A comprehensive set of definitions and terms used when discussing face recognition technology.
This page will be a continuously evolving reference for the basic terminology used when evaluating, integrating, and operating face recognition algorithms. Please let us know if there are any definitions or descriptions you would like added!
Accuracy – the rate at which the system makes a correct prediction regarding a person’s identity. Accuracy will range from 0.0 to 1.0, though this will also be expressed as percentages, in which case it will range from 0.0% to 100.0%. Accuracy = 1.0 – Error.
Error – the rate at which at the system makes an incorrect prediction regarding a person’s identity. Error will range from 0.0 to 1.0, though this will also be expressed as percentages, in which case it will range from 0.0% to 100.0%. Error = 1.0 – Accuracy.
Type I error / false match / false positive / false acceptance – when two different persons are incorrectly determined to be the same person because a comparison of their face templates exceeds the specified similarity threshold.
False Match Rate (FMR) / False Accept Rate (FAR) – the frequency / percentage of comparisons that are false matches.
Type II error / false non-match / false negative / false rejection – when two instances of the same person are incorrectly determined to be different persons because a comparison of their templates falls below the specified similarity threshold.
False Non-Match Rate (FNMR) / False Reject Rate (FRR) / 1.0 – True Accept Rate (TAR) – the frequency / percentage of comparisons that are false non-matches.
Receiver Operating Characteristic (ROC) curve – measures the tradeoff between false matches and false non-matches on a dataset of face images. The curve is generated by systematically adjusting the match threshold, and for each different threshold measuring the FAR and TAR. As the threshold increases both the FAR and TAR will decrease.
Decision Error Tradeoff (DET) – curvesimilar to the ROC curve, measures the tradeoff between false matches and false non-matches. The difference between a DET curve and a ROC curve is that a DET curve plots FAR versus FRR, each typically on a logarithmic axis. Thus, the information reported is the same, but the presentation style is different.
Cumulative Match Characteristic (CMC) curve – measures the frequency that a person in a probe image is matches against their same identity when being searched against a gallery. The x-axis of the plot contains the rank. The frequency plotted at rank 1 is the percentage of times the top match in the gallery is the same person. The frequency plotted at rank 2 is the percentage of times that at least one of the top two matches in the gallery is the same person. The frequency plotted at rank 3 is the percentage of times that at least one of the top three matches in the gallery is the same person. Etc.
System – software and hardware configured to perform a particular task(s). A system can be operated by a person(s) or another system.
Software – a series of instructions that are performed by a computer.
Hardware – physical devices, which may include a central processing unit (CPU), memory (e.g., RAM), storage, touchscreen, camera, etc.
Native software – byte-level machine code that is executed directly by a central processing unit (CPU). Native software is dependent on the software platform it was compiled for.
Software Platform – the CPU architecture and operating system used for running software. E.g., Ubuntu Linux 16.04 running on an x64 CPU.
Software Development Kit (SDK) – provides software libraries that perform specific functions, such as face detection and recognition, and are accessed through API‘s and command line interfaces. An SDK typically has little off-the-shelf utility, and it must instead be embedded into a system. An SDK is a critical component that powers nearly every system that exists.
In terms of face recognition, many developers of face recognition systems license SDK’s from third parties. There are also larger companies that both have their own software development kit and develop systems around it.
An effective SDK will require little to no installation, provide an intuitive documented API, and support a variety of software platforms.
Software Application – executable software designed for end-user interaction, such as a Graphical User Interface (GUI) or a command line interface (CLI).
End-user – a person who interacts with a software application or a system.
Software Library – a collection of software functions that are called by other software libraries or applications.
Software function – a set of computer instructions that are called and run based on provided input and output parameters. Input and output parameters are defined by the function.
Application Programming Interface (API) – the set of functions accessible to a developer. An API is written in a specific software language (e.g., C, C++, Java, Python, Go).
Native API – an API that accesses functions in native software that is running on the same machine that calls the API. With respect to face recognition, an SDK with a native API provides a system developer the most control over how their data is handled, as it should be the case that the SDK only performs the actions stated in the documentation, and the developer controls any data storage or transmission. Some SDKs may still perform unwanted data transmission and storage, which is not difficult to identify during security testing.
Web API – an API that allows functions to be called between machines, one being the client machine that makes the web API call and the other being the host machine that receives and processes the API call. With respect to face recognition, a web API means the client machine will be required to send images and data to the host machine (e.g., a cloud server). It is not possible to know if the host machine is storing the images and data longer than necessary.
Face Recognition API Concepts
Enrollment – the process of receiving an image or video frame, detecting all faces present, and outputting a template for each detected face.
Template – the numerical encoding of a face in an image.
Template comparison – the process of measuring the facial similarity between two templates.
Facial similarity – the similarity measured during the template comparison process. While the similarity will be a numerical value, and often ranges from 0 to 1, no assumptions can be made about the meaning of a given similarity score for an algorithm without knowledge of the underlying distribution, which will be different for every vendor.
Similarity thresholding – the process of converting a numerical similarity score measured between two face templates into a match or no-match determination. This typically involves a single static similarity threshold, such that any similarity score lower than the threshold is determined to be a no-match, and any similarity score greater than the threshold is determined to be a match.
Probe / Query – a template submitted for search against a gallery.
Gallery / Database – a collection of templates to be searched against.
Candidate match list – an ordered list of the top matching templates in a gallery to a submitted probe image. Templates are typically returned in decreasing similarity. Typically a trained facial examiner will make the final determination as to whether any of the images in the candidate match list are the same person as the probe image.
1:N search / human-guided search – the process of a manually submitting a probe image to be search against a gallery, receiving the candidate match list, and determining if a match exists. “1” refers to the single probe image, where “N” is an integer that represents the number of templates in the gallery.
1:(N+1) search / watch-list identification – the process of automatically searching a probe image against a gallery. As opposed to 1:N search which will return a candidate list to a human examiner, watch-list identification will instead send match alerts if any of the gallery templates exceed a similarity threshold when compared against the probe template. Thus, the “+1” refers to the null hypothesis case of the person not being in the watch-list gallery. In this searching paradigm a human is only alerted when match occurs, as opposed to a human always reviewing the search results when a probe is compared against a gallery.
1:1 / identity verification – the process of comparing two face templates and determining if they are a match using similarity thresholding.
Interocular distance (IOD) / Inter-pupillary distance (IPD) – the number pixels (Euclidean distance) between the center of the two eye sockets. It is common for a face recognition algorithm’s sensitivity to image resolution to be measured as accuracy versus IPD.
Minimum bounding box size – the smallest size face that will be searched for in an image. This is typically a single number, measured in pixels, which specifies the height and width of the square face bounding box. As the minimum bounding box size is set smaller, exponentially more face regions will be considered, which will slow down the enrollment speed and increase the chances of a false positive face detection.
Enrollment speed – the amount of time it takes to detect and templatize all faces in an image. Enrollment speed is typically measured on a single processing core, and will be dependent on the speed of the processor, the number of faces in the image, and the resolution of the image.
Comparison speed – the amount of time it takes to compare two templates and generate a threshold.
Template size – the number of bytes required to represent a face. When performing 1:N search it is generally required to cache all N templates into the computer’s memory to provide quick responses. Thus, the amount of RAM required to load N templates will be N times the template size. In addition to requiring less memory, smaller templates enable larger galleries to be searched more quickly.
Facial pose angle – the orientation of a face relative to a camera, measured as Yaw, Pitch, and Roll.
Yaw angle – rotation of the face about the Y-axis of the camera plane. E.g., when a person turns their face to the left or the right relative to the camera.
Pitch angle – rotation of the face about the X-axis of the camera plane. E.g., when a person tilts their face up or down relative to the camera.
Roll angle – rotation of the face about the Z-axis of the camera plane.
Occlusion – when regions of the face are covered. E.g., due to sunglasses, scarf, hair, or capture conditions.