A comprehensive set of definitions and terms used when discussing face recognition technology.
This page will be a continuously evolving reference for the basic terminology used when evaluating, integrating, and operating face recognition algorithms. Please let us know if there are any definitions or descriptions you would like added!
Accuracy – the rate at which the system makes a correct prediction regarding a person’s identity. Accuracy will range from 0.0 to 1.0, though this will also be expressed as percentages, in which case it will range from 0.0% to 100.0%. Accuracy = 1.0 – Error.
Error – the rate at which at the system makes an incorrect prediction regarding a person’s identity. Error will range from 0.0 to 1.0, though this will also be expressed as percentages, in which case it will range from 0.0% to 100.0%. Error = 1.0 – Accuracy.
Type I error / false match / false positive / false acceptance – when two different persons are incorrectly determined to be the same person because a comparison of their face templates exceeds the specified similarity threshold.
Type II error / false non-match / false negative / false rejection – when two instances of the same person are incorrectly determined to be different persons because a comparison of their templates falls below the specified similarity threshold.
False Non-Match Rate (FNMR) / False Reject Rate (FRR) / 1.0 – True Accept Rate (TAR) – the frequency / percentage of comparisons that are false non-matches.
Receiver Operating Characteristic (ROC) curve – measures the tradeoff between false matches and false non-matches on a dataset of face images. The curve is generated by systematically adjusting the match threshold, and for each different threshold measuring the FAR and TAR. As the threshold increases both the FAR and TAR will decrease.
Decision Error Tradeoff (DET) curve – similar to the ROC curve, measures the tradeoff between false matches and false non-matches. The difference between a DET curve and a ROC curve is that a DET curve plots FAR versus FRR, each typically on a logarithmic axis. Thus, the information reported is the same, but the presentation style is different.
Cumulative Match Characteristic (CMC) curve – measures the frequency that a person in a probe image is matches against their same identity when being searched against a gallery. The x-axis of the plot contains the rank. The frequency plotted at rank 1 is the percentage of times the top match in the gallery is the same person. The frequency plotted at rank 2 is the percentage of times that at least one of the top two matches in the gallery is the same person. The frequency plotted at rank 3 is the percentage of times that at least one of the top three matches in the gallery is the same person. Etc.
Face Recognition API Concepts
Enrollment – the process of receiving an image or video frame, detecting all faces present, and outputting a template for each detected face.
Template – the numerical encoding of a face in an image.
Template comparison – the process of measuring the facial similarity between two templates.
Facial similarity – the similarity measured during the template comparison process. While the similarity will be a numerical value, and often ranges from 0 to 1, no assumptions can be made about the meaning of a given similarity score for an algorithm without knowledge of the underlying distribution, which will be different for every vendor.
Similarity thresholding – the process of converting a numerical similarity score measured between two face templates into a match or no-match determination. This typically involves a single static similarity threshold, such that any similarity score lower than the threshold is determined to be a no-match, and any similarity score greater than the threshold is determined to be a match.
Probe / Query – a template submitted for search against a gallery.
Gallery / Database – a collection of templates to be searched against.
Candidate match list – an ordered list of the top matching templates in a gallery to a submitted probe image. Templates are typically returned in decreasing similarity. Typically a trained facial examiner will make the final determination as to whether any of the images in the candidate match list are the same person as the probe image.
1:N search / human-guided search – the process of a manually submitting a probe image to be search against a gallery, receiving the candidate match list, and determining if a match exists. “1” refers to the single probe image, where “N” is an integer that represents the number of templates in the gallery.
1:(N+1) search / watch-list identification – the process of automatically searching a probe image against a gallery. As opposed to 1:N search which will return a candidate list to a human examiner, watch-list identification will instead send match alerts if any of the gallery templates exceed a similarity threshold when compared against the probe template. Thus, the “+1” refers to the null hypothesis case of the person not being in the watch-list gallery. In this searching paradigm a human is only alerted when match occurs, as opposed to a human always reviewing the search results when a probe is compared against a gallery.
1:1 / identity verification – the process of comparing two face templates and determining if they are a match using similarity thresholding.
N:N / facial clustering – facial clustering is the process of taking a set of N face images and grouping them into their different identities. This process involves, either explicitly or implicitly, measuring the facial similarity between all N faces images, which means N*(N-1)/2 total facial comparisons. Depending on the efficiency of an algorithm, this process can be extremely slow and resource intensive if N is large.
There are a lot of different applications for facial clustering. One of the most common use cases is in child exploitation, where investigators serving warrants often end up with large amounts of digital evidence (images and videos) containing children (victims) and adults (perpetrators). Using facial clustering, the investigators can ingest these images and determine how many different persons are in the dataset. In turn, the images from these different identities can passed into a 1:N search system to determine the identities of each person, and in turn either help rescue them (victims) or proceed with the criminal investigation (perpetrators).
Interocular distance (IOD) / Inter-pupillary distance (IPD) – – the number pixels (Euclidean distance) between the center of the two eye sockets. It is common for a face recognition algorithm’s sensitivity to image resolution to be measured as accuracy versus IPD.
Minimum bounding box size – the smallest size face that will be searched for in an image. This is typically a single number, measured in pixels, which specifies the height and width of the square face bounding box. As the minimum bounding box size is set smaller, exponentially more face regions will be considered, which will slow down the enrollment speed and increase the chances of a false positive face detection.
System – software and hardware configured to perform a particular task(s). A system can be operated by a person(s) or another system.
Software – a series of instructions that are performed by a computer.
Hardware – physical devices, which may include a central processing unit (CPU), memory (e.g., RAM), storage, touchscreen, camera, etc.
Native software – byte-level machine code that is executed directly by a central processing unit (CPU). Native software is dependent on the software platform it was compiled for.
Software Platform – the CPU architecture and operating system used for running software. E.g., Ubuntu Linux 16.04 running on an x64 CPU.
Software Development Kit (SDK) – provides software libraries that perform specific functions, such as face detection and recognition, and are accessed through API‘s and command line interfaces. An SDK typically has little off-the-shelf utility, and it must instead be embedded into a system. An SDK is a critical component that powers nearly every system that exists.
In terms of face recognition, many developers of face recognition systems license SDK’s from third parties. There are also larger companies that both have their own software development kit and develop systems around it.
An effective SDK will require little to no installation, provide an intuitive documented API, and support a variety of software platforms.
Software Application – executable software designed for end-user interaction, such as a Graphical User Interface (GUI) or a command line interface (CLI).
End-user – a person who interacts with a software application or a system.
Software Library – a collection of software functions that are called by other software libraries or applications.
Software function – a set of computer instructions that are called and run based on provided input and output parameters. Input and output parameters are defined by the function.
Application Programming Interface (API) – the set of functions accessible to a developer. An API is written in a specific software language (e.g., C, C++, Java, Python, Go).
Native API – an API that accesses functions in native software that is running on the same machine that calls the API. With respect to face recognition, an SDK with a native API provides a system developer the most control over how their data is handled, as it should be the case that the SDK only performs the actions stated in the documentation, and the developer controls any data storage or transmission. Some SDKs may still perform unwanted data transmission and storage, which is not difficult to identify during security testing.
Turn-key system – a system that needs no custom integration, modification, or complicated installation in order to work. It is simply a matter of “turning the key” and using the system.
One-off system – a system that is custom developed for use in only a single deployment. Typically a one-off system is derived from an existing system.
Web API – an API that allows functions to be called between machines, one being the client machine that makes the web API call and the other being the host machine that receives and processes the API call. With respect to face recognition, a web API means the client machine will be required to send images and data to the host machine (e.g., a cloud server). It is not possible to know if the host machine is storing the images and data longer than necessary.
Frames Per Second (FPS) – the number of frames / images sampled from a video in a second. The frames are sampled in uniform / even manner. Standard cameras record video up to 30 FPS. Face recognition typically do not benefit from sampling frames at a rate higher than 5 FPS.
Enrollment speed – the amount of time it takes to detect and templatize all faces in an image. Enrollment speed is typically measured on a single processing core, and will be dependent on the speed of the processor, the number of faces in the image, and the resolution of the image.
Comparison speed – the amount of time it takes to compare two templates and generate a threshold.
Template size – the number of bytes required to represent a face. When performing 1:N search it is generally required to cache all N templates into the computer’s memory to provide quick responses. Thus, the amount of RAM required to load N templates will be N times the template size. In addition to requiring less memory, smaller templates enable larger galleries to be searched more quickly.
Binary size – the amount of memory (RAM) used by face recognition software, which comprises of the code libraries and statistical models. Embedded devices (e.g., mobile phones) have limited memory resources, and some face recognition algorithms require more memory than is available on the entire device (let along the fact that other applications need memory as well).
Constrained – when aspects of the capture environment (camera configuration, illumination, background, etc.) can be controlled. Face recognition algorithms typically do better in constrained / controlled environments.
Unconstrained – when aspects of the capture environment (camera configuration, illumination, background, etc.) cannot be controlled. Face recognition algorithms typically have increased error rates in unconstrained / non-controlled environments. Until around 2015 almost no commercial face recognition algorithms supported unconstrained environments. Today, viable face recognition vendors are expected to handle off-angle face images, varying illumination, different cameras types, and other variates.
Facial pose angle – the orientation of a face relative to a camera, measured as Yaw, Pitch, and Roll.
Yaw angle – rotation of the face about the Y-axis of the camera plane. E.g., when a person turns their face to the left or the right relative to the camera.
Pitch angle – rotation of the face about the X-axis of the camera plane. E.g., when a person tilts their face up or down relative to the camera.
Roll angle – rotation of the face about the Z-axis of the camera plane.
Occlusion – when regions of the face are covered. E.g., due to sunglasses, scarf, hair, or capture conditions.
Identity Deduplication – the process of cross-referencing identity document applicants’ face image against existing images in a identity document database. This is process is typically semi-automated, where a human investigator only intervenes if a face image from two different identities generate a similarity score above the specific threshold, which is indicative of a fraudulent application. Identity deduplication is performed by agencies and organizations that that grant identity documents, such as driver’s licenses and passports, and financial institutions.
Forensic Search – when an analyst searches a gallery with a probe image in an attempt to determine the identity of the person in the probe image. Typically a forensic search recognition system will provide the analyst with a list of the gallery images with the highest similarity scores to the probe mate. In turn, the analyst will manually verify if any of search results are the same person as the probe image.
Access Control – a 1:1 verification system where a user claims their identity (e.g., via a username, ID card, or badge) and presents their face to the system for access. These systems range from mobile device unlock to accessing a secure facility.
Real-time Screening – a 1:(N+1) watch-list identification system that analyzes a camera feed or streaming video to compare each detected facial identity against a gallery watchlist. While real-time screening shares similarities to identity deduplication, it is typically performed on much smaller galleries, and receives streaming video as an input data.
Perpetual license – a grant to use software in a specified manner (e.g., on a specific machine, with certain usage parameters) in perpetuity (forever).
Maintenance fees – fees collected on a perpetual license to provide software updates and technical support. Typically these fees are collected on an annual basis and are a percentage of the original license cost.
Random Access Memory (RAM) – volatile memory that can be rapidly read by a central processor unit (CPU). The term volatile means that the contents of the memory will be erased if power is lost, which differs from persistent storage mediums such as hard-drives. The notion of random access, or uniform read and write times, means that data can be read from any physical location in memory in roughly the same amount of time, or written to any location in memory in roughly the same amount of time.
As it pertains to face recognition, search applications generally require that the templates are stored in RAM so they can be rapidly compared against a probe template. This means there will be a latency when a search application is first initialized while the templates are read from persistent storage into RAM, and that there needs to be enough RAM bandwidth available to load all of the templates in RAM at the same time.
Persistent storage – non-volatile memory that is typically 20x to 1000x slower to read and write to than RAM, and can also have non-uniform read and write times. Examples of persistent storage include hard-drives, flash memory, cloud storage, and file servers (note that cloud storage and file servers typically are abstractions to configured hard-drives).
As it pertains to face recognition, templates, images, and videos need to be saved on a persistent storage medium. For verification applications, as they only need to reference a single template, a template can typically be read from the persistent storage when needed (as opposed to having all templates loaded into RAM first). For search applications, the templates need to be first read from persistent storage into RAM in order to enable searching the gallery in a reasonable period of time. Finally, any template generated by a face recognition application that is not saved to a persistent storage medium will eventually be lost.
I/O bound – when an algorithm or application has to wait longer to read or write data than it does to process the data. The “I” in I/O refers to Input, and the “O” refers to Output.
Compute bound – when an algorithm or application has to wait longer to process data than it does to read or write the data.