EP1828959A1 - Face recognition using features along iso-radius contours - Google Patents
Face recognition using features along iso-radius contoursInfo
- Publication number
- EP1828959A1 EP1828959A1 EP05813319A EP05813319A EP1828959A1 EP 1828959 A1 EP1828959 A1 EP 1828959A1 EP 05813319 A EP05813319 A EP 05813319A EP 05813319 A EP05813319 A EP 05813319A EP 1828959 A1 EP1828959 A1 EP 1828959A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- interest
- point
- contour
- data
- irad
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
- G06V40/171—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
Definitions
- the present invention relates to the representation of 3D objects and is also concerned, although not exclusively, with the alignment, recognition and/or verification of 3D objects.
- Preferred embodiments of the invention are concerned with the representation of three-dimensional (3D) structural data and its associated registered two-dimensional (2D) colour-intensity image data.
- 3D structural data may be acquired from any 3D measurement system — for example, stereo camera systems, projected light ranging systems, shape from shading systems and so on.
- 3D data and 3D data models In order to recognise the shape of an object or verify that an object belongs to a particular class of objects, using 3D data and 3D data models, it is necessary to acquire 3D data using some form of 3D sensor and then compare the acquired data with examples of 3D data or models stored in a database.
- the way in which the 3D data is modelled and represented is crucial, since it dictates the manner in which the comparison between captured 3D data and pre-stored 3D data is made. Thus, ultimately, it has a large influence on the performance of the system, for example, in terms of the system's overall recognition performance.
- Typical approaches for matching 3D objects first attempt to align the object to a standard orientation, consistent with the orientation of the models stored in a database. There are many ways of encoding 3D structure in the public domain.
- a 3D curve C is defined by intersecting a sphere of radius r, centred on a feature point-of- interest, with the captured 3D facial surface.
- a best- fit plane is then fitted to this curve, although the curve is not likely to be planar and in some cases could be highly non-planar.
- a second plane is then defined by translating the extracted plane such that it contains the point-of-interest.
- the orthogonal projection of the curve C onto this plane is then denoted by the planar curve C and the orthogonal projection distance of C onto its corresponding curve C forms a (signed) distance profile, which is sampled at regular angles through the full range of 360 degrees.
- the best fit plane will also be sensitive to changes in facial structure, such as those caused by changes in expression. Hence, any local change in surface structure will affect the whole representation of the surface around the point-of-interest.
- Preferred embodiments of the present invention aim to provide a method that maintains a consistent signal for all rigid sections of the surface, regardless of any structural changes in other sections. For example, the part of a contour passing through the rigid forehead is not affected by the same contour passing through the malleable mouth area.
- a method of representing 3D object data comprising the steps of determining a point-of-interest in a predetermined position relative to the object, and generating for that point-of-interest a set of multiple isoradius surface contours, each of which comprises a locus of a set of points on the surface of the object that are at a constant predetermined distance from the point-of-interest, which distance is different for each contour of the set.
- Said point-of-interest may be located in or on said object.
- Said point-of-interest may be located on a surface of said object.
- a method as above may include the step of determining one or more property of the object at points along each said contour.
- Said one or more property may include at least one of: curvature of the respective contour; object surface orientation; local gradient of object surface orientation along the respective contour; and object surface curvature along the respective contour.
- Said one or more property may include colour and/or colour-intensity.
- a plurality of properties of the object are determined at points along each said contour, and a plurality of aligned ID signals are derived therefrom.
- said object is a human face.
- said point-of-interest is a nose-tip.
- a method according to any of the preceding aspects of the invention may include the further step of comparing the represented 3D data with stored data to recognise or verify the object.
- Such a method may further include at least one of the following steps: a. size-shape vector prefiltering;
- a method according to any of the preceding aspects of the invention may comprise the steps of determining a plurality of points-of-interest as aforesaid, and generating for each point-of-interest a set of multiple isoradius surface contours as aforesaid.
- the invention also extends to apparatus for representing 3D data, the apparatus being adapted to carry out a method according to any of the preceding aspects of the invention.
- the data would be in the form of a set of 3D point coordinates. This is often called a ⁇ point-cloud' representation.
- this point cloud is converted into a mesh representation, where each point is connected to several neighbours, creating a mesh of triangular planar facets. If a corresponding standard colour- intensity image can be acquired at the same time as the 3D model, this colour- intensity information can be aligned to the captured 3D structure and texture- mapped to the surface of the model using the known facets in the mesh representation. In this way the 2D image is said to be registered to the 3D data.
- Example embodiments of the invention can be divided into two parts, namely:
- a "point-of-interest” (POI) is extracted.
- the first stage in extracting the representation is to locate one or more interest points on the captured 3D object surface.
- a crucial point is that the 3D position of such interest points should be detected reliably, such that there is high repeatability (low variance in 3D position).
- Good interest points are those that have a local maximum in 3D surface curvature.
- the tip of the nose is a good choice for interest point.
- the absolute orientation of the face is immaterial, since contours are generated by intersecting several spheres of different radius with the object surface.
- each point-of-interest used in the method must be "semantic” in the sense it is labelled ("nose-tip”, “eye-corner”, etc.) and one must be able to match points of interest between test data and data in a stored database.
- An advantage of using a single point-of-interest is that matching is implicit as it is one-to-one.
- POIs could be detected using colour-texture data (from a standard 2D image) as well, or may be combined with the use of structural information.
- IRAD is the locus in 3D space of a point on the surface of the object, which is a constant distance from a "point-of-interest".
- IRAD is the locus in 3D space of a point on the surface of the object, which is a constant distance from a "point-of-interest”.
- IRAD locii can take many forms.
- Typical methods include (i) simple approaches such as linear interpolation across a 3D mesh, (ii) accurate but computationally expensive approaches such as the direct use of a parametric Gaussian smoothing function, and (iii) more complex but faster approaches such as the generation of a gridded depth map and the subsequent use of a 3D interpolation scheme, such as those based on the generation of 3D Hermite patches.
- a typical set of properties might include shape properties, for example:-
- the shape of the IRAD itself for example, expressed as a curvature property (see below) or orientation property relative to some reference orientation. Note that an orientation signal has less noise than a curvature signal as it is a first order difference rather than a second order difference.
- Depth properties may be included, as may be colour-intensity properties, as for example:- 1. Intensity(possibly normalised).
- Red, green or blue (RGB) colour channel possibly normalised, for example, with respect to intensity
- RGB data any other transformation of RGB data (such as HSV, CIE, etc).
- the final representation is thus a set of ID signals.
- Extraction of a set of ID signals facilitates matching to a pre-stored (database) 3D data and models through a process of correlation, although we do not preclude using the representation in other forms of matching process. For example, if several interest points were used, graph matching approaches may be appropriate.
- Size-shape prefilter 1.
- ID signal correlation 2.
- LDA Linear Discriminant Analysis
- the length of an isoradius contour depends on how much it meanders across its spherical surface, although obviously one generally expects contour lengths at lower radii to be shorter than contour lengths at higher radii.
- a distance measure eg Euclidean, Mahalanobis
- the prefilter should be implemented as a weak constraint, so that the expected number of false rejects of the prefilter is zero.
- a slightly more sophisticated prefilter would use the ratio of contour lengths associated with different IRAD radii. This would prevent scaled versions of the same object shape being rejected. Such scalings could occur with slight variations in camera calibration, such as the value used for the stereo baseline.
- the method can only be applied to contours which do not intersect with any holes on the 3D surface. In such cases, the contour length cannot be measured and should be eliminated from the feature vector before the feature vector match is made.
- ID signal correlation the process of ID signal correlation is standard and well documented. It forms the core of the matching process. Much of the power of the disclosed embodiments of the invention is that the dense, comprehensive multi-contour, multi-feature representation employed makes this technique central to the matching process. It is noted that, for fast searching of large databases, the correlation process is likely to be implemented directly in hardware.
- This weighting scheme is applied by computing the inner (dot) product of correlation scores with a predetermined weights vector.
- the element values of these weight vectors will be dependent on the application scenario in mind and calculated using such methods as LDA applied to typical examples of the target data.
- Such a process produces a weighting scheme in which between-class variance of correlation scores is maximised and within-class variance (e.g. due to facial expression and other factors) is minimised.
- results of LDA on a particular database may indicate certain features of certain IRADs are not helpful in that they do not provide sufficient discriminatory information and should therefore not be used in the matching process, or even computed in the live operation of the system.
- the method handles this situation by maintaining multiple signal fragments along a single IRAD in the face representation and generating correlation scores for all fragments independently. The aim then is to determine the maximum correlation score across all fragments of all IRADs that is consistent with a single 3D rotation.
- the ID correlation processes proceed independently on a number of counts: 1. Every feature (colour, shape) on every IRAD is correlated independently.
- Fragmented IRADs due to holes in the surface patch have multiple segments, which are correlated independently. What we are then looking for is the set of correlations, both within IRADs (due to multiple features and multiple fragments) and across the whole set of IRADs, which has a consistent orientation alignment between test data and model being matched.
- orientation alignment is a 3 degree of freedom (dof) rotation about the point-of-interest (nose- tip). The rotation is determined from the known correlation between test data surface coordinates and database model surface coordinates on the specific IRAD.
- the output of the system may wish to display a list of descending scores, with some sort of cut-off below which it is unlikely that a correct match has been achieved.
- facial range/ colour /intensity images of a subject need to be collected, analysed and stored.
- the representation needs to be augmented, such that it captures all variations of facial structure in a wide range of expressions. Given that a face in a semantically known expression (smile, frown, wrinkle nose) will match, the system will also be capable of outputting facial expression.
- PCA Principle Component Analysis
- the IRAD technique may be used for alignment to a standard pose.
- the IRAD representation is used to simultaneously perform recognition and alignment, where the alignment is between a pair of 3D or 3D/2D images (one captured image and a database image).
- the IRAD technique may also be used as a means of aligning data to a standard pose (position and orientation), which is used as a precursor to other recognition techniques, such as LDA-based recognition and verification.
- a face may be aligned to a forward looking pose, with some feature or features in a standard position.
- the IRAD representation may be used to align the 3D face to a standard forward looking pose in the following way:
- the "point signature method” uses a single contour around a point-of-interest.
- the IRAD representation uses multiple contours.
- PSM uses multiple points-of-interest to identify features in the face.
- the IRAD methods may typically use one point-of-interest as a reference point to encode the whole face (with multiple contours).
- PSM uses a single contour as a marker to encode local surface shape by measuring orthogonal depth to a reference plane.
- the IRAD methods encode IRAD contours themselves by measuring IRAD contour curvature. They also use IRAD contours in a similar way to the PSM method, i.e. to act as a repeatable surface position locator (a "surface marker"), but the way in which the data is encoded along the contour is different.
- the IRAD methods measure the way in which the surface normal is changing along the contour and measure three colour-intensity signals along the contour (namely RGB values or any colour-space transformation of these values, such as HSV, CIE).
- PSM uses the encoded depth to match point signatures using Euclidean distance on an ordered feature vector.
- the ordering of the feature vector is chosen from a distinctive point on the (orthonormally projected) IRAD curve, such as the point of maximum curvature.
- the IRAD methods use a signal correlation process, before any feature vector matching is used.
- PSM matches a single signal/ feature vector.
- the IRAD methods match multiple signals/feature vectors, due to the representation employing:
- the first task is to find the nose tip. This may be achieved by finding the local maxima in surface curvature.
- the nose forms a distinct ridge, which may be helpful to locate the nose tip.
- Another approach is to determine surface regions above a certain threshold of curvature and locate the centroid of that surface patch.
- Other approaches may use physical analogies, such as artificial potential fields.
- IRAD contours are then generated.
- a specific IRAD contour that is, we wish to identify a locus of points on the face surface that is a fixed radius from the nose-tip point-of-interest. This set of points meanders across the surface of a sphere, centred on the point-of- interest. Assume that this sphere has radius, R.
- any facet in the mesh we can detect if it straddles or touches the intersecting sphere, by checking the distance of its three points relative to the nose-tip. For sufficiently high resolution, we are not likely to find many instances when a point lies exactly on the sphere. More likely are the instances when we find two points of a facet inside the sphere and one outside, or one inside and two outside. In such cases, two linear interpolations can be used on two sides of the facet to generate two connected points on both the facial surface and the sphere surface, thus forming part of the IRAD contour. The information required to link such pairs of connected points is gathered via the known 3D mesh. It is known from the mesh data, which facets neighbour which other facets and thus it is straightforward to link up pairs of connected points into a chain, which represents the IRAD contour.
- the first of these is addressed by re-sampling the IRAD contour through a process of interpolation, to generate what are called reference points along the IRAD contour.
- the second of these requires us to maintain multiple signal fragments in the representation, as discussed previously.
- Encoding the face shape may be achieved in a number of ways, the most obvious of which are :
- contour in the direction ⁇ is computed as the cross-product of the two vectors
- the pixels from the standard 2D colour image that correspond to the 3D data along an IRAD can easily be extracted so that an IRAD effectively consists of a set of ID signals, registered in terms of their position on the IRAD contour.
- These signals could be the raw RGB values (registered to the 3D mesh) from the colour camera or derivatives of this information, such as HSV or CIE colour space.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Oral & Maxillofacial Surgery (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
3D object data is represented by determining a point-of-interest (poi) in a predetermined position relative to the object, and generating for that point-of- interest (poi) a set of multiple isoradius surface contours (IRAD), each of which comprises a locus of a set of points (p1, p2, p3) on the surface of the object that are at a constant predetermined distance from the point-of-interest (poi), which distance is different for each contour of the set. A plurality of properties, such as curvature and colour / intensity, can be taken along each contour, to provide a series of aligned 1D signals that can be used for object recognition or verification. This may find particular application to the recognition or verification of human faces.
Description
FACE RECOGNITION USING FEATURES ALONG ISO-RADIUS CONTOURS
The present invention relates to the representation of 3D objects and is also concerned, although not exclusively, with the alignment, recognition and/or verification of 3D objects.
Preferred embodiments of the invention are concerned with the representation of three-dimensional (3D) structural data and its associated registered two-dimensional (2D) colour-intensity image data.
Although preferred embodiments of the invention may find particular application to the representation, alignment, recognition and/or verification of human faces, the invention may find more general application to any object.
In examples of the invention, 3D structural data may be acquired from any 3D measurement system — for example, stereo camera systems, projected light ranging systems, shape from shading systems and so on.
In order to recognise the shape of an object or verify that an object belongs to a particular class of objects, using 3D data and 3D data models, it is necessary to acquire 3D data using some form of 3D sensor and then compare the acquired data with examples of 3D data or models stored in a database. The way in which the 3D data is modelled and represented is crucial, since it dictates the manner in which the comparison between captured 3D data and pre-stored 3D data is made. Thus, ultimately, it has a large influence on the performance of the system, for example, in terms of the system's overall recognition performance. Typical approaches for matching 3D objects first attempt to align the object to a standard orientation, consistent with the orientation of the models stored in a database.
There are many ways of encoding 3D structure in the public domain.
Gordon [G.G. Gordon. Face recognition based on depth and curvature features. In proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 808-810, 1992] discusses face recognition based on depth and curvature features. In this work, 3D facial features are extracted, such as nose ridge, nose-bridge, eye corners, etc and comparison between two faces is based on their relationship in feature space. Also there are academic papers [Ho Y Wang, Chua C and Ren Y. Integrated 2D and 3D images for face recognition. In 11th International Conference on Image Analysis and Processing. Pp 48-53, 2001] in which 2D (colour /intensity) and 3D images are integrated for the purpose of face recognition/ verification.
The only method known to us that bears some relation to the present disclosure is that of Chua et al [Feng Han, Chun-Seng Chua and Yeong-Khing Ho. 3D Human face recognition usingpoint signatures. In the 4th IEEE International Conference on Automatic Face and Gesture Recognition, 2001, pp233-238]. The concept of the "point signature" was introduced in Chua and Jarvis's earlier work [Chin-Seng Chua and Ray Jarvis. Point signatures: A new representation for 3D object recognition. International Journal of Computer Vision, 25(1), 1997].
In the work of point signatures applied to face recognition, a 3D curve C is defined by intersecting a sphere of radius r, centred on a feature point-of- interest, with the captured 3D facial surface. A best- fit plane is then fitted to this curve, although the curve is not likely to be planar and in some cases could be highly non-planar. A second plane is then defined by translating the extracted plane such that it contains the point-of-interest. The orthogonal projection of the curve C onto this plane is then denoted by the planar curve C and the orthogonal projection distance of C onto its corresponding curve C forms a
(signed) distance profile, which is sampled at regular angles through the full range of 360 degrees.
One of the problems with this method is that it suffers from "missing parts". If there are missing parts in the data, then the plane fitting process will be corrupted, which in turn will corrupt the projection distances.
The best fit plane will also be sensitive to changes in facial structure, such as those caused by changes in expression. Hence, any local change in surface structure will affect the whole representation of the surface around the point-of-interest.
Preferred embodiments of the present invention aim to provide a method that maintains a consistent signal for all rigid sections of the surface, regardless of any structural changes in other sections. For example, the part of a contour passing through the rigid forehead is not affected by the same contour passing through the malleable mouth area.
According to one aspect of the present invention, there is provided a method of representing 3D object data, comprising the steps of determining a point-of-interest in a predetermined position relative to the object, and generating for that point-of-interest a set of multiple isoradius surface contours, each of which comprises a locus of a set of points on the surface of the object that are at a constant predetermined distance from the point-of-interest, which distance is different for each contour of the set.
Said point-of-interest may be located in or on said object.
Said point-of-interest may be located on a surface of said object.
A method as above may include the step of determining one or more property of the object at points along each said contour.
Said one or more property may include at least one of: curvature of the respective contour; object surface orientation; local gradient of object surface orientation along the respective contour; and object surface curvature along the respective contour.
Said one or more property may include colour and/or colour-intensity.
Preferably, a plurality of properties of the object are determined at points along each said contour, and a plurality of aligned ID signals are derived therefrom.
Preferably, said object is a human face.
Preferably, said point-of-interest is a nose-tip.
A method according to any of the preceding aspects of the invention may include the further step of comparing the represented 3D data with stored data to recognise or verify the object.
Such a method may further include at least one of the following steps: a. size-shape vector prefiltering;
b. a weighting function to generate match score;
c. maintaining signal fragments to deal with holes (missing parts) in the data;
d. employing rotational alignment constraints to ensure that all data is at a consistent orientation;
e. storing, analysing and classifying the variation of isoradius contour signals which occur due to changes in facial expression; and
f. in addition to recognition/verification match scores, outputting both the best rigid Euclidean transformation that aligns test and database data and a value for facial expression.
A method according to any of the preceding aspects of the invention may comprise the steps of determining a plurality of points-of-interest as aforesaid, and generating for each point-of-interest a set of multiple isoradius surface contours as aforesaid.
The invention also extends to apparatus for representing 3D data, the apparatus being adapted to carry out a method according to any of the preceding aspects of the invention.
For a better understanding of the invention and to show how embodiments of the same may be carried into effect, embodiments thereof will now be described, by way of example, and reference will be made, by way of example, to the accompanying diagrammatic drawing, in which Figure 1 illustrates encoding of an isoradius contour.
Firstly, it is assumed that some form of sensor is able to measure the 3D structure of an object of interest, such as a human face. Typically, the data would be in the form of a set of 3D point coordinates. This is often called a ^point-cloud' representation. Also, typically, this point cloud is converted into a mesh representation, where each point is connected to several neighbours,
creating a mesh of triangular planar facets. If a corresponding standard colour- intensity image can be acquired at the same time as the 3D model, this colour- intensity information can be aligned to the captured 3D structure and texture- mapped to the surface of the model using the known facets in the mesh representation. In this way the 2D image is said to be registered to the 3D data.
Example embodiments of the invention can be divided into two parts, namely:
1. Representations used to encode 3D facial structure and registered colour-intensity image. 2. Methods used to implement face alignment, recognition and verification.
Considering firstly the aspect of representation, extraction of the representation can be divided into three conceptual stages: a. A "point-of-interest" (POI) is extracted.
b. A series of isoradius surface contours are extracted. c. Both 3D surface properties and aligned colour-intensity properties along each of those contours is extracted into a set of one- dimensional (ID) signals.
These three steps are now considered in more detail.
The first stage in extracting the representation is to locate one or more interest points on the captured 3D object surface.
A crucial point is that the 3D position of such interest points should be detected reliably, such that there is high repeatability (low variance in 3D
position). Good interest points are those that have a local maximum in 3D surface curvature. In the case of the human face, the tip of the nose is a good choice for interest point. Once the nose-tip is identified, the absolute orientation of the face is immaterial, since contours are generated by intersecting several spheres of different radius with the object surface. Although, in general, methods embodying the invention may include multiple points of interest, it is anticipated that, in many cases, only one may be used, namely the nose-tip, for human face recognition.
Note that each point-of-interest used in the method must be "semantic" in the sense it is labelled ("nose-tip", "eye-corner", etc.) and one must be able to match points of interest between test data and data in a stored database. An advantage of using a single point-of-interest is that matching is implicit as it is one-to-one.
Finally, it is noted that POIs could be detected using colour-texture data (from a standard 2D image) as well, or may be combined with the use of structural information.
We now turn to extraction of an isoradius surface contour.
An "isoradius surface contour", or IRAD for short, is the locus in 3D space of a point on the surface of the object, which is a constant distance from a "point-of-interest". One can think of this as the intersection of the surface of a sphere, centred on that point-of-interest, with the object surface. If we then vary the radius of that intersecting sphere, we can generate a whole series of IRADs of arbitrary density. A typical density, however, would be a radius step size equal to the average distance between neighbouring points in the mesh representation.
It may be noted that several interest points (see above) may be required for objects when the generation of isoradius contours is poorly conditioned, that is, when the surface normal and sphere normal are nearly parallel for some areas on some intersecting spheres.
Extraction of a set of IRAD signals will now be considered. This relates to the extraction of 3D shape properties and colour-intensity properties of the object, along isoradius contours, such that a set of ID signals is generated. In a typical case, properties would be extracted at regular intervals along the IRAD, for example, using a fixed step size of space-curve length.
The actual extraction of IRAD locii can take many forms. Typical methods include (i) simple approaches such as linear interpolation across a 3D mesh, (ii) accurate but computationally expensive approaches such as the direct use of a parametric Gaussian smoothing function, and (iii) more complex but faster approaches such as the generation of a gridded depth map and the subsequent use of a 3D interpolation scheme, such as those based on the generation of 3D Hermite patches.
A typical set of properties might include shape properties, for example:-
1. The shape of the IRAD itself, for example, expressed as a curvature property (see below) or orientation property relative to some reference orientation. Note that an orientation signal has less noise than a curvature signal as it is a first order difference rather than a second order difference.
2. The change in normal vector of the captured model surface on neighbouring points of the IRAD surface or the normal vector orientation relative to some reference normal vector orientation
(again this first-order difference measure will have less noise than a second-order difference measure).
Depth properties may be included, as may be colour-intensity properties, as for example:- 1. Intensity(possibly normalised).
2. Red, green or blue (RGB) colour channel (possibly normalised, for example, with respect to intensity).
3. A second, possibly normalised, colour channel.
4. Any other transformation of RGB data (such as HSV, CIE, etc).
The final representation is thus a set of ID signals.
A matching algorithm is now considered.
Extraction of a set of ID signals facilitates matching to a pre-stored (database) 3D data and models through a process of correlation, although we do not preclude using the representation in other forms of matching process. For example, if several interest points were used, graph matching approaches may be appropriate.
There are several basic elements to the matching algorithm, which can be described as follows:
1. Size-shape prefilter. 2. ID signal correlation.
3. Prioritisation and weighting of feature vector elements using methods such as, but not limited to, Linear Discriminant Analysis (LDA).
4. Feature vector matching, using some distance metric such as correlation score, Euclidean distance, cosine distance or some other metric.
There are several more detailed features in matching algorithms, as follows:
1. Dealing with holes.
2. Using rotational alignment constraints to improve system performance.
3. Dealing with changes in facial expression
The aspect of size-shape prefilter will now be considered.
To reduce computational expense, it is desirable to filter any obvious non-matches from the database before invoking time consuming signal correlation. We call this size-shape prefiltering and the process uses the lengths and relative lengths of IRAD contours.
The length of an isoradius contour depends on how much it meanders across its spherical surface, although obviously one generally expects contour lengths at lower radii to be shorter than contour lengths at higher radii. This means that, once the range and density of radii has been chosen for the whole dataset, a distance measure (eg Euclidean, Mahalanobis) of the feature vector of IRAD lengths associated with that radius range may act as a prefilter to eliminate any very poor matches. The prefilter should be implemented as a weak
constraint, so that the expected number of false rejects of the prefilter is zero. A slightly more sophisticated prefilter would use the ratio of contour lengths associated with different IRAD radii. This would prevent scaled versions of the same object shape being rejected. Such scalings could occur with slight variations in camera calibration, such as the value used for the stereo baseline.
Note that the method can only be applied to contours which do not intersect with any holes on the 3D surface. In such cases, the contour length cannot be measured and should be eliminated from the feature vector before the feature vector match is made.
Turning now to ID signal correlation, the process of ID signal correlation is standard and well documented. It forms the core of the matching process. Much of the power of the disclosed embodiments of the invention is that the dense, comprehensive multi-contour, multi-feature representation employed makes this technique central to the matching process. It is noted that, for fast searching of large databases, the correlation process is likely to be implemented directly in hardware.
We now consider matching scored using weightings from Linear Discriminant Analysis.
Once signal correlation has been determined, we may prioritise sections of IRAD signals by giving a higher weighting to those sections with greater discriminating ability. This weighting scheme is applied by computing the inner (dot) product of correlation scores with a predetermined weights vector. The element values of these weight vectors will be dependent on the application scenario in mind and calculated using such methods as LDA applied to typical examples of the target data. Such a process produces a weighting scheme in
which between-class variance of correlation scores is maximised and within-class variance (e.g. due to facial expression and other factors) is minimised.
Note that the results of LDA on a particular database may indicate certain features of certain IRADs are not helpful in that they do not provide sufficient discriminatory information and should therefore not be used in the matching process, or even computed in the live operation of the system.
It may be necessary to consider how to deal with holes. Some captured test data will have "holes" in the 3D mesh, due to self occlusion, poor lighting conditions, etc, and this problem is likely to be worse at extreme (non fronto- parallel) head orientations.
The method handles this situation by maintaining multiple signal fragments along a single IRAD in the face representation and generating correlation scores for all fragments independently. The aim then is to determine the maximum correlation score across all fragments of all IRADs that is consistent with a single 3D rotation.
Enforcement of orientation alignment constraints will now be considered.
In the first stages of the algorithm, the ID correlation processes proceed independently on a number of counts: 1. Every feature (colour, shape) on every IRAD is correlated independently.
2. Fragmented IRADs (due to holes in the surface patch) have multiple segments, which are correlated independently.
What we are then looking for is the set of correlations, both within IRADs (due to multiple features and multiple fragments) and across the whole set of IRADs, which has a consistent orientation alignment between test data and model being matched. Such orientation alignment is a 3 degree of freedom (dof) rotation about the point-of-interest (nose- tip). The rotation is determined from the known correlation between test data surface coordinates and database model surface coordinates on the specific IRAD.
For each correlated IRAD signal pair (query and database), with a distinctive maximum in correlation, we can generate a list of 3D correspondences along the matched pair of IRAD contours. Thus, if we had a face with 40 IRADs, we could end up with 50 or so correlation peaks, each of which has a (different sized) set of 3D correspondences. Each of these set of 3D correspondences defines a 3D rotation about the origin, which is positioned at the nose tip. We compute these rotations using a least squares method, as follows.
First we compute the cross covariance matrix (K) associated with the signal correlation and then we compute the singular value decomposition (SVD) of this matrix as K=USV. The rotation matrix between the two sets of data is then given by multiplying together the orthonormal matrices in the SVD as R=VU'.
The 3D rotation that this matrix represents is then converted to normalised quaternions, which has a distinct advantage over Euler angles in that the representation of a 3D rotation is unique. A 3D rotation is represented by four quaternions but, since these are normalised such that their sum of squares is unity, a 3D rotation expressed using quaternions can be represented in a three-dimensional quaternion space. In order to extract the best rotational displacement that aligns
the two 3D facial surfaces, we determine the largest cluster in quaternion space. We do this by a very simple clustering process, which uses systematic sampling consensus, due to the relatively small number of data points in quaternion space. For each point, we count the number of neighbours within a sphere of some predefined radius. The point in this space that has the largest number of neighbours is considered to approximately represent the rotation between the two facial surfaces and the points within its neighbourhood are labelled as putative inliers. We then eliminate, from the putative inliers, points corresponding to multiple correlation peaks so that, for a given IRAD signal, only the most central quaternion space point in the neighbourhood region is retained. Finally, we can create a large list of 3D point-point correlations based on all the IRAD data associated with the correlations that have generated inliers within the quaternion space neighbourhood. We then recompute K and again use SVD to compute the refined rotation matrix, R.
Now we compute the correlation score feature vector with all IRAD- features aligned in terms of orientation and we apply the weighted sum of vector elements (dot product) based on LDA analysis in order to get a single figure for the match.
Obviously the output of the system may wish to display a list of descending scores, with some sort of cut-off below which it is unlikely that a correct match has been achieved.
In order to deal with facial expressions, multiple facial range/ colour /intensity images of a subject need to be collected, analysed and stored. The representation needs to be augmented, such that it captures all variations of facial structure in a wide range of expressions. Given that a face in a
semantically known expression (smile, frown, wrinkle nose) will match, the system will also be capable of outputting facial expression.
Note that if the nose is indeed the point-of-interest, then wrinkling one's nose is likely to have the largest effect on the data representation, as the origin is being moved relative to a large part of the facial structure.
The simplest way to do this would be to have an "expression" dimension in the database, so that if we store 10 expressions per person, we simply have a factor of 10 increase in the number of database comparisons that have to be made.
In matching across the "expression" dimension, we would find the two expressions, which have the best match over a consistent orientation of both expressions. We can then find the best linear interpolation between two expression matches and this is used to form the overall match score for that facial model in the database. Many more sophisticated methods can be applied to model and recognise IRAD signal variations due to facial expression. One example to use would be Principle Component Analysis (PCA), which can extract the main modes of IRAD signal variation under a full range of facial expressions.
The IRAD technique may be used for alignment to a standard pose.
In the techniques described so far, the IRAD representation is used to simultaneously perform recognition and alignment, where the alignment is between a pair of 3D or 3D/2D images (one captured image and a database image). The IRAD technique may also be used as a means of aligning data to a standard pose (position and orientation), which is used as a precursor to other
recognition techniques, such as LDA-based recognition and verification. In such a face-recognition system, a face may be aligned to a forward looking pose, with some feature or features in a standard position.
The IRAD representation may be used to align the 3D face to a standard forward looking pose in the following way:
1. Translate the point-of-interest to a 3D standard position.
2. Use the maxima in IRAD curvature to find the ridge of the nose.
3. Use the direction of the ridge of the nose to define a pencil of planes and determine which planar orientation provides the best axis of symmetry between the left and right sides of the face. This determines the correct zero pan angle of the head.
4. Choose either the tilt of the nose ridge or use points above and below the nose to define the tilt angle of the head.
Some differences between the IRAD methods discussed here and the "point-signature" representation of Chua and Jarvis are now discussed.
1. The "point signature method" (PSM) uses a single contour around a point-of-interest. The IRAD representation uses multiple contours.
2. PSM uses multiple points-of-interest to identify features in the face. The IRAD methods may typically use one point-of-interest as a reference point to encode the whole face (with multiple contours).
3. PSM uses a single contour as a marker to encode local surface shape by measuring orthogonal depth to a reference plane. The
IRAD methods encode IRAD contours themselves by measuring IRAD contour curvature. They also use IRAD contours in a similar way to the PSM method, i.e. to act as a repeatable surface position locator (a "surface marker"), but the way in which the data is encoded along the contour is different. The IRAD methods measure the way in which the surface normal is changing along the contour and measure three colour-intensity signals along the contour (namely RGB values or any colour-space transformation of these values, such as HSV, CIE).
Differences in matching algorithm are as follows:-
1. PSM uses the encoded depth to match point signatures using Euclidean distance on an ordered feature vector. The ordering of the feature vector is chosen from a distinctive point on the (orthonormally projected) IRAD curve, such as the point of maximum curvature. By contrast, the IRAD methods use a signal correlation process, before any feature vector matching is used.
2. PSM matches a single signal/ feature vector. The IRAD methods match multiple signals/feature vectors, due to the representation employing:
a. multiple IRAD contours over a densely populated range of radii; and b. multiple representations of shape and colour/intensity along a single IRAD contour.
A particular example of an embodiment of the invention will now be described.
It is assumed that we start with any method of generating a 3D data point cloud, an associated 3D triangular mesh, and a colour-intensity image registered to the mesh. It is also assumed that the application is to develop an IRAD contour representation of a human face using a single point-of-interest, which is positioned at the nose tip.
The first task is to find the nose tip. This may be achieved by finding the local maxima in surface curvature. The nose forms a distinct ridge, which may be helpful to locate the nose tip. Another approach is to determine surface regions above a certain threshold of curvature and locate the centroid of that surface patch. Other approaches may use physical analogies, such as artificial potential fields.
IRAD contours are then generated. Consider the case where we are generating a specific IRAD contour, that is, we wish to identify a locus of points on the face surface that is a fixed radius from the nose-tip point-of-interest. This set of points meanders across the surface of a sphere, centred on the point-of- interest. Assume that this sphere has radius, R.
For any facet in the mesh, we can detect if it straddles or touches the intersecting sphere, by checking the distance of its three points relative to the nose-tip. For sufficiently high resolution, we are not likely to find many instances when a point lies exactly on the sphere. More likely are the instances when we find two points of a facet inside the sphere and one outside, or one inside and two outside. In such cases, two linear interpolations can be used on two sides of the facet to generate two connected points on both the facial surface and the sphere surface, thus forming part of the IRAD contour.
The information required to link such pairs of connected points is gathered via the known 3D mesh. It is known from the mesh data, which facets neighbour which other facets and thus it is straightforward to link up pairs of connected points into a chain, which represents the IRAD contour.
There are a some important remaining issues concerning the practical use of IRAD contours:
1. The points along the IRAD contour are unevenly spaced.
2. There may be gaps in the IRAD contour due to "missing parts" (holes) in the 3D data.
The first of these is addressed by re-sampling the IRAD contour through a process of interpolation, to generate what are called reference points along the IRAD contour. The second of these requires us to maintain multiple signal fragments in the representation, as discussed previously.
Encoding the face shape may be achieved in a number of ways, the most obvious of which are :
1. using the shape of the IRAD contour itself to encode the face shape; and
2. using the change in object surface normal to encode shape.
The second of these is straightforward, and requires measuring the change in surface normal from one reference point to the adjacent one using a straightforward backward (or forward) difference operation. Obviously, the operation should not be computed across breaks in the IRAD contour.
Encoding the shape of the IRAD contour itself is slightly more complicated. Here we measure the curvature that is due to the face shape, rather than the curvature that is simply due to the fact that the IRAD is distributed across the surface of a sphere.
Consider Figure 1. Given that curvature K = Δθ / Δs, then if we maintain a constant step length Δs along the isoradius contour, then the angular changes Δθ encode the contour shape. How do we actually compute Δθ along the contour? Consider three consecutive points pi, p2, p3 on the contour, separated by a fixed but small Δs, as shown in Figure 1. A normal to the
contour in the direction ή is computed as the cross-product of the two vectors
OpI and Op2, where pi and p2 are two points on the isoradius contour and O is the POI centred origin. This vector can be recomputed for the points p2 and p3 using the cross-product of Op2 and Op3. The change in angle of this normal vector is the angle that we need. It is noted that the sum of angular changes around the contour should be 2π radians.
To encode registered colour-intensity information, the pixels from the standard 2D colour image that correspond to the 3D data along an IRAD can easily be extracted so that an IRAD effectively consists of a set of ID signals, registered in terms of their position on the IRAD contour. These signals could be the raw RGB values (registered to the 3D mesh) from the colour camera or derivatives of this information, such as HSV or CIE colour space.
In this specification, the verb "comprise" has its normal dictionary meaning, to denote non-exclusive inclusion. That is, use of the word "comprise" (or any of its derivatives) to include one feature or more, does not exclude the possibility of also including further features.
The reader's attention is directed to all and any priority documents identified in connection with this application and to all and any papers and documents which are filed concurrently with or previous to this specification in connection with this application and which are open to public inspection with this specification, and the contents of all such papers and documents are incorporated herein by reference.
All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and/or all of the steps of any method or process so disclosed, may be combined in any combination, except combinations where at least some of such features and/or steps are mutually exclusive.
Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Thus, unless expressly stated otherwise, each feature disclosed is one example only of a generic series of equivalent or similar features.
The invention is not restricted to the details of the foregoing embodiments). The invention extends to any novel one, or any novel combination, of the features disclosed in this specification (including any accompanying claims, abstract and drawings), or to any novel one, or any novel combination, of the steps of any method or process so disclosed.
Claims
1. A method of representing 3D object data, comprising the steps of determining a point-of-interest in a predetermined position relative to the object, and generating for that point-of-interest a set of multiple isoradius surface contours, each of which comprises a locus of a set of points on the surface of the object that are at a constant predetermined distance from the point-of- interest, which distance is different for each contour of the set.
2. A method according to claim 1, wherein said point-of-interest is located in or on said object.
3. A method according to claim 2, wherein said point-of-interest is located on a surface of said object.
4. A method according to claim 1, 2 or 3, including the step of determining one or more property of the object at points along each said contour.
5. A method according to claim 4, including wherein said one or more property includes at least one of: curvature of the respective contour; object surface orientation; local gradient of object surface orientation along the respective contour; and object surface curvature along the respective contour.
6. A method according to claim 4 or 5, wherein said one or more property includes colour and/ or colour-intensity.
7. A method according to claim 4, 5 or 6, wherein a plurality of properties of the object are determined at points along each said contour, and a plurality of aligned ID signals are derived therefrom.
8. A method according to any of the preceding claims, wherein said object is a human face.
9. A method according to claim 8, wherein said point-of-interest is a nose- tip.
10. A method according to any of the preceding claims, including the further step of comparing the represented 3D data with stored data to recognise or verify the object.
11. A method according to claim 10, further including at least one of the following steps:
a. size-shape vector prefiltering; b. a weighting function to generate match score;
c. maintaining signal fragments to deal with holes (missing parts) in the data; d. employing rotational alignment constraints to ensure that all data is at a consistent orientation;
e. storing, analysing and classifying the variation of isoradius contour signals which occur due to changes in facial expression; and
f. in addition to recognition/verification match scores, outputting both the best rigid Euclidean transformation that aligns test and database data and a value for facial expression.
12. A method according to any of the preceding claims, comprising the steps of determining a plurality of points-of-interest as aforesaid, and generating for each point-of-interest a set of multiple isoradius surface contours as aforesaid.
13. A method of representing 3D object data, the method being substantially as hereinbefore described with reference to the accompanying drawings.
14. Apparatus for representing 3D data, the apparatus being adapted to carry out a method according to any of the preceding claims.
15. Apparatus for representing 3D object data, the apparatus being substantially as hereinbefore described with reference to the accompanying drawings.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GBGB0426595.5A GB0426595D0 (en) | 2004-12-06 | 2004-12-06 | Representation of 3D objects |
PCT/EP2005/056470 WO2006061365A1 (en) | 2004-12-06 | 2005-12-05 | Face recognition using features along iso-radius contours |
Publications (1)
Publication Number | Publication Date |
---|---|
EP1828959A1 true EP1828959A1 (en) | 2007-09-05 |
Family
ID=34044031
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP05813319A Withdrawn EP1828959A1 (en) | 2004-12-06 | 2005-12-05 | Face recognition using features along iso-radius contours |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP1828959A1 (en) |
GB (1) | GB0426595D0 (en) |
WO (1) | WO2006061365A1 (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8532344B2 (en) | 2008-01-09 | 2013-09-10 | International Business Machines Corporation | Methods and apparatus for generation of cancelable face template |
US8538096B2 (en) | 2008-01-09 | 2013-09-17 | International Business Machines Corporation | Methods and apparatus for generation of cancelable fingerprint template |
US8571273B2 (en) * | 2009-05-22 | 2013-10-29 | Nokia Corporation | Method and apparatus for performing feature extraction using local primitive code |
CN101894254B (en) * | 2010-06-13 | 2013-01-09 | 南开大学 | Contouring method-based three-dimensional face recognition method |
CN112784680B (en) * | 2020-12-23 | 2024-02-02 | 中国人民大学 | Method and system for locking dense contactors in people stream dense places |
-
2004
- 2004-12-06 GB GBGB0426595.5A patent/GB0426595D0/en not_active Ceased
-
2005
- 2005-12-05 EP EP05813319A patent/EP1828959A1/en not_active Withdrawn
- 2005-12-05 WO PCT/EP2005/056470 patent/WO2006061365A1/en active Application Filing
Non-Patent Citations (1)
Title |
---|
See references of WO2006061365A1 * |
Also Published As
Publication number | Publication date |
---|---|
GB0426595D0 (en) | 2005-01-05 |
WO2006061365A1 (en) | 2006-06-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Soltanpour et al. | A survey of local feature methods for 3D face recognition | |
CN106682598B (en) | Multi-pose face feature point detection method based on cascade regression | |
Perakis et al. | 3D facial landmark detection under large yaw and expression variations | |
Bronstein et al. | Three-dimensional face recognition | |
Papazov et al. | Real-time 3D head pose and facial landmark estimation from depth images using triangular surface patch features | |
Passalis et al. | Using facial symmetry to handle pose variations in real-world 3D face recognition | |
Mian et al. | An efficient multimodal 2D-3D hybrid approach to automatic face recognition | |
Sirohey et al. | Eye detection in a face image using linear and nonlinear filters | |
US7929775B2 (en) | System and method for recognition in 2D images using 3D class models | |
Gökberk et al. | 3D shape-based face representation and feature extraction for face recognition | |
Mian et al. | Automatic 3d face detection, normalization and recognition | |
US20150177846A1 (en) | Hand pointing estimation for human computer interaction | |
US20090185746A1 (en) | Image recognition | |
US8280150B2 (en) | Method and apparatus for determining similarity between surfaces | |
US7542624B1 (en) | Window-based method for approximating the Hausdorff in three-dimensional range imagery | |
Li et al. | 3D object recognition and pose estimation from point cloud using stably observed point pair feature | |
JPH10177650A (en) | Device for extracting picture characteristic, device for analyzing picture characteristic, and system for collating picture | |
Chowdhary | 3D object recognition system based on local shape descriptors and depth data analysis | |
Premachandran et al. | Perceptually motivated shape context which uses shape interiors | |
Davies et al. | Advanced methods and deep learning in computer vision | |
Szeliski et al. | Feature detection and matching | |
Russ et al. | 3D facial recognition: a quantitative analysis | |
Al-Osaimi | A novel multi-purpose matching representation of local 3D surfaces: A rotationally invariant, efficient, and highly discriminative approach with an adjustable sensitivity | |
EP1828959A1 (en) | Face recognition using features along iso-radius contours | |
Salah et al. | Registration of three-dimensional face scans with average face models |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20070706 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI SK TR |
|
17Q | First examination report despatched |
Effective date: 20070927 |
|
DAX | Request for extension of the european patent (deleted) | ||
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20080408 |