Nothing Special   »   [go: up one dir, main page]

EP3506149A1 - Verfahren, system und computerprogrammprodukt zur augenverfolgungsrichtungsschätzung - Google Patents

Verfahren, system und computerprogrammprodukt zur augenverfolgungsrichtungsschätzung Download PDF

Info

Publication number
EP3506149A1
EP3506149A1 EP17382912.8A EP17382912A EP3506149A1 EP 3506149 A1 EP3506149 A1 EP 3506149A1 EP 17382912 A EP17382912 A EP 17382912A EP 3506149 A1 EP3506149 A1 EP 3506149A1
Authority
EP
European Patent Office
Prior art keywords
face
eye
image
model
gaze
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
EP17382912.8A
Other languages
English (en)
French (fr)
Other versions
EP3506149B1 (de
Inventor
Luis Unzueta Irurtia
Jon GOENETXEA IMAZ
Unai ELORDI HIDALGO
Oihana OTAEGUI MADURGA
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fundacion Centro de Tecnologias de Interaccion Visual y Comunicaciones Vicomtech
Original Assignee
Fundacion Centro de Tecnologias de Interaccion Visual y Comunicaciones Vicomtech
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fundacion Centro de Tecnologias de Interaccion Visual y Comunicaciones Vicomtech filed Critical Fundacion Centro de Tecnologias de Interaccion Visual y Comunicaciones Vicomtech
Priority to EP17382912.8A priority Critical patent/EP3506149B1/de
Publication of EP3506149A1 publication Critical patent/EP3506149A1/de
Application granted granted Critical
Publication of EP3506149B1 publication Critical patent/EP3506149B1/de
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/012Head tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/013Eye tracking input arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/0304Detection arrangements using opto-electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/165Detection; Localisation; Normalisation using facial parts and geometric relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/19Sensors therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]

Definitions

  • the present invention relates to the field of methods and systems for the estimation of eye gaze direction from images.
  • it relates to methods and systems therefor, in which the images or videos from which the estimation of eye gaze direction is performed, are monocular images or videos.
  • the model-based approach relies explicitly in 3D graphical models that represent the geometry of the eye (typically as spheres) which are fitted to the person's detected eye features in the image (typically, the iris and the eye corners).
  • the fitted 3D model allows inferring the 3D eye gaze vector, which is then used to deduce where the person is looking at (e.g., a specific position in a screen in front of the person's face).
  • Strupczewski A. et al. have provided a revision of methods for estimating the eye gaze direction based on eye modelling ( Strupczewski A. et al., "Geometric Eye Gaze Tracking", Proc.
  • the appearance-based approach establishes a direct relation between the person's eye appearance and the corresponding eye gaze data of interest (e.g., the 3D eye gaze vector) by applying machine learning techniques.
  • a dataset of annotated images is used to train a regression model, which is then used to deduce where the person is looking at, when applied to the person's eye image, extracted from the image.
  • Deep learning techniques allow to generalize much better the learned relation between the eye appearance and the corresponding eye gaze data than alternative machine learning approaches (based on "handcrafted” image features and “shallow” layered learning architectures), when a huge dataset of annotated images is used for training.
  • hundreds of thousands or even millions of samples are used, which may include real data (Zhang X. et al., 2015), photorealistic synthetic data ( Wood, et al., "Rendering of Eyes for Eye-Shape Registration and Gaze Estimation.” Proc. of the IEEE International Conference on Computer Vision (ICCV), 2015 ; Wood, et al., "Learning an Appearance-Based Gaze Estimator from One Million Synthesised Images.” Proc.
  • An effective eye gaze direction estimation system does not only require obtaining accurate eye gaze data from eye images, but it also requires applying properly the eye gaze data to the environment, so that it is possible to deduce where the person is looking at.
  • the most recent approaches have simplified the use cases to some specific ones, where the kind of targets are pre-defined and implicitly included during the regression model's learning stage.
  • the computer-implemented method for eye gaze direction estimation described in the present disclosure intends to solve the shortcomings of prior-art methods.
  • the method of the present disclosure can effectively extend its applicability to different environmental conditions.
  • the considered entities such as person's face, targets if any, and imaging device
  • the method of the present disclosure estimates the direction of eye gaze of a user, from an image of the user taken by an imaging device (such as a camera).
  • the image is a monocular image captured by a monocular imaging device, meaning that the image only provides basic information, such as RGB or greyscale values and resolution. In other words, the image does not provide additional information that binocular cameras may provide, such as 3D information (depth).
  • the image may be a color image or a black-and-white image.
  • the image used for the eye gaze estimation provides limited information about the imaging device used for capturing the image. This limited information is typically the resolution of the imaging device.
  • the image may be extracted from a video comprising a plurality of images.
  • the image -or video from which images are extracted- may be taken in real time (substantially at the moment of executing the method) by an imaging device, or may have been registered well in advance with respect to the execution of the method.
  • any other information of the imaging device relevant for the estimation of eye gaze direction is modelled in order to estimate, for example, the distance between the person in the image and any potential target.
  • the imaging device is therefore modelled in order to have a virtual model thereof that may be integrated in a 3D reconstruction of a virtual space common to the user, imaging device and potential targets.
  • the modelled imaging device also provides a fixed reference system within the space, required for establishing a relationship between the user and the imaging device.
  • the "entities" to be considered by the method herein disclosed are at least the imaging device (e.g. camera) with which the image(s) has(have) been taken, and the user (i.e. the person whose eye gaze direction is to be estimated).
  • Other entities involved in the method are potential targets to which the user may be looking at.
  • the method of the present disclosure may be applied for identifying which object(s) a user is looking at and for how long the user is looking at a certain object.
  • target refers to a specific interest area to which the user may be supposed to be looking at.
  • the target or targets may be different depending on the application.
  • Non-limiting examples of targets are: the screen of a mobile device or specific areas thereof; the screen of a TV set or different areas thereof; a display; a control panel of an industrial machine or specific areas thereof, such as a machine tool; different elements of vehicle to which the driver may be looking at, such as the dashboard, windscreen or rear-view mirrors of a vehicle; an advertising panel; and different areas configured to interact with a user of a smart sensorized house.
  • other elements can be targets.
  • the virtual 3D space reconstruction is generated by combining a set of computer vision procedures in a specific manner and by considering a set of assumptions regarding the physical characteristics of the different components, which constitutes the principal factor to obtain the goal. Different computer vision procedures may be applied to fulfill each of the steps in the proposed method, as far as they provide the required output for each step. In embodiments of the invention, computer vision procedures that have a good balance between accuracy, robustness and efficiency are selected, so that they can be integrated also in embedded hardware systems with low computational capabilities.
  • Non-limiting examples of applications or use of the method of the present disclosure are: Assisting a driver by identifying in a vehicle the object to which the driver is looking at (dashboard, windscreen, inner rear-view mirror, left outer rear-view mirror, right outer rear-view mirror%) by taking images of the driver with a camera disposed within the vehicle (for example, in front of the driver); Identifying in a video the relationship between people and objects; Assessing the impact of publicity, for example in an internet session, identifying whether a publicity message shown on a display is of interest for the user, or when a person is watching TV, identifying whether an advertisement being shown is of interest for the user.
  • a first aspect of the invention relates to a computer-implemented method for estimating eye gaze direction, comprising: fitting a 3D face model to a monocular image obtained from an imaging device, thus obtaining values of a set of face model parameters representing at least one model position parameter, at least one orientation parameter, at least one shape parameter and at least one action parameter; obtaining normalized 3D gaze estimation vectors for the right and left eyes with respect to the imaging device viewpoint; estimating the eye gaze direction with respect to at least one target in the scene.
  • the 3D face model fitted to the monocular image is obtained as follows: applying an appearance-based face region identification process to the monocular image for selecting a cropped face image; applying an appearance-based face landmark detection algorithm to the cropped face image for extracting a set of facial characterization landmarks representative of the shape of face elements, the face landmark detection algorithm being fed with a trained face landmark detection model; applying a model-based face model fitting algorithm for fitting a 3D deformable face model to the set of facial characterization landmarks, thus obtaining the values of the set of face model parameters that minimize the error between each facial characterization landmark and a corresponding projection in the 3D deformable face model.
  • the 3D deformable face model has 6 degrees of freedom corresponding to the XYZ position coordinates and roll-pitch-yaw rotation angles with respect to the imaging device coordinate system.
  • the appearance-based face region identification process comprises applying at least one of the following algorithms: a face detection algorithm or a face tracking algorithm.
  • applying an appearance-based face landmark detection algorithm to the cropped face image for extracting a set of facial characterization landmarks representative of the shape of face elements is done as follows: holistically estimating model position and orientation parameters; holistically estimating shape parameters; and sequentially estimating each action parameter.
  • the imaging device from which the monocular image is obtained is modelled at least by means of its focal length, its image pixel width and its image pixel height.
  • obtaining normalized 3D gaze estimation vectors for the right and left eyes with respect to the imaging device viewpoint is done as follows: extracting and normalizing a left-eye image patch and a right-eye image patch by calculating an affine transformation matrix M for each eye; applying a trained 3D eye gaze vector regression model to each normalized patch.
  • the trained 3D eye gaze vector regression model is a deep neural network.
  • the neural network used in the current method is unimodal because it only considers the normalized ocular image related to the gaze vector, and not the orientation of the head.
  • the orientation is used explicitly and uncoupled in the reconstruction of the virtual 3D world. This reduces the negative impact that a not sufficiently accurate estimation of the orientation of the head may have on the estimation of the gaze vector.
  • obtaining normalized 3D gaze estimation vectors for the right and left eyes is done as follows: applying an eye shape normalization procedure for one eye; mirroring the resulting matrix for the eye not corresponding to that considered by the regressor (left or right); processing both matrixes with the pre-trained deep neural network; un-mirroring the response for the mirrored eye image; applying a head rotation's correction factor; dividing both regression results by their corresponding Euclidean norms.
  • the affine transformation matrix M is calculated as follows: ⁇ ⁇ 1 ⁇ ⁇ ⁇ c x ⁇ ⁇ ⁇ c y ⁇ ⁇ ⁇ ⁇ ⁇ c x + 1 ⁇ ⁇ ⁇ c y where:
  • obtaining normalized 3D gaze estimation vectors for the right and left eyes further comprises applying an eye image equalization algorithm.
  • estimating the eye gaze direction with respect to at least one target in the scene is done as follows: modelling each target of the at least one target with a set of polygons formed by k points b and lines l , and their corresponding planar surfaces ⁇ v, q ⁇ , where v is the normal vector and q the distance from the origin, that define the objects that need to be related with the user's point of gaze, and placing the set of polygons with respect to the imaging device's coordinate system; placing the 3D face model represented by values of a set of face model parameters, with respect to the same imaging device's coordinate system; transforming the normalized 3D gaze estimation vectors for the right and left eyes, so that they are referred to the coordinate system of the imaging system; calculating the geometric mean of both gaze vectors, calculating the point of gaze for each target plane; applying a point-to-polygon strategy to select either a point of gaze that lies within any of the polygons that represent the at least one target or a point of gaze of
  • a second aspect of the invention relates to a system comprising at least one processor configured to perform the steps of the method of the first aspect of the invention.
  • a third aspect of the invention relates to a computer program product comprising computer program instructions/code for performing the method of the first aspect of the invention.
  • a fourth aspect of the invention relates to a computer-readable memory/medium that stores program instructions/code for performing the method of the first aspect of the invention.
  • a method for estimating eye gaze direction from monocular images taken by a single imaging device has been proposed.
  • the method provides a reconstruction of the 3D virtual space and relates the 2D projections with the reconstructed 3D virtual space has been proposed.
  • the method does not require extra equipment, such as several imaging devices or depth sensors. On the contrary, the method only requires a single monocular imaging device that does not provide 3D data.
  • relating the 2D image projections with the reconstructed 3D space would require not only obtaining the person's 3D eye gaze vectors from the images, as disclosed for example by Strupczewski et al., 2016, but also the person's 3D eye positions, the surrounding potential targets' geometries in the same 3D space, the camera characteristics from which that space is observed and an additional calibration stage done by the user.
  • the present disclosure manages to perform such estimation from a simple monocular imaging device.
  • the method of the present disclosure does not use a 3D graphic model to represent the shape of the eye.
  • a 3D graphic model is used to represent the face.
  • the eyes are then positioned in 3D space in a global way, as if the ocular centers were positioned.
  • an appearance-based methodology using a deep neural network is preferably contemplated.
  • the deep neural network is trained from a database of ocular images and its corresponding 3D look vectors with respect to the coordinate system located in the imaging device, and not in the head like in some conventional approaches.
  • estimating the position of the iris in the image is not necessary. Therefore, the dependency of the current method on the accuracy in the estimation of the pose of the head is lower and the dependency of the current method on the accuracy in the estimation of the pose of the iris is null. This circumstance is relevant given that it makes the direction estimations achieved by the current method to be -in general- closer to the real one.
  • the complete processing of the system is more efficient than in conventional methods. Therefore, the current method can work more efficiently in a device (i.e. processor) with a more modest processing capacity (such as, but not limiting, mobile phones and tablets with ARM chips).
  • a device i.e. processor
  • a more modest processing capacity such as, but not limiting, mobile phones and tablets with ARM chips.
  • FIG. 1A shows a block diagram of the method of the present disclosure.
  • the method is applied to an image or, in general, to a sequence of images (input monocular image(s) 101 in figure 1A ).
  • Each image has been captured by a camera (not shown in the figure).
  • the method may be executed substantially while the camera is registering images (or video) or a posteriori, that is to say, using registered images or video.
  • Each input image 101 is a monocular image, that is to say, the main information it provides is the RGB values of the image (either color image or black-and-white one). In particular, it does not provide 3D information (depth) or other extra features, such as infrared information.
  • the camera must have enough resolution in the ocular area so as to distinguish where the person in the image is looking at. For example, but not limiting, the camera should provide at least 40 pixels between an eye's left corner and the right corner of the same eye.
  • the diagram shown in figure 1A includes three main blocks: A first block (Block 1) 11 in which a 3D face model is adjusted to the user's face image.
  • the input to this first block 11 is an input monocular image 101 or, in general, a sequence of monocular images I j .
  • the output of this first block 11 is a 3D face model 112 fitted to the user's face image 101.
  • the 3D face model 112 is represented by the values taken by a set of parameters ⁇ t, r, s, a ⁇ , which will be explained in detail later.
  • the second block (Block 2) 12 from the 3D face model 112 fitted to the user's face image and from the user's face image 101, normalized 3D gaze vectors 121 of the user with respect to the camera viewpoint are obtained.
  • the eye gaze direction is estimated with respect to the targets.
  • the output of this third block 13 is the user's estimated point of gaze with respect to the targets in the scene.
  • FIG 1B a method for estimating eye gaze direction according to the present disclosure, corresponding to the main blocks 11-13 depicted in figure 1A , is represented in the form of a flow chart.
  • Block 11 in figure 1A corresponds to stages 101-112 in figure 1B
  • block 12 in figure 1A corresponds to stages 113-115 in figure 1B
  • block 13 in figure 1A corresponds to stages 116-120 in figure 1B .
  • reference to blocks or stages of figure 1B will also be made.
  • Figure 2 represents in more detail the first block (Block 1) 11 depicted in figure 1A , that is to say, the block for adjusting a 3D face model to the user's face image(s).
  • Figure 3 shows an example of generic deformable 3D face model (111 in figure 2 ) that may be used in the method of the present disclosure. Any other deformable 3D face model may be used instead. It is recommended that the model 111 be simple, i.e.
  • the model 111 preferably accomplishes a compromise between simplicity and richness.
  • First block 11 is implemented by means of hybrid nested algorithms to automatically obtain the values of the face model parameters ⁇ t, r, s, a ⁇ based on the user's face in the image.
  • the nested algorithms are hybrid because the face region identification procedure (blocks 106-108 in figure 1B ; blocks 106-107 in figure 2 ) and the face landmark detection procedure (block 109 in figures 1B and 2 ) is appearance-based, while the remaining procedures are model-based.
  • two stages are distinguished: (1) the initial face detection 106; and (2) the subsequent face tracking 107. This is explained in detail in relation to the upper part of the flow diagram of figure 4 (stages 41-47). Depending on the application, it may be required to detect all the faces present in each image.
  • the detection algorithm 106 is typically required, because the face or faces to be detected in the image do not necessarily correspond to people already identified.
  • image tracking algorithms may be used, preferably together with detection algorithms. Because the face (or faces) in the images may be faces which have been previously identified as the persons whose eye gaze direction is to be estimated, image tracking algorithms can be used. The current method permits to estimate the eye gaze direction of several people at the same time.
  • the algorithm analyzes whether the observed image requires face detection or not (in which case face tracking can be applied), i.e., whether the face in the image is new or not in the scene. This is relevant when the current method is applied to image sequences, as the time efficiency of computer vision tracking algorithms is typically much higher than those for face detection, and also because of memory constraints of the targeted hardware.
  • the answer to the "Face detection needed' query 41 shown in figure 4 depends on the use case. For instance, in one application the processing of only a limited and controlled number of faces could be required, and therefore, a pre-defined buffer of maximum number of faces would be reserved in memory (e.g., only one).
  • the face detection procedure (stages 41-44 in figure 4 ) would be ignored until the tracking procedure (stages 45-47 in figure 4 ) loses that face.
  • face detection is needed, then the face model parameters are reset to a neutral configuration (42 in figure 4 ), a face region detector is run in the image (43 in figure 4 ) and an image patch and face region of the detected user's face is stored (44 in figure 4 ).
  • a neutral configuration is explained later, in relation to equation (1).
  • the application of a face detection algorithm (106 in figure 1B ) requires a conventional trained face detection model 105, which is out of the scope of the present invention.
  • a face region has been identified (answer to query 48).
  • This region or rectangle represents a cropped face image (108 in figure 1B ).
  • all the faces shown in the image are identified (i.e. captured within or delimited by a rectangle) and a cropped face image 108 of the face of interest is obtained.
  • the face landmark detection algorithm 109 permits to extract a set of facial characterization points o landmarks that identify the shape of face elements like left eye, right eye, left eyebrow, right eyebrow, nose, left nostril, right nostril, mouth, forehead, left cheek, right cheek, chin, face corners or limits, etc.
  • a trained face landmark detection model 110 is used for such detection.
  • the output of the face landmark detection algorithm 109 is an image coordinates vector. This vector has the positions of characteristic facial landmarks from which control points of the 2D facial shape (the projection of the shape of face, in other words) will be obtained.
  • the deformation parameters of the model 111 are adjusted in order to optimize (that is to say, minimize) the mentioned error.
  • the result of this adjustment are the values of face model parameters ⁇ t, r, s, a ⁇ .
  • the facial appearance of the person is identified in the 3D world, thanks to the overlapping of the face texture on the adjusted 3D face model.
  • the considered deformation parameters can be distinguished in two groups: those related to facial actions and those related to the user's shape.
  • a hard constraint to the facial shape deformations through time is not applied, considering that, in theory, it is kept constant for a specific person, and that the facial shape deformations cannot be measured with complete certainty from any viewpoint.
  • the 3D face deformable model has also 6 degrees of freedom that correspond to the XYZ position coordinates and roll-pitch-yaw rotation angles with respect to the camera coordinate system, which is assumed to be well-known and located at the center of the image, with zero depth.
  • the vertices of the 3D deformable model are those vertices of the model whose positioning is semantically equivalent to the detected landmarks.
  • a graphic model is nothing more than a list of points (vertices), joined by straight lines in one way or another (usually forming triangles).
  • the model is deformable through a series of parameters (numbers), which if they are zero, do not deform the model, but if any of them has a value different from 0 then the positioning of some vertices varies depending on a vector of pre-established deformation multiplied by the value of that parameter value, which acts as a "weight".
  • These deformation vectors are also designed according to concepts, such as facial actions, or specific forms of the person. For example, a parameter that represents the interocular distance will change the vertices in the vicinity of the eyes, maintaining the symmetry of the face.
  • the model may have more vertices than the landmarks detected.
  • a semantic relationship must be established between vertices and landmarks in order to deform the model through landmarks.
  • the 2D landmark that represents the tip of the nose must be paired with the 3D vertex positioned on the tip of the nose of the 3D model.
  • the facial model and its adjustment are relatively complex.
  • the number of landmarks may be 68, because in the facial adjustment stage the deformability parameters of the 3D facial graphic object are adjusted (and for doing it, it is better to use more landmarks). Consequently, the achieved estimation of the 3D positions of the eye centers is much closer to the real one than in conventional methods.
  • the objective function is parameterized with camera parameters ⁇ f, w, h ⁇ and face model parameters ⁇ t , r , s , a ⁇ .
  • Any conventional camera model may be adopted.
  • the pinhole camera model [Hartley and Zisserman, 2003] is adopted, without considering lens distortion, as it provides a good balance between simplicity and realism for obtaining perspective projections, for most common cases.
  • the camera parameters ⁇ f , w , h ⁇ are assumed to be known beforehand and constant in time. Hence, the optimization procedure obtains those values of face model parameters ⁇ t, r , s , a ⁇ that minimize the objective function e.
  • the textures of the user's face elements under control can be obtained, including the eyes, for their processing in the following block (block 12 in figure 1 ).
  • key parts of the image have been identified (for example eyes, nose, etc.) and they are aligned with the graphical object (the graphical object being the deformable graphical model representing the face).
  • neutral configuration is established.
  • a non-limiting example of neutral configuration is a configuration in which the face is expressionless, looking at the front and located at the central region of the image, at a distance from the camera viewpoint in which the face boundaries cover about one quarter of the shortest image size (width or height).
  • the face parameters are normalized so that in the neutral configuration they all have zero value and the range that each one covers (e.g., the translation in X direction, a facial action or shape deformation from one extreme to the other, etc.) lies in the same order of values.
  • the algorithm corresponding to the first block 11 ( figure 2 ) and schematized in the flow chart of figure 4 is as follows: Considering in general that the input 101 is formed by an image sequence I (that may be a single image or video comprising a plurality of images or frames), for each I j ⁇ I, wherein j is the frame number:
  • the algorithm schematized in figure 4 can be summarized as follows, in which additional inputs to the algorithm are trained face detection model 105, trained face landmark detection model 110 and generic 3D face deformable model 111. These models are implemented as neural networks, preferably deep neural networks.
  • the algorithm for adjusting a 3D face model to the detected landmarks, applied at stage 50 in figure 4 (stage 112 in figures 1B and 2 ), is explained in detail in figure 5 (it is also referred to as Algorithm 2 in Algorithm 1 listed above).
  • the face model adjustment is carried out following a three-stage procedure, in which the first stage (block 51 in figure 5 ) corresponds to the overall holistic estimation of the model position (t) and orientation (r) parameters, the second stage (block 53 in figure 5 ) corresponds to the holistic estimation of the shape parameters (s) and the third stage (block 54 in figure 5 ) corresponds to the sequential estimation of each action parameter (a).
  • the current parameter values are preferably converted to the normalized range workspace (block 52 in figure 5 ). If a face has been detected (blocks 41-44 in figure 4 ), then the current parameter values are those of the neutral configuration (block 42 in figure 4 ).
  • the current parameter values are those of the previous configuration.
  • Levenberg-Marquardt algorithm [ Levenberg, "A Method for the Solution of Certain Non-Linear Problems in Least Squares.” Quarterly of Applied Mathematics, 2 (1944), pp. 164-168 ; Marquardt, "An Algorithm for Least-Squares Estimation of Nonlinear Parameters.” SIAM Journal on Applied Mathematics, 11 (2) (1963), pp.
  • the shape and action values are preferably initialized with zero values, but the position and orientation values are preferably kept and updated from frame to frame while tracking.
  • a hard constraint to the facial shape deformations through time is not applied, because, in theory, it should keep constant for a specific person, and because facial shape deformations cannot be measured with complete certainty from any viewpoint.
  • the obtained parameter values may optionally be filtered by taking into account their frame-to-frame variation and an appropriate filtering method for face movements.
  • the 3D face model fitted to the user's face image 101 and represented by parameters ⁇ t, r, s, a ⁇ 112, has already been obtained (block 11 of figure 1 ).
  • the result of this stage 112 is the geometry in the 3D space of the facial shape of the person in the image.
  • the information of interest of said geometry of the facial shape is the 3D positioning of the eyes and patches of eye images, in order to be able to apply an eye gaze direction estimation technique.
  • normalized 3D gaze vectors 121 of the user with respect to the camera viewpoint need to be obtained (block 12 of figure 1 ). This may be done in two main stages (113 & 115 in figure 1B ).
  • eye image patches of the person are extracted and normalized 113 ( figure 1B ).
  • two eye image patches are extracted: a left-eye image patch and a right-eye image patch.
  • Each patch is an image represented by a matrix.
  • the result of this stage 113 is, for each eye, the eye image region transformed to a normalized space.
  • the normalized matrix has fixed image size, fixed left and right corners for each eye, and/or the intensity values for the pixels are within a fixed range.
  • a 3D eye gaze vector regression model is applied 115 to each normalized matrix. This is explained next.
  • the output 121 of block 12 in figures 1A-1B is the normalized estimated 3D gaze vectors, for each eye, of the user with respect to the camera viewpoint, throughout I ⁇ g l , g r ⁇ norm .
  • figure 6 represents a geometry of eye's key points in a normalized eye shape: w and h are respectively the image pixel width and the image pixel height and have already been predefined.
  • a trained 3D eye gaze vector regression model 114 has been applied 115 to the normalized eye image patches obtained in the previous stage 113 in order to extract 3D vectors of the eye gaze.
  • the trained 3D eye gaze vector regression model 114 is a neural network.
  • the neural network is preferably a deep neural network.
  • the neural network has a set of parameters (in the case of a deep neural network, it has a very large number of parameters) that allow to infer a response (the desired 3D vector) from a stimulus (each normalized eye image patch). These parameters have to be previously trained from data bases of images annotated with the corresponding answers (in the case of deep neural networks, with a great deal of answers).
  • the neural network when the neural network is fed with an unknown image (eye image patch obtained in stage 113), the neural network, based on what it has "learned" during a training stage, which has generated a model (trained 3D gaze vector estimation model 114 in figure 1B ), responds with a result (the 3D vector corresponding to that eye image patch).
  • a neural network -preferably a deep neural network- has been trained with a geometry of the eyes (for example the one shown in figure 6 ), assuming that each eye is centered ("c" is thus estimated).
  • the deep neural network is shown in figure 1B as a trained 3D gaze vector estimation model 114.
  • a 3D vector represents the eye gaze direction
  • the patches are inserted in a trained neural network.
  • a 3D vector is thus obtained.
  • estimation of the eye gaze direction is improved by combining the two 3D vectors of respective eyes.
  • Figure 7 shows a flow chart of the algorithm corresponding to the second block 12 ( figure 1A ). It is an appearance-based algorithm, which permits to automatically infer the normalized 3D gaze vector 121 from the eye textures of the user.
  • the inputs to the algorithm are: the image sequence I; 2D left ⁇ e 1 , e 2 ⁇ l and right ⁇ e 1 , e 2 ⁇ r eye corner landmark positions, throughout I; the adjusted face model geometry and parameters, throughout I; and the pre-trained deep neural network for regressing 3D gazes from normalized eye images.
  • the output of the algorithm is the user's normalized left and right eye gaze vectors estimation ⁇ g l , g r ⁇ norm , throughout I.
  • the normalized image patches of both eyes need to be extracted (113 in figure 1B ).
  • the normalization is made by aligning the eye textures in a pre-defined shape and preferably by also equalizing the histogram, so that the image brightness is normalized, and its contrast increased.
  • the shape normalization is solved by applying an affine warping transformation (using affine transformation matrix M) based on a set of key points of the eye (both eye corners e 1 e 2 ), obtained from the face landmarks detected previously, so that they fit normalized positions, as shown in figure 6 .
  • the affine transformation matrix M is calculated as follows (stage 71 of figure 7 ): ⁇ ⁇ 1 ⁇ ⁇ ⁇ c x ⁇ ⁇ ⁇ c y ⁇ ⁇ ⁇ ⁇ ⁇ c x + 1 ⁇ ⁇ ⁇ c y where:
  • I norm shape x y I input M 11 x + M 12 y + M 13 , M 21 x + M 22 y + M 23 wherein in M rq , r denotes the row and q denotes the column, r being 1 or 2 and c being, 1, 2, or 3.
  • the inference of both eye gaze vectors is obtained by the same regression model trained (block 114 in figure 1B ) only with appearances and the corresponding gaze vectors of one of the eyes (e.g., the right eye). Therefore, the other normalized eye image needs to be mirrored before it is processed by the regressor (trained regression model 114).
  • This regressor 114 preferably corresponds to a deep neural network trained with an annotated set of normalized eye images, in which the annotation corresponds to the corresponding normalized 3D eye gaze vector, i.e., three floating point numbers representing the XYZ gaze direction with respect to a normalized camera viewpoint, as a unit vector.
  • normalized camera viewpoint refers to a viewpoint from which the eye shape is observed as shown in figure 6 , i.e., with the eye corners e 1 , e 2 in fixed positions of the normalized image.
  • Different kind of deep neural networks can be employed for this purpose, i.e., with different number, kind and configuration of layers (e.g., inspired in LeNet [ LeCun, "Gradient-Based Learning Applied to Document Recognition.” Proceedings of the IEEE, 1998 ], VGG [ Simonyan and Zisserman, "Very Deep Convolutional Networks for Large-Scale Image Recognition.” CoRR abs/1409.1556, 2014 ], etc.).
  • this preferably deep neural network processes the incoming normalized eye image patch and infers a vector of 3 floating point numbers associated with the gaze direction.
  • the image that has been previously mirrored its obtained result needs now to be un-mirrored (just by changing the sign of the first number).
  • the trained regression model 114 either synthetic images, or real images, or a combination of synthetic and real images, may be used.
  • figure 8 shows three examples of the distortion that happens in the normalized appearance of distant eyes in non-frontal faces, when the head's yaw angle is changed.
  • the red rectangle represents the output of the face detection algorithm, that is to say, the detected face region, represented by a rectangle in image coordinates (for example, left top corner position, width and height).
  • the green points are the face landmarks (d1, d2, d3%) detected by the face landmark detection algorithm.
  • the white points are the vertices of the 3D facial model and its triangles, which define the 3D model.
  • the green points do not match exactly the white ones because the deformability of the graphical object is not perfect.
  • e is minimized in block 11 ( figure 1A ). Consequently, this distortion may affect in stability of the estimated gaze for different yaw rotation angles of the head. A similar instability may also happen for different pitch angles, but in a lower degree.
  • g l g r norm corrected g l g r norm regression + K y ⁇ r y ⁇ r y 0 K x ⁇ r x ⁇ r x 0 0
  • the reference pitch and yaw angles could be the average values from those observed during the image sequence, while the user's head poses are closer to frontal viewpoints, while the proportionality constants could be determined based on the observations of the gaze stability while the user is moving the head, but maintaining the point of gaze.
  • each vector is divided by the Euclidean norm, so that to assure that the resulting vectors have unit norm, and this way both normalized gaze vectors are obtained.
  • the eye image equalization algorithm (block 73 in figure 7 ) may be implemented as described next in relation to figure 9 .
  • the input to this algorithm is the shape-normalized eye image I norm shape .
  • the shape-normalized eye image may be an 8-bit single channel shape-normalized eye image.
  • the output of this algorithm is a normalized eye image I norm .
  • the histogram H of the shape-normalized eye image is calculated (stage 91).
  • the histogram is normalized so that the sum of histogram bins is 255 (stage 92).
  • transform the image using H' as a look-up table: I norm x y H ′ I norm shape x y .
  • the normalized 3D gaze vectors (for left and right eyes) 121 ( figures 1A-1B ) of the user with respect to the camera viewpoint have already been obtained (block 12 of figure 1A ) from each normalized eye image patches (left-eye patch and right-eye patch). It is remarkable that these 3D eye gaze vectors have been obtained without any previous calibration e.g. without any initialization procedures. This is especially important in applications requiring real-time monitoring of the eye gaze, such as automotive applications. Finally, the eye gaze direction needs to be estimated with respect to the targets (block 13 of figure 1A ). In other words, the 3D eye gaze vector already obtained must be related with the environment. This is explained next. In the present disclosure, a virtual 3D modelling between the target (i.e. screen) and the person is performed.
  • This virtual 3D space reconstruction is generated by combining a set of computer vision procedures in a specific manner, as described next, and by considering a set of assumptions regarding the physical characteristics of the different components.
  • the target's geometry (117 in figure 1B ) and camera intrinsic assumptions (116 in figure 1B ) are used.
  • the 3D modelling is virtual (rather than real) because the characteristics of the camera or user measurements, such as interocular distance, are not known. So, from each 3D eye gaze vector (obtained in stage 115) and from the target's geometry 117 and camera intrinsic assumptions 116, all the scene components (face, 3D eye gaze vectors, camera and possible targets) are set in a common 3D space (stage 118).
  • a filtered average 3D eye gaze vector is then calculated (stage 119). Finally, the intersection between the filtered average 3D eye gaze vector and the target's geometry is checked (stage 120). This checking provides the estimated eye gaze direction with respect to the target 131.
  • the output 131 of block 13 in figures 1A-1B is the user's estimated point of gaze with respect to the targets in the scene.
  • Figure 10 shows a flow chart of the algorithm corresponding to the third block 13 of the method of the invention ( figure 1A ). It is a target-related point of gaze estimation algorithm. It automatically infers the user's eye gaze with respect to the targets in the scene.
  • the algorithm is as follows: First, the target geometries are placed with respect to the camera's coordinate system, which is the same reference used for the face and eye gaze vectors, already estimated in previous blocks of the method of the present disclosure. The camera's coordinate system has been previously pre-established. In other words, it is assumed that the camera's coordinate system is well-known.
  • a target is modelled or referred to as a set of polygons formed by k points b and lines l , and their corresponding planar surfaces ⁇ v, q ⁇ (where v is the normal vector and q the distance from the origin) that define the objects that need to be related with the user's point of gaze (e.g., a TV screen is represented by a rectangular plane, or the frontal and lateral windows in a vehicle are represented by 3 rectangular planes, etc., depending on the final application).
  • the 3D face model is placed in the scene with the parameters obtained in block 11 ( figure 1A ).
  • the normalized left and right eye 3D gaze vectors obtained in block 12 are transformed, so that they are referred to the coordinate system of the camera (i.e., not to the normalized camera viewpoint, as before).
  • This is done by removing the effect of the rotation angle ⁇ that was used for the affine transformation applied to each normalized eye shape, like this: g l g r ⁇ cos ⁇ ⁇ g l g r norm x + sin ⁇ ⁇ g l g r norm y ⁇ sin ⁇ ⁇ g l g r norm x + cos ⁇ ⁇ g l g r norm y g l g r norm z
  • both gaze vectors are combined by calculating its geometric mean g, which it is assumed to be the user's overall gaze vector.
  • the gaze vector may optionally be filtered by taking into account its frame-to-frame motion and an appropriate filtering method for eye movements.
  • the origin of this vector is preferably placed in the middle position (mean value) of both eye centers from the 3D face, ⁇ (see for example the origin of this vector in the image in the left low corner in figure 1B ).
  • a point-in-polygon strategy is applied to see if any of the calculated pog -s lies within any of the polygons that represent the different targets.
  • the polygon within which any of the calculated pog -s lies, is selected, or the closest one if none of the calculated pog -s lies within a polygon.
  • Different point-in-polygon strategies are reported in E. Haines, "Point in Polygon Strategies.” Graphics Gems IV, ed. Paul Heckbert, Academic Press, p. 24-46, 1994 .
  • the point-in-polygon strategy may result in that the point of gaze goes through a polygon, or that the point of gaze does not go through any polygon. In the event it does not, it may provide the closest distance to a polygon.
  • the method of the present disclosure if the point of gaze does not go through a polygon, the method provides the closest polygon to the point of gaze. For example, in line 11 above, if the point of gaze does not go through a polygon, the distance to the polygon is stored. And in line 12 above, the current measured distance is compared to the minimum measured distance (which is the stored one), in order to guarantee that the closest polygon is finally selected.
  • One of the advantages of the computer-implemented method of the present invention is that it can be integrated, processed and executed in embedded hardware systems with low computational capabilities, such as mobile phones. Besides, in order to run the current method, no extra equipment is required apart from a single imaging device, such as several imaging devices or depth sensors. On the contrary, unlike current attempts for relating 2D image projections with a reconstructed 3D space, which require not only obtaining the person's 3D eye gaze vectors from the images, but also the person's 3D eye positions, the surrounding potential targets' geometries in the same 3D space, the camera characteristics from which that space is observed and an additional calibration stage done by the user. The current method only requires a single monocular imaging device that does not provide 3D data.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Evolutionary Computation (AREA)
  • Geometry (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Ophthalmology & Optometry (AREA)
  • Image Analysis (AREA)
EP17382912.8A 2017-12-27 2017-12-27 Verfahren, system und computerprogrammprodukt zur abschätzung des blickpunkts Active EP3506149B1 (de)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP17382912.8A EP3506149B1 (de) 2017-12-27 2017-12-27 Verfahren, system und computerprogrammprodukt zur abschätzung des blickpunkts

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP17382912.8A EP3506149B1 (de) 2017-12-27 2017-12-27 Verfahren, system und computerprogrammprodukt zur abschätzung des blickpunkts

Publications (2)

Publication Number Publication Date
EP3506149A1 true EP3506149A1 (de) 2019-07-03
EP3506149B1 EP3506149B1 (de) 2025-01-08

Family

ID=61027403

Family Applications (1)

Application Number Title Priority Date Filing Date
EP17382912.8A Active EP3506149B1 (de) 2017-12-27 2017-12-27 Verfahren, system und computerprogrammprodukt zur abschätzung des blickpunkts

Country Status (1)

Country Link
EP (1) EP3506149B1 (de)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111931694A (zh) * 2020-09-02 2020-11-13 北京嘀嘀无限科技发展有限公司 确定人物的视线朝向的方法、装置、电子设备和存储介质
EP3819813A1 (de) * 2019-11-08 2021-05-12 Honda Research Institute Europe GmbH Bestimmung eines bereichs von interesse in einer kopf-augen-verfolgungsanwendung
CN113661495A (zh) * 2021-06-28 2021-11-16 华为技术有限公司 视线校准方法及装置、设备、计算机可读存储介质、系统、车辆
WO2022023142A1 (en) * 2020-07-27 2022-02-03 Roomality Limited Virtual window
US11335104B2 (en) 2020-03-31 2022-05-17 Toyota Research Institute, Inc. Methods and system for predicting driver awareness of a feature in a scene
CN114610150A (zh) * 2022-03-09 2022-06-10 上海幻电信息科技有限公司 图像处理方法及装置
CN115482574A (zh) * 2022-09-29 2022-12-16 珠海视熙科技有限公司 基于深度学习的屏幕注视点估计方法、装置、介质及设备
CN115830675A (zh) * 2022-11-28 2023-03-21 深圳市华弘智谷科技有限公司 一种注视点跟踪方法、装置、智能眼镜及存储介质
CN116052264A (zh) * 2023-03-31 2023-05-02 广州视景医疗软件有限公司 一种基于非线性偏差校准的视线估计方法及装置

Non-Patent Citations (20)

* Cited by examiner, † Cited by third party
Title
ANDREW T DUCHOWSKI ET AL: "Binocular eye tracking in virtual reality for inspection training", PROCEEDINGS EYE TRACKING RESEARCH & APPLICATIONS SYMPOSIUM 2000, 8 November 2000 (2000-11-08), NEW YORK, NY : ACM, US, pages 89 - 96, XP058268054, ISBN: 978-1-58113-280-9, DOI: 10.1145/355017.355031 *
BERTHILSSON R: "Finding correspondences of patches by means of affine transformations", COMPUTER VISION, 1999. THE PROCEEDINGS OF THE SEVENTH IEEE INTERNATION AL CONFERENCE ON KERKYRA, vol. 2, 20 September 1999 (1999-09-20) - 27 September 1999 (1999-09-27), IEEE COMPUT. SOC,LOS ALAMITOS, CA, USA,, pages 1117 - 1122, XP010350527, ISBN: 978-0-7695-0164-2 *
BROYDEN: "The Convergence of a Class of Double Rank Minimization Algorithms: 2. The New algorithm", J. INST. MATH. APPL., vol. 6, 1970, pages 222 - 231
E. HAINES: "Graphics Gems IV", 1994, ACADEMIC PRESS, article "Point in Polygon Strategies", pages: 24 - 46
FLETCHER: "A New Approach to Variable Metric Algorithms", COMPUTER J., vol. 13, 1970, pages 317 - 322
GOLDFARB: "A Family of Variable Metric Methods Derived by Variational Means", MATH. COMP., vol. 24, 1970, pages 23 - 26
KRAFKA K. ET AL.: "Eye Tracking for Everyone", PROC. OF THE IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR, 2016
LECUN: "Gradient-Based Learning Applied to Document Recognition", PROCEEDINGS OF THE IEEE, 1998
LEVENBERG: "A Method for the Solution of Certain Non-Linear Problems in Least Squares", QUARTERLY OF APPLIED MATHEMATICS, vol. 2, 1944, pages 164 - 168
MARQUARDT: "An Algorithm for Least-Squares Estimation of Nonlinear Parameters", SIAM JOURNAL ON APPLIED MATHEMATICS, vol. 11, no. 2, 1963, pages 431 - 441, XP000677023, DOI: doi:10.1137/0111030
SHANNO: "Conditioning of Quasi-Newton Methods for Function Minimization", MATH. COMP., vol. 24, 1970, pages 647 - 650
SHRIVASTAVA ET AL.: "Learning from Simulated and Unsupervised Images through Adversarial Training", CORR ABS/1612.07828, 2016
SIMONYAN; ZISSERMAN: "Very Deep Convolutional Networks for Large-Scale Image Recognition", CORR ABS/1409.1556, 2014
STRUPCZEWSKI A. ET AL.: "Proc. of the Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP", vol. 3, 2016, VISAPP, article "Geometric Eye Gaze Tracking", pages: 446 - 457
UNZUETA LUIS ET AL: "Efficient generic face model fitting to images and videos", IMAGE AND VISION COMPUTING, vol. 32, no. 5, 1 May 2014 (2014-05-01), pages 321 - 334, XP028845847, ISSN: 0262-8856, DOI: 10.1016/J.IMAVIS.2014.02.006 *
WOOD ET AL.: "Learning an Appearance-Based Gaze Estimator from One Million Synthesised Images", PROC. OF THE ACM SYMPOSIUM ON EYE TRACKING RESEARCH & APPLICATIONS, 2016, pages 131 - 138, XP058079701, DOI: doi:10.1145/2857491.2857492
WOOD ET AL.: "Rendering of Eyes for Eye-Shape Registration and Gaze Estimation", PROC. OF THE IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV, 2015
XUCONG ZHANG ET AL: "MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 24 November 2017 (2017-11-24), XP080839951 *
ZHANG X. ET AL.: "Appearance-Based Gaze Estimation in the Wild", PROC. OF THE IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR, 2015, pages 4511 - 4520, XP032793907, DOI: doi:10.1109/CVPR.2015.7299081
ZHANG X. ET AL.: "It's Written All Over Your Face: Full-Face Appearance-Based Gaze Estimation", CORR ABS/1611.08860, 2017

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3819813A1 (de) * 2019-11-08 2021-05-12 Honda Research Institute Europe GmbH Bestimmung eines bereichs von interesse in einer kopf-augen-verfolgungsanwendung
US11335104B2 (en) 2020-03-31 2022-05-17 Toyota Research Institute, Inc. Methods and system for predicting driver awareness of a feature in a scene
WO2022023142A1 (en) * 2020-07-27 2022-02-03 Roomality Limited Virtual window
CN111931694A (zh) * 2020-09-02 2020-11-13 北京嘀嘀无限科技发展有限公司 确定人物的视线朝向的方法、装置、电子设备和存储介质
CN113661495A (zh) * 2021-06-28 2021-11-16 华为技术有限公司 视线校准方法及装置、设备、计算机可读存储介质、系统、车辆
CN114610150A (zh) * 2022-03-09 2022-06-10 上海幻电信息科技有限公司 图像处理方法及装置
CN115482574A (zh) * 2022-09-29 2022-12-16 珠海视熙科技有限公司 基于深度学习的屏幕注视点估计方法、装置、介质及设备
CN115830675A (zh) * 2022-11-28 2023-03-21 深圳市华弘智谷科技有限公司 一种注视点跟踪方法、装置、智能眼镜及存储介质
CN115830675B (zh) * 2022-11-28 2023-07-07 深圳市华弘智谷科技有限公司 一种注视点跟踪方法、装置、智能眼镜及存储介质
CN116052264A (zh) * 2023-03-31 2023-05-02 广州视景医疗软件有限公司 一种基于非线性偏差校准的视线估计方法及装置

Also Published As

Publication number Publication date
EP3506149B1 (de) 2025-01-08

Similar Documents

Publication Publication Date Title
EP3506149B1 (de) Verfahren, system und computerprogrammprodukt zur abschätzung des blickpunkts
US10684681B2 (en) Neural network image processing apparatus
US11836880B2 (en) Adjusting a digital representation of a head region
Wang et al. EM enhancement of 3D head pose estimated by point at infinity
US8467596B2 (en) Method and apparatus for object pose estimation
US20220400246A1 (en) Gaze correction of multi-view images
US9866820B1 (en) Online calibration of cameras
Kumano et al. Pose-invariant facial expression recognition using variable-intensity templates
CN104123749A (zh) 一种图像处理方法及系统
EP3154407B1 (de) Blickbeurteilungsverfahren und -vorrichtung
CN107004275A (zh) 用于确定实物的至少一部分的处于绝对空间比例的3d重构件的空间坐标的方法和系统
JP2003015816A (ja) ステレオカメラを使用した顔・視線認識装置
US11159717B2 (en) Systems and methods for real time screen display coordinate and shape detection
CN109583338A (zh) 基于深度融合神经网络的驾驶员视觉分散检测方法
CN102013011A (zh) 基于正脸补偿算子的多姿态人脸识别方法
JPWO2019003973A1 (ja) 顔認証装置、顔認証方法およびプログラム
CN114270417A (zh) 能够更新注册人脸模板的人脸识别系统及方法
CN106981078A (zh) 视线校正方法、装置、智能会议终端及存储介质
Unzueta et al. Efficient generic face model fitting to images and videos
CN108694348B (zh) 一种基于自然特征的跟踪注册方法及装置
WO2020068104A1 (en) Generating spatial gradient maps for a person in an image
US20210165999A1 (en) Method and system for head pose estimation
KR101844367B1 (ko) 부분 포즈 추정에 의하여 개략적인 전체 초기설정을 사용하는 머리 포즈 추정 방법 및 장치
Afroze et al. Detection of human’s focus of attention using head pose
JP4623320B2 (ja) 三次元形状推定システム及び画像生成システム

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN PUBLISHED

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20200102

RBV Designated contracting states (corrected)

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

17Q First examination report despatched

Effective date: 20210303

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

REG Reference to a national code

Ref country code: DE

Ref legal event code: R079

Ref document number: 602017087225

Country of ref document: DE

Free format text: PREVIOUS MAIN CLASS: G06K0009000000

Ipc: G06V0040190000

Ref country code: DE

Ref legal event code: R079

Free format text: PREVIOUS MAIN CLASS: G06K0009000000

Ipc: G06V0040190000

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: GRANT OF PATENT IS INTENDED

RIC1 Information provided on ipc code assigned before grant

Ipc: G06F 3/03 20060101ALI20240625BHEP

Ipc: G06F 3/01 20060101ALI20240625BHEP

Ipc: G06V 40/16 20220101ALI20240625BHEP

Ipc: G06V 10/82 20220101ALI20240625BHEP

Ipc: G06V 40/19 20220101AFI20240625BHEP

INTG Intention to grant announced

Effective date: 20240724

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE PATENT HAS BEEN GRANTED

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

REG Reference to a national code

Ref country code: DE

Ref legal event code: R096

Ref document number: 602017087225

Country of ref document: DE