WO2022250063A1 - 顔認証を行う画像処理装置および画像処理方法 - Google Patents
顔認証を行う画像処理装置および画像処理方法 Download PDFInfo
- Publication number
- WO2022250063A1 WO2022250063A1 PCT/JP2022/021288 JP2022021288W WO2022250063A1 WO 2022250063 A1 WO2022250063 A1 WO 2022250063A1 JP 2022021288 W JP2022021288 W JP 2022021288W WO 2022250063 A1 WO2022250063 A1 WO 2022250063A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- trained model
- feature
- feature amount
- learning
- Prior art date
Links
- 238000012545 processing Methods 0.000 title claims abstract description 128
- 238000003672 processing method Methods 0.000 title claims description 3
- 230000001815 facial effect Effects 0.000 title description 11
- 238000000034 method Methods 0.000 claims description 90
- 238000013528 artificial neural network Methods 0.000 claims description 8
- 230000002194 synthesizing effect Effects 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 description 123
- 230000008569 process Effects 0.000 description 49
- 239000013598 vector Substances 0.000 description 30
- 238000010586 diagram Methods 0.000 description 25
- 230000009466 transformation Effects 0.000 description 20
- 230000006870 function Effects 0.000 description 11
- 238000004364 calculation method Methods 0.000 description 6
- 230000001131 transforming effect Effects 0.000 description 6
- 230000008859 change Effects 0.000 description 4
- 238000007796 conventional method Methods 0.000 description 4
- 210000002569 neuron Anatomy 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 238000013459 approach Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 239000000654 additive Substances 0.000 description 2
- 230000000996 additive effect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000006866 deterioration Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/60—Analysis of geometric attributes
- G06T7/62—Analysis of geometric attributes of area, perimeter, diameter or volume
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/60—Extraction of image or video features relating to illumination properties, e.g. using a reflectance or lighting model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30168—Image quality inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
Definitions
- the present invention relates to face recognition technology using images.
- Patent Document 1 when extracting features of a person from an image, wearing of a mask or eyeglasses is determined, and an image region from which feature amounts are extracted is dynamically changed according to the determination result.
- Patent Literature 1 it was necessary to store multiple patterns of features according to the state of the person's clothing when registering the person.
- the present invention has been made in view of the above problems, and aims to reduce the amount of information to be registered when matching objects in different states.
- An image processing apparatus for solving the above problems includes: a first acquisition means for acquiring a first feature amount from a first image based on a first trained model for extracting features from the image; a second acquisition means for acquiring a second feature quantity from the second image based on a second trained model for extracting features from the image, which is determined according to the state of the second image; collation means for determining whether an object included in the first image and an object included in the second image are the same based on the first feature amount and the second feature amount;
- the second trained model is a model that has learned the second feature quantity in the same feature space as the first trained model.
- Block diagram showing a functional configuration example of an image processing device Block diagram showing a hardware configuration example of an image processing device Schematic diagram showing an example of matching processing operation
- Schematic diagram showing an example of matching processing operation 4 is a flowchart showing processing executed by an image processing apparatus; 4 is a flowchart showing processing executed by an image processing apparatus; 4 is a flowchart showing processing executed by an image processing apparatus;
- Schematic diagram showing an example of the operation of the learning process Schematic diagram showing an example of the operation of the learning process 4 is a flowchart showing processing executed by an image processing apparatus;
- 4 is a flowchart showing processing executed by an image processing apparatus;
- Block diagram showing a functional configuration example of an image processing device 4 is a flowchart showing processing executed by an image processing apparatus; 4 is a flowchart showing processing executed by an image processing apparatus;
- the image processing apparatus converts an object in an image into a feature amount using different feature amount conversion means according to the state of the object at the time of photographing, and then performs matching. As a result, the matching accuracy is superior to that of the conventional method in which the feature value conversion means is not changed according to the state.
- the present invention while using different conversion means, learning adjustment is performed so that the feature values output for the same object are similar to each other. Therefore, even if the conversion methods are different, they can be used for collation processing without discrimination. Therefore, compared with the conventional method of extracting the feature amount of the registered image pattern, the amount of memory required for storing the feature amount can be reduced. Alternatively, it excels in calculation cost and speed of matching processing.
- FIG. 1 is a diagram showing a functional configuration example of an image processing apparatus.
- the image processing apparatus 1 includes a first image acquisition unit 101, a second image acquisition unit 102, an object parameter determination 103, a storage unit 104, a first feature amount conversion unit 105, a second feature amount conversion unit 106, a feature A quantity matching unit 107 is provided. Details will be described later.
- FIG. 2 is a hardware configuration diagram of the image processing device 1 in this embodiment.
- the CPU H101 controls the entire apparatus by executing the control program stored in the ROM H102.
- RAM H103 temporarily stores various data from each component. In addition, it expands the program and makes it executable by the CPU H101.
- the storage unit H104 stores transformation parameters for image transformation according to the present embodiment. As a medium for the storage unit H104, an HDD, flash memory, various optical media, etc. can be used.
- the acquisition unit H105 is composed of a keyboard, a touch panel, a dial, etc., receives input from a user, and is used for setting an arbitrary viewpoint when reconstructing an image of a subject.
- the display unit H106 is composed of a liquid crystal display or the like, and displays the reconstruction result of the image of the subject.
- the present apparatus can communicate with an imaging apparatus and other apparatuses via the communication unit H107.
- FIG. 3A and 3B are schematic diagrams of the matching process of this embodiment, showing the difference between the method of the present invention and the conventional method.
- FIG. 3A shows a conventional method, in which the input image including the person to be authenticated and the registered image including the registered person are converted using the same parameters. At this time, if there is a large change in appearance, such as whether or not a mask or sunglasses are worn, the accuracy is likely to deteriorate. On the other hand, there is a problem that the configuration scale of the feature amount conversion unit becomes large when trying to deal with all appearance changes.
- FIG. 3B is an example of a schematic diagram of the present invention.
- an object parameter determination 103 determines the state of the subject, such as whether or not the mask is worn. According to the determination result, the feature quantity conversion unit 106 reads appropriate conversion parameters from the storage unit 104 and performs feature quantity conversion.
- a plurality of types of conversion parameters are learned according to the state of the person and the shooting environment. Since the conversion parameters are learned individually for each subject's condition, robust matching can be achieved even with large changes in appearance, such as whether or not a mask or sunglasses are worn.
- the feature matching unit 107 may calculate the degree of similarity based on a basic method such as an inner product or an angle between feature values, and does not require special processing. In this way, one type of similarity can be used as a standard collation measure regardless of the state of the object. For example, in the method of Patent Literature 1, the same number of feature values of a registered person as the number of feature extraction methods must be stored. Since parameters are applied, the feature amount to be registered can be narrowed down.
- An object of the present embodiment is to determine whether the same person or different persons are shown based on the image feature amount when two person images are given.
- the processing shown in the flowchart of FIG. 4 is executed according to a computer program stored in the storage device 104 by the CPU 101 of FIG. 2, which is a computer.
- notation of each process (step) is omitted by adding S to the beginning of each process (step).
- the first image acquisition unit 101 acquires the first image (first image) including the object to be authenticated (here, the person).
- the determination unit 103 determines whether the first image satisfies a predetermined condition. If the specified conditions are met, the state of the object and the shooting environment are normal (similar to the learned environment); otherwise, the mask is applied or the illuminance of the environment changes etc., it is determined that the state is not normal. Here, specifically, it is determined whether or not the person in the first image is wearing a mask. A technique such as template matching is used to detect the mask. If the predetermined condition (not masked) is satisfied, the process proceeds to S103. If the predetermined condition is not satisfied (masked), the process proceeds to S104.
- the first feature amount conversion unit (first feature acquisition unit) 105 reads out the parameters (first parameter set) for normal person feature amount conversion and sets them in the learned model.
- a trained model is a neural network for acquiring feature values of an object from an image.
- a trained model to which the first parameter set is applied is called a first trained model.
- the first feature amount conversion unit 105 reads the feature amount conversion parameters (second parameter set) for the person wearing the mask and sets them in the learned model.
- a trained model to which the second parameter set is applied is called a second trained model.
- the feature quantity conversion unit 105 is configured by a convolutional neural network known in Non-Patent Document 1, for example.
- the feature quantity transforming unit 105 is composed of a deep neural network (hereinafter abbreviated as DNN) known as a Transformer network (transformer network) known in Patent Document 2. That is, the feature amount conversion unit 105 acquires the feature amount using a parameter set that is a trained model for acquiring the features of the person included in the image and is learned according to the state of the person included in the image. do.
- DNN deep neural network
- Transformer network Transformer network
- the parameters for feature quantity conversion are various parameters such as the number of layers of neurons, the number of neurons, and connection weights.
- the first feature amount conversion unit 105 converts the feature amount from the first image received from the first image acquisition unit 101 based on the first trained model or the second trained model. Convert.
- steps S106 to S110 the same processing as in steps S101 to S105 is performed on the second image (second image). That is, when the person included in the second image is not masked, the feature amount is acquired from the first learned model to which the first parameter set is applied. If the person included in the second image is wearing a mask, the feature is obtained from the second trained model to which the second parameter set is applied.
- the above processing is performed by the second image acquisition unit 102 and the second feature amount conversion unit (second feature acquisition unit) 106 .
- the first image and the second image are converted into feature amounts.
- These feature quantities are represented as f 1 and f 2 . Let f 1 and f 2 be one-dimensional vectors as in Non-Patent Document 1.
- the DNN parameters received by the first feature quantity transforming unit 105 and the second feature quantity transforming unit 106 have the same configuration. Although not necessary, the number of output channels of the neurons in the final layer is the same. As a result, the dimension lengths of f 1 and f 2 are assumed to be the same.
- the feature quantity matching unit 107 calculates a similarity score between the two feature quantities. That is, based on the first feature amount and the second feature amount, it is determined whether or not the object included in the first image and the object included in the second image are the same. If the similarity score between the first feature quantity and the second feature quantity is greater than or equal to a predetermined threshold, the two images contain the same object. If the similarity score between the first feature and the second feature is less than a predetermined threshold, the two images contain different objects.
- a plurality of indices for measuring the degree of similarity between feature quantities are known, but here, as in the method of Non-Patent Document 1, angles between feature quantity vectors are used. Calculate the similarity score as follows.
- the feature matching unit 107 determines that the person is the same person if the similarity score is equal to or greater than a predetermined threshold value, and otherwise determines that the person is a different person. This completes the operation of the matching process. It should be noted that the first image and the second image may be configured such that feature amounts are acquired by a common image acquisition unit and feature amount conversion unit.
- the learning phase of this embodiment will be described.
- learning is performed by the ⁇ representative vector method> known from Non-Patent Document 1.
- the representative vector method is a face authentication learning method that increases the learning efficiency by setting a feature amount vector that represents each person and using this together. See Non-Patent Document 1 for details.
- the image processing apparatus 2 in the learning processing phase is shown in FIG.
- the image conversion unit 200 converts a first image group, which is a set of reference images of the object (for example, a face image of a person without a wearable object), into an image representing a predetermined state of the object (for example, a mask). Face images of the person wearing the device) are converted into a second group of images.
- an image showing a wearing object such as a mask is combined with the face image, or the image is converted to have a certain brightness.
- the image acquisition unit 201 acquires an image group used for learning. Here, two or more types of image groups are acquired in order to learn two or more types of parameter sets.
- the feature amount conversion unit 202 acquires feature amounts from each image using a parameter set according to the state of the image and a learning model for extracting feature amounts from the image.
- a learning unit 203 learns a parameter set and a learning model for extracting a feature amount from an image. Note that this embodiment describes an example in which the first learning model and the second learning model are alternately learned.
- the processing flow procedure of this embodiment consists of FIGS. 5A and 5B.
- the process shown in FIG. 5A is called ⁇ first learning process>
- the process shown in FIG. 5B is called ⁇ second learning process>.
- first learning process> normal feature conversion learning is performed using an image group (first image group) of a person not wearing a mask.
- a group of images of a person wearing a mask (a second group of images) is used to perform learning specialized for a masked person.
- the solid line portion in FIG. 14 is the configuration used in the ⁇ first learning process> process
- the dashed line portion is the configuration used in the ⁇ second learning process> process.
- FIG. 5A shows processing in the learning phase executed by the image processing apparatus.
- the feature quantity conversion unit 202 initializes the parameter set and representative vectors v 1 to v n of the first learning model with random numbers.
- 1 to n are IDs of all persons included in the learning image.
- Each representative vector v is a d-dimensional vector (d is a predetermined value).
- the image acquisition unit 201 acquires images I 1 to I m randomly selected from the first image group.
- the first image group is a reference image group, which is a plurality of person images not wearing a mask, and includes one or more images per person. Each image is attached with information of a person's ID.
- the feature amount conversion unit 202 acquires the first learning feature amount f i by inputting each image I i of the first image group to the first learning model.
- the learning feature amount f i is a d-dimensional vector.
- the feature quantity conversion unit 202 converts the feature quantity similarity (intra-class similarity) between each person image and the representative vector and the feature quantity similarity (inter-class similarity) between each person and the representative vector of another person. ) to calculate the loss value.
- y(i) is the ID number of the person in image Ii .
- the loss value used for learning is obtained by summing up these values for each image as follows.
- Loss value ⁇ i interclass similarity score (f i ) - ⁇ intraclass similarity score (f i )
- ⁇ is a weight parameter for learning balance. Note that the above is an example of the loss value, and there are various known methods such as using a similarity score with a margin or cross entropy. For details, refer to Non-Patent Document 1 and the like.
- the learning unit 203 updates the first parameter set of the feature conversion unit (first learning model) so as to reduce the loss value.
- the feature quantity conversion unit 203 updates the value of the representative vector
- in S206 updates the first parameter set.
- the storage unit 104 stores and saves the first parameter set and the values of the representative vectors v 1 to v n .
- Fig. 6 schematically shows an example of the result at the time when the ⁇ first learning process> is completed.
- Representative vectors 601 , 602 , and 603 are obtained in the feature space 600 as feature vectors representing the persons with IDs 1 to 3 .
- the first parameter set is appropriately learned so that the features a, b and features p, q, etc. of each person are located in the vicinity of these representative vectors (image features of each person are indicated by black circles in the figure). represents).
- a learning image group (second image group) of a person wearing a mask is used to learn a DNN (second learning model) of feature amount conversion for a person wearing a mask.
- the ⁇ second learning process> will be described with reference to FIG. 5B.
- the image conversion unit 200 converts the first image group into a second image group that satisfies a predetermined condition. Specifically, an image obtained by synthesizing wearing objects such as a mask and sunglasses, and images with different illuminances are generated using an existing conversion method. If the second group of images has been prepared in advance, S300 may be skipped.
- the feature amount conversion unit 202 acquires the first parameter set and uses it as the initial values of the parameters of the second learning model.
- learning of the second parameter of the second learning model is performed in the same manner as in the processing flow of FIG. 5A.
- the contents of the processing, the calculation of the loss, etc. are the same as the processing of S202 to S207.
- the representative vectors v 1 to v n are not updated in S205, and the values stored in S208 in the previous stage are used as they are.
- learning is performed such that the feature amount of a person wearing a mask approaches the representative vector of a person not wearing a mask.
- the storage unit 104 saves the second parameter set and ends the learning. Note that the representative vector values are used only during learning, and are not used during matching operations.
- FIG. 7 is a diagram schematically showing the start point of the ⁇ second learning process>.
- the positions of the representative vectors 601, 602, 603 are fixed and are not updated by learning thereafter. Images c and d of the person wearing the mask are positioned far from the representative vector 601 of the person.
- the learning adjustment of ⁇ second learning processing> as indicated by an arrow attached to feature c (numbered 702), the feature of each person approaches the direction of each representative vector. parameter set is learned.
- the feature amount using the first parameter set for the image of the person not wearing the mask (a, b in FIG. 6) and the image of the person wearing the mask (c, The feature quantity using the second parameter set for d) is close to each other in the feature space.
- FIGS. 8A and 8B and schematic diagrams of FIGS. 9A to 9C An example of the flow of this learning operation process will be described with reference to FIGS. 8A and 8B and schematic diagrams of FIGS. 9A to 9C.
- a set of images of a normal person and an image group obtained by superimposing and synthesizing a mask image on the same image are used.
- FIG. 9A shows examples of normal person images a, b, p and images a′, b′, p′ superimposed with masks.
- the second parameter set is learned so that the feature amounts of images a', b', and p' approach those of images a, b, and p, respectively.
- the loss value is calculated from the intra-class similarity and the inter-class similarity by the following formula without using the representative vector, and the first parameter set of the first learning model is updated.
- FIG. 9B shows the results of the ⁇ first learning process>.
- the feature amount conversion unit 202 initializes the parameters of the DNN.
- the image acquisition unit 201 obtains an original image (first learning image) before superimposing the mask as a learning image and an image (second training image). That is, the first learning image and the second learning image are images in which the same object is captured, and are a pair of images in which the state of the object and the shooting environment are different.
- the feature amount conversion unit 202 converts the first learning feature amount from the first learning model and the original image (first image), and converts the first learning feature amount from the second feature model and the synthesized image (second image). Each acquires a learning feature amount.
- the learning unit 203 calculates the intra-class and inter-class loss values of the person. At this time, in addition to the terms of similarity score within and between classes of persons used so far, a term of similarity of image pair is newly added as shown in the following equation.
- Image pair similarity score (f x ) similarity score (f x , f x' )
- Loss value ⁇ i inter-class similarity score (f i ) - ⁇ 1 intra-class similarity score (f i ) - ⁇ 2 image pair similarity score (f i )
- fx is the feature amount of the image x
- fx ' is the feature amount of the image x' obtained by superimposing and synthesizing the mask on the image x.
- ⁇ 1 and ⁇ 2 are parameters for balancing each term.
- the similarity term of the image pair is obtained when the distance between the learning feature amounts of the original image before mask superimposition (first learning image) and the synthesized image after mask superimposition (second learning image) is smaller than a predetermined value.
- FIG. 9C shows a schematic diagram of the terms of the degree of similarity of the feature amount pairs with the numbers 900, 901, and 902 attached to the arrows.
- an arrow 903 indicates the conventional intra-class similarity
- an arrow 904 indicates the inter-class similarity.
- the feature values of the original image without the mask are “fixed” and do not move, and the feature values of the combined image with the mask are Learning is performed such that the amount changes in the direction approaching the feature amount of the non-mask wearing.
- the learning unit 203 determines in S507 that the learning has converged, in S508 the second parameter set of the second learning model is saved and the learning ends.
- the above is an example of a derived form of the learning method.
- learning method may be performed including a small number of masked human images. In this way, even if the object parameter determination 103 fails in matching and an erroneous feature transformation parameter is applied, it is possible to prevent significant performance deterioration. Similarly, it is conceivable to mix and learn images of normal people when performing learning of the feature amount conversion unit for mask-wearing people.
- a derived form of the configuration of the DNN will be given.
- the DNN for feature value conversion for a normal person and the DNN for a person wearing a mask share the former layer and partially change only the latter layer according to the state of the person. can be considered.
- feature amount conversion means with completely different configurations for the feature amount conversion unit for normal persons and the feature amount conversion unit for masked persons.
- a convolutional neural network for a feature amount transforming unit for a normal person, and use a transformer network known in Patent Document 2 for a person wearing a mask.
- a recursive neural network or the like may also be used.
- a wide variety of feature amount conversion means, not limited to DNN, can be applied to the feature amount conversion unit as long as the means can adjust the parameters based on the loss value.
- the feature quantities f 1 and f 2 obtained by transforming the input image may be in the form of an N-dimensional matrix instead of a one-dimensional vector.
- the feature vectors obtained from the first trained model and the second trained model have the same length, but they may have different lengths.
- a known method for calculating similarity between vectors of unequal lengths such as Earth Mover's Distance, may be used.
- the present invention is applied to a form other than switching depending on whether or not a mask or sunglasses are worn.
- one-to-one images are input, and it is determined whether the subject is the same object.
- an example of a form assuming a use case such as an automatic door gate that opens and closes by face authentication will be described.
- the feature amounts of N persons are registered in advance in the image processing apparatus of this embodiment. At the time of matching, one image taken by the camera in front of the gate is input as an input image, and whether the input person is the same as one of the N registered people or not is checked. judge.
- the presence or absence of a mask is determined to switch the feature quantity conversion unit.
- a face image for registration a frontal face with good lighting conditions
- a face image for inquiry the lighting conditions are poor depending on the camera installation status, the angle of the face is large, etc.
- Shooting conditions are very different. Therefore, the feature quantity conversion unit corresponding to each is learned and used.
- FIG. 11A shows the person registration operation
- FIG. 11B shows the matching operation between the input image and the registered person.
- the processing mode setting unit 109 sets the current operation mode to the registration operation mode (S601).
- the first feature quantity conversion unit 105 acquires a conversion parameter set (first parameter set) for the registration operation mode. Apply the obtained parameter set to the trained model.
- the first image acquisition unit 101 inputs all N person images for registration one by one (S604), the feature amount conversion unit 105 converts them into feature amounts (S605), and the feature registration unit 108 is registered as the feature value of each person.
- the registered image is assumed to be a front face of a person photographed under favorable conditions. For this reason, the first feature value conversion unit is trained in advance mainly using frontal faces.
- the processing mode setting unit 109 sets the operation mode to the matching operation mode (S701).
- the second feature quantity conversion unit 106 acquires a parameter set (second parameter set) selected according to the situation from among a plurality of learned parameter sets.
- the second parameter set is learned in advance by using people at various angles as learning data.
- the second image acquisition unit 102 acquires one captured input image. Depending on the positional relationship between the camera and the gate door, it is not determined in advance where the person appears in the image. Therefore, a face detector may be provided inside the second image acquisition unit 102 to detect the face and cut out only the image around the face. (A widely known face detector may be used.)
- the second feature quantity conversion unit 106 acquires a second feature quantity from the input image (S704).
- the feature quantity matching unit 107 calculates the degree of similarity between the feature quantity of the input image and each registered feature quantity one by one (S706). The result is output (S708).
- the gate door is opened and closed based on the above results. Specifically, if the person included in the second image matches any of the registered persons, control is performed to open the gate. Print a notification to the administrator. The authentication result may be output to a display device near the entrance gate.
- FIG. 12 is the flow of the learning process of the second embodiment.
- FIG. 13 also shows a schematic diagram. The difference here is that the first learning model and the second learning model are learned at the same time, unlike the form of the first embodiment. It will be explained that the learning method of this embodiment can also be applied to such a method.
- the hardware configuration example is the same as that shown in FIG. 2, and the functional configuration example of the image processing apparatus is the same as that shown in FIG.
- the image acquisition unit 201 acquires a first learning image group in which only front images simulating the photographing conditions of the registered images are collected.
- the feature amount conversion unit 202 acquires the first learning feature amount from the first learning image group based on the first learning model using the first parameter set.
- the image acquisition unit 201 acquires the second learning image group.
- the second group of images includes various human images at different angles including looking down on the input image.
- the feature amount conversion unit 202 acquires the second learning feature amount from the second learning image group based on the second learning model using the second parameter set.
- the learning unit 203 randomly selects images from each image group to create a subject pair (intra-class pair) and a stranger pair (inter-class pair), and calculates a loss value based on the similarity between their feature amounts.
- Ask for As the loss the triplet loss known from Non-Patent Document 2 or the like is used as follows.
- Non-Patent Document 2 Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A unified embedding for face recognition and clustering. In CVPR, 2015).
- f i is a feature amount of a person image I i
- f j is a feature amount of a person different from the image I i
- f k is a feature amount of another image I k of the same person as I i .
- the person images I i are randomly selected from the first training set or the second training set, and the person images I j and I k are sampled accordingly to form inter-class pairs and intra-class pairs.
- the person images I j and I k are selected from the second learning set, and when the person image I i is selected from the second learning set, the person image I i is selected from the second learning set.
- Images I j and I k are chosen from the first training set. Thereby, the first learning model and the second learning model can be interlocked and learned.
- the learning unit 203 performs parameter learning update using the error backpropagation method in the direction in which each of the first learning model and the second learning model reduces the loss value.
- a loss value based on the similarity is calculated for each output of the two learning models, and this is back-propagated as an error to each feature transforming unit again to update the learning. is done.
- the first learning model and the second learning model both process images with different characteristics while learning both at the same time.
- a combination of learning two learning models simultaneously in the initial stage and learning only the second feature amount while fixing the first feature amount in the latter half is also conceivable.
- both the state determination and the feature amount conversion obtain the state and the feature amount from the image.
- an embodiment will be described in which an intermediate feature amount is generated from an image, and state determination and feature amount conversion are performed based on the intermediate feature amount.
- the state includes, for example, attributes of a person such as gender, race, and age.
- some parameters of the learning model are changed according to the attributes of the person.
- a common layer is used for the learning model layer that executes the attribute (state) determination of a person and the processing of feature amount conversion. As a result, the processing of state determination and feature value conversion is shared, and the speed and memory efficiency are improved.
- FIGS. 15 to 18 are used to input a one-to-one image similar to that of the first embodiment, and in the case of “one-to-one image matching processing” to determine whether the subject is the same object.
- FIGS. 19 to 20A and 20B it is determined whether the person appearing in the input image is the same as one of the registered persons from N persons registered in advance. A case of “matching processing” will be described.
- the hardware configuration is the same as that of the information processing apparatus of FIG. 2 in the first and second embodiments.
- FIG. 15 shows an example of the functional configuration of the image processing device 15.
- the basic configuration conforms to FIG. The difference is that the first feature amount conversion unit 1501 generates intermediate feature amounts.
- a parameter determination unit 1502, a second feature amount conversion unit 1504, and a third feature amount conversion unit 1505 (third feature acquisition unit) operate based on the intermediate feature amount.
- a parameter determination unit 1502 determines parameters of a trained model according to the state (attribute in the case of a person) of an object included in an image.
- a parameter determination unit 1502 estimates the state of an object included in the image based on the intermediate feature amount of the image.
- the attribute is estimated to be the attention attribute.
- the state of the object included in the image is estimated based on a third trained model that outputs a feature amount regarding the state of the object from the image.
- the parameter determination unit 1502 determines transformation parameters associated in advance according to the estimated state (person's attribute). That is, when the attribute of the object included in the first image and the attribute of the object included in the second image are the same, the same trained model (or feature transformation parameter) is determined. If the attributes of the object included in the first image and the attributes of the object included in the second image are different, different trained models (or model parameters) are determined.
- the storage unit 1503 stores conversion parameters to be supplied to the second feature amount conversion unit 1504 and the third feature amount conversion unit 1505 .
- FIG. 16 is a schematic diagram of the matching process of this embodiment.
- the input image is converted by the first feature amount conversion unit 1501 into an intermediate feature amount regarding the state of the object.
- a transformation parameter corresponding to the state is obtained by the parameter determination unit 1502 using the transformed intermediate feature amount.
- the state of an object includes gender, race, and the like. Alternatively, it may be age, facial orientation, presence or absence of a mask, etc., and is not limited to these.
- Storage unit 1503 stores transformation parameters 1602 specialized for state Y and predetermined transformation parameters 1601 for general use corresponding to all states. For example, if the state determination for the input image is “state Y,” the conversion parameter 1602 for state Y is set in the third feature amount conversion unit 1505 .
- the third feature quantity conversion unit 1505 converts the intermediate feature quantity into a face feature quantity based on the parameters determined by the parameter determination unit 1502 .
- the term "feature amount” is used, but the term “face feature amount” is used to make it easier to distinguish from the intermediate feature amount.
- the registered image is also converted into facial feature amounts, and the feature amount matching unit 107 performs matching of the facial feature amounts of the input image and the registered image.
- the processing speed can be increased because the parts that are converted to intermediate feature values are shared.
- the size of the transformation parameters managed by the storage unit 1503 can be reduced, and the transformation parameter reading speed can be increased.
- the parameter determination unit 1502 obtains the state of the object (whether or not the mask is worn) by a method such as template matching.
- the parameter determination unit 1502 may also be configured by a deep neural network like the second and third feature conversion units.
- the first feature quantity conversion unit may also be configured as a deep neural network. A specific state determination method will be described later with reference to FIG.
- each transformation parameter may be learned so that it is possible to some extent to perform feature transformation for images in states other than the corresponding state. For example, in addition to images in corresponding states as learning data, a small amount of images in other states may be included in learning. Alternatively, learning with a changed loss function, such as a smaller loss value, may be performed in other states.
- the first image acquisition unit 101 acquires a first image (first image) including a person.
- the first feature amount conversion unit 1501 converts the first image into an intermediate feature amount (first intermediate feature amount).
- the parameter determination unit 1502 determines whether the state of the first image (first state) is obtained from the first intermediate feature amount. Specifically, it is determined whether the sex of the person reflected in the first image is male (not female).
- the parameter determination unit 1502 reads the transformation parameters corresponding to the first state from the storage unit 1503 based on the determination result, and sets them in the second feature amount transformation unit 1504.
- the second feature quantity conversion unit 1504 converts the first intermediate feature quantity to obtain a face feature quantity (first face feature quantity).
- first face feature quantity a face feature quantity
- the second feature conversion unit 1504 is configured based on the trained model in which the parameters for good identification of male are set. , to get the features from the image.
- the second image acquisition unit 102 acquires the second image (second image) including the person.
- the first feature amount conversion unit 1501 converts the second image into an intermediate feature amount (second intermediate feature amount).
- the parameter determination unit 1502 determines the state of the second image (second state) from the second intermediate feature amount. Specifically, it is determined whether the gender of the person appearing in the second image is male (not female).
- the conversion parameter corresponding to the second state is read out from the storage unit 1503 and set in the third feature amount conversion unit 1505.
- the third feature quantity conversion unit 1505 converts the second intermediate feature quantity to obtain a face feature quantity (second face feature quantity).
- the parameters of the trained model set in the second feature conversion unit 1504 and the third feature conversion unit 1505 are the same.
- the trained model parameters set in the second feature conversion unit 1504 and the third feature conversion unit 1505 are different. .
- the feature amount matching unit 107 calculates the similarity score of the two feature amounts obtained in S1705 and S1710. By thresholding the similarity score, it is possible to determine whether the persons appearing in the two images are the same.
- the status determined by the parameter determination unit 1502 is race, sex, or the like, it can be determined that the person is a different person if the status is different.
- the state of two images is obtained in advance, and if the degree of certainty of the determination result regarding the state of the object included in the image is high and the respective states are determined to be different, the facial feature amount is determined. Skip the conversion process. This can reduce processing. Further, when it is determined that the two sheets are in the same state, processing can be reduced by reading out the conversion parameters once.
- S1801 to S1803 in FIG. 18 are the same as S1701 to S1703 in FIG. ).
- the first feature conversion unit 1501 converts the second image into intermediate feature amounts to obtain the state of the second image (second state).
- the parameter determination unit 1502 determines whether the first state and the second state obtained in S1803 and S1806 are the same. If they are the same, the process moves to S1808, otherwise the process moves to S1812.
- the parameter determination unit 1502 reads the transformation parameters corresponding to the first state from the storage unit 1503 and sets them in the second feature amount transformation unit 1504 and the third feature amount transformation unit 1505.
- the second feature quantity conversion unit 1504 converts the first intermediate feature quantity into a face feature quantity (first face feature quantity).
- the third feature quantity conversion unit 1505 converts the second intermediate feature quantity into a face feature quantity (second face feature quantity).
- the feature amount matching unit 107 calculates a similarity score between the first facial feature amount and the second facial feature amount.
- parameter determining section 1502 is configured to output the score along with the state.
- the parameter determination unit 1502 is configured as a deep neural network and configured to obtain an output for each state. Then, learning is performed so that the output corresponding to the state of the image is maximized. The state determination may be performed as the state in which the output is maximized, and the output value may be used as the state score. A specific method for obtaining the state score will be described later with reference to FIG. If the state score is greater than the predetermined threshold, the process moves to S1813. Otherwise, the process moves to S1814.
- the feature quantity matching unit 107 outputs the similarity between the first image and the second image as zero. In other words, when the certainty for state determination is equal to or greater than a predetermined value and the states of the objects (attributes of the person) are different, it can be determined that the objects are unlikely to be the same object.
- the parameter determination unit 1502 reads the conversion parameters corresponding to the first state from the storage unit 1503 and sets them in the second feature amount conversion unit 1504.
- the second feature quantity conversion unit 1504 converts the first intermediate feature quantity to obtain a face feature quantity (first face feature quantity).
- the conversion parameter corresponding to the second state is read out from the storage unit 1503 and set in the third feature amount conversion unit 1505.
- the third feature quantity conversion unit 1505 converts the second intermediate feature quantity to obtain a face feature quantity (second face feature quantity).
- the feature amount matching unit 107 calculates the similarity score of the two feature amounts obtained in S1815 and S1817. Similar to the embodiment described above, two objects are determined to be the same if the similarity score is greater than or equal to a predetermined threshold, and different if less than the threshold.
- FIG. 19 shows a functional configuration example of the image processing device 19 .
- the basic configuration conforms to FIG. The difference is that a processing mode setting unit 1901 and a feature amount registration unit 1902 are provided.
- the flow of matching processing is shown in FIGS. 20A and 20B.
- FIG. 20A shows the person registration operation
- FIG. 20B shows the matching operation between the input image and the registered person.
- the parameter determination unit 1502 determines conversion parameters according to the racial status of the registered person acquired in advance. This is because the race of the registered person can be accurately known at the time of registration, and it is not necessary to estimate it from the image. A specific flow of processing will be described with reference to FIG. 20A.
- the processing mode setting unit 109 sets the current operation mode to the registered operation mode.
- the processing mode setting unit 109 acquires the racial status of the registered person. For example, a list of race conditions for each registered person is stored in advance in the storage unit H104 such as an HDD, and the list is acquired. Alternatively, the racial status of the person to be registered is obtained from the obtaining unit H105 such as a keyboard.
- S2003a is the beginning of a loop for sequentially processing registered persons. It is assumed that registered persons are assigned numbers in order from 1. In order to refer to the registered person using the variable i, i is initialized to 1 first. Furthermore, when i is equal to or less than the number of registered persons, the process proceeds to S2005a, and when this is not satisfied, the loop is exited and the process ends.
- the parameter determining unit 1502 reads the corresponding conversion parameters from the storage unit 1503 based on the state of the person i acquired by the processing mode setting unit 109, and sets them in the second feature amount conversion unit 1504.
- the first image acquisition unit 101 acquires a registered image of person i.
- the first feature quantity conversion unit 1501 converts the registered image into an intermediate feature quantity.
- the second feature amount conversion unit 1504 converts the intermediate feature amount to obtain a face feature amount.
- the facial feature amount of the person i is registered in the feature registration unit 1902.
- the race status of person i is also registered.
- S2009a is the end of the registered person loop, increments i by 1, and returns to S2003a.
- the matching operation between the input image and the registered person will be explained using FIG. 20B.
- the state of the input image such as race
- the state of the input image is unknown, so processing is performed based on the state estimated from the image.
- states such as race and gender
- the states are different, it can be determined that they are different persons. Therefore, when the state of the input image, such as race, can be estimated with a high degree of certainty, the processing speed is improved by narrowing down the registered persons to be collated.
- the state sought by the parameter determining unit 1502 is "race".
- the processing mode setting unit 109 sets the operation mode to the collation operation mode. As a result, the state is not acquired from the processing mode setting unit 109 .
- the second image acquisition unit 102 acquires a query image (second image).
- the first feature amount conversion unit 1501 converts the second image into an intermediate feature amount (second intermediate feature amount).
- the parameter determination unit 1502 determines the state of the second image (second state) from the second intermediate feature amount. Specifically, the race of the person appearing in the second image is determined.
- the parameter determination unit 1502 determines conversion parameters corresponding to the second state from the storage unit 1503 according to the second state.
- the determined conversion parameter is set in the (third) trained model in the third feature quantity conversion unit 1505 .
- the third feature quantity conversion unit 1505 converts the second intermediate feature quantity to obtain a face feature quantity (second face feature quantity).
- S2007b it is determined whether the score of the state output by the parameter determination unit 1502 (state score) is high. If the state score is greater than the predetermined threshold, the process moves to S2008b. Otherwise, the process proceeds to S2009b.
- the feature amount matching unit 107 narrows down registered persons who are in the same state as the second state as candidate persons. That is, in the present embodiment, registered persons of the same race are narrowed down.
- S2009b is the beginning of a loop for sequentially processing registered persons.
- the feature amount matching unit 107 performs matching processing on the narrowed down registered persons in order. Therefore, in order to sequentially refer to the registered persons by the variable i, numbers are assigned to the registered persons to be processed in order from 1, and i is initialized to 1. Furthermore, when i is equal to or less than the number of registered persons to be processed, the process proceeds to S2010b, and when this is not satisfied, the loop is exited and the process proceeds to S2012b.
- the feature amount matching unit 107 obtains the facial feature amount of the person i stored in the feature registration unit 1902. Then, the feature amount matching unit 107 calculates a similarity score between the second facial feature amount obtained in S2006b and the facial feature amount of the person i.
- S2011b is the end of the registered person loop, increments i by 1, and returns to S2009b.
- the output unit 1900 outputs the result if there is a person whose similarity score obtained in S2010b is equal to or greater than a predetermined value. Note that the output unit 1900 outputs the result of matching by the feature amount matching unit 107, that is, the result of face authentication to a display device or the like.
- Example of state determination method A method of obtaining a state from an image by the first feature quantity conversion unit 1501 and parameter determination unit 1502 will be described.
- the first feature quantity conversion unit 1501 and the parameter determination unit 1502 are configured using the DNN described above.
- the parameter determination unit 1502 is configured to obtain the output through the Softmax function by making the number of outputs of the neural network the same as the number of states.
- a state label is associated with each dimension of the output of the Softmax function of the parameter determination unit 1502, and learning is performed so that the corresponding state of the image takes 1 and the other states take 0.
- a learning flow will be described with reference to FIG.
- the parameter set used by the first feature quantity conversion unit 1501 is initialized with random numbers or the like. Alternatively, it may be initialized with a parameter set obtained by learning face authentication by the method described in FIG. 5A and the like.
- the parameter set used by the parameter determination unit 1502 is initialized with random numbers or the like.
- a group of face images labeled with the state is acquired. For example, if the status is race, a group of face images labeled with race is acquired.
- the parameter determination unit 1502 estimates the label of the state. An image is used as an input, and DNN is forward-processed to obtain the value of the Softmax function.
- Equation 9 the loss is calculated based on Equation 9, known as cross-entropy.
- p(i) indicates correct label information that takes 1 when the i-th state value is correct, and takes 0 otherwise.
- q(i) indicates the value of the Softmax function corresponding to the i-th state.
- the parameter sets of the first feature quantity conversion unit 1501 and the parameter determination unit 1502 are updated so that the loss value becomes small.
- small updates are performed in the direction of reducing the loss value.
- S2107 it is determined whether or not learning has ended. For example, it is determined that learning has ended when the decrease in the loss value becomes small. Alternatively, it may be determined that learning is completed when learning is repeated a predetermined number of times. If learning has ended, the process moves to S2108. Otherwise, the process returns to S2103.
- the state of the image can be obtained. Specifically, the value of the Softmax function for the image is obtained, and it is determined that the state corresponds to the dimension having the largest value. Note that the value of the Softmax function obtained at this time takes a larger value when the certainty is higher, so the value of the Softmax function can also be used as the state score.
- processing speed is increased by standardizing the state determination and calculation of intermediate feature values for feature value conversion.
- the model size for state determination and feature quantity conversion can be reduced, and the amount of memory used can also be reduced.
- the conversion parameters managed by the storage unit 1503 can be reduced, the reading speed of the conversion parameters can be increased.
- the present invention can be applied even when both state determination and feature amount conversion are performed using an image as an input.
- an attribute that is difficult for a person to change during his or her lifetime may be set.
- appearance attributes such as age, presence or absence of a beard, and hairstyle may be used.
- Alternate attributes such as skin color may also be used instead of race. Therefore, the condition of use is not limited to race or gender.
- the present invention can be applied to various tasks related to identity matching and similarity calculation. For example, it can be applied to a task of detecting objects of a specific category, an image inquiry task of extracting a design of a specific shape from a moving image, a similar image search, and so on.
- the conditions determined by the condition determination unit 103 and the processing mode setting unit 109 are the image quality of the input image, the angle of view of the object, the size of the object, the clarity of the view of the object, the brightness of the lighting, the shielding of the object, and the attachments of the object. and presence or absence of wearables, or subtypes of objects, or combinations thereof.
- the comparison of whether or not objects are the same has been mainly described, but it is also possible to perform regression estimation of the similarity values between objects.
- the true similarity between a pair of object i and object j is given as a teacher value as shown in the following equation, and the squared error from the estimated similarity score is defined as the loss value.
- Loss value ⁇ i ⁇ j (true pairwise similarity score (f i, f j ) - pairwise similarity score (f i , f j )) 2
- the parameters of the feature quantity conversion unit 105 and the feature quantity conversion unit 106 may be learned so as to reduce this loss value.
- f i and f j are pairs of feature amounts of images transformed by the first trained model and the second trained model, respectively. The above shows that the present invention can be applied to various tasks.
- the present invention is also realized by executing the following processing. That is, the software (program) that implements the functions of the above-described embodiments is supplied to the system or device via a network for data communication or various storage media. Then, the computer (or CPU, MPU, etc.) of the system or device reads and executes the program. Alternatively, the program may be recorded on a computer-readable recording medium and provided.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Geometry (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
Description
本発明の実施形態に係る画像処理装置を、図面を参照しながら説明する。なお、図面間で符号の同じものは同じ動作をするとして重ねての説明を省く。また、この実施の形態に掲載されている構成要素はあくまで例示であり、この発明の範囲をそれらのみに限定する趣旨のものではない。
図3A及び図3Bは、本実施形態の照合処理の模式図であり、本発明の方法と従来の方法との差異を示している。図3Aは従来の方法であり、認証処理の対象となる人物を含む入力画像と登録人物を含む登録画像とに対して同一のパラメータで特徴量の変換を行う。この時マスクやサングラスの装着の有無といった大きな見えの変化があると、精度の劣化が生じ易い。一方であらゆる見えの変化に対応させようとすると、特徴量変換部の構成規模が大きくなる課題がある。図3Bは本発明の模式図例である。同図では入力画像が入力されると、物体パラメータ決定103がマスク装着の有無といった被写体の状態を判定する。その判定結果に応じて特徴量変換部106が記憶部104から適切な変換パラメータを読み出して特徴量変換を行う。ここで、変換パラメータは、人物の状態や撮影環境に応じて、複数種類学習されている。変換パラメータは被写体の状態に特化して個別に学習がなされているため、マスクやサングラスの装着の有無といった大きな見えの変化に対しても頑健な照合が実現できる。
類似度スコア(f1,f2) := cos(θ12)
=<f1,f2>÷(|f1|・|f2|)
本実施形態の学習フェーズについて説明する。ここでは非特許文献1で公知である<代表ベクトル手法>による学習を行う。代表ベクトル手法は、各人物を代表する特徴量ベクトルを設定し、これを併用することで学習効率を上げる顔認証の学習手法である。詳細は非特許文献1を参照されたい。なお、学習処理フェーズにおける画像処理装置2は、図14に示す。画像変換部200は、対象の基準となる画像(例えば、装着物がない状態の人物の顔画像)のセットである第一の画像群を、対象の所定の状態を示す画像(例えば、マスクを装着した人物の顔画像)のセットである第二の画像群に変換する。具体的には、マスク等の装着物を示す画像を顔画像に合成することや、ある一定の明るさになるよう画像を変換する。画像取得部201は、学習用に用いる画像群を取得する。ここでは、2種類以上のパラメータセットを学習するため、2種類以上の画像群を取得する。特徴量変換部202は、画像の状態に応じたパラメータセットと画像から特徴量を抽出する学習モデルとを用いて、画像のそれぞれから特徴量を取得する。学習部203は、パラメータセットと、画像から特徴量を抽出する学習モデルを学習する。なお、本実施形態では、第一の学習モデルと第二の学習モデルを交互に学習させる例を述べる。
クラス内類似度スコア(fi)= 類似度スコア(fi,vy(i)) ,
クラス間類似度スコア(fi)= Σj≠y(¬i) 類似度スコア(fi,vj)
損失値 = Σi クラス間類似度スコア(fi) - λクラス内類似度スコア(fi)
ここで学習の形態のその他の派生的な形態を挙げる。例えば、<代表ベクトル>を用いない学習形態も考えられる。この学習の動作処理のフロー例を図8A及び図8B、模式図として図9A~図9Cを用いて説明する。本形態例では通常の人物の画像のセットと、同画像にマスク画像を重畳合成した画像群を用いる。図9Aに通常の人物の画像a,b,p、およびマスクを重畳した画像a’,b’,p’の例を示す。本派生の例では画像a’,b’,p’の特徴量が画像a,b,pの特徴量へとそれぞれ近づくように第二のパラメータセットを学習する。
クラス内類似度スコア(fi)= Σy(k)=y(i) 類似度スコア(fi,fk) ,
クラス間類似度スコア(fi)= Σy(j)≠y(¬i) 類似度スコア(fi,fj) ,
損失値 = Σi クラス間類似度スコア(fi) - λクラス内類似度スコア(fi)
画像ペア類似度スコア(fx)= 類似度スコア(fx,fx’)
損失値 = Σi クラス間類似度スコア(fi)- λ1 クラス内類似度スコア(fi)
- λ2 画像ペア類似度スコア(fi)
次にDNNの構成について派生の形態例を挙げる。例えば、通常人物用の特徴量変換のDNNと、マスク装着人物用のDNNで、層数やニューロン数を変更することが考えられる。一般に、マスクをつけた人物や横顔の人物などの照合困難な対象や、見えのバリエーションが豊富な対象は、規模の大きいDNNを用いることで性能が向上しやすい。このため、扱う対象に応じて各DNNの規模を調整すれば計算コストと照合精度の費用対効果を向上させることができる。
本実施形態はマスクやサングラスの装着の有無による切り替え以外の形態に本発明を適用する。実施形態1では1枚対1枚の画像を入力とし、同一物体の被写体かを判定した。本実施形態では、顔認証によって開閉する自動ドアのゲートのようなユースケースを想定した形態例を説明する。本実施形態の画像処理装置には予めN人の人物の特徴量を登録しておく。照合時にはゲートの前のカメラで撮影した1枚の画像を入力画像として入力し、入力された人物が登録されたN人のうちいずれかの人物と同一であるか、いずれにも該当しないかを判定する。
損失値 = Σi [クラス間ペア類似度スコア(fi,fj)
- クラス内ペア類似度スコア(fi,fk)+m ]+ ,
ただしmは学習を頑健にするための損失のマージン値の定数、[・]+は
(数式8)
[x]+=x If x>0
[x]+=0 Otherwise
で定義される関数である。
上述の実施形態では、状態判定と特徴量変換の双方が画像から状態や特徴量を各々求めていた。本実施形態では、画像から中間特徴量を生成し、中間特徴量をもとに状態判定と特徴量変換を行う形態について説明する。ここで、状態とは、例えば、性別、人種や年齢といった人物の属性を含む。本実施形態では、画像に含まれる人物について個人を特定するための特徴量を得る際に、人物の属性に応じて学習モデルの一部のパラメータを異ならせる。一方で、人物の属性(状態)判定及び特徴量変換の処理を実行する学習モデルのレイヤについては共通のものを用いる。これにより、状態判定と特徴量変換の処理が共通化され、速度・メモリの効率が高められる。
図15に画像処理装置15の機能構成例を示す。基本的な構成は図1に準じている。差異としては、第一の特徴量変換部1501が中間特徴量を生成することである。これに伴い、中間特徴量をもとにパラメータ決定部1502と第二の特徴量変換部1504と第三の特徴量変換部1505(第三の特徴取得部)が動作するようになっている。パラメータ決定部1502は、画像に含まれる物体の状態(人物の場合、属性)に応じて、学習済みモデルのパラメータを決定する。パラメータ決定部1502は、画像の中間特徴量に基づいて、画像に含まれる物体の状態を推定する。推定方法は、注目属性の代表的な特徴量との一致度が所定の閾値以上であれば注目属性であると推定する。または、画像から物体の状態に関する特徴量を出力する第三の学習済みモデルに基づいて画像に含まれる物体の状態を推定する。さらに、パラメータ決定部1502は、推定された状態(人物の属性)に応じて予め対応付けられた変換パラメータを決定する。つまり、第一の画像に含まれる物体の属性と、第二の画像に含まれる物体の属性が同じ場合は、同一の学習済みモデル(または特徴変換パラメータ)が決定される。第一の画像に含まれる物体の属性と、第二の画像に含まれる物体の属性が異なる場合は、異なる学習済みモデル(またはモデルのパラメータ)が決定される。また、記憶部1503は第二の特徴量変換部1504と第三の特徴量変換部1505に供給する変換パラメータを記憶する。
図19に画像処理装置19の機能構成例を示す。基本的な構成は図15に準じている。差異としては、処理モード設定部1901と特徴量登録部1902を備える。照合処理のフローは図20A及び図20Bである。人物の登録動作を図20Aに、入力画像と登録人物との照合動作を図20Bに示す。
第一の特徴量変換部1501とパラメータ決定部1502により画像から状態を求める方法について述べる。第一の特徴量変換部1501とパラメータ決定部1502を、前述のDNNを使用して構成する。パラメータ決定部1502はニューラルネットワークの出力数を状態数と同じにして、Softmax関数を通して出力を得るように構成する。
損失値 = -Σp(i)log(q(i))
本明細書中では人物の照合を中心に説明を行ったが、本発明は同一性の照合や類似度の算出に関する様々なタスクに適応可能である。例えば特定のカテゴリの物体を検出するタスク、動画中から特定形状の意匠を抽出する画像問い合わせタスク、類似画像検索、などへの適用がある。
損失値 = ΣiΣj (真のペア類似度スコア(fi,fj)
- ペア類似度スコア(fi,fj))2
Claims (19)
- 画像から特徴を抽出する第一の学習済みモデルに基づいて、第一の画像から第一の特徴量を取得する第一の取得手段と
第二の画像の状態に応じて決定された、画像から特徴を抽出する第二の学習済みモデルに基づいて、前記第二の画像から第二の特徴量を取得する第二の取得手段と
前記第一の特徴量と前記第二の特徴量に基づいて、前記第一の画像に含まれる物体と前記第二の画像に含まれる物体が同一か否かを判定する照合手段と
を有し、
前記第二の学習済みモデルは、前記第一の学習済みモデルと同じ特徴空間で前記第二の特徴量を学習したモデルであることを特徴とすることを特徴とする画像処理装置。 - 前記第二の画像が所定の条件を満たすか否かを判定する判定手段を更に有し、
前記第二の取得手段は、前記所定の条件の判定の結果に応じて前記第二の学習済みモデルを決定することを特徴とする請求項1に記載の画像処理装置。 - 前記判定手段は、入力画像の画質、物体の見えの角度、物体のサイズ、物体の見えの明瞭さ、照明の明暗、物体の遮蔽、物体の付属物や装着物の有無、物体のサブタイプ、のうち少なくとも1つ以上の状態を検出するための前記所定の条件を判定することを特徴とする請求項2に記載の画像処理装置。
- 前記第二の取得手段は、前記第二の画像に含まれる人物がマスクをしている場合に、前記第一の学習済みモデルとは異なる学習済みモデルを前記第二の学習済みモデルとして決定することを特徴とする請求項2または3に記載の画像処理装置。
- 前記第一の学習済みモデルに基づいて抽出された特徴量と、前記第二の学習済みモデルに基づいて抽出された特徴量と、の類似度が所定の値より大きくなるように前記第一の学習済みモデルおよび前記第二の学習済みモデルを学習する学習手段を更に有することを特徴とする請求項1乃至4のいずれか1項に記載の画像処理装置。
- 前記学習手段は、状態が異なる複数の画像群に基づいて、前記第一の学習済みモデルおよび前記第二の学習済みモデルのそれぞれを学習することを特徴とする請求項5に記載の画像処理装置。
- 前記複数の画像群は、基準となる第一の画像群と、前記基準となる画像群を変換した第二の画像群と、を含み、
前記学習手段は、前記第一の画像群に含まれる画像と、前記第二の画像群に含まれる画像とが同一の物体である場合に、前記第一の画像群に含まれる画像の特徴量と、前記第二の画像群に含まれる画像の特徴量とが類似するように学習すること特徴とする請求項6に記載の画像処理装置。 - 前記第二の画像群は、前記第一の画像群に装着物を合成した画像群であることを特徴とする請求項7に記載の画像処理装置。
- 前記第一の学習済みモデルおよび前記第二の学習済みモデルは、それぞれが複数の層からなるニューラルネットワークからなることを特徴とする請求項5乃至8のいずれか1項に記載の画像処理装置。
- 前記第一の学習済みモデルおよび前記第二の学習済みモデルは、一部の層のパラメータを共有する請求項9に記載の画像処理装置。
- 前記第一の学習済みモデルおよび前記第二の学習済みモデルは、トランスフォーマーネットワークであることを特徴とする請求項9または10に記載の画像処理装置。
- 前記学習手段は、前記第一の学習済みモデルを学習した後に、前記第一の学習済みモデルに基づいて抽出される特徴量に基づいて前記第二の学習済みモデルを学習することを特徴とする請求項5乃至11のいずれか1項に記載の画像処理装置。
- 前記第一の学習済みモデルおよび前記第二の学習済みモデルは、同時あるいは交互にパラメータの学習を行うことを特徴とする請求項5乃至11のいずれか1項に記載の画像処理装置。
- 画像から物体の状態に関する特徴量を出力する第三の学習済みモデルに基づいて、前記第一の画像の中間特徴量を取得する第三の取得手段と、
前記取得された記第一の画像の中間特徴量に基づいて、前記第一の学習済みモデルのパラメータを決定するパラメータ決定手段と、を更に有することを特徴とする請求項1乃至13のいずれか1項に記載の画像処理装置。 - 前記第三の取得手段は、更に前記第二の画像の中間特徴量を取得し、
前記パラメータ決定手段は、前記取得された記第二の画像の中間特徴量に基づいて、前記第二の学習済みモデルのパラメータを決定し、
前記第二の学習済みモデルのパラメータは、前記第一の画像の中間特徴量が示す前記物体の属性と前記取得された記第二の画像の中間特徴量が示す前記物体の属性とが異なる場合は、前記第一の学習済みモデルのパラメータとは異なるパラメータに決定されることを特徴とする請求項14に記載の画像処理装置。 - 前記第一の取得手段は、前記第三の取得手段により取得される第一の画像の中間特徴量を使用して第一の特徴量を取得することを特徴とする請求項14に記載の画像処理装置。
- 前記第二の取得手段は、前記第三の取得手段により取得される第二の画像の中間特徴量を使用して第二の特徴量を取得することを特徴とする請求項15に記載の画像処理装置。
- 画像から特徴を抽出する第一の学習済みモデルに基づいて、第一の画像から第一の特徴量を取得する第一の取得工程と、
第二の画像の状態に応じて決定された、画像から特徴を抽出する第二の学習済みモデルに基づいて、前記第二の画像から第二の特徴量を取得する第二の取得工程と、
前記第一の特徴量と前記第二の特徴量に基づいて、前記第一の画像に含まれる物体と前記第二の画像に含まれる物体が同一か否かを判定する照合工程と、
を有し、
前記第二の学習済みモデルは、前記第一の学習済みモデルと同じ特徴空間で前記第二の特徴量を学習したモデルであることを特徴とする画像処理方法。 - 画像から特徴を抽出する第一の学習済みモデルに基づいて、第一の画像から第一の特徴量を取得する第一の取得工程と、
第二の画像の状態に応じて決定された、画像から特徴を抽出する第二の学習済みモデルに基づいて、前記第二の画像から第二の特徴量を取得する第二の取得工程と、
前記第一の特徴量と前記第二の特徴量に基づいて、前記第一の画像に含まれる物体と前記第二の画像に含まれる物体が同一か否かを判定する照合工程と、
をコンピュータに実行させるためのプログラムであって、
前記第二の学習済みモデルは、前記第一の学習済みモデルと同じ特徴空間で前記第二の特徴量を学習したモデルであることを特徴とするプログラム。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP22811333.8A EP4350611A1 (en) | 2021-05-26 | 2022-05-24 | Image processing device and image processing method for performing facial authentication |
CN202280037899.7A CN117396929A (zh) | 2021-05-26 | 2022-05-24 | 进行面部识别的图像处理设备和图像处理方法 |
US18/514,325 US20240087364A1 (en) | 2021-05-26 | 2023-11-20 | Image processing apparatus configured to perform face recognition, image processing method, and storage medium |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2021088227 | 2021-05-26 | ||
JP2021-088227 | 2021-05-26 | ||
JP2021192448A JP7346528B2 (ja) | 2021-05-26 | 2021-11-26 | 画像処理装置、画像処理方法及びプログラム |
JP2021-192448 | 2021-11-26 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/514,325 Continuation US20240087364A1 (en) | 2021-05-26 | 2023-11-20 | Image processing apparatus configured to perform face recognition, image processing method, and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022250063A1 true WO2022250063A1 (ja) | 2022-12-01 |
Family
ID=84228849
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2022/021288 WO2022250063A1 (ja) | 2021-05-26 | 2022-05-24 | 顔認証を行う画像処理装置および画像処理方法 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20240087364A1 (ja) |
EP (1) | EP4350611A1 (ja) |
WO (1) | WO2022250063A1 (ja) |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007280250A (ja) * | 2006-04-11 | 2007-10-25 | Matsushita Electric Ind Co Ltd | 顔認証システム |
JP2017117024A (ja) * | 2015-12-22 | 2017-06-29 | キヤノン株式会社 | 画像認識装置、画像認識方法、及び撮像装置 |
JP2018147240A (ja) * | 2017-03-06 | 2018-09-20 | パナソニックIpマネジメント株式会社 | 画像処理装置、画像処理方法、及び画像処理プログラム |
JP2018160237A (ja) * | 2017-03-23 | 2018-10-11 | 三星電子株式会社Samsung Electronics Co.,Ltd. | 顔認証方法及び装置 |
JP2018165983A (ja) * | 2017-03-28 | 2018-10-25 | 三星電子株式会社Samsung Electronics Co.,Ltd. | 顔認証方法及び装置 |
JP2018165980A (ja) * | 2017-03-28 | 2018-10-25 | 三星電子株式会社Samsung Electronics Co.,Ltd. | 顔認証方法及び装置 |
US20180373924A1 (en) * | 2017-06-26 | 2018-12-27 | Samsung Electronics Co., Ltd. | Facial verification method and apparatus |
JP2019102081A (ja) * | 2017-12-05 | 2019-06-24 | 富士通株式会社 | データ処理装置及びデータ処理方法 |
WO2020121425A1 (ja) * | 2018-12-12 | 2020-06-18 | 三菱電機株式会社 | 状態判定装置、状態判定方法、及び状態判定プログラム |
JP2020115311A (ja) * | 2019-01-18 | 2020-07-30 | オムロン株式会社 | モデル統合装置、モデル統合方法、モデル統合プログラム、推論システム、検査システム、及び制御システム |
US10956819B2 (en) | 2017-05-23 | 2021-03-23 | Google Llc | Attention-based sequence transduction neural networks |
JP2021088227A (ja) | 2019-12-02 | 2021-06-10 | Toyo Tire株式会社 | 回避走行支援システムおよび回避走行支援方法 |
JP2021192448A (ja) | 2010-11-05 | 2021-12-16 | 株式会社半導体エネルギー研究所 | 半導体装置 |
-
2022
- 2022-05-24 WO PCT/JP2022/021288 patent/WO2022250063A1/ja active Application Filing
- 2022-05-24 EP EP22811333.8A patent/EP4350611A1/en active Pending
-
2023
- 2023-11-20 US US18/514,325 patent/US20240087364A1/en active Pending
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007280250A (ja) * | 2006-04-11 | 2007-10-25 | Matsushita Electric Ind Co Ltd | 顔認証システム |
JP4957056B2 (ja) | 2006-04-11 | 2012-06-20 | パナソニック株式会社 | 顔認証システムおよび顔認証方法 |
JP2021192448A (ja) | 2010-11-05 | 2021-12-16 | 株式会社半導体エネルギー研究所 | 半導体装置 |
JP2017117024A (ja) * | 2015-12-22 | 2017-06-29 | キヤノン株式会社 | 画像認識装置、画像認識方法、及び撮像装置 |
JP2018147240A (ja) * | 2017-03-06 | 2018-09-20 | パナソニックIpマネジメント株式会社 | 画像処理装置、画像処理方法、及び画像処理プログラム |
JP2018160237A (ja) * | 2017-03-23 | 2018-10-11 | 三星電子株式会社Samsung Electronics Co.,Ltd. | 顔認証方法及び装置 |
JP2018165980A (ja) * | 2017-03-28 | 2018-10-25 | 三星電子株式会社Samsung Electronics Co.,Ltd. | 顔認証方法及び装置 |
JP2018165983A (ja) * | 2017-03-28 | 2018-10-25 | 三星電子株式会社Samsung Electronics Co.,Ltd. | 顔認証方法及び装置 |
US10956819B2 (en) | 2017-05-23 | 2021-03-23 | Google Llc | Attention-based sequence transduction neural networks |
US20180373924A1 (en) * | 2017-06-26 | 2018-12-27 | Samsung Electronics Co., Ltd. | Facial verification method and apparatus |
JP2019102081A (ja) * | 2017-12-05 | 2019-06-24 | 富士通株式会社 | データ処理装置及びデータ処理方法 |
WO2020121425A1 (ja) * | 2018-12-12 | 2020-06-18 | 三菱電機株式会社 | 状態判定装置、状態判定方法、及び状態判定プログラム |
JP2020115311A (ja) * | 2019-01-18 | 2020-07-30 | オムロン株式会社 | モデル統合装置、モデル統合方法、モデル統合プログラム、推論システム、検査システム、及び制御システム |
JP2021088227A (ja) | 2019-12-02 | 2021-06-10 | Toyo Tire株式会社 | 回避走行支援システムおよび回避走行支援方法 |
Non-Patent Citations (2)
Title |
---|
DENG: "ArcFace: Additive Angular Margin Loss for Deep Face Recognition", CVPR, 2019 |
FLORIAN SCHROFFDMITRY KALENICHENKOJAMES PHILBIN: "Facenet: A unified embedding for face recognition and clustering", CVPR, 2015 |
Also Published As
Publication number | Publication date |
---|---|
US20240087364A1 (en) | 2024-03-14 |
EP4350611A1 (en) | 2024-04-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ye et al. | Bi-directional center-constrained top-ranking for visible thermal person re-identification | |
Ye et al. | Modality-aware collaborative learning for visible thermal person re-identification | |
Sarfraz et al. | Deep perceptual mapping for cross-modal face recognition | |
Chen et al. | Relevance metric learning for person re-identification by exploiting listwise similarities | |
Zhang et al. | Multi-observation visual recognition via joint dynamic sparse representation | |
CN113989851B (zh) | 一种基于异构融合图卷积网络的跨模态行人重识别方法 | |
EP2245580A1 (en) | Discovering social relationships from personal photo collections | |
CN112801054B (zh) | 人脸识别模型的处理方法、人脸识别方法及装置 | |
GB2591496A (en) | De-centralised learning for re-identification | |
CN108805077A (zh) | 一种基于三元组损失函数的深度学习网络的人脸识别系统 | |
Barman et al. | Shape: A novel graph theoretic algorithm for making consensus-based decisions in person re-identification systems | |
Wu et al. | Race classification from face using deep convolutional neural networks | |
Ren et al. | A deep and structured metric learning method for robust person re-identification | |
CN109190521A (zh) | 一种基于知识提纯的人脸识别模型的构建方法及应用 | |
CN112836605B (zh) | 一种基于模态增广的近红外与可见光跨模态人脸识别方法 | |
Ni et al. | Discriminative deep transfer metric learning for cross-scenario person re-identification | |
JP7346528B2 (ja) | 画像処理装置、画像処理方法及びプログラム | |
CN111160119B (zh) | 一种用于化妆人脸验证的多任务深度判别度量学习模型构建方法 | |
An et al. | Multi-level common space learning for person re-identification | |
WO2022250063A1 (ja) | 顔認証を行う画像処理装置および画像処理方法 | |
Kekre et al. | Face and gender recognition using principal component analysis | |
Yao et al. | Mask attack detection using vascular-weighted motion-robust rPPG signals | |
Yang et al. | Privileged information-based conditional structured output regression forest for facial point detection | |
Li et al. | Facial age estimation by deep residual decision making | |
CN116246307A (zh) | 基于扩散模型的跨模态行人重识别方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22811333 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202280037899.7 Country of ref document: CN |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2022811333 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2022811333 Country of ref document: EP Effective date: 20240102 |