Nothing Special   »   [go: up one dir, main page]

WO2016174915A1 - Image processing device, image processing method, and program - Google Patents

Image processing device, image processing method, and program Download PDF

Info

Publication number
WO2016174915A1
WO2016174915A1 PCT/JP2016/056191 JP2016056191W WO2016174915A1 WO 2016174915 A1 WO2016174915 A1 WO 2016174915A1 JP 2016056191 W JP2016056191 W JP 2016056191W WO 2016174915 A1 WO2016174915 A1 WO 2016174915A1
Authority
WO
WIPO (PCT)
Prior art keywords
normal
image
unit
polarization
subject
Prior art date
Application number
PCT/JP2016/056191
Other languages
French (fr)
Japanese (ja)
Inventor
文香 中谷
康孝 平澤
雄飛 近藤
穎 陸
Original Assignee
ソニー株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニー株式会社 filed Critical ソニー株式会社
Priority to ES16786192T priority Critical patent/ES2929648T3/en
Priority to US15/565,968 priority patent/US10444617B2/en
Priority to JP2017515413A priority patent/JP6693514B2/en
Priority to CN201680023380.8A priority patent/CN107533370B/en
Priority to EP16786192.1A priority patent/EP3291052B1/en
Publication of WO2016174915A1 publication Critical patent/WO2016174915A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01BMEASURING LENGTH, THICKNESS OR SIMILAR LINEAR DIMENSIONS; MEASURING ANGLES; MEASURING AREAS; MEASURING IRREGULARITIES OF SURFACES OR CONTOURS
    • G01B11/00Measuring arrangements characterised by the use of optical techniques
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01BMEASURING LENGTH, THICKNESS OR SIMILAR LINEAR DIMENSIONS; MEASURING ANGLES; MEASURING AREAS; MEASURING IRREGULARITIES OF SURFACES OR CONTOURS
    • G01B11/00Measuring arrangements characterised by the use of optical techniques
    • G01B11/002Measuring arrangements characterised by the use of optical techniques for measuring two or more coordinates
    • G01B11/005Measuring arrangements characterised by the use of optical techniques for measuring two or more coordinates coordinate measuring machines
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01BMEASURING LENGTH, THICKNESS OR SIMILAR LINEAR DIMENSIONS; MEASURING ANGLES; MEASURING AREAS; MEASURING IRREGULARITIES OF SURFACES OR CONTOURS
    • G01B11/00Measuring arrangements characterised by the use of optical techniques
    • G01B11/16Measuring arrangements characterised by the use of optical techniques for measuring the deformation in a solid, e.g. optical strain gauge
    • G01B11/168Measuring arrangements characterised by the use of optical techniques for measuring the deformation in a solid, e.g. optical strain gauge by means of polarisation
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01BMEASURING LENGTH, THICKNESS OR SIMILAR LINEAR DIMENSIONS; MEASURING ANGLES; MEASURING AREAS; MEASURING IRREGULARITIES OF SURFACES OR CONTOURS
    • G01B11/00Measuring arrangements characterised by the use of optical techniques
    • G01B11/26Measuring arrangements characterised by the use of optical techniques for measuring angles or tapers; for testing the alignment of axes
    • GPHYSICS
    • G03PHOTOGRAPHY; CINEMATOGRAPHY; ANALOGOUS TECHNIQUES USING WAVES OTHER THAN OPTICAL WAVES; ELECTROGRAPHY; HOLOGRAPHY
    • G03BAPPARATUS OR ARRANGEMENTS FOR TAKING PHOTOGRAPHS OR FOR PROJECTING OR VIEWING THEM; APPARATUS OR ARRANGEMENTS EMPLOYING ANALOGOUS TECHNIQUES USING WAVES OTHER THAN OPTICAL WAVES; ACCESSORIES THEREFOR
    • G03B35/00Stereoscopic photography
    • G03B35/02Stereoscopic photography by sequential recording
    • G03B35/04Stereoscopic photography by sequential recording with movement of beam-selecting members in a system defining two or more viewpoints
    • GPHYSICS
    • G03PHOTOGRAPHY; CINEMATOGRAPHY; ANALOGOUS TECHNIQUES USING WAVES OTHER THAN OPTICAL WAVES; ELECTROGRAPHY; HOLOGRAPHY
    • G03BAPPARATUS OR ARRANGEMENTS FOR TAKING PHOTOGRAPHS OR FOR PROJECTING OR VIEWING THEM; APPARATUS OR ARRANGEMENTS EMPLOYING ANALOGOUS TECHNIQUES USING WAVES OTHER THAN OPTICAL WAVES; ACCESSORIES THEREFOR
    • G03B35/00Stereoscopic photography
    • G03B35/02Stereoscopic photography by sequential recording
    • G03B35/06Stereoscopic photography by sequential recording with axial movement of lens or gate between exposures
    • GPHYSICS
    • G03PHOTOGRAPHY; CINEMATOGRAPHY; ANALOGOUS TECHNIQUES USING WAVES OTHER THAN OPTICAL WAVES; ELECTROGRAPHY; HOLOGRAPHY
    • G03BAPPARATUS OR ARRANGEMENTS FOR TAKING PHOTOGRAPHS OR FOR PROJECTING OR VIEWING THEM; APPARATUS OR ARRANGEMENTS EMPLOYING ANALOGOUS TECHNIQUES USING WAVES OTHER THAN OPTICAL WAVES; ACCESSORIES THEREFOR
    • G03B35/00Stereoscopic photography
    • G03B35/18Stereoscopic photography by simultaneous viewing
    • G03B35/26Stereoscopic photography by simultaneous viewing using polarised or coloured light separating different viewpoint images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/12Details of acquisition arrangements; Constructional details thereof
    • G06V10/14Optical characteristics of the device performing the acquisition or on the illumination arrangements
    • G06V10/145Illumination specially adapted for pattern recognition, e.g. using gratings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/12Details of acquisition arrangements; Constructional details thereof
    • G06V10/14Optical characteristics of the device performing the acquisition or on the illumination arrangements
    • G06V10/147Details of sensors, e.g. sensor lenses
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/61Control of cameras or camera modules based on recognised objects

Definitions

  • This technology makes it possible to recognize a subject accurately and easily with respect to an image processing apparatus, an image processing method, and a program.
  • the normal line of a subject is calculated from polarized images of a plurality of polarization directions.
  • normals are calculated by applying polarization images of a plurality of polarization directions to a model equation.
  • Patent Literature 1 illuminates a subject by arranging illumination means so that illumination light is p-polarized with respect to a predetermined reference plane. Further, Patent Document 1 separates reflected light from a reference surface into s-polarized light and p-polarized light, measures the light intensity for each polarization component, and obtains light obtained while moving the object to be measured along the reference surface. Based on the intensity measurement result, the subject is identified.
  • Patent Document 2 detects movements of an object corresponding to a hand and a forearm and recognizes a movement close to a natural movement of a person when turning a page.
  • the normal of the target surface corresponding to the palm is used.
  • the object is detected by extracting the distribution of the distance and the contour in the distance image.
  • the relationship between the polarization direction and the luminance of the polarization image has a periodicity of 180 degrees, and the orientation in the normal direction
  • the so-called 180 degree indefiniteness problem remains.
  • the difference in the surface material of the subject can be identified, but in principle, the three-dimensional shape of the subject cannot be identified from the two polarization directions.
  • an image with a high distance resolution is required to accurately detect the object and calculate the normal. It is not easy to acquire an image with high resolution.
  • an object of this technique is to provide an image processing apparatus, an image processing method, and a program that can easily and accurately recognize a subject.
  • the first aspect of this technology is A polarization image acquisition unit that acquires a plurality of polarization images having different polarization directions in which a subject to be recognized is imaged; A normal calculation unit that calculates a normal for each pixel based on the polarization image acquired by the polarization image acquisition unit; The image processing apparatus includes a recognition unit that recognizes the subject using the normal calculated by the normal calculation unit.
  • a plurality of polarized images with different polarization directions captured using an input indicator in a user interface as a subject to be recognized are acquired by the polarization image acquisition unit.
  • the normal calculation unit calculates a normal for each pixel based on the acquired polarization image. For example, the normal calculation unit performs temporary recognition of a subject using a temporary recognition processing image generated from a plurality of polarization images.
  • normals are calculated from a plurality of polarization images, and the indeterminacy of the calculated normals is resolved based on the provisional recognition result.
  • the model that is closest to the subject using the image for the temporary recognition processing and the model image registered in advance is used as the subject's temporary recognition result, and the indefiniteness of the normal line is eliminated based on the temporarily recognized model.
  • the normal calculation unit temporarily recognizes the positions of the fingertip and finger pad using the temporary recognition processing image and the model image registered in advance, and the temporary recognition result Based on the above, the indefiniteness of the normal of the finger region in the hand is eliminated.
  • the recognizing unit determines the pointing direction based on the normal of the finger region in which the ambiguity is eliminated.
  • the normal line calculation unit temporarily recognizes the position and skeleton structure of the hand region using the temporary recognition processing image and the model image registered in advance, and calculates the hand region based on the temporary recognition result.
  • Line indeterminacy may be eliminated.
  • the recognizing unit determines the hand shape based on the normal of the hand region in which the ambiguity is eliminated.
  • the normal calculation unit tentatively recognizes the position of the face region using the temporary recognition processing image and the model image registered in advance, and based on the temporary recognition result. To eliminate the indefiniteness of the normal of the face.
  • the recognizing unit determines a face shape or a facial expression based on the normal of the face area in which the ambiguity is eliminated.
  • the recognizing unit determines the distribution of the teacher data corresponding to the learning subject, for example, the normal line of the learning subject from the normals of the learning subject calculated based on a plurality of polarization images having different polarization directions obtained by imaging the learning subject. Are stored in the teacher database as teacher data.
  • the recognizing unit uses the distribution of normals calculated based on a plurality of polarized images with different polarization directions obtained by imaging the subject to be recognized as student data, and student data based on student data and teacher data stored in the teacher database unit.
  • the learning subject corresponding to the teacher data that is most similar to may be used as the recognition result of the subject to be recognized.
  • the polarization image acquisition unit acquires a plurality of polarization images having different polarization directions in which the recognition subject is imaged, and In the normal calculation unit, based on the polarization image acquired by the polarization image acquisition unit, calculating a normal for each pixel; In the image processing method, the recognition unit may recognize the subject using the normal calculated by the normal calculation unit.
  • the third aspect of this technique is a procedure for acquiring a plurality of polarization images having different polarization directions in which a subject to be recognized is imaged, A procedure for calculating a normal for each pixel based on the acquired polarization image; A program for causing a computer to execute a procedure for recognizing the subject using the calculated normal line.
  • the program of the present technology is, for example, a storage medium or a communication medium provided in a computer-readable format to a general-purpose computer that can execute various program codes, such as an optical disk, a magnetic disk, or a semiconductor memory. It is a program that can be provided by a medium or a communication medium such as a network. By providing such a program in a computer-readable format, processing corresponding to the program is realized on the computer.
  • a plurality of polarization images having different polarization directions in which a subject to be recognized is imaged are acquired, and a normal is calculated for each pixel based on the acquired polarization image.
  • Subject recognition is performed using the normal. For this reason, it is possible to easily and accurately recognize the subject. Note that the effects described in the present specification are merely examples and are not limited, and may have additional effects.
  • FIG. 3 is a flowchart showing a basic operation of the image processing apparatus. It is a figure which shows the structure of 1st Embodiment of an image processing apparatus. It is the figure which illustrated the structure in the case of producing
  • FIG. 1 shows a basic configuration of the image processing apparatus.
  • the image processing apparatus 10 includes a polarization image acquisition unit 20, a normal calculation unit 30, and a recognition unit 40.
  • the polarization image acquisition unit 20 acquires a plurality of polarization images with different polarization directions in which the subject to be recognized is captured, for example, polarization images with three or more polarization directions.
  • the polarization image acquisition unit 20 may include an imaging unit that generates a polarization image with three or more polarization directions, and a configuration that acquires a polarization image with three or more polarization directions from an external device or a recording medium. It may be.
  • the normal calculation unit 30 calculates the normal of the subject to be recognized based on the polarization image acquired by the polarization image acquisition unit 20.
  • the normal line calculation unit 30 calculates a normal line by applying a plurality of polarization images having different polarization directions acquired by the polarization image acquisition unit 20 to the model formula.
  • the normal line calculation unit 30 may perform processing for eliminating indefiniteness with respect to the normal line calculated for each pixel, for example.
  • the recognition unit 40 performs a process of recognizing the subject to be recognized based on the normal calculated by the normal calculation unit 30. For example, when the subject to be recognized is an input indicator in the user interface, the recognition unit 40 recognizes the type, position, orientation, and the like of the subject and outputs the recognition result as input information in the user interface.
  • FIG. 2 is a flowchart showing the basic operation of the image processing apparatus.
  • the image processing apparatus acquires a polarization image.
  • the polarization image acquisition unit 20 of the image processing apparatus 10 acquires a plurality of polarization images having different polarization directions in which the recognition target subject is imaged, and proceeds to step ST2.
  • the image processing apparatus calculates a normal line.
  • the normal calculation unit 30 of the image processing apparatus 10 calculates a normal for each pixel based on the polarization image acquired in step ST1, and proceeds to step ST3.
  • the image processing apparatus outputs a recognition result.
  • the recognition unit 40 of the image processing apparatus 10 performs recognition processing on the subject to be recognized based on the normal calculated in step ST2, and outputs a recognition result.
  • FIG. 3 shows the configuration of the first embodiment of the image processing apparatus.
  • the image processing apparatus 11 includes a polarization image acquisition unit 20, a normal calculation unit 31, and a user interface (UI) processing unit 41.
  • UI user interface
  • the polarization image acquisition unit 20 acquires a plurality of polarization images having different polarization directions.
  • FIG. 4 illustrates a configuration when a polarization image is generated by the polarization image acquisition unit.
  • the polarization image acquisition unit 20 generates the image sensor 201 by arranging the polarization filter 202 having a pixel configuration of a plurality of polarization directions to perform imaging.
  • FIG. 4A illustrates a case where a polarizing filter 202 serving as one of four different polarization directions (polarization directions are indicated by arrows) of each pixel is arranged on the front surface of the image sensor 201. ing.
  • FIG. 4 illustrates a configuration when a polarization image is generated by the polarization image acquisition unit.
  • the polarization image acquisition unit 20 generates the image sensor 201 by arranging the polarization filter 202 having a pixel configuration of a plurality of polarization directions to perform imaging.
  • FIG. 4A illustrates a case where a polar
  • the polarization image acquisition unit 20 may generate a plurality of polarization images having different polarization directions using a multi-lens array configuration.
  • a plurality of lenses 203 (four in the figure) are provided on the front surface of the image sensor 201, and an optical image of a subject is formed on the imaging surface of the image sensor 201 by each lens 203.
  • a polarizing plate 204 is provided on the front surface of each lens 203, and a plurality of polarization images having different polarization directions are generated with the polarization direction of the polarizing plate 204 being different.
  • the polarization image acquisition unit 20 is configured in this way, a plurality of polarization images can be acquired by one imaging, and thus a recognition target subject can be quickly recognized. Also, as shown in FIG. 4C, a configuration in which polarizing plates 212-1 to 212-4 having different polarization directions are provided in front of the imaging units 210-1 to 210-4, from a plurality of different viewpoints. A plurality of polarization images having different polarization directions may be generated.
  • the polarizing plate 211 is rotated to capture images in a plurality of different polarization directions, and a plurality of polarization images having different polarization directions are acquired.
  • the polarization image acquisition unit 20 can acquire a luminance polarization image.
  • an image equivalent to a non-polarized normal luminance image can be acquired by averaging the luminances of four adjacent pixels in different directions of polarization.
  • the polarization direction is different if the distance between the lenses 203 and the imaging units 210-1 to 210-4 is short enough to be ignored with respect to the distance to the subject. Parallax can be ignored for multiple polarization images. Therefore, by averaging the luminance of polarized images having different polarization directions, an image equivalent to a non-polarized normal luminance image can be obtained.
  • an image equivalent to a normal luminance image that is non-polarized can be acquired by averaging the luminances of luminance-polarized images having different polarization directions for each pixel.
  • the polarization image acquisition unit 20 may generate not only the luminance polarization image but also the three primary color images at the same time by providing the image sensor 201 with a color filter, and may simultaneously generate the infrared image and the like. Further, the polarization image acquisition unit 20 may generate a luminance image by calculating the luminance from the three primary color images.
  • the normal calculation unit 31 calculates a normal from which indefiniteness has been eliminated from a plurality of polarization images acquired by the polarization image acquisition unit 20 and auxiliary information such as various models.
  • the normal line calculation unit 31 includes, for example, a polarization processing unit 301, a provisional recognition processing image generation unit 302, a provisional recognition processing unit 303, a model database unit 304, and an indeterminacy elimination unit 305.
  • the polarization processing unit 301 calculates a normal line from the polarization image and outputs it to the indeterminacy eliminating unit 305.
  • the shape of the subject and the polarization image will be described with reference to FIG.
  • the light source LT is used to illuminate the subject OB
  • the imaging unit DC images the subject OB via the polarizing plate PL.
  • the luminance of the subject OB changes according to the polarization direction of the polarizing plate PL.
  • a plurality of polarization images are acquired by rotating the polarizing plate PL, for example, and the highest luminance is Imax and the lowest luminance is Imin.
  • the angle in the y-axis direction with respect to the x-axis when the polarizing plate PL is rotated is a polarization angle ⁇ .
  • the polarizing plate PL returns to the original polarization state when rotated 180 degrees and has a period of 180 degrees.
  • the polarization angle ⁇ when the maximum luminance Imax is observed is defined as an azimuth angle ⁇ .
  • the luminance I observed when the polarizing plate PL is rotated can be expressed as in Expression (1).
  • FIG. 6 illustrates the relationship between the luminance and the polarization angle. This example shows a diffuse reflection model. In the case of specular reflection, the azimuth angle is shifted by 90 degrees from the polarization angle.
  • Equation (1) the polarization angle ⁇ is clear when the polarization image is generated, and the maximum luminance Imax, the minimum luminance Imin, and the azimuth angle ⁇ are variables. Therefore, by performing the fitting to the model equation shown in the equation (1) using the luminance of the polarization image having three or more polarization directions, the polarization having the maximum luminance based on the model equation indicating the relationship between the luminance and the polarization angle.
  • the azimuth angle ⁇ which is an angle, can be determined.
  • the normal of the object surface is expressed in a polar coordinate system, and the normal is defined as an azimuth angle ⁇ and a zenith angle ⁇ .
  • the zenith angle ⁇ is an angle from the z axis toward the normal
  • the azimuth angle ⁇ is an angle in the y axis direction with respect to the x axis as described above.
  • the degree of polarization ⁇ can be calculated based on the equation (2).
  • the zenith angle ⁇ can be determined from the characteristics shown in FIG.
  • the characteristics shown in FIG. 7 are merely examples, and the characteristics change depending on the refractive index of the subject. For example, the degree of polarization increases as the refractive index increases.
  • FIG. 8 is a diagram for explaining the indefiniteness of 180 degrees.
  • the normal is calculated by imaging the subject OB shown in FIG. 8A by the imaging unit DC, the luminance change according to the rotation of the polarization direction has a cycle of 180 degrees. Therefore, for example, as shown in FIG. 8B, the normal direction (indicated by an arrow) is correct in the upper half area GA of the subject OB, and the normal direction is reverse in the lower half area GB. There is.
  • the temporary recognition processing image generation unit 302 generates a temporary recognition processing image based on the plurality of polarization images acquired by the polarization image acquisition unit 20.
  • the temporary recognition processing image generation unit 302 calculates the average of a plurality of polarized images, for example, thereby obtaining a temporary recognition process equivalent to a captured image (normal image) obtained by imaging without using a polarizing plate or a polarizing filter. An image is generated.
  • the temporary recognition processing image generation unit 302 may extract a polarization image of one polarization direction from a plurality of polarization images to obtain a temporary recognition processing image.
  • the temporary recognition processing image generation unit 302 may use the acquired plurality of polarization images as the temporary recognition processing image.
  • the temporary recognition processing image generation unit 302 may use a normal image and a polarization image as the temporary recognition processing image.
  • the temporary recognition processing image generation unit 302 outputs the temporary recognition processing image to the temporary recognition processing unit 303.
  • the temporary recognition processing unit 303 performs temporary recognition of the subject to be recognized using the temporary recognition processing image generated by the temporary recognition processing image generation unit 302.
  • the temporary recognition processing unit 303 performs subject recognition using the temporary recognition processing image, and determines the type, position, posture, and the like of the subject to be recognized.
  • the temporary recognition processing unit 303 uses, for example, a temporary recognition processing image and images of various object models (a normal image and a polarization image) stored in advance in the model database unit 304, and approximates the subject to be recognized most. Determine the model.
  • the temporary recognition processing image generated by the temporary recognition processing image generation unit 302 includes a polarization image
  • the temporary recognition processing unit 303 determines a model that is closest to the object to be recognized in consideration of polarization characteristics. To do.
  • the temporary recognition processing unit 303 outputs the determined model to the indeterminacy eliminating unit 305 as a temporary recognition result.
  • the temporary recognition processing unit 303 may perform temporary recognition of the subject to be recognized using not only model fitting but also other object
  • the indeterminacy canceling unit 305 cancels the normal indefiniteness calculated by the polarization processing unit 301 based on the temporary recognition result supplied from the temporary recognition processing unit 303.
  • the temporary recognition result has information indicating the type, position, posture, and the like of the subject to be recognized as described above. Therefore, the ambiguity eliminating unit 305 eliminates the ambiguity of the normal from the normal direction having a phase difference of 180 degrees in the normal based on the model indicated by the provisional recognition result. In other words, the ambiguity elimination unit 305 eliminates the ambiguity of the normal line by specifying the normal direction so as to correspond to the shape of the subject to be recognized, and the normal processing unit 41 eliminates the ambiguity. Output to.
  • the UI processing unit 41 recognizes a subject using the normal generated by the normal calculation unit 31 from which the ambiguity is eliminated.
  • the UI processing unit 41 uses the subject to be recognized in the polarization image acquired by the polarization image acquisition unit 20 as an input indicator in the user interface.
  • the UI processing unit 41 performs subject recognition based on the normal calculated by the normal calculation unit 31 and generates input information (hereinafter referred to as “UI information”) in the user interface.
  • UI information input information
  • the normal is information indicating the three-dimensional shape of the subject to be recognized
  • the UI processing unit 41 performs subject recognition, recognizes the type, position, orientation, and the like of the subject to be recognized, and the recognition result. Is output as UI information.
  • FIG. 9 is a flowchart showing the operation of the first embodiment.
  • the polarization image acquisition unit 20 acquires a polarization image.
  • the polarization image acquisition unit 20 performs imaging using a polarizing plate or a polarization filter, acquires a plurality of polarization images having different polarization directions, and proceeds to steps ST12 and ST13.
  • step ST12 the normal calculation unit 31 calculates a normal.
  • the normal calculation unit 31 performs fitting to a model equation using pixel values of a plurality of polarization images having different polarization directions for each pixel of the polarization image, and calculates a normal based on the model equation after fitting. Proceed to step ST15.
  • step ST13 the normal calculation unit 31 generates a temporary recognition processing image.
  • the normal line calculation unit 31 averages pixel values of a plurality of polarization images having different polarization directions generated in step ST11 for each pixel of the polarization image, and calculates the average value as a temporary recognition processing image (equivalent to a normal image). ), The process proceeds to step ST14.
  • step ST14 the normal calculation unit 31 performs provisional recognition processing.
  • the normal line calculation unit 31 performs fitting using, for example, a temporary recognition processing image and a model stored in advance, recognizes the type, position, orientation, and the like of the subject to be recognized, and proceeds to step ST15.
  • step ST15 the normal calculation unit 31 eliminates the indefiniteness of the normal.
  • the normal line calculation unit 31 determines the indeterminacy from the normal line calculated in step ST12, that is, the normal line having indefiniteness of 180 degrees, based on the temporary recognition result in step ST14, that is, the type, position, posture, etc. of the subject to be recognized. It cancels and it progresses to step ST16.
  • step ST16 the UI processing unit 41 generates UI information.
  • the UI processing unit 41 recognizes the type, position, orientation, and the like of the subject to be recognized based on the normal line in which the indefiniteness is eliminated, and sets the recognition result as UI information.
  • the subject to be recognized is a hand
  • the temporary recognition processing unit recognizes the positions of the fingertip and finger pad using the temporary recognition processing image and a pre-registered model image.
  • the indeterminacy eliminating unit eliminates the indefiniteness of the normal of the finger region in the hand based on the positions of the fingertip and the finger pad temporarily recognized by the temporary recognition processing unit
  • the incognition unit eliminates the indeterminacy.
  • the pointing direction is determined based on the normal of the finger area.
  • FIG. 10 is a flowchart showing the operation of the first specific example in the first embodiment.
  • the polarization image acquisition unit 20 acquires a polarization image.
  • the polarization image acquisition unit 20 performs imaging of a pointing hand using a polarizing plate or a polarizing filter. Further, the polarization image acquisition unit 20 acquires a plurality of polarization images having different polarization directions, and proceeds to steps ST22 and ST23.
  • step ST22 the normal calculation unit 31 calculates a normal. For each pixel of the polarization image, the normal line calculation unit 31 fits the pixel values of a plurality of polarization images having different polarization directions to the model equation, calculates the normal line based on the model equation after the fitting, and proceeds to step ST25. move on.
  • step ST23 the normal calculation unit 31 generates a temporary recognition processing image.
  • the normal line calculation unit 31 averages the pixel values of a plurality of polarization images having different polarization directions generated in step ST21 for each pixel of the polarization image, and calculates the average value as a temporary recognition processing image (equivalent to a normal image). ), The process proceeds to step ST24.
  • the normal calculation unit 31 detects the fingertip / fingerpad position.
  • the normal line calculation unit 31 detects the fingertip region and the finger pad region from the temporary recognition processing image using an image recognition technique.
  • the normal calculation unit 31 is disclosed in, for example, documents “SKKang, MYNam, and PKRhee,“ Color Based Hand and Finger Detection Technology for User Interaction. ”International Conference on Convergence and Hybrid Information Technology, pp 229-236, 2008.
  • a skin color image region is extracted from the color temporary recognition processing image, and edge detection is performed in the peripheral region of the extracted image region to detect the contour of the hand.
  • FIG. 11 illustrates the detection result of the hand outline, and the image area within the hand outline is the hand area ARh.
  • the normal calculation unit 31 detects a finger region from the hand region.
  • the normal line calculation unit 31 performs a morphological open process on the hand region.
  • a morphological open process small objects in the image are removed first, and large objects retain their shape and size. That is, when the morphological open process is performed on the hand area, the finger area that is thinner than the palm is removed first, so if the difference between the image before the morphological open process and the image after the process is calculated, the finger area is calculated. It becomes possible to detect.
  • the finger area ARf can be detected as shown in FIG.
  • the normal calculation unit 31 separates the detected finger area into a fingertip area and a finger pad area. As shown in FIG. 11, the normal line calculation unit 31 sets a difference area between the hand area ARh and the detected finger area ARf as a fist area ARg. Further, in the finger area ARf, the normal line calculation unit 31 sets the area farthest from the center of gravity BG of the fist area ARg as the fingertip area ARfs and the other area as the finger pad area ARft.
  • step ST25 the normal calculation unit 31 eliminates the indefiniteness of the normal.
  • the normal calculation unit 31 regards the appearance shape of the finger as a convex shape, and determines the normal of the normal corresponding to the finger region from the normal calculated in step ST22, that is, the normal having an indefiniteness of 180 degrees. Is canceled based on the detection result of step ST24, and the process proceeds to step ST26.
  • 12 illustrates the normal line of the finger pad area
  • FIG. 12 (a) is the normal line of the finger pad area ARft before the indefiniteness is resolved
  • FIG. The normal line of the finger pad area ARft after the erasure is eliminated is shown.
  • step ST26 the UI processing unit 41 determines the pointing direction.
  • the UI processing unit 41 determines the direction in which the fingertip is facing, that is, the pointing direction based on the normal of the finger region in which the indefiniteness is eliminated, and sets the determination result as UI information.
  • the pointing direction is a direction fs substantially perpendicular to the normal of the finger pad area ARft as shown in FIG. Therefore, for example, when the user performs a pointing operation toward the direction of the polarization image acquisition unit 20 and determines the pointing operation, the UI processing unit 41 is in a direction orthogonal to the normal of the finger pad region, A direction FP facing the front side from the back side of the image is taken as a pointing direction.
  • the UI processing unit 41 sets a vector p that is expressed by the equation (7) for a plurality of position (for example, k positions) normals in the finger pad region, that is, the normal N shown in the equation (6), by Obtained under constraints.
  • the function that is minimized in Expression (7) can be defined as Function C as shown in Expression (10). That is, a vector p that minimizes the function C may be obtained under the constraint condition of Expression (8).
  • the Lagrangian undetermined multiplier method is used to calculate the vector p that minimizes the function C.
  • Equation (10) can be expressed as Equation (11) using Lagrange multiplier ⁇ . Therefore, if a vector p satisfying the eigen equation shown in Expression (12) is obtained, the vector p satisfying Expression (7) can be calculated.
  • the vector p satisfying the eigen equation is established when the vector p is an eigen vector of the function W, and the value of ⁇ corresponds to the eigen value.
  • the vector p satisfying Expression (7) is the eigenvector corresponding to the minimum eigenvalue of the function W. It becomes.
  • the pointing direction can be determined with higher accuracy than in the case of using a distance image. Further, in the first specific example, the case where the pointing direction is determined is illustrated, but as shown in FIG. 14, the position of the pupil EP is detected, and how much the position of the pupil EP moves in any direction from the center.
  • the line-of-sight direction FP can also be determined depending on whether or not it is.
  • the subject to be recognized is a face
  • the temporary recognition processing unit recognizes the position of the face area using the temporary recognition processing image and the model image registered in advance.
  • the indeterminacy eliminating unit eliminates the indeterminacy of the normal of the face based on the position of the face area temporarily recognized by the temporary recognition processing unit, and the recognizing unit corrects the face area in which the indeterminacy is eliminated.
  • the face shape or expression is determined based on the line.
  • FIG. 15 is a flowchart showing the operation of the second specific example in the first embodiment.
  • the polarization image acquisition unit 20 acquires a polarization image.
  • the polarization image acquisition unit 20 captures a face using a polarizing plate or a polarization filter. Further, the polarization image acquisition unit 20 provides a color filter in the imaging unit, acquires a plurality of color polarization images having different polarization directions, and proceeds to steps ST32 and ST33.
  • step ST32 the normal calculation unit 31 calculates a normal. For each pixel of the polarization image, the normal line calculation unit fits the pixel values of a plurality of polarization images having different polarization directions to the model equation, calculates the normal line based on the model equation after the fitting, and proceeds to step ST35. .
  • step ST33 the normal calculation unit 31 generates a temporary recognition processing image.
  • the normal line calculation unit 31 averages pixel values of a plurality of polarization images having different polarization directions generated in step ST31 for each pixel of the polarization image, and calculates the average value as a temporary recognition processing image (equivalent to a normal image). ), The process proceeds to step ST34.
  • the normal line calculation unit 31 detects face recognition and feature points of the recognized face.
  • the normal line calculation unit 31 detects the position of the face from the temporary recognition processing image using a face recognition technique.
  • the normal calculation unit 31 is, for example, a document “TFCootes, CJTaylor, DHCooper, and J.Graham:“ Active Shape Models-Their Training and Application ”, Computer Vision and Image Understanding, Vol.16, No.1. , “January,” pp.38-59, “1995”, and detects the feature points of the face.
  • Active Shape Model can automatically detect feature points that determine the posture of the recognition target in the image.
  • a plurality of learning images in which feature points are manually arranged are prepared, and an intermediate shape to be recognized is generated from these images.
  • the object to be recognized is searched for by changing the position of the intermediate shape with respect to the image to be recognized.
  • template matching is performed by looking at the luminance change around the feature point in the intermediate shape.
  • the normal calculation part 31 eliminates the indefiniteness of the normal.
  • the normal calculation unit 31 determines the three-dimensional shape of the face and the orientation of the face from the positional relationship between the feature points of the face detected in step ST34 and the feature points of the three-dimensional model stored in advance.
  • the normal line calculation unit 31 determines the three-dimensional shape of the face and the orientation of the face from the positional relationship between the face OBf and the feature points (for example, eyes, nose, mouth, etc.) of the three-dimensional model ML.
  • the normal calculation unit 31 calculates the normal corresponding to the face region from the normal calculated in step ST32, that is, the normal having indefiniteness of 180 degrees. Is resolved based on the detection result of step ST34, and the process proceeds to step ST36.
  • the UI processing unit 41 determines the face shape and expression.
  • the UI processing unit 41 discriminates a detailed face shape and expression based on the normal of the face area in which the indefiniteness is eliminated, and uses the discrimination result as UI information.
  • the UI processing unit 41 integrates the normal of the face area in which the indefiniteness is eliminated, and determines the facial expression from the detailed face shape or the detailed face shape. It should be noted that distance image information, a three-dimensional model, or the like may be used for discrimination of the face shape and expression.
  • the face shape and facial expression can be discriminated more accurately than when the distance image is used.
  • face recognition is illustrated. However, if a three-dimensional model of a known object is prepared, the known object can be recognized.
  • the subject to be recognized is a hand
  • the temporary recognition processing unit recognizes the position of the hand region and the skeleton structure using the temporary recognition processing image and the model image registered in advance.
  • the indeterminacy elimination unit eliminates indeterminacy in the normal of the hand region based on the position and skeleton structure of the hand region temporarily recognized by the temporary recognition processing unit, and the recognition unit eliminates indeterminacy.
  • the hand shape is determined based on the normal of the hand region.
  • FIG. 17 is a flowchart showing the operation of the third specific example in the first embodiment.
  • the polarization image acquisition unit 20 acquires a polarization image.
  • the polarization image acquisition unit 20 performs hand imaging using a polarizing plate or a polarization filter. Further, the polarization image acquisition unit 20 provides a color filter in the imaging unit, acquires a plurality of color polarization images having different polarization directions, and proceeds to steps ST42 and 43.
  • step ST42 the normal calculation unit 31 calculates a normal. For each pixel of the polarization image, the normal line calculation unit 31 fits the pixel values of a plurality of polarization images having different polarization directions to the model equation, calculates the normal line based on the model equation after the fitting, and proceeds to step ST45. move on.
  • step ST43 the normal calculation unit 31 generates a temporary recognition processing image.
  • the normal calculation unit 31 averages the pixel values of a plurality of polarization images having different polarization directions generated in step ST41 for each pixel of the polarization image, and calculates the average value as a temporary recognition processing image (equivalent to a normal image). ), The process proceeds to step ST44.
  • step ST44 the normal calculation unit 31 detects the position / posture of the hand.
  • the normal calculation unit 31 performs the same processing as step ST24, detects the fist or palm region and the fingertip region, and detects the skeleton of the hand by connecting the center of gravity of the fist or palm region and the fingertip.
  • FIG. 18 is a diagram for explaining an operation when detecting a skeleton of a hand.
  • the normal calculation unit 31 detects the palm area ARk and the finger area ARf, and detects the skeleton of the hand as shown by a broken line connecting the center of gravity of the palm area ARk and the tip of the finger area ARf.
  • the normal calculation unit 31 performs fitting between the detected hand skeleton and a skeleton model stored in advance for each hand posture, and uses the posture of the skeleton model that minimizes the fitting error as the imaged hand posture. .
  • the normal calculation unit 31 matches the centroid of the detected hand skeleton with the centroid of the skeleton model stored in advance, and calculates a sum of absolute differences SAD (sum of absolute difference) of position coordinates such as joints and fingertips. Calculate every time.
  • the normal line calculation unit 31 sets the posture with the smallest calculated sum of absolute differences SAD as the posture of the photographed hand.
  • the normal calculation unit 31 detects the hand position and hand posture in this way, and proceeds to step ST45.
  • step ST45 the normal calculation unit 31 eliminates the indefiniteness of the normal.
  • the normal line calculation unit 31 determines the normal line corresponding to the hand region from the normal line calculated in step ST42 based on the hand position and hand posture detected in step ST44, that is, the normal line having an indefiniteness of 180 degrees. The process proceeds to step ST46.
  • the UI processing unit 41 determines the shape of the hand.
  • the UI processing unit 41 discriminates the shape of the hand based on the normal of the hand region in which the ambiguity is eliminated, and uses the discrimination result as UI information.
  • the UI processing unit 41 determines, for example, the shape of the hand in detail by integrating the normal of the hand region in which the indefiniteness is eliminated. In determining the shape of the hand, distance image information, a three-dimensional model, or the like may be used.
  • the shape of the hand can be discriminated more accurately than when the distance image is used.
  • the normal calculation unit 31 performs fitting between the finger area detected by the same process as in step ST24 and the finger shape model stored in advance for each pointing direction. In the fitting, the detected finger area (or hand area) and the finger area (or hand area) of the finger shape model are overlapped using the fingertip area detected by the same process as in step ST24 as a fulcrum.
  • the UI processing unit 41 sets the orientation of the finger shape model that minimizes the fitting error, that is, the overlay error, as the pointing direction indicated by the imaged hand.
  • the operations shown in the flowchart in the first embodiment include a process for generating a normal having indefiniteness and a process for generating a temporary recognition processing image and performing temporary recognition of a subject to be recognized. It is not restricted to performing in parallel. For example, one process may be performed before the other process is performed.
  • Second Embodiment> By the way, in the first embodiment described above, the case where UI information is generated based on a normal line in which indefiniteness has been eliminated has been described. However, UI information may be generated based on a normal line having indefiniteness. it can. In the second embodiment, a case where UI information is generated based on a normal having indefiniteness will be described.
  • FIG. 19 illustrates the configuration of the second embodiment of the image processing apparatus.
  • the image processing apparatus 12 includes polarization image acquisition units 20-1 and 20-2, normal calculation units 32-1 and 32-2, and a user interface (UI) processing unit 42.
  • UI user interface
  • the polarization image acquisition units 20-1 and 20-2 acquire a plurality of polarization images having different polarization directions.
  • the polarization image acquisition units 20-1 and 20-2 are configured in the same manner as the polarization image acquisition unit 20 in the first embodiment.
  • the polarization image acquisition unit 20-1 acquires a plurality of polarization images having different polarization directions in which the recognition subject is imaged, and outputs the plurality of polarization images to the normal calculation unit 32-1.
  • the polarization image acquisition unit 20-2 acquires a plurality of polarization images having different polarization directions in which the learning subject is imaged and outputs the acquired images to the normal calculation unit 32-2. Further, the polarization image acquisition units 20-1 and 20-2 may output the acquired polarization image to the UI processing unit 42.
  • the normal calculation unit 32-1 (32-2) calculates a normal from a plurality of polarization images acquired by the polarization image acquisition unit 20-1 (20-2).
  • the normal calculation unit 32-1 (32-2) is configured using the polarization processing unit 301 of the normal calculation unit 31 in the first embodiment.
  • the normal calculation unit 32-1 performs processing similar to that of the polarization processing unit 301 described above from the polarization image to be recognized acquired by the polarization image acquisition unit 20-1, calculates a normal, and performs a UI processing unit 42. Output to.
  • the normal calculation unit 32-2 calculates a normal from the polarization image of the learning subject acquired by the polarization image acquisition unit 20-2 and outputs the normal to the UI processing unit 42.
  • the UI processing unit 42 uses the subject to be recognized in the polarization image acquired by the polarization image acquisition unit 20-1 as an input indicator in the user interface.
  • the UI processing unit 41 performs subject recognition based on the normal that has not been solved by the indeterminacy calculated by the normal calculation unit 32-1, and generates input information (hereinafter referred to as “UI information”) in the user interface.
  • UI information input information
  • the normal is information indicating the three-dimensional shape of the subject to be recognized
  • the UI processing unit 42 performs subject recognition, recognizes the type, position, orientation, and the like of the subject to be recognized, and the recognition result. Is output as UI information.
  • the UI processing unit 42 stores teacher data corresponding to the learning subject in advance, and is stored as student data calculated based on a plurality of polarization images with different polarization directions obtained by imaging the subject to be recognized. Based on the teacher data, recognition processing of the subject to be recognized is performed.
  • the UI processing unit 42 includes a teacher data generation unit 421, a teacher database unit 422, and a recognition processing unit 423.
  • the teacher data generating unit 421 generates teacher data corresponding to the learning subject using the normal calculated by the normal calculating unit 32-2, and stores the teacher data in the teacher database unit 422. Further, the teacher data generation unit 421 generates a non-polarized image (normal image) using the polarized image supplied from the polarized image acquisition unit 20-2, and the feature amount calculated from the non-polarized image and the acquired normal line. Teacher data may be generated using.
  • the teacher database unit 422 stores the teacher data generated by the teacher data generation unit 421. In addition, the teacher database unit 422 outputs the stored teacher data to the recognition processing unit 423.
  • the recognition processing unit 423 generates student data based on the normal calculated by the normal calculation unit 32-1 and performs recognition processing using the generated student data and teacher data stored in the teacher database unit 422. And generate UI information. In addition, the recognition processing unit 423 generates a non-polarized image (normal image) using the polarized image supplied from the polarized image acquisition unit 20-1, and calculates the feature amount calculated from the non-polarized image and the acquired normal line. May be used to generate student data.
  • a non-polarized image normal image
  • FIG. 20 is a flowchart showing the learning operation.
  • the polarization image acquisition unit 20-2 acquires a polarization image of the learning subject.
  • the polarization image acquisition unit 20-2 captures a learning subject using a polarizing plate or a polarization filter, acquires a plurality of polarization images having different polarization directions, and proceeds to step ST52.
  • the normal calculation unit 32-2 calculates a normal.
  • the normal calculation unit 32-2 fits pixel values of a plurality of polarization images having different polarization directions to a model equation for each pixel of the polarization image, and calculates a normal based on the model equation after the fitting. Proceed to ST54.
  • step ST54 the UI processing unit 42 generates teacher data.
  • the UI processing unit 42 generates teacher data based on the normal calculated based on the polarization image of the learning subject, and proceeds to step ST55.
  • step ST55 the UI processing unit 42 stores teacher data.
  • the UI processing unit 42 stores the teacher data generated in step ST54 in the teacher database unit 422.
  • step ST51 to step ST55 is performed for each learning subject, and the UI processing unit 42 stores teacher data generated with various objects as learning subjects.
  • the UI processing unit 42 When generating UI information based on a normal having indefiniteness and a polarization image, the UI processing unit 42 performs the process of step ST53 to generate a non-polarization image from the polarization image.
  • step ST54 the UI processing unit 42 generates teacher data by using the normal calculated based on the polarization image of the learning subject and the feature amount calculated from the non-polarized image.
  • FIG. 21 is a flowchart showing the recognition operation using the learning result.
  • the polarization image acquisition unit 20-1 acquires a polarization image of the recognition target object.
  • the polarization image acquisition unit 20-1 captures a subject to be recognized using a polarizing plate or a polarization filter, acquires a plurality of polarization images having different polarization directions, and proceeds to step ST62.
  • the normal calculation unit 32-1 calculates a normal.
  • the normal calculation unit 32-1 fits pixel values of a plurality of polarization images having different polarization directions to a model equation for each pixel of the polarization image, and calculates a normal based on the model equation after the fitting. Proceed to ST64.
  • step ST64 the UI processing unit 42 generates student data.
  • the UI processing unit 42 generates student data based on the normal calculated based on the polarization image to be recognized, and proceeds to step ST65.
  • step ST65 the UI processing unit 42 generates UI information.
  • the UI processing unit 42 determines the type, position, orientation, and the like of the subject to be recognized based on the student data generated in step ST64 and the teacher data stored by performing the processing in steps ST51 to ST55. The result is UI information.
  • the UI processing unit 42 performs the process of step ST63 to generate a non-polarization image from the polarization image.
  • the UI processing unit 42 generates student data using the normal calculated based on the polarization image to be recognized and the feature amount calculated from the non-polarized image.
  • FIG. 22 shows an operation of a specific example in the second embodiment.
  • the specific example has shown the case where UI information is produced
  • the polarization image acquisition unit 20-2 acquires a polarization image (teacher polarization image) of the learning subject. For example, with the hand in the Rock state, the polarization image acquisition unit 20-2 captures the hand using a polarizing plate or a polarization filter, acquires a plurality of polarization images having different polarization directions, and proceeds to Step ST72. move on.
  • step ST72 the normal calculation unit 32-2 calculates a normal. For each pixel of the polarization image, the normal line calculation unit 32-2 fits pixel values of a plurality of polarization images having different polarization directions to the model formula, and moves the hand based on the model formula after the fitting. The normal line in the state is calculated, and the process proceeds to step ST73.
  • step ST73 the UI processing unit 42 generates teacher data.
  • the UI processing unit 42 generates teacher data based on the normal line of the learning subject. For example, the UI processing unit 42 forms a histogram of normals when the hand is in a rock state, and proceeds to step ST74 using the obtained normal histogram as teacher data.
  • the UI processing unit 42 stores teacher data.
  • the UI processing unit 42 stores, as teacher data, a normal histogram when the hand is in a rock state in the teacher database unit.
  • step ST71 to step ST74 is performed for each learning subject, for example, when the hand is in a par state or a scissors state, and the teacher data in each state is stored in the teacher database unit 422. .
  • step ST75 the polarization image acquisition unit 20-1 acquires a polarization image to be recognized.
  • the polarization image acquisition unit 20-1 captures, for example, a hand when performing a soap using a polarizing plate or a polarization filter, acquires a plurality of polarization images having different polarization directions, and proceeds to step ST76.
  • step ST76 the normal calculation unit 32-1 calculates a normal.
  • the normal calculation unit 32-1 fits pixel values of a plurality of polarization images having different polarization directions to a model equation for each pixel of the polarization image, and calculates a normal based on the model equation after the fitting. Proceed to ST77.
  • step ST77 the UI processing unit 42 generates student data.
  • the UI processing unit 42 generates student data based on the normal line to be recognized. For example, the UI processing unit 42 histograms the normals regarding the state of the hand to be recognized, and proceeds to step ST78 using the obtained normal histograms as student data.
  • step ST78 the UI processing unit 42 generates UI information.
  • the UI processing unit 42 determines from the teacher database unit 422 the teacher data most similar to the student data obtained in step ST77. Further, the UI processing unit 42 determines the hand state corresponding to the determined teacher data as the hand state captured in the polarization image acquired in step ST75, and outputs the determination result as UI information.
  • UI information can be generated without performing a process for eliminating the indefiniteness of the normal.
  • the recognition process can be performed with higher accuracy than when the distance image is used.
  • a configuration for generating a normal from a learning subject and a configuration for generating a normal from a subject to be recognized are provided separately. Therefore, in the configuration in which the normal line is generated from the learning subject, it is possible to generate the teacher data with higher accuracy than in the configuration in which the normal line is generated from the recognition subject. Therefore, by storing the teacher data used as a criterion for discrimination in the teacher database unit 422 as high-precision data, it is possible to obtain an accurate discrimination result.
  • the polarization image acquisition unit and the normal line calculation unit may be shared between the case where the normal line is generated from the learning subject and the case where the normal line is generated from the desired recognition target.
  • a communication unit, a recording medium mounting unit, and the like may be provided in the UI processing unit so that teacher data can be updated or added from the outside.
  • teacher data can be updated or added from the outside.
  • FIG. 23 exemplifies a user interface to which the present technology is applied.
  • hand recognition is performed using a recognition target as a hand.
  • the hand shape and pointing direction can be determined.
  • face recognition is performed using the recognition target as a face.
  • face recognition personal authentication, facial expression and line-of-sight direction can be determined.
  • human recognition is performed with a recognition target as a person.
  • body authentication and pose discrimination can be performed.
  • the normal since a normal close to the three-dimensional shape of the subject can be calculated from the polarization image as compared with the normal generated from the conventional distance image, for example, even for a subject that is angled toward the camera The normal can be calculated stably. Therefore, the subject to be recognized can be easily and accurately recognized by using the normal calculated from the polarization image. Furthermore, if this technique is applied to the user interface, the pointing direction and the like can be recognized more reliably than when the pointing direction and the like are recognized from the distance image, and thus it is possible to provide a stress-free user interface. For example, if the configuration shown in FIGS. 4A and 4B is used as the polarization image acquisition unit, the normal can be calculated from the polarization image acquired by the monocular camera, so there is no need to use a plurality of cameras. Therefore, application to a user interface is easy.
  • the series of processes described in the specification can be executed by hardware, software, or a combined configuration of both.
  • a program in which a processing sequence is recorded is installed and executed in a memory in a computer incorporated in dedicated hardware.
  • the program can be installed and executed on a general-purpose computer capable of executing various processes.
  • the program can be recorded in advance on a hard disk, SSD (Solid State Drive), or ROM (Read Only Memory) as a recording medium.
  • the program is a flexible disk, a CD-ROM (Compact Disc Read Only Memory), an MO (Magneto optical disc), a DVD (Digital Versatile Disc), a BD (Blu-Ray Disc (registered trademark)), a magnetic disk, or a semiconductor memory card. It can be stored (recorded) in a removable recording medium such as temporarily or permanently. Such a removable recording medium can be provided as so-called package software.
  • the program may be transferred from the download site to the computer wirelessly or by wire via a network such as a LAN (Local Area Network) or the Internet.
  • the computer can receive the program transferred in this way and install it on a recording medium such as a built-in hard disk.
  • the image processing apparatus may have the following configuration.
  • a polarization image acquisition unit that acquires a plurality of polarization images having different polarization directions in which a subject to be recognized is imaged;
  • a normal calculation unit that calculates a normal for each pixel based on the polarization image acquired by the polarization image acquisition unit;
  • An image processing apparatus comprising: a recognition unit that recognizes the subject using the normal calculated by the normal calculation unit.
  • the subject to be recognized is an input indicator in a user interface, The image processing apparatus according to (1), wherein the recognition unit uses the recognition result of the subject as input information in the user interface.
  • the normal calculation unit A provisional recognition processing image generation unit that generates a provisional recognition processing image from the plurality of polarization images; A temporary recognition recognition processing unit that performs temporary recognition of the subject using the temporary recognition processing image generated by the temporary recognition processing image generation unit; A polarization processing unit that calculates normals from the plurality of polarization images; An indeterminacy eliminating unit that eliminates the indeterminacy of the normal calculated by the polarization processing unit based on the temporary recognition result of the temporary recognition processing unit, The image processing apparatus according to (1) or (2), wherein the recognizing unit recognizes the subject using a normal whose indefiniteness has been eliminated by the normal calculating unit.
  • the temporary recognition processing unit uses, as a temporary recognition result of the subject, a model closest to the subject using the temporary recognition processing image and a model image registered in advance.
  • the subject to be recognized is a hand, The temporary recognition processing unit recognizes the positions of the fingertips and finger pads using the temporary recognition processing image and a pre-registered model image, The image processing apparatus according to (4), wherein the indefiniteness eliminating unit eliminates indeterminacy of a normal of a finger region in the hand based on the positions of the fingertip and the finger pad temporarily recognized by the temporary recognition processing unit. .
  • the subject to be recognized is a face
  • the temporary recognition processing unit recognizes the position of the face area using the temporary recognition processing image and a model image registered in advance
  • the image processing apparatus according to (4), wherein the indefiniteness eliminating unit eliminates indefiniteness of the normal of the face based on the position of the face area temporarily recognized by the temporary recognition processing unit.
  • the recognizing unit determines a face shape or a facial expression based on a normal of the face region in which indefiniteness is eliminated by the normal calculation unit.
  • the subject to be recognized is a hand
  • the temporary recognition processing unit recognizes the position and skeleton structure of the hand region using the temporary recognition processing image and a pre-registered model image
  • apparatus The image processing apparatus according to (10), wherein the recognizing unit determines a hand shape based on a normal of the hand region in which indefiniteness is eliminated by the normal calculation unit.
  • the recognition unit A teacher data generation unit that generates teacher data corresponding to the learning subject from normals calculated based on a plurality of polarized images having different polarization directions obtained by imaging the learning subject; A teacher database unit for storing the teacher data generated for each learning subject by the teacher data generation unit; Student data generated according to the recognition target using normals calculated based on a plurality of polarization images having different polarization directions obtained by imaging the subject to be recognized, and teacher data stored in the teacher database unit
  • the image processing apparatus according to (1) or (2), further including a recognition processing unit that recognizes the subject to be recognized.
  • the polarization image acquisition unit acquires a plurality of polarization images having different polarization directions for each of the recognition target and the learning subject, The image processing apparatus according to (11), wherein the normal line calculation unit calculates a normal line for each of the recognition object and the learning subject based on the polarization image acquired by the polarization image acquisition unit.
  • a learning polarization image acquisition unit that acquires a plurality of polarization images having different polarization directions obtained by imaging the learning subject;
  • the image processing apparatus according to (11) or (12) further including a learning normal calculation unit that calculates a normal based on the polarization image acquired by the learning polarization image acquisition unit.
  • the teacher data is data indicating a distribution of normals for the learning subject
  • the student data is data indicating a distribution of normals calculated for the subject to be recognized.
  • the image processing apparatus according to any one of 13).
  • the image processing device according to any one of (11) to (14), wherein the recognition processing unit sets the learning subject corresponding to the teacher data closest to the student data as a recognition result.
  • a plurality of polarization images having different polarization directions in which a subject to be recognized is captured are acquired, and a normal line is obtained for each pixel based on the acquired polarization image. Is calculated, and the subject is recognized using the calculated normal. For this reason, it is possible to easily and accurately recognize the subject. Therefore, it is suitable for a device having an interface for performing operation control and start / end / change / update of signal processing according to the recognition result of the type, position, orientation, etc. of the object.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Vascular Medicine (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)
  • User Interface Of Digital Computer (AREA)
  • Length Measuring Devices By Optical Means (AREA)
  • Position Input By Displaying (AREA)

Abstract

A polarized light image acquisition unit 20 acquires a plurality of polarized light images having different polarization directions. A polarized light image is an image in which, for example, a pointing body in a user interface is imaged as a subject to be recognized. A normal computation unit 30 computes a normal for each pixel of the subject to be recognized on the basis of the polarized light images which are acquired by the polarized light image acquisition unit 20. The normal is information according to the three-dimensional shape of the subject to be recognized. Using the normals which are computed with the normal computation unit 30, a recognition unit 40 carries out a recognition of the subject, assesses the type, position, orientation, etc., of the pointing body, and outputs the result of the assessment as input information in the user interface. Thus, it is possible to easily carry out the recognition of the subject with good precision.

Description

画像処理装置と画像処理方法およびプログラムImage processing apparatus, image processing method, and program
  この技術は、画像処理装置と画像処理方法およびプログラムに関し、被写体の認識を精度よく容易に行えるようにする。 This technology makes it possible to recognize a subject accurately and easily with respect to an image processing apparatus, an image processing method, and a program.
 従来、複数の偏光方向の偏光画像から被写体の法線を算出することが行われている。例えば、非特許文献1や非特許文献2では、複数の偏光方向の偏光画像をモデル式に当てはめることによって法線の算出が行われている。 Conventionally, the normal line of a subject is calculated from polarized images of a plurality of polarization directions. For example, in Non-Patent Document 1 and Non-Patent Document 2, normals are calculated by applying polarization images of a plurality of polarization directions to a model equation.
 また、偏光画像を用いて被写体の認識等も行われている。例えば特許文献1は、照明光が所定の基準面に対してp偏光となるよう照明手段を配置して被写体の照明を行う。また、特許文献1は、基準面からの反射光をs偏光とp偏光に分離して偏光成分毎の光強度の測定を行い、被測定物を基準面に沿って移動させながら得られた光強度の測定結果に基づき、被写体の識別が行われている。 Also, the subject is recognized using the polarization image. For example, Patent Literature 1 illuminates a subject by arranging illumination means so that illumination light is p-polarized with respect to a predetermined reference plane. Further, Patent Document 1 separates reflected light from a reference surface into s-polarized light and p-polarized light, measures the light intensity for each polarization component, and obtains light obtained while moving the object to be measured along the reference surface. Based on the intensity measurement result, the subject is identified.
 さらに、法線をユーザインタフェースに利用することが行われている。例えば特許文献2は、手と前腕とに相当する対象物の動作を検出して、ページめくりを行う際の人の自然な動作に近い動きを認識することが行われており、動作の検出では手のひらに相当する対象面の法線が用いられている。また、対象物は、距離画像における距離の分布や輪郭の抽出を行うことで検出されている。 Furthermore, the normal is used for the user interface. For example, Patent Document 2 detects movements of an object corresponding to a hand and a forearm and recognizes a movement close to a natural movement of a person when turning a page. The normal of the target surface corresponding to the palm is used. In addition, the object is detected by extracting the distribution of the distance and the contour in the distance image.
特開2011-150689号公報JP2011-150689A 特開2012-242901号公報JP 2012-242901 A
 ところで、複数の偏光方向の偏光画像をモデル式に当てはめることによって法線を算出する場合、偏光方向と偏光画像の輝度との関係は180度の周期性を有しており、法線方向の方位角を求める場合にいわゆる180度の不定性の問題が残ってしまう。また、s偏光とp偏光を用いた識別では、被写体の表面素材の違いを識別することはできるが、原理上2偏光方向から被写体の三次元形状を識別できない。さらに、対象物の検出や法線の算出を距離画像に基づいて行う場合、精度よく対象物の検出や法線の算出を行うためには距離の分解能が高い画像が必要であるが、距離の分解能が高い画像を取得することは容易でない。 By the way, when the normal is calculated by applying a polarization image of a plurality of polarization directions to the model formula, the relationship between the polarization direction and the luminance of the polarization image has a periodicity of 180 degrees, and the orientation in the normal direction When obtaining the angle, the so-called 180 degree indefiniteness problem remains. Further, in the identification using s-polarized light and p-polarized light, the difference in the surface material of the subject can be identified, but in principle, the three-dimensional shape of the subject cannot be identified from the two polarization directions. In addition, when detecting an object and calculating a normal based on a distance image, an image with a high distance resolution is required to accurately detect the object and calculate the normal. It is not easy to acquire an image with high resolution.
 そこで、この技術では、被写体の認識を精度よく容易に行うことができる画像処理装置と画像処理方法およびプログラムを提供することを目的とする。 Accordingly, an object of this technique is to provide an image processing apparatus, an image processing method, and a program that can easily and accurately recognize a subject.
 この技術の第1の側面は、
 認識対象の被写体が撮像されている偏光方向が異なる複数の偏光画像を取得する偏光画像取得部と、
 前記偏光画像取得部で取得された偏光画像に基づいて、画素毎に法線を算出する法線算出部と、
 前記法線算出部で算出された法線を用いて前記被写体の認識を行う認識部と
を備える画像処理装置にある。
The first aspect of this technology is
A polarization image acquisition unit that acquires a plurality of polarization images having different polarization directions in which a subject to be recognized is imaged;
A normal calculation unit that calculates a normal for each pixel based on the polarization image acquired by the polarization image acquisition unit;
The image processing apparatus includes a recognition unit that recognizes the subject using the normal calculated by the normal calculation unit.
 この技術においては、例えばユーザインタフェースにおける入力指示体を認識対象の被写体として撮像されている偏光方向が異なる複数の偏光画像が偏光画像取得部で取得される。法線算出部では、取得された偏光画像に基づいて画素毎に法線を算出する。例えば、法線算出部は、複数の偏光画像から生成した仮認識処理用画像を用いて被写体の仮認識を行う。また、複数の偏光画像から法線を算出して、算出した法線の不定性を仮認識結果に基づいて解消する。仮認識では、仮認識処理用画像と予め登録されているモデルの画像を用いて被写体に最も近似したモデルを被写体の仮認識結果として、仮認識されたモデルに基づいて法線の不定性を解消する。認識対象の被写体が手である場合、法線算出部は、仮認識処理用画像と予め登録されているモデルの画像を用いて手の指先と指腹の位置を仮認識して、仮認識結果に基づいて手における指領域の法線の不定性を解消する。認識部は、不定性が解消された指領域の法線に基づき指さし方向を判別する。また、法線算出部は、仮認識処理用画像と予め登録されているモデルの画像を用いて手の領域の位置と骨格構造を仮認識して、仮認識結果に基づいて手の領域の法線の不定性を解消してもよい。この場合、認識部は、不定性が解消された手領域の法線に基づき手形状を判別する。さらに、認識対象の被写体は顔である場合、法線算出部は、仮認識処理用画像と予め登録されているモデルの画像を用いて顔領域の位置を仮認識して、仮認識結果に基づいて顔の法線の不定性を解消する。認識部は、不定性が解消された顔領域の法線に基づき顔形状または表情を判別する。 In this technique, for example, a plurality of polarized images with different polarization directions captured using an input indicator in a user interface as a subject to be recognized are acquired by the polarization image acquisition unit. The normal calculation unit calculates a normal for each pixel based on the acquired polarization image. For example, the normal calculation unit performs temporary recognition of a subject using a temporary recognition processing image generated from a plurality of polarization images. In addition, normals are calculated from a plurality of polarization images, and the indeterminacy of the calculated normals is resolved based on the provisional recognition result. In the temporary recognition, the model that is closest to the subject using the image for the temporary recognition processing and the model image registered in advance is used as the subject's temporary recognition result, and the indefiniteness of the normal line is eliminated based on the temporarily recognized model. To do. When the subject to be recognized is a hand, the normal calculation unit temporarily recognizes the positions of the fingertip and finger pad using the temporary recognition processing image and the model image registered in advance, and the temporary recognition result Based on the above, the indefiniteness of the normal of the finger region in the hand is eliminated. The recognizing unit determines the pointing direction based on the normal of the finger region in which the ambiguity is eliminated. In addition, the normal line calculation unit temporarily recognizes the position and skeleton structure of the hand region using the temporary recognition processing image and the model image registered in advance, and calculates the hand region based on the temporary recognition result. Line indeterminacy may be eliminated. In this case, the recognizing unit determines the hand shape based on the normal of the hand region in which the ambiguity is eliminated. Further, when the subject to be recognized is a face, the normal calculation unit tentatively recognizes the position of the face region using the temporary recognition processing image and the model image registered in advance, and based on the temporary recognition result. To eliminate the indefiniteness of the normal of the face. The recognizing unit determines a face shape or a facial expression based on the normal of the face area in which the ambiguity is eliminated.
 また、認識部は、学習用被写体を撮像した偏光方向が異なる複数の偏光画像に基づいて算出した学習用被写体の法線から学習用被写体に応じた教師データ、例えば学習用被写体の法線の分布を教師データとして教師データベース部に記憶させる。認識部は、認識対象の被写体を撮像した偏光方向が異なる複数の偏光画像に基づいて算出した法線の分布を生徒データとして、生徒データと教師データベース部に記憶されている教師データに基づき生徒データに最も近似した教師データに対応するする学習用被写体を、認識対象の被写体の認識結果としてもよい。 In addition, the recognizing unit determines the distribution of the teacher data corresponding to the learning subject, for example, the normal line of the learning subject from the normals of the learning subject calculated based on a plurality of polarization images having different polarization directions obtained by imaging the learning subject. Are stored in the teacher database as teacher data. The recognizing unit uses the distribution of normals calculated based on a plurality of polarized images with different polarization directions obtained by imaging the subject to be recognized as student data, and student data based on student data and teacher data stored in the teacher database unit. The learning subject corresponding to the teacher data that is most similar to may be used as the recognition result of the subject to be recognized.
 この技術の第2の側面は、偏光画像取得部で、認識対象の被写体が撮像されている偏光方向が異なる複数の偏光画像を取得することと、
 法線算出部で、前記偏光画像取得部によって取得された偏光画像に基づいて、画素毎に法線を算出することと、
 認識部で、前記法線算出部で算出された法線を用いて前記被写体の認識を行うこと
を含む画像処理方法にある。
According to a second aspect of the present technology, the polarization image acquisition unit acquires a plurality of polarization images having different polarization directions in which the recognition subject is imaged, and
In the normal calculation unit, based on the polarization image acquired by the polarization image acquisition unit, calculating a normal for each pixel;
In the image processing method, the recognition unit may recognize the subject using the normal calculated by the normal calculation unit.
 この技術の第3の側面は、認識対象の被写体が撮像されている偏光方向が異なる複数の偏光画像を取得する手順と、
 前記取得された偏光画像に基づいて画素毎に法線を算出する手順と、
 前記算出された法線を用いて前記被写体の認識を行う手順と
をコンピュータで実行させるプログラムにある。
The third aspect of this technique is a procedure for acquiring a plurality of polarization images having different polarization directions in which a subject to be recognized is imaged,
A procedure for calculating a normal for each pixel based on the acquired polarization image;
A program for causing a computer to execute a procedure for recognizing the subject using the calculated normal line.
 なお、本技術のプログラムは、例えば、様々なプログラム・コードを実行可能な汎用コンピュータに対して、コンピュータ可読な形式で提供する記憶媒体、通信媒体、例えば、光ディスクや磁気ディスク、半導体メモリなどの記憶媒体、あるいは、ネットワークなどの通信媒体によって提供可能なプログラムである。このようなプログラムをコンピュータ可読な形式で提供することにより、コンピュータ上でプログラムに応じた処理が実現される。 Note that the program of the present technology is, for example, a storage medium or a communication medium provided in a computer-readable format to a general-purpose computer that can execute various program codes, such as an optical disk, a magnetic disk, or a semiconductor memory. It is a program that can be provided by a medium or a communication medium such as a network. By providing such a program in a computer-readable format, processing corresponding to the program is realized on the computer.
 この技術によれば、認識対象の被写体が撮像されている偏光方向が異なる複数の偏光画像が取得されて、この取得された偏光画像に基づいて画素毎に法線が算出されて、算出された法線を用いて被写体の認識が行われる。このため、被写体の認識を精度よく容易に行うことができる。なお、本明細書に記載された効果はあくまで例示であって限定されるものではなく、また付加的な効果があってもよい。 According to this technique, a plurality of polarization images having different polarization directions in which a subject to be recognized is imaged are acquired, and a normal is calculated for each pixel based on the acquired polarization image. Subject recognition is performed using the normal. For this reason, it is possible to easily and accurately recognize the subject. Note that the effects described in the present specification are merely examples and are not limited, and may have additional effects.
画像処理装置の基本構成を示す図である。It is a figure which shows the basic composition of an image processing apparatus. 画像処理装置の基本動作を示したフローチャートである。3 is a flowchart showing a basic operation of the image processing apparatus. 画像処理装置の第1の実施の形態の構成を示す図である。It is a figure which shows the structure of 1st Embodiment of an image processing apparatus. 偏光画像取得部で偏光画像を生成する場合の構成を例示した図である。It is the figure which illustrated the structure in the case of producing | generating a polarization image in a polarization image acquisition part. 被写体の形状と偏光画像について説明するための図である。It is a figure for demonstrating a to-be-photographed object's shape and a polarization image. 輝度と偏光角との関係を例示した図である。It is the figure which illustrated the relationship between a brightness | luminance and a polarization angle. 偏光度と天頂角の関係を示す図である。It is a figure which shows the relationship between a polarization degree and a zenith angle. 180度の不定性を説明するための図である。It is a figure for demonstrating indefiniteness of 180 degree | times. 第1の実施の形態の動作を示すフローチャートである。It is a flowchart which shows operation | movement of 1st Embodiment. 第1の実施の形態における第1の具体例の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the 1st specific example in 1st Embodiment. 手の輪郭の検出結果を例示した図である。It is the figure which illustrated the detection result of the outline of the hand. 指腹領域の法線を示す図である。It is a figure which shows the normal line of a finger pad area | region. 指さし方向を示す図である。It is a figure which shows the pointing direction. 視線方向を示す図である。It is a figure which shows a gaze direction. 第1の実施の形態における第2の具体例の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the 2nd specific example in 1st Embodiment. 顔の三次元形状および顔の向きの判別を説明するための図である。It is a figure for demonstrating discrimination | determination of the three-dimensional shape and face direction of a face. 第1の実施の形態における第3の具体例の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the 3rd specific example in 1st Embodiment. 手の骨格を検出する場合の動作を説明するための図である。It is a figure for demonstrating operation | movement in the case of detecting the skeleton of a hand. 画像処理装置の第2の実施の形態の構成を例示した図である。It is the figure which illustrated the structure of 2nd Embodiment of an image processing apparatus. 学習動作を示すフローチャートである。It is a flowchart which shows learning operation | movement. 学習結果を用いた認識動作を示すフローチャートである。It is a flowchart which shows the recognition operation | movement using a learning result. 第2の実施の形態における具体例の動作を示している。The operation | movement of the specific example in 2nd Embodiment is shown. 適用対象となるユーザインタフェースを例示した図である。It is the figure which illustrated the user interface used as application object.
 以下、本技術を実施するための形態について説明する。なお、説明は以下の順序で行う。
 1.画像処理装置の基本構成と基本動作
 2.第1の実施の形態
  2-1.第1の実施の形態の構成
  2-2.第1の実施の形態の動作
  2-3.第1の実施の形態における第1の具体例
  2-4.第1の実施の形態における第2の具体例
  2-5.第1の実施の形態における第3の具体例
 3.第2の実施の形態
  3-1.第2の実施の形態の構成
  3-2.第2の実施の形態の動作
  3-3.第2の実施の形態における具体例
Hereinafter, embodiments for carrying out the present technology will be described. The description will be given in the following order.
1. 1. Basic configuration and basic operation of image processing apparatus First embodiment 2-1. Configuration of first embodiment 2-2. Operation of first embodiment 2-3. First specific example in the first embodiment 2-4. Second specific example in the first embodiment 2-5. 3. Third specific example in the first embodiment Second embodiment 3-1. Configuration of second embodiment 3-2. Operation of second embodiment 3-3. Specific example in the second embodiment
 <1.画像処理装置の基本構成と基本動作>
 図1は、画像処理装置の基本構成を示している。画像処理装置10は、偏光画像取得部20と法線算出部30、認識部40を有している。
<1. Basic Configuration and Basic Operation of Image Processing Device>
FIG. 1 shows a basic configuration of the image processing apparatus. The image processing apparatus 10 includes a polarization image acquisition unit 20, a normal calculation unit 30, and a recognition unit 40.
 偏光画像取得部20は、認識対象の被写体が撮像されている偏光方向が異なる複数の偏光画像、例えば偏光方向が3方向以上の偏光画像を取得する。偏光画像取得部20は、偏光方向が3方向以上の偏光画像を生成する撮像部を有する構成であってもよく、外部機器や記録媒体等から偏光方向が3方向以上の偏光画像を取得する構成であってもよい。 The polarization image acquisition unit 20 acquires a plurality of polarization images with different polarization directions in which the subject to be recognized is captured, for example, polarization images with three or more polarization directions. The polarization image acquisition unit 20 may include an imaging unit that generates a polarization image with three or more polarization directions, and a configuration that acquires a polarization image with three or more polarization directions from an external device or a recording medium. It may be.
 法線算出部30は、偏光画像取得部20で取得された偏光画像に基づいて、認識対象の被写体の法線を算出する。法線算出部30は、偏光画像取得部20で取得された偏光方向が異なる複数の偏光画像をモデル式に当てはめることによって法線を算出する。また、法線算出部30は、例えば画素毎に算出した法線に対して不定性を解消する処理を行ってもよい。 The normal calculation unit 30 calculates the normal of the subject to be recognized based on the polarization image acquired by the polarization image acquisition unit 20. The normal line calculation unit 30 calculates a normal line by applying a plurality of polarization images having different polarization directions acquired by the polarization image acquisition unit 20 to the model formula. In addition, the normal line calculation unit 30 may perform processing for eliminating indefiniteness with respect to the normal line calculated for each pixel, for example.
 認識部40は、法線算出部30で算出された法線に基づき認識対象の被写体を認識する処理を行う。認識部40は、例えば認識対象の被写体がユーザインタフェースにおける入力指示体であるとき、被写体の種類や位置,姿勢等を認識して、認識結果をユーザインタフェースにおける入力情報として出力する。 The recognition unit 40 performs a process of recognizing the subject to be recognized based on the normal calculated by the normal calculation unit 30. For example, when the subject to be recognized is an input indicator in the user interface, the recognition unit 40 recognizes the type, position, orientation, and the like of the subject and outputs the recognition result as input information in the user interface.
 図2は、画像処理装置の基本動作を示したフローチャートである。ステップST1で画像処理装置は偏光画像を取得する。画像処理装置10の偏光画像取得部20は、認識対象の被写体が撮像されている偏光方向が異なる複数の偏光画像を取得してステップST2に進む。ステップST2で画像処理装置は法線を算出する。画像処理装置10の法線算出部30は、ステップST1で取得された偏光画像に基づいて画素毎に法線を算出してステップST3に進む。ステップST3で画像処理装置は認識結果を出力する。画像処理装置10の認識部40は、ステップST2で算出された法線に基づいて、認識対象の被写体についての認識処理を行い、認識結果を出力する。 FIG. 2 is a flowchart showing the basic operation of the image processing apparatus. In step ST1, the image processing apparatus acquires a polarization image. The polarization image acquisition unit 20 of the image processing apparatus 10 acquires a plurality of polarization images having different polarization directions in which the recognition target subject is imaged, and proceeds to step ST2. In step ST2, the image processing apparatus calculates a normal line. The normal calculation unit 30 of the image processing apparatus 10 calculates a normal for each pixel based on the polarization image acquired in step ST1, and proceeds to step ST3. In step ST3, the image processing apparatus outputs a recognition result. The recognition unit 40 of the image processing apparatus 10 performs recognition processing on the subject to be recognized based on the normal calculated in step ST2, and outputs a recognition result.
 <2.第1の実施の形態>
  <2-1.第1の実施の形態の構成>
 図3は、画像処理装置の第1の実施の形態の構成を示している。画像処理装置11は、偏光画像取得部20と法線算出部31、ユーザインタフェース(UI)処理部41を有している。
<2. First Embodiment>
<2-1. Configuration of First Embodiment>
FIG. 3 shows the configuration of the first embodiment of the image processing apparatus. The image processing apparatus 11 includes a polarization image acquisition unit 20, a normal calculation unit 31, and a user interface (UI) processing unit 41.
 偏光画像取得部20は、偏光方向が異なる複数の偏光画像を取得する。図4は、偏光画像取得部で偏光画像を生成する場合の構成を例示している。偏光画像取得部20は、例えば図4の(a)に示すように、イメージセンサ201に複数の偏光方向の画素構成とされた偏光フィルタ202を配置して撮像を行うことで生成する。なお、図4の(a)は、各画素が異なる4種類の偏光方向(偏光方向を矢印で示す)のいずれかの画素となる偏光フィルタ202をイメージセンサ201の前面に配置した場合を例示している。また、偏光画像取得部20は、図4の(b)に示すように、マルチレンズアレイの構成を利用して偏光方向が異なる複数の偏光画像を生成してもよい。例えばイメージセンサ201の前面にレンズ203複数(図では4個)設けて、各レンズ203によって被写体の光学像をイメージセンサ201の撮像面にそれぞれ結像させる。また、各レンズ203の前面に偏光板204を設けて、偏光板204の偏光方向を異なる方向として、偏光方向が異なる複数の偏光画像を生成する。このように偏光画像取得部20を構成すれば、1回の撮像で複数の偏光画像を取得できることから速やかに認識対象の被写体の認識処理を行える。また、図4の(c)に示すように、撮像部210-1~210-4の前に互いに偏光方向が異なる偏光板212-1~212-4を設けた構成として、異なる複数の視点から偏光方向が異なる複数の偏光画像を生成してもよい。 The polarization image acquisition unit 20 acquires a plurality of polarization images having different polarization directions. FIG. 4 illustrates a configuration when a polarization image is generated by the polarization image acquisition unit. For example, as illustrated in FIG. 4A, the polarization image acquisition unit 20 generates the image sensor 201 by arranging the polarization filter 202 having a pixel configuration of a plurality of polarization directions to perform imaging. FIG. 4A illustrates a case where a polarizing filter 202 serving as one of four different polarization directions (polarization directions are indicated by arrows) of each pixel is arranged on the front surface of the image sensor 201. ing. In addition, as illustrated in FIG. 4B, the polarization image acquisition unit 20 may generate a plurality of polarization images having different polarization directions using a multi-lens array configuration. For example, a plurality of lenses 203 (four in the figure) are provided on the front surface of the image sensor 201, and an optical image of a subject is formed on the imaging surface of the image sensor 201 by each lens 203. In addition, a polarizing plate 204 is provided on the front surface of each lens 203, and a plurality of polarization images having different polarization directions are generated with the polarization direction of the polarizing plate 204 being different. If the polarization image acquisition unit 20 is configured in this way, a plurality of polarization images can be acquired by one imaging, and thus a recognition target subject can be quickly recognized. Also, as shown in FIG. 4C, a configuration in which polarizing plates 212-1 to 212-4 having different polarization directions are provided in front of the imaging units 210-1 to 210-4, from a plurality of different viewpoints. A plurality of polarization images having different polarization directions may be generated.
 なお、認識対象の被写体の動きが遅い場合や認識対象の被写体がステップ的に動作する場合には、図4の(d)に示すように、撮像部210の前に偏光板211を設けた構成としてもよい。この場合、偏光板211を回転させて異なる複数の偏光方向でそれぞれ撮像を行い、偏光方向が異なる複数の偏光画像を取得する。 In the case where the movement of the recognition target subject is slow or the recognition target subject operates stepwise, a configuration in which a polarizing plate 211 is provided in front of the imaging unit 210 as shown in FIG. It is good. In this case, the polarizing plate 211 is rotated to capture images in a plurality of different polarization directions, and a plurality of polarization images having different polarization directions are acquired.
 イメージセンサ201でカラーフィルタを使用しない場合、偏光画像取得部20では輝度偏光画像を取得できる。ここで、図4の(a)の場合、偏光方向が異なる方向であって隣接している4画素の輝度を平均することで、無偏光の通常輝度画像と同等の画像を取得することができる。また、図4の(b),(c)の場合、被写体までの距離に対して各レンズ203や撮像部210-1~210-4の位置間隔が無視できる程度に短ければ、偏光方向が異なる複数の偏光画像では視差を無視することができる。したがって、偏光方向が異なる偏光画像の輝度を平均することで、無偏光の通常輝度画像と同等の画像を取得することができる。また、視差を無視することができない場合は、偏光方向が異なる偏光画像を視差量に応じて位置合わせして、位置合わせ後の偏光画像の輝度を平均すれば無偏光の通常輝度画像と同等の画像を取得することができる。また、図4の(d)の場合、画素毎に偏光方向が異なる輝度偏光画像の輝度を平均することで、無偏光である通常輝度画像と同等の画像を取得できる。 When the image sensor 201 does not use a color filter, the polarization image acquisition unit 20 can acquire a luminance polarization image. Here, in the case of FIG. 4A, an image equivalent to a non-polarized normal luminance image can be acquired by averaging the luminances of four adjacent pixels in different directions of polarization. . 4B and 4C, the polarization direction is different if the distance between the lenses 203 and the imaging units 210-1 to 210-4 is short enough to be ignored with respect to the distance to the subject. Parallax can be ignored for multiple polarization images. Therefore, by averaging the luminance of polarized images having different polarization directions, an image equivalent to a non-polarized normal luminance image can be obtained. In addition, when parallax cannot be ignored, if polarized images with different polarization directions are aligned according to the amount of parallax and the luminance of the polarized image after alignment is averaged, it is equivalent to a non-polarized normal luminance image Images can be acquired. In the case of FIG. 4D, an image equivalent to a normal luminance image that is non-polarized can be acquired by averaging the luminances of luminance-polarized images having different polarization directions for each pixel.
 さらに、偏光画像取得部20は、輝度偏光画像だけでなく、イメージセンサ201にカラーフィルタを設けることで三原色画像を同時に生成してもよく、赤外画像等を同時に生成してもよい。また、偏光画像取得部20は、三原色画像から輝度を算出して輝度画像を生成してもよい。 Furthermore, the polarization image acquisition unit 20 may generate not only the luminance polarization image but also the three primary color images at the same time by providing the image sensor 201 with a color filter, and may simultaneously generate the infrared image and the like. Further, the polarization image acquisition unit 20 may generate a luminance image by calculating the luminance from the three primary color images.
 法線算出部31は、偏光画像取得部20で取得された複数の偏光画像と補助情報例えば種々のモデルから、不定性を解消した法線を算出する。法線算出部31は、例えば偏光処理部301、仮認識処理用画像生成部302、仮認識処理部303、モデルデータベース部304、不定性解消部305を有している。 The normal calculation unit 31 calculates a normal from which indefiniteness has been eliminated from a plurality of polarization images acquired by the polarization image acquisition unit 20 and auxiliary information such as various models. The normal line calculation unit 31 includes, for example, a polarization processing unit 301, a provisional recognition processing image generation unit 302, a provisional recognition processing unit 303, a model database unit 304, and an indeterminacy elimination unit 305.
 偏光処理部301は、偏光画像から法線を算出して不定性解消部305へ出力する。ここで、被写体の形状と偏光画像について図5を用いて説明する。例えば図5に示すように、光源LTを用いて被写体OBの照明を行い、撮像部DCは偏光板PLを介して被写体OBの撮像を行う。この場合、撮像画像は、偏光板PLの偏光方向に応じて被写体OBの輝度が変化する。なお、説明を容易とするため、例えば偏光板PLを回転して撮像を行うことで、複数の偏光画像を取得して、最も高い輝度をImax,最も低い輝度をIminとする。また、2次元座標におけるx軸とy軸を偏光板PLの平面上としたとき、偏光板PLを回転させたときのx軸に対するy軸方向の角度を偏光角υとする。 The polarization processing unit 301 calculates a normal line from the polarization image and outputs it to the indeterminacy eliminating unit 305. Here, the shape of the subject and the polarization image will be described with reference to FIG. For example, as shown in FIG. 5, the light source LT is used to illuminate the subject OB, and the imaging unit DC images the subject OB via the polarizing plate PL. In this case, in the captured image, the luminance of the subject OB changes according to the polarization direction of the polarizing plate PL. For ease of explanation, a plurality of polarization images are acquired by rotating the polarizing plate PL, for example, and the highest luminance is Imax and the lowest luminance is Imin. Further, when the x-axis and the y-axis in the two-dimensional coordinates are on the plane of the polarizing plate PL, the angle in the y-axis direction with respect to the x-axis when the polarizing plate PL is rotated is a polarization angle υ.
 偏光板PLは、180度回転させると元の偏光状態に戻り180度の周期を有している。また、最大輝度Imaxが観測されたときの偏光角υを方位角φとする。このような定義を行うと、偏光板PLを回転させたときに観測される輝度Iは式(1)のように表すことができる。なお、図6は、輝度と偏光角との関係を例示している。また、この例は拡散反射のモデルを示しており、鏡面反射の場合は方位角が偏光角に比べて90度ずれる。 The polarizing plate PL returns to the original polarization state when rotated 180 degrees and has a period of 180 degrees. The polarization angle υ when the maximum luminance Imax is observed is defined as an azimuth angle φ. When such a definition is performed, the luminance I observed when the polarizing plate PL is rotated can be expressed as in Expression (1). FIG. 6 illustrates the relationship between the luminance and the polarization angle. This example shows a diffuse reflection model. In the case of specular reflection, the azimuth angle is shifted by 90 degrees from the polarization angle.
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 式(1)では、偏光角υが偏光画像の生成時に明らかであり、最大輝度Imaxと最小輝度Iminおよび方位角φが変数となる。したがって、偏光方向が3方向以上の偏光画像の輝度を用いて、式(1)に示すモデル式へのフィッティングを行うことにより、輝度と偏光角の関係を示すモデル式に基づき最大輝度となる偏光角である方位角φを判別することができる。 In Equation (1), the polarization angle υ is clear when the polarization image is generated, and the maximum luminance Imax, the minimum luminance Imin, and the azimuth angle φ are variables. Therefore, by performing the fitting to the model equation shown in the equation (1) using the luminance of the polarization image having three or more polarization directions, the polarization having the maximum luminance based on the model equation indicating the relationship between the luminance and the polarization angle. The azimuth angle φ, which is an angle, can be determined.
 また、物体表面の法線を極座標系で表現して、法線を方位角φと天頂角θとする。なお、天頂角θはz軸から法線に向かう角度、方位角φは、上述のようにx軸に対するy軸方向の角度とする。ここで、偏光板PLを回したときに最小輝度Iminと最大輝度Imaxが得られたとき、式(2)に基づき偏光度ρを算出できる。 Also, the normal of the object surface is expressed in a polar coordinate system, and the normal is defined as an azimuth angle φ and a zenith angle θ. The zenith angle θ is an angle from the z axis toward the normal, and the azimuth angle φ is an angle in the y axis direction with respect to the x axis as described above. Here, when the minimum luminance Imin and the maximum luminance Imax are obtained when the polarizing plate PL is rotated, the degree of polarization ρ can be calculated based on the equation (2).
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
 偏光度ρと天頂角θの関係は、拡散反射の場合、フレネルの式から例えば図7に示す特性を有することが知られている。したがって、図7に示す特性から偏光度ρに基づき天頂角θを判別できる。なお、図7に示す特性は例示であって、被写体の屈折率等に依存して特性は変化する。例えば屈折率が大きくなるに伴い偏光度が大きくなる。 It is known that the relationship between the degree of polarization ρ and the zenith angle θ has the characteristics shown in FIG. Therefore, the zenith angle θ can be determined from the characteristics shown in FIG. The characteristics shown in FIG. 7 are merely examples, and the characteristics change depending on the refractive index of the subject. For example, the degree of polarization increases as the refractive index increases.
 このようにして算出された法線は、180度の不定性を有している。図8は、180度の不定性を説明するための図である。図8の(a)に示す被写体OBを撮像部DCで撮像して法線を算出する場合、偏光方向の回転に応じた輝度変化は180度の周期を有している。したがって、例えば図8の(b)に示すように被写体OBの上半分の領域GAでは法線方向(矢印で示す)が正しい方向となり、下半分の領域GBでは法線方向が逆方向となるおそれがある。 The normal calculated in this way has an indefiniteness of 180 degrees. FIG. 8 is a diagram for explaining the indefiniteness of 180 degrees. When the normal is calculated by imaging the subject OB shown in FIG. 8A by the imaging unit DC, the luminance change according to the rotation of the polarization direction has a cycle of 180 degrees. Therefore, for example, as shown in FIG. 8B, the normal direction (indicated by an arrow) is correct in the upper half area GA of the subject OB, and the normal direction is reverse in the lower half area GB. There is.
 仮認識処理用画像生成部302は、偏光画像取得部20で取得された複数の偏光画像に基づき仮認識処理用画像を生成する。仮認識処理用画像生成部302は、例えば複数の偏光画像の平均を算出することで、偏光板や偏光フィルタを用いることなく撮像して得られた撮像画像(通常画像)と同等の仮認識処理用画像を生成する。また、仮認識処理用画像生成部302は、複数の偏光画像から一つの偏光方向の偏光画像を抽出して仮認識処理用画像としてもよい。また、仮認識処理用画像生成部302は、取得した複数の偏光画像を仮認識処理用画像としてもよい。さらに、仮認識処理用画像生成部302は、仮認識処理用画像として通常画像と偏光画像を用いるようにしてもよい。仮認識処理用画像生成部302は、仮認識処理用画像を仮認識処理部303へ出力する。 The temporary recognition processing image generation unit 302 generates a temporary recognition processing image based on the plurality of polarization images acquired by the polarization image acquisition unit 20. The temporary recognition processing image generation unit 302 calculates the average of a plurality of polarized images, for example, thereby obtaining a temporary recognition process equivalent to a captured image (normal image) obtained by imaging without using a polarizing plate or a polarizing filter. An image is generated. In addition, the temporary recognition processing image generation unit 302 may extract a polarization image of one polarization direction from a plurality of polarization images to obtain a temporary recognition processing image. Further, the temporary recognition processing image generation unit 302 may use the acquired plurality of polarization images as the temporary recognition processing image. Further, the temporary recognition processing image generation unit 302 may use a normal image and a polarization image as the temporary recognition processing image. The temporary recognition processing image generation unit 302 outputs the temporary recognition processing image to the temporary recognition processing unit 303.
 仮認識処理部303は、仮認識処理用画像生成部302で生成された仮認識処理用画像を用いて認識対象の被写体の仮認識を行う。仮認識処理部303は、仮認識処理用画像を用いて被写体認識を行い、認識対象の被写体の種類や位置,姿勢等を判別する。仮認識処理部303は、例えば仮認識処理用画像とモデルデータベース部304に予め記憶されている種々の物体のモデルの画像(通常画像や偏光画像)を用いて、認識対象の被写体に最も近似したモデルを判別する。また、仮認識処理部303は、仮認識処理用画像生成部302で生成された仮認識処理用画像が偏光画像を含む場合、偏光特性も考慮して認識対象の被写体に最も近似したモデルを判別する。仮認識処理部303は、判別したモデルを仮認識結果として不定性解消部305へ出力する。なお、仮認識処理部303は、モデルフィッティングに限らず他の物体認識手法を用いて、認識対象の被写体の仮認識を行ってもよい。 The temporary recognition processing unit 303 performs temporary recognition of the subject to be recognized using the temporary recognition processing image generated by the temporary recognition processing image generation unit 302. The temporary recognition processing unit 303 performs subject recognition using the temporary recognition processing image, and determines the type, position, posture, and the like of the subject to be recognized. The temporary recognition processing unit 303 uses, for example, a temporary recognition processing image and images of various object models (a normal image and a polarization image) stored in advance in the model database unit 304, and approximates the subject to be recognized most. Determine the model. In addition, when the temporary recognition processing image generated by the temporary recognition processing image generation unit 302 includes a polarization image, the temporary recognition processing unit 303 determines a model that is closest to the object to be recognized in consideration of polarization characteristics. To do. The temporary recognition processing unit 303 outputs the determined model to the indeterminacy eliminating unit 305 as a temporary recognition result. The temporary recognition processing unit 303 may perform temporary recognition of the subject to be recognized using not only model fitting but also other object recognition methods.
 不定性解消部305は、偏光処理部301で算出された法線の不定性を、仮認識処理部303から供給された仮認識結果に基づいて解消する。仮認識結果は、上述のように認識対象の被写体の種類や位置,姿勢等を示す情報を有している。したがって、不定性解消部305は、法線において180度の位相差を有する法線方向から、仮認識結果が示すモデルに基づき法線の不定性を解消する。すなわち、不定性解消部305は、認識対象の被写体の形状に対応するように法線方向を特定することで法線の不定性を解消して、不定性を解消した法線をUI処理部41へ出力する。 The indeterminacy canceling unit 305 cancels the normal indefiniteness calculated by the polarization processing unit 301 based on the temporary recognition result supplied from the temporary recognition processing unit 303. The temporary recognition result has information indicating the type, position, posture, and the like of the subject to be recognized as described above. Therefore, the ambiguity eliminating unit 305 eliminates the ambiguity of the normal from the normal direction having a phase difference of 180 degrees in the normal based on the model indicated by the provisional recognition result. In other words, the ambiguity elimination unit 305 eliminates the ambiguity of the normal line by specifying the normal direction so as to correspond to the shape of the subject to be recognized, and the normal processing unit 41 eliminates the ambiguity. Output to.
 UI処理部41は、法線算出部31で生成された不定性が解消された法線を用いて被写体の認識を行う。UI処理部41は、偏光画像取得部20で取得された偏光画像における認識対象の被写体をユーザインタフェースにおける入力指示体とする。UI処理部41は、法線算出部31で算出された法線に基づき被写体認識を行いユーザインタフェースにおける入力情報(以下「UI情報」という)を生成する。上述のように、法線は認識対象の被写体の三次元形状を示す情報であり、UI処理部41は被写体認識を行い、認識対象の被写体の種類や位置,姿勢等を認識して、認識結果をUI情報として出力する。 The UI processing unit 41 recognizes a subject using the normal generated by the normal calculation unit 31 from which the ambiguity is eliminated. The UI processing unit 41 uses the subject to be recognized in the polarization image acquired by the polarization image acquisition unit 20 as an input indicator in the user interface. The UI processing unit 41 performs subject recognition based on the normal calculated by the normal calculation unit 31 and generates input information (hereinafter referred to as “UI information”) in the user interface. As described above, the normal is information indicating the three-dimensional shape of the subject to be recognized, and the UI processing unit 41 performs subject recognition, recognizes the type, position, orientation, and the like of the subject to be recognized, and the recognition result. Is output as UI information.
  <2-2.第1の実施の形態の動作>
 図9は、第1の実施の形態の動作を示すフローチャートである。ステップST11で偏光画像取得部20は偏光画像を取得する。偏光画像取得部20は、偏光板または偏光フィルタを用いて撮像を行い、偏光方向が互いに異なる複数の偏光画像を取得してステップST12,13に進む。
<2-2. Operation of First Embodiment>
FIG. 9 is a flowchart showing the operation of the first embodiment. In step ST11, the polarization image acquisition unit 20 acquires a polarization image. The polarization image acquisition unit 20 performs imaging using a polarizing plate or a polarization filter, acquires a plurality of polarization images having different polarization directions, and proceeds to steps ST12 and ST13.
 ステップST12で法線算出部31は法線を算出する。法線算出部31は、偏光画像の画素毎に、偏光方向が互いに異なる複数の偏光画像の画素値を用いてモデル式へのフィッティングを行い、フィッティング後のモデル式に基づき法線を算出してステップST15に進む。 In step ST12, the normal calculation unit 31 calculates a normal. The normal calculation unit 31 performs fitting to a model equation using pixel values of a plurality of polarization images having different polarization directions for each pixel of the polarization image, and calculates a normal based on the model equation after fitting. Proceed to step ST15.
 ステップST13で法線算出部31は仮認識処理用画像を生成する。法線算出部31は、例えばステップST11で生成された偏光方向が互いに異なる複数の偏光画像の画素値を偏光画像の画素毎に平均して、平均値を仮認識処理用画像(通常画像と同等)の画素値としてステップST14に進む。 In step ST13, the normal calculation unit 31 generates a temporary recognition processing image. For example, the normal line calculation unit 31 averages pixel values of a plurality of polarization images having different polarization directions generated in step ST11 for each pixel of the polarization image, and calculates the average value as a temporary recognition processing image (equivalent to a normal image). ), The process proceeds to step ST14.
 ステップST14で法線算出部31は仮認識処理を行う。法線算出部31は、例えば仮認識処理用画像と予め記憶されているモデルを用いてフィッティングを行い、認識対象の被写体の種類や位置,姿勢等を認識してステップST15に進む。 In step ST14, the normal calculation unit 31 performs provisional recognition processing. The normal line calculation unit 31 performs fitting using, for example, a temporary recognition processing image and a model stored in advance, recognizes the type, position, orientation, and the like of the subject to be recognized, and proceeds to step ST15.
 ステップST15で法線算出部31は法線の不定性を解消する。法線算出部31は、ステップST12で算出した法線すなわち180度の不定性を有した法線から、ステップST14の仮認識結果すなわち認識対象の被写体の種類や位置,姿勢等に基づき不定性を解消してステップST16に進む。 In step ST15, the normal calculation unit 31 eliminates the indefiniteness of the normal. The normal line calculation unit 31 determines the indeterminacy from the normal line calculated in step ST12, that is, the normal line having indefiniteness of 180 degrees, based on the temporary recognition result in step ST14, that is, the type, position, posture, etc. of the subject to be recognized. It cancels and it progresses to step ST16.
 ステップST16でUI処理部41はUI情報を生成する。UI処理部41は不定性を解消した法線に基づき、認識対象の被写体の種類や位置,姿勢等を認識して、認識結果をUI情報とする。 In step ST16, the UI processing unit 41 generates UI information. The UI processing unit 41 recognizes the type, position, orientation, and the like of the subject to be recognized based on the normal line in which the indefiniteness is eliminated, and sets the recognition result as UI information.
  <2-3.第1の実施の形態における第1の具体例>
 次に、第1の実施の形態における第1の具体例について説明する。第1の具体例では、認識対象の被写体が手であり、仮認識処理部は、仮認識処理用画像と予め登録されているモデルの画像を用いて手の指先と指腹の位置を認識する。また、不定性解消部は、仮認識処理部で仮認識された前記指先と指腹の位置に基づいて手における指領域の法線の不定性を解消して、認識部は不定性が解消されている指領域の法線に基づき指さし方向を判別する。
<2-3. First Specific Example in First Embodiment>
Next, a first specific example in the first embodiment will be described. In the first specific example, the subject to be recognized is a hand, and the temporary recognition processing unit recognizes the positions of the fingertip and finger pad using the temporary recognition processing image and a pre-registered model image. . Further, the indeterminacy eliminating unit eliminates the indefiniteness of the normal of the finger region in the hand based on the positions of the fingertip and the finger pad temporarily recognized by the temporary recognition processing unit, and the incognition unit eliminates the indeterminacy. The pointing direction is determined based on the normal of the finger area.
 図10は、第1の実施の形態における第1の具体例の動作を示すフローチャートである。ステップST21で偏光画像取得部20は偏光画像を取得する。偏光画像取得部20は、偏光板または偏光フィルタを用いて指さしを行っている手の撮像を行う。また、偏光画像取得部20は、偏光方向が互いに異なる複数の偏光画像を取得してステップST22,23に進む。 FIG. 10 is a flowchart showing the operation of the first specific example in the first embodiment. In step ST21, the polarization image acquisition unit 20 acquires a polarization image. The polarization image acquisition unit 20 performs imaging of a pointing hand using a polarizing plate or a polarizing filter. Further, the polarization image acquisition unit 20 acquires a plurality of polarization images having different polarization directions, and proceeds to steps ST22 and ST23.
 ステップST22で法線算出部31は法線を算出する。法線算出部31は、偏光画像の画素毎に、偏光方向が互いに異なる複数の偏光画像の画素値をモデル式にフィッティングさせて、フィッティング後のモデル式に基づき法線を算出してステップST25に進む。 In step ST22, the normal calculation unit 31 calculates a normal. For each pixel of the polarization image, the normal line calculation unit 31 fits the pixel values of a plurality of polarization images having different polarization directions to the model equation, calculates the normal line based on the model equation after the fitting, and proceeds to step ST25. move on.
 ステップST23で法線算出部31は仮認識処理用画像を生成する。法線算出部31は、例えばステップST21で生成された偏光方向が互いに異なる複数の偏光画像の画素値を偏光画像の画素毎に平均して、平均値を仮認識処理用画像(通常画像と同等)の画素値としてステップST24に進む。 In step ST23, the normal calculation unit 31 generates a temporary recognition processing image. For example, the normal line calculation unit 31 averages the pixel values of a plurality of polarization images having different polarization directions generated in step ST21 for each pixel of the polarization image, and calculates the average value as a temporary recognition processing image (equivalent to a normal image). ), The process proceeds to step ST24.
 ステップST24で法線算出部31は指先・指腹位置を検出する。法線算出部31は、画像認識技術を用いて仮認識処理用画像から指先領域と指腹領域を検出する。法線算出部31は、例えば文献「S.K.Kang, M.Y.Nam, and P.K.Rhee, “Color Based Hand and Finger Detection Technology for User Interaction.”International Conference on Convergence and Hybrid Information Technology, pp 229-236, 2008」に記載されているように、カラーの仮認識処理用画像から肌色の画像領域を抽出して、抽出した画像領域における周辺領域内でエッジ検出を行い手の輪郭を検出する。図11は手の輪郭の検出結果を例示しており、手の輪郭内の画像領域が手領域ARhである。 In step ST24, the normal calculation unit 31 detects the fingertip / fingerpad position. The normal line calculation unit 31 detects the fingertip region and the finger pad region from the temporary recognition processing image using an image recognition technique. The normal calculation unit 31 is disclosed in, for example, documents “SKKang, MYNam, and PKRhee,“ Color Based Hand and Finger Detection Technology for User Interaction. ”International Conference on Convergence and Hybrid Information Technology, pp 229-236, 2008. As described, a skin color image region is extracted from the color temporary recognition processing image, and edge detection is performed in the peripheral region of the extracted image region to detect the contour of the hand. FIG. 11 illustrates the detection result of the hand outline, and the image area within the hand outline is the hand area ARh.
 次に、法線算出部31は、手領域から指領域を検出する。法線算出部31は、手領域に対してモルフォロジーオープン処理を行う。モルフォロジーオープン処理では、イメージ内の小さなオブジェクトが先に除去されて、大きなオブジェクトは形状とサイズが維持される。すなわち、手領域に対してモルフォロジーオープン処理を行うと、手のひらよりも細い指の領域が先に除去されることから、モルフォロジーオープン処理前の画像と処理後の画像の差分を算出すれば指領域を検出することが可能となる。 Next, the normal calculation unit 31 detects a finger region from the hand region. The normal line calculation unit 31 performs a morphological open process on the hand region. In the morphological open process, small objects in the image are removed first, and large objects retain their shape and size. That is, when the morphological open process is performed on the hand area, the finger area that is thinner than the palm is removed first, so if the difference between the image before the morphological open process and the image after the process is calculated, the finger area is calculated. It becomes possible to detect.
 また、指領域の検出は、他の手法を用いてもよい。例えば指の外観形状は複数の凸包から構成されていると見なして、手領域から凸包の検出を行い検出結果に基づき指領域を判別することが可能となる。凸包は、例えばOpenCVにおいて用意されている凸包を取得するメソッドを用いることで容易に検出することが可能である。したがって、図11に示すように、指領域ARfを検出できる。 Also, other methods may be used for detecting the finger area. For example, assuming that the external shape of the finger is composed of a plurality of convex hulls, it is possible to detect the convex hull from the hand region and determine the finger region based on the detection result. The convex hull can be easily detected by using, for example, a method for obtaining the convex hull prepared in OpenCV. Therefore, the finger area ARf can be detected as shown in FIG.
 法線算出部31は、検出した指領域を指先領域と指腹領域に切り分ける。法線算出部31は、図11に示すように、手領域ARhと検出した指領域ARfの差分領域を拳領域ARgとする。また、法線算出部31は、指領域ARfおいて、拳領域ARgの重心BGから最も遠い領域を指先領域ARfs、他の領域を指腹領域ARftとする。 The normal calculation unit 31 separates the detected finger area into a fingertip area and a finger pad area. As shown in FIG. 11, the normal line calculation unit 31 sets a difference area between the hand area ARh and the detected finger area ARf as a fist area ARg. Further, in the finger area ARf, the normal line calculation unit 31 sets the area farthest from the center of gravity BG of the fist area ARg as the fingertip area ARfs and the other area as the finger pad area ARft.
 ステップST25で法線算出部31は法線の不定性を解消する。法線算出部31は、指の外観形状が凸形状であると見なして、ステップST22で算出した法線すなわち180度の不定性を有した法線から、指領域に対応する法線の不定性をステップST24の検出結果に基づき解消してステップST26に進む。なお、図12は、指腹領域の法線を例示しており、図12の(a)は不定性が解消される前の指腹領域ARftの法線、図12の(b)は不定性が解消された後の指腹領域ARftの法線を示している。 In step ST25, the normal calculation unit 31 eliminates the indefiniteness of the normal. The normal calculation unit 31 regards the appearance shape of the finger as a convex shape, and determines the normal of the normal corresponding to the finger region from the normal calculated in step ST22, that is, the normal having an indefiniteness of 180 degrees. Is canceled based on the detection result of step ST24, and the process proceeds to step ST26. 12 illustrates the normal line of the finger pad area, FIG. 12 (a) is the normal line of the finger pad area ARft before the indefiniteness is resolved, and FIG. The normal line of the finger pad area ARft after the erasure is eliminated is shown.
 ステップST26でUI処理部41は指さし方向を判別する。UI処理部41は不定性を解消した指領域の法線に基づき、指先が向いている方向すなわち指さし方向を判別して判別結果をUI情報とする。 In step ST26, the UI processing unit 41 determines the pointing direction. The UI processing unit 41 determines the direction in which the fingertip is facing, that is, the pointing direction based on the normal of the finger region in which the indefiniteness is eliminated, and sets the determination result as UI information.
 指さし方向は、図13に示すように指腹領域ARftの法線に対して略直交する向きfsである。したがって、例えばユーザが偏光画像取得部20の方向を向いて指さし動作を行い、この指さし動作を判別する場合、UI処理部41は、指腹領域の法線に対して直交する方向であって、画像の奥側から手前側を向いた方向FPを指さし方向とする。 The pointing direction is a direction fs substantially perpendicular to the normal of the finger pad area ARft as shown in FIG. Therefore, for example, when the user performs a pointing operation toward the direction of the polarization image acquisition unit 20 and determines the pointing operation, the UI processing unit 41 is in a direction orthogonal to the normal of the finger pad region, A direction FP facing the front side from the back side of the image is taken as a pointing direction.
 指さし方向のベクトルpを式(3)、指腹領域における法線方向のベクトルnを式(4)とすると、式(5)に示すように、ベクトルpとベクトルnの内積は「0」となる。したがって、UI処理部41は、指腹領域における複数の位置(例えばk個の位置)法線すなわち式(6)に示す法線Nについて、式(7)となるベクトルpを式(8)の制約条件のもとで求める。 Assuming that the vector p in the pointing direction is Equation (3) and the vector n in the normal direction in the finger pad region is Equation (4), as shown in Equation (5), the inner product of the vector p and the vector n is “0”. Become. Therefore, the UI processing unit 41 sets a vector p that is expressed by the equation (7) for a plurality of position (for example, k positions) normals in the finger pad region, that is, the normal N shown in the equation (6), by Obtained under constraints.
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003
 ここで関数Wを式(9)のように定義すると、式(7)において最小とする関数は式(10)に示すように関数Cとして定義できる。すなわち、式(8)の制約条件のもとで関数Cが最小となるベクトルpを求めればよい。関数Cが最小となるベクトルpの算出はラグランジュの未定乗数法を用いる。ラグランジュの未定乗数法を用いる場合、式(10)はラグランジュ乗数λを用いて式(11)として示すことができる。したがって、式(12)に示す固有方程式を満たすベクトルpを求めれば、式(7)を満たすベクトルpを算出できる。固有方程式を満たすベクトルpは、ベクトルpが関数Wの固有ベクトルのときに成立し、λの値が固有値に対応する。この時、ベクトルpに関数Wの固有ベクトルを代入すると、最小とする関数の値が「C=λ」になることから、式(7)を満たすベクトルpは、関数Wの最小固有値に対応する固有ベクトルとなる。 Here, when the function W is defined as shown in Expression (9), the function that is minimized in Expression (7) can be defined as Function C as shown in Expression (10). That is, a vector p that minimizes the function C may be obtained under the constraint condition of Expression (8). The Lagrangian undetermined multiplier method is used to calculate the vector p that minimizes the function C. When the Lagrange multiplier method is used, Equation (10) can be expressed as Equation (11) using Lagrange multiplier λ. Therefore, if a vector p satisfying the eigen equation shown in Expression (12) is obtained, the vector p satisfying Expression (7) can be calculated. The vector p satisfying the eigen equation is established when the vector p is an eigen vector of the function W, and the value of λ corresponds to the eigen value. At this time, if the eigenvector of the function W is substituted into the vector p, the value of the function to be minimized is “C = λ”. Therefore, the vector p satisfying Expression (7) is the eigenvector corresponding to the minimum eigenvalue of the function W. It becomes.
Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000004
 このような処理を行えば、距離画像を用いる場合に比べて精度よく指さし方向を判別できるようになる。また、第1の具体例では、指さし方向を判別する場合を例示したが、図14に示すように、瞳EPの位置を検出して、瞳EPの位置が中央からいずれの方向にどの程度移動しているかに応じて視線の方向FPを判別することもできる。 If such processing is performed, the pointing direction can be determined with higher accuracy than in the case of using a distance image. Further, in the first specific example, the case where the pointing direction is determined is illustrated, but as shown in FIG. 14, the position of the pupil EP is detected, and how much the position of the pupil EP moves in any direction from the center. The line-of-sight direction FP can also be determined depending on whether or not it is.
  <2-4.第1の実施の形態における第2の具体例>
 次に、第1の実施の形態における第2の具体例について説明する。第2の具体例では、認識対象の被写体が顔であり、仮認識処理部は、仮認識処理用画像と予め登録されているモデルの画像を用いて顔領域の位置を認識する。また、不定性解消部は、仮認識処理部で仮認識された顔領域の位置に基づいて顔の法線の不定性を解消して、認識部は不定性が解消されている顔領域の法線に基づき顔形状または表情を判別する。
<2-4. Second Specific Example in First Embodiment>
Next, a second specific example in the first embodiment will be described. In the second specific example, the subject to be recognized is a face, and the temporary recognition processing unit recognizes the position of the face area using the temporary recognition processing image and the model image registered in advance. Further, the indeterminacy eliminating unit eliminates the indeterminacy of the normal of the face based on the position of the face area temporarily recognized by the temporary recognition processing unit, and the recognizing unit corrects the face area in which the indeterminacy is eliminated. The face shape or expression is determined based on the line.
 図15は、第1の実施の形態における第2の具体例の動作を示すフローチャートである。ステップST31で偏光画像取得部20は偏光画像を取得する。偏光画像取得部20は、偏光板または偏光フィルタを用いて顔の撮像を行う。また、偏光画像取得部20は、撮像部にカラーフィルタを設けて、偏光方向が互いに異なる複数のカラー偏光画像を取得してステップST32,33に進む。 FIG. 15 is a flowchart showing the operation of the second specific example in the first embodiment. In step ST31, the polarization image acquisition unit 20 acquires a polarization image. The polarization image acquisition unit 20 captures a face using a polarizing plate or a polarization filter. Further, the polarization image acquisition unit 20 provides a color filter in the imaging unit, acquires a plurality of color polarization images having different polarization directions, and proceeds to steps ST32 and ST33.
 ステップST32で法線算出部31は法線を算出する。法線算出部は、偏光画像の画素毎に、偏光方向が互いに異なる複数の偏光画像の画素値をモデル式にフィッティングさせて、フィッティング後のモデル式に基づき法線を算出してステップST35に進む。 In step ST32, the normal calculation unit 31 calculates a normal. For each pixel of the polarization image, the normal line calculation unit fits the pixel values of a plurality of polarization images having different polarization directions to the model equation, calculates the normal line based on the model equation after the fitting, and proceeds to step ST35. .
 ステップST33で法線算出部31は仮認識処理用画像を生成する。法線算出部31は、例えばステップST31で生成された偏光方向が互いに異なる複数の偏光画像の画素値を偏光画像の画素毎に平均して、平均値を仮認識処理用画像(通常画像と同等)の画素値としてステップST34に進む。 In step ST33, the normal calculation unit 31 generates a temporary recognition processing image. For example, the normal line calculation unit 31 averages pixel values of a plurality of polarization images having different polarization directions generated in step ST31 for each pixel of the polarization image, and calculates the average value as a temporary recognition processing image (equivalent to a normal image). ), The process proceeds to step ST34.
 ステップST34で法線算出部31は顔認識や認識した顔の特徴点を検出する。法線算出部31は、顔認識技術を用いて仮認識処理用画像から顔の位置を検出する。さらに、法線算出部31は、例えば文献「T.F.Cootes, C.J.Taylor, D.H.Cooper, and J.Graham:"Active Shape Models - Their Training and Application",Computer Vision and Image Understanding, Vol.16, No.1, January, pp.38-59, 1995」で開示されているActive Shape Modelを用いて、顔の特徴点を検出する。Active Shape Modelでは、画像内の認識対象の姿勢を決定づけるような特徴点を自動的に検出することができる。具体的には、まず手動で特徴点を配置した学習用画像を複数枚用意し、それらの画像から認識対象の中間的な形状を生成する。次に認識したい画像に対して、中間形状の位置を変化させることで認識対象の物体を探す。その際には、中間形状における特徴点の周囲の輝度変化を見て、テンプレートマッチングを行う。これらの検索は粗い解像度から高密な解像度までの画像ピラミッドの各解像度で繰り返される。法線算出部31は、このような処理を行い、認識した顔の特徴点を検出してステップST35に進む。 In step ST34, the normal line calculation unit 31 detects face recognition and feature points of the recognized face. The normal line calculation unit 31 detects the position of the face from the temporary recognition processing image using a face recognition technique. Further, the normal calculation unit 31 is, for example, a document “TFCootes, CJTaylor, DHCooper, and J.Graham:“ Active Shape Models-Their Training and Application ”, Computer Vision and Image Understanding, Vol.16, No.1. , “January,” pp.38-59, “1995”, and detects the feature points of the face. Active Shape Model can automatically detect feature points that determine the posture of the recognition target in the image. Specifically, first, a plurality of learning images in which feature points are manually arranged are prepared, and an intermediate shape to be recognized is generated from these images. Next, the object to be recognized is searched for by changing the position of the intermediate shape with respect to the image to be recognized. At that time, template matching is performed by looking at the luminance change around the feature point in the intermediate shape. These searches are repeated at each resolution of the image pyramid from coarse to high resolution. The normal calculation unit 31 performs such processing, detects the recognized facial feature points, and proceeds to step ST35.
 ステップST35で法線算出部31は法線の不定性を解消する。法線算出部31は、ステップST34で検出した顔の特徴点の位置と予め記憶されている三次元モデルの特徴点の位置関係から、顔の三次元形状および顔の向きを判別する。法線算出部31は図16に示すように、顔OBfと三次元モデルMLの特徴点(例えば目や鼻、口等)の位置関係から、顔の三次元形状および顔の向きを判別する。さらに、法線算出部31は、判別した顔の三次元形状および顔の向きに基づき、ステップST32で算出した法線すなわち180度の不定性を有した法線から、顔領域に対応する法線の不定性をステップST34の検出結果に基づき解消してステップST36に進む。 In step ST35, the normal calculation part 31 eliminates the indefiniteness of the normal. The normal calculation unit 31 determines the three-dimensional shape of the face and the orientation of the face from the positional relationship between the feature points of the face detected in step ST34 and the feature points of the three-dimensional model stored in advance. As shown in FIG. 16, the normal line calculation unit 31 determines the three-dimensional shape of the face and the orientation of the face from the positional relationship between the face OBf and the feature points (for example, eyes, nose, mouth, etc.) of the three-dimensional model ML. Further, based on the determined three-dimensional shape of the face and the orientation of the face, the normal calculation unit 31 calculates the normal corresponding to the face region from the normal calculated in step ST32, that is, the normal having indefiniteness of 180 degrees. Is resolved based on the detection result of step ST34, and the process proceeds to step ST36.
 ステップST36でUI処理部41は顔形状や表情を判別する。UI処理部41は不定性を解消した顔領域の法線に基づき、詳細な顔形状や表情を判別して判別結果をUI情報とする。UI処理部41は、例えば不定性を解消した顔領域の法線を積分して詳細な顔形状や詳細な顔形状から表情を判別する。なお、顔形状や表情の判別では、距離画像の情報や三次元形状のモデル等も用いてもよい。 In step ST36, the UI processing unit 41 determines the face shape and expression. The UI processing unit 41 discriminates a detailed face shape and expression based on the normal of the face area in which the indefiniteness is eliminated, and uses the discrimination result as UI information. For example, the UI processing unit 41 integrates the normal of the face area in which the indefiniteness is eliminated, and determines the facial expression from the detailed face shape or the detailed face shape. It should be noted that distance image information, a three-dimensional model, or the like may be used for discrimination of the face shape and expression.
 このような処理を行えば、距離画像を用いる場合に比べて精度よく顔形状や表情を判別できるようになる。また、第2の具体例では顔認識について例示したが、既知物体の三次元モデル等を用意すれば既知物体の認識も可能である。 If such processing is performed, the face shape and facial expression can be discriminated more accurately than when the distance image is used. In the second specific example, face recognition is illustrated. However, if a three-dimensional model of a known object is prepared, the known object can be recognized.
  <2-5.第1の実施の形態における第3の具体例>
 次に、第1の実施の形態における第3の具体例について説明する。第3の具体例では、認識対象の被写体が手であり、仮認識処理部は、仮認識処理用画像と予め登録されているモデルの画像を用いて手の領域の位置と骨格構造を認識する。また、不定性解消部は、仮認識処理部で仮認識された手の領域の位置と骨格構造に基づいて手の領域の法線の不定性を解消して、認識部は不定性が解消されている手領域の法線に基づき手形状を判別する。
<2-5. Third Specific Example in First Embodiment>
Next, a third specific example in the first embodiment will be described. In the third specific example, the subject to be recognized is a hand, and the temporary recognition processing unit recognizes the position of the hand region and the skeleton structure using the temporary recognition processing image and the model image registered in advance. . In addition, the indeterminacy elimination unit eliminates indeterminacy in the normal of the hand region based on the position and skeleton structure of the hand region temporarily recognized by the temporary recognition processing unit, and the recognition unit eliminates indeterminacy. The hand shape is determined based on the normal of the hand region.
 図17は、第1の実施の形態における第3の具体例の動作を示すフローチャートである。ステップST41で偏光画像取得部20は偏光画像を取得する。偏光画像取得部20は、偏光板または偏光フィルタを用いて手の撮像を行う。また、偏光画像取得部20は、撮像部にカラーフィルタを設けて、偏光方向が互いに異なる複数のカラー偏光画像を取得してステップST42,43に進む。 FIG. 17 is a flowchart showing the operation of the third specific example in the first embodiment. In step ST41, the polarization image acquisition unit 20 acquires a polarization image. The polarization image acquisition unit 20 performs hand imaging using a polarizing plate or a polarization filter. Further, the polarization image acquisition unit 20 provides a color filter in the imaging unit, acquires a plurality of color polarization images having different polarization directions, and proceeds to steps ST42 and 43.
 ステップST42で法線算出部31は法線を算出する。法線算出部31は、偏光画像の画素毎に、偏光方向が互いに異なる複数の偏光画像の画素値をモデル式にフィッティングさせて、フィッティング後のモデル式に基づき法線を算出してステップST45に進む。 In step ST42, the normal calculation unit 31 calculates a normal. For each pixel of the polarization image, the normal line calculation unit 31 fits the pixel values of a plurality of polarization images having different polarization directions to the model equation, calculates the normal line based on the model equation after the fitting, and proceeds to step ST45. move on.
 ステップST43で法線算出部31は仮認識処理用画像を生成する。法線算出部31は、例えばステップST41で生成された偏光方向が互いに異なる複数の偏光画像の画素値を偏光画像の画素毎に平均して、平均値を仮認識処理用画像(通常画像と同等)の画素値としてステップST44に進む。 In step ST43, the normal calculation unit 31 generates a temporary recognition processing image. For example, the normal calculation unit 31 averages the pixel values of a plurality of polarization images having different polarization directions generated in step ST41 for each pixel of the polarization image, and calculates the average value as a temporary recognition processing image (equivalent to a normal image). ), The process proceeds to step ST44.
 ステップST44で法線算出部31は手の位置・姿勢を検出する。法線算出部31は、ステップST24と同様な処理を行い、拳または手のひらの領域と指先領域を検出して、拳または手のひらの領域の重心と指先を結ぶことにより手の骨格を検出する。図18は、手の骨格を検出する場合の動作を説明するための図である。法線算出部31は、手のひら領域ARkと指領域ARfを検出して、手のひら領域ARkの重心と指領域ARfの先端を結び破線で示すように手の骨格を検出する。 In step ST44, the normal calculation unit 31 detects the position / posture of the hand. The normal calculation unit 31 performs the same processing as step ST24, detects the fist or palm region and the fingertip region, and detects the skeleton of the hand by connecting the center of gravity of the fist or palm region and the fingertip. FIG. 18 is a diagram for explaining an operation when detecting a skeleton of a hand. The normal calculation unit 31 detects the palm area ARk and the finger area ARf, and detects the skeleton of the hand as shown by a broken line connecting the center of gravity of the palm area ARk and the tip of the finger area ARf.
 法線算出部31は、検出した手の骨格と予め記憶されている手の姿勢毎の骨格モデルとのフィッティングを行い、フィッティング誤差が最小となる骨格モデルの姿勢を撮像された手の姿勢とする。法線算出部31は、例えば検出した手の骨格と予め記憶されている骨格モデルの重心を一致させて、関節や指先等の位置座標の差分絶対値和SAD(sum of absolute difference)を骨格モデル毎に算出する。法線算出部31は、算出した差分絶対値和SADが最も小さい姿勢を、撮影された手の姿勢とする。法線算出部31は、このように手の位置と手の姿勢を検出してステップST45に進む。 The normal calculation unit 31 performs fitting between the detected hand skeleton and a skeleton model stored in advance for each hand posture, and uses the posture of the skeleton model that minimizes the fitting error as the imaged hand posture. . For example, the normal calculation unit 31 matches the centroid of the detected hand skeleton with the centroid of the skeleton model stored in advance, and calculates a sum of absolute differences SAD (sum of absolute difference) of position coordinates such as joints and fingertips. Calculate every time. The normal line calculation unit 31 sets the posture with the smallest calculated sum of absolute differences SAD as the posture of the photographed hand. The normal calculation unit 31 detects the hand position and hand posture in this way, and proceeds to step ST45.
 ステップST45で法線算出部31は法線の不定性を解消する。法線算出部31は、ステップST44で検出した手の位置や手の姿勢に基づきステップST42で算出した法線すなわち180度の不定性を有した法線から、手領域に対応する法線の不定性を解消してステップST46に進む。 In step ST45, the normal calculation unit 31 eliminates the indefiniteness of the normal. The normal line calculation unit 31 determines the normal line corresponding to the hand region from the normal line calculated in step ST42 based on the hand position and hand posture detected in step ST44, that is, the normal line having an indefiniteness of 180 degrees. The process proceeds to step ST46.
 ステップST46でUI処理部41は手の形状を判別する。UI処理部41は不定性を解消した手領域の法線に基づき、手の形状を判別して判別結果をUI情報とする。UI処理部41は、例えば不定性を解消した手領域の法線を積分して手の形状を詳細に判別する。なお、手の形状の判別では、距離画像の情報や三次元形状のモデル等も用いてもよい。 In step ST46, the UI processing unit 41 determines the shape of the hand. The UI processing unit 41 discriminates the shape of the hand based on the normal of the hand region in which the ambiguity is eliminated, and uses the discrimination result as UI information. The UI processing unit 41 determines, for example, the shape of the hand in detail by integrating the normal of the hand region in which the indefiniteness is eliminated. In determining the shape of the hand, distance image information, a three-dimensional model, or the like may be used.
 このような処理を行えば、距離画像を用いる場合に比べて精度よく手の形状を判別できるようになる。また、第3の実施例では、手の形状だけでなく指さし方向を判別して判別してもよい。この場合、ステップST44で法線算出部31は、ステップST24と同様な処理によって検出した指領域と予め記憶されている指さし方向毎の指形状モデルとのフィッティングを行う。なお、フィッティングでは、ステップST24と同様な処理によって検出した指先領域を支点として、検出した指領域(または手領域)と指形状モデルの指領域(または手領域)の重ね合わせを行う。UI処理部41は、フィッティング誤差すなわち重ね合わせ誤差が最小となる指形状モデルの姿勢を撮像された手が示す指さし方向とする。 If such processing is performed, the shape of the hand can be discriminated more accurately than when the distance image is used. In the third embodiment, not only the shape of the hand but also the pointing direction may be determined. In this case, in step ST44, the normal calculation unit 31 performs fitting between the finger area detected by the same process as in step ST24 and the finger shape model stored in advance for each pointing direction. In the fitting, the detected finger area (or hand area) and the finger area (or hand area) of the finger shape model are overlapped using the fingertip area detected by the same process as in step ST24 as a fulcrum. The UI processing unit 41 sets the orientation of the finger shape model that minimizes the fitting error, that is, the overlay error, as the pointing direction indicated by the imaged hand.
 また、第1の実施の形態においてフローチャートを用いて示した動作は、不定性を有した法線を生成する処理と、仮認識処理用画像を生成して認識対象の被写体の仮認識を行う処理を並列に行う場合に限られない。例えば、一方の処理を行ってから他方の処理を行うようにしてもよい。 The operations shown in the flowchart in the first embodiment include a process for generating a normal having indefiniteness and a process for generating a temporary recognition processing image and performing temporary recognition of a subject to be recognized. It is not restricted to performing in parallel. For example, one process may be performed before the other process is performed.
 <3.第2の実施の形態>
 ところで、上述の第1の実施の形態では、不定性が解消された法線に基づいてUI情報を生成する場合について説明したが、UI情報は不定性を有する法線に基づいて生成することもできる。第2の実施の形態では、不定性を有する法線に基づいてUI情報を生成する場合について説明する。
<3. Second Embodiment>
By the way, in the first embodiment described above, the case where UI information is generated based on a normal line in which indefiniteness has been eliminated has been described. However, UI information may be generated based on a normal line having indefiniteness. it can. In the second embodiment, a case where UI information is generated based on a normal having indefiniteness will be described.
  <3-1.第2の実施の形態の構成>
 図19は、画像処理装置の第2の実施の形態の構成を例示している。画像処理装置12は、偏光画像取得部20-1,20-2と法線算出部32-1,32-2、ユーザインタフェース(UI)処理部42を有している。
<3-1. Configuration of Second Embodiment>
FIG. 19 illustrates the configuration of the second embodiment of the image processing apparatus. The image processing apparatus 12 includes polarization image acquisition units 20-1 and 20-2, normal calculation units 32-1 and 32-2, and a user interface (UI) processing unit 42.
 偏光画像取得部20-1,20-2は、偏光方向が異なる複数の偏光画像を取得する。偏光画像取得部20-1,20-2は、第1の実施の形態における偏光画像取得部20と同様に構成されている。偏光画像取得部20-1は、認識対象の被写体が撮像されている偏光方向が異なる複数の偏光画像を取得して法線算出部32-1へ出力する。偏光画像取得部20-2は、学習用被写体が撮像されている偏光方向が異なる複数の偏光画像を取得して法線算出部32-2へ出力する。また、偏光画像取得部20-1,20-2は、取得した偏光画像をUI処理部42へ出力してもよい。 The polarization image acquisition units 20-1 and 20-2 acquire a plurality of polarization images having different polarization directions. The polarization image acquisition units 20-1 and 20-2 are configured in the same manner as the polarization image acquisition unit 20 in the first embodiment. The polarization image acquisition unit 20-1 acquires a plurality of polarization images having different polarization directions in which the recognition subject is imaged, and outputs the plurality of polarization images to the normal calculation unit 32-1. The polarization image acquisition unit 20-2 acquires a plurality of polarization images having different polarization directions in which the learning subject is imaged and outputs the acquired images to the normal calculation unit 32-2. Further, the polarization image acquisition units 20-1 and 20-2 may output the acquired polarization image to the UI processing unit 42.
 法線算出部32-1(32-2)は、偏光画像取得部20-1(20-2)で取得された複数の偏光画像から法線を算出する。法線算出部32-1(32-2)は、第1の実施の形態における法線算出部31の偏光処理部301を用いた構成されている。法線算出部32-1は、偏光画像取得部20-1で取得された認識対象の偏光画像から、上述の偏光処理部301と同様な処理を行い、法線を算出してUI処理部42へ出力する。同様に、法線算出部32-2は、偏光画像取得部20-2で取得された学習用被写体の偏光画像から法線を算出してUI処理部42へ出力する。 The normal calculation unit 32-1 (32-2) calculates a normal from a plurality of polarization images acquired by the polarization image acquisition unit 20-1 (20-2). The normal calculation unit 32-1 (32-2) is configured using the polarization processing unit 301 of the normal calculation unit 31 in the first embodiment. The normal calculation unit 32-1 performs processing similar to that of the polarization processing unit 301 described above from the polarization image to be recognized acquired by the polarization image acquisition unit 20-1, calculates a normal, and performs a UI processing unit 42. Output to. Similarly, the normal calculation unit 32-2 calculates a normal from the polarization image of the learning subject acquired by the polarization image acquisition unit 20-2 and outputs the normal to the UI processing unit 42.
 UI処理部42は、偏光画像取得部20-1で取得された偏光画像における認識対象の被写体をユーザインタフェースにおける入力指示体とする。またUI処理部41は、法線算出部32-1で算出された不定性が解消されていない法線に基づき被写体認識を行いユーザインタフェースにおける入力情報(以下「UI情報」という)を生成する。上述のように、法線は認識対象の被写体の三次元形状を示す情報であり、UI処理部42は被写体認識を行い、認識対象の被写体の種類や位置,姿勢等を認識して、認識結果をUI情報として出力する。また、UI処理部42は、予め学習用被写体に応じた教師データを記憶しておき、認識対象の被写体を撮像した偏光方向が異なる複数の偏光画像に基づいて算出した生徒データと記憶されている教師データに基づいて認識対象の被写体の認識処理を行う。 The UI processing unit 42 uses the subject to be recognized in the polarization image acquired by the polarization image acquisition unit 20-1 as an input indicator in the user interface. In addition, the UI processing unit 41 performs subject recognition based on the normal that has not been solved by the indeterminacy calculated by the normal calculation unit 32-1, and generates input information (hereinafter referred to as “UI information”) in the user interface. As described above, the normal is information indicating the three-dimensional shape of the subject to be recognized, and the UI processing unit 42 performs subject recognition, recognizes the type, position, orientation, and the like of the subject to be recognized, and the recognition result. Is output as UI information. The UI processing unit 42 stores teacher data corresponding to the learning subject in advance, and is stored as student data calculated based on a plurality of polarization images with different polarization directions obtained by imaging the subject to be recognized. Based on the teacher data, recognition processing of the subject to be recognized is performed.
 UI処理部42は、教師データ生成部421、教師データベース部422、認識処理部423を有している。 The UI processing unit 42 includes a teacher data generation unit 421, a teacher database unit 422, and a recognition processing unit 423.
 教師データ生成部421は、法線算出部32-2で算出された法線を用いて、学習用被写体に応じた教師データを生成して教師データベース部422に記憶させる。また、教師データ生成部421は、偏光画像取得部20-2から供給された偏光画像を用いて無偏光画像(通常画像)を生成して、無偏光画像から算出した特徴量と取得した法線を用いて教師データを生成してもよい。 The teacher data generating unit 421 generates teacher data corresponding to the learning subject using the normal calculated by the normal calculating unit 32-2, and stores the teacher data in the teacher database unit 422. Further, the teacher data generation unit 421 generates a non-polarized image (normal image) using the polarized image supplied from the polarized image acquisition unit 20-2, and the feature amount calculated from the non-polarized image and the acquired normal line. Teacher data may be generated using.
 教師データベース部422は、教師データ生成部421で生成された教師データを記憶する。また、教師データベース部422は、記憶している教師データを認識処理部423へ出力する。 The teacher database unit 422 stores the teacher data generated by the teacher data generation unit 421. In addition, the teacher database unit 422 outputs the stored teacher data to the recognition processing unit 423.
 認識処理部423は、法線算出部32-1で算出された法線に基づき生徒データを生成して、生成した生徒データと教師データベース部422に記憶されている教師データを用いて認識処理を行い、UI情報を生成する。また、認識処理部423は、偏光画像取得部20-1から供給された偏光画像を用いて無偏光画像(通常画像)を生成して、無偏光画像から算出した特徴量と取得した法線を用いて生徒データを生成してもよい。 The recognition processing unit 423 generates student data based on the normal calculated by the normal calculation unit 32-1 and performs recognition processing using the generated student data and teacher data stored in the teacher database unit 422. And generate UI information. In addition, the recognition processing unit 423 generates a non-polarized image (normal image) using the polarized image supplied from the polarized image acquisition unit 20-1, and calculates the feature amount calculated from the non-polarized image and the acquired normal line. May be used to generate student data.
  <3-2.第2の実施の形態の動作>
 次に、第2の実施の形態の動作について説明する。図20は、学習動作を示すフローチャートである。ステップST51で偏光画像取得部20-2は学習用被写体の偏光画像を取得する。偏光画像取得部20-2は、偏光板または偏光フィルタを用いて学習用被写体を撮像して、偏光方向が互いに異なる複数の偏光画像を取得してステップST52に進む。
<3-2. Operation of Second Embodiment>
Next, the operation of the second embodiment will be described. FIG. 20 is a flowchart showing the learning operation. In step ST51, the polarization image acquisition unit 20-2 acquires a polarization image of the learning subject. The polarization image acquisition unit 20-2 captures a learning subject using a polarizing plate or a polarization filter, acquires a plurality of polarization images having different polarization directions, and proceeds to step ST52.
 ステップST52で法線算出部32-2は法線を算出する。法線算出部32-2は、偏光画像の画素毎に、偏光方向が互いに異なる複数の偏光画像の画素値をモデル式にフィッティングさせて、フィッティング後のモデル式に基づき法線を算出してステップST54に進む。 In step ST52, the normal calculation unit 32-2 calculates a normal. The normal calculation unit 32-2 fits pixel values of a plurality of polarization images having different polarization directions to a model equation for each pixel of the polarization image, and calculates a normal based on the model equation after the fitting. Proceed to ST54.
 ステップST54でUI処理部42は教師データを生成する。UI処理部42は、学習用被写体の偏光画像に基づいて算出した法線に基づき教師データを生成してステップST55に進む。 In step ST54, the UI processing unit 42 generates teacher data. The UI processing unit 42 generates teacher data based on the normal calculated based on the polarization image of the learning subject, and proceeds to step ST55.
ステップST55でUI処理部42は教師データを記憶する。UI処理部42は、ステップST54で生成された教師データを教師データベース部422に記憶する。 In step ST55, the UI processing unit 42 stores teacher data. The UI processing unit 42 stores the teacher data generated in step ST54 in the teacher database unit 422.
 また、ステップST51からステップST55の処理を学習用被写体毎に行い、UI処理部42には、種々の物体を学習用被写体として生成された教師データが記憶される。なお、不定性を有する法線と偏光画像に基づいてUI情報を生成する場合、UI処理部42はステップST53の処理を行い、偏光画像から無偏光画像を生成する。また、ステップST54でUI処理部42は、学習用被写体の偏光画像に基づいて算出した法線と無偏光画像から算出した特徴量を用いて教師データを生成する。 Further, the processing from step ST51 to step ST55 is performed for each learning subject, and the UI processing unit 42 stores teacher data generated with various objects as learning subjects. When generating UI information based on a normal having indefiniteness and a polarization image, the UI processing unit 42 performs the process of step ST53 to generate a non-polarization image from the polarization image. In step ST54, the UI processing unit 42 generates teacher data by using the normal calculated based on the polarization image of the learning subject and the feature amount calculated from the non-polarized image.
 図21は、学習結果を用いた認識動作を示すフローチャートである。ステップST61で偏光画像取得部20-1は認識対象の被写体の偏光画像を取得する。偏光画像取得部20-1は、偏光板または偏光フィルタを用いて認識対象の被写体を撮像して、偏光方向が互いに異なる複数の偏光画像を取得してステップST62に進む。 FIG. 21 is a flowchart showing the recognition operation using the learning result. In step ST61, the polarization image acquisition unit 20-1 acquires a polarization image of the recognition target object. The polarization image acquisition unit 20-1 captures a subject to be recognized using a polarizing plate or a polarization filter, acquires a plurality of polarization images having different polarization directions, and proceeds to step ST62.
 ステップST62で法線算出部32-1は法線を算出する。法線算出部32-1は、偏光画像の画素毎に、偏光方向が互いに異なる複数の偏光画像の画素値をモデル式にフィッティングさせて、フィッティング後のモデル式に基づき法線を算出してステップST64に進む。 In step ST62, the normal calculation unit 32-1 calculates a normal. The normal calculation unit 32-1 fits pixel values of a plurality of polarization images having different polarization directions to a model equation for each pixel of the polarization image, and calculates a normal based on the model equation after the fitting. Proceed to ST64.
 ステップST64でUI処理部42は生徒データを生成する。UI処理部42は、認識対象の偏光画像に基づいて算出した法線に基づき生徒データを生成してステップST65に進む。 In step ST64, the UI processing unit 42 generates student data. The UI processing unit 42 generates student data based on the normal calculated based on the polarization image to be recognized, and proceeds to step ST65.
 ステップST65でUI処理部42はUI情報を生成する。UI処理部42は、ステップST64で生成した生徒データとステップST51乃至ステップST55の処理を行うことで記憶されている教師データに基づき認識対象の被写体の種類や位置,姿勢等を判別して、判別結果をUI情報とする。なお、不定性を有する法線と偏光画像に基づいてUI情報を生成する場合、UI処理部42はステップST63の処理を行い、偏光画像から無偏光画像を生成する。また、ステップST64でUI処理部42は、認識対象の偏光画像に基づいて算出した法線と無偏光画像から算出した特徴量を用いて生徒データを生成する。 In step ST65, the UI processing unit 42 generates UI information. The UI processing unit 42 determines the type, position, orientation, and the like of the subject to be recognized based on the student data generated in step ST64 and the teacher data stored by performing the processing in steps ST51 to ST55. The result is UI information. In addition, when generating UI information based on a normal having indefiniteness and a polarization image, the UI processing unit 42 performs the process of step ST63 to generate a non-polarization image from the polarization image. In step ST64, the UI processing unit 42 generates student data using the normal calculated based on the polarization image to be recognized and the feature amount calculated from the non-polarized image.
  <3-3.第2の実施の形態における具体例>
 次に、第2の実施の形態における具体例について説明する。図22は、第2の実施の形態における具体例の動作を示している。なお、具体例は不定性を有する法線に基づいてUI情報を生成する場合を示している。ステップST71で偏光画像取得部20-2は学習用被写体の偏光画像(教師偏光画像)を取得する。例えば手をグー(Rock)の状態として、偏光画像取得部20-2は、偏光板または偏光フィルタを用いて手の撮像を行い、偏光方向が互いに異なる複数の偏光画像を取得してステップST72に進む。
<3-3. Specific Example in Second Embodiment>
Next, a specific example in the second embodiment will be described. FIG. 22 shows an operation of a specific example in the second embodiment. In addition, the specific example has shown the case where UI information is produced | generated based on the normal line which has indefiniteness. In step ST71, the polarization image acquisition unit 20-2 acquires a polarization image (teacher polarization image) of the learning subject. For example, with the hand in the Rock state, the polarization image acquisition unit 20-2 captures the hand using a polarizing plate or a polarization filter, acquires a plurality of polarization images having different polarization directions, and proceeds to Step ST72. move on.
 ステップST72で法線算出部32-2は法線を算出する。法線算出部32-2は、偏光画像の画素毎に、偏光方向が互いに異なる複数の偏光画像の画素値をモデル式にフィッティングさせて、フィッティング後のモデル式に基づき手をグー(Rock)の状態としたときの法線を算出してステップST73に進む。 In step ST72, the normal calculation unit 32-2 calculates a normal. For each pixel of the polarization image, the normal line calculation unit 32-2 fits pixel values of a plurality of polarization images having different polarization directions to the model formula, and moves the hand based on the model formula after the fitting. The normal line in the state is calculated, and the process proceeds to step ST73.
 ステップST73でUI処理部42は教師データを生成する。UI処理部42は、学習用被写体の法線に基づいて教師データを生成する。例えば、UI処理部42は、手をグー(Rock)の状態としたときの法線をヒストグラム化して、得られた法線ヒストグラムを教師データとしてステップST74に進む。 In step ST73, the UI processing unit 42 generates teacher data. The UI processing unit 42 generates teacher data based on the normal line of the learning subject. For example, the UI processing unit 42 forms a histogram of normals when the hand is in a rock state, and proceeds to step ST74 using the obtained normal histogram as teacher data.
 ステップST74でUI処理部42は教師データを記憶する。UI処理部42は、例えば手をグー(Rock)の状態としたときの法線ヒストグラムを教師データとして教師データベース部に記憶する。 In step ST74, the UI processing unit 42 stores teacher data. For example, the UI processing unit 42 stores, as teacher data, a normal histogram when the hand is in a rock state in the teacher database unit.
 また、ステップST71からステップST74の処理を学習用被写体毎、例えば手をパー(Paper)の状態やチョキ(Scissors)の状態とした場合でも行い、各状態の教師データを教師データベース部422に記憶させる。 Also, the processing from step ST71 to step ST74 is performed for each learning subject, for example, when the hand is in a par state or a scissors state, and the teacher data in each state is stored in the teacher database unit 422. .
 ステップST75で偏光画像取得部20-1は認識対象の偏光画像を取得する。偏光画像取得部20-1は、偏光板または偏光フィルタを用いて例えばじゃんけんを行ったときの手を撮像して、偏光方向が互いに異なる複数の偏光画像を取得してステップST76に進む。 In step ST75, the polarization image acquisition unit 20-1 acquires a polarization image to be recognized. The polarization image acquisition unit 20-1 captures, for example, a hand when performing a soap using a polarizing plate or a polarization filter, acquires a plurality of polarization images having different polarization directions, and proceeds to step ST76.
 ステップST76で法線算出部32-1は法線を算出する。法線算出部32-1は、偏光画像の画素毎に、偏光方向が互いに異なる複数の偏光画像の画素値をモデル式にフィッティングさせて、フィッティング後のモデル式に基づき法線を算出してステップST77に進む。 In step ST76, the normal calculation unit 32-1 calculates a normal. The normal calculation unit 32-1 fits pixel values of a plurality of polarization images having different polarization directions to a model equation for each pixel of the polarization image, and calculates a normal based on the model equation after the fitting. Proceed to ST77.
 ステップST77でUI処理部42は生徒データを生成する。UI処理部42は、認識対象の法線に基づいて生徒データを生成する。例えば、UI処理部42は、認識対象の手の状態についての法線をヒストグラム化して、得られた法線ヒストグラムを生徒データとしてステップST78に進む。 In step ST77, the UI processing unit 42 generates student data. The UI processing unit 42 generates student data based on the normal line to be recognized. For example, the UI processing unit 42 histograms the normals regarding the state of the hand to be recognized, and proceeds to step ST78 using the obtained normal histograms as student data.
 ステップST78でUI処理部42はUI情報を生成する。UI処理部42は、ステップST77で得られた生徒データと最も類似する教師データを教師データベース部422から判別する。さらに、UI処理部42は、判別した教師データに対応する手の状態を、ステップST75で取得した偏光画像で撮像されている手の状態と判別して、判別結果をUI情報として出力する。 In step ST78, the UI processing unit 42 generates UI information. The UI processing unit 42 determines from the teacher database unit 422 the teacher data most similar to the student data obtained in step ST77. Further, the UI processing unit 42 determines the hand state corresponding to the determined teacher data as the hand state captured in the polarization image acquired in step ST75, and outputs the determination result as UI information.
 このような処理を行えば、法線の不定性を解消する処理を行わなくともUI情報を生成できるようになる。また、第1の実施の形態と同様に、距離画像を用いる場合に比べて精度よく認識処理を行うことができる。 If such a process is performed, UI information can be generated without performing a process for eliminating the indefiniteness of the normal. In addition, as in the first embodiment, the recognition process can be performed with higher accuracy than when the distance image is used.
 また、第2の実施の形態では、学習用被写体から法線を生成する構成と、認識対象の被写体から法線を生成する構成を別個に設けている。したがって、学習用被写体から法線を生成する構成では、認識対象の被写体から法線を生成する構成に比べて、高精度に教師データを生成することが可能となる。したがって、判別の基準として用いる教師データを高精度なデータとして教師データベース部422に記憶させることで、精度のよい判別結果を得られるようになる。また、第2の実施の形態では、偏光画像取得部と法線算出部は学習用被写体から法線を生成する場合と所望の認識対象から法線を生成する場合とで共用してもよい。この場合、画像処理装置の構成が簡易となり、安価に画像処理装置を提供することが可能となる。さらに、第2の実施の形態では、UI処理部に通信部や記録媒体装着部等を設けて、外部から教師データの更新や追加等を行うことができる構成としてもよい。このように、教師データの更新や追加等を外部から通信路または記録媒体を介して可能とすれば、より多くの物体を認識対象の被写体として用いることが可能となり、汎用性を向上できる。 In the second embodiment, a configuration for generating a normal from a learning subject and a configuration for generating a normal from a subject to be recognized are provided separately. Therefore, in the configuration in which the normal line is generated from the learning subject, it is possible to generate the teacher data with higher accuracy than in the configuration in which the normal line is generated from the recognition subject. Therefore, by storing the teacher data used as a criterion for discrimination in the teacher database unit 422 as high-precision data, it is possible to obtain an accurate discrimination result. In the second embodiment, the polarization image acquisition unit and the normal line calculation unit may be shared between the case where the normal line is generated from the learning subject and the case where the normal line is generated from the desired recognition target. In this case, the configuration of the image processing apparatus is simplified, and the image processing apparatus can be provided at a low cost. Furthermore, in the second embodiment, a communication unit, a recording medium mounting unit, and the like may be provided in the UI processing unit so that teacher data can be updated or added from the outside. Thus, if updating or addition of teacher data can be performed from the outside via a communication path or a recording medium, more objects can be used as subjects to be recognized, and versatility can be improved.
 図23は、本技術の適用対象となるユーザインタフェースを例示している。例えば認識対象を手として手認識を行う。手認識では手形状や指さし方向を判別できる。なお、手形状の判別では、三次元形状を精度よく取得できるので細かい指の形も取得することが可能となる。また、認識対象を顔として顔認識を行う。顔認識では個人認証、表情や視線方向を判別できる。さらに、認識対象を人として人認識を行う。人認識では体型認証やポーズ判別を行うことができる。また、物体を認識対象として物体認証を行う場合、物体認証では、既知物体の姿勢を判別できる。 FIG. 23 exemplifies a user interface to which the present technology is applied. For example, hand recognition is performed using a recognition target as a hand. In hand recognition, the hand shape and pointing direction can be determined. In the hand shape discrimination, since a three-dimensional shape can be obtained with high accuracy, a fine finger shape can also be obtained. Also, face recognition is performed using the recognition target as a face. In face recognition, personal authentication, facial expression and line-of-sight direction can be determined. Furthermore, human recognition is performed with a recognition target as a person. In human recognition, body authentication and pose discrimination can be performed. When object authentication is performed using an object as a recognition target, the posture of a known object can be determined in object authentication.
 このような本技術によれば、従来の距離画像から生成する法線に比べて偏光画像から被写体の三次元形状に近い法線を算出できるので、例えばカメラに向かって角度がついている被写体についても法線を安定して算出することが可能である。したがって、偏光画像から算出した法線を用いることで、認識対象の被写体を精度よく容易に認識できる。さらに、この技術をユーザインタフェースに適用すれば、距離画像から指さし方向等を認識する場合に比べてより確実に指さし方向等を認識できるので、ストレスのないユーザインタフェースを提供することが可能となる。なお、偏光画像取得部として例えば図4の(a)(b)の構成を用いれば、単眼カメラで取得された偏光画像から法線を算出できるので、複数のカメラを用いる必要がない。したがって、ユーザインタフェースへの適用も容易である。 According to the present technology, since a normal close to the three-dimensional shape of the subject can be calculated from the polarization image as compared with the normal generated from the conventional distance image, for example, even for a subject that is angled toward the camera The normal can be calculated stably. Therefore, the subject to be recognized can be easily and accurately recognized by using the normal calculated from the polarization image. Furthermore, if this technique is applied to the user interface, the pointing direction and the like can be recognized more reliably than when the pointing direction and the like are recognized from the distance image, and thus it is possible to provide a stress-free user interface. For example, if the configuration shown in FIGS. 4A and 4B is used as the polarization image acquisition unit, the normal can be calculated from the polarization image acquired by the monocular camera, so there is no need to use a plurality of cameras. Therefore, application to a user interface is easy.
 明細書中において説明した一連の処理はハードウェア、またはソフトウェア、あるいは両者の複合構成によって実行することが可能である。ソフトウェアによる処理を実行する場合は、処理シーケンスを記録したプログラムを、専用のハードウェアに組み込まれたコンピュータ内のメモリにインストールして実行させる。または、各種処理が実行可能な汎用コンピュータにプログラムをインストールして実行させることが可能である。 The series of processes described in the specification can be executed by hardware, software, or a combined configuration of both. When processing by software is executed, a program in which a processing sequence is recorded is installed and executed in a memory in a computer incorporated in dedicated hardware. Alternatively, the program can be installed and executed on a general-purpose computer capable of executing various processes.
 例えば、プログラムは記録媒体としてのハードディスクやSSD(Solid State Drive)、ROM(Read Only Memory)に予め記録しておくことができる。あるいは、プログラムはフレキシブルディスク、CD-ROM(Compact Disc Read Only Memory),MO(Magneto optical)ディスク,DVD(Digital Versatile Disc)、BD(Blu-Ray Disc(登録商標))、磁気ディスク、半導体メモリカード等のリムーバブル記録媒体に、一時的または永続的に格納(記録)しておくことができる。このようなリムーバブル記録媒体は、いわゆるパッケージソフトウェアとして提供することができる。 For example, the program can be recorded in advance on a hard disk, SSD (Solid State Drive), or ROM (Read Only Memory) as a recording medium. Alternatively, the program is a flexible disk, a CD-ROM (Compact Disc Read Only Memory), an MO (Magneto optical disc), a DVD (Digital Versatile Disc), a BD (Blu-Ray Disc (registered trademark)), a magnetic disk, or a semiconductor memory card. It can be stored (recorded) in a removable recording medium such as temporarily or permanently. Such a removable recording medium can be provided as so-called package software.
 また、プログラムは、リムーバブル記録媒体からコンピュータにインストールする他、ダウンロードサイトからLAN(Local Area Network)やインターネット等のネットワークを介して、コンピュータに無線または有線で転送してもよい。コンピュータでは、そのようにして転送されてくるプログラムを受信し、内蔵するハードディスク等の記録媒体にインストールすることができる。 In addition to installing the program from the removable recording medium to the computer, the program may be transferred from the download site to the computer wirelessly or by wire via a network such as a LAN (Local Area Network) or the Internet. The computer can receive the program transferred in this way and install it on a recording medium such as a built-in hard disk.
 なお、本明細書に記載した効果はあくまで例示であって限定されるものではなく、記載されていない付加的な効果があってもよい。また、本技術は、上述した技術の実施の形態に限定して解釈されるべきではない。この技術の実施の形態は、例示という形態で本技術を開示しており、本技術の要旨を逸脱しない範囲で当業者が実施の形態の修正や代用をなし得ることは自明である。すなわち、本技術の要旨を判断するためには、請求の範囲を参酌すべきである。 In addition, the effect described in this specification is an illustration to the last, and is not limited, There may be an additional effect which is not described. Further, the present technology should not be construed as being limited to the embodiments of the technology described above. The embodiments of this technology disclose the present technology in the form of examples, and it is obvious that those skilled in the art can make modifications and substitutions of the embodiments without departing from the gist of the present technology. In other words, the scope of the claims should be considered in order to determine the gist of the present technology.
 また、本技術の画像処理装置は以下のような構成も取ることができる。
 (1) 認識対象の被写体が撮像されている偏光方向が異なる複数の偏光画像を取得する偏光画像取得部と、
 前記偏光画像取得部で取得された偏光画像に基づいて、画素毎に法線を算出する法線算出部と、
 前記法線算出部で算出された法線を用いて前記被写体の認識を行う認識部と
を備える画像処理装置。
 (2) 前記認識対象の被写体はユーザインタフェースにおける入力指示体であり、
 前記認識部は、前記被写体の認識結果を前記ユーザインタフェースにおける入力情報とする(1)に記載の画像処理装置。
 (3) 前記法線算出部は、
 前記複数の偏光画像から仮認識処理用画像を生成する仮認識処理用画像生成部と、
 前記仮認識処理用画像生成部で生成された仮認識処理用画像を用いて前記被写体の仮認識を行う仮認識識処理部と、
 前記複数の偏光画像から法線を算出する偏光処理部と、
 前記仮認識処理部の仮認識結果に基づいて前記偏光処理部で算出された法線の不定性を解消する不定性解消部とを有し、
 前記認識部は前記法線算出部で不定性が解消された法線を用いて前記被写体の認識を行う(1)または(2)に記載の画像処理装置。
 (4) 前記仮認識処理部は、前記仮認識処理用画像と予め登録されているモデルの画像を用いて前記被写体に最も近似したモデルを前記被写体の仮認識結果として、
 前記不定性解消部は、前記仮認識処理部で仮認識された前記モデルに基づいて前記偏光処理部で算出された法線の不定性を解消する(3)に記載の画像処理装置。
 (5) 前記認識対象の被写体は手であり、
 前記仮認識処理部は、前記仮認識処理用画像と予め登録されているモデルの画像を用いて前記手の指先と指腹の位置を認識して、
 前記不定性解消部は、前記仮認識処理部で仮認識された前記指先と指腹の位置に基づいて前記手における指領域の法線の不定性を解消する(4)に記載の画像処理装置。
 (6) 前記認識部は前記法線算出部で不定性が解消された前記指領域の法線に基づき指さし方向を判別する(5)に記載の画像処理装置。
 (7) 前記認識対象の被写体は顔であり、
 前記仮認識処理部は、前記仮認識処理用画像と予め登録されているモデルの画像を用いて顔領域の位置を認識して、
 前記不定性解消部は、前記仮認識処理部で仮認識された前記顔領域の位置に基づいて前記顔の法線の不定性を解消する(4)に記載の画像処理装置。
 (8) 前記認識部は前記法線算出部で不定性が解消された前記顔領域の法線に基づき顔形状または表情を判別する(7)に記載の画像処理装置。
 (9) 前記認識対象の被写体は手であり、
 前記仮認識処理部は、前記仮認識処理用画像と予め登録されているモデルの画像を用いて前記手の領域の位置と骨格構造を認識して、
 前記不定性解消部は、前記仮認識処理部で仮認識された前記手の領域の位置と骨格構造に基づいて前記手の領域の法線の不定性を解消する(4)に記載の画像処理装置。
 (10) 前記認識部は前記法線算出部で不定性が解消された前記手領域の法線に基づき手形状を判別する(10)に記載の画像処理装置。
 (11) 前記認識部は、
 学習用被写体を撮像した偏光方向が異なる複数の偏光画像に基づいて算出した法線から前記学習用被写体に応じた教師データを生成する教師データ生成部と、
 前記教師データ生成部によって学習用被写体毎に生成された前記教師データを記憶する教師データベース部と、
 前記認識対象の被写体を撮像した偏光方向が異なる複数の偏光画像に基づいて算出した法線を用いて前記認識対象に応じて生成した生徒データと、前記教師データベース部に記憶されている教師データに基づいて、前記認識対象の被写体を認識する認識処理部とを有する(1)または(2)に記載の画像処理装置。
 (12) 偏光画像取得部は、前記認識対象および前記学習用被写体毎に前記偏光方向が異なる複数の偏光画像を取得して、
 前記法線算出部は、前記偏光画像取得部で取得された偏光画像に基づいて、前記認識対象および前記学習用被写体毎に法線を算出する(11)に記載の画像処理装置。
 (13) 前記学習用被写体を撮像した偏光方向が異なる複数の偏光画像を取得する学習用偏光画像取得部と、
 前記学習用偏光画像取得部で取得された偏光画像に基づいて法線を算出する学習用法線算出部をさらに備える(11)または(12)に記載の画像処理装置。
 (14) 前記教師データは、前記学習用被写体についての法線の分布を示すデータであり、前記生徒データは前記認識対象の被写体について算出した法線の分布を示すデータである(11)乃至(13)のいずれかに記載の画像処理装置。
 (15) 認識処理部は、前記生徒データに最も近似した教師データに対応する学習用被写体を認識結果とする(11)乃至(14)のいずれかに記載の画像処理装置。
In addition, the image processing apparatus according to the present technology may have the following configuration.
(1) a polarization image acquisition unit that acquires a plurality of polarization images having different polarization directions in which a subject to be recognized is imaged;
A normal calculation unit that calculates a normal for each pixel based on the polarization image acquired by the polarization image acquisition unit;
An image processing apparatus comprising: a recognition unit that recognizes the subject using the normal calculated by the normal calculation unit.
(2) The subject to be recognized is an input indicator in a user interface,
The image processing apparatus according to (1), wherein the recognition unit uses the recognition result of the subject as input information in the user interface.
(3) The normal calculation unit
A provisional recognition processing image generation unit that generates a provisional recognition processing image from the plurality of polarization images;
A temporary recognition recognition processing unit that performs temporary recognition of the subject using the temporary recognition processing image generated by the temporary recognition processing image generation unit;
A polarization processing unit that calculates normals from the plurality of polarization images;
An indeterminacy eliminating unit that eliminates the indeterminacy of the normal calculated by the polarization processing unit based on the temporary recognition result of the temporary recognition processing unit,
The image processing apparatus according to (1) or (2), wherein the recognizing unit recognizes the subject using a normal whose indefiniteness has been eliminated by the normal calculating unit.
(4) The temporary recognition processing unit uses, as a temporary recognition result of the subject, a model closest to the subject using the temporary recognition processing image and a model image registered in advance.
The image processing apparatus according to (3), wherein the indeterminacy canceling unit cancels the indefiniteness of the normal calculated by the polarization processing unit based on the model temporarily recognized by the temporary recognition processing unit.
(5) The subject to be recognized is a hand,
The temporary recognition processing unit recognizes the positions of the fingertips and finger pads using the temporary recognition processing image and a pre-registered model image,
The image processing apparatus according to (4), wherein the indefiniteness eliminating unit eliminates indeterminacy of a normal of a finger region in the hand based on the positions of the fingertip and the finger pad temporarily recognized by the temporary recognition processing unit. .
(6) The image processing device according to (5), wherein the recognizing unit determines a pointing direction based on a normal of the finger region whose indefiniteness has been eliminated by the normal calculation unit.
(7) The subject to be recognized is a face,
The temporary recognition processing unit recognizes the position of the face area using the temporary recognition processing image and a model image registered in advance,
The image processing apparatus according to (4), wherein the indefiniteness eliminating unit eliminates indefiniteness of the normal of the face based on the position of the face area temporarily recognized by the temporary recognition processing unit.
(8) The image processing apparatus according to (7), wherein the recognizing unit determines a face shape or a facial expression based on a normal of the face region in which indefiniteness is eliminated by the normal calculation unit.
(9) The subject to be recognized is a hand,
The temporary recognition processing unit recognizes the position and skeleton structure of the hand region using the temporary recognition processing image and a pre-registered model image,
The image processing according to (4), wherein the indefiniteness eliminating unit eliminates indefiniteness of the normal of the hand region based on a position and a skeleton structure of the hand region temporarily recognized by the temporary recognition processing unit. apparatus.
(10) The image processing apparatus according to (10), wherein the recognizing unit determines a hand shape based on a normal of the hand region in which indefiniteness is eliminated by the normal calculation unit.
(11) The recognition unit
A teacher data generation unit that generates teacher data corresponding to the learning subject from normals calculated based on a plurality of polarized images having different polarization directions obtained by imaging the learning subject;
A teacher database unit for storing the teacher data generated for each learning subject by the teacher data generation unit;
Student data generated according to the recognition target using normals calculated based on a plurality of polarization images having different polarization directions obtained by imaging the subject to be recognized, and teacher data stored in the teacher database unit The image processing apparatus according to (1) or (2), further including a recognition processing unit that recognizes the subject to be recognized.
(12) The polarization image acquisition unit acquires a plurality of polarization images having different polarization directions for each of the recognition target and the learning subject,
The image processing apparatus according to (11), wherein the normal line calculation unit calculates a normal line for each of the recognition object and the learning subject based on the polarization image acquired by the polarization image acquisition unit.
(13) A learning polarization image acquisition unit that acquires a plurality of polarization images having different polarization directions obtained by imaging the learning subject;
The image processing apparatus according to (11) or (12), further including a learning normal calculation unit that calculates a normal based on the polarization image acquired by the learning polarization image acquisition unit.
(14) The teacher data is data indicating a distribution of normals for the learning subject, and the student data is data indicating a distribution of normals calculated for the subject to be recognized. The image processing apparatus according to any one of 13).
(15) The image processing device according to any one of (11) to (14), wherein the recognition processing unit sets the learning subject corresponding to the teacher data closest to the student data as a recognition result.
 この技術の画像処理装置と画像処理方法およびプログラムでは、認識対象の被写体が撮像されている偏光方向が異なる複数の偏光画像が取得されて、この取得された偏光画像に基づいて画素毎に法線が算出されて、算出された法線を用いて被写体の認識が行われる。このため、被写体の認識を精度よく容易に行うことができる。したがって、物体の種類や位置,姿勢等の認識結果に応じて動作制御や信号処理の開始,終了,変更,更新等を行うインタフェースを有した機器に適している。 In the image processing apparatus, the image processing method, and the program according to this technique, a plurality of polarization images having different polarization directions in which a subject to be recognized is captured are acquired, and a normal line is obtained for each pixel based on the acquired polarization image. Is calculated, and the subject is recognized using the calculated normal. For this reason, it is possible to easily and accurately recognize the subject. Therefore, it is suitable for a device having an interface for performing operation control and start / end / change / update of signal processing according to the recognition result of the type, position, orientation, etc. of the object.
 10,11,12・・・画像処理装置
 20,20-1,20-2・・・偏光画像取得部
 30,31,32-1,32-2・・・法線算出部
 40・・・認識部
 41,42・・・ユーザインタフェース(UI)処理部
 201・・・イメージセンサ
 202・・・偏光フィルタ
 203・・・レンズ
 204,211,212-1~212-4・・・偏光板
 210,210-1~210-4・・・撮像部
 301・・・偏光処理部
 302・・・仮認識処理用画像生成部
 303・・・仮認識処理部
 304・・・モデルデータベース部
 305・・・不定性解消部
 421・・・教師データ生成部
 422・・・教師データベース部
 423・・・認識処理部
DESCRIPTION OF SYMBOLS 10, 11, 12 ... Image processing device 20, 20-1, 20-2 ... Polarized image acquisition part 30, 31, 32-1, 32-2 ... Normal calculation part 40 ... Recognition Unit 41, 42 ... User interface (UI) processing unit 201 ... Image sensor 202 ... Polarizing filter 203 ... Lens 204, 211, 212-1 to 212-4 ... Polarizing plate 210, 210 -1 to 210-4: Imaging unit 301: Polarization processing unit 302 ... Temporary recognition processing image generation unit 303 ... Temporary recognition processing unit 304 ... Model database unit 305 ... Uncertainty Resolving unit 421 ... Teacher data generation unit 422 ... Teacher database unit 423 ... Recognition processing unit

Claims (17)

  1.  認識対象の被写体が撮像されている偏光方向が異なる複数の偏光画像を取得する偏光画像取得部と、
     前記偏光画像取得部で取得された偏光画像に基づいて、画素毎に法線を算出する法線算出部と、
     前記法線算出部で算出された法線を用いて前記被写体の認識を行う認識部と
    を備える画像処理装置。
    A polarization image acquisition unit that acquires a plurality of polarization images having different polarization directions in which a subject to be recognized is imaged;
    A normal calculation unit that calculates a normal for each pixel based on the polarization image acquired by the polarization image acquisition unit;
    An image processing apparatus comprising: a recognition unit that recognizes the subject using the normal calculated by the normal calculation unit.
  2.  前記認識対象の被写体はユーザインタフェースにおける入力指示体であり、
     前記認識部は、前記被写体の認識結果を前記ユーザインタフェースにおける入力情報とする
    請求項1記載の画像処理装置。
    The object to be recognized is an input indicator in a user interface,
    The image processing apparatus according to claim 1, wherein the recognition unit uses the recognition result of the subject as input information in the user interface.
  3.  前記法線算出部は、
     前記複数の偏光画像から仮認識処理用画像を生成する仮認識処理用画像生成部と、
     前記仮認識処理用画像生成部で生成された仮認識処理用画像を用いて前記被写体の仮認識を行う仮認識識処理部と、
     前記複数の偏光画像から法線を算出する偏光処理部と、
     前記仮認識処理部の仮認識結果に基づいて前記偏光処理部で算出された法線の不定性を解消する不定性解消部とを有し、
     前記認識部は前記法線算出部で不定性が解消された法線を用いて前記被写体の認識を行う
    請求項1記載の画像処理装置。
    The normal calculation unit
    A provisional recognition processing image generation unit that generates a provisional recognition processing image from the plurality of polarization images;
    A temporary recognition recognition processing unit that performs temporary recognition of the subject using the temporary recognition processing image generated by the temporary recognition processing image generation unit;
    A polarization processing unit that calculates normals from the plurality of polarization images;
    An indeterminacy eliminating unit that eliminates the indeterminacy of the normal calculated by the polarization processing unit based on the temporary recognition result of the temporary recognition processing unit,
    The image processing apparatus according to claim 1, wherein the recognizing unit recognizes the subject using a normal whose indefiniteness has been eliminated by the normal calculating unit.
  4.  前記仮認識処理部は、前記仮認識処理用画像と予め登録されているモデルの画像を用いて前記被写体に最も近似したモデルを前記被写体の仮認識結果として、
     前記不定性解消部は、前記仮認識処理部で仮認識された前記モデルに基づいて前記偏光処理部で算出された法線の不定性を解消する
    請求項3記載の画像処理装置。
    The temporary recognition processing unit uses, as a temporary recognition result of the subject, a model closest to the subject using the image for temporary recognition processing and an image of a model registered in advance.
    The image processing apparatus according to claim 3, wherein the indeterminacy canceling unit cancels the indefiniteness of the normal calculated by the polarization processing unit based on the model temporarily recognized by the temporary recognition processing unit.
  5.  前記認識対象の被写体は手であり、
     前記仮認識処理部は、前記仮認識処理用画像と予め登録されているモデルの画像を用いて前記手の指先と指腹の位置を認識して、
     前記不定性解消部は、前記仮認識処理部で仮認識された前記指先と指腹の位置に基づいて前記手における指領域の法線の不定性を解消する
    請求項4記載の画像処理装置。
    The object to be recognized is a hand,
    The temporary recognition processing unit recognizes the positions of the fingertips and finger pads using the temporary recognition processing image and a pre-registered model image,
    The image processing apparatus according to claim 4, wherein the indeterminacy canceling unit cancels the indeterminacy of the normal of the finger region in the hand based on the positions of the fingertip and the finger pad temporarily recognized by the temporary recognition processing unit.
  6.  前記認識部は前記法線算出部で不定性が解消された前記指領域の法線に基づき指さし方向を判別する
    請求項5記載の画像処理装置。
    The image processing apparatus according to claim 5, wherein the recognizing unit determines a pointing direction based on a normal of the finger region whose indefiniteness has been eliminated by the normal calculation unit.
  7.  前記認識対象の被写体は顔であり、
     前記仮認識処理部は、前記仮認識処理用画像と予め登録されているモデルの画像を用いて顔領域の位置を認識して、
     前記不定性解消部は、前記仮認識処理部で仮認識された前記顔領域の位置に基づいて前記顔の法線の不定性を解消する
    請求項4記載の画像処理装置。
    The object to be recognized is a face,
    The temporary recognition processing unit recognizes the position of the face area using the temporary recognition processing image and a model image registered in advance,
    The image processing apparatus according to claim 4, wherein the indeterminacy canceling unit cancels the indefiniteness of the normal of the face based on the position of the face area temporarily recognized by the temporary recognition processing unit.
  8.  前記認識部は前記法線算出部で不定性が解消された前記顔領域の法線に基づき顔形状または表情を判別する
    請求項7記載の画像処理装置。
    The image processing apparatus according to claim 7, wherein the recognizing unit determines a face shape or a facial expression based on a normal of the face area whose indefiniteness has been eliminated by the normal calculation unit.
  9.  前記認識対象の被写体は手であり、
     前記仮認識処理部は、前記仮認識処理用画像と予め登録されているモデルの画像を用いて前記手の領域の位置と骨格構造を認識して、
     前記不定性解消部は、前記仮認識処理部で仮認識された前記手の領域の位置と骨格構造に基づいて前記手の領域の法線の不定性を解消する
    請求項4記載の画像処理装置。
    The object to be recognized is a hand,
    The temporary recognition processing unit recognizes the position and skeleton structure of the hand region using the temporary recognition processing image and a pre-registered model image,
    The image processing apparatus according to claim 4, wherein the ambiguity canceling unit resolves the ambiguity of the normal of the hand region based on a position and a skeleton structure of the hand region temporarily recognized by the temporary recognition processing unit. .
  10.  前記認識部は前記法線算出部で不定性が解消された前記手領域の法線に基づき手形状を判別する
    請求項9記載の画像処理装置。
    The image processing apparatus according to claim 9, wherein the recognizing unit determines a hand shape based on a normal of the hand region whose indefiniteness has been eliminated by the normal calculation unit.
  11.  前記認識部は、
     学習用被写体を撮像した偏光方向が異なる複数の偏光画像に基づいて算出した法線から前記学習用被写体に応じた教師データを生成する教師データ生成部と、
     前記教師データ生成部によって学習用被写体毎に生成された前記教師データを記憶する教師データベース部と、
     前記認識対象の被写体を撮像した偏光方向が異なる複数の偏光画像に基づいて算出した法線を用いて前記認識対象に応じて生成した生徒データと、前記教師データベース部に記憶されている教師データに基づいて、前記認識対象の被写体を認識する認識処理部とを有する
    請求項1記載の画像処理装置。
    The recognition unit
    A teacher data generation unit that generates teacher data corresponding to the learning subject from normals calculated based on a plurality of polarized images having different polarization directions obtained by imaging the learning subject;
    A teacher database unit for storing the teacher data generated for each learning subject by the teacher data generation unit;
    Student data generated according to the recognition target using normals calculated based on a plurality of polarization images having different polarization directions obtained by imaging the subject to be recognized, and teacher data stored in the teacher database unit The image processing apparatus according to claim 1, further comprising: a recognition processing unit that recognizes a subject to be recognized.
  12.  偏光画像取得部は、前記認識対象および前記学習用被写体毎に前記偏光方向が異なる複数の偏光画像を取得して、
     前記法線算出部は、前記偏光画像取得部で取得された偏光画像に基づいて、前記認識対象および前記学習用被写体毎に法線を算出する
    請求項11記載の画像処理装置。
    The polarization image acquisition unit acquires a plurality of polarization images having different polarization directions for each of the recognition target and the learning subject,
    The image processing apparatus according to claim 11, wherein the normal line calculation unit calculates a normal line for each of the recognition target and the learning subject based on the polarization image acquired by the polarization image acquisition unit.
  13.  前記学習用被写体を撮像した偏光方向が異なる複数の偏光画像を取得する学習用偏光画像取得部と、
     前記学習用偏光画像取得部で取得された偏光画像に基づいて法線を算出する学習用法線算出部をさらに備える
    請求項11記載の画像処理装置。
    A learning polarization image acquisition unit that acquires a plurality of polarization images having different polarization directions obtained by imaging the learning subject;
    The image processing apparatus according to claim 11, further comprising a learning normal calculation unit that calculates a normal based on the polarization image acquired by the learning polarization image acquisition unit.
  14.  前記教師データは、前記学習用被写体の法線の分布を示すデータであり、前記生徒データは前記認識対象の被写体について算出した法線の分布を示すデータである
    請求項11記載の画像処理装置。
    The image processing apparatus according to claim 11, wherein the teacher data is data indicating a distribution of normals of the learning subject, and the student data is data indicating a distribution of normals calculated for the subject to be recognized.
  15.  認識処理部は、前記生徒データに最も近似した教師データに対応する学習用被写体を認識結果とする
    請求項11記載の画像処理装置。
    The image processing apparatus according to claim 11, wherein the recognition processing unit recognizes the learning subject corresponding to the teacher data closest to the student data.
  16.  偏光画像取得部で、認識対象の被写体が撮像されている偏光方向が異なる複数の偏光画像を取得することと、
     法線算出部で、前記偏光画像取得部によって取得された偏光画像に基づいて、画素毎に法線を算出することと、
     認識部で、前記法線算出部で算出された法線を用いて前記被写体の認識を行うこと
    を含む画像処理方法。
    Obtaining a plurality of polarization images having different polarization directions in which a subject to be recognized is imaged by the polarization image acquisition unit;
    In the normal calculation unit, based on the polarization image acquired by the polarization image acquisition unit, calculating a normal for each pixel;
    An image processing method comprising: recognizing the subject using a normal calculated by the normal calculation unit in a recognition unit.
  17.  認識対象の被写体が撮像されている偏光方向が異なる複数の偏光画像を取得する手順と、
     前記取得された偏光画像に基づいて画素毎に法線を算出する手順と、
     前記算出された法線を用いて前記被写体の認識を行う手順と
    をコンピュータで実行させるプログラム。
    A procedure for acquiring a plurality of polarization images having different polarization directions in which a subject to be recognized is imaged,
    A procedure for calculating a normal for each pixel based on the acquired polarization image;
    A program for causing a computer to execute a procedure for recognizing the subject using the calculated normal.
PCT/JP2016/056191 2015-04-30 2016-03-01 Image processing device, image processing method, and program WO2016174915A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
ES16786192T ES2929648T3 (en) 2015-04-30 2016-03-01 Image processing device, image processing method and program
US15/565,968 US10444617B2 (en) 2015-04-30 2016-03-01 Image processing apparatus and image processing method
JP2017515413A JP6693514B2 (en) 2015-04-30 2016-03-01 Image processing apparatus, image processing method, and program
CN201680023380.8A CN107533370B (en) 2015-04-30 2016-03-01 Image processing apparatus, image processing method, and program
EP16786192.1A EP3291052B1 (en) 2015-04-30 2016-03-01 Image processing device, image processing method, and program

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2015-093070 2015-04-30
JP2015093070 2015-04-30

Publications (1)

Publication Number Publication Date
WO2016174915A1 true WO2016174915A1 (en) 2016-11-03

Family

ID=57199057

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2016/056191 WO2016174915A1 (en) 2015-04-30 2016-03-01 Image processing device, image processing method, and program

Country Status (6)

Country Link
US (1) US10444617B2 (en)
EP (1) EP3291052B1 (en)
JP (1) JP6693514B2 (en)
CN (1) CN107533370B (en)
ES (1) ES2929648T3 (en)
WO (1) WO2016174915A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3392606A1 (en) * 2017-04-17 2018-10-24 Faro Technologies, Inc. Three-dimensional inspection
WO2019021569A1 (en) * 2017-07-26 2019-01-31 ソニー株式会社 Information processing device, information processing method, and program
JP2019020330A (en) * 2017-07-20 2019-02-07 セコム株式会社 Object detector
WO2019069536A1 (en) * 2017-10-05 2019-04-11 ソニー株式会社 Information processing device, information processing method, and recording medium
JP2020008399A (en) * 2018-07-06 2020-01-16 富士通株式会社 Distance measuring device, distance measuring method, and distance measuring program
JPWO2020105679A1 (en) * 2018-11-21 2021-10-07 ソニーグループ株式会社 Work discrimination system, work discrimination device and work discrimination method
JP7213396B1 (en) * 2021-08-30 2023-01-26 ソフトバンク株式会社 Electronics and programs

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018029280A (en) * 2016-08-18 2018-02-22 ソニー株式会社 Imaging device and imaging method
WO2019102734A1 (en) * 2017-11-24 2019-05-31 ソニー株式会社 Detection device and electronic device manufacturing method
EP3617999B1 (en) * 2018-09-01 2023-04-19 Tata Consultancy Services Limited Systems and methods for dense surface reconstruction of an object using graph signal processing
EP3852375B1 (en) * 2018-09-12 2024-09-25 Sony Group Corporation Image processing device, image processing method, and program
US11004253B2 (en) * 2019-02-21 2021-05-11 Electronic Arts Inc. Systems and methods for texture-space ray tracing of transparent and translucent objects
WO2020202695A1 (en) * 2019-04-03 2020-10-08 ソニー株式会社 Image processing device and information generation device and methods therefor
EP3937476A4 (en) * 2019-04-19 2022-05-04 Sony Group Corporation Image capturing device, image processing device, and image processing method
WO2021084907A1 (en) * 2019-10-30 2021-05-06 ソニー株式会社 Image processing device, image processing method, and image processing program
CN114731368B (en) * 2019-12-13 2024-07-16 索尼集团公司 Imaging apparatus, information processing apparatus, imaging method, and information processing method
JP2022095024A (en) * 2020-12-16 2022-06-28 キヤノン株式会社 Learning data generating apparatus, learning data generating method, and computer program

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10187978A (en) * 1996-12-27 1998-07-21 Sanyo Electric Co Ltd Component form recognizing method
JP2008146243A (en) * 2006-12-07 2008-06-26 Toshiba Corp Information processor, information processing method and program
WO2009147814A1 (en) * 2008-06-02 2009-12-10 パナソニック株式会社 Image processing device, method, and computer program for generating normal (line) information, and viewpoint-converted image generation device
WO2012011246A1 (en) * 2010-07-21 2012-01-26 パナソニック株式会社 Image processing device

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5028138A (en) * 1989-05-23 1991-07-02 Wolff Lawrence B Method of and apparatus for obtaining object data by machine vision form polarization information
US6239907B1 (en) * 1999-09-03 2001-05-29 3M Innovative Properties Company Rear projection screen using birefringent optical film for asymmetric light scattering
US7853082B2 (en) * 2004-11-08 2010-12-14 Panasonic Corporation Normal line information estimation device, registered image group formation device and image collation device, and normal line information estimation method
US20090171415A1 (en) * 2006-04-27 2009-07-02 Kenneth Dowling Coated medical leads and method for preparation thereof
US7711182B2 (en) * 2006-08-01 2010-05-04 Mitsubishi Electric Research Laboratories, Inc. Method and system for sensing 3D shapes of objects with specular and hybrid specular-diffuse surfaces
EP2202688B1 (en) 2007-02-13 2013-11-20 Panasonic Corporation System, method and apparatus for image processing and image format
EP2071280B1 (en) 2007-08-07 2015-09-30 Panasonic Intellectual Property Management Co., Ltd. Normal information generating device and normal information generating method
WO2010116476A1 (en) * 2009-03-30 2010-10-14 富士通オプティカルコンポーネンツ株式会社 Optical device
JP5664152B2 (en) 2009-12-25 2015-02-04 株式会社リコー Imaging device, in-vehicle imaging system, and object identification device
JP5588196B2 (en) * 2010-02-25 2014-09-10 キヤノン株式会社 Recognition device, control method therefor, and computer program
JP5658618B2 (en) 2011-05-16 2015-01-28 パナソニックIpマネジメント株式会社 Operation input device, program
US20150253428A1 (en) * 2013-03-15 2015-09-10 Leap Motion, Inc. Determining positional information for an object in space
JP6004235B2 (en) * 2012-02-03 2016-10-05 パナソニックIpマネジメント株式会社 Imaging apparatus and imaging system
US9025067B2 (en) 2013-10-09 2015-05-05 General Electric Company Apparatus and method for image super-resolution using integral shifting optics
JP2015115041A (en) * 2013-12-16 2015-06-22 ソニー株式会社 Image processor, and image processing method
JP6456156B2 (en) * 2015-01-20 2019-01-23 キヤノン株式会社 Normal line information generating apparatus, imaging apparatus, normal line information generating method, and normal line information generating program
CN107251553B (en) * 2015-02-27 2019-12-17 索尼公司 Image processing apparatus, image processing method, and image pickup element
US10260866B2 (en) * 2015-03-06 2019-04-16 Massachusetts Institute Of Technology Methods and apparatus for enhancing depth maps with polarization cues
US9741163B2 (en) * 2015-12-22 2017-08-22 Raytheon Company 3-D polarimetric imaging using a microfacet scattering model to compensate for structured scene reflections

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10187978A (en) * 1996-12-27 1998-07-21 Sanyo Electric Co Ltd Component form recognizing method
JP2008146243A (en) * 2006-12-07 2008-06-26 Toshiba Corp Information processor, information processing method and program
WO2009147814A1 (en) * 2008-06-02 2009-12-10 パナソニック株式会社 Image processing device, method, and computer program for generating normal (line) information, and viewpoint-converted image generation device
WO2012011246A1 (en) * 2010-07-21 2012-01-26 パナソニック株式会社 Image processing device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HAYATO MURAI: "Hand pose estimation using orientation histograms", THE ROBOTICS AND MECHATRONICS CONFERENCE 2014 KOEN RONBUNSHU, THE JAPAN SOCIETY OF MECHANICAL ENGINEERS, 24 May 2014 (2014-05-24), XP009506987 *
See also references of EP3291052A4 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3392606A1 (en) * 2017-04-17 2018-10-24 Faro Technologies, Inc. Three-dimensional inspection
US10462444B2 (en) 2017-04-17 2019-10-29 Faro Technologies, Inc. Three-dimensional inspection
JP2019020330A (en) * 2017-07-20 2019-02-07 セコム株式会社 Object detector
US11189042B2 (en) * 2017-07-26 2021-11-30 Sony Corporation Information processing device, information processing method, and computer program
WO2019021569A1 (en) * 2017-07-26 2019-01-31 ソニー株式会社 Information processing device, information processing method, and program
JP7103357B2 (en) 2017-07-26 2022-07-20 ソニーグループ株式会社 Information processing equipment, information processing methods, and programs
JPWO2019021569A1 (en) * 2017-07-26 2020-06-11 ソニー株式会社 Information processing apparatus, information processing method, and program
WO2019069536A1 (en) * 2017-10-05 2019-04-11 ソニー株式会社 Information processing device, information processing method, and recording medium
US11244145B2 (en) 2017-10-05 2022-02-08 Sony Corporation Information processing apparatus, information processing method, and recording medium
JP7071633B2 (en) 2018-07-06 2022-05-19 富士通株式会社 Distance measuring device, distance measuring method and distance measuring program
JP2020008399A (en) * 2018-07-06 2020-01-16 富士通株式会社 Distance measuring device, distance measuring method, and distance measuring program
JPWO2020105679A1 (en) * 2018-11-21 2021-10-07 ソニーグループ株式会社 Work discrimination system, work discrimination device and work discrimination method
JP7435464B2 (en) 2018-11-21 2024-02-21 ソニーグループ株式会社 Workpiece discrimination system, workpiece discrimination device, and workpiece discrimination method
JP7213396B1 (en) * 2021-08-30 2023-01-26 ソフトバンク株式会社 Electronics and programs

Also Published As

Publication number Publication date
US20180107108A1 (en) 2018-04-19
CN107533370A (en) 2018-01-02
EP3291052A1 (en) 2018-03-07
ES2929648T3 (en) 2022-11-30
CN107533370B (en) 2021-05-11
JP6693514B2 (en) 2020-05-13
US10444617B2 (en) 2019-10-15
JPWO2016174915A1 (en) 2018-02-22
EP3291052B1 (en) 2022-10-05
EP3291052A4 (en) 2018-12-26

Similar Documents

Publication Publication Date Title
JP6693514B2 (en) Image processing apparatus, image processing method, and program
US10636155B2 (en) Multi-modal depth mapping
Pavlakos et al. Harvesting multiple views for marker-less 3d human pose annotations
CN105389554B (en) Living body determination method and equipment based on recognition of face
JP5873442B2 (en) Object detection apparatus and object detection method
JP5715833B2 (en) Posture state estimation apparatus and posture state estimation method
US9684815B2 (en) Mobility empowered biometric appliance a tool for real-time verification of identity through fingerprints
CN107273846B (en) Human body shape parameter determination method and device
US20130051626A1 (en) Method And Apparatus For Object Pose Estimation
WO2012117687A1 (en) Posture estimation device, posture estimation system, and posture estimation method
CN103871045B (en) Display system and method
JP5672112B2 (en) Stereo image calibration method, stereo image calibration apparatus, and computer program for stereo image calibration
JP5170094B2 (en) Spoofing detection system, spoofing detection method, and spoofing detection program
WO2006049147A1 (en) 3d shape estimation system and image generation system
CN112102947A (en) Apparatus and method for body posture assessment
CN111652018A (en) Face registration method and authentication method
KR101818992B1 (en) COSMETIC SURGERY method USING DEPTH FACE RECOGNITION
WO2019016879A1 (en) Object detection device and object detection method
JP2010072910A (en) Device, method, and program for generating three-dimensional model of face
Medioni et al. Non-cooperative persons identification at a distance with 3D face modeling
JP5092093B2 (en) Image processing device
JP2007257489A (en) Image processor and image processing method
KR101711307B1 (en) Portable and Computer Equipment Unlock System using Depth Face Recognition
JP2004288222A (en) Image collation device, image collation method, recording medium recording its control program
Berretti et al. Face recognition by SVMS classification of 2D and 3D radial geodesics

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16786192

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2017515413

Country of ref document: JP

Kind code of ref document: A

REEP Request for entry into the european phase

Ref document number: 2016786192

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 15565968

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE