KR101357581B1

KR101357581B1 - A Method of Detecting Human Skin Region Utilizing Depth Information

Info

Publication number: KR101357581B1
Application number: KR1020120066939A
Authority: KR
Inventors: 장석우
Original assignee: 안양대학교 산학협력단
Priority date: 2012-06-21
Filing date: 2012-06-21
Publication date: 2014-02-05
Also published as: KR20130143405A

Abstract

스테레오 3차원 영상으로부터 유사한 깊이 특징을 가지는 화소들을 레이블링하고, 레이블링한 영역 중에서 인간의 피부색상 분포를 가지는 영역들을 실제적인 피부색상 영역으로 검출하는 깊이 정보 기반 사람의 피부 영역 검출 방법에 관한 것으로서, (a) 상기 스테레오 3차원 영상으로부터 깊이 정보를 추출하는 단계; (b) 상기 깊이 정보의 영상을 대상으로 레이블링하는 단계; (c) 상기 3차원 영상으로부터 피부색상 영역을 검출하는 단계; (d) 상기 레이블링 영역과 상기 피부색상 영역을 AND 결합하여 피부 영역의 후보군을 검출하는 단계; 및, (e) 상기 피부 영역의 후보군에서 질감 복잡도를 평가하여 최종적인 피부 영역을 검출하는 단계를 포함하는 구성을 마련한다.
상기와 같은 피부 영역 검출 방법에 의하여, 색상 특징과 깊이 특징을 효과적으로 결합함으로써 기존의 배경 영역에서 부정확하게 검출되는 피부색상 영역의 오류를 상당수 제거할 수 있고, 이로 인해 전반적으로 보다 정확하게 피부영역을 추출할 수 있다.A method of detecting a depth-based human skin region by labeling pixels having similar depth characteristics from a stereo three-dimensional image and detecting regions having a distribution of human skin color among the labeled regions as actual skin color regions. a) extracting depth information from the stereo three-dimensional image; (b) labeling the image of the depth information as an object; (c) detecting a skin color region from the 3D image; (d) detecting a candidate group of skin regions by AND combining the labeling region and the skin color region; And (e) evaluating texture complexity in the candidate group of skin regions to detect the final skin region.
By using the skin region detection method as described above, by effectively combining the color feature and the depth feature, a large number of errors of the skin color region which are incorrectly detected in the existing background region can be eliminated, thereby extracting the skin region more accurately. can do.

Description

Depth Information-based Human Skin Region Detection Method {A Method of Detecting Human Skin Region Utilizing Depth Information}

본 발명은 스테레오 3차원 영상으로부터 유사한 깊이 특징을 가지는 화소들을 레이블링하고, 레이블링한 영역 중에서 인간의 피부색상 분포를 가지는 영역들을 실제적인 피부색상 영역으로 검출하는 깊이 정보 기반 사람의 피부 영역 검출 방법에 관한 것이다.
The present invention relates to a depth information-based human skin region detection method for labeling pixels having similar depth features from a stereo three-dimensional image and detecting regions having a distribution of human skin color among the labeled regions as actual skin color regions. will be.

최근 들어, 저가의 디지털 카메라, 그리고 카메라가 내장된 스마트 폰이나 태블릿 컴퓨터 등의 빠른 보급으로 인해 정지영상이나 동영상을 촬영하기가 매우 용이해졌으며, 이로 인해 영상 자료들의 양도 기하급수적으로 늘어나고 있다. 더불어, 이런 자료들을 효과적으로 관리하기 위한 다양한 영상 분석 기술에 대한 관심도 꾸준히 증가하고 있다.Recently, due to the rapid spread of low-cost digital cameras and smartphones or tablet computers with built-in cameras, it has become very easy to shoot still images and videos, and the amount of video data is increasing exponentially. In addition, there is a growing interest in various image analysis techniques to effectively manage these materials.

이런 영상 분석 분야에서 수행되는 중요한 연구주제들 중의 하나는 주어진 영상에서 특정한 색상을 가지고 있는 영역을 찾는 것이다. 이 주제들 중에서 특히 인간의 피부색상(skin color)을 나타내는 영역을 정확하게 추출하는 기법은 입력되는 정지영상 또는 동영상으로부터 사람을 검출하는데 필요한 의미 있는 단서를 제공하므로 매우 중요하다. 그리고 이런 피부색상의 추출은 여러 응용분야, 예를 들어 이동 물체의 감지 및 추적, 손 영역 검출을 이용한 제스처 인식, 얼굴 인식, 내용기반의 영상 검색, 유해 콘텐츠 탐지 및 필터링 등의 분야에서 매우 유용하게 사용된다[문헌 1,2].One of the major research topics in this field of image analysis is to find areas with specific colors in a given image. Among these subjects, the technique of accurately extracting the area representing human skin color is particularly important because it provides meaningful clues for detecting a person from an input still image or video. This extraction of skin color is very useful in many applications such as detection and tracking of moving objects, gesture recognition using hand area detection, face recognition, content-based image retrieval, harmful content detection and filtering. It is used [Document 1,2].

관련 문헌에서 2차원 위주의 피부색상 영역을 추출하는 기존의 여러 방법들을 확인할 수 있다. Lee는 YCbCr 공간에서 특별한 조명 효과에 기인한 색상 치우침을 견딜 수 있는 피부 색상 모델들을 이용해 피부 색상을 분할했다[문헌 3]. 그런 다음, 여러 가지 특징들을 이용하여 분할된 피부 영역의 진위 여부를 파악했다. Cho는 HSV 칼라공간을 이용하여 어떤 정해진 임계값을 기준으로 영상 전체의 색상 및 밝기에 따라 임계값을 적응적으로 이동시켜 얼굴 영역을 찾는 방법을 제안하였다[문헌 4]. 이 방법은 한 인종(황인종)만의 피부색상 영역 검출을 시도하였으므로, 다른 인종(백인 또는 흑인)의 피부색상이 존재할 경우에는 문제가 발생하였다. Hsu는 EyeMap과 MouthMap을 이용하여 얼굴을 검출하는 새로운 방법을 제안하였다[문헌 5]. 이 방법에서 얼굴의 구성요소는 피부색상 모델로부터 유도된 특징맵을 사용하여 검출하였으며, 검출된 얼굴 구성요소들의 기하학적인 특징을 기반으로 서로의 관계가 정의되었다. 그러나 이 방법은 간단한 기하학적인 관계만을 사용하였기 때문에 유연성에 여러 가지 제한이 있었다. Fang은 얼굴 검출을 위한 새로운 칼라 히스토그램 기반의 방법을 제안하였다[문헌 6]. 이 방법에서는 얼굴의 서로 다른 영역에 대한 칼라 히스토그램을 연결하여 이 영역들 사이의 공간적인 관계를 설정하는 벡터를 형성하였고, 이 벡터를 이용하여 효과적으로 얼굴을 검출하는 알고리즘을 개발하였다. 위에서 언급한 방법들 이외의 다른 방법들도 계속해서 문헌에 소개되고 있다[문헌 7].In the related literature, several existing methods for extracting two-dimensional skin color regions can be identified. Lee segmented skin color using skin color models that can withstand color bias due to special lighting effects in the YCbCr space [3]. The various features were then used to determine the authenticity of the segmented skin area. Cho proposed a method of finding the facial region by adaptively shifting the threshold value according to the color and brightness of the entire image based on a predetermined threshold value using the HSV color space [Ref. 4]. This method attempts to detect the skin color area of only one race (yellow race), and therefore, a problem occurs when the skin color of another race (white or black) is present. Hsu proposed a new method for face detection using EyeMap and MouthMap [Ref. 5]. In this method, facial components were detected using a feature map derived from the skin color model, and their relationships were defined based on the geometric features of the detected facial components. However, this method uses only simple geometric relationships, which limits its flexibility. Fang proposed a new color histogram-based method for face detection [Ref. 6]. In this method, the color histograms for the different areas of the face are connected to form a vector that sets up the spatial relationship between these areas, and the algorithm is developed to detect the face effectively using the vector. In addition to the methods mentioned above, other methods continue to be introduced in the literature [7].

이런 기존의 알고리즘들은 피부 색상 영역 추출 알고리즘에 여전히 많은 문제를 내포하고 있다. 즉, 인간의 피부 색상은 개인 간의 차이나 인종(race) 사이의 차이 등으로 인해 촬영된 영상에 포함된 피부 색상은 기본적으로 동일하지 않다. 뿐만 아니라, 색조화장, 분장, 사용된 카메라, 조명의 변화 등의 여러 가지 다른 환경조건으로 인해서 입력영상 내에 존재하는 피부 색상은 조금씩 다르다. 특히, 색상, 텍스처, 모양, 그리고 기하학적인 관계 등의 2차원적인 특징만을 사용하여 피부색상을 추출하는 것은 제한점이 존재한다.These existing algorithms still have many problems with the skin color gamut extraction algorithm. That is, the skin color of the human skin is not basically the same in the photographed image due to differences between individuals or races. In addition, the skin color in the input image is slightly different due to various environmental conditions such as color makeup, makeup, camera used, and lighting changes. In particular, there are limitations in extracting skin color using only two-dimensional features such as color, texture, shape, and geometric relationship.

상기와 같은 피부 색상 영역 추출 방법을 3차원 영상에 적용하여 사람의 얼굴 영역을 검출할 수 있다. 그러나 이 경우에도 2차원 영상의 피부 색상 영역 추출 방법에서 나타나는 문제점이 여전히 존재한다.The skin color region extraction method as described above may be applied to a 3D image to detect a human face region. However, even in this case, there still exists a problem in the skin color region extraction method of the 2D image.

따라서 2차원 위주의 피부색상 추출 방법에서 가장 많이 사용하는 색상 특징과, 3차원 영상의 특징인 3차원적인 거리(depth) 정보를 결합하여 보다 정확하게 피부색상 영역을 검출하는 방법이 필요하다.
Therefore, there is a need for a method of more accurately detecting a skin color region by combining color features most frequently used in a two-dimensional skin color extraction method with three-dimensional depth information, which is a characteristic of a three-dimensional image.

[문헌 1] A. Drosou, D. Ioannidis, K. Moustakas, and D. Tzovaras, "Spatiotemporal Analysis of Human Activities for Biometric Authentication," Computer Vision and Image Understanding, Vol. 116, No. 3, pp. 411-421, 2012.[1] A. Drosou, D. Ioannidis, K. Moustakas, and D. Tzovaras, "Spatiotemporal Analysis of Human Activities for Biometric Authentication," Computer Vision and Image Understanding, Vol. 116, No. 3, pp. 411-421, 2012. [문헌 2] S.-W. Jang, Y.-J. Park, G.-Y. Kim, and S.-Y. Lee, "Skin Region Extraction Combining 3D Depth and Color Features," In Proc. of the Winter Conference on the Korea Society of Computer and Information, Vol. 20, No. 1, pp. 201-204, 2012.[Document 2] S.-W. Jang, Y.-J. Park, G.-Y. Kim, and S.-Y. Lee, "Skin Region Extraction Combining 3D Depth and Color Features," In Proc. of the Winter Conference on the Korea Society of Computer and Information, Vol. 20, No. 1, pp. 201-204, 2012. [문헌 3] J.-S. Lee, Y.-M. Kuo, P.-C. Chung, and E.-L. Chen, "Naked Image Detection based on Adaptive and Extensible Skin Color Model," Pattern Recognition, Vol. 40, No. 8, pp. 2261-2270, 2007.[Reference 3] J.-S. Lee, Y.-M. Kuo, P.-C. Chung, and E.-L. Chen, "Naked Image Detection based on Adaptive and Extensible Skin Color Model," Pattern Recognition, Vol. 40, No. 8, pp. 2261-2270, 2007. [문헌 4] K.-M. Cho, J.-H. Jang, and K.-S. Hong, "Adaptive Skin-Color Filter," Pattern Recognition, Vol. 34, No. 5, pp. 1067-1073, 2001.Document 4 K.-M. Cho, J.-H. Jang, and K.-S. Hong, "Adaptive Skin-Color Filter," Pattern Recognition, Vol. 34, No. 5, pp. 1067-1073, 2001. [문헌 5] R.-L. Hsu, M. Abdel-Mottaleb, and A. K. Jain, "Face Detection in Color Images," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, No. 5, pp. 696-706, 2002.Document 5 R.-L. Hsu, M. Abdel-Mottaleb, and A. K. Jain, "Face Detection in Color Images," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, No. 5, pp. 696-706, 2002. [문헌 6] J. Fang and G. Qiu, "A color histogram-based approach to human face detection," In Proc. of the International Conference on Visual Information Engineering, pp. 133- 136, 2003.J. Fang and G. Qiu, "A color histogram-based approach to human face detection," In Proc. of the International Conference on Visual Information Engineering, pp. 133-136, 2003. [문헌 7] K. M. Lee, "Component-based Face Detection and Verification," Pattern Recognition Letters, Vol. 29, No. 3, pp. 200-214, 2008.[7] K. M. Lee, "Component-based Face Detection and Verification," Pattern Recognition Letters, Vol. 29, No. 3, pp. 200-214, 2008. [문헌 8] N. Baha and S. Larabi, "Accurate Real-Time Neural Disparity MAP Estimation with FPGA," Pattern Recognition, Vol. 45, No. 3, pp. 1195-1204, 2012.[8] N. Baha and S. Larabi, "Accurate Real-Time Neural Disparity MAP Estimation with FPGA," Pattern Recognition, Vol. 45, No. 3, pp. 1195-1204, 2012. [문헌 9] Y.-M. Paik, H.-J. Choi, Y.-H. Seo, and D.-W. Kim, "A Study on the Outlier Improvement Method Using Cost Function," In Proc. of the Fall Conf. of the Korean Society of Broadcasting Engineers, pp. 269-272, 2009.Document 9 Y.-M. Paik, H.-J. Choi, Y.-H. Seo, and D.-W. Kim, "A Study on the Outlier Improvement Method Using Cost Function," In Proc. of the Fall Conf. of the Korean Society of Broadcasting Engineers, pp. 269-272, 2009. [문헌 10] G.-J. Liu, X.-L. Tang, H.-D. Cheng, J.-H. Huang, and J.-F. Liu, "A Novel Approach for Tracking High Speed Skaters in Sports Using a Panning Camera," Pattern Recognition, Vol. 42, No. 11, pp. 2922-2935, 2009.Document 10 G.-J. Liu, X.-L. Tang, H.-D. Cheng, J.-H. Huang, and J.-F. Liu, "A Novel Approach for Tracking High Speed Skaters in Sports Using a Panning Camera," Pattern Recognition, Vol. 42, No. 11, pp. 2922-2935, 2009. [문헌 11] H.-H. Do, S. Melnik, and E. Rahm, "Comparison of Schema Matching Evaluations," Lecture Notes in Computer Science, Vol. 2593, pp.221-237, 2003.Document 11 H.-H. Do, S. Melnik, and E. Rahm, "Comparison of Schema Matching Evaluations," Lecture Notes in Computer Science, Vol. 2593, pp. 221-237, 2003. [문헌 12] N. Otsu, "A Threshold Selection Method from Gray-Level Histogram," IEEE Transactions on Systems, Man and Cybernetics, Vol. 9, No. 1, pp. 62-66, 1979.12. N. Otsu, "A Threshold Selection Method from Gray-Level Histogram," IEEE Transactions on Systems, Man and Cybernetics, Vol. 9, No. 1, pp. 62-66, 1979.

본 발명의 목적은 상술한 바와 같은 문제점을 해결하기 위한 것으로, 스테레오 3차원 영상으로부터 유사한 깊이 특징을 가지는 화소들을 레이블링하고, 레이블링한 영역 중에서 인간의 피부색상 분포를 가지는 영역들을 실제적인 피부색상 영역으로 검출하는 깊이 정보 기반 사람의 피부 영역 검출을 제공하는 것이다.SUMMARY OF THE INVENTION An object of the present invention is to solve the above-mentioned problems, and to label pixels having similar depth characteristics from a stereo three-dimensional image, and to have a real skin color region among the labeled regions with human skin color distribution. It is to provide a depth information based human skin area detection.

상기 목적을 달성하기 위해 본 발명은 좌우 영상으로 구성된 스테레오 3차원 영상으로부터 사람의 피부 영역을 검출하는 깊이 정보 기반 사람의 피부 영역 검출 방법에 관한 것으로서, (a) 상기 스테레오 3차원 영상으로부터 깊이 정보를 추출하는 단계; (b) 상기 깊이 정보의 영상을 대상으로 레이블링하는 단계; (c) 상기 3차원 영상으로부터 피부색상 영역을 검출하는 단계; (d) 상기 레이블링 영역과 상기 피부색상 영역을 AND 결합하여 피부 영역의 후보군을 검출하는 단계; 및, (e) 상기 피부 영역의 후보군에서 질감 복잡도를 평가하여 최종적인 피부 영역을 검출하는 단계를 포함하는 것을 특징으로 한다.In order to achieve the above object, the present invention relates to a depth information based human skin area detection method for detecting a skin area of a human from a stereo three-dimensional image composed of left and right images, (a) depth information from the stereo three-dimensional image; Extracting; (b) labeling the image of the depth information as an object; (c) detecting a skin color region from the 3D image; (d) detecting a candidate group of skin regions by AND combining the labeling region and the skin color region; And (e) evaluating texture complexity in the candidate group of skin regions to detect the final skin region.

또, 본 발명은 깊이 정보 기반 사람의 피부 영역 검출 방법에 있어서, 상기 (a)단계에서, 그래프 컷(graph cuts) 기반의 스테레오 매칭 기법을 사용하여 깊이정보를 추출하는 것을 특징으로 한다.In addition, the present invention is a depth information-based human skin region detection method, in the step (a), characterized in that the depth information is extracted using a stereo matching technique based on graph cuts (graph cuts).

또, 본 발명은 깊이 정보 기반 사람의 피부 영역 검출 방법에 있어서, 상기 (a)단계에서, 상기 스테레오 3차원 영상의 좌우 칼라 영상을 좌우 그레이 영상으로 변환하고, 상기 좌우 그레이 영상 간의 스테레오 정합을 적용하여 깊이 정보를 추출하는 것을 특징으로 한다.In addition, the present invention provides a depth information-based human skin region detection method, in the step (a), converts the left and right color image of the stereo three-dimensional image to the left and right gray image, and applies stereo matching between the left and right gray image. And extracting depth information.

또, 본 발명은 깊이 정보 기반 사람의 피부 영역 검출 방법에 있어서, 상기 (c)단계에서, 상기 3차원 영상에서 눈 영역을 검출하여, 검출된 눈 영역으로부터 피부맵을 생성하고, 상기 피부맵으로부터 피부색상 모델을 생성하여 피부색상 영역을 추출하는 것을 특징으로 한다.In addition, the present invention is a depth information-based human skin area detection method, in the step (c), by detecting the eye area in the three-dimensional image, to generate a skin map from the detected eye area, and from the skin map The skin color model may be generated to extract a skin color region.

또, 본 발명은 깊이 정보 기반 사람의 피부 영역 검출 방법에 있어서, 상기 (c)단계에서, 추출된 눈 영역에 해당하는 최소 포함 사각형(MER)의 확장된 영역에 위치한 샘플을 선정하고, 선정된 샘플로부터 [수식 1]와 같은 피부 맵(skin map)을 생성하는 것을 특징으로 한다.In addition, in the depth information based human skin region detection method, in step (c), a sample located in an extended region of the minimum inclusion rectangle (MER) corresponding to the extracted eye region is selected and selected. It is characterized by generating a skin map such as [Equation 1] from the sample.

[수식 1][Equation 1]

단, C_r과 C_b는 눈 영역에 해당하는 최소 포함 사각형(MER)의 확장된 영역에 위치한 YC_bC_r 공간의 화소값이고,

과

는 각각 일반적인 피부 색상의 YC_bC_r 공간의 화소값 임.However, C _r and C _b are pixel values of the YC _b C _r space located in the extended area of the minimum containing rectangle (MER) corresponding to the eye area.

and

Are the pixel values of the YC _b C _r space of the normal skin color, respectively.

또, 본 발명은 깊이 정보 기반 사람의 피부 영역 검출 방법에 있어서, 상기 (e)단계에서, 상기 피부 영역의 후보군에 속하는 영역(이하 후보 영역)에 대하여, 소벨(Sobel) 에지 연산자를 적용하여 에지의 정도를 추출하여 질감 복잡도를 계산하는 것을 특징으로 한다.In addition, in the depth information based human skin region detection method, in step (e), edges are applied to a region (hereinafter, candidate region) belonging to the candidate group of the skin region by applying a Sobel edge operator. It is characterized by calculating the texture complexity by extracting the degree of.

또, 본 발명은 깊이 정보 기반 사람의 피부 영역 검출 방법에 있어서, 상기 피부 영역의 후보군에 속하는 i번째 후보 피부영역에 대한 질감 복잡도 T(R_i)는 [수식 2]에 의해 계산되는 것을 특징으로 한다.In another aspect, the present invention provides a depth information-based human skin region detection method, wherein the texture complexity T (R _i ) for the i-th candidate skin region belonging to the candidate group of the skin region is calculated by [Equation 2]. do.

[수식 2][Equation 2]

단, I_gray(x,y)는 x와 y 위치에서의 명암값이고,Where I _gray (x, y) is the contrast at the x and y positions,

w_h(x,y)와 w_v(x,y)는 소벨 에지 연산자의 수평과 수직 마스크이고,w _h (x, y) and w _v (x, y) are the horizontal and vertical masks of the Sobel edge operator,

E(x,y)는 x와 y 위치에서의 에지 정도이고,E (x, y) is the edge at x and y positions,

N(R_i)는 영역 R_i에 속한 화소의 개수임.
N (R _i ) is the number of pixels belonging to the region R _i .

상술한 바와 같이, 본 발명에 따른 깊이 정보 기반 사람의 피부 영역 검출 방법에 의하면, 색상 특징과 깊이 특징을 효과적으로 결합함으로써 기존의 배경 영역에서 부정확하게 검출되는 피부색상 영역의 오류를 상당수 제거할 수 있고, 이로 인해 전반적으로 보다 정확하게 피부영역을 추출할 수 있는 효과가 얻어진다.
As described above, according to the depth information-based human skin region detection method according to the present invention, by effectively combining the color feature and the depth feature, it is possible to remove a large number of errors in the skin color region that are incorrectly detected in the existing background region. As a result, the overall skin area can be extracted more accurately.

도 1은 본 발명을 실시하기 위한 전체 시스템의 구성을 도시한 도면이다.
도 2는 본 발명의 일실시예에 따른 깊이 정보 기반 사람의 피부 영역 검출 방법을 설명하는 흐름도이다.
도 3은 스테레오 정합의 기본 개념을 설명하기 위한 도면이다.
도 4는 본 발명에 따라 스테레오 영상에서 깊이정보를 추출하는 일례를 도시한 것이다.
도 5는 본 발명의 일실시예에 따라 피부색상 영역을 추출하는 단계를 설명하는 흐름도이다.
도 6은 본 발명에 따라 피부맵 생성과 이진화 과정의 일례를 도시한 것이다.
도 7은 본 발명에 따른 소벨 연산자 마스크의 일례이다.
도 8은 본 발명의 실험에 따른 피부영역 검출의 일례이다.
도 9는 본 발명의 실험에 따른 종래기술과의 성능 비교에 대한 표이다.
도 10은 본 발명의 실험에 따른 종래기술과의 성능 비교에 대한 그래프이다.
도 11은 본 발명의 일실시예에 따른 깊이 정보 기반 사람의 피부 영역 검출 시스템의 구성에 대한 블록도이다.1 is a diagram showing a configuration of an overall system for carrying out the present invention.
2 is a flowchart illustrating a method for detecting a skin region of a human based on depth information according to an embodiment of the present invention.
3 is a view for explaining the basic concept of stereo matching.
4 illustrates an example of extracting depth information from a stereo image according to the present invention.
5 is a flowchart illustrating a step of extracting a skin color region according to an embodiment of the present invention.
6 illustrates an example of a skin map generation and binarization process according to the present invention.
7 is an example of a Sobel operator mask in accordance with the present invention.
8 is an example of skin area detection according to an experiment of the present invention.
9 is a table for comparing the performance with the prior art according to the experiment of the present invention.
10 is a graph of a performance comparison with the prior art according to the experiment of the present invention.
FIG. 11 is a block diagram illustrating a configuration of a system for detecting a skin area of a human based on depth information according to an embodiment of the present invention.

이하, 본 발명의 실시를 위한 구체적인 내용을 도면에 따라서 설명한다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, the present invention will be described in detail with reference to the drawings.

또한, 본 발명을 설명하는데 있어서 동일 부분은 동일 부호를 붙이고, 그 반복 설명은 생략한다.
In the description of the present invention, the same parts are denoted by the same reference numerals, and repetitive description thereof will be omitted.

먼저, 본 발명을 실시하기 위한 전체 시스템의 구성의 예들에 대하여 도 1을 참조하여 설명한다.First, examples of the configuration of the entire system for carrying out the present invention will be described with reference to Fig.

도 1에서 보는 바와 같이, 본 발명에 따른 깊이 정보 기반 사람의 피부 영역 검출 방법은 스테레오 영상(또는 이미지)(10)을 입력받아 상기 영상(또는 이미지)에서 사람의 피부 영역을 검출하는 컴퓨터 단말(20) 상의 프로그램 시스템으로 실시될 수 있다. 즉, 상기 피부 영역 검출 방법은 프로그램으로 구성되어 컴퓨터 단말(20)에 설치되어 실행될 수 있다. 컴퓨터 단말(20)에 설치된 프로그램은 하나의 프로그램 시스템(30)과 같이 동작할 수 있다.As shown in FIG. 1, a depth information based human skin region detection method according to the present invention receives a stereo image (or image) 10 and detects a human skin region from the image (or image). 20) on a program system. That is, the skin region detection method may be configured as a program and installed and executed in the computer terminal 20. The program installed in the computer terminal 20 may operate like one program system 30.

한편, 다른 실시예로서, 상기 스테레오 정합 방법은 프로그램으로 구성되어 범용 컴퓨터에서 동작하는 것 외에 ASIC(주문형 반도체) 등 하나의 전자회로로 구성되어 실시될 수 있다. 또는 디지털 영상(또는 이미지)의 피부 영역 검출 등만을 전용으로 처리하는 전용 컴퓨터 단말(20)로 개발될 수도 있다. 이를 피부 영역 검출 장치라 부르기로 한다. 그 외 가능한 다른 형태도 실시될 수 있다.
Meanwhile, as another embodiment, the stereo matching method may be implemented as a program and operate in a general-purpose computer, and may be implemented as one electronic circuit such as an ASIC (custom semiconductor). Alternatively, the present invention may be developed as a dedicated computer terminal 20 for exclusively processing skin area detection of a digital image (or image). This is called a skin area detection device. Other possible forms may also be practiced.

다음으로, 본 발명의 일실시예에 따른 깊이 정보 기반 사람의 피부 영역 검출 방법을 도 2를 참조하여 보다 구체적으로 설명한다.Next, a method for detecting a skin region of a human based on depth information according to an embodiment of the present invention will be described in more detail with reference to FIG. 2.

도 2에서 보는 바와 같이, 본 발명에 따른 깊이 정보 기반 사람의 피부 영역 검출 방법은 (a) 3차원 영상으로부터 깊이 정보를 추출하는 단계(S10); (b) 깊이정보에서 화소들을 레이블링하여 영역을 분할하는 단계(S20); (c) 피부색상 영역을 추출하는 단계(S30); (d) 레이블링 영역과 상기 피부색상 영역을 AND 결합하여 피부 영역의 후보군을 검출하는 단계(S40); 및, (d) 질감 복잡도를 이용하여 최종적인 피부 영역을 검출하는 단계(S50)로 구성된다.As shown in Figure 2, the depth information-based human skin region detection method according to the present invention comprises the steps of (a) extracting depth information from the three-dimensional image (S10); (b) dividing an area by labeling pixels in depth information (S20); (c) extracting the skin color region (S30); (d) detecting a candidate group of skin regions by AND combining the labeling region and the skin color region (S40); And (d) detecting the final skin region using the texture complexity (S50).

본 발명은 기본적으로 3차원의 입체영상을 입력 받는다. 일반적으로, 3차원의 입체 디스플레이는 다시점 스테레오스코픽(stereoscopic) 비전 기술을 적용하여 2차원 영상에 깊이 정보를 부가하고, 이 깊이 정보로 인하여 시청자가 마치 영상이 제작되고 있는 현장에 있는 것 같은 생동감 및 현실감을 느낄 수 있게 하는 차세대 신기술이다.The present invention basically receives a three-dimensional stereoscopic image. In general, three-dimensional stereoscopic display uses multi-view stereoscopic vision technology to add depth information to a two-dimensional image, which makes the viewer feel as if they are in the scene where the image is being produced. And new generation technology that makes you feel the reality.

도 2는 본 발명에 따른 피부색상 영역 추출 방법의 전체적인 개요도를 블록 다이어그램의 형태로 보여준다. 제안된 방법은 먼저 입력된 칼라 영상을 그레이 영상으로 변환한 후, 스테레오 정합 기법을 이용하여 좌우 영상으로부터 카메라와 물체 사이의 거리를 나타내는 3차원의 깊이 특징을 추출한다. 그런 다음, 유사한 깊이 특징을 가지는 화소들을 레이블링하고, 레이블링한 영역 중에서 2차원의 피부색상 분포를 가지는 영역들만을 실제적인 피부색상 영역이라고 판단한다.
Figure 2 shows the overall schematic of the skin color region extraction method according to the invention in the form of a block diagram. The proposed method first converts the input color image into a gray image, and then extracts a three-dimensional depth feature representing the distance between the camera and the object from the left and right images using stereo matching. Then, pixels having similar depth characteristics are labeled, and only those regions having a two-dimensional skin color distribution among the labeled regions are determined as actual skin color regions.

먼저, 3차원 영상으로부터 깊이 정보를 추출하는 단계(S10)를 설명한다.First, an operation (S10) of extracting depth information from a 3D image will be described.

3차원의 깊이정보 추출을 위한 스테레오(stereo) 정합은 컴퓨터 비전 분야의 고전적인 문제 중의 하나이며, 로봇 네비게이션, 3차원 모델링, 영상 기반 렌더링 등의 많은 응용 분야에서 유용하게 사용된다. 스테레오 정합의 목적은 동일한 장면에 대한 두 개 이상의 영상이 주어질 때 참조(reference) 영상에 대한 변이(disparity) 맵을 계산하는 것인데, 여기서 변이는 대응하는 두 개 화소 사이의 위치의 차이를 의미한다. 그리고 변이 맵을 계산하기 위해서는 각 화소에 대한 대응성(correspondence) 문제를 해결해야 한다[문헌 8].Stereo matching for three-dimensional depth extraction is one of the classic problems in computer vision, and is useful in many applications such as robot navigation, three-dimensional modeling, and image-based rendering. The purpose of stereo matching is to calculate a disparity map for a reference image when two or more images for the same scene are given, where the variation refers to the difference in position between the corresponding two pixels. And in order to calculate the disparity map, it is necessary to solve the problem of correspondence with each pixel [Ref. 8].

일반적으로, 양안 스테레오인 경우 두 개의 입력영상은 보정된 카메라로 촬영되었고 교정(rectification)되었다고 가정하면 에피폴라 라인들은 수평이 된다. 그러나 이런 제약사항에도 불구하고 스테레오 정합의 불량조건(ill-posed) 특성 때문에 변이를 정확하게 결정하는 것은 여전히 어려운 문제로 남아 있다. 특히, 중첩되거나 텍스처가 부족한 영역에서는 더욱 어렵다[문헌 9].In general, in the case of binocular stereo, the epipolar lines are horizontal, assuming that two input images have been taken with a calibrated camera and have been rectified. However, despite these constraints, it is still difficult to accurately determine the variation due to the ill-posed nature of stereo matching. In particular, it is more difficult in the overlapped or lacking texture area [9].

보통, 스테레오 정합 알고리즘은 크게 지역적인 방법과 전역적인 방법으로 분류될 수 있다. 지역적인 방법은 대응점 검색의 구분력을 증가시키기 위해서 정합을 위한 일정한 크기의 윈도우를 사용한다. 이 방법에서 대응점은 지역적인 윈도우 안에서 다양한 정합 척도를 사용하여 명암값을 비교함으로써 추출될 수 있다. 이 방법은 매우 빠르나 지역적으로 중첩되거나 텍스처가 부족한 영역, 그리고 변이가 불연속적인 경계 부분에서는 정확도가 저하된다는 단점이 있다.In general, stereo matching algorithms can be broadly classified into local methods and global methods. The local method uses a window of constant size for matching to increase the discriminatory power of the correspondence point search. In this method the correspondence points can be extracted by comparing the contrast values using various matching measures within the local window. This method is very fast, but has the disadvantage of degrading accuracy at regions overlapping locally or lacking texture, and discontinuous boundaries.

전역적인 방법은 스테레오 정합의 불량조건(ill-posed) 문제를 해결하기 위해서 완만성(smoothness) 제한조건을 가진 에너지 최소화 알고리즘을 사용한다. 따라서 이 방법은 텍스처가 부족한 영역에서도 대응점을 찾을 수 있다. 그러나 완만성 제한조건으로 인해 변이정보의 불연속성을 제거시킨다. 따라서 불연속성을 보존하는 완만성 제한조건을 사용하는 스테레오 정합 알고리즘이 소개되었다. 전역적인 방법은 다양한 최소화 기법에 의해 대응점을 추출하는데, 최근에는 그래프 컷(graph cuts)과 신뢰 전파(belief propagation) 기반의 알고리즘들이 우수한 성능 때문에 많이 사용되고 있다. 그러나 상당수의 전역적인 방법들은 중첩 문제를 명확히 고려하지 않고 있다.The global method uses an energy minimization algorithm with a smoothness constraint to solve the ill-posed problem of stereo matching. Therefore, this method can find correspondence even in areas where texture is lacking. However, due to the gradual constraints, discontinuities in the disparity information are eliminated. Therefore, a stereo matching algorithm is introduced that uses a gentleness constraint that preserves discontinuities. Global methods extract matching points by various minimization techniques. Recently, graph cuts and trust propagation-based algorithms are used because of their superior performance. However, many global methods do not explicitly consider overlapping issues.

본 발명에서는 그래프 컷(graph cuts) 기반의 스테레오 매칭 기법을 사용하여 거리를 측정한다. 이 방법의 경우 색상정보는 고려하지 않고 그레이 영상을 사용한다.In the present invention, the distance is measured using a graph cuts based stereo matching technique. In this method, gray images are used without considering color information.

그래프 컷 기반의 방법은 전역적 최소화 방법 중의 하나이다. 컴퓨터 비전에서 에너지 함수를 최소화 하기 위한 많은 작업 중의 하나는 모든 픽셀에 레이블(label)을 할당하는 것인데, 보통 스테레오 분야에서 레이블은 변이(disparity)를 의미한다. 이 문제는 각각의 픽셀에 레이블 p ∈ P을 f_p ∈ L 로 할당하는 f 를 찾아내는 것을 목표로 한다. 여기서 P는 픽셀의 집합이고 L은 레이블의 집합이다. 그리고 이 문제를 해결하기 위해 에너지 함수를 공식화 할 수 있는데 함수의 형태는 다음과 같다.The graph cut-based method is one of the global minimization methods. One of the many tasks to minimize energy functions in computer vision is to assign a label to every pixel. In the stereo world, a label usually means disparity. This problem aims to find f that assigns the label p ∈ P to f _p ∈ L for each pixel. Where P is the set of pixels and L is the set of labels. And to solve this problem, we can formulate the energy function.

여기서 D_p 는 픽셀 p에 레이블 f_p를 얼마나 잘 할당할 수 있는지를 나타내는 데이터 항이다. V_p _,q는 완만성 항(Smoothness term)으로 픽셀 p, q 사이의 변이 변화를 나타낸다. 결론적으로, 그래프 컷은 에너지 함수를 최소화하는 f 를 찾기 위해 사용된다.
Where D _p Is a data term that indicates how well we can assign the label f _p to pixel p. V _p _{, q} represents a change in transition between pixels p and q in a smoothness term. In conclusion, the graph cut is used to find f that minimizes the energy function.

먼저 [수학식 1]을 이용하여 입력된 스테레오 칼라 영상을 그레이 영상으로 변환한다.First, the input stereo color image is converted into a gray image using [Equation 1].

[수학식 1][Equation 1]

단, r, g, b는 입력영상의 r, g, b 칼라값을 의미하고, gray는 흑백으로 변환시킨 화소의 명암값을 나타낸다.However, r, g, and b represent r, g, and b color values of the input image, and gray represents contrast values of pixels converted to black and white.

입력 영상을 그레이 영상으로 변환한 후에는 좌우 영상 간의 정합을 통해 깊이정보를 추출한다. 좌우 영상으로부터 깊이 정보를 추출하는 기본적인 개념은 도 3과 같다. 도 3과 같이, P는 실세계의 한 점, x_l과 x_r은 P가 좌우 영상에 맺힌 x좌표, f는 카메라의 초점거리, T는 카메라의 기선장(baseline), 그리고 Z는 추출하고자 하는 깊이 값이라고 하자. 그러면 두 개의 삼각형 (p_l,P,p_r)과 (O_l,P,O_r)은 닮은 형태이므로 [수학식 2]가 성립하고, [수학식 2]를 Z에 대해 전개하면 [수학식 3]과 같이 정리가 되어 깊이 정보인 Z를 추출할 수 있게 된다.After converting the input image to a gray image, depth information is extracted by matching the left and right images. The basic concept of extracting depth information from left and right images is shown in FIG. 3. As shown in FIG. 3, P is a point in the real world, x _l and x _r are x coordinates where P is bound to the left and right images, f is the focal length of the camera, T is the baseline of the camera, and Z is the depth to be extracted. Let's call it a value. Then, since two triangles (p _l , P, p _r ) and (O _l , P, O _r ) are similar, Equation 2 holds, and if you expand Equation 2 with respect to Z, As shown in [3], the depth information Z can be extracted.

[수학식 2]&Quot; (2) "

[수학식 3]&Quot; (3) "

스테레오 정합을 통한 좌우 영상의 변이(disparity)를 추출하는 작업은 앞서 설명했듯이 기존의 알고리즘인 그래프 컷 기반의 방법으로 추출한다. 그리고 [수학식 2]에서 변이는 x_l - x_r 을 나타내며, [수학식 3]에서 x_l - x_r 은 간단히 d로 표시된다. 결국, [수학식 3]에서 깊이 정보 Z는 변이 d를 이용하여 추출된다.
Extracting the disparity of the left and right images through stereo matching is performed using a graph cut-based method, which is an existing algorithm, as described above. In Equation 2, the variation represents x _l -x _r , and in Equation 3, x _l -x _r Is simply represented by d. As a result, in Equation 3, the depth information Z is extracted using the variation d.

다음으로, (b) 깊이정보에서 화소들을 레이블링하여 영역을 분할하는 단계(S20)를 설명한다.Next, step (S20) of dividing an area by labeling pixels in depth information will be described.

본 발명에서는 깊이 정보를 추출한 후에 깊이 값이 유사한 화소들을 레이블링하여 영역 단위로 분할한다. 즉, 깊이 정보로 구성된 2차원 영상을 대상으로 유사한 깊이에 대한 화소들을 레이블링하여 영상을 유사한 깊이 값을 가지는 영역 단위로 분할한다. 이때, 영상의 깊이 값의 차이가 소정의 범위 내인 포함되는 화소들을 동일하게 레이블링함으로써, 영역단위로 분할한다.In the present invention, after extracting depth information, pixels having similar depth values are labeled and divided into area units. That is, the pixels of similar depths are labeled with a two-dimensional image composed of depth information, and the image is divided into regions having similar depth values. In this case, the pixels included in the image having a difference in depth value within a predetermined range are equally labeled, thereby being divided into area units.

도 4는 스테레오 영상으로부터 깊이를 추출한 예를 보여준다. 도 4의 (a)와 (b)는 스테레오 영상의 왼쪽과 오른쪽 영상을 나타내고, (c)는 좌우 입력영상으로부터 스테레오 정합 알고리즘을 적용하여 깊이 정보를 추출한 결과를 보여준다.
4 shows an example of extracting depth from a stereo image. 4 (a) and 4 (b) show left and right images of a stereo image, and (c) shows a result of extracting depth information by applying a stereo matching algorithm from left and right input images.

다음으로, (c) 피부색상 영역을 추출하는 단계(S30)를 도 5를 참조하여 보다 구체적으로 설명한다.Next, (c) extracting the skin color region (S30) will be described in more detail with reference to FIG.

본 발명에서는 색상을 이용하여 영상으로부터 피부색상 영역을 획득한 후 이전 단계에서 추출한 깊이정보와 결합하여 최종적인 피부색상 영역을 추출한다.In the present invention, after obtaining a skin color region from an image using color, the final skin color region is extracted by combining with the depth information extracted in the previous step.

일반적으로, 기존의 대부분의 피부색상 추출 방법은 사전 학습을 통해 미리 정의된 피부색상 분포 모델을 사용하지만 여전히 본질적인 문제를 가지고 있다. 즉, 개개인의 고유한 피부색상의 차이 및 인종 간의 피부색상의 차이로 인해 사람마다 피부색상 자체가 동일하지 않다. 또한, 특수 분장이나 색조화장, 촬영 시 사용하는 광학기기, 조명 효과 등으로 인해 획득된 영상의 피부색상 영역이 동일하지 않다. 따라서 사전에 정의된 피부모델을 이용하는 기존의 알고리즘은 위에서 언급한 여러 가지의 변화를 극복하기가 어렵다. 이런 문제를 해결하기 위한 최적의 솔루션은 서로 다른 영상이 입력될 때마다 사람별로 고유한 피부색상 샘플을 신뢰성 있게 선택하여 입력영상에 적응적인 피부모델을 생성한 후, 이 피부모델을 이용해서 입력영상의 피부색상 영역을 추출해야 한다.In general, most existing skin color extraction methods use a predefined skin color distribution model through prior learning, but still have inherent problems. That is, the skin color itself is not the same for each person due to individual skin color differences and race color differences. In addition, the skin color region of the acquired image is not the same due to special makeup or tonal cosmetics, optical equipment used for photographing, and lighting effects. Therefore, the existing algorithm using a predefined skin model is difficult to overcome the various changes mentioned above. The optimal solution to this problem is to generate a skin model that is adaptive to the input image by selecting a unique skin color sample for each person whenever a different image is input, and then use the input image. The skin color area of the skin should be extracted.

이를 위해, 본 발명에서는 입력영상으로부터 얼굴의 주요 구성요소인 눈을 찾고, 찾아진 눈 영역 주변의 피부색상 샘플을 신뢰성 있게 추출하여 입력영상에 최적으로 적합한 피부모델을 생성한다. 그리고 이 모델을 이용하여 입력영상으로부터 피부영역을 추출한다. 이때, 입력 영상은 스테레오 3차원 영상의 좌우영상 중 어느 하나의 칼라 영상이다.
To this end, the present invention finds the eye, which is the main component of the face, from the input image, and reliably extracts skin color samples around the found eye area to generate a skin model that is optimally suited for the input image. The skin region is extracted from the input image using this model. In this case, the input image is a color image of any one of the left and right images of the stereo 3D image.

도 5a에서 보는 바와 같이, (c) 피부색상 영역을 추출하는 단계(S30)는 (c1) 전처리 단계(S31); (c2) 눈 영역을 검출하는 단계(S32); (c3) 눈 영역이 검출되지 않으면, 사전에 정해진 피부맵을 선정하는 단계(S33); (c4) 눈 영역이 검출되면, 눈 영역으로부터 피부맵을 생성하는 단계(S34); (c5) 피부맵으로부터 피부색상 모델을 생성하는 단계(S35); (c6) 피부색상 모델을 이용하여 피부색상 영역을 추출하는 단계(S36)로 구성된다.As shown in Figure 5a, (c) extracting the skin color region (S30) is (c1) pre-treatment step (S31); (c2) detecting an eye region (S32); (c3) selecting a predetermined skin map when the eye region is not detected (S33); (c4) when the eye region is detected, generating a skin map from the eye region (S34); (c5) generating a skin color model from the skin map (S35); (c6) extracting the skin color region using the skin color model (S36).

먼저, 전처리로서, 입력된 영상의 칼라공간 RGB 를 [수학식 4]와 같이 피부색상 추출에 가장 적합하다고 알려진 YC_bC_r 공간으로 변환한다(S31).First, as a preprocess, the color space RGB of the input image is converted into the YC _b C _r space which is known to be most suitable for skin color extraction as shown in [Equation 4] (S31).

[수학식 4]&Quot; (4) "

다음으로, 색상과 명암 기반의 맵을 혼합한 아이맵(EyeMap)[문헌 5]을 이용하여 양쪽 눈 영역을 검출한다(S32).Next, both eye regions are detected using EyeMap [Document 5] in which color and contrast based maps are mixed (S32).

다음으로, 눈 영역이 검출되지 않으면, 사전에 정해진 피부맵으로 이용할 피부맵을 선정한다(S33).Next, if the eye area is not detected, a skin map to be used as a predetermined skin map is selected (S33).

다음으로, 눈 영역이 검출되면, 눈 영역으로부터 피부맵을 생성한다(S34).Next, when the eye area is detected, a skin map is generated from the eye area (S34).

추출된 눈 영역에 해당하는 최소 포함 사각형(MER)의 5배 확장된 영역에 위치한 샘플을 선정하고, 선정된 샘플로부터 아래 [수학식 5]와 같은 피부 맵(skin map)을 생성한다(S34).A sample located in an area extended five times the minimum inclusion rectangle MER corresponding to the extracted eye region is selected, and a skin map as shown in Equation 5 below is generated from the selected sample (S34). .

[수학식 5]&Quot; (5) "

여기서,

과

는 각각 일반적인 피부 색상의 C_r과 C_b 값을 나타낸다. 그리고 피부 맵은 0에서 255사이의 값의 범위를 가지는데, 선택된 샘플이 평균적인 피부 색상에 근접할수록 255와 가까운 값을 가진다.here,

and

Are the common skin colors C _r and C _{b, respectively} Value. The skin map has a value ranging from 0 to 255. The closer the selected sample is to the average skin color, the closer to 255.

도 6은 입력영상으로부터 피부 맵을 생성한 후 이를 이진화하는 예를 보여주고 있다. 도 6에서 왼쪽 영상의 양쪽 눈 주변에 있는 사각형이 눈 영역 최소 포함 사각형의 5배 확장된 영역을 나타낸다. 이 영역으로부터 샘플들을 선택하고 피부 맵을 생성하면 도 6의 중앙에 위치한 그림이 된다. 다시 말해, 최소 포함 사각형의 5배 안에 위치한 각각의 샘플들에 [수학식 5]를 적용하면 피부 맵이 생성된다. 그런 다음, 피부 맵을 피부 영역과 비 피부영역으로 이진화하면 도 6의 오른쪽에 위치한 영상이 된다.6 shows an example of generating a skin map from an input image and binarizing it. In FIG. 6, a quadrangle around both eyes of the left image represents an area five times larger than an eye area minimum including rectangle. Selecting samples from this area and creating a skin map results in a picture centered in FIG. 6. In other words, applying [Equation 5] to each of the samples located within five times the minimum containing rectangle generates a skin map. Then, when the skin map is binarized into a skin region and a non-skin region, an image located on the right side of FIG. 6 is obtained.

[수학식 5]에서 C_r과 C_b는 눈 영역에 해당하는 최소 포함 사각형(MER)의 5배 확장된 영역에 위치한 샘플들의 C_r과 C_b 값을 의미한다. 그리고

과

는 일반화된 피부 모델의 C_r과 C_b 값이다.In [Equation 5], C _r and C _b refer to the values of C _r and C _b of samples located in a region 5 times extended of the minimum inclusion rectangle (MER) corresponding to the eye region. And

and

Is the C _r and C _b values of the generalized skin model.

이와 같이 하는 이유는 3차원 입력영상에서 추출한 샘플의 화소값이 피부색상에 가까운 색상인지 아닌지를 1차적으로 판정하기 위함이다. 즉, 일반화된 피부 모델의 C_r과 C_b 값은 모든 입력영상에 대해서 100퍼센트 정확한 피부색상 값이 아니나, 인간의 피부색상과 유사한 값을 가지고 있다.The reason for doing this is to primarily determine whether the pixel value of the sample extracted from the 3D input image is a color close to the skin color. In other words, the C _r and C _b values of the generalized skin model are not 100% accurate skin color values for all input images, but have similar values to human skin color.

다음으로, 피부 맵을 이진화한 후 피부색상이라고 판단된 피부 맵만을 이용하여 피부색상 모델을 생성하고(S35), 위의 과정을 통해 생성된 피부색상 모델을 이용해서 전체 영상으로부터 피부색상 영역을 강건하게 추출한다(S36).Next, after binarizing the skin map, a skin color model is generated using only the skin map determined as the skin color (S35), and the skin color region is robust from the entire image using the skin color model generated through the above process. To extract (S36).

즉, 일반화된 피부 모델의 C_r과 C_b 값을 이용하여 1차적으로 피부 맵을 생성한 후, 피부맵의 이진화를 통하여 1차적으로 생성된 피부맵 중에서 피부색상이 아니라고 판단된 샘플들을 제거하고 실제적인 피부색상 샘플들만을 선택하여 피부색상 모델을 생성한다.That is, the skin map is first generated using the C _r and C _b values of the generalized skin model, and then the samples determined to be non-skin color are first removed from the skin map generated through the binarization of the skin map. Only actual skin color samples are selected to generate a skin color model.

피부 맵에서 피부 샘플 화소만을 선택하기 위해 본 발명에서는 Otsu에 의해 제안된 히스토그램 이진화 방법을 사용한다[문헌 12]. Otsu 방법은 명암 히스토그램을 이진화하는 최적의 임계치를 사전지식 없이 통계적으로 선택하며, 히스토그램이 두 개의 확률밀도를 가질 때 성능이 우수하다고 알려져 있다. 본 논문에서는 피부 맵 내에 포함된 자료들에 대한 히스토그램을 작성하고, Otsu 방법에 의해 히스토그램을 이진화하여 비 피부색상 화소들을 제외하고 피부색상 화소들만을 선택한다.
In order to select only skin sample pixels in the skin map, the present invention uses the histogram binarization method proposed by Otsu [Ref. 12]. The Otsu method statistically selects the optimal threshold for binarizing the contrast histogram without prior knowledge and is known to perform well when the histogram has two probability densities. In this paper, a histogram is created for the data contained in the skin map, and the histogram is binarized by the Otsu method to select only skin color pixels except non-skin color pixels.

도 5b는 도 5a의 과정을 다르게 표현한 것으로서, 앞서 기술한 입력영상에 적응적인 피부모델을 이용하여 피부색상 영역을 강건하게 추출하는 방법의 전체적인 개요도를 보여준다. 도 5b에서 확인할 수 있듯이 입력영상에서 사람의 눈이 존재하지 않거나 사람이 눈을 감아서 눈을 검출할 수 없는 경우에는 기존의 방법에서처럼 사전에 정의된 피부 모델을 사용하여 피부색상 영역을 추출하므로 제안된 방법의 실행 안정성에는 전혀 문제가 없다.
FIG. 5B is a different representation of the process of FIG. 5A, and shows an overall schematic diagram of a method for robustly extracting a skin color region using a skin model adaptive to the above-described input image. As shown in FIG. 5B, when the human eye does not exist in the input image or when the human cannot close the eye by detecting the eye, the skin color region is extracted using a predefined skin model as in the conventional method. There is no problem with the execution stability of the proposed method.

다음으로, (d) 레이블링 영역과 상기 피부색상 영역을 AND 결합하여 피부 영역의 후보군을 검출한다(S40). 즉, 앞서 S20 단계의 깊이정보 영상에서 레이블링된 영역과, 3차원 영상의 칼라 영상에서 검출한 색상 기반의 피부영역을 결합하여, 입력영상에서 3차원적인 피부색상 영역을 검출한다(S40).Next, the candidate group of the skin region is detected by AND combining the labeling region and the skin color region (S40). That is, a three-dimensional skin color region is detected in the input image by combining the labeled region in the depth information image of step S20 and the color-based skin region detected in the color image of the three-dimensional image (S40).

이를 위해, 먼저 거리 영상으로부터 동일한 레벨을 가지는 거리 영역을 레이블링하는데, 일반적으로 동일한 객체에서는 여러 가지 레벨의 거리가 측정된다. 그런 다음, [수학식 5-2]와 같이 레이블링된 거리 영상(깊이 영상)과 입력 영상에 적응적인 피부색상 모델을 사용하여 추출한 이진화된 피부색상 영상을 AND 연산하여 2차원 피부영역에 대응하는 거리영상만을 선택한다.To this end, first, a distance area having the same level is labeled from a distance image. In general, distances of various levels are measured in the same object. Then, a distance corresponding to the two-dimensional skin region is obtained by performing an AND operation on the labeled distance image (depth image) and the extracted skin color image extracted using a skin color model adaptive to the input image as shown in [Equation 5-2]. Select only the image.

[수학식 5-2]&Quot; (5-2) "

다시 말해, 위의 [수학식 5-2]는 아래와 같이 기술할 수 있는데, I_depth(x,y)는 깊이 영상을 의미하고, I_bi _{_} _skin(x,y)는 색상 기반의 이진화된 피부영역을 의미하며, I_depth'(x,y)는 [수학식 5-2]의 AND 연산을 통해 획득한 2차원 피부영역에 대응하는 거리영상을 의미한다. 레이블링 영역과 상기 피부색상 영역을 AND 연산하여 피부 영역을 검출하는 알고리즘은 다음과 같다.In other words, Equation 5-2 can be described as below, where I _depth (x, y) is a depth image and I _bi _{_} _skin (x, y) is a color-based binary skin. I _depth '(x, y) means a distance image corresponding to a two-dimensional skin region obtained through an AND operation of Equation 5-2. An algorithm for detecting a skin region by ANDing the labeling region and the skin color region is as follows.

[알고리즘][algorithm]

IF I_bi _{_} _skin(x,y) ≡ 0 THENIF I _bi _{_} _skin (x, y) ≡ 0 THEN

I_depth'(x,y) = I_depth(x,y)I _depth '(x, y) = I _depth (x, y)

ELSEELSE

I_depth'(x,y) = 0I _depth '(x, y) = 0

END IF
END IF

그러면 I_depth'(x,y)와 색상 기반으로 추출한 피부영상인 I_skincolor(x,y)로부터 깊이 특징에 대한 레이블별로 피부색상 영역의 후보군을 추출할 수 있다.
Then, the candidate group of the skin color region can be extracted for each label of the depth feature from I _skincolor (x, y), which is the skin image extracted based on I _depth '(x, y).

다음으로, (d) 질감 복잡도를 이용하여 최종적인 피부 영역을 검출하는 단계(S50)를 설명한다. 즉, 상기 피부 영역의 후보군에서 질감 복잡도를 평가하여 최종적인 피부 영역을 검출한다(S50).Next, the step (S50) of detecting the final skin region using the texture complexity will be described. That is, the final skin region is detected by evaluating texture complexity in the candidate group of the skin region (S50).

최종적인 3차원 피부영역을 선택하기 위해서 거리 레이블별 피부영역 후보군의 질감 복잡도(texture smoothness)를 평가한다. 일반적으로, 피부색상 영역은 질감이 거칠지 않고 완만하다는 특성을 가지고 있으므로 3차원 피부영역 후보군의 질감 복잡도를 측정하여 질감이 거친 영역들은 제거하고 완만한 영역들만을 실제적인 피부영역으로 판단한다.In order to select the final three-dimensional skin region, texture smoothness of the skin region candidate group by distance label is evaluated. In general, since the skin color region has a characteristic that the texture is not rough but gentle, the texture complexity of the three-dimensional skin region candidate group is measured to remove the rough texture regions and to determine only the gentle regions as actual skin regions.

본 발명에서는 질감 복잡도 측정을 위해서 해당하는 영역 내에 소벨(Sobel) 에지 연산자를 적용하여 에지의 정도를 추출한다. 즉, 해당 영역 내에 에지가 많이 존재할수록 질감이 거칠다는 것을 나타내며, 에지가 적게 존재할수록 질감이 완만하다는 의미이다. 보통, 소벨 에지 연산자는 에지를 검출하는 미분 연산자로 x축과 y축으로 각각 한 번씩 미분을 수행하는데, 소벨 연산자에 해당하는 회선(convolution) 마스크는 다음의 도 7과 같다.In the present invention, the Sobel edge operator is applied in the corresponding region for the texture complexity measurement, and the edge degree is extracted. That is, the more edges present in the region, the rougher the texture is, and the less edges present, the smoother the texture. In general, the Sobel edge operator is a derivative operator for detecting an edge and performs the derivative once on the x-axis and the y-axis, respectively. The convolution mask corresponding to the Sobel operator is shown in FIG. 7.

깊이 영상과 이진화된 피부영상과의 AND 연산을 통해 추출한 i번째 후보 피부영역 R_i에 대한 질감 복잡도 T(R_i)는 [수학식 6]과 같이 추출할 수 있다. [수학식 6]에서 I_gray(x,y)는 x와 y 위치에서의 명암값을 의미하고, w_h(x,y)와 w_v(x,y)는 소벨 에지 연산자의 수평과 수직 마스크를 의미한다. 그리고 E(x,y)는 x와 y 위치에서의 에지 정도를 나타내고, N(R_i)는 영역 R_i에 속한 화소의 개수를 의미한다.The texture complexity T (R _i ) of the i-th candidate skin region R _i extracted through the AND operation between the depth image and the binarized skin image may be extracted as shown in [Equation 6]. In Equation 6, I _gray (x, y) denotes the contrast values at the x and y positions, and w _h (x, y) and w _v (x, y) are the horizontal and vertical masks of the Sobel edge operator. Means. E (x, y) represents the edge degree at x and y positions, and N (R _i ) means the number of pixels belonging to the region R _i .

[수학식 6]&Quot; (6) "

본 발명에서는 [수학식 6]을 이용하여 각 후보 피부영역에 대한 질감 복잡도를 계산한 후 적절한 임계치 범위(또는 사전에 정해진 임계치 범위) 내에 포함된 후보영역만을 실제 피부영역이라고 판단한다. 본 발명에서는 다양한 입체 영상을 이용한 실험을 통해서 질감 복잡도에 대한 임계치를 결정하였다.
In the present invention, after calculating the texture complexity for each candidate skin region using Equation 6, it is determined that only candidate regions included in an appropriate threshold range (or a predetermined threshold range) are actual skin regions. In the present invention, the threshold for texture complexity was determined through experiments using various stereoscopic images.

다음으로, 실험을 통한 본 발명의 효과를 도 8 내지 도 10을 참조하여 보다 구체적으로 설명한다.Next, the effects of the present invention through experiments will be described in more detail with reference to FIGS. 8 to 10.

본 발명의 실험을 위하여 사용한 컴퓨터는 인텔 코어 i7-2600의 3.40GHz CPU와 8GB의 메모리를 사용하였고, 운영체제는 마이크로소프트사의 Windows 7을 사용하였다. 그리고 응용 소프트웨어의 구현을 위한 컴파일러로는 마이크로소프트사의 Visual C++ 2008을 이용하여 제안된 피부색상 영역 추출 알고리즘을 구현하였다. 또한, 실험에 사용할 영상 데이터베이스의 구축을 위해서 특정한 제약조건이 주어지지 않은 일반적인 환경에서 촬영한 다양한 3차원의 정지 및 동영상을 수집하여 활용하였다.The computer used for the experiment of the present invention used the Intel Core i7-2600 3.40GHz CPU and 8GB of memory, the operating system of Microsoft Windows 7 was used. As a compiler for implementing application software, the proposed skin color region extraction algorithm is implemented using Microsoft Visual C ++ 2008. In addition, we used various 3D still and video images taken in a general environment without specific constraints to construct an image database for the experiment.

도 8은 본 발명을 이용하여 입력된 영상으로부터 피부영역을 검출한 예를 보여준다. 도 8(a)는 입력된 스테레오 영상 중에서 좌측 영상을 보여주고, 도 8(b)는 입력영상으로부터 스테레오 정합을 통해 측정한 거리영상을 보여준다. 그리고 도 8(c)는 2차원 피부색상 모델을 이용하여 피부영역을 추출한 이진 결과영상을 보여주며, 도 8(d)는 거리영상과 색상 기반의 피부영역을 결합하여 최종적으로 검출한 3차원적인 피부영역을 나타낸다.8 shows an example of detecting a skin region from an input image using the present invention. FIG. 8 (a) shows the left image of the input stereo image, and FIG. 8 (b) shows the distance image measured through stereo matching from the input image. 8 (c) shows a binary result image obtained by extracting a skin region using a two-dimensional skin color model, and FIG. 8 (d) shows a three-dimensional image finally detected by combining a distance image and a color-based skin region. Represents a skin area.

도 8의 (c)에서 확인할 수 있듯이 2차원적인 피부색상 모델을 이용하여 추출한 피부색상 영역은 많은 오 검출을 포함하고 있다. 즉, 사람의 피부영역이 아닌 배경 영역에서 피부색상과 유사한 영역이 존재할 경우 피부영역으로 오 검출한다. 그러나 2차원적인 피부색상 특징에 깊이 특징을 결합하여 검출한 3차원적인 피부색상 영역은 보다 정확하게 피부영역만을 추출함을 확인할 수 있다.As shown in (c) of FIG. 8, the skin color region extracted using the two-dimensional skin color model includes many false detections. In other words, if there is an area similar to the skin color in the background area instead of the human skin area, it is incorrectly detected as the skin area. However, it can be seen that the three-dimensional skin color region detected by combining the two-dimensional skin color feature with the depth feature extracts only the skin region more accurately.

본 발명에서는 입력영상에서 텍스처가 매우 빈약하거나 없는 영역에서는 3차원의 깊이 정보를 추출하기가 어려우므로 피부색상 추출에 어려움이 존재할 수 있다.In the present invention, since it is difficult to extract three-dimensional depth information in an area where texture is very poor or absent in the input image, it may be difficult to extract skin color.

본 발명에 따른 피부영역 검출 방법의 성능을 정량적으로 평가하기 위해서 [수학식 7]과 같은 오차의 제곱 평균 제곱근 (RMSE) 척도를 정의하였다. RMSE는 영상 화질의 일반적이면서도 구체적인 관점을 다루는 척도로 실제적인 값과 측정값의 차이를 측정하기 위해 종종 사용되는데, 정확성 측정에 있어 좋은 척도라고 알려져 있다[문헌 10].In order to quantitatively evaluate the performance of the skin area detection method according to the present invention, a root mean square (RMSE) measure of an error as defined in [Equation 7] was defined. RMSE is a measure that deals with the general and specific aspects of image quality and is often used to measure the difference between actual and measured values, and is known as a good measure of accuracy.

[수학식 7][Equation 7]

[수학식 7]에서 M과 N은 영상의 가로와 세로의 길이를 나타내고, i와 j는 영상의 위치를 나타내는 행과 열의 인덱스이다. 그리고 OB(i,j)는 그라운드 트루스(ground truth) 이진영상을 의미하며, RB(i,j)는 피부영역을 검출한 이진결과영상을 나타낸다.In Equation (7), M and N represent horizontal and vertical lengths of an image, and i and j are indexes of rows and columns representing positions of an image. OB (i, j) represents a ground truth binary image, and RB (i, j) represents a binary result image of detecting a skin region.

성능을 보다 정확히 평가하기 위해서 원래의 테스트 영상을 피부 영역과 비피부 영역으로 수작업으로 변환한 후 결과 영상과 비교 분석하였다. 그리고 피부영역 추출에는 두 가지 종류의 에러, 즉 false positive와 false negative가 존재하므로 다음 [수학식 8]과 같은 척도 overall을 정확도 평가에 사용한다[문헌 11].In order to evaluate the performance more accurately, the original test image was manually converted into skin and non-skin areas and compared with the resultant image. In addition, since there are two kinds of errors, false positive and false negative, in skin region extraction, the overall scale as shown in [Equation 8] is used for the accuracy evaluation [Ref. 11].

[수학식 8]&Quot; (8) "

도 9와 도 10은 기존의 2차원 피부영역 검출 방법과 본 발명에 따른 3차원의 피부영역 검출 방법을 비교 평가한 성능의 결과를 표와 그림으로 각각 보여준다.9 and 10 show the results of the performance of comparing and evaluating the conventional two-dimensional skin area detection method and the three-dimensional skin area detection method according to the present invention in a table and a figure, respectively.

도 10에서 확인할 수 있듯이 본 발명은 2차원 피부영역 검출 결과를 기반으로 최종적인 피부영역을 검출하기 때문에 precision은 제안된 방법이 조금 낮은 결과를 얻었다. 그러나 배경영역에 피부색상과 유사한 색상이 분포되어 있는 경우 2차원 피부영역 검출은 오 검출이 많이 발생하므로 다소 낮은 recall이 나온다. 이에 비해, 본 발명은 거리영상을 기준으로 영역을 분리하여 피부영역을 검증하기 때문에 오 검출율을 현저히 낮출 수 있으므로 precision과 recall 모두 좋은 결과를 얻을 수 있었다.
As can be seen in FIG. 10, the present invention detects the final skin region based on the 2D skin region detection result, and thus the precision of the proposed method is slightly lower. However, when the color similar to the skin color is distributed in the background area, the two-dimensional skin area detection occurs a little lower recall because a lot of false detection occurs. In contrast, since the present invention verifies the skin area by separating the area based on the distance image, the detection rate of the error can be significantly lowered, so that both precision and recall can obtain good results.

다음으로, 본 발명의 일실시예에 따른 깊이 정보 기반 사람의 피부 영역 검출 시스템을 도 11을 참조하여 설명한다.Next, a depth information based human skin region detection system according to an embodiment of the present invention will be described with reference to FIG. 11.

앞서 설명한 바와 같이, 본 발명에 따른 깊이 정보 기반 사람의 피부 영역 검출 방법은 프로그램 시스템으로 구현될 수 있으며, 상기 방법의 각 단계를 하나의 기능적 수단으로 구성하여 구현될 수 있다.As described above, the depth information based human skin region detection method according to the present invention may be implemented as a program system, and each step of the method may be implemented by configuring one functional means.

피부 영역 검출 시스템(30)은 좌우 영상으로 구성된 스테레오 3차원 영상으로부터 사람의 피부 영역을 검출하는 시스템이다.The skin region detection system 30 is a system for detecting a human skin region from a stereo three-dimensional image composed of left and right images.

도 11에서 보는 바와 같이, 본 발명의 일실시예에 따른 깊이 정보 기반 사람의 피부 영역 검출 시스템은 스테레오 3차원 영상으로부터 깊이 정보를 추출하는 깊이정보 추출부(31); 상기 깊이 정보의 영상을 대상으로 레이블링하는 레이블링 수행부(32); 상기 3차원 영상으로부터 피부색상 영역을 검출하는 색상영역 검출부(33); 상기 레이블링 영역과 상기 피부색상 영역을 AND 결합하여 피부 영역의 후보군을 검출하는 후보영역 검출부(34); 및, 상기 피부 영역의 후보군에서 질감 복잡도를 평가하여 최종적인 피부 영역을 검출하는 최종영역 검출부(35)로 구성된다.
As shown in FIG. 11, a depth information based human skin region detection system according to an exemplary embodiment of the present invention includes a depth information extractor 31 extracting depth information from a stereo three-dimensional image; A labeling performing unit 32 for labeling an image of the depth information as an object; A color gamut detector 33 for detecting a skin color gamut from the 3D image; A candidate region detector (34) for detecting a candidate group of skin regions by AND combining the labeling region and the skin color region; And a final region detector 35 for evaluating the texture complexity in the candidate group of the skin regions to detect the final skin region.

본 발명에서는 깊이 특징과 색상 특징을 결합하여 입력되는 3차원의 동영상에 존재하는 인간의 피부색상 영역들을 강건하게 추출하는 방법을 제안하였다. 본 발명에서는 먼저 스테레오 정합 기법을 이용하여 입력된 좌우 영상으로부터 카메라와 물체 사이의 거리인 3차원의 깊이 정보를 추출하였다. 그런 다음, 유사한 깊이 특징을 가지는 영역들을 군집화하고, 군집화된 영역 중에서 2차원의 피부색상 분포를 가지는 영역들을 실제적인 피부색상 영역이라고 판단하였다.In the present invention, a method of robustly extracting human skin color regions present in a 3D video input by combining depth and color features is proposed. In the present invention, three-dimensional depth information, which is a distance between a camera and an object, is extracted from a left and right image input using a stereo matching technique. Then, regions having similar depth characteristics were clustered, and regions having a two-dimensional skin color distribution among the clustered regions were determined as actual skin color regions.

이상, 본 발명자에 의해서 이루어진 발명을 상기 실시 예에 따라 구체적으로 설명하였지만, 본 발명은 상기 실시 예에 한정되는 것은 아니고, 그 요지를 이탈하지 않는 범위에서 여러 가지로 변경 가능한 것은 물론이다.
As mentioned above, although the invention made by this inventor was demonstrated concretely according to the said Example, this invention is not limited to the said Example and can be variously changed in the range which does not deviate from the summary.

10 : 3차원 영상 20 : 컴퓨터 단말
30 : 피부 영역 검출 시스템 31 : 깊이정보 추출부
32 : 레이블링 수행부 33 : 색상영역 검출부
34 : 후보영역 검출부 35 : 최종영역 검출부
36 : 메모리10: three-dimensional image 20: computer terminal
30: skin area detection system 31: depth information extraction unit
32: labeling unit 33: color gamut detection unit
34: candidate region detection unit 35: final region detection unit
36: memory

Claims

In the depth information-based human skin region detection method for detecting a human skin region from a stereo three-dimensional image composed of left and right images,
(a) extracting depth information from the stereo 3D image;
(b) labeling the image of the depth information as an object;
(c) detecting a skin color region from the 3D image;
(d) detecting a candidate group of skin regions by AND combining the labeling region and the skin color region; And
(e) evaluating texture complexity in the candidate group of skin regions to detect the final skin region,
In the step (c), the eye region is detected in the 3D image, a skin map is generated from the detected eye region, and a skin color model is generated from the skin map to extract a skin color region. Information based human skin area detection method.

The method of claim 1,
In the step (a), the depth information-based human skin region detection method, characterized in that to extract the depth information using a graph cuts based stereo matching technique.

The method of claim 1,
In the step (a), the depth-based human skin region is characterized in that the left and right color image of the stereo three-dimensional image is converted into a left and right gray image, and depth information is extracted by applying stereo matching between the left and right gray image. Detection method.

delete

The method of claim 1,
In step (c), selecting a sample located in the extended region of the minimum inclusion rectangle (MER) corresponding to the extracted eye region, and generates a skin map as shown in [Formula 1] from the selected sample Depth information-based human skin region detection method, characterized in that.
[Equation 1]

However, C _r and C _b are pixel values of the YC _b C _r space located in the extended area of the minimum containing rectangle (MER) corresponding to the eye area.

and

The method of claim 1,
In the step (e), based on the depth information, the texture complexity is calculated by extracting the degree of the edge by applying a Sobel edge operator to the region (hereinafter candidate region) belonging to the candidate group of the skin region Method for detecting human skin area.

The method according to claim 6,
Depth information-based human skin region detection method characterized in that the texture complexity T (R _i ) for the i-th candidate skin region belonging to the candidate group of the skin region is calculated by Equation 2.
[Equation 2]

Where I _gray (x, y) is the contrast at the x and y positions,
w _h (x, y) and w _v (x, y) are the horizontal and vertical masks of the Sobel edge operator,
E (x, y) is the edge at x and y positions,
N (R _i ) is the number of pixels belonging to the region R _i .