KR20080055622A

KR20080055622A - Human-Friendly Computer I / O System

Info

Publication number: KR20080055622A
Application number: KR1020070112970A
Authority: KR
Inventors: 김종철
Original assignee: 김종철
Priority date: 2006-12-16
Filing date: 2007-11-07
Publication date: 2008-06-19
Also published as: KR100903490B1

Abstract

본 발명은 컴퓨터와 인간과의 인터페이스(Human Computer Interface)가 graphic object의 위치인 GUI 환경에서, 스테레오 비전 센서로 사용자의 얼굴특징점을 인식측량하여 사용권한이 있는 사용자를 인식하고, 얼굴의 위치와 자세를 판별하여 사용자의 시선과 관심시점 등을 구하여, graphic object를 자동으로 표시하고, 스테레오스코픽 디스플레이의 해상도를 조절하고, HRTF based 3D audio 시스템을 구동함으로써 3차원적 시각과 청각효과를 제공하게 되어 현실감을 높이며, 업무효율을 높이도록 하였다.According to the present invention, in a GUI environment where a computer interface between a computer and a human is a graphic object, a stereo vision sensor recognizes a user's facial features and recognizes a user with permission to use the device. To visualize the user's eyes and points of interest, automatically display graphic objects, adjust the resolution of stereoscopic displays, and drive HRTF based 3D audio systems to provide three-dimensional visual and auditory effects To increase work efficiency.

Description

Ergonomic Human Computer Interface

본 발명은 컴퓨터(또는 컴퓨터 유사장치)와 인간과의 인터페이스에 관한 것으로, 컴퓨터가 사람의 얼굴을 인식하고 얼굴의 3차원적 위치와 자세, 시선방향과 관심시점(Point of Interest), 귀의 위치를 인지하므로서 사용자에게 알맞은 시청각출력과 사용자에게 편리한 입력방식을 제공하는 것이다.The present invention relates to an interface between a computer (or a computer-like device) and a human, and the computer recognizes a human face and displays the three-dimensional position and posture of the face, the direction of view and the point of interest, and the position of the ear. By providing the user with the proper audiovisual output and the convenient input method.

통상의 컴퓨터는 수동적으로서 키보드나 마우스를 통하여 사용자가 지시를 할 때까지 기다리다가, 지시된 작업을 수행한 결과를 사용자의 자세나 위치에 무관하게 인간의 오감 중 시각과 청각을 자극하는 표현 즉, 글자 또는 영상, 소리를 제공한다.In general, a computer passively waits for a user's instruction through a keyboard or a mouse, and then expresses a result of performing the indicated task to stimulate vision and hearing among the five senses of the human body regardless of the posture or position of the user. Provide text, video, and sound.

따라서 수동형 스테레오스크픽 디스플레이는 사용자가 정해진 위치에서 보아야만 정확한 영상을 볼 수 있다.Therefore, the passive stereoscopic display can see the correct image only when the user sees it at a fixed position.

컴퓨터의 마우스는 키보드와 더불어 입력장치의 표준으로 자리 잡았으나 왼팔과 오른팔의 동작이 비대칭이므로 앉은 자세를 왜곡시키고, 피곤이 쉽게 오며, 장시간 사용하면 손이나 어깨에 통증을 유발한다.The computer mouse has become a standard input device along with the keyboard, but since the movement of the left and right arms is asymmetrical, the sitting posture is distorted, tiredness becomes easy, and long-term use causes pain in the hands or shoulders.

컴퓨터가 사용자를 인식하고, 사용자의 관심과 의도를 추정하여 미리 명령대 상후보를 준비하고 사용자로 하여금 선택 또는 확인받도록 한다면 즉, 사용자의 지시행위를 쉽게 한다면 업무 효율이 오르고 편리해진다.If the computer recognizes the user, estimates the user's interests and intentions, prepares the candidate candidates in advance, and allows the user to select or confirm, that is, facilitate the user's instruction, thereby increasing work efficiency and convenience.

이러한 용도로 쓸 수 있는 종래기술은 다음과 같이 요약할 수 있다.The prior art that can be used for this purpose can be summarized as follows.

단안(Monocular) 시각시스템은 영상에서 명암의 강도 또는 색깔의 농염 등으로 구분이 분명한 픽셀군(pixel group)을 하나의 대상으로 보고 미리 저장되어 있는 형상과 비교하여 식별한다. 얼굴 인식에서는 대개 다음과 같이 5 단계를 거친다.Monocular visual systems identify pixel groups that are clearly distinguished by intensity or intensity of color in an image and compare them with previously stored shapes. Face recognition usually involves five steps:

1. 영상 취득1. Image Acquisition

2. 취득한 영상에서 얼굴처럼 보이는 형상(예:타원형의 얼굴 형상과 두 눈과 입을 가진 형상)을 검출2. Detect shapes that look like faces in the acquired image (for example, oval face shape and shape with two eyes and mouth)

3. 얼굴 영상에서 얼굴특징(feature)을 검출하여 형판(template)생성3. Template generation by detecting facial features from facial images

4. 만들어진 형판과 유사한 정도가 높은 후보얼굴 형판을 Data Base 에서 검색4. Search the Data Base for candidate face templates that are similar to the template you created.

5. 확인 또는 인식5. Confirmation or recognition

얼굴자세(facial pose)는 사용자 얼굴이 향한 방향을 뜻하며 피치, 요우, 롤 각으로 표현되며, 궁극적으로 사용자의 관심이 향하고 있음을 뜻한다.The facial pose refers to the direction in which the user's face is directed and is expressed in pitch, yaw, and roll angle, and ultimately, the user's attention is directed.

2차원의 영상으로 3차원인 얼굴의 자세까지 추정하는 방법은 다음 두 가지로 분류된다.There are two methods for estimating the posture of a face in three dimensions with a two-dimensional image.

1. 해석기하적 방법 - 3차원좌표계가 2차원으로 투영된 영상에서 얼굴구성요소의 2차원좌표를 구하고 3차원좌표계가 2차원 영상으로 투영되는 좌표변환식의 해를 구함.Analytical Geometry-Find the 2D coordinates of the face components in a 2D projected 3D coordinate system and the solution of the coordinate transformation equation where the 3D coordinate system is projected to the 2D image.

예) US PN 6,154,559, US PN 6,937,745, US PN 7,121,946Ex) US PN 6,154,559, US PN 6,937,745, US PN 7,121,946

2. 학습에 기초한 방법 - 자세와 자세에 따른 2차원 영상의 견본들을 저장하고 취득한 영상에 근사한 견본을 찾음으로서 자세를 추정함.2. Learning-based method-Predict posture by storing samples of two-dimensional images according to posture and posture and finding samples approximating the acquired images.

예) US PN 6,471,756Ex) US PN 6,471,756

최근에는 한편 항공사진측량에서 오래전부터 사용하던 stereo 영상을 이용하는 방법도 제시되고 있다.Recently, a method of using stereo images, which have been used in aerial surveys for a long time, has also been proposed.

예) US PN 6,188,777 Yes) US PN 6,188,777

앞의 기술들은 일반얼굴모형(generic head model)을 기초로 하여 얼굴 구성요소들에 대한 feature 형상이나 색깔 등을 학습시키어 얼굴자세를 추정함으로 불특정 다수에 대한 시스템 사용이 가능하나 정확도가 부족하고 계산량이 많다는 문제점을 안고 있다.The above techniques are based on the generic head model to estimate the face posture by learning the feature shape and color of the face elements, so that the system can be used for an unspecified number of people. There are many problems.

한편, 눈동자의 위치로서 시선을 추정하는 방법도 많이 연구되었으나, 사람의 눈은 계속 새로운 정보를 받아들이기 위하여 두리번거리기 때문에 마우스의 역할로 쓰기에는 너무 민감하다는 문제가 있다.On the other hand, there have been many researches on the method of estimating the gaze as the position of the eye, but since the human eye keeps wandering to receive new information, there is a problem that it is too sensitive to use as a mouse.

(참고: Eye Controlled Media: Present and Future State: Arne John Glenstrup, Theo Engell-Nielsen, University of Copenhagen, 1995. 6. 1.)(Note: Eye Controlled Media: Present and Future State: Arne John Glenstrup, Theo Engell-Nielsen, University of Copenhagen, June 1. 1995)

본 발명에서는 인간친화적으로, 즉 컴퓨터가 사용자가 누구인지를 알아보고 사용자의 위치, 얼굴의 자세, 눈동자의 위치를 측정하여서, GUI(Graphic User Interface) 환경에서 사용자의 의도를 추정하여, 사용자가 지시할 가능성이 있는 명령내용을 미리 준비하여 제시하고 사용자의 확인 또는 선택에 따라 해당한 명령을 수행한 결과를 사용자의 눈과 귀에 적합한 시청각효과를 출력할 수 있도록 하는 방법과 장치를 구현하는 것이 기술적 과제이다.In the present invention, the computer is to find out who the user is, and to measure the position of the user, the posture of the face, and the position of the pupil, to estimate the intention of the user in a GUI (Graphic User Interface) environment, and to indicate the user. It is a technical task to implement a method and apparatus for preparing and presenting a possible command content in advance and outputting an audiovisual effect suitable for a user's eyes and ears based on the user's confirmation or selection. .

본 발명에서는 컴퓨터(또는 컴퓨터 유사기계)의 주변기기로서 2개 이상의 스피커와, 2차원적 평면 디스플레이 또는 3차원적 영상을 제공하는 스테레오스코픽 디스플레이와, 디스플레이의 모서리 또는 주변에 최소 두 개의 CCD카메라 또는 비디오 카메라를 두고, 사용자는 안경알이 없는 안경테를 착용하는 것을 기본으로 한다. In the present invention, as a peripheral device of a computer (or a computer-like machine), two or more speakers, a stereoscopic display providing a two-dimensional flat panel display or a three-dimensional image, and at least two CCD cameras or video at the corners or the periphery of the display. With the camera, the user is based on wearing an eyeglass frame without eyeglasses.

그 안경테에는 최소 3개의 시각 마커가 있어 카메라가 검출하기 쉽도록 하였으며, 안경테에는 카메라교정을 위한 눈금이 있다.The frame has at least three visual markers to make the camera easier to detect, and the frame has a scale for camera calibration.

안경알 없는 안경테는 스테레오 카메라의 교정에도 쓰이며, 헤드마우스(head mouse)역할을 하므로 (Head Mouse Template)라 부르도록 한다.Eyeglass frames without eyeglasses are also used to calibrate stereo cameras, and are called head mouse templates because they act as head mice.

도 1은 2차원 디스플레이(7)의 양 모서리에 있는 두 개의 CCD 카메라(5), (6)와 사용자가 세 마커 (1), (2), (3)이 있는 Head Mouse Template (4)를 착용한 모습이다. 1 shows two CCD cameras 5 and 6 at both corners of a two-dimensional display 7 and the user has a Head Mouse Template 4 with three markers (1), (2) and (3). It is worn.

Head Mouse Template는 평면으로 되어 있으며, 안경처럼 사용자 얼굴에 장착하면 얼굴기준면과 평행하게 된다고 본다. 얼굴 기준면이란 양 눈이 이루는 면으로서 시선에 대하여 수직이며, 수식의 전개 편의상 마커 (1) (2)의 중앙은 양 눈의 중앙으로부터 시선이 통과한다고 본다.The Head Mouse Template has a flat surface and when placed on the user's face like glasses, it is considered to be parallel to the face reference plane. The face reference plane is a plane formed by both eyes and is perpendicular to the line of sight. For convenience of development, the center of the markers (1) and (2) passes through the line of sight from the center of both eyes.

도 2 는 사용자 머리 위에서 내려다 본 그림으로서 디스플레이 평면을 따라서 왼쪽과 오른쪽의 CCD 카메라를 이은 축을 X 축이라 하고, 디스플레이에 수직이며 Head Mouse Template로 향한 축을 Z 축이라 한다.FIG. 2 is a view from above of the user's head. The axis connecting the left and right CCD cameras along the display plane is called the X axis, and the axis perpendicular to the display and directed toward the Head Mouse Template is called the Z axis.

하나의 마커 (8)에 대하여, b 의 거리만큼 떨어진 두 개의 CCD 카메라에 투영된 상의 위치가 각각 x_img1 와 x_img2 와 로 다르게 나타나며, 이들 위치 차이를 CCD 소자의 pixel 단위길이 a 로 나누어 disparity d 라 부른다. 이때 디스플레이로부터 마커까지의 거리 z 는 CCD 카메라 렌즈의 focal length f 와 CCD 카메라 간의 거리 baseline b 로서, 다음의 근사식으로 표현된다. (참고: RANGE AND VELOCITY ESTIMATION OF OBJECTS AT LONG RANGES USING MULTIOCULAR VIDEO SEQUENCES, For one marker 8, the positions of the images projected on two CCD cameras separated by a distance of b are each x _img1 And x _img2 It is different from and, and the position difference is divided by the pixel unit length a of the CCD element and is called disparity d . The distance z from the display to the marker is the distance baseline b between the focal length f of the CCD camera lens and the CCD camera, and is expressed by the following approximation equation. (Note: RANGE AND VELOCITY ESTIMATION OF OBJECTS AT LONG RANGES USING MULTIOCULAR VIDEO SEQUENCES,

N.Scherer, R.Gabler; http://www.isprs.org/istanbul2004/comm5/papers/649.pdf) N. Scherer, R. Gabler; http://www.isprs.org/istanbul2004/comm5/papers/649.pdf)

z = ( bㆍf )/ ( dㆍa ) z = (b · f) / (d · a)

이로써 Head Mouse Template 좌표계와 디스플레이 좌표계가 수평면을 공유한다면,안경테의 마커 (1) 과 (2) 로서 사용자가 주시하는 시선점의 수평위치를 구하고, 마커 (2)와 (3)으로서 시선점의 수직위치를 구할 수 있다.Thus, if the Head Mouse Template coordinate system and the display coordinate system share a horizontal plane, the markers (1) and (2) on the eyeglass frame obtain the horizontal position of the gaze point of the user's gaze, and the markers (2) and (3) are perpendicular to the gaze point. You can get the location.

도 3 은 디스플레이 좌표계의 원점을 센서 1 (왼쪽 CCD 카메라)의 중심으로 옮기어서 시선이 디스플레이의 한 점에 닿는 곳의 위치를 구하는 방법을 2차원에서 설명코자 한다.FIG. 3 illustrates in two dimensions how to move the origin of the display coordinate system to the center of sensor 1 (left CCD camera) to find the position where the line of sight touches a point on the display.

마커 M1 (1)과 M2 (2)의 상이 왼쪽 CCD 카메라에 투영된 위치가 x_p1 과 x_p2 이면, 앞의 근사식으로 M1과 M2의 위치는 각각 (x_p1 z₁/f, z₁) 과 (x_p2 z₂/f, z₂)이며, 마커들의 중앙점 M_c에서마커들을 잇는 선에 수직한 시선이 디스플레이와 만나는 점 (9)의 위치는

이다.The position where the images of the markers M1 (1) and M2 (2) are projected on the left CCD camera is x _p1 And x _p2 , The positions of M1 and M2 are (x _p1) z ₁ / f, z ₁ ) and (x _p2 z ₂ / f, z ₂ ), at the center point M _c of the markers The position of the point (9) where the line of sight perpendicular to the line connecting the markers meets the display is

to be.

도 4는 디스플레이 좌표계와 Head Mouse Template 좌표계의 관계를 나타내는 그림이며, Head Mouse Template의 뒤편에 있는 사용자의 얼굴특징(facial feature), 즉 눈썹의 양모서리, 눈의 양 가장자리, 귀 등 얼굴표정에 비교적 무관한 얼굴특징점 들의 위치를 측정할 수 있다. 또한, 사용자별로 특이한 특징으로서 사마귀나 점(10)의 위치를 측정할 수 있다. (점이나 사마귀가 있는 사용자 경우)4 is a diagram illustrating the relationship between the display coordinate system and the head mouse template coordinate system, and is relatively relatively to facial features such as the facial features of the user behind the head mouse template, that is, the edges of the eyebrows, the edges of the eyes, and the ears. The location of unrelated facial features can be measured. In addition, as a specific feature for each user, the location of the wart or the point 10 may be measured. (For users with dots or warts)

Head Mouse Template 좌표계에서 얼굴특징점 k 들의 좌표를 (x_T ^k, y_T ^k, z_T ^k)라 하면 디스플레이 좌표계에서의 왼쪽과 오른쪽의 CCD 카메라의 화상에서의 화소좌표 (x_cp ^k _L , y_cp ^k _L , -f), (x_cp ^k _R , y_cp ^k _R , -f)는 다음 식으로 표현된다.If the coordinates of the facial feature k in the Head Mouse Template coordinate system are (x _T ^k , y _T ^k , z _T ^k ), the pixel coordinates (x _cp ^k _L , y _cp) of the image of the CCD camera on the left and right sides of the display coordinate system ^k _L , -f), (x _cp ^k _R , y _cp ^k _R , -f) is expressed by the following equation.

한편, Head Mouse Template에 기초한 좌표계와 디스플레이 좌표계 변환관계즉, 디스플레이 좌표계로부터 X_T,Y_T,Z_T 만큼 병진이동하고, θ,φ,ψ 만큼 X,Y,Z 축을 중심으로 순서대로 회전한 Template 좌표계의 변환식은 다음과 같다.On the other hand, the coordinate system based on the head mouse template and the display coordinate system transformation relationship, that is, the template translated by X _T , Y _T , Z _T from the display coordinate system, and rotated in order around the X, Y, Z axes by θ, φ, ψ in order. The conversion formula of the coordinate system is as follows.

사용자 얼굴특징점들의 위치좌표를 기억해놓은 뒤에는, Head Mouse Template 없이도 얼굴 특징점들을 측정하여 template 좌표계와 디스플레이 좌표계의 변환관계를 계산함으로서 시선(line of sight)과 디스플레이 평면상의 시점( POI: Point of Interest) (9)을 구할 수 있다. After memorizing the positional coordinates of the user's face features, the point of interest and the point of interest (POI) on the display plane are calculated by measuring the face feature points without the Head Mouse Template and calculating the transformation relationship between the template coordinate system and the display coordinate system. 9) can be obtained.

쌍안(Binocluar)시각 시스템은 단안시각에 비하여 조명, 거리, 관점(viewpoint)에 무관하다는 장점이 있으나, 사람의 눈과 달리 computer vision 에서는 불일치문제(correspondence problem)가 있다. 이러한 문제를 풀기 위하여는 카메라 보정(camera calibration)을 해주어야 하는데, 알려진 방법으로는 "Camera Calibration of Stereo Photogrammetric System with One-Dimensional Optical Reference Bar" Q U Xu et al, International Sysmposium on Instrumentation Science and Technology (2006) 1048-1052, "A Flexible New Technique for Camera Calibration" Zhengyou Zhang, 1998.12.2. http://research.microsoft.com 등이 있다. Head Mouse Template 의 눈금으로 컴퓨터 디스플레이 세팅이 바뀔 때마다 앞서의 방법을 적용하여 교정을 하면 편리하다.The binocular (Binocluar) vision system has the advantage of being independent of illumination, distance, and viewpoint compared to the monocular vision, but unlike the human eye, there is a correlation problem in computer vision. In order to solve this problem, camera calibration should be performed. The known method is "Camera Calibration of Stereo Photogrammetric System with One-Dimensional Optical Reference Bar" QU Xu et al, International Sysmposium on Instrumentation Science and Technology (2006) 1048-1052, "A Flexible New Technique for Camera Calibration" Zhengyou Zhang, December 2, 1998. http://research.microsoft.com. Whenever the computer display setting is changed by the scale of the Head Mouse Template, it is convenient to apply the above method.

컴퓨터(또는 컴퓨터 유사기계)의 초기화 시에는 Head Mouse Template로 카메라의 교정을 마치고, Head Mouse Template에 의한 디스플레이 상의 POI 위치를 계산하여, 커서(curser)를 계산된 POI 위치에 표현하고 사용자의 확인행위(feedback), 즉 사용자가 원하는 위치에 커서가 위치하지 아니하였다면 얼굴의 자세를 바꾸고, 컴퓨터는 바뀐 얼굴자세에 따라 새로운 위치에 커서를 위치함을 반복하다가, 커서가 사용자가 원하는 위치에 이르렀을 때 사용자는 확인(confirm) 또는 선택 등의 지시행위를 키보드 상의 정해진 버튼을 누르거나 뗌으로서 핸드마우스의 역할을 대신할 수 있다. 초기화 시 사용자의 얼굴특징에 대한 스테레오 사진측량으로 얼굴표정에 무관한 얼굴특징점들의 좌표를 읽어서 저장등록하고, 초기화 이후의 운영체제에서는 Head Mouse Template를 장착하지 않아도 사용자의 얼굴특징을 측정하여, 등록된 얼굴의 데이터베이스와 비교하여 특정사용자를 인식할 수 있으며, 동시에 얼굴기준면의 위치와 자세가 산출되므로서 디스플레이 상의 POI를 추정하여 커서를 표시할 수 있다.When initializing a computer (or a computer-like machine), the camera is calibrated with the Head Mouse Template, the POI position on the display is calculated by the Head Mouse Template, the cursor is represented at the calculated POI position, and the user confirms. (feedback), i.e. changing the position of the face if the cursor is not located at the desired position, the computer repeats the position of the cursor in the new position according to the changed face posture, when the cursor reaches the desired position. The user may take the role of a hand mouse by pressing or releasing a predetermined button on the keyboard to indicate an action such as confirmation or selection. This is a stereo photo measurement of the user's facial features during initialization. The coordinates of the facial feature points irrelevant to the facial expressions are read and registered.In the operating system after the initialization, the user's facial features are measured without the Head Mouse Template installed. A specific user can be recognized by comparing with the database of the user, and at the same time, the position and posture of the face reference plane are calculated to estimate the POI on the display and display the cursor.

도 5는 왼쪽 눈과 오른쪽 눈에 같은 입체대상이나 서로 다른 영상을 중첩시켜 보임으로서 시차차(parallax difference)를 조성하므로서 원근감을 느끼도록 하는 입체시각시스템과 입체음향을 제공하기 위한 컴퓨터 주변기기의 설치 예이다. 구성은 왼쪽 눈과 오른쪽 눈에 보여줄 영상을 제공하는 디스플레이 (11),(12)를 좌우에 두고, 이들을 반사시키는 거울(13),(14)가 있으며, 입체음향효과를 내기 위한 네 개의 스피커 (15),(16),(17),(18) 과 입체적인 음향취득을 위한 마이크 (19), (20)이 있다. CCD 카메라 (5),(6)은 거울 (13),(16)의 중앙에 위치한 경우이다.FIG. 5 illustrates an example of installation of a stereoscopic visual system and a computer peripheral device for providing stereoscopic sound to create a parallax difference by overlapping the same stereoscopic object or different images in the left eye and the right eye. to be. The configuration includes left and right displays 11 and 12 that provide images for the left and right eyes, and mirrors 13 and 14 to reflect them, and four speakers for stereoscopic effect ( 15), (16), (17), (18) and microphones (19), (20) for three-dimensional sound acquisition. The CCD cameras 5 and 6 are located at the center of the mirrors 13 and 16.

도 6은 도 5의 평면도로서 시각시스템에 관한 부분이다. 도 6에서 (21)구역은 왼쪽 눈과 오른쪽 눈의 영상이 겹쳐지는 영역으로 입체시역(stereo-region)이며 FIG. 6 is a plan view of FIG. 5, and is a part of a vision system. FIG. In FIG. 6, area (21) is a stereo-region in which the images of the left eye and the right eye overlap.

POI(9)를 표현하고 있다. 도 7-a 는 POI 가 무한대인 경우에 눈동자의 위치를 보여주며, 도 7-b는 POI가 유한한 경우에 눈동자의 위치로서, 이들 눈동자 위치 차이로서 POI까지의 거리, 즉 입체 심도(3 Dimensional Depth)를 구할 수 있음을 보여준다. 따라서 Stereoscopic Display 에서 POI 부근의 ±10°정도인 Panum's area부위만 해상도를 높이고, 그 외는 해상도를 낮게 하여 눈의 피로를 덜하게 할 수 있다.( 참고 "Foveation for 3D Visualization and Stereo Imaging" Arzu Coltekin,TKK Institute of Photogammetry and Remote Sensing Publication Espoo 2006, Helsingki University of Technology)POI (9) is represented. 7-a shows the positions of the pupils when the POI is infinity, and FIG. 7-b is the positions of the pupils when the POIs are finite, and the distance to the POIs, i.e., three-dimensional depth, as these pupil position differences. Depth) can be obtained. Therefore, in the stereoscopic display, only the panum's area around ± 10 ° around the POI can increase the resolution, and the others can reduce the resolution to reduce eye fatigue. (See "Foveation for 3D Visualization and Stereo Imaging" Arzu Coltekin, TKK Institute of Photogammetry and Remote Sensing Publication Espoo 2006, Helsingki University of Technology)

컴퓨터의 초기화 시 카메라 보정 및 사용자 얼굴특징점들의 좌표를 구하여 등록하는 단계에서, 가상의 POI를 생성하여 무한대의 입체심도로부터 가까이 까지 근접시키면서 사용자의 양쪽 눈동자의 위치변화를 기록하여 테이블로 만들어 등록하도록 한다. 즉, 도 7-a 의 무한대 시점의 눈동자 위치와, 도 7-b 의 유한거리 관심시점(POI) (9)에 대하여 사용자의 눈동자 위치차이 (22)와 (23)을 측정하여, 입체심도 대 눈동자위치차이의 테이블을 구축하고, 초기화 이후 운영체제에서는 눈동자 위치차이 (22)와 (23)을 측정하여 테이블로부터 해당 입체심도를 읽어서 해당위치에 표현하도록 한다. During the initialization of the computer, in the step of obtaining and registering the camera correction and coordinates of the user's facial features, a virtual POI is generated to record a change in the position of both eyes of the user while making a near proximity from the infinite depth of depth to create a table. . That is, the user's pupil position difference (22) and (23) are measured with respect to the pupil position at the infinity point of time of FIG. 7-a and the finite distance point of interest (POI) 9 of FIG. A table of pupil position differences is constructed, and after initialization, the operating system measures pupil position differences (22) and (23), and reads the corresponding stereoscopic depth from the table and expresses them at the corresponding positions.

시각에서 입체감을 동일한 점(POI)에 대한 왼쪽 눈과 오른쪽 눈에 맺히는 상의 위치차이(시차:parallax)로 인지하듯이, 소리의 입체감은 양쪽 귀에서 들리는 소리의 시간차이(Interaural Time Difference:ITD), 강도차이(Interaural Level Difference:ILT), 방향에 따른 주파수반응효과(Head-Related Transfer Function: HRTF)로서 인지한다. HRTF는 소리의 방향에 따라서 1 KHz이하의 소리와 4KHz 이상의 소리가 머리, 몸통 및 외이(outer ear)를 통하며 회절과 반사에 따른 주파수별 반사음향효과를 뜻한다. 따라서 HRTF based 3D Audio System를 가동시키기 위하여는 각각의 음원에 대한 신호정보, 청취자의 위치와 얼굴자세, 음원의 위치를 입력하여야 한다. (참고: "Efficeient and Effective Use of Low Cost 3D Audio Systems", Kenneth Wang, Proceedings of the 2002 International Conference on Auditory Display, Kyoto, Japan, July 2-5, 2002. As perceived by the difference in position (parallax) between the left eye and the right eye for the same point (POI) in vision, the stereoscopic sense of sound is the Interaural Time Difference (ITD). , Interaural Level Difference (ILT) and Head-Related Transfer Function (HRTF). HRTF refers to the reflection sound effect of each frequency due to diffraction and reflection, with sound below 1 KHz and sound above 4KHz through head, torso and outer ear depending on the direction of sound. Therefore, in order to operate the HRTF based 3D Audio System, the signal information for each sound source, the position of the listener and the face posture, and the position of the sound source must be input. (Note: "Efficeient and Effective Use of Low Cost 3D Audio Systems", Kenneth Wang, Proceedings of the 2002 International Conference on Auditory Display, Kyoto, Japan, July 2-5, 2002.

US PN 6,961,439 Method and Apparatus for Producing Spatialized Audio Signals)US PN 6,961,439 Method and Apparatus for Producing Spatialized Audio Signals)

본 발명에서는 청취자의 위치와 얼굴자세를 입력하기 위하여 왼쪽 귀와 오른쪽 귀 를 인식하고 위치를 측정하여 입력하거나, 양 귀가 영상에 잡히지 않을 경우 사용자의 얼굴의 다른 특징점 또는 Head Mouse Template의 3차원적 위치와 자세를 스테레오 사진측량 인식하므로서 사용자의 귀의 위치 데이터를 입력할 수 있다.In the present invention, in order to input the position of the listener and the face posture, the left ear and the right ear are recognized and the position is measured and inputted, or when both ears are not caught in the image, other feature points of the user's face or the three-dimensional position of the head mouse template and the like. The positional data of the user's ear can be input by recognizing the posture of stereoscopic photogrammetry.

도 1은 사용자가 세 개의 마커가 있는 Head Mouse Template를 착용하고 디스플레이를 쳐다보는 개념도1 is a conceptual view of a user wearing a head mouse template with three markers and looking at a display

도 2는 도 1의 평면도로서 두 개의 센서에 나타나는 disparity로 심도를 재는 개념을 나타냄.FIG. 2 is a plan view of FIG. 1 showing the concept of depth measurement with disparity appearing in two sensors. FIG.

도 3은 도 1의 평면도로서 시선과 POI를 구하는 개념도FIG. 3 is a plan view of FIG. 1, which is a conceptual diagram for obtaining a line of sight and a POI. FIG.

도 4는 Head Mouse Template를 착용하고 얼굴특징점들을 측정하는 개념도4 is a conceptual diagram of measuring facial features by wearing a Head Mouse Template

도 5는 입체시각과 입체음향을 제공하는 시스템의 예시임.5 is an illustration of a system that provides stereoscopic vision and stereophonic sound.

도 6은 스테레오스코픽 디스플레이에서 입체심도를 표현하는 개념도임.6 is a conceptual diagram representing stereoscopic depth in a stereoscopic display.

도 7은 무한대 시점과 유한거리 시점에 대한 사용자 눈동자의 위치차이를 나타내는 개념도7 is a conceptual diagram illustrating a position difference of a user's pupil with respect to an infinity view and a finite distance view

Claims

How to manipulate the position of graphical objects such as cursors and pointers on a display on a computer system in a GUI environment,

1: acquiring images of a user's face and head with at least two image acquisition devices (CCD cameras)

2: Recognize three or more visual markers in the Head Mouse Template from the user's face image and calculate the three-dimensional position for each marker from the display with the position difference (disparity) of the marker image pixels on each CCD camera screen. Steps to

3: obtaining a straight line (line of sight) perpendicular to the plane formed by the markers and passing through the center between the user's left and right eyes, and calculating an intersection point (POI) where the straight line meets the display plane

4 .: Displaying graphical objects at the location of the computed POI

How to include

The method of claim 1, wherein in calculating the three-dimensional position for each marker in step 2, calibration of internal and external variables of the CCD camera is performed in a known manner using markers and scales in the Head Mouse Template. The user's facial database is registered as user information with computer usage rights by obtaining three-dimensional coordinates of the user's facial features (both the ends of the eyebrows, the corners of the eyes, the edges of the lips, the ears, the points, the warts, etc.) How to further include the steps

The stereoscopic display according to claim 2, wherein when the display is a stereoscopic display, in the step of registering a user face by obtaining three-dimensional coordinates for the user's face feature, a virtual POI is formed on the stereoscopic display to form a nearest stereoscopic image from infinity stereo depth. The method further includes the step of recording the change of the center position of both eyes while changing to the depth, and adding the stereoscopic visual characteristic data of the corresponding user.

The method according to claim 1, wherein in the step of recognizing three or more visual markers in the head mouse template from the face image of the user in step 2, if the head mouse template image cannot be extracted, a match is made in the registered user face database. Recognizing a specific user having permission to use the facial image data, and calculating a three-dimensional position and posture of the face reference plane from the positions of the respective facial feature points.

In computer systems, where the position of graphical objects such as cursors or pointers on a display or stereoscopic display screen is in the Human Computer Interface,

At least two image acquisition devices (CCD cameras) separated from the user's face so that they are located within the stereo-regieon;

Head Mouse Template is mounted parallel to the user's face reference surface and has at least three visual markers and a grid spaced at regular intervals.

A logic processor that detects the position of the image pixel on the CCD camera with respect to the facial feature point or visual marker of the user, calculates the three-dimensional position as a disparity, estimates the user's POI position, and expresses the graphical object at the estimated position;

Storage media for storing data such as user's facial feature data and stereoscopic visual data

Containing system

The system according to claim 1, wherein in the case of a system having a stereoscopic display, in the step 3: obtaining the visual direction, the left eye is read by measuring the positions of the left and right pupils and reading the corresponding stereo depth from the user's stereoscopic visual data table. The method further comprises the step of expressing a graphical object at a position corresponding to the corresponding screen of the right eye and the corresponding stereoscopic depth corresponding to the corresponding screen of the right eye, increasing the resolution only near the panum's area of the corresponding area and lowering the resolution of the other area.

The system of claim 5, further comprising an HRTF based 3D Audio system consisting of at least two speakers or earphones as a peripheral device of the computer system.

The method of claim 7, wherein the method is for driving an HRTF based 3D Audio system.

2: Recognizes three or more facial feature points or three or more visual markers of a Head Mouse Template in a user's face image, and displays a three-dimensional position from the display as a position difference (disparity) of image pixels on each CCD camera screen. Computation of 3D position and posture of face reference plane

3: Step of obtaining the position of the left and right ears from the position and posture of the face reference plane and inputting it to the HRTF based 3D Audio system

How to include