KR20230023520A

KR20230023520A - Long-distance motion gesture recognition apparatus

Info

Publication number: KR20230023520A
Application number: KR1020210105717A
Authority: KR
Inventors: 이태양
Original assignee: 이태양
Priority date: 2021-08-10
Filing date: 2021-08-10
Publication date: 2023-02-17

Abstract

The present invention relates to a long-distance motion gesture recognizing device comprising: a camera which obtains an image for a user; an image processing unit which obtains data for extracting a gesture of the user by image processing for the image obtained from the camera; a command operation extracting unit which extracts a corresponding command by performing voice recognition based on deep learning from the data obtained by the image processing unit; a controlling unit which outputs a controlling signal corresponding to the command extracted by the command operation extracting unit; and a transmitting unit which transmits the controlling signal outputted from the controlling unit to a controlled device in a wired or wireless method. The present invention increases a recognizing rate; greatly contributes to improvement of a gesture recognizing rate from a long distance; and is applied to smart home appliances and smart home electronic products which are widely used in conjunction with a smartphone application or smart home systems such as smart things, a home assistant, a home kit, etc. to increase use convenience.

Description

Long-distance motion gesture recognition apparatus {Long-distance motion gesture recognition apparatus}

본 발명은 장거리 모션 제스처 인식장치에 관한 것으로서, 보다 상세하게는 제어 대상 기기의 제어를 위한 명령 인식률을 높이도록 할 뿐만 아니라, 장거리에서의 제스처 인식률 향상에 크게 기여할 수 있는 장거리 모션 제스처 인식장치에 관한 것이다.The present invention relates to a long-distance motion gesture recognition device, and more particularly, to a long-distance motion gesture recognition device that can greatly contribute to improving the gesture recognition rate at a long distance as well as to increase the command recognition rate for controlling a controlling device. will be.

일반적으로, 4차 산업혁명의 핵심요소인 인공지능 기술에 음성인식 기술이 결합된 제품과 서비스 개발 및 보급이 급속하게 확대되고 있다. 그 중에서 인공지능 스피커는 사물인터넷 시대에 가정 내 다른 가전제품을 음성으로 명령하는 것만으로도 원격 제어할 수 있어서, 홈 IoT의 허브로 주목받고 있다.In general, the development and distribution of products and services that combine voice recognition technology with artificial intelligence technology, a key element of the 4th industrial revolution, are rapidly expanding. Among them, artificial intelligence speakers are attracting attention as a hub of home IoT because they can remotely control other home appliances in the IoT era by simply giving voice commands.

종래의 인공지능 스피커와 관련되는 기술로서, 한국등록특허 제10-2266320호의 "인공지능 스피커 시스템"은 메인 본체 상에 장착된 내부 마이크로폰과, 상기 내부 마이크로폰을 통해 입력된 음향신호 중 음성지시 내용이 있으면 해당 음성 지시내용에 대한 응답 정보를 내부 스피커를 통해 출력하는 메인 단말 제어부와, 상기 메인단말 제어부에 제어되어 등록된 사용자 단말기와 통신을 수행하는 메인 단말 통신부를 갖는 메인기기와; 상기 메인 본체와 분리되어 있으며 상기 메인 기기로부터 서브단말 통신부를 통해 외부소음 차단모드 수행지시 신호가 수신되면 외부 마이크로폰으로부터 수신된 외부 음향신호에 대해 외부 음향신호를 상쇄간섭할 수 있는 외부 상쇄간섭신호를 생성하여 외부 스피커를 통해 출력하는 서브 기기;를 구비하고, 상기 메인 기기는 상기 외부소음 차단모드, 수면모드, 육아모드, 공부모드의 설정을 지원하고, 상기 수면모드로의 설정지시 신호가 수신되면 상기 내부 마이크로폰을 통해 수신된 음향정보를 기억부에 저장되게 기록하고, 상기 육아모드로의 설정지시신호가 수신되면 상기 내부 마이크로폰을 통해 수신된 음향정보에 울음소리가 검출되면, 등록된 사용자 단말기로 울음소리 발생을 알리는 메시지를 전송하고, 상기 공부모드로의 설정지시 신호가 수신되면 상기 내부 마이크로폰을 통해 수신된 음향신호를 상쇄간섭에 의해 제거할 수 있도록 내부 상쇄간섭신호를 생성하여 상기 내부 스피커를 통해 출력되게 처리할 수 있다.As a technology related to conventional artificial intelligence speakers, the "artificial intelligence speaker system" of Korean Patent Registration No. 10-2266320 includes an internal microphone mounted on a main body and voice instructions among sound signals input through the internal microphone. a main device having a main terminal control unit that outputs response information to the corresponding voice instruction content through an internal speaker, and a main terminal communication unit that is controlled by the main terminal control unit and communicates with registered user terminals; It is separated from the main body and when an external noise blocking mode execution instruction signal is received from the main device through the sub-terminal communication unit, an external canceling interference signal capable of canceling interference with the external acoustic signal received from the external microphone is generated. and a sub device that generates and outputs the output through an external speaker, wherein the main device supports settings of the external noise blocking mode, sleep mode, parenting mode, and study mode, and when a signal for setting the sleep mode to the sleep mode is received. The sound information received through the internal microphone is stored and recorded in the storage unit, and when a setting instruction signal for the parenting mode is received, when a crying sound is detected in the sound information received through the internal microphone, the registered user terminal A message informing of the occurrence of a crying sound is transmitted, and when a signal indicating the setting of the study mode is received, an internal destructive interference signal is generated to remove the sound signal received through the internal microphone by destructive interference so as to operate the internal speaker. output can be processed.

그러나, 이와 같은 종래 기술은 낮은 음성 인식률과 음성을 내야하는 번거로움을 가지고 있을 뿐만 아니라, 목소리를 내지 못하는 장애인들에게는 사용이 어렵다는 문제점을 가지고 있었다.However, such a prior art not only has a low voice recognition rate and the hassle of producing a voice, but also has problems in that it is difficult to use for people with disabilities who cannot speak.

상기한 바와 같은 종래 기술의 문제점을 해결하기 위하여, 본 발명은 제어 대상 기기의 제어를 위한 명령 인식률을 높이도록 할 뿐만 아니라, 장거리에서의 제스처 인식률 향상에 크게 기여하고, 스마트폰 앱이나 스마트싱스, 홈어시스턴트, 홈킷 등 스마트홈 시스템과 연동하여 많이 사용되는 스마트홈 가전이나 스마트홈 전자제품 등에도 응용할 수 있도록 함으로써 사용의 편의성을 높이도록 하는데 목적이 있다.In order to solve the problems of the prior art as described above, the present invention not only increases the command recognition rate for controlling the controlling device, but also greatly contributes to the improvement of the gesture recognition rate at a long distance, and the smartphone app or smartthings, Its purpose is to increase the convenience of use by allowing it to be applied to smart home appliances or smart home electronic products that are widely used in conjunction with smart home systems such as Home Assistant and HomeKit.

본 발명의 다른 목적들은 이하의 실시례에 대한 설명을 통해 쉽게 이해될 수 있을 것이다.Other objects of the present invention will be easily understood through the description of the following embodiments.

상기한 바와 같은 목적을 달성하기 위해, 본 발명의 일측면에 따르면, 사용자에 대한 영상을 획득하기 위한 카메라; 상기 카메라로부터 획득한 영상에 대한 영상 처리에 의해 상기 사용자의 제스처 추출을 위한 데이터를 획득하는 영상처리부; 상기 영상처리부에 의해 획득한 데이터로부터 딥러닝 기반의 음성 인식을 수행하여 해당하는 명령을 추출하도록 하는 명령동작추출부; 상기 명령동작추출부에 의해 추출되는 명령에 상응하는 제어신호를 출력하는 제어부; 및 상기 제어부로부터 출력하는 제어신호를 유선 또는 무선으로 제어 대상 기기에 전송하도록 하는 전송부;를 포함하는, 장거리 모션 제스처 인식장치가 제공된다.In order to achieve the above object, according to one aspect of the present invention, a camera for obtaining an image of the user; an image processing unit that obtains data for extracting the user's gesture by image processing on the image obtained from the camera; a command action extraction unit for extracting a corresponding command by performing deep learning-based voice recognition from the data acquired by the image processing unit; a control unit outputting a control signal corresponding to the command extracted by the command operation extraction unit; and a transmission unit configured to transmit the control signal output from the control unit to the controlled device by wire or wirelessly.

상기 영상처리부는, 상기 사용자의 얼굴을 포함하여 가슴까지의 이미지를 상기 사용자의 제스처 추출을 위한 데이터로서 획득하고, 상기 명령동작추출부는, 상기 영상처리부에 의해 획득되는 데이터를 사용하여, 얼굴을 포함하여 가슴까지의 이미지를 딥러닝의 인풋으로 활용하기 위해 얼굴 인식(Facedection)을 진행하여 얼굴을 통해 ROI 영역 확보하고, 특정 프레임 시계열 데이터를 LSTM과 CNN의 인풋으로 활용해 딥러닝 모델을 거쳐 특정 제스처로 분류함으로써 특정 제스처가 나타내는 명령을 추출하도록 할 수 있다.The image processing unit obtains an image of the user's chest, including the face, as data for extracting the user's gesture, and the command action extraction unit uses the data obtained by the image processing unit to obtain the image including the face. To use the image up to the chest as an input for deep learning, face recognition is performed to secure an ROI area through the face, and specific gestures are obtained through a deep learning model by using specific frame time series data as input for LSTM and CNN. By classifying as , it is possible to extract a command indicated by a specific gesture.

상기 명령동작추출부는, 딥러닝시, 프레임 단위로 이미지를 수집하여 실시간 반복 진행하되, 반복 진행시 정해진 개수의 프레임이 중첩되도록 할 수 있다.During deep learning, the command action extraction unit collects images frame by frame and repeats the process in real time, and allows a predetermined number of frames to overlap during the repetition process.

사용자의 음성이 입력되도록 마련되는 마이크로폰; 및 상기 마이크로폰에 입력되는 음성을 처리하여, 명령 추출을 위한 데이터를 획득하는 음성처리부;를 더 포함하고, 상기 명령동작추출부는, 상기 음성처리부에 의해 획득한 데이터로부터 사용자의 음성을 미리 저장된 다수의 기준 음성과 비교하여, 매칭되는 기준 음성이 해당하는 명령을 추출하도록 할 수 있다.a microphone provided to input a user's voice; and a voice processing unit that processes voice input to the microphone and obtains data for command extraction, wherein the command action extraction unit includes a plurality of pre-stored voices of the user from data obtained by the voice processing unit. Compared with the reference voice, the matched reference voice may extract a corresponding command.

상기 영상처리부에 의해 획득한 데이터로부터 주사용자의 제스처를 인식하도록 함으로써, 상기 명령동작추출부가 상기 주사용자의 제스처로부터 딥러닝 기반의 음성 인식을 수행하여 해당하는 명령을 추출하도록 하는 주사용자인식부; 및 상기 카메라에 의해 획득되는 이미지 내에서, 상기 주사용자인식부에 의해 인식한 주사용자의 제스처로부터 손으로 지시하는 기기를 지정된 시간동안 순차적으로 식별되도록 처리하는 지시동작추출부;를 더 포함하고, 상기 명령동작추출부는, 상기 주사용자가 손으로 지시한 이후의 동작 제스처로부터 딥러닝 기반의 음성 인식을 수행하여 해당하는 명령을 추출하도록 하고, 상기 제어부는, 상기 명령동작추출부에 의해 추출되는 명령 직전에 식별되던 기기에 대하여 상기 명령에 해당하는 제어신호를 송신하도록 제어할 수 있다.a main user recognition unit for recognizing a gesture of a main user from the data obtained by the image processing unit, so that the command motion extraction unit extracts a corresponding command by performing deep learning-based voice recognition from the gesture of the main user; and a commanding motion extraction unit for sequentially identifying a device indicated by the hand from the main user's gesture recognized by the main user recognition unit within the image obtained by the camera, for a specified time period; The command action extraction unit extracts a corresponding command by performing deep learning-based voice recognition from an operation gesture given by the main user by hand, and the control unit extracts the command extracted by the command operation extraction unit. It can be controlled to transmit a control signal corresponding to the command to the device identified immediately before.

본 발명에 따른 장거리 모션 제스처 인식장치에 의하면, 제어 대상 기기의 제어를 위한 명령 인식률을 높이도록 할 뿐만 아니라, 장거리에서의 제스처 인식률 향상에 크게 기여할 수 있고, 스마트폰 앱이나 스마트싱스, 홈어시스턴트, 홈킷 등 스마트홈 시스템과 연동하여 많이 사용되는 스마트홈 가전이나 스마트홈 전자제품 등에도 응용할 수 있도록 함으로써 사용의 편의성을 높이도록 하는 효과를 가진다.According to the long-distance motion gesture recognition device according to the present invention, it not only increases the command recognition rate for controlling a controlling device, but also can greatly contribute to improving the gesture recognition rate at a long distance, and can be used in smartphone apps, smart things, home assistant, It has the effect of enhancing the convenience of use by allowing it to be applied to smart home appliances or smart home electronic products that are widely used in conjunction with smart home systems such as HomeKit.

도 1은 본 발명의 일 실시례에 따른 장거리 모션 제스처 인식장치를 도시한 사시도이다.
도 2는 본 발명의 일 실시례에 따른 장거리 모션 제스처 인식장치를 도시한 구성도이다.
도 3은 본 발명의 다른 실시례에 따른 장거리 모션 제스처 인식장치를 도시한 사시도이다.1 is a perspective view illustrating a long-distance motion gesture recognition device according to an embodiment of the present invention.
2 is a configuration diagram illustrating a long-distance motion gesture recognition apparatus according to an embodiment of the present invention.
3 is a perspective view illustrating a long-distance motion gesture recognition device according to another embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고, 여러 가지 실시례를 가질 수 있는 바, 특정 실시례들을 도면에 예시하고, 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니고, 본 발명의 기술 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 식으로 이해되어야 하고, 여러 가지 다른 형태로 변형될 수 있으며, 본 발명의 범위가 하기 실시례에 한정되는 것은 아니다. Since the present invention can make various changes and have various embodiments, specific embodiments will be illustrated in the drawings and described in detail. However, this is not intended to limit the present invention to specific embodiments, and should be understood in such a way as to include all changes, equivalents, or substitutes included in the technical spirit and scope of the present invention, and may be modified in various other forms. However, the scope of the present invention is not limited to the following examples.

이하, 첨부된 도면을 참조하여 본 발명에 따른 실시례를 상세히 설명하며, 도면 부호에 관계없이 동일하거나 대응하는 구성요소에 대해서는 동일한 참조 번호를 부여하고, 이에 대해 중복되는 설명을 생략하기로 한다.Hereinafter, embodiments according to the present invention will be described in detail with reference to the accompanying drawings, the same reference numerals will be assigned to the same or corresponding components regardless of reference numerals, and redundant descriptions thereof will be omitted.

도 1은 본 발명의 일 실시례에 따른 장거리 모션 제스처 인식장치를 도시한 구성도이고, 도 2는 본 발명의 일 실시례에 따른 장거리 모션 제스처 인식장치의 사용 모습을 도시한 사시도이다.1 is a configuration diagram showing a long-distance motion gesture recognition device according to an embodiment of the present invention, and FIG. 2 is a perspective view showing how the long-distance motion gesture recognition device according to an embodiment of the present invention is used.

도 1 및 도 2를 참조하면, 본 발명의 일 실시례에 따른 장거리 모션 제스처 인식장치(10)는 카메라(11), 영상처리부(12), 명령동작추출부(13), 제어부(14) 및 전송부(15)를 포함할 수 있으며, 케이싱(25)이 휴대용으로 적합한 원통형으로 이루어지나, 이에 반드시 한하는 것은 아니다.1 and 2, a long-distance motion gesture recognition device 10 according to an embodiment of the present invention includes a camera 11, an image processing unit 12, a command operation extraction unit 13, a control unit 14, and It may include a transmission unit 15, and the casing 25 is made of a cylindrical shape suitable for portable use, but is not necessarily limited thereto.

카메라(11)는 사용자에 대한 영상을 획득하도록 하는데, 예컨대 케이싱(25)의 전면에 전방을 향하여 영상을 획득하도록 설치될 수 있다. 카메라(11)에 의해 획득되는 영상, 예컨대 동영상의 경우, 이미지의 연속으로서, 사진 이미지를 매 초에 특정 프레임 수로 송신하도록 하는데, 예컨대 5초짜리 동영상이 60프레임인 경우 총 300장의 사진을 5초동안 빠르게 순서대로 획득하게 될 수 있다. The camera 11 is to acquire an image of the user, and may be installed, for example, on the front of the casing 25 so as to acquire an image in a forward direction. In the case of a video obtained by the camera 11, for example, a video, as a series of images, a photo image is transmitted at a specific number of frames per second. can be obtained in quick succession.

영상처리부(12)는 카메라(11)로부터 획득한 영상에 대한 영상 처리에 의해 사용자의 제스처 추출을 위한 데이터를 획득하도록 한다. 영상처리부(12)는 사용자의 얼굴을 포함하여 가슴까지의 이미지를 사용자의 제스처 추출을 위한 데이터로서 획득할 수 있다. 영상처리부(12)는 카메라(11)의 획득 영상에서, 사용자의 신체특징, 예컨대, 머리, 얼굴, 팔, 손, 다리, 발 등의 특징을 사용하여 사용자의 모습을 영상 처리 기법에 의하여 데이터로서 획득하고, 이를 명령동작추출부(13)에 제공하도록 할 수 있다. 영상처리부(12)에 의한 영상 처리 기법은 공지된 기술이 적용될 수 있다.The image processing unit 12 acquires data for extracting a user's gesture by image processing on an image obtained from the camera 11 . The image processing unit 12 may obtain an image of the user's chest up to the face including the user's face as data for extracting the user's gesture. The image processing unit 12 converts the user's appearance into data using an image processing technique using body characteristics of the user, such as head, face, arm, hand, leg, and foot, in the image obtained by the camera 11. It can be acquired and provided to the command operation extraction unit 13. A known technique may be applied to the image processing technique by the image processing unit 12 .

명령동작추출부(13)는 영상처리부(12)에 의해 획득한 데이터로부터 딥러닝 기반의 음성 인식을 수행하여 해당하는 명령을 추출하도록 한다. 명령동작추출부(13)는 손의 특정 사인이 아니라 움직이는 프레임(frame), 즉 동작에 대한 인식을 수행하게 된다. 또한 명령동작추출부(13)는 영상처리부(12)에 의해 획득한 데이터와 메모리부(21)에 미리 저장된 기준 제스처 데이터를 비교하여, 매칭되는 기준 제스처 데이터가 나타내는 제어신호를 명령으로서 추출하도록 구성될 수도 있다.The command action extraction unit 13 extracts a corresponding command by performing deep learning-based voice recognition from the data acquired by the image processing unit 12. The command motion extraction unit 13 recognizes a moving frame, that is, a motion, not a specific sign of the hand. In addition, the command action extraction unit 13 is configured to compare the data obtained by the image processing unit 12 with reference gesture data previously stored in the memory unit 21, and extract a control signal indicated by the matched reference gesture data as a command. It could be.

명령동작추출부(13)는 영상처리부(12)에 의해 획득되는 데이터를 사용하여, 얼굴을 포함하여 가슴까지의 이미지를 딥러닝의 인풋으로 활용하기 위해 얼굴 인식(Facedection)을 진행하여 얼굴을 통해 ROI 영역을 확보하고, 특정 프레임 시계열 데이터를 LSTM과 CNN의 인풋으로 활용하여, 딥러닝 모델을 거쳐 특정 제스처로 분류함으로써, 특정 제스처가 나타내는 명령을 추출하도록 할 수 있다. 제스처 인식의 경우 뎁스 비전 등을 활용한 인식이 이루어지는데, 이는 단거리 인식에 유리하지만, 거리가 멀어지면 정확도가 떨어진다. 반면 명령동작추출부(13)는 얼굴인식 후 얼굴이 포함한 특정 영역에서의 손동작 인식장치를 활용하여 약 10m 거리에서의 장거리 제스처 인식도 가능해진다. 또한 명령동작추출부(13)는 모델 최적화를 통해 임베디드 보드가 서버가 아닌 로컬 환경에서 동작하도록 구성될 수 있다. The command action extraction unit 13 uses the data acquired by the image processing unit 12 to perform face recognition to utilize images up to the chest, including the face, as inputs for deep learning. By securing an ROI area, using specific frame time-series data as input to LSTM and CNN, and classifying it as a specific gesture through a deep learning model, commands indicated by the specific gesture can be extracted. In the case of gesture recognition, recognition using depth vision is performed, which is advantageous for short-distance recognition, but accuracy decreases as the distance increases. On the other hand, after face recognition, the command motion extraction unit 13 utilizes a hand motion recognition device in a specific area including the face to enable long-distance gesture recognition at a distance of about 10 m. Also, the command operation extraction unit 13 may be configured so that the embedded board operates in a local environment rather than a server through model optimization.

명령동작추출부(13)는 제스처 인식을 위하여, 2-stage dector를 통한 R-CNN + LSTM 기법이 적용될 수 있고, 얼굴을 포함하여 가슴까지의 이미지를 딥러닝의 인풋으로 활용하기 위해 Facedection을 진행하여 얼굴을 통해 ROI영역을 확보함으로써, 딥러닝 모델을 최적화하고, 간소화하여 모델사이즈를 줄일 수 있고, 정확도를 높이도록 할 수 있다. 명령동작추출부(13)는 특정 프레임 시계열 데이터를 LSTM과 CNN의 인풋으로 활용해서, 딥러닝 모델을 거쳐 특정 제스처로 분류하게 되는데, 제스처는 예컨대 왼쪽 스와핑, 오른쪽 스와핑, 위쪽 스와핑, 아래쪽 스와핑, 정지 신호, 시계방향 회전, 팔 구르기, 박수, 아무런 제스처도 아닌 모습 등 여러가지 모션이 해당될 수 있다. 명령동작추출부(13)는 인공지능 모델의 경우로서, 서버를 거치지 않고 자체 보드의 로컬 오프라인 환경을 통해 모델 수행을 진행하는데, 이용할 수 있는 임베디드 보드로서, 예컨대, OS 내장형 임베디드 보드, 마이크로커널 기반 AI 연산기, FPGA 연산기 등을 활용하도록 구성될 수 있다.The command motion extraction unit 13 can apply R-CNN + LSTM technique through a 2-stage dector for gesture recognition, and proceeds with Facedection to utilize images from the face to the chest as input for deep learning. By securing the ROI area through the face, it is possible to optimize and simplify the deep learning model, reduce the model size, and increase accuracy. The command action extraction unit 13 uses specific frame time series data as an input of LSTM and CNN and classifies them into specific gestures through a deep learning model. Various motions, such as a signal, clockwise rotation, arm rolling, clapping, no gesture, etc., may be applicable. The command operation extraction unit 13 is a case of an artificial intelligence model, and proceeds with model execution through a local offline environment of its own board without going through a server. As an embedded board that can be used, for example, an OS embedded board, a micro kernel It can be configured to utilize AI calculators, FPGA calculators, and the like.

명령동작추출부(13)는 딥러닝시, 프레임 단위로 이미지를 수집하여 실시간 반복 진행하되, 반복 진행시 정해진 개수의 프레임이 중첩되도록 할 수 있고, 예컨대, 프레임 단위로 이미지를 수집하여 실시간 반복 진행하는데, 딥러닝은 특정 프레임 시계열 데이터를 단위로 한 번씩 모델 인식을 실시하게 되며, 프레임을 실시간 지속적으로 받아드려 연속 수행하면서 정확도를 강화할 수 있다. 이는 예컨대, 동영상이 총 1~120번까지의 프레임인 경우, 딥러닝 모델에 들어가는 프레임이 30프레임, 연속을 위해 버리는 프레임이 10이라고 한다면, 처음은 1~30번까지의 이미지로 모델을 수행하고, 두 번째는 10~40번까지의 이미지로 모델을 수행하며, 세 번째는 20~50번까지의 이미지로 모델을 수행할 수 있다. 이렇게 연속적으로 수행하면, 90~120번까지의 이미지로 모델을 수행함으로써, 총 9번의 제스처 결과가 나타나게 될 것이다. 이는 처음 1번 제스처와 2번 제스처가 만약 같은 때, 해당 제스처가 실행하는 등의 더욱 정확도 높은 알고리즘을 활용할 수 있고, 처음 제스처 결과가 나타나는 것을 해당 제스처로 한다면, 인식속도가 매우 빠르다는 장점을 가진다. During deep learning, the command action extraction unit 13 collects images in frame units and repeats them in real time. During the repetition, a predetermined number of frames may be overlapped. For example, images are collected in frame units and repeated in real time. However, in deep learning, model recognition is performed once per specific frame time series data, and the accuracy can be strengthened by continuously accepting frames in real time and continuously performing them. This is, for example, if the video is a total of frames 1 to 120, if the frames entering the deep learning model are 30 frames and the frames discarded for continuation are 10, the model is first performed with images 1 to 30, , the second one performs the model with images from 10 to 40, and the third one can perform the model with images from 20 to 50. If this is performed continuously, a total of 9 gesture results will be displayed by performing the model with images from 90 to 120 times. This has the advantage that if the first gesture and the second gesture are the same, a more accurate algorithm can be used, such as the gesture being executed, and if the gesture is the first gesture result, the recognition speed is very fast. .

명령동작추출부(13)는 예컨대, 다수의 특정 제스처에 따라 미리 정해진 명령으로서, 정지, 동작, 설정값 변경 등일 비롯하여 다양한 처리가 해당될 수 있다.The command operation extraction unit 13 is a predetermined command according to a plurality of specific gestures, and various processes including stop, operation, and setting value change may be applicable.

제어부(14)는 명령동작추출부(13)에 의해 추출되는 명령에 상응하는 제어신호를 출력하도록 한다. The control unit 14 outputs a control signal corresponding to the command extracted by the command operation extraction unit 13.

전송부(15)는 제어부(14)로부터 출력하는 제어신호를 유선 또는 무선으로 제어 대상 기기(1,2,3)에 전송하도록 한다. 전송부(15)는 케이블을 통해 유선으로 제어신호를 제어 대상 기기(1,2,3) 각각에 송신하도록 할 수 있고, 이에 한하지 않고, 미리 등록된 제어 대상 기기(1,2,3)에 지그비, 블루투스, Wi-Fi, 3G, LTE 또는 5G 등을 비롯하여 다양한 무선통신방식에 의해 제어 대상 기기(1,2,3)와의 신호 송수신을 가능하도록 할 수 있다. 제어 대상 기기(1,2,3)는 스마트폰 앱이나 스마트싱스, 홈어시스턴트, 홈킷 등 스마트홈 시스템과 연동하여 사용하는 스마트홈 가전이나 스마트홈 전자제품 등일 수 있다.The transmission unit 15 transmits the control signal output from the control unit 14 to the control target devices 1, 2, and 3 by wire or wirelessly. The transmission unit 15 may transmit a control signal to each of the control target devices 1, 2, and 3 in a wired manner through a cable, but is not limited thereto, and the control target devices 1, 2, and 3 pre-registered. It is possible to transmit/receive signals with the control target devices 1, 2, and 3 using various wireless communication methods including ZigBee, Bluetooth, Wi-Fi, 3G, LTE, or 5G. The control target devices 1, 2, and 3 may be smart home appliances or smart home electronic products used in conjunction with a smart home system such as a smart phone app, smartthings, home assistant, or homekit.

본 발명의 일 실시례에 따른 장거리 모션 제스처 인식장치(10)는 마이크로폰(16) 및 음성처리부(17)를 더 포함할 수 있고, 나아가서, 주사용인식부(18) 및 지시동작추출부(19)를 더 포함할 수 있다.The long-distance motion gesture recognition device 10 according to an embodiment of the present invention may further include a microphone 16 and a voice processing unit 17, and furthermore, a scanning recognition unit 18 and a command motion extraction unit 19 ) may be further included.

마이크로폰(16)은 사용자의 음성이 입력되도록 마련되는데, 예컨대 케이싱(25)의 전면에 위치하도록 형성되나, 반드시 이에 한하지 않는다.The microphone 16 is provided to input the user's voice, for example, it is formed to be located on the front of the casing 25, but is not necessarily limited thereto.

음성처리부(17)는 마이크로폰(16)에 입력되는 음성을 처리하여, 명령 추출을 위한 데이터를 획득하도록 한다. 음성처리부(17)는 본 발명의 기술 분야에서 알려진 다양한 음성 인식 알고리즘 중의 하나를 사용할 수 있는데, 예를 들어, HMM(hidden markov model), GMM(Gaussian mixuture model), SVM(support vector machine)과 같은 통계적 패턴 인식 방법 또는 RNN, LSTM, DNN, CNN과 같은 인공신경망 모델 중의 어느 하나 또는 이들의 조합을 사용할 수 있다.The voice processing unit 17 processes voice input to the microphone 16 to obtain data for command extraction. The voice processing unit 17 may use one of various voice recognition algorithms known in the art of the present invention, for example, a hidden markov model (HMM), a Gaussian mixture model (GMM), and a support vector machine (SVM). Any one of statistical pattern recognition methods or artificial neural network models such as RNN, LSTM, DNN, and CNN, or a combination thereof may be used.

명령동작추출부(13)는 음성처리부(17)에 의해 획득한 데이터로부터 사용자의 음성을 미리 저장된 다수의 기준 음성과 비교하여, 매칭되는 기준 음성이 해당하는 명령을 추출하도록 할 수 있다.The command action extraction unit 13 compares the user's voice with a plurality of pre-stored reference voices from the data acquired by the voice processing unit 17, and extracts a corresponding command from the matched reference voice.

주사용자인식부(18)는 영상처리부(12)에 의해 획득한 데이터로부터 주사용자의 제스처를 인식하도록 함으로써, 명령동작추출부(13)가 주사용자의 제스처로부터 딥러닝 기반의 음성 인식을 수행하여 해당하는 명령을 추출하도록 할 수 있다. 주사용자인식부(18)는 예컨대, 카메라(11)에 의해 촬영되는 다수의 사람들 중에서, 주사용자가 카메라(11)에 가장 인접하다는 점을 고려하여, 가장 근거리에 있는 사용자를 주사용자로 인식하도록 하거나, 주사용자가 명령을 위한 제스처를 주변 사람들에 비하여 큰 동작으로 수행하는 점을 고려하여, 가장 큰 동작을 수행하는 사용자를 주사용자로 인식하도록 하거나, 주사용자가 주변 사람들보다 명령에 필요한 제스처를 비교적 정확하게 수행하는 점을 고려하여, 제스처의 인식이 명확한 사용자를 주사용자로 인식하도록 하는 등 다양한 방식에 의해 주사용자의 인식을 가능하도록 하고, 이로 인해 주사용자 주변 사람들의 동작에 대한 노이즈를 제거하도록 할 수 있으며, 이로 인해 주사용자에 의한 정확한 명령 추출을 가능하도록 할 수 있다. The main user recognition unit 18 recognizes the main user's gesture from the data acquired by the image processing unit 12, so that the command motion extraction unit 13 performs deep learning-based voice recognition from the main user's gesture You can extract the corresponding command. For example, the main user recognition unit 18 considers that the main user is closest to the camera 11 among a plurality of people photographed by the camera 11, and recognizes the closest user as the main user. Or, considering that the main user performs a gesture for a command with a large motion compared to the people around him, the user who performs the largest motion is recognized as the main user, or the main user makes a gesture required for a command more than the people around him. Considering the relatively accurate performance, it is possible to recognize the main user by various methods, such as recognizing a user with clear gesture recognition as the main user, and thereby removing noise about the motions of people around the main user. and, thereby, it is possible to extract an accurate command by the main user.

지시동작추출부(19)는 카메라(11)에 의해 획득되는 이미지 내에서, 주사용자인식부(18)에 의해 인식한 주사용자의 제스처로부터 손으로 지시하는 기기를 지정된 시간동안 순차적으로 식별되도록 처리할 수 있다. 지시동작추출부(19)는 주사용자인식부(18)에 의해 인식한 주사용자의 제스처가 특정 기기를 손으로 지적한 것으로 판단되면, 그 손이 가리키는 기기(1,2,3) 중에서 지적 정확도가 가장 높은 기기로부터 그 주변의 기기를 정해진 시간, 예컨대, 0.5~2.0sec 범위 내의 시간 동안 순차적으로 그 기기의 명칭을 스피커(22)를 통해서 외부로 출력하도록 함으로써, 해당 기기의 지시 여부를 쉽게 확인할 수 있도록 한다. 이때, 사용자가 스피커(22)의 출력 명칭이 실제 가리키는 기기와는 상이할 경우, 다시 재차 기기를 향하도록 제스처를 정정할 수 있고, 이에 따라 지시동작추출부(19)는 새로이 지시하는 기기(1,2,3)를 식별하도록 할 수 있다. 기기의 명칭은 사용자가 조작부(23)의 조작에 의해, 제어부(14)의 제어 동작으로 미리 등록하도록 구성될 수 있다.The command motion extraction unit 19 processes to sequentially identify devices that are directed by the hand from the gestures of the main user recognized by the main user recognition unit 18 within the image acquired by the camera 11 for a specified period of time. can do. When it is determined that the main user's gesture recognized by the main user recognition unit 18 points to a specific device with a hand, the pointing motion extraction unit 19 determines that the pointing accuracy among the devices 1, 2, and 3 pointed by the hand is From the highest device, it is possible to easily check whether or not the device is instructed by sequentially outputting the name of the device to the outside through the speaker 22 for a predetermined time, for example, within the range of 0.5 to 2.0 sec. let it be At this time, if the name of the output of the speaker 22 is different from the device actually pointed by the user, the user may correct the gesture so that it faces the device again. ,2,3) can be identified. The name of the device may be configured to be registered in advance as a control operation of the control unit 14 by a user's manipulation of the operation unit 23 .

명령동작추출부(13)는 주사용자가 손으로 지시한 이후의 동작 제스처로부터 딥러닝 기반의 음성 인식을 수행하여 해당하는 명령을 추출하도록 할 수 있다. 또한 제어부(14)는 명령동작추출부(13)에 의해 추출되는 명령 직전에 식별되던 기기에 대하여 명령에 해당하는 제어신호를 송신하도록 제어할 수 있다. 따라서, 이러한 기능에 의하여, 제어부(14) 등은 사용자와의 동작 제어에 필요한 소통을 수행하도록 하는데, 사용자는 손으로 가리킴으로써 제어 대상 기기(1,2,3)를 확인한 다음, 이 기기에 대한 명령을 정확하게 내리도록 할 수 있다.The command action extraction unit 13 may extract a corresponding command by performing voice recognition based on deep learning from an action gesture given by the main user by hand. In addition, the control unit 14 may control the device identified immediately before the command extracted by the command operation extraction unit 13 to transmit a control signal corresponding to the command. Therefore, by this function, the control unit 14 and the like perform communication necessary for operation control with the user. The user identifies the control target device 1, 2, 3 by pointing with his hand, and then controls the device. Able to give orders accurately.

본 발명의 일 실시례에 따른 장거리 모션 제스처 인식장치(10)는 메모리부(21), 스피커(22), 조작부(23) 및 전원공급부(24)를 더 포함할 수 있다. 메모리부(21)는 명령동작추출부(13) 및 제어부(14)의 동작에 필요한 각종 데이터 및 기준 제스처 각각의 명령과 기준 음성 데이터 각각의 명령을 저장하도록 할 수 있고, 명령동작추출부(13), 제어부(14) 등이 요청시 해당 데이터를 제공할 수 있으며, 이 밖에도 동작에 필요한 각종 데이터 및 프로그램 내지 어플리케이션을 저장할 수 있다. 스피커(22)는 필요한 신호에 해당하는 오디오 출력이나 인공지능 스피커로서 요구되는 기능을 위해 마련될 수 있다. 조작부(23)는 직접 동작에 필요한 신호를 입력하도록 다수의 버튼이나 스위치 또는 터치패널 등으로 이루어질 수 있다. 전원공급부(24)는 동작에 필요한 전원을 공급하도록 하는데, 이를 위해 충전회로에 의해 충전이 가능한 충전지이거나, 착탈 가능하도록 수용된 건전기나 외부 전원으로부터 인가되는 전원을 공급하도록 구성될 수 있다. The long-distance motion gesture recognition device 10 according to an embodiment of the present invention may further include a memory unit 21, a speaker 22, a control unit 23, and a power supply unit 24. The memory unit 21 can store various data necessary for the operation of the command operation extraction unit 13 and the control unit 14, each command of a reference gesture, and each command of reference voice data, and the command operation extraction unit 13 ), the control unit 14, etc. can provide corresponding data upon request, and in addition, various data and programs or applications necessary for operation can be stored. The speaker 22 may be provided for functions required as an audio output corresponding to a necessary signal or an artificial intelligence speaker. The control unit 23 may be composed of a plurality of buttons, switches, or a touch panel to directly input signals necessary for operation. The power supply unit 24 supplies power required for operation, and for this purpose, it may be a rechargeable battery that can be charged by a charging circuit, or may be configured to supply power applied from a dry cell or an external power source that is detachably accommodated.

도 3은 본 발명의 다른 실시례에 따른 장거리 모션 제스처 인식장치를 도시한 사시도이다.3 is a perspective view illustrating a long-distance motion gesture recognition device according to another embodiment of the present invention.

도 3을 참조하면, 발명의 다른 실시례에 따른 장거리 모션 제스처 인식장치(30)는 이전 실시례에 따른 장거리 모션 제스처 인식장치(10)와 동일 명칭의 구성들이 동일하되, 케이싱(35) 구조에서 차이를 가질 수 있으며, 본 실시례에서는 고정형 사용을 가능하도록 육면체 등의 고정형 구조를 가진 케이싱(35)이 사용될 수 있다. Referring to FIG. 3 , a long-distance motion gesture recognition device 30 according to another embodiment of the present invention has the same components as the long-distance motion gesture recognition device 10 according to the previous embodiment, but has a casing 35 structure. It may have a difference, and in this embodiment, a casing 35 having a fixed type structure such as a hexahedron may be used to enable a fixed type use.

케이싱(35)의 전면에는 카메라(31), 마이크로폰(36), 스피커(42) 및 조작부(43) 등이 마련될 수 있다.A camera 31, a microphone 36, a speaker 42 and a control unit 43 may be provided on the front of the casing 35.

이와 같은 본 발명에 따른 장거리 모션 제스처 인식장치에 따르면, 인식률을 높이도록 할 뿐만 아니라, 장거리에서의 제스처 인식률 향상에 크게 기여할 수 있다.According to the long-distance motion gesture recognition device according to the present invention, not only can the recognition rate be increased, but it can also greatly contribute to improving the gesture recognition rate at a long distance.

본 발명에 따르면, 스마트폰 앱이나 스마트싱스, 홈어시스턴트, 홈킷 등 스마트홈 시스템과 연동하여 많이 사용되는 스마트홈 가전이나 스마트홈 전자제품 등에도 응용할 수 있도록 함으로써 사용의 편의성을 높이도록 할 수 있다.According to the present invention, it is possible to improve the convenience of use by enabling application to frequently used smart home appliances or smart home electronic products in conjunction with smart home systems such as smart phone apps, smart things, home assistant, and home kit.

이와 같이 본 발명에 대해서 첨부된 도면을 참조하여 설명하였으나, 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 다양한 수정 및 변형이 이루어질 수 있음은 물론이다. 그러므로, 본 발명의 범위는 설명된 실시례에 한정되어서는 아니되며, 후술하는 특허청구범위뿐만 아니라 이러한 특허청구범위와 균등한 것들에 의해 정해져야 한다.As described above, the present invention has been described with reference to the accompanying drawings, but various modifications and variations can be made without departing from the technical spirit of the present invention. Therefore, the scope of the present invention should not be limited to the described embodiments, and should be defined by not only the claims to be described later, but also those equivalent to these claims.

1,2,3 : 제어 대상 기기 11,31 : 카메라
12 : 영상처리부 13 : 명령동작추출부
14 : 제어부 15 : 전송부
16,36 : 마이크로폰 17 : 음성처리부
18 : 주사용인식부 19 : 지시동작추출부
21 : 메모리부 22,42 : 스피커
23,43 : 조작부 24 : 전원공급부
25,45 : 케이싱1,2,3: control target device 11,31: camera
12: image processing unit 13: command motion extraction unit
14: control unit 15: transmission unit
16,36: microphone 17: voice processing unit
18: injection recognition unit 19: instructional motion extraction unit
21: memory unit 22,42: speaker
23,43: control unit 24: power supply unit
25,45: Casing

Claims

a camera for obtaining an image of a user;
an image processing unit that obtains data for extracting the user's gesture by image processing on the image obtained from the camera;
a command action extraction unit for extracting a corresponding command by performing deep learning-based voice recognition from the data acquired by the image processing unit;
a control unit outputting a control signal corresponding to the command extracted by the command operation extraction unit; and
a transmission unit for transmitting the control signal output from the control unit to a control target device in a wired or wireless manner;
Including, long-distance motion gesture recognition device.

The method of claim 1,
The image processing unit,
Obtaining an image of the user's chest up to the user's face as data for extracting the user's gesture;
The command operation extraction unit,
Using the data acquired by the image processing unit, face recognition is performed to secure the ROI area through the face in order to utilize the image including the face and the chest as an input for deep learning, and a specific frame time series data is obtained. A long-distance motion gesture recognition device that uses LSTM and CNN as inputs and classifies them as specific gestures through a deep learning model to extract commands indicated by specific gestures.

The method of claim 1,
The command operation extraction unit,
In deep learning, a long-distance motion gesture recognition device that collects images in frame units and repeats them in real time, and overlaps a predetermined number of frames during the repetition.

The method of claim 1,
a microphone provided to input a user's voice; and
A voice processing unit that processes voice input to the microphone and obtains data for command extraction;
The command operation extraction unit,
The long-distance motion gesture recognition apparatus for comparing the user's voice with a plurality of pre-stored reference voices from the data obtained by the voice processing unit, and extracting a corresponding command from the matched reference voice.

The method according to any one of claims 1 to 4,
a main user recognition unit for recognizing a gesture of a main user from the data obtained by the image processing unit, so that the command motion extraction unit extracts a corresponding command by performing deep learning-based voice recognition from the gesture of the main user; and
In the image obtained by the camera, a command motion extractor for processing to sequentially identify a device that is instructed by the hand from the gesture of the main user recognized by the main user recognition unit for a specified period of time;
The command operation extraction unit,
Deep learning-based voice recognition is performed from the operation gestures after the main user gives a hand instruction to extract a corresponding command,
The control unit,
The long-distance motion gesture recognition device controls to transmit a control signal corresponding to the command to a device identified immediately before the command extracted by the command action extraction unit.