KR20220043847A

KR20220043847A - Method, apparatus, electronic device and storage medium for estimating object pose

Info

Publication number: KR20220043847A
Application number: KR1020210082024A
Authority: KR
Inventors: 장 차오; 김지연; 류 양; 카오 우에잉; 왕 하오; 장현성; 왕 창; 홍성훈; 리 웨이밍
Original assignee: 삼성전자주식회사
Priority date: 2020-09-29
Filing date: 2021-06-24
Publication date: 2022-04-05
Also published as: CN114332214A

Abstract

The present disclosure relates to an object pose estimation method, a device thereof, an electronic apparatus, and a storage medium. The object pose estimation method determines the reliability of a depth image according to a color image and a depth image of an object, estimates the pose of the object based on three-dimensional key points when the depth image is reliable, and estimates the pose of the object based on two-dimensional key points when the depth image is not reliable.

Description

Object pose estimation method, apparatus, electronic device and storage medium {METHOD, APPARATUS, ELECTRONIC DEVICE AND STORAGE MEDIUM FOR ESTIMATING OBJECT POSE

이하의 일 실시 예들은 인공지능 및 증강현실 기술 분야에 관한 것으로, 특히 객체 포즈 추정 방법, 장치, 전자 장치 및 저장 매체에 관한 것이다. The following embodiments relate to the field of artificial intelligence and augmented reality technology, and more particularly, to an object pose estimation method, an apparatus, an electronic device, and a storage medium.

정보 기술 및 인공 지능의 발달로 인해, 사회는 자동화 및 스마트화 등에 대한 요구 사항이 커지고 있고, 가상 현실, 자율 주행 및 로봇 등에 대한 기술도 점점 더 많은 관심을 받고 있다. 그중, 객체 포즈 추정 기술은 객체에 대한 카메라의 포즈 정보를 해결할 수 있으며, 해당 포즈 정보에 따라 객체 주변의 공간적 형태를 구성할 수 있으므로, 가상 현실, 자율 주행, 증강 현실 등 기술에서 중요한 역할을 한다.Due to the development of information technology and artificial intelligence, the society has increasing requirements for automation and smartization, and technologies for virtual reality, autonomous driving and robots are also receiving more and more attention. Among them, the object pose estimation technology can solve the pose information of the camera for the object and can configure the spatial shape around the object according to the pose information, so it plays an important role in technologies such as virtual reality, autonomous driving, and augmented reality. .

종래 기술에서는 주로 컬러 이미지를 기반으로 객체 포즈 추정이 이루어졌으나, 해당 방식은 복잡한 응용에서 결과의 정확도를 구현하는데 제한적이므로, 실제 응용에 필요한 정확도를 달성할 수 없다.In the prior art, object pose estimation is mainly performed based on a color image. However, since the method is limited in realizing the accuracy of the result in a complex application, the accuracy required for the actual application cannot be achieved.

종래의 객체 포즈 추정 방법은 깊이 이미지와 컬러 이미지를 이용하여 객체의 포즈를 추정할 수 있지만, 이러한 방법은 깊이 이미지의 품질에 따라 객체의 포즈가 부정확하다는 것이 발견되었다. 즉, 깊이 데이터가 누락되거나 오류 및 노이즈가 있는 상황에서 획득한 객체 포즈의 정확도는 높지 않다.The conventional object pose estimation method can estimate the pose of an object using a depth image and a color image, but it has been found that this method has an inaccurate pose of the object according to the quality of the depth image. That is, the accuracy of the object pose acquired in a situation where depth data is missing or there are errors and noises is not high.

또한, 컬러 이미지와 깊이 이미지를 기반으로 객체 6D 포즈를 추정하는 경우, 이종(heterogeneous) 수신을 사용하여 컬러 이미지와 깊이 이미지를 각각 처리하고, 융합 네트워크를 사용하여 특징을 추출하고, 추출된 특징에 기반하여 포즈를 추정한다. 그러나 해당 방법은 하나의 객체의 특징만을 사용하고, 객체에서 이미지가 겹치거나 그림자가 보일 때 정확한 객체 포즈 추정이 어려운 문제가 있다.In addition, when estimating an object 6D pose based on a color image and a depth image, the color image and the depth image are processed using heterogeneous reception, respectively, and features are extracted using a fusion network, and the extracted features are based on the estimation of the pose. However, this method uses only the characteristics of one object, and there is a problem in that it is difficult to accurately estimate the object pose when images overlap or shadows are seen in the object.

또한, 컬러 이미지 특징과 깊이 특징을 융합하여 객체 포즈를 추정할 때, 대량의 메모리와 자원을 소모해야 하기 때문에, 정확한 포즈 추정의 효율성이 낮아 객체 포즈 추정의 실시간 요구 사항을 충족할 수 없는 문제도 가지고 있다.In addition, when estimating an object pose by fusing color image features and depth features, a large amount of memory and resources must be consumed, so the efficiency of accurate pose estimation is low, so the real-time requirements of object pose estimation cannot be met. Have.

본 발명은 객체 포즈 추정 방법, 장치, 전자 장치 및 저장 매체를 제공하는 것을 목적으로 한다.An object of the present invention is to provide a method, an apparatus, an electronic device, and a storage medium for estimating an object pose.

본 발명의 일 실시 예에 따른 객체 포즈 추정 방법은, 객체의 컬러 이미지 및 깊이 이미지에 따라 상기 깊이 이미지의 신뢰도를 결정하는 단계; 상기 깊이 이미지를 신뢰할 수 있는 경우, 3차원 키 포인트를 기반으로 상기 객체의 포즈를 추정하는 단계; 및 상기 깊이 이미지를 신뢰할 수 없는 경우, 2차원 키 포인트를 기반으로 상기 객체의 포즈를 추정하는 단계를 포함한다.An object pose estimation method according to an embodiment of the present invention includes: determining reliability of the depth image according to a color image and a depth image of the object; estimating the pose of the object based on a three-dimensional key point when the depth image is reliable; and if the depth image is not reliable, estimating the pose of the object based on a two-dimensional key point.

이때, 상기 컬러 이미지 및 상기 깊이 이미지에 따라 상기 깊이 이미지의 신뢰도를 결정하는 단계는, 상기 컬러 이미지를 기반으로, 또는 상기 컬러 이미지 및 상기 깊이 이미지를 기반으로, 이미지 특징을 추출하고, 상기 깊이 이미지에 따라 포인트 클라우드 특징을 추출하고, 상기 이미지 특징과 포인트 클라우드 특징을 융합하여 융합 특징을 얻고, 상기 융합 특징을 기반으로 상기 깊이 이미지의 신뢰도를 결정하는 단계를 포함할 수 있다.In this case, the determining of the reliability of the depth image according to the color image and the depth image includes extracting image features based on the color image or based on the color image and the depth image, and the depth image extracting a point cloud feature according to the method, obtaining a fusion feature by fusing the image feature and the point cloud feature, and determining the reliability of the depth image based on the fusion feature.

이때, 상기 융합 특징을 기반으로 상기 깊이 이미지의 신뢰도를 결정하는 단계는, 상기 융합 특징에 기반하여, 객체 인스턴스(instance) 분할 이미지 및 깊이 신뢰도 이미지를 획득하는 단계; 객체 인스턴스 분할 이미지 및 깊이 신뢰도 이미지에 따라 컬러 이미지 중 각 타겟 객체에 대응하는 깊이 이미지의 신뢰도를 결정하는 단계를 포함할 수 있다.In this case, the determining of the reliability of the depth image based on the fusion feature may include: acquiring an object instance segmentation image and a depth reliability image based on the fusion feature; The method may include determining the reliability of the depth image corresponding to each target object among the color images according to the object instance segmentation image and the depth reliability image.

이때, 상기 컬러 이미지 및 상기 깊이 이미지에 따라 상기 깊이 이미지의 신뢰도를 결정하는 단계는, 상기 컬러 이미지 및 상기 깊이 이미지를 기반으로 이미지 특징을 추출하고, 이미지 특징에 따라 상기 깊이 이미지의 신뢰도를 결정하는 단계를 포함할 수 있다.In this case, the determining of the reliability of the depth image according to the color image and the depth image includes extracting image features based on the color image and the depth image, and determining the reliability of the depth image according to the image features. may include steps.

이때, 상기 이미지 특징에 따라 상기 깊이 이미지의 신뢰도를 결정하는 단계는, 상기 이미지 특징에 따라, 객체 인스턴스 분할 이미지 및 깊이 신뢰도 이미지를 획득하는 단계; 및 상기 객체 인스턴스 분할 이미지 및 상기 깊이 신뢰도 이미지에 따라 상기 컬러 이미지에서 각 타겟 객체에 대응하는 상기 깊이 이미지의 신뢰도를 결정하는 단계를 포함할 수 있다.In this case, the determining of the reliability of the depth image according to the image feature may include: acquiring an object instance segmentation image and a depth reliability image according to the image feature; and determining the reliability of the depth image corresponding to each target object in the color image according to the object instance segmentation image and the depth reliability image.

이때, 상기 이미지 특징에 따라, 상기 객체 인스턴스 분할 이미지 및 상기 깊이 신뢰도 이미지를 획득하는 단계는, 상기 이미지 특징에 따라, 각 타겟 객체에 대응하는 이미지 영역의 영역 이미지 특징을 획득하는 단계; 및 상기 각 타겟 객체에 대해, 해당 타겟 객체에 대응하는 영역 이미지 특징을 기반으로, 상기 해당 타겟 객체의 깊이 신뢰도 이미지를 결정하고, 상기 각 타겟 객체에 대응하는 영역 이미지 특징을 기반으로 상기 객체 인스턴스 분할 이미지를 획득하는 단계를 포함할 수 있다.In this case, according to the image characteristic, the obtaining of the object instance segmentation image and the depth reliability image may include, according to the image characteristic, obtaining a regional image characteristic of an image region corresponding to each target object; and, for each target object, based on a region image feature corresponding to the target object, determining a depth reliability image of the corresponding target object, and dividing the object instance based on a region image feature corresponding to each target object It may include acquiring an image.

이때, 객체 포즈 추정 방법은, 상기 컬러 이미지 및 상기 깊이 이미지를 기반으로, 각 타겟 객체의 제1 외관 특징 및 각 타겟 객체 간의 기하학적 관계 특징을 획득하는 단계; 및 각 타겟 객체에 대해, 해당 타겟 객체의 제1 외관 특징, 해당 타겟 객체 이외의 다른 타겟 객체의 제1 외관 특징 및 해당 타겟 객체와 해당 타겟 객체 이외의 다른 타겟 객체 간의 기하학적 관계 특징을 기반으로, 해당 타겟 객체의 제2 외관 특징을 결정하는 단계를 더 포함할 수 있다.In this case, the method for estimating an object pose includes: acquiring a first appearance feature of each target object and a geometrical relationship feature between each target object based on the color image and the depth image; and for each target object, based on a first appearance characteristic of the target object, a first appearance characteristic of a target object other than the target object, and a geometrical relationship characteristic between the target object and another target object other than the target object, The method may further include determining a second appearance characteristic of the corresponding target object.

이때, 상기 3차원 키 포인트를 기반으로 상기 객체의 포즈를 추정하는 단계는, 융합 특징 및 각 타겟 객체의 제2 외관 특징에 따라, 각 타겟 객체의 포즈를 추정하는 단계를 포함할 수 있다.In this case, the step of estimating the pose of the object based on the three-dimensional key point may include estimating the pose of each target object according to a fusion feature and a second appearance feature of each target object.

이때, 상기 3차원 키 포인트를 기반으로 상기 객체의 포즈를 추정하는 단계는, 이미지 특징 및 각 타겟 객체의 제2 외관 특징에 따라, 각 타겟 객체의 포즈를 추정하는 단계를 포함할 수 있다.In this case, the step of estimating the pose of the object based on the three-dimensional key point may include estimating the pose of each target object according to an image feature and a second appearance feature of each target object.

이때, 상기 컬러 이미지 및 상기 깊이 이미지를 기반으로, 상기 각 타겟 객체의 제1 외관 특징 및 상기 각 타겟 객체 간의 기하학적 관계 특징을 획득하는 단계는, 상기 컬러 이미지를 기반으로 또는 상기 컬러 이미지 및 상기 깊이 이미지를 기반으로 이미지 특징을 추출하고, 상기 깊이 이미지에 따라 포인트 클라우드 특징을 추출하고, 상기 이미지 특징과 포인트 클라우드 특징을 융합하여 융합 특징을 얻고, 상기 융합 특징을 기반으로 상기 각 타겟 객체의 제1 외관 특징 및 객체 인스턴스 분할 이미지를 획득하고, 상기 객체 인스턴스 분할 이미지를 기반으로 상기 각 타겟 객체 간의 기하학적 관계 특징을 획득하는 단계를 포함할 수 있다.At this time, based on the color image and the depth image, the step of obtaining the first appearance feature of each target object and the geometrical relationship feature between the respective target objects may include: based on the color image or the color image and the depth extracting an image feature based on an image, extracting a point cloud feature according to the depth image, fusing the image feature and a point cloud feature to obtain a fusion feature, and based on the fusion feature, the first of each target object The method may include obtaining an appearance feature and an object instance segmentation image, and obtaining a geometrical relationship feature between the respective target objects based on the object instance segmentation image.

이때, 상기 컬러 이미지 및 상기 깊이 이미지를 기반으로, 상기 각 타겟 객체의 제1 외관 특징 및 상기 각 타겟 객체 간의 기하학적 관계 특징을 획득하는 단계는, 상기 컬러 이미지 및 상기 깊이 이미지를 기반으로 이미지 특징을 추출하고, 상기 이미지 특징에 따라 상기 각 타겟 객체의 이미지 영역에 대응하는 영역 이미지 특징을 획득하고, 상기 각 타겟 객체의 이미지 영역에 대응하는 영역 이미지 특징을 기반으로 상기 각 타겟 객체의 제1 외관 특징 및 상응하는 객체 탐지(object detection) 결과를 획득하고, 상기 각 타겟 객체의 객체 탐지 결과를 기반으로 상기 각 타겟 객체의 기하학적 관계 특징을 획득하는 단계를 포함할 수 있다.At this time, based on the color image and the depth image, the step of obtaining the first appearance feature of each target object and the geometrical relationship feature between each target object includes: image features based on the color image and the depth image extracting, obtaining a region image feature corresponding to the image region of each target object according to the image feature, and a first appearance feature of each target object based on the region image feature corresponding to the image region of each target object and obtaining a corresponding object detection result, and obtaining a geometrical relationship characteristic of each target object based on the object detection result of each target object.

이때, 객체 포즈 추정 방법은, 비디오 프레임에서 타겟 객체 또는 타겟 포즈가 처음 나타나는지 여부를 탐지하여, 상기 비디오 프레임이 초기 프레임인지 결정하는 단계를 더 포함할 수 있다.In this case, the method for estimating the object pose may further include determining whether the video frame is an initial frame by detecting whether the target object or the target pose first appears in the video frame.

이때, 상기 비디오 프레임에서 상기 타겟 객체 또는 상기 타겟 포즈가 처음 나타나는지 여부를 탐지하여, 상기 비디오 프레임이 초기 프레임인지 결정하는 단계는, 해당 비디오 프레임에서 각 타겟 개체의 이미지 바운딩 박스를 획득하는 단계; 상기 각 타겟 객체의 이미지 바운딩 박스를 각 포즈 결과 목록의 각 타겟 객체에 대응하는 이미지 바운딩 박스와 일치시키는 단계; 상기 포즈 결과 목록에 일치하는 타겟 객체가 있는 경우, 상기 해당 비디오 프레임에서 각 타겟 객체에 대응하는 상기 이미지 바운딩 박스의 제1 포인트 클라우드 데이터와 상기 해당 비디오 프레임의 이전 비디오 프레임에서 각 타겟 객체에 대응하는 제2 포인트 클라우드 데이터 프레임을 비교하여, 상기 제1 포인트 클라우드 데이터와 제2 포인트 클라우드 데이터 사이에 차이가 있는지 여부를 결정하고, 차이가 있다면, 상기 타겟 객체에 대응하는 객체의 포즈가 처음 나타난 것으로 결정하는 단계; 및 상기 객체 포즈 결과 목록에 일치하는 타겟 객체가 없는 경우, 타겟 객체가 처음 나타나는 것으로 결정하는 단계를 포함할 수 있다.In this case, the step of determining whether the video frame is an initial frame by detecting whether the target object or the target pose first appears in the video frame may include: obtaining an image bounding box of each target object in the video frame; matching the image bounding box of each target object with the image bounding box corresponding to each target object in each pose result list; If there is a matching target object in the pose result list, the first point cloud data of the image bounding box corresponding to each target object in the corresponding video frame and the corresponding target object in the previous video frame of the corresponding video frame By comparing second point cloud data frames, it is determined whether there is a difference between the first point cloud data and the second point cloud data, and if there is a difference, it is determined that the pose of the object corresponding to the target object appears first. to do; and when there is no matching target object in the object pose result list, determining that the target object appears first.

이때, 객체 포즈 추정 방법은, 상기 비디오 프레임이 초기 프레임이 아니면, 상기 비디오 프레임에 대응하는 모션 파라미터를 획득하고, 상기 모션 파라미터 및 상기 비디오 프레임에 대응하는 초기 프레임의 객체 포즈 결과에 기초하여, 상기 비디오 프레임에 대응하는 포즈 결과를 결정하는 단계; 및 상기 비디오 프레임에 대응하는 상기 포즈 결과에 따라, 포즈 결과 목록에서 상기 비디오 프레임에 대응하는 상기 초기 프레임의 객체 포즈 결과를 업데이트하는 단계를 더 포함할 수 있다.In this case, the method for estimating an object pose is, if the video frame is not an initial frame, obtains a motion parameter corresponding to the video frame, and based on the motion parameter and an object pose result of an initial frame corresponding to the video frame, determining a pose result corresponding to the video frame; and updating an object pose result of the initial frame corresponding to the video frame in a pose result list according to the pause result corresponding to the video frame.

본 발명의 일 실시 예에 따른 객체 포즈 추정 장치는, 객체의 컬러 이미지 및 깊이 이미지에 따라 상기 깊이 이미지의 신뢰도를 결정하도록 구성된 이미지 신뢰도 결정부; 및 상기 깊이 이미지를 신뢰할 수 있는 경우, 3차원 키 포인트를 기반으로 상기 객체의 포즈를 추정하고, 상기 깊이 이미지를 신뢰할 수 없는 경우, 2차원 키 포인트를 기반으로 상기 객체의 포즈를 추정하도록 구성된 포즈 추정부를 포함한다.An apparatus for estimating an object pose according to an embodiment of the present invention includes: an image reliability determining unit configured to determine the reliability of the depth image according to a color image and a depth image of the object; and a pose configured to estimate the pose of the object based on a three-dimensional key point when the depth image is reliable, and to estimate the pose of the object based on a two-dimensional key point when the depth image is unreliable. Includes estimates.

도 1은 일 실시 예에 따라 객체 포즈를 추정하는 과정을 도시한 흐름도이다.
도 2는 일 실시 예에 따라 한번의 특징 추출에 기반하여 객체 포즈를 추정하는 과정을 도시한 도면이다.
도 3은 일 실시 예에 따라 두번의 특징 추출에 기반하여 객체 포즈를 추정하는 과정을 도시한 도면이다.
도 4는 일 실시 예에 따라 깊이 이미지에 기반하여 객체 포즈를 추정하는 과정을 도시한 흐름도이다.
도 5는 일 실시 예에 따라 이미지 특징과 포인트 클라우드 특징을 융합하여 융합된 특징을 획득하는 과정을 도시한 흐름도이다.
도 6은 일 실시 예에 따라 한번의 특징 추출에 기반하고 타겟 객체의 제2 외관 특징을 사용하여 객체 포즈를 추정하는 과정을 도시한 도면이다.
도 7은 일 실시 예에 따라 두번의 특징 추출에 기반하고 타겟 객체의 제2 외관 특징을 사용하여 객체 포즈를 추정하는 과정을 도시한 도면이다.
도 8은 다른 실시 예에 따라 객체 포즈를 추정하는 과정을 도시한 흐름도이다.
도 9는 일 실시 예에 따라 컬러 이미지 및 깊이 이미지에 기반하여 객체 포즈를 추정하는 과정을 도시한 도면이다.
도 10은 일 실시 예에 따라 비디오의 초기 프레임에 기반하여 객체 포즈를 추정하는 과정을 도시한 흐름도이다.
도 11은 일 실시 예에 따라 3개의 초기 프레임 선택 방법의 예를 도시한 도면이다.
도 12는 일 실시 예에 따라 객체 탐지에 기반하여 비디오 프레임이 초기 프레임인지 여부를 결정하는 과정을 도시한 흐름도이다.
도 13은 일 실시 예에 따라 새로운 객체 또는 새로운 포즈의 유무를 확인하는 과정을 도시한 흐름도이다.
도 14는 일 실시 예에 따라 객체 포즈 추정 장치의 개략적인 구성을 도시한 도면이다.
도 15는 일 실시 예에 따라 전자 장치의 개략적인 구성을 도시한 도면이다.1 is a flowchart illustrating a process of estimating an object pose according to an exemplary embodiment.
2 is a diagram illustrating a process of estimating an object pose based on a single feature extraction according to an embodiment.
3 is a diagram illustrating a process of estimating an object pose based on two times of feature extraction according to an embodiment.
4 is a flowchart illustrating a process of estimating an object pose based on a depth image according to an exemplary embodiment.
5 is a flowchart illustrating a process of acquiring a fused feature by fusing an image feature and a point cloud feature, according to an embodiment.
6 is a diagram illustrating a process of estimating an object pose based on one-time feature extraction and using a second appearance feature of a target object, according to an embodiment.
7 is a diagram illustrating a process of estimating an object pose based on two-time feature extraction and using a second appearance feature of a target object, according to an embodiment.
8 is a flowchart illustrating a process of estimating an object pose according to another exemplary embodiment.
9 is a diagram illustrating a process of estimating an object pose based on a color image and a depth image, according to an embodiment.
10 is a flowchart illustrating a process of estimating an object pose based on an initial frame of a video according to an embodiment.
11 is a diagram illustrating an example of a method for selecting three initial frames according to an embodiment.
12 is a flowchart illustrating a process of determining whether a video frame is an initial frame based on object detection, according to an embodiment.
13 is a flowchart illustrating a process of confirming the existence of a new object or a new pose, according to an exemplary embodiment.
14 is a diagram illustrating a schematic configuration of an apparatus for estimating an object pose according to an embodiment.
15 is a diagram illustrating a schematic configuration of an electronic device according to an exemplary embodiment.

이하에서, 첨부된 도면을 참조하여 실시예들을 상세하게 설명한다. 그러나, 실시예들에는 다양한 변경이 가해질 수 있어서 특허출원의 권리 범위가 이러한 실시예들에 의해 제한되거나 한정되는 것은 아니다. 실시예들에 대한 모든 변경, 균등물 내지 대체물이 권리 범위에 포함되는 것으로 이해되어야 한다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. However, since various changes may be made to the embodiments, the scope of the patent application is not limited or limited by these embodiments. It should be understood that all modifications, equivalents and substitutes for the embodiments are included in the scope of the rights.

실시예에서 사용한 용어는 단지 설명을 목적으로 사용된 것으로, 한정하려는 의도로 해석되어서는 안된다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 명세서 상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in the examples are used for the purpose of description only, and should not be construed as limiting. The singular expression includes the plural expression unless the context clearly dictates otherwise. In this specification, terms such as "comprise" or "have" are intended to designate that a feature, number, step, operation, component, part, or a combination thereof described in the specification exists, but one or more other features It should be understood that this does not preclude the existence or addition of numbers, steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which the embodiment belongs. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related art, and should not be interpreted in an ideal or excessively formal meaning unless explicitly defined in the present application. does not

또한, 첨부 도면을 참조하여 설명함에 있어, 도면 부호에 관계없이 동일한 구성 요소는 동일한 참조부호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 실시예를 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 실시예의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.In addition, in the description with reference to the accompanying drawings, the same components are given the same reference numerals regardless of the reference numerals, and the overlapping description thereof will be omitted. In describing the embodiment, if it is determined that a detailed description of a related known technology may unnecessarily obscure the gist of the embodiment, the detailed description thereof will be omitted.

또한, 실시 예의 구성 요소를 설명하는 데 있어서, 제 1, 제 2, A, B, (a), (b) 등의 용어를 사용할 수 있다. 이러한 용어는 그 구성 요소를 다른 구성 요소와 구별하기 위한 것일 뿐, 그 용어에 의해 해당 구성 요소의 본질이나 차례 또는 순서 등이 한정되지 않는다. 어떤 구성 요소가 다른 구성요소에 "연결", "결합" 또는 "접속"된다고 기재된 경우, 그 구성 요소는 그 다른 구성요소에 직접적으로 연결되거나 접속될 수 있지만, 각 구성 요소 사이에 또 다른 구성 요소가 "연결", "결합" 또는 "접속"될 수도 있다고 이해되어야 할 것이다. In addition, in describing the components of the embodiment, terms such as first, second, A, B, (a), (b), etc. may be used. These terms are only for distinguishing the elements from other elements, and the essence, order, or order of the elements are not limited by the terms. When it is described that a component is "connected", "coupled" or "connected" to another component, the component may be directly connected or connected to the other component, but another component is between each component. It will be understood that may also be "connected", "coupled" or "connected".

어느 하나의 실시 예에 포함된 구성요소와, 공통적인 기능을 포함하는 구성요소는, 다른 실시 예에서 동일한 명칭을 사용하여 설명하기로 한다. 반대되는 기재가 없는 이상, 어느 하나의 실시 예에 기재한 설명은 다른 실시 예에도 적용될 수 있으며, 중복되는 범위에서 구체적인 설명은 생략하기로 한다.Components included in one embodiment and components having a common function will be described using the same names in other embodiments. Unless otherwise stated, descriptions described in one embodiment may be applied to other embodiments as well, and detailed descriptions within the overlapping range will be omitted.

본 개시의 이해를 돕기 위한 아래에서 관련 기술에 대해 설명한다.Related technologies will be described below for better understanding of the present disclosure.

객체는 공간에서 6개의 자유도를 가진 대상으로, 컵, 모니터, 책 등과 같이 6-DOF(degrees of freedom) 대상이라고도 한다.An object is an object with six degrees of freedom in space, also called a six-degrees of freedom (DOF) object, such as a cup, monitor, or book.

증강 현실 기술은 사용자 앞에 있는 디스플레이 장면에 가상 콘텐츠를 추가하여 사용자에게 실제 정보 경험을 제공한다. 사용자 앞에서 고품질의 가상 현실 융합 효과를 완성하기 위해, 3차원 공간에서 증강 현실 시스템은 주변 객체의 3차원 상태에 대한 고정밀 실시간 처리 및 이해가 필요하다.Augmented reality technology adds virtual content to the display scene in front of the user, providing the user with a real information experience. In order to complete the high-quality virtual reality fusion effect in front of the user, the augmented reality system in the three-dimensional space requires high-precision real-time processing and understanding of the three-dimensional state of the surrounding objects.

포즈 추정은 기하학적 모델이나 구조를 이용하여 객체의 구조와 모양을 표현하고, 객체의 특징을 추출하여 모델과 이미지 간의 대응 관계를 설정하고, 기하학적 또는 다른 방법으로 객체의 공간 포즈를 추정한다. 여기에 사용되는 모델은 평면, 원통 또는 어떤 종류의 기하학적 구조와 같은 단순한 기하학적 바디일 수 있으며, 레이저 스캐닝 또는 기타 방법으로 얻은 3D 모델일 수도 있다.Pose estimation expresses the structure and shape of an object using a geometric model or structure, establishes a correspondence relationship between a model and an image by extracting object features, and estimates the spatial pose of an object by geometric or other methods. The model used here may be a simple geometric body such as a plane, cylinder, or some kind of geometry, or it may be a 3D model obtained by laser scanning or other methods.

이미지의 6D 포즈 추정은 컬러 및 깊이 정보를 포함하는 주어진 단일 이미지에 대해, 이미지에 있는 타겟 객체의 6D 포즈가 추정된다. 6D 포즈는 3차원 위치 및 3차원 공간 방향을 포함하며, 6-DOF 포즈라고도 한다.6D pose estimation of an image, for a given single image containing color and depth information, a 6D pose of a target object in the image is estimated. A 6D pose includes a three-dimensional position and a three-dimensional spatial orientation, also referred to as a 6-DOF pose.

다층 퍼셉트론(MLP; Multi-Layer Perceptron) 네트워크은 다층 신경망으로, 일종의 순방향 구조의 인공 신경망이다.A multi-layer perceptron (MLP) network is a multi-layer neural network, which is a kind of forward-structured artificial neural network.

시맨틱 분할(Semantic Segmentation)은 배경에서 타겟 객체를 분리하는 것이다.Semantic segmentation is the separation of the target object from the background.

객체 탐지는 이미지에서 객체의 위치를 결정하는 것이다.Object detection is the determination of the location of an object in an image.

인스턴스 분할은 각 타겟 객체에 속하는 픽셀을 결정하는 것이다.Instance segmentation determines which pixels belong to each target object.

본 개시는 상술한 문제점 중 적어도 하나를 해결하기 위한 객체 포즈 추정 방법에 관한 것으로, 컬러 이미지 및 깊이 이미지를 기반으로 객체 포즈 추정 시, 객체 포즈 추정의 정확도를 효과적으로 향상시킬 수 있고, 깊이 이미지의 신뢰 가능 품질에 의존하지 않을 수 있다.The present disclosure relates to a method for estimating an object pose for solving at least one of the above problems, and when estimating an object pose based on a color image and a depth image, the accuracy of the object pose estimation can be effectively improved, and the reliability of the depth image May not depend on possible quality.

이하에서는, 본 개시의 일 실시 예에 따른 객체 포즈 추정 방법, 장치, 전자 장치 및 저장 매체를 첨부된 도 1 내지 도 15를 참조하여 상세히 설명한다.Hereinafter, according to an embodiment of the present disclosure An object pose estimation method, an apparatus, an electronic device, and a storage medium will be described in detail with reference to the accompanying FIGS. 1 to 15 .

도 1은 일 실시 예에 따라 객체 포즈를 추정하는 과정을 도시한 흐름도이다. 1 is a flowchart illustrating a process of estimating an object pose according to an exemplary embodiment.

도 1을 참조하면, 객체 포즈 추정 장치는 객체의 컬러 이미지 및 깊이 이미지에 따라, 깊이 이미지의 신뢰도를 결정한다(110).Referring to FIG. 1 , the apparatus for estimating an object pose determines reliability of a depth image according to a color image and a depth image of the object ( 110 ).

그리고, 객체 포즈 추정 장치는 깊이 이미지를 신뢰할 수 있는 경우, 3차원 키 포인트를 기반으로 객체의 포즈를 추정한다(120).Then, when the depth image is reliable, the object pose estimation apparatus estimates the pose of the object based on the 3D key point ( 120 ).

그리고, 객체 포즈 추정 장치는 깊이 이미지를 신뢰할 수 없는 경우, 2차원 키 포인트를 기반으로 객체의 포즈를 추정한다(130).Then, when the object pose estimation apparatus does not trust the depth image, it estimates the pose of the object based on the two-dimensional key point ( 130 ).

이때, 컬러 이미지와 깊이 이미지는 같은 장면에 대응하는 상황 속에서 동일한 객체를 포함하는 이미지로, 객체의 종류와 개수는 여기에 제한되지 않는다. 본 개시의 실시 예는 컬러 이미지와 대응하는 깊이 이미지의 획득 방법을 제한하지 않는다. 예를 들어, 컬러 이미지와 깊이 이미지는 일반 이미지 획득 장치와 깊이 이미지 획득 장치를 통해 동시에 획득한 컬러 이미지 및 깊이 이미지일 수 있고, 또한 깊이 이미지 및 컬러 이미지 획득 기능을 모두 갖춘 이미지 획득 장치 또는 비디오 획득 장치에 의해 획득될 수 있다. 예를 들어, 이미지 획득 장치를 통해 RGBD(Red+Green+Blue+depth) 이미지(빨강/녹색/파랑 3가지 원색과 깊이 정보를 포함한 이미지)를 획득하고, 깊이 이미지와 컬러 이미지는 해당 RGBD 이미지를 기반으로 획득한 것일 수 있다.In this case, the color image and the depth image are images including the same object in a situation corresponding to the same scene, and the type and number of objects are not limited thereto. Embodiments of the present disclosure do not limit a method of acquiring a color image and a corresponding depth image. For example, the color image and the depth image may be the color image and the depth image simultaneously acquired through the general image acquisition device and the depth image acquisition device, and also the image acquisition device or video acquisition having both the depth image and color image acquisition function may be obtained by the device. For example, an RGBD (Red+Green+Blue+depth) image (an image including three primary colors and depth information of red/green/blue) is acquired through an image acquisition device, and the depth image and color image are It may be obtained based on

이때, 컬러 이미지의 경우, 객체 포즈 추정 장치는 실제 적용 요구사항에 따라 그레이 이미지 또는 컬러 이미지 사용 여부를 구성하고 선택할 수 있다. 예를 들어, 객체 포즈 추정 장치는 더 나은 포즈 추정 효과를 얻기 위해 컬러 이미지를 사용할 수 있고, 포즈 추정의 효율성을 높이려면 회색 이미지를 사용할 수 있다. 설명의 편의를 위해 이하의 설명에서는 컬러 이미지를 예로 들어 설명한다.In this case, in the case of a color image, the object pose estimation apparatus may configure and select whether to use a gray image or a color image according to actual application requirements. For example, the object pose estimation apparatus may use a color image to obtain a better pose estimation effect, and may use a gray image to increase the efficiency of pose estimation. For convenience of explanation, a color image will be used as an example in the following description.

실제 응용에 있어, 많은 응용 장면에서 연속 및 실시간 객체 포즈 추정이 필요하다. 이때, 선택적으로, 비디오 획득 장비를 통해 객체의 RGBD 비디오를 수집할 수 있고, 해당 비디오의 각 프레임 이미지는 RGBD 이미지이며, 동일한 비디오 프레임에서 객체의 컬러 이미지와 깊이 이미지를 추출하여 획득할 수 있고, 각 비디오 프레임에 대해, 모두 대응하는 컬러 이미지 및 깊이 이미지를 얻을 수 있다. 동일한 비디오 프레임에 대응하는 컬러 이미지와 깊이 이미지의 객체는 일치하고, 객체는 하나 이상일 수 있다. 획득한 각 비디오 프레임에 대응하는 컬러 이미지 및 깊이 이미지에 대해, 본 개시의 방안을 기반으로 객체 포즈의 실시간 추정을 실현할 수 있다.In practical applications, continuous and real-time object pose estimation is required in many application scenes. At this time, optionally, the RGBD video of the object may be collected through video acquisition equipment, and each frame image of the video is an RGBD image, and may be obtained by extracting the color image and depth image of the object from the same video frame, For each video frame, all corresponding color images and depth images can be obtained. Objects of the color image and the depth image corresponding to the same video frame are identical, and there may be more than one object. For a color image and a depth image corresponding to each obtained video frame, real-time estimation of an object pose may be realized based on the method of the present disclosure.

객체 포즈 추정 장치는 컬러 이미지와 깊이 이미지에 따라 이미지 특징을 추출 및 처리하고, 처리된 이미지 특징을 사용하여 깊이 이미지의 신뢰도를 판단할 수 있다. 구체적으로, 깊이 이미지의 신뢰도 판단은 깊이 이미지의 깊이 데이터의 신뢰도 판단일 수 있다. 일반적으로 깊이 이미지는 깊이 데이터, 윤곽 데이터 등 다양한 데이터를 포함하며, 깊이 이미지의 신뢰도를 판단하는 것은 구체적으로 깊이 이미지의 깊이 데이터를 신뢰할 수 있는지 여부를 판단하는 것일 수 있다.The object pose estimation apparatus may extract and process image features according to the color image and the depth image, and determine reliability of the depth image using the processed image features. Specifically, the reliability determination of the depth image may be the reliability determination of the depth data of the depth image. In general, a depth image includes various data such as depth data and contour data, and determining the reliability of the depth image may specifically determine whether the depth data of the depth image is reliable.

깊이 이미지의 신뢰도 판단은 신뢰도 임계값을 설정하거나 신뢰도 픽셀의 비율 임계값을 설정하여 얻을 수 있다. 깊이 이미지에 대응하는 신뢰도가 설정된 신뢰도 임계값을 초과하거나 통계적으로 신뢰할 수 있는 픽셀의 비율이 설정된 비율 임계값을 초과하는 경우, 이는 해당 깊이 이미지를 신뢰할 수 있다는 것을 나타낸다.Determination of the reliability of the depth image may be obtained by setting a reliability threshold or by setting a threshold for the ratio of reliability pixels. When the reliability corresponding to the depth image exceeds the set reliability threshold or the ratio of statistically reliable pixels exceeds the set ratio threshold, this indicates that the depth image can be trusted.

객체 포즈 추정 장치는 깊이 이미지를 신뢰할 수 있는 경우, 3차원 키 포인트를 기반으로 객체의 포즈를 추정한다. 그리고, 객체 포즈 추정 장치는 깊이 이미지를 신뢰할 수 없는 경우, 2차원 키 포인트를 기반으로 객체의 포즈를 추정한다.The object pose estimation apparatus estimates the pose of the object based on the three-dimensional key point when the depth image is reliable. And, when the object pose estimation apparatus cannot trust the depth image, the object pose estimation apparatus estimates the pose of the object based on the two-dimensional key point.

이때, 2차원 키 포인트는 컬러 이미지만을 사용하여 얻을 수 있고, 이런 상황에서, 본 개시에서 설명하는 네트워크 구조 중, 2차원 키 포인트 추출을 위해서 입력이 포인트 클라우드 특징 형식인 MLP 네트워크는 입력이 이미지 특징 형식인 CNN 네트워크로 대체된다. 또한, 2차원 키 포인트와 3차원 키 포인트는 모두 깊이 이미지와 컬러 이미지를 기반으로 얻을 수 있다.At this time, the two-dimensional key point can be obtained using only a color image, and in this situation, among the network structures described in the present disclosure, the MLP network in which the input is a point cloud feature format for two-dimensional key point extraction, the input is the image feature It is replaced by a CNN network in the form In addition, both two-dimensional key points and three-dimensional key points can be obtained based on depth images and color images.

깊이 이미지가 신뢰할 수 있는 경우, 객체 포즈 추정 장치는 3차원 키 포인트를 기반으로 객체의 포즈를 추정한다. 선택적으로, 객체 포즈 추정 장치는 깊이 이미지와 컬러 이미지를 기반으로 3차원 키 포인트를 획득할 수 있고, 획득한 3차원 키 포인트를 이용하여 객체의 포즈를 추정한다. 해당 3차원 키 포인트의 데이터는 깊이 이미지가 제공하는 깊이 데이터를 포함한다. 객체 포즈 추정을 위해 컬러 이미지만 사용하는 방안과 비교하여, 본 개시의 객체 포즈 추정 장치는 컬러 이미지와 깊이 이미지를 기반으로 객체 포즈를 추정한다. 이는 기본 데이터의 크기를 증가시켜, 객체 포즈 추정의 정확성과 정밀도를 향상시키는 데 도움이 된다. 또한 객체의 가림, 센서 노이즈, 불충분한 조명 등 조건에서도 정확한 객체 포즈를 얻을 수 있다.When the depth image is reliable, the object pose estimation apparatus estimates the pose of the object based on the three-dimensional key point. Optionally, the object pose estimation apparatus may obtain a three-dimensional key point based on the depth image and the color image, and estimate the pose of the object using the obtained three-dimensional key point. The data of the corresponding three-dimensional key point includes depth data provided by the depth image. Compared to a method using only a color image for object pose estimation, the object pose estimation apparatus of the present disclosure estimates an object pose based on a color image and a depth image. This increases the size of the underlying data, which helps to improve the accuracy and precision of object pose estimation. In addition, accurate object poses can be obtained even under conditions such as object occlusion, sensor noise, and insufficient lighting.

깊이 이미지를 신뢰할 수 없는 경우, 객체 포즈 추정 장치는 2차원 키 포인트를 기반으로 객체의 포즈를 추정하고, 선택적으로, 컬러 이미지와 깊이 이미지의 윤곽 데이터에 따라 이미지의 2차원 키 포인트를 구할 수 있으며, 획득한 2차원 키 포인트를 이용하여 포즈를 추정한다. 깊이 이미지의 깊이 데이터는 신뢰할 수 없기 때문에, 이런 경우 깊이 데이터를 기반으로 객체 포즈를 추정할 경우, 최종적으로 획득한 객체 포즈에 오류가 날 수 있다. 따라서, 깊이 이미지를 신뢰할 수 없는 경우, 객체 포즈 추정 장치는 깊이 이미지의 깊이 데이터의 무결성에 의존하지 않고, 컬러 이미지 및 깊이 이미지의 객체 윤곽 데이터를 기반으로 포즈를 추정할 수 있다. 따라서, 깊이 데이터가 누락되거나 오류 및 노이즈가 있는 상황에서, 컬러 이미지와 객체 윤곽 데이터를 기반으로 한 포즈 추정은, 컬러 이미지만을 기반으로 한 객체 포즈 추정 방안과 비교하여, 추정된 객체 포즈의 정확도를 향상시킬 수 있다. When the depth image is unreliable, the object pose estimation apparatus estimates the pose of the object based on the two-dimensional key point, and optionally, obtains the two-dimensional key point of the image according to the contour data of the color image and the depth image, , the pose is estimated using the obtained two-dimensional key points. Since the depth data of the depth image is unreliable, in this case, when the object pose is estimated based on the depth data, an error may occur in the finally obtained object pose. Therefore, when the depth image is unreliable, the object pose estimation apparatus may estimate a pose based on the object contour data of the color image and the depth image without relying on the integrity of the depth data of the depth image. Therefore, in a situation where depth data is missing or there are errors and noise, pose estimation based on color images and object contour data improves the accuracy of the estimated object poses, compared to object pose estimation methods based on color images only. can be improved

본 개시에서 객체 포즈 추정 장치는 깊이 이미지와 컬러 이미지를 사용하여 객체 포즈를 추정한다. 첫째, 객체 포즈 추정 장치는 컬러 이미지와 깊이 이미지에 따라 깊이 이미지의 신뢰도를 결정한다. 깊이 이미지를 신뢰할 수 있는 경우, 객체 포즈 추정 장치는 3차원 키 포인트에 기반하여 객체의 포즈를 추정하고, 깊이 이미지를 신뢰할 수 없는 경우, 즉, 깊이 데이터가 누락되거나 오류 및 노이즈가 있는 상황에서, 객체 포즈 추정 장치는 2차원 키 포인트에 기반하여 객체의 포즈를 추정함으로써, 완전한 깊이 이미지에 대한 의존도를 낮추고, 객체 포즈 추정의 정확도를 높일 수 있다. 또한, 깊이 이미지의 신뢰도 판단 결과에 따라 깊이 이미지 또는 컬러 이미지를 적응적으로 선택하여 객체 포즈를 추정함으로써, 객체 포즈 추정의 견고성을 향상시킬 수 있다.In the present disclosure, an apparatus for estimating an object pose estimates an object pose using a depth image and a color image. First, the object pose estimation apparatus determines the reliability of the depth image according to the color image and the depth image. When the depth image is reliable, the object pose estimation device estimates the pose of the object based on the three-dimensional key point, and when the depth image is not reliable, that is, in the situation where the depth data is missing or there are errors and noise, The apparatus for estimating an object pose may reduce dependence on a complete depth image and increase accuracy of object pose estimation by estimating a pose of an object based on a two-dimensional key point. In addition, the robustness of object pose estimation may be improved by adaptively selecting a depth image or a color image to estimate the object pose according to the reliability determination result of the depth image.

이하, 본 개시에서 제공하는 각 선택적 실시예에 대해 상세히 설명한다.Hereinafter, each optional embodiment provided in the present disclosure will be described in detail.

선택적 실시 예에서, 객체의 컬러 이미지 및 깊이 이미지에 따라 깊이 이미지의 신뢰도를 결정하는 것은 다음의 A방안, B방안 중 어느 하나를 통해 획득할 수 있다.In an optional embodiment, determining the reliability of the depth image according to the color image and the depth image of the object may be obtained through any one of the following methods A and B.

A 방안은 다음 단계를 포함할 수 있다.Option A may include the following steps.

A1: 컬러 이미지를 기반으로, 또는 컬러 이미지 및 깊이 이미지를 기반으로 이미지 특징을 추출한다. A1: Extract image features based on color images, or based on color images and depth images.

A2: 깊이 이미지에 따라 포인트 클라우드 특징을 추출한다.A2: Extract point cloud features according to the depth image.

A3: 이미지 특징과 포인트 클라우드 특징을 융합하여 융합 특징을 얻는다.A3: By fusing the image feature and the point cloud feature, the fusion feature is obtained.

A4: 융합 특징을 기반으로 깊이 이미지의 신뢰도를 결정한다.A4: Determine the reliability of the depth image based on the fusion feature.

상기 각 단계 중 일부 단계 사이는 앞뒤 순서가 없을 수 있으며, 예를 들어 단계 A1과 단계 A2 사이에는 순서가 없을 수 있음을 이해해야 한다.It should be understood that there may be no back-and-forth order between some of the above steps, for example, there may be no order between steps A1 and A2.

실시 예에서 제공하는 A방안은 컬러 이미지를 기반으로 이미지 특징을 추출할 수 있고, 컬러 이미지 및 깊이 이미지를 기반으로 이미지 특징을 추출할 수도 있다. 이때, 이미지 특징의 추출은 이미지 특징 추출 네트워크, 이미지 특징 추출 알고리즘 등을 통해 수행될 수 있다. A방안의 후속 단계는 이미지 특징과 포인트 클라우드 특징을 융합할 것이기 때문에, 컬러 이미지만을 기반으로 이미지 특징을 추출한 상황에서, 융합된 특징은 깊이 이미지의 깊이 데이터 및 윤곽 데이터 등도 포함한다.Method A provided in the embodiment may extract image features based on a color image, and may extract image features based on a color image and a depth image. In this case, the image feature extraction may be performed through an image feature extraction network, an image feature extraction algorithm, or the like. Since the subsequent step of method A will fuse image features and point cloud features, in a situation where image features are extracted based on only color images, the fused features include depth data and contour data of the depth image.

일 선택적 실시예에 의해 제공되는 방안은 도 2에 도시되어 있다. 도 2의 단계별 상세한 설명은 추후 후술한다. A방안에서, 컬러 이미지 및 깊이 이미지를 기반으로 특징을 추출하고, 이미지 특징 추출 네트워크를 통해 이미지 특징을 추출한다. 즉, A방안은 컬러 이미지와 깊이 이미지를 이미지 특징 추출 네트워크에 입력한다. 컬러 이미지가 컬러인 경우, 이미지 특징 추출 네트워크의 입력은 컬러 이미지와 깊이 이미지를 픽셀별로 스플라이싱하여 얻은 H*W의 4개의 채널 이미지이다. 이때, H는 이미지의 높이이고, W는 이미지의 넓이이고, 4개의 채널은 각각 컬러 이미지의 RGB 데이터에 대응하는 3개의 채널과 깊이 이미지의 깊이 데이터에 대응하는 채널이며, 이미지 특징 추출 네트워크의 출력은 각 픽셀의 이미지 특징 벡터를 포함한다.A solution provided by an alternative embodiment is illustrated in FIG. 2 . Detailed description of each step of FIG. 2 will be described later. In method A, features are extracted based on color images and depth images, and image features are extracted through an image feature extraction network. That is, method A inputs a color image and a depth image to the image feature extraction network. If the color image is color, the input to the image feature extraction network is a four-channel image of H*W obtained by pixel-by-pixel splicing of the color image and the depth image. In this case, H is the height of the image, W is the width of the image, and the 4 channels are 3 channels corresponding to RGB data of the color image and a channel corresponding to the depth data of the depth image, respectively, the output of the image feature extraction network contains the image feature vector of each pixel.

A방안은 선택적으로, 깊이 이미지에 따라 포인트 클라우드 특징을 추출할 수 있다. 먼저, A방안은 깊이 이미지에 대해 포인트 클라우드를 변환하여 깊이 이미지에 대응하는 포인트 클라우드 데이터를 획득한 후, 다시 포인트 클라우드 데이터를 기반으로 포인트 클라우드 특징을 추출하여 깊이 이미지에 대응하는 포인트 클라우드 특징을 획득할 수 있다. 이때, 포인트 클라우드 특징 추출은 포인트 클라우드 특징 추출 네트워크를 통해 수행할 수 있다. 포인트 클라우드 특징 추출 네트워크의 입력은 포인트 클라우드 데이터이고, 포인트 클라우드 특징 추출 네트워크의 출력은 각 3차원 포인트의 포인트 클라우드 특징 벡터를 포함하고, 이를 통해 각 픽셀의 포인트 클라우드 특징 벡터를 얻는다. 이때, 포인트 클라우드 특징 벡터는 기하학적 특징 벡터로 특징화될 수 있다.Method A may optionally extract point cloud features according to the depth image. First, method A converts the point cloud for the depth image to obtain the point cloud data corresponding to the depth image, and then extracts the point cloud feature based on the point cloud data again to obtain the point cloud feature corresponding to the depth image can do. In this case, the point cloud feature extraction may be performed through the point cloud feature extraction network. The input of the point cloud feature extraction network is point cloud data, and the output of the point cloud feature extraction network includes a point cloud feature vector of each three-dimensional point, thereby obtaining a point cloud feature vector of each pixel. In this case, the point cloud feature vector may be characterized as a geometric feature vector.

A방안은 획득한 이미지 특징과 포인트 클라우드 특징을 융합하여 융합 특징을 얻는다. 구체적으로, A방안은 이미지에 포함된 각 픽셀에 대해, 이미지 특징 추출에 의해 획득된 각 픽셀의 이미지 특징 벡터 및 포인트 클라우드 특징 추출에 의해 획득된 각 픽셀의 포인트 클라우드 특징 벡터를 픽셀별로 융합하여, 융합 특징을 획득한다.Method A fuses the acquired image features and point cloud features to obtain a fusion feature. Specifically, for each pixel included in the image, method A fuses the image feature vector of each pixel obtained by image feature extraction and the point cloud feature vector of each pixel obtained by point cloud feature extraction for each pixel, Acquire fusion characteristics.

선택적으로, 융합 작업은 고밀도 융합일 수 있고, 융합 작업은 이미지 특징 벡터와 포인트 클라우드 특징 벡터를 스플라이싱하거나, 이미지 특징 벡터 및/또는 포인트 클라우드 특징 벡터를 다른 특징 공간에 투영하는 것일 수 있으며, 해당 특징 공간의 특징 벡터를 해당 픽셀의 융합 특징으로 사용한다. 이때, 융합 작업은 이미지 특징 벡터 및/또는 포인트 클라우드 특징 벡터를 다른 특징 공간에 투영하는 융합 방식을 통해, 이미지의 간섭 정보를 걸러내고, 융합 특징의 정보 중복을 줄일 수 있다.Optionally, the fusion operation may be high-density fusion, and the fusion operation may be splicing an image feature vector and a point cloud feature vector, or projecting an image feature vector and/or a point cloud feature vector to another feature space, The feature vector of the corresponding feature space is used as the fusion feature of the corresponding pixel. In this case, the fusion operation may filter out interference information of the image and reduce information duplication of the fusion feature through a fusion method of projecting an image feature vector and/or a point cloud feature vector to another feature space.

융합 특징에 기반하여 깊이 이미지의 신뢰도를 판단하는 A방안은, 컬러 이미지를 단독 사용하여 신뢰도를 판단하는 방안과 비교하여, 그 판단 결과가 보다 더 정확하며, 깊이 이미지 신뢰도의 판단 효율을 높이는 데 도움이 된다.Method A for judging the reliability of the depth image based on the fusion feature, compared to the method for judging the reliability using a color image alone, the judgment result is more accurate and helps to increase the judgment efficiency of the depth image reliability becomes this

선택적으로, 해당 융합 특징에 기반하여 깊이 이미지의 신뢰도를 판단하는 과정은 다음과 같다. 융합 특징을 기반으로 각 픽셀의 깊이 데이터로 구성된 깊이 신뢰도 이미지를 획득하고, 깊이 신뢰도 이미지의 각 픽셀의 깊이 데이터와 미리 설정된 신뢰도 임계값 또는 비례 임계값에 따라, 깊이 이미지를 신뢰할 수 있는지 여부를 결정한다.Optionally, the process of determining the reliability of the depth image based on the corresponding fusion feature is as follows. Acquire a depth confidence image composed of the depth data of each pixel based on the fusion feature, and determine whether the depth image is reliable according to the depth data of each pixel of the depth confidence image and a preset confidence threshold or proportionality threshold do.

B 방안은 다음 단계를 포함할 수 있다.Option B may include the following steps.

B1: 컬러 이미지 및 깊이 이미지를 기반으로 이미지 특징을 추출한다. B1: Extract image features based on color images and depth images.

B2: 이미지 특징에 따라 깊이 이미지의 신뢰도를 결정한다.B2: Determine the reliability of the depth image according to the image features.

선택적 실시예에서 제공하는 B 방안은 도 3에 도시되어 있으며, 도 2의 단계별 상세한 설명은 추후 후술한다. B 방안은 이미지 특징 추출 네트워크를 사용하여 컬러 이미지 및 깊이 이미지의 특징을 추출하여, 이미지 특징을 획득한다. B 방안은 이미지에 적어도 두 개의 타겟 객체가 있는 경우, 의심되는 타겟 객체의 이미지 영역에 대해 영역을 분할하여, 각 타겟 객체의 이미지 영역(ROI, Region of Interest)을 획득한다. B 방안은 이미지 영역에 대해 풀링 처리하여 해당 이미지 영역의 이미지 특징을 획득하고, 풀링된 이미지 특징에 따라, 타겟 객체에 대응하는 깊이 신뢰도 이미지를 획득한다. B 방안은 타겟 객체에 대응하는 깊이 신뢰도 이미지를 미리 설정된 신뢰도 임계값 또는 비례 임계값과 비교하여, 이미지 데이터의 신뢰도를 결정한다.Method B provided in the optional embodiment is shown in FIG. 3 , and a detailed description of each step of FIG. 2 will be described later. Method B uses an image feature extraction network to extract features of color images and depth images to obtain image features. In the method B, when there are at least two target objects in the image, the region is divided for the image region of the suspected target object, and an image region (ROI, Region of Interest) of each target object is obtained. Method B performs a pooling process on an image region to obtain image features of the image region, and according to the pooled image features, acquires a depth reliability image corresponding to a target object. Method B compares the depth reliability image corresponding to the target object with a preset reliability threshold or proportional threshold to determine the reliability of image data.

선택적 실시예에서, 융합 특징에 기반하여 깊이 이미지의 신뢰도를 결정하며, 이는 다음 단계를 포함한 방식으로 진행될 수 있다. In an optional embodiment, determining the reliability of the depth image based on the fusion feature, which may proceed in a manner comprising the following steps.

C1: 융합 특징에 기반하여, 객체 인스턴스 분할 이미지 및 깊이 신뢰도 이미지를 획득한다.C1: Acquire an object instance segmentation image and a depth confidence image based on the fusion feature.

C2: 객체 인스턴스 분할 이미지 및 깊이 신뢰도 이미지에 따라, 컬러 이미지 중 각 타겟 객체에 대응하는 깊이 이미지의 신뢰도를 결정한다.C2: According to the object instance segmentation image and the depth reliability image, the reliability of the depth image corresponding to each target object among the color images is determined.

도 2에 도시된 바와 같이, 객체 포즈 추정 장치는 이미지 특징과 포인트 클라우드 특징의 융합 특징을 획득한 후, 융합 특징을 기반으로 깊이 이미지의 신뢰도를 예측할 수 있으며, 이미지에 대응하는 깊이 신뢰도 이미지를 결정할 수 있다. 또한, 객체 포즈 추정 장치는 융합 특징을 기반으로 시맨틱 분할 및 중심 오프셋 추정을 수행하여 객체 인스턴스를 분할하고, 객체 인스턴스 분할 이미지를 얻을 수 있다. 그런 다음, 객체 인스턴스 분할 이미지 및 깊이 신뢰도 이미지에 따라, 컬러 이미지에서 각 타겟 객체에 대응하는 깊이 이미지의 신뢰도를 결정한다.As shown in FIG. 2 , the object pose estimation apparatus obtains the fusion feature of the image feature and the point cloud feature, and then can predict the reliability of the depth image based on the fusion feature, and determine the depth reliability image corresponding to the image. can Also, the apparatus for estimating an object pose may perform semantic segmentation and center offset estimation based on the fusion feature to segment an object instance, and obtain an object instance segmentation image. Then, according to the object instance segmentation image and the depth reliability image, the reliability of the depth image corresponding to each target object in the color image is determined.

구체적으로, 객체 포즈 추정 장치는 객체 인스턴스 분할 이미지에 따라 각 타겟 객체에 대응하는 이미지 영역의 픽셀을 얻을 수 있고, 깊이 신뢰도 이미지에 따라 해당 이미지 영역의 각 픽셀의 깊이 데이터를 얻을 수 있다. 즉, 객체 포즈 추정 장치는 각 타겟 객체에 대응하는 깊이 이미지를 얻을 수 있다. 객체 포즈 추정 장치는 미리 설정된 신뢰도 임계값 또는 비례 임계값에 따라 각 타겟 객체에 대응하는 깊이 이미지의 신뢰도를 결정할 수 있다. 구체적으로, 객체 포즈 추정 장치는 어떤 타겟 객체의 깊이 이미지의 깊이 데이터가 미리 설정된 임계값을 초과하거나, 통계적으로 신뢰할 수 있는 픽셀의 비율이 미리 설정된 임계값보다 큰 경우, 해당 타겟 객체의 깊이 이미지는 신뢰할 수 있는 이미지이고, 그렇지 않은 경우, 해당 타겟 객체의 깊이 이미지는 신뢰할 수 없다고 판단할 수 있다.Specifically, the apparatus for estimating an object pose may obtain a pixel of an image region corresponding to each target object according to the object instance segmentation image, and may obtain depth data of each pixel of the corresponding image region according to the depth reliability image. That is, the object pose estimation apparatus may obtain a depth image corresponding to each target object. The object pose estimation apparatus may determine the reliability of the depth image corresponding to each target object according to a preset reliability threshold or a proportional threshold. Specifically, when the depth data of a depth image of a certain target object exceeds a preset threshold or the statistically reliable ratio of pixels is greater than a preset threshold, the object pose estimation apparatus determines that the depth image of the target object is If the image is a reliable image, it may be determined that the depth image of the corresponding target object is not reliable.

이미지에 하나 이상의 타겟 객체가 있을 수 있기 때문에, 일부 타겟 객체는 신뢰할 수 있고 일부는 신뢰할 수 없는 경우, 전체 이미지에 대응하는 깊이 이미지의 신뢰도 판단 결과에 따라 각 타겟 객체의 신뢰도 결과를 얻을 수 없다. 해당 문제에 대해, 본 개시에서 제공하는 방안에서는, 융합 특징을 사용하여 각 타겟 객체에 대응하는 객체 인스턴스 분할 이미지를 얻은 다음, 각 타겟 객체에 대응하는 깊이 이미지를 얻고, 미리 설정된 신뢰도 임계값을 결합하여 각 타겟 객체에 대응하는 깊이 이미지의 신뢰도를 결정하여, 이미지 중 각 타겟 객체의 신뢰도를 결정한다. Since there may be more than one target object in the image, when some target objects are reliable and some are unreliable, the reliability result of each target object cannot be obtained according to the reliability determination result of the depth image corresponding to the entire image. For the problem, in the method provided by the present disclosure, an object instance segmentation image corresponding to each target object is obtained using a fusion feature, then a depth image corresponding to each target object is obtained, and a preset reliability threshold is combined. to determine the reliability of the depth image corresponding to each target object to determine the reliability of each target object in the image.

다른 대안으로, 깊이 이미지의 신뢰도에 대한 판단은 객체에 대한 별도의 판단이 아닌 전체적인 판단일 수 있다. 즉, 이미지의 타겟 객체가 하나 이상이더라도 전체 이미지를 판단할 수 있고, 판단 결과를 얻을 수 있다. 판단 결과를 신뢰할 수 있는 경우, 객체 포즈 추정 장치는 각 타겟 객체에 대해 3차원 키 포인트를 기반으로 한 포즈 추정 방법을 사용하여 객체의 포즈를 추정할 수 있다. 판단 결과를 신뢰할 수 없는 경우, 객체 포즈 추정 장치는 각 타겟 객체에 대해 2차원 키 포인트에 기반한 포즈 추정 방법을 사용하여 객체의 포즈를 추정할 수 있다. 해당 방법은 신뢰도 판단 과정에서의 데이터 처리량을 줄이고, 판단 효율을 높일 수 있다.Alternatively, the determination of the reliability of the depth image may be an overall determination rather than a separate determination of the object. That is, even if there is one or more target objects of the image, the entire image may be determined and a determination result may be obtained. If the determination result is reliable, the object pose estimation apparatus may estimate the pose of the object using a pose estimation method based on a three-dimensional key point for each target object. When the determination result is not reliable, the object pose estimation apparatus may estimate the pose of the object using a pose estimation method based on a two-dimensional key point for each target object. The method can reduce the data processing amount in the reliability determination process and increase the determination efficiency.

본 개시에서 제공하는 선택적 실시 예에서, 이미지 특징에 따라 깊이 이미지의 신뢰도를 결정하는 것은 다음 단계를 포함할 수 있다.In an optional embodiment provided by the present disclosure, determining the reliability of the depth image according to the image feature may include the following steps.

D1: 이미지 특징에 따라, 객체 인스턴스 분할 이미지 및 깊이 신뢰도 이미지를 획득한다.D1: According to the image feature, obtain an object instance segmentation image and a depth reliability image.

D2: 객체 인스턴스 분할 이미지 및 깊이 신뢰도 이미지에 따라, 컬러 이미지 중 각 타겟 객체에 대응하는 깊이 이미지의 신뢰도를 결정한다.D2: According to the object instance segmentation image and the depth reliability image, the reliability of the depth image corresponding to each target object among the color images is determined.

도 3에서 도시한 바와 같이, 컬러 이미지 및 깊이 이미지를 기반으로 이미지 특징을 획득하고, 이미지 특징에 대해 영역화 처리한다. 해당 영역화 처리는 영역 제안 네트워크를 통해 진행할 수 있고, 타겟 객체에 속할 수 있는 이미지 영역을 추출할 수 있다. 객체 포즈 추정 장치는 각 타겟 객체의 이미지 영역을 기반으로 타겟 객체에 대응하는 객체 인스턴스 분할 이미지 및 깊이 신뢰도 이미지를 획득하고, 객체 인스턴스 분할 이미지 및 깊이 신뢰도 이미지에 따라 그중 타겟 객체에 대응하는 깊이 이미지를 결정하고, 깊이 이미지 중 각 픽셀의 깊이 데이터와 미리 설정된 임계값의 비교 결과에 따라, 타겟 객체에 대응하는 깊이 이미지의 신뢰도를 결정한다. As shown in FIG. 3 , image features are acquired based on a color image and a depth image, and region processing is performed on the image features. The zoning process may be performed through the region suggestion network, and an image region that may belong to the target object may be extracted. The object pose estimation apparatus obtains an object instance segmentation image and a depth reliability image corresponding to the target object based on the image region of each target object, and selects a depth image corresponding to the target object among them according to the object instance segmentation image and the depth reliability image is determined, and the reliability of the depth image corresponding to the target object is determined according to a result of comparing the depth data of each pixel in the depth image with a preset threshold value.

해당 실시예에서 제공하는 방안은, 이미지 특징에 기반하여 타겟 객체에 대응하는 깊이 이미지의 신뢰도를 결정하고, 이미지 특징에 기반하여 타겟 객체를 분할하고, 각 타겟 객체에 대응하는 깊이 이미지를 획득하며, 깊이 이미지의 신뢰도 판단을 얻기 위한 데이터 처리량을 감소시킨다.The method provided in this embodiment is to determine the reliability of the depth image corresponding to the target object based on the image feature, segment the target object based on the image feature, and obtain a depth image corresponding to each target object, It reduces the data throughput for obtaining the reliability judgment of the depth image.

선택적 실시 예에 따라, 객체 포즈 추정 장치는 이미지 특징에 따라 객체 인스턴스 분할 이미지 및 깊이 신뢰도 이미지를 획득할 수 있다. 이는 다음 단계를 포함하는 방법으로 구현될 수 있다.According to an optional embodiment, the apparatus for estimating an object pose may acquire an object instance segmentation image and a depth reliability image according to image characteristics. This can be implemented in a way comprising the following steps.

E1: 이미지 특징에 따라, 각 타겟 객체에 대응하는 이미지 영역의 영역 이미지 특징을 획득한다.E1: According to the image feature, acquire a regional image feature of the image region corresponding to each target object.

E2: 각 타겟 객체에 대해, 해당 타겟 객체에 대응하는 영역 이미지 특징을 기반으로, 해당 타겟 객체의 깊이 신뢰도 이미지를 결정하고, 각 타겟 객체에 대응하는 영역 이미지 특징을 기반으로, 객체 인스턴스 분할 이미지를 획득한다.E2: For each target object, based on the area image feature corresponding to the target object, determine the depth reliability image of the target object, and based on the area image feature corresponding to each target object, the object instance segmentation image acquire

도 3과 도시된 바와 같이, 객체 포즈 추정 장치는 컬러 이미지와 깊이 이미지를 기반으로 이미지 특징을 결정하고, 이미지 특징에 대해 영역화 처리할 수 있다.As illustrated in FIG. 3 , the apparatus for estimating an object pose may determine an image feature based on a color image and a depth image, and may perform regionalization processing on the image feature.

객체 포즈 추정 장치는 영역 제안 네트워크 또는 영역 분할 알고리즘을 통해 타겟 객체에 대해 영역 분할을 진행할 수 있고, 각 타겟 객체에 대응하는 이미지 영역을 획득할 수 있다. 객체 포즈 추정 장치는 다시 각 타겟 객체에 대응하는 이미지 영역을 기반으로 특징을 추출하여 영역 이미지 특징을 획득하고, 이를 통해 타겟 객체 레이어에서 이미지를 분할할 수 있다. 이때, 영역 이미지 특징은 객체 이미지 영역에 풀링(ROI pooling)하여 얻을 수 있다.The apparatus for estimating an object pose may segment a target object through a region proposal network or a region segmentation algorithm, and may obtain an image region corresponding to each target object. The object pose estimation apparatus may again extract features based on an image region corresponding to each target object to obtain a region image feature, and through this, the image may be segmented in the target object layer. In this case, the region image feature may be obtained by pooling the object image region (ROI pooling).

객체 포즈 추정 장치는 각 타겟 객체에 대해, 해당 타겟 객체에 대응하는 영역 이미지 특징을 기반으로, 신경망의 전체 연결 레이어를 통해 해당 영역 이미지 특징을 처리할 수 있다. 그리고, 객체 포즈 추정 장치는 해당 타겟 객체에 대응하는 깊이 신뢰도 이미지를 결정한 후, 해당 타겟 객체에 대응하는 깊이 데이터의 미리 설정된 임계값과 결합하여 깊이 신뢰도 이미지를 신뢰할 수 있는지 여부를 판단할 수 있다. The apparatus for estimating an object pose may process a corresponding region image feature for each target object through the entire connection layer of the neural network, based on the region image feature corresponding to the target object. And, after determining the depth reliability image corresponding to the target object, the apparatus for estimating the object pose may determine whether the depth reliability image is reliable by combining it with a preset threshold value of depth data corresponding to the target object.

객체 포즈 추정 장치는 각 타겟 객체에 대해, 해당 타겟 객체에 대응하는 영역 이미지 특징을 기반으로, 신경망(예, 컨볼루션 신경망)을 사용하여 이미지에 대해 객체 인스턴스 분할을 진행하고, 객체 인스턴스 분할 이미지를 얻을 수 있으며, 객체 인스턴스 분할 이미지를 통해 타겟 객체의 윤곽 정보와 윤곽 내의 객체의 상세 정보를 얻을 수 있다.For each target object, the object pose estimation apparatus performs object instance segmentation on the image using a neural network (eg, convolutional neural network) based on the region image feature corresponding to the target object, and generates the object instance segmentation image. It is possible to obtain the outline information of the target object and detailed information of the object within the outline through the object instance segmentation image.

이를 바탕으로, 객체 포즈 추정 장치는 객체 인스턴스 분할 이미지 및 대응하는 깊이 신뢰도 이미지에 따라 각 타겟 객체에 대응하는 깊이 이미지를 결정하고, 해당 깊이 이미지와 대응하는 미리 설정된 임계값에 따라 해당 타겟 객체의 깊이 이미지에 대한 신뢰도를 결정할 수 있다. Based on this, the object pose estimation apparatus determines a depth image corresponding to each target object according to the object instance segmentation image and the corresponding depth reliability image, and the depth of the target object according to a preset threshold value corresponding to the depth image The reliability of the image can be determined.

본 개시에서 제공하는 방안은, 이미지 특징을 기반으로 타겟 객체 레이어의 영역을 분할하고, 신경망 처리를 통해 타겟 객체의 윤곽 및 세부 정보를 획득하여 타겟 객체의 정확한 분할을 획득하는 것으로, 이는 타겟 객체에 대응하는 깊이 이미지를 얻는 신뢰도 결과의 판단 정확성을 향상시키는 데 도움이 된다.The method provided in the present disclosure is to segment an area of a target object layer based on image features, and obtain an accurate segmentation of the target object by acquiring the outline and detailed information of the target object through neural network processing, which is It helps to improve the judgment accuracy of the reliability result of obtaining the corresponding depth image.

해당 실시 방안에서, 객체 포즈 추정 장치는 추가적으로, 객체 인스턴스 분할 이미지 및 각 타겟 객체에 대응하는 영역 이미지 특징을 기반으로, 깊이 이미지에 기반하여 객체의 포즈를 추정하거나, 컬러 이미지에 기반하여 객체의 포즈를 추정하는 것을 구현할 수 있다. 구체적인 단계는 다음과 같다.In the corresponding embodiment, the object pose estimation apparatus additionally estimates the pose of the object based on the depth image, or the pose of the object based on the color image, based on the object instance segmentation image and the region image feature corresponding to each target object. It can be implemented to estimate . The specific steps are as follows.

E3: 객체 인스턴스 분할 이미지 및 각 타겟 객체의 대응하는 영역 이미지 특징에 따라, 각 타겟 객체의 이미지 특징을 획득한다.E3: According to the object instance segmentation image and the corresponding region image feature of each target object, obtain an image feature of each target object.

E4: 각 타겟 객체에 대해, 해당 타겟 객체의 이미지 특징을 기반으로 객체의 포즈를 추정한다.E4: For each target object, estimate the pose of the object based on the image feature of the target object.

도 3에 도시된 바와 같이, 객체 포즈 추정 장치는 객체 인스턴스의 분할 이미지와 대응하는 영역 이미지 특징을 처리하여, 예를 들어 곱하여, 타겟 객체에 대응하는 이미지 특징을 얻을 수 있다.As shown in FIG. 3 , the apparatus for estimating an object pose may process, for example, multiply a segmented image of an object instance and a corresponding region image feature to obtain an image feature corresponding to the target object.

각 타겟 객체에 대해, 해당 타겟 객체의 이미지 특징을 객체 포즈 추정의 기본 데이터(예를 들어, 2차원 키 포인트를 기반으로 한 객체 포즈 추정의 기본 데이터, 또는 3차원 키 포인트를 기반으로 한 객체 포즈 추정의 기본 데이터)로 하여 타겟 객체의 이미지 특징에 따라 객체의 2차원 키 포인트 오프셋 이미지 또는 3차원 키 포인트 오프셋 이미지를 획득하는 것은, 2차원 키 포인트 오프셋 이미지 및 3차원 키 포인트 오프셋 이미지의 정확도를 향상시키는데 도움이 된다.For each target object, the image characteristics of that target object are combined with the basic data of object pose estimation (e.g., basic data of object pose estimation based on two-dimensional key points, or object pose based on three-dimensional key points). (basic data of estimation) to obtain a two-dimensional key point offset image or a three-dimensional key point offset image of an object according to the image characteristics of the target object, the accuracy of the two-dimensional key point offset image and the three-dimensional key point offset image helps to improve

도 1의 120단계에서 객체 포즈 추정 장치는 3차원 키 포인트에 기반하여 객체의 포즈를 추정한다. 120단계는 다음의 도 4와 같은 방식으로 진행될 수 있다.In step 120 of FIG. 1 , the object pose estimation apparatus estimates the pose of the object based on the 3D key point. Step 120 may proceed in the same manner as in FIG. 4 below.

도 4는 일 실시 예에 따라 깊이 이미지에 기반하여 객체 포즈를 추정하는 과정을 도시한 흐름도이다.4 is a flowchart illustrating a process of estimating an object pose based on a depth image according to an exemplary embodiment.

도 4를 참조하면, 객체 포즈 추정 장치는 컬러 이미지, 또는 컬러 이미지 및 깊이 이미지에 기반하여 이미지 특징을 추출한다(410).Referring to FIG. 4 , the object pose estimation apparatus extracts image features based on a color image or a color image and a depth image ( S410 ).

그리고, 객체 포즈 추정 장치는 깊이 이미지에 기반하여 포인트 클라우드 특징을 추출한다(420).Then, the object pose estimation apparatus extracts a point cloud feature based on the depth image ( 420 ).

그리고, 객체 포즈 추정 장치는 이미지 특징과 포인트 클라우드 특징을 융합하여 융합 특징을 획득한다(430).Then, the object pose estimation apparatus acquires a fusion feature by fusing the image feature and the point cloud feature ( 430 ).

그리고, 객체 포즈 추정 장치는 융합 특징에 따라 객체의 포즈를 추정한다(440).Then, the object pose estimation apparatus estimates the pose of the object according to the fusion feature ( 440 ).

본 개시에서 제공하는 방안은 컬러 이미지를 기반으로 이미지 특징을 추출할 수 있고, 컬러 이미지 및 깊이 이미지를 기반으로 이미지 특징을 추출할 수도 있다. 이때, 이미지 특징의 추출은 이미지 특징 추출 네트워크, 이미지 특징 추출 알고리즘 등을 통해 수행할 수 있다. 이후, 이미지 특징과 포인트 클라우드 특징의 융합이 진행되기 때문에, 컬러 이미지만을 기반으로 이미지 특징을 추출한 경우에도 융합된 특징에는 깊이 이미지의 깊이 데이터 및 윤곽 데이터 등의 데이터가 포함된다.The method provided in the present disclosure may extract image features based on a color image, and may extract image features based on a color image and a depth image. In this case, the image feature extraction may be performed through an image feature extraction network, an image feature extraction algorithm, or the like. Thereafter, since the fusion of the image feature and the point cloud feature proceeds, even when the image feature is extracted based on only the color image, the fused feature includes data such as depth data and contour data of the depth image.

선택적 실시예에서 제공하는 방안은 도 2에 도시된 바와 같을 수 있다. 해당 방안에서, 컬러 이미지와 깊이 이미지를 기반으로 이미지 특징 추출 네트워크를 통해 이미지 특징을 추출한다. 즉, 객체 포즈 추정 장치는 컬러 이미지 및 깊이 이미지를 이미지 특징 네트워크에 입력하여 이미지 특징을 추출할 수 있다. 이때, 특징 추출 네트워크의 출력은 각 픽셀의 이미지 특징 벡터를 포함한다.The method provided in the optional embodiment may be as shown in FIG. 2 . In this method, image features are extracted through an image feature extraction network based on a color image and a depth image. That is, the object pose estimation apparatus may extract image features by inputting the color image and the depth image to the image feature network. In this case, the output of the feature extraction network includes the image feature vector of each pixel.

선택적으로, 객체 포즈 추정 장치는 깊이 이미지에 따라 포인트 클라우드 특징을 추출한다. 먼저, 객체 포즈 추정 장치는 깊이 이미지에 대해 포인트 클라우드를 변환하여, 깊이 이미지에 대응하는 포인트 클라우드 데이터를 획득하고, 포인트 클라우드 데이터를 기반으로 포인트 클라우드 특징을 추출하여, 깊이 이미지에 대응하는 포인트 클라우드 특징을 획득한다. 객체 포즈 추정 장치는 포인트 클라우드 특징 추출 네트워크를 통해 포인트 클라우드 특징을 추출할 수 있다. 이때, 클라우드 특징 추출 네트워크의 입력은 포인트 클라우드 데이터이고, 출력에는 각 3차원 포인트의 포인트 클라우드 특징 벡터가 포함된다.Optionally, the object pose estimation apparatus extracts a point cloud feature according to the depth image. First, the object pose estimation apparatus transforms the point cloud for the depth image, obtains point cloud data corresponding to the depth image, extracts point cloud features based on the point cloud data, and points cloud features corresponding to the depth image to acquire The object pose estimation apparatus may extract a point cloud feature through a point cloud feature extraction network. In this case, the input of the cloud feature extraction network is point cloud data, and the output includes the point cloud feature vector of each three-dimensional point.

객체 포즈 추정 장치는 획득한 이미지 특징과 포인트 클라우드 특징을 융합하여 융합 특징을 얻는다. 구체적으로, 이미지에 포함된 각 픽셀에 있어서, 각 픽셀에 대해, 해당 픽셀의 이미지 특징 벡터와 포인트 클라우드 특징 벡터를 융합하여, 융합 특징을 획득한다. 융합 작업은 고밀도 융합일 수 있다.The object pose estimation apparatus obtains a fusion feature by fusing the acquired image feature and the point cloud feature. Specifically, for each pixel included in the image, for each pixel, the image feature vector and the point cloud feature vector of the pixel are fused to obtain a fusion feature. The fusion operation may be a high density fusion.

객체 포즈 추정 장치는 융합 특징을 기반으로 깊이 이미지의 신뢰도를 판단할 수 있다. 융합 특징을 기반으로 깊이 이미지의 신뢰도를 판단하는 것은 단독으로 컬러 이미지를 사용하여 신뢰도를 판단하는 방법과 비교하여, 그 판단 결과는 훨씬 더 정확하며, 깊이 이미지의 신뢰도를 얻는 효율을 높이는데 도움이 된다.The object pose estimation apparatus may determine the reliability of the depth image based on the fusion feature. Judging the reliability of the depth image based on the fusion feature compared to the method of judging the reliability using a color image alone, the judgment result is much more accurate, and it helps to increase the efficiency of obtaining the reliability of the depth image. do.

융합 특징을 기반으로 2차원 키 포인트 오프셋 이미지와 3차원 키 포인트 오프셋 이미지를 획득하는 것은, 딥 러닝을 통해 2차원 키 포인트 오프셋 이미지와 3차원 키 포인트 오프셋 이미지의 획득을 얻을 수 있다. 그런 다음, 객체 포즈 추정 장치는 깊이 이미지의 신뢰도 판단 결과에 따라 객체의 포즈를 추정하고, 깊이 이미지를 신뢰할 수 잇는 경우 3차원 키 포인트 오프셋 이미지에 따라 객체의 포즈를 추정하고, 깊이 이미지를 신뢰할 수 없는 경우, 2차원 키 포인트 오프셋 이미지에 따라 객체의 포즈를 추정한다.Acquiring a two-dimensional key point offset image and a three-dimensional key point offset image based on the fusion feature can obtain a two-dimensional key point offset image and a three-dimensional key point offset image through deep learning. Then, the object pose estimation apparatus estimates the pose of the object according to the reliability determination result of the depth image, and if the depth image is reliable, estimates the pose of the object according to the three-dimensional key point offset image, and the depth image can be trusted. If there is none, the pose of the object is estimated according to the two-dimensional key point offset image.

도 3에 도시한 바와 같이, 객체 포즈 추정 장치는 이미지 특징을 사용하여 타겟 객체에 대해 영역화 처리를 하여, 타겟 객체의 인스턴스 분할 이미지 특징을 획득하고, 깊이 이미지에 대해 포인트 클라우드 특징을 추출하여 인스턴스 분할 포인트 클라우드 특징을 획득하고, 분할된 이미지 특징과 포인트 클라우드 특징을 융합하여, 융합 특징을 기반으로 3차원 키 포인트 오프셋 이미지를 추정한 후, 3차원 키 포인트 오프셋 이미지를 기반으로 객체의 포즈를 추정한다.As shown in FIG. 3 , the object pose estimation apparatus performs segmentation processing on the target object using image features to obtain instance segmentation image features of the target object, and extracts point cloud features from the depth image to instance After acquiring the segmented point cloud feature, fusing the segmented image feature and the point cloud feature, estimating a three-dimensional key point offset image based on the fusion feature, and then estimating the pose of the object based on the three-dimensional key point offset image do.

이때, 3차원 키 포인트 오프셋 이미지의 3개 채널은 해당 픽셀에 대응하는 객체 포인트의 3D 좌표에서 객체 상에 미리 설정된 기준 키 포인트의 3D 좌표까지의 편차 벡터를 나타낸다.In this case, the three channels of the three-dimensional key point offset image represent a deviation vector from the 3D coordinates of the object point corresponding to the pixel to the 3D coordinates of the reference key point preset on the object.

구체적으로, 3차원 키 포인트 오프셋 이미지에 따라 객체의 포즈를 추정하는 것은 다음과 같은 방법으로 수행될 수 있다. 객체 포즈 추정 장치는 타겟 객체의 임의의 3D 픽셀에서 객체의 미리 설정된 기준 키 포인트를 통과할 직선을 결정한다. 객체 포즈 추정 장치는 해당 타겟 객체의 픽셀 영역의 모든 N개의 픽셀에서 대응하는 3차원 키 포인트 오프셋 이미지에 따라 미리 설정된 기준 키 포인트를 통과하는 N개의 직선을 얻을 수 있다. 그런 다음, 객체 포즈 추정 장치는 투표 방식을 통해 타겟 객체의 미리 설정된 기준 키 포인트의 3차원 좌표를 결정할 수 있다. 이러한 방식으로 타겟 객체 상의 M개의 키 포인트의 3차원 좌표를 획득한 후, 객체의 3차원 모델 상에 있는 M개의 대응점은 최소 제곱에 의해 추정되며, 두 그룹 3차원 포인트 간의 3차원 회전 및 3차원 변환을 획득하여, 객체의 6D 포즈를 얻을 수 있다.Specifically, estimating the pose of the object according to the 3D key point offset image may be performed in the following way. The object pose estimation apparatus determines a straight line passing through a preset reference key point of the object in an arbitrary 3D pixel of the target object. The apparatus for estimating an object pose may obtain N straight lines passing through a preset reference key point according to a three-dimensional key point offset image corresponding to all N pixels of a pixel region of a corresponding target object. Then, the object pose estimation apparatus may determine the three-dimensional coordinates of the preset reference key point of the target object through a voting method. After obtaining the three-dimensional coordinates of the M key points on the target object in this way, the M corresponding points on the three-dimensional model of the object are estimated by least squares, and the three-dimensional rotation and three-dimensionality between the two groups of three-dimensional points By obtaining the transform, we can get the 6D pose of the object.

도 3에 도시된 바와 같이, 객체 포즈 추정 장치는 컬러 이미지와 깊이 이미지를 기반으로 이미지 특징을 추출하고, 이미지 특징을 기반으로 깊이 이미지의 신뢰도 판단을 수행하여, 깊이 이미지의 신뢰도 판단 결과를 얻을 수 있다. 객체 포즈 추정 장치는 깊이 이미지를 신뢰할 수 있는 경우, 이미지 특징과 깊이 이미지를 기반으로 추출된 포인트 클라우드 특징을 융합하여, 융합된 특징을 기반으로 3차원 키 포인트를 획득한다.As shown in FIG. 3 , the object pose estimation apparatus extracts image features based on the color image and the depth image, and determines the reliability of the depth image based on the image features to obtain the reliability determination result of the depth image. there is. When the depth image is reliable, the object pose estimation apparatus fuses the image feature and the point cloud feature extracted based on the depth image to obtain a three-dimensional key point based on the fused feature.

객체 포즈 추정 장치는 타겟 객체가 복수인 경우, 각 타겟 객체에 대응하는 이미지 특징과 포인트 클라우드 특징을 융합하고, 융합된 특징을 기반으로 3차원 키 포인트를 획득한다.When there are a plurality of target objects, the apparatus for estimating an object pose fuses an image feature and a point cloud feature corresponding to each target object, and acquires a three-dimensional key point based on the fused feature.

구체적으로, 객체 포즈 추정 장치는 이미지 특징과 포인트 클라우드 특징에 따라 융합하여 융합 특징을 획득한다. Specifically, the object pose estimation apparatus acquires the fusion feature by fusion according to the image feature and the point cloud feature.

융합 이미지를 기반으로 한 객체 포즈 추정의 과정은 다음과 같다. 객체 포즈 추정 장치는 MPL 도구와 같은 3D 이미지를 그리는 도구를 사용하여, 타겟 객체의 융합 특징에 따라 3차원 키 포인트 오프셋 이미지를 추정하고, 3차원 키 포인트 오프셋 이미지를 기반으로 객체 포즈를 추정한다.The process of object pose estimation based on the fusion image is as follows. The object pose estimation apparatus estimates a three-dimensional key point offset image according to a fusion characteristic of a target object by using a 3D image drawing tool such as an MPL tool, and estimates an object pose based on the three-dimensional key point offset image.

한편, 도 1의 130단계에서 제공하는 2차원 키 포인트를 기반으로 객체의 포즈를 추정하는 단계는 다음과 같은 방식을 통해 진행될 수 있다.Meanwhile, the step of estimating the pose of the object based on the two-dimensional key point provided in step 130 of FIG. 1 may be performed in the following manner.

F1: 컬러 이미지 및 깊이 이미지를 기반으로 이미지 특징을 추출한다.F1: Extract image features based on color image and depth image.

F2: 이미지 특징에 따라 객체의 포즈를 추정한다.F2: Estimate the pose of the object according to the image features.

해당 방안은 깊이 이미지를 신뢰할 수 없을 때, 컬러 이미지를 기반으로 객체의 포즈를 추정하는 방안으로, 객체 포즈 추정 장치는 깊이 이미지를 기반으로 깊이 이미지 중 객체의 윤곽 데이터를 획득하고, 컬러 이미지 및 깊이 이미지 중 객체의 윤곽 데이터에 따라 객체의 포즈를 추정한다. 객체 포즈 추정 기준에 객체의 깊이 데이터는 포함되지 않기 때문에, 이미지 특징에 따른 객체 포즈 추정을 2차원 키 포인트 기반의 객체 포즈 추정이라고도 한다.This method is a method of estimating a pose of an object based on a color image when the depth image is unreliable. The object pose estimation apparatus acquires contour data of an object from among the depth images based on the depth image, and includes a color image and a depth The pose of the object is estimated according to the contour data of the object in the image. Since depth data of an object is not included in the object pose estimation standard, object pose estimation according to image features is also called two-dimensional key point-based object pose estimation.

선택적으로, 이미지 특징에 기반한 객체의 포즈 추정은 다음을 포함하는 방법으로 수행될 수 있다. 객체 포즈 추정 장치는 컨볼루션 신경망을 사용하여 타겟 객체의 이미지 특징에 따라 해당 타겟 객체의 2차원 키 포인트 오프셋 이미지를 추정하고, 2차원 키 포인트 오프셋 이미지를 기반으로 객체의 포즈를 추정한다.Optionally, the pose estimation of the object based on the image features may be performed in a method comprising: The object pose estimation apparatus estimates a two-dimensional key point offset image of a corresponding target object according to image characteristics of the target object using a convolutional neural network, and estimates a pose of the object based on the two-dimensional key point offset image.

객체 포즈 추정 장치는 선택적으로, 이미지 특징에 따라 타겟 객체를 영역화 처리하여 영역 이미지 특징을 획득하고, 영역 이미지 특징을 기반으로 인스턴스 분할 이미지 특징을 획득한다. 그런 다음, 객체 포즈 추정 장치는 인스턴스 분할 이미지 특징을 컨볼루션 신경망의 입력으로 사용하여 2차원 키포인트 오프셋 이미지를 추정하고, 2차원 키 포인트 오프셋 이미지를 기반으로 객체의 포즈를 추정한다.The apparatus for estimating an object pose optionally obtains a region image feature by regionalizing the target object according to the image feature, and obtains an instance segmentation image feature based on the region image feature. Then, the object pose estimation apparatus estimates a two-dimensional key point offset image by using the instance segmentation image feature as an input of the convolutional neural network, and estimates the pose of the object based on the two-dimensional key point offset image.

이때, 2차원 키 포인트 오프셋 이미지 중 각 픽셀의 두 채널은 해당 픽셀에 대응하는 객체 포인트의 2차원 좌표에서 타겟 객체에 미리 설정된 기준점의 2차원 좌표까지의 편차 벡터를 나타낸다. In this case, two channels of each pixel in the two-dimensional key point offset image represent a deviation vector from the two-dimensional coordinates of the object point corresponding to the pixel to the two-dimensional coordinates of the reference point preset in the target object.

구체적으로, 2차원 키 포인트 오프셋 이미지에 따라 객체의 포즈를 추정하는 것은 다음과 같은 방법으로 수행될 수 있다. 객체 포즈 추정 장치는 타겟 객체의 임의의 픽셀에서 해당의 미리 설정된 기준점을 통과할 직선을 결정한다. 객체 포즈 추정 장치는 해당 타겟 객체의 픽셀 영역의 모든 N개의 픽셀에 대응하는 2차원 키 포인트 오프셋 이미지에 따라 해당의 미리 설정된 기준점을 통과하는 N개의 직선을 얻을 수 있다. 그런 다음, 객체 포즈 추정 장치는 투표 방식을 통해 타겟 객체에 있는 해당의 미리 설정된 기준점의 2차원 좌표를 얻을 수 있다. 이러한 방식으로 객체 포즈 추정 장치는 타겟 객체 상의 미리 설정된 M개의 기준점의 2차원 좌표를 획득한 후, 3차원 모델 상의 M개 키 포인트의 대응점을 사용하여, PnP(Perspective-n-Point) 알고리즘에 따라 카메라 좌표계와 객체 좌표계 사이의 3D 회전 및 3D 이동을 계산하여, 객체의 6D 포즈를 얻을 수 있다.Specifically, estimating the pose of the object according to the two-dimensional key point offset image may be performed in the following way. The object pose estimation apparatus determines a straight line passing through the preset reference point in an arbitrary pixel of the target object. The object pose estimation apparatus may obtain N straight lines passing through the preset reference point according to the two-dimensional key point offset image corresponding to all N pixels of the pixel region of the corresponding target object. Then, the object pose estimation apparatus may obtain the two-dimensional coordinates of the corresponding preset reference point in the target object through a voting method. In this way, the object pose estimation apparatus obtains the two-dimensional coordinates of M preset reference points on the target object, and then uses the corresponding points of the M key points on the three-dimensional model according to the PnP (Perspective-n-Point) algorithm. By calculating the 3D rotation and 3D movement between the camera coordinate system and the object coordinate system, a 6D pose of the object can be obtained.

도 4의 430단계에서 제공하는 이미지 특징 및 포인트 클라우드 특징을 융합함으로써 융합 특징을 획득하는 단계는 아래의 도 5의 방식으로 구현될 수 있다.The step of acquiring the fusion feature by fusing the image feature and the point cloud feature provided in step 430 of FIG. 4 may be implemented in the manner of FIG. 5 below.

도 5는 일 실시 예에 따라 이미지 특징과 포인트 클라우드 특징을 융합하여 융합된 특징을 획득하는 과정을 도시한 흐름도이다.5 is a flowchart illustrating a process of acquiring a fused feature by fusing an image feature and a point cloud feature, according to an embodiment.

도 5를 참조하면, 객체 포즈 추정 장치는 이미지 특징에 따라, 객체 인스턴스 분할 이미지가 획득한다(510).Referring to FIG. 5 , the apparatus for estimating an object pose obtains an object instance segmentation image according to an image feature ( 510 ).

그리고, 객체 포즈 추정 장치는 객체 인스턴스에 분할 이미지에 따라, 각 타겟 객체의 이미지 특징 및 포인트 클라우드 특징을 획득한다(520).Then, the apparatus for estimating the object pose acquires image features and point cloud features of each target object according to the segmented image of the object instance ( 520 ).

그리고, 객체 포즈 추정 장치는 동일한 타겟 객체의 이미지 특징과 포인트 클라우드 특징을 융합하여 각 타겟 객체의 융합 특징을 획득한다(530).Then, the apparatus for estimating an object pose acquires a fusion characteristic of each target object by fusing an image characteristic and a point cloud characteristic of the same target object ( 530 ).

도 3에 도시된 바와 같이, 객체 포즈 추정 장치는 이미지 특징을 기반으로 영역화 처리를 수행하여 영역 이미지 특징을 얻고, 영역 이미지 특징을 기반으로 객체 인스턴스 분할을 수행하여 객체 인스턴스 분할 이미지를 획득한다. 그리고, 객체 포즈 추정 장치는 영역 이미지 특징과 객체 인스턴스 분할 이미지에 따라 인스턴스 분할 이미지 특징, 즉 타겟 객체의 이미지 특징을 획득한다. 그리고, 객체 포즈 추정 장치는 인스턴스 분할 이미지 및 포인트 클라우드 특징에 따라 인스턴스 분할 기하학적 특징, 즉 타겟 객체의 포인트 클라우드 특징을 획득한다. 그런 다음 객체 포즈 추정 장치는 동일한 타겟 객체에 대응하는 이미지 특징과 포인트 클라우드 특징을 융합하여 각 타겟 객체의 융합 특징을 획득한다.As shown in FIG. 3 , the apparatus for estimating an object pose obtains a region image feature by performing regionalization processing based on the image feature, and obtains an object instance segmentation image by performing object instance segmentation based on the region image feature. The apparatus for estimating the object pose acquires an instance segmentation image feature, that is, an image feature of the target object, according to the region image feature and the object instance segmentation image. Then, the object pose estimation apparatus acquires the instance segmentation geometrical feature, that is, the point cloud feature of the target object, according to the instance segmentation image and the point cloud feature. Then, the object pose estimation apparatus acquires the fusion feature of each target object by fusing the image feature and the point cloud feature corresponding to the same target object.

도 3의 실시 예는 타겟 객체의 융합 특징을 획득하는 다른 방안을 제공하며, 해당 방안은 컬러 이미지에 하나 이상의 타겟 객체가 존재하는 경우에 적용될 수 있고, 컬러 이미지 및 깊이 이미지로부터 특징을 추출하고 특징을 융합하는 방안과 비교하여, 영역화 처리 및 인스턴스 분할 등 처리를 통해, 각 타겟 객체의 이미지 특징 및 포인트 클라우드 특징을 세분화하고, 두번의 특징 추출을 통해 보다 더 정확한 융합 특징을 얻는데 도움이 된다.The embodiment of FIG. 3 provides another method for acquiring the fusion feature of a target object, and the method can be applied when one or more target objects exist in a color image, extracting features from a color image and a depth image, and Compared to the fusion method, it is helpful to subdivide the image features and point cloud features of each target object through processing such as regionalization processing and instance segmentation, and to obtain more accurate fusion features through two feature extractions.

이후, 객체 포즈 추정 장치는 융합 특징에 따라 객체의 포즈를 추정하며, 이는 다음 단계를 포함한다. 객체 포즈 추정 장치는 각 타겟 객체의 융합 특징에 따라, 각 타겟 객체의 포즈를 추정한다.Then, the object pose estimation apparatus estimates the pose of the object according to the fusion feature, which includes the following steps. The object pose estimation apparatus estimates a pose of each target object according to a fusion characteristic of each target object.

이때, 객체 포즈 추정 장치는 각 타겟 객체의 융합 특징에 따라 해당 타겟 객체의 포즈를 추정하고, 각 타겟 객체에 대해 모두 다음과 같은 작업이 수행된다.In this case, the object pose estimation apparatus estimates the pose of the corresponding target object according to the fusion characteristic of each target object, and the following operations are performed for each target object.

객체 포즈 추정 장치는 타겟 객체의 융합 특징을 기반으로 3차원 키 포인트 오프셋 이미지를 추정한 다음, 3차원 키 포인트 오프셋 이미지를 기반으로 해당 타겟 객체의 포즈를 추정하고, 차례로 각 타겟 개체의 포즈를 획득한다.The object pose estimation apparatus estimates a three-dimensional key point offset image based on the fusion characteristic of the target object, then estimates the pose of the corresponding target object based on the three-dimensional key point offset image, and in turn acquires the pose of each target object do.

객체 포즈 추정 장치는 타겟 객체가 여러 개인 경우, 각 타겟 객체를 모두 영역화 처리 및 인스턴스 분할하여 각 타겟 객체의 정확한 융합 특징을 획득하고, 그런 다음 융합 특징을 기반으로 각 타겟 객체의 포즈를 추정한다. 타겟 객체의 융합 특징에 따라, 각 타겟 객체의 포즈를 추정하는 방법은 컬러 이미지의 모든 타겟 객체의 정확한 포즈를 얻는데 효과적이다.When there are multiple target objects, the apparatus for estimating the target object acquires accurate fusion characteristics of each target object by segmentation processing and instance segmentation of each target object, and then estimating the pose of each target object based on the fusion characteristics. . According to the fusion characteristic of the target object, the method of estimating the pose of each target object is effective to obtain accurate poses of all target objects in the color image.

이미지 중 복수의 타겟 객체가 있는 상황에 대해, 본 개시는 타겟 객체의 객체 관계를 기반으로 포즈를 추정하는 실시 예를 제공한다.For a situation in which there are a plurality of target objects in an image, the present disclosure provides an embodiment of estimating a pose based on an object relationship of the target objects.

H1: 객체 포즈 추정 장치는 컬러 이미지 및 깊이 이미지에 기초하여, 각 타겟 객체의 제1 외관 특징 및 각 타겟 객체 간의 기하학적 관계 특징을 획득한다.H1: The object pose estimation apparatus acquires, based on the color image and the depth image, a first appearance feature of each target object and a geometrical relationship feature between each target object.

H2: 객체 포즈 추정 장치는 각 타겟 객체에 대해, 해당 타겟 객체의 제1 외관 특징, 해당 타겟 객체 이외의 기타 타겟 객체의 제1 외관 특징 및 해당 타겟 객체와 해당 타겟 객체 이외의 기타 타겟 객체 간의 기하학적 관계 특징을 기반으로, 해당 타겟 객체의 제2 외관 특징을 획득한다.H2: for each target object, the apparatus for estimating the object pose, for each target object, includes a first appearance feature of the corresponding target object, first appearance features of other target objects other than the corresponding target object, and a geometrical relationship between the target object and other target objects other than the corresponding target object. Based on the relational feature, a second appearance feature of the corresponding target object is acquired.

도 6과 같이, 객체 포즈 추정 장치는 컬러 이미지와 깊이 이미지에 따라 특징 추출과 융합을 진행하고, 융합 특징을 기반으로 각 타겟 객체의 제1 외관 특징 및 각 타겟 객체 간의 기하학적 관계 특징을 획득할 수 있다. 도 6의 단계별 상세한 설명은 추후 후술한다. 이때, 객체 포즈 추정 장치는 융합 특징을 기반으로 객체 인스턴스 분할 이미지를 얻고, 객체 인스턴스 분할 이미지를 기반으로 각 타겟 객체 간의 기하학적 관계 특징을 획득한다.As shown in FIG. 6 , the object pose estimation apparatus performs feature extraction and fusion according to the color image and the depth image, and based on the fusion feature, the first appearance feature of each target object and the geometrical relationship feature between each target object can be obtained. there is. Step-by-step detailed description of FIG. 6 will be described later. In this case, the apparatus for estimating an object pose obtains an object instance segmentation image based on the fusion feature, and obtains a geometrical relationship feature between each target object based on the object instance segmentation image.

이때, 융합 특징을 기반으로 각 타겟 객체의 제1 외관 특징을 획득하는 방법은 다음과 같은 방법으로 구현될 수 있다. 객체 포즈 추정 장치는 융합 특징을 기반으로, 투표 방식을 통해 입력 데이터에서 각 픽셀이 속한 타겟 객체의 중심점 좌표 및 해당 타겟 객체의 외관 특징을 예측하고, 예측된 중심점의 좌표에 대해, 클러스터링 방식을 통해 복수의 타겟 객체 및 각 타겟 객체의 픽셀을 획득하고, 각 타겟 객체에 대해, 해당 각 타겟 객체에 속하는 픽셀 특징을 융합하여, 각 타겟 객체의 제1 외관 특징을 획득한다.In this case, a method of acquiring the first appearance feature of each target object based on the fusion feature may be implemented in the following way. The object pose estimation apparatus predicts the center point coordinates of the target object to which each pixel belongs and the appearance characteristics of the target object from the input data through a voting method based on the fusion characteristic, and the predicted center point coordinates through a clustering method A plurality of target objects and pixels of each target object are acquired, and for each target object, pixel features belonging to the respective target objects are fused to obtain a first appearance feature of each target object.

도 7에 도시된 바와 같이, 객체 포즈 추정 장치는 컬러 이미지와 깊이 이미지를 기반으로 이미지 특징을 추출하고, 이미지 특징을 기반으로 영역화 처리를 통해 영역 이미지 특징을 획득하고, 영역 이미지 특징을 기반으로 각 타겟 객체의 제1 외관 특징 및 각 타겟 객체 간의 기하학적 관계 특징을 획득할 수 있다. 도 7의 단계별 상세한 설명은 추후 후술한다. 이때, 객체 포즈 추정 장치는 영역 이미지 특징을 기반으로 객체를 탐지하고, 객체 탐지 결과를 기반으로 각 타겟 객체 간의 기하학적 관계 특징을 획득한다.As shown in FIG. 7 , the object pose estimation apparatus extracts image features based on a color image and a depth image, obtains regional image features through segmentation processing based on the image features, and based on the regional image features. A first appearance characteristic of each target object and a geometrical relationship characteristic between each target object may be obtained. Step-by-step detailed description of FIG. 7 will be described later. In this case, the object pose estimation apparatus detects the object based on the area image feature, and acquires the geometrical relationship feature between each target object based on the object detection result.

객체 포즈 추정 장치는 각 타겟 객체에 대해, 타겟 객체의 제1 외관 특징, 해당 타겟 객체 이외의 다른 타겟 객체의 제1 외관 특징 및 해당 타겟 객체와 해당 타겟 객체 이외의 다른 타겟 객체 간의 기하학적 관계 특징을 기반으로, 해당 타겟 객체의 제2 외관 특징을 결정하고, 해당 제2 외관 특징에 기반하여 객체의 포즈를 추정한다.For each target object, the apparatus for estimating the object pose determines, for each target object, a first appearance characteristic of the target object, a first appearance characteristic of another target object other than the corresponding target object, and a geometrical relationship characteristic between the corresponding target object and another target object other than the target object. Based on the determination, a second appearance characteristic of the target object is determined, and a pose of the object is estimated based on the second appearance characteristic.

한편, 선택적으로, 객체 포즈 추정 장치는 융합 특징에 따라 다음과 같은 방안으로 객체의 포즈를 추정할 수 있다. 객체 포즈 추정 장치는 융합 특징과 각 타겟 객체의 제2 외관 특징에 따라 각 타겟 객체의 포즈를 추정한다.Meanwhile, optionally, the apparatus for estimating the object pose may estimate the pose of the object according to the following method according to the fusion characteristic. The object pose estimation apparatus estimates the pose of each target object according to the fusion characteristic and the second appearance characteristic of each target object.

도 6에 도시된 바와 같이, 객체 포즈 추정 장치는 타겟 객체의 융합 특징과 해당 타겟 객체에 대응하는 제2 외관 특징에 따라 관계 특징을 융합하여, 2차원 키 포인트 오프셋 이미지 및 3차원 키 포인트 오프셋 이미지를 추정한다. 객체 포즈 추정 장치는 깊이 이미지를 신뢰할 수 없는 경우 2차원 키 포인트 오프셋 이미지를 기반으로 객체의 포즈가 추정하고, 깊이 이미지를 신뢰할 수 있는 경우 3차원 키 포인트 오프셋 이미지를 기반으로 객체의 포즈를 추정한다.As shown in FIG. 6 , the object pose estimation apparatus fuses the relational features according to the fusion feature of the target object and the second appearance feature corresponding to the target object, and a two-dimensional key point offset image and a three-dimensional key point offset image to estimate The object pose estimation apparatus estimates the pose of the object based on the two-dimensional key point offset image when the depth image is not reliable, and estimates the pose of the object based on the three-dimensional key point offset image when the depth image is reliable. .

본 개시의 실시예는 다른 방법을 더 제공하며, 융합 특징에 따라 객체의 포즈를 추정할 수 있다. 도 7에 도시된 바와 같이, 객체 포즈 추정 장치는 이미지 특징에 기반하여 영역화 처리하고 영역 이미지 특징을 획득하고, 영역 이미지 특징에 기반하여 객체 인스턴스 분할하여, 객체 인스턴스 분할 이미지를 획득하고, 영역 이미지 특징 및 객체 인스턴스 분할 이미지에 기반하여 분할된 이미지 특징을 획득하고, 분할된 이미지 특징, 분할된 포인트 클라우드 특징 및 타겟 객체의 제2 외관 특징에 기반하여 특징을 융합하고, 융합 특징에 기반하여 3차원 키 포인트 오프셋 이미지를 추정하고, 추정된 3차원 키 포인트 오프셋 이미지에 기반하여 객체의 포즈를 추정한다. Embodiments of the present disclosure further provide another method, and may estimate a pose of an object according to a fusion feature. 7 , the apparatus for estimating an object pose performs regionalization based on the image feature to obtain a region image feature, and divides the object instance based on the region image feature to obtain an object instance segmentation image, and a region image Obtain a segmented image feature based on the feature and object instance segmentation image, fuse the feature based on the segmented image feature, the segmented point cloud feature, and the second appearance feature of the target object, and three-dimensionally based on the fusion feature Estimate the key point offset image, and estimate the pose of the object based on the estimated three-dimensional key point offset image.

이미지 특징에 따라 객체의 포즈를 추정하는 방법은 이미지 특징 및 각 타겟 객체의 제2 외관 특징에 따라, 각 타겟 객체의 포즈를 추정한다.The method of estimating the pose of the object according to the image feature estimates the pose of each target object according to the image feature and the second appearance feature of each target object.

구체적으로, 객체 포즈 추정 장치는 컬러 이미지와 깊이 이미지를 기반으로 이미지 특징을 추출하고, 이미지 특징을 기반으로 영역 이미지 특징을 획득하고, 영역 이미지 특징을 기반으로 분할된 이미지 특징을 획득하고, 타겟 객체의 이미지 특징 및 상응하는 제2 외관 특징에 따라 2차원 키 포인트 오프셋 이미지를 추정하고, 추정된 2차원 키 포인트 오프셋 이미지에 기반하여 타겟 객체의 포즈를 추정한다.Specifically, the object pose estimation apparatus extracts image features based on a color image and a depth image, acquires regional image features based on image features, acquires segmented image features based on regional image features, and selects a target object Estimate a two-dimensional key point offset image according to an image feature of and a corresponding second appearance feature, and estimate a pose of the target object based on the estimated two-dimensional key point offset image.

본 개시의 방안은 타겟 객체의 관계 특징을 도입하고, 여러 객체 간의 기하학적 관계 정보를 추출 및 융합하고, 서로 다른 타겟 객체의 제1 외관 특징을 추가하여 타겟 객체의 제2 외관 특징을 결정하며, 이를 통해 객체의 가림, 시야 절단 및 작은 객체 상황에서의 객체 포즈 추정 방법에 있어 정확성과 견고성을 향상시킨다.The method of the present disclosure introduces a relationship feature of a target object, extracts and fuses geometric relationship information between several objects, and adds first appearance features of different target objects to determine a second appearance feature of the target object, This improves the accuracy and robustness of object occlusion, field cutting, and object pose estimation methods in small object situations.

본 개시의 실시 예에서, 컬러 이미지 및 깊이 이미지를 기반으로, 각 타겟 객체의 제1 외관 특징 및 각 타겟 객체 간의 기하학적 관계 특징을 획득하는 것은, 다음 중 적어도 하나의 방안(방안1(P1-P5), 방안2(Q1-Q4))을 통해 획득할 수 있다.In an embodiment of the present disclosure, the obtaining of the first appearance feature of each target object and the geometrical relationship feature between each target object based on the color image and the depth image may include at least one of the following methods (Scheme 1 (P1-P5) ), method 2 (Q1-Q4)).

방안1(P1-P5)은 다음과 같다.Method 1 (P1-P5) is as follows.

P1: 컬러 이미지, 또는 컬러 이미지 및 깊이 이미지를 기반으로 이미지 특징 추출한다.P1: Image feature extraction based on a color image, or a color image and a depth image.

P2: 깊이 이미지에 따라 포인트 클라우드 데이터를 추출한다.P2: Extract point cloud data according to the depth image.

P3: 이미지 특징과 포인트 클라우드 특징을 융합하여 융합 특징을 획득한다.P3: Acquire a fusion feature by fusing the image feature and the point cloud feature.

P4: 융합 특징에 기초하여, 각 타겟 객체의 제1 외관 특징 및 객체 인스턴스 분할 이미지를 획득한다.P4: Acquire a first appearance feature and an object instance segmentation image of each target object based on the fusion feature.

P5: 객체 인스턴스 분할 이미지를 기반으로, 각 타겟 객체 간의 기하학적 관계 특징을 획득한다.P5: Based on the object instance segmentation image, obtain geometrical relationship features between each target object.

방안1에서 객체 포즈 추정 장치는 융합 특징을 기반으로 각 타겟 객체의 외관 특징과 객체 인스턴스 분할 이미지를 결정하고, 객체 인스턴스 분할 이미지를 기반으로 각 타겟 객체 간의 기하학적 관계 특징을 획득한다. 방안1은 한 번의 특징 추출을 기반으로 하여 타겟 객체의 외관 특징과 각 타겟 객체 간의 기하학적 관계 특징을 결정하며, 타겟 객체의 외관 특징과 타겟 객체 간의 기하학적 관계 특징을 빠르게 파악할 수 있다.In Scheme 1, the apparatus for estimating an object pose determines an appearance feature and an object instance segmentation image of each target object based on the fusion feature, and acquires a geometrical relationship feature between each target object based on the object instance segmentation image. Method 1 determines the appearance feature of the target object and the geometrical relationship feature between each target object based on one-time feature extraction, and it is possible to quickly grasp the appearance feature of the target object and the geometrical relationship feature between the target object.

방안2(Q1-Q4)는 다음과 같다Option 2 (Q1-Q4) is as follows

Q1: 컬러 이미지와 깊이 이미지를 기반으로 이미지 특징을 추출한다.Q1: Extract image features based on color images and depth images.

Q2: 이미지 특징에 따라 각 타겟 객체의 이미지 영역에 대응하는 이미지 특징을 획득한다.Q2: Obtain an image feature corresponding to an image region of each target object according to the image feature.

Q3: 각 타겟 객체의 영역 이미지에 대응하는 영역 이미지 특징을 기반으로, 각 타겟 객체의 제1 외관 특징 및 대응하는 객체 탐지 결과를 획득한다.Q3: Acquire a first appearance feature of each target object and a corresponding object detection result based on the area image feature corresponding to the area image of each target object.

Q4: 각 타겟 객체의 객체 탐지 결과를 기반으로, 각 타겟 객체의 기하학적 관계 특징을 획득한다.Q4: Based on the object detection result of each target object, obtain geometrical relationship characteristics of each target object.

방안2는 두번의 특징 추출에 기반한 것으로, 하나는 이미지 특징 추출이고, 다른 하나는 영역 이미지 특징 추출이며, 이를 통해 객체 포즈 추정 장치는 타겟 객체의 제1 외관 특징 및 각 타겟 객체의 기하학적 관계 특징을 획득한다. 두 번의 특징 추출은 제1 외관 특징 및 각 타겟 객체의 기하학적 관계 특징의 정확도를 향상시키는데 도움이 된다. Method 2 is based on two feature extractions, one is image feature extraction and the other is region image feature extraction. Through this, the object pose estimation apparatus calculates the first appearance feature of the target object and the geometrical relational feature of each target object. acquire The two-time feature extraction helps to improve the accuracy of the first appearance feature and the geometrical relationship feature of each target object.

한편, 컬러 이미지와 깊이 이미지는 비디오의 한 프레임 컬러 깊이 이미지를 기반으로 얻은 것으로, 컬러 이미지와 깊이 이미지에 대응하는 비디오 프레임이 비디오의 초기 프레임인 경우, 컬러 이미지와 깊이 이미지에 따라 깊이 이미지의 신뢰도를 결정할 수 있다.On the other hand, the color image and the depth image are obtained based on the one-frame color depth image of the video. can be decided

컬러 정보 및 깊이 정보를 포함하는 비디오 프레임에서 컬러 이미지 및 깊이 이미지가 추출된다. 컬러 이미지와 깊이 이미지에 대응하는 비디오 프레임이 비디오의 초기 프레임인 경우, 즉 해당 프레임 이전에 비디오 프레임의 포즈에 대한 참조 정보가 없는 경우, 객체 포즈 추정 장치는 컬러 이미지와 깊이 이미지를 사용하여 깊이 이미지의 신뢰도를 결정한다. 깊이 이미지의 신뢰도는 상술한 실시예 중 어느 하나를 사용하여 판단할 수 있다.A color image and a depth image are extracted from a video frame including color information and depth information. When the video frame corresponding to the color image and the depth image is an initial frame of the video, that is, when there is no reference information about the pose of the video frame before the corresponding frame, the object pose estimation apparatus uses the color image and the depth image to obtain the depth image. determine the reliability of The reliability of the depth image may be determined using any one of the above-described embodiments.

도 8은 다른 실시 예에 따라 객체 포즈를 추정하는 과정을 도시한 흐름도이다.8 is a flowchart illustrating a process of estimating an object pose according to another exemplary embodiment.

도 8을 참조하면, 도 1의 120단계와 130단계에서 객체의 포즈 추정 결과를 얻은 후, 다음 단계를 더 포함한다.Referring to FIG. 8 , after obtaining the pose estimation result of the object in steps 120 and 130 of FIG. 1 , the method further includes the following steps.

객체 포즈 추정 장치는 객체의 포즈 추정 결과를 포즈 결과 목록에 저장한다(140). 이때, 포즈 결과는 각 타겟 객체 및 각 타겟 객체에 대응하는 객체 포즈를 포함한다. 획득한 객체 포즈를 기반으로 후속 포즈 추정을 진행하기 위해, 획득한 객체의 포즈 추정 결과는 미리 구성된 포즈 결과 목록에 저장된다. 즉, 포즈 결과 목록의 객체 포즈는, 후속 객체 포즈 획득을 단순화하고 객체 포즈 획득의 실시간 성능을 향상시키고 데이터 처리량을 줄이기 위한 참조로 사용된다.The object pose estimation apparatus stores the pose estimation result of the object in the pose result list ( 140 ). In this case, the pose result includes each target object and an object pose corresponding to each target object. In order to proceed with the subsequent pose estimation based on the acquired object pose, the pose estimation result of the acquired object is stored in a pre-configured pose result list. That is, the object pose in the pose result list is used as a reference to simplify subsequent object pose acquisition, improve real-time performance of object pose acquisition, and reduce data throughput.

이를 바탕으로, 컬러 이미지 및 깊이 이미지에 대응하는 비디오 프레임이 초기 프레임이 아닌 경우, 객체 포즈 추정 장치는 비디오 프레임에 대응하는 비디오 획득 장치의 모션 파라미터를 획득하고, 해당 모션 파라미터 및 해당 비디오 프레임에 대응하는 초기 프레임의 객체 포즈 결과에 기초하여, 해당 비디오 프레임에 대응하는 포즈 결과를 결정한다(150).Based on this, if the video frame corresponding to the color image and the depth image is not the initial frame, the object pose estimation apparatus obtains a motion parameter of the video acquisition device corresponding to the video frame, and corresponds to the motion parameter and the video frame Based on the object pose result of the initial frame, a pose result corresponding to the video frame is determined ( 150 ).

그리고, 객체 포즈 추정 장치는, 해당 비디오 프레임에 대응하는 포즈 결과에 따라, 포즈 결과 목록에서 해당 비디오 프레임에 대응하는 초기 프레임의 포즈 결과를 업데이트한다(160).Then, the object pose estimation apparatus updates the pose result of the initial frame corresponding to the video frame in the pose result list according to the pose result corresponding to the video frame ( 160 ).

하나의 비디오는 복수의 비디오 프레임 시퀀스를 가질 수 있고, 각 비디오 프레임 시퀀스에는 하나의 초기 프레임만 있고, 나머지 비디오 프레임은 초기 프레임이 아니다. 컬러 이미지 및 깊이 이미지에 대응하는 비디오 프레임이 초기 프레임이 아닌 경우, 즉 적어도 하나의 타겟 객체와 대응하는 객체 포즈가 포즈 결과 목록에 저장되어 있는 경우(포즈 결과 목록에 적어도 초기 프레임에 대응하는 포즈 결과 목록이 저장된 경우), 컬러 이미지 및 깊이 이미지에 대응하는 비 초기 프레임에 대한 비디오 획득 장치의 모션 파라미터를 획득한다. 해당 비디오 획득 장치의 모션 파라미터는 하나의 상대적 모션 파라미터일 수 있다. 해당 상대적 모션 파라미터는 초기 프레임에 대한 비 초기 프레임의 비디오 프레임의 획득 장치의 모션 파라미터로, 해당 상대적 모션 파라미터와 해당 비디오 프레임 시퀀스의 초기 프레임의 객체 포즈 결과를 기반으로, 해당 비디오 프레임의 객체 포즈 결과가 결정된다.One video may have a plurality of video frame sequences, each video frame sequence has only one initial frame, and the remaining video frames are not initial frames. When the video frame corresponding to the color image and the depth image is not an initial frame, that is, when an object pose corresponding to at least one target object is stored in the pose result list (a pose result corresponding to at least an initial frame in the pose result list) list is stored), and acquires motion parameters of the video acquisition device for non-initial frames corresponding to color images and depth images. The motion parameter of the corresponding video acquisition device may be one relative motion parameter. The corresponding relative motion parameter is a motion parameter of the acquiring device of the video frame of the non-initial frame with respect to the initial frame. Based on the relative motion parameter and the object pose result of the initial frame of the corresponding video frame sequence, the object pose result of the corresponding video frame is decided

객체 포즈 추정 장치는 해당 비디오 프레임 중 모든 타겟 객체와 대응하는 포즈 결과를 획득하고, 해당 비디오 프레임의 타겟 객체 및 그 대응하는 객체 포즈를 구성하는 포즈 결과를 포즈 결과 목록에 저장하고, 해당 비디오 프레임에 대응하는 포즈 결과를 해당 비디오 프레임에 대응하는 초기 프레임의 포즈 결과로 간주하여, 포즈 결과 목록을 업데이트한다.The object pose estimation apparatus obtains pose results corresponding to all target objects in the video frame, stores the target object of the video frame and pose results constituting the corresponding object poses in the pose result list, and stores the pose results in the corresponding video frame. Considering the corresponding pause result as the pause result of the initial frame corresponding to the video frame, the pause result list is updated.

보다 구체적으로, 객체 포즈 추정 장치는 초기 프레임에 대응하는 포즈 결과와 비디오 획득 장치의 모션 파라미터를 사용하여 해당 초기 프레임이 위치한 비디오 프레임 시퀀스에서 비 초기 프레임의 객체 포즈 추정을 결정하고, 앞뒤 비디오 프레임의 차이를 이용하여 객체의 포즈를 추정한다. 초기 프레임의 추정된 포즈를 이용하여 비 초기 프레임의 포즈를 추정하는 방법은 객체 포즈 획득의 효율성을 향상시키고 객체 포즈 획득의 실시간 성능을 향상시키는 데 도움이 된다. 또한, 위의 방안은 인접한 비디오 프레임의 데이터 차이가 적기 때문에, 비 초기 프레임의 비디오 프레임에 대응하는 포즈 결과를 이용하여 포즈 결과 목록을 업데이트 하고, 이는 다음 비디오 프레임의 데이터 처리를 줄이는 데 도움이 된다.More specifically, the object pose estimation apparatus determines the object pose estimation of a non-initial frame in the video frame sequence in which the corresponding initial frame is located by using the pose result corresponding to the initial frame and the motion parameters of the video acquisition apparatus, and Estimate the pose of the object using the difference. The method of estimating the pose of a non-initial frame by using the estimated pose of the initial frame helps to improve the efficiency of object pose acquisition and improve the real-time performance of the object pose acquisition. In addition, since the above method has a small data difference between adjacent video frames, the pause result list is updated using the pause result corresponding to the video frame of the non-initial frame, which helps to reduce data processing of the next video frame. .

비디오에서 비디오 프레임이 초기 프레임인지 여부를 결정하기 위한 이하 몇 가지 방식을 더 제공한다. Several more methods are provided below for determining whether a video frame is an initial frame in a video.

어느 한 비디오 프레임의 경우, 해당 비디오 프레임이 비디오의 첫 번째 프레임인 경우, 해당 비디오 프레임을 해당 비디오의 초기 프레임으로 볼 수 있다.In the case of any one video frame, if the corresponding video frame is the first frame of a video, the corresponding video frame can be viewed as an initial frame of the corresponding video.

어느 한 비디오 프레임의 경우, 해당 비디오 프레임에 대해 객체 탐지를 수행하여, 해당 비디오 프레임에서 타겟 객체와 객체 포즈를 결정하고, 해당 비디오 프레임에서 타겟 객체 또는 객체 포즈가 처음 나타나는지 여부를 탐지하여, 비디오 프레임이 초기 프레임인지 판단할 수 있다. 비디오 프레임에서 타겟 객체 또는 객체의 포즈가 처음 나타날 때, 해당 비디오 프레임은 해당 비디오의 초기 프레임이다.For a video frame, object detection is performed on the video frame to determine the target object and object pose in the video frame, and it is detected whether the target object or object pose first appears in the video frame. It may be determined whether this is an initial frame. When a target object or pose of an object first appears in a video frame, that video frame is the initial frame of the video.

도 10은 일 실시 예에 따라 비디오의 초기 프레임에 기반하여 객체 포즈를 추정하는 과정을 도시한 흐름도이다.10 is a flowchart illustrating a process of estimating an object pose based on an initial frame of a video according to an embodiment.

도 10을 참조하면, 객체 포즈 추정 장치는 RGBD 이미지의 비디오 프레임을 읽는다(1010).Referring to FIG. 10 , the object pose estimation apparatus reads a video frame of an RGBD image ( 1010 ).

객체 포즈 추정 장치는 RGBD 이미지를 기반으로 객체의 컬러 이미지와 깊이 이미지를 얻을 수 있으며, 해당 비디오 프레임이 초기 프레임인지 여부를 결정한다(1012).The object pose estimation apparatus may obtain a color image and a depth image of an object based on the RGBD image, and determines whether a corresponding video frame is an initial frame ( 1012 ).

1012의 확인결과 초기 프레임이면, 객체 포즈 추정 장치는 컨볼루션 신경망을 기반으로 객체 인스턴스 분할을 수행하고, 객체 인스턴스 분할 이미지를 기반으로 객체의 6-DOF 포즈를 추정한다(1014).If it is an initial frame as a result of confirmation in step 1012 , the object pose estimation apparatus performs object instance segmentation based on the convolutional neural network and estimates a 6-DOF pose of the object based on the object instance segmentation image ( 1014 ).

그리고, 객체 포즈 추정 장치는 타겟 객체와 대응하는 객체 포즈를 획득하고, 타겟 객체 및 대응하는 객체 포즈를 기반으로 포즈 결과 목록을 업데이트 한다(1016). Then, the object pose estimation apparatus obtains an object pose corresponding to the target object, and updates the pose result list based on the target object and the corresponding object pose (1016).

1012의 확인결과 초기 프레임이 아닌 경우, 객체 포즈 추정 장치는 획득한 현재 비디오 프레임의 6-DOF 카메라 모션 파라미터를 기반으로, 포즈 결과 목록의 각 객체 인스턴스에 대해, 현재 비디오 프레임에서 카메라의 모션 파라미터를 사용하여 초기 프레임의 객체 인스턴스의 포즈를 현재 비디오 프레임으로 변환하고, 해당 비 초기 프레임의 객체 포즈를 획득한다(1018).If it is determined in 1012 that it is not the initial frame, the object pose estimation apparatus calculates the motion parameters of the camera in the current video frame for each object instance in the pose result list based on the acquired 6-DOF camera motion parameters of the current video frame. is used to transform the pose of the object instance of the initial frame into the current video frame, and obtains the object pose of the non-initial frame ( 1018 ).

한편, 도 10은 단지 선택적인 방법일 뿐이며, 본 출원 실시예의 방안을 제한하지 않는다.Meanwhile, FIG. 10 is only an optional method, and does not limit the method of the embodiment of the present application.

초기 프레임의 선택에 있어, 도 11과 같이 나타낼 수 있다. In the selection of the initial frame, it can be represented as shown in FIG. 11 .

도 11은 일 실시 예에 따라 3개의 초기 프레임 선택 방법의 예를 도시한 도면이다.11 is a diagram illustrating an example of a method for selecting three initial frames according to an embodiment.

임의의 비디오 프레임의 경우, 해당 비디오 프레임이 비디오의 제1 프레임인 경우, 해당 비디오 프레임은 해당 비디오의 초기 프레임이다. 하나의 비디오에 하나의 초기 프레임만 있는 경우, 해당 초기 프레임은 비디오의 제1 프레임이 되며, 이는 도 11의 초기 프레임 선택 방법(1)에 대응한다.In the case of any video frame, if the video frame is the first frame of the video, the video frame is the initial frame of the video. When there is only one initial frame in one video, the corresponding initial frame becomes the first frame of the video, which corresponds to the initial frame selection method (1) of FIG. 11 .

비디오의 경우 미리 설정된 기간에 따라, 비디오를 복수의 비디오 프레임 시퀀스로 나눌 수도 있다. 각 비디오 프레임 시퀀스의 제1 프레임은 해당 비디오 프레임 시퀀스의 초기 프레임이고, 초기 프레임 선택 방법(2)에 대응한다. In the case of video, the video may be divided into a plurality of video frame sequences according to a preset period. The first frame of each video frame sequence is an initial frame of the corresponding video frame sequence, and corresponds to the initial frame selection method (2).

본 개시는 초기 프레임 선택 방법(3)을 제공한다. 하며, 초기 프레임 선택 방법(3)은 비디오를 적어도 하나의 비디오 프레임 시퀀스로 분할할 수 있다. 예를 들어 도 11의 초기 프레임 선택 방법(3)에 개시된 초기 프레임 선택 컨트롤러와 같다. 초기 프레임 선택 컨트롤러는 새 객체와 새 포즈의 모니터링을 통해 새 객체와 새 포즈의 비디오 프레임이 해당 비디오 프레임 시퀀스의 초기 프레임이라고 결정한다.The present disclosure provides an initial frame selection method (3). And, the initial frame selection method 3 may divide the video into at least one video frame sequence. For example, it is the same as the initial frame selection controller disclosed in the initial frame selection method (3) of FIG. By monitoring the new object and new pose, the initial frame selection controller determines that the video frame of the new object and the new pose is the initial frame of the corresponding video frame sequence.

구체적으로, 객체 포즈 추정 장치는 비디오의 각 비디오 프레임에 대해 객체 탐지를 수행하고, 탐지 결과에 따라 해당 비디오 프레임에서의 타겟 객체와 포즈를 결정하고, 탐지를 통해 해당 비디오 프레임에서의 타겟 객체 또는 객체 포즈가 처음 나타나는지 여부를 결정한다. 예를 들어, 객체 포즈 추정 장치는 현재 비디오 프레임에 대응하는 타겟 객체와 객체 포즈를 포즈 결과 목록 중 타겟 객체와 객체 포즈와 비교하여, 현재 비디오 프레임에 대응하는 타겟 객체와 객체 포즈가 처음 나타나는지 여부를 판단할 수 있으며, 비디오 프레임 중 새로운 타겟 객체 또는 새로운 객체 포즈가 적어도 하나 나타난 경우, 해당 비디오 프레임을 해당 비디오 또는 비디오 클립의 초기 프레임으로 판단한다.Specifically, the object pose estimation apparatus performs object detection on each video frame of the video, determines a target object and pose in the video frame according to the detection result, and detects the target object or object in the video frame through detection. Determines whether the pose appears first. For example, the object pose estimation apparatus compares the target object and the object pose corresponding to the current video frame with the target object and the object pose in the pose result list, and determines whether the target object and the object pose corresponding to the current video frame appear for the first time. may be determined, and when at least one new target object or new object pose appears among the video frames, the corresponding video frame is determined as an initial frame of the corresponding video or video clip.

본 개시는 상술한 방법을 통해 비디오 프레임이 초기 프레임인지 여부를 판단하고, 판단 결과에 따라 타겟 처리를 수행할 수 있다. 비디오 프레임이 초기 프레임이 아닌 경우, 객체 포즈 추정 장치는 초기 프레임의 객체 포즈 및 비디오에 대응하는 모션 파라미터를 이용하여 객체의 포즈를 추정하여, 객체 포즈 추정의 실시간 성능을 향상시킬 수 있다.The present disclosure may determine whether a video frame is an initial frame through the above-described method, and may perform target processing according to the determination result. When the video frame is not the initial frame, the object pose estimation apparatus estimates the pose of the object using the object pose of the initial frame and a motion parameter corresponding to the video, thereby improving real-time performance of object pose estimation.

비디오 프레임에 새로운 객체 또는 새로운 포즈가 나타나는지 여부에 따라 비디오 프레임이 초기 프레임인지 여부를 결정하는 방안은 객체 포즈 추정 방안에 적합하다.A method of determining whether a video frame is an initial frame according to whether a new object or a new pose appears in the video frame is suitable for the object pose estimation method.

하나의 비디오에 제1 프레임만 초기 프레임으로 설정하는 방안과 비교하여, 새로운 객체 또는 새로운 포즈가 나타나는지 여부에 따라 초기 프레임을 진행하는 방안은 초기 프레임을 기반으로 하는 비 초기 비디오 프레임의 객체 포즈 추정을 위한 데이터 처리량을 줄이는데 도움이 된다.Compared to the method of setting only the first frame as the initial frame in one video, the method of advancing the initial frame depending on whether a new object or a new pose appears requires object pose estimation of a non-initial video frame based on the initial frame. It helps to reduce the data throughput for

도 12는 일 실시 예에 따라 객체 탐지에 기반하여 비디오 프레임이 초기 프레임인지 여부를 결정하는 과정을 도시한 흐름도이다.12 is a flowchart illustrating a process of determining whether a video frame is an initial frame based on object detection, according to an embodiment.

도 12를 참조하면, 객체 포즈 추정 장치는 새로운 RGBD 비디오 프레임을 획득한다(1210).Referring to FIG. 12 , the object pose estimation apparatus acquires a new RGBD video frame ( 1210 ).

그리고, 객체 포즈 추정 장치는 객체 탐지(1212)를 통해서 비디오 프레임 내의 타겟 객체 및 객체 포즈를 획득하고, 포즈 결과 목록(현재의 객체 인스턴스 및 그 6-DOF 포즈 목록)을 비교한다(1214).Then, the object pose estimation apparatus acquires the target object and the object pose in the video frame through the object detection 1212, and compares the pose result list (the current object instance and its 6-DOF pose list) (1214).

그리고, 객체 포즈 추정 장치는 비교 결과에 따라 새로운 객체 인스턴스 또는 새로운 포즈가 있는지 여부를 결정한다(1216).Then, the object pose estimation apparatus determines whether there is a new object instance or a new pose according to the comparison result ( 1216 ).

1216단계의 확인결과 새로운 객체 인스턴스 또는 새로운 포즈가 없으면, 객체 포즈 추정 장치는 1210단계로 되돌아 가서 새로운 프레임을 획득한다.If it is determined in step 1216 that there is no new object instance or new pose, the apparatus for estimating the object pose returns to step 1210 and acquires a new frame.

1216단계의 확인결과 새로운 객체 인스턴스 또는 새로운 포즈가 있으면, 객체 포즈 추정 장치는 현재 비디오 프레임을 초기 프레임으로 설정한다(1218).If it is determined in step 1216 that there is a new object instance or a new pose, the object pose estimation apparatus sets the current video frame as an initial frame ( 1218 ).

상술한 어느 한 방법으로 비디오 프레임이 초기 프레임인지 판단한 후, 객체 포즈 추정 장치는 초기 프레임인 경우, 깊이 이미지에 대해 깊이 데이터 신뢰도를 판단하고, 판단 결과에 따라 객체의 포즈를 추정하는 방법을 사용하여 객체의 포즈를 추정하고, 초기 프레임이 아닌 경우, 도 8의 150단계 내지 160 단계에서 제공된 방법을 이용하여 객체의 포즈를 추정한다.After determining whether a video frame is an initial frame by any of the methods described above, the object pose estimation apparatus determines the reliability of depth data for a depth image when it is an initial frame, and uses a method of estimating the pose of an object according to the determination result. The pose of the object is estimated, and if it is not the initial frame, the pose of the object is estimated using the method provided in steps 150 to 160 of FIG. 8 .

비디오 프레임에서 타겟 객체 또는 객체 포즈가 처음 나타나는지 여부를 탐지하는 단계는 또한 다음 방법을 통해 실현될 수도 있으며, 해당 방법은 다음 단계를 포함할 수 있다.The step of detecting whether the target object or object pose first appears in the video frame may also be realized through the following method, and the method may include the following step.

K1: 해당 비디오 프레임에서 각 타겟 개체의 이미지 바운딩 박스를 획득한다.K1: Acquire the image bounding box of each target object in the corresponding video frame.

K2: 각 타겟 객체의 이미지 바운딩 박스를 포즈 결과 목록의 각 타겟 객체에 대응하는 이미지 바운딩 박스와 일치시킨다.K2: Match the image bounding box of each target object with the image bounding box corresponding to each target object in the pose result list.

K3: 상기 포즈 결과 목록에 일치하는 타겟 객체가 있는 경우, 해당 비디오 프레임에서 각 타겟 객체에 대응하는 해당 이미지 바운딩 박스의 제1 포인트 클라우드 데이터와 해당 비디오 프레임의 이전 비디오 프레임에서 각 타겟 객체에 대응하는 제2 포인트 클라우드 데이터 프레임을 비교하여, 제1 포인트 클라우드 데이터와 제2 포인트 클라우드 데이터 사이에 차이가 있는지 여부를 결정하고, 차이가 있다면, 타겟 객체에 대응하는 객체의 포즈가 처음 나타난 것으로 결정한다.K3: If there is a matching target object in the pose result list, the first point cloud data of the corresponding image bounding box corresponding to each target object in the corresponding video frame and the corresponding target object in the previous video frame of the corresponding video frame By comparing the second point cloud data frames, it is determined whether there is a difference between the first point cloud data and the second point cloud data, and if there is a difference, it is determined that the pose of the object corresponding to the target object appears first.

K4: 객체 포즈 결과 목록에 일치하는 타겟 객체가 없는 경우, 타겟 객체가 처음 나타나는 것으로 결정한다.K4: If there is no matching target object in the object pose result list, it is determined that the target object appears first.

K1 - K4의 탐지 방법은 비디오 프레임에서 타겟 객체의 이미지 바운딩 박스와 포즈 결과 목록의 타겟 객체의 이미지 바운딩 박스를 비교하고, 포즈 결과 목록에 비디오 프레임의 타겟 객체의 이미지 바운딩 박스와 일치하는 이미지 바운딩 박스가 없는 경우, 비디오 프레임의 타겟 객체가 처음으로 나타난 것을 나타내며, 포즈 결과 목록에 비디오 프레임에 대응하는 이미지 바운딩 박스와 일치하는 이미지 바운딩 박스가 있는 경우, 비디오 프레임의 타겟 객체가 처음 나타난 것이 아님을 나타낸다.The detection method of K1 - K4 compares the image bounding box of the target object in the video frame with the image bounding box of the target object in the pose result list, and the image bounding box matching the image bounding box of the target object of the video frame in the pose result list If there is no, it indicates that the target object of the video frame appears for the first time. .

객체 포즈 추정 장치는 타겟 객체가 처음 나타난 것이 아닌 비디오 프레임의 경우, 해당 타겟 객체의 객체 포즈를 더 판단하고, 해당 비디오 프레임에서 해당 타겟 객체에 대응하는 이미지 바운딩 박스의 제1 포인트 클라우드 데이터를 해당 비디오 프레임의 이전 비디오 프레임에서 해당 타겟 객체에 대응하는 제2 포인트 클라우드 데이터와 비교한다. 객체 포즈 추정 장치는 제1 포인트 클라우드 데이터와 제2 포인트 클라우드 데이터가 동일한 경우, 해당 비디오 프레임에서 타겟 객체의 포즈는 처음 나타난 것이 아님을 나타내고, 제1 포인트 클라우드 데이터와 제2 포인트 클라우드 데이터가 다른 경우, 즉 데이터 차이가 있는 경우, 타겟 객체의 객체 포즈가 처음 나타난 것을 나타낸다.In the case of a video frame in which the target object does not appear for the first time, the object pose estimation apparatus further determines the object pose of the target object, and converts the first point cloud data of the image bounding box corresponding to the target object in the video frame to the video frame. It compares with the second point cloud data corresponding to the target object in the previous video frame of the frame. When the first point cloud data and the second point cloud data are the same, the object pose estimation apparatus indicates that the pose of the target object does not appear first in the video frame, and when the first point cloud data and the second point cloud data are different , that is, if there is a data difference, it indicates that the object pose of the target object appears first.

도 13은 일 실시 예에 따라 새로운 객체 또는 새로운 포즈의 유무를 확인하는 과정을 도시한 흐름도이다.13 is a flowchart illustrating a process of confirming the existence of a new object or a new pose, according to an exemplary embodiment.

도 13을 참조하면, 객체 포즈 추정 장치는 객체가 탐지한 각 객체의 이미지 바운딩 박스를 확인한다(1310).Referring to FIG. 13 , the apparatus for estimating an object pose checks an image bounding box of each object detected by the object ( 1310 ).

그리고, 객체 포즈 추정 장치는 현재 객체 인스턴스 목록의 각 객체에 대해서 두개의 이미지 바운딩 박스가 일치하고 겹치는지 여부를 비교한다(1312).Then, the object pose estimation apparatus compares whether two image bounding boxes match and overlap for each object in the current object instance list ( 1312 ).

그리고, 객체 포즈 추정 장치는 둘의 이미지 바운딩 박스가 일치하는 객체를 찾을 수 있는지 여부를 확인한다(1314).Then, the apparatus for estimating the object pose checks whether an object matching the two image bounding boxes can be found ( 1314 ).

1314단계의 확인결과 둘의 이미지 바운딩 박스가 일치하지 않은 객체가 있으면, 객체 포즈 추정 장치는 새로운 타겟 객체가 탐지되었다고 판단한다(1316).If it is determined in step 1314 that there is an object whose image bounding boxes do not match, the object pose estimation apparatus determines that a new target object is detected ( 1316 ).

1314단계의 확인결과 둘의 이미지 바운딩 박스가 일치하는 객체가 있으면, 객체 포즈 추정 장치는 현재 비디오 프레임의 이전 프레임과 비교하여 바운딩 박스의 포인트 클라우드가 변경되었는지 여부를 확인한다(1318).As a result of the check in step 1314, if there is an object with which the two image bounding boxes match, the object pose estimation apparatus determines whether the point cloud of the bounding box is changed in comparison with the previous frame of the current video frame (1318).

1318단계의 확인결과 바운딩 박스의 포인트 클라우드가 변경되었으면, 객체 포즈 추정 장치는 , 새로운 객체 포즈가 감지되었다고 판단한다(1320).If the point cloud of the bounding box is changed as a result of the check in step 1318, the object pose estimation apparatus determines that a new object pose is detected (1320).

1318단계의 확인결과 바운딩 박스의 포인트 클라우드가 변경되지 않았으면, 객체 포즈 추정 장치는 새로운 객체 및 새로운 포즈가 탐지되지 않았다고 판단한다(1322).If it is determined in step 1318 that the point cloud of the bounding box is not changed, the object pose estimation apparatus determines that a new object and a new pose are not detected ( 1322 ).

본 개시는 비디오 프레임에서 타겟 객체 또는 객체 포즈가 처음 나타나는지 여부를 판단하는 방안을 제공한다. 타겟 객체 또는 객체의 포즈 중 어느 하나가 처음 나타나는 것으로 판단되면, 해당 비디오 프레임을 초기 프레임으로 결정하고, 도 8의 110단계 내지 140단계에서 제공된 방식에 따라 객체의 포즈를 추정한다. 이때, 이미지 바운딩 박스를 기준으로 한 타겟 객체에 대한 판단은 보다 상세한 데이터 없이 타겟 객체가 새로운 객체인지 빠르고 정확하게 판단할 수 있다. 이미지 바운딩 박스의 포인트 클라우드 데이터를 비교하여 객체 포즈의 첫 등장 여부를 정확하게 판단할 수 있다.The present disclosure provides a method of determining whether a target object or an object pose first appears in a video frame. When it is determined that any one of the target object or the pose of the object appears for the first time, the corresponding video frame is determined as an initial frame, and the pose of the object is estimated according to the method provided in steps 110 to 140 of FIG. 8 . In this case, the determination of the target object based on the image bounding box may quickly and accurately determine whether the target object is a new object without more detailed data. By comparing the point cloud data of the image bounding box, it is possible to accurately determine whether an object pose first appears.

본 개시의 실시예에서 제공하는 방안을 보다 더 잘 이해하고 설명하기 위해, 이하, 몇 가지 구체적인 예시를 결합하여 구체적으로 설명한다. 한편, 도 2, 3, 6, 7에서 각진 사각형은 작업 단계를 나타내고, 둥근 사각형은 처리 결과를 나타낸다.In order to better understand and explain the methods provided by the embodiments of the present disclosure, a few specific examples will be combined and described below in detail. On the other hand, in FIGS. 2, 3, 6, and 7, square rectangles indicate operation steps, and round rectangles indicate processing results.

예시 1Example 1

도 9는 일 실시 예에 따라 컬러 이미지 및 깊이 이미지에 기반하여 객체 포즈를 추정하는 과정을 도시한 도면이다.9 is a diagram illustrating a process of estimating an object pose based on a color image and a depth image, according to an embodiment.

도 9를 참조하면, 객체 포즈 추정 장치는 동일한 객체의 컬러 이미지(901)와 깊이 이미지(902)를 획득하고, 이미지 컨볼루션 네트워크(911)를 사용하여 컬러 이미지(901)와 깊이 이미지(902)에 대해 이미지 특징 추출을 진행하여 이미지 특징을 획득하고, 포인트 클라우드 특징 추출 네트워크(912)를 사용하여 깊이 이미지(902)의 포인트 클라우드 데이터에 대해 포인트 클라우드 특징 추출을 진행하여 포인트 클라우드 특징을 획득한 후, 이미지 특징과 포인트 클라우드 특징을 융합(920)하여 융합 특징을 획득한다.9, the object pose estimation apparatus obtains a color image 901 and a depth image 902 of the same object, and uses an image convolution network 911 to obtain a color image 901 and a depth image 902. After acquiring image features by performing image feature extraction for , by fusing (920) the image feature and the point cloud feature to obtain the fusion feature.

객체 포즈 추정 장치는 융합 특징을 기반으로 다음 두 가지 분기의 동작을 수행한다.The object pose estimation apparatus performs the following two branching operations based on the fusion feature.

첫 번째 분기로, 객체 포즈 추정 장치는 융합 특징을 기반으로 깊이 신뢰도 예측(931)을 진행하여, 깊이 신뢰도 이미지를 획득하고, 융합 특징을 기반으로 객체에 대해 시맨틱 분할(932)을 진행하여, 객체 시맨틱 분할 이미지를 획득하고, 객체 중심 오프셋 추정(933)을 진행하여, 객체 중심 이미지를 획득하고, 해당 분기에서, 객체 시맨틱 분할 이미지 및 객체 중심 이미지를 기반으로 이미지에 대해 인스턴스 분할(934)을 진행하여, 객체 인스턴스 분할 이미지를 획득한다.As a first branch, the object pose estimation apparatus performs depth reliability prediction 931 based on the fusion feature to obtain a depth reliability image, and performs semantic segmentation 932 on the object based on the fusion feature, Acquire a semantic segmentation image, perform object center offset estimation 933 to obtain an object center image, and in the corresponding branch, perform instance segmentation 934 on the image based on the object semantic segmentation image and the object center image Thus, an object instance segmentation image is obtained.

그리고, 객체 포즈 추정 장치는 객체 인스턴스 분할 이미지 및 깊이 신뢰도 이미지에 기초하여 깊이 데이터 신뢰도를 판단(940)한다. Then, the object pose estimation apparatus determines ( 940 ) the depth data reliability based on the object instance segmentation image and the depth reliability image.

두 번째 분기에서, 객체 포즈 추정 장치는 객체 인스턴스 분할 이미지와 융합 특징에 대해 관계 특징을 융합(950)하여 융합된 특징을 획득한다.In the second branch, the object pose estimation apparatus acquires the fused feature by fusing 950 the relational feature with respect to the object instance segmentation image and the fused feature.

객체 포즈 추정 장치는 깊이 데이터 신뢰도의 판단 결과에 따라, 객체 포즈 추정에 2차원 키 포인트를 사용할 것인지 3차원 키 포인트를 사용할 것인지 결정한다. 구체적으로, 객체 포즈 추정 장치는 깊이 데이터를 신뢰할 수 없는 경우, 융합된 특징을 기반으로 2차원 키 포인트 오프셋 추정(961)을 진행하여, 2차원 키 포인트 오프셋 이미지를 획득한다. 2차원 키 포인트 오프셋 이미지에 따라, 2차원 키 포인트 투표(962)를 수행하여 미리 설정된 기준 키 포인트의 2D 좌표를 얻고, ePnP 알고리즘(963)에 따라 기준 키 포인트의 3D 좌표를 미리 설정하여 6D 포즈를 추정(964)한다.The object pose estimation apparatus determines whether to use a two-dimensional key point or a three-dimensional key point for object pose estimation according to a result of the determination of the reliability of the depth data. Specifically, when the depth data is unreliable, the object pose estimation apparatus performs two-dimensional key point offset estimation 961 based on the fused feature to obtain a two-dimensional key point offset image. According to the two-dimensional key point offset image, a two-dimensional key point voting 962 is performed to obtain the 2D coordinates of a preset reference key point, and the 3D coordinates of the reference key point are preset according to the ePnP algorithm 963 to pose a 6D pose is estimated (964).

객체 포즈 추정 장치는 깊이 데이터를 신뢰할 수 있는 경우, 융합된 특징을 기반으로 3차원 키 포인트 오프셋 추정(971)을 진행하여, 3차원 키 포인트 오프셋 이미지에 따라 3차원 키 포인트 투표(972)를 수행하고, 최소 제곱 피팅(973)을 통해 6D 포즈를 추정(974)한다.When the object pose estimation apparatus trusts the depth data, the three-dimensional key point offset estimation 971 is performed based on the fused feature, and the three-dimensional key point voting 972 is performed according to the three-dimensional key point offset image. and estimate (974) the 6D pose through least squares fitting (973).

예시 2Example 2

도 2는 일 실시 예에 따라 한번의 특징 추출에 기반하여 객체 포즈를 추정하는 과정을 도시한 도면이다.2 is a diagram illustrating a process of estimating an object pose based on a single feature extraction according to an embodiment.

도 2를 참조하면, 객체 포즈 추정 장치는 동일한 객체의 컬러 이미지(201)와 깊이 이미지(202)를 획득하고, 이미지 특징 추출 네트워크(211)를 사용하여 컬러 이미지(201)와 깊이 이미지(202)에 대해 이미지 특징 추출을 진행하여, 이미지 특징을 획득하고, 포인트 클라우드 특징 추출 네트워크(212)를 사용하여 깊이 이미지(202)의 포인트 클라우드 데이터에 대해 포인트 클라우드 특징을 추출하여, 포인트 클라우드 특징을 획득한다.2 , the object pose estimation apparatus obtains a color image 201 and a depth image 202 of the same object, and uses an image feature extraction network 211 to obtain a color image 201 and a depth image 202 to extract the image features, obtain image features, and extract the point cloud features from the point cloud data of the depth image 202 using the point cloud feature extraction network 212 to obtain the point cloud features .

객체 포즈 추정 장치는 획득한 이미지 특징과 포인트 클라우드 특징을 융합(220)한다. 여기서 융합 작업은 고밀도 융합이 될 수 있고, 객체 포즈 추정 장치는 융합 특징을 기반으로 다음 3가지 분기의 작업을 수행한다.The object pose estimation apparatus fuses (220) the acquired image feature and the point cloud feature. Here, the fusion operation may be high-density fusion, and the object pose estimation apparatus performs the following three branches based on the fusion feature.

첫째, 객체 포즈 추정 장치는 융합 특징에 기초하여 이미지에 대해 깊이 신뢰도를 예측하고, 깊이 신뢰도 이미지(231)를 획득한다.First, the apparatus for estimating an object pose predicts depth reliability for an image based on a fusion feature, and obtains a depth reliability image 231 .

둘째, 객체 포즈 추정 장치는 융합 특징에 기초하여 객체를 시맨틱 분할하여, 객체 시맨틱 분할 이미지(232)를 획득하고, 객체 중심 오프셋 추정을 통해 객체 중심 이미지(233)를 획득한다.Second, the object pose estimation apparatus semantically divides the object based on the fusion feature to obtain the object semantic segmentation image 232 , and obtains the object center image 233 through object center offset estimation.

셋째, 객체 포즈 추정 장치는 2차원 키 포인트 오프셋 이미지(234) 및 3차원 키 포인트 오프셋 이미지(235)를 획득한다.Third, the object pose estimation apparatus obtains a two-dimensional key point offset image 234 and a three-dimensional key point offset image 235 .

그런 다음, 객체 포즈 추정 장치는 객체 시맨틱 분할 이미지(232)와 객체 중심 이미지(233)에 따라 객체 인스턴스 분할을 진행하여, 객체 인스턴스 분할 이미지(241)를 획득하고, 객체 인스턴스 분할 이미지(241) 및 깊이 신뢰도 이미지에(231) 따라 깊이 데이터의 신뢰도를 판단(250)하고, 판단 결과에 따라 2차원 키 포인트를 사용하여 객체 포즈를 추정할지 3차원 키 포인트를 사용하여 객체의 포즈를 추정할지 판단한다. Then, the object pose estimation apparatus performs object instance segmentation according to the object semantic segmentation image 232 and the object center image 233 to obtain an object instance segmentation image 241, and an object instance segmentation image 241 and Determining the reliability of the depth data according to the depth reliability image 231 (250), and determining whether to estimate the object pose using the two-dimensional key point or the three-dimensional key point according to the determination result .

구체적으로, 객체 포즈 추정 장치는 깊이 데이터를 신뢰할 수 없는 경우, 2차원 키 포인트 오프셋 이미지에 따라, 2차원 키 포인트 투표(261)를 수행하여 미리 설정된 기준 키 포인트의 2D 좌표를 얻고, PnP 알고리즘(262)에 따라 기준 키 포인트의 3D 좌표를 미리 설정하여 6D 포즈를 추정(263)한다.Specifically, when the depth data is not reliable, the object pose estimation apparatus performs a two-dimensional key point voting 261 according to the two-dimensional key point offset image to obtain the 2D coordinates of a preset reference key point, and a PnP algorithm ( 262), the 3D coordinates of the reference key point are preset to estimate (263) a 6D pose.

객체 포즈 추정 장치는 깊이 데이터를 신뢰할 수 있는 경우, 3차원 키 포인트 오프셋 이미지에 따라 3차원 키 포인트 투표(271)를 수행하여 기준 키 포인트의 3D 좌표를 얻고, 최소 제곱 피팅(272)을 통해 6D 포즈를 추정(273)한다.If the depth data is reliable, the object pose estimation apparatus performs three-dimensional key point voting 271 according to the three-dimensional key point offset image to obtain the 3D coordinates of the reference key point, and 6D through least squares fitting 272 A pose is estimated (273).

객체 포즈 추정 장치는 6D 포즈를 추정하기 위해서 객체 CAD 모델(280)이 이용될 수 있으며, 이때, 객체 CAD 모델(280)은 평면, 원통 또는 어떤 종류의 기하학적 구조와 같은 단순한 기하학적 바디일 수 있으며, 레이저 스캐닝 또는 기타 방법으로 얻은 3D 모델일 수도 있다.The object pose estimation apparatus may use an object CAD model 280 to estimate a 6D pose, wherein the object CAD model 280 may be a simple geometric body such as a plane, a cylinder, or some kind of geometric structure, It may also be a 3D model obtained by laser scanning or other methods.

예시 3Example 3

도 3은 일 실시 예에 따라 두번의 특징 추출에 기반하여 객체 포즈를 추정하는 과정을 도시한 도면이다.3 is a diagram illustrating a process of estimating an object pose based on two times of feature extraction according to an embodiment.

도 3을 참조하면, 객체 포즈 추정 장치는 동일한 객체의 컬러 이미지(301)와 깊이 이미지(302)를 획득하고, 이미지 특징 추출 네트워크(311)를 사용하여 컬러 이미지(301)와 깊이 이미지(302)에 대해 이미지 특징 추출을 진행하여, 이미지 특징을 획득한다. 객체 포즈 추정 장치는 이미지 특징을 영역 제안 네트워크(Region Proposal Network)(321)에 입력하여 영역화 처리하고, 다시 ROI 풀링(322)을 통해 영역 이미지 특징을 획득한다. Referring to FIG. 3 , the object pose estimation apparatus obtains a color image 301 and a depth image 302 of the same object, and uses an image feature extraction network 311 to obtain a color image 301 and a depth image 302 . By performing image feature extraction on , image features are obtained. The object pose estimation apparatus inputs image features to a Region Proposal Network 321 for regionalization processing, and again acquires regional image features through ROI pooling 322 .

객체 포즈 추정 장치는 영역 이미지 특징에 기초하여, 전체 연결 레이어(331)의 특징 처리를 진행하여, 깊이 신뢰도 이미지(332)를 획득하고, 컨볼루션(CNN) 분할 네트워크(341)를 사용하여 객체 인스턴스를 분할하여, 객체 인스턴스 분할 이미지(342)를 획득한다.The object pose estimation apparatus performs feature processing of the entire connected layer 331 based on the region image feature to obtain a depth reliability image 332, and uses a convolutional (CNN) segmentation network 341 to obtain an object instance is divided to obtain an object instance segmentation image 342 .

객체 포즈 추정 장치는 객체 인스턴스 분할 이미지(342)와 영역 이미지 특징에 기초하여 인스턴스 분할 이미지 특징(343)을 획득한다.The apparatus for estimating an object pose obtains an instance segmentation image feature 343 based on the object instance segmentation image 342 and the region image feature.

객체 포즈 추정 장치는 인스턴스 분할 이미지 특징(343) 및 깊이 신뢰도 이미지(332)에 따라 깊이 데이터의 신뢰도를 판단(350)한다.The object pose estimation apparatus determines (350) reliability of the depth data according to the instance segmentation image feature (343) and the depth reliability image (332).

객체 포즈 추정 장치는 깊이 데이터를 신뢰할 수 없는 경우, 인스턴스 분할 이미지 특징(343)을 2차원 키 포인트로 사용하여 객체 포즈 추정을 입력한다. 구체적으로, 객체 포즈 추정 장치는 인스턴스 분할 이미지 특징(343)을 컨볼루션 신경망(361)의 입력으로 사용하여 객체의 2차원 키 포인트 오프셋 이미지를 출력하고, 2차원 키 포인트 오프셋 이미지에 따라, 2차원 키 포인트 투표(362)를 수행하여 미리 설정된 기준 키 포인트의 2D 좌표를 얻고, EPnP 알고리즘(363)에 따라 기준 키 포인트의 3D 좌표를 미리 설정하여 6D 포즈를 추정(364)한다.When the object pose estimation apparatus does not trust the depth data, the object pose estimation is input using the instance segmented image feature 343 as a two-dimensional key point. Specifically, the object pose estimation apparatus outputs a two-dimensional key point offset image of an object using the instance segmentation image feature 343 as an input of the convolutional neural network 361, and according to the two-dimensional key point offset image, the two-dimensional The key point voting 362 is performed to obtain the 2D coordinates of the preset reference key points, and the 3D coordinates of the reference key points are preset according to the EPnP algorithm 363 to estimate 364 a 6D pose.

객체 포즈 추정 장치는 깊이 이미지에 대응하는 깊이 데이터를 신뢰할 수 있는 경우, 인스턴스 분할 이미지 특징과 전력 특징 추출 네트워크를 사용하여, 획득한 인스턴스 분할의 기하학적 특징(371)을 융합(372)하고, 융합된 특징인 3차원 키 포인트를 기반으로 객체의 포즈를 추정한다.When the depth data corresponding to the depth image is reliable, the object pose estimation apparatus fuses 372 the geometrical features 371 of the obtained instance segmentation using an instance segmentation image feature and a power feature extraction network, and the fused Estimate the pose of the object based on the characteristic 3D key point.

구체적으로, 객체 포즈 추정 장치는 3차원 그리기 도구(MPL)(373)를 이용하여 타겟 객체의 융합 특징에 따라 3차원 키 포인트 오프셋 이미지를 추정하고, 3차원 키 포인트 오프셋 이미지에 따라 3차원 키 포인트 투표(374)를 수행하여 기준 키 포인트의 3D 좌표를 얻고, 최소 제곱 피팅(375)을 통해 6D 포즈를 추정(376)한다.Specifically, the apparatus for estimating an object pose uses a three-dimensional drawing tool (MPL) 373 to estimate a three-dimensional key point offset image according to the fusion characteristic of the target object, and a three-dimensional key point according to the three-dimensional key point offset image A vote 374 is performed to obtain the 3D coordinates of the reference key point, and a 6D pose is estimated 376 via least squares fitting 375 .

객체 포즈 추정 장치는 6D 포즈를 추정하기 위해서 객체 CAD 모델(380)이 이용될 수 있으며, 이때, 객체 CAD 모델(380)은 평면, 원통 또는 어떤 종류의 기하학적 구조와 같은 단순한 기하학적 바디일 수 있으며, 레이저 스캐닝 또는 기타 방법으로 얻은 3D 모델일 수도 있다.The object pose estimation apparatus may use an object CAD model 380 to estimate a 6D pose, where the object CAD model 380 may be a simple geometric body such as a plane, a cylinder, or some kind of geometric structure, It may also be a 3D model obtained by laser scanning or other methods.

도 2와 비교하여, 도 3의 방안은 특징 추출을 위해 2단계 특징 추출 네트워크를 채택하고, 추출된 특징을 더욱 세분화하여 정확한 객체 포즈 추정 결과를 얻는 데 도움이 된다.Compared with FIG. 2, the method of FIG. 3 adopts a two-step feature extraction network for feature extraction, and further refines the extracted features to help obtain accurate object pose estimation results.

예시 4Example 4

도 6은 일 실시 예에 따라 한번의 특징 추출에 기반하고 타겟 객체의 제2 외관 특징을 사용하여 객체 포즈를 추정하는 과정을 도시한 도면이다.6 is a diagram illustrating a process of estimating an object pose based on one-time feature extraction and using a second appearance feature of a target object, according to an embodiment.

도 6을 참조하면, 객체 포즈 추정 장치는 이미지 특징 추출 네트워크(611)를 사용하여 컬러 이미지(601) 및 깊이 이미지(602)에 대해 이미지 특징을 추출하여, 이미지 특징을 획득하고, 포인트 클라우드 특징 추출 네트워크(612)를 사용하여, 깊이 이미지의 변환된 포인트 클라우드 데이터에 대해 포인트 클라우드 특징을 추출하여, 포인트 클라우드 특징을 획득한다.Referring to FIG. 6 , the apparatus for estimating an object pose uses an image feature extraction network 611 to extract image features for a color image 601 and a depth image 602 to obtain image features, and point cloud feature extraction The network 612 is used to extract the point cloud features from the transformed point cloud data of the depth image to obtain the point cloud features.

객체 포즈 추정 장치는 이미지 특징과 포인트 클라우드 특징을 융합(620)하여 융합 특징을 얻고, 융합 특징을 기반으로 객체 시맨틱 분할(641) 및 객체 인스턴스 분할(632)을 수행하여, 객체 인스턴스 분할 이미지(643)를 획득한다.The object pose estimation apparatus fuses 620 the image feature and the point cloud feature to obtain the fusion feature, and performs object semantic segmentation 641 and object instance segmentation 632 based on the fusion feature to perform object instance segmentation image 643 ) is obtained.

도 2에 도시된 방안과 비교하여, 도 6에 도시된 방안은 객체 관계 분기(630)를 추가한다. 객체 관계 분기(630)에서 얻은 가중된 외관 특징(637)은 3차원 키 포인트를 기반으로 하는 객체 포즈 추정에 영향을 주지만, 2차원 키 포인트를 기반으로 하는 객체 포즈 추정에는 영향을 미치지 않는다. 2차원 키 포인트를 기반으로 하는 객체 포즈 추정은 도 2에 도시된 방안과 동일하다. 따라서, 다음으로는 설명을 위한 예로 3차원 키 포인트를 기반으로 한 객체 포즈 추정을 사용하였다. 객체 관계 분기(630)에서, 객체 포즈 추정 장치는 융합 특징에 따라 단일 타겟 객체의 외관 특징(631)과 복수의 타겟 객체의 외관 특징(632)을 결정한다. 즉, 객체 포즈 추정 장치는 각 타겟 객체의 제1 외관 특징을 얻고, 객체 인스턴스 분할 이미지(643)를 기반으로 복수의 타겟 객체 간의 기하학적 관계 특징(633)을 얻고, 각 타겟 객체의 제1 외관 특징 및 각 타겟 객체 간의 기하학적 관계 특징에 대해 주의력을 추가(634)하여 타겟 객체의 관계 특징(635)을 얻는다. Compared to the scheme shown in FIG. 2 , the scheme shown in FIG. 6 adds an object relationship branch 630 . The weighted appearance feature 637 obtained from the object relationship branch 630 affects object pose estimation based on three-dimensional key points, but does not affect object pose estimation based on two-dimensional key points. The object pose estimation based on the two-dimensional key point is the same as the scheme shown in FIG. 2 . Therefore, next, object pose estimation based on three-dimensional key points is used as an example for explanation. In the object relationship branch 630 , the object pose estimation apparatus determines an appearance feature 631 of a single target object and an appearance feature 632 of a plurality of target objects according to the fusion feature. That is, the object pose estimation apparatus obtains a first appearance feature of each target object, obtains a geometric relationship feature 633 between a plurality of target objects based on the object instance segmentation image 643 , and a first appearance feature of each target object and adding ( 634 ) attention to the geometrical relationship feature between each target object to obtain a relationship feature ( 635 ) of the target object.

객체 포즈 추정 장치는 획득한 타겟 객체의 관계 특징(635)과 그 자체에 대응하는 제1 외관 특징에 대해 픽셀 별(Per Pixel) 추가(636)하여, 해당 타겟 객체의 가중된 외관 특징(637), 즉 해당 타겟 객체의 제2 외관 특징을 얻는다. The apparatus for estimating the object pose adds (636) per pixel to the obtained relational feature (635) of the target object and the first appearance feature corresponding to itself (636), and the weighted appearance feature of the target object (637) , that is, the second appearance feature of the corresponding target object is obtained.

객체 포즈 추정 장치는 제2 외관 특징을 기반으로 포즈 회귀(638) 처리를 수행한다. 예를 들어, 회귀 네트워크를 사용하여, 제2 외관 특징을 처리하여 객체 관계 분기에서 예측한 객체 포즈를 얻는다.The object pose estimation apparatus performs pose regression 638 processing based on the second appearance feature. For example, using a regression network, the second appearance feature is processed to obtain the object pose predicted from the object relationship branch.

객체 포즈 추정 장치는 제2 외관 특징과 융합 특징을 기반으로 관계 특징을 융합(651)하여 3차원 키 포인트 오프셋을 추정(652)하고, 3차원 키 포인트 오프셋 이미지를 획득한다. 객체 포즈 추정 장치는 깊이 이미지를 신뢰할 수 있는 경우, 3차원 키 포인트 오프셋 이미지를 사용하여 3차원 키 포인트 투표(653)를 수행하여 기준 키 포인트의 3D 좌표를 얻고, 최소 제곱 피팅(654)을 통해 객체 포즈를 추정하고, 3차원 키 포인트를 기반으로 한 객체 포즈를 얻는다.The apparatus for estimating the object pose estimates (652) a three-dimensional key point offset by fusing (651) the relational feature based on the second appearance feature and the fusion feature, and obtains a three-dimensional key point offset image. If the object pose estimation apparatus can trust the depth image, it performs three-dimensional key point voting (653) using the three-dimensional key point offset image to obtain the 3D coordinates of the reference key point, and through least squares fitting (654) Estimate the object pose, and obtain the object pose based on three-dimensional key points.

객체 포즈 추정 장치는 객체 관계 분기(630)에서 예측한 객체 포즈와 3차원 키 포인트를 기반으로 한 객체 포즈를 획득한 후, 객체 포즈를 선택(655)하고, 실제 상황에 따라 선택 기준을 조정하여 객체의 6-DOF 포즈 추정(656)을 실현하여, 객체의 6D 포즈를 얻는다.The object pose estimation apparatus obtains the object pose predicted in the object relationship branch 630 and the object pose based on the three-dimensional key point, selects the object pose 655, and adjusts the selection criteria according to the actual situation. By realizing the 6-DOF pose estimation 656 of the object, a 6D pose of the object is obtained.

객체 포즈 추정 장치는 6D 포즈를 추정하기 위해서 객체 CAD 모델(660)을 이용할 수 있다.The object pose estimation apparatus may use the object CAD model 660 to estimate the 6D pose.

해당 방안에서는 객체 관계 분석을 추가하고, 서로 다른 객체 간의 관계 특징을 결합하여 객체 포즈를 추정하였다. 이는 각 타겟 객체에 대응하는 이미지 영역의 정확도를 높이는 데 도움이 되고, 정확한 이미지 영역 분할은 객체 포즈 추정의 정확도를 향상시키는 데 도움이 된다.In this method, object pose was estimated by adding object relationship analysis and combining the relationship characteristics between different objects. This helps to increase the accuracy of the image region corresponding to each target object, and accurate image region segmentation helps to improve the accuracy of object pose estimation.

예시 5Example 5

도 7은 일 실시 예에 따라 두번의 특징 추출에 기반하고 타겟 객체의 제2 외관 특징을 사용하여 객체 포즈를 추정하는 과정을 도시한 도면이다.7 is a diagram illustrating a process of estimating an object pose based on two-time feature extraction and using a second appearance feature of a target object, according to an embodiment.

도 7을 참조하면, 객체 포즈 추정 장치는 이미지 특징 추출 네트워크(711)를 사용하여, 컬러 이미지(701) 및 깊이 이미지(702)에 대해 이미지 특징을 추출하고, 이미지 특징을 획득한다. 객체 포즈 추정 장치는 이미지 특징을 영역 제안 네트워크(721)에 입력하여 영역화 처리하고, 다시 ROI 풀링(722)을 통해 영역 이미지 특징을 얻는다.Referring to FIG. 7 , the object pose estimation apparatus uses an image feature extraction network 711 to extract image features for a color image 701 and a depth image 702 , and obtain image features. The object pose estimation apparatus inputs the image features to the region suggestion network 721 for region processing, and again obtains region image features through ROI pooling 722 .

도 3에 도시된 방안과 비교하여, 도 7에 도시된 방안은 객체 관계 분기(730)를 추가하고, 즉, 영역 이미지 특징에 기반하여 각 타겟 객체의 제1 외관 특징(731, 732)과 각 타겟 객체 간의 기하학적 관계 특징(733)을 수행한다. 이때, 각 타겟 객체의 제1 외관 특징은 단일 타겟 객체의 이미지 외관 특징(731) 및 복수의 타겟 객체의 이미지 외관 특징(732)에 대응한다. 객체 관계 분기(730)에서 각 타겟 객체 간의 기하학적 관계 특징(733)은 객체 탐지 결과(741)를 기반으로 얻은 것이고, 객체 탐지 결과(741)는 획득한 영역 이미지 특징을 분할 컨볼루션 네트워크(740)에 입력하여 처리를 통해 얻은 것이다.Compared to the scheme shown in FIG. 3 , the scheme shown in FIG. 7 adds an object relationship branch 730 , i.e., based on the area image feature, first appearance features 731 and 732 of each target object and each Perform geometrical relationship feature 733 between target objects. Here, the first appearance feature of each target object corresponds to the image appearance feature 731 of the single target object and the image appearance feature 732 of the plurality of target objects. In the object relationship branch 730 , the geometric relationship feature 733 between each target object is obtained based on the object detection result 741 , and the object detection result 741 divides the acquired regional image feature into a convolutional network 740 . It is entered into and obtained through processing.

객체 포즈 추정 장치는 단일 타겟 객체의 이미지 외관 특징(731), 복수의 타겟 객체의 이미지 외관 특징(732) 및 복수의 타겟 객체 간의 기하학적 관계 특징(733)에 대해 주의력을 추가(734)하여, 타겟 객체의 관계 특징(735)을 얻는다. 객체 포즈 추정 장치는 타겟 객체의 관계 특징 및 외관 특징에 대해 픽셀 별 추가(736)하여 가중치가 적용된 외관 특징(737), 즉 타겟 객체의 제2 외관 특징을 얻는다. 객체 포즈 추정 장치는 제2 외형 특징을 기반으로 포즈 회귀(738) 처리를 수행한다. 예를 들어, 회귀 네트워크를 사용하여, 제2 외관 특징을 처리하여 객체 관계 분기에서 예측한 객체 포즈를 얻는다.The object pose estimation apparatus adds (734) attention to the image appearance feature (731) of a single target object, the image appearance feature (732) of the plurality of target objects, and the geometrical relationship feature (733) between the plurality of target objects, so that the target Get the relational feature 735 of the object. The apparatus for estimating the object pose obtains a weighted appearance feature 737, that is, a second appearance feature of the target object, by adding 736 to each pixel for the relational feature and the appearance feature of the target object. The object pose estimation apparatus performs pose regression 738 processing based on the second appearance feature. For example, using a regression network, the second appearance feature is processed to obtain the object pose predicted from the object relationship branch.

도 3에 도시된 방안과 비교하여, 도 7에 도시된 방안은 객체 관계 분기(730)를 추가하고, 객체 관계 분기(730)에서 얻은 가중된 외관 특징(737)은 3차원 키 포인트를 기반으로 하는 객체 포즈 추정에 영향을 주지만, 2차원 키 포인트를 기반으로 하는 객체 포즈 추정에는 영향을 미치지 않는다. 2차원 키 포인트를 기반으로 하는 객체 포즈 추정은 도 3에 도시된 방안과 동일하다. 따라서, 다음으로 3차원 키 포인트를 기반으로 한 객체 포즈 추정을 예로 들어 설명한다.Compared with the scheme shown in Fig. 3, the scheme shown in Fig. 7 adds an object relationship branch 730, and the weighted appearance feature 737 obtained from the object relationship branch 730 is based on a three-dimensional key point. affects the object pose estimation based on two-dimensional key points, but does not affect the object pose estimation based on two-dimensional key points. The object pose estimation based on the two-dimensional key point is the same as the method shown in FIG. 3 . Therefore, next, object pose estimation based on three-dimensional key points will be described as an example.

객체 포즈 추정 장치는 ROI 풀링(722) 후, 영역 이미지 특징을 획득하고, 영역 이미지 특징을 분할 컨볼루션 네트워크(740)에 입력하여 객체 인스턴스 분할 이미지(742)를 획득한 다음, 영역 이미지 특징 및 객체 인스턴스 분할 이미지(742)에 따라 분할된 이미지 특징(743), 즉 인스턴스 분할 이미지 특징을 획득한다.After the ROI pooling 722, the apparatus for estimating the object pose acquires a region image feature, inputs the region image feature to the segmentation convolution network 740 to obtain an object instance segmentation image 742, and then obtains the region image feature and the object. A segmented image feature 743 , that is, an instance segmented image feature, is obtained according to the instance segmented image 742 .

객체 포즈 추정 장치는 깊이 이미지에 대해 포인트 클라우드 데이터 변환 및 포인트 클라우드 특징 추출을 수행하여, 분할된 기하학적 특징(751), 즉 인스턴스 분할 기하학적 특징(751)을 획득한다.The object pose estimation apparatus performs point cloud data transformation and point cloud feature extraction on the depth image to obtain a segmented geometric feature 751 , that is, an instance segmented geometric feature 751 .

객체 포즈 추정 장치는 획득한 분할된 이미지 특징(743), 분할된 기하학적 특징(751) 및 가중치가 적용된 외관 특징(737)을 융합(752)하고, 3차원 그리기 도구(MPL)(753)를 이용하여 융합 특징에 따라 3차원 키 포인트 오프셋(754)을 추정하고, 3차원 키 포인트 오프셋 이미지에 따라 3차원 키 포인트 투표(755)를 수행하여 기준 키 포인트(753)의 3D 좌표를 얻고, 최소 제곱 피팅(757)을 통해 객체 포즈를 추정하고, 3차원 키 포인트를 기반으로 한 객체 포즈를 얻는다.The object pose estimation apparatus fuses (752) the acquired segmented image feature (743), segmented geometric feature (751), and weighted appearance feature (737), and uses a three-dimensional drawing tool (MPL) (753). to estimate the three-dimensional key point offset 754 according to the fusion feature, and perform a three-dimensional key point voting 755 according to the three-dimensional key point offset image to obtain the 3D coordinates of the reference key point 753, least squares An object pose is estimated through fitting 757, and an object pose is obtained based on a three-dimensional key point.

객체 포즈 추정 장치는 객체 관계 분기(730)에서 예측한 객체 포즈와 3차원 키 포인트를 기반으로 한 객체 포즈에 따라 포즈를 선택(760)하고, 설정된 선택 조건 따라 객체의 6-DOF 포즈 추정(761), 즉 6D 포즈를 결정한다.The object pose estimation apparatus selects a pose according to the object pose predicted in the object relationship branch 730 and the object pose based on the three-dimensional key point (760), and estimates the 6-DOF pose of the object (761) according to the set selection condition. ), that is, to determine the 6D pose.

해당 방안은 2단계 특징 추출을 기반으로 객체 관계 분석을 추가한다. 보다 정확한 특징을 기반으로, 각 객체 영역의 정확한 분할을 개선하는 데 도움이 되고, 다시 서로 다른 객체 간의 관계 특징을 결합하여 객체 포즈를 추정하며, 이는 각 타겟 객체에 대응하는 이미지 영역의 정확도를 높이고 객체 포즈 추정의 정확도를 높이는 데 도움이 된다.This method adds object relationship analysis based on two-step feature extraction. Based on more accurate features, it helps to improve the accurate segmentation of each object region, and again combines the relational features between different objects to estimate the object pose, which increases the accuracy of the image region corresponding to each target object and It helps to increase the accuracy of object pose estimation.

도 14는 일 실시 예에 따라 객체 포즈 추정 장치의 개략적인 구성을 도시한 도면이다.14 is a diagram illustrating a schematic configuration of an apparatus for estimating an object pose according to an embodiment.

도 14를 참조하면, 객체 포즈 추정 장치(1400)는 이미지 신뢰도 결정부(1410), 포즈 추정부(1420)를 포함한다.Referring to FIG. 14 , the object pose estimation apparatus 1400 includes an image reliability determination unit 1410 and a pose estimation unit 1420 .

이미지 신뢰도 결정부(1410)는 객체의 컬러 이미지 및 깊이 이미지에 따라 깊이 이미지의 신뢰도를 결정한다.The image reliability determining unit 1410 determines the reliability of the depth image according to the color image and the depth image of the object.

포즈 추정부(1420)는 깊이 이미지를 신뢰할 수 있는 경우, 3차원 키 포인트를 기반으로 객체의 포즈를 추정한다.The pose estimator 1420 estimates the pose of the object based on the 3D key point when the depth image is reliable.

포즈 추정부(1420)는 또한 깊이 이미지를 신뢰할 수 없는 경우, 2차원 키 포인트를 기반으로 객체의 포즈를 추정한다.The pose estimator 1420 also estimates the pose of the object based on the two-dimensional key point when the depth image is unreliable.

이미지 신뢰도 결정부(1410)는 융합 특징 신뢰도 결정부 또는 이미지 특징 신뢰도 결정부 중 적어도 하나를 포함할 수 있다.The image reliability determiner 1410 may include at least one of a fusion feature reliability determiner and an image feature reliability determiner.

융합 특징 신뢰도 결정부는 컬러 이미지를 기반으로, 또는 컬러 이미지 및 깊이 이미지를 기반으로, 이미지 특징을 추출하고, 깊이 이미지에 따라 포인트 클라우드 특징을 추출하고, 이미지 특징과 포인트 클라우드 특징을 융합하여 융합 특징을 얻고, 융합 특징을 기반으로 깊이 이미지의 신뢰도를 결정한다.The fusion feature reliability determining unit extracts image features based on a color image or based on a color image and a depth image, extracts a point cloud feature according to the depth image, and fuses the image feature and the point cloud feature to determine the fusion feature. and determine the reliability of the depth image based on the fusion feature.

이미지 특징 신뢰도 결정부는 컬러 이미지 및 깊이 이미지를 기반으로 이미지 특징을 추출하고, 이미지 특징에 따라 깊이 이미지의 신뢰도를 결정한다.The image feature reliability determining unit extracts image features based on the color image and the depth image, and determines the reliability of the depth image according to the image features.

보다 구체적으로 융합 특징 신뢰도 결정부는 융합 특징에 기반하여, 객체 인스턴스 분할 이미지 및 깊이 신뢰도 이미지를 획득하고, 객체 인스턴스 분할 이미지 및 깊이 신뢰도 이미지에 따라 컬러 이미지 중 각 타겟 객체에 대응하는 깊이 이미지의 신뢰도를 결정한다.More specifically, the fusion feature reliability determining unit obtains an object instance segmentation image and a depth reliability image based on the fusion feature, and determines the reliability of the depth image corresponding to each target object among the color images according to the object instance segmentation image and the depth reliability image. decide

이미지 특징 신뢰도 결정부는 이미지 특징에 따라, 객체 인스턴스 분할 이미지 및 깊이 신뢰도 이미지를 획득하고, 객체 인스턴스 분할 이미지 및 깊이 신뢰도 이미지에 따라 컬러 이미지 중 각 타겟 객체에 대응하는 깊이 이미지의 신뢰도를 결정한다.The image feature reliability determining unit obtains an object instance segmentation image and a depth reliability image according to the image characteristics, and determines the reliability of a depth image corresponding to each target object among the color images according to the object instance segmentation image and the depth reliability image.

이미지 특징 신뢰도 결정부는 이미지 특징에 따라, 각 타겟 객체에 대응하는 이미지 영역의 영역 이미지 특징을 획득하고, 각 타겟 객체에 대해, 해당 타겟 객체에 대응하는 영역 이미지 특징을 기반으로, 해당 타겟 객체의 깊이 신뢰도 이미지를 결정하고, 각 타겟 객체에 대응하는 영역 이미지 특징을 기반으로, 객체 인스턴스 분할 이미지를 획득한다.The image feature reliability determining unit acquires a regional image feature of an image region corresponding to each target object according to an image characteristic, and for each target object, based on the region image feature corresponding to the target object, the depth of the target object A reliability image is determined, and an object instance segmentation image is obtained based on a region image feature corresponding to each target object.

객체 포즈 추정 장치(1400)는 기하학적 관계 특징 획득부와 제2 외관 특징 결정부를 더 포함할 수 있다.The object pose estimating apparatus 1400 may further include a geometrical relationship feature acquirer and a second appearance feature determiner.

기하학적 관계 특징 획득부는 컬러 이미지 및 깊이 이미지를 기반으로, 각 타겟 객체의 제1 외관 특징 및 각 타겟 객체 간의 기하학적 관계 특징을 획득한다.The geometrical relationship feature acquiring unit acquires a first appearance feature of each target object and a geometrical relationship feature between each target object based on the color image and the depth image.

제2 외관 특징 결정부는 각 타겟 객체에 대해, 해당 타겟 객체의 제1 외관 특징, 해당 타겟 객체 이외의 다른 타겟 객체의 제1 외관 특징 및 해당 타겟 객체와 해당 타겟 객체 이외의 다른 타겟 객체 간의 기하학적 관계 특징을 기반으로, 해당 타겟 객체의 제2 외관 특징을 결정한다.For each target object, the second appearance characteristic determining unit includes a first appearance characteristic of the corresponding target object, first appearance characteristics of other target objects other than the corresponding target object, and a geometric relationship between the corresponding target object and other target objects other than the corresponding target object Based on the characteristic, a second appearance characteristic of the corresponding target object is determined.

포즈 추정부(1420)는 융합 특징 및 각 타겟 객체의 제2 외관 특징에 따라 각 타겟 객체의 포즈를 추정할 수 있다. 또는 포즈 추정부(1420)는 이미지 특징 및 각 타겟 객체의 제2 외관 특징에 따라 각 타겟 객체의 포즈를 추정할 수 있다.The pose estimator 1420 may estimate the pose of each target object according to the fusion feature and the second appearance feature of each target object. Alternatively, the pose estimator 1420 may estimate the pose of each target object according to the image feature and the second appearance feature of each target object.

기하학적 관계 특징 획득부는 다음의 2가지 방안 중 적어도 하나를 통해서 기하학적 관계 특징을 획득할 수 있다.The geometrical relationship feature acquisition unit may acquire the geometrical relationship feature through at least one of the following two methods.

첫번째 방안으로, 기하학적 관계 특징 획득부는 컬러 이미지를 기반으로, 또는 컬러 이미지 및 깊이 이미지를 기반으로, 이미지 특징을 추출하고; 깊이 이미지에 따라 포인트 클라우드 특징을 추출하고, 이미지 특징과 포인트 클라우드 특징을 융합하여 융합 특징을 얻고, 융합 특징을 기반으로, 각 타겟 객체의 제1 외관 특징 및 객체 인스턴스 분할 이미지를 획득하고, 객체 인스턴스 분할 이미지를 기반으로, 각 타겟 객체 간의 기하학적 관계 특징을 획득할 수 있다.As a first method, the geometric relation feature obtaining unit extracts image features based on a color image, or based on a color image and a depth image; Extracting a point cloud feature according to the depth image, fusing the image feature and the point cloud feature to obtain a fusion feature, and obtaining a first appearance feature and object instance segmentation image of each target object based on the fusion feature, and object instance Based on the segmented image, a geometrical relationship characteristic between each target object may be obtained.

두번째 방안으로, 기하학적 관계 특징 획득부는 컬러 이미지 및 깊이 이미지를 기반으로 이미지 특징을 추출하고, 이미지 특징에 따라, 각 타겟 객체의 이미지 영역에 대응하는 영역 이미지 특징을 획득하고, 각 타겟 객체의 이미지 영역에 대응하는 영역 이미지 특징을 기반으로, 각 타겟 객체의 제1 외관 특징 및 상응하는 객체 탐지 결과를 획득하고, 각 타겟 객체의 객체 탐지 결과를 기반으로, 각 타겟 객체의 기하학적 관계 특징을 획득할 수 있다.As a second method, the geometric relation feature acquisition unit extracts image features based on the color image and the depth image, and according to the image features, acquires region image features corresponding to the image regions of each target object, and image regions of each target object Based on the area image feature corresponding to , obtain a first appearance feature of each target object and a corresponding object detection result, and based on the object detection result of each target object, obtain a geometrical relationship feature of each target object there is.

한편, 객체 포즈 추정 장치(1400)는 초기 프레임부를 더 포함할 수 있다.Meanwhile, the object pose estimation apparatus 1400 may further include an initial frame unit.

초기 프레임부는 비디오 프레임에서 타겟 객체 또는 타겟 포즈가 처음 나타나는지 여부를 탐지하여, 비디오 프레임이 초기 프레임인지 결정하는데 사용된다.The initial frame part is used to determine whether the video frame is an initial frame by detecting whether the target object or target pose first appears in the video frame.

초기 프레임부는 해당 비디오 프레임에서 각 타겟 개체의 이미지 바운딩 박스를 획득하고, 각 타겟 객체의 이미지 바운딩 박스를 각 포즈 결과 목록의 각 타겟 객체에 대응하는 이미지 바운딩 박스와 일치시키고, 포즈 결과 목록에 일치하는 타겟 객체가 있는 경우, 해당 비디오 프레임에서 각 타겟 객체에 대응하는 이미지 바운딩 박스의 제1 포인트 클라우드 데이터와 해당 비디오 프레임의 이전 비디오 프레임에서 각 타겟 객체에 대응하는 제2 포인트 클라우드 데이터 프레임을 비교하여, 제1 포인트 클라우드 데이터와 제2 포인트 클라우드 데이터 사이에 차이가 있는지 여부를 결정하고, 차이가 있으면 타겟 객체에 대응하는 객체의 포즈가 처음 나타난 것으로 결정하고, 객체 포즈 결과 목록에 일치하는 타겟 객체가 없으면 타겟 객체가 처음 나타나는 것으로 결정할 수 있다.The initial frame unit acquires the image bounding box of each target object in the corresponding video frame, matches the image bounding box of each target object with the image bounding box corresponding to each target object in each pose result list, and matches the pose result list If there is a target object, by comparing the first point cloud data of the image bounding box corresponding to each target object in the video frame with the second point cloud data frame corresponding to each target object in the previous video frame of the video frame, Determine whether there is a difference between the first point cloud data and the second point cloud data, if there is a difference, it is determined that the pose of the object corresponding to the target object appears first, and if there is no matching target object in the object pose result list It can be determined that the target object appears first.

본 개시는 메모리와 프로세서를 포함하는 전자 장치를 더 제공할 수 있다. 이때, 메모리는 컴퓨터 프로그램을 저장하고, 프로세서는 컴퓨터 프로그램을 실행할 때 본 개시의 실시 예에서 제공한 방법을 실행하기 위해 사용될 수 있다.The present disclosure may further provide an electronic device including a memory and a processor. In this case, the memory stores the computer program, and the processor may be used to execute the method provided in the embodiment of the present disclosure when the computer program is executed.

본 개시는 컴퓨터 판독 가능 저장 매체를 더 제공하고, 저장 매체는 컴퓨터 프로그램을 저장하고, 컴퓨터 프로그램이 프로세서에 의해 실행될 때, 본 개시의 실시 예에서 제공한 방법을 실행할 수 있다.The present disclosure further provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the method provided in the embodiments of the present disclosure can be executed.

도 15는 일 실시 예에 따라 전자 장치의 개략적인 구성을 도시한 도면이다.15 is a diagram illustrating a schematic configuration of an electronic device according to an exemplary embodiment.

도 15를 참조하면, 전자 장치(1500)는, 프로세서(1510) 및 메모리(1530)를 포함한다. 프로세서(1510)와 메모리(1530)는 예를 들어, 버스(1520)를 통해 서로 연결된다. 선택적으로, 전자 장치(1500)는 송수신기(1540)를 더 포함할 수 있다. 실제 적용에서 송수신기(1540)는 하나로 제한되지 않으며, 전자 장치(1500)의 구조는 본 개시의 실시예를 제한하지 않는다.Referring to FIG. 15 , the electronic device 1500 includes a processor 1510 and a memory 1530 . The processor 1510 and the memory 1530 are connected to each other through, for example, a bus 1520 . Optionally, the electronic device 1500 may further include a transceiver 1540 . In actual application, the transceiver 1540 is not limited to one, and the structure of the electronic device 1500 is not limited to the embodiment of the present disclosure.

프로세서(1510)는 CPU(Central Processing Unit, 중앙 처리 장치), 일반 프로세서, DSP(Digital Signal Processor, 디지털 신호 프로세서), ASIC(Application Specific Integrated Circuit, 애플리케이션 특정 집적 회로), FPGA(Field Programmable Gate Array, 필드 프로그램 가능 게이트 어레이) 또는 기타 프로그램 가능 논리 장치, 트랜지스터 논리 장치, 하드웨어 구성 요소 또는 이들의 임의의 조합일 수 있다. 이는 본 개시에서 설명된 다양한 예시적 논리 블록, 모듈 및 회로를 결합하여 구현 또는 실행할 수 있다. 프로세서(1510)는 또한 하나 이상의 마이크로 프로세서 조합, DSP 및 마이크로 프로세서의 조합 등과 같은 컴퓨팅 기능을 실현하는 조합일 수 있다.The processor 1510 includes a central processing unit (CPU), a general processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), field programmable gate arrays) or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It may be implemented or executed in combination with the various illustrative logical blocks, modules, and circuits described in this disclosure. Processor 1510 may also be a combination that realizes computing functions, such as a combination of one or more microprocessors, a combination of DSP and microprocessors, and the like.

버스(1520)는 구성들 간의 정보를 전송하기 위한 경로를 포함할 수 있다. 버스(1520)는 PCI(Peripheral Component Interconnect, 주변 부품 상호 연결 표준) 버스 또는 EISA(Extended Industry Standard Architecture, 확장 기술 표준 구조) 버스 등일 수 있다. 버스(4002)는 주소 버스, 데이터 버스, 제어 버스 등으로 나눌 수 있다. 표현의 편의를 위해, 도 15에서는 하나의 굵은 선만 사용하지만, 이것이 버스가 하나만 있거나 버스 유형이 하나만 있는 것을 나타내지는 않는다. Bus 1520 may include a path for transferring information between configurations. The bus 1520 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus 4002 may be divided into an address bus, a data bus, a control bus, and the like. For convenience of expression, only one thick line is used in FIG. 15, but this does not indicate that there is only one bus or that there is only one bus type.

메모리(1530)는 ROM(Read Only Memory) 또는 정적 정보 및 명령을 저장할 수 있는 다른 유형의 정적 저장 장치, RAM(Random Access Memory, 임의 추출 기억 장치) 또는 정보 및 명령을 저장할 수 있는 다른 유형의 동적 저장 장치일 수 있고, 또는 EEPROM(Electrically Erasable Programmable Read Only Memory), CD-ROM(Compact Disc Read Only Memory) 또는 기타 광 디스크 저장 장치, 광 디스크 저장 장치, 디스크 저장 매체 또는 다른 자기 저장 장치, 또는 명령 또는 데이터 구조의 형태로 원하는 프로그램 코드를 휴대 또는 저장하는데 사용되고 컴퓨터에 의해 액세스될 수 있는 임의의 다른 매체일 수 있으며, 이에 제한되지 않는다.Memory 1530 may include read only memory (ROM) or other type of static storage that may store static information and instructions, random access memory (RAM), or other type of dynamic storage that may store information and instructions. may be a storage device, or may be an Electrically Erasable Programmable Read Only Memory (EEPROM), Compact Disc Read Only Memory (CD-ROM) or other optical disc storage device, an optical disc storage device, a disc storage medium or other magnetic storage device, or instructions or any other medium that can be accessed by a computer and used to carry or store the desired program code in the form of a data structure, but is not limited thereto.

메모리(1530)는 본 개시의 실시 예를 실행하기 위한 애플리케이션 프로그램 코드를 저장하는데 사용되고, 실행은 프로세서(1510)에 의해 제어된다. 프로세서(1510)는 메모리(1530)에 저장된 애플리케이션 프로그램 코드(컴퓨터 프로그램)를 실행하여 상기 방법 실시예의 내용을 구현하는데 사용된다.The memory 1530 is used to store application program code for executing an embodiment of the present disclosure, and execution is controlled by the processor 1510 . The processor 1510 is used to execute the application program code (computer program) stored in the memory 1530 to implement the contents of the above method embodiment.

본 개시에서 제공하는 실시예에서, 전자 장치에서 실행되는 상기 객체 포즈 추정 방법은 인공 지능 모델을 이용하여 실행될 수 있다.In the embodiment provided in the present disclosure, the method for estimating the object pose executed in the electronic device may be performed using an artificial intelligence model.

본 개시의 전자 장치(1500)에서 실행되는 해당 방법은 이미지 데이터 또는 비디오 데이터를 인공 지능 모델의 입력 데이터로 이용하여 인식 이미지의 또는 이미지의 이미지 특징의 출력 데이터를 획득할 수 있다. 인공 지능 모델은 훈련을 통해 획득할 수 있다. 여기서, '훈련을 통해 획득한다'는 것은 훈련 알고리즘을 통해 복수의 훈련 데이터로 기본 인공 지능 모델을 훈련하여 원하는 특징(또는 목적)을 수행하도록 미리 정의된 운영 규칙 또는 인공 지능 모델을 획득하는 것을 의미한다. 인공 지능 모델은 복수의 신경망 레이어를 포함할 수 있다. 복수의 신경망 레이어의 각 레이어는 복수의 가중치 값을 포함하고, 이전 레이어의 계산 결과와 복수의 가중치 값 사이의 계산을 통해 신경망 계산을 수행한다.The method executed in the electronic device 1500 of the present disclosure may use image data or video data as input data of an artificial intelligence model to obtain output data of a recognized image or an image characteristic of an image. Artificial intelligence models can be acquired through training. Here, 'acquired through training' means acquiring a predefined operating rule or artificial intelligence model to perform a desired feature (or purpose) by training a basic artificial intelligence model with a plurality of training data through a training algorithm. do. An artificial intelligence model may include a plurality of neural network layers. Each layer of the plurality of neural network layers includes a plurality of weight values, and a neural network calculation is performed through calculations between the calculation results of the previous layer and the plurality of weight values.

시각적 이해는 인간의 시각과 같은 것을 인식하고 사물을 처리하는 기술이며, 예를 들어, 대상 인식, 대상 추적, 이미지 검색, 인간 인식, 장면 인식, 3D 재구성/포지셔닝 또는 이미지 향상을 포함한다.Visual comprehension is the art of recognizing things like human vision and processing objects, and includes, for example, object recognition, object tracking, image retrieval, human recognition, scene recognition, 3D reconstruction/positioning or image enhancement.

본 개시의 객체 포즈 추정 장치는 AI 모델을 통해 복수의 구성 중 적어도 하나의 구성이 구현될 수 있다. AI와 관련된 기능은 비휘발성 메모리, 휘발성 메모리 및 프로세서에서 수행할 수 있다.In the apparatus for estimating an object pose of the present disclosure, at least one configuration among a plurality of configurations may be implemented through an AI model. Functions related to AI can be performed in non-volatile memory, volatile memory, and processors.

해당 프로세서는 하나 이상의 프로세서를 포함할 수 있다. 이때, 해당 하나 이상의 프로세서는 범용 프로세서(예, 중앙 처리 장치(CPU), 애플리케이션 프로세서(AP) 등) 또는 순수 그래픽 처리 장치(예, 그래픽 처리 장치(GPU), 비주얼 처리 장치(VPU) 및/또는 AI 특정 프로세서 (예, 신경 처리 장치(NPU))일 수 있다.The processor may include one or more processors. In this case, the one or more processors are general-purpose processors (eg, central processing unit (CPU), application processor (AP), etc.) or pure graphics processing unit (eg, graphics processing unit (GPU), visual processing unit (VPU) and/or It may be an AI-specific processor (eg, a neural processing unit (NPU)).

해당 하나 이상의 프로세서는 비휘발성 메모리 및 휘발성 메모리에 저장된 미리 정의된 운영 규칙 또는 인공 지능(AI) 모델에 따라 입력 데이터의 처리를 제어한다. 훈련 또는 학습을 통해 미리 정의된 운영 규칙 또는 인공 지능 모델을 제공한다.The one or more processors control the processing of input data according to predefined operating rules or artificial intelligence (AI) models stored in non-volatile memory and volatile memory. It provides predefined operating rules or artificial intelligence models through training or learning.

여기서, 학습을 통해 제공한다는 것은 학습 알고리즘을 다중 학습 데이터에 적용하여 미리 정의된 운영 규칙 또는 원하는 특징을 가진 AI 모델을 획득하는 것을 의미한다. 해당 학습은 실시예에 따른 AI가 수행되는 장치 자체에서 수행될 수 있고, 및/또는 별도의 서버/시스템으로 구현될 수 있다.Here, providing through learning means acquiring an AI model with predefined operating rules or desired characteristics by applying a learning algorithm to multiple learning data. The corresponding learning may be performed in the device itself in which the AI according to the embodiment is performed, and/or may be implemented as a separate server/system.

해당 AI 모델은 복수의 신경망 레이어로 구성될 수 있다. 각 레이어에는 복수의 가중치 값이 있으며, 하나의 레이어의 계산은 이전 레이어의 계산 결과와 현재 레이어의 복수의 가중치에 의해 수행된다. 신경망의 예로, 컨볼루션 신경망(CNN), 심층 신경망(DNN), 반복 신경망(RNN), 제한된 볼츠만 머신(RBM), 심층 신뢰 신경망(DBN), 양방향 순환 심층 신경망(BRDNN), 생성적 적대 신경망(GAN) 및 심층 Q 네트워크를 포함하되 이에 제한되지 않는다.The AI model may be composed of a plurality of neural network layers. Each layer has a plurality of weight values, and the calculation of one layer is performed based on the calculation result of the previous layer and the plurality of weights of the current layer. Examples of neural networks include convolutional neural networks (CNN), deep neural networks (DNNs), recurrent neural networks (RNNs), restricted Boltzmann machines (RBMs), deep trust neural networks (DBNs), bidirectional recurrent deep neural networks (BRDNNs), generative adversarial networks ( GAN) and deep Q networks.

학습 알고리즘은 다중 학습 데이터를 사용하여 미리 결정된 타겟 장치(예, 로봇)를 훈련하여 타겟 장치를 결정하거나 예측하도록 만들거나 허용하거나 제어하는 방법이다. 해당 학습 알고리즘의 예로, 지도 학습(supervised learning), 비지도 학습, 반 지도 학습 또는 강화 학습을 포함하되, 이에 제한되지 않는다.A learning algorithm is a method of making, allowing, or controlling a predetermined target device (eg, a robot) to determine or predict a target device by training a predetermined target device (eg, a robot) using multiple training data. Examples of the corresponding learning algorithm include, but are not limited to, supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 저장할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may store program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or apparatus, to be interpreted by or to provide instructions or data to the processing device. , or may be permanently or temporarily embody in a transmitted signal wave. The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited drawings, those skilled in the art may apply various technical modifications and variations based on the above. For example, the described techniques are performed in an order different from the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

determining the reliability of the depth image according to the color image and the depth image of the object;
estimating the pose of the object based on a three-dimensional key point when the depth image is reliable; and
estimating the pose of the object based on a two-dimensional key point if the depth image is not reliable;
An object pose estimation method comprising a.

According to claim 1,
Determining the reliability of the depth image according to the color image and the depth image,
Based on the color image, or based on the color image and the depth image, image features are extracted, point cloud features are extracted according to the depth image, and fusion features are obtained by fusing the image features and point cloud features. obtaining, and determining the reliability of the depth image based on the fusion feature
An object pose estimation method comprising a.

3. The method of claim 2,
Determining the reliability of the depth image based on the fusion feature comprises:
acquiring an object instance segmentation image and a depth reliability image based on the fusion feature; and
Determining the reliability of the depth image corresponding to each target object among the color images according to the object instance segmentation image and the depth reliability image
An object pose estimation method comprising a.

According to claim 1,
Determining the reliability of the depth image according to the color image and the depth image,
extracting image features based on the color image and the depth image, and determining the reliability of the depth image according to the image features
An object pose estimation method comprising a.

5. The method of claim 4,
Determining the reliability of the depth image according to the image feature comprises:
obtaining, according to the image feature, an object instance segmentation image and a depth reliability image; and
determining the reliability of the depth image corresponding to each target object in the color image according to the object instance segmentation image and the depth reliability image;
An object pose estimation method comprising a.

6. The method of claim 5,
According to the image feature, obtaining the object instance segmentation image and the depth reliability image comprises:
obtaining, according to the image feature, a region image feature of an image region corresponding to each target object; and
For each target object, based on a region image feature corresponding to the target object, a depth reliability image of the corresponding target object is determined, and the object instance segmentation image is based on a region image feature corresponding to each target object steps to obtain
An object pose estimation method comprising a.

According to claim 1,
acquiring a first appearance feature of each target object and a geometrical relationship feature between each target object based on the color image and the depth image; and
For each target object, based on the first appearance characteristic of the corresponding target object, the first appearance characteristic of the target object other than the corresponding target object, and the geometrical relationship characteristic between the target object and the target object other than the corresponding target object, the corresponding determining a second appearance characteristic of the target object;
An object pose estimation method further comprising a.

8. The method of claim 7,
The step of estimating the pose of the object based on the three-dimensional key point,
estimating a pose of each target object according to the fusion characteristic and the second appearance characteristic of each target object;
An object pose estimation method comprising a.

8. The method of claim 7,
The step of estimating the pose of the object based on the three-dimensional key point,
estimating a pose of each target object according to the image characteristic and the second appearance characteristic of each target object;
An object pose estimation method comprising a.

8. The method of claim 7,
Based on the color image and the depth image, obtaining a first appearance feature of each target object and a geometrical relationship feature between each target object includes:
extracting an image feature based on the color image or based on the color image and the depth image, extracting a point cloud feature according to the depth image, fusing the image feature and the point cloud feature to obtain a fusion feature, obtaining a first appearance feature and an object instance segmentation image of each target object based on the fusion feature, and obtaining a geometrical relationship feature between each target object based on the object instance segmentation image
An object pose estimation method comprising a.

8. The method of claim 7,
Based on the color image and the depth image, obtaining a first appearance feature of each target object and a geometrical relationship feature between each target object includes:
extracting an image feature based on the color image and the depth image, obtaining a region image feature corresponding to the image region of each target object according to the image feature, and a region image corresponding to the image region of each target object obtaining a first appearance feature of each target object and a corresponding object detection result based on the characteristic, and obtaining a geometric relational feature of each target object based on the object detection result of each target object;
An object pose estimation method comprising a.

According to claim 1,
determining whether the video frame is an initial frame by detecting whether a target object or target pose first appears in the video frame;
An object pose estimation method further comprising a.

13. The method of claim 12,
determining whether the video frame is an initial frame by detecting whether the target object or the target pose first appears in the video frame,
obtaining an image bounding box of each target object in the corresponding video frame;
matching the image bounding box of each target object with the image bounding box corresponding to each target object in each pose result list;
If there is a matching target object in the pose result list, the first point cloud data of the image bounding box corresponding to each target object in the corresponding video frame and the corresponding target object in the previous video frame of the corresponding video frame By comparing the second point cloud data frames, it is determined whether there is a difference between the first point cloud data and the second point cloud data, and if there is a difference, it is determined that the pose of the object corresponding to the target object appears first. to do; and
If there is no matching target object in the object pose result list, determining that the target object appears first
An object pose estimation method comprising a.

13. The method of claim 12,
If the video frame is not an initial frame, a motion parameter corresponding to the video frame is obtained, and a pose result corresponding to the video frame is obtained based on the motion parameter and an object pose result of an initial frame corresponding to the video frame. determining; and
updating an object pose result of the initial frame corresponding to the video frame in a pose result list according to the pose result corresponding to the video frame;
An object pose estimation method further comprising a.

an image reliability determining unit configured to determine the reliability of the depth image according to the color image and the depth image of the object; and
a pose weight configured to estimate the pose of the object based on a three-dimensional key point when the depth image is reliable, and to estimate the pose of the object based on a two-dimensional key point when the depth image is not reliable government
An object pose estimation device comprising a.

including memory and a processor;
The memory stores a computer program,
The processor is used to execute the method of any one of claims 1 to 14 when executing the computer program.
electronic device.

A computer-readable recording medium in which a program for executing the method of any one of claims 1 to 14 is recorded.