KR20190107984A

KR20190107984A - An image traning apparatus extracting hard negative samples being used to training a neural network based on sampling and a threshold adjusting adaptively and a method performed by the image training apparatus

Info

Publication number: KR20190107984A
Application number: KR1020180029380A
Authority: KR
Inventors: 임영철; 강민성
Original assignee: 재단법인대구경북과학기술원
Priority date: 2018-03-13
Filing date: 2018-03-13
Publication date: 2019-09-23
Also published as: KR102167011B1

Abstract

According to an embodiment of the present invention, an image learning device can extract a hard negative sample used for learning of a neural network from a learning image. The hard negative sample can be used for learning an undesirable result to the neural network as a result of recognizing an object. The hard negative sample can be determined among search regions sampled from a learning image. The image learning device can determine a class score which is the probability that each of the sampled search regions corresponds to an object of the learning image, and then can determine the hard negative sample among the search regions on the basis of the determined class score. The number of hard negative samples used for learning of the neural network among the hard negative samples can be determined based on at least one of the number of positive samples, a predetermined threshold compared to the class score, and a predetermined ratio between the positive samples and the hard negative samples.

Description

IMAGE TRANING APPARATUS EXTRACTING HARD NEGATIVE SAMPLES BEING USED TO TRAINING A NEURAL NETWORK BASED ON SAMPLING AND A THRESHOLD ADJUSTING ADAPTIVELY AND A METHOD PERFORMED BY THE IMAGE TRAINING APPARATUS}

본 발명은 뉴럴 네트워크를 이용하여 영상에 포함된 객체를 식별하는 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for identifying an object included in an image using a neural network.

뉴럴 네트워크는 인간의 생물학적 신경 세포의 특성을 수학적 표현에 의해 모델링 한 것으로, 학습 영상에 포함된 객체를 추출하거나 식별하는데 활용될 수 있다. 기계 지도 학습(Machine supervised learning) 방법은 학습 영상 및 학습 영상에 포함된 객체와 관련된 정보를 포함하는 진리 데이터를 이용하여 뉴럴 네트워크를 학습시키는 방법이다. 즉, 뉴럴 네트워크가 진리 데이터를 이용하여 학습됨에 따라, 뉴럴 네트워크가 학습 영상으로부터 객체를 식별한 결과는 학습 영상에 대응하는 진리 데이터에 수렴할 수 있다.The neural network is modeled by mathematical representations of the characteristics of human biological neurons, and can be used to extract or identify objects included in learning images. The machine supervised learning method is a method of learning a neural network using truth data including information related to a learning image and an object included in the learning image. That is, as the neural network is trained using the truth data, the result of the neural network identifying the object from the training image may converge to the truth data corresponding to the training image.

뉴럴 네트워크를 학습하는 과정에서, 학습 영상의 일부분인 포지티브 샘플 및 네거티브 샘플이 사용될 수 있다. 포지티브 샘플은 객체를 포함하는 학습 영상의 일부분이고, 뉴럴 네트워크는 포지티브 샘플을 이용하여 학습 영상에서 객체가 존재하는 영역을 식별하도록 학습될 수 있다. 네거티브 샘플은 객체를 포함하지 않거나 객체의 일부분을 포함하는 학습 영상의 일부분이고, 뉴럴 네트워크는 네거티브 샘플을 이용하여 학습 영상에서 객체가 존재하지 않는 영상을 식별하도록 학습될 수 있다. 일반적으로, 학습 영상에서 획득되는 네거티브 샘플의 개수가 학습 영상에서 획득되는 포지티브 샘플의 개수보다 훨씬 많을 수 있다. 네거티브 샘플의 개수가 포지티브 샘플의 개수보다 많은 것은 뉴럴 네트워크가 학습하는데 사용되는 데이터의 불균형(imbalance)을 야기할 수 있다.In the course of learning the neural network, a positive sample and a negative sample which are part of the training image may be used. The positive sample is a part of the training image including the object, and the neural network may be trained to identify the area where the object exists in the training image using the positive sample. The negative sample is a portion of a training image that does not contain an object or includes a portion of the object, and the neural network may be trained to identify an image in which the object does not exist in the training image using the negative sample. In general, the number of negative samples acquired in the training image may be much greater than the number of positive samples acquired in the training image. A larger number of negative samples than the number of positive samples may cause an imbalance of data used by the neural network to learn.

본 발명은 샘플링 및 적응적으로 변경되는 임계치에 기초하여 뉴럴 네트워크를 학습하는데 이용되는 하드 네거티브 샘플을 추출하는 영상 학습 장치 및 방법을 제안한다.The present invention proposes an image learning apparatus and method for extracting hard negative samples used for learning neural networks based on sampling and adaptively changing thresholds.

본 발명은 학습 영상을 샘플링하여 획득된 탐색 영역들에 기초하여 하드 네거티브 샘플을 결정하는 영상 학습 장치 및 방법을 제안한다.The present invention proposes an image learning apparatus and method for determining a hard negative sample based on search areas obtained by sampling a learning image.

본 발명은 적응적으로 변경되는 임계치에 기초하여 하드 네거티브 샘플을 결정하는데 사용되는 탐색 영역을 선택하는 영상 학습 장치 및 방법을 제안한다.The present invention proposes an image learning apparatus and method for selecting a search region used to determine a hard negative sample based on an adaptively changed threshold.

일실시예에 따르면, 뉴럴 네트워크를 이용한 영상 학습 방법에 있어서, 학습 영상으로부터 복수의 탐색 영역을 샘플링하는 단계, 상기 복수의 탐색 영역 각각이 상기 학습 영상에 포함된 객체와 대응하는 확률인 클래스 스코어를 결정하는 단계, 상기 복수의 탐색 영역 중에서, 미리 설정된 임계치보다 큰 클래스 스코어를 가지는 하드 네거티브 샘플을 식별하는 단계 및 상기 식별된 하드 네거티브 샘플에 기초하여, 상기 뉴럴 네트워크를 학습하는 단계를 포함하고, 상기 임계치는, 상기 뉴럴 네트워크의 학습에 이용되는 하드 네거티브 샘플의 개수에 기초하여 조절되는 영상 학습 방법이 제공된다.According to an embodiment, in an image learning method using a neural network, sampling a plurality of search regions from a training image, and classifying the class scores, the probability of each of the plurality of search regions corresponding to an object included in the training image Determining, identifying, among the plurality of search regions, a hard negative sample having a class score greater than a preset threshold, and learning the neural network based on the identified hard negative sample; A threshold is provided based on the number of hard negative samples used to train the neural network.

일실시예에 따르면, 상기 하드 네거티브 샘플을 식별하는 단계는, 상기 복수의 탐색 영역 중에서, 상기 뉴럴 네트워크를 학습하는데 이용되는 포지티브 샘플과 대응하지 않는 탐색 영역 중에서, 상기 하드 네거티브 샘플을 식별하는 영상 학습 방법이 제공된다.According to an exemplary embodiment, identifying the hard negative sample may include: image learning identifying the hard negative sample from among a plurality of search areas and a search area that does not correspond to a positive sample used to train the neural network; A method is provided.

일실시예에 따르면, 상기 뉴럴 네트워크를 학습하는 단계는, 상기 객체의 위치 및 상기 학습 영상의 모든 영역을 비교하여 결정된 포지티브 샘플을 식별하는 단계 및 상기 식별된 하드 네거티브 샘플 중에서, 클래스 스코어가 큰 하드 네거티브 샘플부터 순차적으로 상기 뉴럴 네트워크의 학습에 이용되는 하드 네거티브 샘플로 선택하는 단계를 포함하고, 상기 뉴럴 네트워크의 학습에 이용되는 하드 네거티브 샘플의 개수는, 상기 식별된 포지티브 샘플의 개수 및 포지티브 샘플 및 하드 네거티브 샘플 사이의 미리 설정된 비율 중 적어도 하나에 기초하여 결정되는 영상 학습 방법이 제공된다.The training of the neural network may include: identifying a positive sample determined by comparing the location of the object and all regions of the training image, and among the identified hard negative samples, a hard class having a large class score. Selecting from a negative sample to a hard negative sample used for learning the neural network sequentially, wherein the number of hard negative samples used for learning the neural network includes: the number of the identified positive samples and the positive samples; An image learning method is provided that is determined based on at least one of a preset ratio between hard negative samples.

일실시예에 따르면, 상기 객체의 위치 및 상기 학습 영상의 모든 영역을 비교하여 결정된 포지티브 샘플의 개수 및 상기 식별된 하드 네거티브 샘플의 개수에 기초하여, 상기 임계치를 변경할지 여부를 결정하는 단계를 더 포함하는 영상 학습 방법이 제공된다.The method may further include determining whether to change the threshold based on the number of positive samples and the number of identified hard negative samples, which are determined by comparing the location of the object and all regions of the training image. An image learning method is provided.

일실시예에 따르면, 상기 임계치를 변경할지 여부를 결정하는 단계는, 상기 포지티브 샘플 및 상기 하드 네거티브 샘플 사이의 미리 설정된 비율에 상기 포지티브 샘플의 개수를 적용한 값이 상기 식별된 하드 네거티브 샘플의 개수보다 큰 경우, 상기 임계치를 변경하기로 결정하는 영상 학습 방법이 제공된다.According to one embodiment, determining whether to change the threshold, wherein the value of applying the number of the positive samples to a predetermined ratio between the positive sample and the hard negative samples is greater than the number of the identified hard negative samples If large, an image learning method is provided for determining to change the threshold.

일실시예에 따르면, 상기 임계치는, 상기 임계치를 변경하는 경우, 상기 학습 영상과 다른 학습 영상에서, 상기 학습 영상에서 사용된 값 보다 작은 값을 가지는 영상 학습 방법이 제공된다.According to an embodiment, when the threshold value is changed, an image learning method having a value smaller than a value used in the learning image in a learning image different from the learning image is provided.

일실시예에 따르면, 상기 샘플링하는 단계는, 미리 설정된 확률에 기초하여 상기 학습 영상 중에서 상기 복수의 탐색 영역을 샘플링하는 영상 학습 방법이 제공된다.According to an embodiment, the sampling may include providing an image learning method of sampling the plurality of search areas among the learning images based on a preset probability.

일실시예에 따르면, 학습 영상의 모든 영역을 상기 학습 영상의 객체의 위치와 비교하여, 포지티브 샘플을 결정하는 단계 및 학습 영상을 샘플링한 복수의 영역 각각의 클래스 스코어 및 미리 설정된 임계치에 기초하여, 하드 네거티브 샘플을 결정하는 단계 - 상기 클래스 스코어는 상기 복수의 영역 각각이 상기 객체와 대응할 확률임 - 및 상기 결정된 하드 네거티브 샘플 및 상기 결정된 포지티브 샘플에 기초하여, 뉴럴 네트워크를 학습하는 단계를 포함하고, 상기 임계치는, 상기 하드 네거티브 샘플의 개수 및 상기 포지티브 샘플의 개수를 비교한 결과에 따라 조절되는 영상 학습 방법이 제공된다.According to one embodiment, comparing all the areas of the training image with the position of the object of the training image, determining a positive sample and based on the class score and preset threshold of each of the plurality of regions sampled the training image, Determining a hard negative sample, wherein the class score is a probability that each of the plurality of regions corresponds to the object, and based on the determined hard negative sample and the determined positive sample, learning a neural network; The threshold value is provided according to a result of comparing the number of hard negative samples and the number of positive samples.

일실시예에 따르면, 상기 포지티브 샘플을 결정하는 단계는, 상기 학습 영상에 대응하는 진리 데이터에 기초하여, 상기 객체의 위치를 식별하는 단계 및 상기 학습 영상의 모든 영역 각각이 상기 식별된 객체의 위치와 중첩되는 정도가 임계치 이상인지 여부에 기초하여, 상기 학습 영상의 모든 영역 중에서 상기 포지티브 샘플을 선택하는 단계를 포함하는 영상 학습 방법이 제공된다.According to an exemplary embodiment, the determining of the positive sample may include identifying a location of the object based on truth data corresponding to the learning image, and each of all regions of the learning image is a location of the identified object. And selecting the positive sample from all regions of the training image based on whether the degree of overlapping with the threshold is greater than or equal to a threshold value.

일실시예에 따르면, 상기 하드 네거티브 샘플을 결정하는 단계는, 상기 학습 영상을 샘플링한 복수의 영역 중에서, 상기 객체의 위치와 중첩되는 정도가 임계치 이하인지 여부에 기초하여 네거티브 샘플을 식별하는 단계 및 상기 식별된 네거티브 샘플 중에서, 상기 임계치보다 큰 클래스 스코어를 가지는 네거티브 샘플을 상기 하드 네거티브 샘플로 결정하는 단계를 포함하는 영상 학습 방법이 제공된다.According to an embodiment, the determining of the hard negative sample may include: identifying a negative sample based on whether a degree of overlapping with the position of the object is less than or equal to a threshold value among a plurality of areas in which the training image is sampled; Among the identified negative samples, an image learning method is provided, comprising determining a negative sample having a class score greater than the threshold as the hard negative sample.

일실시예에 따르면, 상기 학습 영상에서 샘플링된 복수의 영역은, 상기 임계치 이하의 클래스 스코어를 가지는 소프트 네거티브 샘플 및 상기 임계치보다 큰 클래스 스코어를 가지는 하드 네거티브 샘플을 포함하는 영상 학습 방법이 제공된다.According to an embodiment, the plurality of regions sampled in the training image may include a soft negative sample having a class score below the threshold and a hard negative sample having a class score greater than the threshold.

일실시예에 따르면, (1) 상기 복수의 영역 중에서, 상기 임계치보다 큰 클래스 스코어를 가지는 영역의 개수 및 (2) 포지티브 샘플 및 하드 네거티브 샘플 사이의 미리 설정된 비율 및 상기 포지티브 샘플의 개수에 기초하여 계산된 하드 네거티브 샘플의 목표치를 비교하여, 상기 임계치를 변경할지 여부를 결정하는 단계를 더 포함하는 영상 학습 방법이 제공된다.According to an embodiment, based on (1) the number of regions having a class score larger than the threshold among the plurality of regions, and (2) a preset ratio between positive and hard negative samples and the number of positive samples Comparing the calculated target value of the hard negative sample, there is provided an image learning method further comprising the step of determining whether to change the threshold.

일실시예에 따르면, 상기 임계치를 변경할지 여부를 결정하는 단계는, 상기 목표치가 상기 임계치보다 큰 클래스 스코어를 가지는 영역의 개수 보다 큰 경우, 상기 임계치를 보다 작은 값으로 변경하기로 결정하는 영상 학습 방법이 제공된다.According to an embodiment, the determining of whether to change the threshold value may include: image learning that determines to change the threshold value to a smaller value when the target value is larger than the number of areas having a class score larger than the threshold value. A method is provided.

일실시예에 따르면, 상기 하드 네거티브 샘플을 결정하는 단계는, 상기 목표치가 상기 임계치보다 큰 클래스 스코어를 가지는 영역의 개수 보다 큰 경우, 상기 복수의 영역 중에서, 상기 임계치보다 큰 클래스 스코어를 가지는 하나 이상의 영역을 상기 하드 네거티브 샘플로 결정하는 영상 학습 방법이 제공된다.According to an embodiment, the determining of the hard negative sample may include: when the target value is greater than the number of areas having a class score greater than the threshold, one or more of the plurality of areas having a class score greater than the threshold. An image learning method for determining an area as the hard negative sample is provided.

일실시예에 따르면, 상기 하드 네거티브 샘플을 결정하는 단계는, 상기 목표치가 상기 임계치보다 큰 클래스 스코어를 가지는 영역의 개수 보다 작은 경우, 상기 복수의 영역 중에서, 가장 큰 클래스 스코어를 가지는 영역부터 내림차순으로 상기 목표치만큼 영역을 추출하는 단계 및 상기 추출된 영역들을 상기 하드 네거티브 샘플로 결정하는 단계를 포함하는 영상 학습 방법이 제공된다.According to an embodiment, the determining of the hard negative sample may include determining, in descending order, the region having the largest class score among the plurality of regions when the target value is smaller than the number of regions having a class score greater than the threshold. An image learning method including extracting an area by the target value and determining the extracted areas as the hard negative sample is provided.

일실시예에 따르면, 뉴럴 네트워크를 이용한 영상 인식 방법에 있어서, 입력 영상을 식별하는 단계, 상기 입력 영상을 상기 뉴럴 네트워크에 입력하는 단계 및 상기 뉴럴 네트워크의 출력에 기초하여, 상기 입력 영상에 포함된 객체를 인식하는 단계를 포함하고, 상기 뉴럴 네트워크는, 학습 영상에서 샘플링된 복수의 탐색 영역 중에서, 미리 설정된 임계치에 기초하여 선택된 하나 이상의 하드 네거티브 샘플을 사전에 학습하고, 상기 임계치는, 상기 학습 영상의 모든 영역 및 상기 객체의 위치를 비교하여 결정된 포지티브 샘플의 개수 및 상기 선택된 하드 네거티브 샘플의 개수 중 적어도 하나에 기초하여 조절되는 영상 인식 방법이 제공된다.According to an embodiment, in an image recognition method using a neural network, the method may include identifying an input image, inputting the input image to the neural network, and outputting the neural network based on the output of the neural network. And recognizing an object, wherein the neural network pre-learns one or more hard negative samples selected based on a preset threshold among a plurality of search regions sampled in a training image, and the threshold is the training image. An image recognition method is adjusted based on at least one of the number of positive samples and the number of selected hard negative samples determined by comparing all regions of the object and the position of the object.

일실시예에 따르면, 상기 하드 네거티브 샘플은, 상기 복수의 탐색 영역 중에서, 상기 포지티브 샘플에 대응하는 탐색 영역을 제외한 탐색 영역인 네거티브 샘플 중에서 상기 임계치보다 큰 클래스 스코어를 가지는 탐색 영역이고, 상기 클래스 스코어는, 상기 복수의 탐색 영역 각각이 상기 객체와 대응할 확률인 영상 인식 방법이 제공된다.According to an embodiment, the hard negative sample is a search region having a class score larger than the threshold value among the negative samples, which are search regions except for the search region corresponding to the positive sample, from among the plurality of search regions. An image recognition method having a probability that each of the plurality of search regions corresponds to the object is provided.

일실시예에 따르면, 샘플링 및 적응적으로 변경되는 임계치에 기초하여 뉴럴 네트워크를 학습하는데 이용되는 하드 네거티브 샘플을 추출할 수 있다.According to one embodiment, the hard negative sample used to train the neural network may be extracted based on sampling and adaptively changed thresholds.

일실시예에 따르면, 학습 영상을 샘플링하여 획득된 탐색 영역들에 기초하여 하드 네거티브 샘플을 결정할 수 있다.According to an embodiment, the hard negative sample may be determined based on the search areas obtained by sampling the training image.

일실시예에 따르면, 적응적으로 변경되는 임계치에 기초하여 하드 네거티브 샘플을 결정하는데 사용되는 탐색 영역을 선택할 수 있다.According to one embodiment, the search region used to determine the hard negative sample may be selected based on an adaptively changed threshold.

도 1은 일실시예에 따른 영상 학습 장치가 학습 영상을 이용하여 뉴럴 네트워크를 학습하는 동작을 설명하기 위한 흐름도이다.
도 2는 일실시예에 따른 영상 학습 장치가 뉴럴 네트워크를 학습하는데 사용하는 포지티브 샘플 및 하드 네거티브 샘플을 설명하기 위한 예시적인 도면이다.
도 3은 일실시예에 따른 영상 학습 장치가 복수의 탐색 영역을 정렬하는 동작을 설명하기 위한 예시적인 도면이다.
도 4는 일실시예에 따른 영상 학습 장치가 복수의 탐색 영역 중에서 하드 네거티브 샘플을 선택하는 동작을 설명하기 위한 예시적인 도면이다.
도 5는 일실시예에 따른 영상 학습 장치에 의해 학습된 뉴럴 네트워크를 이용하여 입력 영상에 존재하는 객체를 인식하는 동작을 설명하기 위한 흐름도이다.
도 6은 일실시예에 따른 영상 학습 장치의 구조를 설명하기 위한 도면이다.1 is a flowchart illustrating an example of an operation of learning a neural network using a training image by an image learning apparatus, according to an exemplary embodiment.
FIG. 2 is a diagram for describing a positive sample and a hard negative sample used by an image learning apparatus to learn a neural network, according to an exemplary embodiment.
3 is an exemplary diagram for describing an operation of arranging a plurality of search areas by an image learning apparatus, according to an exemplary embodiment.
4 is a diagram for describing an operation of selecting a hard negative sample from a plurality of search areas by an image learning apparatus, according to an exemplary embodiment.
FIG. 5 is a flowchart illustrating an operation of recognizing an object existing in an input image using a neural network learned by an image learning apparatus, according to an exemplary embodiment.
6 is a diagram for describing a structure of an image learning apparatus, according to an exemplary embodiment.

본 명세서에 개시되어 있는 본 발명의 개념에 따른 실시예들에 대해서 특정한 구조적 또는 기능적 설명들은 단지 본 발명의 개념에 따른 실시예들을 설명하기 위한 목적으로 예시된 것으로서, 본 발명의 개념에 따른 실시예들은 다양한 형태로 실시될 수 있으며 본 명세서에 설명된 실시예들에 한정되지 않는다.Specific structural or functional descriptions of the embodiments according to the inventive concept disclosed herein are merely illustrated for the purpose of describing the embodiments according to the inventive concept, and the embodiments according to the inventive concept. These may be embodied in various forms and are not limited to the embodiments described herein.

본 발명의 개념에 따른 실시예들은 다양한 변경들을 가할 수 있고 여러 가지 형태들을 가질 수 있으므로 실시예들을 도면에 예시하고 본 명세서에 상세하게 설명하고자 한다. 그러나, 이는 본 발명의 개념에 따른 실시예들을 특정한 개시형태들에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 변경, 균등물, 또는 대체물을 포함한다.Embodiments according to the inventive concept may be variously modified and have various forms, so embodiments are illustrated in the drawings and described in detail herein. However, this is not intended to limit the embodiments in accordance with the concept of the present invention to specific embodiments, and includes modifications, equivalents, or substitutes included in the spirit and scope of the present invention.

제1 또는 제2 등의 용어를 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만, 예를 들어 본 발명의 개념에 따른 권리 범위로부터 이탈되지 않은 채, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다.Terms such as first or second may be used to describe various components, but the components should not be limited by the terms. The terms are only for the purpose of distinguishing one component from another component, for example, without departing from the scope of the rights according to the inventive concept, the first component may be called a second component, Similarly, the second component may also be referred to as the first component.

어떤 구성요소가 다른 구성요소에 “연결되어” 있다거나 “접속되어” 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 “직접 연결되어” 있다거나 “직접 접속되어” 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. 구성요소들 간의 관계를 설명하는 표현들, 예를 들어 “~사이에”와 “바로~사이에” 또는 “~에 직접 이웃하는” 등도 마찬가지로 해석되어야 한다.When a component is said to be “connected” or “connected” to another component, it may be directly connected to or connected to that other component, but it may be understood that other components may be present in the middle. Should be. On the other hand, when a component is said to be "directly connected" or "directly connected" to another component, it should be understood that there is no other component in between. Expressions that describe the relationship between components, such as "between" and "immediately between," or "directly neighboring to," should be interpreted as well.

본 명세서에서 사용한 용어는 단지 특정한 실시예들을 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, “포함하다” 또는 “가지다” 등의 용어는 설시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함으로 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In this specification, the terms “comprise” or “have” are intended to designate that the stated feature, number, step, operation, component, part, or combination thereof exists, but includes one or more other features or numbers, It is to be understood that it does not exclude in advance the possibility of the presence or addition of steps, actions, components, parts or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art. Terms such as those defined in the commonly used dictionaries should be construed as having meanings consistent with the meanings in the context of the related art, and are not construed in ideal or excessively formal meanings unless expressly defined herein. Do not.

이하, 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다. 그러나, 특허출원의 범위가 이러한 실시예들에 의해 제한되거나 한정되는 것은 아니다. 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.Hereinafter, exemplary embodiments will be described in detail with reference to the accompanying drawings. However, the scope of the patent application is not limited or limited by these embodiments. Like reference numerals in the drawings denote like elements.

도 1은 일실시예에 따른 영상 학습 장치가 학습 영상을 이용하여 뉴럴 네트워크를 학습하는 동작을 설명하기 위한 흐름도이다. 영상 학습 장치에 의해 학습된 뉴럴 네트워크는 학습 영상 또는 학습 영상을 제외한 다른 영상에 포함된 객체를 인식하는데 사용될 수 있다.1 is a flowchart illustrating an example of an operation of learning a neural network using a training image by an image learning apparatus, according to an exemplary embodiment. The neural network learned by the image learning apparatus may be used to recognize an object included in a learning image or another image except the learning image.

도 1을 참고하면, 단계(101)에서, 일실시예에 따른 영상 학습 장치는 뉴럴 네트워크를 학습하는데 사용할 학습 영상을 식별할 수 있다. 학습 영상은 영상 학습 장치와 연결된 네트워크(예를 들어, 인터넷) 또는 영상 학습 장치와 연결된 다른 전자 장치(예를 들어, 이미지 센서를 포함하는 카메라, 스마트폰 등)로부터 전송될 수 있다. 영상 학습 장치는 네트워크 또는 다른 전자 장치에서 전송되어 영상 학습 장치의 메모리에 저장된 학습 영상을 식별할 수 있다. 학습 영상은 피사체(즉, 객체)가 존재하는 영역 및 피사체가 존재하지 않는 영역(예를 들어, 배경 또는 상기 피사체와 구분되는 다른 피사체)으로 구분될 수 있다.Referring to FIG. 1, in step 101, an image learning apparatus may identify a learning image to be used to learn a neural network. The learning image may be transmitted from a network connected to the image learning device (eg, the Internet) or from another electronic device connected to the image learning device (eg, a camera including an image sensor, a smartphone, etc.). The image learning apparatus may identify a learning image transmitted from a network or another electronic device and stored in a memory of the image learning apparatus. The learning image may be divided into a region where a subject (that is, an object) exists and a region where the subject does not exist (for example, a background or another subject distinguished from the subject).

도 1을 참고하면, 단계(102)에서, 일실시예에 따른 영상 학습 장치는 학습 영상의 일부분인 탐색 영역을 샘플링할 수 있다. 영상 학습 장치는 학습 영상의 일부분을 분할하여 탐색 영역을 결정할 수 있다. 영상 학습 장치는 미리 설정된 확률(예를 들어, 1/k (k는 1 보다 큰 실수))에 기초하여 학습 영상으로부터 탐색 영역을 샘플링할 수 있다. 탐색 영역은 상기 확률에 기초하여 학습 영상으로부터 하나 이상 선택될 수 있다. 탐색 영역이 복수 개 샘플링되는 경우, 탐색 영역들의 크기는 서로 다를 수 있다.Referring to FIG. 1, in operation 102, an image learning apparatus may sample a search area that is a part of a learning image. The image learning apparatus may determine a search area by dividing a portion of the learning image. The image learning apparatus may sample the search region from the learning image based on a preset probability (eg, 1 / k (k is a real number larger than 1)). One or more search areas may be selected from a learning image based on the probability. When a plurality of search areas are sampled, the sizes of the search areas may be different.

도 1을 참고하면, 단계(103)에서, 일실시예에 따른 영상 학습 장치는 선택된 하나 이상의 탐색 영역이 학습 영상에 포함된 객체와 대응하는 확률인 클래스 스코어를 결정할 수 있다. 즉, 클래스 스코어는 대응하는 탐색 영역이 객체에 대응할 확률을 나타낸 값일 수 있다. 탐색 영역이 복수 개 샘플링된 경우, 영상 학습 장치가 클래스 스코어를 결정하는 것은 탐색 영역들 각각에 대하여 수행될 수 있다. 탐색 영역은 학습 영상의 일부분을 샘플링한 것이므로, 영상 학습 장치는 학습 영상의 모든 영역에서 클래스 스코어를 계산하지 않을 수 있다. 따라서, 영상 학습 장치가 클래스 스코어를 계산하는데 필요한 계산량이 줄어들 수 있다.Referring to FIG. 1, in operation 103, the image learning apparatus may determine a class score which is a probability that at least one selected search area corresponds to an object included in the training image. That is, the class score may be a value indicating the probability that the corresponding search region corresponds to the object. When a plurality of search areas are sampled, determining the class score by the image learning apparatus may be performed for each of the search areas. Since the search area is a sample of a part of the training image, the image learning apparatus may not calculate a class score in all regions of the training image. Therefore, the amount of calculation necessary for the image learning apparatus to calculate the class score can be reduced.

도 1을 참고하면, 단계(104)에서, 일실시예에 따른 영상 학습 장치는 하나 이상의 탐색 영역 중에서, 미리 설정된 임계치(θ) 보다 큰 클래스 스코어를 가지는 탐색 영역을 선택하여 저장할 수 있다. 임계치(θ)는 0 이상 1 이하의 실수로써, 예를 들어, 초기값은 0.5 내지 0.9 사이의 실수로 결정될 수 있다. 임계치(θ)는 서로 다른 학습 영상을 이용하여 뉴럴 네트워크를 학습할 때마다 적응적으로 변경될 수 있다. 영상 학습 장치는 임계치(θ) 보다 큰 클래스 스코어를 가지는 탐색 영역 중에서, 네거티브 샘플만을 선택할 수 있다. 네거티브 샘플은 객체를 포함하지 않거나 객체의 일부분을 포함하는 학습 영상의 일부분이고, 객체가 네거티브 샘플에 포함되어 있지 않음을 뉴럴 네트워크에 학습시키는데 이용될 수 있다. 단계(102)에서, 영상 학습 장치가 복수의 탐색 영역을 선택한 경우, 임계치(θ) 보다 큰 클래스 스코어를 가지는 탐색 영역이 하나 이상 선택될 수 있다.Referring to FIG. 1, in operation 104, the image learning apparatus may select and store a search region having a class score greater than a preset threshold θ from one or more search regions. The threshold [theta] can be determined as a real number between 0 and 1, for example, an initial value between 0.5 and 0.9. The threshold value θ may be adaptively changed every time the neural network is learned using different learning images. The image learning apparatus may select only negative samples from among search areas having a class score larger than the threshold θ. The negative sample is part of a training image that does not contain an object or includes a portion of the object, and may be used to learn from the neural network that the object is not included in the negative sample. In operation 102, when the image learning apparatus selects a plurality of search areas, one or more search areas having a class score larger than the threshold θ may be selected.

네거티브 샘플은 임계치(θ) 보다 큰 클래스 스코어를 가지는 하드 네거티브 샘플 및 임계치(θ) 이하의 클래스 스코어를 가지는 소프트 네거티브 샘플로 구분될 수 있다. 따라서, 영상 학습 장치가 단계(104)에 의해 저장하는 탐색 영역은 하드 네거티브 샘플에 대응하는 탐색 영역일 수 있다. The negative sample may be divided into a hard negative sample having a class score larger than the threshold θ and a soft negative sample having a class score below the threshold θ. Thus, the search region stored by the image learning apparatus in step 104 may be a search region corresponding to the hard negative sample.

도 1을 참고하면, 단계(105)에서, 일실시예에 따른 영상 학습 장치는 저장된 탐색 영역을 그룹핑할 수 있다. 더 나아가서, 복수의 탐색 영역이 선택되어 저장된 경우, 영상 학습 장치는 저장된 복수의 탐색 영역을 대응하는 클래스 스코어 순서에 기초하여 정렬할 수 있다. 예를 들어, 영상 학습 장치에 저장된 복수의 탐색 영역은 클래스 스코어의 내림차순으로 정렬될 수 있다.Referring to FIG. 1, in operation 105, an image learning apparatus may group stored search regions. Furthermore, when a plurality of search areas are selected and stored, the image learning apparatus may sort the plurality of stored search areas based on the corresponding class score order. For example, the plurality of search areas stored in the image learning apparatus may be arranged in descending order of class scores.

도 1을 참고하면, 단계(106)에서, 일실시예에 따른 영상 학습 장치는 학습 영상의 일부분으로써, 학습 영상에서 객체가 존재하는 영역을 뉴럴 네트워크에 학습하는데 이용되는 포지티브 샘플을 식별할 수 있다. 바꾸어 말하면, 포지티브 샘플은 객체를 포함하는 학습 영상의 일부분이고, 객체가 포지티브 샘플에 포함되어 있음을 뉴럴 네트워크에 학습시키는데 이용될 수 있다. 영상 학습 장치는 학습 영상의 모든 영역을 탐색하여, 하나 이상의 포지티브 샘플을 식별할 수 있다. 보다 구체적으로, 영상 학습 장치는 객체의 위치 및 학습 영상의 모든 영역을 비교하여, 하나 이상의 포지티브 샘플을 결정할 수 있다. 이하에서는 영상 학습 장치가 N_p개의 포지티브 샘플을 식별한 것으로 가정한다.Referring to FIG. 1, in step 106, an image learning apparatus according to an embodiment may identify a positive sample used to learn a neural network from a region in which an object exists in a training image as part of a training image. . In other words, the positive sample is part of a training image that includes the object and can be used to learn from the neural network that the object is included in the positive sample. The image learning apparatus may search all regions of the learning image to identify one or more positive samples. More specifically, the image learning apparatus may determine one or more positive samples by comparing the location of the object and all regions of the learning image. Hereinafter, it is assumed that the image learning apparatus has identified N _p positive samples.

단계(104)에서 저장되는 임계치(θ) 보다 큰 클래스 스코어를 가지는 탐색 영역은, 복수의 탐색 영역 중에서 포지티브 샘플에 대응하는 탐색 영역을 제외한 나머지 탐색 영역 중에서 선택될 수 있다. 즉, 학습 영상에서 샘플링된 탐색 영역은 하드 네거티브 샘플만을 추출하는데 이용될 수 있다. 포지티브 샘플이 학습 영상의 모든 영역을 탐색하여 결정되지만, 네거티브 샘플은 학습 영상을 샘플링한 탐색 영역에 기초하여 결정될 수 있다. 따라서, 네거티브 샘플을 결정하는데 소요되는 시간 및 연산량이 절감될 수 있다.A search region having a class score larger than the threshold value θ stored in step 104 may be selected from the remaining search regions except for the search region corresponding to the positive sample among the plurality of search regions. That is, the search region sampled from the training image may be used to extract only hard negative samples. While the positive sample is determined by searching all regions of the training image, the negative sample may be determined based on the search region in which the training image is sampled. Thus, the amount of time and computation required to determine the negative sample can be saved.

도 1을 참고하면, 단계(107)에서, 일실시예에 따른 영상 학습 장치는 포지티브 샘플의 개수에 기초하여, 뉴럴 네트워크의 학습에 사용할 하드 네거티브 샘플의 개수 N_n을 결정할 수 있다. 포지티브 샘플의 개수 N_p 및 하드 네거티브 샘플의 개수 N_n 사이의 비율이 미리 설정될 수 있다. 이 경우, 하드 네거티브 샘플의 개수 N_n은 미리 설정된 비율 및 단계(106)에서 식별된 포지티브 샘플의 개수 N_p에 기초하여 결정될 수 있다. 예를 들어, 포지티브 샘플의 개수 N_p 및 하드 네거티브 샘플의 개수 N_n 사이의 비율이 1:m으로 사전에 결정된 경우, 하드 네거티브 샘플의 개수 N_n은 mN_p가 될 수 있다. 이 경우, 하드 네거티브 샘플의 개수 N_n은 포지티브 샘플의 개수 N_p에 비례할 수 있다.Referring to FIG. 1, in operation 107, the image learning apparatus may determine the number of hard negative samples N _n to be used for learning the neural network based on the number of positive samples. The ratio between the number N _p of positive samples and the number N _n of hard negative samples may be preset. In this case, the number N _n of hard negative samples may be determined based on a preset ratio and the number N _p of positive samples identified in step 106. For example, when the ratio between the number N _p of the positive samples and the number N _n of the hard negative samples is predetermined as 1: m, the number N _n of the hard negative samples may be mN _p . In this case, the number N _n of hard negative samples may be proportional to the number N _p of positive samples.

도 1을 참고하면, 단계(108)에서, 일실시예에 따른 영상 학습 장치는 단계(105)에서 그룹핑된 탐색 영역의 개수를, 단계(107)에서 결정된 하드 네거티브 샘플의 개수 N_n과 비교할 수 있다. 영상 학습 장치는 그룹핑된 탐색 영역들 중에서, 클래스 스코어 및 하드 네거티브 샘플의 개수 N_n에 기초하여, 뉴럴 네트워크를 학습하는데 사용할 하드 네거티브 샘플을 선택할 수 있다. 즉, N_n은 뉴럴 네트워크를 학습하는데 사용할 하드 네거티브 샘플의 목표치일 수 있다.Referring to FIG. 1, in operation 108, the image learning apparatus may compare the number of search regions grouped in operation 105 with the number N _n of hard negative samples determined in operation 107. have. The image learning apparatus may select a hard negative sample to be used to train the neural network based on the class score and the number N _n of hard negative samples among the grouped search areas. That is, N _n may be a target value of a hard negative sample to be used for learning a neural network.

보다 구체적으로, 그룹핑된 탐색 영역의 개수가 하드 네거티브 샘플의 목표치 Nn 보다 큰 경우, 단계(109)에서, 일실시예에 따른 영상 학습 장치는 그룹핑된 탐색 영역들 각각에 대응하는 클래스 스코어에 기초하여, 그룹핑된 탐색 영역들 중에서 하드 네거티브 샘플을 선택할 수 있다. 하드 네거티브 샘플로 선택되는 탐색 영역의 개수는 하드 네거티브 샘플의 목표치 Nn에 대응할 수 있다.More specifically, when the number of grouped search areas is greater than the target value Nn of the hard negative sample, in step 109, the image learning apparatus according to the embodiment may be based on a class score corresponding to each of the grouped search areas. The hard negative sample may be selected from the grouped search areas. The number of search areas selected as the hard negative sample may correspond to the target value Nn of the hard negative sample.

예를 들어, 영상 학습 장치는 상대적으로 높은 클래스 스코어를 가지는 Nn개의 탐색 영역들을 하드 네거티브 샘플로 선택할 수 있다. 저장된 복수의 탐색 영역이 클래스 스코어 순서에 기초하여 정렬된 경우, 예를 들어, 복수의 탐색 영역이 클래스 스코어의 내림차순으로 정렬되어 저장된 경우, 영상 학습 장치는 첫번째로 저장된 탐색 영역부터 Nn번째의 탐색 영역까지 총 Nn개의 탐색 영역들을 하드 네거티브 샘플로 선택할 수 있다.For example, the image learning apparatus may select Nn search areas having a relatively high class score as a hard negative sample. When the plurality of stored search areas are sorted based on the class score order, for example, when the plurality of search areas are stored sorted in descending order of the class score, the image learning apparatus may store the Nn th search area from the first stored search area. Up to Nn search areas can be selected as hard negative samples.

그룹핑된 탐색 영역의 개수가 하드 네거티브 샘플의 목표치 Nn 보다 작은 경우, 단계(110)에서, 일실시예에 따른 영상 학습 장치는 탐색 영역을 선택하는데 사용된 임계치(θ)를 조절할 수 있다. 임계치(θ)는 미리 설정된 실수 α가 적용된 α × θ에 대응하여 변경될 수 있다. 예를 들어, 영상 학습 장치는 다음 학습 영상을 학습할 때에 보다 많은 수의 하드 네거티브 샘플을 선택할 수 있도록, 임계치(θ)의 크기를 줄일 수 있다. 이 경우, 임계치(θ)에 적용되는 α는 1 보다 작은 양의 실수(예를 들어, 0.99)일 수 있다. 따라서, 다음 학습 영상에 대하여, 영상 학습 장치는 보다 많은 숫자의 하드 네거티브 샘플을 뉴럴 네트워크를 학습하는데 이용할 수 있다.If the number of grouped search areas is smaller than the target value Nn of the hard negative sample, in step 110, the image learning apparatus may adjust the threshold value θ used to select the search area. The threshold value θ may be changed in correspondence to α × θ to which the preset real number α is applied. For example, the image learning apparatus may reduce the size of the threshold θ so that a larger number of hard negative samples may be selected when learning the next learning image. In this case, α applied to the threshold θ may be an amount of real number less than 1 (eg, 0.99). Thus, for the next learning image, the image learning device can use a larger number of hard negative samples to train the neural network.

그룹핑된 탐색 영역의 개수가 하드 네거티브 샘플의 목표치 Nn 보다 작은 경우, 단계(111)에서, 일실시예에 따른 영상 학습 장치는 단계(105)에서 그룹핑된 탐색 영역 전부를 하드 네거티브 샘플로 선택할 수 있다. 단계(110) 및 단계(111)의 순서는 도 1에 도시된 바와 다를 수 있으며, 영상 학습 장치는 단계(110) 및 단계(111)를 독립적으로 수행할 수 있다.If the number of grouped search areas is smaller than the target value Nn of the hard negative samples, in step 111, the image learning apparatus according to an embodiment may select all of the search areas grouped in step 105 as hard negative samples. . The order of steps 110 and 111 may be different from that shown in FIG. 1, and the image learning apparatus may independently perform steps 110 and 111.

도 1을 참고하면, 단계(112)에서, 일실시예에 따른 영상 학습 장치는 선택된 하드 네거티브 샘플 및 식별된 포지티브 샘플을 이용하여 뉴럴 네트워크를 학습시킬 수 있다. 따라서, 뉴럴 네트워크는 포지티브 샘플에 기초하여 학습 영상에서 객체가 존재하는 영역을 식별하도록 학습될 수 있다. 또한, 뉴럴 네트워크는 하드 네거티브 샘플에 기초하여 학습 영상에서 객체가 존재하지 않는 영역을 식별할 수 있다. 상술한 바와 같이, 하드 네거티브 샘플은 임계치(θ) 보다 큰 클래스 스코어를 가지는 네거티브 샘플이므로, 뉴럴 네트워크가 객체를 인식한 바람직하지 않은 결과를 학습하는 데에 이용될 수 있다.Referring to FIG. 1, in operation 112, an image learning apparatus may train a neural network using a selected hard negative sample and an identified positive sample. Accordingly, the neural network may be trained to identify the area where the object exists in the training image based on the positive sample. In addition, the neural network may identify an area where no object exists in the training image based on the hard negative sample. As mentioned above, the hard negative sample is a negative sample with a class score greater than the threshold [theta], so that the neural network can be used to learn the undesirable result of recognizing the object.

뉴럴 네트워크를 학습하는데 이용되는 하드 네거티브 샘플은 임계치(θ) 이상의 클래스 스코어를 가지는 학습 영상의 일부분일 수 있다. 즉, 뉴럴 네트워크를 학습하는데 이용되는 하드 네거티브 샘플은 학습 영상으로부터 추출될 수 있는 복수의 네거티브 샘플들 중에서, 뉴럴 네트워크가 객체를 정확하게 인식한 것으로 판단하기 상대적으로 어려운 네거티브 샘플일 수 있다. 일실시예에 따른 영상 학습 장치가 하드 네거티브 샘플만을 선택하여 뉴럴 네트워크를 학습시키기 때문에, 뉴럴 네트워크는 보다 정확하게 객체를 식별하는데 이용될 수 있다.The hard negative sample used to train the neural network may be part of a training image having a class score above a threshold θ. That is, the hard negative sample used for learning the neural network may be a negative sample that is relatively difficult to determine that the neural network correctly recognizes an object among a plurality of negative samples that can be extracted from the training image. Since the image learning apparatus according to an embodiment trains the neural network by selecting only hard negative samples, the neural network can be used to more accurately identify the object.

또한, 포지티브 샘플의 개수 및 네거티브 샘플의 개수가 상이한 것에 따른 불균형에 대하여, 뉴럴 네트워크의 학습에 이용되는 하드 네거티브 샘플의 개수가 최대 N_n 개로 제한되어, 뉴럴 네트워크를 학습하는데 소요되는 시간이 절감될 수 있다. 더 나아가서, 학습 영상의 모든 영역의 클래스 스코어를 계산하는 대신에, 하드 네거티브 샘플이 미리 설정된 확률에 기초하여 샘플링된 학습 영상의 일부분(예를 들어, 단계(102)에서 샘플링된 탐색 영역들)의 클래스 스코어를 계산하여 결정되므로, 하드 네거티브 샘플을 결정하는데 소요되는 시간 및 연산량이 절감될 수 있다.In addition, for the imbalance caused by the difference between the number of positive samples and the number of negative samples, the number of hard negative samples used for learning the neural network is limited to a maximum of N _n , which reduces the time required for learning the neural network. Can be. Furthermore, instead of calculating the class scores of all regions of the training image, the hard negative samples of the portion of the training image sampled (e.g., the search regions sampled in step 102) based on a predetermined probability. Since it is determined by calculating the class score, the amount of time and computation required to determine the hard negative sample can be saved.

도 2는 일실시예에 따른 영상 학습 장치가 뉴럴 네트워크를 학습하는데 사용하는 포지티브 샘플(220, 230) 및 하드 네거티브 샘플(210)을 설명하기 위한 예시적인 도면이다.FIG. 2 is an exemplary diagram for describing the positive samples 220 and 230 and the hard negative sample 210 used by the image learning apparatus to train a neural network, according to an exemplary embodiment.

영상 학습 장치는 학습 영상(200)을 미리 설정된 확률에 기초하여 샘플링하여 획득한 탐색 영역으로부터, 뉴럴 네트워크를 학습하는데 이용될 하드 네거티브 샘플(210)을 결정할 수 있다. 도 2를 참고하면, 하드 네거티브 샘플(210)은 학습 영상(200)의 배경을 포함할 수 있다. 또는, 하드 네거티브 샘플(210)은 뉴럴 네트워크를 통해 인식하고자 하는 객체를 제외한 다른 객체를 포함할 수 있다. 하드 네거티브 샘플(210)은 임계치(θ) 이상이거나 또는 임계치(θ) 보다 큰 클래스 스코어를 가지는 학습 영상의 일부분으로 결정되기 때문에, 뉴럴 네트워크에 의해 객체가 존재하는 영역으로 결정되는 것이 상대적으로 용이한 일부분일 수 있다. The image learning apparatus may determine the hard negative sample 210 to be used for learning the neural network from the search region obtained by sampling the training image 200 based on a preset probability. Referring to FIG. 2, the hard negative sample 210 may include a background of the training image 200. Alternatively, the hard negative sample 210 may include other objects except objects to be recognized through the neural network. Since the hard negative sample 210 is determined as part of a training image having a class score above or above the threshold [theta], it is relatively easy to determine the area in which the object exists by the neural network. It can be part.

영상 학습 장치는 학습 영상(200)의 모든 영역에서 결정된 클래스 스코어에 기초하여, 포지티브 샘플(220, 230)을 결정할 수 있다. 영상 학습 장치가 학습 영상(200)에 대응하는 진리 데이터(240)를 식별한 경우, 포지티브 샘플(220, 230)은 진리 데이터(240)를 고려하여 결정될 수 있다. 진리 데이터(240)는 뉴럴 네트워크를 통해 인식하고자 하는 객체의 학습 영상(200)에서의 위치를 표시한 정보(예를 들어, 좌표 정보)를 포함할 수 있다.The image learning apparatus may determine the positive samples 220 and 230 based on the class scores determined in all regions of the training image 200. When the image learning apparatus identifies the truth data 240 corresponding to the learning image 200, the positive samples 220 and 230 may be determined in consideration of the truth data 240. The truth data 240 may include information (eg, coordinate information) indicating a position in the learning image 200 of the object to be recognized through the neural network.

예를 들어, 영상 학습 장치는 학습 영상(200)의 모든 영역을 진리 데이터(240)로부터 식별되는 객체의 위치와 비교하여, 객체와 중첩되는 정도가 미리 설정된 임계치 이상인 특정 영역을 포지티브 샘플(220, 230)로 결정할 수 있다. 객체와 중첩되는 정도는, 예를 들어, Intersection-of-Union(IOU)에 기초하여 결정될 수 있다. 예를 들어, 특정 영역 및 객체가 중첩되는 정도인 IOU가 0.5 이상이면, 영상 학습 장치는 특정 영역을 포지티브 샘플(220, 230)로 결정할 수 있다. 요약하면, 학습 영상(200)의 모든 영역 중에서, 진리 데이터(240)로부터 식별되는 객체의 위치와 상대적으로 많이 중첩되는 영역이 포지티브 샘플(220, 230)로 결정될 수 있다. 또는, 영상 학습 장치는 진리 데이터(240)가 나타내는 객체의 위치와 가장 많이 겹치는 하나의 영역을 포지티브 샘플로 결정할 수 있다. 또는, 영상 학습 장치는 학습 영상(200)의 모든 영역 각각이 식별된 객체의 위치와 중첩되는 정도(예를 들어, 상기 IOU로 평가될 수 있음)가 임계치(예를 들어, 상기 0.5) 이상인지 여부에 기초하여, 포지티브 샘플을 선택할 수 있다.For example, the image learning apparatus compares all regions of the training image 200 with the positions of the objects identified from the truth data 240, so that the image learning apparatus may identify the specific regions where the degree of overlapping with the objects is greater than or equal to a preset threshold. 230). The degree of overlap with the object may be determined based on, for example, Intersection-of-Union (IOU). For example, if the IOU, which is a degree of overlapping a specific region and an object, is 0.5 or more, the image learning apparatus may determine the specific region as the positive samples 220 and 230. In summary, among all regions of the training image 200, regions where the positions of the objects identified from the truth data 240 relatively overlap with each other may be determined as the positive samples 220 and 230. Alternatively, the image learning apparatus may determine one area that most overlaps with the position of the object represented by the truth data 240 as the positive sample. Alternatively, the image learning apparatus determines whether the degree of overlapping each of the regions of the training image 200 with the location of the identified object (for example, the IOU) may be greater than or equal to a threshold (for example, 0.5). Based on whether it is possible to select a positive sample.

뉴럴 네트워크가 학습 영상(200)에서 객체가 존재하는 영역을 식별하도록 학습되는 경우, 네거티브 샘플(210)은 뉴럴 네트워크를 통해 얻고 싶지 않은 잘못된 식별 결과를 나타낼 수 있다. 예를 들어, 포지티브 샘플(220, 230) 및 하드 네거티브 샘플(210)을 비교하면, 하드 네거티브 샘플(210) 및 객체가 중첩되는 영역의 크기는 포지티브 샘플(220, 230) 및 객체가 중첩되는 영역의 크기보다 상대적으로 적을 수 있다. 즉, 하드 네거티브 샘플(210)은 객체가 존재하는 영역을 포지티브 샘플(220, 230) 보다 정확하지 않게 식별한 결과를 나타낼 수 있다. 뉴럴 네트워크가 포지티브 샘플(220, 230)뿐만 아니라 하드 네거티브 샘플(210)을 학습함으로써, 뉴럴 네트워크는 보다 정확하게 학습 영상(220)에서 객체가 존재하는 영역을 식별할 수 있다.When the neural network is trained to identify an area where an object exists in the training image 200, the negative sample 210 may represent an incorrect identification result that is not desired to be obtained through the neural network. For example, if the positive samples 220 and 230 and the hard negative sample 210 are compared, the size of the area where the hard negative sample 210 and the object overlap is the area where the positive samples 220 and 230 and the object overlap. It may be less than the size of. That is, the hard negative sample 210 may represent a result of identifying an area in which an object exists more accurately than the positive samples 220 and 230. As the neural network learns the hard negative sample 210 as well as the positive samples 220 and 230, the neural network can more accurately identify the region where the object exists in the training image 220.

도 3은 일실시예에 따른 영상 학습 장치가 복수의 탐색 영역을 정렬하는 동작을 설명하기 위한 예시적인 도면이다.3 is an exemplary diagram for describing an operation of arranging a plurality of search areas by an image learning apparatus, according to an exemplary embodiment.

도 3을 참고하면, 영상 학습 장치가 획득한 복수의 탐색 영역들(탐색 영역 1(310) 내지 탐색 영역 4(340))이 도시된다. 탐색 영역들은 입력 영상(300)을 샘플링하여 획득될 수 있다. 영상 학습 장치는 복수의 탐색 영역들이 학습 영상(300)에 포함된 객체와 대응하는 확률인 클래스 스코어를 결정할 수 있다. 예를 들어, 탐색 영역 1(310)의 클래스 스코어는 0.21, 탐색 영역 2(320)의 클래스 스코어는 0.32, 탐색 영역 3(330)의 클래스 스코어는 0.12 및 탐색 영역 4(340)의 클래스 스코어는 0.57인 것으로 가정한다.Referring to FIG. 3, a plurality of search areas (search area 1 310 to search area 4 340) obtained by the image learning apparatus are illustrated. Search areas may be obtained by sampling the input image 300. The image learning apparatus may determine a class score which is a probability that the plurality of search areas correspond to the object included in the training image 300. For example, the class score of the search region 1 310 is 0.21, the class score of the search region 2 320 is 0.32, the class score of the search region 3 330 is 0.12, and the class score of the search region 4 340 is Assume 0.57.

영상 학습 장치는 복수의 탐색 영역들 중에서, 미리 설정된 임계치(예를 들어, 도 1의 임계치(θ)) 이하의 클래스 스코어를 가지는 탐색 영역(즉, 소프트 네거티브 샘플)을 제거할 수 있다. 예를 들어, 임계치가 0.2인 경우, 클래스 스코어가 0.12인 탐색 영역 3(330)이 제거될 수 있다. 영상 학습 장치는 임계치를 초과하는 클래스 스코어를 가지는 탐색 영역들에 대하여 군집화를 수행할 수 있다. 영상 학습 장치가 군집화를 수행하면서, 상대적으로 낮은 클래스 스코어를 가지는 탐색 영역이 제거될 수 있다. 영상 학습 장치는 군집화를 수행한 다음, 남아있는 탐색 영역들을 클래스 스코어에 기초하여 정렬하여 저장할 수 있다. 따라서, 임계치(θ) 보다 큰 클래스 스코어를 가지는 탐색 영역(즉, 하드 네거티브 샘플)이 영상 학습 장치에 저장될 수 있다.The image learning apparatus may remove a search region (ie, a soft negative sample) having a class score equal to or less than a preset threshold (for example, the threshold θ of FIG. 1) among the plurality of search regions. For example, if the threshold is 0.2, the search region 3 330 having a class score of 0.12 may be removed. The image learning apparatus may perform clustering on search areas having a class score exceeding a threshold. As the image learning apparatus performs clustering, a search region having a relatively low class score may be removed. After performing clustering, the image learning apparatus may sort and store the remaining search areas based on the class score. Therefore, a search area (ie, a hard negative sample) having a class score larger than the threshold θ may be stored in the image learning apparatus.

군집화를 수행한 다음, 탐색 영역 1(310), 탐색 영역 2(320) 및 탐색 영역 4(340)가 남은 것으로 가정하자. 영상 학습 장치는 상기 탐색 영역들을, 클래스 스코어가 높은 순서인 탐색 영역 4(340), 탐색 영역 2(320) 및 탐색 영역 1(310) 순서대로 정렬할 수 있다. 영상 학습 장치는 정렬된 복수의 탐색 영역들 중에서, 학습 영상(300)으로부터 식별된 포지티브 샘플의 개수에 기초하여, 하나 이상의 탐색 영역을 선택하여 뉴럴 네트워크의 학습에 이용할 수 있다. 영상 학습 장치가 선택하는 탐색 영역(즉, 하드 네거티브 샘플)의 개수는 포지티브 샘플의 개수에 비례할 수 있다.After performing clustering, assume that search area 1 310, search area 2 320, and search area 4 340 remain. The image learning apparatus may arrange the search areas in the order of search area 4 340, search area 2 320, and search area 1 310, which are the highest class scores. The image learning apparatus may select one or more search regions from among the plurality of aligned search regions based on the number of positive samples identified from the training image 300, and use them for learning the neural network. The number of search areas (ie, hard negative samples) selected by the image learning apparatus may be proportional to the number of positive samples.

도 4는 일실시예에 따른 영상 학습 장치가 복수의 탐색 영역 중에서 하드 네거티브 샘플을 선택하는 동작을 설명하기 위한 예시적인 도면이다.4 is a diagram for describing an operation of selecting a hard negative sample from a plurality of search areas by an image learning apparatus, according to an exemplary embodiment.

도 4를 참고하면, 영상 학습 장치가 학습 영상(400)에서 추출한 복수의 탐색 영역이 도시된다. 복수의 탐색 영역은 미리 설정된 확률에 기초하여 학습 영상(400)에서 샘플링된 영역일 수 있다. 영상 학습 장치는 샘플링된 복수의 탐색 영역의 클래스 스코어를 결정할 수 있다. 바꾸어 말하면, 클래스 스코어를 결정하는 것이 학습 영상(400)의 모든 영역이 아닌 샘플링된 복수의 탐색 영역에서 수행되므로, 클래스 스코어를 결정하는데 필요한 계산량이 절감될 수 있다.Referring to FIG. 4, a plurality of search regions extracted from the training image 400 by the image learning apparatus are illustrated. The plurality of search areas may be areas sampled from the training image 400 based on a preset probability. The image learning apparatus may determine class scores of the sampled plurality of search areas. In other words, since determining the class score is performed in the plurality of sampled search areas instead of all the areas of the training image 400, the amount of calculation required to determine the class score may be reduced.

복수의 탐색 영역 각각의 클래스 스코어가 결정되면, 영상 학습 장치는 복수의 탐색 영역에서, 미리 설정된 임계치(θ)를 초과하는 클래스 스코어를 가지는 탐색 영역을 하나 이상 추출할 수 있다. 도 4를 참고하면, 탐색 영역들(420, 430, 440, 450, 460)이 미리 설정된 임계치(θ)를 초과하는 클래스 스코어를 가질 수 있다. 영상 학습 장치는 탐색 영역들(420, 430, 440, 450, 460)을 제외한 나머지 탐색 영역들을 버릴 수 있다(discard). 영상 학습 장치는 탐색 영역들(420, 430, 440, 450, 460)에 대해 군집화 또는 정렬을 수행할 수 있다. 영상 학습 장치가 탐색 영역들(420, 430, 440, 450, 460)에 대해 수행하는 군집화 또는 정렬은 도 3에서 설명한 바와 유사하므로, 상세한 설명을 생략한다.When a class score of each of the plurality of search areas is determined, the image learning apparatus may extract one or more search areas having a class score exceeding a preset threshold θ from the plurality of search areas. Referring to FIG. 4, the search areas 420, 430, 440, 450, and 460 may have a class score that exceeds a preset threshold θ. The image learning apparatus discards the remaining search areas except the search areas 420, 430, 440, 450, and 460. The image learning apparatus may perform clustering or alignment on the search areas 420, 430, 440, 450, and 460. Clustering or alignment performed by the image learning apparatus on the search areas 420, 430, 440, 450, and 460 is similar to that described with reference to FIG. 3, and thus a detailed description thereof will be omitted.

영상 학습 장치는 복수의 탐색 영역 중에서 포지티브 샘플과 대응하는 탐색 영역을 제외한 나머지 탐색 영역에 대해서 상술한 군집화 또는 정렬을 수행할 수 있다. 따라서, 탐색 영역들(420, 430, 440, 450, 460)은 미리 설정된 임계치(θ)를 초과하는 클래스 스코어를 가지는 하드 네거티브 샘플일 수 있다.The image learning apparatus may perform the above-described clustering or alignment on the remaining search regions except for the search region corresponding to the positive sample among the plurality of search regions. Thus, the search areas 420, 430, 440, 450, 460 may be hard negative samples having a class score that exceeds a preset threshold θ.

영상 학습 장치는 미리 설정된 임계치(θ)를 초과하는 클래스 스코어를 가지는 탐색 영역들(420, 430, 440, 450, 460) 중 적어도 하나를 뉴럴 네트워크를 학습하는데 이용할 수 있다. 영상 학습 장치는 탐색 영역들(420, 430, 440, 450, 460) 중 적어도 하나를 뉴럴 네트워크의 학습에 이용할 하드 네거티브 샘플로 결정하기에 앞서, 학습 영상(400)에서 포지티브 샘플(410)을 하나 이상 결정할 수 있다. 영상 학습 장치가 학습 영상(400)에서 포지티브 샘플(410)을 하나 이상 결정하는 동작은 도 2에서 설명한 바와 유사하므로, 상세한 설명을 생략한다.The image learning apparatus may use at least one of the search areas 420, 430, 440, 450, and 460 having a class score exceeding a preset threshold θ to train the neural network. Before the image learning apparatus determines at least one of the search areas 420, 430, 440, 450, and 460 as a hard negative sample to be used for learning the neural network, the image learning apparatus performs one positive sample 410 in the training image 400. We can decide more. The operation of the image learning apparatus to determine one or more positive samples 410 from the training image 400 is similar to that described with reference to FIG. 2, and thus a detailed description thereof will be omitted.

도 4를 참고하면, 영상 학습 장치가 학습 영상(400)에서 1 개의 포지티브 샘플(410)을 추출한 것으로 가정하자. 탐색 영역들(420, 430, 440, 450, 460) 중에서 뉴럴 네트워크의 학습에 이용되는 하드 네거티브 샘플의 개수는 포지티브 샘플(410)의 개수에 기초하여 결정될 수 있다. 예를 들어, 뉴럴 네트워크의 학습에 이용되는 하드 네거티브 샘플의 개수 및 포지티브 샘플(410)의 개수 사이의 비율이 미리 설정된 비율을 만족하도록, 영상 학습 장치는 탐색 영역들(420, 430, 440, 450, 460) 중에서 뉴럴 네트워크의 학습에 이용되는 하드 네거티브 샘플을 선택할 수 있다.Referring to FIG. 4, it is assumed that the image learning apparatus extracts one positive sample 410 from the training image 400. The number of hard negative samples used for learning the neural network among the search regions 420, 430, 440, 450, and 460 may be determined based on the number of positive samples 410. For example, the image learning apparatus may search the search areas 420, 430, 440, 450 such that the ratio between the number of hard negative samples and the number of positive samples 410 used for learning the neural network satisfies a preset ratio. 460 may select a hard negative sample used for learning a neural network.

예를 들어, 상기 비율이 1:3인 경우, 포지티브 샘플(410)의 개수가 1이므로, 영상 학습 장치는 임계치(θ)를 초과하는 클래스 스코어를 가지는 5개의 탐색 영역들(420, 430, 440, 450, 460) 중에서 3개의 탐색 영역을 선택하여 뉴럴 네트워크의 학습에 이용할 수 있다. 영상 학습 장치는 탐색 영역들(420, 430, 440, 450, 460) 각각의 클래스 스코어에 기초하여, 가장 큰 클래스 스코어를 가지는 탐색 영역부터 내림차순으로 3개의 탐색 영역을 선택하여 뉴럴 네트워크의 학습에 이용할 수 있다.For example, when the ratio is 1: 3, since the number of positive samples 410 is 1, the image learning apparatus has five search areas 420, 430, and 440 having a class score exceeding a threshold θ. , Three search areas may be selected from 450 and 460 to be used for learning a neural network. Based on the class score of each of the search areas 420, 430, 440, 450, and 460, the image learning apparatus selects three search areas from the search area having the largest class score in descending order and uses them for learning the neural network. Can be.

포지티브 샘플(410)의 개수 또는 미리 설정된 비율에 기초하여 결정되는 하드 네거티브 샘플의 개수가 임계치(θ)를 초과하는 클래스 스코어를 가지는 탐색 영역들(420, 430, 440, 450, 460)의 개수 이상인 경우, 영상 학습 장치는 탐색 영역들(420, 430, 440, 450, 460) 전부를 뉴럴 네트워크의 학습에 이용할 수 있다. 예를 들어, 상기 비율이 1:6인 경우, 포지티브 샘플(410)의 개수가 1이므로, 영상 학습 장치는 6개의 탐색 영역을 뉴럴 네트워크의 학습에 이용할 하드 네거티브 샘플의 개수로 결정할 수 있다. 탐색 영역들(420, 430, 440, 450, 460)의 개수가 결정된 하드 네거티브 샘플의 개수보다 작으므로, 영상 학습 장치는 탐색 영역들(420, 430, 440, 450, 460) 전부를 뉴럴 네트워크의 학습에 이용할 하드 네거티브 샘플로 결정할 수 있다. 따라서, 하드 네거티브 샘플의 개수가 6으로 결정되었음에도 불구하고, 뉴럴 네트워크의 학습에 이용되는 탐색 영역 또는 하드 네거티브 샘플의 최종적인 개수는 5일 수 있다.The number of hard negative samples determined based on the number of positive samples 410 or a preset ratio is equal to or greater than the number of search areas 420, 430, 440, 450, 460 having a class score that exceeds a threshold θ. In this case, the image learning apparatus may use all of the search areas 420, 430, 440, 450, and 460 for learning the neural network. For example, when the ratio is 1: 6, since the number of positive samples 410 is 1, the image learning apparatus may determine six search areas as the number of hard negative samples to be used for learning the neural network. Since the number of search areas 420, 430, 440, 450, and 460 is smaller than the determined number of hard negative samples, the image learning apparatus may move all of the search areas 420, 430, 440, 450, and 460 into the neural network. This can be determined by the hard negative sample used for learning. Thus, although the number of hard negative samples is determined to be 6, the final number of search areas or hard negative samples used for learning the neural network may be five.

포지티브 샘플(410)의 개수 또는 미리 설정된 비율에 기초하여 결정되는 하드 네거티브 샘플의 개수가 임계치(θ)를 초과하는 클래스 스코어를 가지는 탐색 영역들(420, 430, 440, 450, 460)의 개수 이상인 경우, 영상 학습 장치는 임계치(θ)를 변경할 수 있다. 예를 들어, 학습 영상(400)의 학습이 완료된 이후, 새로운 학습 영상을 학습하는 경우, 새로운 학습 영상에 대응하는 하드 네거티브 샘플의 개수가 증가하도록, 임계치(θ)는 보다 작은 값으로 변경될 수 있다. 바꾸어 말하면, 학습 영상(400)에 적용된 임계치를 θ라 하고, 새로운 학습 영상에 적용되는 임계치를 θ’라 하는 경우, 미리 설정된 1보다 작은 실수 α(예를 들어, 0.99)에 대하여, θ’ = α × θ일 수 있다.The number of hard negative samples determined based on the number of positive samples 410 or a preset ratio is equal to or greater than the number of search areas 420, 430, 440, 450, 460 having a class score that exceeds a threshold θ. In this case, the image learning apparatus may change the threshold θ. For example, after the learning of the training image 400 is completed, when learning a new learning image, the threshold value θ may be changed to a smaller value so that the number of hard negative samples corresponding to the new learning image increases. have. In other words, when the threshold value applied to the training image 400 is θ, and the threshold value applied to the new training image is θ ', θ' = for a real number α (for example, 0.99) smaller than 1 preset. may be α × θ.

포지티브 샘플(410) 및 상술한 동작에 의해 선택된 하나 이상의 하드 네거티브 샘플은 학습 영상(400)과 함께 뉴럴 네트워크의 학습에 이용될 수 있다. 영상 학습 장치가 학습 영상(400)을 포함하는 복수의 학습 영상을 이용하여 뉴럴 네트워크를 학습시키면서, 복수의 학습 영상에 순차적으로 적용되는 임계치(θ)의 크기는 점진적으로 감소될 수 있다. 따라서, 복수의 학습 영상에서 결정되는 하드 네거티브 샘플의 클래스 스코어도 임계치(θ)의 크기에 따라 점진적으로 감소될 수 있다. 따라서, 뉴럴 네트워크의 정확도가 빠르게 수렴할 수 있다.The positive sample 410 and one or more hard negative samples selected by the above-described operation may be used for learning the neural network together with the training image 400. While the image learning apparatus trains the neural network using a plurality of learning images including the learning image 400, the magnitude of the threshold value θ sequentially applied to the plurality of learning images may be gradually reduced. Therefore, the class score of the hard negative sample determined in the plurality of training images may also be gradually decreased according to the magnitude of the threshold θ. Therefore, the accuracy of the neural network can converge quickly.

도 5는 일실시예에 따른 영상 학습 장치에 의해 학습된 뉴럴 네트워크를 이용하여 입력 영상에 존재하는 객체를 인식하는 동작을 설명하기 위한 흐름도이다. 일실시예에 따른 영상 학습 장치에 의해 학습된 뉴럴 네트워크는, 예를 들어, 지능형 자동차, 영상 보안 장치, 게임 및 로봇 등의 다양한 응용 분야에서 입력 영상을 인식하는데 사용될 수 있다. 이하에서는 일실시예에 따른 영상 인식 장치가 일실시예에 따른 영상 학습 장치에 의해 학습된 뉴럴 네트워크를 이용하여 입력 영상을 인식하는 동작을 설명한다. 영상 인식 장치는 지능형 자동차, 영상 보안 장치, 게임 및 로봇 등에 적용될 수 있다. 영상 학습 장치 또한 도 1 내지 도 4의 동작에 의해 학습된 뉴럴 네트워크에 기초하여 입력 영상에 포함된 객체를 인식할 수 있다.FIG. 5 is a flowchart illustrating an operation of recognizing an object existing in an input image using a neural network learned by an image learning apparatus, according to an exemplary embodiment. The neural network learned by the image learning apparatus according to an embodiment may be used to recognize an input image in various application fields such as an intelligent vehicle, an image security device, a game, and a robot. Hereinafter, an operation of recognizing an input image using a neural network learned by the image learning apparatus according to an embodiment will be described. The image recognition device may be applied to an intelligent vehicle, an image security device, a game and a robot. The image learning apparatus may also recognize an object included in the input image based on the neural network learned by the operations of FIGS. 1 to 4.

도 5를 참고하면, 단계(510)에서, 일실시예에 따른 영상 인식 장치는 입력 영상을 식별할 수 있다. 입력 영상은 영상 인식 장치와 연결된 네트워크를 통해 수신되어, 영상 인식 장치의 메모리에 저장될 수 있다. 또는, 영상 인식 장치와 연결된 다른 전자 장치(예를 들어, 이미지 센서를 포함하는 카메라, 스마트폰 등)로부터 전송되어, 영상 인식 장치의 메모리에 저장될 수 있다. 영상 인식 장치는 메모리에 저장된 입력 영상을 식별할 수 있다.Referring to FIG. 5, in operation 510, the image recognition apparatus may identify an input image. The input image may be received through a network connected to the image recognition device and stored in a memory of the image recognition device. Alternatively, the electronic device may be transmitted from another electronic device (eg, a camera including an image sensor, a smartphone, etc.) connected to the image recognition device, and stored in a memory of the image recognition device. The image recognition apparatus may identify the input image stored in the memory.

도 5를 참고하면, 단계(520)에서, 일실시예에 따른 영상 인식 장치는 입력 영상을 뉴럴 네트워크에 입력하기 위하여, 입력 영상을 전처리할 수 있다. 입력 영상의 밝기(intensity), 크기 등이 뉴럴 네트워크의 입력을 고려하여 변경될 수 있으며, 입력 영상과 관련된 특징 벡터가 추출될 수 있다.Referring to FIG. 5, in operation 520, the image recognition apparatus according to an embodiment may preprocess the input image to input the input image to the neural network. The intensity, size, etc. of the input image may be changed in consideration of the input of the neural network, and a feature vector associated with the input image may be extracted.

도 5를 참고하면, 단계(530)에서, 일실시예에 따른 영상 인식 장치는 전처리된 입력 영상을 사전에 학습된 뉴럴 네트워크에 입력할 수 있다. 뉴럴 네트워크는 도 1 내지 도 4에서 설명한 영상 학습 장치의 동작에 의해 학습될 수 있다. 뉴럴 네트워크는 복수의 노드를 포함하는 입력 레이어, 하나 이상의 히든 레이어 및 출력 레이어를 포함할 수 있다. 입력 영상과 관련된 정보(예를 들어, 상기 특징 벡터)가 입력 레이어로 입력될 수 있다.Referring to FIG. 5, in operation 530, the image recognition apparatus may input a preprocessed input image into a pre-learned neural network. The neural network may be learned by the operation of the image learning apparatus described with reference to FIGS. 1 to 4. The neural network may include an input layer including one or more nodes, one or more hidden layers, and an output layer. Information related to the input image (eg, the feature vector) may be input to the input layer.

도 5를 참고하면, 단계(540)에서, 일실시예에 따른 영상 인식 장치는 뉴럴 네트워크의 출력에 기초하여, 입력 영상에서 객체가 존재하는 영역을 결정할 수 있다. 바꾸어 말하면, 영상 인식 장치는 뉴럴 네트워크의 출력에 기초하여, 입력 영상에 포함된 객체를 인식할 수 있다. 뉴럴 네트워크는 복수의 노드를 포함하는 출력 레이어를 포함할 수 있다. 영상 인식 장치는 출력 레이어에 포함된 노드의 출력 값을 획득할 수 있다. 영상 인식 장치는 획득된 출력 값에 기초하여, 입력 영상에서 객체가 존재하는 영역을 결정할 수 있다.Referring to FIG. 5, in operation 540, the image recognition apparatus may determine a region in which an object exists in an input image based on an output of a neural network. In other words, the image recognition apparatus may recognize the object included in the input image based on the output of the neural network. The neural network may include an output layer including a plurality of nodes. The image recognition apparatus may obtain an output value of a node included in the output layer. The image recognition apparatus may determine a region in which the object exists in the input image based on the obtained output value.

뉴럴 네트워크가 도 1 내지 도 4에서 설명한 영상 학습 장치의 동작에 의해 포지티브 샘플 및 하드 네거티브 샘플을 학습하였으므로, 영상 인식 장치는 입력 영상에서 객체가 존재하는 영역과 관련된 정보를 뉴럴 네트워크로부터 획득할 수 있다. 영상 인식 장치는 뉴럴 네트워크의 출력에 기초하여, 입력 영상에서 객체가 존재하는 영역을 바운딩 박스로 표시하여 출력할 수 있다. 입력 영상에서 객체가 존재하는 영역을 바운딩 박스로 표시하여 출력하면서, 영상 인식 장치는 바운딩 박스에 객체가 존재할 확률을 출력할 수 있다. 또는, 영상 인식 장치는 입력 영상에서 객체가 존재하는 확률이 높은 영역을 추출하여 출력할 수 있다.Since the neural network has learned the positive sample and the hard negative sample by the operations of the image learning apparatus described with reference to FIGS. 1 to 4, the image recognition apparatus may obtain information related to an area in which an object exists in the input image from the neural network. . The image recognition apparatus may display an area where an object exists in the input image as a bounding box based on the output of the neural network and output the bounding box. The image recognition apparatus may output a probability that the object exists in the bounding box while displaying and displaying a region in which the object exists in the input image. Alternatively, the image recognition apparatus may extract and output a region having a high probability that an object exists in the input image.

지능형 자동차, 영상 보안 장치, 게임 및 로봇 등에 뉴럴 네트워크를 이용한 영상 인식 장치를 적용하기 위하여, 뉴럴 네트워크와 관련된 다양한 파라미터를 조절하면서, 해당 분야의 환경에 최적화된 뉴럴 네트워크를 개발하고 검증할 필요가 있다. 일실시예에 따른 영상 인식 장치가 영상 학습 장치에 의해 학습된 뉴럴 네트워크를 사용함에 따라, 뉴럴 네트워크의 개발 및 검증에 소요되는 시간이 절감될 수 있다. 따라서, 영상 인식 장치의 개발에 소요되는 시간이 단축될 수 있다.In order to apply an image recognition device using a neural network to an intelligent vehicle, a video security device, a game, and a robot, it is necessary to develop and verify a neural network optimized for an environment of a related field while adjusting various parameters related to the neural network. . As the image recognition apparatus according to an embodiment uses a neural network learned by the image learning apparatus, time required for development and verification of the neural network can be reduced. Therefore, the time required for the development of the image recognition device can be shortened.

도 6은 일실시예에 따른 영상 학습 장치의 구조를 설명하기 위한 도면이다.6 is a diagram for describing a structure of an image learning apparatus, according to an exemplary embodiment.

도 6을 참조하면, 일실시예에 따른 영상 학습 장치는 메모리(610) 및 프로세서(620)를 포함할 수 있다. 메모리(610) 및 프로세서(620)는 버스(bus)(630)를 통하여 서로 통신할 수 있다.Referring to FIG. 6, an image learning apparatus according to an embodiment may include a memory 610 and a processor 620. The memory 610 and the processor 620 may communicate with each other via a bus 630.

메모리(610)는 컴퓨터에서 읽을 수 있는 명령어를 저장할 수 있다. 뉴럴 네트워크를 학습하는데 이용되는 학습 영상, 학습 영상에서 샘플링된 탐색 영역들 및 탐색 영역들 각각에 대응하는 클래스 스코어가 메모리(610)에 저장될 수 있다. 영상 학습 장치가 임계치 이상의 탐색 영역들을 정렬하는 경우, 임계치 이상의 탐색 영역들이 클래스 스코어의 내림차순으로 메모리(610)에 저장될 수 있다. 또한, 뉴럴 네트워크와 관련된 파라미터들이 메모리(610)에 저장될 수 있다.The memory 610 may store a computer readable command. A training image, a search region sampled from the training image, and a class score corresponding to each of the search regions used for learning the neural network may be stored in the memory 610. When the image learning apparatus arranges the search areas above the threshold, the search areas above the threshold may be stored in the memory 610 in descending order of the class score. In addition, parameters related to the neural network may be stored in the memory 610.

프로세서(620)는 메모리(610)에 저장된 명령어가 프로세서(620)에서 실행됨에 따라 상술한 동작들을 수행할 수 있다. 메모리(610)는 RAM(Random Access Memory)과 같은 휘발성 메모리이거나, HDD(Hard Disk Drive) 또는 SSD(Solid State Drive)와 같은 비휘발성 메모리일 수 있다.The processor 620 may perform the above-described operations as an instruction stored in the memory 610 is executed in the processor 620. The memory 610 may be a volatile memory such as random access memory (RAM) or a nonvolatile memory such as a hard disk drive (HDD) or a solid state drive (SSD).

프로세서(620)는 명령어들, 혹은 프로그램들을 실행하거나, 영상 학습 장치를 제어하는 장치로서, 예를 들어, CPU(Central Processing Unit) 및 GPU(Graphic Processing Unit)를 포함할 수 있다. 영상 학습 장치는 입출력 장치(도면 미 표시)를 통하여 외부 장치(예를 들어, 영상 촬영 장치, 퍼스널 컴퓨터 또는 네트워크)에 연결되고, 데이터를 교환할 수 있다. 예를 들어, 영상 학습 장치는 이미지 센서를 통해 학습 영상을 수신할 수 있다. 영상 학습 장치는 퍼스널 컴퓨터, 태블릿 컴퓨터, 넷북 등 컴퓨팅 장치, 이동 전화, 스마트 폰, PDA, 태블릿 컴퓨터, 랩톱 컴퓨터 등 모바일 장치, 또는 스마트 텔레비전, 게이트 제어를 위한 보안 장치 등 전자 제품 등의 적어도 일부로 구현될 수 있다.The processor 620 may be a device that executes instructions or programs or controls an image learning apparatus. For example, the processor 620 may include a central processing unit (CPU) and a graphic processing unit (GPU). The image learning apparatus may be connected to an external apparatus (eg, an image photographing apparatus, a personal computer, or a network) through an input / output device (not shown), and may exchange data. For example, the image learning apparatus may receive a learning image through an image sensor. The image learning device is implemented as at least a part of a computing device such as a personal computer, a tablet computer, a netbook, a mobile device such as a mobile phone, a smart phone, a PDA, a tablet computer, a laptop computer, or an electronic product such as a smart television or a security device for gate control. Can be.

프로세서(620)는 미리 설정된 확률에 기초하여 학습 영상으로부터 복수의 탐색 영역을 샘플링하고, 복수의 탐색 영역 각각이 학습 영상에 포함된 객체와 대응하는지를 나타낸 클래스 스코어를 결정하고, 복수의 탐색 영역 중에서, 미리 설정된 임계치 보다 큰 클래스 스코어를 가지는 탐색 영역들을 선택할 수 있고, 선택된 탐색 영역들을 클래스 스코어에 따라 정렬하여 메모리(610)에 저장할 수 있다. 프로세서(620)는 학습 영상의 일부분으로써, 학습 영상에서 객체가 존재하는 영역을 뉴럴 네트워크에 학습하는데 이용되는 포지티브 샘플을 식별할 수 있고, 식별된 포지티브 샘플의 개수 및 클래스 스코어에 기초하여, 메모리(610)에 저장된 탐색 영역들 중에서, 학습 영상에서 객체가 존재하지 않는 영역을 뉴럴 네트워크에 학습하는데 이용되는 하드 네거티브 샘플을 선택할 수 있다. 프로세서(620)는 선택된 하드 네거티브 샘플 및 포지티브 샘플에 기초하여 뉴럴 네트워크를 학습할 수 있다.The processor 620 samples a plurality of search areas from the learning image based on a preset probability, determines a class score indicating whether each of the plurality of search areas corresponds to an object included in the learning image, and among the plurality of search areas, Search areas having a class score greater than a preset threshold may be selected, and the selected search areas may be sorted according to the class score and stored in the memory 610. The processor 620 may identify, as part of the training image, a positive sample used to train the neural network on a region in which the object exists in the training image, and based on the number of identified positive samples and the class score, the memory ( Among the search areas stored in 610, a hard negative sample used to train the neural network on a region where no object exists in the training image may be selected. The processor 620 may learn a neural network based on the selected hard negative sample and positive sample.

도 6에 도시된 각 구성요소들에는 도 1 내지 도 5를 통하여 전술한 사항들이 그대로 적용되므로, 보다 상세한 설명은 생략한다.Since the above-described matters are applied to each of the elements shown in FIG. 6 through FIGS. 1 to 5, detailed description thereof will be omitted.

요약하면, 일실시예에 따른 영상 학습 장치는 학습 영상으로부터 뉴럴 네트워크의 학습에 이용되는 하드 네거티브 샘플을 추출할 수 있다. 하드 네거티브 샘플은 객체를 인식한 바람직하지 않은 결과를 뉴럴 네트워크에 학습하는데 이용될 수 있다. 하드 네거티브 샘플은 미리 설정된 확률에 따라 학습 영상으로부터 샘플링된 탐색 영역들 중에서 결정될 수 있다. 영상 학습 장치는 샘플링된 탐색 영역들이 객체에 대응할 확률인 클래스 스코어를 결정한 다음, 결정된 클래스 스코어에 기초하여 탐색 영역들 중에서 뉴럴 네트워크의 학습에 이용할 하드 네거티브 샘플을 결정할 수 있다. 뉴럴 네트워크의 학습에 이용할 하드 네거티브 샘플의 개수는 포지티브 샘플의 개수, 클래스 스코어와 비교되는 미리 설정된 임계치 및 미리 설정된 포지티브 샘플 및 하드 네거티브 샘플간의 비율 중 적어도 하나에 기초하여 결정될 수 있다. 따라서, 하드 네거티브 샘플을 정렬하는데 소요되는 시간 및 연산량이 절감될 수 있다.In summary, the image learning apparatus may extract a hard negative sample used for learning the neural network from the training image. Hard negative samples can be used to train the neural network for undesirable results of recognizing objects. The hard negative sample may be determined among search areas sampled from the training image according to a preset probability. The image learning apparatus may determine a class score that is a probability that the sampled search areas correspond to the object, and then determine a hard negative sample to be used for learning the neural network among the search areas based on the determined class score. The number of hard negative samples to be used for learning the neural network may be determined based on at least one of the number of positive samples, a preset threshold compared to the class score, and a ratio between the preset positive samples and the hard negative samples. Thus, the amount of time and computation required to align the hard negative sample can be saved.

임계치 이상의 클래스 스코어를 가지는 탐색 영역이 하드 네거티브 샘플로 결정되므로, 뉴럴 네트워크의 학습에 이용되는 하드 네거티브 샘플은 객체의 존재 여부를 판단하는 것이 상대적으로 어려운 탐색 영역에 대응할 수 있다. 영상 학습 장치가 미리 설정된 확률에 따라 학습 영상으로부터 샘플링된 탐색 영역들 중에서 하드 네거티브 샘플을 결정하므로, 하드 네거티브 샘플을 결정하는데 소요되는 시간이 절감될 수 있다.Since a search region having a class score of more than a threshold is determined as a hard negative sample, the hard negative sample used for learning the neural network may correspond to a search region having difficulty in determining the existence of an object. Since the image learning apparatus determines a hard negative sample among the search areas sampled from the training image according to a preset probability, the time required for determining the hard negative sample may be reduced.

하드 네거티브 샘플로 결정되는 탐색 영역의 개수는 임계치를 초과하는 클래스 스코어를 가지는 탐색 영역의 개수를 초과하지 않을 수 있다. 하드 네거티브 샘플로 결정되는 탐색 영역의 개수가 부족한 것으로 결정되는 경우, 임계치는 보다 작은 값으로 변경될 수 있다. 영상 학습 장치는 포지티브 샘플의 개수 및 미리 설정된 포지티브 샘플 및 하드 네거티브 샘플간의 비율에 기초하여, 하드 네거티브 샘플로 결정되는 탐색 영역의 개수가 부족한지 판단할 수 있다. 영상 학습 장치가 복수의 학습 영상을 순차적으로 뉴럴 네트워크의 학습에 이용하는 경우, 학습 영상으로부터 하드 네거티브 샘플을 추출할 때마다, 임계치가 적응적으로 변경될 수 있다. 적응적으로 변경되는 임계치 및 상기 확률에 따라 샘플링된 탐색 영역을 이용하여, 일실시예에 따른 영상 학습 장치는 성능의 저하 없이 뉴럴 네트워크를 보다 빠르게 학습할 수 있다.The number of search areas determined as the hard negative sample may not exceed the number of search areas having a class score exceeding a threshold. If it is determined that the number of search areas determined as the hard negative sample is insufficient, the threshold may be changed to a smaller value. The image learning apparatus may determine whether the number of the search areas determined as the hard negative samples is insufficient based on the number of positive samples and the ratio between the positive and hard negative samples. When the image learning apparatus sequentially uses the plurality of learning images for learning the neural network, the threshold may be adaptively changed whenever the hard negative sample is extracted from the learning images. Using the adaptively changed threshold value and the search region sampled according to the probability, the image learning apparatus may learn the neural network faster without degrading performance.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The apparatus described above may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components. For example, the devices and components described in the embodiments are, for example, processors, controllers, arithmetic logic units (ALUs), digital signal processors, microcomputers, field programmable gate arrays (FPGAs). Can be implemented using one or more general purpose or special purpose computers, such as a programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to the execution of the software. For convenience of explanation, one processing device may be described as being used, but one of ordinary skill in the art will appreciate that the processing device includes a plurality of processing elements and / or a plurality of types of processing elements. It can be seen that it may include. For example, the processing device may include a plurality of processors or one processor and one controller. In addition, other processing configurations are possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of the above, and configure the processing device to operate as desired, or process it independently or collectively. You can command the device. Software and / or data may be any type of machine, component, physical device, virtual equipment, computer storage medium or device in order to be interpreted by or to provide instructions or data to the processing device. Or may be permanently or temporarily embodied in a signal wave to be transmitted. The software may be distributed over networked computer systems so that they may be stored or executed in a distributed manner. Software and data may be stored on one or more computer readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be embodied in the form of program instructions that can be executed by various computer means and recorded in a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the media may be those specially designed and constructed for the purposes of the embodiments, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.Although the embodiments have been described by the limited embodiments and the drawings as described above, various modifications and variations are possible to those skilled in the art from the above description. For example, the described techniques may be performed in a different order than the described method, and / or components of the described systems, structures, devices, circuits, etc. may be combined or combined in a different form than the described method, or other components. Or even if replaced or substituted by equivalents, an appropriate result can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다. Therefore, other implementations, other embodiments, and equivalents to the claims are within the scope of the claims that follow.

Claims

In the image learning method using a neural network,
Sampling a plurality of search areas from a training image;
Determining a class score that is a probability that each of the plurality of search areas corresponds to an object included in the training image;
Identifying a hard negative sample of the plurality of search regions having a class score greater than a preset threshold; And
Training the neural network based on the identified hard negative sample
Including,
The threshold is
The image learning method is adjusted based on the number of hard negative samples used for learning the neural network.

The method of claim 1,
Identifying the hard negative sample,
And among the plurality of search areas, identify the hard negative sample among search areas that do not correspond to a positive sample used for learning the neural network.

The method of claim 1,
Learning the neural network,
Identifying the positive sample determined by comparing the location of the object and all regions of the learning image; And
Selecting from among the identified hard negative samples, from the hard negative samples having a large class score, to the hard negative samples used for learning the neural network sequentially.
Including,
The number of hard negative samples used for learning the neural network is
And at least one of the identified number of positive samples and a predetermined ratio between positive and hard negative samples.

The method of claim 1,
Determining whether to change the threshold based on the number of positive samples and the number of identified hard negative samples determined by comparing the location of the object and all regions of the training image.
Image learning method further comprising.

The method of claim 4, wherein
Determining whether to change the threshold,
And changing the threshold when the value of applying the number of positive samples to a preset ratio between the positive sample and the hard negative sample is greater than the identified number of hard negative samples.

The method of claim 5,
The threshold is
When the threshold value is changed, an image learning method having a value smaller than a value used in the learning image in a learning image different from the learning image.

The method of claim 1,
The sampling step,
The image learning method of sampling the plurality of search areas from the learning image based on a preset probability.

Determining a positive sample by comparing all regions of a training image with positions of objects of the training image; And
Determining a hard negative sample based on a class score and a preset threshold of each of the plurality of regions sampled from the training image, wherein the class score is a probability that each of the plurality of regions corresponds to the object; And
Training a neural network based on the determined hard negative sample and the determined positive sample
Including,
The threshold is
The image learning method is adjusted according to a result of comparing the number of the hard negative samples and the number of the positive samples.

The method of claim 8,
Determining the positive sample,
Identifying a location of the object based on truth data corresponding to the learning image; And
Selecting the positive sample from all regions of the learning image based on whether each of the regions of the learning image overlaps with a position of the identified object is greater than or equal to a threshold value.
Image learning method comprising a.

The method of claim 8,
Determining the hard negative sample,
Identifying a negative sample from among a plurality of areas sampled from the training image, based on whether the overlapping position of the object is equal to or less than a threshold value; And
Determining, among the identified negative samples, a negative sample having a class score greater than the threshold as the hard negative sample
Image learning method comprising a.

The method of claim 8,
The plurality of regions sampled from the training image are
A soft negative sample having a class score below the threshold; And
A hard negative sample with a class score greater than the threshold
Image learning method comprising a.

The method of claim 8,
Comparing the number of areas having a class score greater than the threshold among the plurality of areas and (2) a target ratio of hard negative samples calculated based on a preset ratio between positive and hard negative samples and the number of positive samples To determine whether to change the threshold.
Image learning method further comprising.

The method of claim 12,
Determining whether to change the threshold,
And when the target value is larger than the number of areas having a class score larger than the threshold, determining the threshold value to a smaller value.

The method of claim 12,
Determining the hard negative sample,
And when the target value is greater than the number of areas having a class score greater than the threshold, determining one or more areas having a class score greater than the threshold as the hard negative sample among the plurality of areas.

The method of claim 12,
Determining the hard negative sample,
If the target value is smaller than the number of regions having a class score greater than the threshold, extracting an area by the target value in descending order from the region having the largest class score among the plurality of regions; And
Determining the extracted regions as the hard negative sample
Image learning method comprising a.

In the image recognition method using a neural network,
Identifying an input image;
Inputting the input image to the neural network; And
Recognizing an object included in the input image based on the output of the neural network;
Including,
The neural network,
Among the plurality of search areas sampled in the training image, one or more hard negative samples selected based on a preset threshold are previously learned,
The threshold is
The image recognition method of claim 1, wherein the image recognition method is adjusted based on at least one of the number of positive samples and the number of selected hard negative samples.

The method of claim 16,
The hard negative sample,
A search region having a class score larger than the threshold value among negative samples, which are search regions except for the search region corresponding to the positive sample, from among the plurality of search regions;
The class score is a probability that each of the plurality of search areas corresponds to the object.