KR20200067632A

KR20200067632A - Method and apparatus for allocating memory space for driving a neural network

Info

Publication number: KR20200067632A
Application number: KR1020180154692A
Authority: KR
Inventors: 송준호
Original assignee: 삼성전자주식회사
Priority date: 2018-12-04
Filing date: 2018-12-04
Publication date: 2020-06-12
Also published as: US11675507B2; US20200174686A1; US11112980B2; US20210365194A1

Abstract

The present invention relates to a method and a device for allocating a memory space for driving a neural network. The method for allocating a memory space for driving a neural network comprises: allocating a space to store an input feature map in a memory based on an initial address value of the memory and capacity information of the input feature map; and allocating a space to store an output feature map in the memory based on a last address value of the memory and capacity information of the output feature map.

Description

Method and apparatus for allocating memory space for driving a neural network}

뉴럴 네트워크를 구동하기 위한 메모리 공간을 할당하는 방법 및 장치에 관한다. 구체적으로, 뉴럴 네트워크(neural network)의 복수의 레이어(layer)들 각각에 대해 메모리 공간을 할당하는 방법 및 장치에 관한다.It relates to a method and apparatus for allocating memory space for driving a neural network. Specifically, it relates to a method and apparatus for allocating memory space for each of a plurality of layers of a neural network.

뉴럴 네트워크(neural network)는 시냅스의 결합으로 네트워크를 형성한 인공 뉴런이 학습을 통해 시냅스의 결합 세기를 변화시켜, 문제 해결 능력을 가지는 모델 전반을 의미한다. A neural network refers to a model in which artificial neurons that form a network by combining synapses change the binding strength of synapses through learning, thereby having a problem-solving ability.

뉴럴 네트워크를 처리하는 장치는 복잡한 입력 데이터에 대한 많은 양의 연산을 필요로 한다. 예를 들어, 뉴럴 네트워크는 각 이미지에 대하여 많은 연산들을 수행하고, 많은 중간 결과 데이터들을 생성한다. Devices that process neural networks require large amounts of computation on complex input data. For example, the neural network performs many operations on each image and generates a lot of intermediate result data.

따라서, 뉴럴 네트워크의 프로세서가 많은 중간 결과 데이터들을 외부 메모리로부터 리드하거나 라이트하는 과정에서, 외부 메모리의 대역폭의 한계로 인해 뉴럴 네트워크의 성능이 저하될 수 있다.Accordingly, in a process in which a processor of a neural network reads or writes many intermediate result data from an external memory, performance of the neural network may be deteriorated due to a limitation of the bandwidth of the external memory.

뉴럴 네트워크를 구동하기 위한 메모리 공간을 할당하는 방법 및 장치를 제공하는데 있다. Disclosed is a method and apparatus for allocating memory space for driving a neural network.

본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 이하의 실시예들로부터 또 다른 기술적 과제들이 유추될 수 있다.The technical problems to be achieved by the present embodiment are not limited to the technical problems as described above, and other technical problems may be inferred from the following embodiments.

본 개시의 제 1 측면은, 뉴럴 네트워크의 복수의 레이어(layer)들 중 제 1 레이어의 입력 피처맵을 저장할 공간의 용량에 관한 제 1 용량 정보 및 상기 제 1 레이어의 출력 피처맵을 저장할 공간의 용량에 관한 제 2 용량 정보를 획득하는 단계; 및 상기 메모리의 초기 주소 값 및 상기 제 1 용량 정보에 기초하여 상기 메모리 내에서 상기 입력 피처맵을 저장할 제 1 저장 공간을 할당하고, 상기 메모리의 마지막 주소값 및 상기 제 2 용량 정보에 기초하여 상기 메모리 내에서 상기 출력 피처맵을 저장할 제 2 저장 공간을 할당하는 단계를 포함하는, 뉴럴 네트워크의 복수의 레이어들에 대해 메모리 공간을 할당하는 방법을 제공할 수 있다. A first aspect of the present disclosure includes a first capacity information regarding a capacity of a space to store an input feature map of a first layer among a plurality of layers of a neural network and a space to store an output feature map of the first layer. Obtaining second capacity information about the capacity; And a first storage space to store the input feature map in the memory based on the initial address value of the memory and the first capacity information, and based on the last address value of the memory and the second capacity information. And allocating a second storage space to store the output feature map in a memory, to provide a method for allocating memory space for a plurality of layers of a neural network.

본 개시의 제 2 측면은, 메모리; 및 적어도 하나의 프로그램을 실행함으로써 뉴럴 네트워크를 구동하는 프로세서를 포함하고, 상기 프로세서는, 상기 뉴럴 네트워크의 복수의 레이어들 중 제 1 레이어의 입력 피처맵을 저장할 공간의 용량에 관한 제 1 용량 정보 및 상기 제 1 레이어의 출력 피처맵을 저장할 공간의 용량에 관한 제 2 용량 정보를 획득하고, 상기 메모리의 초기 주소 값 및 상기 제 1 용량 정보에 기초하여 상기 메모리 내에서 상기 입력 피처맵을 저장할 제 1 저장 공간을 할당하고, 상기 메모리의 마지막 주소값 및 상기 제 2 용량 정보에 기초하여 상기 메모리 내에서 상기 출력 피처맵을 저장할 제 2 저장 공간을 할당하는, 뉴럴 네트워크 장치를 제공할 수 있다. A second aspect of the present disclosure includes a memory; And And a processor for driving a neural network by executing at least one program, wherein the processor includes first capacity information regarding a capacity of a space to store an input feature map of a first layer among a plurality of layers of the neural network and the First storage for acquiring second capacity information regarding a capacity of a space to store the output feature map of the first layer, and storing the input feature map in the memory based on the initial address value of the memory and the first capacity information It is possible to provide a neural network device that allocates space and allocates a second storage space to store the output feature map in the memory based on the last address value of the memory and the second capacity information.

본 개시의 제 3 측면은, 제 1 측면의 방법을 컴퓨터에서 실행시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공할 수 있다.A third aspect of the present disclosure can provide a computer-readable recording medium recording a program for executing the method of the first aspect on a computer.

본 개시는 뉴럴 네트워크를 구동하기 위한 메모리 공간을 할당하는 방법 및 장치를 제공할 수 있다. 구체적으로, 뉴럴 네트워크의 복수의 레이어들 각각에 대해, 프로세서 내에 위치한 내부 메모리 공간을 할당할 수 있다. 구체적으로, 레이어의 입력 피처맵 및 출력 피처맵을 저장하기 위한 공간을 내부 메모리 공간의 양 단에 할당할 수 있다. 현재 레이어의 출력 피처맵은 다음 레이이어의 입력 피처맵이 되므로, 입력 피처맵 및 출력 피처맵을 저장하기 위한 공간을 각각 내부 메모리 공간의 양 단에 할당하는 경우, 외부 메모리에의 엑세스를 줄일 수 있다. 따라서, 외부 메모리의 대역폭의 한계로 인해 뉴럴 네트워크의 성능이 저하되는 현상을 최소화할 수 있다.The present disclosure can provide a method and apparatus for allocating memory space for driving a neural network. Specifically, for each of the plurality of layers of the neural network, an internal memory space located in the processor may be allocated. Specifically, space for storing the input feature map and the output feature map of the layer may be allocated to both ends of the internal memory space. Since the output feature map of the current layer becomes the input feature map of the next layer, if space for storing the input feature map and the output feature map is allocated to both ends of the internal memory space, access to external memory can be reduced. have. Therefore, it is possible to minimize a phenomenon in which the performance of the neural network is degraded due to the bandwidth limitation of the external memory.

도 1은 뉴럴 네크워크 아키텍처의 일 예를 설명하기 위한 도면이다.
도 2는 뉴럴 네트워크에서 입력 피처맵 및 출력 피처맵의 관계의 일 예를 설명하기 위한 도면이다.
도 3은 뉴럴 네트워크 장치의 일 예를 나타낸 도면이다.
도 4는 메모리 공간의 할당 방식의 일 예를 나타낸 도면이다.
도 5는 메모리 단편화 현상으로 인한 오버헤드가 발생하는 과정의 일 예를 나타낸 도면이다.
도 6은 레이어를 복수의 서브 레이어들로 분할하는 일 예를 나타낸 도면이다.
도 7은 레이어를 복수의 서브 레이어들로 분할하여 메모리를 할당하는 일 예를 나타낸 도면이다.
도 8은 복수의 레이어들 내 타일들을 그룹화하여 메모리를 할당하는 일 예를 나타낸 도면이다.
도 9는 뉴럴 네트워크 장치에서 메모리 공간을 할당하는 과정의 일 예를 설명하기 위한 흐름도이다.
도 10은 뉴럴 네트워크 장치에서 메모리 공간을 할당하는 과정의 다른 예를 설명하기 위한 흐름도이다.
도 11은 뉴럴 네트워크의 각 레이어에서 입력 피처맵 및 출력 피처맵을 저장할 공간을 메모리 내에서 할당하는 과정의 일 예를 나타내는 흐름도이다.1 is a view for explaining an example of a neural network architecture.
2 is a diagram for explaining an example of the relationship between an input feature map and an output feature map in a neural network.
3 is a diagram illustrating an example of a neural network device.
4 is a diagram illustrating an example of a memory space allocation method.
5 is a diagram illustrating an example of a process in which overhead occurs due to a memory fragmentation phenomenon.
6 is a view showing an example of dividing a layer into a plurality of sub-layers.
7 is a diagram illustrating an example of allocating memory by dividing a layer into a plurality of sub-layers.
8 is a diagram illustrating an example of allocating memory by grouping tiles in a plurality of layers.
9 is a flowchart illustrating an example of a process of allocating memory space in a neural network device.
10 is a flowchart illustrating another example of a process of allocating memory space in a neural network device.
11 is a flowchart illustrating an example of a process of allocating a space for storing an input feature map and an output feature map in memory in each layer of a neural network.

본 실시예들에서 사용되는 용어는 본 실시예들에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 기술분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 임의로 선정된 용어도 있으며, 이 경우 해당 실시예의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서, 본 실시예들에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 실시예들의 전반에 걸친 내용을 토대로 정의되어야 한다.The terminology used in the present embodiments was selected from general terms that are currently widely used as possible while considering functions in the present embodiments, but this may vary depending on the intention or precedent of a person skilled in the art or the appearance of new technology. Can be. In addition, in certain cases, some terms are arbitrarily selected, and in this case, their meanings will be described in detail in the description of the corresponding embodiment. Therefore, the terms used in the present embodiments should be defined based on the meaning of the terms and the contents of the present embodiments, not simply the names of the terms.

실시예들에 대한 설명들에서, 어떤 부분이 다른 부분과 연결되어 있다고 할 때, 이는 직접적으로 연결되어 있는 경우뿐 아니라, 그 중간에 다른 구성요소를 사이에 두고 전기적으로 연결되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 포함한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.In the descriptions of the embodiments, when a part is connected to another part, this includes not only the case of being directly connected, but also the case of being electrically connected with another component in between. . Also, when a part includes a certain component, this means that other components may be further included rather than excluding other components unless otherwise specified.

본 실시예들에서 사용되는 "구성된다" 또는 "포함한다" 등의 용어는 명세서 상에 기재된 여러 구성 요소들, 도는 여러 단계들을 반드시 모두 포함하는 것으로 해석되지 않아야 하며, 그 중 일부 구성 요소들 또는 일부 단계들은 포함되지 않을 수도 있고, 또는 추가적인 구성 요소 또는 단계들을 더 포함할 수 있는 것으로 해석되어야 한다.The terms "consisting of" or "comprising" as used in the embodiments should not be construed as including all of the various components, or various steps described in the specification, and some of them or It should be construed that some steps may not be included, or may further include additional components or steps.

하기 실시예들에 대한 설명은 권리범위를 제한하는 것으로 해석되지 말아야 하며, 해당 기술분야의 당업자가 용이하게 유추할 수 있는 것은 실시예들의 권리범위에 속하는 것으로 해석되어야 할 것이다. 이하 첨부된 도면들을 참조하면서 오로지 예시를 위한 실시예들을 상세히 설명하기로 한다.The description of the following embodiments should not be construed as limiting the scope of rights, and those that can be easily inferred by those skilled in the art should be interpreted as belonging to the scope of the embodiments. Hereinafter, exemplary embodiments will be described in detail with reference to the accompanying drawings.

도 1은 뉴럴 네크워크 아키텍처의 일 예를 설명하기 위한 도면이다. 1 is a view for explaining an example of a neural network architecture.

도 1을 참조하면, 뉴럴 네트워크(1)는 딥 뉴럴 네트워크(Deep Neural Network, DNN) 또는 n-계층 뉴럴 네트워크(n-layers neural networks)의 아키텍처일 수 있다. DNN 또는 n-계층 뉴럴 네트워크는 컨벌루션 뉴럴 네트워크(Convolutional Neural Networks, CNN), 리커런트 뉴럴 네트워크(Recurrent Neural Networks, RNN), Deep Belief Networks, Restricted Boltzman Machines 등에 해당될 수 있다. 예를 들어, 뉴럴 네트워크(1)는 컨벌루션 뉴럴 네트워크(CNN)로 구현될 수 있다. 도 1에서는 뉴럴 네트워크(1)의 예시에 해당하는 컨벌루션 뉴럴 네트워크에서는 컨벌루션 레이어 외에도, 서브샘플링 레이어(subsampling layer, 또는 풀링 레이어(pooling layer)), 풀리 커넥티드(fully connected) 레이어 등이 더 포함될 수 있다.Referring to FIG. 1, the neural network 1 may be an architecture of a deep neural network (DNN) or an n-layers neural network. The DNN or n-layer neural network may correspond to Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Deep Belief Networks, Restricted Boltzman Machines, and the like. For example, the neural network 1 may be implemented as a convolutional neural network (CNN). In FIG. 1, in the convolutional neural network corresponding to the example of the neural network 1, in addition to the convolutional layer, a subsampling layer (or a pooling layer), a fully connected layer, etc. may be further included. have.

뉴럴 네트워크(1)는 입력 이미지, 피처맵들(feature maps) 및 출력을 포함하는 복수 레이어들을 갖는 아키텍처로 구현될 수 있다. 뉴럴 네트워크(1)에서 입력 이미지는 웨이트맵(weight map)과의 컨벌루션 연산이 수행되고, 그 결과 피처맵들이 출력된다. 웨이트맵은 입력 이미지의 특징을 찾아내기 위한 파라미터들로, 커널(kernel) 또는 필터(filter)라고도 불린다. 이때 생성된 출력 피처맵들은 입력 피처맵들로서 다시 웨이트맵과의 컨벌루션 연산이 수행되고, 새로운 피처맵들이 출력된다. 이와 같은 컨벌루션 연산이 반복적으로 수행된 결과, 최종적으로는 뉴럴 네트워크(1)를 통한 입력 이미지의 특징들에 대한 인식 결과가 출력될 수 있다.The neural network 1 may be implemented with an architecture having multiple layers including input images, feature maps and outputs. In the neural network 1, the input image is subjected to a convolution operation with a weight map, and as a result, feature maps are output. The weight map is a parameter for finding characteristics of an input image, and is also called a kernel or filter. At this time, the generated output feature maps are input feature maps, and convolution operation with the weight map is performed again, and new feature maps are output. As a result of such a convolution operation being repeatedly performed, a result of recognition of characteristics of the input image through the neural network 1 may be finally output.

예를 들어, 도 1의 뉴럴 네트워크(1)에 24x24 픽셀 크기의 이미지가 입력된 경우, 입력 이미지는 웨이트맵과의 컨벌루션 연산을 통해 20x20 크기를 갖는 4채널의 피처맵들로 출력될 수 있다. 또한, 서브샘플링 과정을 통해 20x20 크기를 갖는 4채널의 피처맵의 픽셀 값들 중 일부만이 이용되어 10x10 크기를 갖는 4채널의 피처맵들이 출력될 수 있다. 서브샘플링 방식으로는 최대-풀링(max-pooling), 평균-풀링(average-pooling) 등의 방식 등이 적용될 수 있다. For example, when an image having a size of 24x24 pixels is input to the neural network 1 of FIG. 1, the input image may be output as feature channels of 4 channels having a size of 20x20 through convolution operation with a weight map. Also, through the subsampling process, only some of the pixel values of the feature map of 4 channels having a size of 20x20 may be used to output feature maps of 4 channels having a size of 10x10. As a subsampling method, a method such as max-pooling or average-pooling may be applied.

이후에도, 10x10 피처맵들은 웨이트맵과의 반복적인 컨벌루션 연산 및 서브샘플링 연산을 통해 크기가 줄어들면서, 최종적으로는 글로벌(global)한 특징들이 출력될 수 있다. 뉴럴 네트워크(1)는 여러 레이어들에서 컨벌루션 연산 및 서브샘플링(또는 풀링) 연산을 반복적으로 수행함으로써 입력 이미지로부터 이미지 전체를 대표할 수 있는 강인한 특징들을 필터링하여 출력하고, 출력된 글로벌한 특징들이 풀리 커넥티드 레이어에 입력됨으로써 최종적으로 입력 이미지에 대한 인식 결과를 도출할 수 있다.Afterwards, the 10x10 feature maps are reduced in size through iterative convolution and subsampling operations with the weight map, and finally, global features can be output. The neural network 1 filters and outputs robust features that can represent the entire image from the input image by repeatedly performing convolution and subsampling (or pooling) operations on multiple layers. By inputting to the connected layer, the recognition result for the input image can be finally obtained.

도 2는 뉴럴 네트워크에서 입력 피처맵 및 출력 피처맵의 관계의 일 예를 설명하기 위한 도면이다.2 is a diagram for explaining an example of the relationship between an input feature map and an output feature map in a neural network.

도 2를 참조하면, 뉴럴 네트워크의 어느 레이어(2)에서, 제 1 피처맵(FM1)은 입력 피처맵에 해당될 수 있고, 제 2 피처맵(FM2)는 출력 피처맵에 해당될 수 있다. 피처맵은 입력 데이터의 다양한 특징들이 표현된 데이터 세트를 의미할 수 있다. 피처맵들(FM1, FM2)은 2차원 매트릭스의 엘리먼트들을 갖거나 또는 3차원 매트릭스의 엘리먼트들을 가질 수 있고, 각각의 엘리먼트에는 픽셀 값이 정의될 수 있다. 피처맵들(FM1, FM2)은 너비(W)(또는 칼럼이라고 함), 높이(H)(또는 로우라고 함) 및 깊이(D)를 가진다. 이때, 깊이(D)는 채널들의 개수에 해당될 수 있다.Referring to FIG. 2, in a layer 2 of a neural network, the first feature map FM1 may correspond to an input feature map, and the second feature map FM2 may correspond to an output feature map. The feature map may mean a data set in which various characteristics of input data are expressed. The feature maps FM1 and FM2 may have elements of a 2D matrix or elements of a 3D matrix, and a pixel value may be defined for each element. The feature maps FM1 and FM2 have a width W (or a column), a height H (or a row), and a depth D. At this time, the depth D may correspond to the number of channels.

제 1 피처맵(FM1) 및 웨이트맵(weight map, WM)에 대한 컨벌루션 연산이 수행될 수 있고, 그 결과 제 2 피처맵(FM2)이 생성될 수 있다. 웨이트맵은 각 엘리먼트에 정의된 웨이트 파라미터로 제 1 피처맵(FM1)과 컨벌루션 연산을 수행함으로써 제 1 피처맵(FM1)의 특징들을 필터링한다. 웨이트맵은 제 1 피처맵(FM1)을 슬라이딩 윈도우 방식으로 시프트하면서 제 1 피처맵(FM1)의 윈도우들(또는 타일이라고도 함)과 컨벌루션 연산을 수행한다. 각 시프트 동안, 웨이트맵에 포함된 웨이트 파라미터들 각각은 제 1 피처맵(FM1) 내 중첩된 윈도우의 픽셀 값들 각각과 곱해지고 더해질 수 있다. 제 1 피처맵(FM1)과 웨이트맵이 컨벌루션됨에 따라, 제 2 피처맵(FM2)의 하나의 채널이 생성될 수 있다. 도 1에는 하나의 웨이트맵만이 도시되었으나, 실제로는 복수의 웨이트맵들이 제 1 피처맵(FM1)과 각각 컨벌루션되어, 복수의 채널들의 제 2 피처맵(FM2)이 생성될 수 있다.The convolution operation for the first feature map FM1 and the weight map WM may be performed, and as a result, the second feature map FM2 may be generated. The weight map filters characteristics of the first feature map FM1 by performing a convolution operation with the first feature map FM1 with a weight parameter defined in each element. The weight map shifts the first feature map FM1 in a sliding window manner and performs convolutional operations with the windows of the first feature map FM1 (also referred to as tiles). During each shift, each of the weight parameters included in the weight map may be multiplied and added to each of the pixel values of the overlapped window in the first feature map FM1. As the first feature map FM1 and the weight map are convolved, one channel of the second feature map FM2 may be generated. Although only one weight map is shown in FIG. 1, in reality, a plurality of weight maps may be convolved with the first feature map FM1, so that a second feature map FM2 of a plurality of channels may be generated.

한편, 제 2 피처맵(FM2)은 다음 레이어의 입력 피처맵에 해당될 수 있다. 예를 들어, 제 2 피처맵(FM2)은 풀링(또는 서브샘플링) 레이어의 입력 피처맵이 될 수 있다.Meanwhile, the second feature map FM2 may correspond to an input feature map of the next layer. For example, the second feature map FM2 may be an input feature map of a pooling (or subsampling) layer.

도 1 및 도 2에서는 설명의 편의를 위하여 뉴럴 네트워크(1)의 개략적인 아키텍처에 대해서만 도시되어 있다. 하지만, 뉴럴 네트워크(1)는 도시된 바와 달리, 보다 많거나 적은 개수의 레이어들, 피처맵들, 웨이트맵들 등으로 구현될 수 있고, 그 크기들 또한 다양하게 변형될 수 있음을 당해 기술분야의 통상의 기술자라면 이해할 수 있다.1 and 2 are shown only for the schematic architecture of the neural network 1 for convenience of explanation. However, the neural network 1 can be implemented with more or fewer layers, feature maps, weight maps, and the like, as shown, and its sizes can also be variously modified. Those skilled in the art can understand.

도 3은 뉴럴 네트워크 장치의 일 예를 나타낸 도면이다. 3 is a diagram illustrating an example of a neural network device.

뉴럴 네트워크 장치(300)는 PC(personal computer), 서버 디바이스, 모바일 디바이스, 임베디드 디바이스 등의 다양한 종류의 디바이스들로 구현될 수 있고, 구체적인 예로서 뉴럴 네트워크를 이용한 음성 인식, 영상 인식, 영상 분류 등을 수행하는 스마트폰, 태블릿 디바이스, AR(Augmented Reality) 디바이스, IoT(Internet of Things) 디바이스, 자율주행 자동차, 로보틱스, 의료기기 등에 해당될 수 있으나, 이에 제한되지 않는다. 나아가서, 뉴럴 네트워크 장치(300)는 위와 같은 디바이스에 탑재되는 전용 하드웨어 가속기(HW accelerator)에 해당될 수 있고, 뉴럴 네트워크 장치(300)는 뉴럴 네트워크 구동을 위한 전용 모듈인 NPU(neural processing unit), TPU(Tensor Processing Unit), Neural Engine 등과 같은 하드웨어 가속기일 수 있으나, 이에 제한되지 않는다.The neural network apparatus 300 may be implemented with various types of devices such as a personal computer (PC), a server device, a mobile device, and an embedded device. As a specific example, voice recognition, image recognition, image classification, etc. using a neural network Smartphones, tablet devices, AR (Augmented Reality) devices, Internet of Things (IoT) devices, autonomous vehicles, robotics, medical devices, and the like, which are performed, but are not limited thereto. Further, the neural network device 300 may correspond to a dedicated hardware accelerator (HW accelerator) mounted on the above device, and the neural network device 300 may be a neural processing unit (NPU), which is a dedicated module for driving the neural network. It may be a hardware accelerator such as a TPU (Tensor Processing Unit), a Neural Engine, etc., but is not limited thereto.

도 3을 참조하면, 뉴럴 네트워크 장치(300)는 프로세서(310), 내부 메모리(320) 및 외부 메모리(330)를 포함할 수 있다. 도 3에 도시된 뉴럴 네트워크 장치(300)에는 본 실시예들와 관련된 구성요소들만이 도시되어 있다. 따라서, 뉴럴 네트워크 장치(300)에는 도 3에 도시된 구성요소들 외에 다른 범용적인 구성요소들이 더 포함될 수 있음은 당해 기술분야의 통상의 기술자에게 자명하다.Referring to FIG. 3, the neural network device 300 may include a processor 310, an internal memory 320 and an external memory 330. In the neural network device 300 illustrated in FIG. 3, only components related to the present embodiments are illustrated. Accordingly, it is apparent to those skilled in the art that the neural network device 300 may further include other general-purpose components in addition to the components shown in FIG. 3.

프로세서(310)는 뉴럴 네트워크 장치(300)를 실행하기 위한 전반적인 기능들을 제어하는 역할을 한다. 예를 들어, 프로세서(310)는 뉴럴 네트워크 장치(300) 내에 저장된 프로그램들을 실행함으로써, 뉴럴 네트워크 장치(300)를 전반적으로 제어한다. 프로세서(310)는 뉴럴 네트워크 장치(300) 내에 구비된 CPU(central processing unit), GPU(graphics processing unit), AP(application processor) 등으로 구현될 수 있으나, 이에 제한되지 않는다.The processor 310 serves to control overall functions for executing the neural network device 300. For example, the processor 310 controls the neural network device 300 as a whole by executing programs stored in the neural network device 300. The processor 310 may be implemented as a central processing unit (CPU), graphics processing unit (GPU), or application processor (AP) provided in the neural network device 300, but is not limited thereto.

외부 메모리(330)는 뉴럴 네트워크 장치(300) 내에서 처리되는 각종 데이터들을 저장하는 하드웨어로서, 예를 들어, 외부 메모리(330)는 뉴럴 네트워크 장치(300)에서 처리된 데이터들 및 처리될 데이터들을 저장할 수 있다. 외부 메모리(330)는 뉴럴 네트워크 장치(300)에 의해 구동될 애플리케이션들, 드라이버들 등을 저장할 수 있다. 외부 메모리(330)는 DRAM(dynamic random access memory), SRAM(static random access memory) 등과 같은 RAM(random access memory), ROM(read-only memory), EEPROM(electrically erasable programmable read-only memory), CD-ROM, 블루레이 또는 다른 광학 디스크 스토리지, HDD(hard disk drive), SSD(solid state drive), 또는 플래시 메모리일 수 있으나, 이에 한정되지 않는다.The external memory 330 is hardware storing various data processed in the neural network device 300. For example, the external memory 330 stores data processed in the neural network device 300 and data to be processed. Can be saved. The external memory 330 may store applications, drivers, and the like to be driven by the neural network device 300. The external memory 330 includes random access memory (RAM) such as dynamic random access memory (DRAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), and CD -ROM, Blu-ray or other optical disk storage, hard disk drive (HDD), solid state drive (SSD), or flash memory, but is not limited thereto.

예를 들어, 외부 메모리(330)는 뉴럴 네트워크의 복수의 레이어들 각각에서의 중간 연산 결과물인 출력 피처맵 또는 연산이 수행되는 현재 레이어에서는 사용되지 않는 웨이트맵들을 저장할 수 있다. 한편, 뉴럴 네트워크는 각각의 입력 이미지마다 GOPS(giga-operation per second)를 수행하고, 대략 수억 내지 수천억 개의 웨이트맵들을 이용하고, 수백 기가바이트의 중간 연산 결과물들을 생성할 수 있다. 이에 따라, 프로세서(310)의 외부 메모리(330)에의 엑세스(access) 빈도수가 증가하는 경우, 외부 메모리(330) 대역폭의 상당한 부분을 소모하여 뉴럴 네트워크 장치(300)의 실행 속도를 감소시킬 수 있다.For example, the external memory 330 may store an output feature map that is an intermediate calculation result in each of the plurality of layers of the neural network or weight maps that are not used in the current layer on which the operation is performed. On the other hand, the neural network can perform giga-operation per second (GOPS) for each input image, use approximately hundreds to hundreds of billions of weight maps, and generate hundreds of gigabytes of intermediate calculation results. Accordingly, when the frequency of access to the external memory 330 of the processor 310 increases, the execution speed of the neural network device 300 may be reduced by consuming a significant portion of the bandwidth of the external memory 330. .

따라서, 뉴럴 네트워크 장치(300)는 프로세서(310) 내에 내부 메모리(320)를 더 포함하여, 외부 메모리(330)에 직접 엑세스(access)하는 것보다 빠르게 데이터에 접근할 수 있도록 할 수 있다. 내부 메모리(320)는 프로세서(310)의 캐시　메모리 또는 SRAM(Static random access Memory)일 수 있으나 이제 한정되지 않으며 다양한 형태의 메모리를 포함할 수 있다. Accordingly, the neural network device 300 may further include an internal memory 320 in the processor 310 to access data faster than accessing the external memory 330 directly. The internal memory 320 may be a cache-to-memory or a static random access memory (SRAM) of the processor 310, but is not limited to this and may include various types of memory.

프로세서(310)는 복수의 레이어들 각각에서, 외부 메모리(330)에의 엑세스를 최소화할 수 있도록 내부 메모리(320)의 공간을 효율적으로 할당할 수 있다. The processor 310 may efficiently allocate space in the internal memory 320 to minimize access to the external memory 330 in each of the plurality of layers.

구체적으로, 프로세서(310)는 복수의 레이어들 각각에 대하여, 각 레이어에서 이용되거나 생성되는 각종 데이터들을 저장할 공간의 용량에 관한 용량 정보를 획득할 수 있다. 프로세서(310)는 레이어의 입력 피처맵을 저장할 공간의 용량에 관한 제 1 용량 정보 및 레이어의 출력 피처맵을 저장할 공간의 용량에 관한 제 2 용량 정보를 획득할 수 있다. 제 1 용량 정보는 입력 피처맵을 저장하기 위해 필요한 공간의 용량으로서, 입력 피처맵 데이터의 크기에 해당할 수 있다. 마찬가지로, 제 2 용량 정보는 출력 피처맵을 저장하기 위해 필요한 공간의 용량으로서, 출력 피처맵 데이터의 크기에 해당할 수 있다.Specifically, the processor 310 may acquire capacity information regarding a capacity of a space for storing various data used or generated in each layer, for each of the plurality of layers. The processor 310 may obtain first capacity information regarding the capacity of the space to store the input feature map of the layer and second capacity information regarding the capacity of the space to store the output feature map of the layer. The first capacity information is a capacity of space required to store the input feature map, and may correspond to the size of the input feature map data. Likewise, the second capacity information is a capacity of space required to store the output feature map, and may correspond to the size of the output feature map data.

프로세서(310)는 획득된 용량 정보에 기초하여, 내부 메모리(320) 내에서 레이어의 입력 피처맵 및 출력 피처맵 각각을 저장할 공간을 할당할 수 있다. 프로세서(310)는 내부 메모리(320)의 초기 주소 값 및 제 1 용량 정보에 기초하여 내부 메모리(320)) 내에서 입력 피처맵을 저장할 제 1 저장 공간을 할당하고, 내부 메모리(320)의 마지막 주소값 및 제 2 용량 정보에 기초하여 내부 메모리(320) 내에서 출력 피처맵을 저장할 제 2 저장 공간을 할당할 수 있다. The processor 310 may allocate a space to store each of the input feature map and the output feature map of the layer in the internal memory 320 based on the acquired capacity information. The processor 310 allocates a first storage space to store the input feature map in the internal memory 320 based on the initial address value of the internal memory 320 and the first capacity information, and the last of the internal memory 320 A second storage space to store the output feature map in the internal memory 320 may be allocated based on the address value and the second capacity information.

제 1 저장 공간은 내부 메모리(320)의 초기 주소 값으로부터 제 1 주소 값까지의 공간에 해당하고, 제 2 저장 공간은 내부 메모리(320)의 제 2 주소 값으로부터 마지막 주소 값까지의 공간에 해당할 수 있다. 제 1 주소 값 및 제 2 주소 값은 메모리의 초기 주소 값과 마지막 주소 값 사이의 임의의 주소 값이 될 수 있다. 뉴럴 네트워크 상에서 현재 레이어의 출력 피처맵은 다음 레이어의 입력 피처맵이 되므로, 레이어의 입력 피처맵 및 출력 피처맵을 저장할 공간을 각각 내부 메모리(320)의 양 단에 할당함으로써, 외부 메모리(330)에의 엑세스를 최소화할 수 있다. The first storage space corresponds to the space from the initial address value of the internal memory 320 to the first address value, and the second storage space corresponds to the space from the second address value to the last address value of the internal memory 320. can do. The first address value and the second address value may be any address value between the initial address value and the last address value in memory. On the neural network, since the output feature map of the current layer becomes the input feature map of the next layer, the space to store the input feature map and the output feature map of the layer is allocated to both ends of the internal memory 320, respectively, thereby the external memory 330. Access to the server can be minimized.

도 4는 메모리 공간의 할당 방식의 일 예를 나타낸 도면이다. 4 is a diagram illustrating an example of a memory space allocation method.

도 4는 뉴럴 네트워크의 제 1 레이어, 제 2 레이어 및 제 3 레이어 각각에서 프로세서(310)가 메모리(400) 공간을 할당하는 일 예를 나타낸 것이다. 도 4의 메모리(400)는 도 3의 내부 메모리(320)에 해당할 수 있다. 4 illustrates an example in which the processor 310 allocates memory 400 space in each of the first, second, and third layers of the neural network. The memory 400 of FIG. 4 may correspond to the internal memory 320 of FIG. 3.

뉴럴 네트워크 장치(300) 내 메모리(400)는 예를 들어 10K(byte)의 메모리(400) 용량을 가질 수 있다. 이 때, 메모리(400)는 0(0K)부터 10000(10K)의 번호를 붙여 데이터를 구분할 수 있으며, 이 번호를 메모리(400)의 주소 값이라 할 수 있다. 즉, 메모리 내에서 저장되는 데이터의 위치는 0K 번지부터 10K 번지까지의 주소 값을 이용하여 특정할 수 있다. 한편, 상술한 메모리(400)의 용량은 예시에 불과할 뿐 메모리(400)는 다양한 크기의 용량을 가질 수 있다. 또한, 메모리(400)의 주소 값은 상술한 바로 제한되지 않으며, 다양한 방식으로 나타낼 수 있다. The memory 400 in the neural network device 300 may have a memory 400 capacity of 10K (byte), for example. At this time, the memory 400 can distinguish data by numbering from 0 (0K) to 10000 (10K), and this number may be referred to as an address value of the memory 400. That is, the location of the data stored in the memory can be specified using address values from 0K to 10K. Meanwhile, the above-described capacity of the memory 400 is only an example, and the memory 400 may have various sizes. In addition, the address value of the memory 400 is not limited as described above, and may be represented in various ways.

먼저, 프로세서(310)는 제 1 레이어의 입력 피처맵을 저장할 공간의 용량에 관한 제 1 용량 정보 및 제 1 레이어의 출력 피처맵을 저장할 공간의 용량에 관한 제 2 용량 정보를 획득할 수 있다. First, the processor 310 may obtain first capacity information regarding a capacity of a space to store an input feature map of the first layer and second capacity information about a capacity of a space to store the output feature map of the first layer.

제 1 용량 정보는 입력 피처맵의 너비, 높이 및 채널들의 개수에 기초하여 결정될 수 있으며, 제 2 용량 정보 또한 출력 피처맵의 너비, 높이 및 채널들의 개수에 기초하여 결정될 수 있다. 예를 들어, 프로세서(310)는 제 1 레이어의 입력 피처맵을 저장하기 위해 필요한 용량이 3K(byte)이고, 제 1 레이어의 출력 피처맵을 저장하기 위해 필요한 용량이 2K(byte)이라는 정보를 획득할 수 있다. The first capacity information may be determined based on the width, height and number of channels of the input feature map, and the second capacity information may also be determined based on the width, height and number of channels of the output feature map. For example, the processor 310 may provide information that the capacity required to store the input feature map of the first layer is 3K (byte), and the capacity required to store the output feature map of the first layer is 2K (byte). Can be obtained.

프로세서(310)는 제 1 레이어의 입력 피처맵을 저장하기 위해 필요한 용량이 3K(byte)이라는 용량 정보에 기초하여, 메모리(400)의 초기 주소 값인 0K 번지부터 3K번지까지의 공간에 해당하는 제 1 저장 공간(410)을 메모리(400) 내에서 할당할 수 있다. 또한, 제 1 레이어의 출력 피처맵을 저장하기 위해 필요한 용량이 2K(byte)이라는 용량 정보에 기초하여, 메모리(400)의 8K 번지부터 메모리(400)의 마지막 주소 값인 10K번지까지의 공간에 해당하는 제 2 저장 공간(420)을 메모리(400) 내에서 할당할 수 있다.Processor 310 is based on the capacity information that the capacity required to store the input feature map of the first layer is 3K (byte), the memory corresponding to the space from the initial address value 0K to 3K address of the memory 400 One storage space 410 may be allocated in the memory 400. Further, based on the capacity information of 2K (byte), the capacity required to store the output feature map of the first layer corresponds to a space from 8K address of the memory 400 to 10K address, which is the last address value of the memory 400. The second storage space 420 may be allocated in the memory 400.

또한, 프로세서(310)는 제 1 레이어의 입력 피처맵과 연산이 수행되는 웨이트맵을 저장할 공간의 용량에 관한 제 4 용량 정보를 더 획득할 수 있다. 제 4 용량 정보는 제 1 레이어의 웨이트맵을 저장하기 위해 필요한 공간의 용량으로서, 웨이트맵의 크기 및 개수에 기초하여 결정될 수 있다. 프로세서(310)는 획득된 제 4 용량 정보에 기초하여 제 1 메모리(400) 공간과 제 2 메모리(400) 공간 사이에 웨이트맵을 저장할 공간을 할당할 수 있다. 구체적으로, 웨이트맵을 저장할 공간은 메모리 내 제 1 주소 값으로부터 제 2 주소 값까지의 공간의 적어도 일부가 될 수 있다.In addition, the processor 310 may further obtain fourth capacity information regarding the capacity of the space to store the input feature map of the first layer and the weight map on which the calculation is performed. The fourth capacity information is a capacity of space required to store the weight map of the first layer, and may be determined based on the size and number of weight maps. The processor 310 may allocate space for storing a weight map between the first memory 400 space and the second memory 400 space based on the acquired fourth capacity information. Specifically, the space to store the weight map may be at least part of the space from the first address value to the second address value in memory.

한편, 프로세서(310)는 입력 피처맵, 출력 피처맵 및 웨이트맵 외에 레이어의 컨볼루션 연산 과정에서 생성되는 다양한 종류의 데이터를 저장할 공간을 메모리(400) 내에 할당할 수 있으며, 이를 워킹 데이터(working data)로 지칭할 수 있다. 예를 들어, 워킹 데이터는 컨볼루션 연산 과정에서의 중간 결과물 등을 포함할 수 있다. Meanwhile, the processor 310 may allocate space in the memory 400 to store various types of data generated in the process of convolution of a layer, in addition to the input feature map, the output feature map, and the weight map, and working data (working) data). For example, the working data may include intermediate results in a convolution operation process.

제 1 레이어에서 프로세서(310)가 메모리(400) 공간을 할당한 방식에 따라, 제 1 레이어의 입력 피처맵과 웨이트맵이 메모리(400) 내에 저장될 수 있다. 입력 피처맵과 웨이트맵 간의 컨볼루션 연산이 완료되면, 제 2 저장 공간(420)에 제 1 레이어의 출력 피처맵이 저장될 수 있다. Depending on how the processor 310 allocates space in the memory 400 in the first layer, the input feature map and the weight map of the first layer may be stored in the memory 400. When the convolution operation between the input feature map and the weight map is completed, the output feature map of the first layer may be stored in the second storage space 420.

이후, 프로세서(310)는 제 1 레이어에 이은 제 2 레이어에 대해 메모리(400) 공간을 다시 할당할 수 있다. 이 때, 제 1 레이어에서 컨볼루션 연산 결과 생성된 출력 피처맵은 제 2 레이어의 입력 피처맵이 될 수 있다. 따라서, 프로세서(310)는 제 2 레이어의 입력 피처맵을 저장할 공간을 제 2 저장 공간(420)에 할당할 수 있고, 제 2 메모리(400) 공간에 저장된 제 1 레이어의 출력 피처맵을 제 2 레이어의 입력 피처맵으로서 그대로 이용할 수 있다. Thereafter, the processor 310 may reallocate the memory 400 space for the second layer following the first layer. In this case, the output feature map generated as a result of the convolution operation in the first layer may be an input feature map of the second layer. Accordingly, the processor 310 may allocate a space to store the input feature map of the second layer to the second storage space 420, and the output feature map of the first layer stored in the second memory 400 space to the second It can be used as it is as an input feature map of a layer.

또한, 프로세서(310)는 제 1 레이어에서와 마찬가지로, 제 2 레이어의 출력 피처맵을 저장할 공간의 용량에 관한 제 3 용량 정보를 획득할 수 있다. 프로세서(310)는 메모리(400)의 초기 주소 값 및 획득된 제 3 용량 정보에 기초하여 메모리(400) 내에 제 2 레이어의 출력 피처맵을 저장할 제 3 저장 공간(430)을 할당할 수 있다. 예를 들어, 제 1 레이어의 출력 피처맵을 저장하기 위해 필요한 용량이 1K(byte)인 경우, 프로세서(310)는 메모리(400)의 초기 주소 값인 0K 번지부터 1K번지까지의 공간에 해당하는 제 3 저장 공간(430)을 메모리(400) 내에서 할당할 수 있다.Also, as in the first layer, the processor 310 may acquire third capacity information regarding the capacity of the space to store the output feature map of the second layer. The processor 310 may allocate a third storage space 430 to store the output feature map of the second layer in the memory 400 based on the initial address value of the memory 400 and the acquired third capacity information. For example, when the capacity required to store the output feature map of the first layer is 1K (byte), the processor 310 may control the space corresponding to the space from the address 0K to address 1K of the initial address of the memory 400. 3 The storage space 430 may be allocated in the memory 400.

마찬가지로, 프로세서(310)는 제 3 레이어에 대해 메모리(400) 공간을 할당할 때, 제 2 레이어의 출력 피처맵은 제 3 레이어의 입력 피처맵이 되므로, 제 3 레이어의 입력 피처맵을 저장할 공간을 제 3 저장 공간(430)에 할당할 수 있다. Similarly, when the processor 310 allocates memory 400 space for the third layer, the output feature map of the second layer becomes an input feature map of the third layer, and therefore, a space to store the input feature map of the third layer Can be allocated to the third storage space 430.

한편, 이와 같이 각 레이어의 입력 피처맵 및 출력 피처맵을 저장할 공간들 각각을 메모리(400) 내에서 양 단에 할당하는 경우, 메모리 단편화(memory fragmentation) 현상을 방지할 수 있다. 이하 도 5를 참조하여 메모리 단편화 현상을 보다 상세히 설명한다. On the other hand, when each of the spaces for storing the input feature map and the output feature map of each layer is allocated to both ends in the memory 400, memory fragmentation may be prevented. Hereinafter, the memory fragmentation phenomenon will be described in more detail with reference to FIG. 5.

도 5는 메모리 단편화 현상으로 인한 오버헤드가 발생하는 과정의 일 예를 나타낸 도면이다. 5 is a diagram illustrating an example of a process in which overhead occurs due to a memory fragmentation phenomenon.

도 5는 뉴럴 네트워크의 제 1 레이어 및 제 2 레이어 각각에서 프로세서(310)의 메모리(500) 공간 할당의 일 예를 나타낸 것이다. 도 5의 메모리(500)는 도 3의 내부 메모리(320)에 해당할 수 있다. 5 illustrates an example of memory 500 space allocation of the processor 310 in each of the first layer and the second layer of the neural network. The memory 500 of FIG. 5 may correspond to the internal memory 320 of FIG. 3.

프로세서(310)는 제 1 레이어의 출력 피처맵을 저장하기 위해 메모리(500)의 4K 번지부터 7K 번지까지의 공간을 할당할 수 있다. 제 1 레이어에서 입력 피처맵과 웨이트맵 간의 컨볼루션 연산이 완료되면, 메모리(500)의 4K 번지부터 7K 번지까지의 공간에 제 1 레이어의 출력 피처맵이 저장될 수 있다. The processor 310 may allocate space from 4K to 7K of the memory 500 to store the output feature map of the first layer. When the convolution operation between the input feature map and the weight map is completed in the first layer, the output feature map of the first layer may be stored in a space from 4K to 7K of the memory 500.

이후, 프로세서(310)는 제 1 레이어에 이은 제 2 레이어에 대해 메모리(500) 공간을 다시 할당할 수 있다. 제 2 레이어의 입력 피처맵은 제 1 레이어에서 컨볼루션 연산 결과 생성된 출력 피처맵과 같으므로, 메모리(500)의 4K 번지부터 7K 번지까지의 공간에 저장된 제 1 레이어의 출력 피처맵을 제 2 레이어의 입력 피처맵으로서 그대로 이용할 수 있다. Thereafter, the processor 310 may reallocate the memory 500 space for the second layer following the first layer. Since the input feature map of the second layer is the same as the output feature map generated by the convolution operation in the first layer, the output feature map of the first layer stored in the space from 4K to 7K of the memory 500 is second. It can be used as it is as an input feature map of a layer.

이후, 프로세서(310)는 제 2 레이어의 출력 피처맵, 웨이트맵 및 워킹 데이터 각각을 저장할 공간의 용량에 관한 용량 정보를 획득할 수 있다. 예를 들어, 프로세서(310)는 출력 피처맵, 웨이트맵 및 워킹 데이터 각각을 저장하기 위해 필요한 용량이 각각 1K(byte), 5K(byte) 및 1K(byte)라는 용량 정보를 획득할 수 있다. Thereafter, the processor 310 may obtain capacity information regarding a capacity of a space to store each of the output feature map, weight map, and walking data of the second layer. For example, the processor 310 may obtain capacity information of 1K (byte), 5K (byte), and 1K (byte), respectively, of capacity required to store the output feature map, weight map, and walking data, respectively.

프로세서(310)는 제 2 레이어의 출력 피처맵을 저장하기 위해 메모리(500)의 초기 주소 값인 0K 번지부터 1K 번지까지의 공간을 할당하고, 워킹 데이터를 저장하기 위해 메모리(500)의 9K 번지부터 마지막 주소 값인 10K번지까지의 공간을 할당할 수 있다. 이 때, 메모리(500) 내에서 1K 번지부터 4K 번지까지의 공간 및 7K 번지부터 9K번지까지의 공간이 사용 가능할 수 있으나, 웨이트맵을 저장하기 위해 필요한 공간의 용량은 5K(byte)이므로 메모리(500) 할당을 할 수 없게 되는 메모리 단편화 현상이 발생하게 된다.The processor 310 allocates space from the address 0K to 1K, which is the initial address value of the memory 500, to store the output feature map of the second layer, and from the 9K address of the memory 500 to store working data Space up to 10K, the last address value, can be allocated. At this time, the space from the 1K address to the 4K address and the space from the 7K address to the 9K address may be available in the memory 500, but the space required to store the weight map is 5K (byte). 500) A memory fragmentation phenomenon that cannot be allocated occurs.

따라서, 프로세서(310)가 메모리(500) 내에서 분산되어 존재하는, 단편화된 메모리(500) 공간들을 결합하고 메모리(500) 공간을 재배치하는 과정이 필요할 수 있으며, 이 과정에서 오버헤드(overhead)가 발생할 수 있다.Accordingly, a process in which the processor 310 is distributed in the memory 500 and is present in a fragmented memory 500 space is combined and the memory 500 space may be rearranged, and overhead may be required in this process. Can occur.

다시 도 4로 돌아와서, 도 4의 경우는 각 레이어의 입력 피처맵 및 출력 피처맵을 저장할 공간들 각각을 메모리(400) 내에서 양 단에 할당하고 있다. 따라서, 도 5와 같은 메모리 단편화 현상이 발생하지 않기 때문에 이로 인한 오버헤드의 문제점도 해결할 수 있다. Returning to FIG. 4 again, in the case of FIG. 4, each of the spaces for storing the input feature map and the output feature map of each layer is allocated at both ends in the memory 400. Therefore, since the memory fragmentation phenomenon as shown in FIG. 5 does not occur, it is possible to solve the overhead problem.

도 6은 레이어를 복수의 서브 레이어들로 분할하는 일 예를 나타낸 도면이다. 6 is a view showing an example of dividing a layer into a plurality of sub-layers.

도 6을 참조하면, 프로세서(310)는 제 1 레이어의 입력 피처맵(600)과 웨이트맵(610) 간의 컨볼루션 연산을 수행하여 출력 피처맵(620)을 생성할 수 있다. 다만, 제 1 레이어에서의 연산 시 입력 피처맵(600), 웨이트맵(610) 및 출력 피처맵(620) 등 연산에 필요하거나 연산 과정에서 생성되는 각종 데이터들을 저장할 공간의 용량은, 프로세서(310) 내부에 위치하는 메모리 공간의 크기보다 커질 수 있다. 이러한 경우, 프로세서(310)가 외부 메모리로부터 데이터를 리드하거나 라이트하는 빈도수가 증가하여 실행 속도가 감소할 수 있다. Referring to FIG. 6, the processor 310 may generate an output feature map 620 by performing a convolution operation between the input feature map 600 and the weight map 610 of the first layer. However, when calculating in the first layer, the capacity of the space for storing various data necessary for calculation such as the input feature map 600, the weight map 610, and the output feature map 620, or generated in the calculation process, is the processor 310. ) It can be larger than the size of the memory space located inside. In this case, the frequency at which the processor 310 reads or writes data from the external memory increases, and execution speed may decrease.

따라서, 프로세서(310)는 제 1 레이어를 제 1 서브 레이어, 제 2 서브 레이어 및 제 3 서브 레이어로 분할하여 분할된 서브 레이어 각각에 대하여 메모리 공간을 할당하고 연산을 수행하는 방식으로 외부 메모리에의 엑세스를 줄일 수 있다. 예를 들어, 프로세서(310)는 웨이트맵(610)의 크기가 도미넌트(dominant)할 경우, 레이어를 서브 레이어들로 분할하는 방식을 수행할 수 있다. Accordingly, the processor 310 divides the first layer into a first sub-layer, a second sub-layer, and a third sub-layer, and allocates memory space to each of the divided sub-layers and performs calculation on the external memory. Access can be reduced. For example, when the size of the weight map 610 is dominant, the processor 310 may perform a method of dividing the layer into sub-layers.

제 1 레이어의 입력 피처맵(600)과 연산이 수행되는 웨이트맵(610) 또한 복수의 서브 웨이트맵들로 분할되어, 각각의 서브 레이어에 할당될 수 있다. 프로세서(310)는 분할된 서브 레이어 각각에 대하여 메모리를 할당하고, 이에 기초하여 연산을 수행할 수 있다. The input feature map 600 of the first layer and the weight map 610 on which calculation is performed may also be divided into a plurality of sub weight maps and allocated to each sub layer. The processor 310 allocates memory for each of the divided sub-layers, and may perform calculation based on the memory.

예를 들어, 제 1 서브 레이어에는 제 1 서브 웨이트맵(611)이 할당되어, 제 1 서브 레이어에서는 입력 피처맵(600)과 제 1 서브 웨이트맵(611) 간의 컨볼루션 연산이 수행될 수 있다. 마찬가지로, 제 2 서브 레이어에는 제 2 서브 웨이트맵(612)이 할당되어 입력 피처맵(600)과의 컨볼루션 연산이 수행될 수 있고, 제 3 서브 레이어에는 제 3 서브 웨이트맵(613)이 할당되어 입력 피처맵(600)과 컨볼루션 연산이 수행될 수 있다.For example, a first sub-weight map 611 is assigned to the first sub-layer, and convolution operation between the input feature map 600 and the first sub-weight map 611 may be performed in the first sub-layer. . Similarly, the second sub-weight map 612 is allocated to the second sub-layer, and convolutional calculation with the input feature map 600 can be performed, and the third sub-weight map 613 is allocated to the third sub-layer. As a result, the input feature map 600 and the convolution operation may be performed.

프로세서(310)는 연산 결과, 분할된 서브 레이어 각각에 대하여 출력 피처맵(620)의 채널들 각각을 생성할 수 있다. 구체적으로, 서브 웨이트맵들 각각은 입력 피처맵의 각 채널을 순회하며 합성곱을 계산한 후, 계산 결과를 종합하여 출력 피처맵(620)의 채널들 각각을 만들 수 있다. As a result of the operation, the processor 310 may generate each of the channels of the output feature map 620 for each of the divided sub-layers. Specifically, each of the sub-weight maps can traverse each channel of the input feature map, calculate a composite product, and synthesize the calculation results to make each of the channels of the output feature map 620.

예를 들어, 제 1 채널 출력 피처맵(621)은 제 1 서브 레이어에서 입력 피처맵(600)과 제 1 서브 웨이트맵(611) 간의 연산에 의해 생성될 수 있으며, 제 2 채널 출력 피처맵(622)은 제 2 서브 레이어에서 입력 피처맵(600)과 제 2 서브 웨이트맵(612) 간의 연산에 의해 생성될 수 있다. 마찬가지로, 제 3 채널 출력 피처맵(623)은 제 3 서브 레이어에서 입력 피처맵(600)과 제 3 서브 웨이트맵(613) 간의 연산에 의해 생성될 수 있다. 제 1 채널 출력 피처맵(621), 제 2 채널 출력 피처맵(622) 및 제 3 채널 출력 피처맵(623)을 모두 종합하면, 출력 피처맵(620)과 동일한 결과를 얻을 수 있다.For example, the first channel output feature map 621 may be generated by an operation between the input feature map 600 and the first sub weight map 611 in the first sub layer, and the second channel output feature map ( 622) may be generated by an operation between the input feature map 600 and the second sub weight map 612 in the second sub layer. Similarly, the third channel output feature map 623 may be generated by an operation between the input feature map 600 and the third sub weight map 613 in the third sub layer. When all of the first channel output feature map 621, the second channel output feature map 622, and the third channel output feature map 623 are combined, the same result as the output feature map 620 can be obtained.

한편, 서브 레이어들의 개수 및 서브 웨이트맵들의 개수 등은 상술한 바로 한정되지 않으며 다양한 값을 가질 수 있다. Meanwhile, the number of sub-layers and the number of sub-weight maps are not limited to those described above, and may have various values.

도 7은 레이어를 복수의 서브 레이어들로 분할하여 메모리를 할당하는 일 예를 나타낸 도면이다. 7 is a diagram illustrating an example of allocating memory by dividing a layer into a plurality of sub-layers.

도 7을 참조하면, 프로세서(310)는 제 1 서브 레이어, 제 2 서브 레이어 및 제 3 서브 레이어 각각에 대하여 메모리(700) 공간을 할당할 수 있다. 도 7의 메모리(700)는 도 3의 내부 메모리(320)에 해당할 수 있다. Referring to FIG. 7, the processor 310 may allocate memory 700 space for each of the first sub-layer, the second sub-layer, and the third sub-layer. The memory 700 of FIG. 7 may correspond to the internal memory 320 of FIG. 3.

도 6에서 상술한 바와 같이, 프로세서(310)는 제 1 레이어를 복수의 서브 레이어들로 분할하고, 서브 레이어들 각각에 대해 메모리(700)를 할당하는 경우 제 1 레이어의 웨이트맵이 아닌 서브 웨이트맵을 할당할 수 있다. As described above with reference to FIG. 6, when the processor 310 divides the first layer into a plurality of sub-layers and allocates a memory 700 for each of the sub-layers, the sub-weight is not a weight map of the first layer. Maps can be assigned.

예를 들어, 제 1 서브 레이어에 대하여 메모리(700)를 할당할 경우, 프로세서(310)는 메모리(700) 내에서 입력 피처맵을 저장할 제 1 저장 공간(710) 및 출력 피처맵을 저장할 제 2 저장 공간(720)을 할당할 수 있다. 또한, 프로세서(310)는 제 1 서브 레이어에 할당된 제 1 서브 웨이트맵을 저장할 제 4 저장 공간(730)을 제 1 저장 공간(710)과 제 2 저장 공간(720) 사이에 할당할 수 있다. For example, when allocating the memory 700 to the first sub-layer, the processor 310 may store a first storage space 710 to store an input feature map in the memory 700 and a second to store an output feature map. Storage space 720 may be allocated. Also, the processor 310 may allocate a fourth storage space 730 to store the first sub-weight map allocated to the first sub-layer between the first storage space 710 and the second storage space 720. .

마찬가지로, 제 2 서브 레이어 및 제 3 서브 레이어 각각에 대하여 메모리(700) 할당을 하는 경우에도, 제 1 저장 공간(710)에 입력 피처맵을 저장할 공간을 할당하고 제 2 저장 공간(720)에 출력 피처맵을 저장할 공간을 할당할 수 있다. 또한, 제 2 서브 레이어에서는 제 4 저장 공간(730)에 제 2 서브 웨이트맵을 저장할 공간을 할당할 수 있고, 제 3 서브 레이어에서는 제 4 저장 공간(730)에 제 3 서브 웨이트맵을 저장할 공간을 할당할 수 있다. Similarly, even in the case of allocating the memory 700 for each of the second sub-layer and the third sub-layer, the space for storing the input feature map is allocated to the first storage space 710 and output to the second storage space 720 You can allocate space to store feature maps. In addition, in the second sub-layer, space to store the second sub-weight map may be allocated to the fourth storage space 730, and in the third sub-layer, space to store the third sub-weight map in the fourth storage space 730. Can be assigned.

이 때, 제 1 서브 레이어, 제 2 서브 레이어 및 제 3 서브 레이어 각각에서 제 1 저장 공간(710)에 저장되는 입력 피처맵은 동일할 수 있다. 제 4 저장 공간(730)에는, 제 1 서브 레이어에서 제 3 서브 레이어까지 이동하면서, 각각의 서브 레이어에 할당된 서브 웨이트맵이 오버라이트(overwrite)되며 저장될 수 있다. In this case, the input feature maps stored in the first storage space 710 in each of the first sub-layer, the second sub-layer, and the third sub-layer may be the same. In the fourth storage space 730, while moving from the first sub-layer to the third sub-layer, the sub weight map allocated to each sub layer may be overwritten and stored.

제 2 저장 공간(720)에는, 각 서브 레이어에서 입력 피처맵과 할당된 서브 웨이트맵 간의 컨볼루션 연산 결과인 출력 피처맵의 채널들 각각이 순차적으로 저장될 수 있다. 예를 들어, 제 1 서브 레이어에서는 제 2 저장 공간(720)에 제 1 채널 출력 피처맵이 저장될 수 있고, 제 2 서브 레이어에서는 제 2 채널 출력 피처맵이 기 저장된 제 1 채널 출력 피처맵에 축적될 수 있다. 마지막으로 제 3 서브 레이어에서 제 3 채널 출력 피처맵이 기 저장된 제 1 채널 출력 피처맵 및 제 2 채널 출력 피처맵에 축적되면, 출력 피처맵의 모든 채널들의 생성이 완료되며 제 1 레이어에서의 컨볼루션 연산이 종료될 수 있다.In the second storage space 720, each channel of the output feature map, which is a result of the convolution operation between the input feature map and the allocated sub weight map in each sub layer, may be sequentially stored. For example, the first channel output feature map may be stored in the second storage space 720 in the first sub-layer, and the second channel output feature map may be stored in the first channel output feature map in the second sub-layer. Can accumulate. Finally, when the third channel output feature map is accumulated in the first channel output feature map and the second channel output feature map in the third sub-layer, generation of all channels of the output feature map is completed and convolution in the first layer The solution operation may be terminated.

도 8은 복수의 레이어들 내 타일들을 그룹화하여 메모리를 할당하는 일 예를 나타낸 도면이다. 8 is a diagram illustrating an example of allocating memory by grouping tiles in a plurality of layers.

뉴럴 네트워크 내 레이어에서의 연산 시 입력 피처맵, 웨이트맵 및 출력 피터맵 등 연산에 필요하거나 연산 과정에서 생성되는 각종 데이터들을 저장할 공간의 용량은, 프로세서(310) 내부에 위치하는 메모리 공간의 크기보다 커질 수 있다. 또한, 도 6 및 도 7에서 상술한 바와 같이, 레이어를 복수의 서브 레이어로 분할하여 메모리를 할당하는 경우에도, 각종 데이터들을 저장할 공간의 용량은 프로세서(310) 내부에 위치하는 메모리 공간의 크기보다 커질 수 있다.When calculating at a layer in the neural network, the capacity of a space for storing various data necessary for calculation such as an input feature map, a weight map, and an output peter map is greater than the size of the memory space located inside the processor 310. It can grow. In addition, as described above with reference to FIGS. 6 and 7, even when the memory is allocated by dividing the layer into a plurality of sub-layers, the capacity of the space to store various data is greater than the size of the memory space located inside the processor 310. It can grow.

이러한 경우, 프로세서(310)는 내부 메모리의 메모리 용량을 최대한 활용하여 외부 메모리에의 엑세스를 줄이기 위해, 소정의 레이어들 각각을 타일링(tiling)할 수 있다. 프로세서(310)는 입력 타일에 대응되는 소정의 레이어들 내 타일들을 그룹화하여 동시에 처리할 수 있다. 예를 들어, 프로세서(310)는 입력 피처맵의 크기가 도미넌트(dominant)할 경우, 소정의 레이어들 내 타일들을 그룹화하여 메모리를 할당하고 연산을 처리할 수 있다. In this case, the processor 310 may tile each of the predetermined layers in order to reduce access to the external memory by maximizing the memory capacity of the internal memory. The processor 310 may process tiles by grouping tiles in predetermined layers corresponding to the input tile. For example, when the size of the input feature map is dominant, the processor 310 may group tiles in predetermined layers to allocate memory and process operations.

도 8을 참조하면, 제 1 레이어의 입력 피처맵(810), 제 2 레이어의 입력 피처맵(820) 및 제 2 레이어의 출력 피처맵(830)이 도시되어 있다. 제 2 레이어의 입력 피처맵(820)은 4개의 채널들(820-1, 820-2, 820-3 및 820-4)을 포함하고, 제 2 레이어의 입력 피처맵(830)은 6개의 채널들(830-1, 830-2, 830-3, 830-4, 830-5 및 830-6)을 포함할 수 있다. 여기서, 제 2 레이어의 입력 피처맵(820) 및 제 2 레이어의 출력 피처맵(830) 각각에 대해 도 8에서 도시된 채널들의 개수는 상술한 바로 한정되지 않으며 다양한 값을 가질 수 있다. Referring to FIG. 8, an input feature map 810 of the first layer, an input feature map 820 of the second layer, and an output feature map 830 of the second layer are illustrated. The input feature map 820 of the second layer includes four channels 820-1, 820-2, 820-3 and 820-4, and the input feature map 830 of the second layer is six channels Fields 830-1, 830-2, 830-3, 830-4, 830-5 and 830-6. Here, for each of the input feature map 820 of the second layer and the output feature map 830 of the second layer, the number of channels illustrated in FIG. 8 is not limited to the above, and may have various values.

먼저, 프로세서(310)는 제 1 레이어의 입력 피처맵(810)에서 입력 피처맵(810) 내 입력 타일(812-1)을 선택할 수 있다. 입력 타일(812-1)은 입력 피처맵(810)의 일부에 해당할 수 있다. 제 1 레이어에서 입력 타일(812-1)과 웨이트맵(840)간의 연산을 통해 타일(822-1)이 생성될 수 있으며, 제 2 레이어에서 타일(822-1)과 웨이트맵(850)간의 연산을 통해 타일(832-1)이 생성될 수 있다. 타일(822-1)은 입력 피처맵(820)의 채널들(820-1, 820-2, 820-3 및 820-4) 각각의 왼쪽 상단의 일부를 포함할 수 있으며, 타일(822-1)은 출력 피처맵(830)의 채널들(830-1, 830-2, 830-3, 830-4, 830-5 및 830-6) 각각의 왼쪽 상단의 일부를 포함할 수 있다.First, the processor 310 may select an input tile 812-1 in the input feature map 810 from the input feature map 810 of the first layer. The input tile 812-1 may correspond to a part of the input feature map 810. A tile 822-1 may be generated through an operation between the input tile 812-1 and the weight map 840 in the first layer, and between the tile 822-1 and the weight map 850 in the second layer. Tile 832-1 may be generated through the operation. The tile 822-1 may include a portion of the upper left of each of the channels 820-1, 820-2, 820-3, and 820-4 of the input feature map 820, and the tile 822-1 ) May include a portion of the upper left of each of the channels 830-1, 830-2, 830-3, 830-4, 830-5, and 830-6 of the output feature map 830.

프로세서(310)가 제 1 레이어 및 제 2 레이어의 타일들을 그룹화하여 동시에 처리하고자 할 경우, 프로세서(310)는 입력 타일(812-1), 타일들(822-1 및 822-1), 제 1 레이어의 웨이트맵(840), 제 2 레이어의 웨이트맵(850) 및 기타 워킹 데이터 각각을 저장할 공간의 용량에 대한 용량 정보를 획득할 수 있다. 획득된 각각의 용량의 합이 프로세서(310) 내부에 위치한 메모리 용량의 크기 내인 경우, 프로세서(310)는 획득된 용량 정보 각각에 기초하여 프로세서(310) 내부에 위치한 메모리 내에 입력 타일(812-1), 타일(822-1), 타일(832-1), 제 1 레이어의 웨이트맵(840) 및 제 2 레이어의 웨이트맵(850) 각각을 저장할 공간을 할당할 수 있다. When the processor 310 wants to group and process the tiles of the first layer and the second layer at the same time, the processor 310 includes the input tiles 812-1, the tiles 822-1 and 822-1, and the first The capacity information for the capacity of the space to store each of the weight map 840 of the layer, the weight map 850 of the second layer, and other working data may be obtained. When the sum of each obtained capacity is within the size of the memory capacity located inside the processor 310, the processor 310 inputs the tile 812-1 into the memory located inside the processor 310 based on each acquired capacity information. ), a tile 822-1, a tile 832-1, a weight map 840 of the first layer and a weight map 850 of the second layer may be allocated.

프로세서(310)는 할당된 메모리 공간에 기초하여, 입력 타일(812-1)에 대응하는 제 1 레이어 및 제 2 레이어에서의 연산을 일괄적으로 처리할 수 있다. 마찬가지로, 프로세서(310)는 나머지 입력 타일들(812-2, 812-3 및 812-4)에 대해서도 제 1 레이어 및 제 2 레이어에서의 연산을 일괄적으로 처리할 수 있다.The processor 310 may collectively process operations in the first layer and the second layer corresponding to the input tile 812-1 based on the allocated memory space. Likewise, the processor 310 may collectively process operations in the first layer and the second layer for the remaining input tiles 812-2, 812-3, and 812-4.

한편, 입력 타일의 크기 및 그룹화되는 소정의 레이어들의 개수는 상술한 바로 한정되지 않으며, 다양한 값을 가질 수 있다.Meanwhile, the size of the input tile and the number of predetermined layers to be grouped are not limited to those described above, and may have various values.

도 9는 뉴럴 네트워크 장치에서 메모리 공간을 할당하는 과정의 일 예를 설명하기 위한 흐름도이다. 9 is a flowchart illustrating an example of a process for allocating memory space in a neural network device.

910 단계에서, 뉴럴 네트워크 장치(300)는 뉴럴 네트워크의 n번째 레이어에 대하여 메모리를 할당할 수 있다. 뉴럴 네트워크 장치(300)는 먼저 뉴럴 네트워크의 1 번째 레이어에 대하여 메모리를 할당할 수 있다. 메모리는 뉴럴 네트워크 장치(300)의 프로세서(310) 내에 위치할 수 있다.In operation 910, the neural network device 300 may allocate memory for the n-th layer of the neural network. The neural network device 300 may first allocate memory for the first layer of the neural network. The memory may be located in the processor 310 of the neural network device 300.

920 단계에서, 뉴럴 네트워크 장치(300)는 n번째 레이어의 입력 피처맵, 출력 피처맵, 웨이트맵 및 워킹 데이터 각각의 용량을 획득할 수 있다.In operation 920, the neural network device 300 may acquire the capacity of each of the input feature map, output feature map, weight map, and walking data of the n-th layer.

930 단계에서, 뉴럴 네트워크 장치(300)는 메모리 내에서 입력 피처맵을 저장할 제 1 저장 공간을 할당하고, 출력 피처맵을 저장할 제 2 저장 공간을 할당할 수 있다. 제 1 저장 공간은 메모리의 초기 주소 값으로부터 제 1 주소 값까지의 공간에 해당하고, 제 2 저장 공간은 메모리의 제 2 주소 값으로부터 마지막 주소 값까지의 공간에 해당할 수 있다.In operation 930, the neural network device 300 may allocate a first storage space to store the input feature map in memory, and a second storage space to store the output feature map. The first storage space may correspond to a space from the initial address value of the memory to the first address value, and the second storage space may correspond to the space from the second address value to the last address value of the memory.

940 단계에서, 뉴럴 네트워크 장치(300)는 획득된 용량들의 총 합이 메모리 공간의 크기보다 작은지를 판단할 수 있다. 획득된 용량들의 총 합이 메모리 공간의 크기보다 작은 경우, 950단계로 진행된다. 하지만, 획득된 용량들의 총 합이 메모리 공간의 크기보다 작지 않을 경우, 960단계로 진행한다. 구체적으로, 뉴럴 네트워크 장치(300)는 n번째 레이어의 입력 피처맵, 출력 피처맵, 웨이트맵 및 워킹 데이터 각각을 저장하기 위한 공간의 용량을 모두 합하고, 합한 결과 값이 메모리 공간의 크기보다 작은지를 판단할 수 있다.In operation 940, the neural network device 300 may determine whether the total sum of the acquired capacities is smaller than the size of the memory space. If the total sum of the acquired capacities is smaller than the size of the memory space, the process proceeds to step 950. However, if the total sum of the acquired capacities is not smaller than the size of the memory space, the process proceeds to step 960. Specifically, the neural network device 300 sums all of the capacity of the space for storing the input feature map, the output feature map, the weight map, and the walking data of the n-th layer, and whether the resultant value is smaller than the size of the memory space I can judge.

950 단계에서, 뉴럴 네트워크 장치(300)는 제 1 저장 공간과 제 2 저장 공간 사이에 웨이트맵을 저장할 공간을 할당할 수 있다. In operation 950, the neural network device 300 may allocate space for storing the weight map between the first storage space and the second storage space.

960 단계에서, 뉴럴 네트워크 장치(300)는 웨이트맵을 복수의 서브 웨이트맵들로 분할할 수 있다. 뉴럴 네트워크 장치(300)는 서브 웨이트맵들 각각과 입력 피처맵 간의 연산으로부터 출력 피처맵의 복수의 채널들 각각이 생성될 수 있도록, 웨이트맵을 복수의 서브 웨이트맵들로 분할할 수 있다.In operation 960, the neural network device 300 may divide the weight map into a plurality of sub weight maps. The neural network device 300 may divide the weight map into a plurality of sub weight maps so that each of the plurality of channels of the output feature map can be generated from the calculation between each of the sub weight maps and the input feature map.

970 단계에서, 뉴럴 네트워크 장치(300)는 n번째 레이어를 복수의 서브 레이어들로 분할하고, 서브 레이어들 각각에 대하여 제 1 저장 공간과 제 2 저장 공간 사이에 서브 웨이트맵을 저장할 공간을 할당할 수 있다. In operation 970, the neural network device 300 divides the n-th layer into a plurality of sub-layers, and allocates space to store a sub-weight map between the first storage space and the second storage space for each of the sub-layers. Can be.

980 단계에서, 뉴럴 네트워크 장치(300)는 n번째 레이어의 다음 레이어에 대하여 메모리를 할당할 수 있다. In operation 980, the neural network device 300 may allocate memory for the next layer of the n-th layer.

990 단계에서, 뉴럴 네트워크 장치(300)는 n번째 레이어가 마지막 레이어에 해당하는지 판단할 수 있다. n번째 레이어가 마지막 레이어에 해당하는 경우, 메모리 할당을 종료한다. n번째 레이어가 마지막 레이어에 해당하지 않는 경우, 920 단계로 리턴한다. 즉, 모든 레이어에 대하여 메모리를 할당할 때까지 920 단계 내지 980 단계가 반복될 수 있다. In operation 990, the neural network device 300 may determine whether the n-th layer corresponds to the last layer. When the n-th layer corresponds to the last layer, memory allocation is ended. If the n-th layer does not correspond to the last layer, the process returns to step 920. That is, steps 920 to 980 may be repeated until memory is allocated for all layers.

도 10은 뉴럴 네트워크 장치에서 메모리 공간을 할당하는 과정의 다른 예를 설명하기 위한 흐름도이다. 10 is a flowchart illustrating another example of a process of allocating memory space in a neural network device.

1010 단계에서, 뉴럴 네트워크 장치(300)는 뉴럴 네트워크의 n번째 레이어에 대하여 메모리를 할당할 수 있다. 뉴럴 네트워크 장치(300)는 먼저 뉴럴 네트워크의 1번째 레이어에 대하여 메모리를 할당할 수 있다. 메모리는 뉴럴 네트워크 장치(300)의 프로세서(310) 내에 위치할 수 있다.In step 1010, the neural network device 300 may allocate memory for the n-th layer of the neural network. The neural network device 300 may first allocate memory for the first layer of the neural network. The memory may be located in the processor 310 of the neural network device 300.

1015 단계에서, 뉴럴 네트워크 장치(300)는 획득된 용량들의 총 합이 메모리 공간의 크기보다 작은지를 판단할 수 있다. 획득된 용량들의 총 합이 메모리 공간의 크기보다 작은 경우, 1025단계로 진행된다. 하지만, 획득된 용량들의 총 합이 메모리 공간의 크기보다 작지 않을 경우, 1030단계로 진행한다. 구체적으로, 뉴럴 네트워크 장치(300)는 n번째 레이어의 입력 피처맵, 출력 피처맵, 웨이트맵 및 워킹 데이터 각각을 저장하기 위한 공간의 용량을 모두 합하고, 합한 결과 값이 메모리 공간의 크기보다 작은지를 판단할 수 있다.In step 1015, the neural network device 300 may determine whether the total sum of the acquired capacities is smaller than the size of the memory space. If the total sum of the acquired capacities is smaller than the size of the memory space, step 1025 is performed. However, if the total sum of the acquired capacities is not smaller than the size of the memory space, the process proceeds to step 1030. Specifically, the neural network device 300 sums all of the capacity of the space for storing the input feature map, the output feature map, the weight map, and the walking data of the n-th layer, and whether the resultant value is smaller than the size of the memory space I can judge.

1025 단계에서, 뉴럴 네트워크 장치(300)는 메모리 내에서 입력 피처맵을 저장할 제 1 저장 공간을 할당하고, 출력 피처맵을 저장할 제 2 저장 공간을 할당할 수 있다. 제 1 저장 공간은 메모리의 초기 주소 값으로부터 제 1 주소 값까지의 공간에 해당하고, 제 2 저장 공간은 메모리의 제 2 주소 값으로부터 마지막 주소 값까지의 공간에 해당할 수 있다.In operation 1025, the neural network device 300 may allocate a first storage space to store the input feature map in memory, and a second storage space to store the output feature map. The first storage space may correspond to a space from the initial address value of the memory to the first address value, and the second storage space may correspond to the space from the second address value to the last address value of the memory.

1030 단계에서, 뉴럴 네트워크 장치(300)는 n번째 레이어의 입력 피처맵 내 입력 타일, 출력 타일, 웨이트맵 및 워킹 데이터를 각각 저장할 공간의 용량의 합 S 획득할 수 있다. 입력 타일은 입력 피처맵의 일부에 해당할 수 있고, 출력 타일은 n번째 레이어에서 입력 타일과 웨이트맵 간의 연산에 의해 생성될 수 있다.In step 1030, the neural network device 300 may acquire the sum S of the capacity of the space to store the input tile, the output tile, the weight map, and the working data in the input feature map of the n-th layer, respectively. The input tile may correspond to a part of the input feature map, and the output tile may be generated by an operation between the input tile and the weight map in the n-th layer.

1035 단계에서, 뉴럴 네트워크 장치(300)는 S가 메모리 공간의 크기보다 작은지를 판단할 수 있다. S가 메모리 공간의 크기보다 작은 경우, 1040단계로 진행된다. 하지만, S가 메모리 공간의 크기보다 작지 않을 경우, 1055단계로 진행한다.In step 1035, the neural network device 300 may determine whether S is smaller than the size of the memory space. If S is smaller than the size of the memory space, the process proceeds to step 1040. However, if S is not smaller than the size of the memory space, the process proceeds to step 1055.

1040 단계에서, 뉴럴 네트워크 장치(300)는 n번째 레이어의 다음 레이어에 대하여 타일링을 수행할 수 있다. In step 1040, the neural network device 300 may perform tiling on the next layer of the n-th layer.

1045 단계에서, 뉴럴 네트워크 장치(300)는 n번째 레이어의 출력 타일 및 웨이트맵을 저장할 공간의 용량의 합 S_n 획득할 수 있다. 구체적으로, 뉴럴 네트워크 장치(300)는 n번째 레이어의 출력 타일을 입력으로 하여 n번째 레이어의 다음 레이어를 통과하여 생성되는 출력 타일 및 n번째 레이어의 다음 레이어의 연산에 이용되는 웨이트맵을 저장할 공간의 용량의 합을 획득할 수 있다.In step 1045, the neural network device 300 may acquire a sum S _n of the capacity of the space to store the output tile and weight map of the n-th layer. Specifically, the neural network device 300 is a space for storing an output tile generated through the next layer of the n-th layer and a weight map used for calculation of the next layer of the n-th layer by using the output tile of the n-th layer as an input. The sum of the doses of can be obtained.

1050 단계에서, 뉴럴 네트워크 장치(300)는 Sn을 S에 더하여 S를 누적시킬 수 있다. 뉴럴 네트워크 장치(300)는 누적된 S가 메모리 공간의 크기보다 작은지를 다시 판단할 수 있다. 즉, S가 메모리 공간의 크기보다 작지 않을 때까지 1035 단계 내지 1050 단계가 반복될 수 있다. In step 1050, the neural network device 300 may accumulate S by adding Sn to S. The neural network device 300 may determine whether the accumulated S is smaller than the size of the memory space. That is, steps 1035 to 1050 may be repeated until S is not smaller than the size of the memory space.

1055 단계에서, 뉴럴 네트워크 장치(300)는 웨이트맵을 복수의 서브 웨이트맵들로 분할할 수 있다. 뉴럴 네트워크 장치(300)는 서브 웨이트맵들 각각과 입력 피처맵 간의 연산으로부터 출력 피처맵의 복수의 채널들 각각이 생성될 수 있도록, 웨이트맵을 복수의 서브 웨이트맵들로 분할할 수 있다.In operation 1055, the neural network device 300 may divide the weight map into a plurality of sub weight maps. The neural network device 300 may divide the weight map into a plurality of sub weight maps so that each of the plurality of channels of the output feature map can be generated from the calculation between each of the sub weight maps and the input feature map.

1060 단계에서, 뉴럴 네트워크 장치(300)는 n번째 레이어를 복수의 서브 레이어들로 분할하고, 서브 레이어들 각각에 대하여 서브 웨이트맵을 할당할 수 있다. In operation 1060, the neural network device 300 may divide the n-th layer into a plurality of sub-layers, and allocate a sub-weight map to each of the sub-layers.

1065 단계에서, 뉴럴 네트워크 장치(300)는 S가 메모리 공간의 크기보다 작은지를 판단할 수 있다. S가 메모리 공간의 크기보다 작은 경우, 1070단계로 진행된다. 하지만, S가 메모리 공간의 크기보다 작지 않을 경우, 1075단계로 진행한다.In step 1065, the neural network device 300 may determine whether S is smaller than the size of the memory space. If S is smaller than the size of the memory space, the process proceeds to step 1070. However, if S is not smaller than the size of the memory space, the process proceeds to step 1075.

1070 단계에서, 뉴럴 네트워크 장치(300)는 메모리 내에서 입력 타일, 출력 타일, 웨이트맵 및 워킹 데이터를 저장할 공간을 할당할 수 있다. In operation 1070, the neural network device 300 may allocate space for storing input tiles, output tiles, weight maps, and working data in the memory.

1075 단계에서, 뉴럴 네트워크 장치(300)는 입력 타일, 출력 타일, 웨이트맵 및 워킹 데이터의 일부를 외부 메모리에 저장할 수 있다. 즉, S가 메모리 공간의 크기보다 크기 때문에, 프로세서(310)가 연산 과정에서 외부 메모리로부터 데이터를 리드하거나 라이트할 수 밖에 없다. In operation 1075, the neural network device 300 may store a portion of the input tile, output tile, weight map, and walking data in an external memory. That is, since S is larger than the size of the memory space, the processor 310 has no choice but to read or write data from the external memory during the calculation process.

1080 단계에서, 뉴럴 네트워크 장치(300)는 n번째 레이어의 다음 레이어에 대하여 메모리를 할당할 수 있다. In operation 1080, the neural network device 300 may allocate memory for the next layer of the n-th layer.

1085 단계에서, 뉴럴 네트워크 장치(300)는 n번째 레이어가 마지막 레이어에 해당하는지 판단할 수 있다. n번째 레이어가 마지막 레이어에 해당하는 경우, 메모리 할당을 종료한다. n번째 레이어가 마지막 레이어에 해당하지 않는 경우, 1015 단계로 리턴한다. 즉, 모든 레이어에 대하여 메모리를 할당할 때까지 1015 단계 내지 1080 단계가 반복될 수 있다. In operation 1085, the neural network device 300 may determine whether the n-th layer corresponds to the last layer. When the n-th layer corresponds to the last layer, memory allocation is ended. If the n-th layer does not correspond to the last layer, the process returns to step 1015. That is, steps 1015 to 1080 may be repeated until memory is allocated for all layers.

도 11은 뉴럴 네트워크의 각 레이어에서 입력 피처맵 및 출력 피처맵을 저장할 공간을 메모리 내에서 할당하는 과정의 일 예를 나타내는 흐름도이다. 11 is a flowchart illustrating an example of a process of allocating a space for storing an input feature map and an output feature map in memory in each layer of a neural network.

1110 단계에서, 뉴럴 네트워크 장치(300)는 뉴럴 네트워크의 복수의 레이어들 중 제 1 레이어의 입력 피처맵을 저장할 공간의 용량에 관한 제 1 용량 정보 및 제 1 레이어의 출력 피처맵을 저장할 공간의 용량에 관한 제 2 용량 정보를 획득할 수 있다. . 제 1 용량 정보는 제 1 레이어의 입력 피처맵을 저장하기 위해 필요한 공간의 용량으로서, 입력 피처맵 데이터의 크기에 해당할 수 있다. 마찬가지로, 제 2 용량 정보는 제 2 레이어의 출력 피처맵을 저장하기 위해 필요한 공간의 용량으로서, 출력 피처맵 데이터의 크기에 해당할 수 있다. In operation 1110, the neural network device 300 first capacity information regarding a capacity of a space to store an input feature map of the first layer among a plurality of layers of the neural network and a capacity of a space to store the output feature map of the first layer. It is possible to obtain the second capacity information about. . The first capacity information is a capacity of a space required to store the input feature map of the first layer, and may correspond to the size of the input feature map data. Similarly, the second capacity information is a capacity of space required to store the output feature map of the second layer, and may correspond to the size of the output feature map data.

1120 단계에서, 뉴럴 네트워크 장치(300)는 메모리의 초기 주소 값 및 제 1 용량 정보에 기초하여 메모리 내에서 입력 피처맵을 저장할 제 1 저장 공간을 할당하고, 메모리의 마지막 주소값 및 제 2 용량 정보에 기초하여 메모리 내에서 출력 피처맵을 저장할 제 2 저장 공간을 할당할 수 있다. 제 1 저장 공간은 메모리의 초기 주소 값으로부터 제 1 주소 값까지의 공간에 해당하고, 제 2 저장 공간은 메모리의 제 2 주소 값으로부터 마지막 주소 값까지의 공간에 해당할 수 있다. 제 1 주소 값 및 제 2 주소 값은 메모리의 초기 주소 값과 마지막 주소 값 사이의 임의의 주소 값이 될 수 있다. 뉴럴 네트워크 장치(300)는 레이어의 입력 피처맵 및 출력 피처맵을 저장할 공간을 각각 메모리의 양 단에 할당함으로써, 외부 메모리에의 엑세스를 최소화할 수 있다. In step 1120, the neural network device 300 allocates a first storage space to store an input feature map in the memory based on the initial address value of the memory and the first capacity information, and the last address value and the second capacity information of the memory A second storage space to store the output feature map in memory can be allocated based on the. The first storage space may correspond to a space from the initial address value of the memory to the first address value, and the second storage space may correspond to the space from the second address value to the last address value of the memory. The first address value and the second address value may be any address value between the initial address value and the last address value in memory. The neural network device 300 can minimize access to external memory by allocating spaces for storing the input feature map and the output feature map of the layers to both ends of the memory.

본 실시예들은 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행 가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체 및 통신 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. 통신 매체는 전형적으로 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈과 같은 변조된 데이터 신호의 기타 데이터, 또는 기타 전송 메커니즘을 포함하며, 임의의 정보 전달 매체를 포함한다.The embodiments can also be embodied in the form of a recording medium containing instructions executable by a computer, such as program modules, being executed by a computer. Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. In addition, computer-readable media may include both computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Communication media typically includes computer readable instructions, data structures, other data in a modulated data signal, such as program modules, or other transport mechanisms, and includes any information delivery media.

또한, 본 명세서에서, "부"는 프로세서 또는 회로와 같은 하드웨어 구성(hardware component), 및/또는 프로세서와 같은 하드웨어 구성에 의해 실행되는 소프트웨어 구성(software component)일 수 있다.Further, in the present specification, the “unit” may be a hardware component such as a processor or circuit, and/or a software component executed by a hardware component such as a processor.

전술한 본 명세서의 설명은 예시를 위한 것이며, 본 명세서의 내용이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present specification is for illustration only, and those skilled in the art to which the contents of this specification belong may understand that it can be easily modified to other specific forms without changing the technical spirit or essential features of the present invention. Will be able to. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form.

본 실시예의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 포함되는 것으로 해석되어야 한다.The scope of the present embodiment is indicated by the claims, which will be described later, rather than by the detailed description, and should be interpreted to include all modified or modified forms derived from the meaning and scope of the claims and their equivalent concepts.

Claims

A method for allocating memory space for a plurality of layers of a neural network,
Acquiring first capacity information about a capacity of a space to store an input feature map of a first layer among a plurality of layers of the neural network and second capacity information about a capacity of a space to store an output feature map of the first layer step; And
Allocating a first storage space to store the input feature map in the memory based on the initial address value of the memory and the first capacity information, and based on the last address value of the memory and the second capacity information And allocating a second storage space to store the output feature map within.

According to claim 1,
Allocating a space to store the input feature map of the second layer following the first layer to the second storage space;
Obtaining third capacity information regarding a capacity of a space to store the output feature map of the second layer; And
And allocating a third storage space to store the output feature map of the second layer in the memory based on the initial address value of the memory and the third capacity information.

According to claim 1,
The first storage space corresponds to a space from an initial address value of the memory to a first address value,
The second storage space corresponds to a space from the second address value to the last address value in the memory.

According to claim 1,
The acquiring step further acquires fourth capacity information regarding a capacity of a space to store the input feature map and a weight map on which calculation is performed,
The allocating step further allocates a space for storing the weight map between the first storage space and the second storage space based on the fourth capacity information.

According to claim 1,
Dividing the weight map of the first layer into a plurality of sub weight maps;
Dividing the first layer into a plurality of sub-layers, and assigning each of the plurality of sub-weight maps to each of the divided sub-layers;
Obtaining sub-capacity information regarding a capacity of a space to store each of the plurality of sub-weight maps; And
And for each of the plurality of sub-layers, allocating a space for storing a sub-weight map allocated based on the sub-capacity information between the first storage space and the second storage space.

The method of claim 5,
Each of the plurality of channels of the output feature map is generated from the operation between each of the plurality of sub weight maps and the input feature map,
And storing each of the plurality of channels of the output feature map sequentially in the second storage space.

According to claim 1,
Selecting an input tile in the input feature map of the first layer; Obtaining capacity information for a capacity of a space to store each of the input tile, the output tile corresponding to the input tile, and the weight map of the first layer; And
And allocating a space to store each of the input tile, the output tile, and the weight map in the memory based on the acquired capacity information.

According to claim 1,
Wherein the memory is located within a processor of a device driving the neural network.

A computer-readable recording medium recording a program for executing the method of claim 1 on a computer.

For neural network devices,
Memory; And
And a processor running the neural network by executing at least one program,
The processor,
Acquiring first capacity information about a capacity of a space to store an input feature map of a first layer among a plurality of layers of the neural network and second capacity information about a capacity of a space to store an output feature map of the first layer, , Allocate a first storage space to store the input feature map in the memory based on the initial address value of the memory and the first capacity information, and based on the last address value of the memory and the second capacity information A device for allocating a second storage space to store the output feature map in memory.

The method of claim 10,
The processor allocates space to store the input feature map of the second layer following the first layer to the second storage space, and receives third capacity information about the capacity of the space to store the output feature map of the second layer. A device for acquiring and allocating a third storage space to store an output feature map of the second layer in the memory based on the initial address value of the memory and the third capacity information.

The method of claim 10,
The first storage space corresponds to a space from an initial address value of the memory to a first address value, and the second storage space corresponds to a space from a second address value to the last address value of the memory.

The method of claim 10,
The processor further acquires fourth capacity information regarding a capacity of a space to store the input feature map and a weight map on which calculation is performed, and based on the fourth capacity information, the first storage space and the second storage space An apparatus for allocating space for storing the weight map therebetween.

The method of claim 10,
The processor divides the weight map of the first layer into a plurality of sub weight maps, divides the first layer into a plurality of sub layer layers, and divides the first layer into a plurality of sub weight maps, and the plurality of sub weight maps in each of the divided sub layers. Allocating each of them, obtaining sub-capacity information about the capacity of the space to store each of the plurality of sub-weight maps, and, for each of the plurality of sub-layers, sub-map allocated based on the sub-capacity information An apparatus for allocating space for storage between the first storage space and the second storage space.

The method of claim 14,
The processor generates each of a plurality of channels of the output feature map from an operation between each of the plurality of sub-weight maps and the input feature map, and in the second storage space, a plurality of channels of the output feature map. A device that stores each sequentially.

The method of claim 10,
The processor selects an input tile in the input feature map of the first layer, and stores the input tile, an output tile corresponding to the input tile, and a space to store each weight map of the first layer. An apparatus for acquiring capacity information for a device and allocating a space to store each of the input tile, the output tile, and the weight map in the memory based on the acquired capacity information.

The method of claim 10,
And the memory is in the processor of the neural network device.