JP2017151604A

JP2017151604A - Arithmetic processing unit

Info

Publication number: JP2017151604A
Application number: JP2016031949A
Authority: JP
Inventors: 智章尾崎; Tomoaki Ozaki
Original assignee: Denso Corp
Current assignee: Denso Corp
Priority date: 2016-02-23
Filing date: 2016-02-23
Publication date: 2017-08-31
Anticipated expiration: 2036-02-23
Also published as: JP6645252B2

Abstract

PROBLEM TO BE SOLVED: To provide an arithmetic processing unit configured to improve efficiency of arithmetic processing by implementing a plurality of arithmetic blocks and a plurality of memories.SOLUTION: An arithmetic processing unit 10 includes a plurality of processing layers connected hierarchically. The processing layers each including a relay unit 13 and a relay unit 12 are connected to each other. The relay unit 13 outputs arithmetic result data to be generated by a pair of arithmetic blocks 11 to a pair of data holding units 14, or outputs accumulated arithmetic result data to the pair of data holding units or a data holding unit arranged in a lower order than the above data holding units. The relay unit 12 outputs the data held in the pair of data holding units to the pair of arithmetic blocks, or outputs the data held in the pair of data holding units to the pair of arithmetic blocks and an arithmetic block arranged in a higher order than the above arithmetic blocks. The arithmetic processing unit is configured to prevent wiring connecting the plurality of arithmetic blocks to a plurality of memories from being complicated, while preventing reduction in processing speed or increase in size of the device.SELECTED DRAWING: Figure 5

Description

本発明は、演算処理装置に関する。 The present invention relates to an arithmetic processing device.

従来より、複数の処理層が階層的に接続されたニューラルネットワークによる演算を実行する演算処理装置が考えられている。特に画像認識を行う演算処理装置においては、いわゆる畳み込みニューラルネットワーク（ＣＮＮ：Convolutional Neural Network）が中核的な存在となっている。 2. Description of the Related Art Conventionally, there has been considered an arithmetic processing device that executes arithmetic operations using a neural network in which a plurality of processing layers are hierarchically connected. Particularly in arithmetic processing devices that perform image recognition, a so-called convolutional neural network (CNN) is at the core.

特許第５１８４８２４号公報Japanese Patent No. 5184824

ところで、この種の演算処理装置においては、演算処理層の階層数の増加や演算処理の複雑化に対応するために、複数の演算ブロックと複数のメモリを搭載し、例えば１つの演算ブロックに対し複数のメモリを接続し、また、１つのメモリに対し複数の演算ブロックを接続することが考えられている。この構成によれば、１つの演算ブロックが複数のメモリにデータを書き込むことができ、また、１つのメモリから複数の演算ブロックにデータを読み込ませることができるので、演算処理層の階層数の増加や演算処理の複雑化に対応して演算処理を効率的に行うことができる。しかし、複数の演算ブロックと複数のメモリとを接続する配線が複雑化し、これに伴う処理速度の低下や、演算処理装置の大型化が懸念される。 By the way, in this kind of arithmetic processing device, in order to cope with the increase in the number of arithmetic processing layers and the complexity of arithmetic processing, a plurality of arithmetic blocks and a plurality of memories are mounted. It is considered to connect a plurality of memories and to connect a plurality of operation blocks to one memory. According to this configuration, one arithmetic block can write data to a plurality of memories, and data can be read from one memory to a plurality of arithmetic blocks. In addition, it is possible to efficiently perform arithmetic processing corresponding to the complexity of arithmetic processing. However, the wiring for connecting a plurality of operation blocks and a plurality of memories becomes complicated, and there is a concern that the processing speed will decrease and the size of the operation processing device will increase.

そこで、本発明は、複数の演算ブロックと複数のメモリを搭載することにより演算処理の効率化を図るようにした演算処理装置において、複数の演算ブロックと複数のメモリとを接続する配線の複雑化を抑えることができ、処理速度の低下や装置の大型化を回避することができる構成を提供することを目的とする。 Therefore, the present invention provides a complicated wiring for connecting a plurality of operation blocks and a plurality of memories in an operation processing apparatus designed to improve the efficiency of the operation processing by mounting a plurality of operation blocks and a plurality of memories. It is an object of the present invention to provide a configuration that can suppress the decrease in processing speed and increase in the size of the apparatus.

本発明に係る演算処理装置は、階層的に接続された複数の処理層による演算を実行する演算処理装置であって、前記演算を実行する複数の演算ブロックと、複数の前記演算ブロックとそれぞれ対をなす複数のデータ保持部と、複数の前記演算ブロックとそれぞれ対をなす複数のデータ出力部と、を備える。複数の前記演算ブロック、複数の前記データ保持部、複数の前記データ出力部は、それぞれ下位側から上位側に向かって列状に配列されている。複数の前記演算ブロックは、下位側から上位側に向かって演算結果データを順次累積する。前記データ出力部は、対をなす前記演算ブロックが生成する演算結果データを対をなす前記データ保持部に出力、または、累積された演算結果データを対をなす前記データ保持部および当該データ保持部よりも下位側の前記データ保持部に出力する。 An arithmetic processing apparatus according to the present invention is an arithmetic processing apparatus that executes an operation by a plurality of processing layers connected in a hierarchy, and each of the operation blocks that executes the operation and a plurality of the operation blocks. A plurality of data holding units, and a plurality of data output units each paired with the plurality of operation blocks. The plurality of calculation blocks, the plurality of data holding units, and the plurality of data output units are respectively arranged in a row from the lower side to the upper side. The plurality of operation blocks sequentially accumulate operation result data from the lower side toward the upper side. The data output unit outputs the operation result data generated by the pair of operation blocks to the data holding unit that forms a pair, or the data holding unit that forms a pair of accumulated operation result data and the data holding unit Is output to the data holding unit on the lower side.

また、本発明に係る演算処理装置は、階層的に接続された複数の処理層による演算を実行する演算処理装置であって、前記演算を実行する複数の演算ブロックと、複数の前記演算ブロックとそれぞれ対をなす複数のデータ保持部と、複数の前記演算ブロックとそれぞれ対をなす複数のデータ出力部と、を備える。複数の前記演算ブロック、複数の前記データ保持部、複数の前記データ出力部は、それぞれ下位側から上位側に向かって列状に配列されている。前記データ出力部は、対をなす前記データ保持部が保持しているデータを対をなす前記演算ブロックに出力、または、対をなす前記データ保持部が保持しているデータを対をなす前記演算ブロックおよび当該演算ブロックよりも上位側の前記演算ブロックに出力する。 The arithmetic processing device according to the present invention is an arithmetic processing device that executes arithmetic operations by a plurality of hierarchically connected processing layers, and includes a plurality of arithmetic blocks that execute the arithmetic operations, and a plurality of the arithmetic blocks. A plurality of data holding units each making a pair; and a plurality of data output units each making a pair with the plurality of operation blocks. The plurality of calculation blocks, the plurality of data holding units, and the plurality of data output units are respectively arranged in a row from the lower side to the upper side. The data output unit outputs the data held by the data holding unit that makes a pair to the operation block that makes a pair, or the operation that makes a pair of the data held by the data holding unit that makes a pair The data is output to the block and the calculation block above the calculation block.

本発明に係る演算処理装置によれば、データ出力部による選択処理により、１つの演算ブロックの演算結果データを複数のデータ保持部に書き込むことが可能である。また、本発明に係る演算処理装置によれば、データ出力部による選択処理により、１つのデータ保持部が保持しているデータを複数の演算ブロックに分配することが可能である。この構成によれば、複数の演算ブロックと複数のデータ保持部とを接続する配線の複雑化を抑えつつも、複数の演算ブロックと複数のデータ保持部との間でデータの読み書きを効率的に行うことができ、処理速度の低下や装置の大型化を回避することができる。 According to the arithmetic processing device according to the present invention, it is possible to write the operation result data of one arithmetic block into a plurality of data holding units by the selection processing by the data output unit. In addition, according to the arithmetic processing device according to the present invention, it is possible to distribute the data held by one data holding unit to a plurality of calculation blocks by the selection process by the data output unit. According to this configuration, it is possible to efficiently read and write data between the plurality of operation blocks and the plurality of data holding units, while suppressing the complexity of the wiring connecting the plurality of operation blocks and the plurality of data holding units. This can be done, and a reduction in processing speed and an increase in the size of the apparatus can be avoided.

畳み込みニューラルネットワークの構成例を概念的に示す図A diagram conceptually showing a configuration example of a convolutional neural network 中間層における演算処理の流れを視覚的に示す図（その１）The figure which shows the flow of arithmetic processing in the middle layer visually (the 1) 中間層における演算処理の流れを視覚的に示す図（その２）A diagram visually showing the flow of arithmetic processing in the intermediate layer (Part 2) 特徴量抽出処理に用いられる一般的な演算式および関数を示す図Diagram showing general arithmetic expressions and functions used for feature extraction processing 本実施形態に係る演算処理装置の構成例を概略的に示すブロック図1 is a block diagram schematically showing a configuration example of an arithmetic processing device according to the present embodiment. 演算ブロックの構成例を概略的に示すブロック図Block diagram schematically showing an example of the configuration of an arithmetic block 演算処理装置の動作例を概略的に示すブロック図（その１）Block diagram (part 1) schematically showing an operation example of the arithmetic processing unit 演算処理装置の動作例を概略的に示すブロック図（その２）Block diagram schematically showing an operation example of the arithmetic processing unit (part 2) 演算処理装置の動作例を概略的に示すブロック図（その３）Block diagram schematically showing an operation example of the arithmetic processing unit (part 3) 演算処理装置の動作例を概略的に示すブロック図（その４）Block diagram schematically showing an operation example of the arithmetic processing unit (part 4) データ保持部がアクセスする外部メモリのアドレスの決定例を示す図（その１）The figure which shows the example of the determination of the address of the external memory which a data holding part accesses (the 1) データ保持部がアクセスする外部メモリのアドレスの決定例を示す図（その２）The figure which shows the example of determination of the address of the external memory which a data holding part accesses (the 2)

以下、演算処理装置の一実施形態について図面を参照しながら説明する。
（ニューラルネットワーク）
図１には、詳しくは後述する演算処理装置１０に適用されるニューラルネットワーク、この場合、畳み込みニューラルネットワークの構成を概念的に示している。畳み込みニューラルネットワークＮは、入力データである画像データＤ１から所定の形状やパターンを認識する画像認識技術に応用されるものであり、中間層Ｎａと全結合層Ｎｂとを有する。中間層Ｎａは、複数の特徴量抽出処理層Ｎａ１，Ｎａ２・・・が階層的に接続された構成である。各特徴量抽出処理層Ｎａ１，Ｎａ２・・・は、それぞれ畳み込み層Ｃおよびプーリング層Ｐを備える。 Hereinafter, an embodiment of an arithmetic processing device will be described with reference to the drawings.
(neural network)
FIG. 1 conceptually shows the configuration of a neural network that is applied to an arithmetic processing unit 10 described later in detail, in this case, a convolutional neural network. The convolutional neural network N is applied to an image recognition technique for recognizing a predetermined shape or pattern from image data D1, which is input data, and includes an intermediate layer Na and a total coupling layer Nb. The intermediate layer Na has a configuration in which a plurality of feature quantity extraction processing layers Na1, Na2,. Each feature amount extraction processing layer Na1, Na2,... Includes a convolution layer C and a pooling layer P, respectively.

次に、中間層Ｎａにおける処理の流れについて説明する。図２に例示するように、第１層目の特徴量抽出処理層Ｎａ１では、演算処理装置は、入力される画像データＤ１を例えばラスタスキャンにより所定サイズごとに走査する。そして、走査したデータに対して周知の特徴量抽出処理を施すことにより入力画像に含まれる複数の特徴量を抽出する。なお、第１層目の特徴量抽出処理層Ｎａ１では、例えば水平方向に延びる線状の特徴量や斜め方向に延びる線状の特徴量などといった比較的シンプルな単独の特徴量を抽出する。このとき、演算処理装置は、入力画像に含まれる複数の特徴にそれぞれ対応する複数の特徴マップを生成する。 Next, the flow of processing in the intermediate layer Na will be described. As illustrated in FIG. 2, in the first feature amount extraction processing layer Na1, the arithmetic processing unit scans the input image data D1 for each predetermined size by, for example, raster scanning. A plurality of feature amounts included in the input image are extracted by performing a known feature amount extraction process on the scanned data. Note that the first feature amount extraction processing layer Na1 extracts relatively simple single feature amounts such as a linear feature amount extending in the horizontal direction and a linear feature amount extending in the oblique direction. At this time, the arithmetic processing device generates a plurality of feature maps respectively corresponding to the plurality of features included in the input image.

第２層目の特徴量抽出処理層Ｎａ２では、演算処理装置は、前階層の特徴量抽出処理層Ｎａ１から入力される入力データを例えばラスタスキャンにより所定サイズごとに走査する。そして、走査したデータに対して周知の特徴量抽出処理を施すことにより入力画像に含まれる複数の特徴量を抽出する。なお、第２層目の特徴量抽出処理層Ｎａ２では、第１層目の特徴量抽出処理層Ｎａ１で抽出された複数の特徴量の空間的な位置関係などを考慮しながら統合させることで、より高次元の複合的な特徴量を抽出する。このとき、演算処理装置は、入力画像に含まれる複数の特徴にそれぞれ対応する複数の特徴マップを生成する。 In the second feature amount extraction processing layer Na2, the arithmetic processing unit scans the input data input from the preceding feature amount extraction processing layer Na1 for each predetermined size by, for example, raster scanning. A plurality of feature amounts included in the input image are extracted by performing a known feature amount extraction process on the scanned data. In addition, in the feature amount extraction processing layer Na2 of the second layer, by integrating the spatial positional relationship of a plurality of feature amounts extracted by the feature amount extraction processing layer Na1 of the first layer, Extract higher-dimensional composite features. At this time, the arithmetic processing device generates a plurality of feature maps respectively corresponding to the plurality of features included in the input image.

第３層目の特徴量抽出処理層Ｎａ３では、演算処理装置は、前階層の特徴量抽出処理層Ｎａ２から入力される入力データを例えばラスタスキャンにより所定サイズごとに走査する。そして、走査したデータに対して周知の特徴量抽出処理を施すことにより入力画像に含まれる複数の特徴量を抽出する。なお、第３層目の特徴量抽出処理層Ｎａ３では、第２層目の特徴量抽出処理層Ｎａ２で抽出された複数の特徴量の空間的な位置関係などを考慮しながら統合させることで、より高次元の複合的な特徴量を抽出する。このとき、演算処理装置は、入力画像に含まれる複数の特徴にそれぞれ対応する複数の特徴マップを生成する。このように、複数の特徴量抽出処理層による特徴量の抽出処理を繰り返すことで、演算処理装置は、画像データＤ１に含まれる検出対象物体の画像認識を行う。 In the third feature quantity extraction processing layer Na3, the arithmetic processing unit scans the input data input from the previous feature quantity extraction processing layer Na2 for each predetermined size by, for example, raster scanning. A plurality of feature amounts included in the input image are extracted by performing a known feature amount extraction process on the scanned data. The feature extraction processing layer Na3 of the third layer is integrated by considering the spatial positional relationship of a plurality of feature amounts extracted by the feature extraction processing layer Na2 of the second layer, Extract higher-dimensional composite features. At this time, the arithmetic processing device generates a plurality of feature maps respectively corresponding to the plurality of features included in the input image. In this way, by repeating the feature amount extraction processing by the plurality of feature amount extraction processing layers, the arithmetic processing device performs image recognition of the detection target object included in the image data D1.

演算処理装置は、中間層Ｎａにおいて複数の特徴量抽出処理層Ｎａ１，Ｎａ２，Ｎａ３・・・による処理を繰り返すことで入力画像データＤ１に含まれる種々の特徴量を高次元で抽出していく。そして、演算処理装置は、中間層Ｎａの処理により得られた結果を中間演算結果データとして全結合層Ｎｂに出力する。 The arithmetic processing unit extracts various feature amounts included in the input image data D1 in a high dimension by repeating the processing by the plurality of feature amount extraction processing layers Na1, Na2, Na3... In the intermediate layer Na. Then, the arithmetic processing unit outputs the result obtained by the processing of the intermediate layer Na to the all coupling layer Nb as intermediate operation result data.

全結合層Ｎｂは、中間層Ｎａから得られる複数の中間演算結果データを結合して最終的な演算結果データを出力する。即ち、全結合層Ｎｂは、中間層Ｎａから得られる複数の中間演算結果データを結合し、さらに、その結合結果に対して重み係数を異ならせながら積和演算を行うことにより、最終的な演算結果データ、即ち、入力データである画像データＤ１に含まれる検出対象物を認識した画像データを出力する。このとき、積和演算による演算結果の値が大きい部分が検出対象物の一部または全部として認識される。 The total coupling layer Nb combines a plurality of intermediate calculation result data obtained from the intermediate layer Na and outputs final calculation result data. That is, the total connection layer Nb combines a plurality of intermediate operation result data obtained from the intermediate layer Na, and further performs a sum-of-products operation while varying the weighting coefficient for the combined result, thereby obtaining a final operation. Result data, that is, image data in which the detection target included in the image data D1 as input data is recognized is output. At this time, the part where the value of the result of the product-sum operation is large is recognized as a part or all of the detection target.

次に、演算処理装置による特徴量抽出処理の流れについて説明する。図３に例示するように、演算処理装置は、前階層の特徴量抽出処理層から入力される入力データＤｎを所定サイズ、この場合、図にてハッチングで示す３×３画素ごとのフィルタサイズにより走査する。なお、画素サイズは、３×３画素に限られず、例えば５×５画素など適宜変更することができる。 Next, a flow of feature amount extraction processing by the arithmetic processing device will be described. As illustrated in FIG. 3, the arithmetic processing device uses a predetermined size for the input data Dn input from the feature extraction processing layer in the previous hierarchy, in this case, according to the filter size for each 3 × 3 pixel indicated by hatching in the figure. Scan. Note that the pixel size is not limited to 3 × 3 pixels, and can be appropriately changed, for example, 5 × 5 pixels.

そして、演算処理装置は、走査したデータに対して、それぞれ周知の畳み込み演算を行う。そして、演算処理装置は、畳み込み演算後のデータに対して周知の活性化処理を行い、畳み込み層Ｃの出力とする。そして、演算処理装置は、畳み込み層Ｃの出力データＣｎに対して、所定サイズ、この場合、２×２画素ごとに周知のプーリング処理を行い、プーリング層Ｐの出力とする。そして、演算処理装置は、プーリング層Ｐの出力データＰｎを次の階層の特徴量抽出処理層に出力する。なお、画素サイズは、２×２画素に限られず適宜変更することができる。 The arithmetic processing unit performs a known convolution operation on the scanned data. Then, the arithmetic processing device performs a well-known activation process on the data after the convolution operation, and outputs the result to the convolution layer C. Then, the arithmetic processing unit performs a well-known pooling process on the output data Cn of the convolution layer C at a predetermined size, in this case, 2 × 2 pixels, and outputs the result to the pooling layer P. Then, the arithmetic processing device outputs the output data Pn of the pooling layer P to the feature amount extraction processing layer of the next layer. The pixel size is not limited to 2 × 2 pixels and can be changed as appropriate.

図４には、畳み込み演算処理に用いられる畳み込み関数、活性化処理に用いられる関数、プーリング処理に用いられる関数の一般的な例を示している。即ち、畳み込み関数Ｙｉｊは、直前の層の出力Ｘｉｊに学習により得られる重み係数Ｗｐ，ｑを乗算した値を累積する関数となっている。なお、「Ｎ」は１サイクルの畳み込み演算処理により処理される画素サイズを示す。即ち、例えば１演算サイクルの画素サイズが「３×３」画素である場合、Ｎの値は「２」である。また、畳み込み関数Ｙｉｊは、累積値に所定のバイアス値を加算する関数としてもよい。また、畳み込み関数は、全結合処理にも対応し得る積和演算が可能な関数であれば、種々の関数を採用することができる。また、活性化処理には、周知のロジスティックジグモイド関数やＲｅＬＵ関数（Rectified Linear Units）などが用いられる。また、プーリング処理には、入力されるデータの最大値を出力する周知の最大プーリング関数や、入力されるデータの平均値を出力する周知の平均プーリング関数などが用いられる。 FIG. 4 shows general examples of a convolution function used for convolution operation processing, a function used for activation processing, and a function used for pooling processing. That is, the convolution function Yij is a function that accumulates values obtained by multiplying the output Xij of the immediately preceding layer by the weighting factors Wp, q obtained by learning. Note that “N” indicates a pixel size to be processed by one cycle of convolution operation processing. That is, for example, when the pixel size of one calculation cycle is “3 × 3” pixels, the value of N is “2”. Further, the convolution function Yij may be a function for adding a predetermined bias value to the accumulated value. Various functions can be adopted as the convolution function as long as it is a function capable of multiply-accumulate operation that can cope with all-join processing. For the activation process, a well-known logistic sigmoid function, ReLU function (Rectified Linear Units), or the like is used. For the pooling process, a known maximum pooling function that outputs a maximum value of input data, a known average pooling function that outputs an average value of input data, or the like is used.

上述した畳み込みニューラルネットワークＮによれば、コンボルーション層Ｃによる処理およびプーリング層Ｐによる処理が繰り返されることにより、より高次元の特徴量の抽出が可能となる。次に、この畳み込みニューラルネットワークＮを適用した演算処理装置に係る実施形態について説明する。 According to the convolutional neural network N described above, the processing by the convolution layer C and the processing by the pooling layer P are repeated, so that higher-dimensional feature amounts can be extracted. Next, an embodiment according to an arithmetic processing apparatus to which the convolution neural network N is applied will be described.

（一実施形態）
図５に例示する演算処理装置１０は、複数の演算ブロック１１、複数の中継部１２，１３、複数のデータ保持部１４、複数のインタフェース１５などを備える。演算処理装置１０は、１つの演算ブロック１１、２つの中継部１２，１３、１つのデータ保持部１４により１つの演算処理ユニット１６を構成している。そして、演算処理装置１０は、複数の演算処理ユニット１６を下流側から上流側に向けて列状に配列した構成となっている。なお、説明の便宜上、図の下側を下流側、図の上側を上流側と定義する。また、演算処理ユニット１６は、それぞれインタフェース１５を介してインターコネクト部１７に接続されている。インターコネクト部１７は、演算処理装置１０の外部に設けられた外部メモリ１８に接続されている。 (One embodiment)
The arithmetic processing apparatus 10 illustrated in FIG. 5 includes a plurality of arithmetic blocks 11, a plurality of relay units 12 and 13, a plurality of data holding units 14, a plurality of interfaces 15, and the like. In the arithmetic processing device 10, one arithmetic processing unit 16 is configured by one arithmetic block 11, two relay units 12 and 13, and one data holding unit 14. The arithmetic processing unit 10 has a configuration in which a plurality of arithmetic processing units 16 are arranged in a line from the downstream side toward the upstream side. For convenience of explanation, the lower side of the figure is defined as the downstream side, and the upper side of the figure is defined as the upstream side. The arithmetic processing units 16 are each connected to the interconnect unit 17 via the interface 15. The interconnect unit 17 is connected to an external memory 18 provided outside the arithmetic processing apparatus 10.

図６に例示するように、演算ブロック１１は、それぞれ、畳み込み演算処理部１１ａ、累積処理部１１ｂ、活性化処理部１１ｃ、プーリング処理部１１ｄなどを備えている。これらの処理部は、例えば回路などのハードウェアにより構成してもよいし、ソフトウェアにより構成してもよいし、ハードウェアとソフトウェアの組み合わせにより構成してもよい。畳み込み演算処理部１１ａは、前階層から入力される入力データに対して周知の畳み込み演算処理を実行して、その処理結果データを累積処理部１１ｂに出力する。 As illustrated in FIG. 6, the calculation blocks 11 each include a convolution calculation processing unit 11 a, an accumulation processing unit 11 b, an activation processing unit 11 c, a pooling processing unit 11 d, and the like. These processing units may be configured by hardware such as a circuit, may be configured by software, or may be configured by a combination of hardware and software. The convolution operation processing unit 11a performs a known convolution operation process on the input data input from the previous layer, and outputs the processing result data to the accumulation processing unit 11b.

累積処理部１１ｂは、例えば加算器などで構成されている。累積処理部１１ｂは、下位側の演算ブロック１１の累積処理部１１ｂからデータが入力される場合には、そのデータを、自身と同じ演算ブロック１１の畳み込み演算処理部１１ａから入力されるデータに加算する。これにより、複数の演算ブロック１１は、それぞれの演算ブロック１１の畳み込み演算処理部１１ａによる演算結果データを、下位側から上位側に向かって順次累積することが可能となっている。 The accumulation processing unit 11b is configured by an adder, for example. When the data is input from the accumulation processing unit 11b of the lower-order calculation block 11, the accumulation processing unit 11b adds the data to the data input from the convolution calculation processing unit 11a of the same calculation block 11 as itself. To do. Thereby, the plurality of operation blocks 11 can sequentially accumulate operation result data by the convolution operation processing unit 11a of each operation block 11 from the lower side to the upper side.

累積処理部１１ｂは、下位側の演算ブロック１１からデータが入力されない場合には、自身と同じ演算ブロック１１の畳み込み演算処理部１１ａから入力されるデータを活性化処理部１１ｃに出力する。また、累積処理部１１ｂは、下位側の演算ブロック１１からデータが入力される場合には、自身と同じ演算ブロック１１の畳み込み演算処理部１１ａから入力されるデータに下位側の演算ブロック１１から入力されるデータを加算した累積データを活性化処理部１１ｃに出力する。 When no data is input from the lower calculation block 11, the accumulation processing unit 11b outputs the data input from the convolution calculation processing unit 11a of the same calculation block 11 to the activation processing unit 11c. In addition, when data is input from the lower calculation block 11, the accumulation processing unit 11 b inputs data input from the convolution calculation processing unit 11 a of the same calculation block 11 from the lower calculation block 11. The accumulated data obtained by adding the data to be output is output to the activation processing unit 11c.

活性化処理部１１ｃは、累積処理部１１ｂから入力されるデータに対して周知の活性化処理を実行して、その処理結果データをプーリング処理部１１ｄに出力する。プーリング処理部１１ｄは、活性化処理部１１ｃによる処理結果データに対して周知のプーリング処理を実行して、その処理結果データを出力する。 The activation processing unit 11c executes a well-known activation process on the data input from the accumulation processing unit 11b, and outputs the processing result data to the pooling processing unit 11d. The pooling processing unit 11d executes a well-known pooling process on the processing result data from the activation processing unit 11c, and outputs the processing result data.

中継部１２は、データ保持部１４から演算ブロック１１にデータを出力するためのデータ出力部の一例であり、図示しないマルチプレクサ回路やフリップフロップ回路などを備えている。中継部１２は、自身と同じ演算処理ユニット１６のデータ保持部１４から入力されるデータを、自身と同じ演算処理ユニット１６の演算ブロック１１に出力することが可能である。また、中継部１２は、自身と同じ演算処理ユニット１６のデータ保持部１４から入力されるデータ、および、自身よりも下位側の中継部１２から入力されるデータのうち何れか一方を選択し、その選択したデータを、自身と同じ演算処理ユニット１６の演算ブロック１１に出力することが可能である。 The relay unit 12 is an example of a data output unit for outputting data from the data holding unit 14 to the operation block 11, and includes a multiplexer circuit, a flip-flop circuit, and the like (not shown). The relay unit 12 can output data input from the data holding unit 14 of the same arithmetic processing unit 16 to the arithmetic block 11 of the same arithmetic processing unit 16 as itself. The relay unit 12 selects either one of data input from the data holding unit 14 of the same arithmetic processing unit 16 as that of the relay unit 12 and data input from the relay unit 12 on the lower side than itself. The selected data can be output to the arithmetic block 11 of the same arithmetic processing unit 16 as itself.

中継部１３は、演算ブロック１１からデータ保持部１４にデータを出力するためのデータ出力部の一例であり、図示しないマルチプレクサ回路やフリップフロップ回路などを備えている。中継部１３は、自身と同じ演算処理ユニット１６の演算ブロック１１から入力されるデータを、自身と同じ演算処理ユニット１６のデータ保持部１４に出力することが可能である。また、中継部１３は、自身と同じ演算処理ユニット１６の演算ブロック１１から入力されるデータ、および、自身よりも上位側の中継部１３から入力されるデータのうち何れか一方を選択し、その選択したデータを、自身と同じ演算処理ユニット１６のデータ保持部１４に出力することが可能である。 The relay unit 13 is an example of a data output unit for outputting data from the arithmetic block 11 to the data holding unit 14, and includes a multiplexer circuit, a flip-flop circuit, and the like (not shown). The relay unit 13 can output data input from the calculation block 11 of the same arithmetic processing unit 16 to the data holding unit 14 of the same arithmetic processing unit 16 as itself. The relay unit 13 selects either one of data input from the arithmetic block 11 of the same arithmetic processing unit 16 as that of the relay unit 13 and data input from the relay unit 13 on the higher side than itself, The selected data can be output to the data holding unit 14 of the same arithmetic processing unit 16 as itself.

データ保持部１４は、いわゆる内部メモリとして機能するものであり、現階層における演算処理時に入力される入力データ、つまり演算結果データ、および、次階層における演算処理時に出力する出力データを一時的に保持するものである。データ保持部１４は、それぞれ２つのバッファ１４ａ，１４ｂを備える。また、データ保持部１４は、それぞれ図示しないスイッチング機能部を備える。スイッチング機能部は、バッファ１４ａ，１４ｂを、演算ブロック１１へのデータ出力用および演算ブロック１１からのデータ入力用に切り換える機能を有する。 The data holding unit 14 functions as a so-called internal memory, and temporarily holds input data input during calculation processing in the current hierarchy, that is, calculation result data, and output data output during calculation processing in the next hierarchy. To do. The data holding unit 14 includes two buffers 14a and 14b, respectively. The data holding unit 14 includes a switching function unit (not shown). The switching function unit has a function of switching the buffers 14 a and 14 b for data output to the operation block 11 and data input from the operation block 11.

即ち、スイッチング機能部は、例えば、バッファ１４ａをデータ出力用として機能させる場合にはバッファ１４ｂをデータ入力用として機能させるように切り換える。また、スイッチング機能部は、バッファ１４ａをデータ入力用として機能させる場合にはバッファ１４ｂをデータ出力用として機能させるように切り換える。 That is, the switching function unit switches the buffer 14b to function for data input when the buffer 14a functions for data output, for example. Further, the switching function unit switches the buffer 14b to function for data output when the buffer 14a functions for data input.

データ保持部１４は、現階層における演算処理時に入力される入力データ、つまり演算結果データを、データ入力用に切り換えられているバッファ１４ａあるいはバッファ１４ｂに保持する。そして、データ保持部１４は、次回層の演算処理時には、現階層の演算処理時においてデータ入力用に切り換えられていたバッファ１４ａあるいはバッファ１４ｂをデータ出力用に切り換え、そのバッファに保持されているデータを、演算ブロック１１に出力する。これにより、データ保持部１４は、現階層の演算処理時における演算結果データを外部メモリ１８に退避させなくとも、演算処理装置１０内部において、現階層の演算結果データを次階層に送ることができる。 The data holding unit 14 holds input data, that is, calculation result data, input at the time of calculation processing in the current hierarchy in the buffer 14a or the buffer 14b switched for data input. Then, the data holding unit 14 switches the buffer 14a or the buffer 14b, which has been switched for data input in the current layer arithmetic processing, to data output during the next layer arithmetic processing, and the data held in the buffer. Is output to the calculation block 11. Thereby, the data holding unit 14 can send the operation result data of the current layer to the next layer within the operation processing device 10 without saving the operation result data in the operation processing of the current layer to the external memory 18. .

外部メモリ１８は、例えばＤｏｕｂｌｅ−Ｄａｔａ−Ｒａｔｅ−ＳＤＲＡＭなどで構成される大規模記憶媒体であり、入力画像データＤ１や、演算ブロック１１による演算結果データなどを記憶可能である。外部メモリ１８は、この場合、インターコネクト部１７を介して複数のインタフェース１５、換言すれば複数の演算処理ユニット１６に接続されている。インターコネクト部１７は、外部メモリ１８から読み出されるデータを各演算処理ユニット１６に振り分ける機能を有する。また、インターコネクト部１７は、各演算処理ユニット１６から外部メモリ１８に書き出されるデータを外部メモリ１８に集約する機能を有する。 The external memory 18 is a large-scale storage medium configured by, for example, a Double-Data-Rate-SDRAM, and can store input image data D1, operation result data by the operation block 11, and the like. In this case, the external memory 18 is connected to a plurality of interfaces 15, in other words, a plurality of arithmetic processing units 16 via the interconnect unit 17. The interconnect unit 17 has a function of distributing data read from the external memory 18 to each arithmetic processing unit 16. The interconnect unit 17 has a function of collecting data written from each arithmetic processing unit 16 to the external memory 18 in the external memory 18.

演算処理装置１０は、さらに、冗長データ保持部１９を備える。冗長データ保持部１９は、データ保持部１４と同様に２つのバッファ１９ａ，１９ｂを備える。これらバッファ１９ａ,１９ｂの機能は、上述したバッファ１４ａ，１４ｂと同様である。冗長データ保持部１９は、最も下位側の中継部１３が出力するデータを、最も下位側のデータ保持部１４と冗長に保持する。そして、最も下位側の演算ブロック１１は、冗長データ保持部１４に保持されているデータを、自身が生成する演算結果データに加算することが可能となっている。 The arithmetic processing device 10 further includes a redundant data holding unit 19. The redundant data holding unit 19 includes two buffers 19 a and 19 b as with the data holding unit 14. The functions of these buffers 19a and 19b are the same as those of the buffers 14a and 14b described above. The redundant data holding unit 19 holds the data output from the lowermost relay unit 13 redundantly with the lowermost data holding unit 14. The lowest-order calculation block 11 can add the data held in the redundant data holding unit 14 to the calculation result data generated by itself.

次に、演算処理装置１０の動作例について説明する。即ち、図７に例示するように、複数の演算ブロック１１による演算結果データを累積して処理する場合には、矢印Ａ１で示すように、最も上位側の中継部１３は、自身と同じ演算処理ユニット１６の演算ブロック１１から入力されるデータ、つまり累積データを選択して、自身と同じ演算処理ユニット１６のデータ保持部１４に出力する。また、最上位の中継部１３を除く他の中継部１３は、自身よりも上位側の中継部１３から入力されるデータ、つまり累積データを選択して、自身と同じ演算処理ユニット１６のデータ保持部１４に出力する。 Next, an operation example of the arithmetic processing device 10 will be described. That is, as illustrated in FIG. 7, when the calculation result data by the plurality of calculation blocks 11 are accumulated and processed, as shown by the arrow A <b> 1, the uppermost relay unit 13 performs the same calculation process as itself. Data input from the arithmetic block 11 of the unit 16, that is, accumulated data is selected and output to the data holding unit 14 of the same arithmetic processing unit 16 as itself. Further, the other relay units 13 other than the highest-level relay unit 13 select data input from the relay unit 13 on the higher side than that of the relay unit 13, that is, accumulated data, and hold the data of the same arithmetic processing unit 16 as that of itself. To the unit 14.

また、図８に例示するように、複数の演算ブロック１１による演算結果データを累積せずにパラレルに処理する場合には、矢印Ａ２で示すように、中継部１３は、それぞれ、自身と同じ演算処理ユニット１６の演算ブロック１１から入力されるデータを選択して、自身と同じ演算処理ユニット１６のデータ保持部１４に出力する。 Further, as illustrated in FIG. 8, when the calculation result data by the plurality of calculation blocks 11 are processed in parallel without being accumulated, as shown by the arrow A <b> 2, each relay unit 13 performs the same calculation as itself. Data input from the arithmetic block 11 of the processing unit 16 is selected and output to the data holding unit 14 of the same arithmetic processing unit 16 as itself.

また、図９に例示するように、１つのデータを複数の演算ブロック１１により分散して処理する場合には、矢印Ａ３で示すように、最も下位側の中継部１２は、自身と同じ演算処理ユニット１６のデータ保持部１４から入力されるデータを選択して、自身と同じ演算処理ユニット１６の演算ブロック１１に出力する。また、最下位の中継部１２を除く他の中継部１２は、自身よりも下位側の中継部１２から入力されるデータを選択して、自身と同じ演算処理ユニット１６の演算ブロック１１に出力する。 Also, as illustrated in FIG. 9, when one data is distributed and processed by a plurality of calculation blocks 11, the lowermost relay unit 12 performs the same calculation process as itself, as indicated by an arrow A 3. Data input from the data holding unit 14 of the unit 16 is selected and output to the arithmetic block 11 of the same arithmetic processing unit 16 as itself. Further, the other relay units 12 other than the lowest-order relay unit 12 select data input from the relay unit 12 on the lower side than the relay unit 12 and output the selected data to the operation block 11 of the same processing unit 16 as that of itself. .

また、図１０に例示するように、複数の演算ブロック１１により処理を分散せずにパラレルに処理する場合には、矢印Ａ４で示すように、中継部１２は、それぞれ、自身と同じ演算処理ユニット１６のデータ保持部１４から入力されるデータを選択して、自身と同じ演算処理ユニット１６の演算ブロック１１に出力する。 Further, as illustrated in FIG. 10, when the processing is performed in parallel without being distributed by the plurality of operation blocks 11, as indicated by the arrow A <b> 4, each relay unit 12 has the same operation processing unit as itself. The data input from the 16 data holding units 14 is selected and output to the arithmetic block 11 of the same arithmetic processing unit 16 as itself.

演算処理装置１０によれば、１つの演算ブロック１１、２つの中継部１２，１３、１つのデータ保持部１４により１つの演算処理ユニット１６を構成している。そして、中継部１３は、同じ演算処理ユニット１６において対をなす演算ブロック１１が生成する演算結果データを、同じ演算処理ユニット１６において対をなすデータ保持部１４に出力することが可能である。また、中継部１３は、複数の演算ブロック１１により累積された演算結果データを、同じ演算処理ユニット１６において対をなすデータ保持部１４および当該データ保持部１４よりも下位側のデータ保持部１４に出力することが可能である。 According to the arithmetic processing apparatus 10, one arithmetic processing unit 16 is configured by one arithmetic block 11, two relay units 12 and 13, and one data holding unit 14. The relay unit 13 can output the operation result data generated by the operation block 11 that makes a pair in the same operation processing unit 16 to the data holding unit 14 that makes a pair in the same operation processing unit 16. In addition, the relay unit 13 transfers the calculation result data accumulated by the plurality of calculation blocks 11 to the data holding unit 14 that forms a pair in the same arithmetic processing unit 16 and the data holding unit 14 that is lower than the data holding unit 14. It is possible to output.

また、演算処理装置１０によれば、中継部１２は、同じ演算処理ユニット１６において対をなすデータ保持部１４が保持しているデータを、同じ演算処理ユニット１６において対をなす演算ブロック１１に出力することが可能である。また、中継部１２は、同じ演算処理ユニット１６において対をなすデータ保持部１４が保持しているデータを、同じ演算処理ユニット１６において対をなす演算ブロック１１および当該演算ブロック１１よりも上位側の演算ブロック１１に出力することが可能である。 Further, according to the arithmetic processing device 10, the relay unit 12 outputs the data held by the data holding unit 14 that makes a pair in the same arithmetic processing unit 16 to the arithmetic block 11 that makes a pair in the same arithmetic processing unit 16. Is possible. In addition, the relay unit 12 converts the data held in the data holding unit 14 that makes a pair in the same arithmetic processing unit 16 into the arithmetic block 11 that makes a pair in the same arithmetic processing unit 16 and the upper side of the arithmetic block 11. It is possible to output to the calculation block 11.

即ち、演算処理装置１０によれば、中継部１３による選択処理により、１つの演算ブロック１１の演算結果データを複数のデータ保持部１４に書き込むことが可能である。また、演算処理装置１０によれば、中継部１２による選択処理により、１つのデータ保持部１４が保持しているデータを複数の演算ブロック１１に分配することが可能である。この構成によれば、複数の演算ブロック１１と複数のデータ保持部１４とを接続する配線の複雑化を抑えつつも、複数の演算ブロック１１と複数のデータ保持部１４との間でデータの読み書きを効率的に行うことができ、処理速度の低下や装置の大型化を回避することができる。 That is, according to the arithmetic processing device 10, it is possible to write the calculation result data of one calculation block 11 to the plurality of data holding units 14 by the selection process by the relay unit 13. Further, according to the arithmetic processing device 10, the data held by one data holding unit 14 can be distributed to the plurality of calculation blocks 11 by the selection process by the relay unit 12. According to this configuration, data reading / writing is performed between the plurality of operation blocks 11 and the plurality of data holding units 14 while suppressing the complexity of the wiring connecting the plurality of operation blocks 11 and the plurality of data holding units 14. Can be efficiently performed, and a reduction in processing speed and an increase in the size of the apparatus can be avoided.

また、演算処理装置１０によれば、冗長データ保持部１９は、最も下位側の中継部１３が出力するデータを、最も下位側のデータ保持部１４と冗長的に保持する。即ち、冗長データ保持部１９は、最も下位側のデータ保持部１４と同じデータを保持する。そして、最も下位側の演算ブロック１１は、冗長データ保持部１９に保持されているデータを読み込んで自身の演算結果データに加算することが可能である。 Further, according to the arithmetic processing unit 10, the redundant data holding unit 19 holds the data output from the lowest-order relay unit 13 redundantly with the lowest-order data holding unit 14. That is, the redundant data holding unit 19 holds the same data as the lowest data holding unit 14. The lowest-order calculation block 11 can read the data held in the redundant data holding unit 19 and add it to its own calculation result data.

この構成によれば、例えば、ある処理階層において演算するデータの数が演算ブロック１１の数を超えていて、１度の演算サイクルで全てのデータに対し演算を行えない場合に、まず、演算ブロック１１の数分のデータに対して演算処理を行い、その演算結果を冗長データ保持部１９に保持しておく。そして、演算ブロック１１の数分の次のデータに対して演算処理を行う場合に、冗長データ保持部１９に保持されているデータを読み出して累積加算することができる。従って、演算ブロック１１の数以上のデータ、つまり特徴マップデータを処理する場合であっても、演算ブロック１１の数分ずつに分割して得られた演算処理結果を順次累積加算していくことで、全てのデータに対するトータルの演算処理結果を得ることができる。 According to this configuration, for example, when the number of data to be calculated in a certain processing hierarchy exceeds the number of calculation blocks 11 and calculation cannot be performed on all data in one calculation cycle, first, the calculation block Arithmetic processing is performed on the data corresponding to the number of 11 and the calculation result is held in the redundant data holding unit 19. When the arithmetic processing is performed on the next data corresponding to the number of arithmetic blocks 11, the data held in the redundant data holding unit 19 can be read and cumulatively added. Therefore, even when processing more data than the number of operation blocks 11, that is, when feature map data is processed, the operation processing results obtained by dividing the operation block 11 by the number of operations are sequentially accumulated and added. Thus, a total calculation processing result for all data can be obtained.

また、演算処理装置１０によれば、複数のデータ保持部１４が直接アクセス可能な外部メモリ１８を備える。この構成によれば、格納するデータサイズがバッファ１４ａ，１４ｂの容量を超えている場合に、そのデータを外部メモリ１８に退避させることができ、大規模なデータサイズのデータに対する演算処理に対応することができる。 In addition, the arithmetic processing device 10 includes the external memory 18 that can be directly accessed by the plurality of data holding units 14. According to this configuration, when the data size to be stored exceeds the capacity of the buffers 14a and 14b, the data can be saved in the external memory 18, which corresponds to a calculation process for data having a large data size. be able to.

さらに、演算処理装置１０は、データ保持部１４と外部メモリ１８とのアクセス方式としてＤＭＡ方式（Direct Memory Access）を採用している。そのため、データ保持部１４と外部メモリ１８との間のデータの書き込み処理および読み出し処理を高速で行うことができる。 Further, the arithmetic processing device 10 employs a DMA method (Direct Memory Access) as an access method between the data holding unit 14 and the external memory 18. Therefore, the data writing process and the data reading process between the data holding unit 14 and the external memory 18 can be performed at high speed.

このとき、外部メモリ１８は、データ保持部１４がアクセスするアドレスを次の式［１］または式［２］により決定するように構成するとよい。
Ａ［ｎ］＝ａ［ｎ］＋ｍ×ｄ・・・・・［１］
Ａ［ｎ］＝ａ［０］＋ｍ×ｄ・・・・・［２］
ｎ：データ保持部の番号
Ａ［ｎ］：ｎ番目のデータ保持部がアクセスするアドレス
ａ［ｎ］：ｎ番目のデータ保持部のオフセットアドレス
ｍ：演算ブロックが生成する特徴マップの番号
ｄ：演算ブロックが生成する特徴マップのデータサイズ At this time, the external memory 18 may be configured to determine an address to be accessed by the data holding unit 14 by the following equation [1] or equation [2].
A [n] = a [n] + m × d [1]
A [n] = a [0] + m × d (2)
n: Number of the data holding unit
A [n]: Address accessed by the nth data holding unit
a [n]: Offset address of the nth data holding unit
m: Number of feature map generated by the calculation block
d: Data size of the feature map generated by the calculation block

式［１］によりアドレスを決定する場合、図１１に例示するように、例えば１番目、つまり最上位のデータ保持部１４がアクセスするアドレスＡ［１］は、１番目のデータ保持部１４のオフセットアドレスａ［１］を「１０００」、演算ブロック１１が生成する特徴マップの番号を「１」、演算ブロック１１が生成する特徴マップ１のデータサイズを「３００」バイトとすると、
Ａ［１］＝「１０００」＋「１」×「３００」
＝１３００
となる。 When the address is determined by the expression [1], as illustrated in FIG. 11, for example, the address A [1] accessed by the first, that is, the highest data holding unit 14 is the offset of the first data holding unit 14. If the address a [1] is “1000”, the feature map number generated by the computation block 11 is “1”, and the data size of the feature map 1 created by the computation block 11 is “300” bytes,
A [1] = “1000” + “1” × “300”
= 1300
It becomes.

また、例えば２番目、つまり最上位から２番目のデータ保持部１４がアクセスするアドレスＡ［２］は、２番目のデータ保持部１４のオフセットアドレスａ［２］を「２０００」、演算ブロック１１が生成する特徴マップの番号を「１」、演算ブロック１１が生成する特徴マップ１のデータサイズを「３００」バイトとすると、
Ａ［２］＝「２０００」＋「１」×「３００」
＝２３００
となる。 Further, for example, the address A [2] accessed by the second data holding unit 14 from the second highest, that is, the offset address a [2] of the second data holding unit 14 is “2000”, and the calculation block 11 When the number of the feature map to be generated is “1” and the data size of the feature map 1 generated by the calculation block 11 is “300” bytes,
A [2] = “2000” + “1” × “300”
= 2300
It becomes.

即ち、式［１］によりアドレスを決定する場合、複数のデータ保持部１４は、それぞれに割り当てられたアドレスにアクセスすることとなる。換言すれば、外部メモリ１８のメモリ領域を複数のデータ保持部１４により分割して利用する形態となる。 That is, when the address is determined by the equation [1], the plurality of data holding units 14 access the addresses assigned to each. In other words, the memory area of the external memory 18 is divided and used by the plurality of data holding units 14.

また、式［２］によりアドレスを決定する場合、図１２に例示するように、例えば１番目、つまり最上位のデータ保持部１４がアクセスするアドレスＡ［１］は、予め設定されているａ［０］を「１０００」、演算ブロック１１が生成する特徴マップの番号を「１」、演算ブロック１１が生成する特徴マップ１のデータサイズを「３００」バイトとすると、
Ａ［１］＝「１０００」＋「１」×「３００」
＝１３００
となる。 When the address is determined by the expression [2], as illustrated in FIG. 12, for example, the address A [1] accessed by the first, that is, the uppermost data holding unit 14, is a preset a [ 0] is “1000”, the feature map number generated by the computation block 11 is “1”, and the data size of the feature map 1 created by the computation block 11 is “300” bytes.
A [1] = “1000” + “1” × “300”
= 1300
It becomes.

また、例えば２番目、つまり最上位から２番目のデータ保持部１４がアクセスするアドレスＡ［２］も、ａ［０］を「１０００」、演算ブロック１１が生成する特徴マップの番号を「１」、演算ブロック１１が生成する特徴マップ１のデータサイズを「３００」バイトとして、
Ａ［２］＝「１０００」＋「１」×「３００」
＝１３００
となる。 Also, for example, the address A [2] accessed by the second, that is, the second highest data holding unit 14 also has a [0] of “1000” and the feature map number generated by the calculation block 11 of “1”. The data size of the feature map 1 generated by the calculation block 11 is “300” bytes.
A [2] = “1000” + “1” × “300”
= 1300
It becomes.

即ち、式［２］によりアドレスを決定する場合、全てのデータ保持部１４が同じアドレス「１３００」にアクセスすることとなる。換言すれば、外部メモリ１８のメモリ領域を複数のデータ保持部１４により共有する形態となる。
なお、本発明は、上述した実施形態に限定されるものではなく、その要旨を逸脱しない範囲で種々の実施形態に適用可能である。 That is, when the address is determined by the expression [2], all the data holding units 14 access the same address “1300”. In other words, the memory area of the external memory 18 is shared by the plurality of data holding units 14.
Note that the present invention is not limited to the above-described embodiments, and can be applied to various embodiments without departing from the gist thereof.

図面中、１０は演算処理装置、１１は演算ブロック、１２は中継部（データ出力部）、１３は中継部（データ出力部）、１４はデータ保持部、１８は外部メモリ、１９は冗長データ保持部を示す。 In the figure, 10 is an arithmetic processing unit, 11 is an arithmetic block, 12 is a relay unit (data output unit), 13 is a relay unit (data output unit), 14 is a data holding unit, 18 is an external memory, and 19 is redundant data. Indicates the part.

Claims

An arithmetic processing device (10) that executes arithmetic operations by a plurality of processing layers connected in a hierarchical manner,
A plurality of calculation blocks (11) for performing the calculation;
A plurality of data holding units (14) each paired with a plurality of the calculation blocks;
A plurality of data output units (13) each paired with a plurality of the operation blocks;
With
The plurality of operation blocks, the plurality of data holding units, and the plurality of data output units are arranged in a row from the lower side to the upper side,
The plurality of operation blocks sequentially accumulate operation result data from the lower side to the upper side,
The data output unit outputs the operation result data generated by the pair of operation blocks to the data holding unit that forms a pair, or the data holding unit that forms a pair of accumulated operation result data and the data holding unit An arithmetic processing unit that outputs to the data holding unit on the lower side.

A redundant data holding unit (19) for holding data output by the data output unit on the lowest side;
The arithmetic processing unit according to claim 1, wherein the lowermost arithmetic block adds data held in the redundant data holding unit to arithmetic result data.

An arithmetic processing device (10) that executes arithmetic operations by a plurality of processing layers connected in a hierarchical manner,
A plurality of calculation blocks (11) for performing the calculation;
A plurality of data holding units (14) each paired with a plurality of the calculation blocks;
A plurality of data output units (12) each paired with a plurality of the operation blocks;
With
The plurality of operation blocks, the plurality of data holding units, and the plurality of data output units are arranged in a row from the lower side to the upper side,
The data output unit outputs the data held by the data holding unit that makes a pair to the operation block that makes a pair, or the operation that makes a pair of the data held by the data holding unit that makes a pair An arithmetic processing unit that outputs the block and the arithmetic block on the upper side of the arithmetic block.

The arithmetic processing unit according to any one of claims 1 to 3, further comprising an external memory (18) directly accessible by a plurality of the data holding units.

The calculation block generates a plurality of feature maps respectively corresponding to a plurality of features included in input data,
The arithmetic processing apparatus according to claim 4, wherein the external memory determines an address to be accessed by the data holding unit using the following formula [1] or formula [2].
A [n] = a [n] + m × d [1]
A [n] = a [0] + m × d (2)
n: Number of the data holding unit
A [n]: Address accessed by the nth data holding unit
a [n]: Offset address of the nth data holding unit
m: Number of feature map generated by the calculation block
d: Data size of the feature map generated by the calculation block