JP7078129B2

JP7078129B2 - Arithmetic processing device and control method of arithmetic processing device

Info

Publication number: JP7078129B2
Application number: JP2020552440A
Authority: JP
Inventors: 洋征和田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2018-10-24
Filing date: 2018-10-24
Publication date: 2022-05-31
Anticipated expiration: 2038-10-24
Also published as: JPWO2020084721A1; WO2020084721A1

Description

本発明は、演算処理装置及び演算処理装置の制御方法に関する。 The present invention relates to an arithmetic processing apparatus and a control method for the arithmetic processing apparatus.

近年、画像処理等で使用するデータをＣＰＵ（Central Processing Unit）等のプロセッサを用いて効率的に処理するために、１つの命令コードに基づいて複数のデータを並列に演算するＳＩＭＤ（Single Instruction Multiple Data）と称する手法が提案されている。以下では、ＳＩＭＤを用いた演算をＳＩＭＤ演算という。 In recent years, SIMD (Single Instruction Multiple) that calculates a plurality of data in parallel based on one instruction code in order to efficiently process data used in image processing using a processor such as a CPU (Central Processing Unit). A method called Data) has been proposed. Hereinafter, the operation using SIMD is referred to as a SIMD operation.

例えば、ＳＩＭＤ演算は、深層学習（ディープラーニング）の畳み込み演算などで用いられる。具体的には、深層学習を実行する演算処理装置は、１つの命令に対して複数の積和演算器を用いて並列に畳み込み演算を行うといったＳＩＭＤ演算を行う。 For example, the SIMD operation is used in a convolution operation of deep learning. Specifically, the arithmetic processing apparatus that executes deep learning performs SIMD operations such as performing convolution operations in parallel using a plurality of product-sum arithmetic units for one instruction.

一方、ＳＩＭＤ演算を行う場合に、配列の各要素の値に応じて表を参照するような場合がある。例えば、配列の各要素に対して、個々に関数値を求める場合が考えられる。この場合、関数値を求める対象の関数を区間毎に傾きが異なる直線を繋いだ折れ線により近似し、各要素が属する区間における直線を用いて個々の関数値の近似値を算出する方法がある。関数値を求める対象の関数が区間毎に傾きが異なる直線を繋いだ折れ線により近似する処理は、線形補間と呼ばれる場合がある。 On the other hand, when performing a SIMD operation, the table may be referred to according to the value of each element of the array. For example, it is conceivable to obtain the function value individually for each element of the array. In this case, there is a method of approximating the target function for which the function value is to be obtained by a broken line connecting straight lines having different slopes for each section, and calculating the approximate value of each function value using the straight line in the section to which each element belongs. The process of approximating the target function for which the function value is obtained by a polygonal line connecting straight lines having different slopes for each interval may be called linear interpolation.

このような折れ線による近似では、要素の値によって折れ線で表されるグラフのどの区分に属するかが決まるため、属する区分は要素毎に別々となる。そこで、要素毎に異なる傾きと切片を有する１次式を使用して、要素毎の関数の近似値が算出される。そのため、区分毎の傾きと切片とを組にして並べたルックアップテーブル（ＬＵＴ：Look Up Table）を用意し、要素の値を上位桁だけで打ち切った値を入力としてその表を参照し、表から得られて値に基づいて１次式を計算して近似値を得る方法が考えられる。 In such an approximation by a polygonal line, the division to which the graph belongs is determined by the value of the element, so that the division to which the graph belongs is different for each element. Therefore, an approximate value of the function for each element is calculated using a linear equation having a different slope and intercept for each element. Therefore, prepare a look-up table (LUT: Look Up Table) in which the slope and intercept of each division are arranged as a set, and refer to the table by inputting the value obtained by dividing the element value only by the upper digit. A method of calculating a linear equation based on the value obtained from the above to obtain an approximate value can be considered.

例えば、近年の命令セットの中には、ＳＩＭＤ演算において要素毎にルックアップテーブルを参照する命令が用意されている場合がある。また、ＧＰＵ（Graphical Processing Unit）では、補間演算専用の補間器が搭載される場合もある。 For example, in a recent instruction set, an instruction that refers to a look-up table for each element in a SIMD operation may be prepared. Further, the GPU (Graphical Processing Unit) may be equipped with an interpolator dedicated to interpolation calculation.

また、インデックスレジスタ内部に記憶されたインデックス値から更新される１つ以上のテーブルレジスタに記憶されたデータ要素を選択し、選択したデータ要素に対応する結果レジスタ内部の位置を決定する従来技術がある。また、レジスタファイルを４つのバンクに分け、１個のオペランドで複数個のレジスタを指定して４個のレジスタに同時にアクセスする従来技術がある。 Further, there is a prior art technique in which a data element stored in one or more table registers to be updated is selected from an index value stored in the index register, and a position in the result register corresponding to the selected data element is determined. .. Further, there is a conventional technique of dividing a register file into four banks, designating a plurality of registers with one operand, and accessing the four registers at the same time.

特開２００５－１７４２９４号公報Japanese Unexamined Patent Publication No. 2005-174294 特開２００２－１４９４００号公報Japanese Unexamined Patent Publication No. 2002-149400

J. A. Pineiro, S. F. Oberman, J. M. Muller, and J. D. Bruguera, “High-speed function approximation using a minimax quadratic interpolator,” IEEE Trans. Computers, vol. 54, no. 3, pp. 304-318, 2005.J. A. Pineiro, S. F. Oberman, J. M. Muller, and J. D. Bruguera, “High-speed function approximation using a minimax quadratic interpolator,” IEEE Trans. Computers, vol. 54, no. 3, pp. 304-318, 2005.

しかしながら従来の技術では、大きなデメリットを伴うことがある。例えば、命令セットに用意された命令を使用する場合、従来は、要素の値が確定した後に、メモリから対象の表の該当エントリのアドレスの内容が読み出される。この場合、各要素のアドレス計算、メモリからの読み出し及び対象のＳＩＭＤ要素に対応する位置へのデータ配送のオーバヘッドが大きく、ルックアップテーブル参照を高速にＳＩＭＤに処理する場合、大きな遅延が発生するおそれがある。 However, conventional techniques may have significant disadvantages. For example, when the instruction provided in the instruction set is used, conventionally, the contents of the address of the corresponding entry in the target table are read from the memory after the value of the element is fixed. In this case, the overhead of address calculation of each element, reading from memory, and data delivery to the position corresponding to the target SIMD element is large, and when the lookup table reference is processed to SIMD at high speed, a large delay may occur. There is.

また、ＧＰＵに搭載の補間器では、演算器をベースに、パラメータのテーブルを格納するために、大きな専用回路を追加したものが一般的である。しかし、ＬＳＩへの演算器の搭載密度を極力向上させたいという要望は大きく、追加回路の多い専用補間器の搭載は困難である。 Further, in the interpolator mounted on the GPU, it is common to add a large dedicated circuit in order to store a table of parameters based on the arithmetic unit. However, there is a great demand to improve the mounting density of the arithmetic unit on the LSI as much as possible, and it is difficult to mount a dedicated interpolator with many additional circuits.

また、更新されるテーブルレジスタを介して結果レジスタを選択する従来技術を用いても、テーブルレジスタの領域を確保することから回路規模を小さく抑えることは困難である。また、１個のオペランドで複数のレジスタに同時にアクセスする従来技術であっても、各レジスタに格納されるテーブルのサイズによりレジスタサイズが大きくなるため、複数のレジスタを配置することで回路規模が大きくなるおそれがある。 Further, even if the conventional technique of selecting the result register via the updated table register is used, it is difficult to keep the circuit scale small because the area of the table register is secured. Further, even in the conventional technique of accessing a plurality of registers at the same time with one operand, the register size becomes large depending on the size of the table stored in each register, so that the circuit scale becomes large by arranging a plurality of registers. There is a risk of becoming.

開示の技術は、上記に鑑みてなされたものであって、回路規模を小さく抑えつつ処理効率を向上させた演算処理装置及び演算処理装置の制御方法を提供することを目的とする。 The disclosed technique has been made in view of the above, and an object of the present invention is to provide an arithmetic processing apparatus and a control method of the arithmetic processing apparatus in which the processing efficiency is improved while keeping the circuit scale small.

本願の開示する演算処理装置及び演算処理装置の制御方法は、一つの態様において、記憶部は、複数の要素記憶領域を有し、複数の演算値を１つずつ前記要素記憶領域のそれぞれに格納する。演算部は、乗算器及び加算器を有する積和演算機であり、入力された要素データを基に第１選択値を生成し、前記第１選択値を基に前記要素記憶領域に格納された前記演算値の中から第１演算値を取得し、前記第１演算値を基に演算結果を取得する処理を前記乗算器及び前記加算器を用いて行う。 In one embodiment of the arithmetic processing apparatus and the control method of the arithmetic processing apparatus disclosed in the present application, the storage unit has a plurality of element storage areas, and the plurality of arithmetic values are stored one by one in each of the element storage areas. do. The arithmetic unit is a product-sum arithmetic unit having a multiplier and an adder, generates a first selection value based on the input element data, and stores the first selection value in the element storage area based on the first selection value. The process of acquiring the first calculated value from the calculated values and acquiring the calculated result based on the first calculated value is performed by using the multiplier and the adder .

１つの側面では、本発明は、回路規模を小さく抑えつつ処理効率を向上させることができる。 In one aspect, the present invention can improve processing efficiency while keeping the circuit scale small.

図１は、情報処理装置の全体構成図である。FIG. 1 is an overall configuration diagram of an information processing device. 図２は、積和演算部の詳細な回路図である。FIG. 2 is a detailed circuit diagram of the product-sum calculation unit. 図３は、実施例１で使用する折れ線近似を表す図である。FIG. 3 is a diagram showing a polygonal line approximation used in the first embodiment. 図４は、直線近似関数の関数値を求めるために使用されるルックアップテーブルの一例の図である。FIG. 4 is an example of a look-up table used to obtain the function value of the linear approximation function. 図５は、実施例１に係る要素レジスタが格納する値を示した図である。FIG. 5 is a diagram showing values stored in the element register according to the first embodiment. 図６は、与えられた要素データに対応する切片の値を取得するまでの要素レジスタの状態の遷移を説明するための図である。FIG. 6 is a diagram for explaining the transition of the state of the element register until the value of the intercept corresponding to the given element data is acquired. 図７は、切片の値の取得後から演算結果取得までの要素レジスタの状態の遷移を説明するための図である。FIG. 7 is a diagram for explaining the transition of the state of the element register from the acquisition of the intercept value to the acquisition of the operation result. 図８は、図５～７で示した演算を積和演算部に行わせるためのプログラムの一例の図である。FIG. 8 is a diagram of an example of a program for causing the product-sum calculation unit to perform the operations shown in FIGS. 5 to 7. 図９は、ＳＩＭＤ演算命令の処理のフローチャートである。FIG. 9 is a flowchart of processing of SIMD operation instructions. 図１０は、ＳＩＭＤ演算で要素データ毎のテーブル検索演算を実行する処理のフローチャートである。FIG. 10 is a flowchart of a process for executing a table search operation for each element data in a SIMD operation. 図１１は、実施例２に係るベクタレジスタに格納されるテーブルの値の一例を示す図である。FIG. 11 is a diagram showing an example of the values of the table stored in the vector register according to the second embodiment. 図１２は、２の冪乗を求める演算を積和演算部に行わせるためのプログラムの一例の図である。FIG. 12 is a diagram of an example of a program for causing the product-sum calculation unit to perform an operation for obtaining the power of 2.

以下に、本願の開示する演算処理装置及び演算処理装置の制御方法の実施例を図面に基づいて詳細に説明する。なお、以下の実施例により本願の開示する演算処理装置及び演算処理装置の制御方法が限定されるものではない。 Hereinafter, examples of the arithmetic processing apparatus and the control method of the arithmetic processing apparatus disclosed in the present application will be described in detail with reference to the drawings. The following embodiments do not limit the arithmetic processing apparatus and the control method of the arithmetic processing apparatus disclosed in the present application.

図１は、情報処理装置の全体構成図である。情報処理装置５０は、ＰＣＩ（Peripheral Component Interconnect）カード１及びホストコンピュータ２を有する。ＰＣＩカード１とホストコンピュータ２とはＰＣＩバスで接続され、互いにデータの送受信を行う。 FIG. 1 is an overall configuration diagram of an information processing device. The information processing apparatus 50 includes a PCI (Peripheral Component Interconnect) card 1 and a host computer 2. The PCI card 1 and the host computer 2 are connected by a PCI bus, and data is transmitted to and received from each other.

ホストコンピュータ２は、例えば、深層学習を実行する際の全体的な管理を行う。深層学習を実行する場合、ホストコンピュータ２は、ＰＣＩカード１に対して畳み込み演算などの深層学習における所定の演算の実行を指示する。また、ホストコンピュータ２は、ＳＩＭＤ演算に用いる配列に含まれる要素データ毎にテーブルを参照して関数値を取得するなどのＳＩＭＤ演算の実行をＰＣＩカード１に対して指示する。 The host computer 2, for example, performs overall management when performing deep learning. When executing deep learning, the host computer 2 instructs the PCI card 1 to execute a predetermined operation in deep learning such as a convolution operation. Further, the host computer 2 instructs the PCI card 1 to execute the SIMD operation such as referring to the table for each element data included in the array used for the SIMD operation and acquiring the function value.

ＰＣＩカード１は、ホストコンピュータ２からの指示を受けて演算を実行し、演算結果をホストコンピュータ２に出力する。ＰＣＩカード１は、図１に示すように、複数の処理ユニット１０、全体命令制御部１１、メモリコントローラ１２、メモリ１３及びＰＣＩ制御部１４を有する。このＰＣＩカード１が、「演算処理装置」の一例にあたる。 The PCI card 1 receives an instruction from the host computer 2, executes an operation, and outputs the operation result to the host computer 2. As shown in FIG. 1, the PCI card 1 has a plurality of processing units 10, an overall instruction control unit 11, a memory controller 12, a memory 13, and a PCI control unit 14. This PCI card 1 corresponds to an example of a "arithmetic processing device".

ＰＣＩ制御部１４は、演算の実行を指示する演算命令や演算で使用する演算データの入力をホストコンピュータ２から受ける。そして、ＰＣＩ制御部１４は、取得した演算命令や演算データをメモリコントローラ１２へ出力する。 The PCI control unit 14 receives from the host computer 2 an operation command instructing execution of an operation and an input of operation data used in the operation. Then, the PCI control unit 14 outputs the acquired arithmetic instructions and arithmetic data to the memory controller 12.

また、ＰＣＩ制御部１４は、指示された演算に対する演算結果の入力をメモリコントローラ１２から受ける。そして、ＰＣＩ制御部１４は、演算結果をホストコンピュータ２へ出力する。 Further, the PCI control unit 14 receives the input of the calculation result for the instructed calculation from the memory controller 12. Then, the PCI control unit 14 outputs the calculation result to the host computer 2.

メモリコントローラ１２は、演算命令や演算で使用する演算データの入力をＰＣＩ制御部１４から受ける。そして、メモリコントローラ１２は、取得した演算命令及び演算データをメモリ１３へ格納する。 The memory controller 12 receives input of calculation instructions and calculation data used in the calculation from the PCI control unit 14. Then, the memory controller 12 stores the acquired calculation instruction and calculation data in the memory 13.

また、メモリコントローラ１２は、演算を実行する際に用いる演算データのベクタレジスタ１１１への格納の指示を全体命令制御部１１から受ける。そして、メモリコントローラ１２は、指定された演算データを指定された積和演算部１００のベクタレジスタ１１１へ格納する。ここで、メモリコントローラ１２は、直列に並んだ処理ユニット１０のうちの後段の処理ユニット１０に対してデータを送信する場合、積和演算部１００を迂回させてマルチプレクサ１０３へ演算データを出力する。 Further, the memory controller 12 receives an instruction from the general instruction control unit 11 to store the calculation data used when executing the calculation in the vector register 111. Then, the memory controller 12 stores the designated calculation data in the vector register 111 of the designated product-sum calculation unit 100. Here, when the memory controller 12 transmits data to the processing unit 10 in the subsequent stage among the processing units 10 arranged in series, the memory controller 12 bypasses the product-sum calculation unit 100 and outputs the calculation data to the multiplexer 103.

また、メモリコントローラ１２は、演算結果の格納指示を全体命令制御部１１から受けると、指定された積和演算部１００のベクタレジスタ１１１から演算結果を取得しメモリ１３へ格納する。さらに、メモリコントローラ１２は、ホストコンピュータ２からＰＣＩ制御部１４を介して指示を受けると、メモリ１３に格納された演算結果を読み出し、ＰＣＩ制御部１４へ出力する。 Further, when the memory controller 12 receives an instruction to store the calculation result from the general instruction control unit 11, the memory controller 12 acquires the calculation result from the vector register 111 of the designated product-sum calculation unit 100 and stores it in the memory 13. Further, when the memory controller 12 receives an instruction from the host computer 2 via the PCI control unit 14, the memory controller 12 reads out the calculation result stored in the memory 13 and outputs it to the PCI control unit 14.

全体命令制御部１１は、ホストコンピュータ２から実行が指示された演算の全体の統括管理を行う。全体命令制御部１１は、ホストコンピュータ２からの指示をＰＣＩ制御部１４経由で受け、メモリ１３に格納された全体命令列を次々と読んでは実行する。全体命令には、メモリ１３から演算命令バッファ１０２に演算命令列を転送する命令、メモリ１３から演算データをベクタレジスタ１１１に格納する命令、演算命令バッファ１０２に格納された演算命令列を演算命令制御部１０１に実行開始させる命令、ベクタレジスタ１１１に格納された演算結果をメモリ１３に格納する命令、命令列の実行を終了する命令などがある。 The general instruction control unit 11 manages the entire operation instructed to be executed by the host computer 2. The general instruction control unit 11 receives an instruction from the host computer 2 via the PCI control unit 14, reads the entire instruction sequence stored in the memory 13 one after another, and executes the instruction. The whole instruction includes an instruction to transfer an operation instruction sequence from the memory 13 to the operation instruction buffer 102, an instruction to store operation data from the memory 13 in the vector register 111, and an operation instruction sequence stored in the operation instruction buffer 102 to control the operation instruction. There are an instruction to start execution in the unit 101, an instruction to store the operation result stored in the vector register 111 in the memory 13, an instruction to end the execution of the instruction sequence, and the like.

全体命令制御部１１は、演算命令列を処理ユニット１０に実行させる。演算を処理ユニット１０に実行させる場合、全体命令制御部１１は、演算を実行する際に用いる演算データの取得をメモリコントローラ１２に指示する。また、処理ユニット１０における演算が完了すると、全体命令制御部１１は、演算結果の格納をメモリコントローラ１２に指示する。さらに、実行が指示された演算の全ての処理が完了すると、全体命令制御部１１は、演算完了をメモリコントローラ１２へ通知する。 The general instruction control unit 11 causes the processing unit 10 to execute the operation instruction sequence. When the processing unit 10 is made to execute the calculation, the general instruction control unit 11 instructs the memory controller 12 to acquire the calculation data used when executing the calculation. Further, when the calculation in the processing unit 10 is completed, the general instruction control unit 11 instructs the memory controller 12 to store the calculation result. Further, when all the processes of the operations instructed to be executed are completed, the general instruction control unit 11 notifies the memory controller 12 of the completion of the operations.

次に、処理ユニット１０について説明する。処理ユニット１０は、図１に示すように１つのＰＣＩカード１に複数搭載される。各処理ユニット１０は、並列及び直列に複数接続される。処理ユニット１０は、ある態様においては１２８個である。処理ユニット１０は、積和演算部１００、演算命令制御部１０１、演算命令バッファ１０２及びマルチプレクサ１０３を有する。 Next, the processing unit 10 will be described. As shown in FIG. 1, a plurality of processing units 10 are mounted on one PCI card 1. A plurality of each processing unit 10 are connected in parallel and in series. The number of processing units 10 is 128 in one embodiment. The processing unit 10 includes a product-sum calculation unit 100, a calculation instruction control unit 101, a calculation instruction buffer 102, and a multiplexer 103.

演算命令制御部１０１は、演算命令の実行処理を管理制御する。演算命令制御部１０１は、個々の演算の実行の指示を全体命令制御部１１から受ける。処理ユニット１０で実行できる命令を、全体命令と対比させて演算命令と呼んでいるが、命令には、積和演算部に演算を行わせる狭義の演算命令のほか、汎用レジスタ（図示しない）の操作命令、分岐命令、繰り返し命令、命令列の実行を停止する命令などが含まれる。 The operation instruction control unit 101 manages and controls the execution process of the operation instruction. The operation command control unit 101 receives instructions for executing individual operations from the overall instruction control unit 11. The instructions that can be executed by the processing unit 10 are called arithmetic instructions in comparison with the whole instructions. The instructions include arithmetic instructions in the narrow sense that cause the product-sum calculation unit to perform operations, as well as general-purpose registers (not shown). It includes operation instructions, branch instructions, repeat instructions, instructions to stop the execution of an instruction sequence, and the like.

演算命令制御部１０１は、演算命令バッファ１０２に格納された演算命令を実行順に取得する。次に、演算命令制御部１０１は、取得した演算命令で指定された演算データの出力をベクタレジスタ１１１に指示する。また、演算命令制御部１０１は、取得した演算命令にしたがい、演算実行の指示を積和演算器１１２へ出力する。その後、演算命令制御部１０１は、積和演算器１１２の内で演算結果を用いた演算をループさせる。また、演算命令制御部１０１は、例えば、要素データ毎にテーブルを参照して関数値を取得するＳＩＭＤ演算の実行命令を発行する。 The operation instruction control unit 101 acquires the operation instructions stored in the operation instruction buffer 102 in the order of execution. Next, the operation instruction control unit 101 instructs the vector register 111 to output the operation data specified by the acquired operation instruction. Further, the arithmetic instruction control unit 101 outputs an instruction to execute the arithmetic to the multiply-accumulate arithmetic unit 112 according to the acquired arithmetic instruction. After that, the arithmetic instruction control unit 101 loops the arithmetic using the arithmetic result in the multiply-accumulate arithmetic unit 112. Further, the operation instruction control unit 101 issues, for example, an execution instruction of a SIMD operation that refers to a table for each element data and acquires a function value.

演算命令バッファ１０２は、演算命令列を格納する記憶領域である。演算命令バッファ１０２は、メモリコントローラ１２から入力された演算命令列を指示されたアドレスから入力順に格納する。その後、演算命令制御部１０１からの演算命令の取得要求を受けて、演算命令バッファ１０２は、演算命令制御部１０１に要求されたアドレスの演算命令を出力する。 The operation instruction buffer 102 is a storage area for storing an operation instruction sequence. The operation instruction buffer 102 stores the operation instruction sequence input from the memory controller 12 in the order of input from the instructed address. After that, in response to the operation instruction acquisition request from the operation instruction control unit 101, the operation instruction buffer 102 outputs the operation instruction of the requested address to the operation instruction control unit 101.

積和演算部１００は、ベクタレジスタ１１１及び積和演算器１１２を有する。ただし、積和演算部１００が有するベクタレジスタ１１１は、処理ユニット１０に搭載されたベクタレジスタ全体の一部にあたる。 The product-sum calculation unit 100 has a vector register 111 and a product-sum calculation unit 112. However, the vector register 111 included in the product-sum calculation unit 100 corresponds to a part of the entire vector register mounted on the processing unit 10.

ベクタレジスタ１１１は、演算を実行する際に用いる演算データの入力をメモリコントローラ１２から受けて、入力された演算データを格納する。その後、ベクタレジスタ１１１は、演算命令制御部１０１からの指示を受けて、演算で使用する演算データを積和演算器１１２に出力する。また、積和演算器１１２による演算のループ処理が完了後、ベクタレジスタ１１１は、積和演算器１１２の演算結果を受け取る。そして、メモリ１３への出力の指示をメモリコントローラ１２から受けると、ベクタレジスタ１１１は、指示された領域に格納された演算結果を、マルチプレクサ１０３へ出力する。 The vector register 111 receives the input of the calculation data used when executing the calculation from the memory controller 12, and stores the input calculation data. After that, the vector register 111 receives an instruction from the arithmetic instruction control unit 101 and outputs the arithmetic data used in the arithmetic to the multiply-accumulate arithmetic unit 112. Further, after the loop processing of the calculation by the multiply-accumulate calculator 112 is completed, the vector register 111 receives the calculation result of the product-sum calculator 112. Then, when an instruction to output to the memory 13 is received from the memory controller 12, the vector register 111 outputs the calculation result stored in the instructed area to the multiplexer 103.

また、要素データ毎にテーブルを参照して関数値を取得するＳＩＭＤ演算の場合、ベクタレジスタ１１１は、要素データが属する区間に対応するテーブルの値を保持する。 Further, in the case of a SIMD operation in which a function value is acquired by referring to a table for each element data, the vector register 111 holds the value of the table corresponding to the section to which the element data belongs.

積和演算器１１２は、演算命令制御部１０１からの演算実行の指示を受ける。そして、積和演算器１１２は、ベクタレジスタ１１１から入力された演算データを用いて積和演算を実行する。その後、積和演算器１１２は、演算結果をベクタレジスタ１１１へ出力する。命令により累積を指示された場合は、積和演算器１１２は、累積演算結果を演算器内のレジスタ（アキュムレータ）に保持し、後続の累積演算命令で使用する。 The product-sum calculation unit 112 receives an instruction to execute a calculation from the calculation instruction control unit 101. Then, the product-sum calculation unit 112 executes the product-sum calculation using the calculation data input from the vector register 111. After that, the product-sum calculation unit 112 outputs the calculation result to the vector register 111. When the cumulative operation is instructed by the instruction, the multiply-accumulate operation unit 112 holds the cumulative operation result in a register (accumulator) in the operation unit and uses it in the subsequent cumulative operation instruction.

積和累積演算の場合、積和演算器１１２は、全ての演算が完了するまでベクタレジスタ１１１から入力された値に対する積和演算を繰返す。その後、積和累積演算のループ処理が終了すると、積和演算器１１２は、演算結果をベクタレジスタ１１１へ出力し、格納させる。 In the case of the product-sum accumulation operation, the product-sum calculator 112 repeats the product-sum operation on the value input from the vector register 111 until all the operations are completed. After that, when the loop processing of the product-sum accumulation operation is completed, the product-sum calculation unit 112 outputs the calculation result to the vector register 111 and stores it.

また、要素データ毎にテーブルを参照して関数値を取得するＳＩＭＤ演算の場合、積和演算器１１２は、入力された要素データに対応する関数に含まれる係数をベクタレジスタ１１１の参照先を切替えつつベクタレジスタ１１１から順次取得する。そして、積和演算器１１２は、取得した因子を用いて要素データ毎の関数値を算出する。その後、積和演算器１１２は、演算結果をベクタレジスタ１１１へ出力して格納させる。 Further, in the case of the SIMD operation in which the function value is acquired by referring to the table for each element data, the multiply-accumulate unit 112 switches the reference destination of the vector register 111 to the coefficient included in the function corresponding to the input element data. At the same time, it is sequentially acquired from the vector register 111. Then, the product-sum calculator 112 calculates the function value for each element data using the acquired factor. After that, the product-sum calculation unit 112 outputs the calculation result to the vector register 111 and stores it.

ここで、図２を参照して、積和演算部１００による要素データ毎にテーブルを参照して関数値を取得するＳＩＭＤ演算の処理について詳細に説明する。図２は、積和演算部の詳細な回路図である。図２では、分かり易いように、演算命令制御部１０１から、ベクタレジスタ１１１Ａ、積和演算器１１２Ａ及びマルチプレクサ１２２Ａへ延びる信号入力経路を記載した。ただし、実際には、演算命令制御部１０１からの入力経路は、他のベクタレジスタ１１１Ｂ～１１１Ｃ、積和演算器１１２Ｂ～１１２Ｃ及びマルチプレクサ１２２Ｂ～１２２Ｃへも延びる。 Here, with reference to FIG. 2, the processing of the SIMD operation for acquiring the function value by referring to the table for each element data by the product-sum calculation unit 100 will be described in detail. FIG. 2 is a detailed circuit diagram of the product-sum calculation unit. In FIG. 2, for easy understanding, a signal input path extending from the arithmetic instruction control unit 101 to the vector register 111A, the product-sum arithmetic unit 112A, and the multiplexer 122A is described. However, in reality, the input path from the arithmetic instruction control unit 101 extends to the other vector registers 111B to 111C, the multiply-accumulate arithmetic units 112B to 112C, and the multiplexers 122B to 122C.

積和演算部１００は、図２に示すように、図１に示した積和演算器１１２を複数有する。ここでは、それぞれの積和演算器１１２を、積和演算器１１２Ａ～１１２Ｃと表す。また、積和演算部１００は、バンクと呼ばれる単位に分割されたベクタレジスタ１１１を有する。ここでは、バンク単位に分割された各ベクタレジスタ１１１を、ベクタレジスタ１１１Ａ～１１１Ｃと表す。ベクタレジスタ１１１Ａ～１１１Ｃは、それぞれ積和演算器１１２Ａ～１１２Ｃに１対１で対応する。さらに、さらに、積和演算部１００は、各ベクタレジスタ１１１Ａ～１１１Ｃに対応させて、間接アドレスレジスタ１２１Ａ～１２１Ｃ及びマルチプレクサ１２２Ａ～１２２Ｃが配置される。以下の説明では、それぞれを区別しない場合、要素レジスタ１１３、積和演算器１１２、間接アドレスレジスタ１２１及びマルチプレクサ１２２と称する。 As shown in FIG. 2, the product-sum calculation unit 100 has a plurality of product-sum calculation units 112 shown in FIG. Here, each product-sum calculator 112 is represented as a product-sum calculator 112A to 112C. Further, the product-sum calculation unit 100 has a vector register 111 divided into units called banks. Here, each vector register 111 divided into bank units is represented as vector registers 111A to 111C. The vector registers 111A to 111C correspond to the multiply-accumulate calculators 112A to 112C on a one-to-one basis, respectively. Further, in the product-sum calculation unit 100, the indirect address registers 121A to 121C and the multiplexers 122A to 122C are arranged corresponding to the vector registers 111A to 111C. In the following description, when they are not distinguished, they are referred to as an element register 113, a product-sum calculator 112, an indirect address register 121, and a multiplexer 122.

ベクタレジスタ１１１Ａ～１１１Ｃは、例えばＲＡＭ（Random Acccess Memory）である。ベクタレジスタ１１１Ａ～１１１Ｃは、本実施例では、全部で８つ配置される。そして、ベクタレジスタ１１１Ａがバンク＃０にあたり、ベクタレジスタ１１１Ｂがバンク＃１にあたり、ベクタレジスタ１１１Ｃがバンク＃７にあたる場合で説明する。 The vector registers 111A to 111C are, for example, RAM (Random Acccess Memory). In this embodiment, a total of eight vector registers 111A to 111C are arranged. The case where the vector register 111A corresponds to the bank # 0, the vector register 111B corresponds to the bank # 1, and the vector register 111C corresponds to the bank # 7 will be described.

さらに、ベクタレジスタ１１１Ａは、複数の要素レジスタ１１３Ａを有する。また、ベクタレジスタ１１１Ｂは、複数の要素レジスタ１１３Ｂを有する。ベクタレジスタ１１１Ｃは、複数の要素レジスタ１１３Ｃを有する。ここでは、ベクタレジスタ１１１Ａを例に説明する。 Further, the vector register 111A has a plurality of element registers 113A. Further, the vector register 111B has a plurality of element registers 113B. The vector register 111C has a plurality of element registers 113C. Here, the vector register 111A will be described as an example.

各要素レジスタ１１３Ａは、それぞれレジスタ番号が割り当てられたラインと呼ばれる単位と対応する。ここでは、レジスタ番号を表す数値を、ベクタレジスタ１１１の「アドレス」という。すなわち、各要素レジスタ１１３Ａには、それぞれアドレスが割り当てられる。ここでは、ライン＃＃０～＃＃５１１に対応する各要素レジスタ１１３Ａが存在する場合で説明する。 Each element register 113A corresponds to a unit called a line to which a register number is assigned. Here, the numerical value representing the register number is referred to as the "address" of the vector register 111. That is, an address is assigned to each element register 113A. Here, the case where each element register 113A corresponding to the lines ## 0 to ## 511 exists will be described.

ベクタレジスタ１１１Ａは、各サイクルで演算に用いる複数のオペランドを積和演算器１１２Ａに供給し、また、各サイクルで演算結果をいずれかの要素レジスタ１１３Ａに書き戻すためのリードポート及びライトポートを有する。本実施例では、ベクタレジスタ１１１Ａは、積和演算器１１２Ａに対して３本のリードポートを有する。 The vector register 111A has a read port and a write port for supplying a plurality of operands used for calculation in each cycle to the multiply-accumulate calculator 112A and writing back the calculation result to any element register 113A in each cycle. .. In this embodiment, the vector register 111A has three read ports for the multiply-accumulate unit 112A.

ここで、本実施例では、ベクタレジスタ１１１Ａの各リードポートが、積和演算器１１２Ａに直接接続されているが、接続はこれに限らず、ルータを経由してもよい。ルータを経由するにあたり、１つのリードポートがルータを経由して積和演算器１１２Ａ～１１２Ｃへ接続され、他のリードポートは直接積和演算器１１２Ａに接続されてもよい。また、複数のリードポートがルータを経由して積和演算器１１２Ａ～１１２Ｃへ接続されてもよい。ルータを経由した場合、そのリードポートから出力されるデータは、積和演算器１１２Ａ～１１２Ｃの何れかにオペランドとして供給されることが可能となる。 Here, in this embodiment, each read port of the vector register 111A is directly connected to the product-sum calculator 112A, but the connection is not limited to this and may be via a router. When passing through the router, one read port may be connected to the multiply-accumulate calculator 112A to 112C via the router, and the other read port may be directly connected to the product-sum calculator 112A. Further, a plurality of read ports may be connected to the multiply-accumulate calculators 112A to 112C via a router. When passing through a router, the data output from the read port can be supplied as an operand to any of the multiply-accumulate calculators 112A to 112C.

ここで、本実施例では、図３に示す折れ線近似を行った場合の関数値の取得を例に説明する。図３は、実施例１で使用する折れ線近似を表す図である。 Here, in this embodiment, the acquisition of the function value when the polygonal line approximation shown in FIG. 3 is performed will be described as an example. FIG. 3 is a diagram showing a polygonal line approximation used in the first embodiment.

本実施例では、積和演算部１００は、グラフ２０１で示すｙ＝ｆ（ｘ）を直線近似して求めたグラフ２０２で示すような関数に対する関数値を取得する命令を受ける。グラフ２０２に示すように直線近似後の関数は、区間Ｄ１ではｆ_１（ｘ）＝－０．５（ｘ－１）＋１の１次関数である。すなわち、直線近似後の関数は、区間Ｄ１では、傾きが－０．５であり切片が１．５の１次関数である。また、直線近似後の関数は、区間Ｄ２ではｆ_２（ｘ）＝０（ｘ－２）＋０．５の１次関数である。すなわち、直線近似後の関数は、区間Ｄ２では、傾きが０であり切片が０．５の１次関数である。また、直線近似後の関数は、区間Ｄ３ではｆ_３（ｘ）＝０．５（ｘ－３）＋０．５の１次関数である。すなわち、直線近似後の関数は、区間Ｄ３では、傾きが０．５であり切片が－１．０の１次関数である。また、直線近似後の関数は、区間Ｄ４ではｆ_４（ｘ）＝２（ｘ－４）＋１の１次関数である。すなわち、直線近似後の関数は、区間Ｄ４では、傾きが２であり切片が－７．０の１次関数である。また、直線近似後の関数は、区間Ｄ５ではｆ_５（ｘ）＝０．５（ｘ－５）＋３の１次関数である。すなわち、直線近似後の関数は、区間Ｄ５では、傾きが０．５であり切片が０．５の１次関数である。この直線近似後の各関数が、「所定数の同型の関数」の一例にあたる。この場合、同型の関数とは、それぞれで係数が異なる関数を指す。In this embodiment, the product-sum calculation unit 100 receives an instruction to acquire a function value for a function as shown in the graph 202 obtained by linearly approximating y = f (x) shown in the graph 201. As shown in Graph 202, the function after linear approximation is a linear function of f ₁ (x) = −0.5 (x-1) + 1 in the interval D1. That is, the function after linear approximation is a linear function having a slope of −0.5 and an intercept of 1.5 in the interval D1. The function after linear approximation is a linear function of f ₂ (x) = 0 (x-2) + 0.5 in the interval D2. That is, the function after linear approximation is a linear function having a slope of 0 and an intercept of 0.5 in the interval D2. The function after linear approximation is a linear function of f ₃ (x) = 0.5 (x-3) + 0.5 in the interval D3. That is, the function after linear approximation is a linear function having a slope of 0.5 and an intercept of −1.0 in the interval D3. The function after linear approximation is a linear function of f ₄ (x) = 2 (x-4) + 1 in the interval D4. That is, the function after linear approximation is a linear function having a slope of 2 and an intercept of −7.0 in the interval D4. The function after linear approximation is a linear function of f ₅ (x) = 0.5 (x-5) + 3 in the interval D5. That is, the function after linear approximation is a linear function having a slope of 0.5 and an intercept of 0.5 in the interval D5. Each function after this linear approximation corresponds to an example of "a predetermined number of functions of the same type". In this case, a function of the same type refers to a function having a different coefficient.

区間の値を入力とした場合、グラフ２０２で表される直線近似関数の関数値を得るために用いられるテーブルは、図４に示すルックアップテーブル２１０となる。図４は、直線近似関数の関数値を求めるために使用されるルックアップテーブルの一例の図である。この場合、要素レジスタ１１３Ａ～１１３Ｃには、ルックアップテーブル２１０に示される傾きの値及び切片の値が格納される。このベクタレジスタ１１１に格納された各傾きの値及び切片の値が、「演算値」の一例にあたる。 When the value of the interval is input, the table used to obtain the function value of the linear approximation function represented by the graph 202 is the look-up table 210 shown in FIG. FIG. 4 is an example of a look-up table used to obtain the function value of the linear approximation function. In this case, the element registers 113A to 113C store the slope value and the intercept value shown in the lookup table 210. The value of each slope and the value of the intercept stored in the vector register 111 correspond to an example of the "calculated value".

より具体的には、図５に示す値が要素レジスタ１１３Ａ～１１３Ｃに格納される。図５は、実施例１に係る要素レジスタが格納する値を示した図である。図５では、それぞれを表すバンク＃１～＃７の呼称で、ベクタレジスタ１１１Ａ～１１１Ｃを表した。ここでは、グラフ２０２の直線近似関数が全部で１６区画に分かれる場合で説明する。すなわち、直線近似関数は、１６個の１次関数で表される。 More specifically, the values shown in FIG. 5 are stored in the element registers 113A to 113C. FIG. 5 is a diagram showing values stored in the element register according to the first embodiment. In FIG. 5, the vector registers 111A to 111C are represented by the names of banks # 1 to # 7 representing each. Here, the case where the linear approximation function of the graph 202 is divided into 16 sections in total will be described. That is, the linear approximation function is represented by 16 linear functions.

直線近似関数の関数値を取得する場合、図５に示すように、ベクタレジスタ１１１Ａ～１１１Ｃのアドレス０x０００～０x００Ｆの各要素レジスタ１１３Ａ～１１３Ｃに、直線近似関数の各区間Ｄ１～Ｄ５における１次関数の切片の値が格納される。ここでは、アドレス０x０００～０x００Ｆの各要素レジスタ１１３Ａ～１１３Ｃを切片アドレス部３０１とする。また、ベクタレジスタ１１１Ａ～１１１Ｃのアドレス０x０１０～０x０１Ｆの各要素レジスタ１１３Ａ～１１３Ｃに、直線近似関数の各区間Ｄ１～Ｄ５における１次関数の傾きの値が格納される。ここでは、アドレス０x０１０～０x０１Ｆの各要素レジスタ１１３Ａ～１１３Ｃを、傾きアドレス部３０２とする。 When acquiring the function value of the linear approximation function, as shown in FIG. 5, the linear function in each interval D1 to D5 of the linear approximation function is stored in the element registers 113A to 113C of the vector registers 111A to 111C at the addresses 0x000 to 0x00F. The intercept value of is stored. Here, the element registers 113A to 113C at addresses 0x000 to 0x00F are used as the intercept address unit 301. Further, the values of the slopes of the linear functions in the sections D1 to D5 of the linear approximation function are stored in the element registers 113A to 113C at the addresses 0x010 to 0x01F of the vector registers 111A to 111C. Here, the element registers 113A to 113C at the addresses 0x010 to 0x01F are used as the tilted address unit 302.

このように、切片アドレス部３０１と傾きアドレス部３０２とは、先頭から数えた同じ位置に同じ直線近似関数の係数が格納される。この状態が、「各関数の各項に応じた所定数の係数を位置毎に連続して保持」する状態に当たる。本実施例の場合、所定数の係数は、切片の値と傾きの値の２つの係数である。関数の各項とは、一次関数における傾きを表す項及び切片を表す項にあたる。 In this way, the intercept address unit 301 and the slope address unit 302 store the coefficients of the same linear approximation function at the same positions counted from the beginning. This state corresponds to the state of "continuously holding a predetermined number of coefficients corresponding to each term of each function for each position". In the case of this embodiment, the predetermined number of coefficients are two coefficients, the intercept value and the slope value. Each term of the function corresponds to a term representing the slope and a term representing the intercept in the linear function.

また、ベクタレジスタ１１１Ａ～１１１Ｃのアドレスｍの各要素レジスタ１１３Ａ～１１３ＣにＳＩＭＤ演算に用いる配列に含まれる各要素データの値がメモリコントローラ１２により格納される。より具体的には、アドレスｍの各要素レジスタ１１３Ａ～１１３Ｃの下位に固定小数１６ビット値の要素データが格納される。ここでは、１６ビットの固定小数点値の下位１２ビットが小数部である場合で説明する。この固定小数点の形式は、「Ｑ１２」と表現される場合がある。また、ベクタレジスタ１１１Ａ～１１１Ｃには、演算が進むことで、各要素データに対応する切片のアドレス、切片の値、傾きのアドレス及び演算結果である関数値が格納される。ここで、乗算結果及び累積値は、３２ビットの固定小数点であり、下位２４ビットが小数となる場合で説明する。この固定小数点の形式は、「Ｑ２４」と表現される場合がある。 Further, the value of each element data included in the array used for the SIMD operation is stored in the element registers 113A to 113C at the addresses m of the vector registers 111A to 111C by the memory controller 12. More specifically, element data having a fixed decimal value of 16 bits is stored in the lower ranks of the element registers 113A to 113C at the address m. Here, the case where the lower 12 bits of the 16-bit fixed-point value is a fractional part will be described. This fixed-point format may be expressed as "Q12". Further, as the operation progresses, the vector registers 111A to 111C store the intercept address, the intercept value, the slope address, and the function value which is the operation result corresponding to each element data. Here, the case where the multiplication result and the cumulative value are 32-bit fixed-point numbers and the lower 24 bits are decimal numbers will be described. This fixed-point format may be expressed as "Q24".

また、ベクタレジスタ１１１Ａ～１１１Ｃは、対応する間接アドレスレジスタ１２１Ａ～１２１Ｃからアドレスの入力を受ける。そして、ベクタレジスタ１１１Ａ～１１１Ｃは、入力されたアドレスの要素レジスタ１１３Ａ～１１３Ｃに格納された値を積和演算器１１２Ａ～１１２Ｃへ出力する。 Further, the vector registers 111A to 111C receive an address input from the corresponding indirect address registers 121A to 121C. Then, the vector registers 111A to 111C output the values stored in the element registers 113A to 113C of the input address to the multiply-accumulate arithmetic units 112A to 112C.

ベクタレジスタ１１１Ａ～１１１Ｃは、演算が終了すると、メモリコントローラ１２からの指示に応じて、演算結果をメモリコントローラ１２へ出力する。このベクタレジスタ１１１Ａ～１１１Ｃが、「記憶部」の一例にあたる。そして、要素レジスタ１１３Ａ～１１３Ｃが、「要素記憶領域」の一例にあたる。 When the calculation is completed, the vector registers 111A to 111C output the calculation result to the memory controller 12 in response to an instruction from the memory controller 12. The vector registers 111A to 111C correspond to an example of the "storage unit". The element registers 113A to 113C correspond to an example of the "element storage area".

間接アドレスレジスタ１２１Ａ～１２１Ｃは、例えばＦＦ（Flip-Flop）である。直線近似関数を用いたＳＩＭＤ計算を実行する場合、計算開始前の間接アドレスレジスタ１２１Ａ～１２１Ｃにはデータは未格納である。そして、間接アドレスレジスタ１２１Ａ～１２１Ｃは、関数値を求める演算の途中で発生する、ベクタレジスタ１１１Ａ～１１１Ｃにおける切片又は傾きが格納されたアドレスを示す間接アドレスを格納する。その後、間接アドレスレジスタ１２１Ａ～１２１Ｃは、保持する値をマルチプレクサ１２２Ａ～１２２Ｃを介してベクタレジスタ１１１Ａへ出力する。この間接アドレスレジスタ１２１Ａ～１２１Ｃが、「間接記憶部」の一例にあたる。 The indirect address registers 121A to 121C are, for example, FF (Flip-Flop). When the SIMD calculation using the linear approximation function is executed, the data is not stored in the indirect address registers 121A to 121C before the start of the calculation. Then, the indirect address registers 121A to 121C store indirect addresses indicating the addresses in which the intercepts or slopes in the vector registers 111A to 111C, which are generated in the middle of the operation for obtaining the function value, are stored. After that, the indirect address registers 121A to 121C output the holding values to the vector registers 111A via the multiplexers 122A to 122C. The indirect address registers 121A to 121C correspond to an example of the "indirect storage unit".

マルチプレクサ１２２Ａ～１２２Ｃは、間接アドレスレジスタ１２１Ａ～１２１Ｃからの入力とメモリコントローラ１２からの入力の２つの入力を受ける。そして、マルチプレクサ１２２Ａ～１２２Ｃは、演算命令制御部１０１からの指示を受けて、２つの入力のいずれかを選択する。その後、マルチプレクサ１２２Ａ～１２２Ｃは、選択した入力をベクタレジスタ１１１Ａ～１１１Ｃへ出力する。 The multiplexers 122A to 122C receive two inputs, an input from the indirect address registers 121A to 121C and an input from the memory controller 12. Then, the multiplexers 122A to 122C select one of the two inputs in response to an instruction from the arithmetic instruction control unit 101. After that, the multiplexers 122A to 122C output the selected input to the vector registers 111A to 111C.

次に、図６及び７を参照して積和演算器１１２Ａ～１１２Ｃの機能を説明する。図６は、与えられた要素データに対応する切片の値を取得するまでの要素レジスタの状態の遷移を説明するための図である。また、図７は、切片の値の取得後から演算結果取得までの要素レジスタの状態の遷移を説明するための図である。この積和演算器１１２Ａ～１１２Ｃが、「演算部」の一例にあたる。 Next, the functions of the multiply-accumulate calculators 112A to 112C will be described with reference to FIGS. 6 and 7. FIG. 6 is a diagram for explaining the transition of the state of the element register until the value of the intercept corresponding to the given element data is acquired. Further, FIG. 7 is a diagram for explaining the transition of the state of the element register from the acquisition of the intercept value to the acquisition of the operation result. The multiply-accumulate calculators 112A to 112C correspond to an example of the "calculation unit".

図６に示すように、積和演算器１１２Ａは、ベクタレジスタ１１１Ａのアドレスｍに格納された要素データの値を１２ビット右シフトさせ上位４ビットを取得する。そして、積和演算器１１２Ａは、取得した４ビットのデータをアドレスｎの要素レジスタ１１３Ａに格納する（ステップＳ１）。このアドレスｎの要素レジスタ１１３Ａに格納された４ビットのデータは、要素データに対応する切片の値を保持する要素レジスタ１１３Ａのアドレスである。以下では、このアドレスｎの要素レジスタ１１３Ａに格納された４ビットのデータを切片アドレスと言う。この切片アドレスが、「第１選択値」及び「第１番号」の一例にあたる。 As shown in FIG. 6, the multiply-accumulate calculator 112A shifts the value of the element data stored in the address m of the vector register 111A to the right by 12 bits and acquires the upper 4 bits. Then, the multiply-accumulate calculator 112A stores the acquired 4-bit data in the element register 113A at the address n (step S1). The 4-bit data stored in the element register 113A at the address n is the address of the element register 113A that holds the value of the intercept corresponding to the element data. Hereinafter, the 4-bit data stored in the element register 113A of the address n is referred to as an intercept address. This intercept address corresponds to an example of the "first selection value" and the "first number".

次に、積和演算器１１２は、アドレスｎの要素レジスタ１１３Ａに格納された切片アドレスを取得し間接アドレスレジスタ１２１Ａにセットする（ステップＳ２）。次に、アドレスｐの要素レジスタ１１３Ａに、メモリコントローラ１２から入力された値である１．０が格納される。その後、間接アドレスレジスタ１２１Ａにセットされた切片アドレスがベクタレジスタ１１１Ａに入力されることで（ステップＳ３）、積和演算器１１２は、切片アドレスの要素レジスタ１１３Ａから出力された切片の値を取得する。さらに、積和演算器１１２は、アドレスｐの要素レジスタ１１３Ａから１．０を取得する。そして、積和演算器１１２は、切片の値に１．０を乗算した値をアドレスｑの要素レジスタ１１３Ａへ格納する（ステップＳ４）。この積和演算器１１２により格納された切片の値が、「第１演算値」及び「第１係数」の一例にあたる。 Next, the multiply-accumulate calculator 112 acquires the intercept address stored in the element register 113A of the address n and sets it in the indirect address register 121A (step S2). Next, 1.0, which is a value input from the memory controller 12, is stored in the element register 113A at the address p. After that, the intercept address set in the indirect address register 121A is input to the vector register 111A (step S3), and the multiply-accumulate unit 112 acquires the intercept value output from the element register 113A of the intercept address. .. Further, the product-sum calculator 112 acquires 1.0 from the element register 113A at the address p. Then, the product-sum calculator 112 stores the value obtained by multiplying the intercept value by 1.0 in the element register 113A at the address q (step S4). The value of the intercept stored by the product-sum calculator 112 corresponds to an example of the "first calculated value" and the "first coefficient".

次に、図７に示すように、積和演算器１１２は、アドレスｎの要素レジスタ１１３Ａに格納された値を取得する。そして、積和演算器１１２は、取得した値に直線近似関数に含まれる一次関数の数である１６を付加し、アドレスｎの要素レジスタ１１３Ａに再度格納する（ステップＳ５）。このときアドレスｎの要素レジスタ１１３Ａに格納されたデータは、要素データに対応する傾きの値を保持する要素レジスタ１１３Ａのアドレスである。以下では、このときアドレスｎの要素レジスタ１１３Ａに格納された値を傾きアドレスと言う。ここで、各切片の値及び傾きの値は、ベクタレジスタ１１１において１６個ごとに格納される。すなわち、１６を足すことで、切片アドレスで指定された要素レジスタ１１３Ａの切片アドレス部３０１における位置と同じ傾きアドレス部３０２における位置を示す傾きアドレスが生成できる。この傾きアドレスが、「第２選択値」及び「第２番号」の一例にあたる。 Next, as shown in FIG. 7, the multiply-accumulate calculator 112 acquires the value stored in the element register 113A at the address n. Then, the product-sum calculator 112 adds 16 which is the number of linear functions included in the linear approximation function to the acquired value, and stores it in the element register 113A of the address n again (step S5). At this time, the data stored in the element register 113A at the address n is the address of the element register 113A holding the value of the slope corresponding to the element data. In the following, the value stored in the element register 113A of the address n at this time is referred to as a tilted address. Here, the value of each intercept and the value of the slope are stored in the vector register 111 every 16 pieces. That is, by adding 16, it is possible to generate a tilted address indicating a position in the tilted address section 302 that is the same as the position in the section address section 301 of the element register 113A specified by the intercept address. This tilt address corresponds to an example of the "second selection value" and the "second number".

次に、積和演算器１１２は、アドレスｎの要素レジスタ１１３Ａに格納された傾きアドレスを取得し間接アドレスレジスタ１２１Ａにセットする（ステップＳ６）。 Next, the multiply-accumulate calculator 112 acquires the tilted address stored in the element register 113A of the address n and sets it in the indirect address register 121A (step S6).

その後、間接アドレスレジスタ１２１Ａにセットされた傾きアドレスがベクタレジスタ１１１Ａに入力されることで（ステップＳ７）、積和演算器１１２は、傾きアドレスの要素レジスタ１１３Ａから出力された傾きの値を取得する。この傾きの値が、「第２演算値」及び「第２係数」の一例にあたる。 After that, the tilt address set in the indirect address register 121A is input to the vector register 111A (step S7), so that the multiply-accumulate calculator 112 acquires the tilt value output from the element register 113A of the tilt address. .. The value of this slope corresponds to an example of the "second calculated value" and the "second coefficient".

また、積和演算器１１２Ａは、アドレスｍの要素レジスタ１１３Ａから要素データを取得する。さらに、積和演算器１１２Ａは、アドレスｑの要素レジスタ１１３Ａから切片の値を取得する。そして、積和演算器１１２は、傾きの値と要素データとを掛け合わせ、掛けあわせた結果に切片の値を加算して、要素データに対応する直線近似関数の関数値を求める。その後、積和演算器１１２は、求めた関数値をアドレスｑの要素レジスタ１１３Ａに格納する（ステップＳ６）。 Further, the product-sum calculator 112A acquires element data from the element register 113A at the address m. Further, the multiply-accumulate calculator 112A acquires the intercept value from the element register 113A at the address q. Then, the product-sum calculator 112 multiplies the slope value and the element data, adds the intercept value to the multiplied result, and obtains the function value of the linear approximation function corresponding to the element data. After that, the product-sum calculator 112 stores the obtained function value in the element register 113A at the address q (step S6).

図８は、図５～７で示した演算を積和演算部に行わせるためのプログラムの一例の図である。以下では、紙面に向かって左端の行番号を用いてプログラムが積和演算部１００に実行させる処理を説明する。図８のプログラムでは、アドレスｍ、ｎ、ｐ、ｑの要素レジスタ１１３ＡをそれぞれＦＲ＃ｍ、＃ｎ、＃ｐ、＃ｑとして表した。また、間接アドレスレジスタ１２１ＡをＩＮＤ＿ＲＥＧと表した。 FIG. 8 is a diagram of an example of a program for causing the product-sum calculation unit to perform the operations shown in FIGS. 5 to 7. In the following, a process of causing the product-sum calculation unit 100 to execute the program using the line number on the left end of the paper will be described. In the program of FIG. 8, the element registers 113A of the addresses m, n, p, and q are represented as FR # m, # n, # p, and # q, respectively. Further, the indirect address register 121A is represented as IND_REG.

１行目は、アドレスｍの要素レジスタ１１３Ａから固定小数点値を取得し、取得した固定小数点値を１２ビット右にシフトさせ上位４ビットを取得し、取得した４ビットのデータをアドレスｎの要素レジスタ１１３Ａの下位４ビットにセットさせる処理を表す。この４ビットのデータは、切片アドレスである。 The first line acquires a fixed-point value from the element register 113A at address m, shifts the acquired fixed-point value to the right by 12 bits, acquires the upper 4 bits, and uses the acquired 4-bit data as the element register at address n. It represents the process of setting to the lower 4 bits of 113A. This 4-bit data is the intercept address.

２行目は、アドレスｎの要素レジスタ１１３Ａにセットされた切片アドレスを間接アドレスレジスタ１２１Ａへ格納させる処理を表す。 The second line represents a process of storing the intercept address set in the element register 113A of the address n in the indirect address register 121A.

３行目は、１．０をアドレスｐの要素レジスタ１１３Ａに格納させる処理を表す。 The third line represents a process of storing 1.0 in the element register 113A at the address p.

４行目は、間接アドレスレジスタ１２１Ａに格納された値に、アドレスｐの要素レジスタ１１３Ａに格納された値に乗算させ、乗算結果をアドレスｑの要素レジスタ１１３Ａに格納させる処理を表す。すなわち、４行目は、間接アドレスレジスタ１２１Ａに格納された切片の値をアドレスｑの要素レジスタ１１３Ａに格納させる処理にあたる。ここまでで、図６に示した処理が終了する。 The fourth line represents a process of multiplying the value stored in the indirect address register 121A by the value stored in the element register 113A of the address p and storing the multiplication result in the element register 113A of the address q. That is, the fourth line corresponds to the process of storing the intercept value stored in the indirect address register 121A in the element register 113A of the address q. Up to this point, the process shown in FIG. 6 is completed.

５行目は、アドレスｎの要素レジスタ１１３Ａに格納された値に０ｘ０１０を加算して、加算結果である傾きアドレスを間接アドレスレジスタ１２１Ａに格納させる処理を表す。０ｘ０１０は、アドレス１６個分にあたり、５行目は、アドレスを１６ずらす処理にあたる。 The fifth line represents a process of adding 0x010 to the value stored in the element register 113A of the address n and storing the tilted address as the addition result in the indirect address register 121A. 0x010 corresponds to 16 addresses, and the fifth line corresponds to the process of shifting the addresses by 16.

６行目は、アドレスｎの要素レジスタ１１３Ａに格納された傾きアドレスを取得し、間接アドレスレジスタ１２１Ａに格納させる処理にあたる。 The sixth line corresponds to a process of acquiring the tilted address stored in the element register 113A of the address n and storing it in the indirect address register 121A.

７行目は、以下の処理を表す。まず１つの処理は、間接アドレスレジスタ１２１Ａに格納された傾きアドレスを取得し傾きアドレスの要素レジスタ１１３Ａに格納された傾きを取得し、アドレスｍの要素レジスタ１１３Ａに格納された要素データに乗算する処理である。次に、２つ目の処理は、アドレスｑの要素レジスタ１１３Ａに格納された切片に乗算結果を加算する処理である。これにより、アドレスｑの要素レジスタ１１３Ａに演算結果が格納される。 The seventh line represents the following processing. First, one process is to acquire the tilt address stored in the indirect address register 121A, acquire the tilt stored in the element register 113A of the tilt address, and multiply it by the element data stored in the element register 113A of the address m. Is. Next, the second process is a process of adding the multiplication result to the intercept stored in the element register 113A at the address q. As a result, the calculation result is stored in the element register 113A at the address q.

次に、図９を参照して、ＳＩＭＤ演算命令の処理の流れを説明する。図９は、ＳＩＭＤ演算命令の処理のフローチャートである。ここでは、任意のバンク＃ｉとして表される各ベクタレジスタ１１１における処理を例に説明する。 Next, the flow of processing of the SIMD operation instruction will be described with reference to FIG. FIG. 9 is a flowchart of processing of SIMD operation instructions. Here, processing in each vector register 111 represented as an arbitrary bank #i will be described as an example.

バンク＃ｉのベクタレジスタ１１１は、演算命令制御部１０１から入力された命令が間接アドレスレジスタ１２１からの読み出し命令か否かを判定する（ステップＳ１０１）。 The vector register 111 of the bank # i determines whether or not the instruction input from the operation instruction control unit 101 is a read instruction from the indirect address register 121 (step S101).

命令が間接アドレスレジスタ１２１からの読み出し命令の場合（ステップＳ１０１：肯定）、ベクタレジスタ１１１は、アドレスをマルチプレクサ１２２から読み出す。そして、ベクタレジスタ１１１は、間接アドレスレジスタ１２１から読み出したアドレスをマルチプレクサ１２２に繋がる入力ポートに対応する出力ポートにおける読み出しアドレスにセットする（ステップＳ１０２）。 If the instruction is a read instruction from the indirect address register 121 (step S101: affirmative), the vector register 111 reads the address from the multiplexer 122. Then, the vector register 111 sets the address read from the indirect address register 121 to the read address in the output port corresponding to the input port connected to the multiplexer 122 (step S102).

これに対して、命令が間接アドレスレジスタ１２１からの読み出し命令でない場合（ステップＳ１０１：否定）、ベクタレジスタ１１１は、以下の処理を行う。すなわち、ベクタレジスタ１１１は、命令により与えられ修飾されたアドレスをマルチプレクサ１２２に繋がる入力ポートに対応する出力ポートにおける読み出しアドレスにセットする（ステップＳ１０３）。 On the other hand, when the instruction is not a read instruction from the indirect address register 121 (step S101: negation), the vector register 111 performs the following processing. That is, the vector register 111 sets the address given and modified by the instruction to the read address in the output port corresponding to the input port connected to the multiplexer 122 (step S103).

次に、ベクタレジスタ１１１は、命令により与えられ修飾されたアドレスを他の出力ポートにおける読み出しアドレスにセットする（ステップＳ１０４）。 Next, the vector register 111 sets the address given and modified by the instruction to the read address in the other output port (step S104).

ベクタレジスタ１１１は、読み出しアドレスとしてセットされたアドレスにしたがい要素レジスタ１１３の内容を読み出し各出力ポートから出力しオペランドとしてセットする（ステップＳ１０５）。 The vector register 111 reads out the contents of the element register 113 according to the address set as the read address, outputs the contents from each output port, and sets them as an operand (step S105).

次に、積和演算器１１２は、演算命令制御部１０１から入力された命令が間接アドレスレジスタ１２１への書き込み命令か否かを判定する（ステップＳ１０６）。 Next, the multiply-accumulate arithmetic unit 112 determines whether or not the instruction input from the arithmetic instruction control unit 101 is a write instruction to the indirect address register 121 (step S106).

命令が間接アドレスレジスタ１２１への書き込み命令でない場合（ステップＳ１０６：否定）、積和演算器１１２は、ベクタレジスタ１１１の各出力ポートに対応する自己の入力ポートから各オペランドの入力を受ける。そして、積和演算器１１２は、各オペランドを用いて命令で指定された演算を実行し、演算結果を自己が有する演算結果レジスタにセットする(ステップＳ１０７)。 If the instruction is not a write instruction to the indirect address register 121 (step S106: negation), the multiply-accumulate unit 112 receives the input of each operand from its own input port corresponding to each output port of the vector register 111. Then, the product-sum calculator 112 executes the operation specified by the instruction using each operand, and sets the operation result in the operation result register owned by itself (step S107).

次に、積和演算器１１２は、命令により与えられ修飾されたアドレスを書き込み用のポートにおける書き込みアドレスとする（ステップＳ１０８）。 Next, the multiply-accumulate calculator 112 sets the qualified address given by the instruction as the write address in the write port (step S108).

その後、積和演算器１１２は、演算結果レジスタの内容を書き込み用のポートを用いて書き込みアドレスで指定された要素レジスタ１１３に書き込む（ステップＳ１０９）。 After that, the multiply-accumulate calculator 112 writes the contents of the calculation result register to the element register 113 designated by the write address using the write port (step S109).

これに対して、命令が間接アドレスレジスタ１２１への書き込み命令の場合（ステップＳ１０６：肯定）、ベクタレジスタ１１１は、所定のオペランドを間接アドレスレジスタ１２１にセットする（ステップＳ１１０）。 On the other hand, when the instruction is a write instruction to the indirect address register 121 (step S106: affirmative), the vector register 111 sets a predetermined operand in the indirect address register 121 (step S110).

次に、図１０を参照して、ＳＩＭＤ演算で要素データ毎のテーブル検索演算を実行する処理の流れを説明する。図１０は、ＳＩＭＤ演算で要素データ毎のテーブル検索演算を実行する処理のフローチャートである。 Next, with reference to FIG. 10, the flow of the process of executing the table search operation for each element data in the SIMD operation will be described. FIG. 10 is a flowchart of a process for executing a table search operation for each element data in a SIMD operation.

管理者は、演算において参照対象となるテーブルの値をベクタレジスタ１１１の所定の番号の要素レジスタ１１３に配置する（ステップＳ２０１）。 The administrator arranges the value of the table to be referred to in the operation in the element register 113 having a predetermined number in the vector register 111 (step S201).

積和演算器１１２は、入力値をベクタレジスタ１１１の参照に用いるアドレスとなるように加工する（ステップＳ２０２）。ここで、入力値は、演算の開始時であれば例えば要素データであり、演算の途中であれば例えば演算結果である。 The product-sum calculator 112 processes the input value so that it becomes an address used for reference to the vector register 111 (step S202). Here, the input value is, for example, element data at the start of the operation, and is, for example, the operation result during the operation.

その後、積和演算器１１２は、ベクタレジスタ１１１の参照に用いるアドレスをベクタレジスタ１１１に格納する。ベクタレジスタ１１１は、格納されたベクタレジスタ１１１の参照に用いるアドレスを間接アドレスレジスタ１２１に書き込む（ステップＳ２０３）。 After that, the multiply-accumulate calculator 112 stores the address used for reference to the vector register 111 in the vector register 111. The vector register 111 writes an address used for reference to the stored vector register 111 to the indirect address register 121 (step S203).

次に、積和演算器１１２は、間接アドレスレジスタ１２１に格納された値をアドレスとして要素レジスタ１１３の内容を読み出し他のオペランドとともに演算に使用する（ステップＳ２０４）。 Next, the multiply-accumulate calculator 112 reads out the contents of the element register 113 with the value stored in the indirect address register 121 as an address, and uses it for the calculation together with other operands (step S204).

その後、積和演算器１１２は、演算が完了したか否かを判定する（ステップＳ２０５）。演算が完了していない場合（ステップＳ２０５：否定）、積和演算器１１２は、ステップＳ２０２へ戻る。 After that, the product-sum calculator 112 determines whether or not the calculation is completed (step S205). If the calculation is not completed (step S205: negation), the multiply-accumulate calculator 112 returns to step S202.

これに対して、演算が完了した場合（ステップＳ２０５：肯定）、積和演算器１１２は、演算結果をベクタレジスタ１１１に格納して演算処理を完了する。 On the other hand, when the calculation is completed (step S205: affirmative), the multiply-accumulate calculator 112 stores the calculation result in the vector register 111 and completes the calculation process.

以上に説明したように、本実施例に係る積和演算部は、テーブルを参照して計算を行う演算を実行する場合に、ベクタレジスタにテーブルの値を登録し、要素データ又は他の入力値を用いて積和演算によりベクタレジスタの参照に用いるアドレスを生成する。そして、積和演算部は、生成したアドレスを間接アドレスレジスタに格納する。そして、積和演算部は、間接アドレスに格納された値を読み出しアドレスとしてベクタレジスタからテーブルの値を取得し演算を行う。このように、バンクごとに間接アドレスレジスタを設けたことで、少ない命令数で効率よくテーブルを参照して計算を行う演算を実行することができる。そして、バンクごとに別々の内容を読み出すことができ、ＳＩＭＤ要素データ毎の計算を高速に処理することが可能となる。すなわち、回路規模を小さく抑えつつ処理効率を向上させることができる。 As described above, the product-sum calculation unit according to the present embodiment registers the value of the table in the vector register when executing the calculation by referring to the table, and the element data or other input values. Is used to generate the address used to refer to the vector register by multiply-accumulate operation. Then, the product-sum calculation unit stores the generated address in the indirect address register. Then, the product-sum calculation unit acquires the value of the table from the vector register using the value stored in the indirect address as the read address and performs the calculation. By providing the indirect address register for each bank in this way, it is possible to efficiently refer to the table and execute the calculation with a small number of instructions. Then, different contents can be read out for each bank, and the calculation for each SIMD element data can be processed at high speed. That is, the processing efficiency can be improved while keeping the circuit scale small.

ここで、本実施例では、区分毎に係数の異なる多項式を用いた近似又は補間を行う関数を用いた演算を例に説明したが、使用できる演算はこれに限らず、テーブルを参照して計算を行う演算、すなわち要素データ毎に表検索を行う演算であれば他の演算でもよい。例えば、ニュートン法その他の収束演算における、区分毎に異なる初期値やパラメータの取得などを行う演算を用いてもよい。 Here, in this embodiment, an operation using a function that performs approximation or interpolation using a polynomial having a different coefficient for each division has been described as an example, but the operation that can be used is not limited to this, and the calculation is performed with reference to a table. Any other operation may be used as long as it is an operation for performing a table search for each element data. For example, in Newton's method or other convergence operations, operations that acquire different initial values and parameters for each division may be used.

次に、実施例２について説明する。本実施例に係る積和演算部は、要素データを冪数として２の冪乗を求める演算を実行する。本実施例に係る情報処理装置も図１及び２で示される構成を有する。本実施例では、ＳＩＭＤ演算に用いられる各要素データが８ビットの固定小数点である場合で説明する。具体的には、各要素データは、２の補数表現を用いて表されるものとし、下位７ビットが小数である（「Ｑ７表現」）とする。すなわち、各要素データは、ｓｘｘｘｙｙｙｙの８ビット表現で表され、ｓ＝０であれば、２進数である０．ｘｘｘｙｙｙｙを表し、ｓ＝１であれば、２進数である０．ｘｘｘｙｙｙｙ－１．０００００００を表す。また、演算結果は単精度浮動小数点であるＦＰ３２形式で求める場合で説明する。 Next, Example 2 will be described. The product-sum calculation unit according to this embodiment executes an operation for obtaining a power of 2 using the element data as a power. The information processing apparatus according to this embodiment also has the configurations shown in FIGS. 1 and 2. In this embodiment, the case where each element data used in the SIMD operation has an 8-bit fixed point number will be described. Specifically, each element data is represented using a two's complement representation, and the lower 7 bits are decimals (“Q7 representation”). That is, each element data is represented by an 8-bit representation of sxxxxyyy, and if s = 0, it is a binary number 0. It represents xxxyyyy, and if s = 1, it is a binary number 0. Represents xxxyyyy-1000000000. Further, the calculation result will be described in the case of obtaining in the FP32 format which is a single precision floating point number.

図１１は、実施例２に係るベクタレジスタに格納されるテーブルの値の一例を示す図である。ベクタレジスタ１１１の各要素レジスタ１１３には、図１１に示すように、アドレス００～０Ｆに、要素データの上位４ビットの数を冪数とした場合の２の冪乗の計算結果となり得る１６個の値が格納される。また、要素レジスタ１１３のアドレス１０～１Ｆには、要素データの下位４ビットの数を冪数とした場合の２の冪乗数の計算結果となり得る１６個の値が格納される。演算実行前に、アドレス＃ｍの要素レジスタ１１３に要素データが格納される。 FIG. 11 is a diagram showing an example of table values stored in the vector register according to the second embodiment. As shown in FIG. 11, each element register 113 of the vector register 111 has 16 powers that can be calculated as a power of 2 when the number of the upper 4 bits of the element data is a power number at addresses 00 to 0F. The value of is stored. Further, in the addresses 10 to 1F of the element register 113, 16 values that can be the calculation result of a power of two when the number of the lower 4 bits of the element data is taken as a power are stored. Before executing the operation, the element data is stored in the element register 113 at the address #m.

積和演算器１１２は、アドレス＃ｍの要素レジスタ１１３から要素データを取得する。そして、積和演算器１１２は、取得した要素データと０ｘ００００００Ｆ０との論理積をとり、要素データの上位４ビットより下位のビットを０としたデータを取得する。以下では、要素データの上位４ビットより下位のビットを０としたデータを要素データの上位４ビットを表すデータという。その後、積和演算器１１２は、取得した要素データの上位４ビットを表すデータをアドレス＃ｘの要素レジスタ１１３に格納する。 The product-sum calculator 112 acquires element data from the element register 113 at address #m. Then, the product-sum calculator 112 takes the logical product of the acquired element data and 0x0000000F0, and acquires the data in which the bits lower than the upper 4 bits of the element data are 0. In the following, the data in which the bits lower than the upper 4 bits of the element data are 0 are referred to as data representing the upper 4 bits of the element data. After that, the multiply-accumulate calculator 112 stores the data representing the upper 4 bits of the acquired element data in the element register 113 of the address # x.

次に、積和演算器１１２は、アドレス＃ｍの要素レジスタ１１３から要素データを取得する。そして、積和演算器１１２は、取得した要素データと０ｘ０００００００Ｆの論理積をとり、要素データの下位４ビットより上位のビットを０としたデータを取得する。以下では、要素データの下位４ビットより上位のビットを０としたデータを要素データの下位４ビットを表すデータという。その後、積和演算器１１２は、取得した要素データの下位４ビットを表すデータをアドレス＃ｙの要素レジスタ１１３に格納する。 Next, the product-sum calculator 112 acquires element data from the element register 113 at address #m. Then, the product-sum calculator 112 takes the acquired element data and the logical product of 0x00000000F, and acquires the data in which the bits higher than the lower 4 bits of the element data are set to 0. In the following, the data in which the bits higher than the lower 4 bits of the element data are 0 are referred to as the data representing the lower 4 bits of the element data. After that, the multiply-accumulate calculator 112 stores the data representing the lower 4 bits of the acquired element data in the element register 113 of the address #y.

次に、積和演算器１１２は、結果レジスタであるアドレス＃ｒの要素レジスタ１１３に１．０を格納する。 Next, the product-sum calculator 112 stores 1.0 in the element register 113 of the address #r, which is a result register.

次に、積和演算器１１２は、アドレス＃ｘの要素レジスタ１１３に格納された要素データの上位４ビットを表すデータを取得する。そして、積和演算器１１２は、要素データの上位４ビットを最下位までシフトし、アドレス＃ｍで上位４ビットにあったデータを下位４ビットに移動させたデータを生成する。その後、積和演算器１１２は、上位４ビットを下位４ビットに移動させたデータをアドレス＃ｘの要素レジスタ１１３に格納する。 Next, the multiply-accumulate calculator 112 acquires data representing the upper 4 bits of the element data stored in the element register 113 of the address #x. Then, the multiply-accumulate calculator 112 shifts the upper 4 bits of the element data to the lowest bit, and generates data in which the data in the upper 4 bits at the address #m is moved to the lower 4 bits. After that, the multiply-accumulate calculator 112 stores the data obtained by moving the upper 4 bits to the lower 4 bits in the element register 113 of the address # x.

また、積和演算器１１２は、アドレス＃ｙの要素レジスタ１１３に格納された要素データの下位４ビットを表すデータを取得する。そして、積和演算器１１２は、要素データの下位４ビットを表すデータに下位４ビットの計算に用いるテーブルの値の先頭のアドレスである０ｘ０１０を加算する。その後、積和演算器１１２は、要素データの下位４ビットを表すデータに０ｘ０１０を加算したデータをアドレス＃ｙの要素レジスタ１１３に格納する。 Further, the multiply-accumulate calculator 112 acquires data representing the lower 4 bits of the element data stored in the element register 113 of the address #y. Then, the multiply-accumulate calculator 112 adds 0x010, which is the first address of the value of the table used for the calculation of the lower 4 bits, to the data representing the lower 4 bits of the element data. After that, the multiply-accumulate calculator 112 stores the data obtained by adding 0x010 to the data representing the lower 4 bits of the element data in the element register 113 of the address #y.

次に、積和演算器１１２は、アドレス＃ｘの要素レジスタ１１３に格納された要素データの上位４ビットを下位４ビットに移動させたデータを取得する。次に、積和演算器１１２は、上位４ビットを下位４ビットに移動させたデータを間接アドレスレジスタ１２１へ格納する。このデータが、要素データの上位４ビットを冪数として２の冪乗を算出した場合の演算結果が格納されたアドレスにあたる。 Next, the multiply-accumulate calculator 112 acquires data obtained by moving the upper 4 bits of the element data stored in the element register 113 of the address #x to the lower 4 bits. Next, the product-sum calculator 112 stores the data obtained by moving the upper 4 bits to the lower 4 bits in the indirect address register 121. This data corresponds to the address where the calculation result when the power of 2 is calculated with the upper 4 bits of the element data as the power is stored.

その後、積和演算器１１２は、間接アドレスレジスタ１２１に格納された値をアドレスとする要素レジスタ１１３の内容を取得する。この取得した値が、要素データの上位４ビットを冪数として２の冪乗を算出した場合の演算結果にあたる。そして、積和演算器１１２は、取得した値をアドレス＃ｒの要素レジスタ１１３に格納された値に乗算してアドレス＃ｒの要素レジスタ１１３に格納する。 After that, the product-sum calculator 112 acquires the contents of the element register 113 whose address is the value stored in the indirect address register 121. This acquired value corresponds to the calculation result when the power of 2 is calculated with the upper 4 bits of the element data as the power. Then, the product-sum calculator 112 multiplies the acquired value by the value stored in the element register 113 of the address # r and stores it in the element register 113 of the address # r.

次に、積和演算器１１２は、アドレス＃ｙの要素レジスタ１１３に格納された要素データの下位４ビットのデータに０ｘ０１０を加算した値を取得する。次に、積和演算器１１２は、要素データの下位４ビットのデータに０ｘ０１０を加算した値を間接アドレスレジスタ１２１へ格納する。この要素データの下位４ビットのデータに０ｘ０１０を加算した値が、要素データの下位４ビットを冪数として２の冪乗を算出した場合の演算結果が格納されたアドレスにあたる。 Next, the multiply-accumulate calculator 112 acquires a value obtained by adding 0x010 to the data of the lower 4 bits of the element data stored in the element register 113 of the address #y. Next, the multiply-accumulate calculator 112 stores the value obtained by adding 0x010 to the data of the lower 4 bits of the element data in the indirect address register 121. The value obtained by adding 0x010 to the data of the lower 4 bits of the element data corresponds to the address where the calculation result when the power of 2 is calculated with the lower 4 bits of the element data as the power is stored.

その後、積和演算器１１２は、間接アドレスレジスタ１２１に格納された値をアドレスとする要素レジスタ１１３の内容を取得する。この取得した値が、要素データの下位４ビットを冪数として２の冪乗を算出した場合の演算結果にあたる。そして、積和演算器１１２は、取得した値をアドレス＃ｒの要素レジスタ１１３に格納された値に乗算してアドレス＃ｒの要素レジスタ１１３に格納する。これにより、要素データを冪数とした２の冪乗の演算が完了する。 After that, the product-sum calculator 112 acquires the contents of the element register 113 whose address is the value stored in the indirect address register 121. This acquired value corresponds to the calculation result when the power of 2 is calculated with the lower 4 bits of the element data as the power. Then, the product-sum calculator 112 multiplies the acquired value by the value stored in the element register 113 of the address # r and stores it in the element register 113 of the address # r. As a result, the operation of the power of 2 with the element data as the power is completed.

ここで、本実施例に係る積和演算部１００は、ＥＸＰ２（Ｘ＋Ｙ）＝ＥＸＰ２（Ｘ）×ＥＸＰ２（Ｙ）の関係を利用して要素データを冪数とする２の冪乗を求める。具体的には、積和演算部１００は、上位４ビットの数と下位４ビットの数の和に要素データを分解し、それぞれを冪数とする２の冪乗の値をベクタレジスタ１１１に格納された値から取得して掛け合わせることで、要素データを冪数とする２の冪乗を求める。 Here, the product-sum calculation unit 100 according to this embodiment uses the relationship of EXP2 (X + Y) = EXP2 (X) × EXP2 (Y) to obtain the power of 2 with the element data as the power. Specifically, the product-sum calculation unit 100 decomposes the element data into the sum of the number of the upper 4 bits and the number of the lower 4 bits, and stores the value of the power of 2 having each as the power in the vector register 111. By acquiring from the obtained values and multiplying them, the power of 2 with the element data as the power is obtained.

図１２は、２の冪乗を求める演算を積和演算部に行わせるためのプログラムの一例の図である。 FIG. 12 is a diagram of an example of a program for causing the product-sum calculation unit to perform an operation for obtaining the power of 2.

１行目は、要素データの上位４ビットを表すデータをアドレスｘの要素レジスタ１１３に格納させる処理を表す。 The first line represents a process of storing data representing the upper 4 bits of the element data in the element register 113 of the address x.

２行目は、要素データの下位４ビットを表すデータをアドレスｙの要素レジスタ１１３に格納させる処理を表す。 The second line represents a process of storing data representing the lower 4 bits of the element data in the element register 113 of the address y.

３行目は、結果レジスタであるアドレス＃ｒの要素レジスタ１１３に１．０を格納させる処理を表す。 The third line represents a process of storing 1.0 in the element register 113 of address # r, which is a result register.

４行目は、要素データの上位４ビットを表すデータを最下位までシフトさせる処理を表す。 The fourth line represents a process of shifting the data representing the upper 4 bits of the element data to the lowest.

５行目は、要素データの下位４ビットを表すデータに下位４ビットの計算に用いるテーブルの値の開始位置のアドレスを加算させる処理を表す。 The fifth line represents a process of adding the address of the start position of the value of the table used for the calculation of the lower 4 bits to the data representing the lower 4 bits of the element data.

６行目は、要素データの上位４ビットを表すデータを間接アドレスレジスタ１２１にセットさせる処理を表す。 The sixth line represents a process of setting the data representing the upper 4 bits of the element data in the indirect address register 121.

７行目は、間接アドレスレジスタ１２１にセットされた値をアドレスとする要素レジスタ１１３から値を取得させ、結果レジスタであるアドレス＃ｒの要素レジスタ１１３の値に乗算させる処理を表す。 The seventh line represents a process of acquiring a value from the element register 113 whose address is the value set in the indirect address register 121 and multiplying it by the value of the element register 113 of the address # r which is the result register.

８行目は、要素データの下位４ビットを表すデータに０ｘ０１０を加算した値を間接アドレスレジスタ１２１にセットさせる処理を表す。 The eighth line represents a process of setting the value obtained by adding 0x010 to the data representing the lower four bits of the element data in the indirect address register 121.

９行目は、間接アドレスレジスタ１２１にセットされた値をアドレスとする要素レジスタ１１３から値を取得させ、結果レジスタであるアドレス＃ｒの要素レジスタ１１３の値に乗算させる処理を表す。 The ninth line represents a process of acquiring a value from the element register 113 whose address is the value set in the indirect address register 121 and multiplying it by the value of the element register 113 of the address # r which is the result register.

この場合、要素データの上位４ビットを表すデータが、「第１選択値」の一例にあたる。そして、要素データの上位４ビットを冪数として２の冪乗を算出した場合の演算結果が、「第１演算値」の一例にあたる。また、要素データの下位４ビットを表すデータが、「第２選択値」の一例にあたる。そして、要素データの上位４ビットを冪数として２の冪乗を算出した場合の演算結果が、「第２演算値」の一例にあたる。 In this case, the data representing the upper 4 bits of the element data corresponds to an example of the "first selection value". Then, the calculation result when the power of 2 is calculated with the upper 4 bits of the element data as the power is an example of the "first calculation value". Further, the data representing the lower 4 bits of the element data corresponds to an example of the "second selection value". Then, the calculation result when the power of 2 is calculated with the upper 4 bits of the element data as the power is an example of the "second calculation value".

以上に説明したように、本実施例に係る積和演算部は、要素データの上位４ビットの数と下位４ビットの数をそれぞれ冪数とした場合の、冪乗の結果として取り得る値をベクトルレジスタに格納する。そして、本実施例に係る積和演算部は、要素データの上位４ビットの数と下位４ビットの数を間接アドレスレジスタに格納し、格納した値を用いてベクトルレジスタを検索して、要素データの上位４ビットの数と下位４ビットの数をそれぞれ冪数とした場合の２の冪乗の計算結果を取得する。その後、積和演算部は、取得した２つの値を乗算することで要素データを冪数とした場合の２の冪乗の演算結果を取得する。 As described above, the product-sum calculation unit according to the present embodiment determines a value that can be taken as a result of exponentiation when the number of the upper 4 bits and the number of the lower 4 bits of the element data are each set to the power. Store in a vector register. Then, the product-sum calculation unit according to the present embodiment stores the number of the upper 4 bits and the number of the lower 4 bits of the element data in the indirect address register, searches the vector register using the stored values, and performs the element data. The calculation result of the power of 2 is acquired when the number of the upper 4 bits and the number of the lower 4 bits of are used as the register. After that, the product-sum calculation unit acquires the calculation result of the power of 2 when the element data is used as a power by multiplying the two acquired values.

この場合も、少ない命令数で効率よくテーブルを参照して計算を行う演算を実行することができ、ＳＩＭＤ要素毎の計算を高速に処理することが可能となる。すなわち、回路規模を小さく抑えつつ処理効率を向上させることができる。 In this case as well, it is possible to efficiently execute the calculation by referring to the table with a small number of instructions, and it is possible to process the calculation for each SIMD element at high speed. That is, the processing efficiency can be improved while keeping the circuit scale small.

１ＰＣＩカード
２ホストコンピュータ
１０処理ユニット
１１全体命令制御部
１２メモリコントローラ
１３メモリ
１４ＰＣＩ制御部
１０１演算命令制御部
１０２演算命令バッファ
１０３マルチプレクサ
１１１，１１１Ａ～１１１Ｃベクタレジスタ
１１２，１１２Ａ～１１２Ｃ積和演算器
１１３，１１３Ａ～１１３Ｃ要素レジスタ
１２１，１２１Ａ～１２１Ｃ間接アドレスレジスタ
１２２，１２２Ａ～１２２Ｃマルチプレクサ1 PCI card 2 Host computer 10 Processing unit 11 Overall instruction control unit 12 Memory controller 13 Memory 14 PCI control unit 101 Operation instruction control unit 102 Operation instruction buffer 103 Multiplexer 111, 111A to 111C Vector register 112, 112A to 112C Product sum operation unit 113, 113A to 113C Element register 121, 121A to 121C Indirect address register 122, 122A to 122C multiplexer

Claims

A storage unit having a plurality of element storage areas and storing each of the plurality of calculated values in each of the element storage areas.
It is a product-sum calculation machine having a multiplier and an adder, generates a first selection value based on input element data, and of the calculation value stored in the element storage area based on the first selection value. An arithmetic processing apparatus including a arithmetic unit that acquires a first arithmetic value from the inside and acquires an arithmetic result based on the first arithmetic value by using the multiplier and the adder. ..

The calculation unit generates a second selection value based on the element data or the first selection value, and acquires the second calculation value from the calculation value stored in the element storage area based on the second selection value. The arithmetic processing apparatus according to claim 1, wherein the arithmetic result is acquired based on the first operational value and the second operational value.

The calculation result acquired by the calculation unit is a function of the same type in a predetermined number, and a function corresponding to the element data is selected from the functions including a plurality of coefficients, and the calculation is performed using the selected function. The result,
The element storage areas are numbered sequentially, and the element storage areas in which the numbers are continuous have the predetermined number of coefficients corresponding to each term included in each term continuously for each term. Hold and
The calculation unit generates a first number, which is one of the numbers, based on the element data, acquires a first coefficient stored in the element storage area corresponding to the first number, and obtains the first coefficient. The predetermined number is added to the number to generate the second number, the second coefficient stored in the element storage area corresponding to the second number is acquired, and the element data, the first coefficient and the second coefficient are obtained. The arithmetic processing apparatus according to claim 2, wherein the arithmetic result is acquired based on a coefficient.

There are a specific number of the storage units,
In the calculation unit, the specific number is arranged corresponding to each storage unit.
The arithmetic processing apparatus according to claim 1, wherein each arithmetic unit acquires one element data from an array including a plurality of the element data.

Further, it has an indirect storage unit that outputs a corresponding value corresponding to the value to be held from the storage unit to the calculation unit by inputting the value to be held into the storage unit.
The storage unit receives the input of the first selection value from the indirect storage unit and outputs the first calculation value to the calculation unit.
The arithmetic processing device according to claim 1, wherein the arithmetic unit acquires the first operational value from the storage unit by storing the first selected value in the indirect storage unit.

It is a control method of a storage device having a plurality of element storage areas, and a calculation processing device including a product-sum calculation machine having a multiplier and an adder.
Each of the plurality of calculated values is stored in each of the element storage areas.
Generate the first selection value based on the input element data,
The first calculated value is acquired from the calculated value stored in the element storage area based on the first selected value.
Acquire the calculation result based on the first calculation value.
A control method of an arithmetic processing apparatus, characterized in that processing is performed using the multiplier and the adder .