JPH0863459A

JPH0863459A - Arithmetic processing method and arithmetic processing device

Info

Publication number: JPH0863459A
Application number: JP19438394A
Authority: JP
Inventors: Koji Kuroda; 浩二黒田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1994-08-18
Filing date: 1994-08-18
Publication date: 1996-03-08

Abstract

(57)【要約】【目的】本発明は、高速で且つ大量の演算処理を行な
う必要がある場合に用いて好適で、特に、ＣＰＵと、こ
のＣＰＵからのベクトル演算命令を受けて複数の演算を
同時に実行しうるＶＵとを用い、所定ベクトルデータに
対する総和演算，総積演算等を行なう演算処理方法およ
び演算処理装置に関し、物量の増大を招くことなく収束
動作により後続命令が待機するという状況を解消し、ベ
クトル処理の高速化を実現することを目的とする。【構成】所定ベクトルデータに対する部分演算をＶＵ
２により実行した後、ＶＵ２による部分演算結果に対す
る収束動作をＣＰＵ１により実行するように構成する。 (57) [Summary] [Object] The present invention is suitable for use in the case where it is necessary to perform a large amount of arithmetic processing at a high speed. The present invention relates to an arithmetic processing method and an arithmetic processing apparatus for performing a summation operation, a summation operation, etc. on predetermined vector data by using VU capable of simultaneously executing the following, and a situation in which a subsequent instruction waits due to a convergence operation without increasing the physical quantity. The purpose is to solve the problem and to realize a high-speed vector processing. [Structure] VU for partial operation on predetermined vector data
After being executed by the CPU 2, the CPU 1 executes the convergence operation for the partial calculation result by the VU 2.

Description

Detailed Description of the Invention

【０００１】（目次）産業上の利用分野従来の技術（図１１，図１２）発明が解決しようとする課題（図１１，図１２）課題を解決するための手段（図１）作用（図１）実施例（図２〜図１０）発明の効果(Table of Contents) Industrial Application Conventional Technology (FIGS. 11 and 12) Problem to be Solved by the Invention (FIGS. 11 and 12) Means for Solving the Problem (FIG. 1) Operation (FIG. 1) ) Example (FIGS. 2-10) Effect of the invention

【０００２】[0002]

【産業上の利用分野】本発明は、高速で且つ大量の演算
処理を行なう必要がある場合に用いて好適の演算処理方
法および演算処理装置に関し、特に、中央処理部（ＣＰ
Ｕ）と、この中央処理部からのベクトル演算命令を受け
て複数の演算を同時に実行しうるベクトル処理部（Ｖ
Ｕ）とを用いて、所定ベクトルデータに対する演算処理
（例えば総和演算，総積演算等）を行なう演算処理方法
および演算処理装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an arithmetic processing method and an arithmetic processing device suitable for use in the case where it is necessary to perform a large amount of arithmetic processing at high speed, and more particularly to a central processing unit (CP).
U) and a vector processing unit (V) capable of simultaneously executing a plurality of calculations in response to a vector calculation instruction from the central processing unit.
U) and an arithmetic processing method and an arithmetic processing device for performing arithmetic processing (for example, total sum arithmetic, total product arithmetic etc.) on predetermined vector data.

【０００３】[0003]

【従来の技術】近年、高速で且つ大量の演算処理を行な
う必要がある情報処理装置では、中央処理部〔以下ＣＰ
Ｕ（Central Processor Unit）という場合もある）のほ
かにベクトル処理部〔以下ＶＵ（Vector Unit)という場
合もある〕をそなえ、このＶＵに対してＣＰＵからベク
トル演算命令を与えることにより、ＶＵにより所定ベク
トルデータに対する演算処理を行なっている。2. Description of the Related Art In recent years, in an information processing apparatus that needs to perform a large amount of arithmetic processing at high speed, a central processing unit [hereinafter referred to as CP
In addition to U (Central Processor Unit), a vector processing unit [hereinafter also referred to as VU (Vector Unit)] is provided, and a vector operation instruction is given from the CPU to this VU Performs arithmetic processing on vector data.

【０００４】このＶＵとしては、例えば図１１もしくは
図１２に示すごとく構成されたものがある。これらの図
１１，図１２では、いずれも総和演算処理を行なうため
の加算処理系が示されており、加算用の演算器が４つそ
なえられ、４つの加算処理（並列度４）を同時に行なえ
るようになっている。図１１に示すＶＵ３０Ａでは、加
算を行なうための４つの演算器３１−１〜３１−４と、
各演算器３１−１〜３１−４からの演算結果を一旦格納
するレジスタ３２−１〜３２−４と、演算器３１−１の
入力側において後述する切換機能を果たすセレクタ３３
とがそなえられている。As this VU, there is a VU configured as shown in FIG. 11 or 12, for example. 11 and 12 each show an addition processing system for performing a summation processing, and four arithmetic units for addition are provided, and four addition processing (parallelism degree 4) can be performed at the same time. It has become so. In the VU 30A shown in FIG. 11, four arithmetic units 31-1 to 31-4 for performing addition,
Registers 32-1 to 32-4 that temporarily store the calculation results from the arithmetic units 31-1 to 31-4 and a selector 33 that performs a switching function to be described later on the input side of the arithmetic unit 31-1.
Is provided.

【０００５】各演算器３１−１〜３１−４には、図１１
に示すように、それぞれ、ベクトルデータのうち、４ｉ
＋１番目，４ｉ＋２番目，４ｉ＋３番目，４ｉ＋４番目
（ｉ＝０〜ｎ）のデータが順次入力され、部分和Ｓ１〜
Ｓ４が演算されるようになっている。つまり、各演算器
３１−１〜３１−４では、レジスタ３２−１〜３２−４
に一旦格納された前回までの加算結果〔それぞれ４ｉ−
３番目，４ｉ−２番目，４ｉ−１番目，４ｉ番目のデー
タまでの加算結果〕に、今回入力される４ｉ＋１番目，
４ｉ＋２番目，４ｉ＋３番目，４ｉ＋４番目のデータを
加算する処理を繰り返し実行することで、部分和Ｓ１〜
Ｓ４が演算される。Each of the arithmetic units 31-1 to 31-4 has a configuration shown in FIG.
As shown in FIG.
+ 1st, 4i + 2nd, 4i + 3rd, 4i + 4th (i = 0 to n) data is sequentially input, and partial sums S1 to
S4 is calculated. That is, in each of the arithmetic units 31-1 to 31-4, the registers 32-1 to 32-4 are provided.
The result of addition up to the previous time once stored in [4i-
Result of addition up to the 3rd, 4i−2nd, 4i−1th, and 4ith data].
By repeating the process of adding the 4i + 2nd, 4i + 3rd, and 4i + 4th data, the partial sum S1 to
S4 is calculated.

【０００６】なお、このとき、演算器３１−１の入力側
にそなえられるセレクタ３３は、ベクトルデータのうち
４ｉ＋１番目のデータを順次入力するように切り換えら
れている。各演算器３１−１〜３１−４による部分和Ｓ
１〜Ｓ４の演算処理を終了すると、次に、演算器３１−
１により、これらの部分和Ｓ１〜Ｓ４を加算する収束動
作を実行する。At this time, the selector 33 provided on the input side of the arithmetic unit 31-1 is switched to sequentially input the 4i + 1th data of the vector data. Partial sum S by each computing unit 31-1 to 31-4
When the arithmetic processing of 1 to S4 is completed, the arithmetic unit 31-
By 1, the convergence operation for adding these partial sums S1 to S4 is executed.

【０００７】つまり、１サイクル目には、セレクタ３３
により演算器３１−２からの部分和Ｓ２を選択すること
で、この部分和Ｓ２と、レジスタ３２−１に格納された
自演算器３１−１による部分和Ｓ１とを演算器３１−１
により加算する。２サイクル目には、セレクタ３３によ
り演算器３１−３からの部分和Ｓ３を選択することで、
この部分和Ｓ３と、レジスタ３２−１に格納された部分
和Ｓ１＋Ｓ２とを演算器３１−１により加算する。That is, in the first cycle, the selector 33
By selecting the partial sum S2 from the arithmetic unit 31-2, the partial sum S2 and the partial sum S1 by the own arithmetic unit 31-1 stored in the register 32-1 are calculated by the arithmetic unit 31-1.
Add by. In the second cycle, the selector 33 selects the partial sum S3 from the arithmetic unit 31-3,
The partial sum S3 and the partial sum S1 + S2 stored in the register 32-1 are added by the calculator 31-1.

【０００８】そして、３サイクル目には、セレクタ３３
により演算器３１−４からの部分和Ｓ４を選択すること
で、この部分和Ｓ４と、レジスタ３２−１に格納された
部分和Ｓ１＋Ｓ２＋Ｓ３とを演算器３１−１により加算
し、最終和つまり総和Ｓ＝Ｓ１＋Ｓ２＋Ｓ３＋Ｓ４が算
出され、その総和ＳはＣＰＵへ通知される。また、図１
２に示すＶＵ３０Ｂでは、図１１により前述したものと
同様の４つの演算器３１−１〜３１−４およびレジスタ
３２−１〜３２−４のほかに、演算器３１−１〜３１−
４による部分和Ｓ１〜Ｓ４を加算するための収束動作専
用演算器３４と、この演算器３４からの演算結果を一旦
格納するレジスタ３５と、演算器３４の入力側において
後述する切換機能を果たす２つのセレクタ３６Ａ，３６
Ｂとがそなえられている。Then, in the third cycle, the selector 33
By selecting the partial sum S4 from the arithmetic unit 31-4 by means of this, the partial sum S4 and the partial sum S1 + S2 + S3 stored in the register 32-1 are added by the arithmetic unit 31-1, and the final sum, that is, the total sum S = S1 + S2 + S3 + S4 is calculated, and the sum S is notified to the CPU. Also, FIG.
The VU 30B shown in FIG. 2 includes four computing units 31-1 to 31-4 and registers 32-1 to 32-4 similar to those described above with reference to FIG. 11, and computing units 31-1 to 31-31.
4 dedicated arithmetic operation unit 34 for adding the partial sums S1 to S4, a register 35 for temporarily storing the operation result from this arithmetic unit 34, and a switching function to be described later on the input side of the arithmetic unit 34 One selector 36A, 36
B is provided.

【０００９】演算器３１−１〜３１−４により、図１１
により説明したものと全く同様に、部分和Ｓ１〜Ｓ４が
それぞれ算出されるが、この図１２に示すＶＵ３０Ｂで
は、これらの部分和Ｓ１〜Ｓ４を加算する収束動作が、
演算器３１−１ではなく、収束動作専用演算器３４およ
びレジスタ３５により実行される。つまり、１サイクル
目には、セレクタ３６Ａにより演算器３１−１からの部
分和Ｓ１を選択するとともに、セレクタ３６Ｂにより演
算器３１−２からの部分和Ｓ２を選択することで、これ
らの部分和Ｓ１とＳ２とを収束動作専用演算器３４によ
り加算する。The operation units 31-1 to 31-4 shown in FIG.
The partial sums S1 to S4 are calculated in exactly the same manner as described above, but in the VU 30B shown in FIG. 12, the convergence operation for adding these partial sums S1 to S4 is
It is executed by the arithmetic unit 34 for exclusive use of the convergence operation and the register 35 instead of the arithmetic unit 31-1. That is, in the first cycle, the selector 36A selects the partial sum S1 from the arithmetic unit 31-1, and the selector 36B selects the partial sum S2 from the arithmetic unit 31-2, whereby the partial sum S1. And S2 are added by the arithmetic unit 34 dedicated to the convergence operation.

【００１０】２サイクル目には、セレクタ３６Ａにより
レジスタ３４の部分和Ｓ１＋Ｓ２を選択するとともに、
セレクタ３６Ｂにより演算器３１−３からの部分和Ｓ３
を選択することで、部分和Ｓ３とレジスタ３５からの部
分和Ｓ１＋Ｓ２とを収束動作専用演算器３４により加算
する。そして、３サイクル目には、セレクタ３６Ａによ
りレジスタ３４の部分和Ｓ１＋Ｓ２＋Ｓ３を選択すると
ともに、セレクタ３６Ｂにより演算器３１−４からの部
分和Ｓ４を選択することで、部分和Ｓ４とレジスタ３５
からの部分和Ｓ１＋Ｓ２＋Ｓ３とを収束動作専用演算器
３４により加算し、最終和つまり総和Ｓ＝Ｓ１＋Ｓ２＋
Ｓ３＋Ｓ４が算出され、その総和ＳはＣＰＵへ通知され
る。In the second cycle, the selector 36A selects the partial sum S1 + S2 of the register 34, and
The partial sum S3 from the calculator 31-3 by the selector 36B
By selecting, the partial sum S3 and the partial sum S1 + S2 from the register 35 are added by the convergence operation dedicated arithmetic unit 34. Then, in the third cycle, the selector 36A selects the partial sum S1 + S2 + S3 of the register 34, and the selector 36B selects the partial sum S4 from the arithmetic unit 31-4, whereby the partial sum S4 and the register 35 are selected.
And the partial sum S1 + S2 + S3 from the above are added by the operation unit dedicated to the convergence operation 34, and the final sum, that is, the total sum S = S1 + S2 +
S3 + S4 is calculated, and the sum S thereof is notified to the CPU.

【００１１】なお、図１１，図１２により上述したＶＵ
３０Ａ，３０Ｂでは総和演算処理を行なう場合について
説明しているが、全く同様にして総積演算処理を行なう
こともできる。The VU described above with reference to FIGS.
30A and 30B describe the case where the total sum calculation process is performed, but the total product calculation process can be performed in exactly the same manner.

【００１２】[0012]

【発明が解決しようとする課題】ところで、複数の演算
を同時に行なうベクトル処理部では、部分和を算出した
後、これらの部分和をさらに全て加算する収束動作を行
なう必要がある。そこで、図１１により上述したベクト
ル処理部３０Ａでは、収束動作を演算器３１−１で行な
っているが、このベクトル処理部３０Ａでは、収束動作
が終わるまで、次の演算命令の実行が待たされることに
なる。この収束動作には通常数十τを要するため、その
間、次の演算命令の実行が待たされるということは、ベ
クトル処理の性能低下の大きな要因になっている。By the way, in a vector processing section for simultaneously performing a plurality of operations, it is necessary to calculate a partial sum and then perform a converging operation of further adding all the partial sums. Therefore, in the vector processing unit 30A described above with reference to FIG. 11, the converging operation is performed by the arithmetic unit 31-1. However, in the vector processing unit 30A, execution of the next arithmetic instruction is waited until the converging operation ends. become. Since this convergence operation normally requires several tens of τ, waiting for the execution of the next arithmetic instruction during that time is a major factor in the deterioration of the vector processing performance.

【００１３】また、図１２により上述したベクトル処理
部３０Ｂでは、収束動作専用演算器３４をそなえること
により、収束動作を見せなくしている。これにより、図
１１に示すベクトル処理部３０Ａのように、収束動作の
ために次の演算命令が待たされることはなくなるが、収
束動作専用の演算器３４やレジスタ３５が必要になるた
めに、物量の増加を招くことになり好ましくない。Further, in the vector processing unit 30B described above with reference to FIG. 12, the convergence operation dedicated arithmetic unit 34 is provided so that the convergence operation is not shown. As a result, unlike the vector processing unit 30A shown in FIG. 11, the next operation instruction does not have to wait for the convergence operation, but since the operation unit 34 and the register 35 dedicated to the convergence operation are required, the physical quantity is increased. Is increased, which is not preferable.

【００１４】本発明は、このような課題に鑑み創案され
たもので、物量の増大を招くことなく収束動作により後
続命令が待機するという状況を解消し、ベクトル処理の
高速化を実現した、演算処理方法および演算処理装置を
提供することを目的とする。The present invention was devised in view of such a problem, and solves the situation that the subsequent instruction waits due to the convergence operation without increasing the physical quantity and realizes the speedup of vector processing. It is an object to provide a processing method and an arithmetic processing device.

【００１５】[0015]

【課題を解決するための手段】図１は本発明の原理ブロ
ック図で、この図１において、１は中央処理部（ＣＰ
Ｕ）、２は中央処理部１からのベクトル演算命令を受け
て複数の演算を同時に実行しうるベクトル処理部（Ｖ
Ｕ）で、これらの中央処理部１およびベクトル処理部２
とにより演算処理装置が構成されている。FIG. 1 is a block diagram showing the principle of the present invention. In FIG. 1, 1 is a central processing unit (CP).
U) and 2 are vector processing units (V) that can execute a plurality of operations at the same time by receiving a vector operation instruction from the central processing unit 1.
U), these central processing unit 1 and vector processing unit 2
An arithmetic processing unit is constituted by and.

【００１６】そして、本発明では、所定ベクトルデータ
に対する演算処理を行なう際に、ベクトル処理部２が、
所定ベクトルデータに対する部分演算を実行し、中央処
理部１が、ベクトル処理部２による部分演算結果に対す
る収束動作を実行するようになっている（請求項１，
５）。このとき、所定ベクトルデータに対する演算処理
が総和演算処理である場合、ベクトル処理部２が、所定
ベクトルデータの部分和を求める部分演算を実行し、中
央処理部１が、ベクトル処理部２による部分和を加算す
る収束動作を実行する（請求項２，６）。In the present invention, the vector processing section 2 is configured to perform the arithmetic processing on the predetermined vector data.
The central processing unit 1 executes a partial operation on predetermined vector data, and the central processing unit 1 executes a convergence operation on the result of the partial operation by the vector processing unit 2 (claim 1,
5). At this time, when the arithmetic processing for the predetermined vector data is the total sum arithmetic processing, the vector processing unit 2 executes the partial arithmetic for obtaining the partial sum of the predetermined vector data, and the central processing unit 1 causes the partial sum by the vector processing unit 2 to be executed. A convergence operation of adding is executed (claims 2 and 6).

【００１７】また、所定ベクトルデータに対する演算処
理が総積演算処理である場合も、同様に、ベクトル処理
部２が、所定ベクトルデータの部分積を求める部分演算
を実行し、中央処理部１が、ベクトル処理部２による部
分積を乗算する収束動作を実行する（請求項３，７）。
なお、ベクトル処理部２による部分演算結果を格納する
レジスタをそなえるとともに、ベクトル処理部２による
部分演算を全て終了すると中央処理部１へ部分演算終了
信号を出力する部分演算終了信号出力部をベクトル処理
部２にそなえ、中央処理部１が、部分演算終了信号出力
部からの部分演算終了信号を受けると、レジスタに格納
された部分演算結果に対する収束動作を実行するように
構成してもよい（請求項４，８）。Also, when the arithmetic processing for the predetermined vector data is the total product arithmetic processing, similarly, the vector processing unit 2 executes the partial arithmetic for obtaining the partial product of the predetermined vector data, and the central processing unit 1 A convergence operation for multiplying the partial products by the vector processing unit 2 is executed (claims 3 and 7).
The vector processing unit 2 is provided with a register for storing a partial calculation result, and a partial calculation end signal output unit that outputs a partial calculation end signal to the central processing unit 1 when all the partial calculation by the vector processing unit 2 is completed is provided with a vector processing unit. In addition to the unit 2, the central processing unit 1 may be configured to execute a convergence operation for the partial operation result stored in the register when receiving the partial operation end signal from the partial operation end signal output unit (claim). Items 4, 8).

【００１８】このとき、部分演算終了信号出力部は、ベ
クトルデータのベクトル長（中央処理部１からベクトル
データやベクトル演算命令とともに送信されるもの）を
格納するベクトル長レジスタと、部分演算処理を１サイ
クル行なう毎にベクトル長レジスタに格納された値から
ベクトル処理部２の並列度（１サイクルで同時に行なわ
れる部分演算処理の数）を減算しベクトル長レジスタに
格納し直す減算部と、この減算部による減算結果を０と
比較しその減算結果が０以下になった場合に部分演算終
了信号を出力する比較部とから構成することができる
（請求項９）。At this time, the partial operation end signal output unit stores the vector length of the vector data (which is transmitted from the central processing unit 1 together with the vector data and the vector operation instruction) with the vector length register and the partial operation processing by 1. A subtraction unit that subtracts the degree of parallelism of the vector processing unit 2 (the number of partial operation processes simultaneously performed in one cycle) from the value stored in the vector length register every cycle, and stores it again in the vector length register, and this subtraction unit And a comparator for outputting a partial operation end signal when the subtraction result becomes 0 or less (claim 9).

【００１９】また、中央処理部１からの演算命令を格納
する演算キューをそなえ、中央処理部１が、ベクトル処
理部２へベクトル演算命令を出力すると同時に、演算キ
ューにベクトル処理部２からレジスタに格納された部分
演算結果に対する収束動作を実行すべき旨の命令を格納
し、部分演算終了信号出力部からの部分演算終了信号を
受けると、演算キューに格納されている命令に従って、
レジスタに格納された部分演算結果に対する収束動作を
実行するように構成してもよい（請求項１０）。Further, it is provided with an arithmetic queue for storing arithmetic instructions from the central processing unit 1, and the central processing unit 1 outputs the vector arithmetic instruction to the vector processing unit 2 and at the same time, the vector processing unit 2 registers in the arithmetic queue. When the instruction to execute the convergence operation for the stored partial operation result is stored and the partial operation end signal from the partial operation end signal output unit is received, according to the instruction stored in the operation queue,
You may comprise so that a convergence operation may be performed with respect to the partial calculation result stored in the register (Claim 10).

【００２０】[0020]

【作用】図１により上述した本発明の演算処理方法およ
び演算処理装置では、所定ベクトルデータに対する総和
演算処理や総積演算処理といった演算処理を行なう際
に、所定ベクトルデータに対する部分演算（部分和演
算，部分積演算）を通常通りベクトル処理部２により実
行してから、ベクトル処理部２による部分演算結果に対
する収束動作（部分和の加算，部分積の乗算）が中央処
理部１により実行される。In the arithmetic processing method and the arithmetic processing apparatus of the present invention described above with reference to FIG. 1, when performing arithmetic processing such as total sum arithmetic processing and total product arithmetic processing on predetermined vector data, partial arithmetic (partial sum arithmetic) on predetermined vector data is performed. , Partial product calculation) by the vector processing unit 2 as usual, and then the central processing unit 1 executes a convergence operation (addition of partial sums, multiplication of partial products) for the partial calculation result by the vector processing unit 2.

【００２１】従って、ベクトル処理部２において後続の
ベクトル演算命令が収束動作によって待機させることが
なくなるほか、ベクトル処理部２に収束動作専用の演算
器等を別途そなえる必要もなくなる（請求項１〜３，５
〜７）。なお、ベクトル処理部２において全ての部分演
算を終了しその部分演算結果がレジスタに格納される
と、このベクトル処理部２の部分演算終了信号出力部か
ら部分演算終了信号が出力される。そして、中央処理部
１においては、その部分演算終了信号を受けると、レジ
スタに格納された部分演算結果に対する収束動作が実行
される（請求項４，８）。Therefore, in the vector processing unit 2, subsequent vector operation instructions are not made to wait by the convergence operation, and it is not necessary to separately provide the vector processing unit 2 with an arithmetic unit dedicated to the convergence operation (claims 1 to 3). , 5
~ 7). When all the partial calculations are completed in the vector processing unit 2 and the partial calculation results are stored in the register, the partial calculation end signal output unit of the vector processing unit 2 outputs the partial calculation end signal. When the central processing unit 1 receives the partial operation end signal, the central processing unit 1 executes the convergence operation for the partial operation result stored in the register (claims 4 and 8).

【００２２】このとき、ベクトル処理部２の部分演算終
了信号出力部では、中央処理部１から通知されたベクト
ルデータのベクトル長がベクトル長レジスタに予め格納
されており、部分演算処理を１サイクル行なう毎に、減
算部によりベクトル長レジスタの値からベクトル処理部
２の並列度が減算される。そして、その減算結果が０以
下になった場合に、全てのベクトルデータに対する部分
演算処理を終了したものと判断することができ、比較部
から部分演算終了信号を出力する（請求項９）。At this time, in the partial calculation end signal output section of the vector processing section 2, the vector length of the vector data notified from the central processing section 1 is stored in advance in the vector length register, and the partial calculation processing is performed for one cycle. For each time, the subtraction unit subtracts the parallelism of the vector processing unit 2 from the value of the vector length register. Then, when the subtraction result becomes 0 or less, it can be determined that the partial operation processing for all the vector data has been completed, and a partial operation end signal is output from the comparison section (claim 9).

【００２３】また、中央処理部１がベクトル処理部２に
対してベクトル演算命令を発行すると同時に、演算キュ
ーに、レジスタに格納された部分演算結果に対する収束
動作を実行すべき旨の命令を投入することにより、中央
処理部１側では、ベクトル演算命令が突き放し制御さ
れ、その他の演算処理を続行することができる（請求項
１０）。Further, the central processing unit 1 issues a vector operation instruction to the vector processing unit 2, and at the same time, inputs an instruction to execute a convergence operation for the partial operation result stored in the register into the operation queue. As a result, on the side of the central processing unit 1, the vector operation instruction is released and controlled, and other operation processing can be continued (claim 10).

【００２４】[0024]

【実施例】以下、図面を参照して本発明の実施例を説明
する。図２は本発明の一実施例としての演算処理装置の
全体構成を示すブロック図であり、この図２において、
１は中央処理部（以下ＣＰＵという）、２はＣＰＵ１か
らのベクトル演算命令を受けて複数の演算を同時に実行
しうるベクトル処理部（以下ＶＵという）である。Embodiments of the present invention will be described below with reference to the drawings. FIG. 2 is a block diagram showing the overall configuration of an arithmetic processing unit as an embodiment of the present invention. In FIG.
Reference numeral 1 is a central processing unit (hereinafter referred to as CPU), and 2 is a vector processing unit (hereinafter referred to as VU) capable of receiving a vector calculation instruction from the CPU 1 and simultaneously executing a plurality of calculations.

【００２５】そして、ＣＰＵ１は、このＣＰＵ１を制御
すべく図７にて後述するフローチャートに従って動作す
る制御部３と、図４により後述するごとく構成され図８
にて後述するフローチャートに従って動作する演算処理
部４とを含んで構成されている。同様に、ＶＵ２は、こ
のＶＵ２を制御すべく図９にて後述するフローチャート
に従って動作する制御部５と、図３により後述するごと
く構成され図１０にて後述するフローチャートに従って
動作する演算処理部６とを含んで構成されている。The CPU 1 is configured as described below with reference to FIG. 4 and the control unit 3 that operates according to the flowchart described later in FIG. 7 to control the CPU 1.
The calculation processing unit 4 that operates according to the flow chart described later is included. Similarly, the VU 2 includes a control unit 5 that operates according to a flowchart described below with reference to FIG. 9 to control the VU 2, and an arithmetic processing unit 6 that is configured as described below according to FIG. 3 and operates according to a flowchart described below with reference to FIG. It is configured to include.

【００２６】また、７はＶＵ２の演算処理部６にそなえ
られる部分和終了信号出力部（部分演算終了信号出力
部）で、この部分和終了信号出力部７は、ＶＵ２による
部分和演算を全て終了するとＣＰＵ１へ部分和終了信号
を出力して、ＣＰＵ１にＶＵ２にて部分和を全て求め終
わったことを知らせるためのもので、本実施例では、図
５にて後述するごとく構成されている。Reference numeral 7 denotes a partial sum end signal output unit (partial operation end signal output unit) provided in the operation processing unit 6 of the VU2. The partial sum end signal output unit 7 completes all partial sum operations by the VU2. Then, a partial sum end signal is output to the CPU 1 to notify the CPU 1 that the VU 2 has obtained all the partial sums, and this embodiment is configured as described later with reference to FIG.

【００２７】さらに、８−１〜８−４はそれぞれＶＵ２
により後述するごとく算出された部分和Ｓ１〜Ｓ４を一
旦格納するための部分和レジスタ、９は演算キューで、
この演算キュー９は、ＣＰＵ１からの演算命令を格納す
るもので、ＣＰＵ１がＶＵ２へベクトル演算命令を出力
すると同時に、ＣＰＵ１により、ＶＵ２から部分和レジ
スタ８−１〜８−４に格納された部分和Ｓ１〜Ｓ４の加
算（収束動作）を実行すべき旨の加算命令を格納（キュ
ーイング）されるものである。Further, 8-1 to 8-4 are VU2, respectively.
The partial sum register for temporarily storing the partial sums S1 to S4 calculated as will be described later, 9 is an operation queue,
The operation queue 9 stores the operation command from the CPU 1. When the CPU 1 outputs the vector operation command to the VU 2, the CPU 1 outputs the partial sum stored in the partial sum registers 8-1 to 8-4 from the VU 2 at the same time. An addition instruction to execute the addition (convergence operation) of S1 to S4 is stored (queuing).

【００２８】ところで、本実施例では、ベクトル演算処
理として総和演算処理を行なう場合について説明するた
め、図３，図４では、それぞれ、ＶＵ２の演算処理部６
およびＣＰＵ１の演算処理部４の加算処理系の構成が示
されている。図３に示すように、ＶＵ２の演算処理部６
は、図１１，図１２に示した従来のものと同様に、４つ
の加算用の演算器１０−１〜１０−４がそなえられ、４
つの加算処理（並列度４）を同時に行なえるようになっ
ている。また、演算処理部６には、各演算器１０−１〜
１０−４からの演算結果を一旦格納するレジスタ１１−
１〜１１−４とがそなえられるほか、図５により後述す
る部分和終了信号出力部７がそなえられている。By the way, in the present embodiment, in order to explain the case of performing the total sum calculation process as the vector calculation process, the calculation processing unit 6 of the VU 2 is shown in FIGS.
Also, the configuration of the addition processing system of the arithmetic processing unit 4 of the CPU 1 is shown. As shown in FIG. 3, the arithmetic processing unit 6 of the VU 2
Is provided with four arithmetic units 10-1 to 10-4 for addition, like the conventional one shown in FIGS.
One addition process (parallelism 4) can be performed at the same time. Further, the arithmetic processing unit 6 includes each arithmetic unit 10-1 to 10-1.
Register 11- for temporarily storing the calculation result from 10-4
1 to 11-4, a partial sum end signal output unit 7 described later with reference to FIG. 5 is provided.

【００２９】ここで、本実施例のＶＵ２の演算処理部６
の動作について簡単に説明すると、図１１，図１２に示
した従来のものと同様に、ＶＵ２では、総和命令の起動
により、総和命令を実行して４つの部分和Ｓ１〜Ｓ４が
算出される。なお、図３に示す本実施例のＶＵ２の演算
処理部６では、４つ並列（並列度４）で部分和Ｓ１〜Ｓ
４を作成する場合を説明しているが、並列度は２でも８
でもまたそれ以上でも構わない。Here, the arithmetic processing unit 6 of the VU 2 of the present embodiment.
To briefly explain the operation of the above, the VU2 executes the total sum instruction to calculate the four partial sums S1 to S4 as in the conventional one shown in FIGS. In addition, in the arithmetic processing unit 6 of the VU 2 of this embodiment shown in FIG.
Although the case of creating 4 is explained, even if the degree of parallelism is 2 even 8
But it can be more than that.

【００３０】並列度４の本実施例の場合、例えば、４つ
の演算器１０−１〜１０−４のうち演算器１０−１に
は、ベクトルデータのうち１，５，９，…，４ｎ＋１番
目のデータが順次取り込まれ、演算器１０−２には、ベ
クトルデータのうち２，６，１０，…，４ｎ＋２番目の
データが順次取り込まれる。同様に、演算器１０−３に
は、ベクトルデータのうち３，７，１１，…，４ｎ＋３
番目のデータが順次取り込まれ、演算器１０−４には、
ベクトルデータのうち４，８，１２，…，４ｎ＋４番目
のデータが順次取り込まれる。In the case of the present embodiment with the parallel degree of 4, for example, the arithmetic unit 10-1 among the four arithmetic units 10-1 to 10-4 has the 1,5,9, ..., 4n + 1th of the vector data. Of the vector data are sequentially fetched into the arithmetic unit 10-2. Similarly, in the arithmetic unit 10-3, among the vector data, 3, 7, 11, ..., 4n + 3
The th data is sequentially fetched, and the arithmetic unit 10-4
Of the vector data, the 4,8,12, ..., 4n + 4th data are sequentially fetched.

【００３１】各演算器１０−１〜１０−４に取り込まれ
たデータは、それぞれ、レジスタ１１−１〜１１−４を
介しループバックパスでループしてくるデータと加算さ
れ、各演算器１０−１〜１０−４への入力データがなく
なるまで加算処理が行なわれて、各演算器１０−１〜１
０−４により部分和Ｓ１〜Ｓ４が算出される。そして、
各演算器１０−１〜１０−４により算出された部分和Ｓ
１〜Ｓ４は、それぞれ、部分和レジスタ８−１〜８−４
にセットされ、同時に部分和終了信号が部分和終了信号
出力部７から出力され、ＣＰＵ１へ通知されるようにな
っている。The data fetched by each of the arithmetic units 10-1 to 10-4 is added to the data looped on the loop back path via the registers 11-1 to 11-4, and each arithmetic unit 10- The addition processing is performed until there is no input data to each of the arithmetic units 10-1 to 10-4.
The partial sums S1 to S4 are calculated from 0-4. And
Partial sum S calculated by each of the arithmetic units 10-1 to 10-4
1 to S4 are partial sum registers 8-1 to 8-4, respectively.
The partial sum end signal is simultaneously output from the partial sum end signal output unit 7 to notify the CPU 1.

【００３２】一方、図４に示すように、ＣＰＵ１の演算
処理部４は、部分和Ｓ１〜Ｓ４を加算する収束動作を実
行すべく、部分和レジスタ８−１〜８−４に格納された
部分和Ｓ１〜Ｓ４を加算するための演算器１２と、この
演算器１２からの演算結果を一旦格納するレジスタ１３
と、演算器１２の入力側において後述する切換機能を果
たす２つのセレクタ１４Ａ，１４Ｂとがそなえられてい
る。On the other hand, as shown in FIG. 4, the arithmetic processing unit 4 of the CPU 1 stores the parts stored in the partial sum registers 8-1 to 8-4 so as to execute the convergence operation of adding the partial sums S1 to S4. An arithmetic unit 12 for adding the sums S1 to S4 and a register 13 for temporarily storing the arithmetic result from the arithmetic unit 12.
And two selectors 14A and 14B that perform a switching function described later on the input side of the arithmetic unit 12.

【００３３】なお、図４に示す演算処理部４の構成は、
図１２に示した従来の収束動作専用演算器３４，レジス
タ３５，セレクタ３６Ａ，３６Ｂと同一構成となってお
り、その動作・機能も同様になっている。ただし、本実
施例において、演算器１２，レジスタ１３，セレクタ１
４Ａ，１４Ｂは、収束動作を行なうために従来のように
別途そなえられたものではなく、ＣＰＵ１内に通常そな
えられた部分を使用して構成されている。The configuration of the arithmetic processing unit 4 shown in FIG.
It has the same configuration as the conventional operation unit 34 for exclusive use of the convergence operation, the register 35, and the selectors 36A and 36B shown in FIG. 12, and their operations and functions are also the same. However, in the present embodiment, the arithmetic unit 12, the register 13, the selector 1
4A and 14B are not separately provided as in the conventional case for performing the converging operation, but are configured by using a portion normally provided in the CPU 1.

【００３４】ここで、本実施例のＣＰＵ１の演算処理部
４の収束動作について簡単に説明すると、ＣＰＵ１の演
算処理部４は、ＶＵ２の部分和終了信号出力部７から部
分演算終了信号を受けると、演算キュー９に格納されて
いる命令に従って、部分和レジスタ８−１〜８−４に格
納された部分和Ｓ１〜Ｓ４を加算する収束動作を実行す
るようになっている。The convergence operation of the arithmetic processing unit 4 of the CPU 1 of this embodiment will be briefly described below. When the arithmetic processing unit 4 of the CPU 1 receives the partial arithmetic end signal from the partial sum end signal output unit 7 of the VU 2. A convergence operation for adding the partial sums S1 to S4 stored in the partial sum registers 8-1 to 8-4 is executed in accordance with the instruction stored in the operation queue 9.

【００３５】この収束動作に際して、本実施例の演算処
理部４では、１サイクル目には、セレクタ１４Ａにより
部分和レジスタ８−１からの部分和Ｓ１を選択するとと
もに、セレクタ１４Ｂにより部分和レジスタ８−２から
の部分和Ｓ２を選択することで、これらの部分和Ｓ１と
Ｓ２とを演算器１２により加算し、その加算結果をレジ
スタ１３に格納する。In the convergence operation, in the arithmetic processing unit 4 of the present embodiment, in the first cycle, the selector 14A selects the partial sum S1 from the partial sum register 8-1 and the selector 14B selects the partial sum register 8-1. By selecting the partial sum S2 from -2, these partial sums S1 and S2 are added by the arithmetic unit 12, and the addition result is stored in the register 13.

【００３６】２サイクル目には、セレクタ１４Ａによ
り、レジスタ１３からループバックパスでループしてく
るデータ（部分和Ｓ１＋Ｓ２）を選択するとともに、セ
レクタ１４Ｂにより部分和レジスタ８−３からの部分和
Ｓ３を選択することで、部分和Ｓ３とレジスタ１３から
の部分和Ｓ１＋Ｓ２とを演算器１２により加算し、その
加算結果をレジスタ１３に格納する。In the second cycle, the selector 14A selects the data (partial sum S1 + S2) looped by the loopback path from the register 13, and the selector 14B selects the partial sum S3 from the partial sum register 8-3. By selecting, the partial sum S3 and the partial sum S1 + S2 from the register 13 are added by the arithmetic unit 12, and the addition result is stored in the register 13.

【００３７】そして、３サイクル目には、セレクタ１４
Ａによりレジスタ１３の部分和Ｓ１＋Ｓ２＋Ｓ３を選択
するとともに、セレクタ１４Ｂにより部分和レジスタ８
−４からの部分和Ｓ４を選択することで、部分和Ｓ４と
レジスタ１３からの部分和Ｓ１＋Ｓ２＋Ｓ３とを演算器
１２により加算し、最終和つまり総和Ｓ＝Ｓ１＋Ｓ２＋
Ｓ３＋Ｓ４が算出され、ＣＰＵ１の演算処理部４による
収束動作を完了する。Then, in the third cycle, the selector 14
The partial sum S1 + S2 + S3 of the register 13 is selected by A, and the partial sum register 8 is selected by the selector 14B.
By selecting the partial sum S4 from -4, the partial sum S4 and the partial sum S1 + S2 + S3 from the register 13 are added by the calculator 12, and the final sum, that is, the total sum S = S1 + S2 +
S3 + S4 is calculated, and the convergence operation by the arithmetic processing unit 4 of the CPU 1 is completed.

【００３８】さて、次に、ＶＵ２の演算処理部６にそな
えられる部分和終了信号出力部７の構成を、図５により
説明する。この図５において、１５はＶＬカウンタレジ
スタ（ベクトル長レジスタ）で、このＶＬカウンタレジ
スタ１５は、ＣＰＵ１からベクトルデータやベクトル演
算命令とともに送信されてくるベクトルデータのベクト
ル長ＶＬ〔ベクトル処理する要素の長さ（データ数）〕
を格納し、その後、後述する加算器１６による演算結果
を更新・格納するものである。Now, the configuration of the partial sum end signal output unit 7 provided in the arithmetic processing unit 6 of the VU 2 will be described with reference to FIG. In FIG. 5, reference numeral 15 is a VL counter register (vector length register), and the VL counter register 15 is a vector length VL of vector data transmitted from the CPU 1 together with vector data and a vector operation instruction [length of vector processing element. (The number of data)]
Is stored, and then the calculation result by the adder 16 described later is updated and stored.

【００３９】１６は加算器（ＡＤＤ，減算部）で、この
加算器１６は、各演算器１０−１〜１０−４により部分
演算処理を１サイクル行なう毎に、ＶＬカウンタレジス
タ１５に格納された値からＶＵ２の並列度（１サイクル
で同時に行なわれる部分演算処理の数）を減算すべく、
本実施例では−４を加算するものである。この加算器１
６による演算結果は、後述するセレクタ１７を介してＶ
Ｌカウンタレジスタ１５に更新・書込されるようになっ
ている。なお、ＶＵ２の並列度が８であれば、当然、加
算器１６により加算される値は−８となる。Reference numeral 16 is an adder (ADD, subtraction unit), and this adder 16 is stored in the VL counter register 15 every time one cycle of partial operation processing is performed by each of the operation units 10-1 to 10-4. In order to subtract the parallelism of VU2 (the number of partial arithmetic operations performed simultaneously in one cycle) from the value,
In this embodiment, -4 is added. This adder 1
The calculation result of 6 is V through a selector 17 described later.
It is adapted to be updated / written in the L counter register 15. If the parallelism of VU2 is 8, naturally, the value added by the adder 16 is -8.

【００４０】１７はセレクタで、このセレクタ１７は、
ベクトル命令のスタート時には、ＣＰＵ１からのベクト
ル長ＶＬを選択してＶＬカウンタレジスタ１５に書き込
む一方、それ以降は加算器１６からの演算結果を選択し
てＶＬカウンタレジスタ１５に書き込むものである。１
８はＶＬカウンタチェック部（比較部）で、このＶＬカ
ウンタチェック部１８は、加算器１６による演算結果を
０と比較しその演算結果が０以下になった場合に、全て
のベクトルデータに対する部分和演算処理を終了したも
のと判断し、命令終了信号を１τ（１制御周期）間だけ
オン（High）として出力するものである。Reference numeral 17 is a selector, and this selector 17 is
At the start of the vector instruction, the vector length VL from the CPU 1 is selected and written in the VL counter register 15, and thereafter, the operation result from the adder 16 is selected and written in the VL counter register 15. 1
Reference numeral 8 denotes a VL counter checking unit (comparing unit). The VL counter checking unit 18 compares the operation result of the adder 16 with 0, and when the operation result becomes 0 or less, the partial sum for all vector data is obtained. It is determined that the arithmetic processing has been completed, and the instruction end signal is output as ON (High) for 1τ (1 control cycle).

【００４１】なお、本実施例では、ＶＬカウンタチェッ
ク部１８からの出力（命令終了信号）が立ち上がった場
合にこれを直ちに部分和終了信号としては出力せず、こ
のときの命令が総和命令であるか否かを考慮し、総和命
令である場合にのみ、ＶＬカウンタチェック部１８から
の出力（命令終了信号）を部分和終了信号として出力す
べく、オペコードレジスタ１９，デコーダ２０およびＡ
ＮＤゲート２１がそなえられている。In this embodiment, when the output (instruction end signal) from the VL counter check unit 18 rises, this is not immediately output as the partial sum end signal, and the instruction at this time is the sum total instruction. Considering whether or not it is the sum instruction, the operation code register 19, the decoder 20 and the A code are output so that the output (instruction end signal) from the VL counter check unit 18 is output as the partial sum end signal only when it is the sum instruction.
An ND gate 21 is provided.

【００４２】つまり、オペコードレジスタ１９は、ＣＰ
Ｕ１からの命令コード（オペコード）を格納するもので
あり、デコーダ２０は、オペコードレジスタ１９に格納
された命令コードを解読し、総和命令である場合に立ち
上がる信号を出力するものである。そして、ＡＮＤゲー
ト２１は、ＶＬカウンタチェック部１８からの出力（命
令終了信号）とデコーダ２０からの出力との論理積をと
って部分和終了信号としてＣＰＵ１へ出力するものであ
る。つまり、ＣＰＵ１からの命令が総和命令である場合
に、ＶＬカウンタチェック部１８からの出力（命令終了
信号）が立ち上がると、その信号が部分和終了信号とし
てＣＰＵ１へ出力されるようになっている。That is, the operation code register 19 is
The instruction code (operation code) from U1 is stored, and the decoder 20 decodes the instruction code stored in the operation code register 19 and outputs a signal that rises when it is a sum instruction. The AND gate 21 takes the logical product of the output from the VL counter check unit 18 (instruction end signal) and the output from the decoder 20 and outputs it as a partial sum end signal to the CPU 1. That is, when the instruction from the CPU 1 is the sum instruction, when the output (instruction end signal) from the VL counter check unit 18 rises, the signal is output to the CPU 1 as the partial sum end signal.

【００４３】ここで、本実施例の部分和終了信号出力部
７の動作について、図６により簡単に説明する。例えば
ベクトル長ＶＬ＝１７，並列度が４で、ＣＰＵ１から総
和命令が与えられた場合には、その命令開始時に、ＶＬ
カウンタレジスタ１５にＶ値として“１７”が書き込ま
れ、各演算器１０−１〜１０−４により部分和演算処理
を１サイクル行なう毎に、加算器１６によって、ＶＬカ
ウンタレジスタ１５に格納される値から並列度４が減算
され、加算器１６の出力およびＶＬカウンタレジスタ１
５の値は、“１３”，“９”，“５”，“１”，“−
３”，“−７”と更新されていく。The operation of the partial sum end signal output unit 7 of this embodiment will be briefly described with reference to FIG. For example, if the vector length VL = 17, the degree of parallelism is 4, and the sum instruction is given from the CPU 1, at the start of the instruction, VL
The value stored in the VL counter register 15 by the adder 16 every time the arithmetic unit 10-1 to 10-4 performs a partial sum operation process by writing "17" as the V value into the counter register 15. The parallel degree of 4 is subtracted from the output of the adder 16 and the VL counter register 1
The values of 5 are "13", "9", "5", "1", "-".
It is updated as 3 ”and“ −7 ”.

【００４４】そして、加算器１６の出力が“−３”にな
った時点で、ＶＬカウンタチェック部１８からの出力
（命令終了信号）が１τの間だけ立ち上がる。このと
き、ＣＰＵ１からの命令は総和命令であるので、デコー
ダ２０からの出力も立ち上がっており、ＡＮＤゲート２
０からの部分和終了信号が１τ間立ち上がって、ＣＰＵ
１に部分和演算を終了した旨が通知される。When the output of the adder 16 becomes "-3", the output (command end signal) from the VL counter check unit 18 rises for 1τ. At this time, since the instruction from the CPU 1 is the sum instruction, the output from the decoder 20 is also rising, and the AND gate 2
The partial sum end signal from 0 rises for 1τ and the CPU
1 is notified that the partial sum operation is completed.

【００４５】次に、上述のごとく構成された本実施例の
演算処理装置における、ＣＰＵ１の制御部３，演算処理
部４の動作、および、ＶＵ２の制御部５，演算処理部６
の動作をそれぞれ図７〜図１０により説明する。まず、
ＣＰＵ１の制御部３において制御動作が開始されると、
図７に示すように、外部から与えられた命令の解読が行
なわれ（ステップＡ１）、その命令がベクトル命令であ
るか否かを判定する（ステップＡ２）。ベクトル命令で
ない場合には、スカラ命令であると判断してそのスカラ
命令を実行した後（ステップＡ３）、プログラムカウン
タを更新する（ステップＡ４）。Next, in the arithmetic processing unit of the present embodiment configured as described above, the operations of the control unit 3 and the arithmetic processing unit 4 of the CPU 1 and the control unit 5 and the arithmetic processing unit 6 of the VU 2 are performed.
The operation will be described with reference to FIGS. First,
When the control operation is started in the control unit 3 of the CPU 1,
As shown in FIG. 7, an externally applied instruction is decoded (step A1), and it is determined whether the instruction is a vector instruction (step A2). If it is not a vector instruction, it is judged to be a scalar instruction and the scalar instruction is executed (step A3), and then the program counter is updated (step A4).

【００４６】一方、ステップＡ２でベクトル命令である
と判定された場合には、その命令が総和命令であるか否
かを判定し（ステップＡ５）、総和命令である場合に
は、演算キュー９に、部分和の加算命令を投入するとと
もに（ステップＡ６）、ＶＵ（ベクトルユニット）２に
対してその総和命令を投入し、総和命令を実行する（ス
テップＡ７）。なお、ステップＡ５で総和命令でないと
判定された場合には、ステップＡ６による処理を省略し
て、そのベクトル命令をＶＵ２へ投入してそのベクトル
命令に応じたベクトル演算処理を実行する（ステップＡ
７）。ベクトル命令をＶＵ２に投入した後には、プログ
ラムカウンタを更新してから（ステップＡ４）、ステッ
プＡ１に戻る。On the other hand, if it is determined in step A2 that it is a vector instruction, it is determined whether or not the instruction is a summation instruction (step A5). If it is a summation instruction, it is stored in the operation queue 9. , A partial sum addition instruction is input (step A6), the total addition instruction is input to the VU (vector unit) 2, and the total addition instruction is executed (step A7). If it is determined in step A5 that it is not the sum instruction, the processing in step A6 is omitted, the vector instruction is input to VU2, and the vector operation processing corresponding to the vector instruction is executed (step A).
7). After the vector instruction is input to VU2, the program counter is updated (step A4), and the process returns to step A1.

【００４７】さて、ＶＵ２の制御部５においては、常
時、未実行のベクトル命令があるか否かを判定しており
（ステップＣ１）、未実行のベクトル命令が無い場合に
は、ＣＰＵ１からベクトル命令が投入されるまで待機し
ている。ステップＣ１で未実行のベクトル命令があると
判定された場合には、その命令を解読し（ステップＣ
２）、そのベクトル命令を実行できる状態であるか否か
を判定する（ステップＣ３）。実行できない状態である
場合、つまり、演算器１０−１〜１０−４等で演算処理
実行中である場合には、実行できる状態になるまで待機
する。そして、ベクトル命令を実行できる状態であれ
ば、そのベクトル命令を起動する（ステップＣ４）。The control unit 5 of the VU 2 constantly determines whether or not there is an unexecuted vector instruction (step C1). If there is no unexecuted vector instruction, the CPU 1 sends the vector instruction. Is waiting until is input. If it is determined in step C1 that there is an unexecuted vector instruction, that instruction is decoded (step C
2) It is determined whether the vector instruction can be executed (step C3). When it is in a non-executable state, that is, when the arithmetic processing is being executed by the arithmetic units 10-1 to 10-4, etc., it waits until it becomes the executable state. Then, if the vector instruction can be executed, the vector instruction is activated (step C4).

【００４８】そして、ＶＵ２の演算処理部６では、常
時、制御部５からベクトル命令が起動されたか否かを判
定しており（ステップＤ１）、ベクトル命令が起動され
るまで待機している。ベクトル命令が起動されると、そ
のベクトル命令を実行し（ステップＤ２）、そのベクト
ル命令が終了したか否かを、図５により前述した部分和
終了信号出力部７のＶＬカウンタチェック部１８の出力
（命令終了信号）の立ち上がりに基づいて判定する（ス
テップＤ３）。Then, the arithmetic processing unit 6 of the VU 2 always determines whether or not the vector instruction is activated from the control unit 5 (step D1), and stands by until the vector instruction is activated. When the vector instruction is activated, the vector instruction is executed (step D2), and whether or not the vector instruction is finished is output from the VL counter check unit 18 of the partial sum end signal output unit 7 described above with reference to FIG. The determination is made based on the rising edge of the (command end signal) (step D3).

【００４９】ＶＬカウンタチェック部１８の出力（命令
終了信号）が立ち上がると、そのベクトル命令が終了し
たものと判断するが、本実施例では、さらに、ＶＬカウ
ンタチェック部１８から出力された命令終了信号が、総
和命令に基づくものであるか否かを判定する（ステップ
Ｄ４）。つまり、総和命令を終了した場合には、図５，
図６により前述した通り、デコーダ２０の出力が立ち上
がり、ＡＮＤゲート２１から、ＶＬカウンタチェック部
１８の命令終了信号が部分和終了信号としてＣＰＵ１へ
通知される（ステップＤ５）。When the output (instruction end signal) of the VL counter check unit 18 rises, it is determined that the vector instruction has ended. In the present embodiment, the instruction end signal output from the VL counter check unit 18 is further determined. Is based on the sum instruction (step D4). That is, when the summation instruction is completed, as shown in FIG.
As described above with reference to FIG. 6, the output of the decoder 20 rises, and the AND gate 21 notifies the CPU 1 of the instruction end signal of the VL counter check unit 18 as a partial sum end signal (step D5).

【００５０】さらに、ＣＰＵ１の演算処理部４では、常
時、ＶＵ２から部分和終了信号が送られてきたかどうか
を判定しており（ステップＢ１）、部分和終了信号が送
られてくるまで待機している。部分和終了信号が送られ
てくると、ＶＵ２で求められた部分和Ｓ１〜Ｓ４がそれ
ぞれ部分和レジスタ８−１〜８−４に格納されている状
態であるので、演算処理部４は、演算キュー９に入って
いる部分和の加算命令（収束動作命令）を図４により前
述したように実行する（ステップＢ２）。収束動作を終
了すると、演算処理部４は、部分和の加算命令を演算キ
ュー９から排出（デキュー）して（ステップＢ３）、再
び、部分和終了信号の待機状態となる。Further, the arithmetic processing unit 4 of the CPU 1 always determines whether or not the partial sum end signal is sent from the VU 2 (step B1), and waits until the partial sum end signal is sent. There is. When the partial sum end signal is sent, the partial sums S1 to S4 obtained by the VU2 are in the states stored in the partial sum registers 8-1 to 8-4, respectively. The partial sum addition instruction (convergence operation instruction) in the queue 9 is executed as described above with reference to FIG. 4 (step B2). When the converging operation is completed, the arithmetic processing unit 4 discharges (dequeues) the partial sum addition instruction from the arithmetic queue 9 (step B3), and again enters the partial sum end signal standby state.

【００５１】このように、本発明の一実施例によれば、
所定ベクトルデータに対する部分和演算を通常通りＶＵ
２により実行してから、その部分和Ｓ１〜Ｓ４を加算す
る収束動作をＣＰＵ１により実行することで、ＶＵ２に
おいて後続のベクトル演算命令が収束動作によって待機
させることがなくなり、ベクトル処理の大幅な高速化を
実現できるほか、収束動作をＣＰＵ１で行なうことによ
り、ＶＵ２に従来のような収束動作専用の演算器等を別
途そなえる必要もなくなり、物量の増大も招かない。Thus, according to one embodiment of the present invention,
VU for normal partial sum operation on predetermined vector data
By executing the converging operation of adding the partial sums S1 to S4 by the CPU 1 after being executed by 2, the subsequent vector operation instruction in the VU 2 is not made to wait by the converging operation, and the vector processing is significantly speeded up. In addition, since the CPU 1 performs the converging operation, it is not necessary to separately provide the VU 2 with an arithmetic unit dedicated to the converging operation, and the physical quantity is not increased.

【００５２】また、本実施例によれば、ＶＵ２の部分和
終了信号出力部７により、ベクトル長ＶＬおよびＶＵ２
の並列度に基づいて、全てのベクトルデータに対する部
分和演算処理を終了したか否かを確実に判断することが
できるので、部分和終了信号を正確に出力でき、ＣＰＵ
１において収束動作を確実に開始・実行することができ
る。Further, according to this embodiment, the vector length VL and VU2 are controlled by the partial sum end signal output unit 7 of VU2.
Since it is possible to reliably determine whether or not the partial sum calculation processing for all vector data has been completed based on the degree of parallelism of the CPU, the partial sum end signal can be accurately output, and the CPU
In 1, the convergence operation can be reliably started and executed.

【００５３】さらに、本実施例によれば、ＣＰＵ１がベ
クトル演算命令を発行すると同時に、演算キュー９に、
部分和の加算命令（収束動作命令）を投入することによ
り、ＣＰＵ１側では、ベクトル演算命令が突き放し制御
され、その他の演算処理を続行できる。つまり、ベクト
ル演算処理をＣＰＵ１とＶＵ２とで分担することによ
り、物量の増加を招くことなく、ＶＵ２を効率良く使用
して処理性能を大幅に向上させることができるのであ
る。Further, according to this embodiment, at the same time when the CPU 1 issues a vector operation instruction, the operation queue 9
By inputting a partial sum addition instruction (convergence operation instruction), the CPU 1 side controls the vector operation instruction to be ejected, and other operation processing can be continued. In other words, by sharing the vector operation processing between the CPU 1 and the VU 2, it is possible to efficiently use the VU 2 and significantly improve the processing performance without increasing the physical quantity.

【００５４】なお、上述した実施例では、ベクトル演算
命令が総和命令である場合について説明したが、本発明
は、これに限定されるものではなく、ベクトル演算命令
が演算順序を変えても演算結果の変わらない演算命令
（例えば総積命令等）であれば、上述した実施例と同様
に適用され、上記実施例と同様の作用効果を得ることが
できる。In the above-described embodiment, the case where the vector operation instruction is the sum instruction is explained, but the present invention is not limited to this, and the operation result can be obtained even if the vector operation instruction changes the operation order. If it is an arithmetic instruction that does not change (for example, a total product instruction), it is applied in the same manner as in the above-mentioned embodiment, and the same effect as the above-mentioned embodiment can be obtained.

【００５５】[0055]

【発明の効果】以上詳述したように、本発明の演算処理
方法および演算処理装置によれば、所定ベクトルデータ
に対する部分演算（部分和演算，部分積演算）を通常通
りベクトル処理部により実行してから、その部分演算結
果に対する収束動作（部分和の加算，部分積の乗算）を
中央処理部により実行することで、ベクトル処理部にお
いて後続のベクトル演算命令が収束動作によって待機さ
せることがなくなり、ベクトル処理の大幅な高速化を実
現できるほか、収束動作を中央処理部で行なうことによ
りベクトル処理部に収束動作専用の演算器等を別途そな
える必要もなくなり、物量の増大を招くこともない（請
求項１〜３，５〜７）。As described above in detail, according to the arithmetic processing method and the arithmetic processing apparatus of the present invention, the partial arithmetic operation (the partial sum arithmetic operation, the partial product arithmetic operation) on the predetermined vector data is executed by the vector processing section as usual. Then, the central processing unit executes the convergence operation (addition of partial sums, multiplication of partial products) for the partial operation result, so that the subsequent vector operation instruction in the vector processing unit does not wait by the convergence operation. Not only can the vector processing speed be significantly increased, but since the central processing unit performs the convergence operation, it is no longer necessary to separately provide an arithmetic unit or the like dedicated to the convergence operation in the vector processing unit, which does not lead to an increase in the physical quantity. Items 1-3, 5-7).

【００５６】また、ベクトル処理部で全ての部分演算を
終了しその部分演算結果がレジスタに格納されると、ベ
クトル処理部から部分演算終了信号が出力されるので、
中央処理部では、その部分演算終了信号に応じ、レジス
タに格納された部分演算結果に対する収束動作を確実に
実行することができる（請求項４，８）。さらに、本発
明の演算処理装置によれば、ベクトル処理部の部分演算
終了信号出力部により、ベクトルデータのベクトル長お
よびベクトル処理部の並列度に基づいて、全てのベクト
ルデータに対する部分演算処理を終了したか否かを確実
に判断することができるので、部分演算終了信号を正確
に出力でき、中央処理部において収束動作を確実に開始
することができる（請求項９）。When all the partial calculations are completed in the vector processing section and the partial calculation results are stored in the registers, the vector processing section outputs a partial calculation end signal.
The central processing unit can surely execute the convergence operation for the partial operation result stored in the register in accordance with the partial operation end signal (claims 4 and 8). Further, according to the arithmetic processing unit of the present invention, the partial arithmetic end signal output unit of the vector processing unit ends the partial arithmetic processing for all vector data based on the vector length of the vector data and the parallelism of the vector processing unit. Since it is possible to reliably determine whether or not the partial operation is completed, the partial operation end signal can be accurately output, and the central processing unit can reliably start the converging operation (claim 9).

【００５７】また、中央処理部がベクトル演算命令を発
行すると同時に、演算キューに、収束動作を実行すべき
旨の命令を投入することにより、中央処理部側では、ベ
クトル演算命令が突き放し制御され、その他の演算処理
を続行することができる。即ち、ベクトル演算処理を中
央処理部とベクトル処理部とで分担することにより、物
量の増加を招くことなく、ベクトル処理部を効率良く使
用して処理性能を大幅に向上させることができる（請求
項１０）。Further, at the same time that the central processing unit issues the vector operation instruction, the instruction to execute the convergence operation is input to the operation queue, so that the vector operation instruction is released and controlled on the side of the central processing unit. Other arithmetic processing can be continued. That is, by sharing the vector calculation processing between the central processing unit and the vector processing unit, it is possible to efficiently use the vector processing unit and significantly improve the processing performance without increasing the physical quantity. 10).

[Brief description of drawings]

【図１】本発明の原理ブロック図である。FIG. 1 is a principle block diagram of the present invention.

【図２】本発明の一実施例としての演算処理装置の全体
構成を示すブロック図である。FIG. 2 is a block diagram showing an overall configuration of an arithmetic processing unit as one embodiment of the present invention.

【図３】本実施例のベクトル処理部の要部構成を示すブ
ロック図である。FIG. 3 is a block diagram showing a main configuration of a vector processing unit according to the present embodiment.

【図４】本実施例の中央処理部の要部構成を示すブロッ
ク図である。FIG. 4 is a block diagram showing a main configuration of a central processing unit of the present embodiment.

【図５】本実施例のベクトル処理部における部分和終了
信号出力部の構成を示すブロック図である。FIG. 5 is a block diagram showing a configuration of a partial sum end signal output unit in the vector processing unit of the present embodiment.

【図６】本実施例のベクトル処理部における部分和終了
信号出力部の動作を説明するためのタイムチャートであ
る。FIG. 6 is a time chart for explaining the operation of the partial sum end signal output unit in the vector processing unit of this embodiment.

【図７】本実施例の中央処理部における制御部の動作を
説明するためのフローチャートである。FIG. 7 is a flowchart for explaining the operation of the control unit in the central processing unit of this embodiment.

【図８】本実施例の中央処理部における演算処理部の動
作を説明するためのフローチャートである。FIG. 8 is a flowchart for explaining the operation of the arithmetic processing unit in the central processing unit of the present embodiment.

【図９】本実施例のベクトル処理部における制御部の動
作を説明するためのフローチャートである。FIG. 9 is a flowchart for explaining the operation of the control unit in the vector processing unit of this embodiment.

【図１０】本実施例のベクトル処理部における演算処理
部の動作を説明するためのフローチャートである。FIG. 10 is a flowchart for explaining the operation of the arithmetic processing unit in the vector processing unit of this embodiment.

【図１１】従来のベクトル処理部およびその総和演算処
理動作を説明するためのブロック図である。FIG. 11 is a block diagram for explaining a conventional vector processing unit and its summing operation processing operation.

【図１２】従来のベクトル処理部およびその総和演算処
理動作の他例を説明するためのブロック図である。FIG. 12 is a block diagram for explaining another example of a conventional vector processing unit and its total operation processing operation.

[Explanation of symbols]

１中央処理部（ＣＰＵ）２ベクトル処理部（ＶＵ）３，５制御部４，６演算処理部７部分和終了信号出力部（部分演算終了信号出力部）８部分和レジスタ９演算キュー１０−１〜１０−４，１２演算器１１−１〜１１−４，１３レジスタ１４Ａ，１４Ｂ，１７セレクタ１５ＶＬカウンタレジスタ（ベクトル長レジスタ）１６加算器（ＡＤＤ，減算部）１８ＶＬカウンタチェック部（比較部）１９オペコードレジスタ２０デコーダ２１ＡＮＤゲート 1 Central Processing Unit (CPU) 2 Vector Processing Unit (VU) 3, 5 Control Unit 4, 6 Arithmetic Processing Unit 7 Partial Sum End Signal Output Unit (Partial Operation End Signal Output Unit) 8 Partial Sum Register 9 Arithmetic Queue 10-1 -10-4,12 arithmetic unit 11-1-11-4,13 register 14A, 14B, 17 selector 15 VL counter register (vector length register) 16 adder (ADD, subtraction unit) 18 VL counter check unit (comparison unit) ) 19 opcode register 20 decoder 21 AND gate

Claims

[Claims]

1. An arithmetic processing device comprising a central processing unit and a vector processing unit capable of receiving a vector arithmetic instruction from the central processing unit and simultaneously executing a plurality of arithmetic operations, thereby performing arithmetic processing on predetermined vector data. When performing, the central processing unit performs a partial operation on the predetermined vector data by the vector processing unit, and then performs a convergence operation on the partial operation result by the vector processing unit by the central processing unit. Method.

2. When the arithmetic processing for the predetermined vector data is a total sum arithmetic processing, a partial arithmetic operation for obtaining a partial sum of the predetermined vector data is executed by the vector processing unit, and then the partial sum by the vector processing unit is added. The arithmetic processing method according to claim 1, wherein the convergence operation is performed by the central processing unit.

3. When the calculation process for the predetermined vector data is a total product calculation process, a partial calculation for obtaining a partial product of the predetermined vector data is executed by the vector processing unit, and then the partial product is calculated by the vector processing unit. 2. The arithmetic processing method according to claim 1, wherein the convergence operation for multiplication is executed by the central processing unit.

4. When all the partial operations by the vector processing unit are completed, a partial operation end signal is output from the vector processing unit to the central processing unit, and the convergence occurs when the central processing unit receives the partial operation end signal. The arithmetic processing method according to any one of claims 1 to 3, wherein an operation is executed.

5. An arithmetic processing device comprising a central processing unit and a vector processing unit capable of receiving a vector arithmetic instruction from the central processing unit and executing a plurality of arithmetic operations at the same time. When performing, the vector processing unit executes a partial operation on the predetermined vector data, and the central processing unit executes a convergence operation on the result of the partial operation by the vector processing unit. .

6. When the arithmetic processing for the predetermined vector data is a summation arithmetic processing, the vector processing unit executes a partial arithmetic operation for obtaining a partial sum of the predetermined vector data,
The arithmetic processing device according to claim 5, wherein the central processing unit executes a converging operation for adding partial sums by the vector processing unit.

7. When the arithmetic processing on the predetermined vector data is a total product arithmetic processing, the vector processing unit executes a partial arithmetic operation for obtaining a partial product of the predetermined vector data,
The arithmetic processing unit according to claim 5, wherein the central processing unit executes a converging operation for multiplying a partial product by the vector processing unit.

8. A partial calculation end signal output for providing a register for storing a partial calculation result by the vector processing unit, and outputting a partial calculation end signal to the central processing unit when all the partial calculations by the vector processing unit are completed. A vector processing unit, the central processing unit, when receiving the partial operation end signal from the partial operation end signal output unit, executes a convergence operation for the partial operation result stored in the register. The arithmetic processing unit according to any one of claims 5 to 7, which is characterized.

9. A vector length register for storing the vector length of the vector data sent from the central processing unit together with the vector data and the vector operation instruction to the partial operation end signal output section, and the partial operation processing. Every time one cycle is performed, the parallelism of the vector processing unit is subtracted from the value stored in the vector length register, and the subtraction unit re-stores the value in the vector length register, and the subtraction result by the subtraction unit is compared with 0. , The subtraction result is 0
9. The arithmetic processing device according to claim 8, further comprising: a comparison unit that outputs the partial calculation end signal when the following occurs.

10. An arithmetic queue for storing arithmetic instructions from the central processing unit, wherein the central processing unit outputs the vector arithmetic instruction to the vector processing unit, and at the same time, the vector processing unit outputs the vector arithmetic instruction to the arithmetic queue. When the central processing unit receives the partial operation end signal from the partial operation end signal output unit, an instruction to execute a convergence operation for the partial operation result stored in the register is stored in the operation queue. The arithmetic processing unit according to claim 8 or 9, wherein a converging operation is performed on the partial operation result stored in the register according to the stored instruction.