JP3005456B2

JP3005456B2 - Vector processing equipment

Info

Publication number: JP3005456B2
Application number: JP7150213A
Authority: JP
Inventors: 康宏井川
Original assignee: 甲府日本電気株式会社
Priority date: 1995-06-16
Filing date: 1995-06-16
Publication date: 2000-01-31
Anticipated expiration: 2015-01-31
Also published as: JPH096759A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明はベクトル処理装置に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a vector processing device.

【０００２】[0002]

【従来の技術】一般に、ベクトル処理装置は、主記憶装
置とレジスタあるいは演算器との間で大量のデータを高
速に処理するために、ベクトル演算部から同一タイミン
グにおいて複数のデータを同時にメモリアクセス制御部
に連続的に供給して、高速化を実現している。2. Description of the Related Art In general, a vector processing device simultaneously controls a plurality of data from a vector operation unit at the same timing in order to process a large amount of data between a main storage device and a register or an arithmetic unit at a high speed. It is supplied continuously to the unit to achieve high speed.

【０００３】従来この種のベクトル処理装置は、図２に
示すように、図示省略したベクトル演算部からの、各ベ
クトル要素リクエスト単位の４つの入力ポート、各入力
ポート毎の入力レジスタ２１ａ〜２１ｄ、また、ポート
競合発生時の緩衝用に、同一動作するバッファ２２ａ〜
２２ｄ、バッファの読み出しレジスタ２３ａ〜２３ｄ、
ポート競合の検出、バッファ制御及びクロスバ制御を行
うアービタ２６、アービタ２６の制御信号により、各出
力ポートの入力要素を選択するクロスバ２４及び出力ポ
ート対応の出力レジスタ２５ａ〜２５ｄを有している。Conventionally, as shown in FIG. 2, this type of vector processing apparatus has four input ports for each vector element request, input registers 21a to 21d for each input port, Buffers 22a to 22a operating in the same manner for buffering when a port conflict occurs.
22d, buffer read registers 23a to 23d,
An arbiter 26 for detecting port conflicts, performing buffer control and crossbar control, a crossbar 24 for selecting an input element of each output port according to a control signal of the arbiter 26, and output registers 25a to 25d corresponding to the output ports.

【０００４】この従来のベクトル処理装置は、主記憶装
置が８バイトを１ポートとしてインタリーブされてお
り、最小アクセス単位は８バイトのアクセスであり、４
バイトメモリアクセス命令も、８バイト単位に行ってい
る。In this conventional vector processing device, the main storage device is interleaved with 8 bytes as one port, the minimum access unit is 8-byte access, and
The byte memory access instruction is also performed in units of 8 bytes.

【０００５】図３（Ａ）は従来例及び本願発明を説明す
るためのベクトルロード／ストア命令時における各ベク
トル要素（以下、要素と記す）のアドレスと、各要素が
出力されるポートを要素毎に示す。図３（Ａ）におい
て、ベクトル命令のベースアドレスを”０”、ディスタ
ンスを”４”とすると、要素０，１、要素２，３、・・
・、と連続２要素ずつが出力ポートとなることがわか
る。FIG. 3A shows an address of each vector element (hereinafter referred to as an element) at the time of a vector load / store instruction for explaining the conventional example and the present invention, and a port to which each element is output for each element. Shown in In FIG. 3A, assuming that the base address of the vector instruction is “0” and the distance is “4”, elements 0, 1, element 2, 3,.
It can be seen that two consecutive elements become output ports.

【０００６】次に、図３（Ａ）のベクトル命令時におけ
る従来のベクトル処理装置（図２）の動作を説明する。
先ず、入力レジスタ２１ａ〜２１ｄにはそれぞれ要素０
から順番に、４要素ごとに連続的に格納される。あるタ
イミングで読み出しレジスタ２３ａ〜２３ｄに格納され
た要素０〜３に対して、アービタ２６によりポート競合
の検出が行われる。この場合、要素０，１はそれぞれ出
力ポート０に、要素２、３はそれぞれ出力ポート１に向
かうので、ポート競合が発生することになる。ポート競
合が発生した場合、競合した要素の優先順位の高い要素
（要素番号の最も小さい要素）がアービタ２６により検
出され、クロスバ２４の出力ポート０用セレクタは読み
出しレジスタ２３ａの要素０、クロスバ２４の出力ポー
ト１用セレクタでは読み出しレジスタ２３ｃの要素２が
それぞれ選択され、出力レジスタ２５ａ，２５に格納さ
れる。Next, the operation of the conventional vector processing apparatus (FIG. 2) at the time of the vector instruction of FIG. 3A will be described.
First, each of the input registers 21a to 21d has an element 0
, And sequentially stored for every four elements. At a certain timing, the arbiter 26 detects a port conflict with respect to the elements 0 to 3 stored in the read registers 23a to 23d. In this case, the elements 0 and 1 are directed to the output port 0, and the elements 2 and 3 are directed to the output port 1, so that port conflict occurs. When a port conflict occurs, the arbiter 26 detects the element having the highest priority (element having the smallest element number) of the conflicting element, and the selector for the output port 0 of the crossbar 24 selects the element 0 of the read register 23a and the element of the crossbar 24. In the selector for the output port 1, the element 2 of the read register 23c is selected and stored in the output registers 25a and 25.

【０００７】この時アービタ２６からは、競合が発生し
たことによるホールド要求が出され、読み出しレジスタ
２３ａ〜２３ｄはホールドし、バッファ２２ａ〜２２ｄ
はバッファのリードアドレスをホールドし、ホールド要
求が解除されるまでその状態を保つことになる。つま
り、連続液に入力レジスタ２１ａ〜２１ｄに入力してく
る要素は、バッファ２２ａ〜２２ｄに、同一タイミング
のリクエストは各バッファの同一ワードにバッファリン
グされていくことになる。これは同一命令内の各要素
間、あるいは命令間において、同一アドレスへメモリア
クセスする順序を守るために行われる処理である。At this time, the arbiter 26 issues a hold request due to the occurrence of the conflict, and the read registers 23a to 23d hold and the buffers 22a to 22d
Holds the read address of the buffer and maintains that state until the hold request is released. That is, the elements input to the input registers 21a to 21d in the continuous liquid are buffered in the buffers 22a to 22d, and requests at the same timing are buffered in the same word in each buffer. This is a process performed to maintain the order of memory access to the same address between elements in the same instruction or between instructions.

【０００８】競合に敗れて残った要素１，３に対して
は、次のタイミングで競合検出が行われ、今度は、ポー
ト競合は発生しないのでアービタ２６からのホールド要
求は解除され、読み出しレジスタ２３ａ〜２３ｄは次の
タイミングで要素４〜７の４要素が格納される。以降、
同様な処理が全ての要素の終了まで行われる。For the elements 1 and 3 remaining due to the contention, the contention is detected at the next timing. Since no port contention occurs, the hold request from the arbiter 26 is released and the read register 23a 23d stores four elements of elements 4 to 7 at the next timing. Or later,
Similar processing is performed until the end of all elements.

【０００９】図５（Ａ）は上記動作を示したタイミング
図であり、要素０，２が出力ポート２５ａ〜２５ｄに到
着するタイミングを１とした時の各要素の出力ポート到
着タイミングを示している。また１つのタイミングにお
ける、ポート競合検出対象要素、および出力ポートレジ
スタ到着要素数も示している。図３（Ａ）のベクトル命
令の場合、ポート競合により、各タイミングに４要素ず
つの入力に対して、出力２が要素ずつであることがわか
る。これは最大スループット１／２となる。FIG. 5A is a timing chart showing the above operation, and shows the output port arrival timing of each element when the timing when the elements 0 and 2 arrive at the output ports 25a to 25d is set to 1. . Also, the port conflict detection target element and the number of output port register arriving elements at one timing are shown. In the case of the vector instruction shown in FIG. 3A, it can be seen that output 2 is an element for each input of four elements at each timing due to port conflict. This results in a maximum throughput of １／.

【００１０】図３（Ｂ）は、従来例および本願発明を説
明するために、図３（Ａ）の場合に対してディスタンス
のみを１２とした場合の、ベクトルロード／ストア命令
時における各ベクトル要素のアドレスと、各要素が出力
される出力ポートを要素毎に示す。FIG. 3B is a diagram for explaining the conventional example and the present invention, in which each vector element at the time of a vector load / store instruction when only the distance is 12 with respect to the case of FIG. And the output port to which each element is output for each element.

【００１１】図３（Ｂ）の場合における図２の従来例の
タイミング図は図５（Ｂ）のようになる。FIG. 5B is a timing chart of the conventional example of FIG. 2 in the case of FIG. 3B.

【００１２】[0012]

【発明が解決しようとする課題】上述した従来のベクト
ル処理装置では、４バイトのベクトルロード、ストア命
令（ディスタンスが４バイトの奇数倍のケース）におい
て、スループットが最大値の１／２となってしまい、ベ
クトルロードおよびストア命令の、主メモリのアクセス
時間に深刻な性能低下を引き起こすという欠点がある。In the above-described conventional vector processing apparatus, the throughput becomes 1/2 of the maximum value in a 4-byte vector load / store instruction (in the case where the distance is an odd multiple of 4 bytes). As a result, there is a disadvantage in that the performance of the main memory access time of the vector load and store instructions is seriously degraded.

【００１３】[0013]

【課題を解決するための手段】本発明のベクトル処理装
置は、ベクトル要素ごとにベクトル演算を行う複数のベ
クトル演算部と、複数のバンクを有し、独立にアクセス
可能な複数のポートを有するメモリモジュールで構成さ
れる複数の主記憶部と、前記ベクトル演算部と前記主記
憶部間で複数のデータ転送が前記主記憶部における各ポ
ートのバイト幅単位に独立に行えるメモリアクセス制御
部を備えるベクトル処理装置に於いて、前記メモリアク
セス制御部に、前記ベクトル演算部から、パイプライン
方式で入力するベクトル要素単位、かつ要素番号順のベ
クトル要素リクエストを保持するｎ個（ｎ≧２）の入力
レジスタと、前記入力レジスタに対応して保持内容を格
納するｎ個の入力バッファと、前記入力バッファから読
み出したベクトル要素単位のリクエストを格納するｎ個
の読み出しレジスタと、前記読み出しレジスタに保持さ
れたｎ個のベクトル要素リクエストについて、要素番号
の競合調停を行う偶数アービタ、奇数アービタと、前記
偶数アービタ、奇数アービタにより競合調停された各ベ
クトル要素リクエストデータを、アドレスによるベクト
ル要素リクエスト指定の出力ポートへ送る偶数クロス
バ、奇数クロスバと、前記偶数クロスバ、奇数クロスバ
から最大２つのベクトル要素リクエストデータにつき、
偶数クロスバデータを、出力する順番が先になるように
優先して、同時に格納することが可能な、前記出力ポー
トに対応の出力バッファを設けたことを特徴とするベク
トル処理装置。A vector processing apparatus according to the present invention has a plurality of vector operation units for performing a vector operation for each vector element, a memory having a plurality of banks and a plurality of ports which can be accessed independently. A vector comprising: a plurality of main storage units configured as modules; and a memory access control unit that can independently perform a plurality of data transfers between the vector operation unit and the main storage unit in byte width units of each port in the main storage unit. In the processing device, n (n ≧ 2) input registers for holding vector element requests in the order of element numbers in vector element units input in a pipeline manner from the vector operation section to the memory access control section. And n input buffers for storing held contents corresponding to the input registers, and vectors read from the input buffers N read registers for storing requests in elementary units, and even number arbiters, odd number arbiters, and even number arbiters for competing and arbitrating element numbers for the n vector element requests held in the read registers. Each of the contention arbitrated vector element request data is sent to an output port designated by the address of the vector element request by an even crossbar, an odd crossbar, and up to two vector element request data from the even crossbar and the odd crossbar.
A vector processing device, wherein an output buffer corresponding to the output port is provided, which is capable of storing even-numbered crossbar data at the same time, so that the output order is first.

【００１４】[0014]

【実施例】以下、図面を用いて本発明の実施例について
詳述する。Embodiments of the present invention will be described below in detail with reference to the drawings.

【００１５】図１は本発明の一実施例を示すブロック図
であり、本ベクトル処理装置は８バイト単位に並列にデ
ータ転送が可能である。FIG. 1 is a block diagram showing an embodiment of the present invention. This vector processing apparatus can transfer data in parallel in units of 8 bytes.

【００１６】図１において、４つの入力レジスタ１１ａ
〜１１ｄは図示省略したベクトル演算部からのベクトル
ロード／ストアリクエスト受けるレジスタであり、各要
素単位の複数（例として４要素）のリクエストが要素番
号順に入力ポート０〜３に転送される。In FIG. 1, four input registers 11a are provided.
Reference numerals 11d denote registers for receiving vector load / store requests from a vector operation unit (not shown). A plurality of (for example, four elements) requests for each element are transferred to the input ports 0 to 3 in the order of element numbers.

【００１７】４つのバッファ１２ａ〜１２ｄは、出力ポ
ート競合による待ち合わせが起きたときに連続的にベク
トル演算部から発行されるリクエストの緩衝用のバッフ
ァであり、全てのバッファはリードアドレスおよびライ
トアドレスが共通で、同一動作をする。The four buffers 12a to 12d are buffers for continuously buffering requests issued from the vector operation unit when queuing due to output port contention occurs, and all buffers have read addresses and write addresses. Common and perform the same operation.

【００１８】４つの読み出しレジスタ１３ａ〜１３ｄ
は、バッファ１２ａ〜１２ｄの読み出し用のレジスタで
あり、出力ポート競合検出の対象となる。Four read registers 13a to 13d
Is a register for reading out the buffers 12a to 12d, and is a target of output port conflict detection.

【００１９】偶数ポートアービタ１６ａは入力ポートが
偶数（ポート０，２…読み出しレジスタ１３ａ，１３
ｃ）の要素のみを対象とした出力ポート競合調停回路で
であり、奇数ポートアービタ１６ｂは、入力ポートが奇
数（ポート１，３…読み出しレジスタ１３ｂ，１３ｄ）
の要素のみを対象とした出力ポート競合調停回路であ
り、それぞれのアービタは独立に動作する。これは、例
えば入力ポート０のベクトル要素リクエストと、入力ポ
ート１のベクトル要素がリクエストが同一出力ポートへ
向かったとしても、アービタが異なるので強豪は発生し
ていないようにみえ、出力ポート競合は検出せず、両方
のリクエストが出力可能となることを意味する。The input port of the even-numbered port arbiter 16a is an even number (ports 0, 2,..., Readout registers 13a, 13a).
This is an output port conflict arbitration circuit for only the element c), and the odd-numbered port arbiter 16b has an odd-numbered input port (ports 1, 3,... read-out registers 13b, 13d).
And an arbiter that operates independently. This is because even if the request of the vector element of input port 0 and the request of the vector element of input port 1 are directed to the same output port, the arbiter is different, so that it seems that no strong man has occurred, and output port conflict is detected. No, it means that both requests can be output.

【００２０】また、偶数ポートアービタ１６ａ，奇数ポ
ートアービタ１６ｂにおいて、どちらかのアービタで出
力ポート競合が検出された場合は、読み出しレジスタ１
３ａ〜１３ｂおよびバッファ１２ａ〜１２ｄのリードア
ドレスを、全ての出力ポート競合がなくなるまでホール
ドとして、後続リクエストのアービタ参加を抑止する。
これは命令内の各要素間、あるいは命令間の要素間にお
いて、同一アドレスへのアクセスする順序を守るための
処理である。If the output port conflict is detected in either of the even port arbiter 16a and the odd port arbiter 16b, the read register 1
The read addresses of the buffers 3a to 13b and the buffers 12a to 12d are held until all output port conflicts are eliminated, and the arbiter participation of subsequent requests is suppressed.
This is a process for maintaining the order of accessing the same address between elements in an instruction or between elements between instructions.

【００２１】偶数ポートクロスバ１４ａは、偶数ポート
アービタ１６ａによって制御され、出力ポート競合調停
結果により偶数入力ポートのベクトル要素リクエストの
みを出力ポート（例では４ポート）へ転送している。奇
数クロスバ１４ｂは、同様に、奇数ポートアービタ１６
ｂによって制御され、奇数入力ポートのベクトル要素リ
クエストのみを出力ポートへ転送している。The even port crossbar 14a is controlled by the even port arbiter 16a, and transfers only the vector element request of the even input port to the output port (four ports in the example) according to the output port contention arbitration result. The odd-numbered crossbar 14b is similarly connected to the odd-numbered port arbiter 16
b, and transfers only the vector element request of the odd input port to the output port.

【００２２】出力バッファ１５ａ〜１５ｄは、それぞれ
出力ポートに対応して設けられ、１つの出力バッファ
は、偶数ポートクロスバ１４ａと、奇数ポートクロスバ
１４ｂのそれぞれから送られる、最大２つのベクトル要
素リクエストを同時に格納可能なバッファであり、常
に、偶数ポートクロスバからのベクトル要素リクエスト
を、出力する順番が先になるように優先して格納してい
る。Output buffers 15a to 15d are provided corresponding to the respective output ports, and one output buffer simultaneously receives a maximum of two vector element requests sent from each of the even port crossbar 14a and the odd port crossbar 14b. The buffer is a storable buffer and always stores a vector element request from an even-numbered port crossbar with priority so that the output order is first.

【００２３】本実施例の動作を説明するために、再び図
３を使用する。FIG. 3 is used again to explain the operation of this embodiment.

【００２４】図４（Ａ）は図３（Ａ）に示したベクトル
ロード／ストア命令時における各べクトル要素のアドレ
スと、各要素が出力される出力ポートの関係下における
本発明でのベクトル要素リクエストの出力状況を示して
いる。この動作を図１を参照して詳細に説明する。FIG. 4A shows a vector element according to the present invention under the relationship between the address of each vector element at the time of the vector load / store instruction shown in FIG. 3A and the output port from which each element is output. Indicates the output status of the request. This operation will be described in detail with reference to FIG.

【００２５】要素０〜３の４要素は、入力ポート０〜３
より入力して、入力レジスタ１１ａ〜１１ｄに要素番号
順にそれぞれ格納される。次のタイミングでは入力ポー
トには要素４〜７の４要素が送られてきており、以降各
タイミング毎に、４要素ずつ連続的にパイプラン方式で
送られ、入力レジスタ１１ａ〜１１ｄに順次格納され
る。Elements 0 to 3 are input ports 0 to 3
And stored in the input registers 11a to 11d in the order of element numbers. At the next timing, four elements of elements 4 to 7 are sent to the input port. Thereafter, at each timing, four elements are continuously sent by the pipe run method and sequentially stored in the input registers 11a to 11d. .

【００２６】最初の４要素、要素０〜３は、バッファ１
２ａ〜１２ｄを介して読み出しレジスタ１３ａ〜１３ｄ
に要素番号順に格納される。それまでは読み出しレジス
タ１３ａ〜１３ｄには、有効な要素は格納されていなか
った。ここで、奇数ポートアービタ１６ａと偶数ポート
アービタ１６ｂで出力ポート競合がそれぞれチェックさ
れる。偶数ポートアービタ１６ａは、読み出しレジスタ
の偶数ポート（ポート０，２・・・１３ａ，１３ｃ）に
格納した要素０，２のそれぞれの出力ポートをチェック
する。この場合、図３（Ａ）より要素０は出力ポート
０、要素２は出力ポート１なので出力ポート競合は発生
しないので要素０，２それぞれの要素をクロスバ１４ａ
により要素０はポート０、要素２は出力ポート１へ送る
制御を行う。一方、奇数ポートアービタ１６ｂは、読み
出しレジスタの奇数ポート（ポート１，３・・・１３
ｂ，１３ｄ）に格納した要素１，３のそれぞれの出力ポ
ートをチェックする。The first four elements, elements 0-3, are buffer 1
Readout registers 13a to 13d via 2a to 12d
Are stored in the order of element numbers. Until that time, valid elements were not stored in the read registers 13a to 13d. Here, output port conflicts are checked by the odd port arbiter 16a and the even port arbiter 16b. The even port arbiter 16a checks the output ports of the elements 0 and 2 stored in the even ports (ports 0, 2,... 13a, 13c) of the read register. In this case, since the element 0 is the output port 0 and the element 2 is the output port 1 from FIG. 3 (A), no output port conflict occurs.
Element 0 controls port 0 and element 2 controls output port 1. On the other hand, the odd port arbiter 16b is an odd port (ports 1, 3...
The output ports of the elements 1 and 3 stored in (b, 13d) are checked.

【００２７】この場合、図３（Ａ）より要素１は出力ポ
ート０、要素３は出力ポート１なので出力ポート競合は
発生しないため、要素１，３それぞれの要素をクロスバ
１４ｂにより要素１は出力ポート０、要素３は出力ポー
ト１へ送る制御を行う。ここで、偶数ポートアービタ１
６ａと、奇数ポートアービタ１６ｂはそれぞれ出力ポー
ト競合を検出しなかったので、ホールドは発生せず、読
み出しレジスタ１３ａ〜１３ｄには、次のタイミングに
後続要素の要素４〜７を格納することになる。In this case, since the element 1 is the output port 0 and the element 3 is the output port 1 from FIG. 3 (A), no output port contention occurs. Therefore, the elements 1 and 3 are each connected to the output port by the crossbar 14b. 0, element 3 controls sending to output port 1. Here, even port arbiter 1
6a and the odd-numbered port arbiter 16b do not detect an output port conflict, respectively. Therefore, no hold occurs, and the subsequent elements 4 to 7 are stored in the read registers 13a to 13d at the next timing. .

【００２８】一方、出力ポート０に対応する出力バッフ
ァ１５ａには、偶数クロスバ１４ａから要素０、奇数ク
ロスバから要素１が送られる。ここで出力バッファ１４
ａは、偶数ポートクロスバからの要素０を、出力する順
番が先になるようにして格納し、奇数ポートクロスバか
らの要素１を、要素０の次の順番で出力するように格納
する。同様に、出力ポート１に対応する出力バッファ１
５ｂには、偶数クロスバ１４ａから要素２、奇数クロス
バから要素３が送られる。ここで出力バッファ１４ｂ
は、偶数ポートクロスバからの要素２を、出力する順番
が先になるようにして格納し、奇数ポートクロスバから
の要素３を、要素０の次の順番で出力するように格納す
る。そして次のタイミングに出力ポート０から要素０、
出力ポート１から要素２が出力し、要素１、要素３は次
のタイミングで出力するようになる。On the other hand, the element 0 is sent from the even-numbered crossbar 14a and the element 1 is sent from the odd-numbered crossbar to the output buffer 15a corresponding to the output port 0. Here, the output buffer 14
“a” stores the element 0 from the even-numbered port crossbar in the order of output, and stores the element 1 from the odd-numbered port crossbar in the order following the element 0. Similarly, output buffer 1 corresponding to output port 1
The element 2 is sent from the even crossbar 14a and the element 3 is sent from the odd crossbar to 5b. Here, the output buffer 14b
Stores the element 2 from the even-numbered port crossbar in the output order first, and stores the element 3 from the odd-numbered port crossbar in the order following the element 0. Then, at the next timing, element 0 from output port 0,
The element 2 outputs from the output port 1, and the elements 1 and 3 output at the next timing.

【００２９】また、同様に後続の要素４〜７は、偶数ポ
ートアービタで競合調停されるが、出力ポート競合は検
出されない。以降の要素も図３（Ａ）の各要素の出力ポ
ートからもわかるように、出力ポート競合は検出されず
に、毎タイミングに４要素ずつが読み出しレジスタ１３
ａ〜１３ｄに格納されて、処理されていく。この様子は
図３（Ｃ）に示したが、要素０，２が出力されるタイミ
ングを１とすると、タイミング２において出力バッフア
１５ａ、１５ｂに残った要素１，３が出力し、同時に後
続の要素４，６が出力しており、最初の２要素の出力以
降は、毎タイミング４要素が出力することになり、２４
要素全てが出力するのがタイミング７となる。これは従
来構成の場合と比較して大幅なスループットの向上とな
り、要素が大きくなればなるほど、最大スループット
（１タイミングに４要素出力）に近づくことがわかる。Similarly, the following elements 4 to 7 are arbitrated by the even port arbiter, but no output port conflict is detected. As can be seen from the output ports of the respective elements in FIG. 3A, the output elements are not detected, and four elements are read out by the read register 13 at each timing.
a to 13d, and are processed. This state is shown in FIG. 3 (C). If the timing at which the elements 0 and 2 are output is 1, the elements 1 and 3 remaining in the output buffers 15a and 15b are output at the timing 2 and at the same time, the subsequent elements 4 and 6 are output. After the output of the first two elements, four elements are output at each timing.
Timing 7 is when all the elements are output. This is a significant improvement in throughput as compared with the conventional configuration, and it can be seen that the larger the element, the closer to the maximum throughput (four element output per timing).

【００３０】図４（Ｂ）は図３（Ｂ）の場合における本
実施例のタイミング図であり、図５（Ｂ）と対比するこ
とにより、本発明は従来の構成に比べ、大幅なスループ
ットの向上があることがわかる以上のように、本発明で
はベクトルのメモリアクセス命令のスループットが向上
する。また、実施例では入力４ポート、出力４ポートと
したが、いかなるポート数でも同様に対応できることは
明らかである。さらに、ディスタンスを図３（Ａ）では
４バイト（４バイト×１）、図３（Ｂ）では１２（４バ
イト×３）バイトとしたが、４バイトメモリアクセス命
令のディスタンスが（４バイト×奇数）では同様に、従
来技術よりもスループットが向上する。FIG. 4B is a timing chart of the present embodiment in the case of FIG. 3B. By comparing with FIG. 5B, the present invention has a much higher throughput than the conventional configuration. As can be seen, the present invention improves the throughput of vector memory access instructions. In the embodiment, four input ports and four output ports are used. However, it is apparent that any number of ports can be used in the same manner. Further, the distance is 4 bytes (4 bytes × 1) in FIG. 3A and 12 (4 bytes × 3) bytes in FIG. 3B, but the distance of the 4-byte memory access instruction is (4 bytes × odd number). ) Also improves the throughput over the prior art.

【００３１】[0031]

【発明の効果】以上説明したように本発明のベクトル処
理装置によれば、ベクトルロード、ストア命令の出力ポ
ート競合の発生を抑えるように、競合調停を偶数ポート
対象の競合調停回路と、奇数ポート対象の競合調停回路
に分割したため、競合調停でのスープットの低下を抑止
し、これによりスループットが向上するという効果を有
する。例えば、４バイトのベクトルのメモリアクセスの
命令の場合、スループットが約２倍にも向上する。As described above, according to the vector processing apparatus of the present invention, the contention arbitration is performed on the even-numbered port arbitration circuit and the odd-numbered port arbitration circuit so as to suppress the occurrence of the output port contention of the vector load and store instructions. The division into the target contention arbitration circuit suppresses a drop in the contention in the contention arbitration, thereby improving the throughput. For example, in the case of a 4-byte vector memory access instruction, the throughput is improved about twice.

[Brief description of the drawings]

【図１】本発明の一実施例のベクトル処理装置を示すブ
ロック図である。FIG. 1 is a block diagram illustrating a vector processing device according to an embodiment of the present invention.

【図２】従来のベクトル処理装置の一例を示すブロック
図である。FIG. 2 is a block diagram illustrating an example of a conventional vector processing device.

【図３】本発明の動作を説明するためのタイミング図で
ある。FIG. 3 is a timing chart for explaining the operation of the present invention.

【図４】本発明の実施例の動作を説明するためのタイミ
ング図である。FIG. 4 is a timing chart for explaining the operation of the embodiment of the present invention.

【図５】従来例の動作を説明するためのタイミング図で
ある。FIG. 5 is a timing chart for explaining the operation of the conventional example.

[Explanation of symbols]

１１ａ〜１１ｄ入力レジスタ１２ａ〜１２ｄ入力バッファ１３ａ〜１３ｄ読み出しレジスタ１４ａ偶数ポートクロスバ１４ｂ奇数ポートクロスバ１５ａ〜１５ｄ出力バッファ１６ａ偶数ポートアービタ１６ｂ奇数ポートアービタ２１ａ〜２１ｄ入力レジスタ２２ａ〜２２ｄ入力バッファ２３ａ〜２３ｄ読み出しレジスタ２４クロスバ２５ａ〜２５ｄ出力レジスタ２６アービタ 11a to 11d Input register 12a to 12d Input buffer 13a to 13d Read register 14a Even port crossbar 14b Odd port crossbar 15a to 15d Output buffer 16a Even port arbiter 16b Odd port arbiter 21a to 21d Input register 22a to 22d Input buffer 23a to 23d Read Register 24 Crossbar 25a-25d Output register 26 Arbiter

フロントページの続き (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 17/16 G06F 12/06 Continuation of the front page (58) Field surveyed (Int.Cl. ⁷ , DB name) G06F 17/16 G06F 12/06

Claims

(57) [Claims]

1. Perform a vector operation for each vector element
It has multiple vector operation units and multiple banks,
Memory module having a plurality of accessible ports
A plurality of main storage units, and the vector operation unit
A plurality of data transfers between the main storage units are stored in the main storage unit.
Memory access that can be performed independently for each port
Access controlWhenIn a vector processing device comprising:
The memory access control unit includes:Requires vector element request
In prime orderN (n ≧ 2) input registers to be stored;These n Input registerThe vector element like from
EstToRespectivelyN input buffers to store;These n Input bufferThe vector element fromRiku
EstRespectivelyN read registers to be stored, and n read registers held in the read registersSaidvector
Element requestConflicts only with elements whose input ports are even
An even port arbiter for arbitration, N vectors stored in the read register
Conflicts only with elements with odd input ports in element requests
An odd port arbiter for arbitration, The even arbiterEach vector element that has been arbitrated by
Request is forwarded to the specified output port by address
An even port crossbar to Each vector element contention arbitrated by the odd arbiter
Request is forwarded to the specified output port by address
An odd port crossbar to The even port crossbar and the odd port crossbar
Of up to two vector element requests transferred from
The vector transferred from the even port crossbar.
The two vectors so that the elementary request is output first
Stores element requests Output buffer
And a vector processing device.