JPH07282023A

JPH07282023A - Data transfer amount variable processor and system using the same

Info

Publication number: JPH07282023A
Application number: JP6068293A
Authority: JP
Inventors: Shigeya Tanaka; 成弥田中; Takashi Hotta; 多加志堀田; Akihiro Katsura; 晃洋桂; Michio Morioka; 道雄森岡
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1994-04-06
Filing date: 1994-04-06
Publication date: 1995-10-27

Abstract

PURPOSE:To reduce the use rate of a communication bus and to improve the processing ability of a multiprocessor by providing a memory access control part executing the access making data transfer amounts to be different in a memory bus and a transfer bus for a processor. CONSTITUTION:The processor 100, is provided with a judgement unit for memory space range designation 101, a memory access control 102, a memory bus control 103, a transfer bus control 104, a prefetch unit 105, an instruction cache unit 106, a data cache unit 107 and an arithmetic unit 108. It is judged whether data to be accessed is in a distributed memory corresponding to the processor 100 or in the other memory. For executing the access based on the judgement, the data transfer amount of access is changed, namely, the data transfer amount to the transfer bus 111 is reduced compared to that to the memory bus 110. Thus, the use time of the transfer bus 111 per one time becomes short, the use rate of the transfer bus 111 is reduced and a high speed processing can be realized.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、データ処理装置に係
り、特に共有分散メモリにおけるデータ転送量可変プロ
セッサに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data processing device, and more particularly to a variable data transfer amount processor in a shared distributed memory.

【０００２】[0002]

【従来の技術】従来の共有分散メモリを有するのマルチ
プロセッサシステムとして、特開昭56−155465号，特開
平1−134656号，特開平2−244253号，特開平4−326453
号に記載のように、複数個のプロセッサと共有メモリ装
置とを通信（転送）バスで接続し、物理アドレス空間の
全部、又は一部を各プロセッサからアクセス出来る構成
をとる。2. Description of the Related Art As a conventional multiprocessor system having a shared distributed memory, JP-A-56-155465, JP-A-1-134656, JP-A-2-244253, and JP-A-4-326453.
As described in No. 3, a plurality of processors and a shared memory device are connected by a communication (transfer) bus so that all or part of the physical address space can be accessed from each processor.

【０００３】これらの公知技術では、プロセッサ又はプ
ロセッサと分散メモリからなる命令処理装置が、アクセ
スすべきデータが自己の分散メモリのメモリ空間へのア
クセスか、他の分散メモリのメモリ空間へのアクセスか
を判定し、自己の分散メモリ空間内であれば自己の分散
メモリ装置にアクセス要求を、自己の分散メモリ空間内
でなければ通信バスを通して他の分散メモリ装置にアク
セス要求を出力する。In these known techniques, the processor or the instruction processing device including the processor and the distributed memory accesses the data space to be accessed to its own distributed memory memory space or to another distributed memory memory space. If it is in its own distributed memory space, the access request is output to its own distributed memory device, and if it is not in its own distributed memory space, the access request is output to another distributed memory device through the communication bus.

【０００４】[0004]

【発明が解決しようとする課題】上述の従来技術に示さ
れた共有分散メモリを有するのマルチプロセッサシステ
ムは、データを分散しない１つのメモリをアクセスする
ものに比べて通信バス要求回数の低いマルチプロセッサ
システムが提供できる。The multiprocessor system having the shared distributed memory shown in the above-mentioned prior art has a low number of requests for the communication bus as compared with an access to a single memory in which data is not distributed. The system can provide.

【０００５】しかし、一回の通信バス要求に対するデー
タ又は命令（以下、命令もデータとして扱う）の転送量
については何も記載されていない。However, there is no description about the transfer amount of data or command (hereinafter, command is also treated as data) for one communication bus request.

【０００６】さらに、プロセッサにキャッシュメモリを
内蔵するものが一般的になってきている。このようなプ
ロセッサ又はシステムにおいて、キャッシュヒットしな
かった場合のアクセス、今後使われるであろうデータに
対して先行してメモリアクセスを起こす先行プリフェッ
チとの関係についても記載されていない。Further, it is becoming common for a processor to have a cache memory built therein. In such a processor or system, there is no description of the relationship between the access when there is no cache hit and the prefetch that causes the memory access to the data that will be used in the future.

【０００７】本発明の第１の目的は、分散共有メモリ型
のマルチプロセッサシステムにおいて、通信バスの占有
する時間を短くすべく、バス要求回数の低減と１回当り
の転送量の最適化をおこない、通信バスの使用率を下
げ、マルチプロセッサ全体の処理能力を向上させること
である。A first object of the present invention is to reduce the number of bus requests and optimize the transfer amount per time in a distributed shared memory type multiprocessor system in order to shorten the time occupied by the communication bus. , Lowering the usage rate of the communication bus and improving the processing capacity of the entire multiprocessor.

【０００８】さらに、本発明の第２の目的は、分散共有
メモリ型のマルチプロセッサシステムにおいて、通信バ
スのバスネックを起こさないようにしながら可能な限り
先行プリフェッチをおこない、マルチプロセッサ全体の
処理能力を向上させるシステムを提供することにある。Further, a second object of the present invention is to perform a pre-fetch as much as possible while preventing a bus neck of a communication bus in a distributed shared memory type multiprocessor system so that the processing capability of the entire multiprocessor is improved. It is to provide a system to improve.

【０００９】[0009]

【課題を解決するための手段】上記第１の目的を達成す
るために、データを分散して保持する分散メモリをメモ
リバスを介してアクセスするとともに少なくとも１つの
他の分散メモリを転送バスを介してアクセスしてデータ
を処理するプロセッサであって、上記プロセッサは、上
記メモリバスと上記転送バスとでデータの転送量を異な
らせてアクセスするメモリアクセス制御部を有すること
を特徴とする。In order to achieve the above first object, a distributed memory that holds data in a distributed manner is accessed through a memory bus, and at least one other distributed memory is accessed through a transfer bus. The processor is characterized in that it has a memory access control unit that accesses the memory bus and the transfer bus with different amounts of data transferred.

【００１０】さらに、上記第２の目的を達成するために
は、データを分散して保持する分散メモリをメモリバス
を介してアクセスするとともに少なくとも１つの他の分
散メモリを転送バスを介してアクセスしてデータを処理
するプロセッサであって、上記プロセッサは、データを
処理する演算部と上記分散メモリに保持されているデー
タの一部を保持するキャッシュメモリ部と上記演算部の
処理に応じて上記キャッシュメモリ部へデータを書き込
むプリフェッチ部とを有し、上記プリフェッチ部によっ
て上記キャッシュメモリ部に書き込まれたデータが上記
演算部のアクセスするデータでなければ、上記プリフェ
ッチ部は上記演算部の次の処理に応じて上記キャッシュ
メモリ部へ次のデータを書き込むために、上記次の書き
込むデータが上記分散メモリに保持されているか上記他
の分散メモリに保持されているかを判定し、上記メモリ
バスと上記転送バスとでデータの転送量を異ならせてア
クセスするメモリアクセス制御部を有することを特徴と
する。Further, in order to achieve the above-mentioned second object, a distributed memory for distributing and holding data is accessed via a memory bus, and at least one other distributed memory is accessed via a transfer bus. A processor for processing data, wherein the processor is an operation unit for processing data, a cache memory unit for holding a part of the data held in the distributed memory, and the cache according to the processing of the operation unit. A prefetch unit for writing data to the memory unit, and if the data written in the cache memory unit by the prefetch unit is not the data accessed by the arithmetic unit, the prefetch unit performs the next processing of the arithmetic unit. In response to writing the next data to the cache memory unit, the next write data is It has a memory access control unit that determines whether the data is held in a distributed memory or another distributed memory, and accesses the memory bus and the transfer bus with different data transfer amounts. To do.

【００１１】[0011]

【作用】上記第１の特徴によると、アクセスすべきデー
タがプロセッサに対応した分散メモリにあるか他の分散
メモリにあるかを判定し、その判定に基づいてアクセス
するのに、アクセスのデータ転送量を変えること、つま
り、転送バスへのデータ転送量をメモリバスへのデータ
転送量に比べて小さくする。これによって、転送バスの
１回当りの使用時間は短くなり、その結果転送バスの使
用率を下げ、プロセッサ又はシステム全体として高速な
処理が達成される。According to the first feature described above, it is determined whether the data to be accessed is in the distributed memory corresponding to the processor or another distributed memory, and based on the determination, the access data transfer is performed. The amount is changed, that is, the data transfer amount to the transfer bus is made smaller than the data transfer amount to the memory bus. As a result, the usage time of each transfer bus is shortened, and as a result, the usage rate of the transfer bus is reduced, and high-speed processing is achieved by the processor or the entire system.

【００１２】さらに、上記第２の目的は、プロセッサ
は、主記憶のデータのコピーを持つキャッシュメモリ
と、プロセッサの処理に応じて今後使われるであろうデ
ータに対して先行してメモリアクセスを行う先行プリフ
ェッチ部とを有し、プリフェッチ要求されたプリフェッ
チデータが存在しないときは、要求されたアドレスがプ
ロセッサに対応した分散メモリのメモリ空間内か否かを
判定し、この判定結果によるアクセスすべきメモリ空間
に応じたデータ転送量、つまり、一般には転送バスへの
データ転送量をメモリバスへのデータ転送量に比べて小
さくして、アクセスする。これによって、転送バスの１
回当りの使用時間は短くなり、その結果転送バスの使用
率を下げるように働く。Further, the second purpose is that the processor performs a memory access in advance to the cache memory having a copy of the data in the main memory and the data which will be used in the future according to the processing of the processor. When the prefetch request prefetch data does not exist, the prefetch unit has a preceding prefetch unit, and it is determined whether or not the requested address is within the memory space of the distributed memory corresponding to the processor, and the memory to be accessed according to this determination result. The data transfer amount according to the space, that is, generally, the data transfer amount to the transfer bus is made smaller than the data transfer amount to the memory bus for access. This allows the transfer bus 1
The usage time per operation is shortened, and as a result, the usage rate of the transfer bus is reduced.

【００１３】また、当該命令処理手段外のメモリ装置へ
の先行プリフェッチでのメモリアクセスを行わないよう
にする。これによって、転送バスへのアクセス回数が少
なくなり、その結果転送バスの使用率を下げるように働
く。Further, the memory access by the pre-fetch is not performed to the memory device outside the instruction processing means. As a result, the number of accesses to the transfer bus is reduced, and as a result, the usage rate of the transfer bus is reduced.

【００１４】このように、転送バスの使用率を下げるこ
とにより、マルチプロセッサの台数を増やせるか又はバ
ス待ち時間が減りシステム性能が向上する。By thus reducing the usage rate of the transfer bus, the number of multiprocessors can be increased or the bus waiting time can be reduced to improve the system performance.

【００１５】[0015]

【実施例】図１に本実施例のプロセッサのブロック図を
示す。１００はプロセッサ、101はメモリ空間範囲指定
用判定ユニット、１０２はメモリアクセス制御、１０３
はメモリバス制御、１０４は転送バス制御、１０５はプ
リフェッチユニット、106は命令キャッシュユニット、
１０７はデータキャッシュユニット、１０８は演算ユニ
ットで構成する。また、プロセッサから外部に接続され
るバスは、メモリバス１１０，転送バス１１１である。1 is a block diagram of a processor according to this embodiment. 100 is a processor, 101 is a determination unit for specifying a memory space range, 102 is memory access control, 103
Is a memory bus control, 104 is a transfer bus control, 105 is a prefetch unit, 106 is an instruction cache unit,
107 is a data cache unit, and 108 is an arithmetic unit. The buses connected to the outside from the processor are the memory bus 110 and the transfer bus 111.

【００１６】図２に図１で示したプロセッサを使用した
共有分散メモリ型のマルチプロセッサの構成を示す。プ
ロセッサ２０１と分散メモリ−１（２０５）とそれを接
続するメモリバス２１５で１つの命令処理装置を構成
し、それらを複数個持ち各プロセッサ間を転送バス２１
３で接続する。２０１から２０４は図１で示したプロセ
ッサ、２０５から２０８は分散メモリ、２０９はＩ／Ｏ
コントローラ、２１０〜２１２はディスプレイ，プリン
タ，ディスク等のＩ／Ｏデバイスで構成する。プロセッ
サ−１（２０１）について着目してみると、図１のプロ
セッサ１００の転送バス１１１が図２の転送バス２１３
に接続され、図１のプロセッサ１００のメモリバス１１
０がメモリバス２１５に接続する構成であり、プロセッ
サ２０２から２０４についても同様である。FIG. 2 shows the configuration of a shared distributed memory type multiprocessor using the processor shown in FIG. The processor 201, the distributed memory-1 (205), and the memory bus 215 connecting them constitute one instruction processing unit, and a plurality of them are provided and a transfer bus 21 is provided between the processors.
Connect with 3. 201 to 204 are the processors shown in FIG. 1, 205 to 208 are distributed memories, and 209 is I / O.
The controllers 210 to 212 are composed of I / O devices such as a display, a printer and a disk. Focusing on the processor-1 (201), the transfer bus 111 of the processor 100 of FIG. 1 is the transfer bus 213 of FIG.
Connected to the memory bus 11 of the processor 100 of FIG.
0 is connected to the memory bus 215, and the same applies to the processors 202 to 204.

【００１７】図２の分散メモリのアドレス空間を図３に
示す。本実施例のシステムは物理アドレス４Ｇバイトを
実装しており、メモリマップＩ／Ｏの構成をとる。この
ため各プロセッサ（２１０〜２０４）から見えるアドレ
ス空間は４Ｇバイトであり、アドレスＸ′Ｆ０００００
００からＸ′ＦＦＦＦＦＦＦＦまでがＩ／Ｏ空間であ
る。図２の分散メモリ２０５から２０８は図３に示す各
分散メモリの実装アドレス空間を割当て、それぞれの分
散メモリ上で実装する。The address space of the distributed memory of FIG. 2 is shown in FIG. The system of this embodiment has a physical address of 4 Gbytes and has a memory map I / O configuration. Therefore, the address space seen from each processor (210 to 204) is 4 Gbytes, and the address X'F00000
00 to X'FFFFFFFF is the I / O space. The distributed memories 205 to 208 in FIG. 2 are allocated the mounting address spaces of the distributed memories shown in FIG. 3 and are mounted on the respective distributed memories.

【００１８】さて、始めに図１から図３を用いてマルチ
プロセッサシステムの基本動作を述べ、続いてプロセッ
サ内部の詳細な動作を説明する。First, the basic operation of the multiprocessor system will be described with reference to FIGS. 1 to 3, and then the detailed operation inside the processor will be described.

【００１９】図２において、プロセッサ−１を基準に考
えると、プロセッサ−１がアドレスＸ′０００００１０
０なるアドレスをアクセスする場合、プロセッサ−１内
部の制御により、プロセッサ−１が直接接続している分
散メモリ−１へのアクセスであることを認識し、プロセ
ッサ−１はメモリバス２１５を起動して分散メモリ−１
（２０５）へのアクセスを行う。一方、プロセッサ−１
がアドレスＸ′４Ｆ００００００なるアドレスをアクセ
スする場合、プロセッサ−１内部の制御により、プロセ
ッサ−１が転送バスへのアクセスであることを認識し、
プロセッサ−１は転送バス２１３を起動する。プロセッ
サ−１から４（２０１から２０４）は転送バスを常に監
視しており、このケースでは、プロセッサ−２から４が
各プロセッサに直接接続している分散メモリへのアクセ
スか否かを調べる。本例では、プロセッサ−２が直接接
続している分散メモリ−２へのアクセスであることを認
識しデータを受け取る。プロセッサ−２は転送バスから
受けとったデータより、分散メモリ−２へのアクセスを
するための準備としてプロセッサ−２の内部で転送バス
とメモリバス２１６を接続する。このような一連の操作
により、プロセッサ−１(２０１），転送バス(２１
３），プロセッサ−２(２０２）,メモリバス２１６，分
散メモリ−２（２０６）と接続されアクセスすることが
できる。さらに、プロセッサ−１がアドレスＸ′Ｆ１０
０００００なるアドレスをアクセスする場合、プロセッ
サ−１内部の制御により、プロセッサ−１が転送バスへ
のアクセスであることを認識し、プロセッサ−１は転送
バス２１３を起動する。本実施例では、Ｉ／Ｏコントロ
ーラ２０９が転送バス２１３を監視してＩ／Ｏエリアヘ
のアクセスであることを認識しデータを受け取る。その
後、Ｉ／Ｏコントローラ２０９は、各Ｉ／Ｏデバイス２
１０から２１２へアクセスするためＩ／Ｏコントローラ
２０９内で転送バスとＩ／Ｏバスを接続する。このよう
な一連の操作により、プロセッサ−１(２０１），転送
バス(２１３），Ｉ／Ｏコントローラ２０９，Ｉ／Ｏバ
ス（２１４），Ｉ／Ｏデバイス（２１０〜２１２）と接
続されアクセスすることができる。In FIG. 2, considering the processor-1 as a reference, the processor-1 has the address X'00000010.
When accessing the address 0, the processor-1 recognizes that it is an access to the distributed memory-1 directly connected to the processor-1, and the processor-1 activates the memory bus 215. Distributed memory-1
Access (205). On the other hand, processor-1
When accessing the address X'4F000000, the processor-1 recognizes that it is an access to the transfer bus under the control of the processor-1.
Processor-1 activates the transfer bus 213. Processors-1 through 4 (201-204) are constantly monitoring the transfer bus, and in this case, processors-2 through 4 check to see if they are accessing distributed memory directly connected to each processor. In this example, the processor-2 recognizes that it is an access to the distributed memory-2 directly connected to it and receives data. The processor-2 connects the transfer bus and the memory bus 216 inside the processor-2 in preparation for accessing the distributed memory-2 from the data received from the transfer bus. By such a series of operations, the processor-1 (201) and the transfer bus (21
3), the processor-2 (202), the memory bus 216, and the distributed memory-2 (206) can be connected and accessed. Further, the processor-1 has the address X'F10.
When accessing the address 00000, the processor-1 recognizes that the access is to the transfer bus under the control of the processor-1, and the processor-1 activates the transfer bus 213. In this embodiment, the I / O controller 209 monitors the transfer bus 213, recognizes that the access is to the I / O area, and receives the data. After that, the I / O controller 209 changes the I / O device 2
The transfer bus and the I / O bus are connected in the I / O controller 209 to access 10 to 212. Through such a series of operations, the processor-1 (201), the transfer bus (213), the I / O controller 209, the I / O bus (214), and the I / O devices (210 to 212) are connected and accessed. You can

【００２０】本実施例のマルチプロセッサは、各プロセ
ッサ内部にキャッシュメモリが内蔵されており、主記憶
のデータと常に一致させる一致化手段として各プロセッ
サは転送バスを監視するスヌープキャッシュ構成をと
る。本実施例では、あるプロセッサが他のプロセッサ内
のキャッシュメモリに対してデータの一致性を確認する
とき、ＣＣＣ（キャッシュコヒーレントチェック）要求
を出力する。ＣＣＣ要求は、チェックすべきアドレスを
転送バスを介してブロードキャストされる。本実施例で
は、プロセッサ内部のアドレスアレが２ポートになって
いるため各プロセッサ自身の実行している内部動作と独
立にチェックできる構成をとる。さらに、プロセッサ−
１が、キャッシュミスして、プロセッサ−１から分散メ
モリ−１へアクセスしている場合にも、他のプロセッサ
からのＣＣＣ要求が独立に実行できる。In the multiprocessor of this embodiment, a cache memory is built in each processor, and each processor has a snoop cache structure for monitoring the transfer bus as a matching means for always matching the data in the main memory. In this embodiment, when a processor confirms the data consistency with the cache memory in another processor, it outputs a CCC (cache coherent check) request. The CCC request is broadcast over the transfer bus with the address to be checked. In this embodiment, since the address array inside the processor has two ports, it is possible to check the internal operation independently of each processor. In addition, the processor
Even when the processor 1 makes a cache miss and is accessing the distributed memory -1 from the processor-1, the CCC request from another processor can be independently executed.

【００２１】続いて、プロセッサ内部の本発明の特徴で
ある部分を中心に詳細な動作について述べる。Next, a detailed operation will be described, centering on the characteristic part of the present invention inside the processor.

【００２２】図１に示すプロセッサ１００の基本的な命
令の動作を図４，図５，図６で説明する。図４に演算ユ
ニット１０８の詳細な構成を示す。演算ユニット１０８
は、レジスタ１３０１，演算器１３０２，命令バッファ
レジスタ１３０３，命令デコーダ１３０４，プログラム
カウンタ１３０５，分岐ＡＤＤＥＲ１３０６，ロード命
令用バイトアライナ１３０８,ストア命令用バイトアラ
イナ１３０７,割込み処理用アドレスジェネレータ１３
０９，パイプラインを実行するラッチ１３１０ａ〜ｋ，
パイプライン制御１３１２，条件分岐nottaken用アドレ
ス生成回路1313で構成される。The operation of basic instructions of the processor 100 shown in FIG. 1 will be described with reference to FIGS. FIG. 4 shows a detailed configuration of the arithmetic unit 108. Arithmetic unit 108
Is a register 1301, an arithmetic unit 1302, an instruction buffer register 1303, an instruction decoder 1304, a program counter 1305, a branch ADDER 1306, a load instruction byte aligner 1308, a store instruction byte aligner 1307, and an interrupt processing address generator 13.
09, latches 1310a-k for executing the pipeline,
It is composed of a pipeline control 1312, an address generation circuit 1313 for conditional branch nottaken.

【００２３】図５に命令キャッシュユニット１０６の構
成を示す。１２０１は論理アドレスラッチ、１２０２は
命令用ＴＬＢ、１２０３は命令キャッシュ用アドレスア
レ、１２０４はアドレス格納メモリ、１２０５はデータ
格納バッファ及びプリフェッチバッファ、１２０６は命
令オペランド格納メモリ、１２０７は命令キャッシュ制
御ユニット、１２１０はＴＬＢのＶＰＮ(バーチャルペ
ージ)比較器、１２１１はＴＬＢのプロテクション情報
比較器、１２１２はアドレスアレ用の比較器、１２１３
はセレクタで構成する。ＴＬＢ，命令キャッシュはダイ
レクトマップ方式をとり、命令ＴＬＢは１２８エント
リ、命令キャッシュの容量は１Ｋエントリ３２Ｋバイト
である。FIG. 5 shows the configuration of the instruction cache unit 106. Reference numeral 1201 is a logical address latch, 1202 is an instruction TLB, 1203 is an address cache address array, 1204 is an address storage memory, 1205 is a data storage buffer and prefetch buffer, 1206 is an instruction operand storage memory, 1207 is an instruction cache control unit, 1210. Is a TLB VPN (virtual page) comparator, 1211 is a TLB protection information comparator, 1212 is an address array comparator, and 1213.
Is composed of selectors. The TLB and the instruction cache adopt a direct map method, the instruction TLB has 128 entries, and the capacity of the instruction cache is 1K entry and 32K bytes.

【００２４】図６にデータキャッシュユニット１０６の
構成を示す。１００１は論理アドレスラッチ、１００２
はデータ用ＴＬＢ、１００３はデータキャッシュ用アド
レスアレ、１００４はアドレス格納バッファ、１００５
はデータ格納バッファ、1006はデータオペランド格納メ
モリ、１００７はデータキャッシュ制御ユニット、１０
１０はＴＬＢのＶＰＮ（バーチャルページ）比較器、１
０１１はＴＬＢのプロテクション情報比較器、１０１２
はアドレスアレ用の比較器、１０１３はセレクタ、１０
１４はＣＣＣ(キャッシュコヒーレンスチェック)のため
の比較器で構成する。ＴＬＢ，命令キャッシュはダイレ
クトマップ方式をとり、データＴＬＢは１２８エント
リ、データキャッシュの容量は１Ｋエントリ３２Ｋバイ
トである。FIG. 6 shows the configuration of the data cache unit 106. 1001 is a logical address latch, 1002
Is a TLB for data, 1003 is an address array for data cache, 1004 is an address storage buffer, 1005
Is a data storage buffer, 1006 is a data operand storage memory, 1007 is a data cache control unit, 10
10 is a TLB VPN (virtual page) comparator, 1
011 is a TLB protection information comparator, 1012
Is a comparator for address array, 1013 is a selector, 10
Reference numeral 14 is a comparator for CCC (cache coherence check). The TLB and the instruction cache adopt the direct map method, the data TLB has 128 entries, and the capacity of the data cache is 1K entry and 32K bytes.

【００２５】パイプライン動作は図７に示すように６ス
テージで実行される。命令セットは一般のＲＩＳＣで用
いられるレジスタ間の演算命令，分岐命令，ロード命
令，ストア命令を持つ。これらの命令の基本動作は、本
発明と直接関係無いので、メモリアクセスと関係のある
命令フェッチとデータフェッチの部分だけを述べる。The pipeline operation is executed in 6 stages as shown in FIG. The instruction set has operation instructions between registers, branch instructions, load instructions, and store instructions used in general RISC. Since the basic operation of these instructions is not directly related to the present invention, only the instruction fetch and data fetch portions related to memory access will be described.

【００２６】ＩＦステージにおいて、命令キャッシュユ
ニット１０６では、演算ユニット１０８からの命令アド
レス１２３を図５に示すように論理アドレスラッチ1201
にセットし、命令フェッチリクエスト１２４により命令
キャッシュユニット１０６に起動がかかる。論理アドレ
スは２−４ビットのブロック内アドレスと５−１４ビッ
トのブロックアドレスと１２−３１ビットのページアド
レスに分けられる。通常の動作は、ページアドレスがＴ
ＬＢにより論理アドレスから物理アドレスへ変換され
る。１２−３１ビットのページアドレスは、１２−１８
ビットで命令ＴＬＢ内のエントリを選択し、読みだされ
たアドレス１２０２ａと１９−３１ビットのアドレスを
比較器１２１０を介して比較され、一致したときＴＬＢ
ヒットとなる。更に、選ばれたエントリに対してプロテ
クション情報１２０２ｃを読みだし管理用レジスタと比
較器１２１１を通してチェックする。アドレスアレ1203
は論理アドレスの５−１４ビットでアドレスアレ内の１
Ｋエントリのうちの１つを選択する。読みだされた物理
アドレスページ(ＰＰＮ)１２０３ａと命令ＴＬＢから読
みだされた物理アドレスページ（ＰＰＮ）１２０２ｂは
比較器１２１２で比較され一致したときキャッシュヒッ
トとなる。In the IF stage, in the instruction cache unit 106, the instruction address 123 from the arithmetic unit 108 is transferred to the logical address latch 1201 as shown in FIG.
, And the instruction fetch request 124 activates the instruction cache unit 106. The logical address is divided into a block address of 2-4 bits, a block address of 5-14 bits, and a page address of 12-31 bits. In normal operation, the page address is T
The logical address is converted to a physical address by the LB. The page address of 12-31 bits is 12-18
The entry in the instruction TLB is selected by bits, and the read address 1202a and the address of 19-31 bits are compared via the comparator 1210, and when they match, the TLB
It becomes a hit. Further, the protection information 1202c is read out from the selected entry and checked through the management register and the comparator 1211. Address are 1203
Is 5-14 bits of the logical address and 1 in the address array
Select one of the K entries. The physical address page (PPN) 1203a read out and the physical address page (PPN) 1202b read out from the instruction TLB are compared by the comparator 1212, and when they match, a cache hit occurs.

【００２７】なお、命令キャッシュ用アドレスアレは８
バイト単位の処理ができるように４つのサブブロック有
効ビット１２０３ｂを格納する。図８（ａ）に１エント
リ当たりの１２０３ｂの内部情報を示す。更に、図８
（ｂ）にＢＶ１から４までの組合せを示す。同一ブロッ
ク内の連続するサブブロックのみ許される制約を設け
る。一方、命令オペランド格納メモリ１２０６は論理ア
ドレスの５−１４ビットで命令オペランド格納メモリ内
の１Ｋエントリ内の１つを選択する。選択された３２バ
イトの命令オペランドを、更に、ブロック内アドレス２
−５ビットで８分の１に選択すると、４バイトの命令が
オペランドバス１２２を通して演算器ユニット１０８内
の命令バッファレジスタ１３０３へ送られる。The instruction cache address array is 8
The four sub-block valid bits 1203b are stored so that processing can be performed in byte units. FIG. 8A shows the internal information of 1203b per entry. Furthermore, FIG.
The combination of BV1 to 4 is shown in (b). A constraint is set that allows only consecutive sub-blocks in the same block. On the other hand, the instruction operand storage memory 1206 selects one of the 1K entries in the instruction operand storage memory with 5-14 bits of the logical address. The selected 32-byte instruction operand is added to the block address 2
If -8 is selected by -5 bits, a 4-byte instruction is sent to the instruction buffer register 1303 in the arithmetic unit 108 through the operand bus 122.

【００２８】ところで、ロード命令は、演算ユニット１
０８からのデータアドレス１２６を図６に示すように論
理アドレスラッチ１００１にセットし、データフェッチ
リクエスト１２７によりデータキャッシュユニット１０
７に起動がかかる。論理アドレスは２−４ビットのブロ
ック内アドレスと５−１４ビットのブロックアドレスと
１２−３１ビットのページアドレスに分けられる。ペー
ジアドレスがＴＬＢにより論理アドレスから物理アドレ
スへ変換される。１２−３１ビットのページアドレス
は、１２−１８ビットでデータＴＬＢ内のエントリを選
択し、読みだされたアドレス１００２ａと１９−３１ビ
ットのアドレスを比較器１０１０を介して比較され、一
致したときＴＬＢヒットとなる。更に、選ばれたエント
リに対してプロテクション情報１００２ｃを読みだし管
理用レジスタと比較器１０１１を通してチェックする。
アドレスアレ１００３は論理アドレスの５−１４ビット
でアドレスアレ内の１Ｋエントリのうちの１つを選択す
る。読みだされた物理アドレスページ（ＰＰＮ）１００
３ａとデータＴＬＢから読みだされた物理アドレスペー
ジ（ＰＰＮ）１００２ｂは比較器１０１２で比較され一
致したときキャッシュヒットとなる。By the way, the load instruction is the operation unit 1
The data address 126 from 08 is set in the logical address latch 1001 as shown in FIG.
Starts up on 7. The logical address is divided into a block address of 2-4 bits, a block address of 5-14 bits, and a page address of 12-31 bits. The page address is converted from the logical address to the physical address by the TLB. The page address of 12-31 bits selects an entry in the data TLB by 12-18 bits, and the read address 1002a and the address of 19-31 bits are compared via the comparator 1010. It becomes a hit. Further, the protection information 1002c is read out from the selected entry and checked through the management register and the comparator 1011.
The address array 1003 selects one of the 1K entries in the address array by 5-14 bits of the logical address. Physical address page (PPN) 100 read out
3a and the physical address page (PPN) 1002b read from the data TLB are compared by the comparator 1012, and when they match, a cache hit occurs.

【００２９】なお、データキャッシュ用アドレスアレは
８バイト単位の処理ができるように４つのサブブロック
有効ビット１００３ｂを格納する。サブブロックは命令
キャッシュと同様の構成をとる。一方、データ格納メモ
リ１００６は論理アドレスの５−１４ビットでデータ格
納メモリ内の１Ｋエントリ内の１つを選択する。選択さ
れた３２バイトのデータを、更に、ブロック内アドレス
２−５ビットで８分の１に選択すると、４バイトの命令
がデータバス１２５を通して演算器ユニット１０８内へ
送られる。The data cache address array stores four sub-block valid bits 1003b so that processing can be performed in units of 8 bytes. The sub-block has the same structure as the instruction cache. On the other hand, the data storage memory 1006 selects one of the 1K entries in the data storage memory with 5-14 bits of the logical address. When the selected 32 bytes of data are further selected to 1/8 by the address 2-5 bits in the block, a 4-byte instruction is sent to the arithmetic unit 108 through the data bus 125.

【００３０】ストア命令の動作は、Ｅステージで演算ユ
ニット１０８からのデータアドレス１２６が論理アドレ
スラッチ１００１にセットされ、データリクエスト１２
４（ストア要求）によりデータキャッシュユニットに起
動がかかる。Ｔステージで演算ユニット１０８よりデー
タバス１２５を通して、書き込むべきデータがデータキ
ャッシュユニット１０７に送られ、データ格納バッファ
１００５に格納される。比較器１０１０，１０１１，１
０１２の結果、ＴＬＢヒット，プロテクションヒット，
キャッシュヒットであれば、ストア命令のＷステージ
で、データ格納メモリ１００６へ書き込まれて命令は完
了する。In the operation of the store instruction, the data address 126 from the arithmetic unit 108 is set in the logical address latch 1001 at the E stage, and the data request 12
4 (store request) activates the data cache unit. In the T stage, the data to be written is sent from the arithmetic unit 108 through the data bus 125 to the data cache unit 107 and stored in the data storage buffer 1005. Comparator 1010, 1011, 1
As a result of 012, TLB hit, protection hit,
If it is a cache hit, it is written to the data storage memory 1006 at the W stage of the store instruction and the instruction is completed.

【００３１】本実施例で示す通り、プロセッサ内部の演
算ユニットからのメモリアクセス要求に対し、内蔵キャ
ッシュメモリにデータが存在している場合、メモリアク
セス要求に対応したデータを供給することが出来る。As shown in this embodiment, in response to a memory access request from an arithmetic unit inside the processor, if data is present in the built-in cache memory, data corresponding to the memory access request can be supplied.

【００３２】次に、図１に戻って、命令キャッシュユニ
ット１０６，データキャッシュユニット１０７からチッ
プ外部へのメモリアクセスについて図９，図１０，図１
１，図１２により述べる。Next, returning to FIG. 1, memory access from the instruction cache unit 106 and the data cache unit 107 to the outside of the chip will be described with reference to FIGS.
1, described with reference to FIG.

【００３３】図９はメモリ空間範囲指定用判定ユニット
１０１の構成で、分散共有アドレス中の自己メモリ空間
下限値格納レジスタ４０１，分散用アドレス中の自己メ
モリ空間上限値格納レジスタ４０２，内部アドレスバス
１４０と下限値レジスタ４０１を比較する比較回路４０
３，内部アドレスバス１４０と上限値レジスタ４０２を
比較する比較回路４０４，自己メモリ空間内に内部アド
レスが入っているか否かを判定する判定回路より構成さ
れる。FIG. 9 shows the configuration of the memory space range designation determination unit 101, which is a self-memory space lower limit value storage register 401 in a distributed shared address, a self memory space upper limit value storage register 402 in a distribution address, and an internal address bus 140. Circuit 40 for comparing the lower limit value register 401 with the lower limit value register 401
3, it comprises a comparison circuit 404 for comparing the internal address bus 140 and the upper limit register 402, and a judgment circuit for judging whether or not an internal address is contained in its own memory space.

【００３４】図１０はメモリアクセス制御１０２の構成
で、マスク回路１０２ａ，優先判定回路１０２ｃ，バス
起動回路およびバスアービタ１０２ｄ，プリフェッチイ
ンタフェース制御１９０１，命令キャッシュユニットイ
ンタフェース制御１９０２，データキャッシュユニット
インタフェース制御１９０３より構成される。FIG. 10 shows the configuration of the memory access control 102, which comprises a mask circuit 102a, a priority determination circuit 102c, a bus activation circuit and bus arbiter 102d, a prefetch interface control 191, an instruction cache unit interface control 1902, and a data cache unit interface control 1903. To be done.

【００３５】図１１はメモリバス制御１０３で、メモリ
制御回路５０１，データ用バッファ回路５０２，アドレ
ス用バッファ回路５０３，スイッチ５０４，５０５，５
０６から構成される。FIG. 11 shows a memory bus control 103, which is a memory control circuit 501, a data buffer circuit 502, an address buffer circuit 503, switches 504, 505, and 5.
It is composed of 06.

【００３６】図１２は転送バス制御１０４で、転送バス
制御回路９０１，データ用バッファ回路９０２，９０
３，アドレス用バッファ回路９０４，９０５，スイッチ
９０６〜９１０より構成される。FIG. 12 shows a transfer bus control 104, which includes a transfer bus control circuit 901, data buffer circuits 902 and 90.
3, address buffer circuits 904, 905, and switches 906 to 910.

【００３７】続いて、命令キャッシュユニット１０６か
ら命令リクエスト１３３がアサートされたときの動作を
示す。Next, the operation when the instruction request 133 is asserted from the instruction cache unit 106 will be shown.

【００３８】命令キャッシュユニット１０６内のアドレ
ス比較器１２１２の比較の結果キャッシュミスになると
命令キャッシュ制御ユニット１２０７は演算ユニット１
０８に対してパイプライン処理を待つように指示（図中
では示していない）し、信号１３３を通して、メモリア
クセス制御１０２に命令フェッチ要求を出す。アドレス
格納バッファ１２０４は物理アドレスを生成し、命令ア
ドレス１３９としてセレクタ１０９を介して、内部アド
レスバス１４０に出力する。メモリ空間指定用判定ユニ
ットは、常に内部アドレスバス１４０を監視している。If a cache miss occurs as a result of comparison by the address comparator 1212 in the instruction cache unit 106, the instruction cache control unit 1207 causes the arithmetic unit 1 to operate.
08 is instructed to wait for pipeline processing (not shown in the figure), and an instruction fetch request is issued to the memory access control 102 through the signal 133. The address storage buffer 1204 generates a physical address and outputs it as an instruction address 139 to the internal address bus 140 via the selector 109. The memory space designating determination unit constantly monitors the internal address bus 140.

【００３９】メモリ空間範囲指定用判定ユニット１０１
の動作を説明する。各プロセッサは、システム立ち上げ
のとき、下限値格納レジスタ４０１，上限値格納レジス
タ４０２にソフトウエア上で、設定すべきデータを４０
６を介してセットする。読みだし用４０７と書き込み用
４０６のバスは、演算器ユニット内のレジスタ１３０１
とコントロールレジスタ用の転送命令によりデータ転送
ができる（図中省略）。各プロセッサは直接メモリバス
を介してアクセス可能な各分散メモリの下限，上限を設
定する。本実施例の場合、図３より、図１３（ａ）のよ
うな値を設定する。Judgment unit 101 for specifying memory space range
The operation of will be described. When starting up the system, each processor stores 40 pieces of data to be set in the lower limit value storage register 401 and the upper limit value storage register 402 by software.
Set via 6. The buses 407 for reading and 406 for writing are registers 1301 in the arithmetic unit.
Data can be transferred by using the transfer command for the control register (not shown in the figure). Each processor sets the lower and upper limits of each distributed memory that can be accessed directly via the memory bus. In the case of the present embodiment, the values shown in FIG. 13A are set according to FIG.

【００４０】比較回路４０３は内部アドレスバス１４０
が下限レジスタ４０１以上であることを検出するように
動作する。また、比較回路４０４は内部アドレスバス１
４０が上限レジスタ４０１以下であることを検出するよ
うに動作する。判定回路405は、比較回路４０３の出力
４１０と比較回路４０４の出力４２０を入力し、直接接
続されている分散メモリの空間内であることを判定す
る。判定された信号139は、メモリアクセス制御１０２
へ送られる。判定回路４０５は信号線４１０と４２０の
論理積により実現できる。The comparison circuit 403 uses the internal address bus 140.
Operates to detect that is lower than or equal to the lower limit register 401. The comparison circuit 404 uses the internal address bus 1
It operates to detect that 40 is less than or equal to the upper limit register 401. The determination circuit 405 inputs the output 410 of the comparison circuit 403 and the output 420 of the comparison circuit 404, and determines that it is within the space of the directly connected distributed memory. The determined signal 139 is the memory access control 102.
Sent to. The determination circuit 405 can be realized by a logical product of the signal lines 410 and 420.

【００４１】命令キャッシュからのリクエスト１３３は
優先判定回路１０２ｃに入る。いくつかのリクエストが
同時に入ってきたときに１０２ｃはリクエストの順序付
けを行う。優先判定の結果、命令キャッシュからのリク
エストが選ばれると信号1907がアサートされる。この信
号はプロセッサ内部リクエストが外部に対して選択され
たリクエストである。バス起動回路及びバスアービタ１
０２ｄは外部からのリクエスト１４６，１４８と共にど
のリクエストが優先かを判定して優先度の最も高いもの
を受付ける。リクエスト１９０７が受け付けられると、
バス起動回路及びバスアービタ１０２ｄはメモリバス１
１０，転送バス１１１のどちらに起動をかけるかを信号
１３９より判定する。もし、１３９がアサートされてい
れば、信号１４５を通してメモリバス制御１０３へブロ
ック転送（３２バイト）要求を出力する。もし、１３９
がネゲートされていれば、信号１４７を通して転送バス
制御１０４へサブブロック転送（８バイト）要求を出力
する。メモリバスと転送バスで転送量が異なる理由とし
て、メモリバスはプロセッサが直接アクセスできる分散
メモリへの転送手段であるが、転送バスは、他のプロセ
ッサの分散メモリへのアクセスするための転送手段であ
り、４つのプロセッサが共有しており、転送量を減らす
ことで転送バスの使用率を下げるためである。The request 133 from the instruction cache enters the priority determination circuit 102c. When several requests arrive at the same time, 102c orders the requests. When the request from the instruction cache is selected as a result of the priority determination, the signal 1907 is asserted. This signal is a request that the processor internal request has been selected to the outside. Bus startup circuit and bus arbiter 1
02d determines, together with requests 146 and 148 from the outside, which request has priority, and accepts the request with the highest priority. When request 1907 is accepted,
The bus start circuit and the bus arbiter 102d are the memory bus 1
The signal 139 determines which of 10 and the transfer bus 111 is to be activated. If 139 is asserted, a block transfer (32 bytes) request is output to the memory bus control 103 via the signal 145. If 139
Is negated, the sub-block transfer (8 bytes) request is output to the transfer bus control 104 through the signal 147. The reason why the transfer amount is different between the memory bus and the transfer bus is that the memory bus is a transfer means to the distributed memory that the processor can directly access, but the transfer bus is a transfer means to access the distributed memory of another processor. This is because the four processors are shared and the transfer bus usage rate is reduced by reducing the transfer amount.

【００４２】さらに、優先判定回路１０２ｃは信号１９
０５をアサートさせ命令キャッシュユニットインタフェ
ース制御１９０２にリクエストが受け付けられたことを
知らせる。命令キャッシュユニットインタフェース制御
１９０２は、信号１９０５とメモリ空間範囲指定用判定
ユニット１０１からの信号１３９によって命令キャッシ
ュユニット１０６へ応答する信号を作成する。１３４ａ
は応答信号であり、リクエストが受け付けられたときに
アサートする。１３４ｂは付属情報でありブロック転送
（３２バイト）かサブブロック転送（８バイト）で受け
付けられたかを応答するためのものである。この１３４
ａ，１３４ｂを受け取った命令キャッシュユニット１０
６は、リクエストが受け付けられたこと、そのリクエス
トに対するブロックサイズがブロック転送かサブブロッ
ク転送である事を知り、それに合わせてデータがくるの
を待つ。Further, the priority determination circuit 102c outputs the signal 19
05 is asserted to notify the instruction cache unit interface control 1902 that the request has been accepted. The instruction cache unit interface control 1902 creates a signal responding to the instruction cache unit 106 by the signal 1905 and the signal 139 from the memory space range designation determination unit 101. 134a
Is a response signal and is asserted when the request is accepted. Reference numeral 134b is ancillary information, which is for responding whether the block transfer (32 bytes) or the sub-block transfer (8 bytes) is accepted. This 134
instruction cache unit 10 which has received a and 134b
6 knows that the request has been accepted and that the block size for the request is block transfer or sub-block transfer, and waits for data to arrive accordingly.

【００４３】命令キャッシュメモリからのリクエストが
プロセッサに直接接続されているメモリへのアクセスで
ある場合の動作について述べる。メモリバス制御１０３
は、メモリバスリクエスト１４５により起動がかかる。
メモリバス制御１０３の動作は分散メモリへのリードラ
イトによりスイッチの動作を信号５１０，５１６，５１
７，５１８を使って制御する。The operation when the request from the instruction cache memory is an access to the memory directly connected to the processor will be described. Memory bus control 103
Is activated by the memory bus request 145.
The operation of the memory bus control 103 is a signal 510, 516, 51 for switching the switch operation by reading and writing to the distributed memory.
Control using 7,518.

【００４４】続いて、命令キャッシュメモリからのリク
エストがプロセッサに直接接続されたメモリへのアクセ
スでない別の分散メモリへのアクセス（つまり転送バス
へのアクセス）の場合の動作について述べる。転送バス
制御１０４は、転送バスリクエスト１４７により起動が
かかる。メモリバス制御１０４の動作は、プロセッサと
直接接続されていない分散メモリ，Ｉ／Ｏへのリード，
ライトアクセスをスイッチの動作を信号９３０，９３
１，９３２，９３３，９３４，９３５を使って制御す
る。Next, the operation when the request from the instruction cache memory is an access to another distributed memory (that is, an access to the transfer bus) that is not an access to the memory directly connected to the processor will be described. The transfer bus control 104 is activated by the transfer bus request 147. The operation of the memory bus control 104 includes distributed memory not directly connected to the processor, read to I / O,
Write access, switch operation signal 930, 93
1, 933, 933, 934, 935 are used for control.

【００４５】なお、転送バス制御は、リクエストにより
起動していないとき、１１１と142を接続するようにス
イッチ９０９，９１０が制御されており、転送バス１１
１を内部キャッシュコヒーレントチェック用アドレスを
通して監視できる。In the transfer bus control, the switches 909 and 910 are controlled so as to connect 111 and 142 when the transfer bus 11 is not activated by the request.
1 can be monitored through the internal cache coherent check address.

【００４６】以上、メモリバスまたは転送バスより読み
だされたデータは、内部バス１４１を通して命令キャッ
シュユニット１０６へ転送される。同時に、制御信号は
146または１４８よりメモリアクセス制御１０２へ報告
され、これを受けて１０２ｆ，１３４を伝わり命令キャ
ッシュユニット１０６に知らされる。The data read from the memory bus or transfer bus as described above is transferred to the instruction cache unit 106 through the internal bus 141. At the same time, the control signal is
It is reported to the memory access control 102 from 146 or 148, and in response to this, it is transmitted to 102f and 134 and notified to the instruction cache unit 106.

【００４７】つぎに、データキャッシュユニット１０７
からデータリクエスト１３５がアサートされたときの動
作を示す。データキャッシュからのリクエスト１３５は
優先判定回路１０２ｃに入る。優先判定の結果、データ
キャッシュからのリクエストが選ばれると図１０の信号
１９０７がアサートされる。バス起動回路１０２ｄがリ
クエスト１９０７を受け付けると、バス起動回路はメモ
リバス、転送バスのどちらかに起動をかけるかを信号１
３９より判定する。もし、１３９がアサートされていれ
ば、信号１４５を通してメモリバス制御１０３へブロッ
ク転送（３２バイト）要求を出力する。もし、１３９が
ネゲートされていれば、信号１４７を通して転送バス制
御１０４へサブブロック転送（８バイト）要求を出力す
る。Next, the data cache unit 107
Shows the operation when the data request 135 is asserted. The request 135 from the data cache enters the priority determination circuit 102c. If the request from the data cache is selected as a result of the priority determination, the signal 1907 in FIG. 10 is asserted. When the bus activation circuit 102d receives the request 1907, the bus activation circuit outputs a signal 1 indicating whether to activate the memory bus or the transfer bus.
It judges from 39. If 139 is asserted, a block transfer (32 bytes) request is output to the memory bus control 103 via the signal 145. If 139 is negated, a sub-block transfer (8 bytes) request is output to the transfer bus control 104 through the signal 147.

【００４８】さらに、優先判定回路１０２ｃは信号１９
０６をアサートさせデータキャッシュユニットインタフ
ェース制御１９０３にリクエストが受け付けられたこと
を知らせる。命令キャッシュユニットインタフェース制
御１９０３は、信号１９０６とメモリ空間範囲指定用判
定ユニット１０１からの信号１３９によってデータキャ
ッシュユニット１０７へ応答する信号を作成する。１３
６ａは応答信号であり、リクエストが受け付けられたと
きにアサートする。１３６ｂは付属情報でありブロック
転送（３２バイト）かサブブロック転送（８バイト）で
受け付けられたかを応答するためのものである。この１
３６ａ，１３６ｂを受け取ったデータキャッシュユニッ
ト１０７は、リクエストが受け付けられたこと、そのリ
クエストに対するブロックサイズがブロック転送かサブ
ブロック転送である事を知り、それに合わせてデータが
くるのを待つ。Further, the priority determination circuit 102c outputs the signal 19
Assert 06 to notify the data cache unit interface control 1903 that the request has been accepted. The instruction cache unit interface control 1903 creates a signal responding to the data cache unit 107 by the signal 1906 and the signal 139 from the memory space range designation determination unit 101. Thirteen
Reference numeral 6a is a response signal, which is asserted when the request is accepted. Reference numeral 136b is ancillary information, which is used to respond whether the block transfer (32 bytes) or the sub-block transfer (8 bytes) is accepted. This one
The data cache unit 107 that has received 36a and 136b knows that the request has been accepted and that the block size for the request is block transfer or sub-block transfer, and waits for data to arrive accordingly.

【００４９】データキャッシュメモリからのリクエスト
がプロセッサに直接接続されているメモリへのアクセス
である場合の動作について述べる。メモリバス制御１０
３は、メモリバスリクエスト１４５により起動がかか
る。メモリバス制御１０３の動作は分散メモリへのリー
ドライトによりスイッチの動作を信号５１０，５１６，
５１７，５１８を使って制御する。The operation when the request from the data cache memory is an access to the memory directly connected to the processor will be described. Memory bus control 10
3 is activated by the memory bus request 145. The operation of the memory bus control 103 controls the operation of the switch by reading and writing to the distributed memory 510, 516 and 516.
It controls using 517 and 518.

【００５０】命令キャッシュメモリからのリクエストが
プロセッサに直接接続されたメモリへのアクセスでない
別の分散メモリへのアクセス(つまり転送バスへのアク
セス)の場合の動作について述べる。転送バス制御１０
４は、転送バスリクエスト147により起動がかかる。メ
モリバス制御１０４の動作は、プロセッサと直接接続さ
れていない分散メモリ，Ｉ／Ｏへのリード，ライトアク
セスをスイッチの動作を信号，９３０，９３１，９３
２，９３３，９３４，９３５を使って制御する。The operation in the case where the request from the instruction cache memory is an access to another distributed memory (that is, an access to the transfer bus) which is not an access to the memory directly connected to the processor will be described. Transfer bus control 10
4 is activated by the transfer bus request 147. The operations of the memory bus control 104 are distributed memory not directly connected to the processor, read / write access to I / O, switch operation signals, 930, 931 and 93.
2, 933, 934, 935 are used for control.

【００５１】メモリバスまたは転送バスより読みだされ
たデータは、内部バス１４１を通してデータキャッシュ
ユニット１０７へ転送される。同時に、制御信号は１４
６または１４８よりメモリアクセス制御１０２へ報告さ
れ、これを受けて１０２ｆ，１３６を伝わりデータキャ
ッシュユニット１０７に知らされる。The data read from the memory bus or the transfer bus is transferred to the data cache unit 107 through the internal bus 141. At the same time, the control signal is 14
6 or 148 reports to the memory access control 102, and in response to this, the data is transmitted to 102f and 136 and notified to the data cache unit 107.

【００５２】プロセッサ内部から外部メモリのアクセス
へのリクエストの最後として、前もって予測した先行フ
ェッチを行いメモリアクセスすべきデータがあらかじめ
キャッシュメモリに存在しているようにしたプリフェッ
チ方式を述べる。なお、プリフェッチ方式は、あくまで
も予測して先行プリフェッチするものであるので実際に
フェッチしなくても良いという特徴がある。本実施例で
は、命令のプリフェッチとデータのプリフェッチを行
う。図１４（ａ)，(ｂ）に概念を示す。命令のプリフェ
ッチ（ａ）は、命令キャッシュユニット１０６がキャッ
シュミスしたとき、その次のブロックをプリフェッチす
るものでプリフェッチユニット内の演算器で次のブロッ
クのアドレス計算し、メモリアクセスにより得た情報は
プリフェッチバッファに格納される。At the end of the request from the inside of the processor to access the external memory, a prefetch method will be described in which the prefetch that is predicted in advance is performed so that the data to be memory-accessed already exists in the cache memory. It should be noted that the prefetch method has a feature that it is not necessary to actually fetch it because it is a method of predicting and performing prefetching. In this embodiment, instruction prefetch and data prefetch are performed. The concept is shown in FIGS. 14 (a) and 14 (b). The instruction prefetch (a) prefetches the next block when the instruction cache unit 106 makes a cache miss. The arithmetic unit in the prefetch unit calculates the address of the next block, and the information obtained by the memory access is prefetched. It is stored in the buffer.

【００５３】データのプリフェッチ（ｂ）はロード，ス
トア命令のポストインクリメントと、プリデクリメント
機能を利用する。The data prefetch (b) uses the post-increment of load and store instructions and the pre-decrement function.

【００５４】ロード命令，ストア命令の命令機能を図１
５に示す。The instruction functions of the load instruction and the store instruction are shown in FIG.
5 shows.

【００５５】命令「ＬＯＡＤ」は、ディスプレースメン
ト＋汎用レジスタ（ｂ）がアドレスとなり、そのアドレ
スの指す内容は汎用レジスタ（ｔ）に格納する。命令
「ＳＴＯＲＥ」は、ディスプレースメント＋汎用レジス
タ（ｂ）がアドレスとなり、汎用レジスタ（ｒ）の内容
がアドレスで指されたメモリへ書き込まれる。In the instruction "LOAD", the displacement + general-purpose register (b) serves as an address, and the content indicated by the address is stored in the general-purpose register (t). In the instruction "STORE", the displacement + general-purpose register (b) serves as an address, and the content of the general-purpose register (r) is written to the memory pointed to by the address.

【００５６】さらに、各命令共にアドレッシングにより
さらに２つのバリエーションを持つ。「−−−ＰＩ」
は、アドレス計算せず汎用レジスタ（ｂ）がアドレスと
なると共に、同時にディスプレースメント＋汎用レジス
タ（ｂ）を計算して、その値を汎用レジスタ（ｂ）に格
納する。このアドレッシングモードはいわいるポストイ
ンクリメント機能を有する。「−−−ＰＤ」は、ディス
プレースメント＋汎用レジスタ（ｂ）がアドレスと共
に、同時にディスプレースメント＋汎用レジスタ（ｂ）
を計算して、その値を汎用レジスタ（ｂ）に格納する。
このアドレッシングモードはディスプレースメントが負
の値を持った時、いわいるプリディクリメント機能を有
する。Furthermore, each instruction has two more variations by addressing. "--- PI"
Calculates the displacement + general-purpose register (b) at the same time as the address of the general-purpose register (b) without calculating the address, and stores the value in the general-purpose register (b). This addressing mode has a so-called post-increment function. “−−− PD” indicates that displacement + general-purpose register (b) together with address, displacement + general-purpose register (b) at the same time.
Is stored in the general-purpose register (b).
This addressing mode has a so-called pre-decrement function when the displacement has a negative value.

【００５７】データのプリフェッチ（ｂ）は、プロセッ
サからデータキャッシュヘのリクエストが無いときに、
モディファイアドレスによりデータキャッシュユニット
をアクセスする。データキャッシュユニットがヒットす
ればプリフェッチは完了するがミスしたらメモリアクセ
スを行う。Data prefetch (b) is performed when there is no request from the processor to the data cache.
The data cache unit is accessed by the modified address. If the data cache unit hits, the prefetch is completed, but if it misses, memory access is performed.

【００５８】続いてプリフェッチ方式の詳細な説明を行
う。図１６はプリフェッチユニット１０５で、プリデク
リメント用演算器１４０１，命令プリフェッチ用演算器
1402，プリフェッチ用制御回路１４０３，データプリフ
ェッチ用アドレスバッファ１４０５，命令プリフェッチ
用アドレスバッファ１４０６，セレクタ１４０７より構
成する。Next, the prefetch method will be described in detail. FIG. 16 shows a prefetch unit 105, which is a predecrement arithmetic unit 1401, and an instruction prefetch arithmetic unit.
1402, prefetch control circuit 1403, data prefetch address buffer 1405, instruction prefetch address buffer 1406, and selector 1407.

【００５９】命令プリフェッチの動作を説明する。信号
１２９でキャッシュミスしたことを報告されると、ミス
した命令アドレス情報(物理アドレス)が１２８を介して
プリフェッチユニットへ出力される。命令プリフェッチ
用演算器１４０２は、命令アドレス情報＋３２バイトの
演算を行う。この結果はミスしたブロックの次のブロッ
クのアドレスである。この値は命令プリフェッチ用アド
レスバッファ１４０６に格納される。その後、プリフェ
ッチ制御回路１４０３よりプリフェッチリクエスト１３
２とプリフェッチ用アドレス１３０を出力する。リクエ
スト１３２はメモリアクセス制御１０２内のマスク回路
１０２ａへ送られる。ここで、メモリ空間範囲指定用判
定ユニット１０１の出力１３９により、もし、プロセッ
サが直接接続している自己分散メモリのアクセスであれ
ば図１２の１９０９はアサートされる。さらに、優先判
定回路１０２ｃは信号１９０４をアサートさせプリフェ
ッチユニットインタフェース制御１９０１にリクエスト
が受け付けられ、マスクされなかったことを知らせる。
プリフェッチユニットインタフェース制御１９０１は、
信号１９０４とメモリ空間範囲指定用判定ユニット１０
１の記憶比較ユニット４０３〜４０５からの信号１３９
によってプリフェッチユニット１０５へ応答する信号を
作成する。１３１ａは応答信号であり、リクエストが受
け付けられたときアサートする。１３１ｂは付属情報で
ありマスクされたか否かを応答するためのものである。
この１３１ａ，１３１ｂを受け取ったプリフェッチユニ
ット１０５は、リクエストが受け付けられたこと、その
リクエストに対してマスクされなかった事を知り、さら
に、信号１２９を介して命令キャッシュユニット106に
知らせる。The operation of instruction prefetch will be described. When the signal 129 reports that a cache miss has occurred, the missed instruction address information (physical address) is output to the prefetch unit via 128. The instruction prefetch arithmetic unit 1402 performs arithmetic operation of instruction address information + 32 bytes. The result is the address of the block following the missed block. This value is stored in the instruction prefetch address buffer 1406. After that, the prefetch request is sent from the prefetch control circuit 1403.
2 and the prefetch address 130 are output. The request 132 is sent to the mask circuit 102a in the memory access control 102. Here, 1909 of FIG. 12 is asserted by the output 139 of the determination unit 101 for specifying the memory space range, if the self-distributed memory directly connected to the processor is accessed. Further, the priority determination circuit 102c asserts the signal 1904 to notify the prefetch unit interface control 1901 that the request has been accepted and has not been masked.
The prefetch unit interface control 1901 is
Signal 1904 and memory space range designation determination unit 10
Signal 139 from the memory comparison unit 403-405 of 1
Produces a signal in response to the prefetch unit 105. Reference numeral 131a is a response signal, which is asserted when the request is accepted. Reference numeral 131b is attached information, which is used for responding whether or not it is masked.
The prefetch unit 105 that has received these 131a and 131b knows that the request has been accepted and that the request has not been masked, and further informs the instruction cache unit 106 via the signal 129.

【００６０】リクエスト１９０９がアサートされると、
バス起動回路及びバスアービタ102dは通常とおり信号１
３９によりリクエスト１４５を出力する。これによっ
て、自己分散メモリをアクセスし内部データバス１４１
を通して図５のデータ格納バッファ及びプリフェッチバ
ッファ１２０５へ格納する。同時に、制御信号は１４６
または１４８よりメモリアクセス制御１０２へ報告さ
れ、これを受けて１０２ｆ，１３１を伝わりプリフェッ
チユニット１０５へ知らされる。さらに、信号129を介
して命令キャッシュユニット１０６に知らせる。When request 1909 is asserted,
The bus start-up circuit and the bus arbiter 102d are normally signal 1
A request 145 is output according to 39. As a result, the self-distributed memory is accessed and the internal data bus 141 is accessed.
Through the data storage buffer and the prefetch buffer 1205 of FIG. At the same time, the control signal is 146
Alternatively, it is reported to the memory access control 102 from 148, and in response to this, the data is transmitted to 102f and 131 and notified to the prefetch unit 105. It also informs the instruction cache unit 106 via signal 129.

【００６１】一方、マスク回路１０２ａ内で、メモリ空
間範囲指定用判定ユニット１０１の出力１３９により、
他の分散メモリやＩ／Ｏアクセスであることが認識され
れば、図１０の１９０９はネゲートされる。ネゲートさ
れると、この部分でリクエストが終了し、転送バスへの
プリフェッチが行われないことを意味する。On the other hand, in the mask circuit 102a, the output 139 of the memory space range designation determination unit 101 causes
If the other distributed memory or I / O access is recognized, 1909 in FIG. 10 is negated. When negated, it means that the request ends at this part and prefetching to the transfer bus is not performed.

【００６２】さらに、優先判定回路１０２ｃは信号１９
０４をアサートさせプリフェッチユニットインタフェー
ス制御１９０１にリクエストが受け付けられ、マスクさ
れたことを知らせる。プリフェッチユニットインタフェ
ース制御１９０１は、信号１９０４とメモリ空間範囲指
定用判定ユニット１０１からの信号１３９によってプリ
フェッチユニット１０５へ応答する信号を作成する。１
３１ａは応答信号であり、リクエストが受け付けられた
ときアサートする。１３１ｂは付属情報でありマスクさ
れたか否かを応答するためのものである。この１３１
ａ，１３１ｂを受け取ったプリフェッチユニット１０５
は、リクエストが受け付けられたこと、そのリクエスト
に対してマスクされた事を知り、さらに、信号１２９を
介して命令キャッシュユニット１０６に知らせる。Further, the priority determination circuit 102c outputs the signal 19
04 is asserted to notify the prefetch unit interface control 1901 that the request has been accepted and masked. The prefetch unit interface control 1901 creates a signal responding to the prefetch unit 105 by the signal 1904 and the signal 139 from the memory space range designation determination unit 101. 1
Reference numeral 31a is a response signal, which is asserted when the request is accepted. Reference numeral 131b is attached information, which is used for responding whether or not it is masked. This 131
prefetch unit 105 that received a and 131b
Knows that the request has been accepted and has been masked for that request, and also informs the instruction cache unit 106 via signal 129.

【００６３】つまり、命令プロフェッチ要求は、プリフ
ェッチユニット１０５より出されるが、メモリアクセス
制御１０２内で自己分散メモリ空間か否かを判定し、自
己分散メモリ空間のときのみアクセスを許し、それ以外
のアドレス空間へのアクセスは許さないという制御を行
う。これによって、転送バスには予測されたプリフェッ
チによる起動がかからず、転送バスの使用率を下げる効
果がある。That is, the instruction profetch request is issued from the prefetch unit 105, but it is judged in the memory access control 102 whether or not it is the self-distributed memory space, access is permitted only in the self-distributed memory space, and other than that. Controls that access to the address space is not permitted. This prevents the transfer bus from being activated by the predicted prefetch, and has the effect of reducing the usage rate of the transfer bus.

【００６４】続いて、データのプリフェッチの動作を説
明する。データのプリフェッチは、プリフェッチアドレ
スを出力するまでに２つの方法を持つ。Next, the data prefetch operation will be described. Data prefetch has two methods until the prefetch address is output.

【００６５】ポストインクリメント機能を有する命令
「ＬＯＡＤＰＩ」「ＳＴＯＲＥＰＩ」は、演算ユニット
１０８内の演算器１３０２の出力がモディファイアドレ
スであるため、図１６で１４１１のパスを通りセレクタ
１４０７を介してデータプリフェッチ用アドレスバッフ
ァ１４０５に格納する。The instructions "LOADPI" and "STOREPI" having the post-increment function pass through the path 1411 in FIG. 16 for the data prefetch because the output of the arithmetic unit 1302 in the arithmetic unit 108 is a modify address. It is stored in the address buffer 1405.

【００６６】一方、プリデクリメント機能を有する命令
「ＬＯＡＤＰＤ」「ＳＴＯＲＥＰＤ」は演算ユニット１０
８内の演算器１３０２の出力がモディファイアドレスと
ならないため、演算結果＋ディスプレースメントの計算
を演算器１４０１で計算する。この結果はモディファイ
アドレスとなりセレクタ１４０７を介してデータプリフ
ェッチ用アドレスバッファ１４０５に格納する。On the other hand, the instructions "LOADPD" and "STOREPD" having the pre-decrement function are the arithmetic units 10
Since the output of the arithmetic unit 1302 in 8 does not become the modified address, the arithmetic unit 1401 calculates the calculation result + displacement. The result becomes a modified address and is stored in the data prefetch address buffer 1405 via the selector 1407.

【００６７】２つの方法により、データプリフェッチ用
アドレスバッファ１４０５に格納された論理アドレス
は、プリフェッチ用アドレス１５０としてデータキャッ
シュユニット１０７へ送られる。図６のアドレス格納バ
ッファ１００４で受け、1023を通して論理アドレスラッ
チ１００１に格納される。これ以降の動作はデータキャ
ッシュユニット１０７の動作と同様である。つまり、デ
ータキャッシュユニットがヒットした場合は信号１２９
を介してプリフェッチユニット１０５へ完了の報告を行
う。一方、ミスした場合、自己分散メモリ内のアクセス
であればブロック転送（３２バイト）であり、他の分散
メモリ、Ｉ／Ｏであればサブブロック転送（８バイト）
を行い信号１２９を介してプリフェッチユニット１０５
へ完了の報告を行う。According to two methods, the logical address stored in the data prefetch address buffer 1405 is sent to the data cache unit 107 as the prefetch address 150. It is received by the address storage buffer 1004 in FIG. 6 and stored in the logical address latch 1001 through 1023. The subsequent operation is similar to that of the data cache unit 107. That is, if the data cache unit is hit, signal 129
The completion is reported to the prefetch unit 105 via. On the other hand, if a miss occurs, the block transfer (32 bytes) if the access is within the self-distributed memory, and the sub-block transfer (8 bytes) if it is another distributed memory or I / O.
Prefetch unit 105 via signal 129
To report completion.

【００６８】つまり、データプリフェッチ動作は、命令
プリフェッチ動作と異なり、データプリフェッチ要求は
プリフェッチユニット１０５より出されるが、アドレス
は論理アドレスであるため、始めにデータキャッシュユ
ニット１０７で、そのデータが存在するか否かのチェッ
クを行う。その後、ヒットであれば、プリフェッチは完
了する。ミスであれば、データキャッシュユニットの動
作と同様に、メモリアクセス制御１０２内で自己分散メ
モリ空間か否かを判定し自己分散空間のときのみブロッ
ク単位のアクセスを許し、それ以外のアドレス空間への
アクセスはサブブロック単位のアクセスを許す制御を行
いプリフェッチは完了する。これによって予測されたプ
リフェッチによる転送バスの使用時間を短くする効果が
ある。That is, the data prefetch operation is different from the instruction prefetch operation in that the data prefetch request is issued from the prefetch unit 105. However, since the address is a logical address, the data cache unit 107 first checks whether the data exists. Check whether or not. Then, if it is a hit, the prefetch is completed. If it is a miss, similar to the operation of the data cache unit, it is determined in the memory access control 102 whether or not the memory space is a self-distributed memory space, and block-wise access is permitted only in the self-distributed space, and the other address spaces The access is controlled to allow access in sub-block units, and the prefetch is completed. This has the effect of shortening the predicted use time of the transfer bus by prefetching.

【００６９】以上、本実施例のプロセッサ内部からの外
部メモリへのアクセスについて述べたが、本実施例の改
善について述べる。The access to the external memory from the inside of the processor according to this embodiment has been described above, but the improvement of this embodiment will be described.

【００７０】分散メモリ容量固定のシステムは、上限，
下限レジスタ４０１，４０２をハードウエアで固定（た
とえばＲＯＭメモリ化）してもよい。A system with a fixed distributed memory capacity has an upper limit,
The lower limit registers 401 and 402 may be fixed by hardware (for example, ROM memory).

【００７１】さらに、比較回路４０４は、常に内部アド
レスバス１４０が上限値レジスタ４０２より小さいこと
を検出するようにすればアドレス設定は図１３（ｂ）の
ようにすることもできる。Further, if the comparison circuit 404 always detects that the internal address bus 140 is smaller than the upper limit value register 402, the address can be set as shown in FIG. 13B.

【００７２】さらに、分散メモリの増減単位を例えば２
の１６乗（６４Ｋバイト）と制限すれば、上位１６ビッ
トの比較でよく、レジスタ４０１，４０２にセットする
値は図１３（ｃ）のようになる。これによって、比較回
路４０３，４０４，下限，上限レジスタ４０１，４０２
を全て１６ビットにすることができ、小型化できる。さ
らに、図９に示すメモリ空間範囲指定用判定ユニット１
０１の変形として上限値または下限値アドレス（特定の
ベースアドレス）とそのメモリ空間範囲をあらため設定
しておき、内部アドレスバスが上記設定された範囲指定
内か否か検出してもよい。Further, the increment / decrement unit of the distributed memory is, for example, 2
If it is limited to the 16th power of 64 (64 Kbytes), the upper 16 bits can be compared, and the values set in the registers 401 and 402 are as shown in FIG. As a result, the comparison circuits 403 and 404, the lower limit and the upper limit registers 401 and 402
Can be all 16 bits and can be miniaturized. Further, the determination unit 1 for specifying the memory space range shown in FIG.
As a modification of 01, the upper limit value or lower limit value address (specific base address) and its memory space range may be newly set, and it may be detected whether or not the internal address bus is within the specified range.

【００７３】なお、本実施例において、上記レジスタ４
０２にＸ′ＥＦＦＦＦＦＦＦ，下限レジスタ４０１に
Ｘ′００００００００という値を設定すると、シングル
プロセッサ構成のシステムを容易に構築することができ
る。In the present embodiment, the register 4 described above is used.
02 X 'E FFFFFFF, by setting the value of X'00000000 lower limit register 401, it is possible to easily construct a system of single processor configuration.

【００７４】さらに、本実施例において、上記レジスタ
４０２にＸ′ＦＦＦＦＦＦＦＦ，下限レジスタ４０１に
Ｘ′００００００００という値を設定すると、転送バス
をまったく使用しないシングルプロセッサ構成のシステ
ムを容易に構築することができる。[0074] Further, in this embodiment, the register 402 in the X 'F FFFFFFF, by setting the value of the lower limit register 401 X'00000000, it is readily constructed the system of single processor configuration using no transfer bus it can.

【００７５】さらに、本実施例において、上記レジスタ
４０２にＸ′ＥＦＦＦＦＦＦＦ，下限レジスタ４０１に
Ｘ′ＥＦＦＦＦＦＦＦという値を設定すると、アドレス
Ｘ′ＥＦＦＦＦＦＦＦ（システムとして使用禁止アドレ
ス）以外は転送バスのみ使用することになり、従来のマ
ルチプロセッサ構成のシステムを容易に構築することが
できる。Further, in the present embodiment, when a value of X'EFFFFFFF is set in the register 402 and a value of X'EFFFFFFF is set in the lower limit register 401, only the transfer bus other than the address X'EFFFFFFF (prohibited address in the system) should be used. Therefore, a conventional multiprocessor system can be easily constructed.

【００７６】さらに、一般に、転送バスをサブブロック
単位でアクセスするとバス使用時間は短くなる。しかし
ながら、連続するアドレスをアクセスするようなアプリ
ケーションによっては、サブブロック単位のアクセスを
４回（８バイト×４）よりもブロック単位のアクセス
（３２バイト）の方が転送バスを使用する使用時間は短
い。なぜなら、転送するまでの初期動作が必ず入るため
である。このため、自己分散メモリへのアクセスをサブ
ブロック単位、その他の分散メモリへのアクセスをブロ
ック単位とするようにバス起動制御１０２を改良するこ
ともできる。Further, generally, when the transfer bus is accessed in sub-block units, the bus use time becomes short. However, depending on the application that accesses consecutive addresses, the block bus access (32 bytes) takes less time to use the transfer bus than the sub block access four times (8 bytes × 4). . This is because the initial operation until transfer is inevitable. Therefore, the bus activation control 102 can be improved so that the self-distributed memory is accessed in sub-block units and the other distributed memories are accessed in block units.

【００７７】さらに、プリフェッチにおいて本実施例で
は命令とデータの制御を変えているが、命令とデータと
もにどちらかに合わせて同じようにすることも簡単な修
正で可能であるし、本実施例の命令とデータの制御を入
れ替えることも可能である。さて、本実施例に戻って、
今までとは観点を替えて、プロセッサの外部からのプロ
セッサアクセス要求について述べる。プロセッサアクセ
スは２つに分けられる。Further, in the prefetch, the control of the instruction and the data is changed in the present embodiment, but it is also possible to make the same for both the instruction and the data by a simple modification. It is also possible to exchange control of instructions and data. Now, returning to this embodiment,
From a different point of view, the processor access request from the outside of the processor will be described. Processor access is divided into two.

【００７８】（１）他のプロセッサから本プロセッサの
分散メモリへのアクセス。(1) Access to the distributed memory of this processor from another processor.

【００７９】（２）他のプロセッサから本プロセッサの
データキャッシュメモリへのキャッシュコヒーレントチ
ェック。(2) Cache coherent check from another processor to the data cache memory of this processor.

【００８０】（１）の動作を説明する。本プロセッサは
転送バス１１１上のアドレスが自己分散メモリ範囲内か
どうかを９３２を通して転送バス制御回路９０１で常に
監視しており（メモリ空間範囲指定判定ユニット内の判
定回路と同じものを持っている）、範囲内の時、転送バ
ス１１１から本プロセッサのアクセス要求がある認識
し、転送バス制御回路９０１は、転送バスリクエスト１
４８を図１０のバス起動回路及びアービタ１０２ｄに出
力して内部バス使用許可を求める。その後の動作はメモ
リバスと転送バスがデータを受渡しするように制御され
メモリ転送を完了する。The operation (1) will be described. The processor constantly monitors whether the address on the transfer bus 111 is within the self-distributed memory range by the transfer bus control circuit 901 through 932 (having the same judgment circuit as the judgment circuit in the memory space range designation judgment unit). , Within the range, the transfer bus 111 recognizes that there is an access request from the processor, and the transfer bus control circuit 901 determines that the transfer bus request 1
48 is output to the bus activation circuit and arbiter 102d shown in FIG. 10 to request permission to use the internal bus. Subsequent operations are controlled so that the memory bus and the transfer bus pass data, and the memory transfer is completed.

【００８１】続いて、（２）の動作を説明する。他のプ
ロセッサから本プロセッサのデータキャッシュメモリへ
のキャッシュコヒーレントチェックは、図１２でバス９
２７，１４２を介してデータキャッシュユニットにチェ
ックすべきアドレスが送られる。図６内のデータキャッ
シュユニット１０７ではデータ用アドレスアレ1003でキ
ャッシュ内に同じアドレスのものが更新されていないか
どうかをチェックする。アドレス１４４の５−１４ビッ
トで１Ｋエントリのうちの１つを選択する。読みだされ
た物理アドレスページと上位１３−３２ビットのアドレ
スは比較器1014で比較される。その出力１０３７とキャ
ッシュのコヒーテンス用情報をデータキャッシュ制御ユ
ニット１００７でチェックする。キャッシュミス、ある
いは、ヒットしていても更新されていなければ動作は終
了である。しかしながら、ヒットしてかつ更新されてい
るときにはキャッシュのデータをメモリに書き戻す処理
を行う必要がある。これによって常にデータの一致性を
保証する。Next, the operation (2) will be described. The cache coherent check from another processor to the data cache memory of this processor is performed by the bus 9 in FIG.
The address to be checked is sent to the data cache unit via 27, 142. In the data cache unit 107 in FIG. 6, the data address array 1003 checks whether or not the cache having the same address is updated. One of 1K entries is selected by bits 5-14 of address 144. The read physical address page and the upper 13-32 bit address are compared by the comparator 1014. The output 1037 and the coherency information of the cache are checked by the data cache control unit 1007. If there is a cache miss, or if there is a hit but it has not been updated, the operation ends. However, when it is hit and updated, it is necessary to write back the cache data to the memory. This always guarantees data consistency.

【００８２】データ用アドレスアレ１００３は、２ポー
トの構成となっており、通常のプロセッサの動作とキャ
ッシュコヒーレントチェックは同時に実行できる。さら
に、プロセッサがキャッシュミスして、自己分散メモリ
をアクセスしている場合にも、どちらかの処理を止める
こと無く並行して実行できる。これは、マルチプロセッ
サを高性能化できる効果がある。The data address array 1003 has a two-port structure, and normal processor operation and cache coherent check can be executed simultaneously. Further, even when the processor misses the cache and is accessing the self-distributed memory, the processes can be executed in parallel without stopping either process. This has the effect of improving the performance of the multiprocessor.

【００８３】本実施例の改良としてメモリ構成，システ
ム構成を変えたときのメモリ空間範囲指定用記憶，比較
ユニットとの関係を図１７，図１８，図１９，図２０を
使って述べる。As an improvement of this embodiment, the relationship with the memory space range designation storage and the comparison unit when the memory configuration and the system configuration are changed will be described with reference to FIGS. 17, 18, 19, and 20.

【００８４】図１７は、いくつかの範囲に分けて実装し
た各分散メモリのメモリ空間を示したものである。この
ようなときのメモリ空間範囲指定用判定ユニット１０１
は、図１９に示すように下限値格納レジスタ４０１，４
５０，上限値格納レジスタ４０２，４５１，内部アドレ
スバスがレジスタの値より大きいことを比較する比較器
４０３，４５２，内部アドレスバスがレジスタの値より
小さいことを比較する比較器４０４，４５３、それぞれ
の領域を判定する判定回路４５４より構成される。プロ
セッサ−１は、初期状態のときＸ′００００００００よ
り始まる領域の上限，下限アドレスを４０１，４０２に
セットし、Ｘ′７０００００００より始まる領域の上
限，下限アドレスを４５０，４５１にセットする。これ
によって、分散メモリ−１のメモリ空間範囲を認識でき
る。FIG. 17 shows the memory space of each distributed memory implemented by being divided into several ranges. The determination unit 101 for specifying the memory space range in such a case
Is the lower limit value storage registers 401, 4 as shown in FIG.
50, upper limit value storage registers 402 and 451, comparators 403 and 452 that compare that the internal address bus is larger than the register value, comparators 404 and 453 that compare that the internal address bus is smaller than the register value, respectively. It is composed of a determination circuit 454 for determining a region. In the initial state, processor-1 sets the upper and lower limit addresses of the area starting from X'00000000 to 401 and 402, and the upper and lower addresses of the area starting from X'70000000 to 450 and 451. Thereby, the memory space range of the distributed memory-1 can be recognized.

【００８５】図１８は、クラスタによる階層をつけたと
きのシステム構成を示す。このシステムは、クラスタ−
１（１６５０）、クラスタ−２（１６５１）より構成す
る。さらに、１６０１〜１６０４はプロセッサ、１６０
４〜１６０７は分散メモリ、１６０８〜１６０９はＩ／
Ｏコントローラ、１６１０〜１６１１はクラスタ間通信
コントローラ、１６１２はネット網で構成する。システ
ムの各分散メモリ空間の構成は図３と同じとする。この
時のメモリ空間範囲指定判定ユニットへの下限レジス
タ、上限レジスタに図１３（ａ）の値をセットする。FIG. 18 shows the system configuration when a cluster hierarchy is added. This system is a cluster
1 (1650) and cluster-2 (1651). Further, 1601-1604 are processors, 160
4 to 1607 are distributed memories, 1608 to 1609 are I / O
An O controller, 1610 to 1611 are inter-cluster communication controllers, and 1612 is a net network. The configuration of each distributed memory space of the system is the same as in FIG. At this time, the values shown in FIG. 13A are set in the lower limit register and the upper limit register for the memory space range designation determination unit.

【００８６】このシステムの動作は、図１８において、
プロセッサ−１を基準に考えると、プロセッサ−１がア
ドレスＸ′０００００１００なるアドレスをアクセスす
る場合、プロセッサ−１内部の制御により、プロセッサ
−１が直接接続している分散メモリ−１へのアクセスで
あることを認識し、プロセッサ−１はメモリバス−１を
起動して分散メモリ−１（１６０４）へブロック単位の
のアクセスを行う。The operation of this system is shown in FIG.
Considering the processor -1 as a reference, when the processor -1 accesses the address X'00000100, it is an access to the distributed memory -1, which is directly connected to the processor -1, by the control inside the processor -1. Recognizing this, the processor-1 activates the memory bus-1 and accesses the distributed memory-1 (1604) in block units.

【００８７】一方、プロセッサ−１がアドレスＸ′４Ｆ
００００００なるアドレスをアクセスする場合、プロセ
ッサ−１内部の制御により、プロセッサ−１が転送バス
へのアクセスであることを認識し、プロセッサ−１は転
送バス１６２４へサブブロック単位に起動する。プロセ
ッサ−１，２（１６０１，１６０２）はクラスタ内転送
バス−１２を常に監視しており、このケースでは、プロ
セッサ−２が各プロセッサに直接接続している分散メモ
リへのアクセスか否かを調べる。本例では、プロセッサ
−２が直接接続している分散メモリ−２へのアクセスで
あることを認識しデータを受け取る。プロセッサ−２は
転送バスより受け取ったデータより、分散メモリ−２へ
のアクセスをするための準備としてプロセッサ−２の内
部で転送バスとメモリバス−２を接続する。このような
一連の操作により、プロセッサ−１（１６０１），クラ
スタ内転送バス−１２（１６２４），プロセッサ−２
（１６０２），メモリバス−２，分散メモリ−２（１６
０５）と接続されアクセスすることが可能になる。On the other hand, the processor-1 sends the address X'4F
When an address of 000000 is accessed, the processor-1 recognizes that the access is to the transfer bus under the control of the processor-1, and the processor-1 activates the transfer bus 1624 in sub-block units. The processors-1, 2 (1601, 1602) constantly monitor the intra-cluster transfer bus-12, and in this case, it is checked whether or not the processor-2 is accessing the distributed memory directly connected to each processor. . In this example, the processor-2 recognizes that it is an access to the distributed memory-2 directly connected to it and receives data. The processor-2 connects the transfer bus and the memory bus-2 inside the processor-2 in preparation for accessing the distributed memory-2 from the data received from the transfer bus. By such a series of operations, the processor-1 (1601), the intra-cluster transfer bus-12 (1624), the processor-2
(1602), memory bus-2, distributed memory-2 (16
05) to be connected and accessible.

【００８８】さらに、プロセッサ−１がアドレスＸ′Ｅ
Ｆ００００００なるアドレスをアクセスする場合、プロ
セッサ−１内部の制御により、プロセッサ−１が転送バ
スへのアクセスであることを認識し、プロセッサ−１は
クラスタ内転送バス−１２へサブブロック単位に起動
し、クラスタ間通信コントローラ−１２（１６１０）を
通してネット網１６１２をアクセスする。その後、クラ
スタ間通信コントローラ−３４（１６１１）を通してク
ラスタ内転送バス−３４をアクセスする。プロセッサ−
３，４（１６０３，１６０４）はクラスタ内転送バス−
３４を常に監視しており、このケースでは、プロセッサ
−３〜４が各プロセッサに直接接続している分散メモリ
へのアクセスか否かを調べる。本例では、プロセッサ−
４が直接接続している分散メモリ−４へのアクセスであ
ることを認識しデータを受け取る。プロセッサ−４はク
ラスタ内転送バス−３４より受け取ったデータより、分
散メモリ−４へのアクセスをするための準備としてプロ
セッサ−４の内部でクラスタ内転送バス−３４とメモリ
バス−４を接続する。このような一連の操作により、プ
ロセッサ−１（１６０４），クラスタ内転送バス−１２
（１６２４），クラスタ間通信コントローラ−１２（１
６１０），ネット網１６１２，クラスタ間通信コントロ
ーラ−３４（１６１１），クラスタ内転送バス−３４
（１６２５），プロセッサ−４（１６０４），メモリバ
ス−４，分散メモリ−４（１６０７）と接続されアクセ
スすることが可能になる。Further, the processor-1 determines that the address X'E
When accessing the address F000000, the processor-1 recognizes that it is an access to the transfer bus under the control of the processor-1, and the processor-1 activates the intra-cluster transfer bus-12 in sub-block units, The network 1612 is accessed through the inter-cluster communication controller-12 (1610). After that, the intra-cluster transfer bus-34 is accessed through the inter-cluster communication controller-34 (1611). Processor
3, 4 (1603, 1604) are intra-cluster transfer buses-
34 is constantly monitored, and in this case, it is checked whether the processors 3 to 4 are accessing the distributed memory directly connected to each processor. In this example, the processor
4 recognizes that it is an access to the distributed memory 4 directly connected and receives the data. The processor-4 connects the intra-cluster transfer bus-34 and the memory bus-4 inside the processor-4 in preparation for accessing the distributed memory-4 from the data received from the intra-cluster transfer bus-34. By such a series of operations, the processor-1 (1604), the intra-cluster transfer bus-12
(1624), inter-cluster communication controller-12 (1
610), network 1612, inter-cluster communication controller-34 (1611), intra-cluster transfer bus-34
(1625), processor-4 (1604), memory bus-4, and distributed memory-4 (1607) can be connected to and accessed.

【００８９】図１８のシステムの更なる改良点として、
各プロセッサのメモリ空間範囲指定用判定ユニット１０
１を図１９のような２つの空間の指定範囲を判定できる
構成とする。この時、各レジスタにセットする値を図２
１に示す。ここで注目すべき点は、レジスタ４０１，４
０２は自己分散メモリの範囲であり、レジスタ４５０，
４５１はクラスタ内の分散メモリの範囲であることであ
る。こうすることによりさらに細かい制御が可能であ
る。例えば、図１６のシステムの動作は、プロセッサ−
１を基準に考えると、プロセッサ−１が分散メモリ−１
（１６０４）へブロック単位（３２バイト）のアクセス
を、さらに、分散メモリ−２（１６０５）に２つのサブ
ブロック単位（１６バイト）のアクセスを、さらに、分
散メモリ−３〜４（１６０６〜１６０７）にサブブロッ
ク単位（８バイト）のアクセスを起動することも可能で
ある。具体的には、判定回路４５４により１３９ａがア
サートされればブロック転送，１３９ａがネゲート，１
３９ｂがアサートの時２つのサブブロック転送，１３９
ａ，１３９ｂが共にネゲートの時サブブロック転送する
ようにバス起動制御１０２は制御する。つまり、自己分
散メモリ，クラスタ内の分散メモリ，クラスタ間の分散
メモリごとにブロック転送量を替えることが可能であ
る。なお、この時、転送バスのプロトコルとしてアドレ
ス，データ，プロトコル制御信号に加えて、アクセスす
る単位である転送量のサイズも制御信号と共に送れるよ
うにする必要がある。As a further improvement of the system of FIG.
Judgment unit 10 for specifying memory space range of each processor
1 is configured to be able to determine the designated range of two spaces as shown in FIG. At this time, the values set in each register are shown in FIG.
Shown in 1. The point to be noted here is that the registers 401, 4
02 is the range of the self-distributed memory, and the register 450,
451 is the range of the distributed memory in the cluster. By doing so, finer control is possible. For example, the operation of the system shown in FIG.
Considering 1 as a reference, processor-1 is distributed memory-1
(1604) in block units (32 bytes), distributed memory-2 (1605) in two sub-block units (16 bytes), and distributed memories -3 to 4 (1606 to 1607). It is also possible to activate access in sub-block units (8 bytes). Specifically, if 139a is asserted by the determination circuit 454, block transfer is performed, 139a is negated, 1
Two sub-block transfers when 39b is asserted, 139
The bus activation control 102 controls so that subblock transfer is performed when both a and 139b are negated. That is, it is possible to change the block transfer amount for each of the self-distributed memory, the distributed memory in the cluster, and the distributed memory between the clusters. At this time, in addition to the address, data, and protocol control signals as the protocol of the transfer bus, it is necessary to be able to send the size of the transfer amount which is the unit of access together with the control signal.

【００９０】更に、先行プリフェッチによるメモリアク
セスに対しては、プロセッサ−１が分散メモリ−１（１
６０４）へブロック単位（３２バイト）のアクセスを、
さらに、分散メモリ−２と分散メモリ−３〜４（１６０
６〜１６０７）にはアクセスしないようにすることも可
能である。Further, for the memory access by the preceding prefetch, the processor-1 uses the distributed memory-1 (1
Block unit (32 bytes) access to
Furthermore, distributed memory-2 and distributed memories-3 to 4 (160
It is also possible not to access 6 to 1607).

【００９１】本実施例は、１次キャッシュメモリだけを
考えているが、図２２は、プロセッサ２０１から２０４
が２次キャッシュを持つ場合のシステム構成である。特
徴は、各プロセッサ２０１から２０４から専用のキャッ
シュバスを通して２次キャッシュメモリ（４１０１から
４１０４）と接続する構成である。各プロセッサが２次
キャッシュ構成を取ったときの本発明への展開は、同業
者であれば容易類推可能であるため省略する。Although this embodiment considers only the primary cache memory, FIG. 22 shows processors 201 to 204.
Is a system configuration in the case where has a secondary cache. A feature is that each processor 201 to 204 is connected to the secondary cache memory (4101 to 4104) through a dedicated cache bus. The development of the present invention when each processor has the secondary cache configuration can be easily analogized by those skilled in the art, and therefore the description thereof will be omitted.

【００９２】さらに、図２３は、図１の改良として、プ
ロセッサ内部でマルチプロセッサ構成を取ったときのブ
ロック図を示す。このプロセッサは、演算ユニット−
１，演算ユニット−２を持ち、主要信号は４２１４，４
２２１が命令アドレスバス、４２１３，４２２０がオペ
ランドバス、４２１５，４２２２が命令キャッシュとの
制御信号、４２１１，４２２４がデータアドレスバス、
４２１０，４２２３がデータバス、４２１２，４２２５
がデータキャッシュとの制御信号である。命令キャッシ
ュユニット１０６，データキャッシュユニット１０７は
２ポートのキャッシュメモリ構成を取り、演算ユニット
４２０１と演算ユニット４２０２は内蔵キャッシュメモ
リを共有したマルチプロセッサである。このようなプロ
セッサ内部でマルチプロセッサ構成を取った場合も本発
明を展開することが可能である。Further, FIG. 23 shows a block diagram of a multiprocessor configuration inside the processor as an improvement of FIG. This processor is an arithmetic unit
1, arithmetic unit-2, main signals are 4214, 4
221 is an instruction address bus, 4213 and 4220 are operand buses, 4215 and 4222 are control signals with the instruction cache, 4211 and 4224 are data address buses,
4210 and 4223 are data buses, 4212 and 4225
Is a control signal for the data cache. The instruction cache unit 106 and the data cache unit 107 have a two-port cache memory configuration, and the arithmetic unit 4201 and the arithmetic unit 4202 are multiprocessors that share a built-in cache memory. The present invention can be developed even when a multiprocessor configuration is adopted inside such a processor.

【００９３】[0093]

【発明の効果】本発明の第１の効果は、分散共有メモリ
型のマルチプロセッサシステムにおいて、通信バスの占
有する時間を短くすべく、バス要求回数の低減と１回当
りの転送量の最適化をおこない、通信バスの使用率を下
げ、マルチプロセッサ全体の処理能力を向上させること
である。The first effect of the present invention is to reduce the number of bus requests and optimize the transfer amount per time in a distributed shared memory type multiprocessor system in order to shorten the time occupied by the communication bus. To reduce the usage rate of the communication bus and improve the processing capacity of the multiprocessor as a whole.

【００９４】さらに、本発明の第２の効果は、プロセッ
サシステムにおいて、最も重要なことはメモリアクセス
すべきデータがあらかじめキャッシュメモリに存在して
いることであり、今後使われるであろうデータに対して
先行してメモリアクセスを起こす先行プリフェッチを、
転送バスのバスネックを起こさないようにしながら可能
な限りおこない、マルチプロセッサ全体の処理能力を向
上させるシステムを提供することにある。Furthermore, the second effect of the present invention is that, in a processor system, the most important thing is that the data to be memory-accessed already exists in the cache memory, and the data to be used in the future is Prefetch that causes memory access in advance
It is an object of the present invention to provide a system for improving the processing performance of the entire multiprocessor by performing as much as possible without causing a bus neck of the transfer bus.

[Brief description of drawings]

【図１】本実施例のプロセッサのブロック図。FIG. 1 is a block diagram of a processor according to an embodiment.

【図２】図１で示したプロセッサを使用した共有分散メ
モリ型のマルチプロセッサの構成。FIG. 2 is a configuration of a shared distributed memory multiprocessor using the processor shown in FIG.

【図３】図２の分散メモリのアドレス空間。3 is an address space of the distributed memory of FIG.

【図４】演算ユニット１０８の構成。FIG. 4 shows a configuration of an arithmetic unit 108.

【図５】命令キャッシュユニット１０６の構成。FIG. 5 shows the configuration of the instruction cache unit 106.

【図６】データキャッシュユニット１０７の構成。FIG. 6 shows a configuration of a data cache unit 107.

【図７】パイプライン動作。FIG. 7: Pipeline operation.

【図８】サブブロック有効ビットの内部情報。FIG. 8 is internal information of a sub-block effective bit.

【図９】メモリ空間範囲指定用判定ユニット１０１の構
成。FIG. 9 is a configuration of a memory space range designation determination unit 101.

【図１０】メモリアクセス制御の構成。FIG. 10 shows a configuration of memory access control.

【図１１】メモリバス制御１０３の構成。FIG. 11 is a configuration of a memory bus control 103.

【図１２】転送バス制御１０４の構成。FIG. 12 shows a configuration of a transfer bus control 104.

【図１３】システム構成各下限，上限レジスタにセット
するアドレス値。FIG. 13 is an address value set in each of the system configuration lower and upper limit registers.

【図１４】命令，データのプリフェッチの概念図。FIG. 14 is a conceptual diagram of instruction and data prefetching.

【図１５】ロード命令，ストア命令の命令機能。FIG. 15 is an instruction function of a load instruction and a store instruction.

【図１６】プリフェッチユニット１０５の構成。クラス
タによる階層をつけたときのシステム構成。16 is a configuration of a prefetch unit 105. FIG. System configuration when cluster hierarchy is added.

【図１７】複数の範囲に分けて実装したときの各分散メ
モリのメモリ空間。FIG. 17 is a memory space of each distributed memory when implemented by being divided into a plurality of ranges.

【図１８】クラスタによる階層をつけたときのシステム
構成。FIG. 18 is a system configuration when a hierarchy is formed by clusters.

【図１９】２つの空間の指定範囲を判定できるメモリ空
間範囲指定用判定ユニット１０１の構成。FIG. 19 shows a configuration of a memory space range designation determination unit 101 capable of determining designated ranges of two spaces.

【図２０】図１９の構成の判定回路。20 is a determination circuit having the configuration of FIG.

【図２１】クラスタによる階層をつけたときのシステム
構成各下限，上限レジスタにセットするアドレス値。FIG. 21 is an address value set in each lower limit and upper limit register of the system configuration when a hierarchy is formed by clusters.

【図２２】２階層キャッシュを持つシステム例。FIG. 22 is an example of a system having a two-tier cache.

【図２３】チップ内マルチプロセッサの例。FIG. 23 shows an example of an on-chip multiprocessor.

[Explanation of symbols]

１００…プロセッサ、１０１…メモリ空間範囲指定用判
定ユニット、１０２…メモリアクセス制御、１０３…メ
モリバス制御、１０４…転送バス制御、１０５…プリフ
ェッチユニット、１０６…命令キャッシュユニット、１
０７…データキャッシュユニット、１０８…演算ユニッ
ト、１１０…メモリバス、１１１…転送バス、１２２…
オペランドバス、１２３…命令アドレス、１２５…デー
タバス、１２６…データアドレス、１３０…プリフェッ
チアドレス、１３１…リクエスト１３２に対する応答信
号、１３２…プリフェッチリクエスト、１３３…命令ア
ドレスリクエスト、１３４…リクエスト１３３に対応す
る応答信号、１３５…データキャッシュユニットから出
力しているデータアドレスリクエスト、１３６…リクエ
スト１３５に対する応答信号、１３７…命令キャッシュ
ユニットから出力している命令アドレス、１３９…メモ
リ空間範囲指定用記憶，比較ユニットから出力する範囲
指定結果信号、１４０…内部アドレスバス、１４１…デ
ータキャッシュユニット１０７，命令キャッシュユニッ
ト１０６，メモリバス制御１０３，転送バス制御１０４
を結ぶ内部データバス、１４５…バス起動制御から出力
しているメモリバス起動リクエスト、１４７…転送バス
起動リクエスト、２０１〜204…図１で示したプロセッ
サ、２０５〜２０８…分散メモリ、２０９…Ｉ／Ｏコン
トローラ、２１０〜２１２…ディスプレイ，プリンタ，
ディスク等のＩ／Ｏデバイス、２１３…転送バス、２１
４…Ｉ／Ｏバス、２１５〜２１８…各プロセッサ２０１
から２０４が分散メモリ２０５から２０８までと接続す
るメモリバス。100 ... Processor, 101 ... Judgment unit for specifying memory space range, 102 ... Memory access control, 103 ... Memory bus control, 104 ... Transfer bus control, 105 ... Prefetch unit, 106 ... Instruction cache unit, 1
07 ... Data cache unit, 108 ... Arithmetic unit, 110 ... Memory bus, 111 ... Transfer bus, 122 ...
Operand bus, 123 ... Instruction address, 125 ... Data bus, 126 ... Data address, 130 ... Prefetch address, 131 ... Response signal to request 132, 132 ... Prefetch request, 133 ... Instruction address request, 134 ... Response corresponding to request 133 Signal, 135 ... Data address request output from data cache unit, 136 ... Response signal to request 135, 137 ... Instruction address output from instruction cache unit, 139 ... Memory space range designation storage, output from comparison unit Range designation result signal, 140 ... Internal address bus, 141 ... Data cache unit 107, instruction cache unit 106, memory bus control 103, transfer bus control 104
Internal data bus connecting 145 ... Memory bus activation request output from bus activation control, 147 ... Transfer bus activation request, 201-204 ... Processor shown in FIG. 1, 205-208 ... Distributed memory, 209 ... I / O controller, 210-212 ... display, printer,
I / O devices such as disks, 213 ... Transfer bus, 21
4 ... I / O bus, 215-218 ... Each processor 201
To 204 connect to distributed memories 205 to 208.

───────────────────────────────────────────────────── フロントページの続き (72)発明者森岡道雄茨城県日立市大みか町七丁目１番１号株式会社日立製作所日立研究所内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Michio Morioka 7-1 Omika-cho, Hitachi-shi, Ibaraki Hitachi Ltd. Hitachi Research Laboratory

Claims

[Claims]

1. A processor for processing distributed data by accessing a distributed memory, which holds data in a distributed manner, via a memory bus, and at least one other distributed memory via a transfer bus. Is a variable data transfer amount processor having a memory access control unit that accesses the memory bus and the transfer bus by making the data transfer amount different.

2. A processor for processing data by accessing a distributed memory, which holds data in a distributed manner, via a memory bus, and at least one other distributed memory via a transfer bus to process data. Determines whether the data to be accessed is held in the distributed memory or in the other distributed memory, and if it is held in the distributed memory, the data is accessed with the first transfer amount via the memory bus. A variable data transfer amount processor having a memory access control unit for accessing the second transfer amount via the transfer bus if it is held in the other distributed memory.

3. A processor which processes distributed data by accessing a distributed memory for holding data in a distributed manner via a memory bus and accessing at least one other distributed memory via a transfer bus. Holds the address space information of all distributed memories, and holds the address of the data accessed by the processor based on the address space information whether the data to be accessed is held in the distributed memory or in another distributed memory. If the memory space range determination unit determines whether or not the memory space range determination unit determines whether or not the memory space range determination unit determines whether or not the memory space range determination unit determines whether or not the memory space range determination unit determines whether or not the memory space range determination unit determines whether or not the memory space range determination unit determines whether or not the memory space range determination unit determines whether or not the memory space range determination unit determines whether or not it is stored in the distributed memory. Memory access with a second transfer amount via the transfer bus if held in distributed memory A variable data transfer processor having a control unit.

4. A processor for processing data by accessing a distributed memory, which holds data in a distributed manner, via a memory bus, and at least one other distributed memory via a transfer bus to process the data. Has a calculation unit for processing data and a cache memory unit for holding a part of the data held in the distributed memory, and if the cache memory unit does not hold the data accessed by the calculation unit, It has a memory access control unit that determines whether the data is held in the distributed memory or the other distributed memory, and accesses the memory bus and the transfer bus with different data transfer amounts. Variable data transfer processor.

5. A processor for processing distributed data by accessing a distributed memory holding distributed data through a memory bus and accessing at least one other distributed memory through a transfer bus. Has a calculation unit for processing data and a cache memory unit for holding a part of the data held in the distributed memory, and if the cache memory unit does not hold the data accessed by the calculation unit, It is determined whether the data to be accessed is held in the distributed memory or in the other distributed memory, and if it is held in the distributed memory, the data is accessed with the first transfer amount via the memory bus. , A memory access control unit for accessing at a second transfer amount via the transfer bus if held in the other distributed memory A variable data transfer amount processor having:

6. A processor for processing data by accessing a distributed memory, which holds data in a distributed manner, via a memory bus, and at least one other distributed memory via a transfer bus to process the data. Is a computing unit that processes data, a cache memory unit that holds a part of the data held in the distributed memory and determines whether the data that the computing unit accesses is held, and all distributed memories If the address data of the data to be accessed is determined not to be held in the cache memory unit, the address of the data to be accessed is determined based on the address space information. Memory space range determination unit that determines whether the data is stored in According to the determination of the memory space range determination unit, if the data is held in the distributed memory, the first transfer amount is accessed through the memory bus, and if it is held in the other distributed memory, the transfer bus A data transfer amount variable processor, comprising a memory access control unit for accessing at a second transfer amount via.

7. A processor for processing distributed data by accessing a distributed memory for holding data in a distributed manner via a memory bus and accessing at least one other distributed memory via a transfer bus. Has a calculation unit that processes data, a cache memory unit that holds a part of the data held in the distributed memory, and a prefetch unit that writes data to the cache memory unit according to the processing of the calculation unit. If the data written in the cache memory unit by the prefetch unit is not the data accessed by the arithmetic unit, the prefetch unit writes the next data in the cache memory unit according to the next processing of the arithmetic unit. In order to ensure that the data to be written next is stored in the distributed memory, A data transfer amount variable processor, comprising: a memory access control unit that determines whether or not the data is held in memory, and accesses the memory bus and the transfer bus by making the data transfer amount different.

8. A processor for processing data by accessing a distributed memory that holds data in a distributed manner via a memory bus and at least one other distributed memory via a transfer bus to process the data. Has a calculation unit that processes data, a cache memory unit that holds a part of the data held in the distributed memory, and a prefetch unit that writes data to the cache memory unit according to the processing of the calculation unit. If the data written in the cache memory unit by the prefetch unit is not the data accessed by the arithmetic unit, the prefetch unit writes the next data in the cache memory unit according to the next processing of the arithmetic unit. In order to ensure that the data to be written next is stored in the distributed memory, Memory is determined, and if it is retained in the distributed memory, access is performed with the first transfer amount via the memory bus,
A data transfer amount variable processor having a memory access control unit for accessing at a second transfer amount via the transfer bus if it is held in the other distributed memory.

9. A processor for processing distributed data by accessing a distributed memory for holding data in a distributed manner via a memory bus and accessing at least one other distributed memory via a transfer bus, the processor comprising: Is an arithmetic unit that processes data, a cache memory unit that holds a part of the data held in the distributed memory and determines whether the data accessed by the arithmetic unit is held, and Data is written to the cache memory unit according to the processing, and if the data written in the cache memory unit is not the data accessed by the calculation unit, the data is written to the cache memory unit according to the next process of the calculation unit. Of the prefetch unit that writes the data of and the address space information of all distributed memory, A memory space range determination unit that determines whether the next write data is stored in the distributed memory or in the other distributed memory based on the address space information,
According to the determination of the memory space range determination unit, if the memory is held in the distributed memory, the first data is transferred via the memory bus.
A data transfer amount variable processor, comprising: a memory access control unit for accessing the transfer amount according to the second transfer amount via the transfer bus if the memory access control unit accesses the transfer amount according to the second transfer amount.

10. The data transfer amount variable processor according to any one of claims 1 to 9, the data transfer amount variable processor is connected to the data transfer amount variable processor by a memory bus, and the distribution has different address areas of all address areas. A distributed processing system comprising a plurality of instruction processing units each comprising a memory, and the respective instruction processing units being connected by a transfer bus.

11. The distributed processing system according to claim 10 is used as one cluster, and the plurality of clusters are provided, each of the clusters being connected by an inter-cluster communication control unit and a communication net network, and information between the clusters. The transfer amount is
An information processing system, which is different from the data transfer amount in the distributed processing system.

12. The inter-cluster communication control unit according to claim 11, wherein an address, data, a protocol control signal, and a data transfer amount are set and transmitted as a transfer bus protocol to another cluster via the communication network. An information processing system characterized by:

13. The information processing system according to claim 12, wherein when the data transfer amount is zero, the access to the zeroed address space is stopped.

14. The cache memory according to claim 4, wherein the cache memory has a block size divided into n sub-blocks, and a data transfer amount is (0 to n) × block size. A characteristic information processing system.