JP3441996B2

JP3441996B2 - Data processing device

Info

Publication number: JP3441996B2
Application number: JP08563999A
Authority: JP
Inventors: 裕明山本; 伸治尾崎; 佳人西道
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1994-06-07
Filing date: 1999-03-29
Publication date: 2003-09-02
Anticipated expiration: 2018-09-02
Also published as: JP2000029771A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、マイクロプロセッ
サ等に用いられるデータ処理装置に係り、特にデータ処
理速度の高速化と消費電力の低減とに関するものであ
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data processing device used in a microprocessor or the like, and more particularly to an increase in data processing speed and reduction in power consumption.

【０００２】[0002]

【従来の技術】近年、マイクロプロセッサなどデータ処
理装置の高性能化の要求に対応して、複数命令を同時に
実行する構成（スーパースカラ）を有するマイクロプロ
セッサが提案されている。このスーパースカラ構成を有
するマイクロプロセッサにおいては、毎サイクルの命令
キャッシュアクセスで複数の命令がフェッチされ、複数
本の命令バスに供給される。これらの命令は、複数の命
令実行部に対して選択発行され、実行される。各命令実
行部は特定の種類の命令のみを処理できることが多く、
したがって、命令の選択発行においては、フェッチした
命令の種類を判別し、その種類に応じてそれを処理する
ことが可能な命令実行部に対して発行する必要がある。2. Description of the Related Art In recent years, a microprocessor having a configuration (superscalar) for simultaneously executing a plurality of instructions has been proposed in response to a demand for higher performance of a data processing device such as a microprocessor. In the microprocessor having this superscalar structure, a plurality of instructions are fetched by instruction cache access every cycle and supplied to a plurality of instruction buses. These instructions are selectively issued to a plurality of instruction execution units and executed. Each instruction executor can often only process certain types of instructions,
Therefore, in the selective issue of an instruction, it is necessary to determine the type of the fetched instruction and issue it to the instruction execution unit capable of processing the fetched instruction.

【０００３】このような従来のデータ処理装置の各部の
構成について、以下説明する。The structure of each part of such a conventional data processing apparatus will be described below.

【０００４】図１０は、従来のデータ処理装置の構成を
示し、特に命令フェッチ部の詳細な構成を示す。同図に
示すように、データ処理装置には、命令キャッシュ２３
０と、命令フェッチ部２００と、命令実行部２５０，２
６０とが配置されている。この例では、上記命令実行部
には、整数演算命令を実行するように整数ユニット２５
２で構成された第１実行部２５０と、浮動小数点命令を
実行するように浮動小数点ユニット２６２で構成された
第２実行部２６０とが配置されている。また、各実行部
２５０，２６０への命令信号をデコードするための２つ
の命令デコーダ２５１，２６１が設けられている。そし
て、上記命令フェッチ部２００には、命令の種類を判別
するためのプリデコーダ２２１，２２２と、命令の種類
からその命令を実行可能な命令実行部を判断して命令を
選択供給する２つの命令選択回路２４１，２４２とが配
置されている。この各命令選択回路２４１，２４２は、
上述の各命令実行部２５０，２６０に対応して配置され
ている。上記命令キャッシュ２３０からは、この各命令
選択回路２４１，２４２に命令ＩＲ１，ＩＲ２を供給す
るための２本の命令バスＢin1 ，Ｂin2 が導出されてお
り、この各命令バスＢin1 ，Ｂin2 は、各々上記各命令
選択回路２４１，２４２に接続されている。さらに、上
記各命令バスＢin1 ，Ｂin2 はそれぞれ各プリデコーダ
２２１，２２２の入力側に接続されている。そして、各
プリデコーダ２２１，２２２の出力信号ＰＤ１，ＰＤ２
は、いずれも上記各命令選択回路２４１，２４２の制御
信号として用いられている。FIG. 10 shows the configuration of a conventional data processing device, and particularly shows the detailed configuration of an instruction fetch unit. As shown in the figure, the data processing apparatus includes an instruction cache 23.
0, the instruction fetch unit 200, and the instruction execution units 250 and 2
60 and 60 are arranged. In this example, the instruction execution unit is provided with an integer unit 25 so as to execute an integer operation instruction.
A first executor 250 composed of two and a second executor 260 composed of a floating point unit 262 for executing a floating point instruction are arranged. Further, two instruction decoders 251 and 261 for decoding the instruction signals to the execution units 250 and 260 are provided. The instruction fetch unit 200 includes predecoders 221 and 222 for discriminating the type of instruction, and two instructions for selectively supplying the instruction by discriminating an instruction execution unit capable of executing the instruction from the type of the instruction. Select circuits 241 and 242 are arranged. Each of the instruction selection circuits 241 and 242 has
It is arranged corresponding to each of the instruction execution units 250 and 260 described above. From the instruction cache 230, two instruction buses Bin1 and Bin2 for supplying the instructions IR1 and IR2 to the instruction selection circuits 241 and 242 are derived, and the instruction buses Bin1 and Bin2 are respectively described above. It is connected to each of the instruction selection circuits 241 and 242. Further, the instruction buses Bin1 and Bin2 are connected to the input sides of the predecoders 221 and 222, respectively. Then, the output signals PD1 and PD2 of the respective predecoders 221 and 222
Are used as control signals for the instruction selection circuits 241 and 242.

【０００５】図１１は、上記図１０の構成を有するデー
タ処理装置における各信号の状態を示すタイミングチャ
ートである。すなわち、上記命令キャッシュ２３０から
複数の命令ＩＲ１，ＩＲ２が供給された場合（同図のタ
イミングｔａ参照）、プリデコーダ２２１，２２２にお
いて、これらの命令の種類が判別され、その種類に応じ
て各命令選択回路２４１，２４２に制御信号ＰＤ１，Ｐ
Ｄ２が送られる（同図のタイミングｔｂ参照）。各命令
選択回路２４１，２４２は、これらの命令の種類からそ
れを実行することが可能な各命令実行部２５０，２６０
に対応する命令Ｉ１，Ｉ２を選択して、各命令実行部２
５０，２６０の入力側に配置された各命令デコーダ２５
１，２６１に実行命令Ｉ１，１２をそれぞれ出力する
（同図のタイミングｔｃ参照）。FIG. 11 is a timing chart showing the state of each signal in the data processing device having the configuration of FIG. That is, when a plurality of instructions IR1 and IR2 are supplied from the instruction cache 230 (see timing ta in the figure), the predecoders 221 and 222 discriminate the types of these instructions and each instruction according to the type. Control signals PD1 and P are sent to the selection circuits 241 and 242.
D2 is sent (see timing tb in the figure). The instruction selection circuits 241 and 242 have respective instruction execution units 250 and 260 capable of executing the instruction types from these types.
Select the instructions I1 and I2 corresponding to
Each instruction decoder 25 arranged on the input side of 50, 260
The execution instructions I1 and 12 are output to 1 and 261 respectively (see the timing tc in the figure).

【０００６】また、命令の発行制御を高速化するために
は命令キャッシュからの読み出し命令キャッシュアクセ
スを高速に行う必要があるが、従来のデータ処理装置で
は、下記のような構成をとっていた。通常、上記図１０
に示す命令キャッシュ２３０の入力側には、命令アドレ
ス生成部が配置され（図示せず）、この命令アドレス生
成部と命令キャッシュとは、信号が円滑に処理されるよ
うに、同一の基準クロック信号に応じて動作するように
構成されている。その際、命令アドレス生成部で生成さ
れるアドレス信号は、基準クロックに同期した正確なタ
イミングで出力されるが、このアドレス信号が命令キャ
ッシュ２３０に到達するまでには、途中の配線の容量等
の影響でどうしても遅延が生じる。そこで、従来のデー
タ処理装置では、このアドレス信号の遅延を想定して基
準クロック信号を加工し、アドレスデコーダのプリチャ
ージタイミング、アドレス信号のデコードタイミング、
メモリアレイ部のビット線プリチャージタイミング及び
読み出しデータのラッチタイミングを制御するようにし
ている。Further, in order to speed up the issue control of the instruction, it is necessary to access the read instruction cache from the instruction cache at high speed, but the conventional data processing device has the following configuration. Normally, the above-mentioned FIG.
An instruction address generation unit (not shown) is disposed on the input side of the instruction cache 230 shown in FIG. 3, and the instruction address generation unit and the instruction cache have the same reference clock signal so that signals can be processed smoothly. Is configured to operate according to. At this time, the address signal generated by the instruction address generation unit is output at an accurate timing synchronized with the reference clock. However, by the time the address signal reaches the instruction cache 230, the capacity of wiring in the middle, etc. A delay is inevitable due to the influence. Therefore, in the conventional data processing device, the reference clock signal is processed assuming the delay of the address signal, the precharge timing of the address decoder, the decode timing of the address signal,
The bit line precharge timing and the read data latch timing of the memory array section are controlled.

【０００７】さらに、上記図１０の命令フェッチ部２０
０等の制御回路は、通常、セル（バッファ、ラッチ等の
論理要素）の自動配置配線によって設計される。例えば
ラッチ・セルの場合、入力信号としてデータ信号とイネ
ーブル信号とを備えており、これをクロック信号に同期
して動作させる場合、外部から入力されたクロック信号
をバッファ・セルなどでバッファリングし負荷容量に応
じた駆動能力をもたせ、制御信号（イネーブル信号）と
して用いるようにしている。図１２は、従来の配置配線
方法で設計された制御回路のレイアウトを示す。また、
図１３は従来の配置配線の概略のフローを示す。図１２
に示すように、配置配線を行う一つのブロック２８０内
には、２つの制御信号受信セル（例えばラッチ・セル）
２８１，２８２と、２つの制御信号生成セル（例えばバ
ッファ・セル）２８３，２８４とが配置されている。こ
のような回路の配置配線は以下のように行われる。Further, the instruction fetch unit 20 shown in FIG.
A control circuit such as 0 is usually designed by automatic placement and routing of cells (logical elements such as buffers and latches). For example, in the case of a latch cell, a data signal and an enable signal are provided as input signals, and when operating in synchronization with a clock signal, a clock signal input from the outside is buffered by a buffer cell etc. It has a driving ability according to the capacity and is used as a control signal (enable signal). FIG. 12 shows a layout of a control circuit designed by the conventional placement and routing method. Also,
FIG. 13 shows a schematic flow of conventional placement and routing. 12
As shown in FIG. 3, two control signal receiving cells (for example, latch cells) are included in one block 280 which performs placement and routing.
281, 282 and two control signal generating cells (for example, buffer cells) 283, 284 are arranged. Placement and wiring of such a circuit are performed as follows.

【０００８】図１３に示すように、まずステップＳＲ１
で、概略の配置配線を実行する。次に、ステップＳＲ２
で、各制御信号受信セル２８１，２８２の負荷容量（Ｃ
１，Ｃ２）をそれぞれ抽出し、ステップＳＲ３で、速度
の評価を行って、この速度が設計目標値を満足していな
ければ、ステップＳＲ４に進み、制御信号生成セル２８
３，２８４の駆動能力を調整するつまり異なる駆動能力
を有するセルに置き換える。そして、上記ステップＳＲ
１〜ＳＲ３のステップを繰り返し行った後、ステップＳ
Ｒ３の判別で最終的に速度が設計目標値に達すると、ス
テップＳＲ５に進んで、配置配線が完成されることにな
る。その場合、外部から入力されるメインクロックＣＬ
Ｋをブロック２８０内の制御信号生成セル２８３，２８
４で受けて、制御信号受信セル２８１，２８２に制御信
号を供給するようにしていた。As shown in FIG. 13, first, step SR1
Then, the rough placement and routing is executed. Next, step SR2
Then, the load capacity (C
1, C2) are respectively extracted, the speed is evaluated in step SR3, and if the speed does not satisfy the design target value, the process proceeds to step SR4 and the control signal generating cell 28
The drive capability of 3,284 is adjusted, that is, replaced with cells having different drive capabilities. Then, the above step SR
After repeating steps 1 to SR3, step S
When the speed finally reaches the design target value by the determination of R3, the process proceeds to step SR5 to complete the placement and routing. In that case, the main clock CL input from the outside
K is the control signal generating cell 283, 28 in block 280.
In step 4, the control signal is supplied to the control signal receiving cells 281 and 282.

【０００９】[0009]

【発明が解決しようとする課題】しかしながら、上記従
来のデータ処理装置では、その各々の点について、下記
のような問題があった。However, the above-mentioned conventional data processing device has the following problems in each point.

【００１０】まず、図１０に示す命令フェッチ部の構成
では、命令キャッシュから供給される命令信号をプリデ
コードして直ちに命令の選択／発行制御に使用し、命令
選択を行う構成となっているので、命令キャッシュから
命令が供給されてからそれが命令実行部に発行されるま
でに命令のプリデコードと命令の選択とを制御しなけれ
ばならず、このためフェッチ動作を完了するまでに相当
の時間Ｔ（図１１を参照）を要し、それがデータ処理装
置の高速動作を阻害する一つの要因となっていた。First, in the configuration of the instruction fetch unit shown in FIG. 10, the instruction signal supplied from the instruction cache is predecoded and immediately used for instruction selection / issuance control to perform instruction selection. , It is necessary to control the instruction predecoding and the instruction selection from the time when the instruction is supplied from the instruction cache to the time when it is issued to the instruction execution unit, so that it takes a considerable time to complete the fetch operation. T (see FIG. 11) is required, which is one of the factors that hinder the high speed operation of the data processing device.

【００１１】また、データ処理装置における命令の発行
制御を高速に行うには、命令キャッシュアクセスを高速
に行うことが必要である。通常、アドレス発生手段とキ
ャッシュメモリには同一の基準クロック信号が供給さ
れ、アドレス発生手段から発生されるアドレス信号は、
基準クロックに同期して出力されるが、配線容量等の影
響でキャッシュメモリに到達するまでに遅延が生じるこ
とは避けられない。従って、キャッシュメモリ側では、
このアドレス信号の遅延を予め想定して基準クロック信
号を加工し、アドレスデコーダのプリチャージタイミン
グ、アドレス信号のデコードタイミング、メモリアレイ
部のビット線プリチャージタイミング、及び読み出しデ
ータラッチタイミングを制御している。Further, in order to control the instruction issue in the data processing apparatus at high speed, it is necessary to perform instruction cache access at high speed. Normally, the same reference clock signal is supplied to the address generating means and the cache memory, and the address signal generated from the address generating means is
Although it is output in synchronization with the reference clock, it is inevitable that a delay occurs until it reaches the cache memory due to the influence of wiring capacity and the like. Therefore, on the cache memory side,
The reference clock signal is processed assuming this delay of the address signal in advance, and the precharge timing of the address decoder, the decode timing of the address signal, the bit line precharge timing of the memory array section, and the read data latch timing are controlled. .

【００１２】しかしながら、アドレス信号のキャッシュ
メモリ到達までの遅延は、正確に見積もることが困難で
あるばかりでなく、集積回路内に配置した場合には、加
工精度のバラツキや動作電圧変動や動作温度変動等で予
測値が変動することが多く、かなりのマージンを見込ん
で見積もることが必要となる。そして、命令キャッシュ
アクセスを高速に行うためにはこのマージン量が無視で
きない。例えば、１００ＭＨｚのサイクルタイムで命令
キャッシュをアクセスする場合、マージン量を２ｎｓと
仮定すれば、その値は１サイクルの２０％にも相当する
ことになる。メモリセルの読み出し時間が４ｎｓ前後で
あることと比較しても、サイクル時間内でかなり大きな
割合を占めることになることがわかる。However, it is difficult to accurately estimate the delay of the address signal until it reaches the cache memory, and when placed in an integrated circuit, variations in processing accuracy, operating voltage fluctuations, and operating temperature fluctuations occur. In many cases, the forecast value fluctuates, and it is necessary to estimate with a considerable margin. This margin amount cannot be ignored in order to perform instruction cache access at high speed. For example, when accessing the instruction cache with a cycle time of 100 MHz, assuming that the margin amount is 2 ns, the value corresponds to 20% of one cycle. It can be seen that, compared with the fact that the read time of the memory cell is around 4 ns, it occupies a considerably large proportion within the cycle time.

【００１３】また、命令フェッチ等の論理部を自動配置
配線を用いて設計した場合、高速動作を実現するために
は、クロック信号の駆動能力調整を十分に行い、クロッ
クスキュー（信号の到達時間差）を小さくする必要があ
る。しかるに、従来の構成を有する装置の配置配線を行
う際、駆動能力を調整するために論理要素（セル）を入
れ換えて自動配置配線を再実行する必要があった。この
場合、駆動能力を調整するために駆動能力の異なるセル
を用いて再配置配線を行うが、そうすると、回路のセル
配置情報が変化し、その結果、クロック信号の駆動能力
が変化してしまうため、駆動能力の微細な調整が困難で
あった。加えて、この方法では、最適な回路が生成され
るまで繰り返し自動配置配線を行う必要があり、設計工
数も多くなっていた。さらに、予め駆動能力の大きいバ
ッファ・セルを用いて自動配置配線を行うと、クロック
信号の伝達時間は小さくなり、クロックスキューも小さ
くなるが、この場合は回路面積が増大し、その上消費電
力も大きくなるという不具合が生じる。Further, when a logic unit such as an instruction fetch is designed by using automatic placement and routing, in order to realize a high-speed operation, the drive capability of the clock signal is sufficiently adjusted, and the clock skew (difference in arrival time of signals). Needs to be small. However, when performing the placement and routing of the device having the conventional configuration, it is necessary to replace the logic elements (cells) and re-execute the automatic placement and routing in order to adjust the driving capability. In this case, relocation wiring is performed by using cells having different driving abilities to adjust the driving abilities, but if this is done, the cell placement information of the circuit changes, and as a result, the driving abilities of the clock signals change. However, it was difficult to finely adjust the driving ability. In addition, in this method, it is necessary to repeatedly perform automatic placement and routing until an optimum circuit is generated, and the number of design steps is also large. Furthermore, if automatic placement and routing is performed in advance using buffer cells with high driving capability, the clock signal transmission time will be short and clock skew will be small, but in this case the circuit area will increase and power consumption will also increase. The problem that it becomes large occurs.

【００１４】本発明の目的は、上記第２の問題点に鑑
み、キャッシュメモリに到達するアドレス信号の遅延量
を考慮して、最適なタイミングを持つキャッシュメモリ
動作タイミング制御信号を得るような物理的構成を設け
ることにより、データ処理装置の動作の高速化を図るこ
とにある。In view of the second problem described above, an object of the present invention is to physically obtain a cache memory operation timing control signal having an optimum timing in consideration of a delay amount of an address signal reaching a cache memory. By providing the configuration, it is intended to speed up the operation of the data processing device.

【００１５】[0015]

【課題を解決するための手段】上記目的を達成するた
め、本発明の講じた手段は、データ処理装置のキャッシ
ュメモリに入力されるクロック信号とアドレス信号との
動作タイミングを物理的構成によって一致させることで
ある。In order to achieve the above-mentioned object, the means taken by the present invention makes the operation timings of the clock signal and the address signal input to the cache memory of the data processing device coincide with each other by the physical constitution. That is.

【００１６】具体的に、請求項１の発明の講じた手段
は、少なくともキャッシュメモリを有するデータ処理装
置を前提とする。そして、アドレス信号を発生するアド
レス発生手段と、上記アドレス発生手段により発生され
るアドレス信号の変化タイミングと一致したタイミング
でアドレス同期クロック信号を発生するように構成され
たクロック発生手段と、クロック発生手段で生成された
上記アドレス同期クロック信号を用いて、上記キャッシ
ュメモリの動作タイミングを制御するキャッシュ制御手
段とを設ける構成としたものである。Specifically, the means taken by the invention of claim 1 presupposes a data processing device having at least a cache memory. And an address generating means for generating an address signal, a clock generating means configured to generate an address synchronous clock signal at a timing that coincides with a change timing of the address signal generated by the address generating means, and a clock generating means. And a cache control means for controlling the operation timing of the cache memory by using the address synchronization clock signal generated in the above.

【００１７】請求項２の発明の講じた手段は、請求項１
の発明において、上記アドレス発生手段に、アドレスを
演算するアドレス演算回路と、当該アドレス発生手段か
ら出力されるアドレス信号を一定時間の間保持して出力
するアドレス保持回路と、アドレス保持制御信号に応
じ、上記アドレス演算回路の出力信号及び上記アドレス
保持回路の出力信号のうちのいずれかを選択して、上記
アドレス信号として出力するアドレス選択回路とを設
け、上記アドレス保持回路の動作タイミングを、上記ア
ドレス選択回路を介して基準クロック信号に同期したタ
イミングでアドレス信号を出力するように調整したもの
である。The measures taken by the invention of claim 2 are as follows:
In the invention described above, an address operation circuit for calculating an address, an address holding circuit for holding and outputting an address signal output from the address generation means for a certain period of time, and an address depending on the holding control signal, selects either of the output signal of the output signal and the address holding circuit of the address arithmetic circuit, provided an address selection circuit for outputting as the <br/> address signal, the address The operation timing of the holding circuit is adjusted so that the address signal is output at the timing synchronized with the reference clock signal via the address selection circuit.

【００１８】請求項３の発明の講じた手段は、請求項１
又は２の発明において、上記クロック発生手段を、基準
クロック信号とアドレス保持制御信号とキャッシュ動作
要求信号とを入力とし、上記キャッシュ動作要求信号か
らキャッシュメモリに対して動作要求があり、かつ上記
アドレス保持制御信号からアドレスの保持要求がないと
きに上記基準クロック信号からアドレス同期クロック信
号を生成する論理積回路により構成したものである。The means taken by the invention of claim 3 is the method of claim 1.
Alternatively, in the invention of claim 2, the clock generating means receives a reference clock signal, an address holding control signal and a cache operation request signal as input, and there is an operation request to the cache memory from the cache operation request signal, and the address holding It is configured by a logical product circuit that generates an address synchronous clock signal from the reference clock signal when there is no request for holding an address from the control signal.

【００１９】請求項４の発明の講じた手段は、請求項
１、２又は３の発明において、上記キャッシュメモリ
に、メモリアレイと、上記アドレス同期クロック信号を
所定時間遅延させてなる遅延クロック信号を生成する遅
延回路とを設け、上記キャッシュ制御手段により、上記
遅延クロック信号を用いて上記メモリアレイの動作タイ
ミングを制御するように構成したものである。According to a fourth aspect of the present invention, in the first, second or third aspect of the present invention, the cache memory is provided with a delayed clock signal obtained by delaying the memory array and the address synchronization clock signal for a predetermined time. A delay circuit for generating is provided, and the cache control means controls the operation timing of the memory array using the delayed clock signal.

【００２０】請求項５の発明の講じた手段は、請求項４
の発明において、上記キャッシュメモリに、上記メモリ
アレイから出力される読み出しデータを保持するデータ
保持部を設け、上記キャッシュ制御手段により、上記遅
延クロック信号を用いて上記データ保持部における読み
出しデータのラッチタイミングを制御するように構成し
たものである。The means taken by the invention of claim 5 is defined by claim 4
In the invention, the cache memory is provided with a data holding unit for holding the read data output from the memory array, and the cache control unit uses the delayed clock signal to latch the read data in the data holding unit. Is configured to control.

【００２１】請求項６の発明の講じた手段は、請求項
１、２、３、４又は５の発明において、上記クロック発
生手段とアドレス発生手段とを、上記アドレス信号と上
記アドレス同期クロック信号とが互いに平行に延びる配
線経路を介してキャッシュメモリに供給されるように構
成したものである。According to a sixth aspect of the invention, in the invention of the first, second, third, fourth or fifth aspect, the clock generating means and the address generating means are provided with the address signal and the address synchronizing clock signal. Are supplied to the cache memory via wiring paths extending in parallel with each other.

【００２２】請求項７の発明の講じた手段は、請求項６
の発明において、上記クロック発生手段とアドレス発生
手段とを、伝達する信号の出口の能力が互いに等しくな
るように構成したものである。The means taken by the invention of claim 7 is claim 6
In the invention, the clock generating means and the address generating means are configured such that the capabilities of the outlets of the signals to be transmitted are equal to each other.

【００２３】請求項８の発明の講じた手段は、請求項６
又は７の発明において、上記クロック発生手段とアドレ
ス発生手段とを、上記アドレス信号とアドレス同期クロ
ック信号とが、同一の配線層に形成された経路を介して
キャッシュメモリに供給されるように構成したものであ
る。The means taken by the invention of claim 8 is as follows.
Alternatively, in the invention of the seventh aspect, the clock generating means and the address generating means are configured such that the address signal and the address synchronization clock signal are supplied to the cache memory through a path formed in the same wiring layer. It is a thing.

【００２４】以上の構成により、各請求項の発明では、
それぞれ以下のような作用が得られる。With the above construction, in the inventions of the respective claims,
The following actions are obtained respectively.

【００２５】請求項１の発明によれば、クロック発生手
段から出力されるアドレス信号の変化タイミングと一致
したタイミングを持つアドレス同期クロック信号がキャ
ッシュメモリに供給される。したがって、キャッシュメ
モリ内各部の動作に最適な時間配分をすることが可能と
なり、無駄な時間が排除されてキャッシュメモリ全体の
動作サイクルタイムを最小のものとすることができる。According to the first aspect of the invention, the address synchronous clock signal having the timing coincident with the change timing of the address signal output from the clock generating means is supplied to the cache memory. Therefore, it is possible to optimally allocate the time to the operation of each part in the cache memory, eliminate unnecessary time, and minimize the operation cycle time of the entire cache memory.

【００２６】請求項２の発明によれば、アドレス発生手
段により、アドレスの演算、及びアドレスの保持が行わ
れるとともに、アドレス保持制御信号によって、演算さ
れたアドレス又はアドレス保持回路に保持されたアドレ
スの一方が選択されてアドレス信号として出力される。
そして、このアドレス信号は、アドレス選択回路を介し
て基準クロック信号に同期したタイミングで出力され
る。したがって、請求項１の発明の作用が確実に得られ
る。According to the second aspect of the present invention, the address generating means calculates the address and holds the address, and at the same time the address calculated by the address holding control signal or the address held in the address holding circuit. One is selected and output as an address signal.
Then, this address signal is output at a timing synchronized with the reference clock signal via the address selection circuit. Therefore, the operation of the invention of claim 1 is surely obtained.

【００２７】請求項３の発明によれば、クロック発生手
段により、キャッシュ動作の要求がありかつアドレスの
保持要求がない場合に、基準クロック信号に同期したタ
イミングでアドレス同期クロック信号が出力され、この
アドレス同期クロック信号に応じてキャッシュメモリの
動作タイミングが制御される。したがって、キャッシュ
メモリの誤動作を招くことなく、アドレス信号とアドレ
ス同期クロック信号とのタイミングが一致することにな
る。According to the third aspect of the present invention, the clock generating means outputs the address synchronous clock signal at a timing synchronized with the reference clock signal when there is a cache operation request and no address holding request. The operation timing of the cache memory is controlled according to the address synchronization clock signal. Therefore, the timings of the address signal and the address synchronization clock signal coincide with each other without causing a malfunction of the cache memory.

【００２８】請求項４の発明によれば、キャッシュメモ
リにおいて、信号遅延回路でアドレス同期クロック信号
のタイミングが調整されて遅延クロック信号となり、こ
の遅延クロック信号に応じてメモリアレイの動作が制御
される。したがって、キャッシュメモリに入力されたア
ドレス信号のデコード等に必要な時間を考慮した適正な
タイミングでキャッシュメモリの動作が制御されること
になる。According to the fourth aspect of the present invention, in the cache memory, the timing of the address synchronization clock signal is adjusted by the signal delay circuit to become a delayed clock signal, and the operation of the memory array is controlled according to this delayed clock signal. . Therefore, the operation of the cache memory is controlled at an appropriate timing in consideration of the time required for decoding the address signal input to the cache memory.

【００２９】請求項５の発明によれば、上記請求項４の
発明の作用において、遅延クロック信号によってメモリ
アレイからの読み出しデータラッチタイミングが制御さ
れるので、遅延時間の調整だけで、キャッシュメモリか
ら命令信号を出力するタイミングが最適化されることに
なる。According to the invention of claim 5, in the operation of the invention of claim 4, since the read data latch timing from the memory array is controlled by the delay clock signal, the delay time can be adjusted from the cache memory. The timing of outputting the command signal will be optimized.

【００３０】請求項６の発明によれば、アドレス発生手
段により発生されるアドレス信号と、クロック発生手段
により発生されるアドレス同期クロック信号とが、互い
に平行に延びる配線経路を介して供給されるので、アド
レス信号とアドレス同期クロック信号との遅延量が物理
的に等しく調節されることになる。According to the invention of claim 6, the address signal generated by the address generating means and the address synchronizing clock signal generated by the clock generating means are supplied through the wiring paths extending in parallel with each other. , The delay amounts of the address signal and the address synchronization clock signal are adjusted to be physically equal.

【００３１】請求項７の発明によれば、上記請求項６の
発明の作用において、クロック発生手段とアドレス発生
手段との出口の能力が等しく設けられているので、アド
レス信号とアドレス同期クロック信号とのタイミングが
より正確に一致するように調節されることになる。According to the invention of claim 7, in the operation of the invention of claim 6, since the output capability of the clock generating means and the address generating means are equal, the address signal and the address synchronous clock signal are The timings of will be adjusted to more accurately match.

【００３２】請求項８の発明によれば、アドレス発生手
段により発生されるアドレス信号と、クロック発生手段
により発生されるアドレス同期クロック信号とが、同一
配線層に形成された配線経路を介してキャッシュメモリ
に供給されるので、アドレス信号とアドレス同期クロッ
ク信号との遅延量を等しくするように物理的に調整する
ことが容易となる。According to the eighth aspect of the present invention, the address signal generated by the address generating means and the address synchronous clock signal generated by the clock generating means are cached via a wiring path formed in the same wiring layer. Since it is supplied to the memory, it becomes easy to physically adjust the delay amounts of the address signal and the address synchronization clock signal to be equal.

【００３３】[0033]

【発明の実施の形態】以下、本発明のデータ処理装置の
実施例について、図面を参照しながら説明する。BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of a data processing apparatus of the present invention will be described below with reference to the drawings.

【００３４】図１に示すように、本発明のデータ処理装
置には、命令キャッシュ動作クロック生成部１０と、命
令アドレス生成部２０と、命令キャッシュ３０と、該命
令キャッシュ３０に命令バスＢin1 ，Ｂin2 を介して接
続される命令フェッチ部１００と、該命令フェッチ部１
００に各々命令発行バスＢout1，Ｂout2を介して接続さ
れる第１，第２命令実行部５０，６０とが配置されてい
る。上記命令キャッシュは不図示の命令バスに、上記第
１，第２命令実行部５０，６０は不図示のレジスタファ
イルにそれぞれ接続されている。さらに、該レジスタフ
ァイルには、不図示のデータキャッシュと、データアド
レス生成部と、データキャッシュ動作クロック生成部と
が接続されている。As shown in FIG. 1, in the data processing device of the present invention, an instruction cache operation clock generation unit 10, an instruction address generation unit 20, an instruction cache 30, and instruction buses Bin1 and Bin2 in the instruction cache 30. And an instruction fetch unit 1 connected via
00 is provided with first and second instruction execution units 50 and 60 connected via instruction issue buses Bout1 and Bout2, respectively. The instruction cache is connected to an instruction bus (not shown), and the first and second instruction execution units 50 and 60 are connected to a register file (not shown). Further, a data cache (not shown), a data address generation unit, and a data cache operation clock generation unit are connected to the register file.

【００３５】上記命令キャッシュ動作クロック生成部１
０、命令アドレス生成部２０、命令フェッチ部１００、
第１，第２命令実行部５０，６０は、いずれも外部クロ
ック信号ＣＬＫに応じて作動するように構成されている
が、命令キャッシュ３０は命令キャッシュ動作クロック
生成部１０で生成されたアドレス同期クロック信号Ｓ10
に応じて作動するように構成されている。そして、上記
第１，第２命令実行部５０，６０は異なる種類の命令を
実行するように構成されている。すなわち、命令アドレ
ス生成部２０からアドレス信号Ｓ22が命令キャッシュ３
０に入力されると、命令キャッシュ３０で１サイクルに
２命令の読み出しが行われ、当該アドレスの命令ＩＲ
１，ＩＲ２が命令バスＢin1 ，Ｂin2 に出力される。そ
して、命令バスＢin1 ，Ｂin2 を介して命令ＩＲ１，Ｉ
Ｒ２が命令フェッチ部１００に入力されると、命令フェ
ッチ部１００では、この命令ＩＲ１，ＩＲ２を各命令実
行部５０，６０に適合する種類に選別して、命令発行バ
スＢout1，Ｂout2を介して各命令実行部５０，６０に送
るようになされている。The instruction cache operation clock generator 1
0, instruction address generation unit 20, instruction fetch unit 100,
Although the first and second instruction execution units 50 and 60 are both configured to operate in response to the external clock signal CLK, the instruction cache 30 uses the address synchronization clock generated by the instruction cache operation clock generation unit 10. Signal S10
Is configured to operate in response to. The first and second instruction execution units 50 and 60 are configured to execute different types of instructions. That is, the address signal S22 from the instruction address generation unit 20 causes the instruction cache 3
When it is input to 0, two instructions are read in one cycle in the instruction cache 30, and the instruction IR of the address is read.
1 and IR2 are output to the instruction buses Bin1 and Bin2. Then, via the instruction buses Bin1 and Bin2, the instructions IR1 and I
When R2 is input to the instruction fetch unit 100, the instruction fetch unit 100 sorts the instructions IR1 and IR2 into types suitable for the instruction execution units 50 and 60, and outputs them via the instruction issue buses Bout1 and Bout2. It is designed to be sent to the instruction execution units 50 and 60.

【００３６】図２に示すように、上記命令フェッチ部１
００は、各命令バスＢin1 ，Ｂin2から入力される命令
ＩＲ１，ＩＲ２をデコードするプリデコーダ１２１，１
２２と、上記各プリデコーダ１２１，１２２からの出力
及びプリデコーダ１２１，１２２を通過する前の各命令
バスＢin1 ，Ｂin2 の信号を一時的に保持して、先に入
力された信号から優先的に出力する機能を有する命令キ
ュー１２３と、上記プリデコーダ１２１，１２２の入口
側の各命令バスＢin1 ，Ｂin2 に介設されるスリーステ
ートバッファ１４１，１４２と、どの命令が各命令実行
部５０，６０に入力されたかを検出して後の命令の発行
を制御する命令フェッチ制御回路１４３と、３つの入力
端子と２つの制御端子と１つの出力端子とを有し、入力
される命令のうちのいずれかを選択して上記各命令実行
部５０，６０に出力する命令選択回路１５１，１５２と
を備えている。上記命令キュー１２３の出力端子には、
第１，第２待機命令バスＢwt1 ，Ｂwt2 が接続されてお
り、この各待機命令バスＢwt1 ，Ｂwt2 はいずれも各命
令選択回路１５１，１５２の入力端子に接続されてい
る。また、待機命令デコード信号を出力する待機命令デ
コード信号線Ｂdc1 ，Ｂdc2 が、プリデコーダ１２１，
１２２の出力側から命令キュー１２３を介して設けられ
ており、この待機命令デコード信号線Ｂdc1 ，Ｂdc2
は、各命令選択回路１５１，１５２の制御端子に接続さ
れている。つまり、待機命令デコード信号は、いったん
命令キュー１２３に保持された後、次のタイミングで各
命令選択回路１５１，１５２に出力するように構成され
ている。上記命令フェッチ部１００の構成において、各
プリデコーダ１２１，１２２及び命令キュー１２３によ
り命令待機部１２０が構成され、各命令選択回路１５
１，１５２により命令選択部１５０が構成され、各スリ
ーステートバッファ１４１，１４２及び命令フェッチ制
御回路１４３により制御手段１４０が構成されている。As shown in FIG. 2, the instruction fetch unit 1
00 is a predecoder 121, 1 for decoding the instructions IR1, IR2 input from each instruction bus Bin1, Bin2.
22 and the output from each of the predecoders 121 and 122 and the signals of the instruction buses Bin1 and Bin2 before passing through the predecoders 121 and 122 are temporarily held, and the signals input earlier are given priority. An instruction queue 123 having a function of outputting, three-state buffers 141 and 142 provided on the instruction buses Bin1 and Bin2 on the entrance side of the predecoders 121 and 122, and which instruction is to be executed by the instruction execution units 50 and 60, respectively. An instruction fetch control circuit 143 which detects whether an instruction is input and controls issuance of a subsequent instruction, has three input terminals, two control terminals, and one output terminal, and is one of the input instructions And the instruction selection circuits 151 and 152 for outputting to the instruction execution units 50 and 60. At the output terminal of the instruction queue 123,
The first and second standby instruction buses Bwt1 and Bwt2 are connected, and the standby instruction buses Bwt1 and Bwt2 are connected to the input terminals of the instruction selection circuits 151 and 152, respectively. In addition, the standby instruction decode signal lines Bdc1 and Bdc2 for outputting the standby instruction decode signal are connected to the predecoder 121,
It is provided from the output side of 122 via the instruction queue 123, and these standby instruction decode signal lines Bdc1 and Bdc2 are provided.
Are connected to the control terminals of the instruction selection circuits 151 and 152. That is, the standby instruction decode signal is once held in the instruction queue 123 and then output to the instruction selection circuits 151 and 152 at the next timing. In the configuration of the instruction fetch unit 100, the instruction standby unit 120 is configured by the predecoders 121 and 122 and the instruction queue 123, and each instruction selection circuit 15
1, 152 constitute the instruction selection unit 150, and the three-state buffers 141, 142 and the instruction fetch control circuit 143 constitute the control means 140.

【００３７】また、上記第１命令実行部５０には、第１
命令デコーダ５１と、ラッチ５３と、浮動小数点命令を
処理することができる浮動小数点ユニット５２とが配置
されている。また、上記第２命令実行部６０には、第２
命令デコーダ６１と、ラッチ６３と、整数演算命令を処
理することができる整数ユニット６２とが配置されてい
る。そして、上記第１，第２命令選択回路１５１，１５
２の出力端子と上記第１，第２命令デコーダ５１，６１
の入力端子とは、それぞれ第１，第２命令発行バスＢou
t1，Ｂout2を介して接続されている。ラッチ５３，６３
は、パイプラインのＬ（ロード）ステージとＥ（実行）
ステージとを切り分けるものである。Further, the first instruction execution section 50 has a first
An instruction decoder 51, a latch 53, and a floating point unit 52 capable of processing floating point instructions are arranged. In addition, the second instruction execution unit 60 has a second
An instruction decoder 61, a latch 63, and an integer unit 62 capable of processing integer arithmetic instructions are arranged. Then, the first and second instruction selection circuits 151 and 15
2 output terminals and the first and second instruction decoders 51 and 61
Input terminals are the first and second instruction issuing buses Bou, respectively.
It is connected via t1 and Bout2. Latch 53, 63
Is the L (load) stage and E (execute) of the pipeline
It separates from the stage.

【００３８】以上のように構成されたデータ処理装置の
各要素の機能を説明する。上記命令キャッシュ３０は、
１サイクルに２命令の読み出しを行い各命令バスＢin1
，Ｂin2 に命令ＩＲ１，ＩＲ２を出力する。そして、
この命令ＩＲ１，ＩＲ２は各命令バスＢin1 ，Ｂin2 か
らプリデコーダ１２１，１２２及び命令キュー１２３に
入力される。一方の命令ＩＲ１は、第１，第２命令選択
回路１５１，１５２にも入力される。プリデコーダ１２
１，１２２はそれぞれ命令ＩＲ１，ＩＲ２を入力し、供
給される命令の種類（整数演算命令／浮動小数点演算命
令）を判別して待機命令デコード信号ＰＤ１，ＰＤ２を
命令キュー１２３に出力する。命令キュー１２３は、複
数のエントリをもつＦＩＦＯ（先入れ先出し）メモリ回
路を備えており、各エントリには命令と対応する待機命
令デコード信号を記憶することができ、先に書き込んだ
エントリから順次読み出されるように制御される。命令
キュー１２３は、１サイクルに２つの命令ＩＲ１，ＩＲ
２及び対応する待機命令デコード信号ＰＤ１，ＰＤ２を
連続する２つのエントリに書き込むことが可能に構成さ
れており、命令フェッチ制御回路１４３によって、これ
らのうち未実行の命令及び対応する待機命令デコード信
号のみが書き込まれるように制御される。また、先に書
き込まれた連続する２つのエントリの命令は、それぞれ
第１，第２待機命令バスＢwt1 ，Ｂwt2 を介し待機命令
Ｒ１，Ｒ２として各命令選択回路１５１，１５２の入力
端子に供給され、これに対応する待機命令デコード信号
ＱＤ１，ＱＤ２は待機命令デコード信号線Ｂdc1 ，Ｂdc
2 を介し第１，第２命令選択回路１５１，１５２の制御
端子及び命令フェッチ制御回路１４３に供給される。第
１，第２命令選択回路１５１，１５２は、第１命令バス
Ｂin1 と第１及び第２待機命令バスＢwt1 ，Ｂwt2 とか
ら入力される３つの信号のうちから、制御端子への待機
命令デコード信号ＱＤ１，ＱＤ２に応じて１つの信号を
選択し、それぞれ第１，第２命令発行バスＢout1，Ｂou
t2に出力する。The function of each element of the data processing apparatus configured as described above will be described. The instruction cache 30 is
Each instruction bus Bin1 is read by reading 2 instructions in 1 cycle
, Bin2 to output the instructions IR1 and IR2. And
The instructions IR1 and IR2 are input to the predecoders 121 and 122 and the instruction queue 123 from the instruction buses Bin1 and Bin2. One instruction IR1 is also input to the first and second instruction selection circuits 151 and 152. Predecoder 12
Reference numerals 1 and 122 respectively input the instructions IR1 and IR2, determine the type of the supplied instruction (integer operation instruction / floating point operation instruction), and output the standby instruction decode signals PD1 and PD2 to the instruction queue 123. The instruction queue 123 includes a FIFO (first-in first-out) memory circuit having a plurality of entries, each entry can store a wait instruction decode signal corresponding to an instruction, and the entries can be sequentially read out from the previously written entry. Controlled by. The instruction queue 123 has two instructions IR1 and IR in one cycle.
2 and the corresponding wait instruction decode signals PD1 and PD2 can be written in two consecutive entries, and only the unexecuted instruction and the corresponding wait instruction decode signal of these are fetched by the instruction fetch control circuit 143. Are controlled to be written. The instructions of the two consecutive entries written previously are supplied to the input terminals of the instruction selection circuits 151 and 152 as the standby instructions R1 and R2 via the first and second standby instruction buses Bwt1 and Bwt2, respectively. The corresponding wait instruction decode signals QD1 and QD2 are the wait instruction decode signal lines Bdc1 and Bdc.
It is supplied to the control terminals of the first and second instruction selection circuits 151 and 152 and the instruction fetch control circuit 143 via 2. The first and second instruction selection circuits 151 and 152 select the standby instruction decode signal to the control terminal from the three signals input from the first instruction bus Bin1 and the first and second standby instruction buses Bwt1 and Bwt2. One signal is selected according to QD1 and QD2, and the first and second instruction issuing buses Bout1 and Bou are respectively selected.
Output to t2.

【００３９】第１命令選択回路１５１は、待機命令Ｒ１
が浮動小数点演算命令であることを待機命令デコード信
号ＱＤ１が示している場合には待機命令Ｒ１を、待機命
令Ｒ１が整数演算命令であることを待機命令デコード信
号ＱＤ１が示しかつ待機命令Ｒ２が浮動小数点演算命令
であることを待機命令デコード信号ＱＤ２が示している
場合には待機命令Ｒ２を、その他の場合には第１命令バ
スＢin1 から入力された命令ＩＲ１をそれぞれ選択す
る。このようにして第１命令選択回路１５１で選択され
た命令は、発行命令Ｉ１として、第１命令発行バスＢou
t1を介し第１命令実行部５０に入力される。第２命令選
択回路１５２は、待機命令Ｒ１が整数演算命令であるこ
とを待機命令デコード信号ＱＤ１が示している場合には
待機命令Ｒ１を、待機命令Ｒ１が浮動小数点演算命令で
あることを待機命令デコード信号ＱＤ１が示しかつ待機
命令Ｒ２が整数演算命令であることを待機命令デコード
信号ＱＤ２が示している場合には待機命令Ｒ２を、その
他の場合には第１命令バスＢin1 から入力された命令Ｉ
Ｒ１をそれぞれ選択する。このようにして第２命令選択
回路１５２で選択された命令は、発行命令Ｉ２として、
第２命令発行バスＢout2を介し第２命令実行部６０に入
力される。The first instruction selection circuit 151 uses the standby instruction R1.
Is a floating-point operation instruction, the wait instruction decode signal QD1 indicates the wait instruction R1, the wait instruction decode signal QD1 indicates that the wait instruction R1 is an integer operation instruction, and the wait instruction R2 floats. If the standby instruction decode signal QD2 indicates that it is a decimal point operation instruction, the standby instruction R2 is selected, and in other cases, the instruction IR1 input from the first instruction bus Bin1 is selected. The instruction selected by the first instruction selection circuit 151 in this manner is the first instruction issue bus Bou as the issued instruction I1.
It is input to the first instruction execution unit 50 via t1. The second instruction selection circuit 152 indicates that the standby instruction R1 is a floating point arithmetic instruction if the standby instruction decode signal QD1 indicates that the standby instruction R1 is an integer arithmetic instruction. If the decode signal QD1 indicates and the wait instruction R2 indicates that the wait instruction R2 is an integer operation instruction, the wait instruction R2 is indicated. In other cases, the instruction I input from the first instruction bus Bin1 is input.
Select R1 respectively. The instruction selected by the second instruction selection circuit 152 in this manner is the issued instruction I2.
It is input to the second instruction execution unit 60 via the second instruction issuing bus Bout2.

【００４０】次に、上記データ処理装置における具体的
な動作について、図３のタイミングチャートを参照しな
がら説明する。最初に、命令キュー１２３の全エントリ
が空の状態で第１，第２命令バスＢin1 ，Ｂin2 にそれ
ぞれ命令ＩＲ１，ＩＲ２が供給される場合（例えば、Ｉ
Ｒ１が整数演算命令、ＩＲ２が浮動小数点演算命令であ
る場合）についてみる。最初のクロックサイクルＰe1で
は、タイミングｔ1 で第１，第２命令バスＢin1 ，Ｂin
2 にそれぞれ命令ＩＲ１，ＩＲ２が供給されると、第
１，第２待機命令バスＢwt1 ，Ｂwt2 には命令が供給さ
れないことから、第１，第２命令選択回路１５１，１５
２ではともに第１命令バスＢin1 の命令ＩＲ１（整数演
算命令）が選択され、命令発行バスＢout1，Ｂout2に出
力される。この命令は第２命令実行部６０では実行され
るが、第１命令実行部５０では適合しないので無視され
ることになる。したがって、このサイクルＰe1では、第
２命令選択回路１５２の発行命令Ｉ２すなわち整数演算
命令ＩＲ１のみが実行されることになる（同図のタイミ
ングｔ2 ）。命令フェッチ制御回路１４３は、第１，第
２命令デコーダ５１，６１のデコード結果に応じて、各
スリーステートバッファ１４１，１４２及び命令キュー
１２３を制御する。この結果、未実行の第２命令バスＢ
in2 の浮動小数点演算命令ＩＲ２は命令キュー１２３に
書き込まれる。また、プリデコーダ１２２は該命令ＩＲ
２が浮動小数点演算命令であることを示す待機命令デコ
ード信号ＰＤ２を生成し、該待機命令デコード信号ＰＤ
２もまた命令キュー１２３に書き込まれる（同図のタイ
ミングｔ3 ）。Next, a specific operation of the data processing device will be described with reference to the timing chart of FIG. First, when all the entries in the instruction queue 123 are empty and the instructions IR1 and IR2 are supplied to the first and second instruction buses Bin1 and Bin2, respectively (for example, I
(Where R1 is an integer operation instruction and IR2 is a floating point operation instruction). In the first clock cycle Pe1, at the timing t1, the first and second instruction buses Bin1 and Bin are provided.
When the instructions IR1 and IR2 are respectively supplied to 2, the instructions are not supplied to the first and second standby instruction buses Bwt1 and Bwt2, so that the first and second instruction selection circuits 151 and 15 are provided.
In step 2, the instruction IR1 (integer operation instruction) on the first instruction bus Bin1 is selected and output to the instruction issuing buses Bout1 and Bout2. This instruction is executed by the second instruction execution unit 60, but is ignored by the first instruction execution unit 50 because it is not compatible. Therefore, in this cycle Pe1, only the issuing instruction I2 of the second instruction selecting circuit 152, that is, the integer operation instruction IR1 is executed (timing t2 in the figure). The instruction fetch control circuit 143 controls the three-state buffers 141 and 142 and the instruction queue 123 according to the decoding results of the first and second instruction decoders 51 and 61. As a result, the unexecuted second instruction bus B
The floating point operation instruction IR2 of in2 is written in the instruction queue 123. Further, the predecoder 122 uses the instruction IR
2 generates a standby instruction decode signal PD2 indicating that 2 is a floating point arithmetic instruction, and the standby instruction decode signal PD2 is generated.
2 is also written in the instruction queue 123 (timing t3 in the figure).

【００４１】そして、次のクロックサイクルＰe2で、命
令キュー１２３から、待機中の浮動小数点演算命令ＩＲ
２が待機命令Ｒ１として、またそのデコード信号ＰＤ２
が待機命令デコード信号ＱＤ１としてそれぞれ出力され
る（タイミングｔ4 ）。続いて、このサイクルで、第
１，第２命令バスＢin1 ，Ｂin2 にそれぞれ新たな命令
ＩＲ１，ＩＲ２が供給される（例えば、命令ＩＲ１及び
ＩＲ２がいずれも整数演算命令）（同図のタイミングｔ
5 ）。そして、第１命令選択回路１５１では第１待機命
令バスＢwt1 の浮動小数点演算命令Ｒ１が選択されて発
行命令Ｉ１として第１命令実行部５０に出力される一
方、第２命令選択回路１５２では第１命令バスＢin1 の
整数演算命令ＩＲ１が選択されて発行命令Ｉ２として第
２命令実行部６０に出力される。これらの発行命令Ｉ
１，Ｉ２はそれぞれ第１，第２実行命令部５０，６０で
実行される。したがって、このサイクルＰe2では２命令
が並列して実行される（タイミングｔ6 ）。この際、タ
イミングｔ5 とタイミングｔ6 との間の時間Ｔのうちに
第１，第２命令バスＢin1 ，Ｂin2 の命令ＩＲ１，ＩＲ
２をプリデコードするわけではないので、時間Ｔが従来
より短縮される。一方、命令実行部５０，６０に入力さ
れなかった整数演算命令ＩＲ２は、命令フェッチ制御回
路１４３の制御により命令キュー１２３に書き込まれ
る。また、プリデコーダ１２２は該命令ＩＲ２が整数演
算命令であることを示す待機命令デコード信号ＰＤ２を
生成し、該待機命令デコード信号ＰＤ２もまた命令キュ
ー１２３に書き込まれる（タイミングｔ7 ）。Then, at the next clock cycle Pe2, the waiting floating-point operation instruction IR is fetched from the instruction queue 123.
2 is the standby instruction R1 and its decode signal PD2
Are output as the standby instruction decode signal QD1 (timing t4). Subsequently, in this cycle, new instructions IR1 and IR2 are supplied to the first and second instruction buses Bin1 and Bin2, respectively (for example, both instructions IR1 and IR2 are integer operation instructions) (timing t in the same figure).
Five ). Then, in the first instruction selection circuit 151, the floating point arithmetic instruction R1 of the first standby instruction bus Bwt1 is selected and output as the issued instruction I1 to the first instruction execution section 50, while in the second instruction selection circuit 152, The integer operation instruction IR1 of the instruction bus Bin1 is selected and output to the second instruction execution unit 60 as the issue instruction I2. These issuing instructions I
1 and I2 are executed by the first and second execution instruction units 50 and 60, respectively. Therefore, in this cycle Pe2, two instructions are executed in parallel (timing t6). At this time, during the time T between the timing t5 and the timing t6, the instructions IR1 and IR of the first and second instruction buses Bin1 and Bin2.
Since 2 is not predecoded, the time T is shortened as compared with the conventional case. On the other hand, the integer operation instruction IR2 that has not been input to the instruction execution units 50 and 60 is written in the instruction queue 123 under the control of the instruction fetch control circuit 143. Further, the predecoder 122 generates a standby instruction decode signal PD2 indicating that the instruction IR2 is an integer operation instruction, and the standby instruction decode signal PD2 is also written in the instruction queue 123 (timing t7).

【００４２】さらに、次のクロックサイクルＰe3におい
て、命令キュー１２３から、待機中の整数演算命令ＩＲ
２が待機命令Ｒ１として、またそのデコード信号ＰＤ２
が待機命令デコード信号ＱＤ１としてそれぞれ出力され
る（タイミングｔ8 ）。続いて、このサイクルで、第
１，第２命令バスＢin1 ，Ｂin2 にそれぞれ新たな命令
ＩＲ１，ＩＲ２が供給される（例えば、命令ＩＲ１及び
ＩＲ２がいずれも浮動小数点演算命令）（同図のタイミ
ングｔ9 ）。そして、第１命令選択回路１５１では第１
命令バスＢin1 の浮動小数点演算命令ＩＲ１が、第２命
令選択回路１５２では第１待機命令バスＢwt1 の整数演
算命令Ｒ１がそれぞれ選択される（タイミングｔ10）。
一方、命令実行部５０，６０に入力されなかった浮動小
数点演算命令命令ＩＲ２とそのデコード信号ＰＤ２と
は、命令フェッチ制御回路１４３の制御により命令キュ
ー１２３に書き込まれる。Further, in the next clock cycle Pe3, from the instruction queue 123, the waiting integer operation instruction IR
2 is the standby instruction R1 and its decode signal PD2
Are output as the wait instruction decode signal QD1 (timing t8). Subsequently, in this cycle, new instructions IR1 and IR2 are supplied to the first and second instruction buses Bin1 and Bin2, respectively (for example, both instructions IR1 and IR2 are floating-point arithmetic instructions) (timing t9 in the figure). ). Then, in the first instruction selection circuit 151,
The floating-point operation instruction IR1 on the instruction bus Bin1 and the integer operation instruction R1 on the first standby instruction bus Bwt1 are selected by the second instruction selection circuit 152 (timing t10).
On the other hand, the floating-point operation instruction instruction IR2 and its decode signal PD2 that have not been input to the instruction execution units 50 and 60 are written in the instruction queue 123 under the control of the instruction fetch control circuit 143.

【００４３】このように、本実施例のデータ処理装置で
は、命令キュー１２３に命令があるサイクルにおいて
は、命令の組み合わせに応じて２命令を並列に実行する
ことが可能である。本実施例においては、命令実行数は
最大２であって命令供給数２を上回ることはありえない
ので、命令供給が連続的に行われる限り、常に１命令以
上の命令が命令キュー１２３に記憶されている。したが
って、命令の組み合わせが許せば、常に２つの命令を並
列に実行することが可能である。その場合、従来の例で
は命令選択回路で命令の選択／発行の制御をするために
命令バスＢin1 ，Ｂin2 の内容をデコードしたものを用
いていたのに対し、本実施例では待機命令デコード信号
ＱＤ１，ＱＤ２を用いる。命令キュー１２３の読み出し
時間は命令キャッシュ３０に比較して短いため、待機命
令バスＢwt1 ，Ｂwt2 の命令は命令バスＢin1 ，Ｂin2
に比較して早いタイミングで確定する。また、従来の例
では、命令バスＢin1 ，Ｂin2 からの読み出し・プリデ
コード・発行命令選択という一連の動作を１サイクルで
実行する必要があったが、本実施例では、命令バスＢin
1 からの読み出し，命令キュー１２３からの読み出し・
発行命令選択だけを１サイクルで行えばよい。したがっ
て、命令バスＢin1 ，Ｂin2 の命令ＩＲ１，ＩＲ２の種
類を判断して命令の発行／制御を行う構成に比較して、
命令の選択／発行を高速で行うことが可能になり、ひい
てはデータ処理装置の高速動作を実現できる。As described above, in the data processing apparatus of this embodiment, in the cycle in which there is an instruction in the instruction queue 123, two instructions can be executed in parallel according to the combination of instructions. In the present embodiment, the number of executed instructions is maximum 2 and cannot exceed the number of supplied instructions 2. Therefore, as long as instruction supply is continuously performed, one or more instructions are always stored in the instruction queue 123. There is. Therefore, if the combination of instructions permits, it is possible to always execute two instructions in parallel. In that case, in the conventional example, the instruction selection circuit controls decoding / issuance of the instruction by using the decoded contents of the instruction buses Bin1 and Bin2, whereas in the present embodiment, the standby instruction decode signal QD1. , QD2 is used. Since the read time of the instruction queue 123 is shorter than that of the instruction cache 30, the instructions on the standby instruction buses Bwt1 and Bwt2 are instruction buses Bin1 and Bin2.
It will be confirmed at an earlier timing compared to. Further, in the conventional example, it is necessary to execute a series of operations of reading, predecoding, and issuing instruction selection from the instruction buses Bin1 and Bin2 in one cycle, but in this embodiment, the instruction bus Bin
Read from 1, read from instruction queue 123
Only the issuing instruction need be selected in one cycle. Therefore, in comparison with the configuration in which the types of the instructions IR1 and IR2 of the instruction buses Bin1 and Bin2 are determined and the instructions are issued / controlled,
The instruction can be selected / issued at high speed, and the high speed operation of the data processing device can be realized.

【００４４】なお、本実施例では、命令キャッシュ３０
は１クロックに２命令を２本の命令バスに供給する構成
となっているが、命令バスを１本もしくは３本以上とし
て、各々に命令を供給するようにしても同様の動作を実
現することが可能である。In this embodiment, the instruction cache 30
Is configured to supply two instructions to two instruction buses per clock, but the same operation can be realized even if the instruction bus is provided to one or three or more and each instruction is supplied. Is possible.

【００４５】また、上記実施例の命令フェッチ部１００
では、命令キュー１２３が２本の待機命令バス及び待機
命令デコード信号線に、記憶した命令及び信号を出力す
る構成となっているが、待機命令バス及び待機命令デコ
ード信号線を各々１本又は３本以上設け、各々に命令キ
ューが記憶した命令及び信号を出力する構成としてもよ
い。Also, the instruction fetch unit 100 of the above embodiment.
In the above, the instruction queue 123 outputs the stored instructions and signals to the two standby instruction buses and the standby instruction decode signal lines. However, one or three standby instruction buses and standby instruction decode signal lines are provided. There may be a configuration in which more than one is provided and the command and the signal stored in the command queue are output to each.

【００４６】図４は、図２の変形例を示し、命令バスＢ
in1 ，Ｂin2 において、スリーステートバッファ１４
１，１４２の後方に、まず命令キュー１２３を配置し、
次にプリデコーダ１２１，１２２を配置した例である。
この場合、待機命令デコード信号線Ｂdc1 ，Ｂdc2 はプ
リデコーダ１２１，１２２の出力端子に接続され、待機
命令バスＢwt1 ，Ｂwt2 は命令キュー１２３の出力端子
に接続されている。図４の構成でも、図２の場合と同じ
効果を得ることができる。FIG. 4 shows a modification of FIG.
Three-state buffer 14 in in1 and Bin2
First, the instruction queue 123 is arranged behind 1,142,
Next, it is an example in which the predecoders 121 and 122 are arranged.
In this case, the standby instruction decode signal lines Bdc1 and Bdc2 are connected to the output terminals of the predecoders 121 and 122, and the standby instruction buses Bwt1 and Bwt2 are connected to the output terminal of the instruction queue 123. With the configuration of FIG. 4, the same effect as in the case of FIG. 2 can be obtained.

【００４７】図５は、図１の命令キャッシュ動作クロッ
ク生成部１０、命令アドレス生成部２０及び命令キャッ
シュ３０の詳細な構成を示す電気回路図である。同図に
示すように、上記命令キャッシュ動作クロック生成部１
０は、基準クロック信号ＣＬＫと、アドレス保持制御信
号Ｓakc の反転信号と、キャッシュ動作要求信号Ｓcar
との論理積を演算する論理積回路１１を配置してなる。
また、命令アドレス生成部２０には、アドレス演算回路
２１と、セレクタ及びフリップフロップからなるアドレ
ス選択回路２２と、フリップフロップを配置してなるア
ドレス保持回路２３とが設けられている。また、命令キ
ャッシュ３０には、信号遅延回路３１と、メモリアレイ
３２と、ラッチ３３とが設けられている。信号遅延回路
３１は、例えば複数のダミー・ゲート容量で構成され
る。ラッチ３３は、パイプラインのＦ（フェッチ）ステ
ージとＬ（ロード）ステージとを切り分けるものであ
る。FIG. 5 is an electric circuit diagram showing a detailed configuration of the instruction cache operation clock generator 10, the instruction address generator 20, and the instruction cache 30 of FIG. As shown in the figure, the instruction cache operation clock generator 1
0 is the reference clock signal CLK, the inverted signal of the address holding control signal Sakc, and the cache operation request signal Scar.
A logical product circuit 11 for calculating a logical product of is arranged.
Further, the instruction address generation unit 20 is provided with an address calculation circuit 21, an address selection circuit 22 including a selector and a flip-flop, and an address holding circuit 23 including a flip-flop. Further, the instruction cache 30 is provided with a signal delay circuit 31, a memory array 32, and a latch 33. The signal delay circuit 31 is composed of, for example, a plurality of dummy gate capacitors. The latch 33 separates the F (fetch) stage and the L (load) stage of the pipeline.

【００４８】アドレス演算回路２１は、入力データＤin
1 ，Ｄin2 （例えば、プログラムカウンタの値と、ある
レジスタの値）を取り込み、アドレス演算結果信号Ｓ21
を出力する。このアドレス演算結果信号Ｓ21は、アドレ
ス選択回路２２の第１データ入力となる。アドレス保持
回路２３は、アドレス選択回路２２から出力されるアド
レス信号Ｓ22を入力とし、保持アドレス信号Ｓ23を出力
する。この保持アドレス信号Ｓ23はアドレス選択回路２
２の第２データ入力となる。一方、アドレス保持制御信
号Ｓakc は、アドレス選択回路２２にも入力され、基準
クロック信号ＣＬＫは、アドレス選択回路２２とアドレ
ス保持回路２３にも入力される。The address calculation circuit 21 receives the input data Din
1, Din2 (for example, the value of the program counter and the value of a certain register) are fetched, and the address operation result signal S21
Is output. The address calculation result signal S21 becomes the first data input of the address selection circuit 22. The address holding circuit 23 receives the address signal S22 output from the address selection circuit 22 and outputs a holding address signal S23. The held address signal S23 is used as the address selection circuit 2
2nd data input of 2. On the other hand, the address holding control signal Sakc is also input to the address selecting circuit 22, and the reference clock signal CLK is also input to the address selecting circuit 22 and the address holding circuit 23.

【００４９】一方、命令キャッシュ３０では、メモリア
レイ３２に命令アドレス生成部２０からのアドレス信号
Ｓ22が入力され、信号遅延回路３１には命令キャッシュ
動作クロック生成部１０からアドレス同期クロック信号
Ｓ10が入力される。そして、信号遅延回路３１からは、
上記アドレス同期クロック信号Ｓ10を所定時間遅延させ
てなる遅延クロック信号Ｓ31が出力される。メモリアレ
イ３２は、遅延クロック信号Ｓ31に応じて作動し、アド
レス信号Ｓ22に応じた命令信号を出力し、この命令信号
はラッチ３３で保持された後、２命令ＩＲ１，ＩＲ２を
表わす最終的な命令出力信号Ｓ33として出力される。こ
のラッチ３３も、上記遅延クロック信号Ｓ31に応じて作
動するように構成されている。On the other hand, in the instruction cache 30, the address signal S22 from the instruction address generation unit 20 is input to the memory array 32, and the address synchronization clock signal S10 is input from the instruction cache operation clock generation unit 10 to the signal delay circuit 31. It Then, from the signal delay circuit 31,
A delayed clock signal S31 obtained by delaying the address synchronization clock signal S10 by a predetermined time is output. The memory array 32 operates in response to the delayed clock signal S31 and outputs a command signal in response to the address signal S22. This command signal is held in the latch 33 and then a final command representing two commands IR1 and IR2. It is output as the output signal S33. The latch 33 is also configured to operate in response to the delayed clock signal S31.

【００５０】以上のように構成されたデータ処理装置に
ついて、図６に基づきその動作を説明する。図６は、上
から順に、基準クロック信号ＣＬＫ、アドレス保持制御
信号Ｓakc 、キャッシュ動作要求信号Ｓcar 、アドレス
演算結果信号Ｓ21、アドレス信号Ｓ22、保持アドレス信
号Ｓ23、アドレス同期クロック信号Ｓ10、遅延クロック
信号Ｓ31及び命令出力信号Ｓ33の状態を示す動作タイミ
ング図である。The operation of the data processing apparatus configured as described above will be described with reference to FIG. In FIG. 6, the reference clock signal CLK, the address holding control signal Sakc, the cache operation request signal Scar, the address operation result signal S21, the address signal S22, the holding address signal S23, the address synchronizing clock signal S10, and the delay clock signal S31 are listed in order from the top. FIG. 9 is an operation timing chart showing the states of a command output signal S33 and a command output signal S33.

【００５１】アドレス演算回路２１は、基準クロック信
号ＣＬＫが低レベルの間にアドレス演算を完了し、演算
されたアドレス情報であるアドレス演算結果信号Ｓ21を
出力し（同図中の符号ａ〜ｅで示す信号）、この信号Ｓ
21はアドレス選択回路２２に入力される。アドレス選択
回路２２は、アドレス保持制御信号Ｓakc が低レベルで
あればアドレス演算回路２１からのアドレス演算結果信
号Ｓ21を、アドレス保持制御信号Ｓakc が高レベルであ
ればアドレス保持回路２３からの保持アドレス信号Ｓ23
をセレクタで選択し、この信号をフリップフロップで保
持した後、アドレス信号Ｓ22として出力する。アドレス
保持回路２３は、１／２サイクル前にアドレス選択回路
２２から出力されたアドレス信号Ｓ22を取り込み１サイ
クルの間保持するとともに、保持アドレス信号Ｓ23とし
て出力し、この保持アドレス信号Ｓ23はアドレス選択回
路２２に入力される。命令キャッシュ動作クロック生成
部１０は、その構成要素である論理積回路１１によっ
て、キャッシュ動作要求信号Ｓcar が高レベルで、かつ
アドレス保持制御信号Ｓakc が低レベルであるときに、
アドレス同期クロック信号Ｓ10を出力する。つまり、命
令キャッシュ３０に対して動作要求があり、かつアドレ
ス信号Ｓ22が１サイクル前の値と異なる場合にのみ、基
準クロック信号ＣＬＫをアドレス同期クロック信号Ｓ10
とすることを意味する。The address calculation circuit 21 completes the address calculation while the reference clock signal CLK is at a low level, and outputs the address calculation result signal S21 which is the calculated address information (indicated by symbols a to e in the figure). Signal), this signal S
21 is input to the address selection circuit 22. The address selection circuit 22 outputs the address operation result signal S21 from the address operation circuit 21 when the address holding control signal Sakc is at the low level, and the holding address signal from the address holding circuit 23 when the address holding control signal Sakc is at the high level. S23
Is selected by a selector, this signal is held by a flip-flop, and then output as an address signal S22. The address holding circuit 23 takes in the address signal S22 output from the address selection circuit 22 1/2 cycle before and holds it for one cycle, and outputs it as a holding address signal S23. This holding address signal S23 is the address selection circuit S23. 22 is input. The instruction cache operation clock generation unit 10 uses the logical product circuit 11 that is a constituent element thereof when the cache operation request signal Scar is at a high level and the address holding control signal Sakc is at a low level.
The address synchronization clock signal S10 is output. That is, only when the operation request is issued to the instruction cache 30 and the address signal S22 is different from the value one cycle before, the reference clock signal CLK is changed to the address synchronous clock signal S10.
Means to.

【００５２】ここで、アドレス信号Ｓ22とアドレス同期
クロック信号Ｓ10とは、基準クロック信号ＣＬＫからの
遅延時間が同じになるように、各々命令アドレス生成部
２０と命令キャッシュ動作クロック生成部１０におい
て、図７及び図８を用いて後述するとおり各々伝達する
信号の出口（例えばバッファ）の駆動能力が互いに等し
くなるように設計段階で調整することにより、調整され
る（図６のタイミングｔ11〜ｔ13参照）。その場合、ア
ドレス信号Ｓ22と、アドレス同期クロック信号Ｓ10と
を、同一経路を通して命令キャッシュ３０に供給するこ
とにより、これらの信号の経路における配線負荷容量を
等しいものとすることができ、アドレス信号Ｓ22及びア
ドレス同期クロック信号Ｓ10の双方のタイミングを調整
することが容易になる。Here, the address signal S22 and the address synchronization clock signal S10 are respectively placed in the instruction address generation unit 20 and the instruction cache operation clock generation unit 10 so that the delay time from the reference clock signal CLK becomes the same. Then, each is transmitted as described later with reference to FIGS. 7 and 8.
The drive capabilities of the signal outlets (eg buffers) are equal to each other
Adjustment is performed at the design stage so as to become (see timings t11 to t13 in FIG. 6). In that case, by supplying the address signal S22 and the address synchronization clock signal S10 to the instruction cache 30 through the same path, the wiring load capacitances on the paths of these signals can be made equal, and the address signal S22 and It becomes easy to adjust both timings of the address synchronization clock signal S10.

【００５３】さらに、アドレス信号Ｓ22と、アドレス同
期クロック信号Ｓ10との配線経路を同一にするだけでな
く、配線層の使い方をも同じくすることによって、これ
らの信号の配線負荷容量を等しいものとすることがで
き、アドレス信号Ｓ22とアドレス同期クロック信号Ｓ10
との双方のタイミングを調整することが容易になる。Further, not only the wiring paths of the address signal S22 and the address synchronous clock signal S10 are made the same, but the usage of the wiring layers is also made the same so that the wiring load capacitances of these signals are made equal. Address signal S22 and address synchronization clock signal S10
It becomes easy to adjust the timing of both.

【００５４】また、信号遅延回路３１は、アドレス同期
クロック信号Ｓ10を所定時間だけ遅延させ、遅延クロッ
ク信号Ｓ31として出力する。この遅延クロック信号Ｓ31
は、メモリアレイ３２とラッチ３３とに、動作制御用信
号として入力される。命令キャッシュ３０は、アドレス
信号Ｓ22とアドレス同期クロック信号Ｓ10とを受けてそ
の動作を開始する。上述のように、アドレス信号Ｓ22の
出力タイミングとアドレス同期クロック信号Ｓ10の出力
タイミングとが同じであるため、信号遅延回路３１によ
り、アドレス信号Ｓ22をデコードするアドレスデコーダ
のセットアップ時間分だけアドレス同期クロック信号Ｓ
10を遅延させ、遅延クロック信号Ｓ31としてメモリアレ
イ３２に付与することで、最小の時間でアドレスデコー
ドを行わせることができる。さらに、アドレス同期クロ
ック信号Ｓ10の立ち下がりエッジのタイミングや、低レ
ベル期間の長さを調整することにより、メモリアレイ３
２の中のアドレスデコーダやメモリ部のビット線のプリ
チャージタイミングや、ラッチ３３における読み出しデ
ータのラッチタイミングを制御して、最終的な命令出力
信号Ｓ33の出力タイミングを制御することが容易にな
る。The signal delay circuit 31 delays the address synchronization clock signal S10 by a predetermined time and outputs it as a delayed clock signal S31. This delayed clock signal S31
Is input as an operation control signal to the memory array 32 and the latch 33. The instruction cache 30 receives the address signal S22 and the address synchronization clock signal S10 and starts its operation. As described above, since the output timing of the address signal S22 and the output timing of the address synchronization clock signal S10 are the same, the signal delay circuit 31 causes the address synchronization clock signal for the setup time of the address decoder for decoding the address signal S22. S
By delaying 10 and applying the delayed clock signal S31 to the memory array 32, address decoding can be performed in the minimum time. Furthermore, by adjusting the timing of the falling edge of the address synchronization clock signal S10 and the length of the low level period, the memory array 3
It becomes easy to control the final output timing of the command output signal S33 by controlling the precharge timing of the address decoder in 2 and the bit line of the memory section and the latch timing of the read data in the latch 33.

【００５５】なお、図５に示した命令キャッシュ動作ク
ロック生成部１０、命令アドレス生成部２０及び命令キ
ャッシュ３０の構成は、データキャッシュ動作クロック
生成部、データアドレス生成部及びデータキャッシュに
転用可能である。The configurations of the instruction cache operation clock generation unit 10, the instruction address generation unit 20, and the instruction cache 30 shown in FIG. 5 can be applied to the data cache operation clock generation unit, the data address generation unit, and the data cache. .

【００５６】以上のように、本実施例によれば、アドレ
ス信号を発生するアドレス発生手段と、アドレス発生手
段により発生されるアドレス信号の変化タイミングと一
致したタイミングを持つアドレス同期クロック信号を発
生するクロック発生手段と、キャッシュメモリとを備
え、アドレス同期クロック信号を用いてキャッシュメモ
リの動作タイミングを制御することによって、キャッシ
ュメモリアクセスを行う際に無駄のないタイミング設計
が可能となる結果、キャッシュメモリ全体の動作サイク
ルタイムを最小のものとすることができる。As described above, according to this embodiment, the address generating means for generating the address signal and the address synchronizing clock signal having the timing coincident with the change timing of the address signal generated by the address generating means are generated. By providing the clock generation means and the cache memory, and controlling the operation timing of the cache memory using the address synchronization clock signal, it becomes possible to perform a timing design without waste when performing the cache memory access, resulting in the entire cache memory. The operation cycle time can be minimized.

【００５７】図７は、データ処理装置の配置配線におけ
るレイアウトを示す図である。同図に示すように、制御
回路８０とは独立して設けられた制御信号生成部９０に
制御信号生成セル９１，９２が配設される。一方、制御
回路８０には各制御信号生成セル９１，９２の制御信号
を受ける制御信号受信セル８１，８２が配置されてい
る。各制御信号生成セル９１，９２は、バッファ・セル
等で構成されており、基準クロック信号ＣＬＫを入力
し、これをバッファリングして制御信号ＣＬＫ１，ＣＬ
Ｋ２を出力する。各制御信号生成セル９１，９２のレイ
アウト情報は、セルを形成するトランジスタのチャネル
幅やチャネル長によってパラメータ化されており、パラ
メータ値を変化させることによりセルの外形を変えるこ
となくその駆動能力を変更できる。一方、制御信号受信
セル８１，８２はラッチ・セル等で構成され、該各制御
信号受信セル８１，８２に、制御信号生成セル９１，９
２で生成された制御信号ＣＬＫ１，ＣＬＫ２が入力され
るようになされている。FIG. 7 is a diagram showing a layout in the layout and wiring of the data processing device. As shown in the figure, the control signal generation unit 90 provided independently of the control circuit 80 is provided with control signal generation cells 91 and 92. On the other hand, in the control circuit 80, control signal receiving cells 81 and 82 for receiving the control signals of the control signal generating cells 91 and 92 are arranged. Each of the control signal generation cells 91 and 92 is composed of a buffer cell or the like, receives the reference clock signal CLK, buffers it, and outputs the control signals CLK1 and CL.
Output K2. The layout information of each control signal generating cell 91, 92 is parameterized by the channel width and channel length of the transistor forming the cell, and its driving capability is changed by changing the parameter value without changing the outer shape of the cell. it can. On the other hand, the control signal receiving cells 81, 82 are composed of latch cells and the like, and the control signal receiving cells 81, 82 are provided with control signal generating cells 91, 9 respectively.
The control signals CLK1 and CLK2 generated in 2 are input.

【００５８】ここで、上記制御回路８０及び制御信号生
成部９０の配置配線方法について、図８のフロ―チャ―
トを参照しながら説明する。まず、ステップＳＴ１で、
制御信号受信セル８１，８２を含む制御回路８０の自動
配置配線を実行する。そして、ステップＳＴ２で制御信
号受信セル８１，８２を含む制御回路８０を完成させる
と、ステップＳＴ３で各制御信号受信セル８１，８２の
負荷容量Ｃ１，Ｃ２を抽出する。一方、ステップＳＴ４
で制御信号生成セル９１，９２を含む制御信号生成部９
０の論理レイアウトを設計した後、該制御信号生成部９
０の自動配置配線をステップＳＴ５で実行する。そし
て、ステップＳＴ６で、上記各ステップで形成された制
御信号受信セル８１，８２の負荷容量と各制御信号生成
セル９１，９２の駆動能力から、速度評価を行う。この
速度評価の結果が良好であれば、ステップＳＴ７に進ん
で、制御信号生成セル９１，９２を含む制御信号生成部
９０の配置配線を完成させる。一方、ステップＳＴ６に
おける速度評価の結果がよくないときには、ステップＳ
Ｔ８で、さらに制御信号生成セル９１，９２の駆動能力
を調整してから、上述のステップＳＴ７の制御に移行す
る。Here, the layout and wiring method of the control circuit 80 and the control signal generator 90 will be described with reference to the flowchart of FIG.
The explanation will be given with reference to G. First, in step ST1,
The automatic placement and routing of the control circuit 80 including the control signal receiving cells 81 and 82 is executed. When the control circuit 80 including the control signal receiving cells 81 and 82 is completed in step ST2, the load capacities C1 and C2 of the control signal receiving cells 81 and 82 are extracted in step ST3. On the other hand, step ST4
And a control signal generation unit 9 including control signal generation cells 91 and 92.
After designing the logical layout of 0, the control signal generator 9
The automatic placement and routing of 0 is executed in step ST5. Then, in step ST6, the speed is evaluated from the load capacities of the control signal receiving cells 81 and 82 formed in the above steps and the driving capacities of the control signal generating cells 91 and 92. If the speed evaluation result is good, the process proceeds to step ST7 to complete the placement and wiring of the control signal generation unit 90 including the control signal generation cells 91 and 92. On the other hand, when the speed evaluation result in step ST6 is not good, step S
At T8, the drive capability of the control signal generation cells 91 and 92 is further adjusted, and then the control proceeds to the above-mentioned step ST7.

【００５９】図７によれば、制御信号受信セル８１，８
２を内蔵する制御回路８０と制御信号生成セル９１，９
２を含む制御信号生成部９０とを分離して設けたので、
制御信号生成セル９１，９２の駆動能力に依存すること
なく制御回路８０の自動配線を進めることが可能とな
る。その場合、制御回路８０と制御信号生成部９０とを
分離せずに設計する場合に比べてデータ処理装置全体の
外形を早期に決定することができる。また、実際の配置
配線情報に基づいた制御信号の負荷容量を用いて制御信
号生成セル９１，９２の駆動能力を決定するため、高精
度なタイミング調整が実現可能である。さらに、駆動能
力調整時には制御回路８０及び制御信号生成部９０の外
形は既に決定しており、制御信号生成セル９１，９２の
駆動能力を調整するだけで済む。したがって、再配置配
線は必要でなく、かつ細かいタイミング調整が容易に実
施可能となる。According to FIG. 7, the control signal receiving cells 81, 8
2 and a control circuit 80 and control signal generation cells 91, 9
Since the control signal generation unit 90 including 2 is provided separately,
It is possible to proceed with the automatic wiring of the control circuit 80 without depending on the driving ability of the control signal generation cells 91 and 92. In that case, the outer shape of the entire data processing device can be determined earlier than in the case where the control circuit 80 and the control signal generation unit 90 are designed without being separated. Further, since the drive capacity of the control signal generation cells 91 and 92 is determined by using the load capacitance of the control signal based on the actual arrangement and wiring information, highly accurate timing adjustment can be realized. Further, the outer shapes of the control circuit 80 and the control signal generation unit 90 are already determined at the time of adjusting the driving ability, and it is only necessary to adjust the driving ability of the control signal generating cells 91 and 92. Therefore, no rearrangement wiring is required, and fine timing adjustment can be easily performed.

【００６０】図９は、図７の構成の応用例を示し、スー
パースカラ構成を有するデータ処理装置の全体図であ
る。図９のデータ処理装置の構成は、図１の構成と基本
的には同じであるが、命令アドレス生成部２０、命令フ
ェッチ部１００、第１命令実行部５０及び第２命令実行
部６０は、本体である制御回路に対してクロック信号を
供給する制御信号生成部を内蔵している。そして、主ク
ロック生成部４０で発生された基準クロック信号ＣＬＫ
を受けて、各部の制御信号生成部で制御回路の動作を制
御するローカルクロック信号Ｃ20，Ｃ100 ，Ｃ50，Ｃ60
を個別に生成するように構成されている。FIG. 9 shows an application example of the configuration of FIG. 7, and is an overall view of a data processing device having a superscalar configuration. The configuration of the data processing device of FIG. 9 is basically the same as the configuration of FIG. 1, but the instruction address generation unit 20, the instruction fetch unit 100, the first instruction execution unit 50, and the second instruction execution unit 60 are It has a built-in control signal generation unit that supplies a clock signal to the control circuit that is the main body. Then, the reference clock signal CLK generated by the main clock generation unit 40
In response to this, the local clock signals C20, C100, C50, C60 for controlling the operation of the control circuit in the control signal generation section of each section.
Are individually generated.

【００６１】このようにデータ処置装置を構成すること
で、上記各制御回路の制御信号受信セルの負荷容量に応
じて制御信号生成部の制御信号生成セルの駆動能力を容
易に調整することができる。そして、制御信号受信セル
の駆動能力の微細な調整が可能となることで、各ブロッ
ク間の信号のタイミングの調整が正確かつ容易となる。
例えば、図９において、命令キャッシュ３０に入力され
るアドレス信号Ｓ22とアドレス同期クロック信号Ｓ10と
のタイミングを正確に一致させることができ、データ処
理装置の高速動作を実現することができる。By configuring the data processing device in this way, it is possible to easily adjust the drive capability of the control signal generation cell of the control signal generation unit according to the load capacity of the control signal reception cell of each control circuit. . Since the drive capability of the control signal receiving cell can be finely adjusted, the timing of signals between the blocks can be adjusted accurately and easily.
For example, in FIG. 9, the timing of the address signal S22 input to the instruction cache 30 and the timing of the address synchronization clock signal S10 can be accurately matched, and the high speed operation of the data processing device can be realized.

【００６２】[0062]

【発明の効果】以上説明してきた通り、請求項１の発明
によれば、キャッシュメモリと、アドレス信号を発生す
るアドレス発生手段と、クロック発生手段とを備えたデ
ータ処理装置において、アドレスクロック発生手段から
発生されるアドレス同期クロック信号によりアドレス発
生手段により発生されるアドレス信号の変化タイミング
と一致したタイミングでキャッシュメモリを制御する構
成としたので、キャッシュメモリ全体の動作サイクルタ
イムの低減による動作の高速化を図ることができる。As described above, according to the first aspect of the present invention, in the data processing device including the cache memory, the address generating means for generating the address signal, and the clock generating means, the address clock generating means is provided. Since the cache memory is configured to be controlled at a timing that coincides with the change timing of the address signal generated by the address generation means by the address synchronous clock signal generated from the cache memory, the operation cycle time of the entire cache memory is reduced to speed up the operation. Can be achieved.

【００６３】請求項２の発明によれば、アドレス発生手
段により、アドレスの演算、及びアドレスの保持を行う
とともに、アドレス選択回路を介して基準クロック信号
に同期したタイミングでアドレス信号を出力するように
したので、請求項１の発明の実効を得ることができる。According to the second aspect of the present invention, the address generating means calculates the address and holds the address, and outputs the address signal at the timing synchronized with the reference clock signal via the address selection circuit. Therefore, the effect of the invention of claim 1 can be obtained.

【００６４】請求項３の発明によれば、論理積回路を配
置したクロック発生手段により、キャッシュ動作の要求
がありかつアドレスの保持要求がない場合に、基準クロ
ック信号に同期したタイミングでアドレス同期クロック
信号を出力する構成としたので、キャッシュメモリの誤
動作を招くことなく、アドレス信号とアドレス同期クロ
ック信号とのタイミングの一致を図ることができる。According to the third aspect of the present invention, the clock generation means in which the AND circuit is arranged causes the address synchronization clock to be synchronized with the reference clock signal when there is a cache operation request and no address holding request. Since the signal is output, the timings of the address signal and the address synchronization clock signal can be matched without causing a malfunction of the cache memory.

【００６５】請求項４の発明によれば、キャッシュメモ
リの構成として、信号遅延回路でアドレス同期クロック
信号のタイミングを調整して遅延クロック信号とし、こ
の遅延クロック信号によってメモリアレイの動作を制御
するようにしたので、適正なタイミングでキャッシュメ
モリの動作を制御することができる。According to the fourth aspect of the present invention, as the structure of the cache memory, the signal delay circuit adjusts the timing of the address synchronous clock signal to form a delayed clock signal, and the delayed clock signal controls the operation of the memory array. Therefore, the operation of the cache memory can be controlled at an appropriate timing.

【００６６】請求項５の発明によれば、上記請求項４の
発明において、遅延クロック信号によってメモリアレイ
からの読み出しデータラッチタイミングを制御するよう
にしたので、遅延時間の調整だけで、キャッシュメモリ
から命令信号を出力するタイミングの最適化を図ること
ができる。According to the invention of claim 5, in the invention of claim 4, the delay data signal is used to control the read data latch timing from the memory array. The timing of outputting the command signal can be optimized.

【００６７】請求項６の発明によれば、アドレス発生手
段により発生されるアドレス信号と、クロック発生手段
により発生されるアドレス同期クロック信号とを、互い
に平行に延びる配線経路を介して供給するように構成し
たので、アドレス信号とアドレス同期クロック信号との
遅延量を物理的に等しく調節することができる。According to the sixth aspect of the invention, the address signal generated by the address generating means and the address synchronizing clock signal generated by the clock generating means are supplied through the wiring paths extending in parallel with each other. With this configuration, the delay amounts of the address signal and the address synchronization clock signal can be adjusted to be physically equal.

【００６８】請求項７の発明によれば、上記請求項６の
発明において、クロック発生手段とアドレス発生手段と
の出口の能力を等しくしたので、アドレス信号とアドレ
ス同期クロック信号とのタイミングとの一致度の向上を
図ることができる。According to the invention of claim 7, in the invention of claim 6, the output capabilities of the clock generation means and the address generation means are made equal, so that the timings of the address signal and the address synchronization clock signal match. It is possible to improve the degree.

【００６９】請求項８の発明によれば、アドレス発生手
段により発生されるアドレス信号と、クロック発生手段
により発生されるアドレス同期クロック信号とを、同一
配線層に形成された配線経路を介してキャッシュメモリ
に供給するようにしたので、アドレス信号とアドレス同
期クロック信号との遅延量の調整の容易化を図ることが
できる。According to the eighth aspect of the present invention, the address signal generated by the address generating means and the address synchronous clock signal generated by the clock generating means are cached via the wiring paths formed in the same wiring layer. Since it is supplied to the memory, it is possible to facilitate adjustment of the delay amount between the address signal and the address synchronization clock signal.

[Brief description of drawings]

【図１】本発明の実施例に係るデータ処理装置の全体構
成を示すブロック図である。FIG. 1 is a block diagram showing an overall configuration of a data processing device according to an embodiment of the present invention.

【図２】図１中の命令フェッチ部の詳細構成を示す電気
回路図である。FIG. 2 is an electric circuit diagram showing a detailed configuration of an instruction fetch unit in FIG.

【図３】図２中の命令フェッチ部の動作タイミング図で
ある。3 is an operation timing chart of an instruction fetch unit in FIG.

【図４】図２の変形例を示す電気回路図である。FIG. 4 is an electric circuit diagram showing a modified example of FIG.

【図５】図１中の命令キャッシュ動作クロック生成部、
命令アドレス生成部及び命令キャッシュの各々の詳細構
成を示す電気回路図である。5 is an instruction cache operation clock generation unit in FIG.
It is an electric circuit diagram which shows each detailed structure of an instruction address generation part and an instruction cache.

【図６】図５の回路の動作タイミング図である。6 is an operation timing chart of the circuit of FIG.

【図７】本発明の実施例に係るデータ処理装置の部分レ
イアウト図である。FIG. 7 is a partial layout diagram of the data processing apparatus according to the embodiment of the present invention.

【図８】図７のデータ処理装置の配置配線方法を示すフ
ロ―チャ―ト図である。FIG. 8 is a flowchart showing a layout and wiring method of the data processing device of FIG.

【図９】図７の応用例に係るデータ処理装置の全体構成
を示すブロック図である。9 is a block diagram showing an overall configuration of a data processing device according to an application example of FIG. 7. FIG.

【図１０】従来のデータ処理装置の構成を示す電気回路
図である。FIG. 10 is an electric circuit diagram showing a configuration of a conventional data processing device.

【図１１】図１０中の命令フェッチ部の動作タイミング
図である。FIG. 11 is an operation timing chart of the instruction fetch unit in FIG.

【図１２】従来のデータ処理装置の部分レイアウト図で
ある。FIG. 12 is a partial layout diagram of a conventional data processing device.

【図１３】図１２のデータ処理装置の配置配線方法を示
すフロ―チャ―ト図である。FIG. 13 is a flowchart showing a layout and wiring method of the data processing device of FIG.

[Explanation of symbols]

１０命令キャッシュ動作クロック生成部１１論理積回路２０命令アドレス生成部２１アドレス演算回路２２アドレス選択回路２３アドレス保持回路３０命令キャッシュ３１信号遅延回路３２メモリアレイ３３ラッチ４０主クロック生成部５０第１命令実行部５１第１命令デコーダ５２浮動小数点ユニット５３ラッチ６０第２命令実行部６１第２命令デコーダ６２整数ユニット６３ラッチ８０制御回路８１，８２制御信号受信セル９０制御信号生成部９１，９２制御信号生成セル１００命令フェッチ部１２０命令待機部１２１，１２２プリデコーダ１２３命令キュー１４０制御手段１４１，１４２スリーステートバッファ１４３命令フェッチ制御回路１５０命令選択部１５１第１命令選択回路１５２第２命令選択回路Ｂin1 ，Ｂin2 命令バスＢwt1 ，Ｂwt2 待機命令バスＢdc1 ，Ｂdc2 待機命令デコード信号線Ｂout1，Ｂout2 命令発行バス 10 Instruction cache operation clock generator 11 AND circuit 20 Instruction address generator 21 Address arithmetic circuit 22 Address selection circuit 23 Address holding circuit 30 instruction cache 31 signal delay circuit 32 memory array 33 Latch 40 Main clock generator 50 First instruction execution unit 51 First Instruction Decoder 52 Floating point unit 53 Latch 60 Second instruction execution unit 61 Second instruction decoder 62 integer units 63 latch 80 control circuit 81,82 Control signal receiving cell 90 control signal generator 91, 92 control signal generation cell 100 instruction fetch section 120 command waiting section 121,122 predecoder 123 instruction queue 140 control means 141,142 three-state buffer 143 Instruction fetch control circuit 150 Command selection section 151 First instruction selection circuit 152 Second instruction selection circuit Bin1 and Bin2 instruction buses Bwt1, Bwt2 standby instruction bus Bdc1, Bdc2 Standby instruction decode signal line Bout1, Bout2 instruction issue bus

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平２−40741（ＪＰ，Ａ) 特開平６−149655（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G06F 12/00 G06F 12/08 ─────────────────────────────────────────────────── ─── Continuation of front page (56) References JP-A-2-40741 (JP, A) JP-A-6-149655 (JP, A) (58) Fields investigated (Int.Cl. ⁷ , DB name) G06F 12/00 G06F 12/08

Claims

(57) [Claims]

1. A data processing device comprising at least a cache memory, wherein an address generating means for generating an address signal and an address synchronous clock signal at a timing coincident with a change timing of the address signal generated by the address generating means. And a cache control means for controlling the operation timing of the cache memory by using the address synchronization clock signal generated by the clock generation means. And data processing device.

2. The data processing device according to claim 1, wherein the address generating means holds an address arithmetic circuit for arithmetically operating an address and an address signal output from the address generating means for a predetermined time and outputs the address signal. includes an address holding circuit, depending on the address hold control signal, selects either of the output signal of the output signal and the address holding circuit of the address arithmetic circuit, an address selection circuit for outputting as said address signal, A data processing device, wherein the operation timing of the address holding circuit is adjusted so that the address signal is output at a timing synchronized with a reference clock signal via the address selection circuit.

3. The data processing device according to claim 1, wherein the clock generation means receives a reference clock signal, an address holding control signal and a cache operation request signal as inputs.
And an address synchronous clock signal generated from the reference clock signal when an operation request is issued to the cache memory from the cache operation request signal and no address holding request is issued from the address holding control signal. A data processing device characterized in that

4. The data processing device according to claim 1, wherein the cache memory includes a memory array and a delay circuit that generates a delayed clock signal by delaying the address synchronization clock signal for a predetermined time. The data processing device, wherein the cache control means controls the operation timing of the memory array using the delayed clock signal.

5. The data processing device according to claim 4, wherein the cache memory has a data holding unit that holds read data output from the memory array, and the cache control unit outputs the delayed clock signal. A data processing device characterized by using the above to control a latch timing of read data in the data holding unit.

6. The data processing device according to claim 1, wherein the clock generating means and the address generating means extend the address signal and the address synchronous clock signal in parallel with each other. A data processing device configured to supply to a cache memory via a wiring path.

7. The data processing device according to claim 6, wherein the clock generating means and the address generating means are configured so that the capabilities of the exits of signals to be transmitted are equal to each other. circuit.

8. The data processing device according to claim 6, wherein the clock generating means and the address generating means pass the address signal and the address synchronization clock signal through a path formed in the same wiring layer. A data processing device, characterized in that it is configured to supply to a cache memory.