JPH0721061A

JPH0721061A - Program performance analyzing method

Info

Publication number: JPH0721061A
Application number: JP5191730A
Authority: JP
Inventors: Hiroyuki Hashimoto; 博幸橋本; Sumio Kikuchi; 純男菊池
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1993-07-05
Filing date: 1993-07-05
Publication date: 1995-01-24

Abstract

PURPOSE:To provide a method to analyze the performance of a program by setting only an instruction pass executed actually as a target, and calculating execution time considering the pipeline interlock of a continuous instruction string without changing a source program. CONSTITUTION:A mechanical word instruction string 70 is inputted, and it is divided into plural fundamental blocks by fundamental block division processing 31, and the number of static execution cycles of the fundamental block is calculated by using interlock information 40 in cycle number calculation processing 32. At this time, the number of times of generation at every interlock specification is tabulated in generating number of times tabulation processing 33 simultaneously. The number of times of execution of each fundamental block is inputted from execution number of times information 80, and the execution time of the fundamental block is displayed by calculating from the number of cycles calculated by execution time calculation processing 34.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、プログラムの性能解析
方法に関し、詳しくは、パイプラインインタロックを考
慮してプログラムの実行時間を算出することによりプロ
グラムの性能を解析することができるプログラム性能解
析方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a program performance analysis method, and more particularly, to a program performance analysis capable of analyzing program performance by calculating program execution time in consideration of pipeline interlock. Regarding the method.

【０００２】[0002]

【従来の技術】従来より、プログラムの性能を評価する
ための種々のプログラム性能解析方法が知られている。2. Description of the Related Art Conventionally, various program performance analysis methods for evaluating program performance have been known.

【０００３】例えば、ソースプログラムをコンパイル
し、得られたロードモジュールを実行させたとき、プロ
グラムのどの箇所の実行比率が大きいかを解析するプロ
グラム性能解析方法として、ソースプログラム中に実行
時間測定用の処理を挿入する方法がある。これは、解析
を行う者がソースプログラム中に実行時間を測定するた
めの文を人為的に挿入し、それをコンパイルして得られ
たロードモジュールを実行し、これにより実行時間を測
定する方法である。For example, as a program performance analysis method for analyzing which part of the program has a large execution ratio when the source program is compiled and the obtained load module is executed, a method for measuring execution time in the source program is used. There is a way to insert a process. This is a method in which the parser artificially inserts a statement for measuring the execution time into the source program, compiles it, executes the load module obtained, and thereby measures the execution time. is there.

【０００４】また、別の解析方法が、特開平４−３２０
５３６号公報に開示されている。これは、ソースプログ
ラムをコンパイルして得られた機械語またはアセンブラ
のリストから、プログラムのフローグラフを作成し、フ
ローグラフに存在する各々のパスの静的な実行時間を計
算し、その総和によってプログラムの実行時間に関する
性能を評価する方法である。Another analysis method is disclosed in Japanese Patent Laid-Open No. 4-320.
It is disclosed in Japanese Patent No. 536. This is to create a flow graph of a program from a machine language or assembler list obtained by compiling a source program, calculate the static execution time of each path existing in the flow graph, and calculate the program by the sum of them. This is a method for evaluating the performance related to the execution time of.

【０００５】さらに、ＰＯＨＵＡＰ．ＣＨＡＮＧらの
論文「ＵｓｉｎｇＰｒｏｆｉｌｅＩｎｆｏｒｍａｔｉ
ｏｎｔｏＡｓｓｉｓｔＣｏｄｅＯｐｔｉｍｉｚ
ａｔｉｏｎ」：ＳＯＦＴＷＡＲＥ−ＰＲＡＣＴＩＣＥ
ＡＮＤＥＸＰＥＲＩＥＮＣＥ，Ｖｏｌ．２１（１
２），１３０１−１３２１（ＤＥＣＥＭＢＥＲ１９９
１）には、基本ブロック（逐次的に実行される一連の代
入文や式の列）の実行回数を取得する方法が開示されて
いる。Furthermore, POHUA P. CHANG et al.'S paper "Using Profile Information"
on to Assist Code Optimize
ation ”: SOFTWARE-PRACTICE
AND EXPERIENCE, Vol. 21 (1
2), 1301-1321 (DECEMBER 199
1) discloses a method of acquiring the number of times of execution of a basic block (a series of assignment statements or sequences of expressions that are executed sequentially).

【０００６】この論文に開示されている基本ブロックの
実行回数の取得方法では、各基本ブロックの入口で、基
本ブロックごとにつけられた識別子を引数にして、実行
回数を記録する関数を呼び出す。識別子は、基本ブロッ
クごとに付与されたプログラム中でユニークな識別子で
ある。呼び出された関数では、引数として与えられた識
別子で表わされる基本ブロックの実行回数に１を加える
処理を行う。これにより、プログラムの実行が終了した
ときには各基本ブロックが何回実行されたかを知ること
ができ、この各基本ブロックの実行回数によって当該プ
ログラムを評価することができる。In the basic block execution count acquisition method disclosed in this paper, at the entrance of each basic block, a function for recording the execution count is called using the identifier assigned to each basic block as an argument. The identifier is a unique identifier in the program assigned to each basic block. The called function performs a process of adding 1 to the execution count of the basic block represented by the identifier given as an argument. This makes it possible to know how many times each basic block has been executed when the execution of the program is completed, and the program can be evaluated by the number of times each basic block has been executed.

【０００７】[0007]

【発明が解決しようとする課題】ところが、ソースプロ
グラム中に実行時間測定用の処理を挿入する方法では、
各々のプログラム部分のすべてについて実行時間を測定
しようとすると、ある程度多くの実行時間測定用の処理
をプログラム中に挿入する必要がある。したがって、実
行時間測定用の処理のオーバーヘッドが、測定対象部分
の実行時間に含まれてしまい、プログラム部分だけの実
行時間が正確に測定できないという問題がある。However, in the method of inserting the process for measuring the execution time in the source program,
In order to measure the execution time for all of the respective program parts, it is necessary to insert a certain number of processes for measuring the execution time into the program. Therefore, the processing overhead for measuring the execution time is included in the execution time of the measurement target portion, and there is a problem that the execution time of only the program portion cannot be accurately measured.

【０００８】また、プログラム中に時間測定用の処理を
挿入する場合、解析者は、プログラムのどの位置からど
の位置までの実行時間を測定すればそのプログラムを評
価できるのかをある程度予測し、そのような予測のもと
に挿入場所の決定を行う。しかし、その予測がはずれて
いた場合は、再度、ソースプログラムへの時間測定用処
理の挿入をやり直し、コンパイルおよび実行を行う必要
がある。Further, when inserting a process for measuring time in a program, the analyst predicts to some extent whether the program can be evaluated by measuring the execution time from which position of the program to which position. The insertion location is determined based on such predictions. However, if the prediction is incorrect, it is necessary to insert the time measurement process into the source program again, compile and execute it again.

【０００９】さらに、上記特開平４−３２０５３６号に
開示された方法は、プログラムのフローグラフ中に存在
する各々のパスの実行時間を求め、それらの実行時間の
総和でプログラムの性能解析を行うため、実行時に一度
も実行されない可能性のあるパスの実行時間も解析対象
としてしまう場合がある。Further, in the method disclosed in the above-mentioned Japanese Patent Laid-Open No. 4-320536, the execution time of each path existing in the flow graph of the program is obtained, and the performance of the program is analyzed by the sum of those execution times. In some cases, the execution time of a path that may never be executed at the time of execution is also subject to analysis.

【００１０】また、基本ブロックの実行回数を取得して
プログラムを評価する方法では、プログラムの実際の実
行時間による評価ができない。Further, in the method of evaluating the program by acquiring the number of executions of the basic block, the evaluation cannot be performed by the actual execution time of the program.

【００１１】そこで、本発明の第１の目的は、実際に実
行された実行パスのみを対象にしたプログラム性能解析
方法を提供することにある。Therefore, a first object of the present invention is to provide a program performance analysis method targeting only the actually executed execution path.

【００１２】また、本発明の第２の目的は、ソースプロ
グラムの変更を行うことなく、連続した命令列のパイプ
ラインインタロックを考慮して実行時間を算出すること
により、プログラムの性能を解析するプログラム性能解
析方法を提供することにある。A second object of the present invention is to analyze the performance of the program by calculating the execution time in consideration of the pipeline interlock of continuous instruction sequences without changing the source program. It is to provide a program performance analysis method.

【００１３】[0013]

【課題を解決するための手段】上記目的を達成するた
め、本発明は、入力した機械語命令列を、連続した部分
命令列である基本ブロックに分割する分割ステップと、
該基本ブロックの実行回数を入力する実行回数入力ステ
ップと、パイプラインインタロックを考慮して、該基本
ブロックの実行サイクル数を求めるサイクル数計算ステ
ップと、該基本ブロックの実行サイクル数と実行回数と
を用いて、該基本ブロックの実行時間を算出する実行時
間算出ステップとを備えたことを特徴とする。In order to achieve the above object, the present invention comprises a dividing step of dividing an inputted machine language instruction sequence into basic blocks which are continuous partial instruction sequences,
An execution count input step for inputting the execution count of the basic block; a cycle count calculating step for obtaining the execution cycle count of the basic block in consideration of pipeline interlock; and an execution cycle count and execution count for the basic block. And an execution time calculation step for calculating the execution time of the basic block using

【００１４】前記分割ステップは、例えば、前記機械語
命令列の機械語命令を先頭から１つずつ読み込み、読み
込んだ命令が分岐命令であるときその分岐命令と次の命
令との間で分割し、あるいは読み込んだ命令がラベルで
あるときそのラベルの前で分割することにより、前記機
械語命令列を基本ブロックに分割する。In the dividing step, for example, the machine language instructions of the machine language instruction string are read one by one from the beginning, and when the read instruction is a branch instruction, it is divided between the branch instruction and the next instruction, Alternatively, when the read instruction is a label, it is divided before the label to divide the machine language instruction sequence into basic blocks.

【００１５】機械語命令列は、どのような手順で用意し
たものでもよい。例えば、性能解析の対象である高級言
語のソースプログラムをコンパイルして作成したものを
用いればよい。The machine language instruction sequence may be prepared by any procedure. For example, a program created by compiling a high-level language source program that is the target of performance analysis may be used.

【００１６】基本ブロックの実行回数は、従来より知ら
れている方法で計測すればよい。例えば、ソースプログ
ラムをコンパイルする際に、基本ブロックごとに実行回
数を計測するためのオブジェクトコードを付加したロー
ドモジュールを生成して出力し、該ロードモジュールを
実行して、各基本ブロックの実行回数を計測する方法な
どが用いられる。The number of times the basic block is executed may be measured by a conventionally known method. For example, when compiling a source program, a load module to which an object code for measuring the number of executions is added and output for each basic block, the load module is executed, and the number of executions of each basic block is calculated. A measuring method or the like is used.

【００１７】サイクル数計算ステップにおいて、基本ブ
ロックの実行サイクル数算出時にパイプラインインタロ
ックの発生する命令列の並びがある場合、インタロック
の種別ごとの発生回数を集計する処理を行うようにして
もよい。インタロックの種別ごとの発生回数も、プログ
ラムの性能を評価する指標になる。In the cycle number calculation step, if there is a sequence of instruction sequences in which pipeline interlocks occur when the number of execution cycles of a basic block is calculated, a process of totalizing the number of occurrences for each type of interlock may be performed. Good. The number of occurrences of each type of interlock is also an index for evaluating program performance.

【００１８】性能解析の対象であるソースプログラムが
ループを含む場合には、そのループを構成する基本ブロ
ックの各実行サイクル数と実行回数とからループの実行
時間を算出するようにしてもよい。When the source program which is the object of performance analysis includes a loop, the execution time of the loop may be calculated from the number of execution cycles and the number of executions of the basic blocks forming the loop.

【００１９】さらに、上述したようにして得たプログラ
ムの性能解析結果（例えば、基本ブロックごとの実行回
数および実行時間、並びにインタロック種別ごとのイン
タロック発生回数）を、表示装置に表示するようにする
とよい。Further, the performance analysis result of the program obtained as described above (for example, the execution count and execution time for each basic block, and the interlock occurrence count for each interlock type) is displayed on the display device. Good to do.

【００２０】[0020]

【作用】機械語命令列を基本ブロックに分割し、該基本
ブロックのパイプラインインタロックを考慮した静的な
実行サイクル数を算出する。また、基本ブロックの実行
回数を入力し、上記実行サイクル数と実行回数とを用い
て、基本ブロックの実行時間を算出することができる。
基本ブロックの実行回数は、ソースプログラムをコンパ
イルする際に、基本ブロックごとに実行回数を計測する
ためのオブジェクトコードを付加したロードモジュール
を生成して出力し、該ロードモジュールを実行すれば、
各基本ブロックの実行回数を計測することができる。The machine language instruction sequence is divided into basic blocks, and the number of static execution cycles is calculated in consideration of the pipeline interlock of the basic blocks. Further, the execution time of the basic block can be calculated by inputting the execution count of the basic block and using the execution cycle count and the execution count.
The number of executions of the basic block is determined by generating and outputting a load module to which an object code for measuring the number of executions is added for each basic block when the source program is compiled, and executing the load module.
The number of executions of each basic block can be measured.

【００２１】これにより、ソースプログラムの変更を行
うことなく、かつ、実際に実行された実行パスのみを対
象にしたプログラム性能解析をすることができる。As a result, it is possible to analyze the program performance without changing the source program and targeting only the actually executed execution path.

【００２２】[0022]

【実施例】以下、図面を用いて、本発明の実施例を説明
する。Embodiments of the present invention will be described below with reference to the drawings.

【００２３】まず、各図面の説明を行う前に、パイプラ
インインタロック（以下、インタロックと略す）につい
て説明する。Before explaining each drawing, a pipeline interlock (hereinafter, abbreviated as interlock) will be described.

【００２４】ある２つの命令（先に実行されるほうを先
行命令と呼び、その後に実行されるほうを後続命令と呼
ぶ）があり、先行命令と後続命令との間に依存関係があ
るために、後続命令の実行開始が遅れる場合がある。例
えば、先行命令と後続命令間に同一レジスタあるいは同
一メモリ領域への定義・引用の関係があるために、先行
命令の実行により後続命令の実行開始が遅れる場合や、
先行命令と後続命令が同じ演算器を使用するために、後
続命令の実行開始が遅れる場合などである。このような
場合に、先行命令と後続命令間でインタロックがあると
いう。There is a certain two instructions (the one that is executed first is called the preceding instruction and the one that is executed subsequently is called the succeeding instruction), and there is a dependency between the preceding instruction and the succeeding instruction. The start of instruction execution may be delayed. For example, if there is a definition / citation relationship between the preceding instruction and the succeeding instruction in the same register or the same memory area, execution of the preceding instruction may delay the start of execution of the succeeding instruction, or
This is the case where the start of execution of the subsequent instruction is delayed because the preceding and subsequent instructions use the same arithmetic unit. In such a case, there is an interlock between the preceding instruction and the subsequent instruction.

【００２５】このインタロックが発生する条件は、ＣＰ
Ｕごとに異なる。本実施例で説明するインタロック発生
の条件はその一例であり、これにより本発明が限定され
るものではない。The condition under which this interlock occurs is CP
Different for each U. The condition for interlock occurrence described in the present embodiment is an example thereof, and the present invention is not limited thereby.

【００２６】図１７は、本実施例で扱う命令種別とその
命令のアセンブラでの記述方法を示したものである。オ
ペランド中、Ｍｅｍはメモリへのアクセスを示し、Ｒｘ
はレジスタのアクセスを示す。FIG. 17 shows an instruction type handled in this embodiment and a method of writing the instruction in assembler. In the operand, Mem indicates access to memory, and Rx
Indicates access to a register.

【００２７】ロード命令２０１は、メモリＭｅｍに格納
されているデータをレジスタＲｉに読み込む命令であ
る。ストア命令２０２は、レジスタＲｉに格納されてい
るデータをメモリＭｅｍに書き込む命令である。加算命
令は、２つのレジスタＲｊ，Ｒｋに格納されているデー
タを加算し、結果をレジスタＲｉに格納する命令であ
る。乗算命令は、２つのレジスタＲｊ，Ｒｋに格納され
ているデータを乗算し、結果をレジスタＲｉに格納する
命令である。ブランチ命令は、ラベルｌａｂ−ｘｘの位
置に制御を移す分岐命令である。The load instruction 201 is an instruction to read the data stored in the memory Mem into the register Ri. The store instruction 202 is an instruction to write the data stored in the register Ri into the memory Mem. The addition instruction is an instruction to add the data stored in the two registers Rj and Rk and store the result in the register Ri. The multiplication instruction is an instruction for multiplying the data stored in the two registers Rj and Rk and storing the result in the register Ri. The branch instruction is a branch instruction that transfers control to the position of label lab-xx.

【００２８】図１は、本発明の一実施例に係るプログラ
ム性能解析方法を適用したプログラム性能解析システム
のシステム構成図である。FIG. 1 is a system configuration diagram of a program performance analysis system to which a program performance analysis method according to an embodiment of the present invention is applied.

【００２９】本システムは、ＣＰＵ装置１、メモリ装置
２、入力装置３、および表示装置４を備えている。具体
的には、コンピュータに所定のプログラムを組合せたシ
ステムである。ＣＰＵ装置１は、コンパイラ１０、実行
部２０、およびプログラム性能解析部３０を備えてい
る。This system comprises a CPU device 1, a memory device 2, an input device 3, and a display device 4. Specifically, it is a system in which a computer is combined with a predetermined program. The CPU device 1 includes a compiler 10, an execution unit 20, and a program performance analysis unit 30.

【００３０】コンパイラ１０は、ソースプログラム５０
を入力し、そのソースプログラムをコンパイルする。特
に、機械語命令列出力処理１１は、入力したソースプロ
グラムに応じて機械語命令列７０を生成する。また、実
行回数出力コード生成処理１２は、基本ブロックごとに
実行回数を計測するためのオブジェクトコードを付加し
たロードモジュール６０を生成する。The compiler 10 uses the source program 50.
And compile the source program. In particular, the machine language instruction string output process 11 generates a machine language instruction string 70 according to the input source program. The execution count output code generation process 12 also generates a load module 60 to which an object code for measuring the execution count is added for each basic block.

【００３１】実行部２０は、そのロードモジュール６０
を入力して実行し、実行回数出力処理２１で各基本ブロ
ックの実行回数を計測する。計測結果は、実行回数情報
８０として出力する。The execution unit 20 has its load module 60.
Is executed and the execution count output processing 21 measures the execution count of each basic block. The measurement result is output as the execution count information 80.

【００３２】プログラム性能解析部３０は、基本ブロッ
ク分割処理３１、サイクル数計算処理３２、インタロッ
ク発生回数集計３３、および実行時間計算処理３４から
なる。The program performance analysis unit 30 is composed of a basic block division process 31, a cycle number calculation process 32, an interlock occurrence count total 33, and an execution time calculation process 34.

【００３３】図２に、図１のコンパイラ１０および実行
部２０による処理、すなわち機械語命令列７０および実
行回数情報８０を得るための処理の流れを示す。FIG. 2 shows a flow of processing by the compiler 10 and the execution unit 20 of FIG. 1, that is, processing for obtaining the machine language instruction string 70 and the execution count information 80.

【００３４】コンパイラ１０における機械語命令列出力
処理１１は、ソースプログラム５０を入力し、機械語命
令列７０を出力する。The machine language instruction sequence output processing 11 in the compiler 10 inputs the source program 50 and outputs a machine language instruction sequence 70.

【００３５】図４は、ソースプログラム５０と機械語命
令列７０の一例を示したものである。８０１で示される
高級言語のソースプログラムが、コンパイルされ、アセ
ンブラリストで表された機械語命令列が出力されてい
る。この機械語命令列は、ラベル８０２と命令列８０５
−１〜８０５−６からなる。命令列８０５−１〜８０５
−６は、図１７で説明した記法で記述されている。FIG. 4 shows an example of the source program 50 and the machine language instruction sequence 70. A high-level language source program 801 is compiled and a machine language instruction string represented by an assembler list is output. This machine language instruction sequence includes a label 802 and an instruction sequence 805.
-1 to 805-6. Instruction sequence 805-1 to 805
-6 is described by the notation described in FIG.

【００３６】再び図２を参照して、コンパイラ１０にお
ける実行回数出力コード生成処理１２は、基本ブロック
の実行回数を計測するためのオブジェクトコードを付加
したロードモジュール６０を出力する。このオブジェク
トコードは、各基本ブロックごとに付加される。Referring again to FIG. 2, the execution count output code generation processing 12 in the compiler 10 outputs the load module 60 to which the object code for measuring the execution count of the basic block is added. This object code is added to each basic block.

【００３７】次に、実行部２０でこのロードモジュール
６０を実行する。これにより、各基本ブロックの実行回
数が計測される。実行回数出力処理２１は、その計測結
果を出力する。これにより、基本ブロックごとの実行回
数を記録した実行回数情報８０が作成される。Next, the execution module 20 executes the load module 60. As a result, the number of executions of each basic block is measured. The execution count output process 21 outputs the measurement result. As a result, the execution count information 80 recording the execution count for each basic block is created.

【００３８】図３に、図１のプログラム性能解析部３０
における処理の流れを示す。FIG. 3 shows the program performance analysis unit 30 of FIG.
The flow of processing in is shown.

【００３９】プログラム性能解析部３０は、インタロッ
ク情報が登録されているインタロック情報４０と、コン
パイラにより出力された機械語命令列７０と、ロードモ
ジュール６０を実行させて得られた実行回数情報８０と
を入力する。The program performance analysis unit 30 stores the interlock information 40 in which the interlock information is registered, the machine language instruction sequence 70 output by the compiler, and the execution count information 80 obtained by executing the load module 60. Enter and.

【００４０】プログラム性能解析部３０は、まず基本ブ
ロック分割処理３１を行う。基本ブロック分割処理３１
では、入力された機械語命令列７０を、基本ブロック
（逐次的に実行される一連の代入文や式の列）に分割す
る。そして、分割して得られた基本ブロックのすべてに
ついて、各基本ブロックごとに、基本ブロック情報テー
ブル、命令列情報テーブル、およびインタロック集計テ
ーブルを作成する。The program performance analysis section 30 first performs basic block division processing 31. Basic block division processing 31
Then, the input machine language instruction sequence 70 is divided into basic blocks (a sequence of a series of assignment statements and expressions that are executed sequentially). Then, a basic block information table, an instruction sequence information table, and an interlock tabulation table are created for each basic block of all the basic blocks obtained by division.

【００４１】図５は、これらのテーブルの構造を示す。FIG. 5 shows the structure of these tables.

【００４２】図５（Ａ）は、基本ブロック内にある全命
令の情報を格納する命令列情報テーブル９００の構造を
示す。この命令列情報テーブル９００は、基本ブロック
内で、その基本ブロック内にある命令の個数分だけ確保
される。９０１は命令種別を格納するフィールド、９０
２はその命令が基本ブロック内で何サイクル目に実行可
能になるかを示す開始サイクル数を格納するフィール
ド、９０３はその命令で定義されるレジスタがある場合
にそのレジスタ名を格納するフィールド、９０４はその
命令で引用されるレジスタがある場合にそのレジスタ名
を格納するフィールドである。FIG. 5A shows the structure of an instruction sequence information table 900 which stores information on all the instructions in the basic block. This instruction sequence information table 900 is secured in a basic block by the number of instructions in the basic block. 901 is a field for storing the instruction type, 90
Reference numeral 2 is a field for storing the number of start cycles indicating the cycle in which the instruction can be executed in the basic block, 903 is a field for storing the register name of the register defined by the instruction, 904 Is a field for storing the register name when there is a register referred to by the instruction.

【００４３】図５（Ｂ）は、インタロックの種別ごとに
発生回数を格納するインタロック集計テーブル９２０の
構造を示す。インタロック集計テーブル９２０は、基本
ブロック内で発生するインタロックの種別ごとに作成さ
れる。９２１はインタロックの種別を記載するフィール
ド、９２２はその種別のインタロックの発生回数を格納
するフィールドである。FIG. 5B shows the structure of the interlock aggregation table 920 which stores the number of occurrences for each type of interlock. The interlock tabulation table 920 is created for each type of interlock generated in the basic block. Reference numeral 921 is a field for describing the type of interlock, and 922 is a field for storing the number of occurrences of the interlock of that type.

【００４４】図５（Ｃ）は、全基本ブロックの情報を格
納する基本ブロック情報テーブル９５０を示す。基本ブ
ロック情報テーブル９５０は、基本ブロックごとに作成
される。FIG. 5C shows a basic block information table 950 which stores information on all basic blocks. The basic block information table 950 is created for each basic block.

【００４５】９５１は基本ブロックのラベル名を格納す
るフィールド、９５２は当該基本ブロック内の命令の個
数を格納するフィールド、９５３は当該基本ブロックの
１回あたりの静的実行サイクル数を格納するフィール
ド、９５４は当該基本ブロック内のインタロック発生回
数を格納するインタロック集計テーブル（図５（Ｂ））
へのポインタを格納するフィールド、９５５は当該基本
ブロック内の命令に関する命令列情報テーブル（図５
（Ａ））へのポインタを格納するフィールド、９５６は
次の基本ブロック情報へのポインタを格納するフィール
ドである。Reference numeral 951 is a field for storing the label name of the basic block, 952 is a field for storing the number of instructions in the basic block, and 953 is a field for storing the number of static execution cycles of the basic block at one time. Reference numeral 954 is an interlock aggregation table storing the number of interlock occurrences in the basic block (FIG. 5 (B)).
A field for storing a pointer to the instruction sequence information table 955 relating to the instructions in the basic block (see FIG. 5).
(A)) is a field for storing a pointer, and 956 is a field for storing a pointer to the next basic block information.

【００４６】図５（Ｄ）は、基本ブロック情報テーブ
ル、命令列情報テーブル、およびインタロック集計テー
ブルで構成するリスト構造を示す。各基本ブロックごと
に基本ブロック情報テーブル９５０が作成され、ポイン
タ９５６でつなげられている。基本ブロック情報９５０
のポインタ９５４には、その基本ブロックのインタロッ
ク集計テーブル９２０のリストの先頭アドレスが格納さ
れている。また、基本ブロック情報９５０のポインタ９
５５には、その基本ブロックの命令列情報テーブル９０
０のリストの先頭アドレスが格納されている。FIG. 5D shows a list structure composed of a basic block information table, an instruction string information table, and an interlock aggregation table. A basic block information table 950 is created for each basic block and is connected by a pointer 956. Basic block information 950
The pointer 954 stores the head address of the list of the interlock aggregation table 920 of the basic block. Also, the pointer 9 of the basic block information 950
55 is an instruction sequence information table 90 of the basic block.
The start address of the list of 0 is stored.

【００４７】再び図３を参照して、基本ブロック分割処
理３１では、機械語命令列７０を入力して、図５で説明
したリスト構造を有する基本ブロック情報テーブル、命
令列情報テーブル、およびインタロック集計テーブルを
作成する。なお、この時点では、図５（Ａ）の命令サイ
クル数９０２、図５（Ｂ）の発生回数９２２、および図
５（Ｃ）のサイクル数９５３は、０に初期化される。Referring again to FIG. 3, in the basic block division processing 31, the machine language instruction string 70 is input, and the basic block information table, the instruction string information table and the interlock having the list structure described in FIG. 5 are input. Create a summary table. At this point, the number of instruction cycles 902 in FIG. 5A, the number of occurrences 922 in FIG. 5B, and the number of cycles 953 in FIG. 5C are initialized to 0.

【００４８】基本ブロック分割処理３１の後、サイクル
数計算処理３２を行う。サイクル数計算処理３２では、
分割された全基本ブロックについて、インタロック情報
４０に登録されているインタロック情報に基づき、各基
本ブロック内の命令列の実行サイクル数を計算する。After the basic block division processing 31, a cycle number calculation processing 32 is performed. In the cycle number calculation process 32,
For all divided basic blocks, the number of execution cycles of the instruction sequence in each basic block is calculated based on the interlock information registered in the interlock information 40.

【００４９】図６は、図１および図３のインタロック情
報４０の内容例を示す。本実施例では、インタロック情
報として、インタロックの発生する条件である「命令の
組合わせ」および「レジスタ利用条件」と、そのインタ
ロックが発生した場合に何サイクルの遅れが起こるかを
示す「インタロックサイクル数」を、あらかじめ用意し
ている。FIG. 6 shows an example of the contents of the interlock information 40 shown in FIGS. In the present embodiment, as the interlock information, a "instruction combination" and a "register use condition" that are conditions under which the interlock occurs, and "how many cycles are delayed when the interlock occurs" are shown. "Number of interlock cycles" is prepared in advance.

【００５０】例えば、インタロック情報３０１は、先行
命令である「ロード」で定義されるレジスタと後続命令
である「加算」で使用（引用）されるレジスタとが同一
の場合、その２つの命令間に、先行命令である「ロー
ド」による影響を受けない（インタロックの発生しな
い）命令を最低一命令挿入しておかなければ、後続命令
である「加算」の実行が通常より１サイクル遅れて開始
されることを示している。For example, when the register defined by the preceding instruction “load” and the register used (quoted) by the subsequent instruction “addition” are the same, the interlock information 301 is between the two instructions. If at least one instruction that is not affected by the preceding instruction "load" (interlock does not occur) is inserted, execution of the subsequent instruction "addition" starts one cycle later than usual. Is shown to be done.

【００５１】例えば、次のような命令列でロード命令の
実行がｎサイクル目に開始されたとする。For example, assume that execution of a load instruction is started in the nth cycle with the following instruction sequence.

【００５２】ｎＲｉ＝Ｍｅｍｎ＋１（１サイクルのインタロック）ｎ＋２Ｒｋ＝Ｒｉ＋ＲｊN Ri = Mem n + 1 (1 cycle interlock) n + 2 Rk = Ri + Rj

【００５３】この場合、先行命令のロードと後続命令の
加算との間に１サイクルのインタロックが発生するた
め、後続の加算命令が開始されるのは、ｎ＋２サイクル
目になる。他のインタロック情報３０２、３０３、３０
４もインタロック情報３０１と同様の意味を持つ。In this case, since one cycle of interlock occurs between the loading of the preceding instruction and the addition of the succeeding instruction, the subsequent addition instruction is started in the (n + 2) th cycle. Other interlock information 302, 303, 30
4 also has the same meaning as the interlock information 301.

【００５４】再び図３を参照して、サイクル数計算処理
３２では、図６に示したようなインタロック情報を参照
しながら、インタロックを考慮して各基本ブロック内の
命令列の実行サイクル数を計算する。その結果（すなわ
ち、基本ブロックの命令列の実行サイクル数）は、図５
（Ｃ）の基本ブロック情報９５０のサイクル数９５３に
セットされる。また、その実行サイクル数を計算する途
中で明らかになる各命令の開始サイクル数は、図５
（Ａ）の命令列情報テーブル９００の開始サイクル数９
０２にセットされる。Referring again to FIG. 3, in the cycle number calculation process 32, while referring to the interlock information as shown in FIG. 6, the number of execution cycles of the instruction sequence in each basic block is considered in consideration of the interlock. To calculate. The result (that is, the number of execution cycles of the instruction sequence of the basic block) is shown in FIG.
It is set to the cycle number 953 of the basic block information 950 of (C). In addition, the number of start cycles of each instruction which becomes clear during calculation of the number of execution cycles is shown in FIG.
The number of start cycles of the instruction sequence information table 900 in (A) is 9
It is set to 02.

【００５５】次に、インタロック集計処理３３を行う。
インタロック集計処理３３では、インタロック情報４０
に登録されているインタロック情報に基づき、各基本ブ
ロック内でどの種類のインタロックが何回発生したかを
集計する。集計結果は、図５（Ｂ）のインタロック集計
テーブル９２０の発生回数９２２にセットされる。Next, the interlock counting process 33 is performed.
In the interlock aggregation processing 33, the interlock information 40
Based on the interlock information registered in, the type of interlock in each basic block and how many times it occurred are totaled. The counting result is set in the occurrence count 922 of the interlock counting table 920 of FIG.

【００５６】次に、実行時間計算処理３３を行う。実行
時間計算処理３３では、分割された全基本ブロックにつ
いて、基本ブロックの実行回数とその基本ブロックのサ
イクル数とに基づいて、実行時間の計算を行う。基本ブ
ロックの実行回数は、実行回数情報８０を参照して得
る。基本ブロックのサイクル数は、その基本ブロックの
基本ブロック情報テーブル９５０のサイクル数９５３を
参照して得る。実行時間の計算は、以下の式で行う。Next, the execution time calculation process 33 is performed. In the execution time calculation process 33, the execution time is calculated for all the divided basic blocks based on the number of times of execution of the basic block and the number of cycles of the basic block. The execution count of the basic block is obtained by referring to the execution count information 80. The cycle number of the basic block is obtained by referring to the cycle number 953 of the basic block information table 950 of the basic block. The execution time is calculated by the following formula.

【００５７】実行時間［秒］＝実行回数［回］×サイク
ル数［cycle ］×（１÷ＣＰＵの動作周波数）［秒／cy
cle ］Execution time [seconds] = number of executions [times] × number of cycles [cycle] × (1 ÷ CPU operating frequency) [seconds / cy
cle]

【００５８】例えば、ある基本ブロックの実行回数が１
０，０００，０００回で、サイクル数が１００サイクル
で、ＣＰＵの動作周波数が５０ＭＨｚの場合、その基本
ブロックの実行時間は以下のようになる。For example, the execution count of a certain basic block is 1
When the number of cycles is 100,000,000, the number of cycles is 100, and the operating frequency of the CPU is 50 MHz, the execution time of the basic block is as follows.

【００５９】実行時間［秒］＝１０００００００×１０
０×（１÷５０００００００）＝２０［秒］Execution time [sec] = 10000000 × 10
0 × (1 ÷ 50000000) = 20 [seconds]

【００６０】次に、図７〜図１１を参照して、本実施例
におけるプログラム性能解析処理３０の手順を詳細に説
明する。Next, the procedure of the program performance analysis processing 30 in this embodiment will be described in detail with reference to FIGS.

【００６１】図７は、本実施例におけるプログラム性能
解析処理３０の手順を示したものである。まず、ステッ
プ４０１ａでは、機械語命令列７０から機械語命令列を
入力し、基本ブロック分割処理（図３の３１）を行い、
各基本ブロックごとに基本ブロック情報テーブル、命令
列情報テーブル、およびインタロック集計テーブル（図
５）を作成して、ステップ４０１ｂに進む。FIG. 7 shows the procedure of the program performance analysis processing 30 in this embodiment. First, in step 401a, a machine language instruction sequence is input from the machine language instruction sequence 70, basic block division processing (31 in FIG. 3) is performed,
A basic block information table, an instruction sequence information table, and an interlock aggregation table (FIG. 5) are created for each basic block, and the process proceeds to step 401b.

【００６２】ステップ４０１ｂでは、ステップ４０１ａ
で作成された全基本ブロックに対して、その基本ブロッ
ク内に含まれる命令列の実行サイクル数とインタロック
発生回数を計算し（図３の３２，３３）、ステップ４０
１ｃに進む。In step 401b, step 401a
For all the basic blocks created in step 1, the number of execution cycles and the number of interlock occurrences of the instruction sequence contained in the basic block are calculated (32, 33 in FIG. 3), and step 40
Go to 1c.

【００６３】ステップ４０１ｃでは、ステップ４０１ａ
で作成された全基本ブロックに対して、ステップ４０１
ｂで計算されたサイクル数と実行回数情報８０の実行回
数とから実行時間を計算し（図３の３４）、その計算結
果とステップ４０１ｂで集計されたインタロックの発生
回数とを表示装置４（図１、図３）に表示する。In step 401c, step 401a
Step 401 for all basic blocks created in
The execution time is calculated from the number of cycles calculated in step b and the number of executions of the execution count information 80 (34 in FIG. 3), and the calculation result and the number of interlocks generated in step 401b are displayed on the display device 4 ( Displayed in FIGS. 1 and 3).

【００６４】図８および図９は、基本ブロック分割処理
４０１ａの処理手順を示すフローチャートである。8 and 9 are flowcharts showing the processing procedure of the basic block division processing 401a.

【００６５】この処理手順を説明する前に、まずこの処
理で使用するワークデータ領域について説明する。ワー
クデータ領域としては、ラベルスタック、命令スタッ
ク、および変数ｉｎｓｔ＿ｃｎｔが用意されている。Before describing this processing procedure, the work data area used in this processing will be described first. A label stack, an instruction stack, and a variable inst_cnt are prepared as the work data area.

【００６６】ラベルスタックは、機械語命令列７０に出
力されているラベル名を格納する領域である。このラベ
ル名は、基本ブロックを識別するために用いるもので、
実行回数情報８０に出力されているラベル情報と一対一
に対応しているものとする。命令スタックは、基本ブロ
ック内にある命令を格納する領域である。最初、命令ス
タックは空の状態である。命令を１つ読むごとに命令ス
タックに命令が格納される。１つの基本ブロックの処理
が終了すると、命令スタックは再び空の状態になる。変
数ｉｎｓｔ＿ｃｎｔは、基本ブロック内にある命令の数
をカウントするために用いる変数である。The label stack is an area for storing the label name output to the machine language instruction sequence 70. This label name is used to identify the basic block,
It is assumed that there is a one-to-one correspondence with the label information output in the execution count information 80. The instruction stack is an area for storing instructions in the basic block. Initially, the instruction stack is empty. An instruction is stored in the instruction stack each time one instruction is read. When the processing of one basic block is completed, the instruction stack becomes empty again. The variable inst_cnt is a variable used to count the number of instructions in the basic block.

【００６７】次に、図８および図９を参照して、基本ブ
ロック分割処理４０１ａを説明する。Next, the basic block division processing 401a will be described with reference to FIGS. 8 and 9.

【００６８】ステップ５０１ａでは、ラベルスタックお
よび命令スタックを空にし、変数ｉｎｓｔ＿ｃｎｔを０
にして、ステップ５０５ａに進む。ステップ５０５ａで
は、機械語命令列７０で読み込んでない機械語命令があ
るか否かを判定する。読み込んでない機械語命令があれ
ば、ステップ５０１ｂに進む。機械語命令がなければ、
ステップ５１５ａに進む。In step 501a, the label stack and the instruction stack are emptied, and the variable inst_cnt is set to 0.
Then, the process proceeds to step 505a. In step 505a, it is determined whether or not there is a machine language instruction that is not read in the machine language instruction sequence 70. If there is a machine language command that has not been read, the process proceeds to step 501b. Without machine instructions,
Proceed to step 515a.

【００６９】ステップ５１０ｂでは、機械語命令列７０
から機械語命令を１つ読み込み、ステップ５０５ｂに進
む。ステップ５０５ｂでは、読み込んだ命令がブランチ
命令かどうかを判定する。読み込んだ命令がブランチ命
令であれば、ステップ５０１ｄに進む。ブランチ命令で
なければ、ステップ５０５ｃに進む。At step 510b, the machine language instruction sequence 70
One machine language instruction is read from and the process proceeds to step 505b. In step 505b, it is determined whether the read instruction is a branch instruction. If the read instruction is a branch instruction, the process proceeds to step 501d. If it is not a branch instruction, the process proceeds to step 505c.

【００７０】ステップ５０５ｃでは、読み込んだ命令が
ラベルかどうかを判定する。読み込んだ命令がラベルで
あれば、ステップ５０５ｄに進む。ラベルでなければ、
ステップ５０１ｃに進む。In step 505c, it is determined whether the read instruction is a label. If the read instruction is a label, the process proceeds to step 505d. If not a label
Proceed to step 501c.

【００７１】ステップ５０１ｃでは、読み込んだ命令を
命令スタックに格納する。そして、命令スタックに命令
を格納したことで基本ブロック内の命令が１命令増えた
ので、変数ｉｎｓｔ＿ｃｎｔに１を加えた後、ステップ
５０５ａに進む。In step 501c, the read instruction is stored in the instruction stack. Since one instruction is added to the basic block by storing the instruction in the instruction stack, 1 is added to the variable inst_cnt, and the process proceeds to step 505a.

【００７２】ステップ５０１ｄでは、読み込んだブラン
チ命令を命令スタックに格納し、変数ｉｎｓｔ＿ｃｎｔ
に１を加えた後、ステップ５０１ｅに進む。In step 501d, the read branch instruction is stored in the instruction stack, and the variable inst_cnt is stored.
After adding 1 to step 501e, the process proceeds to step 501e.

【００７３】ステップ５０５ｄでは、ラベルスタックに
ラベル名が格納されてなく、かつ、変数ｉｎｓｔ＿ｃｎ
ｔが０であるかどうかを判定する。この判定は、基本ブ
ロックのラベル名がなく、かつ、基本ブロック内に命令
が１つも存在しない基本ブロックの作成を防ぐためのも
のである。この判定条件を満たした場合は、ステップ５
０５ａに進む。判定条件を満たさない場合は、ステップ
５０１ｅに進む。In step 505d, the label name is not stored in the label stack, and the variable inst_cn is set.
It is determined whether t is 0. This determination is to prevent the creation of a basic block that does not have the label name of the basic block and has no instructions in the basic block. If this determination condition is satisfied, step 5
Go to 05a. If the determination condition is not satisfied, the process proceeds to step 501e.

【００７４】ステップ５０１ｅでは、ラベルスタックお
よび命令スタックに格納されている情報から、１つの基
本ブロック情報を作成する。In step 501e, one basic block information is created from the information stored in the label stack and the instruction stack.

【００７５】基本ブロック情報作成は、以下の手順で行
う。まず、図５（Ｃ）に示した基本ブロック情報テーブ
ル９５０を１つ作成し、基本ブロック情報テーブルリス
トの最後に追加する。次に、ラベルスタックに格納され
ているラベルを、作成した基本ブロック情報テーブル９
５０のラベル名フィールド９５１に登録する。次に、ブ
ロック内命令数フィールド９５２に、変数ｉｎｓｔ＿ｃ
ｎｔの値を格納する。また、合計サイクル数フィールド
９５３に、０を登録する。The basic block information is created in the following procedure. First, one basic block information table 950 shown in FIG. 5C is created and added to the end of the basic block information table list. Next, the labels stored in the label stack are stored in the created basic block information table 9
The label name field 951 of 50 is registered. Next, in the block instruction count field 952, the variable inst_c
Stores the value of nt. Also, 0 is registered in the total cycle number field 953.

【００７６】さらに、図５（Ｂ）に示したインタロック
発生回数集計テーブル９２０を作成し、そのインタロッ
ク発生回数集計テーブル９２０のすべてのインタロック
種別の発生回数フィールドを０に初期化する。なお、イ
ンタロック発生回数集計テーブル９２０は、図６のイン
タロック情報４０に登録されているすべてのインタロッ
ク種別についてインタロックの発生回数を集計するため
の領域を備えているように作成される。Further, the interlock occurrence count table 920 shown in FIG. 5B is created, and the occurrence count fields of all interlock types in the interlock occurrence count table 920 are initialized to zero. The interlock occurrence count totaling table 920 is created so as to have an area for totaling the number of interlock occurrences for all interlock types registered in the interlock information 40 of FIG.

【００７７】また、変数ｉｎｓｔ＿ｃｎｔ個の命令列が
格納できる領域を持った命令列情報テーブル９００（図
５（Ａ））を作成し、命令スタックに格納されている命
令を命令列情報テーブルの命令フィールド９０１に登録
する。その際、登録した命令が定義あるいは引用するレ
ジスタを、それぞれ、定義レジスタフィールド９０３あ
るいは引用レジスタフィールド９０４に登録する。開始
サイクル数フィールド９０２は、全ての命令について０
に初期化する。Further, an instruction string information table 900 (FIG. 5A) having an area capable of storing variable inst_cnt instruction strings is created, and the instructions stored in the instruction stack are stored in the instruction field of the instruction string information table. Register in 901. At that time, the registers defined or quoted by the registered instruction are registered in the definition register field 903 or the reference register field 904, respectively. The number of start cycles field 902 is 0 for all instructions.
Initialize to.

【００７８】以上の処理で、ステップ５０１ｅにおける
基本ブロック情報（図５のテーブル）の作成が完了す
る。With the above processing, the creation of the basic block information (table in FIG. 5) in step 501e is completed.

【００７９】ステップ５０１ｅでは、さらに、ラベルス
タックおよび命令スタックを空の状態とし、直前に読み
込んだ命令がラベルであればラベル名をラベルスタック
に格納する。もし、直前に読み込んだ命令がブランチ命
令であれば、ラベルスタックへのラベル名の格納は行わ
ない。そして、最後に変数ｉｎｓｔ＿ｃｎｔを０とし、
ステップ５０５ａに進む。In step 501e, the label stack and the instruction stack are further emptied, and if the instruction read immediately before is the label, the label name is stored in the label stack. If the instruction read immediately before is a branch instruction, the label name is not stored in the label stack. Finally, the variable inst_cnt is set to 0,
Go to step 505a.

【００８０】ステップ５１５ａでは、命令スタックに命
令が積み残されているか否かを判定する。積み残されて
いる場合は、ステップ５１１ａに進む。積み残されてい
なければ、基本ブロック分割処理を終了する。ステップ
５１１ａでは、ステップ５０１ｅの動作と同様に、ラベ
ルスタックおよび命令スタックに格納されている情報か
ら１つの基本ブロック情報を作成する。その後、基本ブ
ロック分割処理を終了する。At step 515a, it is judged whether or not there is an instruction left on the instruction stack. If there are unloaded items, the process proceeds to step 511a. If not left, the basic block division processing is terminated. In step 511a, similar to the operation in step 501e, one basic block information is created from the information stored in the label stack and the instruction stack. After that, the basic block division processing ends.

【００８１】図１０は、サイクル数計算・インタロック
発生回数集計処理４０１ｂの処理手順を示すフローチャ
ートである。FIG. 10 is a flow chart showing the processing procedure of the cycle number calculation / interlock occurrence frequency totaling processing 401b.

【００８２】まず、ステップ６０５ａでは、基本ブロッ
ク情報テーブルリストにサイクル数をまだ計算していな
い基本ブロックがあるかどうかを判定する。この判定
は、図５（Ｃ）の基本ブロック情報テーブル９５０のサ
イクル数フィールド９５３が０である基本ブロックがあ
るかどうかを調べることで行う。サイクル数を計算して
いない基本ブロックがある場合は、ステップ６０１ａに
進む。サイクル数を計算していない基本ブロックがない
場合は、全基本ブロックでサイクル数の計算を終了して
いるので、サイクル数計算・インタロック発生回数集計
処理を終了する。First, in step 605a, it is determined whether or not there is a basic block whose cycle number has not been calculated in the basic block information table list. This determination is made by checking whether or not there is a basic block whose cycle number field 953 of the basic block information table 950 of FIG. If there is a basic block for which the number of cycles has not been calculated, the process proceeds to step 601a. If there is no basic block for which the number of cycles has not been calculated, the calculation of the number of cycles has been completed for all the basic blocks, so the cycle number calculation / interlock occurrence count totaling process ends.

【００８３】ステップ６０１ａでは、まず、命令を割り
当てることが可能なサイクル数を保持する変数ｃｒｔ＿
ｃｙｃｌｅを１にする。このサイクル数を１にするとい
うことは、基本ブロック内の先頭にある命令が、必ず１
サイクル目に割り当てられるということを表わしてい
る。次に、命令列情報テーブル９００の何番目のテーブ
ルを指しているのかを示す変数ｉを１にする。In step 601a, first, a variable crt_ holding the number of cycles to which an instruction can be assigned is held.
Set cycle to 1. Setting the number of cycles to 1 means that the first instruction in the basic block must be 1
It means that it is assigned to the cycle. Next, the variable i indicating which table in the instruction string information table 900 is pointed to is set to 1.

【００８４】図１２は、命令列情報テーブル９００の例
を示す。同図を用いて説明すると、テーブル９００−１
が１番目、テーブル９００−２が２番目、テーブル９０
０−３が３番目となる。FIG. 12 shows an example of the instruction sequence information table 900. Explaining with reference to FIG.
Is first, table 900-2 is second, table 90
0-3 is the third.

【００８５】さらに、ステップ６０１ａでは、基本ブロ
ック内の命令数を格納する変数ｍａｘ＿ｉｎｓｔに、基
本ブロック情報テーブル９５０のブロック内命令数フィ
ールド９５２の値を代入する。そして、ステップ６０５
ｂに進む。Further, in step 601a, the value of the in-block instruction number field 952 of the basic block information table 950 is substituted into the variable max_inst for storing the number of instructions in the basic block. And step 605
Go to b.

【００８６】ステップ６０５ｂでは、変数ｉが変数ｍａ
ｘ＿ｉｎｓｔ以下であるかどうかを判定する。変数ｉが
変数ｍａｘ＿ｉｎｓｔ以下の場合は、ステップ６０１ｂ
に進む。変数ｉが変数ｍａｘ＿ｉｎｓｔより大きい場合
は、ステップ６０５ｅに進む。At step 605b, the variable i is changed to the variable ma.
It is determined whether it is less than or equal to x_inst. If the variable i is less than or equal to the variable max_inst, step 601b
Proceed to. If the variable i is larger than the variable max_inst, the process proceeds to step 605e.

【００８７】ステップ６０５ｅでは、変数ｍａｘ＿ｉｎ
ｓｔが０であるかどうかを判定する。変数ｍａｘ＿ｉｎ
ｓｔが０の場合は、その基本ブロックに命令がないとい
うことなので、その基本ブロック情報テーブル９５０の
サイクル数フィールド９５３は初期値０のままとして、
ステップ６０１ｇに進む。変数ｍａｘ＿ｉｎｓｔが０で
ない場合は、ステップ６０１ｆに進み、ｍａｘ＿ｉｎｓ
ｔ番目の命令列情報テーブル９００の開始サイクル数フ
ィールド９０２に格納されている値をその基本ブロック
の実行サイクル数とし、その値を基本ブロック情報テー
ブル９５０のサイクル数フィールド９５３へ格納する。
そして、ステップ６０１ｇに進む。At step 605e, the variable max_in
It is determined whether st is 0. Variable max_in
If st is 0, it means that there is no instruction in the basic block, so the cycle number field 953 of the basic block information table 950 is left at the initial value 0,
Go to step 601g. If the variable max_inst is not 0, the process proceeds to step 601f and max_ins
The value stored in the start cycle number field 902 of the t-th instruction string information table 900 is set as the execution cycle number of the basic block, and the value is stored in the cycle number field 953 of the basic block information table 950.
Then, the process proceeds to step 601g.

【００８８】ステップ６０１ｇでは、いまサイクル数計
算処理を終了した基本ブロックの次の基本ブロックを選
択し、ステップ６０５ａに進む。At step 601g, the basic block next to the basic block for which the cycle number calculation process has been completed is selected, and the routine proceeds to step 605a.

【００８９】ステップ６０１ｂでは、まずｉ番目の命令
を開始することが可能なサイクル数（すなわち、ｉ番目
の命令の開始サイクル数）を次の式によって決定する。In step 601b, first, the number of cycles at which the i-th instruction can be started (that is, the start cycle number of the i-th instruction) is determined by the following equation.

【００９０】ｉ＿ｃｙｃｌｅ＝ＭＡＸ（ｉ＿ｃｙｃｌ
ｅ，ｃｒｔ＿ｃｙｃｌｅ）I_cycle = MAX (i_cycle
e, crt_cycle)

【００９１】ここで、記号ｉ＿ｃｙｃｌｅは命令列情報
テーブル９００のｉ番目の命令の開始サイクル数フィー
ルド９０２を表すものとする。また、ＭＡＸは２つの引
数が持つ値のうちより大きい値を持つものを結果として
返す関数とする。Here, the symbol i_cycle represents the start cycle number field 902 of the i-th instruction of the instruction sequence information table 900. In addition, MAX is a function that returns, as a result, one having a larger value out of the values of the two arguments.

【００９２】上式の評価の結果、ｉ＿ｃｙｃｌｅが返さ
れる場合は、ｉ番目の命令より先に割り当てられた命令
とｉ番目の命令との間で、インタロックが発生している
ことを示す。また、ｃｒｔ＿ｃｙｃｌｅが返される場合
は、先行命令との間でインタロックがないか、またはイ
ンタロックが発生する命令の間に他の命令があるために
インタロックが解消されていることを示す。次に、変数
ｃｒｔ＿ｃｙｃｌｅに、ｉ番目の命令の開始サイクル数
ｉ＿ｃｙｃｌｅを代入する。If i_cycle is returned as a result of the evaluation of the above expression, it indicates that an interlock has occurred between the instruction assigned before the i-th instruction and the i-th instruction. If crt_cycle is returned, it indicates that the interlock has been canceled because there is no interlock with the preceding instruction, or because there is another instruction between the instruction in which the interlock occurs. Next, the start cycle number i_cycle of the i-th instruction is substituted into the variable crt_cycle.

【００９３】さらに、ステップ６０１ｂでは、ｉ番目の
命令とその後続の命令との間でインタロックが発生する
可能性があるか否かを検査するために検査範囲の決定を
行う。検査は、ステップ６０５ｃ以降で行なうが、具体
的には、変数ｊをｉ＋１からｌｏｏｐ＿ｅｎｄまで動か
しながらｉ番目の命令とｊ番目の命令との間でインタロ
ックがあるかどうかを検査するものである。このｌｏｏ
ｐ＿ｅｎｄの値、すなわちｉ番目の命令との間でインタ
ロックが発生するかどうかを検査する範囲は、下記式で
決定される。Further, in step 601b, the inspection range is determined in order to inspect whether or not an interlock may occur between the i-th instruction and its subsequent instruction. The check is performed after step 605c. Specifically, it is checked whether there is an interlock between the i-th instruction and the j-th instruction while moving the variable j from i + 1 to loop_end. This loo
The value of p_end, that is, the range for checking whether an interlock occurs with the i-th instruction is determined by the following formula.

【００９４】ｌｏｏｐ＿ｅｎｄ＝ＭＩＮ（ｍａｘ＿ｉｎ
ｓｔ，ｉ番目の命令がインタロック種別にある場合その
インタロックサイクル数のうちの最大サイクル数＋ｉ＋
１）Loop_end = MIN (max_in
If the st, i-th instruction is of the interlock type, the maximum number of interlock cycles + i +
1)

【００９５】ただし、ＭＩＮは、ＭＡＸと反対の結果と
してより小さい値を持つものを結果として返す関数とす
る。上記式は、ｉ番目の命令によって発生する可能性の
あるインタロックのうち最大のインタロックサイクル数
分だけ後の命令までが検査範囲の基本であり、ただし基
本ブロック内の命令数はｍａｘ＿ｉｎｓｔであるから、
どちらか小さい値をｌｏｏｐ＿ｅｎｄに設定するという
意味である。However, MIN is a function that returns a result having a smaller value as a result opposite to MAX. In the above equation, the basic range of the check is the instruction after the maximum number of interlock cycles among the interlocks that may occur by the i-th instruction, but the number of instructions in the basic block is max_inst. From
This means that either smaller value is set to loop_end.

【００９６】最後に、ｉ番目の命令との間で検査を行う
ｊ番目の命令を示す変数ｊに初期値としてｉ＋１を代入
する。そして、ステップ６０５ｃに進む。Finally, i + 1 is assigned as an initial value to a variable j indicating the jth instruction to be checked with the ith instruction. Then, the process proceeds to step 605c.

【００９７】ステップ６０５ｃでは、ｉ番目の命令との
間でのインタロックの検査がすべて終了したか否かを判
断するために、変数ｊがｌｏｏｐ＿ｅｎｄ以下であるか
どうかを検査する。変数ｊがｌｏｏｐ＿ｅｎｄ以下の場
合は、ステップ６０５ｃに進む。変数ｊがｌｏｏｐ＿ｅ
ｎｄより大きい場合は、ステップ６０５ｄに進む。In step 605c, it is checked whether or not the variable j is less than or equal to loop_end in order to judge whether or not all the interlocks with the i-th instruction have been checked. If the variable j is less than or equal to loop_end, the process proceeds to step 605c. Variable j is loop_e
If it is larger than nd, the process proceeds to step 605d.

【００９８】ステップ６０５ｄでは、命令ｉと命令ｊと
の間で、インタロック情報テーブル３００に登録されて
いるインタロックが発生しているかを、命令の組み合わ
せ、およびレジスタの利用条件から判定する。インタロ
ックがある場合は、ステップ６０１ｃに進む。インタロ
ックがない場合は、ステップ６０１ｄに進む。In step 605d, it is determined whether an interlock registered in the interlock information table 300 has occurred between the instruction i and the instruction j based on the instruction combination and the register use condition. If there is an interlock, the process proceeds to step 601c. If there is no interlock, the process proceeds to step 601d.

【００９９】ステップ６０１ｃでは、まず、ステップ６
０５ｄで判定されたインタロック種別の発生回数（図５
（Ｂ）の９２２）に１を加える。次に、ｉ番目の命令に
よって発生するインタロックによりｊ番目の命令が何サ
イクル目から開始可能になるかを次の式によって決定す
る。In step 601c, first, in step 6
Number of occurrences of interlock type determined in 05d (Fig. 5
Add 1 to 922) of (B). Next, the number of cycles from which the j-th instruction can be started due to the interlock generated by the i-th instruction is determined by the following formula.

【０１００】ｊ＿ｃｙｃｌｅ＝ＭＡＸ（ｊ＿ｃｙｃｌ
ｅ，ｉ＿ｃｙｃｌｅ＋１＋ｉのインタロックサイクル
数）J_cycle = MAX (j_cycle
e, i_cycle + 1 + i interlock cycle number)

【０１０１】ここで、記号ｊ＿ｃｙｃｌｅは、ｉ＿ｃｙ
ｃｌｅと同様、命令列情報テーブル９００のｊ番目の命
令の開始サイクル数フィールド９０２を表すものとす
る。ｊ番目の命令は、既にｉ番目と異なる命令間でイン
タロックが発生している可能性もあるため、既に設定さ
れているインタロックによる遅延のほうが大きい場合
は、既に決定されているｊ＿ｃｙｃｌｅを開始サイクル
とし、ｉ番目の命令による遅延のほうが大きい場合は、
「ｉ＿ｃｙｃｌｅ＋１＋ｉによるインタロックサイクル
数」をｊ＿ｃｙｃｌｅの開始サイクルとする。そして、
ステップ６０１ｄに進む。Here, the symbol j_cycle is i_cy.
Similar to cle, it represents the start cycle number field 902 of the j-th instruction of the instruction sequence information table 900. Since the j-th instruction may already have an interlock between instructions different from the i-th instruction, if the delay due to the already set interlock is larger, the already determined j_cycle is started. Cycle and if the delay due to the i-th instruction is greater,
The “interlock cycle number by i_cycle + 1 + i” is the start cycle of j_cycle. And
Go to step 601d.

【０１０２】ステップ６０１ｄでは、ｉ番目とｊ＋１番
目の命令間での検査を行うために変数ｊに１を加える。
そして、ステップ６０５ｃに進む。At step 601d, 1 is added to the variable j in order to perform a check between the i-th and j + 1-th instructions.
Then, the process proceeds to step 605c.

【０１０３】ステップ６０１ｅでは、まず、ｉ番目とそ
の後続命令間でのインタロックの検査が終了し、ｉ＋１
番目の命令による検査を行うために変数ｉに１を加え
る。そして、ｉ＋１番目の命令が開始可能なサイクル
は、最も早い場合ｉ番目の開始サイクルを示すｃｒｔ＿
ｃｙｃｌｅの次のサイクル、すなわちｃｒｔ＿ｃｙｃｌ
ｅ＋１になるため、ｃｒｔ＿ｃｙｃｌｅに１を加える。
そして、ステップ６０５ｂに進む。In step 601e, the interlock check between the i-th instruction and its succeeding instruction is completed, and i + 1
Add 1 to the variable i to perform the check by the th instruction. The cycle in which the i + 1th instruction can be started is crt_, which indicates the ith start cycle in the earliest case.
next cycle of cycle, ie crt_cycle
Since it becomes e + 1, 1 is added to crt_cycle.
Then, the process proceeds to step 605b.

【０１０４】図１２は、図４の機械語命令列に対して基
本ブロック分割処理４０１ａを実行することにより作成
された命令列情報テーブル９００を示す。命令フィール
ド９０１、定義レジスタフィールド９０３、および引用
レジスタフィールド９０４に、それぞれの命令に応じた
データが設定されている。開始サイクル数フィールド９
０２は、全て０に初期化されている。FIG. 12 shows an instruction sequence information table 900 created by executing the basic block division processing 401a for the machine language instruction sequence of FIG. Data corresponding to each instruction is set in the instruction field 901, the definition register field 903, and the reference register field 904. Start cycle number field 9
02 are all initialized to 0.

【０１０５】図１３は、図１２の命令列情報テーブル９
００に対してサイクル数計算・インタロック発生回数集
計処理４０１ｂを実行した結果を示す。開始サイクル数
フィールド９０２に、それぞれの命令が開始されるサイ
クル数が設定されている。FIG. 13 shows the instruction sequence information table 9 of FIG.
The result of executing the cycle number calculation / interlock occurrence frequency totaling process 401b for 00 is shown. The number of cycles at which each instruction is started is set in the start cycle number field 902.

【０１０６】次に、実行時間計算処理４０１ｃについて
説明する。Next, the execution time calculation process 401c will be described.

【０１０７】図１４は、実行回数情報８０（図２）を格
納する実行回数情報テーブル１２０の内容の一例を示し
たものである。実行回数情報テーブル１２０は、基本ブ
ロックを識別するためのラベル名を格納するラベル名フ
ィールド１２１と、実際にその基本ブロックが実行され
た回数を格納する実行回数フィールド１２２とからな
る。FIG. 14 shows an example of the contents of the execution count information table 120 which stores the execution count information 80 (FIG. 2). The execution count information table 120 includes a label name field 121 that stores a label name for identifying a basic block and an execution count field 122 that stores the number of times the basic block was actually executed.

【０１０８】図１１は、実行時間計算処理４０１ｃの処
理手順を示すフローチャートである。FIG. 11 is a flow chart showing the processing procedure of the execution time calculation processing 401c.

【０１０９】まず、ステップ７０５ａでは、実行回数情
報テーブル１２０に格納された実行回数情報があるかど
うかを判定する。実行回数情報があれば、ステップ７０
１ａに進む。実行回数情報がなければ、実行時間計算処
理を終了する。First, in step 705a, it is determined whether or not there is the execution count information stored in the execution count information table 120. If there is execution count information, step 70
Go to 1a. If there is no execution count information, the execution time calculation process ends.

【０１１０】ステップ７０１ａでは、まず、実行回数情
報を１つ読み込む。次に、読み込んだ実行回数情報のラ
ベル名に対応する基本ブロックの基本ブロック情報９５
０からサイクル数９５３を取り出す。次に、実行回数と
サイクル数とから実行時間を計算する。最後に、計算結
果を表示装置４に表示する。そして、ステップ７０５ａ
に進む。In step 701a, first, one execution count information is read. Next, basic block information 95 of the basic block corresponding to the label name of the read execution count information
The cycle number 953 is taken out from 0. Next, the execution time is calculated from the number of executions and the number of cycles. Finally, the calculation result is displayed on the display device 4. Then, step 705a
Proceed to.

【０１１１】図１５は、計算した実行時間とインタロッ
ク発生回数の表示例１３０である。表示する内容は、基
本ブロックのラベル名、実行回数、その基本ブロックの
１回あたりのサイクル数と実行時間、およびインタロッ
ク種別ごとの発生回数である。FIG. 15 is a display example 130 of the calculated execution time and the number of interlock occurrences. The contents to be displayed are the label name of the basic block, the number of executions, the number of cycles and the execution time per basic block, and the number of occurrences for each interlock type.

【０１１２】以上説明した実施例を用いることにより、
基本ブロックの実行時間を算出し、さらにインタロック
種別ごとの発生回数を取得し、それぞれの結果を表示す
ることができるようになる。これにより、プログラムの
どの箇所の実行比率が大きいかを検出することができ、
プログラムの性能を評価できる。By using the embodiment described above,
It becomes possible to calculate the execution time of the basic block, obtain the number of occurrences for each interlock type, and display the respective results. This makes it possible to detect in which part of the program the execution ratio is high,
Evaluate program performance.

【０１１３】なお、上記実施例において、性能解析対象
のソースプログラムがループ処理を含む場合には、その
ループを構成する部分命令列群の各実行サイクル数と実
行回数とからループの実行時間を算出し、それを表示す
るようにしてもよい。In the above embodiment, when the source program of the performance analysis includes a loop process, the execution time of the loop is calculated from the number of execution cycles and the number of executions of the partial instruction sequence group forming the loop. However, it may be displayed.

【０１１４】図１６は、ループを含むソースプログラム
の例である。この例を用いてループの実行時間算出手順
を説明する。ループは、基本ブロック分割処理により、
幾つかの基本ブロックに分割される。この図の例では、
３つの基本ブロック１４１，１４２，１４３に分割され
る。１つの基本ブロックの実行時間の算出は、上述した
手順で行なうことができ、この例の３つの基本ブロック
に対しても実行時間を算出できる。したがって、ループ
の実行時間は、これら３つの基本ブロックのそれぞれの
実行時間を単純に加えたものとして算出可能である。FIG. 16 is an example of a source program including a loop. The loop execution time calculation procedure will be described using this example. The loop is divided into basic blocks,
It is divided into several basic blocks. In this example,
It is divided into three basic blocks 141, 142, 143. The execution time of one basic block can be calculated by the procedure described above, and the execution time can also be calculated for the three basic blocks in this example. Therefore, the execution time of the loop can be calculated by simply adding the execution times of the three basic blocks.

【０１１５】ループの実行時間を表示することにより、
そのプログラムのループ部の性能を評価することができ
る。By displaying the execution time of the loop,
The performance of the loop part of the program can be evaluated.

【０１１６】[0116]

【発明の効果】以上説明したように、本発明によれば、
パイプラインインタロックを考慮した実行時間を知るこ
とができ、また実際に実行した実行パスを対象にしたプ
ログラムの性能解析ができる。さらに、ソースプログラ
ムの変更を行う必要もない。As described above, according to the present invention,
You can know the execution time considering the pipeline interlock, and you can analyze the performance of the program for the actual execution path. Furthermore, it is not necessary to change the source program.

[Brief description of drawings]

【図１】本発明の一実施例に係るプログラム性能解析方
法を適用したプログラム性能解析システムのシステム構
成図である。FIG. 1 is a system configuration diagram of a program performance analysis system to which a program performance analysis method according to an embodiment of the present invention is applied.

【図２】機械語命令列および実行回数情報を得るための
処理の流れ図である。FIG. 2 is a flowchart of a process for obtaining a machine language instruction string and execution count information.

【図３】プログラム性能解析部における処理の流れ図で
ある。FIG. 3 is a flowchart of processing in a program performance analysis unit.

【図４】ソースプログラムと機械語命令列の一例を示す
図である。FIG. 4 is a diagram showing an example of a source program and a machine language instruction sequence.

【図５】基本ブロック情報テーブル、命令列情報テーブ
ル、およびインタロック集計テーブル図である。FIG. 5 is a basic block information table, an instruction string information table, and an interlock aggregation table.

【図６】インタロック情報の内容の例を示す図である。FIG. 6 is a diagram showing an example of contents of interlock information.

【図７】プログラム性能解析処理の動作を示すフローチ
ャートである。FIG. 7 is a flowchart showing the operation of program performance analysis processing.

【図８】基本ブロック分割処理の動作を示すフローチャ
ートである。FIG. 8 is a flowchart showing an operation of basic block division processing.

【図９】基本ブロック分割処理の動作を示すフローチャ
ートである。FIG. 9 is a flowchart showing an operation of basic block division processing.

【図１０】基本ブロックのサイクル数計算・インタロッ
ク発生回数集計処理の動作を示すフローチャートであ
る。FIG. 10 is a flowchart showing the operation of a cycle number calculation / interlock occurrence frequency totaling process of a basic block.

【図１１】実行時間計算処理の動作を示すフローチャー
トである。FIG. 11 is a flowchart showing the operation of execution time calculation processing.

【図１２】基本ブロック分割処理の実行後の命令列情報
テーブルの内容例を示す図である。FIG. 12 is a diagram showing an example of contents of an instruction sequence information table after execution of basic block division processing.

【図１３】サイクル数計算・インタロック発生回数集計
処理後の命令列情報テーブルの内容例を示す図である。FIG. 13 is a diagram showing an example of the contents of an instruction sequence information table after the cycle number calculation / interlock occurrence number aggregation process.

【図１４】実行回数情報を示す図である。FIG. 14 is a diagram showing execution count information.

【図１５】実行時間とインタロック発生回数の表示例で
ある。FIG. 15 is a display example of execution time and interlock occurrence frequency.

【図１７】実施例で扱う命令の種別とアセンブラでの記
述方法を示す図である。FIG. 17 is a diagram showing the types of instructions handled in the example and the description method in the assembler.

[Explanation of symbols]

４…表示装置、３０…プログラム性能解析、３１…基本
ブロック分割、３２…サイクル数計算、３３…インタロ
ック集計、３４…実行時間計算、４０…インタロック情
報、７０…機械語命令列、８０…実行回数情報。4 ... Display device, 30 ... Program performance analysis, 31 ... Basic block division, 32 ... Cycle number calculation, 33 ... Interlock aggregation, 34 ... Execution time calculation, 40 ... Interlock information, 70 ... Machine language instruction string, 80 ... Execution count information.

─────────────────────────────────────────────────────
─────────────────────────────────────────────────── ───

【手続補正書】[Procedure amendment]

【提出日】平成５年７月８日[Submission date] July 8, 1993

【手続補正１】[Procedure Amendment 1]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】図面の簡単な説明[Name of item to be corrected] Brief description of the drawing

【補正方法】変更[Correction method] Change

【補正内容】[Correction content]

【図面の簡単な説明】[Brief description of drawings]

【図７】プログラム性能解析処理の動作を示すフロー
チャートである。FIG. 7 is a flowchart showing the operation of program performance analysis processing.

【図８】基本ブロック分割処理の動作を示すフローチ
ャートである。FIG. 8 is a flowchart showing an operation of basic block division processing.

【図１６】ループを含むソースプログラムの例である。FIG. 16 is an example of a source program including a loop.

【符号の説明】４…表示装置、３０…プログラム性能解析、３１…基本
ブロック分割、３２…サイクル数計算、３３…インタロ
ック集計、３４…実行時間計算、４０…インタロック情
報、７０…機械語命令列、８０…実行回数情報。[Explanation of Codes] 4 ... Display device, 30 ... Program performance analysis, 31 ... Basic block division, 32 ... Cycle number calculation, 33 ... Interlock aggregation, 34 ... Execution time calculation, 40 ... Interlock information, 70 ... Machine language Instruction string, 80 ... Execution count information.

Claims

[Claims]

1. A division step of dividing an input machine language instruction sequence into basic blocks which are continuous partial instruction sequences, an execution number input step of inputting the number of times of execution of the basic block, and a pipeline interlock. Then, a cycle number calculation step for obtaining the number of execution cycles of the basic block, and an execution time calculation step for calculating the execution time of the basic block using the number of execution cycles and the number of executions of the basic block are provided. A program performance analysis method characterized by the above.

2. The program performance analysis method according to claim 1, wherein in the dividing step, the machine language instructions of the machine language instruction sequence are read one by one from the beginning, and when the read instruction is a branch instruction, the branch is executed. A program performance analysis characterized in that the machine language instruction sequence is divided into basic blocks by dividing between an instruction and the next instruction, or when the read instruction is a label, it is divided before the label. Method.

3. The program performance analysis method according to claim 1, wherein the machine language instruction sequence is created by compiling a source program.

4. The program performance analysis method according to claim 3, further comprising: when compiling the source program, generating a load module added with an object code for measuring an execution count for each basic block. An execution count output code generation processing step for outputting, and an execution count output processing step for executing the load module, measuring the execution count of each basic block, and outputting to the execution count input step. Program performance analysis method.

5. The program performance analysis method according to claim 1, wherein the step of calculating the number of cycles includes an interlock when an instruction string in which pipeline interlock occurs is generated when calculating the number of execution cycles of consecutive instruction strings. A program performance analysis method characterized by also performing a process of totaling the number of occurrences of each type.

6. The program performance according to claim 3, wherein when the source program includes a loop, the execution time of the loop is calculated from the number of execution cycles and the number of executions of basic blocks forming the loop. analysis method.

7. The program performance analysis method according to claim 1, further comprising a step of displaying a performance analysis result of the program.