JP2014075046A

JP2014075046A - Trace generation method, device, and program, and multilevel compilation using the method

Info

Publication number: JP2014075046A
Application number: JP2012222362A
Authority: JP
Inventors: Hiroshi Inoue; 拓井上
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 2012-10-04
Filing date: 2012-10-04
Publication date: 2014-04-24
Also published as: US20140101643A1; US9104433B2

Abstract

PROBLEM TO BE SOLVED: To provide a technique for actualizing multilevel compilation by a trace-based compiler.SOLUTION: A trace generation device includes: a TT graph generation part 220 which generates a digraph representing transition of execution between traces through execution of a compiled trace generated at low optimization level while a maximum length is limited to a predetermined length or less; a TT graph update part 221 which increases a recompilation counter for a trace reached by tracing back an edge of the digraph starting at a node corresponding to a trace having a timer tick in timer-based sampling and making a stop before a cycle trace or where the edge ends; and a trace generation part 222 which generates a new trace by allowing a trace longer than the predetermined length to be generated starting at the head of the corresponding trace as the head of the new trace on condition that a value of one recompilation counter exceeds a first threshold.

Description

本発明は、トレースベースのコンパイラ技術に関し、より詳細には、トレースベースのコンパイラに適用するマルチレベルのコンパイレーション技術に関する。 The present invention relates to a trace-based compiler technique, and more particularly to a multi-level compilation technique applied to a trace-based compiler.

従来、頻繁に実行される一続きのコード列（以下、「トレース」という）をコンパイル及び実行の基本単位とするトレースベースのコンパイラが知られている（例えば、非特許文献１、２、及び３を参照）。トレースベースのコンパイラではトレースの選択が重要な鍵となる。より長いトレースの生成は、コンパイラによる最適化の機会を増やし、コンパイル済みトレース間の遷移によるオーバーヘッドを減らす。しかしながら長いトレースの生成はしばしば重複したトレースの生成を引き起こし、コードサイズやコンパイル時間を増加させ、起動時のパフォーマンスを下げる（例えば、非特許文献１を参照）。 Conventionally, a trace-based compiler using a series of frequently executed code strings (hereinafter referred to as “trace”) as a basic unit of compilation and execution is known (for example, Non-Patent Documents 1, 2, and 3). See). Trace selection is an important key for trace-based compilers. The generation of longer traces increases the opportunity for optimization by the compiler and reduces the overhead due to transitions between compiled traces. However, the generation of long traces often results in the generation of duplicate traces, increasing code size and compilation time, and reducing startup performance (see, for example, Non-Patent Document 1).

トレースベースのコンパイラにおいてより長いトレースを生成することは、メソッドベースのコンパイラにおいて積極的にメソッド・インラインを行うことでコンパイルスコープを広げることに似ている。メソッドベースのＪＩＴコンパイラを用いるシステムでは、高速なスタートアップと高いピーク性能を両立させるために適応的マルチレベルのコンパイレーション技術が広く用いられている（例えば、非特許文献４〜８及び特許文献１を参照）。適応的マルチレベルのコンパイレーション技術では、プログラムの起動時には低い最適化レベルでコンパイルを行い、起動後においては、プロファイリングにより実行時間をより多く使っているメソッドをみつけより高い最適化レベルで再コンパイルを行う。トレースベースのコンパイラにおいても、マルチレベルのコンパイレーションを行って、起動速度とピークパフォーマンスの両立を図ることが望まれる。 Producing longer traces in a trace-based compiler is similar to expanding the compilation scope by aggressive method inlining in a method-based compiler. In a system using a method-based JIT compiler, an adaptive multi-level compilation technique is widely used in order to achieve both high-speed startup and high peak performance (for example, Non-Patent Documents 4 to 8 and Patent Document 1). reference). Adaptive multi-level compilation technology compiles at a low optimization level at program startup, and after startup finds methods that use more execution time by profiling and recompiles at a higher optimization level. Do. Even in a trace-based compiler, it is desired to achieve multi-level compilation to achieve both startup speed and peak performance.

なお、以下の先行技術文献のリストにおいて、非特許文献１０は、本発明の実施例において使用するトレース同士を結合する最適化技術（trace linking optimization）についての背景技術としてリストしたものである。また、非特許文献１１は、本発明の実施例において使用する技術のもととなったバーストトレーシング技術についての背景技術としてリストしたものである。また、非特許文献１２乃至１４は、再コンパイルをサポートする既存のトレーシング技術としてリストしたものである。しかしながら、非特許文献１２における再コンパイルは、頻繁にアボートするトレースを修正するためのものであり、本発明のようにアップグレードを目的とするものではない。また、非特許文献１３及び１４における１回目のコンパイルは、実行を監視するために挿入されたコードをコンパイルするためのものであり、これら文献における再コンパイルは通常のコンパイルに相当し、本発明のようにアップグレードを目的とするものではない。 In the following list of prior art documents, Non-Patent Document 10 is listed as a background art about optimization technology (trace linking optimization) for linking traces used in the embodiments of the present invention. Non-Patent Document 11 is a list of background technologies related to the burst tracing technology that is the basis of the technology used in the embodiments of the present invention. Non-Patent Documents 12 to 14 are listed as existing tracing technologies that support recompilation. However, the recompilation in Non-Patent Document 12 is for correcting frequently aborted traces, and is not intended to be upgraded as in the present invention. The first compilation in Non-Patent Documents 13 and 14 is for compiling the inserted code for monitoring the execution, and the recompilation in these documents corresponds to a normal compilation. Is not intended for upgrade.

米国特許第７３８６６８６号明細書US Pat. No. 7,386,686 米国特許第６９７１０９１号明細書US Pat. No. 6,971,091

P. Wu, H. Hayashizaki, H. Inoue,and T. Nakatani, “Reducing Trace Selection Footprint for Large-scale JavaApplications with no Performance Loss”, in Proceedings of the ACMObject-Oriented Programming, Systems, Languages & Applications, pp.789-804, 2011.P. Wu, H. Hayashizaki, H. Inoue, and T. Nakatani, “Reducing Trace Selection Footprint for Large-scale JavaApplications with no Performance Loss”, in Proceedings of the ACMObject-Oriented Programming, Systems, Languages & Applications, pp. 789-804, 2011. H. Inoue, H. Hayashizaki, P. Wu,and T. Nakatani, “A Trace-based Java JIT Compiler Retrofitted from aMethod-based Compiler”, in Proceedings of the International Symposium on CodeGeneration and Optimization, pp.246-256, 2011.H. Inoue, H. Hayashizaki, P. Wu, and T. Nakatani, “A Trace-based Java JIT Compiler Retrofitted from aMethod-based Compiler”, in Proceedings of the International Symposium on CodeGeneration and Optimization, pp.246-256, 2011. H. Hayashizaki, P. Wu, H. Inoue, M. Serrano, and T. Nakatani, “Improvingthe Performance of Trace-based Systems by False Loop Filtering”, In Proceedingsof Sixteenth International Conference on Architectural Support for ProgrammingLanguages and Operating Systems, pp. 405-418, 2011.H. Hayashizaki, P. Wu, H. Inoue, M. Serrano, and T. Nakatani, “Improving the Performance of Trace-based Systems by False Loop Filtering”, In Proceedingsof Sixteenth International Conference on Architectural Support for ProgrammingLanguages and Operating Systems, pp 405-418, 2011. MichaelPaleczny, Christopher Vick, and Cliff Click, “TheJava Hotspot TM Server Compiler”, in Proceedings of the USENIX Java VirtualMachine Research and Technology Symposium, pp.1-12, 2001.MichaelPaleczny, Christopher Vick, and Cliff Click, “TheJava Hotspot TM Server Compiler”, in Proceedings of the USENIX Java VirtualMachine Research and Technology Symposium, pp.1-12, 2001. N.Greevski, A. Kielstra, K. Stoodley, M. Stoodley, and V. Sundaresan,“Java just-in-time compiler and virtual machine improvements for server andmiddleware application”. In Proceedings of the USENIX Virtual Machine Researchand Technology Symposium, pp. 151-162, 2004.N. Greevski, A. Kielstra, K. Stoodley, M. Stoodley, and V. Sundaresan, “Java just-in-time compiler and virtual machine improvements for server and middleware application”. In Proceedings of the USENIX Virtual Machine Research and Technology Symposium, pp. 151-162, 2004. T. Suganuma, T. Yasue, M. Kawahito,H. Komatsu, and T. Nakatani, “A dynamic optimization framework for a Javajust-in-time compiler”, in Proceedings of the ACM Conference on Object-OrientedProgramming Systems, Languages, and Applications, pp.180-195, 2001T. Suganuma, T. Yasue, M. Kawahito, H. Komatsu, and T. Nakatani, “A dynamic optimization framework for a Javajust-in-time compiler”, in Proceedings of the ACM Conference on Object-OrientedProgramming Systems, Languages, and Applications, pp.180-195, 2001 M. Arnold, S. Fink, D. Grove, M. Hind, andP.F.Sweeney, “Adaptive optimization in the Jalapeno JVM”, in Proceedings of theACM SIGPLAN conference on Object-oriented programming, systems, languages, andapplications, pp.47-65, 2000.M. Arnold, S. Fink, D. Grove, M. Hind, and P.F.Sweeney, “Adaptive optimization in the Jalapeno JVM”, in Proceedings of the ACM SIGPLAN conference on Object-oriented programming, systems, languages, andapplications, pp .47-65, 2000. U. Holzle and D. Ungar, “A thirdgeneration self implementation: Reconciling responsiveness with performance”,in Proceedings of the ACM conference on Object-Oriented Programming, Systems,Languages, and Applications, pp. 229-243, 1994.U. Holzle and D. Ungar, “A thirdgeneration self implementation: Reconciling responsiveness with performance”, in Proceedings of the ACM conference on Object-Oriented Programming, Systems, Languages, and Applications, pp. 229-243, 1994. T. Mytkowicz, A. Diwan, M. Hauswirth, and P. F. Sweeney, “Evaluatingthe accuracy of Java profilers”, in Proceedings of the ACM SIGPLAN conferenceof Programming language design and implementation, pp. 1879-197. 2010.T. Mytkowicz, A. Diwan, M. Hauswirth, and P. F. Sweeney, “Evaluating the accuracy of Java profilers”, in Proceedings of the ACM SIGPLAN conferenceof Programming language design and implementation, pp. 1879-197. 2010. V. Bala, E. Duesterwald, and S. Banerjia,“Dynamo: A Transparent Runtime Optimization System”, in Proceedings of the ACMProgramming Language Design and Implementation, pp. 1-12, 2000.V. Bala, E. Duesterwald, and S. Banerjia, “Dynamo: A Transparent Runtime Optimization System”, in Proceedings of the ACM Programming Language Design and Implementation, pp. 1-12, 2000. M. Hirzel, and T. M. Chilimbi, “Burst tracing: a framework forlow-overhead temporal profiling”, in Proceedings of the 4th Workshopon Feedback-Directed and Dynamic Optimization, pp. 117-126, 2001.M. Hirzel, and T. M. Chilimbi, “Burst tracing: a framework forlow-overhead temporal profiling”, in Proceedings of the 4th Workshopon Feedback-Directed and Dynamic Optimization, pp. 117-126, 2001. C. Haubl and H. Mossenbock, “Trace-based Compilation for the JavaHotSpot Virtual Machine”, in Proceedings of the International Conference on thePrinciples and Practice of Programming in Java, pp. 129-138, 2011.C. Haubl and H. Mossenbock, “Trace-based Compilation for the JavaHotSpot Virtual Machine”, in Proceedings of the International Conference on the Principles and Practice of Programming in Java, pp. 129-138, 2011. M. Bebenita, F. Brandner, M. Fahndrich, F. Logozzo, W. Schulte, N. Tillmann, and H. venter, “SPUR: A trace-based JITcompiler for CIL”, in Proceedings of the ACM international conference on Objectoriented programming systems languages and applications, pp. 708-725, 2010.M. Bebenita, F. Brandner, M. Fahndrich, F. Logozzo, W. Schulte, N. Tillmann, and H. venter, “SPUR: A trace-based JITcompiler for CIL”, in Proceedings of the ACM international conference on Objectoriented programming systems languages and applications, pp. 708-725, 2010. M. Bebenita, M. Chang, G. Wagner, A. Gal, C. Wimmer, and M. Franz, “Trace-basedcompilation in execution environments without interpreters”, in Proceedings ofthe 8th International Conference on the Principles and Practice ofProgramming in Java, pp. 59-68, 2010.M. Bebenita, M. Chang, G. Wagner, A. Gal, C. Wimmer, and M. Franz, “Trace-basedcompilation in execution environments without interpreters”, in Proceedings ofthe 8th International Conference on the Principles and Practice of Programming in Java , pp. 59-68, 2010.

トレースベースのコンパイラにおいてマルチレベルのコンパイレーション技術を使用するためには、再コンパイル時に長いトレースを生成してより大きなコンパイルスコープを得る必要がある。しかしながらコンパイルの基本単位をメソッドとするメソッドベースのコンパイラと異なり、トレースの生成ではコンパイルスコープの開始位置及び終了位置の自由度が高いため、単にトレースの最大長の制約を緩めるだけでは上述したように重複したトレースや開始位置が不適切なトレースが生成されてしまう。 In order to use multi-level compilation techniques in a trace-based compiler, it is necessary to generate a long trace during recompilation to obtain a larger compilation scope. However, unlike a method-based compiler that uses the basic unit of compilation as a method, since the generation of traces has a high degree of freedom in the start and end positions of the compilation scope, simply relaxing the maximum trace length constraint as described above Duplicate traces and traces with inappropriate start positions are generated.

メソッドベースのコンパイラでは頻繁に実行されるメソッドを見つけるために、タイマーベースのサンプリングが利用されている（例えば、特許文献２、非特許文献９を参照）。タイマーベースのサンプリングでは安全な位置で実行を停止させるために、メソッドの先頭、ループのバックエッジにyieldpoint (async check point) が挿入される。そして実行時にタイマー割り込みが起きるとフラグが設定され、スレッドが次のyieldpointに達したときに止まる必要があることが示される。なお停止した位置がメソッドの先頭に挿入されたyieldpointである場合は、１段のスタックウォークを行ってメソッドの呼び出し元を特定し、呼び出し元にタイマーティック（timer tick）があたったとして実行時間がチャージされる。また、メソッドの呼び出しとループを含まないメソッドについては、メソッドを抜ける場所（リターン）にyieldpointが挿入されて、タイマーが全く当たらないということがないようにされる。 In the method-based compiler, timer-based sampling is used to find a frequently executed method (see, for example, Patent Document 2 and Non-Patent Document 9). In timer-based sampling, a yieldpoint (async check point) is inserted at the beginning of the method and at the back edge of the loop to stop execution at a safe position. And when a timer interrupt occurs at run time, a flag is set to indicate that the thread needs to stop when it reaches the next yieldpoint. If the yield point is the yieldpoint inserted at the beginning of the method, the caller of the method is identified by performing a one-step stack walk, and the execution time is assumed to be a timer tick. Charged. For methods that do not include method calls and loops, a yieldpoint is inserted at the place where the method exits (return) to prevent the timer from hitting at all.

トレースベースのコンパイラにおいても、タイマーベースのサンプリングを利用して頻繁に実行される直線実行パスを見つけることが考えられる。しかしながら、トレースを抜ける場所にyield pointを挿入しようとすると、条件分岐ごとに出口（exit）が存在するため挿入数が多くなり、コードサイズが大きくなる。また、トレースベースの実行では元の場所に戻るということがないため、スタックウォークなどの手段では停止したトレースの直前のトレースを見つけることができず、実行時間を適切にチャージできない。 Even in a trace-based compiler, it is conceivable to find a linear execution path that is frequently executed by using timer-based sampling. However, if a yield point is to be inserted at a location that leaves the trace, the number of insertions increases because there is an exit for each conditional branch, and the code size increases. Also, since trace-based execution does not return to the original location, it is not possible to find the trace immediately before the stopped trace by means such as a stack walk, and the execution time cannot be charged appropriately.

この発明は、上記の問題点を解決するためになされたものであって、トレースベースのコンパイラにおいてマルチレベルのコンパイレーションを実現することのできる技術を提供することを目的とする。本発明はまた、タイマーベースのサンプリングを利用して、重複したトレースの生成と、更に望ましくはコードサイズの増加とを避けながら、長い実行時間を費やすトレースを見つけ出すことのできる技術を提供することを目的とする。 The present invention has been made to solve the above-described problems, and an object thereof is to provide a technique capable of realizing multi-level compilation in a trace-based compiler. The present invention also provides a technique that can use timer-based sampling to find traces that spend a long execution time while avoiding the generation of duplicate traces and more preferably increasing the code size. Objective.

上記課題を解決するために、本発明の１態様によれば、以下のようなコンピュータによるトレースの生成方法が提供される。該トレース生成方法は、（ａ）前記コンピュータが、最大長を所定長以下に制限されたトレースをコンパイルして得られたコンパイル済みトレースの実行に基づいて、トレース間の実行の遷移を表す有向グラフ（以下、「ＴＴ（Trace Transition)グラフ」という）を作成するステップであって、トレースを示す各ノードが再コンパイルカウンタを有する、前記作成するステップと、（ｂ）前記コンピュータが、前記コンパイル済みコードの実行中におけるタイマーベースのサンプリングにおいて、タイマーティックがあたったトレースに対応するノードを出発点として前記ＴＴグラフのエッジを逆方向に辿り、周期トレース若しくは再コンパイル済みトレースの手前又はエッジがなくなったところで停止して辿り着いたトレースの前記再コンパイルカウンタを増加するステップと、（ｃ）前記コンピュータが、いずれかの前記再コンパイルカウンタの値が第１閾値を超えることを条件に、前記第１閾値を超えた前記再コンパイルカウンタを有するノードに対応するトレースの先頭を新たなトレースの先頭として決定し、前記所定長より長いトレースの生成を許可して前記新たなトレースを生成するステップとを含む。 In order to solve the above-described problems, according to one aspect of the present invention, there is provided the following method for generating a trace by a computer. The trace generation method includes: (a) a directed graph representing a transition of execution between traces based on execution of a compiled trace obtained by compiling a trace whose maximum length is limited to a predetermined length or less by the computer. (Hereinafter referred to as “TT (Trace Transition) graph”), wherein each node indicating a trace has a recompile counter, and (b) the computer In timer-based sampling during execution, the edge of the TT graph is traced in the reverse direction starting from the node corresponding to the trace hit by the timer tick, and stopped before the periodic trace or the recompiled trace, or when the edge disappears The recompilation of the trace that arrived Incrementing a counter; and (c) the computer corresponds to a node having the recompile counter that exceeds the first threshold, provided that any of the recompile counter values exceeds a first threshold. Determining the beginning of the trace to be performed as the beginning of a new trace, permitting generation of a trace longer than the predetermined length, and generating the new trace.

好ましくは、前記コンパイル済みのコードは、前記タイマーティックがあたったトレースを見つけるためのyield pointをトレースの先頭及びループのバックエッジにのみ挿入されている。 Preferably, in the compiled code, yield points for finding the trace hit by the timer tick are inserted only at the beginning of the trace and the back edge of the loop.

好ましくは、前記ＴＴグラフの各エッジは該エッジが表す遷移の相対頻度を示す重みを有する。そして上記トレース生成方法は、タイマーティックがあたったトレースとその直前に実行されたトレースとの間のエッジの前記重みを増加するステップ（ｄ）を更に含み、上記ステップ（ｂ）において、前記コンピュータは、その重みが所定の条件を満たすエッジのみを辿って辿り着いたトレースの前記再コンパイルカウンタを増加する。 Preferably, each edge of the TT graph has a weight indicating the relative frequency of the transition represented by the edge. The trace generation method further includes a step (d) of increasing the weight of an edge between a trace hit with a timer tick and a trace executed immediately before the trace, and in the step (b), the computer The recompilation counter of the trace arrived by tracing only the edge whose weight satisfies the predetermined condition is incremented.

ここで前記所定の条件を満たすエッジとは、前記有向グラフを逆方向に辿る途中に存在するノードに入ってくるエッジが複数ある場合には、該複数のエッジの重みの合計に対するそのエッジの重みの比率が第２閾値を超えるエッジである。 Here, the edge satisfying the predetermined condition is that when there are a plurality of edges entering a node existing in the reverse direction of the directed graph, the weight of the edge with respect to the sum of the weights of the plurality of edges. An edge whose ratio exceeds the second threshold.

また、次に辿ろうとするノードから出ているエッジが複数ある場合には、前記所定の条件を満たすエッジとは、該複数のエッジの重みの合計に対する現在のノードから前記次のノードへのエッジの重みの比率が第３閾値を超えるエッジである。 In addition, when there are a plurality of edges coming out from the node to be traced next, the edge satisfying the predetermined condition is an edge from the current node to the next node with respect to the sum of the weights of the plurality of edges. This is an edge whose weight ratio exceeds the third threshold.

更に好ましくは、ステップ（ｄ）は、前記コンピュータが、前記ＴＴグラフ上で前記タイマーティックがあたったトレースに続く１以上のトレースのyield pointにおいて連続して実行が停止するための設定を行うステップを含む。そして、前記連続して実行が停止するための設定は、実行が、周期トレース、既に停止済みのトレース及びコンパイル済みのトレースのうちのいずれかに到達したこと、連続した実行の停止が所定回数に達したこと、又は次のトレースが存在しないトレースを抜けたことに応答して終了する。 More preferably, the step (d) includes a step in which the computer performs a setting for continuously stopping execution at a yield point of one or more traces following the trace hit by the timer tick on the TT graph. Including. The setting for stopping the execution is that the execution has reached one of a periodic trace, a trace that has already been stopped, and a trace that has been compiled. Exit in response to having reached or exiting a trace for which the next trace does not exist.

また好ましくは、前記コンパイル済みのトレースは、その実行によりトレースを抜けることとなった命令のポインタを記録する命令を挿入されている。そして、ステップ（ｄ）は、トレースの入り口に挿入されたyield pointで実行が停止することに応答して、前記コンピュータが記録された前記命令のポインタの値を参照することにより直前に実行されたトレースを特定するステップを含む。 Preferably, the compiled trace is inserted with an instruction for recording a pointer of an instruction that has exited the trace due to its execution. Then, step (d) was executed immediately before by the computer referring to the recorded pointer value of the instruction in response to execution stopping at the yield point inserted at the entrance of the trace. Including the step of identifying the trace.

また、本発明の他の態様によれば、以下のようなコンピュータによるマルチレベルのコンパイレーション実行方法が提供される。該マルチレベルのコンパイレーション実行方法は、（ａ）前記コンピュータが、最大長を所定長以下に制限して生成されたトレースをコンパイルするステップと、（ｂ）前記コンピュータが、生成したコンパイル済みのトレースの実行結果を取得するステップと、（ｃ）前記コンピュータが、取得した実行結果に対して上記説明したいずれかのトレース生成方法の各ステップを実行するステップと、（ｄ）ステップ（ｃ）の結果生成された前記新たなレースに対して前記コンピュータが再コンパイルを実行するステップとを含む。 Further, according to another aspect of the present invention, there is provided a multilevel compilation execution method by a computer as follows. The multi-level compilation execution method includes: (a) a step in which the computer compiles a trace generated by limiting a maximum length to a predetermined length or less; and (b) a compiled trace generated by the computer. (C) a step in which the computer executes each step of any one of the trace generation methods described above on the acquired execution result; and (d) a result of step (c). And recompiling the computer for the new race created.

以上、トレースの生成方法及びマルチレベルのコンパイレーション実行方法として本発明を説明したが、本発明は、上記説明したトレース生成方法及びマルチレベルのコンパイレーション実行方法のそれぞれの各ステップをコンピュータに実行させるためのトレース生成プログラム及びマルチレベルのコンパイレーション実行プログラムとして把握することもできる。また、そのようなトレース生成プログラム及びマルチレベルのコンパイレーション実行プログラムをそれぞれ、１以上のコンピュータにインストールすることにより実現されるトレース生成装置及びマルチレベルのコンパイレーション実行装置として把握することもできる。 Although the present invention has been described above as a trace generation method and a multilevel compilation execution method, the present invention causes a computer to execute each step of the trace generation method and the multilevel compilation execution method described above. It can also be understood as a trace generation program and a multi-level compilation execution program. It is also possible to grasp such a trace generation program and a multi-level compilation execution program that are realized by installing the trace generation program and the multi-level compilation execution program on one or more computers.

本発明によれば、タイマーベースのサンプリングを用いてトレース間の実行の遷移を表すＴＴグラフを作成しこれに基づいて適切なトレースに実行時間のチャージを行うので、重複が少なくより長い真にホットなトレースを見つけることができる。また、ＴＴグラフを用いてより長い実行時間を消費するトレースを選択できるため、トレースベースのコンパイラにおいてマルチレベルのコンパイレーションを行うことが可能となり、高速なスタートアップと高いピーク性能の両立を図ることができる。本発明のその他の効果については、各実施の形態の記載から理解される。 In accordance with the present invention, timer-based sampling is used to create a TT graph that represents the transition of execution between traces, and based on this, the execution time is charged to the appropriate trace, so there is less overlap and a longer, truly hot Traces can be found. In addition, since a trace that consumes a longer execution time can be selected using a TT graph, it becomes possible to perform multi-level compilation in a trace-based compiler, thereby achieving both high-speed startup and high peak performance. it can. Other effects of the present invention will be understood from the description of each embodiment.

本発明の実施の形態に係るトレース生成装置及びマルチレベルのコンパイラ装置を実現するのに好適なコンピュータ・システム１００のハードウェア構成の一例を示した図である。1 is a diagram showing an example of a hardware configuration of a computer system 100 suitable for realizing a trace generation apparatus and a multi-level compiler apparatus according to an embodiment of the present invention. 図１に示すコンピュータ・システム１００のソフトウェアの構成の一例を示す図である。It is a figure which shows an example of a software structure of the computer system 100 shown in FIG. 図３（ａ）は、線形トレースにおけるyield pointの挿入位置を説明する図である。図３（ｂ）は、周期トレースにおけるyield pointの挿入位置を模式的に示した図であるFIG. 3A is a diagram illustrating the insertion position of the yield point in the linear trace. FIG. 3B is a diagram schematically showing the insertion position of the yield point in the periodic trace. ＴＴグラフの一例を示す図である。It is a figure which shows an example of a TT graph. 図５（ａ）は、ＴＴグラフにおけるエッジの重みの更新方法を説明する図である。図５（ｂ）は、ＴＴグラフにおけるエッジの重みの他の更新方法を説明する図である。FIG. 5A is a diagram for explaining an edge weight update method in the TT graph. FIG. 5B is a diagram for explaining another method for updating edge weights in the TT graph. ＴＴグラフを用いた再コンパイルカウンタの更新方法を説明する図である。It is a figure explaining the update method of the recompilation counter using a TT graph. ＴＴグラフを用いた再コンパイルカウンタの他の更新方法を説明する図である。It is a figure explaining the other update method of the recompilation counter using a TT graph. ＴＴグラフを用いた再コンパイルカウンタの更に他の更新方法を説明する図である。It is a figure explaining the further another update method of the recompilation counter using a TT graph. 図９（ａ）は、ＴＴグラフを用いて選択したトレースの一例を示す図である。図５（ｂ）は、トＴＴグラフを用いることなく選択したトレースの一例を示す図である。FIG. 9A is a diagram illustrating an example of a trace selected using a TT graph. FIG. 5B is a diagram showing an example of a trace selected without using a to-TT graph. 本発明の実施の形態に係るマルチレベルのコンパイレーション処理全体の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the whole multi-level compilation process which concerns on embodiment of this invention. ＴＴグラフの更新処理及び該ＴＴグラフに基づく再コンパイル処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the update process of a TT graph, and the flow of the recompilation process based on this TT graph. ＴＴグラフを用いた再コンパイルカウンタの増加処理の流れの一例を示す図である。It is a figure which shows an example of the flow of the increase process of the recompilation counter using a TT graph. ＴＴグラフの生成及び該ＴＴグラフに基づく再コンパイルの擬似コードの一例を示す図である。It is a figure which shows an example of the pseudo code of the production | generation of TT graph, and the recompilation based on this TT graph. 従来技術と本発明とで起動時間を比較した実験結果を示す図である。It is a figure which shows the experimental result which compared the starting time with a prior art and this invention. 従来技術と本発明とで実行時間を比較した実験結果を示す図である。It is a figure which shows the experimental result which compared execution time with the prior art and this invention. 従来技術と本発明とで合計コンパイル時間を比較した実験結果を示す図である。It is a figure which shows the experimental result which compared the total compilation time with a prior art and this invention.

以下、本発明を実施するための形態を図面に基づいて詳細に説明するが、以下の実施形態は特許請求の範囲にかかる発明を限定するものではなく、また実施形態の中で説明されている特徴の組み合わせの全てが発明の解決手段に必須であるとは限らない。なお、実施の形態の説明の全体を通じて同じ要素には同じ番号を付している。 DESCRIPTION OF EXEMPLARY EMBODIMENTS Hereinafter, modes for carrying out the invention will be described in detail with reference to the drawings. However, the following embodiments do not limit the invention according to the claims, and are described in the embodiments. Not all combinations of features are essential to the solution of the invention. Note that the same numbers are assigned to the same elements throughout the description of the embodiment.

図１は、本発明を実施するのに好適なコンピュータ・システム１００のハードウェア構成の一例を示す。コンピュータ・システム１００は、バス１０６に接続されたメインＣＰＵ（中央処理装置）１０２とメイン・メモリ１０４を含んでいる。ＣＰＵ１０２は好ましくは、32ビット又は64ビットのアーキテクチャに基づくものであり、例えば、インテル社のCore i（商標）シリーズ、Core 2（商標）シリーズ、Atom（商標）シリーズ、Xeon（商標）シリーズ、Pentium（登録商標）シリーズ、Celeron（登録商標）シリーズ、AMD社のPhenom（商標）シリーズ、Athlon（商標）シリーズ、Turion（商標）シリーズ又はSempron（商標）が使用されうる。メイン・メモリ１０４は好ましくは、1GB以上の容量、より好ましくは、2GB以上の容量をもつものであってよい。 FIG. 1 shows an example of a hardware configuration of a computer system 100 suitable for carrying out the present invention. The computer system 100 includes a main CPU (central processing unit) 102 and a main memory 104 connected to a bus 106. The CPU 102 is preferably based on a 32-bit or 64-bit architecture, such as Intel's Core i (TM) series, Core 2 (TM) series, Atom (TM) series, Xeon (TM) series, Pentium (Registered trademark) series, Celeron (registered trademark) series, AMD Phenom (trademark) series, Athlon (trademark) series, Turion (trademark) series, or Sempron (trademark) may be used. The main memory 104 may preferably have a capacity of 1 GB or more, more preferably 2 GB or more.

バス１０６には、ディスプレイ・コントローラ１０８を介して、ディスプレイ１１０、例えば液晶ディスプレイ（ＬＣＤ）が接続されうる。ディスプレイ１１０は、コンピュータの管理のために、通信回線を介してネットワークに接続されたコンピュータについての情報と、そのコンピュータ上で動作中のソフトウェアについての情報を、適当なグラフィック・インタフェースで表示するために使用される。 A display 110, for example, a liquid crystal display (LCD) can be connected to the bus 106 via a display controller 108. The display 110 is used to display information about a computer connected to a network via a communication line and information about software running on the computer with an appropriate graphic interface for managing the computer. used.

バス１０６にはまた、ＳＡＴＡ又はＩＤＥコントローラ１１２を介して、ディスク１１４、例えばシリコン・ディスク又はハードディスクが接続されうる。バス１０６にはまた、ＳＡＴＡ又はＩＤＥコントローラ１１２を介して、任意的に、ドライブ１１６、例えばＣＤ、ＤＶＤまたはＢＤドライブが接続されうる。バス１０６にはさらに、任意的に、キーボード・マウスコントローラ１１８又はＵＳＢバス（図示せず）を介して、キーボード１２０及びマウス１２２が接続されうるが、本発明を実施する上では必要ない。 A disk 114, such as a silicon disk or hard disk, can also be connected to the bus 106 via a SATA or IDE controller 112. The bus 106 may also optionally be connected to a drive 116, such as a CD, DVD or BD drive, via a SATA or IDE controller 112. Further, a keyboard 120 and a mouse 122 may optionally be connected to the bus 106 via a keyboard / mouse controller 118 or a USB bus (not shown), but this is not necessary for implementing the present invention.

ディスク１１４には、オペレーティング・システム、J2EEなどのJava（登録商標）処理環境、Java（登録商標）アプリケーション、Java（登録商標）仮想マシン（VM）を提供するプログラム、その他のプログラム及びデータが、メイン・メモリ１０４にロード可能なように記憶されている。 The disk 114 includes an operating system, a Java (registered trademark) processing environment such as J2EE, a Java (registered trademark) application, a program that provides a Java (registered trademark) virtual machine (VM), and other programs and data. It is stored in the memory 104 so that it can be loaded.

オペレーティング・システムは、例えば、LINUX（登録商標）、マイクロソフト・コーポレーションが提供するWindows（登録商標）オペレーティング・システム、アップル・コンピュータ・インコーポレイテッドが提供するMacOS（登録商標）若しくはiOS（登録商標）、XWindow System備えるUNIX（登録商標）系システム（たとえば、インターナショナル・ビジネス・マシーンズ・コーポレーション（登録商標）が提供するAIX（登録商標）でありうる。 Operating systems include, for example, LINUX (registered trademark), Windows (registered trademark) operating system provided by Microsoft Corporation, MacOS (registered trademark) or iOS (registered trademark) provided by Apple Computer Incorporated, XWindow UNIX (registered trademark) system provided with System (for example, AIX (registered trademark) provided by International Business Machines Corporation (registered trademark)).

上記ディスク１１４には更に、オペレーティング・システムと協働してＣＰＵ１０２に命令を与え、本発明を実施するためのコンピュータ・プログラムを記録することができる。即ち、上記ディスク１１４には、コンピュータ・システム１００にインストールされ、コンピュータ・システム１００を本発明の実施形態によるトレース生成装置として機能させるトレース生成プログラム、コンピュータ・システム１００を本発明の実施形態によるマルチレベルのコンパイレーション実行装置として機能させるマルチレベルのコンパイレーション実行プログラム、及びそれら関連データを記録することができる。なお、マルチレベルのコンパイレーション実行プログラムは、Ｊａｖａ（登録商標）実行時（ＪＩＴ）コンパイラを、トレース生成装置によって生成されたトレースに基づきマルチレベルのコンパイレーションを実行可能なように一部修正することによって実装可能である。 The disk 114 can further record a computer program for giving an instruction to the CPU 102 in cooperation with the operating system to implement the present invention. That is, the disk 114 is installed in the computer system 100, and a trace generation program that causes the computer system 100 to function as a trace generation apparatus according to the embodiment of the present invention. The computer system 100 includes the multi-level according to the embodiment of the present invention. A multi-level compilation execution program that functions as a compilation execution apparatus of the above and related data can be recorded. Note that the multi-level compilation execution program partially modifies the Java (registered trademark) runtime (JIT) compiler so that multi-level compilation can be executed based on the trace generated by the trace generation device. Can be implemented.

上記トレース生成プログラムは、ＴＴグラフ生成モジュールと、ＴＴグラフ更新モジュールと、トレース生成モジュールと、トレースキャッシュを含む。これらプログラム及びモジュールは、ＣＰＵ１０２に働きかけて、コンピュータ・システム１００を、各々後述するＴＴグラフ生成部２２０と、ＴＴグラフ更新部２２１と、トレース生成部２２２と、トレースキャッシュ２２４としてそれぞれ機能させる。また、上記マルチレベルのコンパイレーション実行プログラムは、中間コード生成モジュール、最適化モジュール、コード生成モジュールを含む。これらプログラム及びモジュールは、ＣＰＵ１０２に働きかけて、コンピュータ・システム１００を、各々後述する中間コード生成部２３０と、最適化部２３２と、コード生成部２３４としてそれぞれ機能させる。 The trace generation program includes a TT graph generation module, a TT graph update module, a trace generation module, and a trace cache. These programs and modules cause the CPU 102 to cause the computer system 100 to function as a TT graph generation unit 220, a TT graph update unit 221, a trace generation unit 222, and a trace cache 224, which will be described later. The multi-level compilation execution program includes an intermediate code generation module, an optimization module, and a code generation module. These programs and modules cause the CPU 102 to cause the computer system 100 to function as an intermediate code generation unit 230, an optimization unit 232, and a code generation unit 234, which will be described later.

上記コンピュータ・プログラムは圧縮し、また複数に分割して複数の媒体に記録することもできる。ドライブ１１６は、必要に応じて、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭまたはＢＤからプログラムをディスク１１４にインストールするために使用されうる。 The computer program can be compressed, divided into a plurality of pieces, and recorded on a plurality of media. The drive 116 can be used to install a program from the CD-ROM, DVD-ROM or BD to the disk 114 as required.

通信インタフェース１２６は、例えばイーサネット（登録商標）・プロトコルに従う。通信インタフェース１２６は、通信コントローラ１２４を介してバス１０６に接続され、コンピュータ・システム１００を通信回線１２８に物理的に接続する役割を担い、コンピュータ・システム１００のオペレーティング・システムの通信機能のＴＣＰ／ＩＰ通信プロトコルに対して、ネットワーク・インタフェース層を提供する。なお、通信回線は、有線ＬＡＮ環境に基づくもの、又は、無線ＬＡＮ環境、例えば、ＩＥＥＥ８０２．１１ａ／ｂ／ｇ／ｎなどのＷｉ−Ｆｉ規格に基づくものであってもよい。 The communication interface 126 follows, for example, the Ethernet (registered trademark) protocol. The communication interface 126 is connected to the bus 106 via the communication controller 124, plays a role of physically connecting the computer system 100 to the communication line 128, and is a TCP / IP of the communication function of the operating system of the computer system 100. A network interface layer is provided for the communication protocol. Note that the communication line may be based on a wired LAN environment or a wireless LAN environment, for example, based on a Wi-Fi standard such as IEEE 802.11a / b / g / n.

以上から、本発明の実施態様において使用されるコンピュータ・システム１００は、特定のオペレーティング・システム環境に限定されるものではないことを理解することができるであろう。なお、上記説明した構成要素は例示であり、そのすべての構成要素が本発明の必須構成要素となるわけではない。 From the foregoing, it will be appreciated that the computer system 100 used in embodiments of the present invention is not limited to a particular operating system environment. In addition, the component demonstrated above is an illustration, All the components are not necessarily an essential component of this invention.

図２は、図１に示すコンピュータ・システム１００のソフトウェアの構成の一例を示す図である。ＣＰＵ１０２は、ディスク１１４からJava（登録商標）仮想マシン（VM）、本発明の実施形態によるトレース生成プログラム、マルチレベルのコンパイレーション実行プログラムをメイン・メモリ１０４に読み出し実行することにより、オペレーティング・システム２０２、仮想マシン２０６、トレーシング実行環境２２６、及び動的コンパイラ２２８をメイン・メモリ１０４に展開する。オペレーティング・システム２０２は、ＣＰＵ１０２やメモリの管理など、コンピュータ・システム１００が有する基本的な機能を提供するソフトウェアである。 FIG. 2 is a diagram showing an example of a software configuration of the computer system 100 shown in FIG. The CPU 102 reads the Java (registered trademark) virtual machine (VM), the trace generation program according to the embodiment of the present invention, and the multi-level compilation execution program from the disk 114 to the main memory 104 and executes them, thereby executing the operating system 202. The virtual machine 206, the tracing execution environment 226, and the dynamic compiler 228 are expanded in the main memory 104. The operating system 202 is software that provides basic functions of the computer system 100 such as management of the CPU 102 and memory.

仮想マシン２０６は、バイトコードの低速実行（ｉｎｔｅｒｐｒｅｔ）、およびコンパイル済みトレースの実行を行うエミュレータである。仮想マシン２０６は、インタープリタ２０８と、実行部２１２と、トレースディスパッチャ２１４を含んで構成される。 The virtual machine 206 is an emulator that performs low-speed execution (interpret) of bytecode and execution of compiled traces. The virtual machine 206 includes an interpreter 208, an execution unit 212, and a trace dispatcher 214.

トレースディスパッチャ２１４は、トレースキャッシュ２２４を参照して次に実行するバイトコードアドレスから始まるコンパイル済みトレースがコードキャッシュ２１６に保存されている否かを判定する。インタープリタ２０８は、コンパイル済みトレースが存在しない場合に、処理対象のバイトコードを低速に実行する。実行部２１２は、コンパイル済みトレースが存在する場合、動的コンパイラ２２８が生成したコンパイル済みトレースを保存するメモリ領域であるコードキャッシュ２１６からコンパイル済みトレースを取得して実行する。 The trace dispatcher 214 refers to the trace cache 224 to determine whether or not a compiled trace starting from a bytecode address to be executed next is stored in the code cache 216. The interpreter 208 executes the processing target bytecode at a low speed when there is no compiled trace. When the compiled trace exists, the execution unit 212 acquires the compiled trace from the code cache 216 that is a memory area for storing the compiled trace generated by the dynamic compiler 228 and executes the acquired trace.

トレーシング実行環境２２６は、インタープリタ２０８によるバイトコード及び実行部２１２によるコンパイル済みトレースの実行の結果に基づき、現在の最大長に関する制約の下で実行時間をより多く使っているトレース（以下、「ホットパス」ともいう）をコンパイル対象として選択するためのソフトウェアモジュール群である。トレーシング実行環境２２６は、トレース選択エンジン２１８と、トレースキャッシュ２２４とを含んで構成される。トレース選択エンジン２１８は、起動時の低い最適化レベルにおいては最大長を所定長以下に制限してトレースを選択し、起動後のより高い最適化レベルにおいては所定長よりも長いトレースの生成を許可して新たにトレースを選択する。トレースキャッシュ２２４は、コンパイル時の最適化レベルや、後述するＴＴグラフ生成部２２０によって生成されるＴＴグラフのデータ構造など、トレースを管理するための情報を記憶する。 The tracing execution environment 226 is based on the result of the execution of the compiled trace by the byte code by the interpreter 208 and the execution unit 212, and the trace (hereinafter referred to as “hot path” which uses more execution time under the restriction on the current maximum length. Is also a software module group for selecting as a compilation target. The tracing execution environment 226 includes a trace selection engine 218 and a trace cache 224. The trace selection engine 218 selects a trace by limiting the maximum length to a predetermined length or less at a low optimization level at startup, and allows generation of a trace longer than the predetermined length at a higher optimization level after startup. To select a new trace. The trace cache 224 stores information for managing traces such as an optimization level at the time of compilation and a data structure of a TT graph generated by a TT graph generation unit 220 described later.

本発明では、起動後のより高い最適化レベルに対してタイマーベースのサンプリングにより頻繁に実行される直線実行パスを見つけるために、トレース間の実行の遷移を表す有向グラフであるＴＴグラフを作成する。そしてＴＴグラフを用いて適切なトレースに実行時間のチャージを行うことで、重複したトレースの生成を避けながら長い実行時間を費やすトレースを見つけ出す。そのため、本発明に係るトレース選択エンジン２１８は、ＴＴグラフ生成部２２０、ＴＴグラフ更新部２２１、トレース生成部２２２を含んで構成される。これら３つの構成要素のそれぞれの機能の詳細については図４〜図９を参照して後述する。 In the present invention, in order to find a linear execution path that is frequently executed by timer-based sampling with respect to a higher optimization level after activation, a TT graph that is a directed graph representing an execution transition between traces is created. Then, the execution time is charged to an appropriate trace using the TT graph to find a trace that spends a long execution time while avoiding generation of a duplicate trace. Therefore, the trace selection engine 218 according to the present invention includes a TT graph generation unit 220, a TT graph update unit 221, and a trace generation unit 222. Details of the functions of these three components will be described later with reference to FIGS.

動的コンパイラ２２８は、起動時には起動速度を優先して低い最適化レベルでコンパイルを行い、起動後にはピークパフォーマンスを優先してより高い最適化レベルでコンパイルを行う、マルチレベルのコンパイレーションを採用するコンパイラである。動的コンパイラ２２８は、トレース選択エンジン２１８が出力するトレースを入力として、入力トレースに適用された制約に対応する最適化レベルで最適化を行い、ネイティブコードを動的に生成する。動的コンパイラ２２８は、中間コード生成部２３０と、最適化部２３２と、コード生成部２３４とを含んで構成される。 The dynamic compiler 228 employs a multi-level compilation that compiles at a low optimization level giving priority to startup speed at startup, and compiles at a higher optimization level giving priority to peak performance after startup. It is a compiler. The dynamic compiler 228 receives the trace output from the trace selection engine 218 as an input, performs optimization at an optimization level corresponding to the constraint applied to the input trace, and dynamically generates native code. The dynamic compiler 228 includes an intermediate code generation unit 230, an optimization unit 232, and a code generation unit 234.

中間コード生成部２３０は、トレース選択エンジン２１８が出力したトレースを動的コンパイラ２２８内部で扱う表現（intermediate representation）に変換する。最適化部２３２は、起動時における所定長以下のトレースに対しては低い最適化レベルの最適化処理を施しコード生成部２３０に出力する。最適化部２３２はまた、起動後における所定長以上の長さを許可されて選択されたトレースに対しては、より高い最適化レベルの最適化処理を施しコード生成部２３０に出力してアップグレードのための再コンパイルを図る。コード生成部２３４は、最適化部２３２により出力された最適化済みのトレースをネイティブコードに変換し、コードキャッシュ２１６に格納する。 The intermediate code generation unit 230 converts the trace output from the trace selection engine 218 into an intermediate representation handled in the dynamic compiler 228. The optimization unit 232 performs optimization processing at a low optimization level for a trace having a predetermined length or less at the time of activation and outputs the trace to the code generation unit 230. The optimization unit 232 also performs an optimization process with a higher optimization level on the trace selected with a predetermined length or longer after activation and outputs the trace to the code generation unit 230 for upgrade. Recompile for that. The code generation unit 234 converts the optimized trace output from the optimization unit 232 into native code, and stores the native code in the code cache 216.

以下では、まず図３を参照して、タイマーベースのサンプリングとそこで使用するyield pointについて説明する。続いて、図４〜図９を参照して、ＴＴグラフの生成、更新及びＴＴグラフに基づく再コンパイルについて説明する。続いて図１０〜図１３を参照して、ＴＴグラフの生成及び更新処理を含む、マルチレベルのコンパイレーション処理全体の流れを説明する。最後に図１４〜図１６を参照して、従来技術と本発明とを比較した実験結果について説明する。 In the following, the timer-based sampling and yield points used therein will be described first with reference to FIG. Next, generation and update of a TT graph and recompilation based on the TT graph will be described with reference to FIGS. Next, the flow of the entire multi-level compilation process including the TT graph generation and update process will be described with reference to FIGS. Finally, with reference to FIG. 14 to FIG. 16, experimental results comparing the prior art and the present invention will be described.

１．タイマーベースのサンプリングとyield point
タイマーベースのサンプリングでは、コンパイル済みコードの実行中にタイマー割り込みが起こると、yield flagと呼ばれるスレッドローカルなフラグが設定されて、次のyield pointでスレッドが停止する必要があることが示される。実行が次のyield pointで停止すると、対応するメソッドで実行時間が使われているとみなされ実行時間がチャージされる。このようにyield pointを用いることで、ランタイムシステムはスレッドを安全に停止できると共に、タイマーティックがあたった正確なプログラム位置を特定できる。従ってyield pointの挿入位置がプロファイリングの正確さとオーバーヘッドとを決める（詳細は例えば非特許文献９を参照）。 1. Timer-based sampling and yield point
In timer-based sampling, if a timer interrupt occurs during the execution of compiled code, a thread-local flag called yield flag is set to indicate that the thread needs to stop at the next yield point. When execution stops at the next yield point, it is assumed that the corresponding method is using the execution time, and the execution time is charged. By using yield points in this way, the run-time system can safely stop the thread and identify the exact program location with a timer tick. Therefore, the yield point insertion position determines the profiling accuracy and overhead (for example, see Non-Patent Document 9 for details).

ここでトレースベースのコンパイラにおいてタイマーベースのサンプリングを利用することを考える。タイマー割り込み時に実行されているトレースを正確に特定できるようにするためには、トレースの入り口（entry point）と全ての出口（exit point）にyield pointを挿入する必要がある。しかし全ての出口にyield pointを挿入するとなると、既に述べたように条件分岐ごとにexit pointが存在するため、yield pointの挿入数が多くなりコードサイズが大きくなる。そこで本発明では、ほとんどのexit pointは実行時間のチャージのためのトリガーとなることはないという事実に基づき、yield pointをトレースの先頭及びループのバックエッジにのみ挿入する。 Consider using timer-based sampling in a trace-based compiler. In order to be able to accurately identify the trace being executed at the time of timer interruption, it is necessary to insert yield points at the entry point of the trace and all exit points. However, if yield points are inserted at all exits, exit points exist for each conditional branch as described above, so the number of yield points increases and the code size increases. Therefore, in the present invention, based on the fact that most exit points do not trigger execution time charging, yield points are inserted only at the beginning of the trace and the back edge of the loop.

図３（ａ）は、線形トレース（linear trace）におけるyield pointの挿入位置を説明する図である。図３（ａ）に示す線形トレースは、基本ブロックＡ、Ｂ、Ｃから構成される。タイマーティックを正しく処理するためには、トレースの入り口３００と、全ての出口３０２、３０４、３０６にyield pointを挿入する必要がある。しかしながら本発明では上記事実に基づいてトレースの入り口３００にのみyield pointを挿入し、コードサイズが大きくなることを防ぐ。 FIG. 3A is a diagram illustrating the insertion position of the yield point in the linear trace. The linear trace shown in FIG. 3A is composed of basic blocks A, B, and C. In order to correctly handle timer ticks, it is necessary to insert yield points at the trace entry 300 and at all exits 302, 304, 306. However, in the present invention, a yield point is inserted only at the entrance 300 of the trace based on the above fact to prevent the code size from becoming large.

また、図３（ｂ）は、周期トレース（cyclic trace）におけるyield pointの挿入位置を説明する図である。図３（ｂ）に示す周期トレースは、基本ブロックＡ、Ｂから構成される。タイマーティックを正しく処理するためには、トレースの入り口３０８と、ループのバックエッジ３１２と、全ての出口３１０にyield pointを挿入する必要がある。しかし本発明では上記事実に基づいてトレースの入り口３０８と、ループのバックエッジ３１２にのみyield pointを挿入し、コードサイズが大きくなることを防ぐ。なお、本実施例においてコンパイル単位とするトレースは、線形トレースと、周期トレースであり、各トレースは、図３（ａ）及び図３（ｂ）に示すように、１つの入り口と１以上の出口をもつブロックとする。 FIG. 3B is a diagram for explaining the insertion position of the yield point in the cyclic trace. The periodic trace shown in FIG. 3B is composed of basic blocks A and B. In order to correctly handle timer ticks, it is necessary to insert yield points at the entrance 308 of the trace, the back edge 312 of the loop, and all exits 310. However, in the present invention, yield points are inserted only at the entrance 308 of the trace and the back edge 312 of the loop based on the above fact, thereby preventing the code size from becoming large. Note that the traces that are the compilation units in the present embodiment are a linear trace and a periodic trace, and each trace has one entrance and one or more exits as shown in FIGS. 3 (a) and 3 (b). A block with

本発明ではトレース出口のyield pointを省略する代わりに、トレース内で最後に実行される命令として分岐及び結合命令（branch-and-link命令）を挿入し、その実行によりトレースを抜けることとなった命令のポインタをリンクレジスタに記録する。これによって、タイマー割り込みにより実行がトレースの入り口に挿入されたyield pointで停まった場合でも、リンクレジスタの値を参照して直前に実行されていたトレースを特定することができる。なお、メソッドベースのコンパイラでは、トレース入り口のyield pointで実行が停まった場合、常に直前に実行されていたトレースに対して実行時間のチャージを行うが、本発明に係るトレーシング実行環境２２６では、詳細は後述するがホットパス全体を見つけ出すためにＴＴグラフを用いて実行時間をチャージすべき適切なトレースをその都度決定する。 In the present invention, instead of omitting the yield point of the trace exit, a branch and join instruction (branch-and-link instruction) is inserted as the last executed instruction in the trace, and the trace is exited by the execution. Record the instruction pointer in the link register. As a result, even when execution stops at the yield point inserted at the entrance of the trace due to the timer interrupt, it is possible to identify the trace executed immediately before by referring to the value of the link register. In the case of the method-based compiler, when execution stops at the yield point at the entrance of the trace, the execution time is always charged for the trace executed immediately before. However, in the tracing execution environment 226 according to the present invention, As will be described in detail later, in order to find the entire hot path, an appropriate trace to be charged with the execution time is determined each time using the TT graph.

上述したbranch-and-link命令は、通常のbranch命令と同じジャンプ命令であるが、ジャンプする際にリンクレジスタという専用レジスタにジャンプ元のアドレスを保存する命令である。branch-and-link命令は、一般的にはメソッドの呼び出しに使用され、呼び出されたメソッドから戻る際にリンクレジスタに保存されたアドレスが参照される。本実施例では、トレースリンキング技術によりトレース同士をリンクする際に、branch命令の代わりにbranch-and-link命令を用いることで、直前に実行されたトレースを特定可能とする。 The branch-and-link instruction described above is the same jump instruction as the ordinary branch instruction, but is an instruction for saving the jump source address in a dedicated register called a link register when jumping. The branch-and-link instruction is generally used to call a method, and the address stored in the link register is referred to when returning from the called method. In this embodiment, when the traces are linked by the trace linking technique, the branch-and-link instruction is used instead of the branch instruction so that the trace executed immediately before can be specified.

なお、トレースリンキング技術とは、トレース間で実行をインタープリタ２０８に戻さずに直接次のトレースに実行を移させるために用いられる最適化技術である。トレースリンキングを実現するためには、元のトレースの出口から直に次のトレースの先頭アドレスへのジャンプを行うコードを生成しこれを元のトレースに挿入する必要がある。このため通常は、コンパイル時に実行コードの最後にトレースディスパッチャ２１４へのbranch命令を挿入し、トレースディスパッチャ２１４は、次のトレースが一意に決まることを条件として、ジャンプ元、ジャンプ先のトレースがコンパイルされた後最初の実行時にbranch命令のとび先を次のトレースの先頭アドレスに書き換える。本実施例では上述したように、トレース同士をリンクする際にbranch命令の代わりにbranch-and-link命令によりトレースの先頭アドレスへジャンプするよう書き換える。なおトレースリンキングは既存技術であり本発明の要旨ではないからこれ以上の説明は省略する。トレースリンキングの更なる詳細については、例えば非特許文献１０を参照されたい。 Note that the trace linking technique is an optimization technique used for transferring execution directly to the next trace without returning execution to the interpreter 208 between traces. In order to realize trace linking, it is necessary to generate a code that jumps from the exit of the original trace to the start address of the next trace and insert it into the original trace. For this reason, a branch instruction to the trace dispatcher 214 is usually inserted at the end of the execution code at the time of compilation, and the trace dispatcher 214 compiles the jump source and destination traces on condition that the next trace is uniquely determined. After the first execution, the branch instruction destination is rewritten to the start address of the next trace. In this embodiment, as described above, when linking traces, the branch instruction is rewritten to jump to the start address of the trace by using a branch-and-link instruction instead of the branch instruction. Since trace linking is an existing technique and is not the gist of the present invention, further explanation is omitted. For further details of trace linking, see, for example, Non-Patent Document 10.

２．ＴＴグラフの生成
ＴＴグラフ生成部２２０は、最大長を所定長以下に制限してコンパイルされたコンパイル済みトレースの実行中にタイマーベースのサンプリングを用いたプロファイリングを行うことで、ＴＴグラフを生成する。ＴＴグラフは、トレース間の実行の遷移を表す有向グラフであり、有向グラフの各ノードはトレースを示し、ノード間のエッジはトレース間の実行の遷移を示す。各ノードは該ノードにタイマーティックがあたった頻度を示す再コンパイルカウンタを有し、各エッジは対応するトレース間の遷移の相対頻度を示す重みを有する。 2. Generation of TT Graph The TT graph generation unit 220 generates a TT graph by performing profiling using timer-based sampling during execution of a compiled trace that is compiled with the maximum length limited to a predetermined length or less. The TT graph is a directed graph representing a transition of execution between traces. Each node of the directed graph represents a trace, and an edge between the nodes represents a transition of execution between the traces. Each node has a recompilation counter that indicates the frequency with which the node has a timer tick, and each edge has a weight that indicates the relative frequency of transitions between corresponding traces.

一例としてＴＴグラフは、ＴＴグラフ内の各ノードに対し次のａ〜ｃの情報を保持するデータ構造で保持されてよい。
ａ．タイマーティックが当たった回数を数えるカウンタ（以下、「再コンパイルカウンタ」という）
ｂ．入ってくるエッジ（incoming edge）それぞれについて｛元のノード，エッジの重みカウンタ｝
ｃ．出ていくエッジ（outgoing edge）それぞれについて｛行先のノード，エッジの重みカウンタ｝
ここで各ノードは、該ノードが示すトレースの情報を管理するデータ構造へのポインタによって識別されてよい。なお１のノードに入ってくる又は出ていくエッジ数が多くなる場合は、重みカウンタが大きい上位のエッジの情報のみを保持し、残りのエッジについてはその他のエッジとしてまとめてカウントしてよい。これによりエッジ数の多いノードが増えてもメモリ消費が問題となることはない。 As an example, the TT graph may be held in a data structure that holds information of the following ac for each node in the TT graph.
a. A counter that counts the number of times a timer tick is hit (hereinafter referred to as “recompile counter”)
b. For each incoming edge {original node, edge weight counter}
c. For each outgoing edge {destination node, edge weight counter}
Here, each node may be identified by a pointer to a data structure that manages trace information indicated by the node. When the number of edges entering or exiting one node increases, only the information on the upper edge having a large weight counter may be retained, and the remaining edges may be counted together as other edges. As a result, memory consumption does not become a problem even if the number of nodes having a large number of edges increases.

ＴＴグラフの生成は次のようにして行う。まずプログラムの起動時にコンパイル単位であるトレースの最大長を所定長以下に制限し短い多数のトレースを生成する。そして生成したトレースに対応する有向グラフのノードを作成する。但しこの時点では各ノードに対して保持される上記ａ〜ｃの各情報の中身は空である。次に生成された多数のトレースを低い最適化レベルでコンパイルする。このときトレースの先頭及びループのバックエッジにyield pointを挿入する。 The generation of the TT graph is performed as follows. First, at the time of starting the program, the maximum length of the trace that is a compilation unit is limited to a predetermined length or less, and a large number of short traces are generated. Then, a directed graph node corresponding to the generated trace is created. However, at this time, the contents of the information a to c held for each node are empty. The many generated traces are then compiled at a low optimization level. At this time, yield points are inserted at the beginning of the trace and the back edge of the loop.

次にコンパイル済みトレースを実行部２１２により実行し、トレースディスパッチャ２１４によりトレースリンキングが行われることに応答してＴＴグラフ内の対応するノード間のエッジをはる。より具体的には、元のトレースのノードに対しては情報ｃの行先のノードを設定し、現在のトレースのノードに対しては情報ｂの元のノードを設定し、それぞれ対応するエッジの重みカウンタを１で初期化する。これにより元のトレースと次のトレースとの間にエッジがはられる。 Next, the compiled trace is executed by the execution unit 212, and the edge between corresponding nodes in the TT graph is measured in response to the trace linking being performed by the trace dispatcher 214. More specifically, the destination node of information c is set for the original trace node, the original node of information b is set for the current trace node, and the corresponding edge weights are set. The counter is initialized with 1. This creates an edge between the original trace and the next trace.

図４は、トレース同士が結合されたばかりの時点におけるＴＴグラフの基本構造の一例を示す図である。図４に示すように、一般的にはホットループが周期トレース（ノード４００が示すcyclictraceを参照）として選択され、そこから抜けた先が順次次のトレース（ノード４０２、４０６、…が示すlineartrace 1、linear trace 2、…を参照）としてコンパイルされていく。ＴＴグラフ内のノードには、linear trace2を示すノード４０６のように入ってくるエッジが複数のものもある。上述したように本実施例においてトレースは１つの入り口と１以上の出口をもつブロックであるので、全ての入ってくるエッジは次のトレースの先頭にジャンプしなくてはならない。またＴＴグラフ内のノードには、lineartrace3を示すノード４０８のように出ていくエッジが複数のものもある。出ていくエッジは、トレースの途中にある出口又はトレースの終わりにある出口のいずれかから次のトレースの先頭への遷移を示す。 FIG. 4 is a diagram showing an example of the basic structure of the TT graph at the time when the traces have just been combined. As shown in FIG. 4, generally, a hot loop is selected as a periodic trace (refer to the cyclic trace indicated by the node 400), and the destination from there is sequentially the next trace (linear trace 1 indicated by the nodes 402, 406,...). , See linear trace 2, ...). Some nodes in the TT graph have a plurality of incoming edges such as a node 406 indicating linear trace2. As described above, in this embodiment, the trace is a block having one entry and one or more exits, so all incoming edges must jump to the beginning of the next trace. Also, some nodes in the TT graph have a plurality of edges that come out like a node 408 indicating lineartrace3. The outgoing edge indicates the transition from either the exit in the middle of the trace or the exit at the end of the trace to the beginning of the next trace.

３．エッジの重みの更新
ＴＴグラフの基本構造ができると、次にＴＴグラフ更新部２２１は、コンパイル済みトレースを実行中にタイマーベースのサンプリング用いてプロファイリングを行うことにより、エッジの重みがトレース間の遷移の相対頻度を示すように調整する。即ちＴＴグラフ更新部２２１は、プロファイリングにより２つのトレース間の遷移を見つけると、該遷移を示すエッジの重みを１増加する。より具体的には、ＴＴグラフ更新部２２１は、タイマーティックがあたったトレースとその直前に実行されたトレースとの間のエッジの重みを増加する。なお、ノードに入ってくるエッジが複数ある場合には、リンクレジスタを参照して該当する１のエッジを特定する。 3. Update of edge weights Once the basic structure of the TT graph is completed, the TT graph update unit 221 performs profiling using timer-based sampling during execution of the compiled trace, so that the edge weight changes between the traces. Adjust to show relative frequency. That is, when the TT graph update unit 221 finds a transition between two traces by profiling, the TT graph update unit 221 increments the edge weight indicating the transition by one. More specifically, the TT graph updating unit 221 increases the weight of the edge between the trace hit by the timer tick and the trace executed immediately before the trace. If there are a plurality of edges entering the node, the corresponding one edge is specified with reference to the link register.

図５（ａ）を参照して、エッジの重みの調整方法を具体的に説明する。図５（ａ）に示すように、linear trace 1（ノード５０２）を実行中にタイマー割り込みが起きてyield flagが設定され、実行が次のトレース、即ちlinear trace 2（ノード５０６）の先頭のyield pointで停まったとする。このとき直前のトレースはlineartrace 1（ノード５０２）であるので、linear trace 1のノード５０２とlinear trace 2のノード５０６間のエッジの重みを１増やす。なお、現在のトレースは、現在の命令のポインタからを特定され、直前のトレースはリンクレジスタを参照することにより特定される。また、周期トレースの実行中にタイマー割り込みが起こり、実行がその周期トレースのバックエッジのyieldpointで停まる場合は、同一のトレースへの遷移であり調整すべきエッジがないので何もしない。 With reference to FIG. 5A, the edge weight adjustment method will be described in detail. As shown in FIG. 5A, a timer interrupt occurs during execution of linear trace 1 (node 502), yield flag is set, and execution is the next trace, ie, yield at the beginning of linear trace 2 (node 506). Suppose you stop at point. At this time, since the trace immediately before is lineartrace 1 (node 502), the edge weight between the node 502 of linear trace 1 and the node 506 of linear trace 2 is increased by one. The current trace is specified from the pointer of the current instruction, and the previous trace is specified by referring to the link register. Also, if a timer interrupt occurs during execution of a periodic trace and execution stops at the yieldpoint of the back edge of the periodic trace, nothing is done because there is no transition to the same trace and there is no edge to be adjusted.

サンプルを効率的に収集して十分に正確なＴＴグラフをより早く構築するために、従来技術のバーストトレーシング技術を応用することもできる。バーストトレーシング技術は、サンプリング開始時に１のイベントの代わりに一連のイベント列（バースト）をサンプリングすることによって、低オーバーヘッドの短時間のプロファイリングを可能とする技術である（詳細は非特許文献１１を参照）。そこで本発明においてもバーストトレーシング技術を応用したバーストサンプリング、即ち、トレースの先頭のyieldpointごとに繰り返し実行を停止して、一連のトレース間の遷移をサンプリングすることを考える。 Prior art burst tracing techniques can also be applied to efficiently collect samples and build a sufficiently accurate TT graph faster. The burst tracing technique is a technique that enables profiling in a short time with low overhead by sampling a series of event sequences (bursts) instead of one event at the start of sampling (for details, refer to Non-Patent Document 11). reference). Therefore, in the present invention, it is also considered that burst sampling using the burst tracing technique, that is, the execution is repeatedly stopped for each yield point at the beginning of the trace, and the transition between a series of traces is sampled.

バーストサンプリングを利用する場合、ＴＴグラフ更新部２２１はＴＴグラフ上でタイマーティックがあたったトレースに続く１以上のトレースのyield pointにおいて連続して実行が停止するよう設定を行う。具体的には、ＴＴグラフ更新部２２１は実行がトレース先頭の任意のyield pointで停止するごとにyield flagを設定し、次のyield pointでスレッドが停止する必要があることを示す。 When burst sampling is used, the TT graph update unit 221 performs setting so that execution is continuously stopped at the yield point of one or more traces following the trace on which a timer tick is placed on the TT graph. Specifically, the TT graph update unit 221 sets a yield flag each time execution stops at an arbitrary yield point at the beginning of the trace, indicating that the thread needs to stop at the next yield point.

図５（ｂ）を参照してバーストサンプリングを具体的に説明する。図５（ｂ）に示すように、プロファイリングを開始後タイマー割り込みが起きてyield flagが設定され、linear trace 2を示すノード５２６の先頭のyieldpointで実行が停止したとする。ＴＴグラフ更新部２２１は対応するエッジの重み、即ちlinear trace 1のノード５２２とlineartrace 2を示すノード５２６の間のエッジの重みを１増加した後、次のyield pointで再び実行が停止するようにyield flagを設定する。そして次のlinear trace 3のyield pointで実行が停止するとＴＴグラフ更新部２２１は再び上記処理を繰り返す。 The burst sampling will be specifically described with reference to FIG. As shown in FIG. 5B, it is assumed that a timer interrupt occurs after profiling is started, yield flag is set, and execution stops at the yield point at the head of the node 526 indicating linear trace 2. The TT graph updating unit 221 increases the corresponding edge weight, that is, the edge weight between the node 522 of the linear trace 1 and the node 526 indicating the linear trace 2 by 1 and then stops the execution again at the next yield point. Set the yield flag. When the execution stops at the yield point of the next linear trace 3, the TT graph update unit 221 repeats the above process again.

ＴＴグラフ更新部２２１は上記処理を、（１）連続した実行の停止回数が所定数に達したこと、（２）実行が、周期トレース、バーストサンプリングにおいて既に停止済みのトレース、及びコンパイル済みトレースのうちのいずれかのトレースに到達したこと、（３）実行が、次のトレースが存在しないトレースの出口を通過したことのいずれかに応答して終了する。このようにバーストサンプリングを利用することで、タイマー割り込み頻度を増やすことなく連続したトレース間の遷移をサンプリングすることができる。また、ほとんどのホットパスが再コンパイル済みとなれば、上記（２）の条件によりほとんどのバーストサンプリングは終了する。そのため定常状態では、バーストサンプリングによるオーバーヘッドは、繰り返し停止のない通常のサンプリングと比較してそれほど大きくない。 The TT graph update unit 221 performs the above processing (1) that the number of consecutive execution stops has reached a predetermined number, and (2) execution is a periodic trace, a trace that has already been stopped in burst sampling, and a compiled trace. One of the traces has been reached, and (3) execution terminates in response to either the next trace passing through the exit of a non-existing trace. By using burst sampling in this way, transitions between consecutive traces can be sampled without increasing the timer interrupt frequency. If most of the hot paths have been recompiled, most burst sampling is completed under the condition (2). Therefore, in a steady state, the overhead due to burst sampling is not so large as compared to normal sampling without repeated stops.

４．再コンパイルカウンタの更新及びＴＴグラフに基づく再コンパイル
ＴＴグラフ更新部２２１は、エッジの重みの更新と平行して、ＴＴグラフ内の各ノードが有する再コンパイルカウンタを更新する。上述したように各ノードの再コンパイルカウンタはそのノードが示すトレースにタイマーティックがあたった頻度を示す。但しここでいうタイマーティックがあたった頻度とは、そのトレースに真にタイマーティックがあたった頻度ではなく、タイマー割り込みが起きたときにそのトレースに実行時間をチャージすべきと判断された頻度である。 4). Recompilation Counter Update and Recompilation Based on TT Graph The TT graph update unit 221 updates the recompilation counter of each node in the TT graph in parallel with the edge weight update. As described above, the recompilation counter of each node indicates the frequency at which the timer tick is applied to the trace indicated by the node. However, the frequency of timer ticks here is not the frequency of true timer ticks in the trace, but the frequency of determining that the execution time should be charged to the trace when a timer interrupt occurs. .

本発明では実行時間をチャージすべきトレースを以下のようにＴＴグラフを用いて決定する。即ち、ＴＴグラフ更新部２２１は、コンパイル済みコードの実行中において、タイマーティックがあたったトレースに対応するノードを出発点としてＴＴグラフのエッジを逆方向に辿り、周期トレース、再コンパイル済みトレース、若しくは後述する無効トレースの手前、又はエッジがなくなったところで停止して辿り着いたトレースの再コンパイルカウンタを増加する。 In the present invention, the trace to be charged with the execution time is determined using a TT graph as follows. That is, during execution of the compiled code, the TT graph update unit 221 traces the edge of the TT graph in the reverse direction starting from the node corresponding to the trace hit by the timer tick, and the periodic trace, recompiled trace, or The recompilation counter of the trace which stopped and arrived before the invalid trace which will be described later or when the edge disappears is increased.

再コンパイルカウンタを増加する際にＴＴグラフ更新部２２１はまた、その値を所定の閾値Ｓ１と比較する。そして再コンパイルカウンタの値が所定の閾値Ｓ１よりも大きいと判定した場合、ＴＴグラフ更新部２２１は、所定の閾値Ｓ１よりも大きい再コンパイルカウンタを有するノードに無効のマークをつけ、該ノードの全ての入ってくるエッジ及び出ていくエッジを削除する。次に実行が無効マークのついたトレースに到達するとトレース生成部２２２が呼び出され、トレース生成部２２２は無効マークのついたトレースの先頭アドレスを新たなトレースの開始位置として、所定長以上の長さを許可して新たにトレースを生成する。生成された新たなトレースはその後動的コンパイラ２２８に対して出力され、高い最適化レベルで最適化される。コンパイル済みとなった新たなトレースは、ＴＴグラフ生成部２２０により新規ノードとしてＴＴグラフに追加される。なお無効マークは一定時間経過した後は削除される。 When incrementing the recompile counter, the TT graph updater 221 also compares the value with a predetermined threshold value S1. If it is determined that the value of the recompilation counter is larger than the predetermined threshold S1, the TT graph update unit 221 marks the nodes having the recompilation counter larger than the predetermined threshold S1 as invalid, Delete incoming and outgoing edges. Next, when the execution reaches a trace with an invalid mark, the trace generation unit 222 is called, and the trace generation unit 222 uses the start address of the trace with the invalid mark as a new trace start position and is longer than a predetermined length. To generate a new trace. The generated new trace is then output to the dynamic compiler 228 and optimized at a high optimization level. The new trace that has been compiled is added to the TT graph as a new node by the TT graph generation unit 220. The invalid mark is deleted after a certain period of time.

このように本発明に係るトレーシング実行環境２２６は、ＴＴグラフを利用して適切なトレースに実行時間をチャージし、ホットパス全体を見つけ出す。なお、周期トレースに到達した際に逆方向の探索をそれ以上行わないのは、上述したように本実施例では周期トレースと線形トレースのみをコンパイル単位とするためである。従って、たとえ最適化レベルが高く設定され所定長よりも長くトレースを生成することが許されていても、周期トレースと線形トレースとが互いに混じるような形でＴＴグラフを逆方向に探索することはしない。 In this way, the tracing execution environment 226 according to the present invention uses the TT graph to charge the execution time to an appropriate trace and finds the entire hot path. The reason why the search in the reverse direction is not further performed when the periodic trace is reached is that, as described above, only the periodic trace and the linear trace are used as the compilation unit in the present embodiment. Therefore, even if the optimization level is set high and it is allowed to generate a trace longer than the predetermined length, searching the TT graph in the reverse direction in such a way that the periodic trace and the linear trace are mixed with each other is not possible. do not do.

図６は、上記再コンパイルカウンタの更新方法を説明する図である。図６に示すように、ノード６０８が示すlinear trace 3を実行中にタイマー割り込みが起こりyield flagが設定され、次のノード６１０が示すlinear trace 4の先頭のyield pointで実行が停まったとする。この場合タイマーティックがあたったトレースはlinear trace 4であるため対応するノード６１０が出発点となる。ノード６１０を出発点としてＴＴグラフを逆方向へ辿っていくと（矢印６１４参照）、linear trace 1を示すノード６０２に到達する。ノード６０２は周期トレースを示すノード６００の手前のノードであるため、ノード６０２で逆方向の探索を終了し、その再コンパイルカウンタを１増加して実行時間をチャージする。 FIG. 6 is a diagram for explaining a method for updating the recompile counter. As shown in FIG. 6, it is assumed that a timer interrupt occurs during execution of linear trace 3 indicated by node 608, yield flag is set, and execution stops at the top yield point of linear trace 4 indicated by next node 610. In this case, the trace given the timer tick is linear trace 4, so the corresponding node 610 is the starting point. When the TT graph is traced in the reverse direction starting from the node 610 (see arrow 614), the node 602 indicating linear trace 1 is reached. Since the node 602 is a node before the node 600 indicating the periodic trace, the backward search is ended at the node 602, and the recompile counter is incremented by 1 to charge the execution time.

上述したようにＴＴグラフ内には複数の入ってくるエッジや複数の出ていくエッジをもつノードが存在し得る。ＴＴグラフを逆方向に辿る途中に複数のエッジを有するノードがある場合、ＴＴグラフ更新部２２１は、その重みが所定の条件を満たすエッジのみを辿る。より具体的には、ＴＴグラフ更新部２２１は、ＴＴグラフを逆方向に辿る途中に存在するノードに入ってくるエッジが複数ある場合に、該複数のエッジの重みの合計に対するそのエッジの重みの比率が所定の閾値Ｓ２を超えるエッジのみを辿る（第１更新方法）。ＴＴグラフ更新部２２１はまた、次に辿ろうとするノードから出ているエッジが複数ある場合に、該複数のエッジの重みの合計に対する、現在のノードから次のノードへのエッジの重みの比率が所定の閾値Ｓ３を超えるエッジのみを辿る（第２更新方法）。これは、複数のエッジの重みの合計値に対する対象のエッジの重みの比率が所定の閾値より小さいということは、そのパス（エッジ）がめったに実行されないパスということになるからである。なお、条件を満たすエッジが複数ある場合はその条件を満たす全てのエッジを辿る。 As described above, there can be a node having a plurality of incoming edges and a plurality of outgoing edges in the TT graph. When there is a node having a plurality of edges in the middle of tracing the TT graph in the reverse direction, the TT graph update unit 221 traces only the edge whose weight satisfies a predetermined condition. More specifically, when there are a plurality of edges entering a node existing on the way of tracing the TT graph in the reverse direction, the TT graph update unit 221 calculates the weight of the edge with respect to the sum of the weights of the plurality of edges. Only edges whose ratio exceeds a predetermined threshold S2 are traced (first update method). The TT graph update unit 221 also determines the ratio of the weight of the edge from the current node to the next node with respect to the sum of the weights of the plurality of edges when there are a plurality of edges coming from the next node to be traced. Only edges that exceed a predetermined threshold S3 are traced (second update method). This is because the fact that the ratio of the weight of the target edge to the total value of the weights of the plurality of edges is smaller than the predetermined threshold means that the path (edge) is rarely executed. If there are a plurality of edges that satisfy the condition, all edges that satisfy the condition are traced.

図７は、上記再コンパイルカウンタの更新方法（第１更新方法）を説明する図である。図７に示すように、ノード７０８が示すlinear trace 3を実行中にタイマー割り込みが起こりyield flagが設定され、次のノード７１０が示すlinear trace 4の先頭のyield pointで実行が停まったとする。この場合タイマーティックがあたったトレースはlinear trace 4であるため対応するノード７１０が出発点となる。ノード７１０を出発点としてＴＴグラフを逆方向へ辿っていくと２つの入ってくるエッジ７１４、７１６を有するノード７０６に到達する。 FIG. 7 is a diagram for explaining the recompile counter update method (first update method). As shown in FIG. 7, it is assumed that a timer interrupt occurs during execution of linear trace 3 indicated by node 708, yield flag is set, and execution stops at the top yield point of linear trace 4 indicated by next node 710. In this case, since the trace given the timer tick is linear trace 4, the corresponding node 710 is the starting point. If the TT graph is traced in the reverse direction starting from the node 710, a node 706 having two incoming edges 714 and 716 is reached.

ここで所定の閾値Ｓ２を２つの入ってくるエッジ７１４、７１６の重みの合計値８（８＝６＋２）の２０％とすると、エッジ７１４の重み値６もエッジ７１６の重み値２もどちらも２０％を超えるので両方のエッジについて引き続き逆方向の探索が続けられる。するとエッジ７１４についてはノード７００が周期トレースとなるためその手前のノード７０２で逆方向探索を終了する。そしてノード７０２の再コンパイルカウンタを１増加して実行時間をlinear trace 1にチャージする。エッジ７１６についてはノード７０４に到達した時点で入ってくるエッジがなくなるので、ノード７０４の再コンパイルカウンタを１増加して実行時間をlinear trace 1’にチャージする。 Here, if the predetermined threshold S2 is 20% of the total weight 8 (8 = 6 + 2) of the two incoming edges 714 and 716, the weight value 6 of the edge 714 and the weight value 2 of the edge 716 are both 20 The search in the opposite direction is continued for both edges since the value exceeds%. Then, since the node 700 becomes a periodic trace with respect to the edge 714, the backward search is terminated at the node 702 before the edge 700. Then, the recompile counter of the node 702 is incremented by 1, and the execution time is charged to the linear trace 1. As for the edge 716, when the node 704 is reached, there is no incoming edge, so the recompile counter of the node 704 is incremented by 1 and the execution time is charged to the linear trace 1 '.

図８は、上記再コンパイルカウンタの更新方法（第２更新方法）を説明する図である。図８に示すように、ＴＴグラフの逆方向の探索が開始され、現在のノードがlinear trace 4’を示すノード８１２に到達したとする（矢印８１８を参照）。すると次に辿ろうとするノード８０８は複数の出ていくエッジ８１４、８１６を有しているので、重みを用いてそのエッジを辿るための条件が満たされるか否か検討する。 FIG. 8 is a diagram for explaining the recompile counter update method (second update method). As shown in FIG. 8, it is assumed that a reverse search of the TT graph is started and the current node reaches a node 812 indicating linear trace 4 '(see arrow 818). Then, since the node 808 to be traced has a plurality of outgoing edges 814 and 816, it is examined whether or not the condition for tracing the edges is satisfied using the weight.

ここで所定の閾値Ｓ３を２つの出ていくエッジ８１４、８１６の重みの合計値９（９＝８＋１）の２０％とすると、エッジ８１６の重み値１は２０％より小さいのでエッジ８１６を辿ることはしない。結果、逆方向の探索はノード８１２で終了し、ノード８１２の再コンパイルカウンタを１増加して実行時間をlinear trace 4’チャージする。 Here, when the predetermined threshold value S3 is 20% of the total weight 9 (9 = 8 + 1) of the two outgoing edges 814 and 816, the edge 816 follows the edge 816 because the weight value 1 of the edge 816 is smaller than 20%. I do not. As a result, the backward search ends at node 812, and the recompile counter at node 812 is incremented by 1 to charge the execution time by linear trace 4 '.

図９（ａ）は、ＴＴグラフを用いて選択した所定長よりも長いトレースの一例を示す図である。図９（ｂ）は、ＴＴグラフを用いることなく選択した所定長よりも長いトレースの一例を示す図である。なお、図９（ｂ）に示すＴＴグラフ（トレース９１６を除く）は、図９（ａ）に示したＴＴグラフと同一であり、これは理解を容易にするために提供しているに過ぎないことを理解されたい。 FIG. 9A is a diagram illustrating an example of a trace longer than a predetermined length selected using a TT graph. FIG. 9B is a diagram showing an example of a trace longer than a predetermined length selected without using a TT graph. Note that the TT graph (except for the trace 916) shown in FIG. 9B is the same as the TT graph shown in FIG. 9A, and this is only provided for ease of understanding. Please understand that.

図９（ａ）に示す所定長よりも長い再コンパイルのためのトレース９１４は、ＴＴグラフを使用することにより開始位置が適切に選択され生成されたトレースである。そのためトレース９１４はホットパス全体を含む。これに対し図９（ｂ）に示す所定長よりも長い再コンパイルのためのトレース９１６は、タイマー割り込みが偶然発生した位置を開始位置として生成されたトレースである。そのため、トレース９１６はホットパスの一部のみを含む。結果、トレース９１４に対してはホットパス全体を考慮した最適化を行うことができ、よい実行コードの生成が可能となる。一方トレース９１６はホットパス全体をカバーしないので、最適化の機会を失う可能性がある。また、トレース９１６については、ノード９０２、９０４が示すlinear trace1、linear trace 2からそれぞれ始まる別の長いトレースが新たに生成される可能性もあり、その場合無駄な重複を発生してしまう。このようにＴＴグラフを使用してトレースを選択することで、アップグレードのための再コンパイルに相応しいトレースを生成することが可能となる。 The trace 914 for recompilation longer than the predetermined length shown in FIG. 9A is a trace generated by appropriately selecting the start position by using the TT graph. Therefore, the trace 914 includes the entire hot path. In contrast, the trace 916 for recompilation longer than the predetermined length shown in FIG. 9B is a trace generated with the position where the timer interruption occurred accidentally as the start position. Therefore, the trace 916 includes only a part of the hot path. As a result, optimization considering the entire hot path can be performed on the trace 914, and a good executable code can be generated. On the other hand, since trace 916 does not cover the entire hot path, it may lose the opportunity for optimization. As for the trace 916, another long trace starting from the linear trace 1 and the linear trace 2 indicated by the nodes 902 and 904 may be newly generated. In this case, useless duplication occurs. By selecting a trace using the TT graph in this way, it is possible to generate a trace suitable for recompilation for upgrade.

５．動作説明
次に図１０を参照して、本発明の実施の形態に係るマルチレベルのコンパイレーション処理全体の流れを説明する。図１０に示すフローチャートは、仮想マシン２０６により実行対象のプログラムが起動されることによって開始し、トレース選択エンジン２１８は、インタープリタ２０８によるプログラムの実行結果に基づき、最大長を所定長以下に制限して頻繁に実行されるパスをトレースとして選択する（ステップ１０００）。続いて、動的コンパイラ２２８は、トレース選択エンジン２１８によって出力されたトレースに対しyield pointを挿入して、低い最適化レベルでコンパイル処理を行う（ステップ１００２）。 5. Description of Operation Next, the flow of the entire multilevel compilation process according to the embodiment of the present invention will be described with reference to FIG. The flowchart shown in FIG. 10 starts when the execution target program is started by the virtual machine 206, and the trace selection engine 218 limits the maximum length to a predetermined length or less based on the execution result of the program by the interpreter 208. A frequently executed path is selected as a trace (step 1000). Subsequently, the dynamic compiler 228 inserts yield points into the trace output by the trace selection engine 218, and performs compilation processing at a low optimization level (step 1002).

続いて、実行部２１２はコードキャッシュ２１６からコンパイル済みのトレースを読み出して実行する（ステップ１００４）。実行部２１２によるコンパイル済みのトレースの実行に応答して、トレース選択エンジン２１８はＴＴグラフ（基本構造）を作成する（ステップ１００６）。続いてトレース選択エンジン２１８はタイマーベースのサンプリングを用いたプロファイリングを開始する（ステップ１００８）。続いて、トレース選択エンジン２１８はプロファイリング結果に基づきＴＴグラフを更新し、所定の条件が満たされる場合には新たなトレースを生成して、再コンパイルのため動的コンパイラ２２８に出力する（ステップ１０１０）。動的コンパイラ２２８は、新たなトレースに対し高い最適化レベルで再コンパイルを行う。更新及び再コンパイル処理の詳細は図１１及び図１２を参照して後述する。そして処理は終了する。 Subsequently, the execution unit 212 reads the compiled trace from the code cache 216 and executes it (step 1004). In response to execution of the compiled trace by the execution unit 212, the trace selection engine 218 creates a TT graph (basic structure) (step 1006). Subsequently, the trace selection engine 218 starts profiling using timer-based sampling (step 1008). Subsequently, the trace selection engine 218 updates the TT graph based on the profiling result, and when a predetermined condition is satisfied, generates a new trace and outputs it to the dynamic compiler 228 for recompilation (step 1010). . The dynamic compiler 228 recompiles the new trace at a high optimization level. Details of the update and recompilation process will be described later with reference to FIGS. Then, the process ends.

次に図１１を参照して、図１０に示すＴＴグラフの更新及び再コンパイル処理（ステップ１０１０）の詳細を説明する。図１１に示すフローチャートはタイマー割り込みにより開始し、ＴＴグラフ更新部２２１は、バーストサンプリングのためのyieldflagを設定する（ステップ１１００）。続いて、yield flagの設定に応答して、実行部２１２が次のyield point で実行を停止する（ステップ１１０４）。 Next, details of the TT graph update and recompilation process (step 1010) shown in FIG. 10 will be described with reference to FIG. The flowchart shown in FIG. 11 is started by timer interruption, and the TT graph update unit 221 sets yieldflag for burst sampling (step 1100). Subsequently, in response to the setting of yield flag, the execution unit 212 stops execution at the next yield point (step 1104).

続いて、ＴＴグラフ更新部２２１は、停止位置のyield pointがトレースの先頭に挿入されたyieldpointであるか否かを判定する（ステップ１１０６）。トレースの先頭に挿入されたyieldpointでない場合（ステップ１１０６：ＮＯ）、即ち、周期トレースのバックエッジに挿入されたyieldpointである場合、ＴＴグラフ更新部２２１は今回の実行の停止がタイマー割り込み後の最初の実行停止であるか否かを判定する（ステップ１１１８）。 Subsequently, the TT graph update unit 221 determines whether or not the yield point at the stop position is the yield point inserted at the head of the trace (step 1106). When the yield point is not inserted at the beginning of the trace (step 1106: NO), that is, when the yield point is inserted at the back edge of the periodic trace, the TT graph update unit 221 first stops the execution after the timer interruption. It is determined whether or not execution is stopped (step 1118).

最初の実行停止である場合（ステップ１１１８：ＹＥＳ）、ＴＴグラフ更新部２２１は、現在のトレース、即ち周期トレースの再コンパイルカウンタを１増加する（ステップ１１２０）。続いてＴＴグラフ更新部２２１は、周期トレースの再コンパイルカウンタの値が所定の閾値Ｓ１より大きいか否かを判定する（ステップ１１２２）。再コンパイルカウンタの値が所定の閾値Ｓ１より大きい場合、トレース生成部２２２は次の実行時に周期トレースの先頭アドレスを開始位置として新たなトレースを生成し、動的コンパイラ２２８は新たなトレースを高い最適化レベルで再コンパイルする（ステップ１１２４）。なお、本実施例では周期トレースと線形トレースとを対象とするため、元の周期トレースと同一のトレースが再び生成されることになるが、適用される最適化レベルは前回よりも高いことに留意されたい。ステップ１１１８で最初の実行停止でない場合、又はステップ１１２２において周期トレースの再コンパイルカウンタの値が所定の閾値Ｓ１より大きくない場合、処理は終了する。 If it is the first execution stop (step 1118: YES), the TT graph update unit 221 increments the recompile counter of the current trace, that is, the periodic trace, by 1 (step 1120). Subsequently, the TT graph update unit 221 determines whether or not the value of the recompile counter of the periodic trace is larger than a predetermined threshold value S1 (step 1122). When the value of the recompile counter is larger than the predetermined threshold S1, the trace generation unit 222 generates a new trace with the start address of the periodic trace as the start position at the next execution, and the dynamic compiler 228 sets the new trace to a high optimum. Recompile at the optimization level (step 1124). In this embodiment, since periodic traces and linear traces are targeted, the same trace as the original periodic trace is generated again, but the optimization level applied is higher than the previous one. I want to be. If it is not the first stop in step 1118, or if the value of the recompile counter of the periodic trace is not greater than the predetermined threshold S1 in step 1122, the process ends.

一方、停止位置のyield pointがトレースの先頭に挿入されたyieldpointである場合（ステップ１１０６：ＹＥＳ）、ＴＴグラフ更新部２２１はリンクレジスタを参照して直前に実行されたトレースが判別可能か否かを判定する（ステップ１１０８）。直前に実行されたトレースが判別可能である場合（ステップ１１０８：ＹＥＳ）、ＴＴグラフ更新部２２１は、直前に実行されたトレースから現在のトレースへの遷移を示すＴＴグラフのエッジの重みを１増加する（ステップ１１１０）。 On the other hand, when the yield point at the stop position is the yield point inserted at the beginning of the trace (step 1106: YES), the TT graph update unit 221 refers to the link register to determine whether or not the trace executed immediately before can be determined. Is determined (step 1108). When the trace executed immediately before is discriminable (step 1108: YES), the TT graph update unit 221 increases the weight of the edge of the TT graph indicating the transition from the trace executed immediately before to the current trace by one. (Step 1110).

一方、直前に実行されたトレースが判別可能でない場合（ステップ１１０８：ＮＯ）、又はステップ１１１０の後処理はステップ１１１２へ進み、ＴＴグラフ更新部２２１は今回の実行の停止がタイマー割り込み後の最初の実行停止であるか否かを判定する。最初の実行停止である場合（ステップ１１１２：ＹＥＳ）、ＴＴグラフ更新部２２１は、ＴＴグラフを用いて再コンパイルカウンタの増加処理を行う（ステップ１１１４）。再コンパイルカウンタの増加処理の詳細は図１２を参照して後述する。 On the other hand, when the trace executed immediately before is not discriminable (step 1108: NO), or the post-processing of step 1110 proceeds to step 1112, and the TT graph update unit 221 first stops the current execution after the timer interruption. It is determined whether or not execution is stopped. When it is the first execution stop (step 1112: YES), the TT graph update unit 221 performs a recompile counter increment process using the TT graph (step 1114). Details of the recompile counter increment processing will be described later with reference to FIG.

一方、今回の実行停止がタイマー割り込み後の最初の停止でない場合（ステップ１１１２：ＮＯ）、又は再コンパイルカウンタの増加処理の後処理はステップ１１１６へ進み、ＴＴグラフ更新部２２１は、バーストサンプリングの終了条件が満たされるか否かを判定する。終了条件の内容は既に説明済みであるためここでは省略する。バーストサンプリングの終了条件が満されないと判定された場合（ステップ１１１６：ＮＯ）、処理はステップ１１０４へ戻り一連の処理を繰り返す。一方、バーストサンプリングの終了条件が満された場合（ステップ１１１６：ＹＥＳ）、又はステップ１１２４の再コンパイルの後処理は終了する。 On the other hand, when the current stop is not the first stop after the timer interruption (step 1112: NO), the post-processing of the recompile counter increment processing proceeds to step 1116, and the TT graph update unit 221 ends the burst sampling. Determine whether the condition is met. Since the contents of the end condition have already been explained, they are omitted here. If it is determined that the burst sampling end condition is not satisfied (step 1116: NO), the process returns to step 1104 to repeat a series of processes. On the other hand, when the burst sampling end condition is satisfied (step 1116: YES), the post-compilation post-processing in step 1124 ends.

次に図１２を参照して、図１１に示す再コンパイルカウンタの増加処理（ステップ１１１４）の詳細を説明する。図１２に示すフローチャートはステップ１２００から開始し、ＴＴグラフ更新部２２１は、現在のトレース、即ちその先頭のyieldpointで実行が停止したトレースを、現在の処理対象として選択する。続いてＴＴグラフ更新部２２１は、現在の処理対象のトレースがＴＴグラフ上で入ってくるエッジを有するか否かを判定する（ステップ１２０２）。 Next, details of the recompile counter increment process (step 1114) shown in FIG. 11 will be described with reference to FIG. The flowchart shown in FIG. 12 starts from Step 1200, and the TT graph update unit 221 selects the current trace, that is, the trace whose execution has stopped at the top yieldpoint, as the current processing target. Subsequently, the TT graph update unit 221 determines whether or not the current processing target trace has an incoming edge on the TT graph (step 1202).

現在の処理対象のトレースが入ってくるエッジを有する場合（ステップ１２０２：ＹＥＳ）、続いてＴＴグラフ更新部２２１は、その入ってくるエッジの元のトレースが、周期トレース、再コンパイル済みトレース、又は無効化されたトレースのいずれかであるか否かを判定する（ステップ１２０４）。なお、現在の処理対象のトレースが複数の入ってくるエッジを有する場合、その各々を現在の処理対象のエッジとしてステップ１２０４以下の処理を行う。 If the current trace to be processed has an incoming edge (step 1202: YES), then the TT graph updater 221 determines that the original trace of the incoming edge is a periodic trace, a recompiled trace, or It is determined whether the trace is one of invalidated traces (step 1204). If the current processing target trace has a plurality of incoming edges, the processing from step 1204 is performed with each of the current processing target edges as the current processing target edge.

元のトレースが上記３つのトレースのいずれでもない場合（ステップ１２０４：ＮＯ）、ＴＴグラフ更新部２２１は、現在の処理対象の入ってくるエッジの重みが小さいか否かを判定する（ステップ１２０６）。より具体的には、ＴＴグラフ更新部２２１は、現在の処理対象のトレースに入ってくるエッジが複数ある場合に該複数のエッジの重みの合計に対する現在の処理対象のエッジの重みの比率が所定の閾値Ｓ２より小さいか否かを判定する。また、ＴＴグラフ更新部２２１は、元のトレースが複数の出ていくエッジを有する場合に、該複数のエッジの重みの合計に対する現在の処理対象のエッジの重みの比率が所定の閾値Ｓ３より小さいか否かを判定する。 When the original trace is not one of the above three traces (step 1204: NO), the TT graph update unit 221 determines whether the weight of the incoming edge of the current processing target is small (step 1206). . More specifically, when there are a plurality of edges that enter the current processing target trace, the TT graph update unit 221 sets a ratio of the weight of the current processing target edge to the total of the weights of the plurality of edges. It is determined whether it is smaller than the threshold value S2. In addition, when the original trace has a plurality of outgoing edges, the TT graph update unit 221 has a ratio of the weight of the current processing target edge to the sum of the weights of the plurality of edges smaller than a predetermined threshold value S3. It is determined whether or not.

現在の処理対象の入ってくるエッジの重みが小さくない場合（ステップ１２０６：ＮＯ）、ＴＴグラフ更新部２２１は、元のトレースを次の現在の処理対象のトレースとし（ステップ１２０８）、ステップ１２０２の処理に戻る。一方、現在の処理対象の入ってくるエッジの重みが小さい場合（ステップ１２０６：ＹＥＳ）、現在の処理対象である入ってくるエッジについては処理を終了する。 When the weight of the incoming edge of the current processing target is not small (step 1206: NO), the TT graph update unit 221 sets the original trace as the next current processing target trace (step 1208), and Return to processing. On the other hand, when the weight of the incoming edge of the current processing target is small (step 1206: YES), the processing ends for the incoming edge that is the current processing target.

また、現在の処理対象のトレースに入ってくるエッジがない場合（ステップ１２０２：ＮＯ）、又は、元のトレースが、周期トレース、再コンパイル済みトレース又は無効化済みトレースのいずれかである場合（ステップ１２０４：ＹＥＳ）、処理はステップ１２１０へ進み、ＴＴグラフ更新部２２１は、現在の処理対象のトレースの再コンパイルカウンタを１増加する。 Also, when there is no edge that enters the current trace to be processed (step 1202: NO), or when the original trace is either a periodic trace, a recompiled trace, or a disabled trace (step (1204: YES), the process proceeds to step 1210, and the TT graph update unit 221 increments the recompilation counter of the current processing target trace by one.

続いてＴＴグラフ更新部２２１は、現在の処理対象のトレースの再コンパイルカウンタの値が所定の閾値Ｓ１より大きいか否かを判定する（ステップ１２１２）。再コンパイルカウンタの値が所定の閾値Ｓ１より大きくない場合、処理は終了する。再コンパイルカウンタの値が所定の閾値Ｓ１より大きい場合、トレース生成部２２２は次の実行時に現在処理対象のトレースの先頭アドレスを開始位置として所定長よりも長いトレースの生成を許可して新たなトレースを生成し、動的コンパイラ２２８は新たなトレースを高い最適化レベルで再コンパイルする（ステップ１２１４）。その後処理は終了する。 Subsequently, the TT graph update unit 221 determines whether or not the value of the recompile counter of the current trace to be processed is greater than a predetermined threshold value S1 (step 1212). If the value of the recompile counter is not greater than the predetermined threshold S1, the process ends. When the value of the recompile counter is larger than the predetermined threshold value S1, the trace generation unit 222 permits the generation of a trace longer than the predetermined length with the start address of the trace currently being processed as the start position at the next execution. And the dynamic compiler 228 recompiles the new trace at a high optimization level (step 1214). Thereafter, the process ends.

図１３は、これまで説明してきたＴＴグラフの生成及びＴＴグラフに基づく再コンパイルの擬似コードの一例である。 FIG. 13 is an example of pseudo code for generating a TT graph and recompiling based on the TT graph described so far.

６．実験結果
ＤａＣａｐｏベンチマークスイートを用いて本発明によるＴＴグラフに基づくマルチレベルのコンパイレーションを適用したトレースベースのＪＩＴコンパイラの性能を評価した。比較に用いた従来技術は以下の通りである。
従来技術１：長いトレースの生成を許可し、高い最適化レベルでコンパイル。アップグレードのための再コンパイルなし。
従来技術２：トレースの最大長を制限し、低い最適化レベルでコンパイル。アップグレードのための再コンパイルなし。
従来技術３：アップグレードのための再コンパイルあり。但し、ＴＴグラフを使用せずに、単に最大長の制限を緩和。 6). Experimental Results The performance of the trace-based JIT compiler using multi-level compilation based on the TT graph according to the present invention was evaluated using the DaCapo benchmark suite. The prior art used for the comparison is as follows.
Prior Art 1: Allows generation of long traces and compiles at a high optimization level. No recompilation for upgrade.
Prior Art 2: Compile at a low optimization level, limiting the maximum trace length. No recompilation for upgrade.
Prior Art 3: Recompile for upgrade. However, the maximum length is simply relaxed without using the TT graph.

図１４は、従来技術と本発明とで起動時間を比較した実験結果を示す。縦軸の起動時間は１イタレーション目の実行時間である。また図に示す値はＤａＣａｐｏの全ベンチマークの平均値である。従来技術１の起動時間が長いのは、当初より高い最適化レベルでコンパイルしているためである。これに対し本発明を含む残り３つは、起動時は低い最適化レベルでコンパイルを行っているため、起動時間が短い。 FIG. 14 shows the experimental results comparing the start-up time between the prior art and the present invention. The startup time on the vertical axis is the execution time of the first iteration. The values shown in the figure are average values of all DaCapo benchmarks. The reason why the startup time of the prior art 1 is long is that compiling is performed at a higher optimization level than at the beginning. On the other hand, since the remaining three including the present invention are compiled at a low optimization level at startup, the startup time is short.

図１５は、従来技術と本発明とでピーク性能を比較した実験結果を示す図である。縦軸は実行時間を示す。また、図に示す値はＤａＣａｐｏの全ベンチマークの平均値である。従来技術２の実行時間が長いのは、トレース長を制限し低い最適化レベルでコンパイルしているためである。またアップグレードのための再コンパイルを行いながらも従来技術３の実行時間が本発明よりも若干長くなっているのは、再コンパイルのためのトレースの選択が適切ではないため、最適化の機会を失ったり、重複したトレースを生成しているためと考えられる。 FIG. 15 is a diagram showing experimental results comparing peak performance between the prior art and the present invention. The vertical axis shows the execution time. The values shown in the figure are average values of all DaCapo benchmarks. The reason for the long execution time of the prior art 2 is that the trace length is limited and compiling is performed at a low optimization level. Moreover, the execution time of the prior art 3 is slightly longer than that of the present invention while performing recompilation for upgrade, because the selection of the trace for recompilation is not appropriate and the opportunity for optimization is lost. Or because it generates duplicate traces.

図１６は、従来技術と本発明とで合計コンパイル時間を比較した実験結果を示す図である。縦軸は合計コンパイル時間を示す。また、図に示す値はＤａＣａｐｏの全ベンチマークの平均値である。アップグレードのための再コンパイルを行っていないにも関わらず従来技術１の合計コンパイル時間が長いのは、長いトレースの生成を許可して高い最適化レベルでコンパイルしているためである。 FIG. 16 is a diagram showing experimental results comparing the total compile time between the prior art and the present invention. The vertical axis shows the total compilation time. The values shown in the figure are average values of all DaCapo benchmarks. Although the recompilation for the upgrade is not performed, the total compilation time of the conventional technique 1 is long because the compilation is performed at a high optimization level while allowing generation of a long trace.

図１４〜図１６の実験結果から、高速なスタートアップと高いピーク性能を両立させることができるのは、本発明によるＴＴグラフに基づくマルチレベルのコンパイレーションだけであることが分かる。 From the experimental results of FIGS. 14 to 16, it can be seen that only a multi-level compilation based on the TT graph according to the present invention can achieve both fast startup and high peak performance.

以上、実施形態を用いて本発明の説明をしたが、本発明の技術範囲は上記実施形態に記載の範囲には限定されない。上記の実施形態に、種々の変更または改良を加えることが可能であることが当業者に明らかである。例えば、上記説明した実施例では、起動の高速化を図る観点からは小さく単純なコンパイル単位に区切ることが好ましいことから、コンパイル単位とするトレースは、線形トレースと、周期トレースであり、各トレースは、図３（ａ）及び図３（ｂ）に示すように、１つの入り口と１以上の出口をもつブロックとした。しかしながら、本発明は、ツリー上のトレース（分岐を含むが合流を含まない）や周期トレースと線形トレースを組み合わせたトレース、更には合流を含むような複雑なトレースなど他のトレースに対しても同様の方法により適用可能である。また上記説明した実施例では、トレースリンキング技術とbranch-and-link命令を使用して、トレース間の遷移があった場合にその直前に実行されたトレースを特定可能とした。しかしながらこれら技術や命令を使用せずに、現在のトレースから次のトレースへジャンプする際にジャンプ前のアドレスを特定のレジスタやメモリに保存する構成としたり、トレース間の遷移はトレースディスパッチャ２１４を経由する構成とし、トレースディスパッチャ２１４において必要な情報を収集するようにしたりしてもよい。従って、そのような変更または改良を加えた形態も当然に本発明の技術的範囲に含まれる。 As mentioned above, although this invention was demonstrated using embodiment, the technical scope of this invention is not limited to the range as described in the said embodiment. It will be apparent to those skilled in the art that various modifications or improvements can be added to the above-described embodiments. For example, in the embodiment described above, since it is preferable to divide into small and simple compile units from the viewpoint of speeding up the startup, the traces used as compile units are a linear trace and a periodic trace. As shown in FIGS. 3 (a) and 3 (b), a block having one inlet and one or more outlets was formed. However, the present invention also applies to other traces such as traces on the tree (including branches but not including merging), traces combining periodic and linear traces, and even complex traces including merging. This method can be applied. In the above-described embodiment, the trace linking technique and the branch-and-link instruction are used to specify the trace executed immediately before a transition between traces. However, without using these techniques and instructions, when jumping from the current trace to the next trace, the address before the jump is saved in a specific register or memory, and the transition between traces goes through the trace dispatcher 214. The trace dispatcher 214 may collect necessary information. Therefore, it is a matter of course that embodiments with such changes or improvements are also included in the technical scope of the present invention.

Claims

A computer generated trace method,
(A) The computer creates a directed graph representing a transition of execution between traces based on execution of a compiled trace obtained by compiling a trace whose maximum length is limited to a predetermined length or less. The creating step wherein each node exhibiting a trace has a recompile counter;
(B) In the timer-based sampling during execution of the compiled trace, the computer traces the edge of the directed graph in the reverse direction starting from the node corresponding to the trace hit by the timer tick, and performs periodic tracing or recompilation. Increasing the recompilation counter for traces that have stopped and arrived before the end of the trace or when there are no more edges;
(C) On the condition that the value of any one of the recompilation counters exceeds the first threshold, the computer newly sets the head of the trace corresponding to the node having the recompilation counter exceeding the first threshold. Determining the beginning of a trace, allowing the generation of a trace longer than the predetermined length and generating the new trace;
Trace generation method including

The trace generation method according to claim 1, wherein the compiled trace has a yield point for finding a trace hit by the timer tick inserted only at a trace entrance and a loop back edge.

Each edge of the directed graph has a weight that indicates the relative frequency of the transition represented by the edge, and the computer increases the weight of the edge between the trace hit by the timer tick and the trace executed immediately before The trace generation method according to claim 2, further comprising step (d), wherein in step (b), the computer traces only an edge whose weight satisfies a predetermined condition.

The edge that satisfies the predetermined condition is that when there are a plurality of edges that enter a node existing in the reverse direction of the directed graph, the ratio of the weight of the edge to the total weight of the plurality of edges is the first. The trace generation method according to claim 3, wherein the edge is an edge exceeding two threshold values.

The edge satisfying the predetermined condition is that when there are a plurality of edges coming out from the node to be traced next, the weight of the edge from the current node to the next node with respect to the sum of the weights of the plurality of edges. The trace generation method according to claim 3, wherein the ratio is an edge that exceeds a third threshold.

The step (d) includes the step of setting the computer to stop execution continuously at the yield point of one or more traces following the trace hit by the timer tick on the directed graph. Trace generation method described in 1.

The setting for stopping the continuous execution is that the execution has reached one of a periodic trace, a trace that has already been stopped, and a trace that has been compiled. 7. The method of generating a trace according to claim 6, wherein the trace is terminated in response to exiting a trace in which no next trace exists.

In the compiled trace, an instruction that records a pointer of an instruction that has exited the trace due to its execution is inserted, and in step (d), execution stops at the yield point inserted at the entrance of the trace. 4. The trace generation method according to claim 3, further comprising the step of identifying the trace executed immediately before by referring to a value of a pointer of the recorded instruction.

The trace generation program for making a computer perform each step of the trace generation method as described in any one of Claims 1 thru | or 8.

A multi-level compilation execution method by a computer,
(A) Compiling a trace generated by limiting the maximum length to a predetermined length or less by the computer;
(B) the computer obtaining an execution result of the generated compiled trace;
(C) the computer executing each step of the method according to any one of claims 1 to 8 on the acquired execution result;
(D) the computer recompiling the new trace generated as a result of step (c);
Multi-level compilation execution method including

A multi-level compilation execution program for causing a computer to execute each step of the multi-level compilation execution method according to claim 10.

A trace generator,
A directed graph generation unit that generates a directed graph representing a transition of execution between traces based on the execution of a compiled trace obtained by compiling a trace whose maximum length is limited to a predetermined length or less. The directed graph generator, wherein the node has a recompile counter;
In timer-based sampling during execution of the compiled trace, the edge of the directed graph is traced backward starting from the node corresponding to the trace hit by the timer tick, and the front or edge of the periodic trace or the recompiled trace is A directed graph updater that increases the recompilation counter of the trace that has stopped and arrived when it has run out;
On the condition that the value of any one of the recompile counters exceeds a first threshold value, the head of the trace corresponding to the node having the recompile counter exceeding the first threshold value is determined as the head of a new trace; A generation unit that allows generation of a trace longer than the predetermined length and generates the new trace;
Trace generation device including

13. The trace generation device according to claim 12, wherein the compiled trace has a yield point for finding a trace hit by the timer tick inserted only at a trace entrance and a loop back edge.

Each edge of the directed graph has a weight indicating the relative frequency of the transition represented by the edge, and the directed graph update unit further includes the edge of the edge between the trace hit by the timer tick and the trace executed immediately before the trace. The trace generation device according to claim 13, wherein when the weight is increased and the edge of the directed graph is traced backward, only the edge whose weight satisfies a predetermined condition is traced.

The edge that satisfies the predetermined condition is that when there are a plurality of edges that enter a node existing in the reverse direction of the directed graph, the ratio of the weight of the edge to the total weight of the plurality of edges is the first. The trace generation device according to claim 14, wherein the trace generation device is an edge exceeding two threshold values.

The edge satisfying the predetermined condition is that when there are a plurality of edges coming out from the node to be traced next, the weight of the edge from the current node to the next node with respect to the sum of the weights of the plurality of edges. The trace generation device according to claim 14, wherein the ratio is an edge exceeding a third threshold.

The trace generation according to claim 14, wherein the directed graph update unit further performs setting to stop execution continuously at a yield point of one or more traces following the trace hit by the timer tick on the directed graph. apparatus.

The setting for stopping the continuous execution is that the execution has reached one of a periodic trace, a trace that has already been stopped, and a trace that has been compiled. 18. The trace generation apparatus according to claim 17, wherein the trace generation apparatus terminates in response to exiting a trace in which a next trace does not exist.

In the compiled trace, an instruction that records a pointer of an instruction that has exited the trace due to its execution is inserted, and the directed graph update unit stops executing at the yield point inserted at the entrance of the trace. The trace generation device according to claim 14, wherein the trace executed immediately before is specified by referring to a value of a pointer of the instruction recorded in response thereto.

A multi-level compilation device,
A compiler that compiles traces generated with a maximum length that is less than or equal to a predetermined length;
The trace generation device according to any one of claims 12 to 19, which receives an execution result of the generated compiled trace as an input.
The compiler is a multi-level compilation device that recompiles the new trace output by the trace generation device.