Abstract
Dynamic optimization relies on runtime profile information to improve the performance of program execution. Traditional profiling techniques incur significant overhead and are not suitable for dynamic optimization. In this paper, a new profiling technique is proposed, that incorporates the strength of both software and hardware to achieve near-zero overhead profiling. The compiler passes profiling requests as a few bits of information in branch instructions to the hardware, and the processor executes profiling operations asynchronously in available free slots or on dedicated hardware. The compiler instrumentation of this technique is implemented using an Itanium research compiler. The result shows that the accurate block profiling incurs very little overhead to the user program in terms of the program scheduling cycles. For example, the average overhead is 0.6% for the SPECint95 benchmarks. The hardware support required for the new profiling is practical. The technique is extended to collect edge profiles for continuous phase transition detection. It is believed that the hardware-software collaborative scheme will enable many profile-driven dynamic optimizations for EPIC processors such as the Itanium processors.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
August D I, Connors D A, Mahlke S A et al. Integrated predicated and speculative execution in the IMPACT EPIC architecture. In Proc. 25th Annual International Symposium on Computer Architecture, Barcelona, Spain, 1998, pp.227–237.
Intel Corp. Itanium Application Developers Architecture Guide. May 1999.
Schlansker M S, Rau B R. EPIC: Explicitly parallel instruction computing. Computer, Feb. 2000, 33(2): 37–45.
Ball Thomas, Larus James. Optimally profiling and tracing programs. ACM Trans. Programming Languages and Systems, July 1994, 16(3): 1319–1360.
Ball Thomas, Larus James. Efficient path profiling. MICRO-29, Paris, France, Dec. 1996, pp.46–57.
Anderson J, Berc L M, Dean J et al. Continuous profiling: Where have all the cycles gone?. In Proc. 16th Symposium on Operating System Principles, Oct. 1997, pp.1–4.
Zhang Xiaolan, Wang Zheng, Gloy Nicholas et al. System support for automated profiling and optimization. In 16th ACM Symposium on Operating System Principles, Saint Malo, france, Oct. 5–8, 1997, pp.15–26.
Diep Trung A, Christopher Neslson, John P Shen. Performance evaluation of the PowerPC 620 microarchitecture. In Proc. the 22nd Annual Int. Symp. Computer Architecture, Santa Margherita Ligure, Italy, June 1995, pp.163–174.
Dean J, Hicks J E, Waldspurger C A et al. ProfileMe: Hardware support for instruction-level profiling on out-of-order processors. In Proc. 30th Annual International Symposium on Microarchitecture, Research Triangle Park, North Carolina, Dec. 1997.
Knuth D E, Stevenson F R. Optimal measurement of points for program frequency counts. BIT Numerical Mathematics, Kluwer Academic Publishers, B.V., 1973, 3(3): 313–322.
Lee Yong-Fong, Barbara G Ryder. A comprehensive approach to parallel data flow analysis. In Proc. the ACM Int. Conf. Supercomputing, Washington DC, U.S.A., July 1992, pp.236–247.
Pettis K, Hansen R C. Profile guided code positioning. In Proc. SIGPLAN 1990 Conf. Programming Language Design and Implementation, White Plain, NY, June 1990, pp.16–27.
Smith M. Overcoming the challenges to feedback-directed optimization. In Proc. the ACM SIGPLAN Workshop on Dynamic and Adaptive Compilation and Optimization, Boston, Jan. 18, 2000.
Arnold Matthew, Barbara G Ryder. A framework for reducing the cost of instrumented code. In Proc. the ACM SIGPLAN'01 Conf. Programming Language Design and Implementation, Snowbird, Utah, United States, June 2001, pp.168–179.
Hirzel M, Chilimbi T. Bursty tracing: A framework for low-overhead Temporal Profiling. In Workshop on Feed-back-Directed and Dynamic Optimizations (FDDO), Austin, Texas, 2001.
Merten Matthew C, Andrew R Trick, Christopher N George et al. A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization. In Proc. the 26th Int. Symp. Computer Architecture, Atlanta, GA, May 1999, pp.136–147.
Merten M C, Trick A R, Nystrom E M et al. A hardware mechanism for dynamic extraction and relayout of program hot spots. In Proc. the 27th Int. Symp. Computer Architecture, Vancouver BC, 2000, pp.59–70.
Conte T M, Petal B A, Cox J S. Using branch handling hardware to support profile-driven optimization. In Proc. 27th Annual Intl. Symp. Microarchitecture, Paris, France, Dec. 1996, pp.36–45.
Conte T M, Menezes K N, Hirsh M A. Accurate and practical profile-driven compilation using the profile buffer. In Proc. 29th Annual Int. Symp. Microarchitecture, San Jose, U.S.A., Nov. 1994, pp.12–21.
Ebcioglu K, Altman E, Gschwind M, Sathaye S. Dynamic binary translation and optimization. IEEE Trans. Computers, June 2001, 50(6): 529–548.
Eichenberger A, Sheldon M Lobo. Efficient edge profiling for ILP-processor. In Proc. Int. Conf. Parallel Architectures and Compilation Techniques, Paris, France, Oct. 1998, pp.294–303.
Schnarr Eric, Larus James. Instruction scheduling and executable editing. In Proc. 29th Annual Int. Symp. Microarchitecture, Paris, France, Dec. 1996, pp.288–297.
Author information
Authors and Affiliations
Corresponding author
Additional information
Youfeng Wu received his B.S. degree from Fudan University and his M.S. and Ph.D. degrees from Oregon State University, in computer science. He is currently a principal engineer with Intel's Corporate Technology Group and manages a research team on multiprocessor compilation and dynamic binary optimizations. His research interests include parallel programming and transformations, multiprocessor architecture, binary and dynamic optimizations, and security and safety enhancement via compiler and binary tools.
Yong-Fong Lee received his M.S. and Ph.D. degrees from Rutgers University, both in computer science. He is currently a principal engineer with Intel's Software and Solutions Group and leading a team in working with key ISV's to optimize their solutions on Intel platforms. His technical interests include programming languages & compilers, computer architecture, and performance optimization of server applications.
Rights and permissions
About this article
Cite this article
Wu, Y., Lee, YF. Hardware-Software Collaborative Techniques for Runtime Profiling and Phase Transition Detection. J Comput Sci Technol 20, 665–675 (2005). https://doi.org/10.1007/s11390-005-0665-1
Revised:
Issue Date:
DOI: https://doi.org/10.1007/s11390-005-0665-1