Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/951710.951728acmconferencesArticle/Chapter ViewAbstractPublication PagesesweekConference Proceedingsconference-collections
Article

Frequent loop detection using efficient non-intrusive on-chip hardware

Published: 30 October 2003 Publication History

Abstract

Dynamic software optimization methods are becoming increasingly popular for improving software performance and power. The first step in dynamic optimization consists of detecting frequently executed code, or "critical regions." Previous critical region detectors have been targeted to desktop processors. We introduce a critical region detector targeted to embedded processors, with the unique features of being very size and power efficient, and being completely non-intrusive to the software's execution - features needed in timing-sensitive embedded systems. Our detector not only finds the critical regions, but also determines their relative frequencies, a potentially important feature for selecting among alternative dynamic optimization methods. Our detector uses a tiny cache coupled with a small amount of logic. We provide results of extensive explorations across seventeen embedded system benchmarks. We show that highly accurate results can be achieved with only a 0.02% power overhead and acceptable size overhead. Our detector is currently being used as part of a dynamic hardware/software partitioning approach, but is applicable to a wide-variety of situations.

References

[1]
Anderson, J., Berc, L.M., Dean, J., Ghemawat, S., Henzinger, M.R., Leung, S.T.A., Sites, R.L., Vandevoorde, M.T., Waldspurger, C.A., Weihl, W.E. Continuous profiling: where have all the cycles gone? 16th ACM Symp. of Operating Systems Design, 1997.
[2]
Artisan, http://www.artisan.com.
[3]
Bala, V., Duesterwald, E., Banerjia. Dynamo: a transparent dynamic optimization system. Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implemenation, 2000.
[4]
Bellas, N., et al. Energy and performance improvements in microprocessor design using a loop cache. ICCD, pp. 378--383, 1999.
[5]
Burger, D., Austin, T., Bennet, S. Evaluating future microprocessors: the simplescalar toolset. University of Wisconsin-Madison. Computer Science Department Tech. Report CS-TR-1308, July 2000.
[6]
Calder, B., Feller, P., Eustace, A. Value profiling. MICRO pp. 259--267, 1997.
[7]
Cmelik, R., SpixTools - introduction and user's manual, Sun Microsystems Laboratories, Inc. Technical Report SMLI TR 93-6, 2/93.
[8]
Dean, J., Hicks, J., Waldspurger, C.A., Weihl, W.E., Chrysos, G. ProfileMe: Hardware support for instruction level profiling on out-of-order processors, MICRO 1997.
[9]
Gordon-Ross, A., Cotterell, S., Vahid, F. Exploiting fixed programs in embedded systems: a loop cache example. IEEE Computer Architecture Letters, Vol 1, January 2002.
[10]
Govindarajan, S.C., Ramaswamy, G., Mehendale, M. Area and power reduction of embedded DSP systems using instruction compression and re-configurable encoding. International Conference on Computer Aided Design, 2001.
[11]
Grahm, S.L., Kessler, P.B., McKusick, M.K. Gprof: a call graph execution profiler. SIGPLAN Symp. on Compiler Construction, 1982.
[12]
IEEE, IEEE 1149.1 Standard Test Access Port and Boundary-Scan Architecture, http://standards .ieee.org, 2001.
[13]
Ishihara, Y., Yasuura, H. A power reduction technique with object code merging for application specific embedded processors. Design Automation and Test in Europe, March 2000.
[14]
Hennessy, J.L. and Patterson, D.A. Computer architecture: a quantitative approach. Morgan Kaufmann, 1990.
[15]
Kiefendorff, K. Transistor Budgets Go Ballistic. Microprocessor Report, Volume 12, Number 10, August 1998, pp. 34--43.
[16]
Klaiber, A. The technology behind crusoe processors. Transmeta Technical Brief. January 2000.
[17]
Lee, C., Potkonjak, M., Mangione-Smith, W.H. MediaBench: a tool for evaluating and synthesizing multimedia and communication systems. Proc 30th Annual International Symposium on Microarchitecture, Dec 1997.
[18]
Lee, L.H., Moyer, B., Arends, J. Instruction fetch energy reduction using loop caches for embedded applications with small tight loops. International Symposium On Low Power Electronics and Design, 1999.
[19]
Lysecky, R, Vahid, F. A codesigned on-chip logic minimizer. First IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), 2003.
[20]
Lysecky, R., Vahid, F. On-chip logic minimization. Proceedings of the 40th ACM/IEEE Conference on Design Automation (DAC), 2003.
[21]
Malik, A., Moyer, W., Cermak, D. A low power unified cache architecture providing power and performance flexibility. ISLPED, 2000.
[22]
Merten, M.C., Trick, A. R., George, C.N., Gyllenhaal, J., Hwu, W.W. A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization. ISCA 1999.
[23]
MIPS Technologies, http://www.mips.com/content/Products/Cores/32BitCores/MIPS324KFamily/ProductCatalog/P_MIPS324KFamily/productBrief
[24]
Pettis, K., Hansen, R.C. Profile guided code positioning. ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 1990.
[25]
Scott, J., Lee, L.H., Chin, A., Arends, J., Moyer, W. Designing the M*CORE M3 CPU architecture. IEEE International Conference on Computer Design (ICCD), 1999.
[26]
Stitt, G., Lysecky, R., Vahid, F. Dyanmic hardware/software partitioning: a first approach. Proceedings of the 40th ACM/IEEE Conference on Design Automation (DAC), 2003.
[27]
Suresh, D.C., Najjar, W.A., Vahid, F., Villarreal, J.R., Stitt, G. Profiling tools for hardware/software partitioning of embedded applications. Languages, Compilers and Tools for Embedded Systems (LCTES), 2003, pp. 189--198.
[28]
Synopsys Inc., http://www.synopsys.com.
[29]
Tubella, J., Gonzalez, A. Control speculation in multithreaded processors through dynamic loop detection. http://www.cs.ucr.edu/~dalton/refs/ann_ref/profiling/hpca98_tubella_dynamic_loop_detection.pdf In Proceedings of the Fourth International Symposium On High Performance Computer Architecture (HPCA), 1998.
[30]
Vtune Environment, Intel Corp., http://developer.intel.com/vtune
[31]
Yang, J., Gupta, Rajiv. Energy efficient frequent value data cache design. MICRO 2002.
[32]
Zagha, M., Larson, B., Turner, S., Itzkowitz, M. Performance analysis using the MIPS R10000 performance counters. Supercomputing, Nov. 1996.
[33]
Zhang, X., et al. System support for automatic profiling and optimizations. Proceedings of the 16th Symposium on Operating System Principles, 1997.

Cited By

View all

Index Terms

  1. Frequent loop detection using efficient non-intrusive on-chip hardware

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CASES '03: Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
    October 2003
    340 pages
    ISBN:1581136765
    DOI:10.1145/951710
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 October 2003

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. dynamic optimization
    2. frequent loop detection
    3. frequent value profiling
    4. hardware profiling
    5. hot spot detection
    6. on-chip profiling
    7. runtime profiling

    Qualifiers

    • Article

    Conference

    CASES03
    Sponsor:

    Acceptance Rates

    CASES '03 Paper Acceptance Rate 31 of 162 submissions, 19%;
    Overall Acceptance Rate 52 of 230 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 16 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2013)DLICACM Transactions on Embedded Computing Systems10.1145/251246413:1(1-26)Online publication date: 5-Sep-2013
    • (2013)Adaptive loop caching using lightweight runtime control flow analysisACM Transactions on Embedded Computing Systems10.1145/2435227.243525112:1s(1-23)Online publication date: 29-Mar-2013
    • (2012)Randomized Instruction Injection to Counter Power Analysis AttacksACM Transactions on Embedded Computing Systems10.1145/2345770.234578211:3(1-28)Online publication date: 1-Sep-2012
    • (2012)Loop instruction caching for energy-efficient embedded multitasking processors2012 IEEE 10th Symposium on Embedded Systems for Real-time Multimedia10.1109/ESTIMedia.2012.6507036(97-106)Online publication date: Oct-2012
    • (2011)Binary acceleration using coarse-grained reconfigurable architectureACM SIGARCH Computer Architecture News10.1145/1926367.192637438:4(33-39)Online publication date: 14-Jan-2011
    • (2011)Compiler-assisted technique for rapid performance estimation of FPGA-based processors2011 IEEE International SOC Conference10.1109/SOCC.2011.6085116(341-346)Online publication date: Sep-2011
    • (2010)Enabling large decoded instruction loop caching for energy-aware embedded processorsProceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems10.1145/1878921.1878957(247-256)Online publication date: 24-Oct-2010
    • (2010)Performance estimation framework for FPGA-based processors2010 International Conference on Field-Programmable Technology10.1109/FPT.2010.5681448(413-416)Online publication date: Dec-2010
    • (2009)Autonomous hardware/software partitioning and voltage/frequency scaling for low-power embedded systemsACM Transactions on Design Automation of Electronic Systems10.1145/1640457.164045915:1(1-20)Online publication date: 28-Dec-2009
    • (2009)Design and implementation of a MicroBlaze-based warp processorACM Transactions on Embedded Computing Systems10.1145/1509288.15092948:3(1-22)Online publication date: 22-Apr-2009
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media