Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Efficient discovery of regular stride patterns in irregular programs and its use in compiler prefetching

Published: 17 May 2002 Publication History

Abstract

Irregular data references are difficult to prefetch, as the future memory address of a load instruction is hard to anticipate by a compiler. However, recent studies as well as our experience indicate that some important load instructions in irregular programs contain stride access patterns. Although the load instructions with stride patterns are difficult to identify with static compiler techniques, we developed an efficient profiling method to discover these load instructions. The new profiling method integrates the profiling for stride information and the traditional profiling for edge frequency into a single profiling pass. The integrated profiling pass runs only 17% slower than the frequency profiling alone. The collected stride information helps the compiler to identify load instructions with stride patterns that can be prefetched efficiently and beneficially. We implemented the new profiling and prefetching techniques in a research compiler for Itanium Processor Family (IPF), and obtained significant performance improvement for the SPECINT2000 programs running on Itanium machines. For example, we achieved a 1.59x speedup for 181.mcf, 1.14x for 254.gap, and 1.08x for 197.parser. We also showed that the performance gain is stable across input data sets. These benefits make the new profiling and prefetching techniques suitable for production compilers.

References

[1]
Ball, T. and J. Larus, "Optimally profiling and tracing programs," ACM Transactions on Programming Languages and Systems, 16(3): 1319-1360, July 1994]]
[2]
Calder, B., P. Feller, and A. Eustance, "Value Profiling," MICRO30, Dec. 1997]]
[3]
Callahan, D., K. Kennedy, and A. Porterfield, "Software Prefetching", in Proceedings of the Fourth International Conference on Architecture Support for Programming Languages and Operating Systems, 1991, 40-52]]
[4]
Chang, P. P, S. Mahlke, and W.M. Hwu, "Using profile information to assist classic code optimizations," Software-practice and Experience, 1991]]
[5]
Collins, J., H. Wang, H. Christopher, D. Tullsen, C. J. Hughes, Y. F. Lee, D. Lavery and J. Shen, "Speculative Pre-computation: Long-range Prefetching of Delinquent Loads," ISCA28, 2001]]
[6]
Dahlgren, F., Stenstrom, P., "Evaluation of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors", IEEE Transactions on Parallel and Distributed Systems, Vol. 7, No. 4, April 1996]]
[7]
Fisher, J. and S. Freudenberger, "Predicting Conditional Branch Directions From Previous Runs of a Program," Proc. 5th Annual Intl. Conf. on Architecture Support for Prog. Lang. and Operating Systems, Oct. 1992]]
[8]
Ghiya, R., D. Lavery, D. Sehr, "On the Importance of Points-to Analysis and Other Memory Disambiguation Methods for C Programs," PLDI2001, May 2001]]
[9]
Henning, J.L., "SPEC CPU2000: Measuring CPU Performance in the New Millennium," IEEE Computer, July 2000]]
[10]
Intel Corp, "Benchmarks: Intel® Itanium™ based systems," http://www.intel.com/eBusiness/products/ia64/overview/bm012101.htm]]
[11]
Intel Corp, Intel® Itanium™ Processor Hardware Developer's Manual, 2000. http://developer.intel.com/design/ia-64/manuals.htm]]
[12]
Jouppi, N., "Improving direct-mapped cache performance by the addition of a small fully associative cache and prefetch buffers," ISCA17, May 1990]]
[13]
Krishnaiyer, R., D. Kulkarni, D. Lavery, W. Li, C. Lim, J. Ng, and D. Sehr, "An Advanced Optimizer for the IA64 Architecture, IEEE Micro, Vol 20, No 6, Nov 2000, 60-68]]
[14]
Lipasti, M.H., W.J. Schmidt, S.R. Kunkel, and R.R. Roediger, "SPAID: Software Prefetching in Pointer and Call Intensive Environments", MICRO28, Nov 1995, 231-236]]
[15]
Luk, C.K. and T.C. Mowry, "Compiler-Based Prefetching for Recursive Data Structures," in Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems, September 1996, 222-233]]
[16]
Luk, C.K., "Tolerating Memory Latency through Software-Controlled Pre-Execution in Simultaneous Multithreading Processors," ISCA28, 2001]]
[17]
Mahlke, S.A., D.C. Lin, W.Y. Chen, R.E. Hank, and R.A. Bringmann, "Effective Compiler Support for Predicated Execution Using Hyperblock," MICRO25, Dec. 1992]]
[18]
Mowry, T.C., M.S. Lam, and A. Gupta, "Design and Evaluation of a Compiler Algorithm for Prefetching," in Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, October 1992, 62-73]]
[19]
Muth, R., S. Watterson, S. Debray, "Code Specialization based on Value Profiling," SAS2000]]
[20]
Palacharla, S. and R. Kessler, "Evaluating stream buffers as secondary cache replacement," ISCA21, April 1994]]
[21]
Roth, A., A. Moshovos, and G. Sohi, "Dependence Based Prefetching for Linked Data Structures," Proc. 8th ASPLOS, pages 115-126. Oct. 1998]]
[22]
Roth, A., and G. Sohi. "Effective Jump-Pointer Prefetching for linked data structures," ISCA26, June 1999, 111-121]]
[23]
Santhanam, V., E. Gornish, and W. Hsu, "Data Prefetching on the HP PA-8000," ISCA24, June 1997, 264 - 273]]
[24]
Serrano, M. J. and Y. Wu, "Memory Performance Analysis of SPEC2000C for the Intel Itanium™ Processor," IEEE 4th Annual Workshop on Workload Characterization, in Conjunction with MICRO34, Austin, Texas, December 2, 2001]]
[25]
Sherwood T., S. Sair, B. Calder, "Predictor-Directed Stream Buffers," MICRO33, Dec. 2000]]
[26]
Stoutchinin, A., J. N. Amaral, G. Gao, J. Dehnert, S. Jain, and A. Douillet "Speculative Prefetching of Induction Pointers," in Proceedings of CC 2001, Geneva, Italy, 2 - 6 April, 2001. Also in LNCS 2207, pp 289-303, 2001]]
[27]
Wu, Y., M. Serrano, R. Krishnaiyer, W. Li, J. Fang, "Value-Profile Guided Stride Prefetching for Irregular Code," in Proceedings of CC 2002, April 6 - 14, 2002, Grenoble, France]]
[28]
Zilles, C. and G. Sohi, "Execution-based Prediction Using Speculative Slices," ISCA28, 2001]]

Cited By

View all
  • (2015)AREPProceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT)10.1109/PACT.2015.35(367-378)Online publication date: 18-Oct-2015
  • (2014)A hardware architecture to deploy complex multiprocessor scheduling algorithms2014 IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications10.1109/RTCSA.2014.6910541(1-10)Online publication date: Aug-2014
  • (2014)Light-PREM: Automated software refactoring for predictable execution on COTS embedded systems2014 IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications10.1109/RTCSA.2014.6910515(1-10)Online publication date: Aug-2014
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 37, Issue 5
May 2002
326 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/543552
Issue’s Table of Contents
  • cover image ACM Conferences
    PLDI '02: Proceedings of the ACM SIGPLAN 2002 conference on Programming language design and implementation
    June 2002
    338 pages
    ISBN:1581134630
    DOI:10.1145/512529
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 May 2002
Published in SIGPLAN Volume 37, Issue 5

Check for updates

Author Tags

  1. data prefetching
  2. integrated stride and frequency profiling
  3. performance evaluation
  4. phased multi-strided loads
  5. strongly single-strided loads

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)5
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2015)AREPProceedings of the 2015 International Conference on Parallel Architecture and Compilation (PACT)10.1109/PACT.2015.35(367-378)Online publication date: 18-Oct-2015
  • (2014)A hardware architecture to deploy complex multiprocessor scheduling algorithms2014 IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications10.1109/RTCSA.2014.6910541(1-10)Online publication date: Aug-2014
  • (2014)Light-PREM: Automated software refactoring for predictable execution on COTS embedded systems2014 IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications10.1109/RTCSA.2014.6910515(1-10)Online publication date: Aug-2014
  • (2014)BibliographyAdvanced Backend Code Optimization10.1002/9781118625446.biblio(327-343)Online publication date: 3-Jun-2014
  • (2009)On Instruction-Level Method for Reducing Cache Penalties in Embedded VLIW ProcessorsProceedings of the 2009 11th IEEE International Conference on High Performance Computing and Communications10.1109/HPCC.2009.32(196-205)Online publication date: 25-Jun-2009
  • (2008)Compiler driven data layout optimization for regular/irregular array access patternsACM SIGPLAN Notices10.1145/1379023.137566443:7(41-50)Online publication date: 12-Jun-2008
  • (2008)Compiler driven data layout optimization for regular/irregular array access patternsProceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems10.1145/1375657.1375664(41-50)Online publication date: 12-Jun-2008
  • (2008)Runtime engine for dynamic profile guided stride prefetchingJournal of Computer Science and Technology10.1007/s11390-008-9159-223:4(633-643)Online publication date: 1-Jul-2008
  • (2006)Address-Value Delta (AVD) PredictionIEEE Transactions on Computers10.1109/TC.2006.19155:12(1491-1508)Online publication date: 1-Dec-2006
  • (2005)A Survey of Adaptive Optimization in Virtual MachinesProceedings of the IEEE10.1109/JPROC.2004.84030593:2(449-466)Online publication date: Feb-2005
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media