Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Locality phase prediction

Published: 07 October 2004 Publication History

Abstract

As computer memory hierarchy becomes adaptive, its performance increasingly depends on forecasting the dynamic program locality. This paper presents a method that predicts the locality phases of a program by a combination of locality profiling and run-time prediction. By profiling a training input, it identifies locality phases by sifting through all accesses to all data elements using variable-distance sampling, wavelet filtering, and optimal phase partitioning. It then constructs a phase hierarchy through grammar compression. Finally, it inserts phase markers into the program using binary rewriting. When the instrumented program runs, it uses the first few executions of a phase to predict all its later executions.Compared with existing methods based on program code and execution intervals, locality phase prediction is unique because it uses locality profiles, and it marks phase boundaries in program code. The second half of the paper presents a comprehensive evaluation. It measures the accuracy and the coverage of the new technique and compares it with best known run-time methods. It measures its benefit in adaptive cache resizing and memory remapping. Finally, it compares the automatic analysis with manual phase marking. The results show that locality phase prediction is well suited for identifying large, recurring phases in complex programs.

References

[1]
F. Allen and J. Cocke. A proram data flow analysis procedure. Communications of the ACM, 19:137--147, 1976.]]
[2]
R. Balasubramonian, D. Albonesi, A. Buyuktosunoglu, and S. Dwarkadas. Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures. In Proceedings of the 33rd International Symposium on Microarchitecture, Monterey, California, December 2000.]]
[3]
R. Balasubramonian, S. Dwarkadas, and D. H. Albonesi. Dynamically managing the communication-parallelism trade-off in future clustered processors. In Proceedings of International Symposium on Computer Architecture, San Diego, CA, June 2003.]]
[4]
A. P. Batson and A. W. Madison. Measurements of major locality phases in symbolic reference strings. In Proceedings of the ACM SIGMETRICS Conference on Measurement & Modeling Computer Systems, Cambridge, MA, March 1976.]]
[5]
T. M. Chilimbi. Efficient representations and abstractions for quantifying and exploiting data reference locality. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Snowbird, Utah, June 2001.]]
[6]
R. Das, M. Uysal, J. Saltz, and Y.-S. Hwang. Communication optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22(3):462--479, September 1994.]]
[7]
I. Daubechies. Ten Lectures on Wavelets. Capital City Press, Montpelier,Vermont, 1992.]]
[8]
P.J. Denning. Working sets past and present. IEEE Transactions on Software Engineering, SE-6(1), January 1980.]]
[9]
A. S. Dhodapkar and J. E. Smith. Managing multi-configuration hardware via dynamic working-set analysis. In Proceedings of International Symposium on Computer Architecture, Anchorage, Alaska, June 2002.]]
[10]
A. S. Dhodapkar and J. E. Smith. Comparing program phase detection techniques. In Proceedings of International Symposium on Microarchitecture, December 2003.]]
[11]
C. Ding and K. Kennedy. Improving cache performance in dynamic applications through data and computation reorganization at run time. In Proceedings of the SIGPLAN '99 Conference on Programming Language Design and Implementation, Atlanta, GA, May 1999.]]
[12]
C. Ding and Y. Zhong. Predicting whole-program locality with reuse distance analysis. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, CA, June 2003.]]
[13]
E. Duesterwald, C. Cascaval, and S. Dwarkadas. Characterizing and predicting program behavior and its variability. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques, New Orleans, Louisiana, September 2003.]]
[14]
C. Fang, S. Carr, S. Onder, and Z. Wang. Reuse-distance-based miss-rate prediction on a per instruction basis. In Proceedings of the first ACM SIGPLAN Workshop on Memory System Performance, Washington DC, June 2004.]]
[15]
H. Han and C. W. Tseng. Improving locality for adaptive irregular scientific codes. In Proceedings of Workshop on Languages and Compilers for High-Performance Computing (LCPC'00), White Plains, NY, August 2000.]]
[16]
J. E. Hopcroft and J. D. Ullman. Introduction to automata theory, languages, and computation. Addison-Wesley, 1979.]]
[17]
C.-H. Hsu and U. Kermer. The design, implementation and evaluation of a compiler algorithm for CPU energy reduction. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, CA, June 2003.]]
[18]
R. Joseph, Z. Hu, and M. Martonosi. Wavelet analysis for microprocessor design: Experiences with wavelet-based di/dt characterization. In Proceedings of International Symposium on High Performance Computer Architecture, February 2004.]]
[19]
J. R. Larus. Whole program paths. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Atlanta, Georgia, May 1999.]]
[20]
C. Luk and T. C. Mowry. Memory forwarding: enabling aggressive layout optimizations by guaranteeing the safety of data relocation. In Proceedings of International Symposium on Computer Architecture, Atlanta, GA, May 1999.]]
[21]
M. Huang and J. Renau and J. Torrellas. Positional adaptation of processors: application to energy reduction. In Proceedings of the International Symposium on Computer Architecture, San Diego, CA, June 2003.]]
[22]
G. Magklis, M. L. Scott, G. Semeraro, D. H. Albonesi, and S. Dropsho. Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor. In Proceedings of the International Symposium on Computer Architecture, San Diego, CA, June 2003.]]
[23]
G. Marin and J. Mellor-Crummey. Cross architecture performance predictions for scientific applications using parameterized models. In Proceedings of Joint International Conference on Measurement and Modeling of Computer Systems, New York City, NY, June 2004.]]
[24]
R. L. Mattson, J. Gecsei, D. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM System Journal, 9(2):78--117, 1970.]]
[25]
J. Mellor-Crummey, D. Whalley, and K. Kennedy. Improving memory hierarchy performance for irregular applications. International Journal of Parallel Programming, 29(3), June 2001.]]
[26]
C. G. Nevill-Manning and I. H. Witten. Identifying hierarchical structure in sequences: a linear-time algorithm. Journal of Artificial Intelligence Research, 7:67--82, 1997.]]
[27]
V. S. Pingali, S. A. McKee, W. C. Hsieh, and J. B. Carter. Restructuring computations for temporal data cache locality. International Journal of Parallel Programming, 31(4), August 2003.]]
[28]
X. Shen, Y. Zhong, and C. Ding. Regression-based multi-model prediction of data reuse signature. In Proceedings of the 4th Annual Symposium of the Las Alamos Computer Science Institute, Sante Fe, New Mexico, November 2003.]]
[29]
T. Sherwood, E. Perelman, and B. Calder. Basic block distribution analysis to find periodic behavior and simulation points in applications. In Proceedings of International Conference on Parallel Architectures and Compilation Techniques, Barcelona, Spain, September 2001.]]
[30]
T. Sherwood, S. Sair, and B. Calder. Phase tracking and prediction. In Proceedings of International Symposium on Computer Architecture, San Diego, CA, June 2003.]]
[31]
A. Srivastava and A. Eustace. ATOM: A system for building customized program analysis tools. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, Orlando, Florida, June 1994.]]
[32]
M. M. Strout, L. Carter, and J. Ferrante. Compile-time composition of run-time data and iteration reorderings. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, San Diego, CA, June 2003.]]
[33]
R. A. Sugumar and S. G. Abraham. Efficient simulation of caches under optimal replacement with applications to miss characterization. In Proceedings of the ACM SIGMETRICS Conference on Measurement & Modeling Computer Systems, Santa Clara, CA, May 1993.]]
[34]
L. Zhang. Efficient Remapping Mechanism for an Adaptive Memory System. PhD thesis, Department of Computer Science, University of Utah, August 2000.]]
[35]
L. Zhang, Z. Fang, M. Parker, B. K. Mathew, L. Schaelicke, J. B. Carter, W. C. Hsieh, and S. A. McKee. The Impulse memory controller. IEEE Transactions on Computers, 50(11), November 2001.]]
[36]
Y. Zhong, M. Orlovich, X. Shen, and C. Ding. Array regrouping and structure splitting using whole-program reference affinity. In Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2004.]]

Cited By

View all
  • (2024)A Dynamic Task Mapping Scheme Based on Machine Learning2024 Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC)10.1109/IPEC61310.2024.00074(389-397)Online publication date: 12-Apr-2024
  • (2020)Characterizing and Exploiting Soft Error Vulnerability Phase Behavior in GPU ApplicationsIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2020.2991136(1-1)Online publication date: 2020
  • (2019)Performance of Memory Virtualization Using Global Memory Resource BalancingInternational Journal of Cloud Applications and Computing10.4018/IJCAC.20190101029:1(16-32)Online publication date: 1-Jan-2019
  • Show More Cited By

Recommendations

Reviews

Peter C. Patton

The principles of task and data locality are very important in computer architecture, especially in the application of cache memory to bridge the speed mismatch between a central processing unit (CPU) and its primary memory. This paper goes well beyond qualitative consideration of the principles of locality, to describe a detailed, quantitative evaluation of their effects in actual memory hierarchies, for an actual benchmark program. As memory hierarchies become adaptive, performance becomes dependent on forecasting dynamic locality. This paper presents a method that predicts the locality phases of a program, by a combination of locality profiling and runtime prediction. Previous studies on the manual adaptation of programs by dynamic data reorganization, at both the hardware and application program levels, have shown improvement in cache locality and prefetching efficiency. Unfortunately, such manual analysis is difficult, and depends on the user's intimate knowledge of the program and its data sets. Most large-scale scientific and engineering computations require substantial computer time, and also have recurring locality phases. These computations are good candidates for adaptation using the method described in this paper. Application of the three-step method to the familiar tomcatv benchmark predicts the locality phase of this program with near perfect accuracy, showing a cache size reduction of 40 percent, without additional misses, and a program speedup of 35 percent. This is an excellent paper on quantitative computer architecture; even Hennessy and Patterson would be impressed. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices
ACM SIGPLAN Notices  Volume 39, Issue 11
ASPLOS '04
November 2004
283 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/1037187
Issue’s Table of Contents
  • cover image ACM Conferences
    ASPLOS XI: Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
    October 2004
    296 pages
    ISBN:1581138040
    DOI:10.1145/1024393
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 October 2004
Published in SIGPLAN Volume 39, Issue 11

Check for updates

Author Tags

  1. dynamic optimization
  2. locality analysis and optimization
  3. phase hierarchy
  4. program phase analysis and prediction
  5. reconfigurable architecture

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)28
  • Downloads (Last 6 weeks)4
Reflects downloads up to 18 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Dynamic Task Mapping Scheme Based on Machine Learning2024 Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC)10.1109/IPEC61310.2024.00074(389-397)Online publication date: 12-Apr-2024
  • (2020)Characterizing and Exploiting Soft Error Vulnerability Phase Behavior in GPU ApplicationsIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2020.2991136(1-1)Online publication date: 2020
  • (2019)Performance of Memory Virtualization Using Global Memory Resource BalancingInternational Journal of Cloud Applications and Computing10.4018/IJCAC.20190101029:1(16-32)Online publication date: 1-Jan-2019
  • (2019)Safer Program Behavior Sharing Through Trace WringingProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304074(1059-1072)Online publication date: 4-Apr-2019
  • (2019)A Survey of Phase Classification Techniques for Characterizing Variable Application BehaviorIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.292978131:1(224-236)Online publication date: 17-Dec-2019
  • (2019)Achieving Stagnation-Free Intermittent Computation with Boundary-Free Adaptive Execution2019 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS)10.1109/RTAS.2019.00035(331-344)Online publication date: Apr-2019
  • (2019)The POP Detector: A Lightweight Online Program Phase Detection Framework2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS.2019.00013(48-57)Online publication date: Mar-2019
  • (2019)Access-Aware Self-Adaptive Data Mapping onto 3D-Stacked Hybrid DRAM-PCM Based Chip-Multiprocessor2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)10.1109/HPCC/SmartCity/DSS.2019.00066(389-396)Online publication date: Aug-2019
  • (2019)Memory Distance Measurement for Concurrent ProgramsLanguages and Compilers for Parallel Computing10.1007/978-3-030-35225-7_5(49-64)Online publication date: 15-Nov-2019
  • (2018)Forecast of Chaotic Series in a Horizon Superior to the Inverse of the Maximum Lyapunov ExponentComplexity10.1155/2018/14526832018Online publication date: 1-Jan-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media