Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1375527.1375536acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Accurate memory signatures and synthetic address traces for HPC applications

Published: 07 June 2008 Publication History

Abstract

Though the performance of many scientific codes is dominated by memory behavior, our ability to describe, capture, compare, and recreate that behavior is quite limited. This inability underlies much of the complexity in the field of performance analysis: it is fundamentally difficult to relate benchmarks and applications or use realistic workloads to guide system design and procurement. An observable, reproducible, and machine-independent memory characterization is needed.
The Chameleon framework is a software suite that includes tools to capture a concise, machine-independent memory signature from any application and produce synthetic memory address traces that mimic that signature. By simultaneously modeling both spatial and temporal locality, Chameleon produces uniquely accurate, general-purpose synthetic traces. Our results demonstrate that the cache hit rates generated by each synthetic trace are nearly identical to those of the application it targets on dozens of memory hierarchies representing many of today's commercial offerings.
We apply the framework to high-performance computing (HPC) by leveraging sampling techniques to capture the memory signatures of full-scale, parallel applications with only a 5x slowdown. The overall result is therefore a concise, observable, and machine-independent representation of the memory requirements of full-scale applications that can be tractably captured and accurately mimicked.

References

[1]
High Performance Computing Modernization Program: http://www.hpcmo.hpc.mil/.
[2]
http://www.npaci.edu/DataStar/guide/home.html.
[3]
RandomAccess benchmark: http://icl.cs.utk.edu/projectsfiles/hpcc/RandomAccess/.
[4]
Spec benchmarks: http://www.spec.org/.
[5]
Stream benchmark: http://www.cs.virginia.edu/stream/.
[6]
A. Agarwal, J. Hennessy, and M. Horowitz. An analytical cache model. ACM Trans. Comput. Syst., 7(2):184--215, 1989.
[7]
A. Aho, P. Denning, and J. Ullman. Principles of optimal page replacement. Journal of the ACM, pages 80--93, January 1971.
[8]
G. Alm´asi, C. Ca¸scaval, and D. A. Padua. Calculating stack distances efficiently. In MSP '02: Proceedings of the 2002 workshop on Memory system performance, pages 37--43, New York, NY, USA, 2002. ACM Press.
[9]
J. Archibald and J. Baer. Cache coherence protocols: Evaluation using a multiprocessor simulation model. ACM Transactions on Computer Systems, 4(4):273--298, November 1986.
[10]
D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, D. Dagum, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, R. S. Schreiber, H. D. Simon, V. Venkatakrishnan, and S. K. Weeratunga. The NAS Parallel Benchmarks. The International Journal of Supercomputer Applications, 5(3):63--73, Fall 1991.
[11]
E. Berg and E. Hagersten. Statcache: A probabilistic approach to efficient and accurate data locality analysis. In Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS-2004), Austin, Texas, USA, March 2004.
[12]
E. Berg and E. Hagersten. Fast data-locality profiling of native execution. In SIGMETRICS '05: Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, pages 169--180, New York, NY, USA, 2005. ACM Press.
[13]
K. Beyls and E. D'Hollander. Reuse distance as a metric for cache behavior. In Proceedings of PDCS'01, pages 617--662, August 2001.
[14]
M. Brehob and R. Enbody. An analytical model of locality and caching. Technical Report MSU-CSE-99-31, Michigan State University, September 1999.
[15]
R. Bunt and J. Murphy. Measurement of Locality and the Behaviour of Programs. The Computer Journal, 27(3):238--245, 1984.
[16]
L. Carrington, M. Laurenzano, A.Snavely, R. Campbell, and L. Davis. How well can simple metrics represent the performance of HPC applications? In Supercomputing, November 2005.
[17]
L. Carrington, N. Wolter, A. Snavely, and C. B. Lee. Applying an Automated Framework to Produce Accurate Blind Performance Predictions of Full-Scale HPC Applications. In Proceedings of the 2004 Department of Defense Users Group Conference. IEEE Computer Society Press, 2004.
[18]
R. Cheng and C. Ding. Measuring temporal locality variation across program inputs. Technical Report TR 875, University of Rochester. Computer Science Department., 2005.
[19]
Conte and Hwu. Benchmark characterization for experimental system evaluation. In Proceedings of the Twenty-Third Annual Hawaii International Conference on System Sciences, volume 1, pages 6--18, January 1990.
[20]
P. J. Denning and S. C. Schwartz. Properties of the working-set model. Commun. ACM, 15(3):191--198, 1972.
[21]
C. Ding and Y. Zhong. Predicting Wholeprogram Locality Through Reuse Distance Analysis. In PLDI CS03: Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation, pages 245--157. ACM Press, 2003.
[22]
X. Gao, M. Laurenzano, B. Simon, and A. Snavely. Reducing overheads for acquiring dynamic traces. In International Symposium on Workload Characterization, 2005.
[23]
X. Gao, A. Snavely, and L. Carter. Path grammar guided trace compression and trace approximation. In HPDC'06: Proceedings of the 15th IEEE International Symposium on High Performance Distributed Computing, Paris, France, June 2006.
[24]
K. Grimsrud, J. Archibald, R. Frost, and B. Nelson. On the accuracy of memory reference models. In Proceedings of the 7th international conference on Computer performance evaluation : modelling techniques and tools, pages 369--388, Secaucus, NJ, USA, 1994. Springer-Verlag New York, Inc.
[25]
K. Grimsrud, J. Archibald, R. Frost, and B. Nelson. Locality as a visualization tool. IEEE Transactions on Computers, 45(11):1319--1326, 1996.
[26]
R. Hassan, A. Harris, N. Topham, and A. Efthymiou. A hybrid markov model for accurate memory reference generation. In Proceedings of the IAENG International Conference on Computer Science. IAENG, 2007.
[27]
R. Hassan, A. Harris, N. Topham, and A. Efthymiou. Synthetic trace-driven simulation of cache memory. In AINAW '07: Proceedings of the 21st International Conference on Advanced Information Networking and Applications Workshops, pages 764--771, Washington, DC, USA, 2007. IEEE Computer Society.
[28]
D. J. Kerbyson, H. J. Alme, A. Hoisie, F. Petrini, H. J. Wasserman, and M. Gittings. Predictive performance and scalability modeling of a large-scale application. In Supercomputing '01: Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM), pages 37--37, New York, NY, USA, 2001. ACM.
[29]
Y. H. Kim, M. D. Hill, and D. A. Wood. Implementing stack simulation for highly-associative memories. In SIGMETRICS '91: Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems, pages 212--213, New York, NY, USA, 1991. ACM Press.
[30]
S. Laha. Accurate low-cost methods for performance evaluation of cache memory systems. PhD thesis, Urbana, IL, USA, 1988.
[31]
S. Liu and J. Chen. The effect of product gas enrichment on the chemical response of premixed diluted methane/air flames. In Proceedings of the Third Joint Meeting of the U.S. Sections of the Combustion Institute, Chicago, Illinois, March 16-19 2003.
[32]
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In PLDI '05: Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, pages 190--200, New York, NY, USA, 2005. ACM Press.
[33]
P. Luszczek, J. Dongarra, D. Koester, R. Rabenseifner, B. Lucas, J. Kepner, J. McCalpin, D. Baily, and D. Takahashi. Introduction to the HPC Challenge Benchmark Suite, April 2005. Paper LBNL-57493.
[34]
G. Marin and J. Mellor-Crummey. Crossarchitecture Performance Predictions for Scientific Applications Using Parameterized Models. In SIGMETRICS 2004/PERFORMANCE 2004: Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems, pages 2--13, New York, NY, 2004. ACM Press.
[35]
M. Mathis and D. J. Kerbyson. Performance modeling of mcnp on large-scale systems. In Proceedings of the LACSI Symposium, Los Alamos, NM, 2002. Los Alamos Computer Institute.
[36]
R. Mattson, J. Gecsei, D. Slutz, and I. Traiger. Evaluation Techniques for Storage Hierarchies. IBM System Journal, 9(2):78--117, 1970.
[37]
X. Shen, Y. Zhong, and C. Ding. Regression-based multi-model prediction of data reuse signature. In Proceedings of the 4th Annual Symposium of the Las Alamos Computer Science Institute, Sante Fe, New Mexico, November 2003.
[38]
A. Snavely, L. Carrington, N. Wolter, J. Labarta, R. Badia, and A. Purkayastha. A Framework for Application Performance Modeling and Prediction. In Supercomputing CS02: Proceedings of the 2002 ACM/IEEE Conference on Supercomputing, pages 1--17, Los Alamitos, CA, 2002. IEEE Computer Society Press.
[39]
M. Snir and J. Yu. On the theory of spatial and temporal locality. Technical Report UIUCDCS-R-2005-2611, July 2005.
[40]
E. S. Sorenson. Using locality to predict cache performance. Master's thesis, Brigham Young University, 2001.
[41]
E. S. Sorenson. Cache Characterization and Performance Studies Using Locality Surfaces. PhD thesis, Brigham Young University, 2005.
[42]
E. S. Sorenson and J. K. Flanagan. Cache characterization surfaces and prediction workload miss rates. In Proceedings of the Fourth IEEE Annual Workshop on Workload Characterization, pages 129--139, December 2001.
[43]
E. S. Sorenson and J. K. Flanagan. Evaluating synthetic trace models using locality surfaces. In Proceedings of the Fifth IEEE Annual Workshop on Workload Characterization, pages 23--33, November 2002.
[44]
J. R. Spirn. Program Behavior: Models and Measurements. Elsevier Science Inc., New York, NY, USA, 1977.
[45]
E. Strohmaier and H. Shan. Apex-map: A global data access benchmark to analyze hpc systems and parallel programming paradigms. In SC '05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing, page 49, Washington, DC, USA, 2005. IEEE Computer Society.
[46]
R. A. Sugumar. Multi-configuration simulation algorithms for the evaluation of computer architecture designs. PhD thesis, Ann Arbor, MI, USA, 1993.
[47]
D. Thiebaut. From the fractal dimension of the intermiss gaps to the cache-miss ratio. IBM J. Res. Dev., 32(6):796--803, 1988.
[48]
D. Thiebaut. On the fractal dimension of computer programs and its application to the prediction of the cache miss ratio. IEEE Trans. Comput., 38(7):1012--1026, 1989.
[49]
D. Thiebaut, J. L. Wolf, and H. S. Stone. Synthetic traces for trace-driven simulation of cache memories. IEEE Trans. Comput., 41(4):388--410, 1992.
[50]
M. Tikir, L. Carrington, E. Strohmaier, and A. Snavely. A genetic algorithms approach to modeling the performance of memory-bound computations. In Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, pages 82--94, Reno, Nevada, November 10-13 2007.
[51]
M. Tikir, M. Laurenzano, L. Carrington, and A. Snavely. The PMaC binary instrumentation library for PowerPC. In Workshop on Binary Instrumentation and Applications, 2006.
[52]
J. Weinberg, M. McCracken, A. Snavely, and E. Strohmeir. Quantifying locality in the memory access patterns of hpc applications. In SC '05: Proceedings of the 2005 ACM/IEEE conference on Supercomputing, Seattle, WA, November 2005.
[53]
J. Weinberg and A. Snavely. Symbiotic space-sharing on sdsc's datastar system. In The 12th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP '06), St. Malo, France, June 2006.
[54]
J. Weinberg and A. Snavely. Chameleon: A framework for observing, understanding, and imitating the memory behavior of applications. In PARA08: Workshop on State-of-the-Art in Scientific and Parallel Computing, Trondheim, Norway, May, 2008.
[55]
T. Wen, J. Su, P. Colella, K. Yelick, and N. Keen. An adaptive mesh refinement benchmark for modern parallel programming languages. In Supercomputing 2007, November 2007.
[56]
W. S. Wong and R. J. T. Morris. Benchmark synthesis using the lru cache hit function. IEEE Trans. Comput., 37(6):637--645, 1988.
[57]
Y. Zhong, C. Ding, and K. Kennedy. Reuse distance analysis for scientific programs. In Proceedings of Workshop on Languages, Compilers, and Run-time Systems for Scalable Computers, Washington DC, March 2002.
[58]
Y. Zhong, S. G. Dropsho, and C. Ding. Miss rate prediction across all program inputs. In PACT '03: Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques, page 79, Washington, DC, USA, 2003. IEEE Computer Society.

Cited By

View all
  • (2024)Camouflage: Utility-Aware Obfuscation for Accurate Simulation of Sensitive Program TracesACM Transactions on Architecture and Code Optimization10.1145/365011021:2(1-23)Online publication date: 21-May-2024
  • (2020)Trace Wringing for Program Trace PrivacyIEEE Micro10.1109/MM.2020.2986113(1-1)Online publication date: 2020
  • (2019)Safer Program Behavior Sharing Through Trace WringingProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304074(1059-1072)Online publication date: 4-Apr-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '08: Proceedings of the 22nd annual international conference on Supercomputing
June 2008
390 pages
ISBN:9781605581583
DOI:10.1145/1375527
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 June 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. caches
  2. hpc
  3. locality
  4. synthetic memory traces

Qualifiers

  • Research-article

Conference

ICS08
Sponsor:
ICS08: International Conference on Supercomputing
June 7 - 12, 2008
Island of Kos, Greece

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)4
Reflects downloads up to 14 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Camouflage: Utility-Aware Obfuscation for Accurate Simulation of Sensitive Program TracesACM Transactions on Architecture and Code Optimization10.1145/365011021:2(1-23)Online publication date: 21-May-2024
  • (2020)Trace Wringing for Program Trace PrivacyIEEE Micro10.1109/MM.2020.2986113(1-1)Online publication date: 2020
  • (2019)Safer Program Behavior Sharing Through Trace WringingProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304074(1059-1072)Online publication date: 4-Apr-2019
  • (2018)HALOProceedings of the 2018 International Conference on Supercomputing10.1145/3205289.3205323(118-128)Online publication date: 12-Jun-2018
  • (2017)Fast and Accurate Exploration of Multi-level Caches Using Hierarchical Reuse Distance2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)10.1109/HPCA.2017.11(145-156)Online publication date: Feb-2017
  • (2016)Exploring system performance using elastic traces: Fast, accurate and portable2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS)10.1109/SAMOS.2016.7818336(96-105)Online publication date: Jul-2016
  • (2012)Survey of scheduling techniques for addressing shared resources in multicore processorsACM Computing Surveys10.1145/2379776.237978045:1(1-28)Online publication date: 7-Dec-2012
  • (2012)A lightweight hybrid hardware/software approach for object-relative memory profilingProceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software10.1109/ISPASS.2012.6189205(46-57)Online publication date: 1-Apr-2012
  • (2009)Modelling memory requirements for grid applicationsProceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing10.1109/IPDPS.2009.5160962(1-8)Online publication date: 23-May-2009
  • (2009)Ctuning: A reuse distance based cache performance tuning toolJournal of Electronics (China)10.1007/s11767-008-0023-x26:4(517-524)Online publication date: 19-Sep-2009

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media