research-article

Preserving time in large-scale communication traces

Authors:

Bronis R. de Supinski,

Martin SchulzAuthors Info & Claims

ICS '08: Proceedings of the 22nd annual international conference on Supercomputing

Pages 46 - 55

https://doi.org/10.1145/1375527.1375537

Published: 07 June 2008 Publication History

Abstract

Analyzing the performance of large-scale scientific applications is becoming increasingly difficult due to the sheer size of performance data gathered. Recent work on scalable communication tracing applies online interprocess compression to address this problem. Yet, analysis of communication traces requires knowledge about time progression that cannot trivially be encoded in a scalable manner during compression. We develop scalable time stamp encoding schemes for communication traces.

At the same time, our work contributes novel insights into the scalable representation of time stamped data. We show that our representations capture sufficient information to enable what-if explorations of architectural variations and analysis for path-based timing irregularities while not requiring excessive disk space. We evaluate the ability of several time-stamped compressed MPI trace approaches to enable accurate timed replay of communication events. Our lossless traces are orders of magnitude smaller, if not near constant size, regardless of the number of nodes while preserving timing information suitable for application tuning or assessing requirements of future procurements. Our results prove time-preserving tracing without loss of communication information can scale in the number of nodes and time steps, which is a result without precedent.

References

[1]

The ASCI purple benchmarks.http://www.llnl.gov/asci/purple/benchmarks, 2002.]]

[2]

N. Adiga and et al. An overview of the BlueGene/Lsupercomputer. In Supercomputing, November 2002.]]

Digital Library

[3]

Dorian C. Arnold, Dong H. Ahn, Bronis R. de Supinski,Gregory L. Lee, Barton P. Miller, and Martin Schulz. Stack trace analysis for large scale debugging. In International Parallel and Distributed Processing Symposium, 2007.]]

[4]

Daniel Becker, Felix Wolf, Wolfgang Frings, Markus Geimer,Brian J.N. Wylie, and Bernd Mohr. Automatic trace-based performance analysis of metacomputing applications. In International Parallel and Distributed Processing Symposium, 2007.]]

[5]

Holger Brunst, Hans-Christian Hoppe, Wolfgang E. Nagel, and Manuela Winkler. Performance optimization for large scale computing: The scalable VAMPIR approach. In International Conference on Computational Science (2),pages 751--760, 2001.]]

Digital Library

[6]

Marc Casas, Rosa Badia, and Jesus Labarta. Automatic structure extraction from mpi applications tracefiles. In Euro-Par Conference, August 2007.]]

Digital Library

[7]

JaeWoong Chung, Chi Cao Minh, Austen McDonald, Travis Skare, Hassan Chafi, Brian D. Carlstrom, Christos Kozyrakis, and Kunle Olukotun. Tradeoffs in transactional memory virtualization. In Architectural Support for Programming Languages and Operating Systems, 2006.]]

Digital Library

[8]

F. Freitag, J. Caubet, and J. Labarta. On the scalability of tracing mechanisms. In Euro-Par Conference, pages 97--104, August 2002.]]

Digital Library

[9]

M. Geimer, F. Wolf, B. Wylie, and B. Mohr. Scalable parallel trace-based performance analysis. In European PVM/MPI Users' Group Meeting, 2007.]]

Digital Library

[10]

Paul Havlak and Ken Kennedy. An implementation of interprocedural bounded regular section analysis. IEEE Transactions on Parallel and Distributed Systems, 2(3):350--360, July 1991.]]

Digital Library

[11]

A. Knu"pfer, R. Brendel, H. Brunst, H. Mix, and W. E. Nagel. Introducing the open trace format (OTF). In International Conference on Computational Science, pages 526--533, May 2006.]]

Digital Library

[12]

Andreas Knupfer. Construction and compression of complete call graphs for post-mortem program trace analysis. In International Conference on Parallel Processing, pages 165--172, 2005.]]

Digital Library

[13]

D. E. Knuth. The Art of Computer Programming: Fundamental Algorithms, volume 2. Addison-Wesley, 2edition, 1973.]]

[14]

J. Marathe, F. Mueller, T. Mohan, B. R. de Supinski, S. A.McKee, and A. Yoo. METRIC: Tracking down inefficiencies in the memory hierarchy via binary rewriting. In International Symposium on Code Generation and Optimization, pages 289-300, March 2003.]]

Digital Library

[15]

M. Mesnier, M. Wachs, R. Sambasivan, J. Lopez, J. Hendricks, and G. R. Ganger. //trace: Parallel trace replay with approximate causal events. In USENIX Conference on File and Storage Technologies, February 2007.]]

Digital Library

[16]

W. E. Nagel, A. Arnold, M. Weber, H. C. Hoppe, and K. Solchenbach. VAMPIR: Visualization and analysis of MPIresources. Supercomputer, 12(1):69--80, 1996.]]

[17]

Marcin Neyman, Michal Bukowski, and Piotr Kuzora.Efficient replay of PVM programs. In European PVM/MPI Users' Group Meeting on Recent Advances in Parallel VirtualMachine and Message Passing Interface, pages 83--90, 1999.]]

Digital Library

[18]

M. Noeth, F. Mueller, M. Schulz, and B. R. de Supinski. Scalable compression and replay of communication traces in massively parallel environments. In International Parallel and Distributed Processing Symposium, April 2007.]]

Digital Library

[19]

V. Pillet, J. Labarta, T. Cortes, and S. Girona. PARAVER: A tool to visualise and analyze parallel code. In Proceedings of WoTUG-18: Transputer and occam Developments,volume 44 of Transputer and Occam Engineering, pages 17--31, April 1995.]]

[20]

Philip C. Roth, Dorian C. Arnold, and Barton P. Miller. MRNet: A software-based multicast/reduction network for scalable tools. In Supercomputing, pages 21--36, Washington, DC, USA, 2003. IEEE Computer Society.]]

Digital Library

[21]

Martin Schulz and Bronis R. de Supinski. PNMPI tools: A whole lot greater than the sum of their parts. In Supercomputing, 2007.]]

Digital Library

[22]

J. Vetter and M. McCracken. Statistical scalability analysis of communication operations in distributed applications. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2001.]]

Digital Library

[23]

F. Wong, R. Martin, R. Arpaci-Dusseau, and D. Culler. Architectural requirements and scalability of the NAS parallel benchmarks. In Supercomputing, 1999.]]

Digital Library

[24]

O. Zaki, E. Lusk, W. Gropp, and D. Swider. Toward scalable performance visualization with Jumpshot. International Journal of High Performance Computing Applications,13(3):277--288, 1999.]]

Digital Library

Cited By

Zhai JJin YChen WZheng WZhai JJin YChen WZheng W(2023)Structure-Based Communication Trace CompressionPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_3(43-69)Online publication date: 19-Jun-2023
https://doi.org/10.1007/978-981-99-4366-1_3
Luo XMueller FCarns PJenkins JLatham RRoss RSnyder S(2017)ScalaIOExtrap: Elastic I/O Tracing and Extrapolation2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2017.45(585-594)Online publication date: May-2017
https://doi.org/10.1109/IPDPS.2017.45
Zhang WCheng ASubhlok J(2016)DwarfCode: A Performance Prediction Tool for Parallel ApplicationsIEEE Transactions on Computers10.1109/TC.2015.241752665:2(495-507)Online publication date: 1-Feb-2016
https://dl.acm.org/doi/10.1109/TC.2015.2417526
Show More Cited By

Index Terms

Preserving time in large-scale communication traces

Recommendations

ScalaTrace: tracing, analysis and modeling of HPC codes at scale
PARA'10: Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2

Characterizing the communication behavior of large-scale applications is a difficult and costly task due to code/system complexity and their long execution times. An alternative to running actual codes is to gather their communication traces and then ...
ScalaExtrap: trace-based communication extrapolation for spmd programs
PPoPP '11

Performance modeling for scientific applications is important for assessing potential application performance and systems procurement in high-performance computing (HPC). Recent progress on communication tracing opens up novel opportunities for ...
ScalaExtrap: trace-based communication extrapolation for spmd programs
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming

Performance modeling for scientific applications is important for assessing potential application performance and systems procurement in high-performance computing (HPC). Recent progress on communication tracing opens up novel opportunities for ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '08: Proceedings of the 22nd annual international conference on Supercomputing

June 2008

390 pages

ISBN:9781605581583

DOI:10.1145/1375527

General Chairs:
Theo Papatheodorou
University of Patras, Greece
,
Utpal Banerjee
Intel (retired), USA
,
Program Chairs:
Avi Mendelson
Intel, Israel
,
Kyle Gallivan
Florida State University, USA

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 June 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ICS08

Sponsor:

ICS08: International Conference on Supercomputing

June 7 - 12, 2008

Island of Kos, Greece

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

39
Total Citations
View Citations
305
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)1

Reflects downloads up to 30 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhai JJin YChen WZheng WZhai JJin YChen WZheng W(2023)Structure-Based Communication Trace CompressionPerformance Analysis of Parallel Applications for HPC10.1007/978-981-99-4366-1_3(43-69)Online publication date: 19-Jun-2023
https://doi.org/10.1007/978-981-99-4366-1_3
Luo XMueller FCarns PJenkins JLatham RRoss RSnyder S(2017)ScalaIOExtrap: Elastic I/O Tracing and Extrapolation2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2017.45(585-594)Online publication date: May-2017
https://doi.org/10.1109/IPDPS.2017.45
Zhang WCheng ASubhlok J(2016)DwarfCode: A Performance Prediction Tool for Parallel ApplicationsIEEE Transactions on Computers10.1109/TC.2015.241752665:2(495-507)Online publication date: 1-Feb-2016
https://dl.acm.org/doi/10.1109/TC.2015.2417526
Weber MBrendel RHilbrich TMohror KSchulz MBrunst H(2016)Structural Clustering: A New Approach to Support Performance Analysis at Scale2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2016.27(484-493)Online publication date: May-2016
https://doi.org/10.1109/IPDPS.2016.27
Lagadapati MMueller FEngelmann C(2016)Benchmark Generation and Simulation at Extreme ScaleProceedings of the 20th International Symposium on Distributed Simulation and Real-Time Applications10.1109/DS-RT.2016.18(9-18)Online publication date: 21-Sep-2016
https://dl.acm.org/doi/10.1109/DS-RT.2016.18
Sikora AMargalef TJorba J(2016)Automated and dynamic abstraction of MPI application performanceCluster Computing10.1007/s10586-016-0615-419:3(1105-1137)Online publication date: 1-Sep-2016
https://dl.acm.org/doi/10.1007/s10586-016-0615-4
Luo XMueller FCarns PJenkins JLatham RRoss RSnyder S(2015)HPC I/O trace extrapolationProceedings of the 4th Workshop on Extreme Scale Programming Tools10.1145/2832106.2832108(1-6)Online publication date: 15-Nov-2015
https://dl.acm.org/doi/10.1145/2832106.2832108
Casanova HDesprez FMarkomanolis GSuter F(2015)Simulation of MPI applications with time-independent tracesConcurrency and Computation: Practice & Experience10.1002/cpe.327827:5(1145-1168)Online publication date: 10-Apr-2015
https://dl.acm.org/doi/10.1002/cpe.3278
Wu XMueller FPakin S(2014)A methodology for automatic generation of executable communication specifications from parallel MPI applicationsACM Transactions on Parallel Computing10.1145/26602491:1(1-30)Online publication date: 3-Oct-2014
https://dl.acm.org/doi/10.1145/2660249
Zhai JHu JTang XMa XChen WDamkroger TDongarra J(2014)CypressProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC.2014.17(143-153)Online publication date: 16-Nov-2014
https://dl.acm.org/doi/10.1109/SC.2014.17
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents