Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

ScalaExtrap: Trace-based communication extrapolation for SPMD programs

Published: 04 May 2012 Publication History

Abstract

Performance modeling for scientific applications is important for assessing potential application performance and systems procurement in high-performance computing (HPC). Recent progress on communication tracing opens up novel opportunities for communication modeling due to its lossless yet scalable trace collection. Estimating the impact of scaling on communication efficiency still remains nontrivial due to execution-time variations and exposure to hardware and software artifacts.
This work contributes a fundamentally novel modeling scheme. We synthetically generate the application trace for large numbers of nodes via extrapolation from a set of smaller traces. We devise an innovative approach for topology extrapolation of single program, multiple data (SPMD) codes with stencil or mesh communication. Experimental results show that the extrapolated traces precisely reflect the communication behavior and the performance characteristics at the target scale for both strong and weak scaling applications. The extrapolated trace can subsequently be (a) replayed to assess communication requirements before porting an application, (b) transformed to autogenerate communication benchmarks for various target platforms, and (c) analyzed to detect communication inefficiencies and scalability limitations.
To the best of our knowledge, rapidly obtaining the communication behavior of parallel applications at arbitrary scale with the availability of timed replay, yet without actual execution of the application, at this scale, is without precedence and has the potential to enable otherwise infeasible system simulation at the exascale level.

References

[1]
Bailey, D. and Snavely, A. 2005. Performance modeling: Understanding the present and predicting the future. In Proceedings of the Euro-Par Conference.
[2]
Bailey, D. H., Barszcz, E., Barton, J. T., Browning, D. S., Carter, R. L., Dagum, D., Fatoohi, R. A., Frederickson, P. O., Lasinski, T. A., Schreiber, R. S., Simon, H. D., Venkatakrishnan, V., and Weeratunga, S. K. 1991. The NAS parallel benchmarks. Int. J. Supercomput. Appl. 5, 3, 63--73.
[3]
Bergroth, L., Hakonen, H., and Raita, T. 2000. A survey of longest common subsequence algorithms. In Proceedings of the 7th International Symposium on String Processing Information Retrieval (SPIRE'00). Los Alamitos, CA, 39.
[4]
Brunst, H., Kranzlmüller, D., and Nagel, W. 2005. Tools for scalable parallel program analysis—Vampir NG and DeWiz. In International Series Eng. Comput. Sci. Distributed and Parallel Systems 777, 92--102.
[5]
Deshpande, V. 2011. Automatic generation of complete communication skeletons from traces. M.S. thesis, North Carolina State University, Raleigh, NC.
[6]
Eckert, Z. and Nutt, G. 1996. Trace extrapolation for parallel programs on shared memory multiprocessors. Tech. rep. TR CU-CS-804-96, Department of Computer Science, University of Colorado at Boulder, Boulder, CO.
[7]
Eckert, Z. K. F. and Nutt, G. J. 1994. Parallel program trace extrapolation. In Proceedings of the International Conference on Parallel Processing. 103--107.
[8]
Faraj, A., Patarasuk, P., and Yuan, X. 2007. A study of process arrival patterns for MPI collective operations. In Proceedings of the International Conference on Supercomputing. 168--179.
[9]
Gropp, W., Lusk, E., Doss, N., and Skjellum, A. 1996. A high-performance, portable implementation of the MPI message passing interface standard. Parallel Comput. 22, 6, 789--828.
[10]
Gruber, B., Haring, G., Kranzlmueller, D., and Volkert, J. 1996. Parallel programming with capse—A case study. In Proceedings of the International Euromicro Conference on Parallel, Distributed, and Network-Based Processing.
[11]
Gustafson, J. L. 1988. Reevaluating Amdahl's law. Comm. ACM 31, 5, 532--533.
[12]
Hermanns, M.-A., Geimer, M., Wolf, F., and Wylie, B. J. N. 2009. Verifying causality between distant performance phenomena in large-scale mpi applications. In Proceedings of the 17th Euromicro International Conference on Parallel, Distributed and Network-Based Processing. Los Alamitos, CA, 78--84.
[13]
Hoisie, A., Lubeck, O. M., and Wasserman, H. J. 1999. Performance analysis of wavefront algorithms on very-large scale distributed systems. In Proceedings of the Workshop on Wide Area Networks and High Performance Computing. Springer-Verlag, 171--187.
[14]
Ïpek, E., McKee, S. A., Caruana, R., de Supinski, B. R., and Schulz, M. 2006. Efficiently exploring architectural design spaces via predictive modeling. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems. 195--206.
[15]
Kerbyson, D., Alme, H., Hoisie, A., Petrini, F., Wasserman, H., and Gittings, M. 2001. Predictive performance and scalability modeling of a large-scale application. In International Conference on Supercomputing.
[16]
Kerbyson, D. J. and Hoisie, A. 2006. Performance modeling of the blue gene architecture. In Proceedings of the IEEE John Vincent Atanasoff International Symposium on Modern Computing. 252--259.
[17]
Knüpfer, A., Brendel, R., Brunst, H., Mix, H., and Nagel, W. E. 2006. Introducing the open trace format (OTF). In Proceedings of the International Conference on Computational Science. 526--533.
[18]
Labarta, J., Girona, S., and Cortes, T. 1997. Analyzing scheduling policies using dimemas. Parallel Comput. 23, 1--2, 23--34.
[19]
Nagel, W. E., Arnold, A., Weber, M., Hoppe, H. C., and Solchenbach, K. 1996. VAMPIR: Visualization and analysis of MPI resources. Supercomput. 12, 1, 69--80.
[20]
Noeth, M., Mueller, F., Schulz, M., and de Supinski, B. R. 2007. Scalable compression and replay of communication traces in massively parallel environments. In Proceedings of the International Parallel and Distributed Processing Symposium.
[21]
Noeth, M., Mueller, F., Schulz, M., and de Supinski, B. R. 2009. Scalatrace: Scalable compression and replay of communication traces in high performance computing. J. Parall. Distrib. Comput. 69, 8, 969--710.
[22]
Pillet, V., Labarta, J., Cortes, T., and Girona, S. 1995. PARAVER: A tool to visualise and analyze parallel code. In Proceedings of WoTUG-18: Transputer and OCCAM Developments. Transputer and Occam Engineering Series, vol. 44. 17--31.
[23]
Preissl, R., Köckerbauer, T., Schulz, M., Kranzlmüller, D., Supinski, B. R. D., and Quinlan, D. J. 2008. Detecting patterns in mpi communication traces. In Proceedings of the 37th International Conference on Parallel Processing. Los Alamitos, CA, 230--237.
[24]
Preissl, R., Schulz, M., Kranzlmüller, D., Supinski, B. R., and Quinlan, D. J. 2008. Using mpi communication patterns to guide source code transformations. In Proceedings of the 8th International Conference on Computational Science. Part III, Springer-Verlag, Berlin, 253--260.
[25]
Ratn, P., Mueller, F., de Supinski, B. R., and Schulz, M. 2008. Preserving time in large-scale communication traces. In Proceedings of the International Conference on Supercomputing. 46--55.
[26]
Rodrigues, A. F., Murphy, R. C., Kogge, P., and Underwood, K. D. 2006. The structural simulation toolkit: exploring novel architectures. In Proceedings of the ACM/IEEE Conference on Supercomputing. 157.
[27]
Ronsse, M. and Kranzlmueller, D. 1998. Roltmp-replay of lamport timestamps for message passing systems. In Proceedings of the International Euromicro Conference on Parallel, Distributed, and Network-Based Processing.
[28]
Sherwood, T., Perelman, E., Hamerly, G., and Calder, B. 2002. Automatically characterizing large scale program behavior. In Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems. 45--57.
[29]
Snavely, A., Carrington, L., Wolter, N., Labarta, J., Badia, R., and Purkayastha, A. 2002. A framework for performance modeling and prediction. In Proceedings of the International Conference on Supercomputing.
[30]
Vetter, J. and McCracken, M. 2001. Statistical scalability analysis of communication operations in distributed applications. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.
[31]
Wasserman, H., Hoisie, A., and Lubeck, O. 2000. Performance and scalability analysis of teraflop-scale parallel architectures using multidimensional wavefront applications. Int. J. High Perform. Comput. Appli. 14, 330--346.
[32]
Wu, X., Mueller, F., and Pakin, S. 2011. Automatic generation of executable communication specifications from parallel applications. In Proceedings of the International Conference on Supercomputing. (ICS '11). ACM, New York, 12--21.
[33]
Wu, X., Vijayakumar, K., Mueller, F., Ma, X., and Roth, P. C. 2011. Probabilistic communication and i/o tracing with deterministic replay at scale. In Proceedings of the International Conference on Parallel Processing.
[34]
Xu, Q., Prithivathi, R., Subhlok, J., and Zheng, R. 2008. Logicalization of mpi communication traces. Tech. rep. UH-CS-08-07, Department of Computer Science, University of Houston.
[35]
Xu, Q. and Subhlok, J. 2008. Construction and evaluation of coordinated performance skeletons. In Proceeding of the International Conference on High Performance Computing. 73--86.
[36]
Zhai, J., Chen, W., and Zheng, W. 2010. Phantom: Predicting performance of parallel applications on large-scale parallel machines using a single node. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 305--314.
[37]
Zhai, J., Sheng, T., He, J., Chen, W., and Zheng, W. 2009. Fact: Fast communication trace collection for parallel applications through program slicing. In Proceedings of the International Conference on Supercomputing. 1--12.

Cited By

View all
  • (2021)Extracting clean performance models from tainted programsProceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3437801.3441613(403-417)Online publication date: 17-Feb-2021
  • (2021)Lossy Compression of Communication Traces Using Recurrent Neural NetworksIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.3132417(1-1)Online publication date: 2021
  • (2020)ExtraPeak: Advanced Automatic Performance Modeling for HPC ApplicationsSoftware for Exascale Computing - SPPEXA 2016-201910.1007/978-3-030-47956-5_15(453-482)Online publication date: 31-Jul-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Programming Languages and Systems
ACM Transactions on Programming Languages and Systems  Volume 34, Issue 1
April 2012
225 pages
ISSN:0164-0925
EISSN:1558-4593
DOI:10.1145/2160910
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 May 2012
Accepted: 01 February 2012
Revised: 01 November 2011
Received: 01 June 2011
Published in TOPLAS Volume 34, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Communication
  2. compression
  3. trace extrapolation
  4. tracing

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)43
  • Downloads (Last 6 weeks)13
Reflects downloads up to 30 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Extracting clean performance models from tainted programsProceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3437801.3441613(403-417)Online publication date: 17-Feb-2021
  • (2021)Lossy Compression of Communication Traces Using Recurrent Neural NetworksIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.3132417(1-1)Online publication date: 2021
  • (2020)ExtraPeak: Advanced Automatic Performance Modeling for HPC ApplicationsSoftware for Exascale Computing - SPPEXA 2016-201910.1007/978-3-030-47956-5_15(453-482)Online publication date: 31-Jul-2020
  • (2019)Automatic Instrumentation Refinement for Empirical Performance Modeling2019 IEEE/ACM International Workshop on Programming and Performance Visualization Tools (ProTools)10.1109/ProTools49597.2019.00011(40-47)Online publication date: Nov-2019
  • (2018)Chameleon: Online Clustering of MPI Program Traces2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS.2018.00119(1102-1112)Online publication date: May-2018
  • (2018)Predicting cloud performance for HPC applications before deploymentFuture Generation Computer Systems10.1016/j.future.2017.10.04887:C(618-628)Online publication date: 1-Oct-2018
  • (2017)Predicting Cloud Performance for HPC ApplicationsProceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing10.1109/CCGRID.2017.11(524-533)Online publication date: 14-May-2017
  • (2017)Classification of thread profiles for scaling application behaviorParallel Computing10.1016/j.parco.2017.04.00666(1-21)Online publication date: Aug-2017
  • (2016)Efficient clustering for ultra-scale application tracingJournal of Parallel and Distributed Computing10.1016/j.jpdc.2016.08.00198(25-39)Online publication date: Dec-2016
  • (2015)Toward More Scalable Off-Line Simulations of MPI ApplicationsParallel Processing Letters10.1142/S012962641541002925:03(1541002)Online publication date: Sep-2015
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media