Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

ADAMANT

Published: 01 June 2016 Publication History

Abstract

In the converging world of High Performance Computing and Big Data, moving data is becoming a critical aspect of performance and energy efficiency. In this paper we present the Advanced DAta Movement Analysis Toolkit (ADAMANT), a set of tools to capture and analyze data movement within an application, and to aid in understanding performance and energy efficiency in current and future systems. ADAMANT identifies all the data objects allocated by an application and uses instrumentation modules to monitor relevant events (e.g. cache misses). Finally, ADAMANT produces a per-object performance profile.In this paper we demonstrate the use of ADAMANT in analyzing three applications, BT, BFS, and Velvet, and evaluate the impact of different memory technology. With the information produced by ADAMANT we were able to model and compare different memory configurations and object placement solutions. In BFS we devised a placement which outperforms caching, while in the other two cases we were able to point out which data objects may be problematic for the configurations explored, and would require refactoring to improve performance.

References

[1]
L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, and N. R. Tallent. Hpctoolkit: Tools for performance analysis of optimized parallel programs http://hpctoolkit.org. Concurr. Comput.: Pract. Exper., 22(6):685-701, Apr. 2010.
[2]
Advanced Micro Devices. AMD64 Architecture Programmers Manual Volume 2: System Programming. 2015.
[3]
J. A. Ang, B.W. Barrett, K.B. Wheeler, and R. C. Murphy. Introducing the graph 500. In Proceedings of Cray User's Group Meeting (CUG), May 2010.
[4]
D. Bailey, E. Barszcz, J. Barton, D. Browning, R. Carter, L. Dagum, R. Fatoohi, P. Frederickson, T. Lasinski, R. Schreiber, H. Simon, V. Venkatakrishnan, and S. Weeratunga. The nas parallel benchmarks summary and preliminary results. In Supercomputing, 1991. Supercomputing91. Proceedings of the 1991 ACM/IEEE Conference on, pages 158-165, Nov 1991.
[5]
K. Beyls and E. H. D'Hollander. Discovery of locality-improving refactorings by reuse path analysis. In Proceedings of the Second International Conference on High Performance Computing and Communications, HPCC06, pages 220-229, Berlin, Heidelberg, 2006. Springer-Verlag.
[6]
L. Carrington, A. Snavely, X. Gao, and N. Wolter. A performance prediction framework for scientific applications. In Proceedings of the 2003 International Conference on Computational Science: PartIII, ICCS03, pages 926-935, Berlin, Heidelberg, 2003. Springer-Verlag.
[7]
P. Cicotti, L. Carrington, and A. Chien. Toward application-specific memory reconfiguration for energy efficiency. In Proceedings of the 1st International Workshop on Energy Efficient Supercomputing, page 2. ACM, 2013.
[8]
S. Cook. CUDA Programming: A Developer's Guide to Parallel Computing with GPUs. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1st edition, 2013.
[9]
C. Ding and Y. Zhong. Predicting whole-program locality through reuse distance analysis. SIGPLAN Not., 38(5):245-257, May 2003.
[10]
P. J. Drongowski. Instruction-Based Sampling: A New Performance Analysis Technique for AMD Family 10h Processors, 2007.
[11]
C. Fang, S. Carr, S. nder, and Z. Wang. Reuse-distance-based miss-rate prediction on a per instruction basis. In Proceedings of the 2004 Workshop on Memory System Performance, MSP04, pages 60-68, New York, NY, USA, 2004. ACM.
[12]
A. Gimnez, T. Gamblin, B. Rountree, A. Bhatele, I. Jusufi, P.-T. Bremer, and B. Hamann. Dissecting on-node memory access performance: A semantic approach. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC14, pages 166-176, Piscataway, NJ, USA, 2014. IEEE Press.
[13]
Intel Corporation. Intel R 64 and IA-32 Architectures Software Developer's Manual. 2015.
[14]
J. Jeffers and J. Reinders. Intel Xeon Phi Coprocessor High Performance Programming. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1st edition, 2013.
[15]
R. Lachaize, B. Lepers, and V. Quma. Memprof: A memory profiler for numa multicore systems. In Proceedings of the 2012 USENIX Conference on Annual Technical Conference, USENIX ATC12, pages 5-5, Berkeley, CA, USA, 2012. USENIX Association.
[16]
M. Laurenzano, M. Tikir, L. Carrington, and A. Snavely. Pebil: Efficient static binary instrumentation for linux. In Performance Analysis of Systems Software (ISPASS), 2010 IEEE International Symposium on, pages 175-183, March 2010.
[17]
X. Liu and J. Mellor-Crummey. Pinpointing data locality problems using data-centric analysis. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO11, pages 171-180, Washington, DC, USA, 2011. IEEE Computer Society.
[18]
X. Liu and J. Mellor-Crummey. A data-centric profiler for parallel programs. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC13, pages 28:1-28:12, New York, NY, USA, 2013. ACM.
[19]
X. Liu and J. Mellor-Crummey. A tool to analyze the performance of multithreaded programs on numa architectures. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP14, pages 259-272, New York, NY, USA, 2014. ACM.
[20]
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V.J. Reddi, and K. Hazelwood. Pin: Building customized program analysis tools with dynamic instrumentation. SIGPLAN Not., 40(6):190-200, June 2005.
[21]
C. McCurdy and J. Vetter. Memphis: Finding and fixing numa-related performance problems on multi-core platforms. In Performance Analysis of Systems & Software (ISPASS), 2010 IEEE International Symposium on, pages 87-96. IEEE, 2010.
[22]
B. P. Miller, M.D. Callaghan, J.M. Cargille, J.K. Hollingsworth, R.B. Irvin, K.L. Karavanic, K. Kunchithapadam, and T. Newhall. The paradyn parallel performance measurement tool. Computer, 28(11):37-46, Nov. 1995.
[23]
P. J. Mucci, S. Browne, C. Deane, and G. Ho. Papi: A portable interface to hardware performance counters. In In Proceedings of the Department of Defense HPCMP Users Group Conference, pages 7-10, 1999.
[24]
D.A. Patterson, Latency lags bandwidth, Communication of ACM, 47 (2004) 71-75.
[25]
K. O. Seager, A. Tiwari, M.A. Laurenzano, J. Peraza, P. Cicotti, and L. Carrington. Efficient hpc data motion via scratchpad memory. In Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC12, pages 801-805, Washington, DC, USA, 2012. IEEE Computer Society.
[26]
S.S. Shende, A.D. Malony, The tau parallel performance system, Int. J. High Perform Comput. Appl., 20 (May 2006) 287-311.
[27]
S. Steinke, L. Wehmeyer, B. Lee, and P. Marwedel. Assigning program and data objects to scratchpad for energy reduction. In Proceedings of the Conference on Design, Automation and Test in Europe, DATE02, pages 409, Washington, DC, USA, 2002. IEEE Computer Society.
[28]
A. Suresh, P. Cicotti, and L. Carrington. Evaluation of emerging memory technologies for hpc, data intensive applications. In Cluster Computing (CLUSTER), 2014 IEEE International Conference on, pages 239-247. IEEE, 2014.
[29]
W.A. Wulf, S.A. McKee, Hitting the memory wall: Implications of the obvious, SIGARCH Comput. Archit. News, 23 (Mar 1995) 20-24.
[30]
D. R. Zerbino and E. Birney. Velvet: algorithms for de novo short read assembly using de bruijn graphs. Genome research, 18(5):821-829, 2008.

Cited By

View all
  • (2019)ComDetectiveProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356214(1-21)Online publication date: 17-Nov-2019

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Procedia Computer Science
Procedia Computer Science  Volume 80, Issue C
June 2016
2452 pages
ISSN:1877-0509
EISSN:1877-0509
Issue’s Table of Contents

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 June 2016

Author Tags

  1. caches
  2. computer architecture
  3. memory system
  4. modeling
  5. profiling

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2019)ComDetectiveProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356214(1-21)Online publication date: 17-Nov-2019

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media