research-article

ADAMANT

Authors:

Pietro Cicotti,

Laura CarringtonAuthors Info & Claims

Procedia Computer Science, Volume 80, Issue C

Pages 450 - 460

https://doi.org/10.1016/j.procs.2016.05.323

Published: 01 June 2016 Publication History

Abstract

In the converging world of High Performance Computing and Big Data, moving data is becoming a critical aspect of performance and energy efficiency. In this paper we present the Advanced DAta Movement Analysis Toolkit (ADAMANT), a set of tools to capture and analyze data movement within an application, and to aid in understanding performance and energy efficiency in current and future systems. ADAMANT identifies all the data objects allocated by an application and uses instrumentation modules to monitor relevant events (e.g. cache misses). Finally, ADAMANT produces a per-object performance profile.In this paper we demonstrate the use of ADAMANT in analyzing three applications, BT, BFS, and Velvet, and evaluate the impact of different memory technology. With the information produced by ADAMANT we were able to model and compare different memory configurations and object placement solutions. In BFS we devised a placement which outperforms caching, while in the other two cases we were able to point out which data objects may be problematic for the configurations explored, and would require refactoring to improve performance.

References

[1]

L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, and N. R. Tallent. Hpctoolkit: Tools for performance analysis of optimized parallel programs http://hpctoolkit.org. Concurr. Comput.: Pract. Exper., 22(6):685-701, Apr. 2010.

Digital Library

[2]

Advanced Micro Devices. AMD64 Architecture Programmers Manual Volume 2: System Programming. 2015.

[3]

J. A. Ang, B.W. Barrett, K.B. Wheeler, and R. C. Murphy. Introducing the graph 500. In Proceedings of Cray User's Group Meeting (CUG), May 2010.

[4]

D. Bailey, E. Barszcz, J. Barton, D. Browning, R. Carter, L. Dagum, R. Fatoohi, P. Frederickson, T. Lasinski, R. Schreiber, H. Simon, V. Venkatakrishnan, and S. Weeratunga. The nas parallel benchmarks summary and preliminary results. In Supercomputing, 1991. Supercomputing91. Proceedings of the 1991 ACM/IEEE Conference on, pages 158-165, Nov 1991.

Digital Library

[5]

K. Beyls and E. H. D'Hollander. Discovery of locality-improving refactorings by reuse path analysis. In Proceedings of the Second International Conference on High Performance Computing and Communications, HPCC06, pages 220-229, Berlin, Heidelberg, 2006. Springer-Verlag.

Digital Library

[6]

L. Carrington, A. Snavely, X. Gao, and N. Wolter. A performance prediction framework for scientific applications. In Proceedings of the 2003 International Conference on Computational Science: PartIII, ICCS03, pages 926-935, Berlin, Heidelberg, 2003. Springer-Verlag.

Digital Library

[7]

P. Cicotti, L. Carrington, and A. Chien. Toward application-specific memory reconfiguration for energy efficiency. In Proceedings of the 1st International Workshop on Energy Efficient Supercomputing, page 2. ACM, 2013.

Digital Library

[8]

S. Cook. CUDA Programming: A Developer's Guide to Parallel Computing with GPUs. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1st edition, 2013.

Digital Library

[9]

C. Ding and Y. Zhong. Predicting whole-program locality through reuse distance analysis. SIGPLAN Not., 38(5):245-257, May 2003.

Digital Library

[10]

P. J. Drongowski. Instruction-Based Sampling: A New Performance Analysis Technique for AMD Family 10h Processors, 2007.

[11]

C. Fang, S. Carr, S. nder, and Z. Wang. Reuse-distance-based miss-rate prediction on a per instruction basis. In Proceedings of the 2004 Workshop on Memory System Performance, MSP04, pages 60-68, New York, NY, USA, 2004. ACM.

Digital Library

[12]

A. Gimnez, T. Gamblin, B. Rountree, A. Bhatele, I. Jusufi, P.-T. Bremer, and B. Hamann. Dissecting on-node memory access performance: A semantic approach. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC14, pages 166-176, Piscataway, NJ, USA, 2014. IEEE Press.

Digital Library

[13]

Intel Corporation. Intel R 64 and IA-32 Architectures Software Developer's Manual. 2015.

[14]

J. Jeffers and J. Reinders. Intel Xeon Phi Coprocessor High Performance Programming. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1st edition, 2013.

Digital Library

[15]

R. Lachaize, B. Lepers, and V. Quma. Memprof: A memory profiler for numa multicore systems. In Proceedings of the 2012 USENIX Conference on Annual Technical Conference, USENIX ATC12, pages 5-5, Berkeley, CA, USA, 2012. USENIX Association.

Digital Library

[16]

M. Laurenzano, M. Tikir, L. Carrington, and A. Snavely. Pebil: Efficient static binary instrumentation for linux. In Performance Analysis of Systems Software (ISPASS), 2010 IEEE International Symposium on, pages 175-183, March 2010.

[17]

X. Liu and J. Mellor-Crummey. Pinpointing data locality problems using data-centric analysis. In Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization, CGO11, pages 171-180, Washington, DC, USA, 2011. IEEE Computer Society.

Digital Library

[18]

X. Liu and J. Mellor-Crummey. A data-centric profiler for parallel programs. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC13, pages 28:1-28:12, New York, NY, USA, 2013. ACM.

Digital Library

[19]

X. Liu and J. Mellor-Crummey. A tool to analyze the performance of multithreaded programs on numa architectures. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP14, pages 259-272, New York, NY, USA, 2014. ACM.

Digital Library

[20]

C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V.J. Reddi, and K. Hazelwood. Pin: Building customized program analysis tools with dynamic instrumentation. SIGPLAN Not., 40(6):190-200, June 2005.

Digital Library

[21]

C. McCurdy and J. Vetter. Memphis: Finding and fixing numa-related performance problems on multi-core platforms. In Performance Analysis of Systems & Software (ISPASS), 2010 IEEE International Symposium on, pages 87-96. IEEE, 2010.

[22]

B. P. Miller, M.D. Callaghan, J.M. Cargille, J.K. Hollingsworth, R.B. Irvin, K.L. Karavanic, K. Kunchithapadam, and T. Newhall. The paradyn parallel performance measurement tool. Computer, 28(11):37-46, Nov. 1995.

Digital Library

[23]

P. J. Mucci, S. Browne, C. Deane, and G. Ho. Papi: A portable interface to hardware performance counters. In In Proceedings of the Department of Defense HPCMP Users Group Conference, pages 7-10, 1999.

[24]

D.A. Patterson, Latency lags bandwidth, Communication of ACM, 47 (2004) 71-75.

Digital Library

[25]

K. O. Seager, A. Tiwari, M.A. Laurenzano, J. Peraza, P. Cicotti, and L. Carrington. Efficient hpc data motion via scratchpad memory. In Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC12, pages 801-805, Washington, DC, USA, 2012. IEEE Computer Society.

Digital Library

[26]

S.S. Shende, A.D. Malony, The tau parallel performance system, Int. J. High Perform Comput. Appl., 20 (May 2006) 287-311.

Digital Library

[27]

S. Steinke, L. Wehmeyer, B. Lee, and P. Marwedel. Assigning program and data objects to scratchpad for energy reduction. In Proceedings of the Conference on Design, Automation and Test in Europe, DATE02, pages 409, Washington, DC, USA, 2002. IEEE Computer Society.

Digital Library

[28]

A. Suresh, P. Cicotti, and L. Carrington. Evaluation of emerging memory technologies for hpc, data intensive applications. In Cluster Computing (CLUSTER), 2014 IEEE International Conference on, pages 239-247. IEEE, 2014.

[29]

W.A. Wulf, S.A. McKee, Hitting the memory wall: Implications of the obvious, SIGARCH Comput. Archit. News, 23 (Mar 1995) 20-24.

Digital Library

[30]

D. R. Zerbino and E. Birney. Velvet: algorithms for de novo short read assembly using de bruijn graphs. Genome research, 18(5):821-829, 2008.

Cited By

Sasongko MChabbi MAkhtar PUnat DTaufer MBalaji PPeña A(2019)ComDetectiveProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356214(1-21)Online publication date: 17-Nov-2019
https://dl.acm.org/doi/10.1145/3295500.3356214

ADAMANT

Recommendations

Where replacement algorithms fail: a thorough analysis
CF '10: Proceedings of the 7th ACM international conference on Computing frontiers

Cache placement and eviction, especially at the last level of the memory hierarchy, have received a flurry of research activity recently. The common perception that LRU is a well-performing algorithm has recently been discredited: many researchers have ...
Partitioned instruction cache architecture for energy efficiency

The demand for high-performance architectures and powerful battery-operated mobile devices has accentuated the need for low-power systems. In many media and embedded applications, the memory system can consume more than 50% of the overall system energy, ...
Using a user-level memory thread for correlation prefetching
ISCA '02: Proceedings of the 29th annual international symposium on Computer architecture

This paper introduces the idea of using a User-Level Memory Thread (ULMT) for correlation prefetching. In this approach, a user thread runs on a general-purpose processor in main memory, either in the memory controller chip or in a DRAM chip. The thread ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Procedia Computer Science

Procedia Computer Science Volume 80, Issue C

June 2016

2452 pages

ISSN:1877-0509

EISSN:1877-0509

Issue’s Table of Contents

Copyright © The Authors.

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 June 2016

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Sasongko MChabbi MAkhtar PUnat DTaufer MBalaji PPeña A(2019)ComDetectiveProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356214(1-21)Online publication date: 17-Nov-2019
https://dl.acm.org/doi/10.1145/3295500.3356214

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents