research-article

Harmony: collection and analysis of parallel block vectors

Authors:

Melanie Kambadur,

Martha A. KimAuthors Info & Claims

ISCA '12: Proceedings of the 39th Annual International Symposium on Computer Architecture

Pages 452 - 463

Published: 09 June 2012 Publication History

Abstract

Efficient execution of well-parallelized applications is central to performance in the multicore era. Program analysis tools support the hardware and software sides of this effort by exposing relevant features of multithreaded applications. This paper describes parallel block vectors, which uncover previously unseen characteristics of parallel programs. Parallel block vectors provide block execution profiles per concurrency phase (e.g., the block execution profile of all serial regions of a program). This information provides a direct and fine-grained mapping between an application's runtime parallel phases and the static code that makes up those phases. This paper also demonstrates how to collect parallel block vectors with minimal application perturbation using Harmony. Harmony is an instrumentation pass for the LLVM compiler that introduces just 16-21% overhead on average across eight Parsec benchmarks.

We apply parallel block vectors to uncover several novel insights about parallel applications with direct consequences for architectural design. First, that the serial and parallel phases of execution used in Amdahl's Law are often composed of many of the same basic blocks. Second, that program features, such as instruction mix, vary based on the degree of parallelism, with serial phases in particular displaying different instruction mixes from the program as a whole. Third, that dynamic execution frequencies do not necessarily correlate with a block's parallelism.

References

[1]

T. E. Anderson and E. D. Lazowska. Quartz: a tool for tuning parallel program performance. SIGMETRICS Performance Evaluation Review, 18:115--125, 1990.

Digital Library

[2]

C. Augonnet, S. Thibault, R. Namyst, and P.-A. Wacrenier. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience, 2010.

Digital Library

[3]

M. Bach, M. Charney, R. Cohn, E. Demikhovsky, T. Devor, K. Hazelwood, A. Jaleel, C.-K. Luk, G. Lyons, H. Patil, and A. Tal. Analyzing parallel programs with Pin. Computer, 43(3):34--41, 2010.

Digital Library

[4]

C. Bienia. Benchmarking Modern Multiprocessors. PhD thesis, Princeton University, 2011.

Digital Library

[5]

K. Frlinger and M. Gerndt. ompP: A profiling tool for OpenMP. In Proceedings of the International Workshop on OpenMP, 2005.

Digital Library

[6]

S. Garcia, D. Jeon, C. M. Louie, and M. B. Taylor. Kremlin: rethinking and rebooting gprof for the multicore age. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 458--469, 2011.

Digital Library

[7]

S. L. Graham, P. B. Kessler, and M. K. Mckusick. Gprof: A call graph execution profiler. SIGPLAN Notices, 17:120--126, 1982.

Digital Library

[8]

J. Gummaraju, L. Morichetti, M. Houston, B. Sander, B. R. Gaster, and B. Zheng. Twin peaks: a software platform for heterogeneous computing on general-purpose and graphics processors. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 205--216, 2010.

Digital Library

[9]

Y. He, C. E. Leiserson, and W. M. Leiserson. The Cilkview scalability analyzer. In Proceedings of the Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 145--156, 2010.

Digital Library

[10]

O. Hernandez, R. C. Nanjegowda, B. Chapman, V. Bui, and R. Kufrin. Open source software support for the OpenMP runtime API for profiling. In Proceedings of the International Conference on Parallel Processing Workshops, ICPPW, pages 130--137, 2009.

Digital Library

[11]

M. D. Hill and M. R. Marty. Amdahl's law in the multicore era. Computer, 41:33--38, 2008.

Digital Library

[12]

Intel Corporation. Intel Parallel Amplifier 2011. http://software.intel.com/en-us/articles/intel-parallel-amplifier/.

[13]

Intel Corporation. Intel VTune Amplifier XE. http://software.intel.com/en-us/articles/intel-vtune-amplifier-xe/.

[14]

E. Ipek, M. Kirman, N. Kirman, and J. F. Martinez. Core fusion: accommodating software diversity in chip multiprocessors. In Proceedings of the International Symposium on Computer Architecture (ISCA), ISCA '07, pages 186--197, 2007.

Digital Library

[15]

M. Itzkowitz and Y. Maruyama. HPC profiling with the SunStudio performance tools. In Parallel Tools Workshop, 2009.

[16]

A. Jimborean, M. Herrmann, V. Loechner, and P. Clauss. VMAD: a virtual machine for advanced dynamic analysis of programs. In IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS, 2011.

Digital Library

[17]

D. Jones, Jr., S. Marlow, and S. Singh. Parallel performance tuning for haskell. In Proceedings of the 2nd ACM SIGPLAN Symposium on Haskell, Haskell, pages 81--92, 2009.

Digital Library

[18]

C. Kim, S. Sethumadhavan, D. Gulati, D. Burger, M. Govindan, N. Ranganathan, and S. Keckler. Composable lightweight processors. In Proceedings of the Annual International Symposium on Microarchitecture (MICRO), pages 381--394, 2007.

Digital Library

[19]

C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the International Symposium on Code Generation and Optimization (CGO), pages 75--, 2004.

Digital Library

[20]

A. D. Malony. Event-based performance perturbation: a case study. In Proceedings of the ACM SIGNPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 201--212, 1991.

Digital Library

[21]

G. McLaren. QProf: a scalable profiler for the Q back end. MIT PhD Thesis, 1995.

[22]

T. Moseley, D. A. Connors, D. Grunwald, and R. Peri. Identifying potential parallelism via loop-centric profiling. In Proceedings of the International Conference on Computing Frontiers, CF, pages 143--152, 2007.

Digital Library

[23]

E. B. Nightingale, O. Hodson, R. McIlroy, C. Hawblitzel, and G. Hunt. Helios: heterogeneous multiprocessing with satellite kernels. In Proceedings of the ACM SIGOPS Symposium on Operating Systems Principles, pages 221--234, 2009.

Digital Library

[24]

H. Patil, C. Pereira, M. Stallcup, G. Lueck, and J. Cownie. PinPlay: A framework for deterministic replay and reproducible analysis of parallel programs. In Proceedings of the International Symposium on Code Generation and Optimization (CGO), pages 2--11, 2010.

Digital Library

[25]

A. Rane and J. Browne. Performance optimization of data structures using memory access characterization. In IEEE International Conference on Cluster Computing (CLUSTER), pages 570--574, 2011.

Digital Library

[26]

V. J. Reddi, A. Settle, D. A. Connors, and R. S. Cohn. PIN: A binary instrumentation tool for computer architecture research and education. In Proceedings of the Workshop on Computer Architecture Education, WCAE, 2004.

Digital Library

[27]

B. Saha, X. Zhou, H. Chen, Y. Gao, S. Yan, M. Rajagopalan, J. Fang, P. Zhang, R. Ronen, and A. Mendelson. Programming model for a heterogeneous x86 platform. SIGPLAN Notices, 44:431--440, 2009.

Digital Library

[28]

K. Serebryany, A. Potapenko, T. Iskhodzhanov, and D. Vyukov. Dynamic race detection with the LLVM compiler, 2011.

[29]

S. S. Shende and A. D. Malony. The Tau parallel performance system. International Journal of High Performance Computing Applications, 20:287--311, 2006.

Digital Library

[30]

T. Sherwood, E. Perelman, G. Hamerly, and B. Calder. Automatically characterizing large scale program behavior. SIGOPS Operating Systems Review, 36:45--57, 2002.

Digital Library

[31]

M. D. Smith. Tracing with pixie. Technical Report CSL-TR-91-497, Department of Computer Science, Stanford University, 1991.

[32]

STMicroelectronics, Inc. PGProf: parallel profiling for scientists and engineers, 2011. http://www.pgroup.com/products/pgprof.htm.

[33]

N. R. Tallent and J. M. Mellor-Crummey. Effective performance measurement and analysis of multithreaded applications. SIGPLAN Notices, 44:229--240, 2009.

Digital Library

[34]

Valgrind Developers. Cachegrind: a cache and branch-prediction profiler. http://valgrind.org/docs/manual/cg-manual.html.

[35]

G. Venkatesh, J. Sampson, N. Goulding, S. Garcia, V. Bryksin, J. Lugo-Martinez, S. Swanson, and M. Taylor. Conservation cores: reducing the energy of mature computations. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 205--218, 2010.

Digital Library

[36]

J. Vetter and C. Chambreau. mpiP: Lightweight, Scalable MPI Profiling, 2011. http://mpip.sourceforge.net/.

[37]

H. Zhong, S. Lieberman, and S. Mahlke. Extending multicore architectures to exploit hybrid parallelism in single-thread applications. In Proceedings of the Symposium on High Performance Computer Architecture (HPCA), pages 25--36, 2007.

Digital Library

Cited By

Zhou FGan YMa SWang YArpaci-Dusseau AVoelker G(2018)wPerfProceedings of the 13th USENIX conference on Operating Systems Design and Implementation10.5555/3291168.3291207(527-543)Online publication date: 8-Oct-2018
https://dl.acm.org/doi/10.5555/3291168.3291207
Ham TAragón JMartonosi M(2017)Decoupling Data Supply from Computation for Latency-Tolerant Communication in Heterogeneous ArchitecturesACM Transactions on Architecture and Code Optimization10.1145/307562014:2(1-27)Online publication date: 28-Jun-2017
https://dl.acm.org/doi/10.1145/3075620
Zhang WJi XSong BYu SChen HLi TYew PZhao W(2017)VarCatcherIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.261352428:4(1215-1228)Online publication date: 1-Apr-2017
https://dl.acm.org/doi/10.1109/TPDS.2016.2613524
Show More Cited By

Harmony: collection and analysis of parallel block vectors
1. General and reference
  1. Cross-computing tools and techniques
2. Software and its engineering
  1. Software notations and tools

Recommendations

Harmony: collection and analysis of parallel block vectors
ISCA '12

Efficient execution of well-parallelized applications is central to performance in the multicore era. Program analysis tools support the hardware and software sides of this effort by exposing relevant features of multithreaded applications. This paper ...
Genre classification of music by tonal harmony
Machine Learning and Music

In this paper we present a genre classification framework for audio music based on a symbolic classification system. Audio signals are transformed into a symbolic representation of harmony using a chord transcription algorithm, based on the computation ...
Functional modelling of musical harmony: an experience report
ICFP '11

Music theory has been essential in composing and performing music for centuries. Within Western tonal music, from the early Baroque on to modern-day jazz and pop music, the function of chords within a chord sequence can be explained by harmony theory. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '12: Proceedings of the 39th Annual International Symposium on Computer Architecture

June 2012

584 pages

ISBN:9781450316422

General Chair:
Shih-Lien Lu
Intel
,
Program Chair:
Josep Torrellas
University of Illinois

ACM SIGARCH Computer Architecture News Volume 40, Issue 3
ISCA '12
June 2012
559 pages
ISSN:0163-5964
DOI:10.1145/2366231
Issue’s Table of Contents

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 09 June 2012

Check for updates

Qualifiers

Research-article

Conference

ISCA '12

Sponsor:

SIGARCH

ISCA '12: The 39th Annual International Symposium on Computer Architecture

June 9 - 13, 2012

Oregon, Portland

Acceptance Rates

ISCA '12 Paper Acceptance Rate 47 of 262 submissions, 18%;

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
371
Total Downloads

Downloads (Last 12 months)9
Downloads (Last 6 weeks)0

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhou FGan YMa SWang YArpaci-Dusseau AVoelker G(2018)wPerfProceedings of the 13th USENIX conference on Operating Systems Design and Implementation10.5555/3291168.3291207(527-543)Online publication date: 8-Oct-2018
https://dl.acm.org/doi/10.5555/3291168.3291207
Ham TAragón JMartonosi M(2017)Decoupling Data Supply from Computation for Latency-Tolerant Communication in Heterogeneous ArchitecturesACM Transactions on Architecture and Code Optimization10.1145/307562014:2(1-27)Online publication date: 28-Jun-2017
https://dl.acm.org/doi/10.1145/3075620
Zhang WJi XSong BYu SChen HLi TYew PZhao W(2017)VarCatcherIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.261352428:4(1215-1228)Online publication date: 1-Apr-2017
https://dl.acm.org/doi/10.1109/TPDS.2016.2613524
Rosà AChen LBinder W(2016)Actor profiling in virtual execution environmentsACM SIGPLAN Notices10.1145/3093335.299324152:3(36-46)Online publication date: 20-Oct-2016
https://dl.acm.org/doi/10.1145/3093335.2993241
Rosà AChen LBinder W(2016)Efficient profiling of actor-based applications in parallel and distributed systemsProceedings of the 11th Workshop on Implementation, Compilation, Optimization of Object-Oriented Languages, Programs and Systems10.1145/3012408.3012418(1-3)Online publication date: 17-Jul-2016
https://dl.acm.org/doi/10.1145/3012408.3012418
Rosà AChen LBinder WFischer BSchaefer I(2016)Actor profiling in virtual execution environmentsProceedings of the 2016 ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences10.1145/2993236.2993241(36-46)Online publication date: 20-Oct-2016
https://dl.acm.org/doi/10.1145/2993236.2993241
Rosà AChen LBinder WToth MFritchie S(2016)Profiling actor utilization and communication in AkkaProceedings of the 15th International Workshop on Erlang10.1145/2975969.2975972(24-32)Online publication date: 23-Sep-2016
https://dl.acm.org/doi/10.1145/2975969.2975972
Ham TAragón JMartonosi MPrvulovic M(2015)DeSCProceedings of the 48th International Symposium on Microarchitecture10.1145/2830772.2830800(191-203)Online publication date: 5-Dec-2015
https://dl.acm.org/doi/10.1145/2830772.2830800
Curtsinger CBerger EMiller EHand S(2015)CozProceedings of the 25th Symposium on Operating Systems Principles10.1145/2815400.2815409(184-197)Online publication date: 4-Oct-2015
https://dl.acm.org/doi/10.1145/2815400.2815409
Railing BHein EConte T(2015)ContechACM Transactions on Architecture and Code Optimization10.1145/277689312:2(1-24)Online publication date: 8-Jul-2015
https://dl.acm.org/doi/10.1145/2776893
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten