Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Interactive Trace-Based Analysis Toolset for Manual Parallelization of C Programs

Published: 21 January 2015 Publication History

Abstract

Massive amounts of legacy sequential code need to be parallelized to make better use of modern multiprocessor architectures. Nevertheless, writing parallel programs is still a difficult task. Automated parallelization methods can be effective both at the statement and loop levels and, recently, at the task level, but they are still restricted to specific source code constructs or application domains. We present in this article an innovative toolset that supports developers when performing manual code analysis and parallelization decisions. It automatically collects and represents the program profile and data dependencies in an interactive graphical format that facilitates the analysis and discovery of manual parallelization opportunities. The toolset can be used for arbitrary sequential C programs and parallelization patterns. Also, its program-scope data dependency tracing at runtime can complement the tools based on static code analysis and can also benefit from it at the same time. We also tested the effectiveness of the toolset in terms of time to reach parallelization decisions and of their quality. We measured a significant improvement for several real-world representative applications.

References

[1]
V. H. Allan, R. B. Jones, R. M. Lee, and S. J. Allan. 1995. Software pipelining. ACM Computing Survey 27, 3, 367--432.
[2]
R. Allen and K. Kennedy. 2002. Optimizing Compilers for Modern Architectures. Morgan Kaufmann, San Francisco.
[3]
K. Asanovic, R. Bodik, J. Demmel, T. Keaveny, K. Keutzer, J. Kubiatowicz, N. Morgan, D. Patterson, K. Sen, J. Wawrzynek, D. Wessel, and K. Yelick. 2009. A view of the parallel computing landscape. Communications of the ACM 52, 10, 56--67.
[4]
E. Athanasaki, N. Anastopoulos, K. Kourtis, and N. Koziris. 2008. Exploring the performance limits of simultaneous multithreading for memory intensive applications. Journal of Supercomputing 44, 1, 64--97.
[5]
D. F. Bacon, S. L. Graham, and O. J. Sharp. 1994. Compiler transformations for high-performance computing. ACM Computing Survey 26, 4, 345--420.
[6]
M.-W. Benabderrahmane, L.-N. Pouchet, A. Cohen, and C. Bastoul. 2010. The polyhedral model is more widely applicable than you think. In Compiler Construction, R. Gupta, Ed. Lecture Notes in Computer Science Series, vol. 6011. Springer, Berlin, 283--303.
[7]
D. Burger and J. Goodman. 2004. Billion-transistor architectures: There and back again. Computer 37, 3, 22--28.
[8]
Compaan Design BV. 2012. Retrieved from http://www.compaandesign.com/.
[9]
D. Culler, A. Dusseau, S. Goldstein, A. Krishnamurthy, S. Lumetta, T. von Eicken, and K. Yelick. 1993. Parallel programming in Split-C. In Proceedings of Supercomputing’93. 262--273.
[10]
J. González and A. González. 1998. The potential of data value speculation to boost ILP. In Proceedings of the 12th International Conference on Supercomputing. ICS’98. ACM, New York, NY, USA, 21--28.
[11]
B. Goossens and D. Parello. 2013. Limits of instruction-level parallelism capture. Procedia Computer Science 18, 0, 1664--1673. International Conference on Computational Science.
[12]
J. L. Hennessy and D. A. Patterson. 2012. Computer Architecture: A Quantitative Approach. Elsevier.
[13]
W.-M. Hwu, K. Keutzer, and T. Mattson. 2008. The concurrency challenge. IEEE Design Test of Computers 25, 4, 312--320.
[14]
G. Kahn. 1974. The semantics of a simple language for parallel programming. In Information Processing, J. L. Rosenfeld, Ed. North Holland, Amsterdam, Stockholm, Sweden, 471--475.
[15]
V. Kathail, S. Aditya, R. Schreiber, B. Ramakrishna Rau, D. Cronquist, and M. Sivaraman. 2002. Pico: Automatically designing custom computers. Computer 35, 9, 39--47.
[16]
B. Kienhuis, E. Rijpkema, and E. F. Deprettere. 2000. Compaan: Deriving process networks from Matlab for embedded signal processing architectures. In Proceedings of the 8th International Workshop on Hardware/Software Codesign. 13--17.
[17]
T. Mattson, B. Sanders, and B. Massingill. 2004. Patterns for Parallel Programming. Software Patterns Series. Pearson Education.
[18]
J.-Y. Mignolet, R. Baert, T. J. Ashby, P. Avasare, H.-O. Jang, and J. C. Son. 2009. MPA: Parallelizing an application onto a multicore platform made easy. IEEE Micro 29, 3, 31--39.
[19]
G. C. Necula, S. Mcpeak, S. P. Rahul, and W. Weimer. 2002. CIL: Intermediate language and tools for analysis and transformation of C programs. In Proceedings of the International Conference on Compiler Construction. 213--228.
[20]
G. Ottoni, R. Rangan, A. Stoler, and D. August. 2005. Automatic thread extraction with decoupled software pipelining. In Proceedings of 38th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO. IEEE. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1540952&tag=1.
[21]
E. Pietriga. 2005. A toolkit for addressing HCI issues in visual language environments. In Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC’05) 00, 145--152.
[22]
G. Ramalingam. 1994. The undecidability of aliasing. ACM Transactions on Programming Languages and Systems 16, 5, 1467--1471.
[23]
W. Thies, V. Chandrasekhar, and S. Amarasinghe. 2007. A practical approach to exploiting coarse-grained pipeline parallelism in C programs. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’07). 356--369.
[24]
G. Tournavitis, Z. Wang, B. Franke, and M. F. O’Boyle. 2009. Towards a holistic approach to auto-parallelization: Integrating profile-driven parallelism detection and machine-learning based mapping. SIGPLAN Notes 44, 6, 177--187.
[25]
H. Vandierendonck, S. Rul, and K. De Bosschere. 2010. The Paralax infrastructure: Automatic parallelization with a helping hand. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT’10). ACM, New York, NY, 389--400.
[26]
R. P. Wilson, R. S. French, C. S. Wilson, S. P. Amarasinghe, J. M. Anderson, S. W. K. Tjiang, S.-W. Liao, C.-W. Tseng, M. W. Hall, M. S. Lam, and J. L. Hennessy. 1994. Suif: An infrastructure for research on parallelizing and optimizing compilers. SIGPLAN Notes 29, 12, 31--37.
[27]
C. Yang, Y. Chen, X. Fu, C.-C. Lim, and R. Ju. 2006. A comparison of parallelization and performance optimizations for two ray-tracing applications. Proceedings of HPC&S 6, 321--330.

Cited By

View all
  • (2019)Design and Performance Analysis of Real-Time Dynamic Streaming ApplicationsLanguages and Compilers for Parallel Computing10.1007/978-3-030-34627-0_2(21-36)Online publication date: 13-Nov-2019
  • (2018)Making Break-ups Less PainfulProceedings of the 2018 Workshop on Forming an Ecosystem Around Software Transformation10.1145/3273045.3273046(14-19)Online publication date: 15-Oct-2018
  • (2017)A Region-Based Approach to Pipeline Parallelism in Java Programs on Multicores2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)10.1109/PDP.2017.69(124-131)Online publication date: 2017

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 14, Issue 1
January 2015
443 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/2724585
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 21 January 2015
Accepted: 01 June 2014
Revised: 01 December 2013
Received: 01 December 2012
Published in TECS Volume 14, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Legacy C program parallelization
  2. data dependency analysis
  3. execution profiling
  4. graph abstraction
  5. graph analysis
  6. source annotation

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • European Commission in the context of the FP7 HEAP and PHARAON projects

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)1
Reflects downloads up to 30 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Design and Performance Analysis of Real-Time Dynamic Streaming ApplicationsLanguages and Compilers for Parallel Computing10.1007/978-3-030-34627-0_2(21-36)Online publication date: 13-Nov-2019
  • (2018)Making Break-ups Less PainfulProceedings of the 2018 Workshop on Forming an Ecosystem Around Software Transformation10.1145/3273045.3273046(14-19)Online publication date: 15-Oct-2018
  • (2017)A Region-Based Approach to Pipeline Parallelism in Java Programs on Multicores2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP)10.1109/PDP.2017.69(124-131)Online publication date: 2017

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media