Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article
Open access

PARSECSs: Evaluating the Impact of Task Parallelism in the PARSEC Benchmark Suite

Published: 22 December 2015 Publication History

Abstract

In this work, we show how parallel applications can be implemented efficiently using task parallelism. We also evaluate the benefits of such parallel paradigm with respect to other approaches. We use the PARSEC benchmark suite as our test bed, which includes applications representative of a wide range of domains from HPC to desktop and server applications. We adopt different parallelization techniques, tailored to the needs of each application, to fully exploit the task-based model. Our evaluation shows that task parallelism achieves better performance than thread-based parallelization models, such as Pthreads. Our experimental results show that we can obtain scalability improvements up to 42% on a 16-core system and code size reductions up to 81%. Such reductions are achieved by removing from the source code application specific schedulers or thread pooling systems and transferring these responsibilities to the runtime system software.

Supplementary Material

TACO1204-41 (taco1204-41.pdf)
Slide deck associated with this paper

References

[1]
Aaron B. Adcock, Blair D. Sullivan, Oscar R. Hernandez, and Michael W. Mahoney. 2013. Evaluating OpenMP tasking at scale for the computation of graph hyperbolicity. In OpenMP in the Era of Low Power Devices and Accelerators. Lecture Notes in Computer Science, Vol. 8122. Springer, 71--83.
[2]
Malte Appeltauer, Robert Hirschfeld, Michael Haupt, Jens Lincke, and Michael Perscheid. 2009. A comparison of context-oriented programming languages. In Proceedings of the International Workshop on Context-Oriented Programming (COP’09). ACM, New York, NY, Article No. 6.
[3]
Eduard Ayguadé, Nawal Copty, Alejandro Duran, Jay Hoeflinger, Yuan Lin, Federico Massaioli, Xavier Teruel, Priya Unnikrishnan, and Guansong Zhang. 2009. The design of OpenMP tasks. IEEE Transactions on Parallel and Distributed Systems 20, 3, 404--418.
[4]
Eduard Ayguadé, Alejandro Duran, Jay Hoeflinger, Federico Massaioli, and Xavier Teruel. 2008. An experimental evaluation of the new OpenMP tasking model. In Languages and Compilers for Parallel Computing. Springer-Verlag, 63--77.
[5]
Prithviraj Banerjee. 1994. Parallel Algorithms for VLSI Computer-Aided Design. Prentice Hall.
[6]
Pieter Bellens, Josep M. Perez, Rosa M. Badia, and Jesus Labarta. 2006. CellSs: A programming model for the cell BE architecture. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC’06). Article No. 86. http://doi.acm.org/10.1145/1188455.1188546
[7]
Christian Bienia. 2011. Benchmarking Modern Multiprocessors. Ph.D. Dissertation. Princeton University, Princeton, NJ.
[8]
Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08). ACM, New York, NY 72--81.
[9]
Fischer Black and Myron S. Scholes. 1973. The pricing of options and corporate liabilities. Journal of Political Economy 81, 3, 637--654. http://ideas.repec.org/a/ucp/jpolec/v81y1973i3p637-54.html.
[10]
Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. 1995. Cilk: An efficient multithreaded runtime system. ACM SIGPLAN Notices 30, 8, 207--216.
[11]
David R. Butenhof. 1997. Programming with POSIX Threads. Addison Wesley Longman.
[12]
Barbara Chapman. 2007. The multicore programming challenge. In Advanced Parallel Processing Technologies. Springer.
[13]
Barbara Chapman, Gabriele Jost, and Ruud van der Pas. 2007. Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation). MIT Press, Cambridge, MA.
[14]
Cristian Coarfa, Yuri Dotsenko, John Mellor-Crummey, François Cantonnet, Tarek El-Ghazawi, Ashrujit Mohanti, Yiyi Yao, and Daniel Chavarría-Miranda. 2005. An evaluation of global address space languages: Co-array fortran and unified parallel C. In Proceedings of the 10th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP’05). ACM, New York, NY, 36--47.
[15]
Henry Cook, Miquel Moreto, Sarah Bird, Khanh Dao, David A. Patterson, and Krste Asanovic. 2013. A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness. ACM SIGARCH Compututer Architecture News 41, 3, 308--319.
[16]
Jack Dongarra, Robert Graybill, William Harrod, Robert F. Lucas, Ewing L. Lusk, Piotr Luszczek, Janice McMahon, Allan Snavely, Jeffrey S. Vetter, Katherine A. Yelick, Sadaf R. Alam, Roy L. Campbell, Laura Carrington, Tzu-Yi Chen, Omid Khalili, Jeremy S. Meredith, and Mustafa M. Tikir. 2008. DARPA’s HPCS program—history, models, tools, languages. Advances in Computers 72, 1--100.
[17]
Alejandro Duran, Eduard Ayguadé, Rosa M. Badia, Jesús Labarta, Luis Martinell, Xavier Martorell, and Judit Planas. 2011. OmpSs: A proposal for programming heterogeneous multi-core architectures. Parallel Processing Letters 21, 2, 173--193.
[18]
Alejandro Duran, Roger Ferrer, Eduard Ayguadé, Rosa M. Badia, and Jesus Labarta. 2009. A proposal to extend the OpenMP tasking model with dependent tasks. International Journal of Parallel Programming 37, 3, 292--305.
[19]
Kayvon Fatahalian, Daniel Reiter Horn, Timothy J. Knight, Larkhoon Leem, Mike Houston, Ji Young Park, Mattan Erez, Manman Ren, Alex Aiken, William J. Dally, and Pat Hanrahan. 2006. Sequoia: Programming the memory hierarchy. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (SC’06). ACM, New York, NY, Article 83.
[20]
Gosta Grahne and Jianfei Zhu. 2003. Efficiently using prefix-trees in mining frequent itemsets. In Proceedings of the Workshop on Frequent Itemset Mining Implementations (FIMI’03). http://dblp.uni-trier.de/db/conf/fimi/fimi2003.html#GrahneZ03.
[21]
Jiawei Han, Jian Pei, and Yiwen Yin. 2000. Mining frequent patterns without candidate generation. SIGMOD Record 29, 2, 1--12.
[22]
David Heath, Robert Jarrow, and Andrew Morton. 1992. Bond pricing and the term structure of interest rates: A new methodology for contingent claims valuation. Econometrica 60, 1, 77--105. http://ideas.repec.org/a/ecm/emetrp/v60y1992i1p77-105.html.
[23]
James Christopher Jenista, Yong Hun Eom, and Brian Charles Demsky. 2011. OoOJava: Software out-of-order execution. ACM SIGPLAN Notices 46, 8, 57--68.
[24]
Ian Karlin, Abhinav Bhatele, Jeff Keasler, Bradford L. Chamberlain, Jonathan Cohen, Zachary Devito, Riyaz Haque, Dan Laney, Edward Luke, Felix Wang, David Richards, Martin Schulz, and Charles H. Still. 2013. Exploring traditional and emerging parallel programming models using a proxy application. In Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing (IPDPS’13). IEEE, Los Alamitos, CA, 919--932.
[25]
I-Ting Angelina Lee, Charles E. Leiserson, Tao B. Schardl, Jim Sukha, and Zhunping Zhang. 2013. On-the-fly pipeline parallelism. In Proceedings of the 25th Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA’13). ACM, New York, NY, 140--151. http://doi.acm.org/10.1145/2486159.2486174
[26]
Qin Lv, William Josephson, Zhe Wang, Moses Charikar, and Kai Li. 2006. Ferret: A toolkit for content-based similarity search of feature-rich data. SIGOPS Operating Systems Review 40, 4, 317--330.
[27]
Matthias Müller, David Charypar, and Markus Gross. 2003. Particle-based fluid simulation for interactive applications. In Proceedings of the 2003 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (SCA’03). 154--159. http://dl.acm.org/citation.cfm?id=846276.846298
[28]
Dan Nagle. 2005. MPI—The Complete Reference, vol. 1, The MPI Core, 2nd ed., Scientific and Engineering Computation Series, by Marc Snir, Steve Otto, Steven Huss-Lederman, David Walker and Jack Dongarra. Scientific Programming 13, 1, 57--63.
[29]
OpenMP Architecture Review Board. 2013. OpenMP Application Program Interface Version 3.0. http://www.openmp.org/mp-documents/openmp-4.5.pdf.
[30]
Judit Planas, Rosa M. Badia, Eduard Ayguadé, and Jesus Labarta. 2009. Hierarchical task-based programming with StarSs. International Journal of High Performance Computing Applications 23, 3, 284--299.
[31]
Artur Podobas and Mats Brorsson. 2010. A comparison of some recent task-based parallel programming models. In Proceedings of the 3rd Workshop on Programmability Issues for Multi-Core Computers (MULTIPROG’10).
[32]
Sean Quinlan and Sean Dorward. 2002. Awarded best paper! Venti: A new approach to archival data storage. In Proceedings of the 1st USENIX Conference on File and Storage Technologies (FAST’02). Article No. 7. http://dl.acm.org/citation.cfm?id=1083323.1083333.
[33]
Eftychios Sifakis, Igor Neverov, and Ronald Fedkiw. 2005. Automatic determination of facial muscle activations from sparse motion capture marker data. ACM Transactions on Graphics 24, 3, 417--425.
[34]
Christi Symeonidou, Polyvios Pratikakis, Angelos Bilas, and Dimitrios S. Nikolopoulos. 2013. DRASync: Distributed region-based memory allocation and synchronization. In Proceedings of the 20th European MPI Users’ Group Meeting (EuroMPI’13). ACM, New York, NY, 49--54.
[35]
George Tzenakis, Angelos Papatriantafyllou, John Kesapides, Polyvios Pratikakis, Hans Vandierendonck, and Dimitrios S. Nikolopoulos. 2012. BDDT: Block-level dynamic dependence analysis for deterministic task-based parallelism. ACM SIGPLAN Notices 47, 8, 301--302.
[36]
Hans Vandierendonck, Kallia Chronaki, and Dimitrios S. Nikolopoulos. 2013. Deterministic scale-free pipeline parallelism with hyperqueues. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC’13). ACM, New York, NY, Article No. 32. http://doi.acm.org/10.1145/2503210.2503233
[37]
Hans Vandierendonck, Polyvios Pratikakis, and Dimitrios S. Nikolopoulos. 2011. Parallel programming of general-purpose programs using task-based programming models. In Proceedings of the 3rd USENIX Conference on Hot Topic in Parallelism (HotPar’11). 13. http://dl.acm.org/citation.cfm?id=2001252.2001265.
[38]
Stéphane Zuckerman, Joshua Suetterlein, Rob Knauerhase, and Guang R. Gao. 2011. Using a “Codelet” program execution model for exascale machines: Position paper. In Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era (EXADAPT’11). ACM, New York, NY, 64--69. http://doi.acm.org/10.1145/2000417.2000424.

Cited By

View all
  • (2023)Studying the expressiveness and performance of parallelization abstractions for linear pipelinesProceedings of the 14th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3582514.3582522(29-38)Online publication date: 25-Feb-2023
  • (2023)An Interplay of Energy and Temperature Minimization Techniques for Heterogeneous Multiprocessor SystemsTENCON 2023 - 2023 IEEE Region 10 Conference (TENCON)10.1109/TENCON58879.2023.10322452(629-634)Online publication date: 31-Oct-2023
  • (2023)Dynamic power budget redistribution under a power cap on multi-application environmentsSustainable Computing: Informatics and Systems10.1016/j.suscom.2023.10086538(100865)Online publication date: Apr-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization
ACM Transactions on Architecture and Code Optimization  Volume 12, Issue 4
January 2016
848 pages
ISSN:1544-3566
EISSN:1544-3973
DOI:10.1145/2836331
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 December 2015
Accepted: 01 September 2015
Revised: 01 September 2015
Received: 01 March 2015
Published in TACO Volume 12, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Parallel Applications
  2. concurrency
  3. parallel architectures
  4. parallel benchmarks
  5. parallel runtime systems
  6. scalable applications
  7. synchronization
  8. task-based programming models

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Spanish Miinistry of Science and Innovation
  • Eropean Unions 7th FP, ERC
  • Spanish Goverment Severo Ochoa
  • Secretary for Universities and Research of the Ministry of Economy and Knowledge of the Goverment of Catalonia
  • Juan de la Cierva post-doctoral fellowship
  • Marie Curie Actions of the 7th R&D Framework Programme of teh European Union

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)166
  • Downloads (Last 6 weeks)42
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Studying the expressiveness and performance of parallelization abstractions for linear pipelinesProceedings of the 14th International Workshop on Programming Models and Applications for Multicores and Manycores10.1145/3582514.3582522(29-38)Online publication date: 25-Feb-2023
  • (2023)An Interplay of Energy and Temperature Minimization Techniques for Heterogeneous Multiprocessor SystemsTENCON 2023 - 2023 IEEE Region 10 Conference (TENCON)10.1109/TENCON58879.2023.10322452(629-634)Online publication date: 31-Oct-2023
  • (2023)Dynamic power budget redistribution under a power cap on multi-application environmentsSustainable Computing: Informatics and Systems10.1016/j.suscom.2023.10086538(100865)Online publication date: Apr-2023
  • (2022)Execution Time Estimation of Multithreaded Programs With Critical SectionsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.314345533:10(2470-2481)Online publication date: 1-Oct-2022
  • (2022)A Survey of Performance Tuning Techniques and Tools for Parallel ApplicationsIEEE Access10.1109/ACCESS.2022.314784610(15036-15055)Online publication date: 2022
  • (2022)A Task-Based Approach to Parallel Restricted Hartree–Fock CalculationsJournal of Chemical Theory and Computation10.1021/acs.jctc.1c0082018:4(2144-2161)Online publication date: 4-Apr-2022
  • (2022)A Quantitative Analysis of OpenMP Task Runtime SystemsBenchmarking, Measuring, and Optimizing10.1007/978-3-031-31180-2_1(3-18)Online publication date: 7-Nov-2022
  • (2021)Particle-In-Cell Simulation Using Asynchronous TaskingEuro-Par 2021: Parallel Processing10.1007/978-3-030-85665-6_30(482-498)Online publication date: 25-Aug-2021
  • (2020)AutoParBenchProceedings of the 34th ACM International Conference on Supercomputing10.1145/3392717.3392744(1-10)Online publication date: 29-Jun-2020
  • (2020)No barrier in the roadProceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3332466.3374535(348-361)Online publication date: 19-Feb-2020
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media