Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2464996.2465016acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article

Holistic run-time parallelism management for time and energy efficiency

Published: 10 June 2013 Publication History

Abstract

The ubiquity of parallel machines will necessitate time- and energy-efficient parallel execution of a program in a wide range of hardware and software environments. Prevalent parallel execution models can fail to be efficient. Unable to account for dynamic changes in operating conditions, they may create non-optimum parallelism, leading to underutilization or contention of resources. We propose ParallelismDial (PD), a model to dynamically, continuously and judiciously adapt a program's degree of parallelism to a given dynamic operating environment. PD uses a holistic metric to measure system-efficiency. The metric is used to systematically optimize the program's execution.
We apply PD to two diverse parallel programming models: Intel TBB, an industry standard, and Prometheus, a recent research effort. Two prototypes of PD have been implemented. The prototypes are evaluated on two stock multicore workstations. Dedicated and multiprogrammed environments were considered. Experimental results show that the prototypes outperform the state-of-the-art approaches, on average, by 15% on time and 31% on energy efficiency, in the dedicated environment. In the multiprogrammed environment, the savings are to the tune of 19% and 21% in time and energy, respectively.

References

[1]
Intel64 and IA-32 Architectures Software Developer's Manual Combined Volumes 3A and 3B: System Programming Guide, Parts 1 and 2. http://www.intel.com/Assets/PDF/manual/325384.pdf.
[2]
M. D. Allen. Data-Driven Decomposition of Sequential Programs for Determinate Parallel Execution. PhD thesis, University of Wisconsin, Madison, 2010.
[3]
M. D. Allen, S. Sridharan, and G. S. Sohi. Serialization sets: a dynamic dependence-based parallel execution model. In PPoPP '09, pages 85--96, New York, NY, USA, 2009.
[4]
A. Anand, C. Muthukrishnan, A. Akella, and R. Ramjee. Redundancy in network traffic: findings and implications. SIGMETRICS '09, pages 37--48, New York, NY, USA, 2009.
[5]
T. E. Anderson, B. N. Bershad, E. D. Lazowska, and H. M. Levy. Scheduler activations: effective kernel support for the user-level management of parallelism. In Proceedings of the thirteenth ACM symposium on Operating systems principles, SOSP '91, pages 95--109, New York, NY, USA, 1991. ACM.
[6]
R. L. Bocchino, Jr., V. S. Adve, D. Dig, S. V. Adve, S. Heumann, R. Komuravelli, J. Overbey, P. Simmons, H. Sung, and M. Vakilian. A type and effect system for deterministic parallel java. OOPSLA '09, pages 97--116, New York, NY, USA, 2009.
[7]
S. Borkar. Designing reliable systems from unreliable components: the challenges of transistor variability and degradation. MICRO '05, 25(6):10--16, 2005.
[8]
S. Boyd-Wickizer, A. T. Clements, Y. Mao, A. Pesterev, M. F. Kaashoek, R. Morris, and N. Zeldovich. An analysis of linux scalability to manycores. OSDI'10, pages 1--8, Berkeley, CA, USA, 2010. USENIX Association.
[9]
M. Curtis-Maury, J. Dzierwa, C. D. Antonopoulos, and D. S. Nikolopoulos. Online power-performance adaptation of multithreaded programs using hardware event-based prediction. ICS '06, pages 157--166, New York, NY, USA, 2006.
[10]
M. Curtis-Maury, A. Shah, F. Blagojevic, D. S. Nikolopoulos, B. R. de Supinski, and M. Schulz. Prediction models for multi-dimensional power-performance optimization on many cores. PACT '08, pages 250--259, New York, NY, USA, 2008.
[11]
D. J. DeWitt, R. H. Katz, F. Olken, L. D. Shapiro, M. R. Stonebraker, and D. A. Wood. Implementation techniques for main memory database systems. SIGMOD '84, pages 1--8, New York, NY, USA, 1984.
[12]
E. Ebrahimi, C. J. Lee, O. Mutlu, and Y. N. Patt. Fairness via source throttling: a configurable and high-performance fairness substrate for multi-core memory systems. In ASPLOS '10, pages 335--346, New York, NY, USA, 2010.
[13]
M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In PLDI '98, pages 212--223, 1998.
[14]
S. H. Fuller and E. Lynette I. Millett. The Future of Computing Performance: Game Over or Next Level? The National Academies Press, 2011.
[15]
M. Gendreau. An Introduction to Tabu Search. In F. Glover and G. Kochenberger, editors, Handbook of Metaheuristics, chapter 2, pages 37--54. Kluwer Academic Publishers, 2003.
[16]
J. Gilchrist. Parallel data compression with bzip2. In ICPDCS '04, pages 559--564, 2004.
[17]
F. Glover and M. Laguna. Tabu Search. Kluwer Academic Publishers, Norwell, MA, USA, 1997.
[18]
G. Gupta and G. S. Sohi. Dataflow execution of sequential imperative programs on multicore architectures. In MICRO '11, pages 59--70, New York, NY, USA, 2011.
[19]
R. Illikkal, V. Chadha, A. Herdrich, R. Iyer, and D. Newell. PIRATE: QoS and performance management in CMP architectures. SIGMETRICS Perform. Eval. Rev., 37:3--10, March 2010.
[20]
R. Iyer. CQoS: a framework for enabling QoS in shared caches of CMP platforms. In ICS '04, pages 257--266, New York, NY, USA, 2004.
[21]
R. Iyer, L. Zhao, F. Guo, R. Illikkal, S. Makineni, D. Newell, Y. Solihin, L. Hsu, and S. Reinhardt. Qos policies and architecture for cache/memory in cmp platforms. SIGMETRICS Perform. Eval. Rev., 35(1):25--36, June 2007.
[22]
J. O. Kephart and D. M. Chess. The vision of autonomic computing. Computer, 36(1):41--50, Jan. 2003.
[23]
S. Kim, D. Chandra, and Y. Solihin. Fair cache sharing and partitioning in a chip multiprocessor architecture. In PACT '04, pages 111--122, 2004.
[24]
M. Kulkarni, M. Burtscher, K. Pingali, and C. Cascaval. Lonestar: A suite of parallel irregular programs. In ISPASS '09, pages 65--76, April 2009.
[25]
J. Lee, H. Wu, M. Ravichandran, and N. Clark. Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications. In ISCA '10, pages 270--279, New York, NY, USA, 2010.
[26]
D. Li, B. de Supinski, M. Schulz, K. Cameron, and D. Nikolopoulos. Hybrid MPI/OpenMP power-aware computing. In IPDPS '10, pages 1--12, April 2010.
[27]
J. Li and J. Martinez. Power-performance implications of thread-level parallelism on chip multiprocessors. In ISPASS '05, pages 124--134, March 2005.
[28]
J. Li and J. Martinez. Dynamic power-performance adaptation of parallel computation on chip multiprocessors. In HPCA '06, pages 77--87, Feb. 2006.
[29]
R. Liu, K. Klues, S. Bird, S. Hofmeyr, K. Asanović, and J. Kubiatowicz. Tessellation: space-time partitioning in a manycore client os. HotPar'09, pages 10--10, Berkeley, CA, USA, 2009.
[30]
J. D. McCalpin. Memory bandwidth and machine balance in current high performance computers. TCCA Newsletter, pages 19--25, Dec. 1995.
[31]
C. McCann, R. Vaswani, and J. Zahorjan. A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors. ACM Trans. Comput. Syst., 11(2):146--178, May 1993.
[32]
P. Mucci, S. Browne, C. Deane, and G. Ho. Papi: A portable interface to hardware performance counters. In Proc. Dept. of Defense HPCMP Users Group Conference, pages 7--10, 1999.
[33]
O. Mutlu and T. Moscibroda. Stall-time fair memory access scheduling for chip multiprocessors. In MICRO '07, pages 146--160, 2007.
[34]
O. Mutlu and T. Moscibroda. Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared dram systems. In ISCA '08, pages 63--74, 2008.
[35]
K. J. Nesbit, N. Aggarwal, J. Laudon, and J. E. Smith. Fair queuing memory systems. In MICRO '06, pages 208--222, 2006.
[36]
K. J. Nesbit, J. Laudon, and J. E. Smith. Virtual private caches. In ISCA '07, pages 57--68, New York, NY, USA, 2007.
[37]
H. Pan, B. Hindman, and K. Asanović. Composing parallel software efficiently with lithe. In PLDI '10, pages 376--387, New York, NY, USA, 2010.
[38]
J. Perez, R. Badia, and J. Labarta. A dependency-aware task-based programming environment for multi-core architectures. In Cluster Computing, 2008 IEEE International Conference on, pages 142--151, 29 2008-oct. 1 2008.
[39]
A. Raman, H. Kim, T. Oh, J. W. Lee, and D. I. August. Parallelism orchestration using DoPE: the degree of parallelism executive. In PLDI '11, pages 26--37, New York, NY, USA, 2011. ACM.
[40]
A. Raman, A. Zaks, J. W. Lee, and D. I. August. Parcae: a system for flexible parallel execution. In PLDI '12, pages 133--144, New York, NY, USA, 2012.
[41]
C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating mapreduce for multi-core and multiprocessor systems. HPCA '07, pages 13--24, Washington, DC, USA, 2007.
[42]
J. Reinders. Intel Threading Building Blocks. O'Reilly Media, Inc., 2007.
[43]
U. Richter, M. Mnif, J. Branke, C. MÃijller-Schloer, and H. Schmeck. Towards a generic observer/controller architecture for organic computing. In C. Hochberger and R. Liskowsky, editors, GI Jahrestagung (1), volume 93 of LNI, pages 112--119. GI, 2006.
[44]
H. Sasaki, T. Tanimoto, K. Inoue, and H. Nakamura. Scalability-based manycore partitioning. In PACT '12, pages 107--116, New York, NY, USA, 2012.
[45]
M. A. Suleman, M. K. Qureshi, and Y. N. Patt. Feedback-driven threading: power-efficient and high-performance execution of multithreaded workloads on CMPs. In ASPLOS '08, pages 277--286,2008.
[46]
J. Teich, J. Henkel, A. Herkersdorf, D. Schmitt-Landsiedel, W. Schroder-Preikschat, and G. Snelting. Invasive computing: An overview. In Multiprocessor System-on-Chip, pages 241--268. 2011.

Cited By

View all
  • (2022)Amphis: Managing Reconfigurable Processor Architectures With Generative Adversarial LearningIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.319798041:11(3993-4003)Online publication date: Nov-2022
  • (2022)Thermal-Aware Thread and Turbo Frequency Throttling Optimization for Parallel Applications2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)10.1109/SBCCI55532.2022.9893245(1-6)Online publication date: 22-Aug-2022
  • (2021)Teaching High Productivity and High Performance in an Introductory Parallel Programming Course2021 IEEE 28th International Conference on High Performance Computing, Data and Analytics Workshop (HiPCW)10.1109/HiPCW54834.2021.00010(21-28)Online publication date: Dec-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '13: Proceedings of the 27th international ACM conference on International conference on supercomputing
June 2013
512 pages
ISBN:9781450321303
DOI:10.1145/2464996
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 June 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. autotuning
  2. parallel programming
  3. performance portability
  4. performance tuning
  5. run-time optimization

Qualifiers

  • Research-article

Conference

ICS'13
Sponsor:
ICS'13: International Conference on Supercomputing
June 10 - 14, 2013
Oregon, Eugene, USA

Acceptance Rates

ICS '13 Paper Acceptance Rate 43 of 202 submissions, 21%;
Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)14
  • Downloads (Last 6 weeks)2
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Amphis: Managing Reconfigurable Processor Architectures With Generative Adversarial LearningIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.319798041:11(3993-4003)Online publication date: Nov-2022
  • (2022)Thermal-Aware Thread and Turbo Frequency Throttling Optimization for Parallel Applications2022 35th SBC/SBMicro/IEEE/ACM Symposium on Integrated Circuits and Systems Design (SBCCI)10.1109/SBCCI55532.2022.9893245(1-6)Online publication date: 22-Aug-2022
  • (2021)Teaching High Productivity and High Performance in an Introductory Parallel Programming Course2021 IEEE 28th International Conference on High Performance Computing, Data and Analytics Workshop (HiPCW)10.1109/HiPCW54834.2021.00010(21-28)Online publication date: Dec-2021
  • (2021)Providing high‐level self‐adaptive abstractions for stream parallelism on multicoresSoftware: Practice and Experience10.1002/spe.294851:6(1194-1217)Online publication date: 10-Jan-2021
  • (2020)ALERTProceedings of the 2020 USENIX Conference on Usenix Annual Technical Conference10.5555/3489146.3489170(353-369)Online publication date: 15-Jul-2020
  • (2020)Enhancing Resource Management Through Prediction-Based PoliciesEuro-Par 2020: Parallel Processing10.1007/978-3-030-57675-2_31(493-509)Online publication date: 18-Aug-2020
  • (2019)Generative and multi-phase learning for computer systems optimizationProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3326633(39-52)Online publication date: 22-Jun-2019
  • (2019)Aurora: Seamless Optimization of OpenMP ApplicationsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2018.287299230:5(1007-1021)Online publication date: 1-May-2019
  • (2019)Simplifying and implementing service level objectives for stream parallelismThe Journal of Supercomputing10.1007/s11227-019-02914-6Online publication date: 5-Jun-2019
  • (2019)Tuning Parallel ApplicationsParallel Computing Hits the Power Wall10.1007/978-3-030-28719-1_4(41-54)Online publication date: 6-Nov-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media