Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2749469.2750404acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Harmonia: balancing compute and memory power in high-performance GPUs

Published: 13 June 2015 Publication History

Abstract

In this paper, we address the problem of efficiently managing the relative power demands of a high-performance GPU and its memory subsystem. We develop a management approach that dynamically tunes the hardware operating configurations to maintain balance between the power dissipated in compute versus memory access across GPGPU application phases. Our goal is to reduce power with minimal performance degradation.
Accordingly, we construct predictors that assess the online sensitivity of applications to three hardware tunables---compute frequency, number of active compute units, and memory bandwidth. Using these sensitivity predictors, we propose a two-level coordinated power management scheme, Harmonia, which coordinates the hardware power states of the GPU and the memory system. Through hardware measurements on a commodity GPU, we evaluate Harmonia against a state-of-the-practice commodity GPU power management scheme, as well as an oracle scheme. Results show that Harmonia improves measured energy-delay squared (ED2) by up to 36% (12% on average) with negligible performance loss across representative GPGPU workloads, and on an average is within 3% of the oracle scheme.

References

[1]
AMD, "PowerTune Technology whitepaper, 2010."
[2]
M. Arora, S. Nath, S. Mazumdar, S. Baden, and D. Tullsen, "Redefining the Role of the CPU in the Era of CPU-GPU Integration," IEEE Micro, 2012.
[3]
K. Asanovic, R. Bodik, B. Catanzaro, J. Gebis, P. Husbands, K. Keutzer, D. Patterson, W. Plishker, J. Shalf, S. Williams, and K. Yelick, "The Landscape of Parallel Computing Research: A View from Berkeley," Technical Report UCB/EECS-183.2006, 2006.
[4]
W. L. Bircher, M. Valluri, J. Law, and L. John, "Runtime Identification of Microprocessor Energy Saving Opportunities," in International Symp. on Low Power Electronics and Design (ISLPED), 2005.
[5]
W. Brown, P. Wang, S. Plimpton, and A. Tharrington, "Implementing Molecular Dynamics on Hybrid High Performance Computers---Short Range Forces," Compute Physics Communications, 2011.
[6]
S. Che, M. Boyer, J. Meng, D. Tarjan, J. Sheaffer, S.-H. Lee, and K. Skadron, "Rodinia: A benchmark suite for heterogeneous computing," in IEEE Intl. Symp. on Workload Characterization, 2009.
[7]
S. Che, J. Sheaffer, M. Boyer, L. Szafaryn, and K. Skadron, "A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads," in IEEE Intl. Symp. on Workload Characterization, 2011.
[8]
M. Chen, X. Wang, and X. Li, "Coordinating Processor and Main Memory for Efficient Server Power Control," in International Conference on Supercomputing (ICS), 2011.
[9]
J. Choi, D. Bedard, R. Fowler, and R. Vuduc, "A Roofline Model of Energy," in IEEE International Distributed Process Symposium, 2013.
[10]
CodeXL, "http://developer.amd.com/tools-and-sdks/heterogeneous-computing/codexl/."
[11]
M. Daga and M. Nutter, "Exploiting Coarse-grained Parallelism in B+ Tree Searches on APUs," in Workshop on Irregular Applications, Architectures and Algorithms (IA3), 2012.
[12]
A. Danalis, G. Marin, C. McCurdy, J. Meredith, P. Roth, K. Spafford, V. Tipparaju, and J. Vetter, "The Scalable Heterogeneous Computing (SHOC) Benchmarking Suite," in Workshop on General-Purpose Computation on Graphics Processing Units (GPGPU), 2010.
[13]
H. David, C. Fallin, E. Gorbatov, U. Hanebutte, and O. Mutlu, "Memory Power Management vis Dynamic Voltage/Frequency Scaling," in International Conference on Autonomous Computing (ICAC), 2011.
[14]
H. David, E. Gorbatov, U. Hanebutte, K. Khanna, and C. Le, "RAPL: Memory Power Estimation and Capping," in International Symposium on Low Power Electronics and Design (ISLPED), 2010.
[15]
Q. Deng, D. Meisner, A. Bhattacharjee, T. Wenisch, and R. Bianchini, "CoScale: Coordinating CPU and Memory System DVFS in Server Systems," in International Symposium on Microarchitecture (MICRO), 2012.
[16]
Q. Deng, D. Meisner, A. Bhattacharjee, T. Wenisch, and R. Bianchini, "MultiScale: Memory System DVFS with Multiple Memory Controllers," in International Symposium on Low Power Electronics and Design (ISLPED), 2012.
[17]
Q. Deng, D. Meisner, L. Ramos, T. Wenisch, and R. Bianchini, "Mem-Scale: Active Low-Power Modes for Main Memory," in International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2011.
[18]
B. Diniz, D. Guedez, W. Meira, and R. Bianchini, "Limiting the Power Consumption of Main Memory," in International Symposium on Computer Architecture (ISCA), 2007.
[19]
Elpida, "http://www.elpida.com/en/news/2011/06-27.html."
[20]
W. Felter, K. Rajamani, T. Keller, and C. Rusu, "A Performance-Conserving Approach for Reducing Peak Power Consumption in Server Systems," in International Conference on Supercomputing (ICS), 2005.
[21]
Green500 List, "http://www.green500.org."
[22]
M. Heroux, D. Doerfler, P. Crozier, J. Willenbring, H. Edwards, A. Williams, M. Rajan, E. Keiter, H. Thornquist, and R. Numrich, "Improving Performance via Mini-applications," Sandia Report, SAND2009-5574, 2009.
[23]
S. Hong and H. Kim, "An Analytical Model for a GPU Architecture with Memory-Level and Thread-Level Parallelism Awareness," in International Symposium on Computer Architecture (ISCA), 2009.
[24]
C. Hsu and W. Feng, "Effective Dynamic Voltage Scaling through CPU-Boundedness Detection," Lec. Notes in Computer Science, 2004.
[25]
W. Huang, M. Stan, K. Sankaranarayanan, R. Ribando, and K. Skadron, "Many-core Design from a Thermal Perspective," in Design Automation Conference (DAC), 2008.
[26]
JEDECWide I/O, "http://www.jedec.org/news/pressreleases/jedecpublishes-breakthrough-standard-wide-io-mobile-dram, jan 2012."
[27]
S. Kaxiras and M. Martonosi, "Computer Architecture Techniques for Power Efficiency," Synth. Lec. on Computer Architecture, 2008.
[28]
S. Keckler, W. Dally, B. Khailany, M. Garland, and D. Glasco, "GPUs and the Future of Parallel Computing," IEEE Micro, 2011.
[29]
G. Kestor, R. Gioiosa, D. Kerbyson, and A. Hoisie, "Quantifying the Energy Cost of Data Movement in Scientific Applications," in International Symposium on Workload Characterization (IISWC), 2013.
[30]
J. Laros, K. Pedretti, S. Kelly, W. Shu, and C. Vaughan, "Energy Based Performance Tuning for Large Scale High Performance Computing Systems," in Symp. on High-Performance Computing, 2012.
[31]
J. Lee and H. Kim, "TAP: A TLP-Aware Cache Management Policy for a CPU-GPU Heterogeneous Architecture," in International Conference on High-Performance Computer Architecture (HPCA), 2012.
[32]
J. Lee, V. Sathisha, M. Schulte, K. Compton, and N. S. Kim, "Improving Throughput of Power-Constrained GPUs Using Dynamic Voltage/Frequency and Core Scaling," in International Conference on Parallel Architectures and Compilation Techniques (PACT), 2011.
[33]
J. Leng, T. Hetherington, A. ElTantawy, S. Gilani, N. S. Kim, T. M. Aamodt, and V. J. Reddi, "GPUWattch: Enabling Energy Optimizations in GPGPUs," in International Symposium on Computer Architecture (ISCA), 2013.
[34]
C. Luk, S. Hong, and H. Kim, "Qilin: Exploiting Parallelism on Hetergeneous Multiprocessors with Adaptive Mapping," in International Symposium on Microarchitecture (MICRO), 2009.
[35]
M. Mantor and M. Houston, "AMD Graphics Core Next," in AMD Fusion Developer Summit, 2011.
[36]
A. McLaughlin, I. Paul, J. Greathouse, S. Manne, and S. Yalamanchili, "A Power Characterization and Management of GPU Graph Traversal," in Workshop on Architectures and Systems for Big Data, 2014.
[37]
R. Murphy, K. Wheeler, B. Barett, and J. Ang, "Introduing the Graph500," Cray User's Group (CUG), 2010.
[38]
Online, "http://www.anandtech.com/show/8217/intels-knights-landing-coprocessor-detailed."
[39]
Online, "http://www.techspot.com/news/52003-future-nvidia-volta-gpu-has-stacked-dram-offers-1tb-s-bandwidth.html, march 2013."
[40]
S. Pakin, C. Storlie, M. Lang, R. Fields, E. Romero, C. Idler, S. Michalak, H. Greeberg, J. Loncaric, R. Rheinheimer, G. Grider, and J. Wendelberger, "Power Usage of Production Supercomputers and Production Workloads," in International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), 2012.
[41]
I. Paul, S. Manne, M. Arora, W. L. Bircher, and S. Yalamanchili, "Cooperative Boost: Needy vs. Greedy Power Management," in International Symposium on Computer Architecture (ISCA), 2013.
[42]
I. Paul, V. Ravi, S. Manne, M. Arora, and S. Yalamanchili, "Coordinated Energy Management in Heterogeneous Processors," in International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), 2013.
[43]
J. Pawlowski, "Hybrid Memory Cube (HMC)," in HotChips, 2011.
[44]
E. Rotem, A. Naveh, D. Rajwan, A. Ananthakrishnan, and E. Weisman, "Power Management Architectures of the Intel Microarchitecture Code-Named Sandy Bridge," IEEE Micro, 2012.
[45]
B. Rountree, D. Lowenthal, B. de Supinski, M. Schulz, V. Freeh, and T. Bletsch, "Adagio: Making DVS Practical for Complex HPC Applications," in International Conference on Supercomputing (ICS), 2009.
[46]
B. Rountree, D. Lowenthal, S. Funk, V. Freeh, B. de Supinski, and M. Schulz, "Bounding Energy Consumption in Large-Scale MPI Programs," in International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), 2007.
[47]
J. Shalf, S. Dosanjh, and J. Morrison, "Exascale computing technology challenges," in International Conference on High Performance Computing for Computational Science, 2010.
[48]
A. Sharifi, A. K. Mishra, S. Srikantaiah, M. Kandemir, and C. R. Das, "PEPON: performance-aware hierarchical power budgeting for NoC based multicores," in International Conference on Parallel Architectures and Compilation Techniques (PACT), 2012.
[49]
A. Tiwari, M. Laurenzano, L. Carrington, and A. Snavely, "Autotuning for Energy Usage in Scientific Applications," in International Conference on Parallel Processing (Euro-Par), 2011.
[50]
H. Wang, V. Sathish, R. Singh, M. Schulte, and N. Kim, "Worload and Power Budgest Partitioning for Single Chip Heterogeneous Processors," in International Conference on Parallel Architectures and Compilation Techniques (PACT), 2012.
[51]
S. Williams, A. Waterman, and D. Patterson, "Roofline: An Insightful Visual Performance Model for Multicore Architectures," Communications of the ACM, 2009.

Cited By

View all
  • (2024)Improving GPU Energy Efficiency through an Application-transparent Frequency Scaling Policy with Performance AssuranceProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629584(769-785)Online publication date: 22-Apr-2024
  • (2024)DRLCAP: Runtime GPU Frequency Capping With Deep Reinforcement LearningIEEE Transactions on Sustainable Computing10.1109/TSUSC.2024.33626979:5(712-726)Online publication date: Sep-2024
  • (2024)Analysis of Energy-Efficient LCRM Optimization Algorithm in Computer Vision-based CNNs2024 IEEE 8th Energy Conference (ENERGYCON)10.1109/ENERGYCON58629.2024.10488814(1-6)Online publication date: 4-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer Architecture
June 2015
768 pages
ISBN:9781450334020
DOI:10.1145/2749469
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2015

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

ISCA '15
Sponsor:

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)33
  • Downloads (Last 6 weeks)2
Reflects downloads up to 23 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Improving GPU Energy Efficiency through an Application-transparent Frequency Scaling Policy with Performance AssuranceProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629584(769-785)Online publication date: 22-Apr-2024
  • (2024)DRLCAP: Runtime GPU Frequency Capping With Deep Reinforcement LearningIEEE Transactions on Sustainable Computing10.1109/TSUSC.2024.33626979:5(712-726)Online publication date: Sep-2024
  • (2024)Analysis of Energy-Efficient LCRM Optimization Algorithm in Computer Vision-based CNNs2024 IEEE 8th Energy Conference (ENERGYCON)10.1109/ENERGYCON58629.2024.10488814(1-6)Online publication date: 4-Mar-2024
  • (2023)High-Performance and Power-Saving Mechanism for Page Activations Based on Full Independent DRAM Sub-Arrays in Multi-Core SystemsIEEE Access10.1109/ACCESS.2023.329984811(79801-79822)Online publication date: 2023
  • (2023)LCRM: Layer-Wise Complexity Reduction Method for CNN Model Optimization on End DevicesIEEE Access10.1109/ACCESS.2023.329062011(66838-66857)Online publication date: 2023
  • (2023)Variation aware power management for GPU memoriesMicroprocessors & Microsystems10.1016/j.micpro.2022.10471196:COnline publication date: 1-Feb-2023
  • (2022)A Review on Statistical Power Modelling for a Graphics Processing Unit (GPU)2022 Sixth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC)10.1109/I-SMAC55078.2022.9987403(327-330)Online publication date: 10-Nov-2022
  • (2022)A Modern Primer on Processing in MemoryEmerging Computing: From Devices to Systems10.1007/978-981-16-7487-7_7(171-243)Online publication date: 9-Jul-2022
  • (2020)HCAPP: Scalable Power Control for Heterogeneous 2.5D Integrated SystemsProceedings of the 49th International Conference on Parallel Processing10.1145/3404397.3404448(1-11)Online publication date: 17-Aug-2020
  • (2020)Benchmarking the Performance and Energy Efficiency of AI Accelerators for AI Training2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID)10.1109/CCGrid49817.2020.00-15(744-751)Online publication date: May-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media