Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3370748.3406553acmconferencesArticle/Chapter ViewAbstractPublication PagesislpedConference Proceedingsconference-collections
research-article
Open access

SAOU: safe adaptive overclocking and undervolting for energy-efficient GPU computing

Published: 10 August 2020 Publication History

Abstract

The current trend of ever-increasing performance in scientific applications comes with tremendous growth in energy consumption. In this paper, we present a framework for GPU applications, which reduces energy consumption in GPUs through Safe Overclocking and Undervolting (SAOU) without sacrificing performance. The idea is to increase the frequency beyond the safe frequency fsa f eMax and undervolt below Vsa f eMin to get maximum energy saving. Since such overclocking and undervolting may give rise to faults, we employ an enhanced checkpoint-recovery technique to cover the possible errors. Empirically, we explore different errors and derive a fault model that can set the undervolting and overclocking level for maximum energy saving. We target cuBLAS Matrix Multiplication (cuBLAS-MM) kernel for error correction using the checkpoint and recovery (CR) technique as an example of scientific applications. In case of cuBLAS, SAOU achieves up to 22% energy reduction through undervolting and overclocking without sacrificing the performance.

Supplementary Material

MP4 File (3370748.3406553.mp4)
Presentation video

References

[1]
"Dvfs-aware application classification to improve gpgpus energy efficiency," Parallel Computing, 2018.
[2]
X. Mei, L. S. Yung, K. Zhao, and X. Chu, "A measurement study of gpu dvfs on energy conservation," in Proceedings of the Workshop on Power-Aware Computing and Systems, ser. HotPower '13, 2013, pp. 10:1--10:5.
[3]
K. Dev, S. Reda, I. Paul, W. Huang, and W. Burleson, "Workload-Aware power gating design and Run-Time management for massively parallel GPGPUs," in 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Jul. 2016, pp. 242--247.
[4]
J. Leng, A. Buyuktosunoglu, R. Bertran, P. Bose, and V. J. Reddi, "Safe limits on voltage reduction efficiency in gpus: A direct measurement approach," in Proceedings of the 48th International Symposium on Microarchitecture, ser. MICRO-48. New York, NY, USA: ACM, 2015, pp. 294--307. [Online]. Available
[5]
H. Zamani, Y. Liu, D. Tripathy, L. Bhuyan, and others, "GreenMM: energy efficient GPU matrix multiplication through undervolting," Proceedings of the ACM, 2019.
[6]
K.-H. Huang et al., "Algorithm-based fault tolerance for matrix operations," IEEE transactions on computers, vol. 100, no. 6, pp. 518--528, 1984.
[7]
A. Moody, G. Bronevetsky, K. Mohror, and B. R. d. Supinski, "Design, modeling, and evaluation of a scalable multi-level checkpointing system," in SC '10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, Nov. 2010, pp. 1--11.
[8]
P. Wu and Z. Chen, "FT-ScaLAPACK: correcting soft errors on-line for ScaLAPACK cholesky, QR, and LU factorization routines," in Proceedings of the 23rd international symposium on High-performance parallel and distributed computing, ser. HPDC '14. Association for Computing Machinery, Jun. 2014, pp. 49--60.
[9]
B. Pourghassemi and A. Chandramowlishwaran, "cudaCR: An In-Kernel Application-Level Checkpoint/Restart scheme for CUDA-Enabled GPUs," CLUSTER 2017.
[10]
H. Takizawa, K. Sato, K. Komatsu, and H. Kobayashi, "CheCUDA: A Checkpoint/Restart tool for CUDA applications," 2009 International Conference on Parallel and Distributed Computing, Applications and Technologies, 2009.
[11]
P. H. Hargrove and J. C. Duell, "Berkeley lab checkpoint/restart (BLCR) for linux clusters," J. Phys. Conf. Ser., vol. 46, no. 1, p. 067, Sep. 2006.
[12]
N. A. Nvcr, "A transparent checkpoint-restart library for nvidia cuda," in Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW, 2011.
[13]
H. Takizawa, K. Koyama, K. Sato, K. Komatsu, and H. Kobayashi, "CheCL: Transparent checkpointing and process migration of OpenCL applications," in 2011 IEEE International Parallel Distributed Processing Symposium, May 2011, pp. 864--876.
[14]
G. Memik, M. H. Chowdhury, A. Mallik, and Y. I. Ismail, "Engineering Over-Clocking: Reliability-Performance Trade-Offs for High-Performance register files," 2005 International Conference on Dependable Systems and Networks (DSN'05).
[15]
D. P. Murthy, M. Xie, and R. Jiang, Weibull models. John Wiley & Sons, 2004, vol. 505.
[16]
S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron, "Rodinia: A benchmark suite for heterogeneous computing," in Workload Characterization, 2009. IISWC 2009. IEEE International Symposium on. Ieee, 2009, pp. 44--54.
[17]
S. S. Skiena, The algorithm design manual: Text. Springer Science & Business Media, 1998, vol. 1.
[18]
R. L. Rivest and C. E. Leiserson, Introduction to algorithms. McGraw-Hill, Inc., 1990.
[19]
A. Abdolrashidi, D. Tripathy, M. E. Belviranli, L. N. Bhuyan, and D. Wong, "Wireframe: Supporting data-dependent parallelism through dependency graph execution in gpus," in Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, 2017, pp. 600--611.
[20]
K.-H. Huang et al., "Algorithm-based fault tolerance for matrix operations," IEEE transactions on computers, vol. 100, no. 6, pp. 518--528, 1984.
[21]
Z. Chen, "Extending algorithm-based fault tolerance to tolerate fail-stop failures in high performance distributed environments," in Parallel and Distributed Processing, 2008.
[22]
M. Afterburner, "http://goo.gl/fs2pti."
[23]
Q. Wang, J. Ohmura, S. Axida, T. Miyoshi, H. Irie, and T. Yoshinaga, "Parallel matrix-matrix multiplication based on hpl with a gpu-accelerated pc cluster," in 2010 First International Conference on Networking and Computing, Nov 2010, pp. 243--248.
[24]
S. Blackford. (1997) ScaLAPACK users' guide.

Cited By

View all
  • (2024)Sustainable Optimizing Performance and Energy Efficiency in Proof of Work Blockchain: A Multilinear Regression ApproachSustainability10.3390/su1604151916:4(1519)Online publication date: 10-Feb-2024
  • (2024)DRLCAP: Runtime GPU Frequency Capping With Deep Reinforcement LearningIEEE Transactions on Sustainable Computing10.1109/TSUSC.2024.33626979:5(712-726)Online publication date: Sep-2024
  • (2023)Improving Energy Saving of One-Sided Matrix Decompositions on CPU-GPU Heterogeneous SystemsProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577496(274-287)Online publication date: 25-Feb-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ISLPED '20: Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design
August 2020
263 pages
ISBN:9781450370530
DOI:10.1145/3370748
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

In-Cooperation

  • IEEE CAS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 August 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. checkpoint-recovery
  2. energy efficiency
  3. overclocking
  4. undervolting

Qualifiers

  • Research-article

Funding Sources

  • NSF

Conference

ISLPED '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 398 of 1,159 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)232
  • Downloads (Last 6 weeks)37
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Sustainable Optimizing Performance and Energy Efficiency in Proof of Work Blockchain: A Multilinear Regression ApproachSustainability10.3390/su1604151916:4(1519)Online publication date: 10-Feb-2024
  • (2024)DRLCAP: Runtime GPU Frequency Capping With Deep Reinforcement LearningIEEE Transactions on Sustainable Computing10.1109/TSUSC.2024.33626979:5(712-726)Online publication date: Sep-2024
  • (2023)Improving Energy Saving of One-Sided Matrix Decompositions on CPU-GPU Heterogeneous SystemsProceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3572848.3577496(274-287)Online publication date: 25-Feb-2023
  • (2023)Passive Primary/Backup-Based Scheduling for Simultaneous Power and Reliability Management on Heterogeneous Embedded SystemsIEEE Transactions on Sustainable Computing10.1109/TSUSC.2022.31866568:1(82-93)Online publication date: 1-Jan-2023
  • (2023)AOA: Adaptive Overclocking Algorithm on CPU-GPU Heterogeneous PlatformsAlgorithms and Architectures for Parallel Processing10.1007/978-3-031-22677-9_14(253-272)Online publication date: 11-Jan-2023
  • (2021)MAPAProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3480853(1-14)Online publication date: 14-Nov-2021
  • (2021)PAVERACM Transactions on Architecture and Code Optimization10.1145/345116418:3(1-26)Online publication date: 8-Jun-2021
  • (2021)ICAP: Designing Inrush Current Aware Power Gating Switch for GPGPU2021 IEEE International Conference on Networking, Architecture and Storage (NAS)10.1109/NAS51552.2021.9605434(1-8)Online publication date: Oct-2021
  • (2021)LocalityGuru: A PTX Analyzer for Extracting Thread Block-level Locality in GPGPUs2021 IEEE International Conference on Networking, Architecture and Storage (NAS)10.1109/NAS51552.2021.9605411(1-8)Online publication date: Oct-2021
  • (2021)Deflection-Aware Routing Algorithm in Network on Chip against Soft Errors and Crosstalk Faults2021 IEEE International Conference on Networking, Architecture and Storage (NAS)10.1109/NAS51552.2021.9605392(1-6)Online publication date: Oct-2021
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media