research-article

Open access

Tensor-Aware Energy Accounting

Authors:

Yu David LiuAuthors Info & Claims

ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering

Article No.: 93, Pages 1 - 12

https://doi.org/10.1145/3597503.3639156

Published: 12 April 2024 Publication History

Abstract

With the rapid growth of Artificial Intelligence (AI) applications supported by deep learning (DL), the energy efficiency of these applications has an increasingly large impact on sustainability. We introduce Smaragdine, a new energy accounting system for tensor-based DL programs implemented with TensorFlow. At the heart of Smaragdine is a novel white-box methodology of energy accounting: Smaragdine is aware of the internal structure of the DL program, which we call tensor-aware energy accounting. With Smaragdine, the energy consumption of a DL program can be broken down into units aligned with its logical hierarchical decomposition structure. We apply Smaragdine for understanding the energy behavior of BERT, one of the most widely used language models. Layer-by-layer and tensor-by-tensor, Smaragdine is capable of identifying the highest energy/power-consuming components of BERT. Furthermore, we conduct two case studies on how Smaragdine supports downstream toolchain building, one on the comparative energy impact of hyperparameter tuning of BERT, the other on the energy behavior evolution when BERT evolves to its next generation, ALBERT.

References

[1]

Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D. G., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., and Zheng, X. Tensorflow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) (Savannah, GA, Nov. 2016), USENIX Association, pp. 265--283.

Digital Library

[2]

Babakol, T., Canino, A., and Liu, Y. D. Eflect: Porting energy-aware applications to shared environments. ICSE '22, Association for Computing Machinery, p. 823--834.

[3]

Babakol, T., Canino, A., Mahmoud, K., Saxena, R., and Liu, Y. D. Calm energy accounting for multi-threaded java applications. FSE '20.

[4]

Banerjee, A., Chong, L. K., Chattopadhyay, S., and Roychoudhury, A. Detecting energy bugs and hotspots in mobile apps. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (2014), FSE 2014, p. 588--598.

[5]

Bartenstein, T., and Liu, Y. D. Green streams for data-intensive software. In 35th International Conference on Software Engineering, ICSE '13, San Francisco, CA, USA, May 18--26, 2013 (2013), D. Notkin, B. H. C. Cheng, and K. Pohl, Eds., IEEE Computer Society, pp. 532--541.

[6]

Bengio, Y., Ducharme, R., and Vincent, P. A neural probabilistic language model. Advances in neural information processing systems 13 (2000).

[7]

Bircher, W. L., and John, L. K. Complete system power estimation using processor performance events. IEEE Transactions on Computers 61, 4 (2012), 563--577.

Digital Library

[8]

Cai, E., Juan, D.-C., Stamoulis, D., and Marculescu, D. Neuralpower: Predict and deploy energy-efficient convolutional neural networks. 9th Asian Conference on Machine Learning (10 2017).

[9]

Canino, A., and Liu, Y. D. Proactive and adaptive energy-aware programming with mixed typechecking. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18--23, 2017 (2017), A. Cohen and M. T. Vechev, Eds., ACM, pp. 217--232.

Digital Library

[10]

Canino, A., Liu, Y. D., and Masuhara, H. Stochastic energy optimization for mobile GPS applications. In Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, November 04--09, 2018 (2018), G. T. Leavens, A. Garcia, and C. S. Pasareanu, Eds., ACM, pp. 703--713.

Digital Library

[11]

Cao, J., Li, M., Chen, X., Wen, M., Tian, Y., Wu, B., and Cheung, S.-C. DeepFD: Automated Fault Diagnosis and Localization for Deep Learning Programs. Association for Computing Machinery, New York, NY, USA, 2022, p. 573--585.

[12]

Chen, J., Wu, Z., Wang, Z., You, H., Zhang, L., and Yan, M. Practical accuracy estimation for efficient deep neural network testing. ACM Trans. Softw. Eng. Methodol. 29, 4 (oct 2020).

[13]

Cohen, M., Zhu, H. S., Emgin, S. E., and Liu, Y. D. Energy types. In Proceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2012, part of SPLASH 2012, Tucson, AZ, USA, October 21--25, 2012 (2012), G. T. Leavens and M. B. Dwyer, Eds., ACM, pp. 831--850.

Digital Library

[14]

Collobert, R., and Weston, J. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning (2008), pp. 160--167.

Digital Library

[15]

Dai, X.,Jia, Y., Vajda, P., Uyttendaele, M., Jha, N. K., Zhang, P., Wu, B., Yin, H., Sun, F., Wang, Y., Dukhan, M., Hu, Y., and Wu, Y. Chamnet: Towards efficient network design through platform-aware model adaptation. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Los Alamitos, CA, USA, jun 2019), IEEE Computer Society, pp. 11390--11399.

[16]

Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (Minneapolis, Minnesota, June 2019), Association for Computational Linguistics, pp. 4171--4186.

[17]

Dutta, P., Feldmeier, M., Paradiso, J., and Culler, D. Energy metering for free: Augmenting switching regulators for real-time monitoring. In 2008 International Conference on Information Processing in Sensor Networks (ipsn 2008) (April 2008), pp. 283--294.

Digital Library

[18]

Fukushima, K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological cybernetics 36, 4 (1980), 193--202.

[19]

Gandhi, A., Lee, D., Liu, Z., Mu, S., Zadok, E., Ghose, K., Gopalan, K., Liu, Y. D., Hussain, S. R., and Mcdaniel, P. Metrics for sustainability in data centers. SIGENERGY Energy Inform. Rev. 3, 3 (oct 2023), 40--46.

Digital Library

[20]

Gao, X., Liu, D., Liu, D., Wang, H., and Stavrou, A. E-android: A new energy profiling tool for smartphones. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS) (June 2017), pp. 492--502.

[21]

García-Martín, E., Rodrigues, C. F., Riley, G., and Grahn, H. Estimation of energy consumption in machine learning. Journal of Parallel and Distributed Computing 134 (2019), 75--88.

Digital Library

[22]

Georgiou, S., Kechagia, M., Sharma, T., Sarro, F., and Zou, Y. Green ai: Do deep learning frameworks have different costs? In Proceedings of the 44th International Conference on Software Engineering (New York, NY, USA, 2022), ICSE'22, Association for Computing Machinery, p. 1082--1094.

[23]

Gholami, A., Kwon, K., Wu, B., Tai, Z., Yue, X., Jin, P., Zhao, S., and Keutzer, K. Squeezenext: Hardware-aware neural network design. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (June 2018).

[24]

Gong, Y., Liu, L., Yang, M., and Bourdev, L. Compressing deep convolutional networks using vector quantization, 12 2014.

[25]

Gupta, S., Agrawal, A., Gopalakrishnan, K., and Narayanan, P. Deep learning with limited numerical precision. In International Conference on Machine Learning (2015).

[26]

Han, S., Mao, H., and Dally, W.J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).

[27]

Hinton, G., Vinyals, O., and Dean, J. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).

[28]

Hofmann, K. H. d-1 - the low separation axioms t0 and t1. In Encyclopedia of General Topology, K. P. Hart, J. iti Nagata, J. E. Vaughan, V. V. Fedorchuk, G. Gruenhage, H. J. Junnila, K. M. Kuperberg, J. van Mill, T. Nogura, H. Ohta, A. Okuyama, R. Pol, and S. Watson, Eds. Elsevier, Amsterdam, 2003, pp. 155--157.

[29]

Huang, Z., and Wang, N. Like what you like: Knowledge distill via neuron selectivity transfer. arXiv preprint arXiv:1707.01219 (2017).

[30]

Jouppi, N. P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., Bates, S., Bhatia, S., Boden, N., Borchers, A., Boyle, R., Cantin, P.-l., Chao, C., Clark, C., Coriell, J., Daley, M., Dau, M., Dean, J., Gelb, B., Ghaemmaghami, T. V., Gottipati, R., Gulland, W., Hagmann, R., Ho, C. R., Hogberg, D., Hu, J., Hundt, R., Hurt, D., Ibarz, J., Jaffey, A., Jaworski, A., Kaplan, A., Khaitan, H., Killebrew, D., Koch, A., Kumar, N., Lacy, S., Laudon, J., Law, J., Le, D., Leary, C., Liu, Z., Lucke, K., Lundin, A., MacKean, G., Maggiore, A., Mahony, M., Miller, K., Nagarajan, R., Narayanaswami, R., Ni, R., Nix, K., Norrie, T., Omernick, M., Penukonda, N., Phelps, A., Ross, J., Ross, M., Salek, A., Samadiani, E., Severn, C., Sizikov, G., Snelham, M., Souter, J., Steinberg, D., Swing, A., Tan, M., Thorson, G., Tian, B., Toma, H., Tuttle, E., Vasudevan, V., Walter, R., Wang, W., Wilcox, E., and Yoon, D. H. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (New York, NY, USA, 2017), ISCA '17, Association for Computing Machinery, p. 1--12.

[31]

Jegou, H., Douze, M., and Schmid, C. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 1 (2011), 117--128.

Digital Library

[32]

Kingetsu, H., Kobayashi, K., and Suzuki, T. Neural network module decomposition and recomposition, 2021.

[33]

Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., and Soricut, R. Albert: A lite bert for self-supervised learning of language representations. In International Conference on Learning Representations (2020).

[34]

LeCun, Y., Denker, J. S., and Solla, S. A. Optimal brain damage. In NIPS (1989).

Digital Library

[35]

Lee, J., Lee, J., Han, D., Lee, J., Park, G., and Yoo, H.-J. 7.7 lnpu: A 25.3tflops/w sparse deep-neural-network learning processor with fine-grained mixed precision of fp8-fp16. pp. 142--144.

[36]

Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., and Kang, J. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 4 (09 2019), 1234--1240.

[37]

Linnainmaa, S. Taylor expansion of the accumulated rounding error. BIT Numerical Mathematics 16 (1976), 146--160.

Digital Library

[38]

Liu, K., Mahmoud, K., Yoo, J., and Liu, Y. D. Vincent: Green hot methods in the JVM. In 36th European Conference on Object-Oriented Programming, ECOOP 2022, June 6--10, 2022, Berlin, Germany (2022), K. Ali and J. Vitek, Eds., vol. 222 of LIPIcs, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, pp. 32:1--32:30.

[39]

Lottick, K., Susai, S., Friedler, S., and Wilson, J. Energy usage reports: Environmental awareness as part of algorithmic accountability. In NeurIPS 2019 Workshop on Tackling Climate Change with Machine Learning (2019).

[40]

Ma, S., Liu, Y., Lee, W.-C., Zhang, X., and Grama, A. Mode: Automated neural network model debugging via state differential analysis and input selection. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (New York, NY, USA, 2018), ESEC/FSE 2018, Association for Computing Machinery, p. 175--186.

Digital Library

[41]

McCullough, J. C., and Agarwal, Y. Evaluating the effectiveness of Model-Based power characterization. In 2011 USENIX Annual Technical Conference (USENIX ATC 11) (Portland, OR, June 2011), USENIX Association.

[42]

Montavon, G., Lapuschkin, S., Binder, A., Samek, W., and Müller, K.-R. Explaining nonlinear classification decisions with deep taylor decomposition. Pattern recognition 65 (2017), 211--222.

[43]

Novikov, A., Podoprikhin, D., Osokin, A., and Vetrov, D. P. Tensorizing neural networks. Advances in neural information processing systems 28 (2015).

[44]

Pan, R., and Rajan, H. On decomposing a deep neural network into modules. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (New York, NY, USA, 2020), ESEC/FSE 2020, Association for Computing Machinery, p. 889--900.

Digital Library

[45]

Pathak, A., Hu, Y. C., and Zhang, M. Where is the energy spent inside my app?: Fine grained energy accounting on smartphones with eprof. In Proceedings of the 7th ACM European Conference on Computer Systems (2012), EuroSys '12, pp. 29--42.

Digital Library

[46]

Patterson, D., Gonzalez, J., Le, Q., Liang, C., Munguía, L.-M., Rothchild, D., So, D., Texier, M., and Dean, J. Carbon emissions and large neural network training, 04 2021.

[47]

Pinto, G., Castor, F., and Liu, Y. D. Understanding energy behaviors of thread management constructs. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA 2014, part of SPLASH 2014, Portland, OR, USA, October 20--24, 2014 (2014), A. P. Black and T. D. Millstein, Eds., ACM, pp. 345--360.

Digital Library

[48]

Pinto, G., Liu, K., Castor, F., and Liu, Y. D. A comprehensive study on the energy efficiency of Java's thread-safe collections. In 2016 IEEE International Conference on Software Maintenance and Evolution, ICSME 2016, Raleigh, NC, USA, October 2--7, 2016 (2016), IEEE Computer Society, pp. 20--31.

[49]

Rauschmayr, N., Kumar, V., Huilgol, R., Olgiati, A., Bhattacharjee, S., Harish, N., Kannan, V., Lele, A., Acharya, A., Nielsen, J., Ramakrishnan, L., Chandy, I., Bhatt, I., Li, Z., Chia, K., Dodda, N., Gu, J., Choi, M., Nagarajan, B., Geevarghes, J., Davydenko, D., Li, S., Huang, L., Kim, E., Hill, T., and Kenthapadi, K. Amazon sagemaker debugger: A system for real-time insights into machine learning model training. In MLSys 2021 (2021).

[50]

Rob Srebrovic, J. Y. Leveraging the bert algorithm for patents with tensorflow and bigquery. Tech. rep., Global Patents, Google.

[51]

Rouhani, B. D., Mirhoseini, A., and Koushanfar, F. Delight: Adding energy dimension to deep neural networks. In Proceedings of the 2016 International Symposium on Low Power Electronics and Design (New York, NY, USA, 2016), ISLPED '16, Association for Computing Machinery, p. 112--117.

[52]

Rumelhart, D. E., Hinton, G. E., Williams, R. J., et al. Learning internal representations by error propagation, 1985.

[53]

Schoop, E., Huang, F., and Hartmann, B. Umlaut: Debugging deep learning programs using program structure and model behavior. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (New York, NY, USA, 2021), CHI '21, Association for Computing Machinery.

Digital Library

[54]

Strubell, E., Ganesh, A., and McCallum, A. Energy and policy considerations for deep learning in NLP. CoRR abs/1906.02243 (2019).

[55]

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. Going deeper with convolutions. In Computer Vision and Pattern Recognition (CVPR) (2015).

[56]

Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. Rethinking the inception architecture for computer vision. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), pp. 2818--2826.

[57]

US-NSF. National science foundation and vmware partnership on the next generation of sustainable digital infrastructure (ngsdi), nsf 20--594, 2020.

[58]

van Rossum, G., Warsaw, B., and Coghlan, N. Style guide for Python code. PEP 8, 2001.

[59]

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Red Hook, NY, USA, 2017), NIPS'17, Curran Associates Inc., p. 6000--6010.

[60]

Wang, F., Zhang, W., Lai, S., Hao, M., and Wang, Z. Dynamic gpu energy optimization for machine learning training workloads. IEEE Transactions on Parallel I& Distributed Systems 33, 11 (nov 2022), 2943--2954.

[61]

Wardat, M., Le, W., and Rajan, H. Deeplocalize: Fault localization for deep neural networks. In Proceedings of the 43rd International Conference on Software Engineering (2021), ICSE '21, IEEE Press, p. 251--262.

[62]

Warstadt, A., Singh, A., and Bowman, S. R. Neural network acceptability judgments. arXiv preprint arXiv:1805.12471 (2018).

[63]

You, J., Chung, J.-W., and Chowdhury, M. Zeus: Understanding and optimizing GPU energy consumption of DNN training. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23) (Boston, MA, Apr. 2023), USENIX Association, pp. 119--139.

[64]

Zamani, R., and Afsahi, A. A study of hardware performance monitoring counter selection in power modeling of computing systems. In 2012 International Green Computing Conference (IGCC) (2012), pp. 1--10.

Digital Library

[65]

Zeng, H., Ellis, C. S., Lebeck, A. R., and Vahdat, A. Ecosystem: managing energy as a first class operating system resource. In ASPLOS-X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems (2002), pp. 123--132.

Digital Library

[66]

Zeng, H., Ellis, C. S., Lebeck, A. R., and Vahdat, A. Currentcy: A unifying abstraction for expressing energy management policies. In In Proceedings of the USENIX Annual Technical Conference (2003), pp. 43--56.

Digital Library

[67]

Zhang, X., Zhai, J., Ma, S., and Shen, C. Autotrainer: An automatic dnn training problem detection and repair system. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) (2021), pp. 359--371.

Digital Library

[68]

Zhu, H. S., Lin, C., and Liu, Y. D. A programming model for sustainable software. In 37th IEEE/ACM International Conference on Software Engineering, ICSE 2015, Florence, Italy, May 16--24, 2015, Volume 1 (2015), A. Bertolino, G. Canfora, and S. G. Elbaum, Eds., IEEE Computer Society, pp. 767--777.

Digital Library

Cited By

Raskind JBabakol TMahmoud KLiu Y(2024)VESTA: Power Modeling with Language Runtime EventsProceedings of the ACM on Programming Languages10.1145/36564028:PLDI(621-646)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3656402

Recommendations

Rethink energy accounting with cooperative game theory
MobiCom '14: Proceedings of the 20th annual international conference on Mobile computing and networking

Energy accounting determines how much a software principal contributes to the total system energy consumption. It is the foundation for evaluating software and for operating system based energy management. While various energy accounting policies have ...
User profiling and micro-accounting for smart energy management
SenSys '13: Proceedings of the 11th ACM Conference on Embedded Networked Sensor Systems

Energy management, and in particular its efficient optimization, is one of the hot trends in the current days, both at the enterprise level (optimization of whole corporate/government buildings) and single-citizens' homes. Energy efficiency is generally ...
Storage-aware smartphone energy savings
UbiComp '13: Proceedings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing

In this paper, to our best knowledge, we are first to provide an experimental study on how storage techniques affect power levels in smartphones and introduce energy-efficient approaches to reduce energy consumption. We evaluate power degradation at ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering

May 2024

2942 pages

ISBN:9798400702174

DOI:10.1145/3597503

Co-chairs:
Ana Paiva,
Rui Abreu,
Program Co-chairs:
Abhik Roychoudhury,
Margaret Storey

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

In-Cooperation

Faculty of Engineering of University of Porto

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 April 2024

Check for updates

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

ICSE '24

Sponsor:

SIGSOFT

ICSE '24: IEEE/ACM 46th International Conference on Software Engineering

April 14 - 20, 2024

Lisbon, Portugal

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

2025 IEEE/ACM 46th International Conference on Software Engineering

April 26 - May 3, 2025

Ottawa , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
164
Total Downloads

Downloads (Last 12 months)164
Downloads (Last 6 weeks)55

Reflects downloads up to 12 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Raskind JBabakol TMahmoud KLiu Y(2024)VESTA: Power Modeling with Language Runtime EventsProceedings of the ACM on Programming Languages10.1145/36564028:PLDI(621-646)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3656402

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents