Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3597503.3639156acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article
Open access

Tensor-Aware Energy Accounting

Published: 12 April 2024 Publication History

Abstract

With the rapid growth of Artificial Intelligence (AI) applications supported by deep learning (DL), the energy efficiency of these applications has an increasingly large impact on sustainability. We introduce Smaragdine, a new energy accounting system for tensor-based DL programs implemented with TensorFlow. At the heart of Smaragdine is a novel white-box methodology of energy accounting: Smaragdine is aware of the internal structure of the DL program, which we call tensor-aware energy accounting. With Smaragdine, the energy consumption of a DL program can be broken down into units aligned with its logical hierarchical decomposition structure. We apply Smaragdine for understanding the energy behavior of BERT, one of the most widely used language models. Layer-by-layer and tensor-by-tensor, Smaragdine is capable of identifying the highest energy/power-consuming components of BERT. Furthermore, we conduct two case studies on how Smaragdine supports downstream toolchain building, one on the comparative energy impact of hyperparameter tuning of BERT, the other on the energy behavior evolution when BERT evolves to its next generation, ALBERT.

References

[1]
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D. G., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., and Zheng, X. Tensorflow: A system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) (Savannah, GA, Nov. 2016), USENIX Association, pp. 265--283.
[2]
Babakol, T., Canino, A., and Liu, Y. D. Eflect: Porting energy-aware applications to shared environments. ICSE '22, Association for Computing Machinery, p. 823--834.
[3]
Babakol, T., Canino, A., Mahmoud, K., Saxena, R., and Liu, Y. D. Calm energy accounting for multi-threaded java applications. FSE '20.
[4]
Banerjee, A., Chong, L. K., Chattopadhyay, S., and Roychoudhury, A. Detecting energy bugs and hotspots in mobile apps. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (2014), FSE 2014, p. 588--598.
[5]
Bartenstein, T., and Liu, Y. D. Green streams for data-intensive software. In 35th International Conference on Software Engineering, ICSE '13, San Francisco, CA, USA, May 18--26, 2013 (2013), D. Notkin, B. H. C. Cheng, and K. Pohl, Eds., IEEE Computer Society, pp. 532--541.
[6]
Bengio, Y., Ducharme, R., and Vincent, P. A neural probabilistic language model. Advances in neural information processing systems 13 (2000).
[7]
Bircher, W. L., and John, L. K. Complete system power estimation using processor performance events. IEEE Transactions on Computers 61, 4 (2012), 563--577.
[8]
Cai, E., Juan, D.-C., Stamoulis, D., and Marculescu, D. Neuralpower: Predict and deploy energy-efficient convolutional neural networks. 9th Asian Conference on Machine Learning (10 2017).
[9]
Canino, A., and Liu, Y. D. Proactive and adaptive energy-aware programming with mixed typechecking. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2017, Barcelona, Spain, June 18--23, 2017 (2017), A. Cohen and M. T. Vechev, Eds., ACM, pp. 217--232.
[10]
Canino, A., Liu, Y. D., and Masuhara, H. Stochastic energy optimization for mobile GPS applications. In Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, November 04--09, 2018 (2018), G. T. Leavens, A. Garcia, and C. S. Pasareanu, Eds., ACM, pp. 703--713.
[11]
Cao, J., Li, M., Chen, X., Wen, M., Tian, Y., Wu, B., and Cheung, S.-C. DeepFD: Automated Fault Diagnosis and Localization for Deep Learning Programs. Association for Computing Machinery, New York, NY, USA, 2022, p. 573--585.
[12]
Chen, J., Wu, Z., Wang, Z., You, H., Zhang, L., and Yan, M. Practical accuracy estimation for efficient deep neural network testing. ACM Trans. Softw. Eng. Methodol. 29, 4 (oct 2020).
[13]
Cohen, M., Zhu, H. S., Emgin, S. E., and Liu, Y. D. Energy types. In Proceedings of the 27th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2012, part of SPLASH 2012, Tucson, AZ, USA, October 21--25, 2012 (2012), G. T. Leavens and M. B. Dwyer, Eds., ACM, pp. 831--850.
[14]
Collobert, R., and Weston, J. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning (2008), pp. 160--167.
[15]
Dai, X.,Jia, Y., Vajda, P., Uyttendaele, M., Jha, N. K., Zhang, P., Wu, B., Yin, H., Sun, F., Wang, Y., Dukhan, M., Hu, Y., and Wu, Y. Chamnet: Towards efficient network design through platform-aware model adaptation. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Los Alamitos, CA, USA, jun 2019), IEEE Computer Society, pp. 11390--11399.
[16]
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) (Minneapolis, Minnesota, June 2019), Association for Computational Linguistics, pp. 4171--4186.
[17]
Dutta, P., Feldmeier, M., Paradiso, J., and Culler, D. Energy metering for free: Augmenting switching regulators for real-time monitoring. In 2008 International Conference on Information Processing in Sensor Networks (ipsn 2008) (April 2008), pp. 283--294.
[18]
Fukushima, K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological cybernetics 36, 4 (1980), 193--202.
[19]
Gandhi, A., Lee, D., Liu, Z., Mu, S., Zadok, E., Ghose, K., Gopalan, K., Liu, Y. D., Hussain, S. R., and Mcdaniel, P. Metrics for sustainability in data centers. SIGENERGY Energy Inform. Rev. 3, 3 (oct 2023), 40--46.
[20]
Gao, X., Liu, D., Liu, D., Wang, H., and Stavrou, A. E-android: A new energy profiling tool for smartphones. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS) (June 2017), pp. 492--502.
[21]
García-Martín, E., Rodrigues, C. F., Riley, G., and Grahn, H. Estimation of energy consumption in machine learning. Journal of Parallel and Distributed Computing 134 (2019), 75--88.
[22]
Georgiou, S., Kechagia, M., Sharma, T., Sarro, F., and Zou, Y. Green ai: Do deep learning frameworks have different costs? In Proceedings of the 44th International Conference on Software Engineering (New York, NY, USA, 2022), ICSE'22, Association for Computing Machinery, p. 1082--1094.
[23]
Gholami, A., Kwon, K., Wu, B., Tai, Z., Yue, X., Jin, P., Zhao, S., and Keutzer, K. Squeezenext: Hardware-aware neural network design. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (June 2018).
[24]
Gong, Y., Liu, L., Yang, M., and Bourdev, L. Compressing deep convolutional networks using vector quantization, 12 2014.
[25]
Gupta, S., Agrawal, A., Gopalakrishnan, K., and Narayanan, P. Deep learning with limited numerical precision. In International Conference on Machine Learning (2015).
[26]
Han, S., Mao, H., and Dally, W.J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).
[27]
Hinton, G., Vinyals, O., and Dean, J. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
[28]
Hofmann, K. H. d-1 - the low separation axioms t0 and t1. In Encyclopedia of General Topology, K. P. Hart, J. iti Nagata, J. E. Vaughan, V. V. Fedorchuk, G. Gruenhage, H. J. Junnila, K. M. Kuperberg, J. van Mill, T. Nogura, H. Ohta, A. Okuyama, R. Pol, and S. Watson, Eds. Elsevier, Amsterdam, 2003, pp. 155--157.
[29]
Huang, Z., and Wang, N. Like what you like: Knowledge distill via neuron selectivity transfer. arXiv preprint arXiv:1707.01219 (2017).
[30]
Jouppi, N. P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., Bates, S., Bhatia, S., Boden, N., Borchers, A., Boyle, R., Cantin, P.-l., Chao, C., Clark, C., Coriell, J., Daley, M., Dau, M., Dean, J., Gelb, B., Ghaemmaghami, T. V., Gottipati, R., Gulland, W., Hagmann, R., Ho, C. R., Hogberg, D., Hu, J., Hundt, R., Hurt, D., Ibarz, J., Jaffey, A., Jaworski, A., Kaplan, A., Khaitan, H., Killebrew, D., Koch, A., Kumar, N., Lacy, S., Laudon, J., Law, J., Le, D., Leary, C., Liu, Z., Lucke, K., Lundin, A., MacKean, G., Maggiore, A., Mahony, M., Miller, K., Nagarajan, R., Narayanaswami, R., Ni, R., Nix, K., Norrie, T., Omernick, M., Penukonda, N., Phelps, A., Ross, J., Ross, M., Salek, A., Samadiani, E., Severn, C., Sizikov, G., Snelham, M., Souter, J., Steinberg, D., Swing, A., Tan, M., Thorson, G., Tian, B., Toma, H., Tuttle, E., Vasudevan, V., Walter, R., Wang, W., Wilcox, E., and Yoon, D. H. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (New York, NY, USA, 2017), ISCA '17, Association for Computing Machinery, p. 1--12.
[31]
Jegou, H., Douze, M., and Schmid, C. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 1 (2011), 117--128.
[32]
Kingetsu, H., Kobayashi, K., and Suzuki, T. Neural network module decomposition and recomposition, 2021.
[33]
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., and Soricut, R. Albert: A lite bert for self-supervised learning of language representations. In International Conference on Learning Representations (2020).
[34]
LeCun, Y., Denker, J. S., and Solla, S. A. Optimal brain damage. In NIPS (1989).
[35]
Lee, J., Lee, J., Han, D., Lee, J., Park, G., and Yoo, H.-J. 7.7 lnpu: A 25.3tflops/w sparse deep-neural-network learning processor with fine-grained mixed precision of fp8-fp16. pp. 142--144.
[36]
Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., and Kang, J. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 4 (09 2019), 1234--1240.
[37]
Linnainmaa, S. Taylor expansion of the accumulated rounding error. BIT Numerical Mathematics 16 (1976), 146--160.
[38]
Liu, K., Mahmoud, K., Yoo, J., and Liu, Y. D. Vincent: Green hot methods in the JVM. In 36th European Conference on Object-Oriented Programming, ECOOP 2022, June 6--10, 2022, Berlin, Germany (2022), K. Ali and J. Vitek, Eds., vol. 222 of LIPIcs, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, pp. 32:1--32:30.
[39]
Lottick, K., Susai, S., Friedler, S., and Wilson, J. Energy usage reports: Environmental awareness as part of algorithmic accountability. In NeurIPS 2019 Workshop on Tackling Climate Change with Machine Learning (2019).
[40]
Ma, S., Liu, Y., Lee, W.-C., Zhang, X., and Grama, A. Mode: Automated neural network model debugging via state differential analysis and input selection. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (New York, NY, USA, 2018), ESEC/FSE 2018, Association for Computing Machinery, p. 175--186.
[41]
McCullough, J. C., and Agarwal, Y. Evaluating the effectiveness of Model-Based power characterization. In 2011 USENIX Annual Technical Conference (USENIX ATC 11) (Portland, OR, June 2011), USENIX Association.
[42]
Montavon, G., Lapuschkin, S., Binder, A., Samek, W., and Müller, K.-R. Explaining nonlinear classification decisions with deep taylor decomposition. Pattern recognition 65 (2017), 211--222.
[43]
Novikov, A., Podoprikhin, D., Osokin, A., and Vetrov, D. P. Tensorizing neural networks. Advances in neural information processing systems 28 (2015).
[44]
Pan, R., and Rajan, H. On decomposing a deep neural network into modules. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (New York, NY, USA, 2020), ESEC/FSE 2020, Association for Computing Machinery, p. 889--900.
[45]
Pathak, A., Hu, Y. C., and Zhang, M. Where is the energy spent inside my app?: Fine grained energy accounting on smartphones with eprof. In Proceedings of the 7th ACM European Conference on Computer Systems (2012), EuroSys '12, pp. 29--42.
[46]
Patterson, D., Gonzalez, J., Le, Q., Liang, C., Munguía, L.-M., Rothchild, D., So, D., Texier, M., and Dean, J. Carbon emissions and large neural network training, 04 2021.
[47]
Pinto, G., Castor, F., and Liu, Y. D. Understanding energy behaviors of thread management constructs. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA 2014, part of SPLASH 2014, Portland, OR, USA, October 20--24, 2014 (2014), A. P. Black and T. D. Millstein, Eds., ACM, pp. 345--360.
[48]
Pinto, G., Liu, K., Castor, F., and Liu, Y. D. A comprehensive study on the energy efficiency of Java's thread-safe collections. In 2016 IEEE International Conference on Software Maintenance and Evolution, ICSME 2016, Raleigh, NC, USA, October 2--7, 2016 (2016), IEEE Computer Society, pp. 20--31.
[49]
Rauschmayr, N., Kumar, V., Huilgol, R., Olgiati, A., Bhattacharjee, S., Harish, N., Kannan, V., Lele, A., Acharya, A., Nielsen, J., Ramakrishnan, L., Chandy, I., Bhatt, I., Li, Z., Chia, K., Dodda, N., Gu, J., Choi, M., Nagarajan, B., Geevarghes, J., Davydenko, D., Li, S., Huang, L., Kim, E., Hill, T., and Kenthapadi, K. Amazon sagemaker debugger: A system for real-time insights into machine learning model training. In MLSys 2021 (2021).
[50]
Rob Srebrovic, J. Y. Leveraging the bert algorithm for patents with tensorflow and bigquery. Tech. rep., Global Patents, Google.
[51]
Rouhani, B. D., Mirhoseini, A., and Koushanfar, F. Delight: Adding energy dimension to deep neural networks. In Proceedings of the 2016 International Symposium on Low Power Electronics and Design (New York, NY, USA, 2016), ISLPED '16, Association for Computing Machinery, p. 112--117.
[52]
Rumelhart, D. E., Hinton, G. E., Williams, R. J., et al. Learning internal representations by error propagation, 1985.
[53]
Schoop, E., Huang, F., and Hartmann, B. Umlaut: Debugging deep learning programs using program structure and model behavior. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (New York, NY, USA, 2021), CHI '21, Association for Computing Machinery.
[54]
Strubell, E., Ganesh, A., and McCallum, A. Energy and policy considerations for deep learning in NLP. CoRR abs/1906.02243 (2019).
[55]
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. Going deeper with convolutions. In Computer Vision and Pattern Recognition (CVPR) (2015).
[56]
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. Rethinking the inception architecture for computer vision. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), pp. 2818--2826.
[57]
US-NSF. National science foundation and vmware partnership on the next generation of sustainable digital infrastructure (ngsdi), nsf 20--594, 2020.
[58]
van Rossum, G., Warsaw, B., and Coghlan, N. Style guide for Python code. PEP 8, 2001.
[59]
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Red Hook, NY, USA, 2017), NIPS'17, Curran Associates Inc., p. 6000--6010.
[60]
Wang, F., Zhang, W., Lai, S., Hao, M., and Wang, Z. Dynamic gpu energy optimization for machine learning training workloads. IEEE Transactions on Parallel I& Distributed Systems 33, 11 (nov 2022), 2943--2954.
[61]
Wardat, M., Le, W., and Rajan, H. Deeplocalize: Fault localization for deep neural networks. In Proceedings of the 43rd International Conference on Software Engineering (2021), ICSE '21, IEEE Press, p. 251--262.
[62]
Warstadt, A., Singh, A., and Bowman, S. R. Neural network acceptability judgments. arXiv preprint arXiv:1805.12471 (2018).
[63]
You, J., Chung, J.-W., and Chowdhury, M. Zeus: Understanding and optimizing GPU energy consumption of DNN training. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23) (Boston, MA, Apr. 2023), USENIX Association, pp. 119--139.
[64]
Zamani, R., and Afsahi, A. A study of hardware performance monitoring counter selection in power modeling of computing systems. In 2012 International Green Computing Conference (IGCC) (2012), pp. 1--10.
[65]
Zeng, H., Ellis, C. S., Lebeck, A. R., and Vahdat, A. Ecosystem: managing energy as a first class operating system resource. In ASPLOS-X: Proceedings of the 10th international conference on Architectural support for programming languages and operating systems (2002), pp. 123--132.
[66]
Zeng, H., Ellis, C. S., Lebeck, A. R., and Vahdat, A. Currentcy: A unifying abstraction for expressing energy management policies. In In Proceedings of the USENIX Annual Technical Conference (2003), pp. 43--56.
[67]
Zhang, X., Zhai, J., Ma, S., and Shen, C. Autotrainer: An automatic dnn training problem detection and repair system. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) (2021), pp. 359--371.
[68]
Zhu, H. S., Lin, C., and Liu, Y. D. A programming model for sustainable software. In 37th IEEE/ACM International Conference on Software Engineering, ICSE 2015, Florence, Italy, May 16--24, 2015, Volume 1 (2015), A. Bertolino, G. Canfora, and S. G. Elbaum, Eds., IEEE Computer Society, pp. 767--777.

Cited By

View all
  • (2024)VESTA: Power Modeling with Language Runtime EventsProceedings of the ACM on Programming Languages10.1145/36564028:PLDI(621-646)Online publication date: 20-Jun-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering
May 2024
2942 pages
ISBN:9798400702174
DOI:10.1145/3597503
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

In-Cooperation

  • Faculty of Engineering of University of Porto

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 April 2024

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

ICSE '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)164
  • Downloads (Last 6 weeks)55
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)VESTA: Power Modeling with Language Runtime EventsProceedings of the ACM on Programming Languages10.1145/36564028:PLDI(621-646)Online publication date: 20-Jun-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media