Abstract
Existing approaches for reducing DNN power consumption rely on quite general principles, including avoidance of multiplication operations and aggressive quantization of weights and activations. However, these methods do not consider the precise power consumed by each module in the network and are therefore not optimal. In this paper we develop accurate power consumption models for all arithmetic operations in the DNN, under various working conditions. We reveal several important factors that have been overlooked to date. Based on our analysis, we present PANN (power-aware neural network), a simple approach for approximating any full-precision network by a low-power fixed-precision variant. Our method can be applied to a pre-trained network and can also be used during training to achieve improved performance. Unlike previous methods, PANN incurs only a minor degradation in accuracy w.r.t. the full-precision version of the network and enables to seamlessly traverse the power-accuracy trade-off at deployment time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
The power consumed by a single bit flip may vary across platforms (e.g., between a 5 nm and a 45 nm fabrication), but the number of bit flips per MAC does not change. We therefore report power in units of bit-flips, which allows comparing between implementations while ignoring the platform.
- 3.
Batch-norm layers should first be absorbed into the weights and biases.
- 4.
In quantized models MAC operations are always performed on integers and rescaling is applied at the end.
References
Abts, D., et al.: Think fast: a tensor streaming processor (tsp) for accelerating deep learning workloads. In: 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), pp. 145–158. IEEE (2020)
Achterhold, J., Koehler, J.M., Schmeink, A., Genewein, T.: Variational network quantization. In: International Conference on Learning Representations (2018)
Asif, S., Kong, Y.: Performance analysis of wallace and radix-4 booth-wallace multipliers. In: 2015 Electronic System Level Synthesis Conference (ESLsyn), pp. 17–22. IEEE (2015)
Banner, R., Nahshan, Y., Soudry, D.: Post training 4-bit quantization of convolutional networks for rapid-deployment. In: Advances in Neural Information Processing Systems, pp. 7950–7958 (2019)
Cai, Y., Yao, Z., Dong, Z., Gholami, A., Mahoney, M.W., Keutzer, K.: Zeroq: a novel zero shot quantization framework. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13169–13178 (2020)
Chen, H., et al.: Addernet: do we really need multiplications in deep learning? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1468–1477 (2020)
Courbariaux, M., Bengio, Y., David, J.P.: Binaryconnect: training deep neural networks with binary weights during propagations. arXiv preprint arXiv:1511.00363 (2015)
Elhoushi, M., Chen, Z., Shafiq, F., Tian, Y.H., Li, J.Y.: Deepshift: towards multiplication-less neural networks. arXiv preprint arXiv:1905.13298 (2019)
Esser, S.K., McKinstry, J.L., Bablani, D., Appuswamy, R., Modha, D.S.: Learned step size quantization. In: International Conference on Learning Representations (2019)
Fang, J., Shafiee, A., Abdel-Aziz, H., Thorsley, D., Georgiadis, G., Hassoun, J.H.: Post-training piecewise linear quantization for deep neural networks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 69–86. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_5
Gudaparthi, S., Narayanan, S., Balasubramonian, R., Giacomin, E., Kambalasubramanyam, H., Gaillardon, P.E.: Wire-aware architecture and dataflow for CNN accelerators. In: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 1–13 (2019)
Gupta, S., Agrawal, A., Gopalakrishnan, K., Narayanan, P.: Deep learning with limited numerical precision. In: International Conference on Machine Learning, pp. 1737–1746 (2015)
Haroush, M., Hubara, I., Hoffer, E., Soudry, D.: The knowledge within: Methods for data-free model compression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8494–8502 (2020)
Horowitz, M.: Computing’s energy problem (and what we can do about it). In:2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pp. 10–14 (2014)
Horowitz, M.: Energy table for 45nm process. In: Stanford VLSI wiki (2014)
Huang, N.C., Chou, H.J., Wu, K.C.: Efficient systolic array based on decomposable mac for quantized deep neural networks (2019)
Hubara, I., Nahshan, Y., Hanani, Y., Banner, R., Soudry, D.: Improving post training neural quantization: Layer-wise calibration and integer programming. arXiv preprint arXiv:2006.10518 (2020)
Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2704–2713 (2018)
Jiao, Y., et al.: 7.2 a 12nm programmable convolution-efficient neural-processing-unit chip achieving 825tops. In: 2020 IEEE International Solid-State Circuits Conference-(ISSCC), pp. 136–140. IEEE (2020)
Jouppi, N.P., et al.: In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th Annual International Symposium on Computer Architecture, pp. 1–12 (2017)
Kalamkar, D.,et al.: A study of bfloat16 for deep learning training. arXiv preprint arXiv:1905.12322 (2019)
Karimi, N., Moos, T., Moradi, A.: Exploring the effect of device aging on static power analysis attacks. UMBC Faculty Collection (2019)
Kim, Y., Park, E., Yoo, S., Choi, T., Yang, L., Shin, D.: Compression of deep convolutional neural networks for fast and low power mobile applications. In: Bengio, Y., LeCun, Y. (eds.) 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2–4, 2016, Conference Track Proceedings (2016). http://arxiv.org/abs/1511.06530
Kim, Y., Kim, H., Yadav, N., Li, S., Choi, K.K.: Low-power RTL code generation for advanced CNN algorithms toward object detection in autonomous vehicles. Electronics 9(3), 478 (2020)
Kwon, H., Chatarasi, P., Pellauer, M., Parashar, A., Sarkar, V., Krishna, T.: Understanding reuse, performance, and hardware cost of DNN dataflow: a data-centric approach. In: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 754–768 (2019)
Lavin, A., Gray, S.: Fast algorithms for convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4013–4021 (2016)
Li, F., Zhang, B., Liu, B.: Ternary weight networks. arXiv preprint arXiv:1605.04711 (2016)
Li, Y., et al.: BRECQ: pushing the limit of post-training quantization by block reconstruction. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=POWv6hDd9XH
Lin, Z., Courbariaux, M., Memisevic, R., Bengio, Y.: Neural networks with few multiplications. arXiv preprint arXiv:1510.03009 (2015)
Liu, X., Ye, M., Zhou, D., Liu, Q.: Post-training quantization with multiple points: Mixed precision without mixed precision. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 8697–8705 (2021)
Louizos, C., Reisser, M., Blankevoort, T., Gavves, E., Welling, M.: Relaxed quantization for discretized neural networks. arXiv preprint arXiv:1810.01875 (2018)
Mahmoud, M.: Tensordash: Exploiting sparsity to accelerate deep neural network training and inference (2020)
Mishra, A., Nurvitadhi, E., Cook, J.J., Marr, D.: WRPN: wide reduced-precision networks. arXiv preprint arXiv:1709.01134 (2017)
Mukherjee, A., Saurav, K., Nair, P., Shekhar, S., Lis, M.: A case for emerging memories in dnn accelerators. In: 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 938–941. IEEE (2021)
Nagel, M., Amjad, R.A., van Baalen, M., Louizos, C., Blankevoort, T.: Up or down? adaptive rounding for post-training quantization. arXiv preprint arXiv:2004.10568 (2020)
Nagel, M., Baalen, M.v., Blankevoort, T., Welling, M.: Data-free quantization through weight equalization and bias correction. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1325–1334 (2019)
Nahshan, Y., et al.: Loss aware post-training quantization. arXiv preprint arXiv:1911.07190 (2019)
Nasser, Y., Prévotet, J.C., Hélard, M., Lorandel, J.: Dynamic power estimation based on switching activity propagation. In: 2017 27th International Conference on Field Programmable Logic and Applications (FPL), pp. 1–2. IEEE (2017)
Ni, R., Chu, H.m., Castaneda Fernandez, O., Chiang, P.V., Studer, C., Goldstein, T.: Wrapnet: Neural net inference with ultra-low-precision arithmetic. In: 9th International Conference on Learning Representations (ICLR 2021) (2021)
Radosavovic, I., Kosaraju, R.P., Girshick, R., He, K., Dollár, P.: Designing network design spaces. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10428–10436 (2020)
Rodriguez, A., et al.: Lower numerical precision deep learning inference and training. Intel White Paper 3, 1–19 (2018)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Xu, S., et al.: Generative low-bitwidth data free quantization. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 1–17. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_1
Tam, E., et al.: Breaking the memory wall for AI chip with a new dimension. In: 2020 5th South-East Europe Design Automation, Computer Engineering, Computer Networks and Social Media Conference (SEEDA-CECNSM), pp. 1–7. IEEE (2020)
Tan, M., et al.: Mnasnet: platform-aware neural architecture search for mobile. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2820–2828 (2019)
Tschannen, M., Khanna, A., Anandkumar, A.: StrassenNets: deep learning with a multiplication budget. In: International Conference on Machine Learning. pp. 4985–4994. PMLR (2018)
Venkatesh, G., Nurvitadhi, E., Marr, D.: Accelerating deep convolutional networks using low-precision and sparsity (2016)
You, H., et al.: ShiftaddNet: a hardware-inspired deep network. In: Advances in Neural Information Processing Systems, vol. 33 (2020)
Acknowledgements
This research was partially supported by the Ollendorff Miverva Center at the Viterbi Faculty of Electrical and Computer Engineering, Technion.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Eliezer, N.S., Banner, R., Ben-Yaakov, H., Hoffer, E., Michaeli, T. (2023). Power Awareness in Low Precision Neural Networks. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13807. Springer, Cham. https://doi.org/10.1007/978-3-031-25082-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-25082-8_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25081-1
Online ISBN: 978-3-031-25082-8
eBook Packages: Computer ScienceComputer Science (R0)