Nothing Special   »   [go: up one dir, main page]

Skip to main content

Power Awareness in Low Precision Neural Networks

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 Workshops (ECCV 2022)

Abstract

Existing approaches for reducing DNN power consumption rely on quite general principles, including avoidance of multiplication operations and aggressive quantization of weights and activations. However, these methods do not consider the precise power consumed by each module in the network and are therefore not optimal. In this paper we develop accurate power consumption models for all arithmetic operations in the DNN, under various working conditions. We reveal several important factors that have been overlooked to date. Based on our analysis, we present PANN (power-aware neural network), a simple approach for approximating any full-precision network by a low-power fixed-precision variant. Our method can be applied to a pre-trained network and can also be used during training to achieve improved performance. Unlike previous methods, PANN incurs only a minor degradation in accuracy w.r.t. the full-precision version of the network and enables to seamlessly traverse the power-accuracy trade-off at deployment time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.graphcore.ai/.

  2. 2.

    The power consumed by a single bit flip may vary across platforms (e.g., between a 5 nm and a 45 nm fabrication), but the number of bit flips per MAC does not change. We therefore report power in units of bit-flips, which allows comparing between implementations while ignoring the platform.

  3. 3.

    Batch-norm layers should first be absorbed into the weights and biases.

  4. 4.

    In quantized models MAC operations are always performed on integers and rescaling is applied at the end.

References

  1. Abts, D., et al.: Think fast: a tensor streaming processor (tsp) for accelerating deep learning workloads. In: 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), pp. 145–158. IEEE (2020)

    Google Scholar 

  2. Achterhold, J., Koehler, J.M., Schmeink, A., Genewein, T.: Variational network quantization. In: International Conference on Learning Representations (2018)

    Google Scholar 

  3. Asif, S., Kong, Y.: Performance analysis of wallace and radix-4 booth-wallace multipliers. In: 2015 Electronic System Level Synthesis Conference (ESLsyn), pp. 17–22. IEEE (2015)

    Google Scholar 

  4. Banner, R., Nahshan, Y., Soudry, D.: Post training 4-bit quantization of convolutional networks for rapid-deployment. In: Advances in Neural Information Processing Systems, pp. 7950–7958 (2019)

    Google Scholar 

  5. Cai, Y., Yao, Z., Dong, Z., Gholami, A., Mahoney, M.W., Keutzer, K.: Zeroq: a novel zero shot quantization framework. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13169–13178 (2020)

    Google Scholar 

  6. Chen, H., et al.: Addernet: do we really need multiplications in deep learning? In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1468–1477 (2020)

    Google Scholar 

  7. Courbariaux, M., Bengio, Y., David, J.P.: Binaryconnect: training deep neural networks with binary weights during propagations. arXiv preprint arXiv:1511.00363 (2015)

  8. Elhoushi, M., Chen, Z., Shafiq, F., Tian, Y.H., Li, J.Y.: Deepshift: towards multiplication-less neural networks. arXiv preprint arXiv:1905.13298 (2019)

  9. Esser, S.K., McKinstry, J.L., Bablani, D., Appuswamy, R., Modha, D.S.: Learned step size quantization. In: International Conference on Learning Representations (2019)

    Google Scholar 

  10. Fang, J., Shafiee, A., Abdel-Aziz, H., Thorsley, D., Georgiadis, G., Hassoun, J.H.: Post-training piecewise linear quantization for deep neural networks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 69–86. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_5

    Chapter  Google Scholar 

  11. Gudaparthi, S., Narayanan, S., Balasubramonian, R., Giacomin, E., Kambalasubramanyam, H., Gaillardon, P.E.: Wire-aware architecture and dataflow for CNN accelerators. In: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 1–13 (2019)

    Google Scholar 

  12. Gupta, S., Agrawal, A., Gopalakrishnan, K., Narayanan, P.: Deep learning with limited numerical precision. In: International Conference on Machine Learning, pp. 1737–1746 (2015)

    Google Scholar 

  13. Haroush, M., Hubara, I., Hoffer, E., Soudry, D.: The knowledge within: Methods for data-free model compression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8494–8502 (2020)

    Google Scholar 

  14. Horowitz, M.: Computing’s energy problem (and what we can do about it). In:2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pp. 10–14 (2014)

    Google Scholar 

  15. Horowitz, M.: Energy table for 45nm process. In: Stanford VLSI wiki (2014)

    Google Scholar 

  16. Huang, N.C., Chou, H.J., Wu, K.C.: Efficient systolic array based on decomposable mac for quantized deep neural networks (2019)

    Google Scholar 

  17. Hubara, I., Nahshan, Y., Hanani, Y., Banner, R., Soudry, D.: Improving post training neural quantization: Layer-wise calibration and integer programming. arXiv preprint arXiv:2006.10518 (2020)

  18. Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2704–2713 (2018)

    Google Scholar 

  19. Jiao, Y., et al.: 7.2 a 12nm programmable convolution-efficient neural-processing-unit chip achieving 825tops. In: 2020 IEEE International Solid-State Circuits Conference-(ISSCC), pp. 136–140. IEEE (2020)

    Google Scholar 

  20. Jouppi, N.P., et al.: In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th Annual International Symposium on Computer Architecture, pp. 1–12 (2017)

    Google Scholar 

  21. Kalamkar, D.,et al.: A study of bfloat16 for deep learning training. arXiv preprint arXiv:1905.12322 (2019)

  22. Karimi, N., Moos, T., Moradi, A.: Exploring the effect of device aging on static power analysis attacks. UMBC Faculty Collection (2019)

    Google Scholar 

  23. Kim, Y., Park, E., Yoo, S., Choi, T., Yang, L., Shin, D.: Compression of deep convolutional neural networks for fast and low power mobile applications. In: Bengio, Y., LeCun, Y. (eds.) 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2–4, 2016, Conference Track Proceedings (2016). http://arxiv.org/abs/1511.06530

  24. Kim, Y., Kim, H., Yadav, N., Li, S., Choi, K.K.: Low-power RTL code generation for advanced CNN algorithms toward object detection in autonomous vehicles. Electronics 9(3), 478 (2020)

    Article  Google Scholar 

  25. Kwon, H., Chatarasi, P., Pellauer, M., Parashar, A., Sarkar, V., Krishna, T.: Understanding reuse, performance, and hardware cost of DNN dataflow: a data-centric approach. In: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 754–768 (2019)

    Google Scholar 

  26. Lavin, A., Gray, S.: Fast algorithms for convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4013–4021 (2016)

    Google Scholar 

  27. Li, F., Zhang, B., Liu, B.: Ternary weight networks. arXiv preprint arXiv:1605.04711 (2016)

  28. Li, Y., et al.: BRECQ: pushing the limit of post-training quantization by block reconstruction. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=POWv6hDd9XH

  29. Lin, Z., Courbariaux, M., Memisevic, R., Bengio, Y.: Neural networks with few multiplications. arXiv preprint arXiv:1510.03009 (2015)

  30. Liu, X., Ye, M., Zhou, D., Liu, Q.: Post-training quantization with multiple points: Mixed precision without mixed precision. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 8697–8705 (2021)

    Google Scholar 

  31. Louizos, C., Reisser, M., Blankevoort, T., Gavves, E., Welling, M.: Relaxed quantization for discretized neural networks. arXiv preprint arXiv:1810.01875 (2018)

  32. Mahmoud, M.: Tensordash: Exploiting sparsity to accelerate deep neural network training and inference (2020)

    Google Scholar 

  33. Mishra, A., Nurvitadhi, E., Cook, J.J., Marr, D.: WRPN: wide reduced-precision networks. arXiv preprint arXiv:1709.01134 (2017)

  34. Mukherjee, A., Saurav, K., Nair, P., Shekhar, S., Lis, M.: A case for emerging memories in dnn accelerators. In: 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 938–941. IEEE (2021)

    Google Scholar 

  35. Nagel, M., Amjad, R.A., van Baalen, M., Louizos, C., Blankevoort, T.: Up or down? adaptive rounding for post-training quantization. arXiv preprint arXiv:2004.10568 (2020)

  36. Nagel, M., Baalen, M.v., Blankevoort, T., Welling, M.: Data-free quantization through weight equalization and bias correction. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1325–1334 (2019)

    Google Scholar 

  37. Nahshan, Y., et al.: Loss aware post-training quantization. arXiv preprint arXiv:1911.07190 (2019)

  38. Nasser, Y., Prévotet, J.C., Hélard, M., Lorandel, J.: Dynamic power estimation based on switching activity propagation. In: 2017 27th International Conference on Field Programmable Logic and Applications (FPL), pp. 1–2. IEEE (2017)

    Google Scholar 

  39. Ni, R., Chu, H.m., Castaneda Fernandez, O., Chiang, P.V., Studer, C., Goldstein, T.: Wrapnet: Neural net inference with ultra-low-precision arithmetic. In: 9th International Conference on Learning Representations (ICLR 2021) (2021)

    Google Scholar 

  40. Radosavovic, I., Kosaraju, R.P., Girshick, R., He, K., Dollár, P.: Designing network design spaces. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10428–10436 (2020)

    Google Scholar 

  41. Rodriguez, A., et al.: Lower numerical precision deep learning inference and training. Intel White Paper 3, 1–19 (2018)

    Google Scholar 

  42. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv 2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)

    Google Scholar 

  43. Xu, S., et al.: Generative low-bitwidth data free quantization. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 1–17. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_1

    Chapter  Google Scholar 

  44. Tam, E., et al.: Breaking the memory wall for AI chip with a new dimension. In: 2020 5th South-East Europe Design Automation, Computer Engineering, Computer Networks and Social Media Conference (SEEDA-CECNSM), pp. 1–7. IEEE (2020)

    Google Scholar 

  45. Tan, M., et al.: Mnasnet: platform-aware neural architecture search for mobile. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2820–2828 (2019)

    Google Scholar 

  46. Tschannen, M., Khanna, A., Anandkumar, A.: StrassenNets: deep learning with a multiplication budget. In: International Conference on Machine Learning. pp. 4985–4994. PMLR (2018)

    Google Scholar 

  47. Venkatesh, G., Nurvitadhi, E., Marr, D.: Accelerating deep convolutional networks using low-precision and sparsity (2016)

    Google Scholar 

  48. You, H., et al.: ShiftaddNet: a hardware-inspired deep network. In: Advances in Neural Information Processing Systems, vol. 33 (2020)

    Google Scholar 

Download references

Acknowledgements

This research was partially supported by the Ollendorff Miverva Center at the Viterbi Faculty of Electrical and Computer Engineering, Technion.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nurit Spingarn Eliezer .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1160 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Eliezer, N.S., Banner, R., Ben-Yaakov, H., Hoffer, E., Michaeli, T. (2023). Power Awareness in Low Precision Neural Networks. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13807. Springer, Cham. https://doi.org/10.1007/978-3-031-25082-8_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-25082-8_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-25081-1

  • Online ISBN: 978-3-031-25082-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics