Nothing Special   »   [go: up one dir, main page]

Skip to main content

OvSW: Overcoming Silent Weights for Accurate Binary Neural Networks

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Abstract

Binary Neural Networks (BNNs) have been proven to be highly effective for deploying deep neural networks on mobile and embedded platforms. Most existing works focus on minimizing quantization errors, improving representation ability, or designing gradient approximations to alleviate gradient mismatch in BNNs, while leaving the weight sign flipping, a critical factor for achieving powerful BNNs, untouched. In this paper, we investigate the efficiency of weight sign updates in BNNs. We observe that, for vanilla BNNs, over 50% of the weights remain their signs unchanged during training, and these weights are not only distributed at the tails of the weight distribution but also universally present in the vicinity of zero. We refer to these weights as “silent weights”, which slow down convergence and lead to a significant accuracy degradation. Theoretically, we reveal this is due to the independence of the BNNs gradient from the latent weight distribution. To address the issue, we propose Overcome Silent Weights (OvSW). OvSW first employs Adaptive Gradient Scaling (AGS) to establish a relationship between the gradient and the latent weight distribution, thereby improving the overall efficiency of weight sign updates. Additionally, we design Silence Awareness Decaying (SAD) to automatically identify “silent weights” by tracking weight flipping state, and apply an additional penalty to “silent weights” to facilitate their flipping. By efficiently updating weight signs, our method achieves faster convergence and state-of-the-art performance on CIFAR10 and ImageNet1K dataset with various architectures. For example, OvSW obtains 61.6% and 65.5% top-1 accuracy on the ImageNet1K using binarized ResNet18 and ResNet34 architecture respectively. Codes are available at https://github.com/JingyangXiang/OvSW.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bengio, Y., Léonard, N., Courville, A.: Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013)

  2. Brock, A., De, S., Smith, S.L., Simonyan, K.: High-performance large-scale image recognition without normalization. In: International Conference Machine Learning, pp. 1059–1071. PMLR (2021)

    Google Scholar 

  3. Bulat, A., Tzimiropoulos, G.: Xnor-net++: Improved binary neural networks. arXiv preprint arXiv:1909.13863 (2019)

  4. Chen, J., Liu, L., Liu, Y., Zeng, X.: A learning framework for n-bit quantized neural networks toward fpgas. IEEE Trans. Neural Netw. Learn. Syst. 32(3), 1067–1081 (2020)

    Article  MathSciNet  Google Scholar 

  5. Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks: training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830 (2016)

  6. Ding, R., Chin, T.W., Liu, Z., Marculescu, D.: Regularizing activation distribution for training binarized deep networks. In: IEEE Conference Computer Vision Pattern Recognition, pp. 11408–11417 (2019)

    Google Scholar 

  7. Ding, X., Zhang, X., Ma, N., Han, J., Ding, G., Sun, J.: Repvgg: making vgg-style convnets great again. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 13733–13742 (2021)

    Google Scholar 

  8. on, Duan, Y., et al.: Transnas-bench-101: improving transferability and generalizability of cross-task neural architecture search. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 5251–5260 (2021)

    Google Scholar 

  9. Feng, J.: Bolt. https://github.com/huawei-noah/bolt (2021)

  10. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 580–587 (2014)

    Google Scholar 

  11. Gong, R., et al.: Differentiable soft quantization: bridging full-precision and low-bit neural networks. In: International Conference on Computer Vision, pp. 4852–4861 (2019)

    Google Scholar 

  12. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: International Conference Computer Vision, pp. 2961–2969 (2017)

    Google Scholar 

  13. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: International Conference Computer Vision, pp. 1026–1034 (2015)

    Google Scholar 

  14. He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)

    Article  Google Scholar 

  15. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  16. He, X., et al.: ProxyBNN: learning binarized neural networks via proxy matrices. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 223–241. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_14

    Chapter  Google Scholar 

  17. Helwegen, K., Widdicombe, J., Geiger, L., Liu, Z., Cheng, K.T., Nusselder, R.: Latent weights do not exist: rethinking binarized neural network optimization. In: Advance Neural Information Processing System, vol.32 (2019)

    Google Scholar 

  18. Hinton, G., Vinyals, O., Dean, J., et al.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 2(7) (2015)

  19. Horowitz, M.: 1.1 computing’s energy problem (and what we can do about it). In: 2014 IEEE International Solid-state Circuits Conference Digest of Technical Papers (ISSCC), pp. 10–14. IEEE (2014)

    Google Scholar 

  20. Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: IEEE Conference Computer Vision Pattern Recognition, pp. 2704–2713 (2018)

    Google Scholar 

  21. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  22. Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)

    Google Scholar 

  23. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inform. Process. Syst. 25 (2012)

    Google Scholar 

  24. Lee, C., Kim, H., Park, E., Kim, J.J.: Insta-bnn: binary neural network with instance-aware threshold. In: International Conference Computer Vision, pp. 17325–17334 (2023)

    Google Scholar 

  25. Lee, J., Kim, D., Ham, B.: Network quantization with element-wise gradient scaling. In: IEEE Conference Computer Vision Pattern Recognition, pp. 6448–6457 (2021)

    Google Scholar 

  26. Li, H., Xu, Z., Taylor, G., Studer, C., Goldstein, T.: Visualizing the loss landscape of neural nets. Adv. Neural Inform. Process. Syst. (2018)

    Google Scholar 

  27. Li, S., Lin, M., Wang, Y., Fei, C., Shao, L., Ji, R.: Learning efficient gans for image translation via differentiable masks and co-attention distillation. IEEE Trans, Multimedia (2022)

    Google Scholar 

  28. f Lin, M., et al.: Hrank: filter pruning using high-rank feature map. In: IEEE Conference Computer Vision Pattern Recognition, pp. 1529–1538 (2020)

    Google Scholar 

  29. Lin, M., Ji, R., Xu, Z., Zhang, B., Chao, F., Lin, C.W., Shao, L.: Siman: Sign-to-magnitude network binarization. IEEE Trans. Pattern Anal. Mach, Intell (2022)

    Google Scholar 

  30. Lin, M., et al.: Rotated binary neural network. Adv. Neural Inform. Process. Syst. 33, 7474–7485 (2020)

    Google Scholar 

  31. Lin, X., Zhao, C., Pan, W.: Towards accurate binary convolutional neural network. Adv. Neural Inform. Process. Syst. 30 (2017)

    Google Scholar 

  32. Liu, Z., Shen, Z., Li, S., Helwegen, K., Huang, D., Cheng, K.T.: How do adam and training strategies help bnns optimization. In: International Conference Machine Learning, pp. 6936–6946. PMLR (2021)

    Google Scholar 

  33. Liu, Z., Shen, Z., Savvides, M., Cheng, K.-T.: ReActNet: towards precise binary neural network with generalized activation functions. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12359, pp. 143–159. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58568-6_9

    Chapter  Google Scholar 

  34. Liu, Z., Wu, B., Luo, W., Yang, X., Liu, W., Cheng, K.T.: Bi-real net: enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm. In: European Conference Computer Vision, pp. 722–737 (2018)

    Google Scholar 

  35. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 3431–3440 (2015)

    Google Scholar 

  36. Luo, J.H., Wu, J.: Autopruner: an end-to-end trainable filter pruning method for efficient deep model inference. Pattern Recogn. 107, 107461 (2020)

    Article  Google Scholar 

  37. Martinez, B., Yang, J., Bulat, A., Tzimiropoulos, G.: Training binary neural networks with real-to-binary convolutions. arXiv preprint arXiv:2003.11535 (2020)

  38. Nagel, M., Fournarakis, M., Bondarenko, Y., Blankevoort, T.: Overcoming oscillations in quantization-aware training. In: International Conference Machince Learning, pp. 16318–16330. PMLR (2022)

    Google Scholar 

  39. Paszke, A., et al.: PyTorch: an Imperative Style, High-Performance Deep Learning Library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché Buc, F., Fox, E., Garnett, R. (eds.) Adv. Neural Inform. Process. Syst, pp. 8024–8035. Curran Associates, Inc. (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf

  40. Qin, H., et al.: Forward and backward information retention for accurate binary neural networks. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 2250–2259 (2020)

    Google Scholar 

  41. Rajbhandari, S., Rasley, J., Ruwase, O., He, Y.: Zero: memory optimizations toward training trillion parameter models. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–16. IEEE (2020)

    Google Scholar 

  42. Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: imagenet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV, pp. 525–542. Springer International Publishing, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32

    Chapter  Google Scholar 

  43. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2016)

    Article  Google Scholar 

  44. Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  45. Shang, Y., Xu, D., Zong, Z., Yan, Y.: Network binarization via contrastive learning. arXiv preprint arXiv:2207.02970 (2022)

  46. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  47. Su, X., et al.: Prioritized architecture sampling with monto-carlo tree search. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 10968–10977 (2021)

    Google Scholar 

  48. Tu, Z., Chen, X., Ren, P., Wang, Y.: Adabin: improving binary neural networks with adaptive binary sets. In: European Conference Computer Vision, pp. 379–395. Springer (2022). https://doi.org/10.1007/978-3-031-20083-0_23

  49. Wu, X.M., Zheng, D., Liu, Z., Zheng, W.S.: Estimator meets equilibrium perspective: a rectified straight through estimator for binary neural networks training. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 17055–17064 (2023)

    Google Scholar 

  50. Xu, S., et al.: Resilient binary neural network. AAAI. 37, 10620–10628 (2023)

    Google Scholar 

  51. Xu, S., et al.: Recurrent bilinear optimization for binary neural networks. Eur. Conf. Comput. Vis. 19–35. Springer (2022). https://doi.org/10.1007/978-3-031-20053-3_2

  52. Xu, Y., Han, K., Xu, C., Tang, Y., Xu, C., Wang, Y.: Learning frequency domain approximation for binary neural networks. Adv. Neural Inform. Process. Syst. 34, 25553–25565 (2021)

    Google Scholar 

  53. Xu, Z., et al.: Recu: reviving the dead weights in binary neural networks. In: International Conference Computer Vision, pp. 5198–5208 (2021)

    Google Scholar 

  54. Yang, J., et al.: Quantization networks. In: IEEE Conference Computing Vision Pattern Recognition, pp. 7308–7316 (2019)

    Google Scholar 

  55. Yang, Z., et al.: Searching for low-bit weights in quantized neural networks. Adv. Neural Inform. Process. Syst. 33, 4091–4102 (2020)

    Google Scholar 

  56. You, Y., Gitman, I., Ginsburg, B.: Large batch training of convolutional networks. arXiv preprint arXiv:1708.03888 (2017)

  57. Zhang, D., Yang, J., Ye, D., Hua, G.: Lq-nets: learned quantization for highly accurate and compact deep neural networks. In: European Conference on Computer Vision, pp. 365–382 (2018)

    Google Scholar 

  58. Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 6848–6856 (2018)

    Google Scholar 

  59. Zhao, J., Yang, L., Zhang, B., Guo, G., Doermann, D.S.: Uncertainty-aware binary neural networks. In: IJCAI. pp. 3441–3447 (2021)

    Google Scholar 

  60. Zhou, S., Wu, Y., Ni, Z., Zhou, X., Wen, H., Zou, Y.: Dorefa-net: training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016)

  61. Zhu, F., et al.: Towards unified int8 training for convolutional neural network. In: IEEE Conference on Computer Vision Pattern Recognition,. pp. 1969–1979 (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yong Liu .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 3760 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xiang, J., Chen, Z., Li, S., Wu, Q., Liu, Y. (2025). OvSW: Overcoming Silent Weights for Accurate Binary Neural Networks. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15091. Springer, Cham. https://doi.org/10.1007/978-3-031-73414-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-73414-4_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-73413-7

  • Online ISBN: 978-3-031-73414-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics