OvSW: Overcoming Silent Weights for Accurate Binary Neural Networks

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15091))

Included in the following conference series:

European Conference on Computer Vision

279 Accesses

Abstract

Binary Neural Networks (BNNs) have been proven to be highly effective for deploying deep neural networks on mobile and embedded platforms. Most existing works focus on minimizing quantization errors, improving representation ability, or designing gradient approximations to alleviate gradient mismatch in BNNs, while leaving the weight sign flipping, a critical factor for achieving powerful BNNs, untouched. In this paper, we investigate the efficiency of weight sign updates in BNNs. We observe that, for vanilla BNNs, over 50% of the weights remain their signs unchanged during training, and these weights are not only distributed at the tails of the weight distribution but also universally present in the vicinity of zero. We refer to these weights as “silent weights”, which slow down convergence and lead to a significant accuracy degradation. Theoretically, we reveal this is due to the independence of the BNNs gradient from the latent weight distribution. To address the issue, we propose Overcome Silent Weights (OvSW). OvSW first employs Adaptive Gradient Scaling (AGS) to establish a relationship between the gradient and the latent weight distribution, thereby improving the overall efficiency of weight sign updates. Additionally, we design Silence Awareness Decaying (SAD) to automatically identify “silent weights” by tracking weight flipping state, and apply an additional penalty to “silent weights” to facilitate their flipping. By efficiently updating weight signs, our method achieves faster convergence and state-of-the-art performance on CIFAR10 and ImageNet1K dataset with various architectures. For example, OvSW obtains 61.6% and 65.5% top-1 accuracy on the ImageNet1K using binarized ResNet18 and ResNet34 architecture respectively. Codes are available at https://github.com/JingyangXiang/OvSW.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Self-distribution binary neural networks

Article 28 February 2022

SATB-Nets: Training Deep Neural Networks with Segmented Asymmetric Ternary and Binary Weights

Enhancing Optimization Robustness in 1-Bit Neural Networks Through Stochastic Sign Descent

References

Bengio, Y., Léonard, N., Courville, A.: Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013)
Brock, A., De, S., Smith, S.L., Simonyan, K.: High-performance large-scale image recognition without normalization. In: International Conference Machine Learning, pp. 1059–1071. PMLR (2021)
Google Scholar
Bulat, A., Tzimiropoulos, G.: Xnor-net++: Improved binary neural networks. arXiv preprint arXiv:1909.13863 (2019)
Chen, J., Liu, L., Liu, Y., Zeng, X.: A learning framework for n-bit quantized neural networks toward fpgas. IEEE Trans. Neural Netw. Learn. Syst. 32(3), 1067–1081 (2020)
Article MathSciNet Google Scholar
Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks: training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830 (2016)
Ding, R., Chin, T.W., Liu, Z., Marculescu, D.: Regularizing activation distribution for training binarized deep networks. In: IEEE Conference Computer Vision Pattern Recognition, pp. 11408–11417 (2019)
Google Scholar
Ding, X., Zhang, X., Ma, N., Han, J., Ding, G., Sun, J.: Repvgg: making vgg-style convnets great again. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 13733–13742 (2021)
Google Scholar
on, Duan, Y., et al.: Transnas-bench-101: improving transferability and generalizability of cross-task neural architecture search. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 5251–5260 (2021)
Google Scholar
Feng, J.: Bolt. https://github.com/huawei-noah/bolt (2021)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 580–587 (2014)
Google Scholar
Gong, R., et al.: Differentiable soft quantization: bridging full-precision and low-bit neural networks. In: International Conference on Computer Vision, pp. 4852–4861 (2019)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: International Conference Computer Vision, pp. 2961–2969 (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: International Conference Computer Vision, pp. 1026–1034 (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 770–778 (2016)
Google Scholar
He, X., et al.: ProxyBNN: learning binarized neural networks via proxy matrices. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 223–241. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_14
Chapter Google Scholar
Helwegen, K., Widdicombe, J., Geiger, L., Liu, Z., Cheng, K.T., Nusselder, R.: Latent weights do not exist: rethinking binarized neural network optimization. In: Advance Neural Information Processing System, vol.32 (2019)
Google Scholar
Hinton, G., Vinyals, O., Dean, J., et al.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 2(7) (2015)
Horowitz, M.: 1.1 computing’s energy problem (and what we can do about it). In: 2014 IEEE International Solid-state Circuits Conference Digest of Technical Papers (ISSCC), pp. 10–14. IEEE (2014)
Google Scholar
Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: IEEE Conference Computer Vision Pattern Recognition, pp. 2704–2713 (2018)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inform. Process. Syst. 25 (2012)
Google Scholar
Lee, C., Kim, H., Park, E., Kim, J.J.: Insta-bnn: binary neural network with instance-aware threshold. In: International Conference Computer Vision, pp. 17325–17334 (2023)
Google Scholar
Lee, J., Kim, D., Ham, B.: Network quantization with element-wise gradient scaling. In: IEEE Conference Computer Vision Pattern Recognition, pp. 6448–6457 (2021)
Google Scholar
Li, H., Xu, Z., Taylor, G., Studer, C., Goldstein, T.: Visualizing the loss landscape of neural nets. Adv. Neural Inform. Process. Syst. (2018)
Google Scholar
Li, S., Lin, M., Wang, Y., Fei, C., Shao, L., Ji, R.: Learning efficient gans for image translation via differentiable masks and co-attention distillation. IEEE Trans, Multimedia (2022)
Google Scholar
f Lin, M., et al.: Hrank: filter pruning using high-rank feature map. In: IEEE Conference Computer Vision Pattern Recognition, pp. 1529–1538 (2020)
Google Scholar
Lin, M., Ji, R., Xu, Z., Zhang, B., Chao, F., Lin, C.W., Shao, L.: Siman: Sign-to-magnitude network binarization. IEEE Trans. Pattern Anal. Mach, Intell (2022)
Google Scholar
Lin, M., et al.: Rotated binary neural network. Adv. Neural Inform. Process. Syst. 33, 7474–7485 (2020)
Google Scholar
Lin, X., Zhao, C., Pan, W.: Towards accurate binary convolutional neural network. Adv. Neural Inform. Process. Syst. 30 (2017)
Google Scholar
Liu, Z., Shen, Z., Li, S., Helwegen, K., Huang, D., Cheng, K.T.: How do adam and training strategies help bnns optimization. In: International Conference Machine Learning, pp. 6936–6946. PMLR (2021)
Google Scholar
Liu, Z., Shen, Z., Savvides, M., Cheng, K.-T.: ReActNet: towards precise binary neural network with generalized activation functions. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12359, pp. 143–159. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58568-6_9
Chapter Google Scholar
Liu, Z., Wu, B., Luo, W., Yang, X., Liu, W., Cheng, K.T.: Bi-real net: enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm. In: European Conference Computer Vision, pp. 722–737 (2018)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 3431–3440 (2015)
Google Scholar
Luo, J.H., Wu, J.: Autopruner: an end-to-end trainable filter pruning method for efficient deep model inference. Pattern Recogn. 107, 107461 (2020)
Article Google Scholar
Martinez, B., Yang, J., Bulat, A., Tzimiropoulos, G.: Training binary neural networks with real-to-binary convolutions. arXiv preprint arXiv:2003.11535 (2020)
Nagel, M., Fournarakis, M., Bondarenko, Y., Blankevoort, T.: Overcoming oscillations in quantization-aware training. In: International Conference Machince Learning, pp. 16318–16330. PMLR (2022)
Google Scholar
Paszke, A., et al.: PyTorch: an Imperative Style, High-Performance Deep Learning Library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché Buc, F., Fox, E., Garnett, R. (eds.) Adv. Neural Inform. Process. Syst, pp. 8024–8035. Curran Associates, Inc. (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Qin, H., et al.: Forward and backward information retention for accurate binary neural networks. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 2250–2259 (2020)
Google Scholar
Rajbhandari, S., Rasley, J., Ruwase, O., He, Y.: Zero: memory optimizations toward training trillion parameter models. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–16. IEEE (2020)
Google Scholar
Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: imagenet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV, pp. 525–542. Springer International Publishing, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32
Chapter Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2016)
Article Google Scholar
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Shang, Y., Xu, D., Zong, Z., Yan, Y.: Network binarization via contrastive learning. arXiv preprint arXiv:2207.02970 (2022)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Su, X., et al.: Prioritized architecture sampling with monto-carlo tree search. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 10968–10977 (2021)
Google Scholar
Tu, Z., Chen, X., Ren, P., Wang, Y.: Adabin: improving binary neural networks with adaptive binary sets. In: European Conference Computer Vision, pp. 379–395. Springer (2022). https://doi.org/10.1007/978-3-031-20083-0_23
Wu, X.M., Zheng, D., Liu, Z., Zheng, W.S.: Estimator meets equilibrium perspective: a rectified straight through estimator for binary neural networks training. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 17055–17064 (2023)
Google Scholar
Xu, S., et al.: Resilient binary neural network. AAAI. 37, 10620–10628 (2023)
Google Scholar
Xu, S., et al.: Recurrent bilinear optimization for binary neural networks. Eur. Conf. Comput. Vis. 19–35. Springer (2022). https://doi.org/10.1007/978-3-031-20053-3_2
Xu, Y., Han, K., Xu, C., Tang, Y., Xu, C., Wang, Y.: Learning frequency domain approximation for binary neural networks. Adv. Neural Inform. Process. Syst. 34, 25553–25565 (2021)
Google Scholar
Xu, Z., et al.: Recu: reviving the dead weights in binary neural networks. In: International Conference Computer Vision, pp. 5198–5208 (2021)
Google Scholar
Yang, J., et al.: Quantization networks. In: IEEE Conference Computing Vision Pattern Recognition, pp. 7308–7316 (2019)
Google Scholar
Yang, Z., et al.: Searching for low-bit weights in quantized neural networks. Adv. Neural Inform. Process. Syst. 33, 4091–4102 (2020)
Google Scholar
You, Y., Gitman, I., Ginsburg, B.: Large batch training of convolutional networks. arXiv preprint arXiv:1708.03888 (2017)
Zhang, D., Yang, J., Ye, D., Hua, G.: Lq-nets: learned quantization for highly accurate and compact deep neural networks. In: European Conference on Computer Vision, pp. 365–382 (2018)
Google Scholar
Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 6848–6856 (2018)
Google Scholar
Zhao, J., Yang, L., Zhang, B., Guo, G., Doermann, D.S.: Uncertainty-aware binary neural networks. In: IJCAI. pp. 3441–3447 (2021)
Google Scholar
Zhou, S., Wu, Y., Ni, Z., Zhou, X., Wen, H., Zou, Y.: Dorefa-net: training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016)
Zhu, F., et al.: Towards unified int8 training for convolutional neural network. In: IEEE Conference on Computer Vision Pattern Recognition,. pp. 1969–1979 (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

APRIL Lab, Zhejiang University, Hangzhou, China
Jingyang Xiang, Siqi Li & Yong Liu
Zhejiang University of Technology, Hangzhou, China
Zuohui Chen
College of Computer Science, Hangzhou Dianzi University, Hangzhou, China
Qing Wu
Huzhou Institute, Zhejiang University, Hangzhou, China
Yong Liu

Authors

Jingyang Xiang
View author publications
You can also search for this author in PubMed Google Scholar
Zuohui Chen
View author publications
You can also search for this author in PubMed Google Scholar
Siqi Li
View author publications
You can also search for this author in PubMed Google Scholar
Qing Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yong Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yong Liu .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Hessen, Germany
Stefan Roth
Princeton University, Palo Alto, CA, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 3760 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xiang, J., Chen, Z., Li, S., Wu, Q., Liu, Y. (2025). OvSW: Overcoming Silent Weights for Accurate Binary Neural Networks. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15091. Springer, Cham. https://doi.org/10.1007/978-3-031-73414-4_1

Download citation

DOI: https://doi.org/10.1007/978-3-031-73414-4_1
Published: 25 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73413-7
Online ISBN: 978-3-031-73414-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics