Abstract
Binary Neural Networks (BNNs) have been proven to be highly effective for deploying deep neural networks on mobile and embedded platforms. Most existing works focus on minimizing quantization errors, improving representation ability, or designing gradient approximations to alleviate gradient mismatch in BNNs, while leaving the weight sign flipping, a critical factor for achieving powerful BNNs, untouched. In this paper, we investigate the efficiency of weight sign updates in BNNs. We observe that, for vanilla BNNs, over 50% of the weights remain their signs unchanged during training, and these weights are not only distributed at the tails of the weight distribution but also universally present in the vicinity of zero. We refer to these weights as “silent weights”, which slow down convergence and lead to a significant accuracy degradation. Theoretically, we reveal this is due to the independence of the BNNs gradient from the latent weight distribution. To address the issue, we propose Overcome Silent Weights (OvSW). OvSW first employs Adaptive Gradient Scaling (AGS) to establish a relationship between the gradient and the latent weight distribution, thereby improving the overall efficiency of weight sign updates. Additionally, we design Silence Awareness Decaying (SAD) to automatically identify “silent weights” by tracking weight flipping state, and apply an additional penalty to “silent weights” to facilitate their flipping. By efficiently updating weight signs, our method achieves faster convergence and state-of-the-art performance on CIFAR10 and ImageNet1K dataset with various architectures. For example, OvSW obtains 61.6% and 65.5% top-1 accuracy on the ImageNet1K using binarized ResNet18 and ResNet34 architecture respectively. Codes are available at https://github.com/JingyangXiang/OvSW.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bengio, Y., Léonard, N., Courville, A.: Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013)
Brock, A., De, S., Smith, S.L., Simonyan, K.: High-performance large-scale image recognition without normalization. In: International Conference Machine Learning, pp. 1059–1071. PMLR (2021)
Bulat, A., Tzimiropoulos, G.: Xnor-net++: Improved binary neural networks. arXiv preprint arXiv:1909.13863 (2019)
Chen, J., Liu, L., Liu, Y., Zeng, X.: A learning framework for n-bit quantized neural networks toward fpgas. IEEE Trans. Neural Netw. Learn. Syst. 32(3), 1067–1081 (2020)
Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks: training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830 (2016)
Ding, R., Chin, T.W., Liu, Z., Marculescu, D.: Regularizing activation distribution for training binarized deep networks. In: IEEE Conference Computer Vision Pattern Recognition, pp. 11408–11417 (2019)
Ding, X., Zhang, X., Ma, N., Han, J., Ding, G., Sun, J.: Repvgg: making vgg-style convnets great again. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 13733–13742 (2021)
on, Duan, Y., et al.: Transnas-bench-101: improving transferability and generalizability of cross-task neural architecture search. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 5251–5260 (2021)
Feng, J.: Bolt. https://github.com/huawei-noah/bolt (2021)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 580–587 (2014)
Gong, R., et al.: Differentiable soft quantization: bridging full-precision and low-bit neural networks. In: International Conference on Computer Vision, pp. 4852–4861 (2019)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: International Conference Computer Vision, pp. 2961–2969 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: International Conference Computer Vision, pp. 1026–1034 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 770–778 (2016)
He, X., et al.: ProxyBNN: learning binarized neural networks via proxy matrices. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 223–241. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_14
Helwegen, K., Widdicombe, J., Geiger, L., Liu, Z., Cheng, K.T., Nusselder, R.: Latent weights do not exist: rethinking binarized neural network optimization. In: Advance Neural Information Processing System, vol.32 (2019)
Hinton, G., Vinyals, O., Dean, J., et al.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 2(7) (2015)
Horowitz, M.: 1.1 computing’s energy problem (and what we can do about it). In: 2014 IEEE International Solid-state Circuits Conference Digest of Technical Papers (ISSCC), pp. 10–14. IEEE (2014)
Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: IEEE Conference Computer Vision Pattern Recognition, pp. 2704–2713 (2018)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inform. Process. Syst. 25 (2012)
Lee, C., Kim, H., Park, E., Kim, J.J.: Insta-bnn: binary neural network with instance-aware threshold. In: International Conference Computer Vision, pp. 17325–17334 (2023)
Lee, J., Kim, D., Ham, B.: Network quantization with element-wise gradient scaling. In: IEEE Conference Computer Vision Pattern Recognition, pp. 6448–6457 (2021)
Li, H., Xu, Z., Taylor, G., Studer, C., Goldstein, T.: Visualizing the loss landscape of neural nets. Adv. Neural Inform. Process. Syst. (2018)
Li, S., Lin, M., Wang, Y., Fei, C., Shao, L., Ji, R.: Learning efficient gans for image translation via differentiable masks and co-attention distillation. IEEE Trans, Multimedia (2022)
f Lin, M., et al.: Hrank: filter pruning using high-rank feature map. In: IEEE Conference Computer Vision Pattern Recognition, pp. 1529–1538 (2020)
Lin, M., Ji, R., Xu, Z., Zhang, B., Chao, F., Lin, C.W., Shao, L.: Siman: Sign-to-magnitude network binarization. IEEE Trans. Pattern Anal. Mach, Intell (2022)
Lin, M., et al.: Rotated binary neural network. Adv. Neural Inform. Process. Syst. 33, 7474–7485 (2020)
Lin, X., Zhao, C., Pan, W.: Towards accurate binary convolutional neural network. Adv. Neural Inform. Process. Syst. 30 (2017)
Liu, Z., Shen, Z., Li, S., Helwegen, K., Huang, D., Cheng, K.T.: How do adam and training strategies help bnns optimization. In: International Conference Machine Learning, pp. 6936–6946. PMLR (2021)
Liu, Z., Shen, Z., Savvides, M., Cheng, K.-T.: ReActNet: towards precise binary neural network with generalized activation functions. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12359, pp. 143–159. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58568-6_9
Liu, Z., Wu, B., Luo, W., Yang, X., Liu, W., Cheng, K.T.: Bi-real net: enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm. In: European Conference Computer Vision, pp. 722–737 (2018)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 3431–3440 (2015)
Luo, J.H., Wu, J.: Autopruner: an end-to-end trainable filter pruning method for efficient deep model inference. Pattern Recogn. 107, 107461 (2020)
Martinez, B., Yang, J., Bulat, A., Tzimiropoulos, G.: Training binary neural networks with real-to-binary convolutions. arXiv preprint arXiv:2003.11535 (2020)
Nagel, M., Fournarakis, M., Bondarenko, Y., Blankevoort, T.: Overcoming oscillations in quantization-aware training. In: International Conference Machince Learning, pp. 16318–16330. PMLR (2022)
Paszke, A., et al.: PyTorch: an Imperative Style, High-Performance Deep Learning Library. In: Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché Buc, F., Fox, E., Garnett, R. (eds.) Adv. Neural Inform. Process. Syst, pp. 8024–8035. Curran Associates, Inc. (2019). http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
Qin, H., et al.: Forward and backward information retention for accurate binary neural networks. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 2250–2259 (2020)
Rajbhandari, S., Rasley, J., Ruwase, O., He, Y.: Zero: memory optimizations toward training trillion parameter models. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–16. IEEE (2020)
Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: imagenet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV, pp. 525–542. Springer International Publishing, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2016)
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Shang, Y., Xu, D., Zong, Z., Yan, Y.: Network binarization via contrastive learning. arXiv preprint arXiv:2207.02970 (2022)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Su, X., et al.: Prioritized architecture sampling with monto-carlo tree search. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 10968–10977 (2021)
Tu, Z., Chen, X., Ren, P., Wang, Y.: Adabin: improving binary neural networks with adaptive binary sets. In: European Conference Computer Vision, pp. 379–395. Springer (2022). https://doi.org/10.1007/978-3-031-20083-0_23
Wu, X.M., Zheng, D., Liu, Z., Zheng, W.S.: Estimator meets equilibrium perspective: a rectified straight through estimator for binary neural networks training. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 17055–17064 (2023)
Xu, S., et al.: Resilient binary neural network. AAAI. 37, 10620–10628 (2023)
Xu, S., et al.: Recurrent bilinear optimization for binary neural networks. Eur. Conf. Comput. Vis. 19–35. Springer (2022). https://doi.org/10.1007/978-3-031-20053-3_2
Xu, Y., Han, K., Xu, C., Tang, Y., Xu, C., Wang, Y.: Learning frequency domain approximation for binary neural networks. Adv. Neural Inform. Process. Syst. 34, 25553–25565 (2021)
Xu, Z., et al.: Recu: reviving the dead weights in binary neural networks. In: International Conference Computer Vision, pp. 5198–5208 (2021)
Yang, J., et al.: Quantization networks. In: IEEE Conference Computing Vision Pattern Recognition, pp. 7308–7316 (2019)
Yang, Z., et al.: Searching for low-bit weights in quantized neural networks. Adv. Neural Inform. Process. Syst. 33, 4091–4102 (2020)
You, Y., Gitman, I., Ginsburg, B.: Large batch training of convolutional networks. arXiv preprint arXiv:1708.03888 (2017)
Zhang, D., Yang, J., Ye, D., Hua, G.: Lq-nets: learned quantization for highly accurate and compact deep neural networks. In: European Conference on Computer Vision, pp. 365–382 (2018)
Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: IEEE Conference on Computer Vision Pattern Recognition, pp. 6848–6856 (2018)
Zhao, J., Yang, L., Zhang, B., Guo, G., Doermann, D.S.: Uncertainty-aware binary neural networks. In: IJCAI. pp. 3441–3447 (2021)
Zhou, S., Wu, Y., Ni, Z., Zhou, X., Wen, H., Zou, Y.: Dorefa-net: training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016)
Zhu, F., et al.: Towards unified int8 training for convolutional neural network. In: IEEE Conference on Computer Vision Pattern Recognition,. pp. 1969–1979 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Xiang, J., Chen, Z., Li, S., Wu, Q., Liu, Y. (2025). OvSW: Overcoming Silent Weights for Accurate Binary Neural Networks. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15091. Springer, Cham. https://doi.org/10.1007/978-3-031-73414-4_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-73414-4_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73413-7
Online ISBN: 978-3-031-73414-4
eBook Packages: Computer ScienceComputer Science (R0)