Abstract
Data augmentation is a crucial strategy to tackle issues like inadequate model robustness and a significant generalization gap. It is proven to combat overfitting, elevate deep neural network performance, and enhance generalization, particularly when data are limited. In recent years, mixed sample data augmentation (MSDA), including variants like Mixup and CutMix, has gained significant attention. However, these methods sometimes confound the network with misleading signals, limiting their effectiveness. In this context, we propose LocMix, an MSDA that aims to generate new training samples by prioritizing local saliency feature information and employing statistical data mixing. We achieve this by concealing salient regions with random masks and efficiently combining images through the optimization of local saliency information using transportation methods. Prioritizing the local features within an image allows LocMix to capture image details with greater accuracy and comprehensiveness, thereby enhancing the model’s capacity to understand the target image. We conduct extensive validation of this approach on various challenging datasets. When applied to the training of the PreAct-ResNet18 model, our method yields notable improvements in accuracy. Specifically, on the CIFAR-10 dataset, we observe an impressive 1.71% accuracy enhancement. Similarly, on CIFAR-100, Tiny-ImageNet, ImageNet, and SVHN, we attain substantial accuracy improvements of 80.12%, 64.60%, 77.62%, and 97.12%, corresponding to improvements of 4.88%, 8.75%, 1.93%, and 0.57%, respectively. These experimental results plainly illustrate the effectiveness of our proposed method.
Similar content being viewed by others
Data Availability
Our data sets are sourced from publicly available repositories. To access them, please visit the respective official websites for downloads.
References
Chandio, A., et al.: Precise single-stae detector. arXiv preprint arXiv:2210.04252 (2022)
Khan, W., et al.: Introducing urdu digits dataset with demonstration of an efficient and robust noisy decoder-based pseudo example generator. Symmetry 14(10), 1976 (2022). https://doi.org/10.3390/sym14101976
Roy, A.M., et al.: Wildect-yolo: an efficient and robust computer vision-based accurate object localization model for automated endangered wildlife detection. Eco. Inform. 75, 101919 (2023). https://doi.org/10.1016/j.ecoinf.2022.101919
He, K., et al.: Mask r-cnn. In Proceedings of the IEEE international conference on computer vision(ICCV), 2961–2969 (2017). https://doi.org/10.48550/arXiv.1703.06870
Liu, X., Deng, Z., Yang, Y.: Recent progress in semantic image segmentation. Artif. Intell. Rev. 52, 1089–1106 (2019). https://doi.org/10.1007/s10462-018-9641-3
Baseri Saadi, S., et al.: Investigation of effectiveness of shuffled frog-leaping optimizer in training a convolution neural network. J. Healthc. Eng. (2022). https://doi.org/10.1155/2022/4703682
Ranjbarzadeh, R., et al.: Me-ccnn: multi-encoded images and a cascade convolutional neural network for breast tumor segmentation and recognition. Artif. Intell. Rev. (2023). https://doi.org/10.1007/s10462-023-10426-2
Ranjbarzadeh, R., et al.: Mrfe-cnn: multi-route feature extraction model for breast tumor segmentation in mammograms using a convolutional neural network. Ann. Oper. Res. (2022). https://doi.org/10.1007/s10479-022-04755-8
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115, 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data 6(1), 1–48 (2019). https://doi.org/10.1186/s40537-019-0197-0
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:2210.04252 (2014)
Singh, A., et al.: Understanding eeg signals for subject-wise definition of armoni activities. arXiv preprint arXiv:2301.00948 (2023)
Bayer, M., Kaufhold, M.A., Reuter, C.: A survey on data augmentation for text classification. ACM Comput. Surv. 55(7), 1–39 (2022). https://doi.org/10.1145/3544558
Harris, E., et al.: Fmix: Enhancing mixed sample data augmentation. arXiv preprint arXiv:2002.12047 (2020)
Zhang, H., et al.: Mixup: beyond empirical risk minimization. arXiv preprint https://doi.org/10.48550/arXiv.1710.09412 (2017)
Yun, S., et al.: Cutmix: regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF international conference on computer vision(ICCV), 6023–6032 (2019). arXiv:1905.04899
Uddin, A.A.O.: Saliencymix: a saliency guided data augmentation strategy for better regularization. arXiv preprint arXiv:2006.01791 (2020)
Kim, J.H., Choo, W., Song, H.O.: Puzzle mix: Exploiting saliency and local statistics for optimal mixup. In International conference on machine learning, 5275–5285. PMLR (2020). arXiv:2009.06962v2
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017). https://doi.org/10.1145/3065386
Zhong, Z., et al.: Random erasing data augmentation. In Proceedings of the AAAI conference on artificial intelligence 34, 13001–13008 (2020). https://doi.org/10.1609/aaai.v34i07.7000
Singh, K.K., et al.: Hide-and-seek: A data augmentation technique for weakly-supervised localization and beyond. arXiv preprint arXiv:1811.02545 (2018)
Taylor, L., Nitschke, G.: Improving deep learning with generic data augmentation. In 2018 IEEE symposium series on computational intelligence (SSCI), 1542–1547. IEEE (2018). https://doi.org/10.1109/SSCI.2018.8628742
Verma, V., et al.: Manifold mixup: Better representations by interpolating hidden states. In International conference on machine learning, 6438–6447. PMLR (2019). arXiv:1806.05236v7
Yan, L., et al.: Lmix: regularization strategy for convolutional neural networks. SIViP 17(4), 1245–1253 (2023). https://doi.org/10.1007/s11760-022-02332-x
Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In International conference on machine learning, 3319–3328. PMLR (2017)
Zhao, R., et al.: Saliency detection by multi-context deep learning. In Proceedings of the IEEE conference on computer vision and pattern recognition(CVPR), 1265–1274 (2015)
Zhou, B., et al.: Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition(CVPR), 2921–2929 (2016)
Selvaraju, R.R., et al.: Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision(CVPR, 618–626 (2017)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Le, Y., Yang, X.: Tiny imagenet visual recognition challenge. CS 231N, 7(7), 3 (2015). http://cs231n.stanford.edu/tiny-imagenet-200
Netzer, Y., et al.: Reading digits in natural images with unsupervised feature learning (2011)
He, K., et al.: Identity mappings in deep residual networks. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part IV 14, 630–645. Springer (2016). https://doi.org/10.1007/978-3-319-46493-0_38
Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)
Han, D., Kim, J., Kim, J.: Deep pyramidal residual networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), 5927–5935 (2017). arxiv:1610.02915
Kim, J.H., et al.: Co-mixup: Saliency guided joint mixup with supermodular diversity. Cornell University-arXiv, Learning (2021)
Funding
This work is funded by the National Natural Science Foundation of China under Grant No. 61772180 and the Key R & D plan of Hubei Province No. 2023BCB041.
Author information
Authors and Affiliations
Contributions
LY and YY performed the main manuscript work and experiments. WC and SY created Tables 3 and 4. All authors participated in manuscript review.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yan, L., Ye, Y., Wang, C. et al. LocMix: local saliency-based data augmentation for image classification. SIViP 18, 1383–1392 (2024). https://doi.org/10.1007/s11760-023-02852-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-023-02852-0