emapDiffP: A novel learning algorithm for convolutional neural network optimization

Shubhankar Bhakta¹,
Utpal Nandi ORCID: orcid.org/0000-0002-9638-1906¹,
Chiranjit Changdar²,
Sudipta Kr Ghosal³ &
…
Rajat Kumar Pal⁴

141 Accesses
Explore all metrics

Abstract

Deep neural networks (DNNs) having multiple hidden layers are very efficient to learn large volume datasets and applied in a wide range of applications. The DNNs are trained on these datasets using learning algorithms to learn the relationships among different variables. The base method that makes DNNs successful is stochastic gradient descent (SGD). The gradient reveals the way that a function’s steepest rate of alteration is occurring. No matter how the gradient behaves, the key issue with basic SGD is that all parameters must adjust in equal-sized increments. Consequently, creating adaptable step sizes for every parameter is an effective method of deep model optimization. Gradient-based adaptive techniques utilize local changes in gradients or the square roots of exponential moving averages of squared previous gradients. However, current optimizers continue to struggle with effectively utilizing optimization curved knowledge. The novel emapDiffP optimizer suggested in this study utilizes the prior two parameters to generate a non-periodic and non-negative function, and the upgrade parameter makes use of a partly adaptive value to account for learning rate adjustability. Thus, the optimization steps become smoother with a more accurate step size for the immediate past parameter, a partial adapting value, and the largest two momentum values as the denominator of parameter updating. The rigorous tests on benchmark datasets show that the presented emapDiffP performs significantly better than its counterparts. In terms of classification accuracy, the emapDiffP algorithm gives the best classification accuracy on CIFAR10, MNIST, and Mini-ImageNet datasets for all examined networks and on the CIFAR100 dataset for most of the networks examined. It offers the best classification accuracy on the ImageNet dataset with the ResNet18 model. For image classification tasks on various datasets, the suggested emapDiffP technique offers outstanding training speed. With MNIST, CIFAR100, and ImageNet datasets, the suggested approach achieves the lowest training times; with CIFAR10, it takes the second-lowest training times. The proposed emapDiffP performs better than existing approaches in the majority of circumstances when used on the Set5 dataset for the image super-resolution challenge. Additionally, compared to previous approaches, the suggested method performs better on a variety of NLP tasks and object detection tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DiffMoment: an adaptive optimization technique for convolutional neural network

Article 17 December 2022

sqFm: a novel adaptive optimization scheme for deep learning model

Article 17 January 2024

Moment Centralization-Based Gradient Descent Optimizers for Convolutional Neural Networks

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Availability of data and material

Data will be made available on reasonable request.

References

LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–44. https://doi.org/10.1038/nature14539
Article Google Scholar
Subramanian M, Shanmugavadivel K, Nandhini P (2022) On fine-tuning deep learning models using transfer learning and hyper-parameters optimization for disease identification in maize leaves. Neural Comput Appl. https://doi.org/10.1007/s00521-022-07246-w
Article Google Scholar
Wang G, Li W, Zuluaga MA, Pratt R, Patel PA, Aertsen M, Doel T, David AL, Deprest J, Ourselin S, Vercauteren T (2018) Interactive medical image segmentation using deep learning with image-specific fine tuning. IEEE Trans Med Imaging 37(7):1562–1573. https://doi.org/10.1109/TMI.2018.2791721
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th international conference on neural information processing systems, vol 1. NIPS’12. Curran Associates Inc., Red Hook, pp 1097–1105
Shao R, Lan X, Yuen PC (2019) Joint discriminative learning of deep dynamic textures for 3d mask face anti-spoofing. IEEE Trans Inf Forensics Secur 14(4):923–938. https://doi.org/10.1109/TIFS.2018.2868230
Article Google Scholar
Ren F, Bracewell D (2009) Advanced information retrieval. Electron Notes Theor Comput Sci 225:303–317. https://doi.org/10.1016/j.entcs.2008.12.082
Article Google Scholar
Khan AI, Al-Habsi S (2020) Machine learning in computer vision. Procedia Comput Sci 167:1444–1451. https://doi.org/10.1016/j.procs.2020.03.355
Article Google Scholar
Kutlugün E, Eyüpoğlu C (2020) Artificial intelligence methods used in computer vision. In: 2020 5th International conference on computer science and engineering (UBMK), pp 214–218. https://doi.org/10.1109/UBMK50275.2020.9219385
Young T, Hazarika D, Poria S, Cambria E (2018) Recent trends in deep learning based natural language processing [review article]. IEEE Comput Intell Mag 13(3):55–75. https://doi.org/10.1109/MCI.2018.2840738
Article Google Scholar
Nassif AB, Shahin I, Attili I, Azzeh M, Shaalan K (2019) Speech recognition using deep neural networks: a systematic review. IEEE Access 7:19143–19165. https://doi.org/10.1109/ACCESS.2019.2896880
Article Google Scholar
Dong C, Loy CC, He K, Tang X (2016) Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell 38(2):295–307. https://doi.org/10.1109/TPAMI.2015.2439281
Article Google Scholar
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Article Google Scholar
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial networks. Adv Neural Inf Process Syst. https://doi.org/10.1145/3422622
Article Google Scholar
Gui J, Sun Z, Wen Y, Tao D, Ye J (2023) A review on generative adversarial networks: algorithms, theory, and applications. IEEE Trans Knowl Data Eng 35(4):3313–3332. https://doi.org/10.1109/TKDE.2021.3130191
Article Google Scholar
Zabalza J, Ren J, Zheng J, Zhao H, Qing C, Yang Z, Du P, Marshall S (2016) Novel segmented stacked autoencoder for effective dimensionality reduction and feature extraction in hyperspectral imaging. Neurocomputing 214(C):1062. https://doi.org/10.1016/j.neucom.2016.09.065
Article Google Scholar
Alzubaidi L, Zhang J, Humaidi A, Al-Dujaili A, Duan Y, Al-Shamma O, Santamaría J, Fadhel M, Al-Amidie M, Farhan L (2021) Review of deep learning: concepts, cnn architectures, challenges, applications, future directions. J Big Data. https://doi.org/10.1186/s40537-021-00444-8
Article Google Scholar
Nandi U, Ghorai A, Singh M, Changdar C, Bhakta S, Pal R (2022) Indian sign language alphabet recognition system using cnn with diffgrad optimizer and stochastic pooling. Multimedia Tools Appl. https://doi.org/10.1007/s11042-021-11595-4
Article Google Scholar
Ghorai A, Nandi U, Changdar C, Si T, Singh M, Mondal JK (2023) Indian sign language recognition system using network deconvolution and spatial transformer network. Neural Comput Appl. https://doi.org/10.1007/s00521-023-08860-y
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778. https://doi.org/10.1109/CVPR.2016.90
Yu F, Wang D, Shelhamer E, Darrell T (2018) Deep layer aggregation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 2403–2412. https://doi.org/10.1109/CVPR.2018.00255
Huang G, Liu Z, Weinberger KQ (2016) Densely connected convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2261–2269
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 5987–5995. https://doi.org/10.1109/CVPR.2017.634
Qian N (1999) On the momentum term in gradient descent learning algorithms. Neural Netw 12(1):145–151. https://doi.org/10.1016/S0893-6080(98)00116-6
Article Google Scholar
Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12(61):2121–2159
MathSciNet Google Scholar
Kingma D, Ba J (2015) Adam: a method for stochastic optimization. In: Bengio Y, LeCun Y (eds) 3rd International conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, conference track proceedings. arXiv: http://arxiv.org/abs/1412.6980
Dogo EM, Afolabi OJ, Nwulu NI, Twala B, Aigbavboa CO (2018) A comparative analysis of gradient descent-based optimization algorithms on convolutional neural networks. In: 2018 International conference on computational techniques, electronics and mechanical systems (CTEMS), pp 92–99 https://doi.org/10.1109/CTEMS.2018.8769211
Reyad M, Sarhan A, Arafa M (2023) A modified adam algorithm for deep neural network optimization. Neural Comput Appl. https://doi.org/10.1007/s00521-023-08568-z
Article Google Scholar
Sutton R (1986) Two problems with back propagation and other steepest descent learning procedures for networks. In: Proceedings of the eighth annual conference of the cognitive science society, pp 823–832
Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv:abs/1609.04747
Bottou L (1991) Stochastic gradient learning in neural networks
Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: Dasgupta S, McAllester D (eds) Proceedings of the 30th international conference on machine learning. Proceedings of machine learning research, vol 28. PMLR, Atlanta, pp 1139–1147. https://proceedings.mlr.press/v28/sutskever13.html
Shaziya H (2020) A study of the optimization algorithms in deep learning. https://doi.org/10.1109/ICISC44355.2019.9036442
Yong H, Huang J, Hua X, Zhang L (2020) Gradient centralization: a new optimization technique for deep neural networks, pp 635–652. https://doi.org/10.1007/978-3-030-58452-8_37
Dubey SR, Chakraborty S, Roy S, Mukherjee S, Singh S, Chaudhuri B (2019) diffgrad: an optimization method for convolutional neural networks. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2019.2955777
Article Google Scholar
Roy SK, Paoletti ME, Haut JM, Dubey SR, Kar P, Plaza A, Chaudhuri BB. AngularGrad: A new optimization technique for angular convergence of convolutional neural networks
Reddi SJ, Kale S, Kumar S (2018) On the convergence of adam and beyond. In: International conference on learning representations. https://openreview.net/forum?id=ryQu7f-RZ
Chen J, Zhou D, Tang Y, Yang Z, Cao Y, Gu Q (2020) Closing the generalization gap of adaptive gradient methods in training deep neural networks, pp 3239–3247. https://doi.org/10.24963/ijcai.2020/448
Wang H, Luo Y, An W, Sun Q, Xu J, Zhang L (2020) Pid controller-based stochastic optimization acceleration for deep neural networks. IEEE Trans Neural Netw Learn Syst 31(12):5079–5091. https://doi.org/10.1109/TNNLS.2019.2963066
Article Google Scholar
Huang H, Wang C, Dong B (2019) Nostalgic adam: Weighting more of the past gradients when designing the adaptive learning rate, pp 2556–2562. https://doi.org/10.24963/ijcai.2019/355
Zaheer M, Reddi S, Sachan D, Kale S, Kumar S (2018) Adaptive methods for nonconvex optimization. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems, vol 31. https://proceedings.neurips.cc/paper_files/paper/2018/file/90365351ccc7437a1309dc64e4db32a3-Paper.pdf
Bhakta S, Nandi U, Changdar C, Marjit Singh M (2023) angularparameter: a novel optimization technique for deep learning models. In: Sisodia DS, Garg L, Pachori RB, Tanveer M (eds) Machine intelligence techniques for data analysis and signal processing. Springer, Singapore, pp 201–212. https://doi.org/10.1007/978-981-99-0085-5_17
Chapter Google Scholar
Liu Y-J, Chen CLP, Wen G-X, Tong S (2011) Adaptive neural output feedback tracking control for a class of uncertain discrete-time nonlinear systems. IEEE Trans Neural Netw 22(7):1162–1167. https://doi.org/10.1109/TNN.2011.2146788
Article Google Scholar
Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: Dasgupta S, McAllester D (eds) Proceedings of the 30th international conference on machine learning. Proceedings of machine learning research, vol 28. PMLR, Atlanta, pp 1139–1147. https://proceedings.mlr.press/v28/sutskever13.html
Heo B, Chun S, Oh SJ, Han D, Yun S, Kim G, Uh Y, Ha JW (2021) Adamp: slowing down the slowdown for momentum optimizers on scale-invariant weights. In: International conference on learning representations. https://openreview.net/forum?id=Iz3zU3M316D
Zhuang J, Tang T, Ding Y, Tatikonda SC, Dvornek N, Papademetris X, Duncan J (2020) Adabelief optimizer: adapting stepsizes by the belief in observed gradients. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H (eds) Advances in neural information processing systems, vol 33, pp 18795–18806. https://proceedings.neurips.cc/paper_files/paper/2020/file/d9d4f495e875a2e075a1a4a6e1b9770f-Paper.pdf
Reyad M, Sarhan A, Arafa M (2023) A modified adam algorithm for deep neural network optimization. Neural Comput Appl 35:1–18. https://doi.org/10.1007/s00521-023-08568-z
Article Google Scholar
Bhakta S, Nandi U, Si T, Ghosal S, Changdar C, Pal R (2022) Diffmoment: an adaptive optimization technique for convolutional neural network. Appl Intell. https://doi.org/10.1007/s10489-022-04382-7
Article Google Scholar
Loshchilov I, Hutter F (2019) Decoupled weight decay regularization
Zhao X, Huang C, Liu B, Cao J (2023) Stability analysis of delay patch-constructed Nicholson’s blowflies system. Math Comput Simul. https://doi.org/10.1016/j.matcom.2023.09.012
Article Google Scholar
Huang C, Liu B, Qian C, Cao J (2021) Stability on positive pseudo almost periodic solutions of hpdcnns incorporating d operator. Math Comput Simul 190:1150–1163. https://doi.org/10.1016/j.matcom.2021.06.027
Article MathSciNet Google Scholar
Huang C, Liu B, Yang H, Cao J (2022) Positive almost periodicity on sicnns incorporating mixed delays and d operator. Nonlinear Anal Model Control 27:1–21. https://doi.org/10.15388/namc.2022.27.27417
Article MathSciNet Google Scholar
Danilova M, Dvurechensky PE, Gasnikov AV, Gorbunov E, Guminov S, Kamzolov D, Shibaev I (2020) Recent theoretical advances in non-convex optimization. arXiv:2012.06188
Rosenbrock HH (1960) An automatic method for finding the greatest or least value of a function. Comput J 3(3):175–184. https://doi.org/10.1093/comjnl/3.3.175
Article MathSciNet Google Scholar
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical Report 0, University of Toronto, Toronto
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg A, Fei-Fei L (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848
Maas AL (2013) Rectifier nonlinearities improve neural network acoustic models
Clevert DA, Unterthiner T, Hochreiter S (2015) Fast and accurate deep network learning by exponential linear units (elus). Under Review of ICLR2016 (1997)
Deng L (2012) The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Process Mag 29(6):141–142. https://doi.org/10.1109/MSP.2012.2211477
Article Google Scholar
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86:2278–2324. https://doi.org/10.1109/5.726791
Article Google Scholar
Dong C, Loy CC, He K, Tang X (2016) Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell 38(2):295–307. https://doi.org/10.1109/TPAMI.2015.2439281
Article Google Scholar
Dong C, Loy CC, Tang X (2016) Accelerating the super-resolution convolutional neural network 9906, pp 391–407. https://doi.org/10.1007/978-3-319-46475-6_25
Bevilacqua M, Roumy A, Guillemot C, Alberi-Morel M (2012) Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In: Bowden R, Collomosse JP, Mikolajczyk K (eds) British machine vision conference, BMVC 2012, Surrey, September 3–7, pp 1–10.https://doi.org/10.5244/C.26.135
Tripathi AM, Mishra A (2022) Revamped knowledge distillation for sound classification. In: 2022 International joint conference on neural networks (IJCNN), pp 1–8. https://doi.org/10.1109/IJCNN55064.2022.9892474
Bantupalli K, Xie Y (2018) American sign language recognition using deep learning and computer vision. In: 2018 IEEE international conference on big data (big data), pp 4896–4899. https://doi.org/10.1109/BigData.2018.8622141
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer vision-ECCV 2014. Springer, Cham, pp 740–755
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Article Google Scholar

Download references

Acknowledgements

We would like to thank the Dept. of Computer Science, Vidyasagar University, Paschim Medinipur, Midnapore 721102, West Bengal, India, for providing the infrastructure to carry out our experiments.

Funding

Not applicable.

Author information

Authors and Affiliations

Department of Computer Science, Vidyasagar University, Midnapore, West Bengal, 721102, India
Shubhankar Bhakta & Utpal Nandi
Department of Computer Science, Belda College, Belda, West Bengal, 721424, India
Chiranjit Changdar
Department of Computer Science and Technology, Behala Goverment Polytechnic, Kolkata, West Bengal, 720060, India
Sudipta Kr Ghosal
Department of Computer Science and Engineering, University of Calcutta, Kolkata, West Bengal, 700106, India
Rajat Kumar Pal

Authors

Shubhankar Bhakta
View author publications
You can also search for this author in PubMed Google Scholar
Utpal Nandi
View author publications
You can also search for this author in PubMed Google Scholar
Chiranjit Changdar
View author publications
You can also search for this author in PubMed Google Scholar
Sudipta Kr Ghosal
View author publications
You can also search for this author in PubMed Google Scholar
Rajat Kumar Pal
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

SB provided conceptualization, implementation, and drafting; UN analyzed investigation, methodology, analysis, and supervision; others prepared reviewing and editing.

Corresponding author

Correspondence to Utpal Nandi.

Ethics declarations

Conflict of interest

There are no conflicts of interest or conflict of interest.

Ethics approval

The authors approve that the research presented in this paper was conducted following the principles of ethical and professional conduct.

Consent to participate

Not applicable.

Consent for publication

Not applicable; the authors used publicly available data only and provided the corresponding references.

Code availability

Custom code is available.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Bhakta, S., Nandi, U., Changdar, C. et al. emapDiffP: A novel learning algorithm for convolutional neural network optimization. Neural Comput & Applic 36, 11987–12010 (2024). https://doi.org/10.1007/s00521-024-09708-9

Download citation

Received: 05 August 2023
Accepted: 25 March 2024
Published: 18 April 2024
Issue Date: July 2024
DOI: https://doi.org/10.1007/s00521-024-09708-9

emapDiffP: A novel learning algorithm for convolutional neural network optimization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

DiffMoment: an adaptive optimization technique for convolutional neural network

sqFm: a novel adaptive optimization scheme for deep learning model

Moment Centralization-Based Gradient Descent Optimizers for Convolutional Neural Networks

Availability of data and material

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Code availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

emapDiffP: A novel learning algorithm for convolutional neural network optimization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

DiffMoment: an adaptive optimization technique for convolutional neural network

sqFm: a novel adaptive optimization scheme for deep learning model

Moment Centralization-Based Gradient Descent Optimizers for Convolutional Neural Networks

Explore related subjects

Availability of data and material

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent for publication

Code availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation