Abstract
Deep neural networks (DNNs) having multiple hidden layers are very efficient to learn large volume datasets and applied in a wide range of applications. The DNNs are trained on these datasets using learning algorithms to learn the relationships among different variables. The base method that makes DNNs successful is stochastic gradient descent (SGD). The gradient reveals the way that a function’s steepest rate of alteration is occurring. No matter how the gradient behaves, the key issue with basic SGD is that all parameters must adjust in equal-sized increments. Consequently, creating adaptable step sizes for every parameter is an effective method of deep model optimization. Gradient-based adaptive techniques utilize local changes in gradients or the square roots of exponential moving averages of squared previous gradients. However, current optimizers continue to struggle with effectively utilizing optimization curved knowledge. The novel emapDiffP optimizer suggested in this study utilizes the prior two parameters to generate a non-periodic and non-negative function, and the upgrade parameter makes use of a partly adaptive value to account for learning rate adjustability. Thus, the optimization steps become smoother with a more accurate step size for the immediate past parameter, a partial adapting value, and the largest two momentum values as the denominator of parameter updating. The rigorous tests on benchmark datasets show that the presented emapDiffP performs significantly better than its counterparts. In terms of classification accuracy, the emapDiffP algorithm gives the best classification accuracy on CIFAR10, MNIST, and Mini-ImageNet datasets for all examined networks and on the CIFAR100 dataset for most of the networks examined. It offers the best classification accuracy on the ImageNet dataset with the ResNet18 model. For image classification tasks on various datasets, the suggested emapDiffP technique offers outstanding training speed. With MNIST, CIFAR100, and ImageNet datasets, the suggested approach achieves the lowest training times; with CIFAR10, it takes the second-lowest training times. The proposed emapDiffP performs better than existing approaches in the majority of circumstances when used on the Set5 dataset for the image super-resolution challenge. Additionally, compared to previous approaches, the suggested method performs better on a variety of NLP tasks and object detection tasks.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of data and material
Data will be made available on reasonable request.
References
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–44. https://doi.org/10.1038/nature14539
Subramanian M, Shanmugavadivel K, Nandhini P (2022) On fine-tuning deep learning models using transfer learning and hyper-parameters optimization for disease identification in maize leaves. Neural Comput Appl. https://doi.org/10.1007/s00521-022-07246-w
Wang G, Li W, Zuluaga MA, Pratt R, Patel PA, Aertsen M, Doel T, David AL, Deprest J, Ourselin S, Vercauteren T (2018) Interactive medical image segmentation using deep learning with image-specific fine tuning. IEEE Trans Med Imaging 37(7):1562–1573. https://doi.org/10.1109/TMI.2018.2791721
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th international conference on neural information processing systems, vol 1. NIPS’12. Curran Associates Inc., Red Hook, pp 1097–1105
Shao R, Lan X, Yuen PC (2019) Joint discriminative learning of deep dynamic textures for 3d mask face anti-spoofing. IEEE Trans Inf Forensics Secur 14(4):923–938. https://doi.org/10.1109/TIFS.2018.2868230
Ren F, Bracewell D (2009) Advanced information retrieval. Electron Notes Theor Comput Sci 225:303–317. https://doi.org/10.1016/j.entcs.2008.12.082
Khan AI, Al-Habsi S (2020) Machine learning in computer vision. Procedia Comput Sci 167:1444–1451. https://doi.org/10.1016/j.procs.2020.03.355
Kutlugün E, Eyüpoğlu C (2020) Artificial intelligence methods used in computer vision. In: 2020 5th International conference on computer science and engineering (UBMK), pp 214–218. https://doi.org/10.1109/UBMK50275.2020.9219385
Young T, Hazarika D, Poria S, Cambria E (2018) Recent trends in deep learning based natural language processing [review article]. IEEE Comput Intell Mag 13(3):55–75. https://doi.org/10.1109/MCI.2018.2840738
Nassif AB, Shahin I, Attili I, Azzeh M, Shaalan K (2019) Speech recognition using deep neural networks: a systematic review. IEEE Access 7:19143–19165. https://doi.org/10.1109/ACCESS.2019.2896880
Dong C, Loy CC, He K, Tang X (2016) Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell 38(2):295–307. https://doi.org/10.1109/TPAMI.2015.2439281
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial networks. Adv Neural Inf Process Syst. https://doi.org/10.1145/3422622
Gui J, Sun Z, Wen Y, Tao D, Ye J (2023) A review on generative adversarial networks: algorithms, theory, and applications. IEEE Trans Knowl Data Eng 35(4):3313–3332. https://doi.org/10.1109/TKDE.2021.3130191
Zabalza J, Ren J, Zheng J, Zhao H, Qing C, Yang Z, Du P, Marshall S (2016) Novel segmented stacked autoencoder for effective dimensionality reduction and feature extraction in hyperspectral imaging. Neurocomputing 214(C):1062. https://doi.org/10.1016/j.neucom.2016.09.065
Alzubaidi L, Zhang J, Humaidi A, Al-Dujaili A, Duan Y, Al-Shamma O, Santamaría J, Fadhel M, Al-Amidie M, Farhan L (2021) Review of deep learning: concepts, cnn architectures, challenges, applications, future directions. J Big Data. https://doi.org/10.1186/s40537-021-00444-8
Nandi U, Ghorai A, Singh M, Changdar C, Bhakta S, Pal R (2022) Indian sign language alphabet recognition system using cnn with diffgrad optimizer and stochastic pooling. Multimedia Tools Appl. https://doi.org/10.1007/s11042-021-11595-4
Ghorai A, Nandi U, Changdar C, Si T, Singh M, Mondal JK (2023) Indian sign language recognition system using network deconvolution and spatial transformer network. Neural Comput Appl. https://doi.org/10.1007/s00521-023-08860-y
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778. https://doi.org/10.1109/CVPR.2016.90
Yu F, Wang D, Shelhamer E, Darrell T (2018) Deep layer aggregation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 2403–2412. https://doi.org/10.1109/CVPR.2018.00255
Huang G, Liu Z, Weinberger KQ (2016) Densely connected convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2261–2269
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 5987–5995. https://doi.org/10.1109/CVPR.2017.634
Qian N (1999) On the momentum term in gradient descent learning algorithms. Neural Netw 12(1):145–151. https://doi.org/10.1016/S0893-6080(98)00116-6
Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12(61):2121–2159
Kingma D, Ba J (2015) Adam: a method for stochastic optimization. In: Bengio Y, LeCun Y (eds) 3rd International conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, conference track proceedings. arXiv: http://arxiv.org/abs/1412.6980
Dogo EM, Afolabi OJ, Nwulu NI, Twala B, Aigbavboa CO (2018) A comparative analysis of gradient descent-based optimization algorithms on convolutional neural networks. In: 2018 International conference on computational techniques, electronics and mechanical systems (CTEMS), pp 92–99 https://doi.org/10.1109/CTEMS.2018.8769211
Reyad M, Sarhan A, Arafa M (2023) A modified adam algorithm for deep neural network optimization. Neural Comput Appl. https://doi.org/10.1007/s00521-023-08568-z
Sutton R (1986) Two problems with back propagation and other steepest descent learning procedures for networks. In: Proceedings of the eighth annual conference of the cognitive science society, pp 823–832
Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv:abs/1609.04747
Bottou L (1991) Stochastic gradient learning in neural networks
Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: Dasgupta S, McAllester D (eds) Proceedings of the 30th international conference on machine learning. Proceedings of machine learning research, vol 28. PMLR, Atlanta, pp 1139–1147. https://proceedings.mlr.press/v28/sutskever13.html
Shaziya H (2020) A study of the optimization algorithms in deep learning. https://doi.org/10.1109/ICISC44355.2019.9036442
Yong H, Huang J, Hua X, Zhang L (2020) Gradient centralization: a new optimization technique for deep neural networks, pp 635–652. https://doi.org/10.1007/978-3-030-58452-8_37
Dubey SR, Chakraborty S, Roy S, Mukherjee S, Singh S, Chaudhuri B (2019) diffgrad: an optimization method for convolutional neural networks. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2019.2955777
Roy SK, Paoletti ME, Haut JM, Dubey SR, Kar P, Plaza A, Chaudhuri BB. AngularGrad: A new optimization technique for angular convergence of convolutional neural networks
Reddi SJ, Kale S, Kumar S (2018) On the convergence of adam and beyond. In: International conference on learning representations. https://openreview.net/forum?id=ryQu7f-RZ
Chen J, Zhou D, Tang Y, Yang Z, Cao Y, Gu Q (2020) Closing the generalization gap of adaptive gradient methods in training deep neural networks, pp 3239–3247. https://doi.org/10.24963/ijcai.2020/448
Wang H, Luo Y, An W, Sun Q, Xu J, Zhang L (2020) Pid controller-based stochastic optimization acceleration for deep neural networks. IEEE Trans Neural Netw Learn Syst 31(12):5079–5091. https://doi.org/10.1109/TNNLS.2019.2963066
Huang H, Wang C, Dong B (2019) Nostalgic adam: Weighting more of the past gradients when designing the adaptive learning rate, pp 2556–2562. https://doi.org/10.24963/ijcai.2019/355
Zaheer M, Reddi S, Sachan D, Kale S, Kumar S (2018) Adaptive methods for nonconvex optimization. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems, vol 31. https://proceedings.neurips.cc/paper_files/paper/2018/file/90365351ccc7437a1309dc64e4db32a3-Paper.pdf
Bhakta S, Nandi U, Changdar C, Marjit Singh M (2023) angularparameter: a novel optimization technique for deep learning models. In: Sisodia DS, Garg L, Pachori RB, Tanveer M (eds) Machine intelligence techniques for data analysis and signal processing. Springer, Singapore, pp 201–212. https://doi.org/10.1007/978-981-99-0085-5_17
Liu Y-J, Chen CLP, Wen G-X, Tong S (2011) Adaptive neural output feedback tracking control for a class of uncertain discrete-time nonlinear systems. IEEE Trans Neural Netw 22(7):1162–1167. https://doi.org/10.1109/TNN.2011.2146788
Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: Dasgupta S, McAllester D (eds) Proceedings of the 30th international conference on machine learning. Proceedings of machine learning research, vol 28. PMLR, Atlanta, pp 1139–1147. https://proceedings.mlr.press/v28/sutskever13.html
Heo B, Chun S, Oh SJ, Han D, Yun S, Kim G, Uh Y, Ha JW (2021) Adamp: slowing down the slowdown for momentum optimizers on scale-invariant weights. In: International conference on learning representations. https://openreview.net/forum?id=Iz3zU3M316D
Zhuang J, Tang T, Ding Y, Tatikonda SC, Dvornek N, Papademetris X, Duncan J (2020) Adabelief optimizer: adapting stepsizes by the belief in observed gradients. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H (eds) Advances in neural information processing systems, vol 33, pp 18795–18806. https://proceedings.neurips.cc/paper_files/paper/2020/file/d9d4f495e875a2e075a1a4a6e1b9770f-Paper.pdf
Reyad M, Sarhan A, Arafa M (2023) A modified adam algorithm for deep neural network optimization. Neural Comput Appl 35:1–18. https://doi.org/10.1007/s00521-023-08568-z
Bhakta S, Nandi U, Si T, Ghosal S, Changdar C, Pal R (2022) Diffmoment: an adaptive optimization technique for convolutional neural network. Appl Intell. https://doi.org/10.1007/s10489-022-04382-7
Loshchilov I, Hutter F (2019) Decoupled weight decay regularization
Zhao X, Huang C, Liu B, Cao J (2023) Stability analysis of delay patch-constructed Nicholson’s blowflies system. Math Comput Simul. https://doi.org/10.1016/j.matcom.2023.09.012
Huang C, Liu B, Qian C, Cao J (2021) Stability on positive pseudo almost periodic solutions of hpdcnns incorporating d operator. Math Comput Simul 190:1150–1163. https://doi.org/10.1016/j.matcom.2021.06.027
Huang C, Liu B, Yang H, Cao J (2022) Positive almost periodicity on sicnns incorporating mixed delays and d operator. Nonlinear Anal Model Control 27:1–21. https://doi.org/10.15388/namc.2022.27.27417
Danilova M, Dvurechensky PE, Gasnikov AV, Gorbunov E, Guminov S, Kamzolov D, Shibaev I (2020) Recent theoretical advances in non-convex optimization. arXiv:2012.06188
Rosenbrock HH (1960) An automatic method for finding the greatest or least value of a function. Comput J 3(3):175–184. https://doi.org/10.1093/comjnl/3.3.175
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical Report 0, University of Toronto, Toronto
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg A, Fei-Fei L (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848
Maas AL (2013) Rectifier nonlinearities improve neural network acoustic models
Clevert DA, Unterthiner T, Hochreiter S (2015) Fast and accurate deep network learning by exponential linear units (elus). Under Review of ICLR2016 (1997)
Deng L (2012) The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Process Mag 29(6):141–142. https://doi.org/10.1109/MSP.2012.2211477
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86:2278–2324. https://doi.org/10.1109/5.726791
Dong C, Loy CC, He K, Tang X (2016) Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell 38(2):295–307. https://doi.org/10.1109/TPAMI.2015.2439281
Dong C, Loy CC, Tang X (2016) Accelerating the super-resolution convolutional neural network 9906, pp 391–407. https://doi.org/10.1007/978-3-319-46475-6_25
Bevilacqua M, Roumy A, Guillemot C, Alberi-Morel M (2012) Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In: Bowden R, Collomosse JP, Mikolajczyk K (eds) British machine vision conference, BMVC 2012, Surrey, September 3–7, pp 1–10.https://doi.org/10.5244/C.26.135
Tripathi AM, Mishra A (2022) Revamped knowledge distillation for sound classification. In: 2022 International joint conference on neural networks (IJCNN), pp 1–8. https://doi.org/10.1109/IJCNN55064.2022.9892474
Bantupalli K, Xie Y (2018) American sign language recognition using deep learning and computer vision. In: 2018 IEEE international conference on big data (big data), pp 4896–4899. https://doi.org/10.1109/BigData.2018.8622141
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer vision-ECCV 2014. Springer, Cham, pp 740–755
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Acknowledgements
We would like to thank the Dept. of Computer Science, Vidyasagar University, Paschim Medinipur, Midnapore 721102, West Bengal, India, for providing the infrastructure to carry out our experiments.
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
SB provided conceptualization, implementation, and drafting; UN analyzed investigation, methodology, analysis, and supervision; others prepared reviewing and editing.
Corresponding author
Ethics declarations
Conflict of interest
There are no conflicts of interest or conflict of interest.
Ethics approval
The authors approve that the research presented in this paper was conducted following the principles of ethical and professional conduct.
Consent to participate
Not applicable.
Consent for publication
Not applicable; the authors used publicly available data only and provided the corresponding references.
Code availability
Custom code is available.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bhakta, S., Nandi, U., Changdar, C. et al. emapDiffP: A novel learning algorithm for convolutional neural network optimization. Neural Comput & Applic 36, 11987–12010 (2024). https://doi.org/10.1007/s00521-024-09708-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-024-09708-9