Nothing Special   »   [go: up one dir, main page]

Skip to main content

Advertisement

Log in

emapDiffP: A novel learning algorithm for convolutional neural network optimization

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Deep neural networks (DNNs) having multiple hidden layers are very efficient to learn large volume datasets and applied in a wide range of applications. The DNNs are trained on these datasets using learning algorithms to learn the relationships among different variables. The base method that makes DNNs successful is stochastic gradient descent (SGD). The gradient reveals the way that a function’s steepest rate of alteration is occurring. No matter how the gradient behaves, the key issue with basic SGD is that all parameters must adjust in equal-sized increments. Consequently, creating adaptable step sizes for every parameter is an effective method of deep model optimization. Gradient-based adaptive techniques utilize local changes in gradients or the square roots of exponential moving averages of squared previous gradients. However, current optimizers continue to struggle with effectively utilizing optimization curved knowledge. The novel emapDiffP optimizer suggested in this study utilizes the prior two parameters to generate a non-periodic and non-negative function, and the upgrade parameter makes use of a partly adaptive value to account for learning rate adjustability. Thus, the optimization steps become smoother with a more accurate step size for the immediate past parameter, a partial adapting value, and the largest two momentum values as the denominator of parameter updating. The rigorous tests on benchmark datasets show that the presented emapDiffP performs significantly better than its counterparts. In terms of classification accuracy, the emapDiffP algorithm gives the best classification accuracy on CIFAR10, MNIST, and Mini-ImageNet datasets for all examined networks and on the CIFAR100 dataset for most of the networks examined. It offers the best classification accuracy on the ImageNet dataset with the ResNet18 model. For image classification tasks on various datasets, the suggested emapDiffP technique offers outstanding training speed. With MNIST, CIFAR100, and ImageNet datasets, the suggested approach achieves the lowest training times; with CIFAR10, it takes the second-lowest training times. The proposed emapDiffP performs better than existing approaches in the majority of circumstances when used on the Set5 dataset for the image super-resolution challenge. Additionally, compared to previous approaches, the suggested method performs better on a variety of NLP tasks and object detection tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Algorithm 2
Algorithm 3
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Availability of data and material

Data will be made available on reasonable request.

References

  1. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–44. https://doi.org/10.1038/nature14539

    Article  Google Scholar 

  2. Subramanian M, Shanmugavadivel K, Nandhini P (2022) On fine-tuning deep learning models using transfer learning and hyper-parameters optimization for disease identification in maize leaves. Neural Comput Appl. https://doi.org/10.1007/s00521-022-07246-w

    Article  Google Scholar 

  3. Wang G, Li W, Zuluaga MA, Pratt R, Patel PA, Aertsen M, Doel T, David AL, Deprest J, Ourselin S, Vercauteren T (2018) Interactive medical image segmentation using deep learning with image-specific fine tuning. IEEE Trans Med Imaging 37(7):1562–1573. https://doi.org/10.1109/TMI.2018.2791721

    Article  Google Scholar 

  4. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th international conference on neural information processing systems, vol 1. NIPS’12. Curran Associates Inc., Red Hook, pp 1097–1105

  5. Shao R, Lan X, Yuen PC (2019) Joint discriminative learning of deep dynamic textures for 3d mask face anti-spoofing. IEEE Trans Inf Forensics Secur 14(4):923–938. https://doi.org/10.1109/TIFS.2018.2868230

    Article  Google Scholar 

  6. Ren F, Bracewell D (2009) Advanced information retrieval. Electron Notes Theor Comput Sci 225:303–317. https://doi.org/10.1016/j.entcs.2008.12.082

    Article  Google Scholar 

  7. Khan AI, Al-Habsi S (2020) Machine learning in computer vision. Procedia Comput Sci 167:1444–1451. https://doi.org/10.1016/j.procs.2020.03.355

    Article  Google Scholar 

  8. Kutlugün E, Eyüpoğlu C (2020) Artificial intelligence methods used in computer vision. In: 2020 5th International conference on computer science and engineering (UBMK), pp 214–218. https://doi.org/10.1109/UBMK50275.2020.9219385

  9. Young T, Hazarika D, Poria S, Cambria E (2018) Recent trends in deep learning based natural language processing [review article]. IEEE Comput Intell Mag 13(3):55–75. https://doi.org/10.1109/MCI.2018.2840738

    Article  Google Scholar 

  10. Nassif AB, Shahin I, Attili I, Azzeh M, Shaalan K (2019) Speech recognition using deep neural networks: a systematic review. IEEE Access 7:19143–19165. https://doi.org/10.1109/ACCESS.2019.2896880

    Article  Google Scholar 

  11. Dong C, Loy CC, He K, Tang X (2016) Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell 38(2):295–307. https://doi.org/10.1109/TPAMI.2015.2439281

    Article  Google Scholar 

  12. Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

  13. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial networks. Adv Neural Inf Process Syst. https://doi.org/10.1145/3422622

    Article  Google Scholar 

  14. Gui J, Sun Z, Wen Y, Tao D, Ye J (2023) A review on generative adversarial networks: algorithms, theory, and applications. IEEE Trans Knowl Data Eng 35(4):3313–3332. https://doi.org/10.1109/TKDE.2021.3130191

    Article  Google Scholar 

  15. Zabalza J, Ren J, Zheng J, Zhao H, Qing C, Yang Z, Du P, Marshall S (2016) Novel segmented stacked autoencoder for effective dimensionality reduction and feature extraction in hyperspectral imaging. Neurocomputing 214(C):1062. https://doi.org/10.1016/j.neucom.2016.09.065

    Article  Google Scholar 

  16. Alzubaidi L, Zhang J, Humaidi A, Al-Dujaili A, Duan Y, Al-Shamma O, Santamaría J, Fadhel M, Al-Amidie M, Farhan L (2021) Review of deep learning: concepts, cnn architectures, challenges, applications, future directions. J Big Data. https://doi.org/10.1186/s40537-021-00444-8

    Article  Google Scholar 

  17. Nandi U, Ghorai A, Singh M, Changdar C, Bhakta S, Pal R (2022) Indian sign language alphabet recognition system using cnn with diffgrad optimizer and stochastic pooling. Multimedia Tools Appl. https://doi.org/10.1007/s11042-021-11595-4

    Article  Google Scholar 

  18. Ghorai A, Nandi U, Changdar C, Si T, Singh M, Mondal JK (2023) Indian sign language recognition system using network deconvolution and spatial transformer network. Neural Comput Appl. https://doi.org/10.1007/s00521-023-08860-y

    Article  Google Scholar 

  19. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778. https://doi.org/10.1109/CVPR.2016.90

  20. Yu F, Wang D, Shelhamer E, Darrell T (2018) Deep layer aggregation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 2403–2412. https://doi.org/10.1109/CVPR.2018.00255

  21. Huang G, Liu Z, Weinberger KQ (2016) Densely connected convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2261–2269

  22. Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 5987–5995. https://doi.org/10.1109/CVPR.2017.634

  23. Qian N (1999) On the momentum term in gradient descent learning algorithms. Neural Netw 12(1):145–151. https://doi.org/10.1016/S0893-6080(98)00116-6

    Article  Google Scholar 

  24. Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12(61):2121–2159

    MathSciNet  Google Scholar 

  25. Kingma D, Ba J (2015) Adam: a method for stochastic optimization. In: Bengio Y, LeCun Y (eds) 3rd International conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, conference track proceedings. arXiv: http://arxiv.org/abs/1412.6980

  26. Dogo EM, Afolabi OJ, Nwulu NI, Twala B, Aigbavboa CO (2018) A comparative analysis of gradient descent-based optimization algorithms on convolutional neural networks. In: 2018 International conference on computational techniques, electronics and mechanical systems (CTEMS), pp 92–99 https://doi.org/10.1109/CTEMS.2018.8769211

  27. Reyad M, Sarhan A, Arafa M (2023) A modified adam algorithm for deep neural network optimization. Neural Comput Appl. https://doi.org/10.1007/s00521-023-08568-z

    Article  Google Scholar 

  28. Sutton R (1986) Two problems with back propagation and other steepest descent learning procedures for networks. In: Proceedings of the eighth annual conference of the cognitive science society, pp 823–832

  29. Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv:abs/1609.04747

  30. Bottou L (1991) Stochastic gradient learning in neural networks

  31. Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: Dasgupta S, McAllester D (eds) Proceedings of the 30th international conference on machine learning. Proceedings of machine learning research, vol 28. PMLR, Atlanta, pp 1139–1147. https://proceedings.mlr.press/v28/sutskever13.html

  32. Shaziya H (2020) A study of the optimization algorithms in deep learning. https://doi.org/10.1109/ICISC44355.2019.9036442

  33. Yong H, Huang J, Hua X, Zhang L (2020) Gradient centralization: a new optimization technique for deep neural networks, pp 635–652. https://doi.org/10.1007/978-3-030-58452-8_37

  34. Dubey SR, Chakraborty S, Roy S, Mukherjee S, Singh S, Chaudhuri B (2019) diffgrad: an optimization method for convolutional neural networks. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2019.2955777

    Article  Google Scholar 

  35. Roy SK, Paoletti ME, Haut JM, Dubey SR, Kar P, Plaza A, Chaudhuri BB. AngularGrad: A new optimization technique for angular convergence of convolutional neural networks

  36. Reddi SJ, Kale S, Kumar S (2018) On the convergence of adam and beyond. In: International conference on learning representations. https://openreview.net/forum?id=ryQu7f-RZ

  37. Chen J, Zhou D, Tang Y, Yang Z, Cao Y, Gu Q (2020) Closing the generalization gap of adaptive gradient methods in training deep neural networks, pp 3239–3247. https://doi.org/10.24963/ijcai.2020/448

  38. Wang H, Luo Y, An W, Sun Q, Xu J, Zhang L (2020) Pid controller-based stochastic optimization acceleration for deep neural networks. IEEE Trans Neural Netw Learn Syst 31(12):5079–5091. https://doi.org/10.1109/TNNLS.2019.2963066

    Article  Google Scholar 

  39. Huang H, Wang C, Dong B (2019) Nostalgic adam: Weighting more of the past gradients when designing the adaptive learning rate, pp 2556–2562. https://doi.org/10.24963/ijcai.2019/355

  40. Zaheer M, Reddi S, Sachan D, Kale S, Kumar S (2018) Adaptive methods for nonconvex optimization. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems, vol 31. https://proceedings.neurips.cc/paper_files/paper/2018/file/90365351ccc7437a1309dc64e4db32a3-Paper.pdf

  41. Bhakta S, Nandi U, Changdar C, Marjit Singh M (2023) angularparameter: a novel optimization technique for deep learning models. In: Sisodia DS, Garg L, Pachori RB, Tanveer M (eds) Machine intelligence techniques for data analysis and signal processing. Springer, Singapore, pp 201–212. https://doi.org/10.1007/978-981-99-0085-5_17

    Chapter  Google Scholar 

  42. Liu Y-J, Chen CLP, Wen G-X, Tong S (2011) Adaptive neural output feedback tracking control for a class of uncertain discrete-time nonlinear systems. IEEE Trans Neural Netw 22(7):1162–1167. https://doi.org/10.1109/TNN.2011.2146788

    Article  Google Scholar 

  43. Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: Dasgupta S, McAllester D (eds) Proceedings of the 30th international conference on machine learning. Proceedings of machine learning research, vol 28. PMLR, Atlanta, pp 1139–1147. https://proceedings.mlr.press/v28/sutskever13.html

  44. Heo B, Chun S, Oh SJ, Han D, Yun S, Kim G, Uh Y, Ha JW (2021) Adamp: slowing down the slowdown for momentum optimizers on scale-invariant weights. In: International conference on learning representations. https://openreview.net/forum?id=Iz3zU3M316D

  45. Zhuang J, Tang T, Ding Y, Tatikonda SC, Dvornek N, Papademetris X, Duncan J (2020) Adabelief optimizer: adapting stepsizes by the belief in observed gradients. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H (eds) Advances in neural information processing systems, vol 33, pp 18795–18806. https://proceedings.neurips.cc/paper_files/paper/2020/file/d9d4f495e875a2e075a1a4a6e1b9770f-Paper.pdf

  46. Reyad M, Sarhan A, Arafa M (2023) A modified adam algorithm for deep neural network optimization. Neural Comput Appl 35:1–18. https://doi.org/10.1007/s00521-023-08568-z

    Article  Google Scholar 

  47. Bhakta S, Nandi U, Si T, Ghosal S, Changdar C, Pal R (2022) Diffmoment: an adaptive optimization technique for convolutional neural network. Appl Intell. https://doi.org/10.1007/s10489-022-04382-7

    Article  Google Scholar 

  48. Loshchilov I, Hutter F (2019) Decoupled weight decay regularization

  49. Zhao X, Huang C, Liu B, Cao J (2023) Stability analysis of delay patch-constructed Nicholson’s blowflies system. Math Comput Simul. https://doi.org/10.1016/j.matcom.2023.09.012

    Article  Google Scholar 

  50. Huang C, Liu B, Qian C, Cao J (2021) Stability on positive pseudo almost periodic solutions of hpdcnns incorporating d operator. Math Comput Simul 190:1150–1163. https://doi.org/10.1016/j.matcom.2021.06.027

    Article  MathSciNet  Google Scholar 

  51. Huang C, Liu B, Yang H, Cao J (2022) Positive almost periodicity on sicnns incorporating mixed delays and d operator. Nonlinear Anal Model Control 27:1–21. https://doi.org/10.15388/namc.2022.27.27417

    Article  MathSciNet  Google Scholar 

  52. Danilova M, Dvurechensky PE, Gasnikov AV, Gorbunov E, Guminov S, Kamzolov D, Shibaev I (2020) Recent theoretical advances in non-convex optimization. arXiv:2012.06188

  53. Rosenbrock HH (1960) An automatic method for finding the greatest or least value of a function. Comput J 3(3):175–184. https://doi.org/10.1093/comjnl/3.3.175

    Article  MathSciNet  Google Scholar 

  54. Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical Report 0, University of Toronto, Toronto

  55. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg A, Fei-Fei L (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  56. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848

  57. Maas AL (2013) Rectifier nonlinearities improve neural network acoustic models

  58. Clevert DA, Unterthiner T, Hochreiter S (2015) Fast and accurate deep network learning by exponential linear units (elus). Under Review of ICLR2016 (1997)

  59. Deng L (2012) The mnist database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Process Mag 29(6):141–142. https://doi.org/10.1109/MSP.2012.2211477

    Article  Google Scholar 

  60. Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86:2278–2324. https://doi.org/10.1109/5.726791

    Article  Google Scholar 

  61. Dong C, Loy CC, He K, Tang X (2016) Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell 38(2):295–307. https://doi.org/10.1109/TPAMI.2015.2439281

    Article  Google Scholar 

  62. Dong C, Loy CC, Tang X (2016) Accelerating the super-resolution convolutional neural network 9906, pp 391–407. https://doi.org/10.1007/978-3-319-46475-6_25

  63. Bevilacqua M, Roumy A, Guillemot C, Alberi-Morel M (2012) Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In: Bowden R, Collomosse JP, Mikolajczyk K (eds) British machine vision conference, BMVC 2012, Surrey, September 3–7, pp 1–10.https://doi.org/10.5244/C.26.135

  64. Tripathi AM, Mishra A (2022) Revamped knowledge distillation for sound classification. In: 2022 International joint conference on neural networks (IJCNN), pp 1–8. https://doi.org/10.1109/IJCNN55064.2022.9892474

  65. Bantupalli K, Xie Y (2018) American sign language recognition using deep learning and computer vision. In: 2018 IEEE international conference on big data (big data), pp 4896–4899. https://doi.org/10.1109/BigData.2018.8622141

  66. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer vision-ECCV 2014. Springer, Cham, pp 740–755

  67. Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank the Dept. of Computer Science, Vidyasagar University, Paschim Medinipur, Midnapore 721102, West Bengal, India, for providing the infrastructure to carry out our experiments.

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

SB provided conceptualization, implementation, and drafting; UN analyzed investigation, methodology, analysis, and supervision; others prepared reviewing and editing.

Corresponding author

Correspondence to Utpal Nandi.

Ethics declarations

Conflict of interest

There are no conflicts of interest or conflict of interest.

Ethics approval

The authors approve that the research presented in this paper was conducted following the principles of ethical and professional conduct.

Consent to participate

Not applicable.

Consent for publication

Not applicable; the authors used publicly available data only and provided the corresponding references.

Code availability

Custom code is available.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bhakta, S., Nandi, U., Changdar, C. et al. emapDiffP: A novel learning algorithm for convolutional neural network optimization. Neural Comput & Applic 36, 11987–12010 (2024). https://doi.org/10.1007/s00521-024-09708-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-024-09708-9

Keywords

Navigation