Abstract
Fostered by technological and theoretical developments, deep neural networks (DNNs) have achieved great success in many applications, but their training via mini-batch stochastic gradient descent (SGD) can be very costly due to the possibly tens of millions of parameters to be optimized and the large amounts of training examples that must be processed. The computational cost is exacerbated by the inefficiency of the uniform sampling typically used by SGD to form the training mini-batches: since not all training examples are equally relevant for training, sampling these under a uniform distribution is far from optimal, making the case for the study of improved methods to train DNNs. A better strategy is to sample the training instances under a distribution where the probability of being selected is proportional to the relevance of each individual instance; one way to achieve this is through importance sampling (IS), which minimizes the gradients’ variance w.r.t. the network parameters, consequently improving convergence. In this paper, an IS-based adaptive sampling method to improve the training of DNNs is introduced. This method exploits side information to construct the optimal sampling distribution and is dubbed regularized adaptive sampling (RAS). Experimental comparison using deep convolutional networks for classification of the MNIST and CIFAR-10 datasets shows that when compared against SGD and against another sampling method in the state of the art, RAS produces improvements in the speed and variance of the training process without incurring significant overhead or affecting the classification.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of data and material
Data are available from the authors upon reasonable request.
Notes
The term deep network may allude to small networks with relatively few parameters presented in more than one hidden layer. What we call full-scale models refers to DNNs with around a dozen layers and typically millions of parameters.
In this equation and throughout this work, \(\mathbb {E}\) is used to represent mathematical expectation.
This base-line is harder to establish because of the problem complexity, the hyper-parameters and advanced techniques that can be used, but in the literature we encountered similar experiments with results around this value.
Slow convergence as well as large oscillations of the loss function are considered inconsistent with the desired behavior, even if the final test accuracy is good. In other words, the whole training process is examined, not just the end result.
References
Abohashima Z, Elhosen M, Houssein EH, Mohamed WM (2020) Classification with quantum machine learning: a survey 2020. https://doi.org/10.48550/arXiv.2006.12270
Ajagekar A, You F (2020) Quantum computing assisted deep learning for fault detection and diagnosis in industrial process systems. Comput Chem Eng 143:107119
Al-Waisy AS, Al-Fahdawi S, Mohammed MA, Abdulkareem KH, Mostafa SA, Maashi MS, Arif M, Garcia-Zapirain B (2020) COVID-CheXNet: hybrid deep learning framework for identifying COVID-19 virus in chest X-rays images. Soft Comput 1–16
Alain G, Lamb A, Sankar C, Courville A, Bengio Y (2015) Variance reduction in SGD by distributed importance sampling. arXiv:1511.06481
Alkadi R, Taher F, El-Baz A, Werghi N (2019) A deep learning-based approach for the detection and localization of prostate cancer in T2 magnetic resonance images. J Digit Imaging 32(5):793–807
Alom MZ, Taha TM, Yakopcic C, Westberg S, Sidike P, Nasrin MS, Van Esesn BC, Awwal AAS, Asari VK (2018) The history began from alexnet: a comprehensive survey on deep learning approaches. arXiv:1803.01164
Avalos-López JI, Rojas-Domínguez A, Ornelas-Rodríguez M, Carpio M, Valdez SI (2021) Efficient training of deep learning models through improved adaptive sampling. In: Roman-Rangel E, Kuri-Morales ÁF, Martínez-Trinidad JF, Carrasco-Ochoa JA, Olvera-López JA (eds) Pattern recognition. Springer, Cham, pp 141–152
Bengio Y, Louradour J, Collobert R, Weston J (2009) Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning, pp 41–48
Bouchard G, Trouillon T, Perez J, Gaidon A (2015) Online learning to sample. arXiv:1506.09016
Cerezo M, Coles PJ (2021) Higher order derivatives of quantum neural networks with barren plateaus. Quantum Sci Technol 6(3):035006
Cerezo M, Sone A, Volkoff T, Cincio L, Coles PJ (2021) Cost function dependent barren plateaus in shallow parametrized quantum circuits. Nat Commun 12(1):1–12
Choudhary M, Tiwari V, Venkanna U (2020) Enhancing human iris recognition performance in unconstrained environment using ensemble of convolutional and residual deep neural network models. Soft Comput 24(15):11477–11491
Faghri F, Tabrizian I, Markov I, Alistarh D, Roy D, Ramezani-Kebrya A (2020) Adaptive gradient quantization for data-parallel SGD. arXiv:2010.12460
Fan Y, Tian F, Qin T, Bian J, Liu TY (2017) Learning what data to learn. arXiv:1702.08635
Gibbons JD, Chakraborti S (2020) Nonparametric statistical inference. CRC Press, London
Gopal S (2016) Adaptive sampling for SGD by exploiting side information. In: International conference on machine learning, PMLR, pp 364–372
Havlíček V, Córcoles AD, Temme K, Harrow AW, Kandala A, Chow JM, Gambetta JM (2019) Supervised learning with quantum-enhanced feature spaces. Nature 567(7747):209–212
Hesamian MH, Jia W, He X, Kennedy P (2019) Deep learning techniques for medical image segmentation: achievements and challenges. J Digit Imaging 32(4):582–596
Hill PD (1985) Kernel estimation of a distribution function. Commun Stat Theory Methods 14(3):605–620
Hollander M, Wolfe DA, Chicken E (2013) Nonparametric statistical methods, vol 751. Wiley, Hoboken
Jiang AH, Wong DLK, Zhou G, Andersen DG, Dean J, Ganger GR, Joshi G, Kaminksy M, Kozuch M, Lipton ZC, et al (2019) Accelerating deep learning by focusing on the biggest losers. arXiv:191000762
Johnson TB, Guestrin C (2018) Training deep models faster with robust, approximate importance sampling. Adv Neural Inf Process Syst 31:1–11
Joseph K, Singh K, Balasubramanian VN, et al (2019) Submodular batch selection for training deep neural networks. arXiv:190608771
Katharopoulos A, Fleuret F (2017) Biased importance sampling for deep neural network training. arXiv:1706.00043
Katharopoulos A, Fleuret F (2018) Not all samples are created equal: deep learning with importance sampling. In: International conference on machine learning, PMLR, pp 2525–2534
Kawaguchi K, et al (2020) On optimization and scalability in deep learning. PhD thesis, Massachusetts Institute of Technology
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Krizhevsky A, Hinton G, et al (2009) Learning multiple layers of features from tiny images. Technical report, University of Toronto. http://www.cs.toronto.edu/~kriz/cifar.html
Lamata L (2020) Quantum machine learning and quantum biomimetics: a perspective. Mach Learn Sci Technol 1(3):033002
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Liu H, Lang B (2019) Machine learning and deep learning methods for intrusion detection systems: a survey. Appl Sci 9(20):4396
Liu J, Lim KH, Wood KL, Huang W, Guo C, Huang HL (2021) Hybrid quantum-classical convolutional neural networks. Sci China Phys Mech Astron 64(9):1–8
Loshchilov I, Hutter F (2015) Online batch selection for faster training of neural networks. arXiv:1511.06343
Mari A, Bromley TR, Izaac J, Schuld M, Killoran N (2020) Transfer learning in hybrid classical-quantum neural networks. Quantum 4:340
Owen A, Zhou Y (2000) Safe and effective importance sampling. J Am Stat Assoc 95(449):135–143
Saggio V, Asenbeck BE, Hamann A, Strömberg T, Schiansky P, Dunjko V, Friis N, Harris NC, Hochberg M, Englund D et al (2021) Experimental quantum speed-up in reinforcement learning agents. Nature 591(7849):229–233
Santiago C, Barata C, Sasdelli M, Carneiro G, Nascimento JC (2021) Low: training deep neural networks by learning optimal sample weights. Pattern Recogn 110:107585
Schuld M (2021) Supervised quantum machine learning models are kernel methods. arXiv:2101.11020
Schuld M, Petruccione F (2021) Machine learning with quantum computers. Springer, Cham
Schuld M, Bergholm V, Gogolin C, Izaac J, Killoran N (2019) Evaluating analytic gradients on quantum hardware. Phys Rev A 99(3):032331
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Smith SL, Kindermans PJ, Ying C, Le QV (2017) Don’t decay the learning rate, increase the batch size. arXiv:1711.00489
Sweke R, Wilde F, Meyer J, Schuld M, Fährmann PK, Meynard-Piganeau B, Eisert J (2020) Stochastic gradient descent for hybrid quantum-classical optimization. Quantum 4:314
Thakur D, Biswas S (2020) Smartphone based human activity monitoring and recognition using ML and DL: a comprehensive survey. J Ambient Intell Human Comput pp 1–12
Tokdar ST, Kass RE (2010) Importance sampling: a review. Wiley Interdiscip Rev Comput Stat 2(1):54–60
Varela-Santos S, Melin P (2021) A new approach for classifying coronavirus COVID-19 based on its manifestation on chest X-rays using texture features and neural networks. Inf Sci 545:403–414
Wang X (2020) Example weighting for deep representation learning. PhD thesis, Queen’s University Belfast
Wang L, Ye J, Zhao Y, Wu W, Li A, Song SL, Xu Z, Kraska T (2018) Superneurons: dynamic GPU memory management for training deep neural networks. In: Proceedings of the 23rd ACM SIGPLAN symposium on principles and practice of parallel programming, pp 41–53
Wang M, Fu W, He X, Hao S, Wu X (2022) A survey on large-scale machine learning. IEEE Trans Knowl Data Eng 34(6):2574–2594. https://doi.org/10.1109/TKDE.2020.3015777
Wiebe N, Kapoor A, Svore KM (2015) Quantum deep learning. arXiv:1412.3489
Wu CY, Manmatha R, Smola AJ, Krahenbuhl P (2017) Sampling matters in deep embedding learning. In: Proceedings of the IEEE international conference on computer vision, pp 2840–2848
Yao X, Wang X, Wang SH, Zhang YD (2020) A comprehensive survey on convolutional neural network in medical image analysis. Multimed Tools Appl 2020: 1–45
Zahorodko PV, Modlo YO, Kalinichenko OO, Selivanova TV, Semerikov S (2021) Quantum enhanced machine learning: an overview. CEUR workshop proceedings
Zeiler MD (2012) Adadelta: an adaptive learning rate method. arXiv:1212.5701
Zhao P, Zhang T (2014) Accelerating minibatch stochastic gradient descent using stratified sampling. arXiv:1405.3080
Zhao P, Zhang T (2015) Stochastic optimization with importance sampling for regularized loss minimization. In: International conference on machine learning, PMLR, pp 1–9
Funding
This work was partially supported by The National Council of Science and Technology of Mexico (CONACYT) through grants: CÁTEDRAS-2598 (A. Rojas) and CÁTEDRAS-7795 (S.I. Valdez).
Author information
Authors and Affiliations
Contributions
A.R.D. conceived the study, participated in the design of the algorithms and implementation of the models; carried out computational experiments, analyzed experimental results, and wrote the paper with the other authors. M.O.R. participated in the design of the study and initial versions of the manuscript. M.C. participated in the design of the algorithms and analyzed the results. S.I.V. participated throughout in the preparation of the paper. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no competing interests.
Ethics approval and consent to participate
Not Applicable.
Consent for publication
Not Applicable.
Additional information
Communicated by Oscar Castillo.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Rojas-Domínguez, A., Valdez, S.I., Ornelas-Rodríguez, M. et al. Improved training of deep convolutional networks via minimum-variance regularized adaptive sampling. Soft Comput 27, 13237–13253 (2023). https://doi.org/10.1007/s00500-022-07131-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-022-07131-7