Abstract
This work proposes a new formulation for supervised stacked autoencoder. We argue that features from the same class should be similar to each other and hence linearly dependent. This means that, when stacked as columns, the feature matrix for each class will be rank deficient (low-rank). We impose this constraint into the stacked autoencoder formulation in the form of nuclear norm penalties on class-wise feature matrices at each level. The nuclear norm penalty is the convex surrogate of rank, and promotes a low-rank solution as desired by our proposal. Owing to the nuclear norm penalties, our formulation is non-smooth; hence cannot be solved using gradient descent based techniques like backpropagation directly. Moreover we learn the stacked autoencoder in one go, without the usual pre-training followed by fine-tuning regime. Both the ends (non-smooth cost function and single stage training for all the layers simultaneously) are met by employing the variable splitting followed by augmented Lagrangian method of alternating directions. Two sets of experiments have been carried out. The first set is on a variety of benchmark datasets. Our method excels over other deep learning models compared against—class sparse stacked autoencoder, deep belief network and discriminative deep belief network. The second experiment is on the brain computer classification problem; we find that our method outperforms prior deep learning based solutions utilized for this task.
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bengio Y (2009) Learning deep architectures for AI. Found Trends\({\textregistered }\) Mach Learn 2(1):1–127
Maria J, Amaro J, Falcao G, Alexandre LA (2016) Stacked autoencoders using low-power accelerated architectures for object recognition in autonomous systems. Neural Process Lett 43(2):445–458
Zhou S, Chen Q, Wang X (2013) Convolutional deep networks for visual data classification. Neural Process Lett 38(1):17–27
Chen K, Salman A (2011) Learning speaker-specific characteristics with a deep neural architecture. IEEE Trans Neural Netw 22(11):1744–1756
Kamimura R, Nakanishi S (1995) Feature detectors by autoencoders: decomposition of input patterns into atomic features by neural networks. Neural Process Lett 2(6):17–22
Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408
Bengio Y, Lamblin P, Popovici D, Larochelle H (2010) Greedy layer-wise training of deep networks. Adv Neural Inf Process Syst 19:153
Jarrett K, Kavukcuoglu K, Ranzato MA, LeCun Y (2009) What is the best multi-stage architecture for object recognition? In: International conference on computer vision, pp 2146–2153
Bianchini M, Scarselli F (2015) On the complexity of neural network classifiers: a comparison between shallow and deep architectures. IEEE Trans Neural Netw Learn Syst 25(8):1553–1565
Ng A (2011) Sparse autoencoder. In: CS294A lecture notes, p 72
Makhzani A, Frey B (2013) k-sparse autoencoders. arXiv preprint arXiv:1312.5663
Cho K (2013) Simple sparsification improves sparse denoising autoencoders in denoising highly noisy images. ICML
Rifai S, Vincent P, Muller X, Glorot X, Bengio Y (2011) Contractive auto-encoders: explicit invariance during feature extraction. ICML
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Wan L, Zeiler M, Zhang S, LeCun Y, Fergus R (2013) Regularization of neural networks using dropconnect. ICML
Frazão X, Alexandre LA (2014) DropAll: generalization of two convolutional neural network regularization methods. In: Image analysis and recognition, pp 282–289
Majumdar A, Vatsa M, Singh R (2017) Face recognition via class sparsity based supervised encoding. IEEE Trans Pattern Anal Mach Intell 39:1273–1280
Razakarivony S, Jurie F (2014) Discriminative autoencoders for small targets detection. ICPR
Chen M, Weinberger KQ, Sha F, Bengio Y (2014) Marginalized denoising auto-encoders for nonlinear representations. ICML
Recht B, Xu W, Hassibi B (2011) Null space conditions and thresholds for rank minimization. Math Program 127(1):175–202
Candès EJ, Tao T (2009) The power of convex relaxation: near-optimal matrix completion. IEEE Trans Inf Theory 56(5):2053–2080
Majumdar A, Ward RK (2011) Some empirical advances in matrix completion. Signal Process 91(5):1334–1338
Gogna A, Shukla A, Majumdar A (2014) Matrix Recovery using Split Bregman. ICPR
Yin W, Osher S, Goldfarb D, Darbon J (2008) Bregman iterative algorithms for \(\ell _1\)-minimization with applications to compressed sensing. SIAM J Imaging Sci 1(1):143–168
Candès EJ, Eldar YC, Strohmer T, Voroninski V (2013) Phase retrieval via matrix completion. SIAM J Imaging Sci 6(1):199–225
Liu Z, Vandenberghe L (2009) Interior-point method for nuclear norm approximation with application to system identification. SIAM J Matrix Anal Appl 31(3), 1235-1256
Zhao B, Haldar JP, Brinegar C, Liang ZP (2010) Low rank matrix recovery for real-time cardiac MRI. IEEE ISBI
Candès EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis? J ACM 58(1):1–37
Gogna A, Majumdar A (2015) Matrix completion incorporating auxiliary information for recommender system design. Expert Syst Appl 24(14):5789–5799
Tripathi A, Majumdar A (2017) Asymmetric stacked autoencoder. IEEE IJCNN
Gülçehre Ç, Moczulski M, Denil M, Bengio Y (2016) Noisy Activation Functions. ICML
Larochelle H, Erhan D, Courville A, Bergstra J, Bengio Y (2007) An empirical evaluation of deep architectures on problems with many factors of variation. ICML
Bhattacharya U, Chaudhuri BB (2009) Handwritten numeral databases of Indian scripts and multistage recogni-tion of mixed numerals. IEEE Trans Pattern Anal Mach Intell 31(3):444–457
http://www.iro.umontreal.ca/%7Elisa/twiki/bin/view.cgi/Public/RectanglesData
Larochelle H, Bengio Y (2008) Classification using discriminative restricted Boltzmann machines. ICML
Sankaran A, Sharma G, Singh R, Vatsa M, Majumdar A (2017) Class sparsity signature based restricted Boltzmann machines. Pattern Recognit 61:674–685
Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2009) Robust face recognition via sparse representation. IEEE Trans Pattern Anal Mach Intell 31(2):210–227
Turner JT, Page A, Mohsenin T, Oates T (2014) Deep belief networks used on high resolution multichannel electroencephalography data for seizure detection. AAAI
Al-kaysi AM, Al-Ani A, Boonstra TW (2015) A multichannel deep belief network for the classification of EEG data. ICONIP
Ren Y, Wu Y (2014) Convolutional deep belief networks for feature extraction of EEG signal. IEEE IJCNN
Brodu N, Lotte F, Lecuyer A (2011) Comparative study of band-power extraction techniques for motor imagery classification’. In: IEEE symposium on computational intelligence, cognitive algorithms, mind and brain (CCMB)
Anderson CW, Stolz EA, Shamsunder S (1998) Multivariate autoregressive models for classification of spontaneous electroencephalographic signals during mental tasks. IEEE Trans Biomed Eng 45(3):277–286
Müller-Gerking J, Pfurtcheller G, Flyvberg H (1991) Designing optimal spatial filters for single-trial EEG classification in a movement task. Clin Neurophysiol 110(5):787–789
Khurana P, Majumdar A, Ward RK (2016) Class-wise deep dictionaries for EEG classification. IEEE IJCNN
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gupta, K., Majumdar, A. Imposing Class-Wise Feature Similarity in Stacked Autoencoders by Nuclear Norm Regularization. Neural Process Lett 48, 615–629 (2018). https://doi.org/10.1007/s11063-017-9731-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-017-9731-2