Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Stacked Convolutional Sparse Auto-Encoders for Representation Learning

Published: 05 March 2021 Publication History

Abstract

Deep learning seeks to achieve excellent performance for representation learning in image datasets. However, supervised deep learning models such as convolutional neural networks require a large number of labeled image data, which is intractable in applications, while unsupervised deep learning models like stacked denoising auto-encoder cannot employ label information. Meanwhile, the redundancy of image data incurs performance degradation on representation learning for aforementioned models. To address these problems, we propose a semi-supervised deep learning framework called stacked convolutional sparse auto-encoder, which can learn robust and sparse representations from image data with fewer labeled data records. More specifically, the framework is constructed by stacking layers. In each layer, higher layer feature representations are generated by features of lower layers in a convolutional way with kernels learned by a sparse auto-encoder. Meanwhile, to solve the data redundance problem, the algorithm of Reconstruction Independent Component Analysis is designed to train on patches for sphering the input data. The label information is encoded using a Softmax Regression model for semi-supervised learning. With this framework, higher level representations are learned by layers mapping from image data. It can boost the performance of the base subsequent classifiers such as support vector machines. Extensive experiments demonstrate the superior classification performance of our framework compared to several state-of-the-art representation learning methods.

References

[1]
Vincent Aleven, Kenneth R. Koedinger, H. Colleen Sinclair, and Jaclyn Snyder. 1998. Combatting shallow learning in a tutor for geometry problem solving. In International Conference on Intelligent Tutoring Systems. 364--373.
[2]
John Allman, Francis Miezin, and EveLynn McGuinness. 1985. Stimulus specific responses from beyond the classical receptive field: Neurophysiological mechanisms for local-global comparisons in visual neurons. Annual Review of Neuroscience 8, 8 (1985), 407--430.
[3]
Yoshua Bengio. 2009. Learning deep architectures for AI. Foundations and Trends® in Machine Learning 2, 1 (2009), 1--127.
[4]
Yoshua Bengio, Yann LeCun, and Donnie Henderson. 1993. Globally trained handwritten word recognizer using spatial representation, convolutional neural networks, and hidden Markov models. In Advances in Neural Information Processing Systems. 937--944.
[5]
Y-Lan Boureau, Jean Ponce, and Yann LeCun. 2010. A theoretical analysis of feature pooling in visual recognition. In International Conference on Machine Learning. 111--118.
[6]
Zhaowei Cai, Quanfu Fan, Rogerio S. Feris, and Nuno Vasconcelos. 2016. A unified multi-scale deep convolutional neural network for fast object detection. In European Conference on Computer Vision. 354--370.
[7]
Minmin Chen, Zhixiang Xu, Kilian Weinberger, and Fei Sha. 2012. Marginalized denoising autoencoders for domain adaptation. In International Conference on Machine Learning. 767--774.
[8]
Weijian Chen, Yulong Gu, Zhaochun Ren, Xiangnan He, Hongtao Xie, Tong Guo, Dawei Yin, and Yongdong Zhang. 2019. Semi-supervised user profiling with heterogeneous graph attention networks. In 28th International Joint Conference on Artificial Intelligence. Vol. 19. 2116--2122.
[9]
Adam Coates, Andrew Ng, and Honglak Lee. 2011. An analysis of single-layer networks in unsupervised feature learning. In International Conference on Artificial Intelligence and Statistics. 215--223.
[10]
Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. 2014. DeCAF: A deep convolutional activation feature for generic visual recognition. In International Conference on Machine Learning. 647--655.
[11]
Bo Du, Wei Xiong, Jia Wu, Lefei Zhang, Liangpei Zhang, and Dacheng Tao. 2016. Stacked convolutional denoising auto-encoders for feature representation. IEEE Transactions on Cybernetics 47, 4 (2016), 1017--1027.
[12]
Shaohua Fan, Junxiong Zhu, Xiaotian Han, Chuan Shi, Linmei Hu, Biyu Ma, and Yongliang Li. 2019. Metapath-guided heterogeneous graph neural network for intent recommendation. In 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2478--2486.
[13]
Ross Girshick. 2015. Fast R-CNN. In IEEE Conference on Computer Vision and Pattern Recognition. 1440--1448.
[14]
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition. 580--587.
[15]
Anupriya Gogna and Angshul Majumdar. 2016. Semi supervised autoencoder. In International Conference on Neural Information Processing. 82--89.
[16]
Xiaowei Gu and Plamen P. Angelov. 2018. Semi-supervised deep rule-based approach for image classification. Applied Soft Computing 68 (2018), 53--68.
[17]
Kaiming He, Georgia Gkioxari, Piotr Dollr, and Ross Girshick. 2017. Mask R-CNN. In IEEE Conference on Computer Vision and Pattern Recognition. 2980--2988.
[18]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 770--778.
[19]
Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, and Mu Li. 2019. Bag of tricks for image classification with convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition. 558--567.
[20]
Felix Heide, Wolfgang Heidrich, and Gordon Wetzstein. 2015. Fast and flexible convolutional sparse coding. In IEEE Conference on Computer Vision and Pattern Recognition. 5135--5143.
[21]
Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A fast learning algorithm for deep belief nets. Neural Computation 18, 7 (2006), 1527--1554.
[22]
Judy Hoffman, Sergio Guadarrama, Eric S. Tzeng, Ronghang Hu, Jeff Donahue, Ross Girshick, Trevor Darrell, and Kate Saenko. 2014. LSDA: Large scale detection through adaptation. In Advances in Neural Information Processing Systems. 3536--3544.
[23]
Huiting Hong, Hantao Guo, Yucheng Lin, Xiaoqing Yang, Zang Li, and Jieping Ye. 2020. An attention-based graph neural network for heterogeneous structural learning. In 34th AAAI Conference on Artificial Intelligence. 4132--4139.
[24]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105.
[25]
Steve Lawrence, C. Lee Giles, Ah Chung Tsoi, and Andrew D. Back. 1997. Face recognition: A convolutional neural-network approach. IEEE Transactions on Neural Networks 8, 1 (1997), 98--113.
[26]
Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE Conference on Computer Vision and Pattern Recognition. 2169--2178.
[27]
Quoc V. Le, Alexandre Karpenko, Jiquan Ngiam, and Andrew Ng. 2011. ICA with reconstruction cost for efficient overcomplete feature learning. In Advances in Neural Information Processing Systems. 1017--1025.
[28]
Yann LeCun and Yoshua Bengio. 1995. Convolutional networks for images, speech, and time series. The Handbook of Brain Theory and Neural Networks 3361, 10 (1995), 1995.
[29]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278--2324.
[30]
Dong-Hyun Lee. 2013. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on Challenges in Representation Learning, International Conference on Machine Learning. 2--8.
[31]
Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Ng. 2009. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In International Conference on Machine Learning. 609--616.
[32]
Feifei Li, Rob Fergus, and Pietro Perona. 2004. Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. In IEEE Conference on Computer Vision and Pattern Recognition Workshop. 178--178.
[33]
Xiang Li, Emre Armagan, Asgeir Tomasgard, and Paul Barton. 2011. Stochastic pooling problem for natural gas production network design and operation under uncertainty. AIChE Journal 57, 8 (2011), 2120--2135.
[34]
Hu Linmei, Tianchi Yang, Chuan Shi, Houye Ji, and Xiaoli Li. 2019. Heterogeneous graph attention networks for semi-supervised short text classification. In 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 4823--4832.
[35]
Ziqi Liu, Chaochao Chen, Xinxing Yang, Jun Zhou, Xiaolong Li, and Le Song. 2018. Heterogeneous graph neural networks for malicious account detection. In 27th ACM International Conference on Information and Knowledge Management. 2077--2085.
[36]
Jonathan Masci, Ueli Meier, Dan Cireşan, and Jürgen Schmidhuber. 2011. Stacked convolutional auto-encoders for hierarchical feature extraction. In International Conference on Artificial Neural Networks. 52--59.
[37]
Mohammad Norouzi, Mani Ranjbar, and Greg Mori. 2009. Stacks of convolutional restricted Boltzmann machines for shift-invariant feature learning. In IEEE Conference on Computer Vision and Pattern Recognition. 2735--2742.
[38]
Maxime Oquab, Leon Bottou, Ivan Laptev, and Josef Sivic. 2014. Learning and transferring mid-level image representations using convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition. 1717--1724.
[39]
Paul Smolensky. 1986. Information Processing in Dynamical Systems: Foundations of Harmony Theory. Technical Report. University of Colorado Boulder, Department of Computer Science.
[40]
Wenjun Sun, Siyu Shao, Rui Zhao, Ruqiang Yan, Xingwu Zhang, and Xuefeng Chen. 2016. A sparse auto-encoder-based deep neural network approach for induction motor faults classification. Measurement 89 (2016), 171--178.
[41]
Graham W. Taylor, Rob Fergus, Yann LeCun, and Christoph Bregler. 2010. Convolutional learning of spatio-temporal features. In European Conference on Computer Vision. 140--153.
[42]
Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre Antoine Manzagol. 2010. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research 11, 12 (2010), 3371--3408.
[43]
Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S. Yu. 2019. Heterogeneous graph attention network. In The World Wide Web Conference. 2022--2032.
[44]
Ken C. L. Wong, Alexandros Karargyris, Tanveer Syeda-Mahmood, and Mehdi Moradi. 2017. Building disease detection algorithms with very small numbers of positive samples. In International Conference on Medical Image Computing and Computer-Assisted Intervention. 471--479.
[45]
Wenxuan Wu, Zhongang Qi, and Fuxin Li. 2019. Pointconv: Deep convolutional networks on 3D point clouds. In IEEE Conference on Computer Vision and Pattern Recognition. 9621--9630.
[46]
Yong Xu, Xiaozhao Fang, Jian Wu, Xuelong Li, and David Zhang. 2016. Discriminative transfer subspace learning via low-rank and sparse representation. IEEE Transactions on Image Processing 25, 2 (2016), 850--863.
[47]
Jianchao Yang, Kai Yu, Yihong Gong, and Thomas Huang. 2009. Linear spatial pyramid matching using sparse coding for image classification. In IEEE Conference on Computer Vision and Pattern Recognition. 1794--1801.
[48]
David Yarowsky. 1995. Unsupervised word sense disambiguation rivaling supervised methods. In Annual Meeting of the Association for Computational Linguistics. 189--196.
[49]
Minerva M. Yeung and Bede Liu. 1995. Efficient matching and clustering of video shots. In International Conference on Image Processing. 338--341.
[50]
Rex Ying, Jiaxuan You, Christopher Morris, Xiang Ren, Will Hamilton, and Jure Leskovec. 2018. Hierarchical graph representation learning with differentiable pooling. In Advances in Neural Information Processing Systems. 4800--4810.
[51]
Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S. Huang. 2019. Free-form image in painting with gated convolution. In IEEE Conference on Computer Vision. 4471--4480.
[52]
Matthew D. Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European Conference on Computer Vision. 818--833.
[53]
Chuxu Zhang, Dongjin Song, Chao Huang, Ananthram Swami, and Nitesh V. Chawla. 2019. Heterogeneous graph neural network. In 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 793--803.
[54]
Yuxiang Zhang, Bo Du, and Liangpei Zhang. 2014. A sparse representation-based binary hypothesis model for target detection in hyperspectral images. IEEE Transactions on Geoscience and Remote Sensing 53, 3 (2014), 1346--1354.
[55]
Yi Zhu, Xuegang Hu, Yuhong Zhang, and Peipei Li. 2018. Transfer learning with stacked reconstruction independent component analysis. Knowledge-Based Systems 152 (2018), 100--106.
[56]
Fuzhen Zhuang, Dan Luo, Xin Jin, Hui Xiong, Ping Luo, and Qing He. 2016. Representation learning via semi-supervised autoencoder for multi-task learning. In IEEE International Conference on Data Mining. 1141--1146.
[57]
Will Zou, Shenghuo Zhu, Kai Yu, and Andrew Ng. 2012. Deep learning of invariant features via simulated fixations in video. In Advances in Neural Information Processing Systems. 3203--3211.

Cited By

View all
  • (2024)ProtoMGAE: Prototype-Aware Masked Graph Auto-Encoder for Graph Representation LearningACM Transactions on Knowledge Discovery from Data10.1145/364914318:6(1-22)Online publication date: 12-Apr-2024
  • (2024)Supervised Representation Learning for Network Traffic With Cluster CompressionIEEE Transactions on Sustainable Computing10.1109/TSUSC.2023.32924049:1(1-13)Online publication date: Jan-2024
  • (2024)Self-Adaptive Deep Asymmetric Network for Imbalanced RecommendationIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2023.33007408:1(968-980)Online publication date: Feb-2024
  • Show More Cited By

Index Terms

  1. Stacked Convolutional Sparse Auto-Encoders for Representation Learning

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Knowledge Discovery from Data
    ACM Transactions on Knowledge Discovery from Data  Volume 15, Issue 2
    Survey Paper and Regular Papers
    April 2021
    524 pages
    ISSN:1556-4681
    EISSN:1556-472X
    DOI:10.1145/3446665
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 March 2021
    Accepted: 01 November 2020
    Revised: 01 August 2020
    Received: 01 October 2019
    Published in TKDD Volume 15, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Sparse auto-encoder
    2. representation learning

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • Natural Science Foundation of China
    • National Key Research and Development Program of China
    • Ministry of Education

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)51
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 18 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)ProtoMGAE: Prototype-Aware Masked Graph Auto-Encoder for Graph Representation LearningACM Transactions on Knowledge Discovery from Data10.1145/364914318:6(1-22)Online publication date: 12-Apr-2024
    • (2024)Supervised Representation Learning for Network Traffic With Cluster CompressionIEEE Transactions on Sustainable Computing10.1109/TSUSC.2023.32924049:1(1-13)Online publication date: Jan-2024
    • (2024)Self-Adaptive Deep Asymmetric Network for Imbalanced RecommendationIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2023.33007408:1(968-980)Online publication date: Feb-2024
    • (2024)Representation learning: serial-autoencoder for personalized recommendationFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-023-2441-118:4Online publication date: 1-Aug-2024
    • (2024)Asymmetric Short-Text Clustering via PromptNew Generation Computing10.1007/s00354-024-00244-742:4(599-615)Online publication date: 1-Nov-2024
    • (2023)Hybrid robust convolutional autoencoder for unsupervised anomaly detection of machine tools under noisesRobotics and Computer-Integrated Manufacturing10.1016/j.rcim.2022.10244179(102441)Online publication date: Feb-2023
    • (2022)Method and Application of Visual Relationship Detection Based on Deep LearningJournal of Image and Signal Processing10.12677/JISP.2022.11301611:03(144-161)Online publication date: 2022
    • (2022)Representation learning with deep sparse auto-encoder for multi-task learningPattern Recognition10.1016/j.patcog.2022.108742129(108742)Online publication date: Sep-2022
    • (2021)Personalized Recommendation Based On Entity Attributes and Graph Features2021 IEEE International Conference on Big Knowledge (ICBK)10.1109/ICKG52313.2021.00011(7-14)Online publication date: Dec-2021

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media