research-article

Stacked Convolutional Sparse Auto-Encoders for Representation Learning

Authors:

Xindong WuAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data (TKDD), Volume 15, Issue 2

Article No.: 31, Pages 1 - 21

https://doi.org/10.1145/3434767

Published: 05 March 2021 Publication History

Abstract

Deep learning seeks to achieve excellent performance for representation learning in image datasets. However, supervised deep learning models such as convolutional neural networks require a large number of labeled image data, which is intractable in applications, while unsupervised deep learning models like stacked denoising auto-encoder cannot employ label information. Meanwhile, the redundancy of image data incurs performance degradation on representation learning for aforementioned models. To address these problems, we propose a semi-supervised deep learning framework called stacked convolutional sparse auto-encoder, which can learn robust and sparse representations from image data with fewer labeled data records. More specifically, the framework is constructed by stacking layers. In each layer, higher layer feature representations are generated by features of lower layers in a convolutional way with kernels learned by a sparse auto-encoder. Meanwhile, to solve the data redundance problem, the algorithm of Reconstruction Independent Component Analysis is designed to train on patches for sphering the input data. The label information is encoded using a Softmax Regression model for semi-supervised learning. With this framework, higher level representations are learned by layers mapping from image data. It can boost the performance of the base subsequent classifiers such as support vector machines. Extensive experiments demonstrate the superior classification performance of our framework compared to several state-of-the-art representation learning methods.

References

[1]

Vincent Aleven, Kenneth R. Koedinger, H. Colleen Sinclair, and Jaclyn Snyder. 1998. Combatting shallow learning in a tutor for geometry problem solving. In International Conference on Intelligent Tutoring Systems. 364--373.

Digital Library

[2]

John Allman, Francis Miezin, and EveLynn McGuinness. 1985. Stimulus specific responses from beyond the classical receptive field: Neurophysiological mechanisms for local-global comparisons in visual neurons. Annual Review of Neuroscience 8, 8 (1985), 407--430.

[3]

Yoshua Bengio. 2009. Learning deep architectures for AI. Foundations and Trends® in Machine Learning 2, 1 (2009), 1--127.

Digital Library

[4]

Yoshua Bengio, Yann LeCun, and Donnie Henderson. 1993. Globally trained handwritten word recognizer using spatial representation, convolutional neural networks, and hidden Markov models. In Advances in Neural Information Processing Systems. 937--944.

[5]

Y-Lan Boureau, Jean Ponce, and Yann LeCun. 2010. A theoretical analysis of feature pooling in visual recognition. In International Conference on Machine Learning. 111--118.

[6]

Zhaowei Cai, Quanfu Fan, Rogerio S. Feris, and Nuno Vasconcelos. 2016. A unified multi-scale deep convolutional neural network for fast object detection. In European Conference on Computer Vision. 354--370.

[7]

Minmin Chen, Zhixiang Xu, Kilian Weinberger, and Fei Sha. 2012. Marginalized denoising autoencoders for domain adaptation. In International Conference on Machine Learning. 767--774.

[8]

Weijian Chen, Yulong Gu, Zhaochun Ren, Xiangnan He, Hongtao Xie, Tong Guo, Dawei Yin, and Yongdong Zhang. 2019. Semi-supervised user profiling with heterogeneous graph attention networks. In 28th International Joint Conference on Artificial Intelligence. Vol. 19. 2116--2122.

[9]

Adam Coates, Andrew Ng, and Honglak Lee. 2011. An analysis of single-layer networks in unsupervised feature learning. In International Conference on Artificial Intelligence and Statistics. 215--223.

[10]

Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. 2014. DeCAF: A deep convolutional activation feature for generic visual recognition. In International Conference on Machine Learning. 647--655.

Digital Library

[11]

Bo Du, Wei Xiong, Jia Wu, Lefei Zhang, Liangpei Zhang, and Dacheng Tao. 2016. Stacked convolutional denoising auto-encoders for feature representation. IEEE Transactions on Cybernetics 47, 4 (2016), 1017--1027.

[12]

Shaohua Fan, Junxiong Zhu, Xiaotian Han, Chuan Shi, Linmei Hu, Biyu Ma, and Yongliang Li. 2019. Metapath-guided heterogeneous graph neural network for intent recommendation. In 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2478--2486.

Digital Library

[13]

Ross Girshick. 2015. Fast R-CNN. In IEEE Conference on Computer Vision and Pattern Recognition. 1440--1448.

[14]

Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition. 580--587.

Digital Library

[15]

Anupriya Gogna and Angshul Majumdar. 2016. Semi supervised autoencoder. In International Conference on Neural Information Processing. 82--89.

Digital Library

[16]

Xiaowei Gu and Plamen P. Angelov. 2018. Semi-supervised deep rule-based approach for image classification. Applied Soft Computing 68 (2018), 53--68.

[17]

Kaiming He, Georgia Gkioxari, Piotr Dollr, and Ross Girshick. 2017. Mask R-CNN. In IEEE Conference on Computer Vision and Pattern Recognition. 2980--2988.

[18]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 770--778.

[19]

Tong He, Zhi Zhang, Hang Zhang, Zhongyue Zhang, Junyuan Xie, and Mu Li. 2019. Bag of tricks for image classification with convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition. 558--567.

[20]

Felix Heide, Wolfgang Heidrich, and Gordon Wetzstein. 2015. Fast and flexible convolutional sparse coding. In IEEE Conference on Computer Vision and Pattern Recognition. 5135--5143.

[21]

Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh. 2006. A fast learning algorithm for deep belief nets. Neural Computation 18, 7 (2006), 1527--1554.

Digital Library

[22]

Judy Hoffman, Sergio Guadarrama, Eric S. Tzeng, Ronghang Hu, Jeff Donahue, Ross Girshick, Trevor Darrell, and Kate Saenko. 2014. LSDA: Large scale detection through adaptation. In Advances in Neural Information Processing Systems. 3536--3544.

[23]

Huiting Hong, Hantao Guo, Yucheng Lin, Xiaoqing Yang, Zang Li, and Jieping Ye. 2020. An attention-based graph neural network for heterogeneous structural learning. In 34th AAAI Conference on Artificial Intelligence. 4132--4139.

[24]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105.

Digital Library

[25]

Steve Lawrence, C. Lee Giles, Ah Chung Tsoi, and Andrew D. Back. 1997. Face recognition: A convolutional neural-network approach. IEEE Transactions on Neural Networks 8, 1 (1997), 98--113.

Digital Library

[26]

Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE Conference on Computer Vision and Pattern Recognition. 2169--2178.

Digital Library

[27]

Quoc V. Le, Alexandre Karpenko, Jiquan Ngiam, and Andrew Ng. 2011. ICA with reconstruction cost for efficient overcomplete feature learning. In Advances in Neural Information Processing Systems. 1017--1025.

[28]

Yann LeCun and Yoshua Bengio. 1995. Convolutional networks for images, speech, and time series. The Handbook of Brain Theory and Neural Networks 3361, 10 (1995), 1995.

Digital Library

[29]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 11 (1998), 2278--2324.

[30]

Dong-Hyun Lee. 2013. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on Challenges in Representation Learning, International Conference on Machine Learning. 2--8.

[31]

Honglak Lee, Roger Grosse, Rajesh Ranganath, and Andrew Ng. 2009. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In International Conference on Machine Learning. 609--616.

Digital Library

[32]

Feifei Li, Rob Fergus, and Pietro Perona. 2004. Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. In IEEE Conference on Computer Vision and Pattern Recognition Workshop. 178--178.

[33]

Xiang Li, Emre Armagan, Asgeir Tomasgard, and Paul Barton. 2011. Stochastic pooling problem for natural gas production network design and operation under uncertainty. AIChE Journal 57, 8 (2011), 2120--2135.

[34]

Hu Linmei, Tianchi Yang, Chuan Shi, Houye Ji, and Xiaoli Li. 2019. Heterogeneous graph attention networks for semi-supervised short text classification. In 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP’19). 4823--4832.

[35]

Ziqi Liu, Chaochao Chen, Xinxing Yang, Jun Zhou, Xiaolong Li, and Le Song. 2018. Heterogeneous graph neural networks for malicious account detection. In 27th ACM International Conference on Information and Knowledge Management. 2077--2085.

Digital Library

[36]

Jonathan Masci, Ueli Meier, Dan Cireşan, and Jürgen Schmidhuber. 2011. Stacked convolutional auto-encoders for hierarchical feature extraction. In International Conference on Artificial Neural Networks. 52--59.

[37]

Mohammad Norouzi, Mani Ranjbar, and Greg Mori. 2009. Stacks of convolutional restricted Boltzmann machines for shift-invariant feature learning. In IEEE Conference on Computer Vision and Pattern Recognition. 2735--2742.

[38]

Maxime Oquab, Leon Bottou, Ivan Laptev, and Josef Sivic. 2014. Learning and transferring mid-level image representations using convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition. 1717--1724.

Digital Library

[39]

Paul Smolensky. 1986. Information Processing in Dynamical Systems: Foundations of Harmony Theory. Technical Report. University of Colorado Boulder, Department of Computer Science.

[40]

Wenjun Sun, Siyu Shao, Rui Zhao, Ruqiang Yan, Xingwu Zhang, and Xuefeng Chen. 2016. A sparse auto-encoder-based deep neural network approach for induction motor faults classification. Measurement 89 (2016), 171--178.

[41]

Graham W. Taylor, Rob Fergus, Yann LeCun, and Christoph Bregler. 2010. Convolutional learning of spatio-temporal features. In European Conference on Computer Vision. 140--153.

[42]

Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, and Pierre Antoine Manzagol. 2010. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research 11, 12 (2010), 3371--3408.

Digital Library

[43]

Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S. Yu. 2019. Heterogeneous graph attention network. In The World Wide Web Conference. 2022--2032.

[44]

Ken C. L. Wong, Alexandros Karargyris, Tanveer Syeda-Mahmood, and Mehdi Moradi. 2017. Building disease detection algorithms with very small numbers of positive samples. In International Conference on Medical Image Computing and Computer-Assisted Intervention. 471--479.

[45]

Wenxuan Wu, Zhongang Qi, and Fuxin Li. 2019. Pointconv: Deep convolutional networks on 3D point clouds. In IEEE Conference on Computer Vision and Pattern Recognition. 9621--9630.

[46]

Yong Xu, Xiaozhao Fang, Jian Wu, Xuelong Li, and David Zhang. 2016. Discriminative transfer subspace learning via low-rank and sparse representation. IEEE Transactions on Image Processing 25, 2 (2016), 850--863.

Digital Library

[47]

Jianchao Yang, Kai Yu, Yihong Gong, and Thomas Huang. 2009. Linear spatial pyramid matching using sparse coding for image classification. In IEEE Conference on Computer Vision and Pattern Recognition. 1794--1801.

[48]

David Yarowsky. 1995. Unsupervised word sense disambiguation rivaling supervised methods. In Annual Meeting of the Association for Computational Linguistics. 189--196.

Digital Library

[49]

Minerva M. Yeung and Bede Liu. 1995. Efficient matching and clustering of video shots. In International Conference on Image Processing. 338--341.

[50]

Rex Ying, Jiaxuan You, Christopher Morris, Xiang Ren, Will Hamilton, and Jure Leskovec. 2018. Hierarchical graph representation learning with differentiable pooling. In Advances in Neural Information Processing Systems. 4800--4810.

[51]

Jiahui Yu, Zhe Lin, Jimei Yang, Xiaohui Shen, Xin Lu, and Thomas S. Huang. 2019. Free-form image in painting with gated convolution. In IEEE Conference on Computer Vision. 4471--4480.

[52]

Matthew D. Zeiler and Rob Fergus. 2014. Visualizing and understanding convolutional networks. In European Conference on Computer Vision. 818--833.

[53]

Chuxu Zhang, Dongjin Song, Chao Huang, Ananthram Swami, and Nitesh V. Chawla. 2019. Heterogeneous graph neural network. In 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 793--803.

[54]

Yuxiang Zhang, Bo Du, and Liangpei Zhang. 2014. A sparse representation-based binary hypothesis model for target detection in hyperspectral images. IEEE Transactions on Geoscience and Remote Sensing 53, 3 (2014), 1346--1354.

[55]

Yi Zhu, Xuegang Hu, Yuhong Zhang, and Peipei Li. 2018. Transfer learning with stacked reconstruction independent component analysis. Knowledge-Based Systems 152 (2018), 100--106.

Digital Library

[56]

Fuzhen Zhuang, Dan Luo, Xin Jin, Hui Xiong, Ping Luo, and Qing He. 2016. Representation learning via semi-supervised autoencoder for multi-task learning. In IEEE International Conference on Data Mining. 1141--1146.

[57]

Will Zou, Shenghuo Zhu, Kai Yu, and Andrew Ng. 2012. Deep learning of invariant features via simulated fixations in video. In Advances in Neural Information Processing Systems. 3203--3211.

Cited By

Zheng YJia C(2024)ProtoMGAE: Prototype-Aware Masked Graph Auto-Encoder for Graph Representation LearningACM Transactions on Knowledge Discovery from Data10.1145/364914318:6(1-22)Online publication date: 12-Apr-2024
https://dl.acm.org/doi/10.1145/3649143
Wang XZhang YHe MGuo SYang L(2024)Supervised Representation Learning for Network Traffic With Cluster CompressionIEEE Transactions on Sustainable Computing10.1109/TSUSC.2023.32924049:1(1-13)Online publication date: Jan-2024
https://doi.org/10.1109/TSUSC.2023.3292404
Zhu YGeng YLi YQiang JYuan YWu X(2024)Self-Adaptive Deep Asymmetric Network for Imbalanced RecommendationIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2023.33007408:1(968-980)Online publication date: Feb-2024
https://doi.org/10.1109/TETCI.2023.3300740
Show More Cited By

Index Terms

Stacked Convolutional Sparse Auto-Encoders for Representation Learning
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches
      1. Learning latent representations

Recommendations

Lossless-constraint Denoising based Auto-encoders

In this paper, we address the poor generalization ability problem of traditional auto-encoder on noise data, and propose a Lossless-constraint Denoising (LD) method, which can enhance the anti-noise ability and robustness of auto-encoders. We ...
Explicit guiding auto-encoders for learning meaningful representation

The auto-encoder model plays a crucial role in the success of deep learning. During the pre-training phase, auto-encoders learn a representation that helps improve the performance of the entire neural network during the fine-tuning phase of deep ...
Discriminatively boosted image clustering with fully convolutional auto-encoders
Highlights
- Fully convolutional deep auto-encoders are proposed for image feature extraction.
Abstract
Traditional image clustering methods take a two-step approach, feature learning and clustering, sequentially. However, recent research results demonstrated that combining the separated phases in a unified framework and training them ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 15, Issue 2

Survey Paper and Regular Papers

April 2021

524 pages

ISSN:1556-4681

EISSN:1556-472X

DOI:10.1145/3446665

Editor:
Charu Aggarwal
IBM T. J. Watson Research, USA

Issue’s Table of Contents

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 March 2021

Accepted: 01 November 2020

Revised: 01 August 2020

Received: 01 October 2019

Published in TKDD Volume 15, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Natural Science Foundation of China
National Key Research and Development Program of China
Ministry of Education

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
305
Total Downloads

Downloads (Last 12 months)51
Downloads (Last 6 weeks)7

Reflects downloads up to 18 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zheng YJia C(2024)ProtoMGAE: Prototype-Aware Masked Graph Auto-Encoder for Graph Representation LearningACM Transactions on Knowledge Discovery from Data10.1145/364914318:6(1-22)Online publication date: 12-Apr-2024
https://dl.acm.org/doi/10.1145/3649143
Wang XZhang YHe MGuo SYang L(2024)Supervised Representation Learning for Network Traffic With Cluster CompressionIEEE Transactions on Sustainable Computing10.1109/TSUSC.2023.32924049:1(1-13)Online publication date: Jan-2024
https://doi.org/10.1109/TSUSC.2023.3292404
Zhu YGeng YLi YQiang JYuan YWu X(2024)Self-Adaptive Deep Asymmetric Network for Imbalanced RecommendationIEEE Transactions on Emerging Topics in Computational Intelligence10.1109/TETCI.2023.33007408:1(968-980)Online publication date: Feb-2024
https://doi.org/10.1109/TETCI.2023.3300740
Zhu YGeng YLi YQiang JWu X(2024)Representation learning: serial-autoencoder for personalized recommendationFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-023-2441-118:4Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1007/s11704-023-2441-1
Wang ZZhu YLi YQiang JYuan YZhang C(2024)Asymmetric Short-Text Clustering via PromptNew Generation Computing10.1007/s00354-024-00244-742:4(599-615)Online publication date: 1-Nov-2024
https://dl.acm.org/doi/10.1007/s00354-024-00244-7
Yan SShao HXiao YLiu BWan J(2023)Hybrid robust convolutional autoencoder for unsupervised anomaly detection of machine tools under noisesRobotics and Computer-Integrated Manufacturing10.1016/j.rcim.2022.10244179(102441)Online publication date: Feb-2023
https://doi.org/10.1016/j.rcim.2022.102441
汤婧(2022)Method and Application of Visual Relationship Detection Based on Deep LearningJournal of Image and Signal Processing10.12677/JISP.2022.11301611:03(144-161)Online publication date: 2022
https://doi.org/10.12677/JISP.2022.113016
Zhu YWu XQiang JHu XZhang YLi P(2022)Representation learning with deep sparse auto-encoder for multi-task learningPattern Recognition10.1016/j.patcog.2022.108742129(108742)Online publication date: Sep-2022
https://doi.org/10.1016/j.patcog.2022.108742
Zhu YDong BSha Z(2021)Personalized Recommendation Based On Entity Attributes and Graph Features2021 IEEE International Conference on Big Knowledge (ICBK)10.1109/ICKG52313.2021.00011(7-14)Online publication date: Dec-2021
https://doi.org/10.1109/ICKG52313.2021.00011

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents