Deep mutual information multi-view representation for visual recognition

374 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Multi-view representation is a crucial but challenging issue in visual recognition task. To address this issue, a deep mutual information multi-view representation method is proposed in this paper. Firstly, multi-view inputs are fed to the encoder module of the variational auto encoder architecture to extract multi-view latent layer features. Secondly, the correlation between local features and latent layer features of each view is calculated by maximizing the mutual information. Meanwhile, to obtain a robust multi-view representation, the multi-view canonical correlation analysis and the mutual information maximization methods are used to calculate the canonical correlation of different view mean vectors and the information correlation of different view distributions, respectively. Finally, the supervised loss is used to improve the discriminability of the middle feature layers. The proposed method can obtain a more robust hidden layer representations and operate multi-view scenes with more than two views. Experimental results demonstrate that the proposed method achieves better recognition accuracy than other compared methods among five publicly available datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep cross-view autoencoder network for multi-view learning

Article 21 March 2022

Deep Generative Multi-view Learning

A Multi-view Images Classification Based on Shallow Convolutional Neural Network

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

We add a regularization term R_W = R_W + rI to ensure numerical stability, where r = 10^− 4 is the regularization parameter, and I is the identity matrix.

References

Li Y, Yang M, Zhang Z (2018) A survey of multi-view representation learning. IEEE Trans Knowl Data Eng 14(8):1–20
Google Scholar
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Article Google Scholar
Li YO, Wang W, Calhoun VD (2009) Joint blind source separation by multiset canonical correlation analysis. IEEE Trans Signal Process 57(10):3918–3929
Article MathSciNet Google Scholar
Yuan YH, Sun QS (2014) Multiset canonical correlations using globality preserving projections with applications to feature extraction and recognition. IEEE Trans Neural Netw Learn Syst 25(6):1131–1146
Article Google Scholar
Huang H, He H, Fan X, et al. (2010) Super-resolution of human face image using canonical correlation analysis. Pattern Recogn 43(7):2532–2543
Article Google Scholar
Gao L, Qi L, Chen E, et al. (2018) Discriminative multiple canonical correlation analysis for information fusion. IEEE Trans Image Process 27(4):1951–1965
Article MathSciNet Google Scholar
Hotelling H (1936) Relations between two sets of variates. Biometrika 28:321–377
Article Google Scholar
Hardoon DR, Szedmak S, Shawe-Taylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664
Article Google Scholar
Bach FR, Jordan MI (2002) Kernel independent component analysis. J Mach Learn Res 3:1–48
MathSciNet MATH Google Scholar
Lopez-Paz D, Sra S, Smola A et al (2014) Randomized nonlinear component analysis. In: International conference on machine learning (ICML), pp 1359–1367
Wang W, Livescu K (2016) Large-scale approximate kernel canonical correlation analysis. In: International conference on learning representations (ICLR) conference track proceedings. San Juan
Parra LC, Haufe S, Dmochowski JP (2018) Correlated components analysis: Extracting reliable dimensions in multivariate data. arXiv:1801.08881
Horst P (1961) Generalized canonical correlations and their applications to experimental data. J Clin Psychol 17(4):331–347
Article Google Scholar
Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 3864–3872
Lin Z, Ding G, Han J, Wang J (2017) Cross-view retrieval via probability-based semantics-preserving hashing. IEEE Trans Cybern 47(12):4342–4355
Article Google Scholar
Li K, Qi GJ, Ye J, Hua KA (2017) Linear subspace ranking hashing for cross-modal retrieval. IEEE Trans Pattern Anal Mach Intell 39(9):1825–1838
Article Google Scholar
Ding K, Fan B, Huo C, Xiang S, Pan C (2017) Cross-modal hashing via rank-order preserving. IEEE Trans Multimedia 19(3):571–585
Article Google Scholar
Zhang L, Ma B, Li G, Huang Q, Tian Q (2018) Generalized semi-supervised and structured subspace learning for cross-modal retrieval. IEEE Trans Multimedia 20(1):128–141
Article Google Scholar
Kan M, Shan S, Zhang H, Lao S, Chen X (2012) Multi-view discriminant analysis. In: European conference on computer vision (ECCV), pp 808–821
Kan M, Shan S, Zhang H, Lao S, Chen X (2016) Multi-view discriminant analysis. IEEE Trans Pattern Anal Mach Intell 38(1):188–194
Article Google Scholar
You X, Xu J, Yuan W et al (2019) Multi-view common component discriminant analysis for cross-view classification. Pattern Recogn 92:37–51
Article Google Scholar
Wu Y, Wang S, Huang Q (2019) Multi-modal semantic autoencoder for cross-modal retrieval. Neurocomputing 331:165–175
Article Google Scholar
Lai PL, Fyfe C (1999) A neural implementation of canonical correlation analysis. Neural Netw 12(10):1391–1397
Article Google Scholar
Andrew G, Arora R, Bilmes J, et al. (2013) Deep canonical correlation analysis. In: International conference on machine learning (ICML), pp 1247–1255
Wang W, Arora R, Livescu K et al (2015) On deep multi-view representation learning. In: International conference on machine learning (ICML), pp 1083–1092
Tang Q, Wang W, Livescu K (2017) Acoustic feature learning via deep variational canonical correlation analysis Interspeech. In: Conference track proceedings. Stockholm
Zhen L, Hu P, Wang X, et al. (2019) Deep supervised cross-modal retrieval. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 10386–10395
Hu P, Zhen L, Peng D, et al. (2019) Scalable deep multimodal learning for cross-modal retrieval. In: International ACM SIGIR conference on research and development in information retrieval, pp 635–644
Hu P, Peng D, Sang Y, et al. (2019) Multi-view linear discriminant analysis network. IEEE Trans Image Process 28(11):5352–5365
Article MathSciNet Google Scholar
Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng AY (2011) Multimodal deep learning. In: International conference on machine learning (ICML), pp 689–696
Srivastava N, Salakhutdinov RR (2012) Multimodal learning with deep Boltzmann machines. Advances in neural information processing systems (NeurIPS), pp 2222–2230
Rastegar S, Soleymani M, Rabiee HR, Mohsen Shojaee S (2016) Mdl-cw: a multimodal deep learning framework with cross weights. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 2601–2609
Xu J, Li W, Liu X, Zhang D, Liu J, Han J (2020) Deep embedded complementary and interactive information for multi-view classification. In: AAAI Conference on artificial intelligence, pp 6494–6501
Goodfellow I, Pouget-Abadie J et al (2014) Generative adversarial nets. Advances in neural information processing systems (NeurIPS), 2672–2680
Wang B, Yang Y, Xu X, Hanjalic A, Shen HT (2017) Adversarial cross-modal retrieval. ACM International Conference on Multimedia (ACM MM), 154–162
Wang X, Peng D, Hu P, Sang Y (2019) Adversarial correlated autoencoder for unsupervised multi-view representation learning. Knowl-Based Syst 168:109–120
Article Google Scholar
Lee C, Xie S, Gallagher P, et al. (2015) Deeply-supervised nets. International Conference on Artificial Intelligence and Statistics (AISTATS), pp 562–570
Gao W, Cao B, Shan S, Chen X, Zhou D, Zhang X, Zhao D (2008) The CAS-PEAL large-scale chinese face database and baseline evaluations. IEEE Syst Man Cybern A 38(1):149–161
Article Google Scholar
Ishmael B, Aristide B, Sai R, et al. (2018) Mutual information neural estimation. International Conference on Machine Learning (ICML), pp 530–539
Hjelm RD, Fedorov A, Lavoie-Marchildon S et al (2019) Learning deep representations by mutual information estimation and maximization. In: International conference on learning representations (ICLR), conference track proceedings. New Orleans
Kingma DP, Welling M (2014) Auto-encoding variational bayes. arXiv:1312.6114
Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization. In: International conference on learning representations (ICLR), conference track proceedings. San Diego
Xiao H, Rasul K, Vollgraf R (2017) Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747
Lai K, Bo L, Ren X, Fox D (2011) A large-scale hierarchical multi-view RGB-D object dataset. In: IEEE International conference on robotics and automation (ICRA), pp 1817–1824
Sim T, Baker S, Bsat M (2003) The CMU pose, illumination, and expression database. IEEE Trans Pattern Anal Mach Intell 25(12):1615–1618
Article Google Scholar
LeCun Y, Bottou L, Bengio Y et al (1998) Gradient-based learning applied to document recognition. P IEEE 86(11):2278–2324
Article Google Scholar
Hammami N, Bedda M (2010) Improved tree model for arabic speech recognition. In: IEEE International conference on computer science and information technology (CSIT), pp 521–526

Download references

Acknowledgments

This work was supported by Natural Science Foundations of China (No. 61771091, 61871066), National High Technology Research and Development Program (863 Program) of China (No. 2015AA016306), Natural Science Foundation of Liaoning Province of China (No. 20170540159), and Fundamental Research Fund for the Central Universities of China (No. DUT17LAB04).

Author information

Authors and Affiliations

School of Information and Communication Engineering, Dalian University of Technology, Dalian, China
Xianfa Xu, Zhe Chen & Fuliang Yin

Authors

Xianfa Xu
View author publications
You can also search for this author in PubMed Google Scholar
Zhe Chen
View author publications
You can also search for this author in PubMed Google Scholar
Fuliang Yin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhe Chen.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Multi-view Learning Guest Editors: Guoqing Chao, Xingquan Zhu, Weiping Ding, Jinbo Bi and Shiliang Sun

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, X., Chen, Z. & Yin, F. Deep mutual information multi-view representation for visual recognition. Appl Intell 52, 14888–14904 (2022). https://doi.org/10.1007/s10489-022-03462-y

Download citation

Accepted: 03 March 2022
Published: 27 April 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s10489-022-03462-y

Deep mutual information multi-view representation for visual recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Deep cross-view autoencoder network for multi-view learning

Deep Generative Multi-view Learning

A Multi-view Images Classification Based on Shallow Convolutional Neural Network

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Deep mutual information multi-view representation for visual recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Deep cross-view autoencoder network for multi-view learning

Deep Generative Multi-view Learning

A Multi-view Images Classification Based on Shallow Convolutional Neural Network

Explore related subjects

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation