Abstract
Multi-view representation is a crucial but challenging issue in visual recognition task. To address this issue, a deep mutual information multi-view representation method is proposed in this paper. Firstly, multi-view inputs are fed to the encoder module of the variational auto encoder architecture to extract multi-view latent layer features. Secondly, the correlation between local features and latent layer features of each view is calculated by maximizing the mutual information. Meanwhile, to obtain a robust multi-view representation, the multi-view canonical correlation analysis and the mutual information maximization methods are used to calculate the canonical correlation of different view mean vectors and the information correlation of different view distributions, respectively. Finally, the supervised loss is used to improve the discriminability of the middle feature layers. The proposed method can obtain a more robust hidden layer representations and operate multi-view scenes with more than two views. Experimental results demonstrate that the proposed method achieves better recognition accuracy than other compared methods among five publicly available datasets.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
We add a regularization term RW = RW + rI to ensure numerical stability, where r = 10− 4 is the regularization parameter, and I is the identity matrix.
References
Li Y, Yang M, Zhang Z (2018) A survey of multi-view representation learning. IEEE Trans Knowl Data Eng 14(8):1–20
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828
Li YO, Wang W, Calhoun VD (2009) Joint blind source separation by multiset canonical correlation analysis. IEEE Trans Signal Process 57(10):3918–3929
Yuan YH, Sun QS (2014) Multiset canonical correlations using globality preserving projections with applications to feature extraction and recognition. IEEE Trans Neural Netw Learn Syst 25(6):1131–1146
Huang H, He H, Fan X, et al. (2010) Super-resolution of human face image using canonical correlation analysis. Pattern Recogn 43(7):2532–2543
Gao L, Qi L, Chen E, et al. (2018) Discriminative multiple canonical correlation analysis for information fusion. IEEE Trans Image Process 27(4):1951–1965
Hotelling H (1936) Relations between two sets of variates. Biometrika 28:321–377
Hardoon DR, Szedmak S, Shawe-Taylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664
Bach FR, Jordan MI (2002) Kernel independent component analysis. J Mach Learn Res 3:1–48
Lopez-Paz D, Sra S, Smola A et al (2014) Randomized nonlinear component analysis. In: International conference on machine learning (ICML), pp 1359–1367
Wang W, Livescu K (2016) Large-scale approximate kernel canonical correlation analysis. In: International conference on learning representations (ICLR) conference track proceedings. San Juan
Parra LC, Haufe S, Dmochowski JP (2018) Correlated components analysis: Extracting reliable dimensions in multivariate data. arXiv:1801.08881
Horst P (1961) Generalized canonical correlations and their applications to experimental data. J Clin Psychol 17(4):331–347
Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 3864–3872
Lin Z, Ding G, Han J, Wang J (2017) Cross-view retrieval via probability-based semantics-preserving hashing. IEEE Trans Cybern 47(12):4342–4355
Li K, Qi GJ, Ye J, Hua KA (2017) Linear subspace ranking hashing for cross-modal retrieval. IEEE Trans Pattern Anal Mach Intell 39(9):1825–1838
Ding K, Fan B, Huo C, Xiang S, Pan C (2017) Cross-modal hashing via rank-order preserving. IEEE Trans Multimedia 19(3):571–585
Zhang L, Ma B, Li G, Huang Q, Tian Q (2018) Generalized semi-supervised and structured subspace learning for cross-modal retrieval. IEEE Trans Multimedia 20(1):128–141
Kan M, Shan S, Zhang H, Lao S, Chen X (2012) Multi-view discriminant analysis. In: European conference on computer vision (ECCV), pp 808–821
Kan M, Shan S, Zhang H, Lao S, Chen X (2016) Multi-view discriminant analysis. IEEE Trans Pattern Anal Mach Intell 38(1):188–194
You X, Xu J, Yuan W et al (2019) Multi-view common component discriminant analysis for cross-view classification. Pattern Recogn 92:37–51
Wu Y, Wang S, Huang Q (2019) Multi-modal semantic autoencoder for cross-modal retrieval. Neurocomputing 331:165–175
Lai PL, Fyfe C (1999) A neural implementation of canonical correlation analysis. Neural Netw 12(10):1391–1397
Andrew G, Arora R, Bilmes J, et al. (2013) Deep canonical correlation analysis. In: International conference on machine learning (ICML), pp 1247–1255
Wang W, Arora R, Livescu K et al (2015) On deep multi-view representation learning. In: International conference on machine learning (ICML), pp 1083–1092
Tang Q, Wang W, Livescu K (2017) Acoustic feature learning via deep variational canonical correlation analysis Interspeech. In: Conference track proceedings. Stockholm
Zhen L, Hu P, Wang X, et al. (2019) Deep supervised cross-modal retrieval. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 10386–10395
Hu P, Zhen L, Peng D, et al. (2019) Scalable deep multimodal learning for cross-modal retrieval. In: International ACM SIGIR conference on research and development in information retrieval, pp 635–644
Hu P, Peng D, Sang Y, et al. (2019) Multi-view linear discriminant analysis network. IEEE Trans Image Process 28(11):5352–5365
Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng AY (2011) Multimodal deep learning. In: International conference on machine learning (ICML), pp 689–696
Srivastava N, Salakhutdinov RR (2012) Multimodal learning with deep Boltzmann machines. Advances in neural information processing systems (NeurIPS), pp 2222–2230
Rastegar S, Soleymani M, Rabiee HR, Mohsen Shojaee S (2016) Mdl-cw: a multimodal deep learning framework with cross weights. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 2601–2609
Xu J, Li W, Liu X, Zhang D, Liu J, Han J (2020) Deep embedded complementary and interactive information for multi-view classification. In: AAAI Conference on artificial intelligence, pp 6494–6501
Goodfellow I, Pouget-Abadie J et al (2014) Generative adversarial nets. Advances in neural information processing systems (NeurIPS), 2672–2680
Wang B, Yang Y, Xu X, Hanjalic A, Shen HT (2017) Adversarial cross-modal retrieval. ACM International Conference on Multimedia (ACM MM), 154–162
Wang X, Peng D, Hu P, Sang Y (2019) Adversarial correlated autoencoder for unsupervised multi-view representation learning. Knowl-Based Syst 168:109–120
Lee C, Xie S, Gallagher P, et al. (2015) Deeply-supervised nets. International Conference on Artificial Intelligence and Statistics (AISTATS), pp 562–570
Gao W, Cao B, Shan S, Chen X, Zhou D, Zhang X, Zhao D (2008) The CAS-PEAL large-scale chinese face database and baseline evaluations. IEEE Syst Man Cybern A 38(1):149–161
Ishmael B, Aristide B, Sai R, et al. (2018) Mutual information neural estimation. International Conference on Machine Learning (ICML), pp 530–539
Hjelm RD, Fedorov A, Lavoie-Marchildon S et al (2019) Learning deep representations by mutual information estimation and maximization. In: International conference on learning representations (ICLR), conference track proceedings. New Orleans
Kingma DP, Welling M (2014) Auto-encoding variational bayes. arXiv:1312.6114
Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization. In: International conference on learning representations (ICLR), conference track proceedings. San Diego
Xiao H, Rasul K, Vollgraf R (2017) Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747
Lai K, Bo L, Ren X, Fox D (2011) A large-scale hierarchical multi-view RGB-D object dataset. In: IEEE International conference on robotics and automation (ICRA), pp 1817–1824
Sim T, Baker S, Bsat M (2003) The CMU pose, illumination, and expression database. IEEE Trans Pattern Anal Mach Intell 25(12):1615–1618
LeCun Y, Bottou L, Bengio Y et al (1998) Gradient-based learning applied to document recognition. P IEEE 86(11):2278–2324
Hammami N, Bedda M (2010) Improved tree model for arabic speech recognition. In: IEEE International conference on computer science and information technology (CSIT), pp 521–526
Acknowledgments
This work was supported by Natural Science Foundations of China (No. 61771091, 61871066), National High Technology Research and Development Program (863 Program) of China (No. 2015AA016306), Natural Science Foundation of Liaoning Province of China (No. 20170540159), and Fundamental Research Fund for the Central Universities of China (No. DUT17LAB04).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article belongs to the Topical Collection: Special Issue on Multi-view Learning Guest Editors: Guoqing Chao, Xingquan Zhu, Weiping Ding, Jinbo Bi and Shiliang Sun
Rights and permissions
About this article
Cite this article
Xu, X., Chen, Z. & Yin, F. Deep mutual information multi-view representation for visual recognition. Appl Intell 52, 14888–14904 (2022). https://doi.org/10.1007/s10489-022-03462-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03462-y