Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Deep mutual information multi-view representation for visual recognition

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Multi-view representation is a crucial but challenging issue in visual recognition task. To address this issue, a deep mutual information multi-view representation method is proposed in this paper. Firstly, multi-view inputs are fed to the encoder module of the variational auto encoder architecture to extract multi-view latent layer features. Secondly, the correlation between local features and latent layer features of each view is calculated by maximizing the mutual information. Meanwhile, to obtain a robust multi-view representation, the multi-view canonical correlation analysis and the mutual information maximization methods are used to calculate the canonical correlation of different view mean vectors and the information correlation of different view distributions, respectively. Finally, the supervised loss is used to improve the discriminability of the middle feature layers. The proposed method can obtain a more robust hidden layer representations and operate multi-view scenes with more than two views. Experimental results demonstrate that the proposed method achieves better recognition accuracy than other compared methods among five publicly available datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. We add a regularization term RW = RW + rI to ensure numerical stability, where r = 10− 4 is the regularization parameter, and I is the identity matrix.

References

  1. Li Y, Yang M, Zhang Z (2018) A survey of multi-view representation learning. IEEE Trans Knowl Data Eng 14(8):1–20

    Google Scholar 

  2. Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828

    Article  Google Scholar 

  3. Li YO, Wang W, Calhoun VD (2009) Joint blind source separation by multiset canonical correlation analysis. IEEE Trans Signal Process 57(10):3918–3929

    Article  MathSciNet  Google Scholar 

  4. Yuan YH, Sun QS (2014) Multiset canonical correlations using globality preserving projections with applications to feature extraction and recognition. IEEE Trans Neural Netw Learn Syst 25(6):1131–1146

    Article  Google Scholar 

  5. Huang H, He H, Fan X, et al. (2010) Super-resolution of human face image using canonical correlation analysis. Pattern Recogn 43(7):2532–2543

    Article  Google Scholar 

  6. Gao L, Qi L, Chen E, et al. (2018) Discriminative multiple canonical correlation analysis for information fusion. IEEE Trans Image Process 27(4):1951–1965

    Article  MathSciNet  Google Scholar 

  7. Hotelling H (1936) Relations between two sets of variates. Biometrika 28:321–377

    Article  Google Scholar 

  8. Hardoon DR, Szedmak S, Shawe-Taylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664

    Article  Google Scholar 

  9. Bach FR, Jordan MI (2002) Kernel independent component analysis. J Mach Learn Res 3:1–48

    MathSciNet  MATH  Google Scholar 

  10. Lopez-Paz D, Sra S, Smola A et al (2014) Randomized nonlinear component analysis. In: International conference on machine learning (ICML), pp 1359–1367

  11. Wang W, Livescu K (2016) Large-scale approximate kernel canonical correlation analysis. In: International conference on learning representations (ICLR) conference track proceedings. San Juan

  12. Parra LC, Haufe S, Dmochowski JP (2018) Correlated components analysis: Extracting reliable dimensions in multivariate data. arXiv:1801.08881

  13. Horst P (1961) Generalized canonical correlations and their applications to experimental data. J Clin Psychol 17(4):331–347

    Article  Google Scholar 

  14. Lin Z, Ding G, Hu M, Wang J (2015) Semantics-preserving hashing for cross-view retrieval. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 3864–3872

  15. Lin Z, Ding G, Han J, Wang J (2017) Cross-view retrieval via probability-based semantics-preserving hashing. IEEE Trans Cybern 47(12):4342–4355

    Article  Google Scholar 

  16. Li K, Qi GJ, Ye J, Hua KA (2017) Linear subspace ranking hashing for cross-modal retrieval. IEEE Trans Pattern Anal Mach Intell 39(9):1825–1838

    Article  Google Scholar 

  17. Ding K, Fan B, Huo C, Xiang S, Pan C (2017) Cross-modal hashing via rank-order preserving. IEEE Trans Multimedia 19(3):571–585

    Article  Google Scholar 

  18. Zhang L, Ma B, Li G, Huang Q, Tian Q (2018) Generalized semi-supervised and structured subspace learning for cross-modal retrieval. IEEE Trans Multimedia 20(1):128–141

    Article  Google Scholar 

  19. Kan M, Shan S, Zhang H, Lao S, Chen X (2012) Multi-view discriminant analysis. In: European conference on computer vision (ECCV), pp 808–821

  20. Kan M, Shan S, Zhang H, Lao S, Chen X (2016) Multi-view discriminant analysis. IEEE Trans Pattern Anal Mach Intell 38(1):188–194

    Article  Google Scholar 

  21. You X, Xu J, Yuan W et al (2019) Multi-view common component discriminant analysis for cross-view classification. Pattern Recogn 92:37–51

    Article  Google Scholar 

  22. Wu Y, Wang S, Huang Q (2019) Multi-modal semantic autoencoder for cross-modal retrieval. Neurocomputing 331:165–175

    Article  Google Scholar 

  23. Lai PL, Fyfe C (1999) A neural implementation of canonical correlation analysis. Neural Netw 12(10):1391–1397

    Article  Google Scholar 

  24. Andrew G, Arora R, Bilmes J, et al. (2013) Deep canonical correlation analysis. In: International conference on machine learning (ICML), pp 1247–1255

  25. Wang W, Arora R, Livescu K et al (2015) On deep multi-view representation learning. In: International conference on machine learning (ICML), pp 1083–1092

  26. Tang Q, Wang W, Livescu K (2017) Acoustic feature learning via deep variational canonical correlation analysis Interspeech. In: Conference track proceedings. Stockholm

  27. Zhen L, Hu P, Wang X, et al. (2019) Deep supervised cross-modal retrieval. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 10386–10395

  28. Hu P, Zhen L, Peng D, et al. (2019) Scalable deep multimodal learning for cross-modal retrieval. In: International ACM SIGIR conference on research and development in information retrieval, pp 635–644

  29. Hu P, Peng D, Sang Y, et al. (2019) Multi-view linear discriminant analysis network. IEEE Trans Image Process 28(11):5352–5365

    Article  MathSciNet  Google Scholar 

  30. Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng AY (2011) Multimodal deep learning. In: International conference on machine learning (ICML), pp 689–696

  31. Srivastava N, Salakhutdinov RR (2012) Multimodal learning with deep Boltzmann machines. Advances in neural information processing systems (NeurIPS), pp 2222–2230

  32. Rastegar S, Soleymani M, Rabiee HR, Mohsen Shojaee S (2016) Mdl-cw: a multimodal deep learning framework with cross weights. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 2601–2609

  33. Xu J, Li W, Liu X, Zhang D, Liu J, Han J (2020) Deep embedded complementary and interactive information for multi-view classification. In: AAAI Conference on artificial intelligence, pp 6494–6501

  34. Goodfellow I, Pouget-Abadie J et al (2014) Generative adversarial nets. Advances in neural information processing systems (NeurIPS), 2672–2680

  35. Wang B, Yang Y, Xu X, Hanjalic A, Shen HT (2017) Adversarial cross-modal retrieval. ACM International Conference on Multimedia (ACM MM), 154–162

  36. Wang X, Peng D, Hu P, Sang Y (2019) Adversarial correlated autoencoder for unsupervised multi-view representation learning. Knowl-Based Syst 168:109–120

    Article  Google Scholar 

  37. Lee C, Xie S, Gallagher P, et al. (2015) Deeply-supervised nets. International Conference on Artificial Intelligence and Statistics (AISTATS), pp 562–570

  38. Gao W, Cao B, Shan S, Chen X, Zhou D, Zhang X, Zhao D (2008) The CAS-PEAL large-scale chinese face database and baseline evaluations. IEEE Syst Man Cybern A 38(1):149–161

    Article  Google Scholar 

  39. Ishmael B, Aristide B, Sai R, et al. (2018) Mutual information neural estimation. International Conference on Machine Learning (ICML), pp 530–539

  40. Hjelm RD, Fedorov A, Lavoie-Marchildon S et al (2019) Learning deep representations by mutual information estimation and maximization. In: International conference on learning representations (ICLR), conference track proceedings. New Orleans

  41. Kingma DP, Welling M (2014) Auto-encoding variational bayes. arXiv:1312.6114

  42. Kingma DP, Ba JL (2015) Adam: a method for stochastic optimization. In: International conference on learning representations (ICLR), conference track proceedings. San Diego

  43. Xiao H, Rasul K, Vollgraf R (2017) Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747

  44. Lai K, Bo L, Ren X, Fox D (2011) A large-scale hierarchical multi-view RGB-D object dataset. In: IEEE International conference on robotics and automation (ICRA), pp 1817–1824

  45. Sim T, Baker S, Bsat M (2003) The CMU pose, illumination, and expression database. IEEE Trans Pattern Anal Mach Intell 25(12):1615–1618

    Article  Google Scholar 

  46. LeCun Y, Bottou L, Bengio Y et al (1998) Gradient-based learning applied to document recognition. P IEEE 86(11):2278–2324

    Article  Google Scholar 

  47. Hammami N, Bedda M (2010) Improved tree model for arabic speech recognition. In: IEEE International conference on computer science and information technology (CSIT), pp 521–526

Download references

Acknowledgments

This work was supported by Natural Science Foundations of China (No. 61771091, 61871066), National High Technology Research and Development Program (863 Program) of China (No. 2015AA016306), Natural Science Foundation of Liaoning Province of China (No. 20170540159), and Fundamental Research Fund for the Central Universities of China (No. DUT17LAB04).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhe Chen.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Multi-view Learning Guest Editors: Guoqing Chao, Xingquan Zhu, Weiping Ding, Jinbo Bi and Shiliang Sun

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, X., Chen, Z. & Yin, F. Deep mutual information multi-view representation for visual recognition. Appl Intell 52, 14888–14904 (2022). https://doi.org/10.1007/s10489-022-03462-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03462-y

Keywords

Navigation