Multimodal information fusion based on LSTM for 3D model retrieval

Qi Liang¹,
Ning Xu¹,
Weijie Wang¹ &
…
Xingjian Long¹

430 Accesses
5 Citations
Explore all metrics

Abstract

With advances in low-cost 3D model capturing devices and virtual 3D model building software, the acquisition of 3D data has become increasingly easier. The subsequent 3D model retrieval skill has also become essential when we utilize 3D models. Although a large number of methods have been proposed to address this problem, most of them cannot fully utilize the information represented by 3D models. To solve this problem. We present a multimodal feature fusion method based on the LSTM network. First, we placed some cameras evenly around the 3D model at a fixed distance, which was aimed at the centroid of the 3D model to obtain a series of pictures. Second, the skeleton information is extracted from these rendered pictures. Finally, the rendered pictures, along with the skeleton information, were sequentially fed into the LSTM network to obtain the feature of the fusion information. The confusion matrix was completed to evaluate retrieval performance. In the experiment section, datasets named NTU and ModelNet40 were utilized to demonstrate the performance of the proposed method. Many experiments and corresponding experimental results also demonstrated the superiority of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

3D shape recognition based on multi-modal information fusion

Article 23 January 2020

Multi-scale CNNs for 3D model retrieval

Article 19 January 2018

The assessment of 3D model representation for retrieval with CNN-RNN networks

Article 03 January 2019

References

Akgül CB, Sankur B, Yemez Y, Schmitt F (2009) 3D model retrieval using probability density-based shape descriptors. IEEE Trans Pattern Anal Mach Intell 31 (6):1117–1133
Article Google Scholar
Ansary TF, Daoudi M, Vandeborre JP (2006) A bayesian 3-d search engine using adaptive views clustering. IEEE Trans Multimed 9(1):78–88
Article Google Scholar
Bai S, Bai X, Zhou Z, Zhang Z, Tian Q, Latecki LJ (2017) Gift: towards scalable 3d shape retrieval. IEEE Trans Multimed 19(6):1257–1271. https://doi.org/10.1109/TMM.2017.2652071
Article Google Scholar
Bu S, Liu Z, Han J, Wu J, Ji R (2014) Learning high-level feature by deep belief networks for 3-d model retrieval and recognition. IEEE Trans Multimed 16 (8):2154–2167
Article Google Scholar
Bustos B (2005) Feature-based similarity search in 3d object databases. Acm Computing Surveys 37(4):345–387
Article Google Scholar
Cao B, Kang Y, Lin S, Luo X, Xu S, Lv Z (2016) Style-sensitive 3d model retrieval through sketch-based queries. J Intell Fuzzy Sys 31(5):2637–2644
Article Google Scholar
Conrad M, De Doncker RW, Schniedenharn M, Diatlov A (2014) Packaging for power semiconductors based on the 3d printing technology selective laser melting. In: European conference on power electronics and applications, pp 1–7
Feng Y, Zizhao Z, Zhao X, Ji R, Gao Y (2018) Gvcnn: group-view convolutional neural networks for 3d shape recognition, pp 264–272. https://doi.org/10.1109/CVPR.2018.00035
Furuya T, Ohbuchi R (2016) Deep aggregation of local 3d geometric features for 3d model retrieval
Gao Y, Dai Q (2014) View-based 3d object retrieval: challenges and approaches. IEEE Multimed 21(3):52–57
Article Google Scholar
Gao Y, Tang J, Hong R, Yan S (2012) Camera constraint-free view-based 3-d object retrieval. IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society 21(4):2269–2281
Article MathSciNet Google Scholar
Hu F, Xia G-S, Hu J, Zhang L (2015) Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery. Remote Sens 7(11):14680–14707
Article Google Scholar
Irfanoglu M, Gokberk B, Akarun L (2004) 3d shape-based face recognition using registered surface similarity. In: Proceedings of the IEEE 12th signal processing and communications applications conference, 2004. IEEE, pp 571–574
Kanezaki A, Matsushita Y, Nishida Y (2018) Rotationnet: joint learning of object classification and viewpoint estimation using unaligned 3d object dataset. arXiv:1603.06208
Kazhdan M, Funkhouser T, Rusinkiewicz S (2003) Rotation invariant spherical harmonic representation of 3 d shape descriptors. In: Symposium on geometry processing, vol 6, pp 156–164
Leng B, Guo S, Du C, Zeng J, Xiong Z (2017) 3D object retrieval based on viewpoint segmentation. Multimed Sys 23(1):19–28
Article Google Scholar
Liu A, Nie W, Gao Y, Su Y (2016) Multi-modal clique-graph matching for view-based 3d model retrieval. IEEE Trans Image Process 25(5):2103–2116
Article MathSciNet Google Scholar
Liu AA, Nie WZ, Gao Y, Su YT (2016) Multi-modal clique-graph matching for view-based 3d model retrieval. IEEE Trans Image Process 25(5):2103–2116
Article MathSciNet Google Scholar
Liu A, Nie W, Gao Y, Su Y (2018) View-based 3-d model retrieval: a benchmark. IEEE Trans Sys Man Cybern 48:916–928
Google Scholar
Liu A, Su Y, Nie W, Kankanhalli M (2016) Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE Trans Pattern Anal Mach Intell 39(1):102–114
Article Google Scholar
Liu A, Wang Z, Nie W, Su Y (2015) Graph-based characteristic view set extraction and matching for 3d model retrieval. Inf Sci 320:429–442
Article Google Scholar
Liu A-A, Nie W-Z, Gao Y, Su Y-T (2018) View-based 3-d model retrieval: a benchmark. IEEE Trans Cybern 48(3):916–928
Google Scholar
Liu Q (2012) A survey of recent view-based 3d model retrieval methods. arXiv:1208.3670
Ma C, Guo Y, Yang J, An W (2019) Learning multi-view representation with lstm for 3-d shape recognition and retrieval. IEEE Trans Multimed 21(5):1169–1182. https://doi.org/10.1109/TMM.2018.2875512
Article Google Scholar
Nie L, Wang M, Zha Z, Li G, Chua T-S (2011) Multimedia answering: enriching text qa with media information. In: Proceedings of the 34th international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 695–704
Nie L, Wang M, Zha Z-J, Chua T-S (2012) Oracle in image search: a content-based approach to performance prediction. ACM Trans Inform Sys (TOIS) 30 (2):13
Google Scholar
Nie W, Liu A, Gao Y, Su Y (2019) Hyper-clique graph matching and applications. IEEE Trans Circ Sys Video Technol 29(6):1619–1630. https://doi.org/10.1109/TCSVT.2018.2852310
Article Google Scholar
Nie W, Wang K, Wang H, Su Y (2019) The assessment of 3d model representation for retrieval with cnn-rnn networks. Multimed Tools Appl
Nie W, Wang W, Liu A, Chen C (2019) Characteristic views extraction modal based-on deep reinforcement learning for 3d model retrieval. pp 2389–2393. https://doi.org/10.1109/ICIP.2019.8803343
Papoiu AD, Emerson NM, Patel TS, Kraft RA, Valdes-Rodriguez R, Nattkemper LA, Coghill RC, Yosipovitch G (2014) Voxel-based morphometry and arterial spin labeling fmri reveal neuropathic and neuroplastic features of brain processing of itch in end-stage renal disease. J Neurophys 112(7):1729–38
Article Google Scholar
Paquet E, Rioux M, Murching A, Naveen T, Tabatabai A (2000) Description of shape information for 2-d and 3-d objects. Signal Processing Image Communication 16(s 1–2):103–122
Article Google Scholar
Pickup D, Sun X, Rosin PL, Martin RR, Cheng Z, Nie S, Jin L (2015) Canonical forms for non-rigid 3d shape retrieval. In: Eurographics workshop on 3d object retrieval, pp 99–106
Saupe D, Vranić DV (2001) 3d model retrieval with spherical harmonics and moments. In: Joint pattern recognition symposium. Springer, Berlin, pp 392–397
Sfikas K, Theoharis T, Pratikakis I (2017) Exploiting the panorama representation for convolutional neural network classification and retrieval. In: Eurographics workshop on 3d object retrieval
Shen W, Zhao K, Jiang Y, Wang Y, Bai X, Yuille A (2016) Deepskeleton: learning multi-task scale-associated deep side outputs for object skeleton extraction in natural images. IEEE Trans Image Process PP(99):1–1
MATH Google Scholar
Shen W, Zhao K, Jiang Y, Wang Y, Bai X, Yuille A (2017) Deepskeleton: learning multi-task scale-associated deep side outputs for object skeleton extraction in natural images. IEEE Trans Image Process 26(11):5298–5311
Article MathSciNet Google Scholar
Shi B, Bai S, Zhou Z, Bai X (2015) Deeppano: deep panoramic representation for 3-d shape recognition. IEEE Signal Process Lett 22(12):2339–2343
Article Google Scholar
Shinagawa Y, Kunii TL (1991) Constructing a reeb graph automatically from cross sections. IEEE Comput Graph Appl 11(6):44–51
Article Google Scholar
Su H, Maji S, Kalogerakis E, Learnedmiller E (2015) Multi-view convolutional neural networks for 3d shape recognition, pp 945–953
Sundar H, Silver D, Gagvani N, Dickinson S (2003) Skeleton based shape matching and retrieval. In: Shape modeling international, p 130
Tangelder JW, Veltkamp RC (2003) Polyhedral model retrieval using weighted point sets. Int J Image Graph 3(01):209–229
Article Google Scholar
Wang D, Wang B, Zhao S, Yao H, Liu H (2017) View-based 3d object retrieval with discriminative views. Neurocomputing 252(C):58–66
Article Google Scholar
Wu Z, Song S, Khosla A, Yu F (2015) 3d shapenets: a deep representation for volumetric shapes. In: IEEE conference on computer vision and pattern recognition, pp 1912–1920
Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2014) 3d shapenets: a deep representation for volumetric shapes, pp 1912–1920
Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J, Wu Z, Song S, Khosla A (2015) 3D shapenets a deep representation for volumetric shapes. In: IEEE conference on computer vision & pattern recognition
Xie J, Fang Y, Zhu F, Wong E (2015) Deepshape: deep learned shape descriptor for 3d shape matching and retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1275–1283
Xu K, Shi Y, Zheng L, Zhang J, Liu M, Huang H, Su H, Cohen-Or D, Chen B (2016) 3D attention-driven depth acquisition for object identification. ACM Transactions on Graphics (TOG) 35(6):238
Google Scholar
Yang S, Ramanan D (2015) Multi-scale recognition with DAG-CNNs. In: 2015 IEEE International conference on computer vision, ICCV 2015, Santiago, Chile, December 7-13, 2015. IEEE Computer Society, pp 1215–1223, DOI https://doi.org/10.1109/ICCV.2015.144, (to appear in print)
Zhao S, Yao H, Zhang Y, Wang Y, Liu S (2015) View-based 3d object retrieval via multi-modal graph learning. Signal Process 112:110–118
Article Google Scholar
Zhao S, Yao H, Zhang Y, Wang Y, Liu S (2015) View-based 3d object retrieval via multi-modal graph learning. Signal Process 112:110–118
Article Google Scholar
Zhao X, Si S, Dui H, Cai Z, Sun S (2013) Integrated importance measure for multi-state coherent systems of k level. J Syst Eng Electron 24(6):1029–1037
Article Google Scholar
Zhao X, Si S, Dui H, Cai Z, Wang J, Song X (2015) Compositional performance evaluation with importance measures. Communications in Statistics-Theory and Methods 44(24):5240–5253
Article MathSciNet Google Scholar
Zhu L, Huang Z, Li Z, Xie L, Shen HT (2018) Exploring auxiliary context: discrete semantic transfer hashing for scalable image retrieval. IEEE Trans Neural Netw Learn Sys PP(99):1–13
MathSciNet Google Scholar
Zhu L, Huang Z, Liu X, He X, Sun J, Zhou X (2017) Discrete multimodal hashing with canonical views for robust mobile landmark search. IEEE Trans Multimed 19(9):2066–2079
Article Google Scholar
Zhu L, Shen J, Jin H, Zheng R, Xie L (2015) Content-based visual landmark search via multimodal hypergraph learning. IEEE Trans Sys Man Cybern 45 (12):2756–2769
Google Scholar
Zhu L, Shen J, Xie L, Cheng Z (2017) Unsupervised visual hashing with semantic assistant for content-based image retrieval. IEEE Trans Knowl Data Eng 29 (2):472–486
Article Google Scholar

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (61472275, 61170239, 61303208, 61502337).

Author information

Authors and Affiliations

The School of Electrical and Information Engineering, Tianjin University, Tianjin, China
Qi Liang, Ning Xu, Weijie Wang & Xingjian Long

Authors

Qi Liang
View author publications
You can also search for this author in PubMed Google Scholar
Ning Xu
View author publications
You can also search for this author in PubMed Google Scholar
Weijie Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xingjian Long
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Ning Xu or Weijie Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liang, Q., Xu, N., Wang, W. et al. Multimodal information fusion based on LSTM for 3D model retrieval. Multimed Tools Appl 79, 33943–33956 (2020). https://doi.org/10.1007/s11042-020-08817-6

Download citation

Received: 13 January 2019
Revised: 11 February 2020
Accepted: 28 February 2020
Published: 11 April 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s11042-020-08817-6

Multimodal information fusion based on LSTM for 3D model retrieval

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

3D shape recognition based on multi-modal information fusion

Multi-scale CNNs for 3D model retrieval

The assessment of 3D model representation for retrieval with CNN-RNN networks

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Multimodal information fusion based on LSTM for 3D model retrieval

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

3D shape recognition based on multi-modal information fusion

Multi-scale CNNs for 3D model retrieval

The assessment of 3D model representation for retrieval with CNN-RNN networks

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation