Abstract
With advances in low-cost 3D model capturing devices and virtual 3D model building software, the acquisition of 3D data has become increasingly easier. The subsequent 3D model retrieval skill has also become essential when we utilize 3D models. Although a large number of methods have been proposed to address this problem, most of them cannot fully utilize the information represented by 3D models. To solve this problem. We present a multimodal feature fusion method based on the LSTM network. First, we placed some cameras evenly around the 3D model at a fixed distance, which was aimed at the centroid of the 3D model to obtain a series of pictures. Second, the skeleton information is extracted from these rendered pictures. Finally, the rendered pictures, along with the skeleton information, were sequentially fed into the LSTM network to obtain the feature of the fusion information. The confusion matrix was completed to evaluate retrieval performance. In the experiment section, datasets named NTU and ModelNet40 were utilized to demonstrate the performance of the proposed method. Many experiments and corresponding experimental results also demonstrated the superiority of our approach.
Similar content being viewed by others
References
Akgül CB, Sankur B, Yemez Y, Schmitt F (2009) 3D model retrieval using probability density-based shape descriptors. IEEE Trans Pattern Anal Mach Intell 31 (6):1117–1133
Ansary TF, Daoudi M, Vandeborre JP (2006) A bayesian 3-d search engine using adaptive views clustering. IEEE Trans Multimed 9(1):78–88
Bai S, Bai X, Zhou Z, Zhang Z, Tian Q, Latecki LJ (2017) Gift: towards scalable 3d shape retrieval. IEEE Trans Multimed 19(6):1257–1271. https://doi.org/10.1109/TMM.2017.2652071
Bu S, Liu Z, Han J, Wu J, Ji R (2014) Learning high-level feature by deep belief networks for 3-d model retrieval and recognition. IEEE Trans Multimed 16 (8):2154–2167
Bustos B (2005) Feature-based similarity search in 3d object databases. Acm Computing Surveys 37(4):345–387
Cao B, Kang Y, Lin S, Luo X, Xu S, Lv Z (2016) Style-sensitive 3d model retrieval through sketch-based queries. J Intell Fuzzy Sys 31(5):2637–2644
Conrad M, De Doncker RW, Schniedenharn M, Diatlov A (2014) Packaging for power semiconductors based on the 3d printing technology selective laser melting. In: European conference on power electronics and applications, pp 1–7
Feng Y, Zizhao Z, Zhao X, Ji R, Gao Y (2018) Gvcnn: group-view convolutional neural networks for 3d shape recognition, pp 264–272. https://doi.org/10.1109/CVPR.2018.00035
Furuya T, Ohbuchi R (2016) Deep aggregation of local 3d geometric features for 3d model retrieval
Gao Y, Dai Q (2014) View-based 3d object retrieval: challenges and approaches. IEEE Multimed 21(3):52–57
Gao Y, Tang J, Hong R, Yan S (2012) Camera constraint-free view-based 3-d object retrieval. IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society 21(4):2269–2281
Hu F, Xia G-S, Hu J, Zhang L (2015) Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery. Remote Sens 7(11):14680–14707
Irfanoglu M, Gokberk B, Akarun L (2004) 3d shape-based face recognition using registered surface similarity. In: Proceedings of the IEEE 12th signal processing and communications applications conference, 2004. IEEE, pp 571–574
Kanezaki A, Matsushita Y, Nishida Y (2018) Rotationnet: joint learning of object classification and viewpoint estimation using unaligned 3d object dataset. arXiv:1603.06208
Kazhdan M, Funkhouser T, Rusinkiewicz S (2003) Rotation invariant spherical harmonic representation of 3 d shape descriptors. In: Symposium on geometry processing, vol 6, pp 156–164
Leng B, Guo S, Du C, Zeng J, Xiong Z (2017) 3D object retrieval based on viewpoint segmentation. Multimed Sys 23(1):19–28
Liu A, Nie W, Gao Y, Su Y (2016) Multi-modal clique-graph matching for view-based 3d model retrieval. IEEE Trans Image Process 25(5):2103–2116
Liu AA, Nie WZ, Gao Y, Su YT (2016) Multi-modal clique-graph matching for view-based 3d model retrieval. IEEE Trans Image Process 25(5):2103–2116
Liu A, Nie W, Gao Y, Su Y (2018) View-based 3-d model retrieval: a benchmark. IEEE Trans Sys Man Cybern 48:916–928
Liu A, Su Y, Nie W, Kankanhalli M (2016) Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE Trans Pattern Anal Mach Intell 39(1):102–114
Liu A, Wang Z, Nie W, Su Y (2015) Graph-based characteristic view set extraction and matching for 3d model retrieval. Inf Sci 320:429–442
Liu A-A, Nie W-Z, Gao Y, Su Y-T (2018) View-based 3-d model retrieval: a benchmark. IEEE Trans Cybern 48(3):916–928
Liu Q (2012) A survey of recent view-based 3d model retrieval methods. arXiv:1208.3670
Ma C, Guo Y, Yang J, An W (2019) Learning multi-view representation with lstm for 3-d shape recognition and retrieval. IEEE Trans Multimed 21(5):1169–1182. https://doi.org/10.1109/TMM.2018.2875512
Nie L, Wang M, Zha Z, Li G, Chua T-S (2011) Multimedia answering: enriching text qa with media information. In: Proceedings of the 34th international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 695–704
Nie L, Wang M, Zha Z-J, Chua T-S (2012) Oracle in image search: a content-based approach to performance prediction. ACM Trans Inform Sys (TOIS) 30 (2):13
Nie W, Liu A, Gao Y, Su Y (2019) Hyper-clique graph matching and applications. IEEE Trans Circ Sys Video Technol 29(6):1619–1630. https://doi.org/10.1109/TCSVT.2018.2852310
Nie W, Wang K, Wang H, Su Y (2019) The assessment of 3d model representation for retrieval with cnn-rnn networks. Multimed Tools Appl
Nie W, Wang W, Liu A, Chen C (2019) Characteristic views extraction modal based-on deep reinforcement learning for 3d model retrieval. pp 2389–2393. https://doi.org/10.1109/ICIP.2019.8803343
Papoiu AD, Emerson NM, Patel TS, Kraft RA, Valdes-Rodriguez R, Nattkemper LA, Coghill RC, Yosipovitch G (2014) Voxel-based morphometry and arterial spin labeling fmri reveal neuropathic and neuroplastic features of brain processing of itch in end-stage renal disease. J Neurophys 112(7):1729–38
Paquet E, Rioux M, Murching A, Naveen T, Tabatabai A (2000) Description of shape information for 2-d and 3-d objects. Signal Processing Image Communication 16(s 1–2):103–122
Pickup D, Sun X, Rosin PL, Martin RR, Cheng Z, Nie S, Jin L (2015) Canonical forms for non-rigid 3d shape retrieval. In: Eurographics workshop on 3d object retrieval, pp 99–106
Saupe D, Vranić DV (2001) 3d model retrieval with spherical harmonics and moments. In: Joint pattern recognition symposium. Springer, Berlin, pp 392–397
Sfikas K, Theoharis T, Pratikakis I (2017) Exploiting the panorama representation for convolutional neural network classification and retrieval. In: Eurographics workshop on 3d object retrieval
Shen W, Zhao K, Jiang Y, Wang Y, Bai X, Yuille A (2016) Deepskeleton: learning multi-task scale-associated deep side outputs for object skeleton extraction in natural images. IEEE Trans Image Process PP(99):1–1
Shen W, Zhao K, Jiang Y, Wang Y, Bai X, Yuille A (2017) Deepskeleton: learning multi-task scale-associated deep side outputs for object skeleton extraction in natural images. IEEE Trans Image Process 26(11):5298–5311
Shi B, Bai S, Zhou Z, Bai X (2015) Deeppano: deep panoramic representation for 3-d shape recognition. IEEE Signal Process Lett 22(12):2339–2343
Shinagawa Y, Kunii TL (1991) Constructing a reeb graph automatically from cross sections. IEEE Comput Graph Appl 11(6):44–51
Su H, Maji S, Kalogerakis E, Learnedmiller E (2015) Multi-view convolutional neural networks for 3d shape recognition, pp 945–953
Sundar H, Silver D, Gagvani N, Dickinson S (2003) Skeleton based shape matching and retrieval. In: Shape modeling international, p 130
Tangelder JW, Veltkamp RC (2003) Polyhedral model retrieval using weighted point sets. Int J Image Graph 3(01):209–229
Wang D, Wang B, Zhao S, Yao H, Liu H (2017) View-based 3d object retrieval with discriminative views. Neurocomputing 252(C):58–66
Wu Z, Song S, Khosla A, Yu F (2015) 3d shapenets: a deep representation for volumetric shapes. In: IEEE conference on computer vision and pattern recognition, pp 1912–1920
Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2014) 3d shapenets: a deep representation for volumetric shapes, pp 1912–1920
Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J, Wu Z, Song S, Khosla A (2015) 3D shapenets a deep representation for volumetric shapes. In: IEEE conference on computer vision & pattern recognition
Xie J, Fang Y, Zhu F, Wong E (2015) Deepshape: deep learned shape descriptor for 3d shape matching and retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1275–1283
Xu K, Shi Y, Zheng L, Zhang J, Liu M, Huang H, Su H, Cohen-Or D, Chen B (2016) 3D attention-driven depth acquisition for object identification. ACM Transactions on Graphics (TOG) 35(6):238
Yang S, Ramanan D (2015) Multi-scale recognition with DAG-CNNs. In: 2015 IEEE International conference on computer vision, ICCV 2015, Santiago, Chile, December 7-13, 2015. IEEE Computer Society, pp 1215–1223, DOI https://doi.org/10.1109/ICCV.2015.144, (to appear in print)
Zhao S, Yao H, Zhang Y, Wang Y, Liu S (2015) View-based 3d object retrieval via multi-modal graph learning. Signal Process 112:110–118
Zhao S, Yao H, Zhang Y, Wang Y, Liu S (2015) View-based 3d object retrieval via multi-modal graph learning. Signal Process 112:110–118
Zhao X, Si S, Dui H, Cai Z, Sun S (2013) Integrated importance measure for multi-state coherent systems of k level. J Syst Eng Electron 24(6):1029–1037
Zhao X, Si S, Dui H, Cai Z, Wang J, Song X (2015) Compositional performance evaluation with importance measures. Communications in Statistics-Theory and Methods 44(24):5240–5253
Zhu L, Huang Z, Li Z, Xie L, Shen HT (2018) Exploring auxiliary context: discrete semantic transfer hashing for scalable image retrieval. IEEE Trans Neural Netw Learn Sys PP(99):1–13
Zhu L, Huang Z, Liu X, He X, Sun J, Zhou X (2017) Discrete multimodal hashing with canonical views for robust mobile landmark search. IEEE Trans Multimed 19(9):2066–2079
Zhu L, Shen J, Jin H, Zheng R, Xie L (2015) Content-based visual landmark search via multimodal hypergraph learning. IEEE Trans Sys Man Cybern 45 (12):2756–2769
Zhu L, Shen J, Xie L, Cheng Z (2017) Unsupervised visual hashing with semantic assistant for content-based image retrieval. IEEE Trans Knowl Data Eng 29 (2):472–486
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China (61472275, 61170239, 61303208, 61502337).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liang, Q., Xu, N., Wang, W. et al. Multimodal information fusion based on LSTM for 3D model retrieval. Multimed Tools Appl 79, 33943–33956 (2020). https://doi.org/10.1007/s11042-020-08817-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-08817-6