Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Multimodal information fusion based on LSTM for 3D model retrieval

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

With advances in low-cost 3D model capturing devices and virtual 3D model building software, the acquisition of 3D data has become increasingly easier. The subsequent 3D model retrieval skill has also become essential when we utilize 3D models. Although a large number of methods have been proposed to address this problem, most of them cannot fully utilize the information represented by 3D models. To solve this problem. We present a multimodal feature fusion method based on the LSTM network. First, we placed some cameras evenly around the 3D model at a fixed distance, which was aimed at the centroid of the 3D model to obtain a series of pictures. Second, the skeleton information is extracted from these rendered pictures. Finally, the rendered pictures, along with the skeleton information, were sequentially fed into the LSTM network to obtain the feature of the fusion information. The confusion matrix was completed to evaluate retrieval performance. In the experiment section, datasets named NTU and ModelNet40 were utilized to demonstrate the performance of the proposed method. Many experiments and corresponding experimental results also demonstrated the superiority of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Akgül CB, Sankur B, Yemez Y, Schmitt F (2009) 3D model retrieval using probability density-based shape descriptors. IEEE Trans Pattern Anal Mach Intell 31 (6):1117–1133

    Article  Google Scholar 

  2. Ansary TF, Daoudi M, Vandeborre JP (2006) A bayesian 3-d search engine using adaptive views clustering. IEEE Trans Multimed 9(1):78–88

    Article  Google Scholar 

  3. Bai S, Bai X, Zhou Z, Zhang Z, Tian Q, Latecki LJ (2017) Gift: towards scalable 3d shape retrieval. IEEE Trans Multimed 19(6):1257–1271. https://doi.org/10.1109/TMM.2017.2652071

    Article  Google Scholar 

  4. Bu S, Liu Z, Han J, Wu J, Ji R (2014) Learning high-level feature by deep belief networks for 3-d model retrieval and recognition. IEEE Trans Multimed 16 (8):2154–2167

    Article  Google Scholar 

  5. Bustos B (2005) Feature-based similarity search in 3d object databases. Acm Computing Surveys 37(4):345–387

    Article  Google Scholar 

  6. Cao B, Kang Y, Lin S, Luo X, Xu S, Lv Z (2016) Style-sensitive 3d model retrieval through sketch-based queries. J Intell Fuzzy Sys 31(5):2637–2644

    Article  Google Scholar 

  7. Conrad M, De Doncker RW, Schniedenharn M, Diatlov A (2014) Packaging for power semiconductors based on the 3d printing technology selective laser melting. In: European conference on power electronics and applications, pp 1–7

  8. Feng Y, Zizhao Z, Zhao X, Ji R, Gao Y (2018) Gvcnn: group-view convolutional neural networks for 3d shape recognition, pp 264–272. https://doi.org/10.1109/CVPR.2018.00035

  9. Furuya T, Ohbuchi R (2016) Deep aggregation of local 3d geometric features for 3d model retrieval

  10. Gao Y, Dai Q (2014) View-based 3d object retrieval: challenges and approaches. IEEE Multimed 21(3):52–57

    Article  Google Scholar 

  11. Gao Y, Tang J, Hong R, Yan S (2012) Camera constraint-free view-based 3-d object retrieval. IEEE Transactions on Image Processing A Publication of the IEEE Signal Processing Society 21(4):2269–2281

    Article  MathSciNet  Google Scholar 

  12. Hu F, Xia G-S, Hu J, Zhang L (2015) Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery. Remote Sens 7(11):14680–14707

    Article  Google Scholar 

  13. Irfanoglu M, Gokberk B, Akarun L (2004) 3d shape-based face recognition using registered surface similarity. In: Proceedings of the IEEE 12th signal processing and communications applications conference, 2004. IEEE, pp 571–574

  14. Kanezaki A, Matsushita Y, Nishida Y (2018) Rotationnet: joint learning of object classification and viewpoint estimation using unaligned 3d object dataset. arXiv:1603.06208

  15. Kazhdan M, Funkhouser T, Rusinkiewicz S (2003) Rotation invariant spherical harmonic representation of 3 d shape descriptors. In: Symposium on geometry processing, vol 6, pp 156–164

  16. Leng B, Guo S, Du C, Zeng J, Xiong Z (2017) 3D object retrieval based on viewpoint segmentation. Multimed Sys 23(1):19–28

    Article  Google Scholar 

  17. Liu A, Nie W, Gao Y, Su Y (2016) Multi-modal clique-graph matching for view-based 3d model retrieval. IEEE Trans Image Process 25(5):2103–2116

    Article  MathSciNet  Google Scholar 

  18. Liu AA, Nie WZ, Gao Y, Su YT (2016) Multi-modal clique-graph matching for view-based 3d model retrieval. IEEE Trans Image Process 25(5):2103–2116

    Article  MathSciNet  Google Scholar 

  19. Liu A, Nie W, Gao Y, Su Y (2018) View-based 3-d model retrieval: a benchmark. IEEE Trans Sys Man Cybern 48:916–928

    Google Scholar 

  20. Liu A, Su Y, Nie W, Kankanhalli M (2016) Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE Trans Pattern Anal Mach Intell 39(1):102–114

    Article  Google Scholar 

  21. Liu A, Wang Z, Nie W, Su Y (2015) Graph-based characteristic view set extraction and matching for 3d model retrieval. Inf Sci 320:429–442

    Article  Google Scholar 

  22. Liu A-A, Nie W-Z, Gao Y, Su Y-T (2018) View-based 3-d model retrieval: a benchmark. IEEE Trans Cybern 48(3):916–928

    Google Scholar 

  23. Liu Q (2012) A survey of recent view-based 3d model retrieval methods. arXiv:1208.3670

  24. Ma C, Guo Y, Yang J, An W (2019) Learning multi-view representation with lstm for 3-d shape recognition and retrieval. IEEE Trans Multimed 21(5):1169–1182. https://doi.org/10.1109/TMM.2018.2875512

    Article  Google Scholar 

  25. Nie L, Wang M, Zha Z, Li G, Chua T-S (2011) Multimedia answering: enriching text qa with media information. In: Proceedings of the 34th international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 695–704

  26. Nie L, Wang M, Zha Z-J, Chua T-S (2012) Oracle in image search: a content-based approach to performance prediction. ACM Trans Inform Sys (TOIS) 30 (2):13

    Google Scholar 

  27. Nie W, Liu A, Gao Y, Su Y (2019) Hyper-clique graph matching and applications. IEEE Trans Circ Sys Video Technol 29(6):1619–1630. https://doi.org/10.1109/TCSVT.2018.2852310

    Article  Google Scholar 

  28. Nie W, Wang K, Wang H, Su Y (2019) The assessment of 3d model representation for retrieval with cnn-rnn networks. Multimed Tools Appl

  29. Nie W, Wang W, Liu A, Chen C (2019) Characteristic views extraction modal based-on deep reinforcement learning for 3d model retrieval. pp 2389–2393. https://doi.org/10.1109/ICIP.2019.8803343

  30. Papoiu AD, Emerson NM, Patel TS, Kraft RA, Valdes-Rodriguez R, Nattkemper LA, Coghill RC, Yosipovitch G (2014) Voxel-based morphometry and arterial spin labeling fmri reveal neuropathic and neuroplastic features of brain processing of itch in end-stage renal disease. J Neurophys 112(7):1729–38

    Article  Google Scholar 

  31. Paquet E, Rioux M, Murching A, Naveen T, Tabatabai A (2000) Description of shape information for 2-d and 3-d objects. Signal Processing Image Communication 16(s 1–2):103–122

    Article  Google Scholar 

  32. Pickup D, Sun X, Rosin PL, Martin RR, Cheng Z, Nie S, Jin L (2015) Canonical forms for non-rigid 3d shape retrieval. In: Eurographics workshop on 3d object retrieval, pp 99–106

  33. Saupe D, Vranić DV (2001) 3d model retrieval with spherical harmonics and moments. In: Joint pattern recognition symposium. Springer, Berlin, pp 392–397

  34. Sfikas K, Theoharis T, Pratikakis I (2017) Exploiting the panorama representation for convolutional neural network classification and retrieval. In: Eurographics workshop on 3d object retrieval

  35. Shen W, Zhao K, Jiang Y, Wang Y, Bai X, Yuille A (2016) Deepskeleton: learning multi-task scale-associated deep side outputs for object skeleton extraction in natural images. IEEE Trans Image Process PP(99):1–1

    MATH  Google Scholar 

  36. Shen W, Zhao K, Jiang Y, Wang Y, Bai X, Yuille A (2017) Deepskeleton: learning multi-task scale-associated deep side outputs for object skeleton extraction in natural images. IEEE Trans Image Process 26(11):5298–5311

    Article  MathSciNet  Google Scholar 

  37. Shi B, Bai S, Zhou Z, Bai X (2015) Deeppano: deep panoramic representation for 3-d shape recognition. IEEE Signal Process Lett 22(12):2339–2343

    Article  Google Scholar 

  38. Shinagawa Y, Kunii TL (1991) Constructing a reeb graph automatically from cross sections. IEEE Comput Graph Appl 11(6):44–51

    Article  Google Scholar 

  39. Su H, Maji S, Kalogerakis E, Learnedmiller E (2015) Multi-view convolutional neural networks for 3d shape recognition, pp 945–953

  40. Sundar H, Silver D, Gagvani N, Dickinson S (2003) Skeleton based shape matching and retrieval. In: Shape modeling international, p 130

  41. Tangelder JW, Veltkamp RC (2003) Polyhedral model retrieval using weighted point sets. Int J Image Graph 3(01):209–229

    Article  Google Scholar 

  42. Wang D, Wang B, Zhao S, Yao H, Liu H (2017) View-based 3d object retrieval with discriminative views. Neurocomputing 252(C):58–66

    Article  Google Scholar 

  43. Wu Z, Song S, Khosla A, Yu F (2015) 3d shapenets: a deep representation for volumetric shapes. In: IEEE conference on computer vision and pattern recognition, pp 1912–1920

  44. Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2014) 3d shapenets: a deep representation for volumetric shapes, pp 1912–1920

  45. Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J, Wu Z, Song S, Khosla A (2015) 3D shapenets a deep representation for volumetric shapes. In: IEEE conference on computer vision & pattern recognition

  46. Xie J, Fang Y, Zhu F, Wong E (2015) Deepshape: deep learned shape descriptor for 3d shape matching and retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1275–1283

  47. Xu K, Shi Y, Zheng L, Zhang J, Liu M, Huang H, Su H, Cohen-Or D, Chen B (2016) 3D attention-driven depth acquisition for object identification. ACM Transactions on Graphics (TOG) 35(6):238

    Google Scholar 

  48. Yang S, Ramanan D (2015) Multi-scale recognition with DAG-CNNs. In: 2015 IEEE International conference on computer vision, ICCV 2015, Santiago, Chile, December 7-13, 2015. IEEE Computer Society, pp 1215–1223, DOI https://doi.org/10.1109/ICCV.2015.144, (to appear in print)

  49. Zhao S, Yao H, Zhang Y, Wang Y, Liu S (2015) View-based 3d object retrieval via multi-modal graph learning. Signal Process 112:110–118

    Article  Google Scholar 

  50. Zhao S, Yao H, Zhang Y, Wang Y, Liu S (2015) View-based 3d object retrieval via multi-modal graph learning. Signal Process 112:110–118

    Article  Google Scholar 

  51. Zhao X, Si S, Dui H, Cai Z, Sun S (2013) Integrated importance measure for multi-state coherent systems of k level. J Syst Eng Electron 24(6):1029–1037

    Article  Google Scholar 

  52. Zhao X, Si S, Dui H, Cai Z, Wang J, Song X (2015) Compositional performance evaluation with importance measures. Communications in Statistics-Theory and Methods 44(24):5240–5253

    Article  MathSciNet  Google Scholar 

  53. Zhu L, Huang Z, Li Z, Xie L, Shen HT (2018) Exploring auxiliary context: discrete semantic transfer hashing for scalable image retrieval. IEEE Trans Neural Netw Learn Sys PP(99):1–13

    MathSciNet  Google Scholar 

  54. Zhu L, Huang Z, Liu X, He X, Sun J, Zhou X (2017) Discrete multimodal hashing with canonical views for robust mobile landmark search. IEEE Trans Multimed 19(9):2066–2079

    Article  Google Scholar 

  55. Zhu L, Shen J, Jin H, Zheng R, Xie L (2015) Content-based visual landmark search via multimodal hypergraph learning. IEEE Trans Sys Man Cybern 45 (12):2756–2769

    Google Scholar 

  56. Zhu L, Shen J, Xie L, Cheng Z (2017) Unsupervised visual hashing with semantic assistant for content-based image retrieval. IEEE Trans Knowl Data Eng 29 (2):472–486

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (61472275, 61170239, 61303208, 61502337).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ning Xu or Weijie Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liang, Q., Xu, N., Wang, W. et al. Multimodal information fusion based on LSTM for 3D model retrieval. Multimed Tools Appl 79, 33943–33956 (2020). https://doi.org/10.1007/s11042-020-08817-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-08817-6

Keywords

Navigation