Cross Diffusion on Multi-hypergraph for Multi-modal 3D Object Recognition

Zizhao Zhang¹⁸,
Haojie Lin¹⁸,
Junjie Zhu¹⁸,
Xibin Zhao¹⁸ &
…
Yue Gao¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11164))

Included in the following conference series:

Pacific Rim Conference on Multimedia

4166 Accesses
4 Citations

Abstract

3D object recognition is a longstanding task in computer vision and has shown wide applications in computer aided design, virtual reality, etc. Current state-of-the-art methods mainly focus on 3D object representation for recognition. Concerning the multi-modal representations in practice, how to effectively combine such multi-modal information for recognition is still a challenging and urgent requirement. In this paper, we aim to conduct 3D object recognition using multi-modal information through a cross diffusion process on multi-hypergraph structure. Given multi-modal representations of 3D objects, the correlation among these objects is formulated using the multi-hypergraph structure each representation separately, which is able to model complex relationship among objects. To combine multi-modal representation, we propose a cross diffusion process on multi-hypergraph, in which the label information is propagated from multiple hypergraphs alternatively. In this way, the multi-modal information can be jointly combined through this cross diffusion process in multi-hypergraph structure. We have applied the proposed method in 3D object recognition using multiple representations. To evaluate the performance of the proposed cross diffusion method, we provide extensive experiments on two public 3D object datasets. Experimental results demonstrate that the proposed method can achieve satisfied multi-modal combination performance and outperform the current state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Shifting multi-hypergraphs via collaborative probabilistic voting

Article 31 March 2015

Dynamic Hypergraph Regularized Broad Learning System for Image Classification

Accurate Fine-Grained Object Recognition with Structure-Driven Relation Graph Networks

Article 24 August 2023

References

Chen, D.Y., Tian, X.P., Shen, Y.T., Ouhyoung, M.: On visual similarity based 3D model retrieval. Comput. Graph. Forum 22(3), 223–232 (2003)
Article Google Scholar
Chen, X., Ma, H., Wan, J., Li, B., Xia, T.: Multi-view 3D object detection network for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1 (2017)
Google Scholar
Feng, Y., Zhang, Z., Zhao, X., Ji, R., Gao, Y.: GVCNN: group-view convolutional neural networks for 3D shape recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar
Gao, Y., Wang, M., Tao, D., Ji, R., Dai, Q.: 3D object retrieval and recognition with hypergraph analysis. IEEE Trans. Image Process. 21(9), 4290–4303 (2012)
Article MathSciNet Google Scholar
Gao, Y., Wang, M., Zha, Z.J., Shen, J., Li, X., Wu, X.: Visual-textual joint relevance learning for tag-based social image search. IEEE Trans. Image Process. 22(1), 363–376 (2013)
Article MathSciNet Google Scholar
Guo, H., Wang, J., Gao, Y., Li, J., Lu, H.: Multi-view 3D object retrieval with deep embedding network. IEEE Trans. Image Process. 25(12), 5526–5537 (2016)
Article MathSciNet Google Scholar
Hong, R., Hu, Z., Wang, R., Wang, M., Tao, D.: Multi-view object retrieval via multi-scale topic models. IEEE Trans. Image Process. 25(12), 5814–5827 (2016)
Article MathSciNet Google Scholar
Hong, R., Zhang, L., Zhang, C., Zimmermann, R.: Flickr circles: aesthetic tendency discovery by multi-view regularized topic modeling. IEEE Trans. Multimed. 18(8), 1555–1567 (2016)
Article Google Scholar
Huang, Y., Liu, Q., Metaxas, D.: Video object segmentation by hypergraph cut. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 1738–1745. IEEE (2009)
Google Scholar
Huang, Y., Liu, Q., Zhang, S., Metaxas, D.N.: Image Retrieval via Probabilistic Hypergraph Ranking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3376–3383. IEEE (2010)
Google Scholar
Klaser, A., Marszałek, M., Schmid, C.: A spatio-temporal descriptor based on 3D-gradients. In: Proceedings of the British Machine Vision Conference, pp. 275–1. British Machine Vision Association (2008)
Google Scholar
Klokov, R., Lempitsky, V.: Escape from cells: deep KD-networks for the recognition of 3D point cloud models. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 863–872. IEEE (2017)
Google Scholar
Li, J., Chen, B.M., Lee, G.H.: SO-Net: self-organizing network for point cloud analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9397–9406 (2018)
Google Scholar
Li, Y., Bu, R., Sun, M., Chen, B.: PointCNN. arXiv preprint arXiv:1801.07791 (2018)
Liu, Q., Sun, Y., Wang, C., Liu, T., Tao, D.: Elastic net hypergraph learning for image clustering and semi-supervised classification. IEEE Trans. Image Process. 26(1), 452–463 (2017)
Article MathSciNet Google Scholar
Maturana, D., Scherer, S.: Voxnet: A 3D convolutional neural network for real-time object recognition. In: Proceedings of the Intelligent Robots and Systems, pp. 922–928. IEEE (2015)
Google Scholar
Papadakis, P., Pratikakis, I., Theoharis, T., Perantonis, S.: PANORAMA: a 3D shape descriptor based on panoramic views for unsupervised 3D object retrieval. Int. J. Comput. Vis. 89(2–3), 177–192 (2010)
Article Google Scholar
Pontil, M., Verri, A.: Support vector machines for 3D object recognition. IEEE Trans. Pattern Anal. Mach. Intell. 20(6), 637–646 (1998)
Article Google Scholar
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, p. 4 (2017)
Google Scholar
Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L.J.: Volumetric and multi-view cnns for object classification on 3D data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5648–5656 (2016)
Google Scholar
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: deep hierarchical feature learning on point sets in a metric space. In: Advances in Neural Information Processing Systems, pp. 5105–5114 (2017)
Google Scholar
Riegler, G., Ulusoy, A.O., Geiger, A.: OctNET: learning deep 3D representations at high resolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 3 (2017)
Google Scholar
Shen, Y., Feng, C., Yang, Y., Tian, D.: Mining point cloud local structures by kernel correlation and graph pooling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 4 (2018)
Google Scholar
Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 945–953 (2015)
Google Scholar
Tian, Z., Hwang, T., Kuang, R.: A hypergraph-based learning algorithm for classifying gene expression and ArrayCGH data with prior knowledge. Bioinformatics 25(21), 2831–2838 (2009)
Article Google Scholar
Vranic, D., Saupe, D.: 3D Shape Descriptor Based on 3D Fourier Transform. EURASIP, pp. 271–274 (2001)
Google Scholar
Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., Solomon, J.M.: Dynamic graph CNN for learning on point clouds. arXiv preprint arXiv:1801.07829 (2018)
Wu, Z., et al.: 3D ShapeNets: a deep representation for volumetric shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912–1920 (2015)
Google Scholar
Xie, J., Dai, G., Zhu, F., Shao, L., Fang, Y.: Deep nonlinear metric learning for 3-D shape retrieval. IEEE Trans. Cybern. 48(1), 412–422 (2016)
Article Google Scholar
Xie, S., Liu, S., Chen, Z., Tu, Z.: Attentional ShapeContextNet for point cloud recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4606–4615 (2018)
Google Scholar
Zhang, Z., Bai, L., Liang, Y., Hancock, E.: Joint hypergraph learning and sparse regression for feature selection. Pattern Recogn. 63, 291–309 (2017)
Article Google Scholar
Zhu, L., Shen, J., Jin, H., Zheng, R., Xie, L.: Content-based visual landmark search via multimodal hypergraph learning. IEEE Trans. Cybern. 45(12), 2756–2769 (2015)
Article Google Scholar

Download references

Acknowledgements

This work was supported by National Key R&D Program of China (Grant No. 2017YFC0113000), and National Natural Science Funds of China (U1701262, 61671267).

Author information

Authors and Affiliations

Beijing National Research Center for Information Science and Technology KLISS, School of Software, Tsinghua University, Beijing, China
Zizhao Zhang, Haojie Lin, Junjie Zhu, Xibin Zhao & Yue Gao

Authors

Zizhao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Haojie Lin
View author publications
You can also search for this author in PubMed Google Scholar
Junjie Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Xibin Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yue Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yue Gao .

Editor information

Editors and Affiliations

Hefei University of Technology, Hefei, China
Richang Hong
National Chiao Tung University, Hsinchu, Taiwan
Wen-Huang Cheng
University of Tokyo, Tokyo, Japan
Toshihiko Yamasaki
Hefei University of Technology, Hefei, China
Meng Wang
City University of Hong Kong, Hong Kong, Hong Kong
Chong-Wah Ngo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, Z., Lin, H., Zhu, J., Zhao, X., Gao, Y. (2018). Cross Diffusion on Multi-hypergraph for Multi-modal 3D Object Recognition. In: Hong, R., Cheng, WH., Yamasaki, T., Wang, M., Ngo, CW. (eds) Advances in Multimedia Information Processing – PCM 2018. PCM 2018. Lecture Notes in Computer Science(), vol 11164. Springer, Cham. https://doi.org/10.1007/978-3-030-00776-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-00776-8_4
Published: 19 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00775-1
Online ISBN: 978-3-030-00776-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics