Abstract
3D point clouds deep learning is a promising field of research that allows a neural network to learn features of point clouds directly, making it a robust tool for solving 3D scene understanding tasks. While recent works show that point cloud convolutions can be invariant to translation and point permutation, investigations of the rotation invariance property for point cloud convolution has been so far scarce. Some existing methods perform point cloud convolutions with rotation-invariant features, existing methods generally do not perform as well as translation-invariant only counterpart. In this work, we argue that a key reason is that compared to point coordinates, rotation-invariant features consumed by point cloud convolution are not as distinctive. To address this problem, we propose a simple yet effective convolution operator that enhances feature distinction by designing powerful rotation invariant features from the local regions. We consider the relationship between the point of interest and its neighbors as well as the internal relationship of the neighbors to largely improve the feature descriptiveness. Our network architecture can capture both local and global context by simply tuning the neighborhood size in each convolution layer. We conduct several experiments on synthetic and real-world point cloud classifications, part segmentation, and shape retrieval to evaluate our method, which achieves the state-of-the-art accuracy under challenging rotations.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Armeni, I., Sener, O., Zamir, A.R., Jiang, H., Brilakis, I., Fischer, M., & Savarese, S. (2016). 3d semantic parsing of large-scale indoor spaces. In Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 1534–1543).
Bai, S., Bai, X., Zhou, Z., Zhang, Z., & Jan Latecki, L. (2016). Gift: A real-time and scalable 3d shape search engine. In Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 5023–5032).
Chang, A.X., Funkhouser, T.A., Guibas, L.J., Hanrahan, P., Huang, Q.X., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., Xiao, J., Yi, L., & Yu, F. (2015). Shapenet: an information-rich 3d model repository. arXiv preprint arXiv:151203012
Chen, C., Li, G., Xu, R., Chen, T., Wang, M., & Lin, L. (2019). Clusternet: Deep hierarchical cluster network with rigorously rotation-invariant representation for point cloud analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 4994–5002).
Curless, B., & Levoy, M. (1996). A volumetric method for building complex models from range images. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, (pp. 303–312).
Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., & Niessner, M. (2017). Scannet: richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 5828–5839).
Deng, H., Birdal, T., & Ilic, S. (2018). Ppf-foldnet: unsupervised learning of rotation invariant 3d local descriptors. In Proceedings of the european conference on computer vision, (pp. 602–618)
Esteves, C., Allen-Blanchette, C., Makadia, & A., Daniilidis, K. (2018a). Learning so (3) equivariant representations with spherical cnns. In Proceedings of the European Conference on Computer Vision (ECCV), (pp. 52–68).
Esteves, C., Allen-Blanchette, C., Zhou. X., & Daniilidis, K. (2018b). Polar transformer networks. In International conference on learning representations.
Esteves, C., Xu, Y., Allen-Blanchette, C., & Daniilidis, K. (2019). Equivariant multi-view networks. In Proceedings of the IEEE international conference on computer vision, (pp. 1568–1577).
Furuya, T., & Ohbuchi, R. (2016). Deep aggregation of local 3d geometric features for 3d model retrieval. In BMVC, (Vol. 7, p. 8).
Guo, Y., Sohel, F., Bennamoun, M., Lu, M., & Wan, J. (2013). Rotational projection statistics for 3d local surface description and object recognition. International Journal of Computer Vision, 105(1), 63–86.
Guo, Y., Wang, H., Hu, Q., Liu, H., Liu, L., & Bennamoun, M. (2020). Deep learning for 3d point clouds: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(12), 4338–4364.
Hua, B.S., Pham, Q.H., Nguyen, D.T., Tran, M.K., Yu, L.F., & Yeung, S.K. (2016). Scenenn: A scene meshes dataset with annotations. In: 2016 fourth international conference on 3D vision, (pp. 92–101).
Hua, B.S., Tran, M.K., & Yeung, S.K. (2018). Pointwise convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 984–993).
Kim, J., Jung, W., Kim, H., & Lee, J. (2020a). Cycnn: a rotation invariant cnn using polar mapping and cylindrical convolution layers. arXiv preprint arXiv:2007.10588
Kim, S., Park, J., & Han, B. (2020b). Rotation-invariant local-to-global representation learning for 3d point cloud. Advances in Neural Information Processing Systems, 33, 8174–8185.
Klokov, R., & Lempitsky, V. (2017). Escape from cells: Deep kd-networks for the recognition of 3d point cloud models. In Proceedings of the IEEE international conference on computer vision, (pp. 863–872).
Laptev, D., Savinov, N., Buhmann, J.M., & Pollefeys, M. (2016). Ti-pooling: transformation-invariant pooling for feature learning in convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 289–297).
Li, Y., Pirk, S., Su, H., Qi, C.R., & Guibas, L.J. (2016). Fpnn: Field probing neural networks for 3d data. Advances in Neural Information Processing Systems, 29, 307–315.
Li, Y., Bu, R., Sun, M., Wu, W., Di, X., & Chen, B. (2018). Pointcnn: Convolution on x-transformed points. Advances in Neural Information Processing Systems, 31, 820–830.
Li, X., Li, R., Chen, G., Fu, C.W., Cohen-Or, D., & Heng, P.A. (2021). A rotation-invariant framework for deep point cloud analysis. IEEE Transactions on Visualization and Computer Graphics. https://doi.org/10.48550/arXiv.2003.07238
Liu, Y., Fan, B., Xiang, S., & Pan, C. (2019). Relation-shape convolutional neural network for point cloud analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 8895–8904).
Manning, C.D., Raghavan, P., & Schütze, H. (2008). Introduction to Information Retrieval. Cambridge University Press, http://nlp.stanford.edu/IR-book/information-retrieval-book.html
Maturana, D., & Scherer, S. (2015). Voxnet: A 3d convolutional neural network for real-time object recognition. In 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), (pp. 922–928).
Mian, A., Bennamoun, M., & Owens, R. (2010). On the repeatability and quality of keypoints for local feature-based 3d object retrieval from cluttered scenes. International Journal of Computer Vision, 89(2–3), 348–361.
Poulenard, A., Rakotosaona, M.J., Ponty, Y., & Ovsjanikov, M. (2019). Effective rotation-invariant point cnn with spherical harmonics kernels. In International conference on 3D Vision
Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., & Guibas, L.J. (2016). Volumetric and multi-view cnns for object classification on 3d data. In Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 5648–5656).
Qi, C.R., Su, H., Mo, K., & Guibas, L.J. (2017a). Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 652–660).
Qi, C.R., Yi, L., Su, H., & Guibas, L.J. (2017b). Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In Advances in neural information processing systems, (pp. 5105–5114).
Rao, Y., Lu, J., & Zhou, J. (2019). Spherical fractal convolutional neural networks for point cloud recognition. In Computer vision and pattern recognition
Riegler, G., Osman Ulusoy, A., & Geiger, A. (2017). Octnet: Learning deep 3d representations at high resolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 3577–3586).
Ronneberger, O., Fischer, P., & Brox, T. (2015). U-net: convolutional networks for biomedical image segmentation. In International conference on medical image computing and computer-assisted intervention, (pp. 234–241).
Savva, M., Yu, F., Su, H., Aono, M., Chen, B., Cohen-Or, D., Deng, W., Su, H., Bai, S., & Bai, X., et al. (2016). Shrec16 track: largescale 3d shape retrieval from shapenet core55. In Proceedings of the eurographics workshop on 3D object retrieval, (vol 10).
Su, H., Maji, S., Kalogerakis, E., & Learned-Miller, E. (2015). Multi-view convolutional neural networks for 3d shape recognition. In Proceedings of the IEEE international conference on computer vision, (pp. 945–953).
Tatsuma, A., & Aono, M. (2009). Multi-fourier spectra descriptor and augmentation with spectral clustering for 3d shape retrieval. The Visual Computer, 25(8), 785–804.
Thomas, H. (2020). Rotation-invariant point convolution with multiple equivariant alignments. In 2020 International Conference on 3D Vision (3DV), (pp. 504–513).
Tombari, F., Salti, S., & Di Stefano, L. (2010). Unique signatures of histograms for local surface description. In Proceedings of the european conference on computer vision, (pp. 356–369). Springer.
Uy, M.A., Pham, Q.H., Hua, B.S., Nguyen, D.T., & Yeung, S.K. (2019). Revisiting point cloud classification: a new benchmark dataset and classification model on real-world data. In International conference on computer vision
van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(86), 2579–2605.
Wang, P. S., Liu, Y., Guo, Y. X., Sun, C. Y., & Tong, X. (2017). O-cnn: Octree-based convolutional neural networks for 3d shape analysis. ACM Transactions on Graphics, 36(4), 1–11.
Wang, Y., Sun, Y., Liu, Z., Sarma, S.E., Bronstein, M.M., & Solomon, J.M. (2019). Dynamic graph cnn for learning on point clouds. ACM Transactions on Graphics
Weiler, M., Hamprecht, F.A., & Storath, M. (2018). Learning steerable filters for rotation equivariant cnns. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (pp. 849–858).
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., & Xiao, J. (2015a). 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 1912–1920).
Wu. Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., & Xiao, J. (2015b). 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 1912–1920).
Xu, Y., Fan, T., Xu, M., Zeng, L., & Qiao, Y. (2018). Spidercnn: Deep learning on point sets with parameterized convolutional filters. In Proceedings of the european conference on computer vision, (pp. 87–102).
Yi, L., Kim, V. G., Ceylan, D., Shen, I. C., Yan, M., Su, H., Lu, C., Huang, Q., Sheffer, A., & Guibas, L. (2016). A scalable active framework for region annotation in 3d shape collections. ACM Transactions on Graphics, 35(6), 1–12.
Zaharescu, A., Boyer, E., Varanasi, K., & Horaud, R. (2009). Surface feature detection and description with applications to mesh matching. In Proceedings of the IEEE conference on computer vision and pattern recognition, (pp. 373–380). IEEE.
Zhang, Z., Hua, B.S., Rosen, D.W., Yeung, S.K. (2019a). Rotation invariant convolutions for 3d point clouds deep learning. In International conference on 3D Vision, (pp. 204–213).
Zhang, Z., Hua, B.S., & Yeung, S.K. (2019b). Shellnet: Efficient point cloud convolutional neural networks using concentric shells statistics. In Proceedings of the IEEE international conference on computer vision, (pp. 1607–1616).
Zhang, Z., Hua, B.S., Chen, W., Tian, Y., & Yeung, S.K. (2020). Global context aware convolutions for 3d point cloud understanding. In International conference on 3D vision
Zhao, Y., Birdal, T., Lenssen, J.E., Menegatti, E., Guibas, L., & Tombari, F. (2020). Quaternion equivariant capsule networks for 3d point clouds. In European conference on computer vision, (pp. 1–19). Springer.
Acknowledgements
We thank the anonymous reviewers for their constructive comments. This research project is supported by the grant from Ningbo Research Institute of Zhejiang University (1149957B20210125), and partially supported by an internal grant from HKUST (R9429).
Author information
Authors and Affiliations
Additional information
Communicated by A. Hilton.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, Z., Hua, BS. & Yeung, SK. RIConv++: Effective Rotation Invariant Convolutions for 3D Point Clouds Deep Learning. Int J Comput Vis 130, 1228–1243 (2022). https://doi.org/10.1007/s11263-022-01601-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-022-01601-z