Semi-supervised Viewpoint Estimation with Geometry-Aware Conditional Generation

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12536))

Included in the following conference series:

European Conference on Computer Vision

2026 Accesses
3 Citations
8 Altmetric

Abstract

There is a growing interest in developing computer vision methods that can learn from limited supervision. In this paper, we consider the problem of learning to predict camera viewpoints, where obtaining ground-truth annotations are expensive and require special equipment, from a limited number of labeled images. We propose a semi-supervised viewpoint estimation method that can learn to infer viewpoint information from unlabeled image pairs, where two images differ by a viewpoint change. In particular our method learns to synthesize the second image by combining the appearance from the first one and viewpoint from the second one. We demonstrate that our method significantly improves the supervised techniques, especially in the low-label regime and outperforms the state-of-the-art semi-supervised methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

AUTO3D: Novel View Synthesis Through Unsupervisely Learned Variational Viewpoint and Global 3D Representation

Shape and Viewpoint Without Keypoints

RelPose: Predicting Probabilistic Relative Rotation for Single Objects in the Wild

References

Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.: MixMatch: a holistic approach to semi-supervised learning. arXiv:1905.02249 (2019)
Chang, A.X., et al.: ShapeNet: an information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015)
Cohen, T.S., Geiger, M., Köhler, J., Welling, M.: Spherical CNNs. In: Proceedings of the International Conference on Learning Representations (ICLR) (2018)
Google Scholar
Esteves, C., Allen-Blanchette, C., Makadia, A., Daniilidis, K.: 3D object classification and retrieval with spherical CNNs. arXiv preprint arXiv:1711.06721 (2017)
Esteves, C., Sud, A., Luo, Z., Daniilidis, K., Makadia, A.: Cross-domain 3D equivariant image embeddings. arXiv preprint arXiv:1812.02716 (2018)
Georgakis, G., Reza, M.A., Mousavian, A., Le, P.H., Košecká, J.: Multiview RGB-D dataset for object instance detection. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 426–434. IEEE (2016)
Google Scholar
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Google Scholar
Grabner, A., Roth, P.M., Lepetit, V.: 3D pose estimation and 3D model retrieval for objects in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3022–3031 (2018)
Google Scholar
Hinterstoisser, S., et al.: Multimodal templates for real-time detection of texture-less objects in heavily cluttered scenes. In: International Conference on Computer Vision, pp. 858–865. IEEE (2011)
Google Scholar
Hinterstoisser, S., et al.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 548–562. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37331-2_42
Chapter Google Scholar
Hodan, T., Haluza, P., Obdržálek, Š., Matas, J., Lourakis, M., Zabulis, X.: T-LESS: an RGB-D dataset for 6D pose estimation of texture-less objects. In: 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 880–888. IEEE (2017)
Google Scholar
Insafutdinov, E., Dosovitskiy, A.: Unsupervised learning of shape and pose with differentiable point clouds. In: Advances in Neural Information Processing Systems, pp. 2802–2812 (2018)
Google Scholar
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. In: Advances in Neural Information Processing Systems, pp. 2017–2025 (2015)
Google Scholar
Jakab, T., Gupta, A., Bilen, H., Vedaldi, A.: Conditional image generation for learning the structure of visual objects. Methods 43, 44 (2018)
Google Scholar
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_43
Chapter Google Scholar
Joung, S., et al.: Cylindrical convolutional networks for joint object detection and viewpoint estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14163–14172 (2020)
Google Scholar
Kanazawa, A., Tulsiani, S., Efros, A.A., Malik, J.: Learning category-specific mesh reconstruction from image collections. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 371–386 (2018)
Google Scholar
Kanezaki, A., Matsushita, Y., Nishida, Y.: RotationNet: joint object categorization and pose estimation using multiviews from unsupervised viewpoints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5010–5019 (2018)
Google Scholar
Kato, H., Ushiku, Y., Harada, T.: Neural 3D mesh renderer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3907–3916 (2018)
Google Scholar
Kehl, W., Manhardt, F., Tombari, F., Ilic, S., Navab, N.: SSD-6D: making RGB-based 3D detection and 6D pose estimation great again. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1521–1529 (2017)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kulkarni, T.D., Whitney, W.F., Kohli, P., Tenenbaum, J.: Deep convolutional inverse graphics network. In: Advances in Neural Information Processing Systems, pp. 2539–2547 (2015)
Google Scholar
Liao, S., Gavves, E., Snoek, C.G.: Spherical regression: learning viewpoints, surface normals and 3D rotations on n-spheres. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9759–9767 (2019)
Google Scholar
Liebelt, J., Schmid, C.: Multi-view object class detection with a 3D geometric model. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1688–1695. IEEE (2010)
Google Scholar
Mahendran, S., Ali, H., Vidal, R.: 3D pose regression using convolutional neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2174–2182 (2017)
Google Scholar
Mustikovela, S.K., et al.: Self-supervised viewpoint learning from image collections. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3971–3981 (2020)
Google Scholar
Nguyen-Phuoc, T., Li, C., Theis, L., Richardt, C., Yang, Y.L.: HoloGAN: unsupervised learning of 3D representations from natural images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 7588–7597 (2019)
Google Scholar
Rad, M., Lepetit, V.: BB8: a scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3828–3836 (2017)
Google Scholar
Rhodin, H., Salzmann, M., Fua, P.: Unsupervised geometry-aware representation for 3D human pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 750–767 (2018)
Google Scholar
Su, H., Qi, C.R., Li, Y., Guibas, L.J.: Render for CNN: viewpoint estimation in images using CNNs trained with rendered 3D model views. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2686–2694 (2015)
Google Scholar
Sundermeyer, M., Marton, Z.C., Durner, M., Brucker, M., Triebel, R.: Implicit 3D orientation learning for 6D object detection from RGB images. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 699–715 (2018)
Google Scholar
Suwajanakorn, S., Snavely, N., Tompson, J.J., Norouzi, M.: Discovery of latent 3D keypoints via end-to-end geometric reasoning. In: Advances in Neural Information Processing Systems, pp. 2059–2070 (2018)
Google Scholar
Tan, V., Budvytis, I., Cipolla, R.: Indirect deep structured learning for 3D human body shape and pose prediction. In: British Machine Vision Conference (BMVC) (2018)
Google Scholar
Tarvainen, A., Valpola, H.: Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: Advances in Neural Information Processing Systems, pp. 1195–1204 (2017)
Google Scholar
Tekin, B., Sinha, S.N., Fua, P.: Real-time seamless single shot 6D object pose prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 292–301 (2018)
Google Scholar
Thewlis, J., Bilen, H., Vedaldi, A.: Unsupervised learning of object frames by dense equivariant image labelling. In: Advances in Neural Information Processing Systems, pp. 844–855 (2017)
Google Scholar
Thewlis, J., Bilen, H., Vedaldi, A.: Unsupervised learning of object landmarks by factorized spatial embeddings. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5916–5925 (2017)
Google Scholar
Thewlis, J., Bilen, H., Vedaldi, A.: Modelling and unsupervised learning of symmetric deformable object categories. In: Advances in Neural Information Processing Systems, pp. 8178–8189 (2018)
Google Scholar
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 23–30. IEEE (2017)
Google Scholar
Tulsiani, S., Efros, A.A., Malik, J.: Multi-view consistency as supervisory signal for learning shape and pose prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2897–2905 (2018)
Google Scholar
Tulsiani, S., Malik, J.: Viewpoints and keypoints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1510–1519 (2015)
Google Scholar
Worrall, D.E., Garbin, S.J., Turmukhambetov, D., Brostow, G.J.: Interpretable transformations with encoder-decoder networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5726–5735 (2017)
Google Scholar
Xiang, Y., Mottaghi, R., Savarese, S.: Beyond pascal: a benchmark for 3D object detection in the wild. In: IEEE Winter Conference on Applications of Computer Vision, pp. 75–82. IEEE (2014)
Google Scholar
Xie, Q., Dai, Z., Hovy, E., Luong, M.T., Le, Q.V.: Unsupervised data augmentation for consistency training. arXiv preprint arXiv:1904.12848 (2019)
Yan, X., Yang, J., Yumer, E., Guo, Y., Lee, H.: Perspective transformer nets: learning single-view 3D object reconstruction without 3D supervision. In: Advances in Neural Information Processing Systems, pp. 1696–1704 (2016)
Google Scholar
Yang, G., Cui, Y., Belongie, S., Hariharan, B.: Learning single-view 3D reconstruction with limited pose supervision. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 86–101 (2018)
Google Scholar
Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. In: International Conference on Machine Learning (2018)
Google Scholar
Zhou, X., Karpur, A., Luo, L., Huang, Q.: Starmap for category-agnostic keypoint and viewpoint estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 318–334 (2018)
Google Scholar

Download references

Acknowledgements

The authors acknowledge the support of Toyota Motor Europe.

Author information

Authors and Affiliations

School of Informatics, University of Edinburgh, Edinburgh, UK
Octave Mariotti & Hakan Bilen

Authors

Octave Mariotti
View author publications
You can also search for this author in PubMed Google Scholar
Hakan Bilen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Octave Mariotti .

Editor information

Editors and Affiliations

University of Clermont Auvergne, Clermont Ferrand, France
Adrien Bartoli
Università degli Studi di Udine, Udine, Italy
Andrea Fusiello

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mariotti, O., Bilen, H. (2020). Semi-supervised Viewpoint Estimation with Geometry-Aware Conditional Generation. In: Bartoli, A., Fusiello, A. (eds) Computer Vision – ECCV 2020 Workshops. ECCV 2020. Lecture Notes in Computer Science(), vol 12536. Springer, Cham. https://doi.org/10.1007/978-3-030-66096-3_42

Download citation

DOI: https://doi.org/10.1007/978-3-030-66096-3_42
Published: 03 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66095-6
Online ISBN: 978-3-030-66096-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Semi-supervised Viewpoint Estimation with Geometry-Aware Conditional Generation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

AUTO3D: Novel View Synthesis Through Unsupervisely Learned Variational Viewpoint and Global 3D Representation

Shape and Viewpoint Without Keypoints

RelPose: Predicting Probabilistic Relative Rotation for Single Objects in the Wild

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Semi-supervised Viewpoint Estimation with Geometry-Aware Conditional Generation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

AUTO3D: Novel View Synthesis Through Unsupervisely Learned Variational Viewpoint and Global 3D Representation

Shape and Viewpoint Without Keypoints

RelPose: Predicting Probabilistic Relative Rotation for Single Objects in the Wild

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation