Article

3D Congealing: 3D-Aware Image Alignment in the Wild

Authors:

Andreas Engelhardt,

Varun JampaniAuthors Info & Claims

Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part I

Pages 387 - 404

https://doi.org/10.1007/978-3-031-73232-4_22

Published: 30 September 2024 Publication History

Abstract

We propose 3D Congealing, a novel problem of 3D-aware alignment for 2D images capturing semantically similar objects. Given a collection of unlabeled Internet images, our goal is to associate the shared semantic parts from the inputs and aggregate the knowledge from 2D images to a shared 3D canonical space. We introduce a general framework that tackles the task without assuming shape templates, poses, or any camera parameters. At its core is a canonical 3D representation that encapsulates geometric and semantic information. The framework optimizes for the canonical representation together with the pose for each input image, and a per-image coordinate map that warps 2D pixel coordinates to the 3D canonical frame to account for the shape matching. The optimization procedure fuses prior knowledge from a pre-trained image generative model and semantic information from input images. The former provides strong knowledge guidance for this under-constraint task, while the latter provides the necessary information to mitigate the training data bias from the pre-trained model. Our framework can be used for various tasks such as pose estimation and image editing, achieving strong results on real-world image datasets under challenging illumination conditions and on in-the-wild online image collections. Project page at https://ai.stanford.edu/~yzzhang/projects/3d-congealing/.

References

[1]

Boss M et al. SAMURAI: shape and material from unconstrained real-world arbitrary image collections Adv. Neural. Inf. Process. Syst. 2022 35 26389-26403

[2]

Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9650–9660 (2021)

[3]

Chen X, Dong Z, Song J, Geiger A, and Hilliges O Vedaldi A, Bischof H, Brox T, and Frahm J-M Category level object pose estimation via neural analysis-by-synthesis Computer Vision – ECCV 2020 2020 Cham Springer 139-156

Digital Library

[4]

Chen, Y., et al.: Local-to-global registration for bundle-adjusting neural radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognitionm, pp. 8264–8273 (2023)

[5]

Cheng, W., Cao, Y.P., Shan, Y.: Id-pose: sparse-view camera pose estimation by inverting diffusion models. arXiv preprint arXiv:2306.17140 (2023)

[6]

Deng, Y., Yang, J., Tong, X.: Deformed implicit field: modeling 3D shapes with learned dense correspondence. In: CVPR (2021)

[7]

Gal, R., et al.: An image is worth one word: personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618 (2022)

[8]

Goodfellow I et al. Generative adversarial networks Commun. ACM 2020 63 11 139-144

Digital Library

[9]

Goodwin W, Vaze S, Havoutis I, and Posner I Avidan S, Brostow G, Cissé M, Farinella GM, and Hassner T Zero-shot category-level object pose estimation Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXIX 2022 Cham Springer Nature Switzerland 516-532

Digital Library

[10]

Gower, J.C., Dijksterhuis, G.B.: Procrustes problems, vol. 30. OUP Oxford (2004)

[11]

Gupta, K., et al.: ASIC: aligning sparse in-the-wild image collections. arXiv preprint arXiv:2303.16201 (2023)

[12]

Huang, G., Mattar, M., Lee, H., Learned-Miller, E.: Learning to align from scratch. Adv. Neural Inf. Process. Syst. 25 (2012)

[13]

Huang, G.B., Jain, V., Learned-Miller, E.: Unsupervised joint alignment of complex images. In: ICCV, pp. 1–8. IEEE (2007)

[14]

Jampani, V., et al.: Navi: Category-agnostic image collections with high-quality 3D shape and pose annotations. arXiv preprint arXiv:2306.09109 (2023)

[15]

Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015)

[16]

Kirillov, A., et al.: Segment anything. arXiv preprint arXiv:2304.02643 (2023)

[17]

Kuang Z et al. NeROIC: neural rendering of objects from online image collections ACM Trans. Graph. (TOG) 2022 41 4 1-12

Digital Library

[18]

Learned-Miller EG Data driven image models through continuous joint alignment IEEE TPAMI 2005 28 2 236-250

Digital Library

[19]

Lin, A., Zhang, J.Y., Ramanan, D., Tulsiani, S.: RelPose++: recovering 6D poses from sparse-view observations. arXiv preprint arXiv:2305.04926 (2023)

[20]

Lin, C.H., Ma, W.C., Torralba, A., Lucey, S.: BARF: bundle-adjusting neural radiance fields. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5741–5751 (2021)

[21]

Liu, R., Wu, R., Van Hoorick, B., Tokmakov, P., Zakharov, S., Vondrick, C.: Zero-1-to-3: Zero-shot one image to 3D object. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9298–9309 (2023)

[22]

Lorensen WE and Cline HE Marching cubes: a high resolution 3D surface construction algorithm ACM SIGGRAPH Comput. Graph. 1987 21 4 163-169

Digital Library

[23]

Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Conference on Learning Representations (2018)

[24]

Martin-Brualla, R., Radwan, N., Sajjadi, M.S.M., Barron, J.T., Dosovitskiy, A., Duckworth, D.: NeRF in the wild: neural radiance fields for unconstrained photo collections. In: CVPR (2021)

[25]

Meng, Q., et al.: GNeU: GAN-based neural radiance field without posed camera. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6351–6361 (2021)

[26]

Mildenhall B et al. Nerf: representing scenes as neural radiance fields for view synthesis Commun. ACM 2021 65 1 99-106

Digital Library

[27]

Miller, E.G., Matsakis, N.E., Viola, P.A.: Learning from one example through shared densities on transforms. In: CVPR, vol. 1, pp. 464–471. IEEE (2000)

[28]

Min, J., Lee, J., Ponce, J., Cho, M.: SPair-71k: a large-scale benchmark for semantic correspondence. arXiv preprint arXiv:1908.10543 (2019)

[29]

Ofri-Amar, D., Geyer, M., Kasten, Y., Dekel, T.: Neural congealing: aligning images to a joint semantic atlas. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19403–19412 (2023)

[30]

Oquab, M., et al.: DINOv2: learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023)

[31]

Peebles, W., Zhu, J.Y., Zhang, R., Torralba, A., Efros, A.A., Shechtman, E.: GAN-supervised dense visual alignment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13470–13481 (2022)

[32]

Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: DreamFusion: Text-to-3D using 2D diffusion. arXiv preprint arXiv:2209.14988 (2022)

[33]

Raj, A., et al.: DreamBooth3D: subject-driven text-to-3D generation. arXiv preprint arXiv:2303.13508 (2023)

[34]

Reizenstein, J., Shapovalov, R., Henzler, P., Sbordone, L., Labatut, P., Novotny, D.: Common objects in 3D: large-scale learning and evaluation of real-life 3D category reconstruction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10901–10911 (2021)

[35]

Ren, T., et al.: Grounded SAM: assembling open-world models for diverse visual tasks. arXiv preprint arXiv:2401.14159 (2024)

[36]

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684–10695 (2022)

[37]

Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., Aberman, K.: DreamBooth: fine tuning text-to-image diffusion models for subject-driven generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22500–22510 (2023)

[38]

Schonberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4104–4113 (2016)

[39]

Shi, Y., Wang, P., Ye, J., Long, M., Li, K., Yang, X.: MVDream: multi-view diffusion for 3D generation. arXiv preprint arXiv:2308.16512 (2023)

[40]

Sun, X., et al.: Pix3D: dataset and methods for single-image 3D shape modeling. In: CVPR (2018)

[41]

Wang, H., Sridhar, S., Huang, J., Valentin, J., Song, S., Guibas, L.J.: Normalized object coordinate space for category-level 6d object pose and size estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2642–2651 (2019)

[42]

Wang, J., Rupprecht, C., Novotny, D.: PoseDiffusion: solving pose estimation via diffusion-aided bundle adjustment. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9773–9783 (2023)

[43]

Wang, P., Liu, L., Liu, Y., Theobalt, C., Komura, T., Wang, W.: NeuS: learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689 (2021)

[44]

Wang, Z., Wu, S., Xie, W., Chen, M., Prisacariu, V.A.: NeRF–: neural radiance fields without known camera parameters. arXiv preprint arXiv:2102.07064 (2021)

[45]

Yariv L, Gu J, Kasten Y, and Lipman Y Volume rendering of neural implicit surfaces Adv. Neural. Inf. Process. Syst. 2021 34 4805-4815

[46]

Yen-Chen, L., Florence, P., Barron, J.T., Rodriguez, A., Isola, P., Lin, T.Y.: INeRF: inverting neural radiance fields for pose estimation. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1323–1330. IEEE (2021)

[47]

Zhang, J., Yang, G., Tulsiani, S., Ramanan, D.: NeRS: neural reflectance surfaces for sparse-view 3D reconstruction in the wild. In: Advances in Neural Information Processing Systems, vol. 34, pp. 29835–29847 (2021)

[48]

Zhang JY, Ramanan D, and Tulsiani S Avidan S, Brostow G, Cissé M, Farinella GM, and Hassner T RelPose: predicting probabilistic relative rotation for single objects in the wild Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXI 2022 Cham Springer Nature Switzerland 592-611

Digital Library

Index Terms

3D Congealing: 3D-Aware Image Alignment in the Wild
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object recognition
        Shape inference
      2. Computer vision representations
        Hierarchical representations
        Shape representations
  2. Computer graphics
    1. Shape modeling

Index terms have been assigned to the content through auto-classification.

Recommendations

Pose-Invariant 3D Face Alignment
ICCV '15: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV)

Face alignment aims to estimate the locations of a set of landmarks for a given image. This problem has received much attention as evidenced by the recent advancement in both the methodology and performance. However, most of the existing works neither ...
3D face recognition: a survey

3D face recognition has become a trending research direction in both industry and academia. It inherits advantages from traditional 2D face recognition, such as the natural recognition process and a wide range of applications. Moreover, 3D face ...
Uncertainty-Aware Semi-Supervised Learning of 3D Face Rigging from Single Image
MM '22: Proceedings of the 30th ACM International Conference on Multimedia

We present a method to rig 3D faces via Action Units (AUs), viewpoint and light direction, from single input image. Existing 3D methods for face synthesis and animation rely heavily on 3D morphable model (3DMM), which was built on 3D data and cannot ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part I

Sep 2024

580 pages

ISBN:978-3-031-73231-7

DOI:10.1007/978-3-031-73232-4

Editors:
Aleš Leonardis
https://ror.org/03angcq70University of Birmingham, Birmingham, UK
,
Elisa Ricci
https://ror.org/05trd4x28University of Trento, Trento, Italy
,
Stefan Roth
https://ror.org/05n911h24Technical University of Darmstadt, Darmstadt, Germany
,
Olga Russakovsky
https://ror.org/00hx57361Princeton University, Princeton, NJ, USA
,
Torsten Sattler
https://ror.org/03kqpb082Czech Technical University in Prague, Prague, Czech Republic
,
Gül Varol
https://ror.org/02nwvxz07École des Ponts ParisTech, Marne-la-Vallée, France

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 30 September 2024

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Table of Conten