Shape-From-Silhouette Across Time Part I: Theory and Algorithms

Kong-man (German) Cheung¹,
Simon Baker¹ &
Takeo Kanade¹

859 Accesses
92 Citations
3 Altmetric
Explore all metrics

Abstract

Shape-From-Silhouette (SFS) is a shape reconstruction method which constructs a 3D shape estimate of an object using silhouette images of the object. The output of a SFS algorithm is known as the Visual Hull (VH). Traditionally SFS is either performed on static objects, or separately at each time instant in the case of videos of moving objects. In this paper we develop a theory of performing SFS across time: estimating the shape of a dynamic object (with unknown motion) by combining all of the silhouette images of the object over time. We first introduce a one dimensional element called a Bounding Edge to represent the Visual Hull. We then show that aligning two Visual Hulls using just their silhouettes is in general ambiguous and derive the geometric constraints (in terms of Bounding Edges) that govern the alignment. To break the alignment ambiguity, we combine stereo information with silhouette information and derive a Temporal SFS algorithm which consists of two steps: (1) estimate the motion of the objects over time (Visual Hull Alignment) and (2) combine the silhouette information using the estimated motion (Visual Hull Refinement). The algorithm is first developed for rigid objects and then extended to articulated objects. In the Part II of this paper we apply our temporal SFS algorithm to two human-related applications: (1) the acquisition of detailed human kinematic models and (2) marker-less motion tracking.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial Intelligence

References

Aggarwal, J., Cai, Q., Liao, W., and Sabata, B. 1994. Articulated and elastic non-rigid motion: A review. In Proceedings of IEEE Workshop on Motion of Non-rigid and Articulated Objects’94, pp. 16–22.
Ahuja, N. and Veenstra, J. 1989. Generating octrees from object silhouettes in orthographic views. IEEE Transactions Pattern Analysis and Machine Intelligence, 11(2):137–149.
Article Google Scholar
Baumgart, B.G. 1974. Geometric modeling for computer vision. Ph.D. thesis, Stanford University.
Besl, P. and McKay, N. 1992. A method of registration of 3D shapes. IEEE Transaction on Pattern Analysis and Machine Intelligence, 14(2):239–256.
Article Google Scholar
Bottino, A. and Laurentini, A. 2000. Non-intrusive silhouette based motion capture. In Proceedings of the Fourth World Multiconference on Systemics, Cybernetics and Informatics SCI 2001, pp. 23–26.
Buehler, C., Matusik, W., McMillan, L., and Gortler, S. 1999. Creating and rendering image-based visual hulls. Technical Report MIT-LCS-TR-780, MIT.
Buehler, C., Matusik, W., and McMillan, L. 2001. Polyhedral visual hulls for real-time rendering. In Proceedings of the 12th Eurographics Workshop on Rendering.
Cheung, G., Baker, S., and Kanade, T. 2003. Visual hull alignment and refinement across time:a 3D reconstruction algorithm combining shape-frame-silhouette with stereo. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR’03), Madison, MI.
Cheung, G. 2003. Visual Hull Construction, Alignment and Refinement for Human Kinematic Modeling, Motion Tracking and Rendering. Ph.D. thesis, Carnegie Mellon University.
Delamarre, Q. and Faugeras, O. 1999. 3D articulated models and multi-view tracking with silhouettes. In Proceedings of International Conference on Computer Vision (ICCV’99), Corfu, Greece.
Dempster, A., Laird, N., and Rubin, D. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of Statistical Society, B 39:1–38.
Google Scholar
Dennis, J. and Schnabel, R. 1983. Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Prentice Hall, Englewood Cliffs, NJ.
Google Scholar
Irani, M., Hassner, T., and Anandan, P. 2002. What does the scene look like from a scene point? In Proceedings of European Conference on Computer Vision (ECCV’02), Copenhagen, Denmark. pp. 883–897.
Jain, A. 1989. Fundamentals of Digital Image Processing. Prentice Hall.
Joshi, T., Ahuja, N., and Ponce, J. 1994. Towards structure and motion estimation from dynamic silhouettes. In Proceedings of IEEE Workshop on Motion of Non-rigid and Articulated Objects, pp. 166–171.
Joshi, T., Ahuja, N., and Ponce, J. 1995. Structure and motion estimation from dynamic silhouettes under perspective projection. Technical Report UIUC-BI-AI-RCV-95-02, University of Illinois Urbana Champaign.
Kakadiaris, I. and Metaxas, D. 1998. 3D human body model acquisition from multiple views. International Journal on Computer Vision, 30(3):191–218.
Article Google Scholar
Ke, Q. and Kanade, T. 2001. A subspace approach to layer extraction. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR’01), Kauai, HI.
Kim, Y. and Aggarwal, J. 1986. Rectangular parallelepiped coding: A volumetric representation of three dimensional objects. IEEE Journal of Robotics and Automation, RA-2:127–134.
Google Scholar
Krahnstoever, N., Yeasin, M., and Sharma, R. 2001. Automatic acquisition and initialization of kinematic models. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR’01), Technical Sketches, Kauai, HI.
Krahnstoever, N., Yeasin, M., and Sharma, R. 2003. Automatic acquisition and initialization of articulated models. In To appear in Machine Vision and Applications (to accepted).
Kurazume, R., Nishino, K., Zhang, Z., and Ikeuchi, K. 2002. Simultaneous 2D images and 3D geometric model registration for texture mapping utilizing reflectance attribute. In Proceedings of Asian Conference on Computer Vision (ACCV’02), vol. 1, pp. 99–106.
Kutulakos, K. and Seitz, S. 2000. A theory of shape by space carving. International Journal of Computer Vision, 38(3):199–218.
Article Google Scholar
Laurentini, A. 1991. The visual hull: A new tool for contour-based image understanding. In Proceedings of the Seventh Scandinavian Conference on Image Analysis, pp. 993–1002.
Laurentini, A. 1994. The visual hull concept for silhouette-based image understanding. IEEE Transactions Pattern Analysis and Machine Intelligence, 16(2):150–162.
Article Google Scholar
Laurentini, A. 1995. How far 3D shapes can be understood from 2D silhouettes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(2):188–195.
Article Google Scholar
Laurentini, A. 1999. The visual hull of curved objects. In Proceedings of International Conference on Computer Vision (ICCV’99), Corfu, Greece.
Lazebnik, S., Boyer, E., and Ponce, J. 2001. On computing exact visual hulls of solids bounded by smooth surfaces. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR’01), Kauai HI.
Martin, W. and Aggarwal, J. 1983. Volumetric descriptions of objects from multiple views. IEEE Transactions on Pattern Analysis and Machine Intelligence, 5(2):150–174.
Google Scholar
Matusik, W. 2001. Image-based visual hulls. Master’s thesis, Massachusetts Institute of Technology.
Matusik, W., Buehler, C., Raskar, R., Gortler, S., and McMillan, L. 2000. Image-based visual hulls. In Computer Graphics Annual Conference Series (SIGGRAPH’00), New Orleans, LA.
Mendonca, P., Wong, K., and Cipolla, R. 2000. Camera pose estimation and reconstruction from image profiles under circular motion. In Proceedings of European Conference on Computer Vision (ECCV’00), Dublin, Ireland, pp. 864–877.
Mendonca, P., Wong, K., and Cipolla, R. 2001. Epipolar geometry from profiles under circular motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6):604–616.
Article Google Scholar
Moezzi, S., Tai, L., and Gerard, P. 1997. Virtual view generation for 3D digital video. IEEE Computer Society Multimedia, 4(1).
Noborio, H., Fukuda, S., and Arimoto, S. 1988. Construction of the octree approximating three-dimensional objects by using multiple views. IEEE Transactions Pattern Analysis and Machine Intelligence, 10(6):769–782.
Article Google Scholar
Okutomi, M. and Kanade, T. 1993. A multiple-baseline stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(4):353–363.
Article Google Scholar
Poelman, C. and Kanade, T. 1992. A paraperspective factorization method for shape and motion recovery. Technical Report CMU-CS-TR-92-208, Carnegie Mellon University, Pittsburgh, PA.
Potmesil, M. 1987. Generating octree models of 3D objects from their silhouettes in a sequence of images. Computer Vision, Graphics and Image Processing, 40:1–20.
Google Scholar
Press, W., Teukolsky, S., Vetterling, W., and Flannery, B. 1993. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press.
Quan, L. and Kanade, T. 1996. A factorization method for affine structure from line correspondences. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR’96), San Francisco, CA, pp. 803–808.
Rusinkiewicz, S. and Levoy, M. 2001. Efficient variants of the ICP algorithm. In Third International Conference on 3D Digital Imaging and Modeling, pp. 145–152.
Sawhney, H. and Ayer, S. 1996. Compact representations of videos through dominant and multiple motion estimation. IEEE Transaction on Pattern Analysis and Machine Intelligence, 18(8):814–830.
Article Google Scholar
Shanmukh, K. and Pujari, A. 1991. Volume intersection with optimal set of directions. Pattern Recognition Letter, 12:165–170.
Article Google Scholar
Szeliski, R. 1993. Rapid octree construction from image sequences. Computer Vision, Graphics and Image Processing: Image Understanding, 58(1):23–32.
Google Scholar
Szeliski, R. 1994. Image mosaicing for tele-reality applications. Technical Report CRL 94/2, Compaq Cambridge Research Laboratory.
Szeliski, R. and Golland, P. 1998. Stereo matching with transparency and matting. In Proceedings of the Sixth International Conference on Computer Vision (ICCV’98), pp. 517–524, Bombay, India.
Tomasi, C. and Kanade, T. 1992. Shape and motion from image streams under orthography: A factorization method. International Journal of Computer Vision, 9(2):137–154.
Article Google Scholar
Vijayakumar, B., Kriegman, D., and Ponce, J. 1996. Structure and motion of curved 3D objects from monocular silhouettes. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR’96), San Francisco, CA, pp. 327–334.
Wheeler, M. 1996. Automatic Modeling and Localization for Object Recognition. PhD thesis, Carnegie Mellon University.
Wong, K. and Cipolla, R. 2001. Head model acquisition and silhouettes. In Proceedings of International Workshop on Visual Form (IWVF-4).
Wong, K. and Cipolla, R. 2001. Structure and motion from silhouettes. In Proceedings of International Conference on Computer Vision (ICCV’01), Vancouver, Canada.
Zhang, Z. 1994. Iterative point matching for registration of free-form curves and surfaces. International Journal of Computer Vision, 13(2):119–152.
Article Google Scholar

Download references

Author information

Authors and Affiliations

The Robotics Institute, Carnegie Mellon University, New York
Kong-man (German) Cheung, Simon Baker & Takeo Kanade

Authors

Kong-man (German) Cheung
View author publications
You can also search for this author in PubMed Google Scholar
Simon Baker
View author publications
You can also search for this author in PubMed Google Scholar
Takeo Kanade
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kong-man (German) Cheung.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cheung, Km., Baker, S. & Kanade, T. Shape-From-Silhouette Across Time Part I: Theory and Algorithms. Int J Comput Vision 62, 221–247 (2005). https://doi.org/10.1007/s11263-005-4881-5

Download citation

Received: 13 October 2003
Revised: 27 May 2004
Accepted: 27 May 2004
Published: 01 November 2004
Issue Date: May 2005
DOI: https://doi.org/10.1007/s11263-005-4881-5

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Modal Space: A Physics-Based Model for Sequential Estimation of Time-Varying Shape from Monocular Video

Combining Local-Physical and Global-Statistical Models for Sequential Deformable Shape from Motion

On Mean Pose and Variability of 3D Deformable Models

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Shape-From-Silhouette Across Time Part I: Theory and Algorithms

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Modal Space: A Physics-Based Model for Sequential Estimation of Time-Varying Shape from Monocular Video

Combining Local-Physical and Global-Statistical Models for Sequential Deformable Shape from Motion

On Mean Pose and Variability of 3D Deformable Models

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation