Abstract
We present an approach to real-time person tracking in crowded and/or unknown environments using integration of multiple visual modalities. We combine stereo, color, and face detection modules into a single robust system, and show an initial application in an interactive, face-responsive display. Dense, real-time stereo processing is used to isolate users from other objects and people in the background. Skin-hue classification identifies and tracks likely body parts within the silhouette of a user. Face pattern detection discriminates and localizes the face within the identified body parts. Faces and bodies of users are tracked over several temporal scales: short-term (user stays within the field of view), medium-term (user exits/reenters within minutes), and long term (user returns after hours or days). Short-term tracking is performed using simple region position and size correspondences, while medium and long-term tracking are based on statistics of user appearance. We discuss the failure modes of each individual module, describe our integration method, and report results with the complete system in trials with thousands of users.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Darrell, T., Gordon, G., Woodfill, W., and Baker, H. 1997. A magic morphin mirror. In SIGGRAPH' 97 Visual Proceedings. ACM Press.
Darrell, T., Harville, M., Gordon, G., and Woodfill, W. 1998. Mass hallucinations. SIGGRAPH' 98 Visual Proceedings. ACM Press. Also shown at CVPR'98 Demonstration Session, Santa Barbera, CA, June 1998, and at The Tech Museum of Innovation, San Jose, Oct. 1998– April 1999.
Darrell, T., Maes, P., Blumberg, B., and Pentland, A. 1994. A novel environment for situated vision and behavior. In IEEE Workshop on Visual Behaviors, CVPR' 94, Seattle. IEEE CS Press.
Fleck, M., Forsyth, D., and Bregler, C. 1996. Finding naked people. In European Conference on Computer Vision, Vol. II, pp. 592–602.
Isard, M. and Blake, A. 1998. Condensation: Unifying low-level and high-level tracking in a stochastic framework. In Proc. 5th European Conf. Computer Vision, Vol. 1, pp. 893–908.
Kanade, T., Yoshida, A., Oda, K., Kano, H., and Tanaka, M. 1996. A video-rate stereo machine and its new applications. In Computer Vision and Pattern Recognition Conference, San Francisco, CA.
Maes, P., Darrell, T., Blumberg, B., and Pentland, A.P. 1996. The ALIVE system:Wireless, full-body, interaction with autonomous agents. ACMMultimedia Systems: Special Issue on on Multimedia and Multisensory Virtual Worlds.
Poggio, T. and Sung, K.K. 1994. Example-based learning for viewbased human face detection. In Proceedings of the ARPA IU Workshop' 94, Vol. II, pp. 843–850.
Rehg, J., Loughlin, M., and Waters, K. 1997. Vision for a smart Kiosk. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, CVPR-97. IEEE Computer Society Press, pp. 690–696.
Rehg, J., Murphy, K., and Fieguth, P. 1999. Vision-based speaker detection using Bayesian networks. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, CVPR-99. IEEE Computer Society Press, pp. 110–116.
Rowley, H., Baluja, S., and Kanade, T. 1996. Neural network-based face detection. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, CVPR-96. IEEE Computer Society Press, pp. 203–207.
Toyama, K. and Hager, G. 1996. Incremental focus of attention for robust visual tracking. In Proceedings of the 1996 IEEE Conference on Computer Vision and Pattern Recognition; pp. 189–195.
Woodfill, J. and Von Herzen, B. 1997. Real-time stereo vision on the PARTS reconfigurable computer. In Proceedings IEEE Symposium on Field-Programmable Custom Computing Machines, Napa, pp. 242–250.
Wren, C., Azarbayejani, A., Darrell, T., and Pentland, A. 1997. Pfinder: Real-time tracking of the human body. In IEEE Transactions on Pattern Analysis and Machine Intelligence.
Zabih, R. and Woodfill, J. 1994. Non-parametric local transforms for computing visual correspondence. In Proceedings of the Third European Conference on Computer Vision, Stockholm, pp. 151–158.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Darrell, T., Gordon, G., Harville, M. et al. Integrated Person Tracking Using Stereo, Color, and Pattern Detection. International Journal of Computer Vision 37, 175–185 (2000). https://doi.org/10.1023/A:1008103604354
Issue Date:
DOI: https://doi.org/10.1023/A:1008103604354