Abstract
A simple method is presented for detecting, localizing and recognizing instances of classes of objects, while accommodating a wide variation in an object's pose. The method utilizes a small two-dimensional template that is warped into an image, and converts localization to a one-dimensional sub-problem, with the search for a match between image and template executed by dynamic programming. For roughly cylindrical objects (like heads), the method recovers three of the six degrees of freedom of motion (2 translation, 1 rotation), and accommodates two more degrees of freedom in the search process (1 rotation, 1 translation). Experiments demonstrate that the method provides an efficient search strategy that outperforms normalized correlation. This is demonstrated in the example domain of face detection and localization, and can extended to more general detection tasks. An additional technique recovers rough object pose from the match results, and is used in a two stage recognition experiment in conjunction with maximization of mutual information.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Baker, H.H. and Binford, T.O. 1981. Depth from edge and intensity based stereo. In Proc. 7th IJCAI, pp. 631–636.
Ballard, D. and Brown, C. 1982. In Computer Vision. Prentice Hall.
Barrow, H.G. 1976. Interactive aids for cartography and photo interpretation. SRI Tech. Report, SRI International.
Betke, M. and Makris, N. 1995. Fast object recognition in noisy images using simulated annealing. In Proc. Int. Conf. on Computer Vision, pp. 523–530.
Beymer, D. 1993. Face recognition under varying pose. AI Memo 1461, Artificial Intelligence Lab at MIT, Cambridge, MA.
Brunelli, R. and Poggio, T. 1993. Face recognition: Features versus templates. IEEE Trans. on Pattern Analysis and Machine Intelligence, 15(10):1042–1052.
Cootes, T.F., Taylor, C.J., Lanitis, A., Cooper D.H., Graham, J. 1993. Building and using flexible models incorporating gray level information. In Proc. Int. Conf. on Computer Vision, Berlin, pp. 242–246.
Corman, T., Leiserson, C., and Rivest, R. 1990. Introduction to Algorithms. McGraw Hill.
Cyberware Incorporated Monterey, CA.
Forney, G.D. 1973. The Viterbi algorithm. In Proceedings IEEE, Vol. 61, pp. 268–278.
Hornegger, J. 1995. Statistical learning, localization and identification of objects. In Proc. Int. Conf. on Computer Vision, Cambridge, MA, pp. 914–919.
Huttenlocher, D.P. and Ullman, S. 1990. Recognizing solid objects by alignment with an image. Int. Journal of Computer Vision 5(2):195–212.
Huttenlocher, D.P., Lilien, R., and Olson, C. 1996. Object recognition using subspace methods. In Proc. European Conf. on Computer Vision, pp. 537–545.
Mahmood, S.T.F. and Zhu, W. 1998. Image organization and retrieval using a flexible shape model. In Proc. of Content Based Access of Image and Video Libraries.
Murase, H. and Nayar, S. 1995. Learning and recognition of 3-d objects from brightness images. AAAI Fall Symposium Series Working Notes, AAAI.
Press, W. and Flannery, B. et al. 1990. Numerical Recipes in C. Cambridge University Press.
Ohta, Y. and Kanade, T. 1985. Stereo by intra and inter-scanline search using dynamic programming. IEEE Trans. on Pattern Analysis and Machine Intelligence, 7(2).
Pentland, A., Moghaddam, B., and Starner, T. 1994. View-based and modular eigenspaces for face recognition. In Proc. Computer Vision and Pattern Recognition, pp. 84–91.
Romano, R., Beymer, D., and Poggio, T. 1996. Face verification for real time applications. ARPA, IU Workshop, Vol. 1.
Rowley, H., Baluja, S., and Kanade, T. 1995. Human face detection in visual scenes. CMU-CS-95-158R, Carnegie Mellon University, Pittsburg, PA.
Rowley, H., Baluja, S., and Kanade, T. 1998. Rotation invariant neural-network based face detection. In Proc. Computer Vision and Pattern Recognition.
Rucklidge, J. 1994. Locating objects using the hausdorff distance. Proc. Int. Conference on Computer Vision, pp. 457–464.
Schneiderman, H. and Kanade, T. 1998. Probabilistic modeling of local appearance and spatial relationships for object recognition. In Proc. Computer Vision and Pattern Recognition.
Sakoe, H. and Chiba, S. 1978. Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoustics, Speech, and Signal Proc., Vol. ASSP-26, pp. 43–49.
Shashua, A. and Ullman, S. 1988. Structural saliency: The detection of globally salient structures using a locally connected network. In Proc. Int. Conference on Computer Vision, pp. 321–327.
Sinha, P. 1994. Object recognition via image invariants: Acase study. In Investigative Opthamology and Visual Science, Florida.
Sung, K. and Poggio, T. 1994. Example based learning for viewbased human face detection. AI Memo 1521, MIT. Cambridge, MA.
Turk, M. and Pentland, A. 1991. Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1).
Ullman, S. and Basri, R. 1991. Recognition by linear combination of models. IEEE Trans. on Pattern Analysis and Machine Intelligence, 13(10).
Vaillant, R., Monrocq, C., and Le Cun, Y. 1994. Original approach for the localization of objects in images. IEEE Proc. on Vision, Image and Signal Processing, Vol. 141, No.4.
Viola, P. and Wells, W.M. 1995. Alignment by maximization of mutual information. In Proc. Int. Conference on Computer Vision, Cambridge, MA.
Viterbi, A.J. 1967. Error bounds for convolution codes and an asymptotically optimal decoding algorithm. IEEE Trans. on Information Theory, IT-13:260–269.
Yuille, A., Hallinan, P., and Cohen, D. 1992. Feature extraction from faces using deformable templates. Int. Journal of Computer Vision, 8(2):99–111.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Ratan, A.L., Grimson, W.E.L. & Wells, W.M. Object Detection and Localization by Dynamic Template Warping. International Journal of Computer Vision 36, 131–147 (2000). https://doi.org/10.1023/A:1008147915077
Issue Date:
DOI: https://doi.org/10.1023/A:1008147915077