Abstract
We introduce the self-similar sketch, a new method for the extraction of intermediate image features that combines three principles: detection of self-similarity structures, nonaccidental alignment, and instance-specific modelling. The method searches for self-similar image structures that form nonaccidental patterns, for example collinear arrangements. We demonstrate a simple implementation of this idea where self-similar structures are found by looking for SIFT descriptors that map to the same visual words in image-specific vocabularies. This results in a visual word map which is searched for elongated connected components. Finally, segments are fitted to these connected components, extracting linear image structures beyond the ones that can be captured by conventional edge detectors, as the latter implicitly assume a specific appearance for the edges (steps). The resulting collection of segments constitutes a “sketch” of the image. This is applied to the task of estimating vanishing points, horizon, and zenith in standard benchmark data, obtaining state-of-the-art results. We also propose a new vanishing point estimation algorithm based on recently introduced techniques for the continuous-discrete optimisation of energies arising from model selection priors.
Chapter PDF
Similar content being viewed by others
References
Baumberg, A.: Reliable feature matching across widely separated views. In: CVPR, pp. 1774–1781. IEEE Press, New York (2000)
Mikolajczyk, K., Schmid, C.: An Affine Invariant Interest Point Detector. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part I. LNCS, vol. 2350, pp. 128–142. Springer, Heidelberg (2002)
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: Rosin, P.L., Marshall, A.D. (eds.) BMVC, British Machine Vision Association (2002)
Lowe, D.G.: Object recognition from local scale-invariant features. In: ICCV, pp. 1150–1157 (1999)
Vedaldi, A., Soatto, S.: Features for recognition: Viewpoint invariance for non-planar scenes. In: ICCV, pp. 1474–1481. IEEE Press, New York (2005)
Harris, C., Stephens, M.: A combined corner and edge detector. In: Proc. of The Fourth Alvey Vision Conference, pp. 147–151 (1988)
Canny, J.: A computational approach to edge detection. IEEE Trans. on Patt. Analysis and Machine Intell. 8 (1986)
Lindeberg, T.: Principles for automatic scale selection. Technical Report ISRN KTH/NA/P 98/14 SE, Royal Institute of Technology (1998)
Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. IEEE Trans. on Patt. Analysis and Machine Intell. 33, 898–916 (2011)
Rosten, E., Drummond, T.: Machine Learning for High-Speed Corner Detection. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 430–443. Springer, Heidelberg (2006)
Shechtman, E., Irani, M.: Matching local self-similarities across images and videos. In: CVPR. IEEE Press, New York (2007)
Biederman, I.: Recognition-by-components: A theory of human image understanding. Psychological Review 94 (1987)
Agin, G.J., Binford, T.O.: Computer description of curved objects. IEEE Trans. Comp.s 25, 439–449 (1976)
Gibson, B.M., Gosselin, O.F., Gosselin, F., Schyns, P.G., Wasserman, E.A.: Nonaccidental properties underlie shape recognition in mammalian and nonmammalian vision. Current Biology 17 (2007)
Jojic, N., Caspi, Y.: Capturing image structure with probabilistic index maps. In: CVPR (1), pp. 212–219 (2004)
Deselaers, T., Ferrari, V.: Global and efficient self-similarity for object classification and detection. In: CVPR, pp. 1633–1640. IEEE Press, New York (2010)
Isack, H.N., Boykov, Y.: Energy-based geometric multi-model fitting. Int. Journal of Comp. Vision 97, 123–147 (2012)
Delong, A., Osokin, A., Isack, H.N., Boykov, Y.: Fast approximate energy minimization with label costs. Int. Journal of Comp. Vision 96, 1–27 (2012)
Kahn, P., Kitchen, L.J., Riseman, E.M.: A fast line finder for vision-guided robot navigation. IEEE Trans. Patt. Anal. Mach. Intell. 12, 1098–1102 (1990)
Schaffalitzky, F., Zisserman, A.: Planar grouping for automatic detection of vanishing lines and points. Image Vision Comput. 18, 647–658 (2000)
Kǒsecká, J., Zhang, W.: Video Compass. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part IV. LNCS, vol. 2353, pp. 476–490. Springer, Heidelberg (2002)
Denis, P., Elder, J.H., Estrada, F.J.: Efficient Edge-Based Methods for Estimating Manhattan Frames in Urban Imagery. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 197–210. Springer, Heidelberg (2008)
Tardif, J.P.: Non-iterative approach for fast and accurate vanishing point detection. In: ICCV, pp. 1250–1257. IEEE Press, New York (2009)
Barinova, O., Lempitsky, V.S., Tretiak, E., Kohli, P.: Geometric Image Parsing in Man-Made Environments. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part II. LNCS, vol. 6312, pp. 57–70. Springer, Heidelberg (2010)
Flint, A., Mei, C., Reid, I.D., Murray, D.W.: Growing semantically meaningful models for visual slam. In: CVPR, pp. 467–474. IEEE Press, New York (2010)
Flint, A., Murray, D.W., Reid, I.: Manhattan scene understanding using monocular, stereo, and 3d features. In: Metaxas, D.N., Quan, L., Sanfeliu, A., Gool, L.J.V. (eds.) ICCV, pp. 2228–2235. IEEE Press, New York (2011)
Hoiem, D., Efros, A.A., Hebert, M.: Closing the loop in scene interpretation. In: CVPR. IEEE Press, New York (2008)
Tretiak, E., Barinova, O., Kohli, P., Lempitsky, V.S.: Geometric image parsing in man-made environments. Int. Journal of Comp. Vision 97, 305–321 (2012)
Vedaldi, A., Fulkerson, B.: VLFeat: an open and portable library of computer vision algorithms. In: Bimbo, A.D., Chang, S.F., Smeulders, A.W.M. (eds.) ACM Multimedia, pp. 1469–1472. ACM (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vedaldi, A., Zisserman, A. (2012). Self-similar Sketch. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds) Computer Vision – ECCV 2012. ECCV 2012. Lecture Notes in Computer Science, vol 7573. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33709-3_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-33709-3_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33708-6
Online ISBN: 978-3-642-33709-3
eBook Packages: Computer ScienceComputer Science (R0)