Abstract
We describe the Photobook system, which is a set of interactive tools for browsing and searching images and image sequences. These query tools differ from those used in standard image databases in that they make direct use of the image content rather than relying on text annotations. Direct search on image content is made possible by use of semantics-preserving image compression, which reduces images to a small set of perceptually-significant coefficients. We discuss three types of Photobook descriptions in detail: one that allows search based on appearance, one that uses 2-D shape, and a third that allows search based on textural properties. These image content descriptions can be combined with each other and with text-based descriptions to provide a sophisticated browsing and search capability. In this paper we demonstrate Photobook on databases containing images of people, video keyframes, hand tools, fish, texture swatches, and 3-D medical data.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Adelson, E. and Bergen, J. 1991. The plenoptic function and the elements of early vision. In M. Landy and J.A. Movshon (Eds.), Computational Models of Visual Processing, MIT Press.
ACMSIGIR. 1991. Proceedings of International Conference on Multimedia Information Systems, Singapore.
Ballard, D. and Brown, C. 1982. Computer Vision. Prentice Hall.
Binaghi. E., Gagliardi, I., and Schettini, R. Indexing and fuzzy logicbased retrieval of color images. In Visual Database Systems, II, IFIP Transactions, A-7:79–92.
Blanz, W.E., Petkovic, D., and Sanz, J.L. 1989. Algorithms and Architectures for Machine Vision. C.H. Chen (Ed.), Marcel Decker Inc.
Breuel, T. 1990. Indexing for recognition from a large model base. M.I.T. Artificial Intelligence Laboratory Memo. 1108.
Brodatz, P. (1966). Textures: A Photographic Album for Artists and Designers, Dover: New York.
Chang, C.C. and Lee, S.Y. (1991). Retrieval of similar pictures on pictorial databases. Pattern recognition, 24(7):675–680.
Chang, C.-C. and Wu, T.-C. (1992). Retrieving the most similar symbolic pictures from pictorial databases. Information Processing and Management, 28(5):581–588.
Chen, Z. and Ho, S.-Y. (1991). Computer vision for robust 3D aircraft recognition with fast library search. Pattern Recognition, 24(5):375–390.
Darrell, T. and Pentland, A. 1991. Robust estimation of a multi-layer motion representation. In Proceedings IEEE Workshop on Visual Motion, pp. 173–177. Longer version available as M.I.T. Media Laboratory Perceptual Computing Technical Report No. 163.
Darrell, T., Maes, P., Blumberg, B., and Pentland, A. 1994. A novel environment for situated vision and behavior. IEEE Workshop on Visual Behaviors. Seattle. WA, pp. 68–72.
Duda, R. and Hart, P. (1973). Pattern Classification and Scene Analysis. Wiley: New York.
Francos, J. 1993. Orthogonal decompositions of 2-D random fields and their applications for 2-D spectral estimation. In N.K. Bose and C.R. Rao (Eds.), Signal Processing and its Applications. North-Holland, pp. 287–327.
Gast, P. 1993. Integrating eigenpicture analysis with an image database. M.I.T. Bachelors Thesis, Computer Science and Electrical Engineering Deptartment. Advisor: Alex Pentland.
Grosky, W.I., Neo, P., and Mehrotra, R. 1992. A pictorial index mechanism for model-based matching. Data and Knowledge Engineering, 8:309–327.
Haase, K. 1993a. FRAMER: A portable persistent representation library. Proceedings of the AAAI Workshop on AI in Systems and Support, Am. Asso. for AI.
Haase, K. 1993b. AI in service and support: Bridging the gap, Haase. Proceedings of Am. Asso. AI.
Helson, H. and Lowdenslager, D. (1962). Prediction theory and fourier series in several variables II. Acta Mathmatica, 196:175–213.
Hirata, K. and Kato, T. (1992). Query by visual example. In Advances in Database Technology EDBT'92, Third International Conference on Extending Database Technology. Springer-Verlag: Vienna, Austria.
Ioka, M. 1989. A method of defining the similarity of images on the basis of color information. Technical Report RT-003 0, IBM Tokyo Research Lab.
Ireton, M.A. and Xydeas, C.S. 1990. Classification of shape for content retrieval of images in a multimedia database. In Sixth International Conference on Digital Processing of Signals in Communications, Loughborough, UK, 2–6. IEE, pp. 111–116.
Jagadish, H.V. 1991. A retrieval technique for similar shapes. In International Conference on Management of Data, SIGMOD 91, Denver CO, ACM, pp. 208–217.
Jain, R. and Niblack, W. 1992. NSF Workshop on Visual Information Management.
Kato, T., Kurita, T., Shimogaki, H., Mizutori, T., and Fujimura, K. 1991. A cognitive approach to visual interaction. In International Conference of Multimedia Information Systems, MIS'91, ACM and National University of Singapore, pp. 109–120.
Lamdan, Y. and Wolfson, H.J. 1988. Geometric hashing: A genral and efficient model-based recognition scheme. In 2nd International Conference on Computer Vision (ICCV), Tampa, Florida, IEEE, pp. 238–249.
Lee, S.-Y. and Hsu, F.-J. (1990). 2D C-string: A new spatial knowledge representation for image database systems. Pattern Recognition, 23(10):1077–1087.
Lee, S.-Y. and Hsu, F.-J. (1992). Spatial reasoning and similarity retrieval of images using 2D c-string knowledge representation. Pattern Recognition, 25(2):305–318.
Lippman, A. 1981. Semantic bandwidth compression. Picture Coding Symposium.
McLean, P. 1989. Structured video coding. M.I.T. Masters Thesis, Advisor: Andrew Lippman.
Mao, J. and Jain, A. (1992). Texture classification and segmentation using multiresolution simultaneous autoregressive models. Pattern Recognition, 25(2):173–188.
Mehrotra, R. and Grosky, W.I. 1989. Shape matching utilizing indexed hypotheses generation and testing. IEEE Transactions of Robotics and Automation, 5(1):70–77.
Moghaddam, B. and Pentland, A. 1994. Face recognition using viewbased and modular eigenspaces for identification and inspection of Humans. SPIE Conf. on Automatic Systems, San Diego.
Niblack, W., Barber, R., Equitz, W., Flickner, M., Glasman, E., Petkovic, D., and Yanker, P. 1993. The QBIC project: Querying image s by content using color, texture, and shape. In IS & T/SPIE 1993 International Symposium on Electronic Imaging: Science & Technology., Conference 1908, Storage and Retrieval for Image and Video Databases.
Martin, J., Pentland, A., and Kikinis, R., 1994. Shape analysis of brain structures using physical and experimental modes. IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, pp. 752–755.
Pentland, A. and Sclaroff, S. 1991. Closed-form solutions for physically based shape modeling and recognition. IEEE Trans. Pattern Analysis and Machine Intelligence, 13(7):715–730.
Pentland, A., Picard, R., Davenport, G., and Welsh, R. 1993. The BT/MIT project on advanced image tools for telecommunications: An overview. Image Com'93, 2nd International Conference on Image Communications, Bordeaux, France, pp. 23–25.
Pentland, A., Moggadam, B., and Starner, T., 1994. View-based and modular eigenspaces for face recognition. IEEE Conf. Computer Vision and Pattern Recognition, Seattle, WA, pp. 84–90.
Picard, R.W. (1982). Random field texture coding. Society for Information Display International Symposium Digest, XXIII:685–688.
Picard, R.W. and Kabir, T. 1993. Finding similar patterns in large image databases. Proc. ICASSP, Minneapolis, MN, Vol. 5, pp. 161–164.
Picard, R.W. and Gorkani, M. 1994. Finding perceptually dominant orientations in natural textures. Spatial Vision, 8(2):221–253.
Picard, R.W. and Liu, F. 1994. A new Wold ordering for image similarity. Proc. ICASSP, Adelaide, Australia.
Picard, R.W. and Minka, T.P. 1995. Vision texture for annotation. ACM/Springer-Verlag Journal of Multimedia Systems, 3:3–14.
Rao, A.R. and Lohse, G.L. 1993. Towards a texture naming system: Identifying relevant dimensions of texture. IEEE Conf. on Visualization, San Jose, CA.
Sclaroff, S. and Pentland, A. 1993. A finite-element framework for correspondence and matching. 4th International Conference on Computer Vision, Berlin, Germany, pp. 308–313.
Sclaroff, S. and Pentland, A. 1995. Modal matching for correspondence and recognition. IEEE Trans. Pattern Analysis and Machine Intelligence, 17(6):562–575. Also available as: M.I.T. Media Laboratory Perceptual Computing Technical Note No. 304.
Sirovich, L. and Kirby, M. 1987. Low-dimensional procedure for the characterization of human faces. J. Opt. Soc. Am. A, 4(3):519–524.
Smoliar, S. and Zhang, H. 1994. Content-based video indexing and retrieval. IEEE Multimedia Magazine, 1(2):62–72.
Sriram, R., Francos, J.M., and Pearlman, W.A. 1994. Texture coding using a wold decomposition model. Proc. 12th IAPR Int. Conf. Pat. Rec., Jerusalem, Israel.
Swain, M. and Ballard, D. 1991, Color indexing. Int. J. of Computer Vision, 7(1):11–32.
Tanaka, S., Shima, M., Shibayama, J., and Maeda, A. 1989. Retrieval method for an image database based on topographical structure. In Applic. of Digital Image Processing, SPIE, 1153:318–327.
Therrien, C.W. 1992. Discrete Random Signals and Statistical Signal Processing. Prentice-HallL: Englewood Cliffs, NJ.
Turk, M. and Pentland, A. 1991. Eigenfaces for recognition. Journal of Cognitive Neuroscience.
Wakimoto, K., Shima, M., Tanaka, S., and Maeda, A. 1990. An intelligent user interface to an image database using a figure interpretation method. In 9th Int. Conference on Pattern Recognition, Vol. 2, pp. 516–991.
Wang, J.Y.A. and Adelson, E.H. Layered representation for motion analysis IEEE CVPR'93. Longer version available as: M.I.T. Media Laboratory Perceptual Computing Technical Report No. 228.
Additional information
Perceptual Computing Section, The Media Laboratory, Massachusetts Institute of Technology
Rights and permissions
About this article
Cite this article
Pentland, A., Picard, R.W. & Sclaroff, S. Photobook: Content-based manipulation of image databases. Int J Comput Vision 18, 233–254 (1996). https://doi.org/10.1007/BF00123143
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF00123143