Abstract
This section describes the indexing, search, and retrieval of various combinations of audio, video, text, and image media and the automated content processing that enables it. The intent is to provide a framework for data analysis in multimedia digital libraries. The organization of this article is as follows: The introduction briefly distinguishes digital from traditional libraries and touches on the specific issues important to searching the content of multimedia libraries. The second section introduces the Informedia Digital Video Library as an example of a multimedia library, including a quick tour of the functionality. The next section discusses the processing of audio and image information, as it relates to a multimedia library. Section four illustrates the interplay between audio and video information using a video information retrieval experiment as an example. Section five discusses the exporting and sharing of metadata in a digital library using MPEG–7. Finally, section 6 presents one vision of a future digital library, where all personal memory can be recorded and accessed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ardizzone, E., La Cascia, M., Avanzato, A., Bruna, A.: Video indexing using MPEG motion compensation vectors. In: IEEE International Conference on Multimedia Computing and Systems, vol. 2, pp. 725–729 (1999)
Bikel, D.M., Miller, S., Schwartz, R., Weischedel, R.: Nymble: A high performance learning name-finder. In: Proc. 5th Conference on Applied Natural Language Processing, pp. 194–201 (1996)
Bouthemy, P., Gelgon, M., Ganansia, F.: A unified approach to shot change detection and camera motion characterization. IEEE Trans. Circuits and Systems for Video Technology 9, 1030–1044 (1999)
Bush, V.: As we may think. The Atlantic Monthly 176(7), 101–108 (1945)
Chang, S.-F., Sikora, T., Puri, A.: Overview of the MPEG-7 standard. IEEE Transactions on Circuits and Systems for Video Technology (2001)
Christel, M., Martin, D.: Information visualization within a digital video library. Journal of Intelligent Information Systems 11(3), 235–257 (1998)
Christel, M.G., Hauptmann, A.G., Warmack, A.S., Crosby, S.A.: Adjustable filmstrips and skims as abstractions for a digital video library. In: Proc. IEEE Advances in Digital Libraries Conference, pp. 98–104 (1999)
Christel, M.G., Maher, B., Begun, A.: XSLT for tailored access to a digital video library. In: Proc. Joint Conference on Digital Libraries, pp. 290–299 (2001)
Christel, M.G., Olligschlaeger, A.M., Huang, C.: Interactive maps for a digital video library. IEEE MultiMedia 7(1), 60–67 (2000)
Bimbo, A.D.: Visual Information Retrieval. Morgan Kaufmann Publishers, San Francisco (1999)
Encyclopedia Britannica (2002), http://www.britannica.com
Fox, E.A., Marchionini, G.: Toward a worldwide digital library. Communications of the ACM 41(4), 22–28 (1998)
Garofolo, J.S., Auzanne, C.P., Voorhees, E.M.: The TREC spoken document retrieval track: A success story. In: Proc RIAO–2000: Content-Based Multimedia Information Access Conference, pp. 12–14 (2000)
Goldstein, J., Kantrowitz, M., Mittal, V., Carbonell, J.: Summarizing text documents: Sentence selection and evaluation metrics. In: Proc. ACM SIGIR (1999)
Hauptmann, A.G., Jin, R., Ng, T.D.: Multi-modal information retrieval from broadcast video using OCR and speech recognition. In: Proc. Joint Conference on Digital Libraries (2002)
Hauptmann, A.G., Jones, R.E., Seymore, K., Siegler, M.A., Slattery, S.T., Witbrock, M.J.: Experiments in information retrieval from spoken documents. In: Proc. DARPA Workshop on Broadcast News Understanding Systems (1998)
Hauptmann, A.G., Lee, D.: Topic labeling of broadcast news stories in the Informedia digital video library. In: Proc. ACM Conference on Digital Libraries (1998)
Hauptmann, A.G., Smith, M.: Text, speech and vision for video segmentation: The Informedia project. In: Proc. AAAI Fall Symposium, Computational Models for Integrating Language and Vision, pp. 10–12 (1995)
Hauptmann, A.G., Witbrock, M.: Informedia: News-on-demand - multimedia information acquisition and retrieval. In: Maybury, M. (ed.) Intelligent Multimedia Information Retrieval. AAAI Press/MIT Press (1998)
Hauptmann, A.G., Witbrock, M.J., Christel, M.G.: Artificial intelligence techniques in the interface to a digital video library. In: Proc. Conference on Human Factors in Computing Systems, pp. 2–3 (1997)
Houghton, R.: Named faces: putting names to faces. IEEE Intelligent Systems 14(5), 45–50 (1999)
Huang, Q., Dom, B., Gorkani, M., Hafner, J., Lee, D., Petkovic, D., Steele, D., Yanker, P.: Query by image and video content: the QBIC system. IEEE Computer 28(9), 23–32 (1995)
Jin, R., Hauptmann, A.G.: Headline generation using a training corpus. In: Gelbukh, A. (ed.) CICLING 2001. LNCS, vol. 2004, pp. 208–215. Springer, Heidelberg (2001)
Jinzenji, K., Ishibashi, S., Kotera, H.: Algorithm for automatically producing layered sprites by detecting camera movement. In: Proc. International Conference on Image Processing, vol. 1, pp. 767–770 (1997)
Kantor, P., Voorhees, E.M.: Report on the confusion track. In: Proc. Fifth Text Retrieval Conference, (TREC-5), (1997)
Kimball, O., Schmidt, M., Gish, H., Waterman, J.: Speaker verification with limited enrollment data. In: Proc. ICSLP, vol. 2, pp. 967–970 (1996)
Kubala, F., Colbath, S., Liu, D., Makhoul, J.: Rough’n’Ready: A meeting recorder and browser. ACM Computing Surveys 31(2es), 7 (1999)
Kubala, F., Colbath, S., Liu, D., Srivastava, A., Makhoul, J.: Integrated technologies for indexing spoken language. Communication of the ACM 43(2), 48–56 (2000)
Kubala, F., Schwartz, R., Stone, R., Weischedel, R.: Named entity extraction from speech. In: Proc. DARPA Broadcast News Workshop (1998)
Kupiec, J., Pedersen, J., Chen, F.: A trainable document summarizer. In: Proc. ACM SIGIR, pp. 68–73 (1995)
Lee, H., Smeaton, A.: Searching the Físchlár-NEWS archive on a mobile device. In: Proc. ACM SIGIR, pp. 11–15 (2002)
Leiner, B.M.: The scope of the digital library. Draft Prepared for the DLib Working Group on Digital Library Metrics (1998)
Lienhart, R.: Comparison of automatic shot boundary detection algorithms. In: Storage and Retrieval for Still Image and Video Databases VII, Proc. SPIE 3656-29 (1999)
Mani, I., House, D., Maybury, M., Green, M.: Towards content-based browsing of broadcast news video. Intelligent Multimedia Information Retrieval (1998)
MPEG Moving Pictures Expert Group. Standards ISO/IEC 13818-2:2000, and ISO/IEC 11172-2 (1993), http://mpeg.telecomitalialab.com/standards.htm
ISO/IEC JTC1/SC29/WG11 N4509. Overview of the MPEG-7 standard, version 6.0 (2000)
Ney, H.: The use of a one stage dynamic programming algorithm for connected word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing, AASP 32(2), 262–271 (1984)
Olligschlaeger, A.M., Hauptmann, A.G.: Multimodal information systems and GIS: The Informedia digital video library. ESRI User Conference (1999)
MPEG-7 Schema Page (2001), http://pmedia.i2.ibm.com:8000/mpeg7/schema/
Park, J.I., Inoue, S., Iwadate, Y.: Estimating camera parameters from motion vectors of digital video. In: IEEE Workshop Multimedia Signal Processing, pp. 105–110 (1998)
Pentland, A., Starner, T., Etcoff, N., Masoiu, N., Oliyide, O., Turk, M.: Experiments with Eigenfaces. In: Proc. IJCAI Looking at People Workshop (1993)
Rivlin, Z., Bolles, R., Appelt, D., Cheyer, A., Hakkani-Tur, D.Z., Israel, D., Julia, L., Martin, D., Myers, G., Nitz, K., Sabata, B., Sankar, A., Shriberg, E., Sonmez, K., Stolcke, A., Tur, G.: MAESTRO: Conductor of multimedia analysis technologies. Communications of the ACM 43(2), 57–74 (2000)
Rowley, H., Baluja, S., Kanade, T.: Human face detection in visual scenes. Technical Report CMU-CS-95-158, Carnegie Mellon University, Pittsburgh, PA (1995)
Salton, G., Singhal, A., Mitra, M., Buckley, C.: Automatic text structuring and summary. Info. Proc. And Management 33, 193–207 (1997)
Sato, T., Kanade, T., Hughes, E., Smith, M.: Video OCR for digital news archives. In: IEEE International Workshop on Content-Based Access of Image and Video Databases, pp. 52–60 (January 1998)
Sato, T., Kanade, T., Hughes, E.A., Smith, M.A., Satoh, S.: Video OCR: Indexing digital news libraries by recognition of superimposed caption. ACM Multimedia Systems 7(5), 385–395 (1999)
Satoh, S., Kanade, T.: NAME-IT: Association of face and name in video. IEEE CVPR 1997, Puerto Rico (1997)
Schmidt, M., Golden, J., Gish, H.: GMM sample statistic log-likelihoods for textindependent speaker recognition. In: Proc. Eurospeech 1997, vol. 2, pp. 855–858 (1997)
Schneiderman, H., Kanade, T.: Probabilistic modeling of local appearance and spatial relationships of object recognition. In: Proc IEEE CVPR (1998)
Schwartz, R., Imai, T., Kubala, F., Nguyen, L., Makhoul, J.: A maximum likelihood model for topic classification in broadcast news. In: Proc. Eurospeech 1997 (1997)
Shamos, M.: Vision for the universal library (2002), http://www.ul.cs.cmu.edu/
Shneiderman, B.: Designing the User Interface. Addison-Wesley, Reading (1998)
Slaughter, L., Oard, D.W., Warnick, V.L., Harding, J.L., Wilkerson, G.J.: A graphical interface for speech-based retrieval. In: Proc. Digital Libraries 1998, pp. 305–306 (1998)
Smeaton, A., Murphy, N., O’Connor, N., Marlow, S., Lee, H., Mc Donald, K., Browne, P., Ye, J.: The Físchlár digital video system: A digital library of broadcast TV programmes. In: Proc. Joint Conference on Digital Libraries (2001)
Smeulders, A.W.M., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Trans. Pattern Analysis and Machine Intelligence 22(12), 1349–1380 (2000)
SonicFoundry (2002), http://sonicfoundry.com/
Virage (2002), http://www.virage.com/
Visionics (2002), http://www.visionics.com
Voorhees, E.M., Harman, D.K.: The Ninth Text Retrieval Conference, TREC-9 (2001)
Voorhees, E.M., Tice, D.M.: The TREC-8 question answering track report. In: The Eighth Text Retrieval Conference, TREC-8 (2000)
VTREC. The Video TREC track home page (2001), http://www-nlpir.nist.gov/projects/trecvid/
Wactlar, H., Christel, M., Gong, Y., Hauptmann, A.: Lessons learned from the creation and deployment of a terabyte digital video library. IEEE Computer 32(2), 66–73 (1999)
Wang, R., Huang, T.: Fast camera motion analysis in the MPEG domain. International Conference on Image Processing 3, 691–694 (1999)
QBIC web site (2002), http://wwwqbic.almaden.ibm.com
Witbrock, M., Mittal, V.: Ultra-summarization: A statistical approach to generating highly condensed non-extractive summaries. In: Proc. ACM SIGIR (1999)
Woodland, P.C., Gales, M.J.F., Pye, D., Young, S.J.: Development of the 1996 broadcast news transcription system. In: Proceedings of the 1997 ARPA Workshop on Speech Recognition (February 1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Hauptmann, A., Jin, R., Wactlar, H. (2003). Data Analysis for a Multimedia Library. In: Renals, S., Grefenstette, G. (eds) Text- and Speech-Triggered Information Access. Lecture Notes in Computer Science(), vol 2705. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45115-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-540-45115-0_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40635-8
Online ISBN: 978-3-540-45115-0
eBook Packages: Springer Book Archive