Abstract
Multimedia data has increasingly become a prevalent resource in Digital Library system; this includes audio, video, and image archives. However, each type of these data may need specific tools to help facilitate effective and efficient retrieval tasks. In this paper, we focus on retrieval of speech audio collection, which includes audio books, speech recordings, interviews, and lectures. Currently, most of the audio retrieval systems are based on keyword/title/author search typed into the system by users. The system then searches for particular keywords and gives a list of entire audio files that potentially are relevant to the query. Nonetheless, browsing audio content for particular section of the audios without knowing the actual content is yet a very difficult task. Moreover, since audio transcription or keyword annotation is very labor intensive and becomes infeasible for large data, we introduce here a preliminary framework that locates subsections of the audio that correspond to the voice query made by a user. We demonstrate a utility of our approach on query retrieval tasks in various types of audio recordings. We also show that this simple framework can potentially help retrieve and locate the voice query within the audio accurately and efficiently.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Franz, A., Milch, B.: Searching the Web by Voice. In: Proceedings of COLING (2002)
Kruskall, J.B., Liberman, M.: The symmetric time warping algorithm: From continuous to discrete. In: Time Warps, String Edits and Macromolecules (1983)
Klabbhankao, B.: Online Information Retrieval Using Genetic Algorithms. NECTEC Technical Journal 2(7) (March-June 2000)
Zhu, Y., Shasha, D., Zhao, X.: Query by Humming – in Action with its Technology Revealed. In: ACM SIGMOD, June 9-12 (2003)
Zhu, Y., Shasha, D.: Warping Indexes with Envelope Transforms for Query by Humming. In: ACM SIGMOD, June 9-12 (2003)
Hazen, T.J., Saenko, K., La, C.-H., Glass, J.R.: A Segment-Based Audio-Visual Speech Recognizer: Data Collection, Development, and Initial Experiments. In: Proc. ICMI (2004)
Gutkin, A., King, S.: Structural Representation of Speech for Phonetic Classification. In: Proc. 17th International Conference on Pattern Recognition (ICPR), Cambridge, August 2004, vol. 3, pp. 438–441. IEEE Computer Society Press, Los Alamitos (2004)
Ratanamahatana, C.A., Keogh, E.: Three Myths about Dynamic Time Warping Data Mining. In: SIAM International Conference on Data Mining (SDM) (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ratanamahatana, C.A., Tohlong, P. (2006). Speech Audio Retrieval Using Voice Query. In: Sugimoto, S., Hunter, J., Rauber, A., Morishima, A. (eds) Digital Libraries: Achievements, Challenges and Opportunities. ICADL 2006. Lecture Notes in Computer Science, vol 4312. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11931584_56
Download citation
DOI: https://doi.org/10.1007/11931584_56
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49375-4
Online ISBN: 978-3-540-49377-8
eBook Packages: Computer ScienceComputer Science (R0)