Speech Audio Retrieval Using Voice Query

Chotirat Ann Ratanamahatana²⁰ &
Phubes Tohlong²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4312))

Included in the following conference series:

International Conference on Asian Digital Libraries

1206 Accesses
4 Citations

Abstract

Multimedia data has increasingly become a prevalent resource in Digital Library system; this includes audio, video, and image archives. However, each type of these data may need specific tools to help facilitate effective and efficient retrieval tasks. In this paper, we focus on retrieval of speech audio collection, which includes audio books, speech recordings, interviews, and lectures. Currently, most of the audio retrieval systems are based on keyword/title/author search typed into the system by users. The system then searches for particular keywords and gives a list of entire audio files that potentially are relevant to the query. Nonetheless, browsing audio content for particular section of the audios without knowing the actual content is yet a very difficult task. Moreover, since audio transcription or keyword annotation is very labor intensive and becomes infeasible for large data, we introduce here a preliminary framework that locates subsections of the audio that correspond to the voice query made by a user. We demonstrate a utility of our approach on query retrieval tasks in various types of audio recordings. We also show that this simple framework can potentially help retrieve and locate the voice query within the audio accurately and efficiently.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A video indexing and retrieval computational prototype based on transcribed speech

Article 30 August 2021

Spoken Document Retrieval: Sub-sequence DTW Framework and Variants

Weighted fast sequential DTW for multilingual audio Query-by-Example retrieval

Article 19 February 2018

References

Franz, A., Milch, B.: Searching the Web by Voice. In: Proceedings of COLING (2002)
Google Scholar
Kruskall, J.B., Liberman, M.: The symmetric time warping algorithm: From continuous to discrete. In: Time Warps, String Edits and Macromolecules (1983)
Google Scholar
Klabbhankao, B.: Online Information Retrieval Using Genetic Algorithms. NECTEC Technical Journal 2(7) (March-June 2000)
Google Scholar
Zhu, Y., Shasha, D., Zhao, X.: Query by Humming – in Action with its Technology Revealed. In: ACM SIGMOD, June 9-12 (2003)
Google Scholar
Zhu, Y., Shasha, D.: Warping Indexes with Envelope Transforms for Query by Humming. In: ACM SIGMOD, June 9-12 (2003)
Google Scholar
Hazen, T.J., Saenko, K., La, C.-H., Glass, J.R.: A Segment-Based Audio-Visual Speech Recognizer: Data Collection, Development, and Initial Experiments. In: Proc. ICMI (2004)
Google Scholar
Gutkin, A., King, S.: Structural Representation of Speech for Phonetic Classification. In: Proc. 17th International Conference on Pattern Recognition (ICPR), Cambridge, August 2004, vol. 3, pp. 438–441. IEEE Computer Society Press, Los Alamitos (2004)
Chapter Google Scholar
Ratanamahatana, C.A., Keogh, E.: Three Myths about Dynamic Time Warping Data Mining. In: SIAM International Conference on Data Mining (SDM) (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Engineering, Chulalongkorn University, Bangkok, 10330, Thailand
Chotirat Ann Ratanamahatana & Phubes Tohlong

Authors

Chotirat Ann Ratanamahatana
View author publications
You can also search for this author in PubMed Google Scholar
Phubes Tohlong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Tsukuba, Tsukuba 1-2, Ibaraki, Japan
Shigeo Sugimoto
The University of Queensland, St Lucia, Queensland, Australia
Jane Hunter
Vienna University of Technology, Vienna, Austria
Andreas Rauber
Research Center for Knowledge Communities, University of Tsukuba, 305-8550, Ibaraki, Japan
Atsuyuki Morishima

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ratanamahatana, C.A., Tohlong, P. (2006). Speech Audio Retrieval Using Voice Query. In: Sugimoto, S., Hunter, J., Rauber, A., Morishima, A. (eds) Digital Libraries: Achievements, Challenges and Opportunities. ICADL 2006. Lecture Notes in Computer Science, vol 4312. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11931584_56

Download citation

DOI: https://doi.org/10.1007/11931584_56
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49375-4
Online ISBN: 978-3-540-49377-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Speech Audio Retrieval Using Voice Query

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A video indexing and retrieval computational prototype based on transcribed speech

Spoken Document Retrieval: Sub-sequence DTW Framework and Variants

Weighted fast sequential DTW for multilingual audio Query-by-Example retrieval

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Speech Audio Retrieval Using Voice Query

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A video indexing and retrieval computational prototype based on transcribed speech

Spoken Document Retrieval: Sub-sequence DTW Framework and Variants

Weighted fast sequential DTW for multilingual audio Query-by-Example retrieval

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation