research-article

Two-stream indexing for spoken web search

Authors:

Kundan SrivastavaAuthors Info & Claims

WWW '11: Proceedings of the 20th international conference companion on World wide web

Pages 503 - 512

https://doi.org/10.1145/1963192.1963364

Published: 28 March 2011 Publication History

Get Access

Abstract

This paper presents two-stream processing of audio to index the audio content for Spoken Web search. The first stream indexes the meta-data associated with a particular audio document. The meta-data is usually very sparse, but accurate. This therefore results in a high-precision, low-recall index. The second stream uses a novel language-independent speech recognition to generate text to be indexed. Owing to the multiple languages and the noise in user generated content on the Spoken Web, the speech recognition accuracy of such systems is not high, thus they result in a low-precision, high-recall index. The paper attempts to use these two complementary streams to generate a combined index to increase the precision-recall performance in audio content search.

The problem of audio content search is motivated by the real world implication of the Web in developing regions, where due to literacy and affordability issues, people use Spoken Web which consists of interconnected VoiceSites, which have content in audio. The experiments are based on more than 20,000 audio documents spanning over seven live VoiceSites and four different languages. The results suggest significant improvement over a meta-data-only or a speech-recognitiononly system, thus justifying the two-stream processing approach. Audio content search is a growing problem area and this paper wishes to be a first step to solving this at a large scale, across languages, in a Web context.

References

[1]

S. Agarwal, D. Chakraborty, A. Kumar, A. A. Nanavati, and N. Rajput. HSTP: Hyperspeech Transfer Protocol. In ACM Hypertext 2007, UK, September 2007.

Digital Library

Google Scholar

[2]

S. Agarwal, K. Dhanesha, A. Jain, A. Kumar, S. Menon, N. Rajput, K. Srivastava, and S. Srivastava. Organizational, social and executional implications in delivering ict solutions: A telecom web case-study. In Proc. Intl. Conf. on Information and Communication Technologies and Development (ICTD), 2010.

Digital Library

Google Scholar

[3]

S. Agarwal, A. Kumar, A. A. Nanavati, and N. Rajput. Content creation and dissemination by-and-for users in rural areas. In Proc. Intl. Conf. on Information and Communication Technologies and Development (ICTD), April 2009.

Digital Library

Google Scholar

[4]

C. Alberti, M. Bacchiani, A. Bezman, C. Chelba, A. Drofa, H. Liao, P. Moreno, T. Power, A. Sahuguet, M. Shugrina, and O. Siohan. An audio indexing system for election video material. In In Proc. ICASSP, April 2009.

Digital Library

Google Scholar

[5]

C. Allauzen, M. Mohri, and M. Saraclar. General indexation of weighted automata - application to spoken utterance retrieval. In Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval at HLT/NAACL, pages 33--40, 2004.

Digital Library

Google Scholar

[6]

M. Baldonado, C. chuan K. Chang, L. Gravano, and A. Paepcke. The stanford digital library metadata architecture. International Journal of Digital Libraries, 1:108--121, 1997.

Crossref

Google Scholar

[7]

S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(1):107--117, 1998.

Digital Library

Google Scholar

[8]

A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener. Graph structure in the web. The International Journal of Computer and Telecommunications Networking, 1(6), 2000.

Google Scholar

[9]

A. Callerya and D. Tracy-Proulxa. Yahoo! cataloging the web. Journal of Library Metadata, 1(1), 1997.

Google Scholar

[10]

D. Chakrabarti, R. Kumar, and K. Punera. Quicklink selectoin for navigational query results. In WWW '09: Proceedings of the 18th international conference on World Wide Web, Madrid, Spain, May 2009.

Digital Library

Google Scholar

[11]

J. Charzinski. Traffic Properties, Client Side Cachability and CDN Usage of Popular Web Sites. Lecture Notes in Computer Science, 2010(5987), 2010.

Digital Library

Google Scholar

[12]

C. Chelba and A. Acero. Position specific posterior lattices for indexing speech. In ACL '05: Proceedings of the Annual Meeting on Association for Computational Linguistics, pages 443--450, 2005.

Digital Library

Google Scholar

[13]

T. K. Chia, K. C. Sim, H. Li, and H. T. Ng. A lattice-based approach to query-by-example spoken document retrieval. In SIGIR '08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pages 363--370, 2008.

Digital Library

Google Scholar

[14]

M. Diao, S. Mukherjea, N. Rajput, and K. Srivastava. Faceted search and browsing of audio content on spoken web. In CIKM '10: Proceedings of the nineteenth international conference on Information and knowledge management, 2010.

Digital Library

Google Scholar

[15]

L. Ding, T. Finin, A. Joshi, R. Pan, R. S. Cost, Y. Peng, P. Reddivari, V. Doshi, and J. Sachs. Swoogle: a search and metadata engine for the semantic web. In CIKM '04: Proceedings of the thirteenth international conference on Information and knowledge management, pages 652--659, 2004.

Digital Library

Google Scholar

[16]

T. Heimonen and M. Kaki. Mobile finder: supporting mobile web search with automatic result categories. In Proceedings of the MobileHCI, 2007.

Digital Library

Google Scholar

[17]

B. Hughes and A. Kamat. A metadata search engine for digital language archives. Digital Libraries Magazine, 11(2), 2005.

Google Scholar

[18]

M. Jones, G. Buchanan, and H. Thimnbleby. Improving web search on small screen devices. Interacting with Computers, 4(15), 2003.

Google Scholar

[19]

R. Kraft and J. Zien. Mining anchor text for query refinement. In WWW '04: Proceedings of the 13th international conference on World Wide Web, New York, USA, May 2004.

Digital Library

Google Scholar

[20]

A. Kumar, N. Rajput, S. Agarwal, D. Chakraborty, and A. A. Nanavati. Organizing the unorganized -- employing it to empower the under-privileged. In Proceedings of the World Wide Web, April 2008.

Digital Library

Google Scholar

[21]

A. Kumar, N. Rajput, D. Chakraborty, S. Agarwal, and A. A. Nanavati. Voiserv: Creation and delivery of converged services through voice for emerging economies. In WoWMoM'07 Proceedings of the 2007 International Symposium on a World of Wireless, Mobile and Multimedia Networks, Finland, June 2007.

Crossref

Google Scholar

[22]

A. Kumar, N. Rajput, D. Chakraborty, S. Agarwal, and A. A. Nanavati. WWTW: A World Wide Telecom Web for Developing Regions. In ACM SIGCOMM Workshop on Networked Systems For Developing Regions, Aug 2007.

Digital Library

Google Scholar

[23]

J. Ledlie, B. Odero, E. Minkov, I. Kiss, and J. Polifroni. Crowd translator: On building localized speech recognizers through micropayments. SIGOPS Operating Systems Review, 43(4), 2009.

Digital Library

Google Scholar

[24]

M. McCandless, E. Hatcher, and O. Gospodneti. Lucene in Action, Second Edition. Manning Publications Company, 2008.

Digital Library

Google Scholar

[25]

I. Medhi, A. Sagar, and K. Toyama. Text-Free User Interfaces for Illiterate and Semi-Literate Users. In ICTD, Berkeley, USA, May 2006.

Crossref

Google Scholar

[26]

R. Miller and K. Bharat. Sphinx: A framework for creating personal, site-specific web crawlers. In WWW '98: Proceedings of the 7th international conference on World Wide Web, Brisbane, Australia, May 1998.

Digital Library

Google Scholar

[27]

G. Mishne, D. Carmel, R. Hoory, A. Roytman, and A. Soffer. Automatic analysis of call-center conversations. In CIKM '05: Proceedings of the 14th international conference on Information and knowledge management, pages 453--459, 2005.

Digital Library

Google Scholar

[28]

C. Parada, A. Sethy, and B. Ramachandran. Query-by-example spoken term detection for OOV terms. In In Proc. ASRU, December 2009.

Crossref

Google Scholar

[29]

N. Patel, D. Chittamuru, A. Jain, P. Dave, and T. S. Parikh. Avaaj Otalo - A Field Study of an Interactive Voice Forum for Small Farmers in Rural India. In Proc. CHI, USA, April 2010.

Digital Library

Google Scholar

[30]

M. Plauch, U. Nallasamy, J. Pal, C. Wooters, and D. Ramachandran. Speech Recognition for Illiterate Access Information and Technology. In ICTD), Berkeley, CA, USA, May 2006.

Crossref

Google Scholar

[31]

A. Ranjan, R. Balakrishnan, and M. Chignell. Searching in audio: the utility of transcripts, dichotic presentation, and time-compression. In CHI '06: Proceedings of the SIGCHI conference on Human Factors in computing systems, pages 721--730, 2006.

Digital Library

Google Scholar

[32]

R. Sarvas, E. Herrarte, A. Wilhelm, and M. Davis. Metadata creation system for mobile images. In Proceedings of the 2nd international conference on Mobile systems, applications, and services, pages 36--48, 2004.

Digital Library

Google Scholar

[33]

U. N. E. Scientific and C. Organization. Education for All Global Monitoring Report - Reaching the Marginalized. http://unesdoc.unesco.org/images/0018/001866/186606E.pdf, pages 16--32, 2010.

Google Scholar

[34]

J. Sherwani. Are Spoken Dialog Systems Viable for Under-served Semi-literate Populations? PhD Thesis Proposal, Carnegie Mellon University, http:// www.cs.cmu.edu/jsherwan/JS-proposal.pdf, 2005.

Google Scholar

[35]

Sourceforge. Jspider - the Open Source Web Robot. http://j-spider.sourceforge.net, October 2010.

Google Scholar

[36]

M. Svensson and A. Kurti. Using contextual metadata for enhanced reusability of mobile media objects. In Sharing Experiences with Social Mobile Media : Proceedings of the International Workshop in conjunction with MobileHCI, pages 72--79, 2009.

Google Scholar

[37]

K.-P. Yee, K. Swearingen, L. Li, and M. Hearst. Faceted metadata for image search and browsing. In CHI '03: Proceedings of the SIGCHI conference on Human Factors in computing systems, pages 401--408, 2003.

Digital Library

Google Scholar

[38]

K. C. Yu, C. Ma, and F. Seide. Vocabulary independent indexing of spontaneous speech. IEEE Transactions on Speech and Audio Processing, 13(5), 2005.

Google Scholar

[39]

Y. Zhang and J. Glass. Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams. In In Proc. ASRU, December 2009.

Crossref

Google Scholar

Cited By

View all

Remy CAgarwal SKumar ASrivastava S(2013)Supporting Voice Content Sharing among Underprivileged People in Urban IndiaHuman-Computer Interaction – INTERACT 201310.1007/978-3-642-40498-6_38(489-506)Online publication date: 2013
https://doi.org/10.1007/978-3-642-40498-6_38
Oard DAgrawal ROard DRajput N(2012)Query by babblingProceedings of the first workshop on Information and knowledge management for developing region10.1145/2389776.2389781(17-22)Online publication date: 2-Nov-2012
https://dl.acm.org/doi/10.1145/2389776.2389781
Sahay SRajput NPansare N(2011)Social ranking for spoken web searchProceedings of the 20th ACM international conference on Information and knowledge management10.1145/2063576.2063840(1835-1840)Online publication date: 24-Oct-2011
https://dl.acm.org/doi/10.1145/2063576.2063840

Index Terms

Two-stream indexing for spoken web search
1. Information systems
  1. Information retrieval
  2. Information systems applications
    1. Multimedia information systems

Recommendations

Faceted search and browsing of audio content on spoken web
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management

Spoken Web is a web of VoiceSites that can be accessed by a phone. The content in a VoiceSite is audio. Therefore Spoken Web provides an alternate to the World Wide Web (WWW) in developing regions where low Internet penetration and low literacy are ...
The World Wide Telecom Web browser
ACM DEV '10: Proceedings of the First ACM Symposium on Computing for Development

In developing regions, literacy levels and Internet penetration is considerably low, but phone penetration is high and is growing rapidly. In such a setting, the World Wide Telecom Web (WWTW), commonly known as the Spoken Web, provides a compelling ...
Social ranking for spoken web search
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

Spoken Web is an alternative Web for low-literacy users in the developing world. People can create audio content over phone and share on the Spoken Web. This enables easy creation of locally relevant content. Even on the World Wide Web in developed ...

Reviews

Reviewer: Gerald Friedland

The Spoken Web affects many people in rural India and other developing areas that have just recently adopted the widespread use of voice-only cell phones (as opposed to smartphones). The Spoken Web, like the World Wide Web (WWW), enables users to browse and surf for information. The presentation, however, is spoken language instead of text and graphics, so users can access content using a cell phone with no display. Since there is no textual information, though, how does one search this Web__?__ This is exactly the topic of this paper. This paper presents a technique that combines metadata and speech recognition into an indexing approach for Spoken Web search. The paper describes different variants of algorithms, and then presents an evaluation of precision and recall based on more than 20,000 voice documents. It provides evidence that the combination of the metadata and the speech recognition stream performs better than comparable algorithms on a single modality. This paper is for speech and natural language processing researchers. I definitely recommend it to those working in such fields, even if they are not working on the Spoken Web. Though the authors performed the experiments within this domain, as a speech and multimedia researcher, I see no reason why the techniques could not also be applied to "found data" from the WWW, such as consumer-produced videos on social networking sites. The characteristics of the data match reasonably. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

WWW '11: Proceedings of the 20th international conference companion on World wide web

March 2011

552 pages

ISBN:9781450306379

DOI:10.1145/1963192

General Chairs:
S. Sadagopan
IIIT-Bangalore, India
,
Krithi Ramamritham
IIT-Bombay, India
,
Arun Kumar
IBM Research, India
,
M. P. Ravindra
Infosys E & R, India
,
Program Chairs:
Elisa Bertino
Purdue University, USA
,
Ravi Kumar
Yahoo! Research, USA

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web
The International Institute of Information Technology Bangalore: The International Institute of Information Technology Bangalore

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 March 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW '11

WWW '11: 20th International World Wide Web Conference

March 28 - April 1, 2011

Hyderabad, India

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
245
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 02 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Remy CAgarwal SKumar ASrivastava S(2013)Supporting Voice Content Sharing among Underprivileged People in Urban IndiaHuman-Computer Interaction – INTERACT 201310.1007/978-3-642-40498-6_38(489-506)Online publication date: 2013
https://doi.org/10.1007/978-3-642-40498-6_38
Oard DAgrawal ROard DRajput N(2012)Query by babblingProceedings of the first workshop on Information and knowledge management for developing region10.1145/2389776.2389781(17-22)Online publication date: 2-Nov-2012
https://dl.acm.org/doi/10.1145/2389776.2389781
Sahay SRajput NPansare N(2011)Social ranking for spoken web searchProceedings of the 20th ACM international conference on Information and knowledge management10.1145/2063576.2063840(1835-1840)Online publication date: 24-Oct-2011
https://dl.acm.org/doi/10.1145/2063576.2063840

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Faceted search and browsing of audio content on spoken web

The World Wide Telecom Web browser

Social ranking for spoken web search

Reviews

Access critical reviews of Computing literature here