Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2911451.2914727acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

Retrievability of Code Mixed Microblogs

Published: 07 July 2016 Publication History

Abstract

Mixing multiple languages within the same document, a phenomenon called (linguistic) code mixing or code switching, is a frequent trend among multilingual users of social media. In the context of information retrieval (IR), code mixing may affect retrieval effectiveness due to the mixing of different vocabularies with different collection statistics within a single collection of documents. In this paper, we investigate the indexing and retrieval strategies for a mixed collection of documents, comprising of code-mixed and the monolingual documents. In particular, we address three alternative modes of indexing, namely (a) a single index for the two sub-collections; (b) a separate index for each sub-collection; and (c) a clustered index with two individual sub-collection statistics coupled with the overall one. We make use of the expected retrievability scores of the two classes of documents to empirically show that indexing strategies (a) and (b) mostly retrieve the monolingual documents at top ranks with standard retrieval approaches. Our experiments show that, by contrast, the clustered index (c) is able to alleviate this problem by improving the retrievability of the code-mixed documents.

References

[1]
List of languages by number of native speakers. https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers. Accessed: 2016-02-09.
[2]
G. Amati and C. J. Van Rijsbergen. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst., 20(4):357--389, 2002.
[3]
P. Auer. Code-switching in conversation: language, interaction and identity. Taylor and Francis, 2002.
[4]
L. Azzopardi and V. Vinay. Retrievability: an evaluation measure for higher order information access tasks. In Proceedings of CIKM, pages 561--570, 2008.
[5]
U. Barman, A. Das, J. Wagner, and J. Foster. Code mixing: A challenge for language identification in the language of social media. In Proceedings of First Workshop on Computational Approaches to Code Switching, pages 13--23, 2014.
[6]
H. Fang and C. Zhai. An exploration of axiomatic approaches to information retrieval. In Proceedings of SIGIR '05, pages 480--487, 2005.
[7]
E. A. Fox and J. A. Shaw. Combination of multiple searches. In Proc of TREC-2, pages 243--252, 1994.
[8]
D. Hiemstra. Using Language Models for Information Retrieval. PhD thesis, Center of Telematics and Information Technology, AE Enschede, 2000.
[9]
X. Liu and W. B. Croft. Cluster-based retrieval using language models. In Proc of SIGIR '04, pages 186--193, 2004.
[10]
D. Nguyen and A. S. Dogruöz. Word level language identification in online multilingual communication. In Proceedings of EMNLP '13, pages 857--862, 2013.
[11]
C. J. Paolillo. "conversational" codeswitching on usenet and internet relay chat. Language@Internet, 8(3), 2011.
[12]
S. E. Robertson, S. Walker, S. Jones, and M. Hancock-Beaulieu. Okapi at TREC-3. In Proceedings of the Third Text REtrieval Conference (TREC 1994). NIST, 1994.
[13]
Y. Vyas, S. Gella, J. Sharma, K. Bali, and M. Choudhury. POS tagging of english-hindi code-mixed social media content. In Proceedings of EMNLP '14, pages 974--979, 2014.

Cited By

View all
  • (2023)The Effect of Stopword Removal on Information Retrieval for Code-Mixed Data Obtained Via Social MediaSN Computer Science10.1007/s42979-023-01942-74:5Online publication date: 26-Jun-2023
  • (2019)Extracting Resource Needs and Availabilities From Microblogs for Aiding Post-Disaster Relief OperationsIEEE Transactions on Computational Social Systems10.1109/TCSS.2019.29141796:3(604-618)Online publication date: Jun-2019
  • (2018)Exploitation of Social Media for Emergency Relief and PreparednessInformation Systems Frontiers10.1007/s10796-018-9878-z20:5(901-907)Online publication date: 1-Oct-2018

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval
July 2016
1296 pages
ISBN:9781450340694
DOI:10.1145/2911451
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. code mixing
  2. fusion
  3. microblog retrieval
  4. retrievability

Qualifiers

  • Short-paper

Funding Sources

Conference

SIGIR '16
Sponsor:

Acceptance Rates

SIGIR '16 Paper Acceptance Rate 62 of 341 submissions, 18%;
Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)The Effect of Stopword Removal on Information Retrieval for Code-Mixed Data Obtained Via Social MediaSN Computer Science10.1007/s42979-023-01942-74:5Online publication date: 26-Jun-2023
  • (2019)Extracting Resource Needs and Availabilities From Microblogs for Aiding Post-Disaster Relief OperationsIEEE Transactions on Computational Social Systems10.1109/TCSS.2019.29141796:3(604-618)Online publication date: Jun-2019
  • (2018)Exploitation of Social Media for Emergency Relief and PreparednessInformation Systems Frontiers10.1007/s10796-018-9878-z20:5(901-907)Online publication date: 1-Oct-2018

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media