research-article

IIIT-H System Submission for FIRE2014 Shared Task on Transliterated Search

Authors:

Manish ShrivastavaAuthors Info & Claims

FIRE '14: Proceedings of the 6th Annual Meeting of the Forum for Information Retrieval Evaluation

Pages 48 - 53

https://doi.org/10.1145/2824864.2824872

Published: 05 December 2014 Publication History

Get Access

Abstract

This paper describes our submission for FIRE 2014 Shared Task on Transliterated Search. The shared task features two sub-tasks: Query word labeling and Mixed-script Ad hoc retrieval for Hindi Song Lyrics.

Query Word Labeling is on token level language identification of query words in code-mixed queries and back-transliteration of identified Indian language words into their native scripts. We have developed letter based language models for the token level language identification of query words and a structured perceptron model for back-transliteration of Indic words.

The second subtask for Mixed-script Ad hoc retrieval for Hindi Song Lyrics is to retrieve a ranked list of songs from a corpus of Hindi song lyrics given an input query in Devanagari or transliterated Roman script. We have used edit distance based query expansion and language modeling followed by relevance based reranking for the retrieval of relevant Hindi Song lyrics for a given query.

References

[1]

Michael Collins. Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. pages 188--193, 2006.

Google Scholar

[2]

Marcello Federico, Nicola Bertoldi, and Mauro Cettolo. Irstlm: an open source toolkit for handling large scale language models. In Interspeech, pages 1618--1621, 2008.

Crossref

Google Scholar

[3]

Parth Gupta, Kalika Bali, Rafael E Banchs, Monojit Choudhury, and Paolo Rosso. Query expansion for mixed-script information retrieval. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, pages 677--686. ACM, 2014.

Digital Library

Google Scholar

[4]

Christopher D Manning, Prabhakar Raghavan, and Hinrich Schütze. Introduction to information retrieval, volume 1. Cambridge university press Cambridge, 2008.

Crossref

Google Scholar

[5]

Tomas Mikolov, Stefan Kombrink, Anoop Deoras, Lukar Burget, and J Cernocky. Rnnlm-recurrent neural network language modeling toolkit. In Proc. of the 2011 ASRU Workshop, pages 196--201, 2011.

Google Scholar

[6]

Franz Josef Och and Hermann Ney. Giza++: Training of statistical translation models, 2000.

Google Scholar

[7]

Andreas Stolcke et al. Srilm-an extensible language modeling toolkit. In INTERSPEECH, 2002.

Google Scholar

[8]

Olga Vechtomova and Ying Wang. A study of the effect of term proximity on query expansion. Journal of Information Science, 32(4):324--333, 2006.

Crossref

Google Scholar

Cited By

View all

Jain MJindal RJain A(2023)Code‐mixed Hindi‐English text correction using fuzzy graph and word embeddingExpert Systems10.1111/exsy.13328Online publication date: 14-May-2023
https://doi.org/10.1111/exsy.13328
Memon N(2023)Sentence Level Language Identification in Code-mix Gujarati Language with Transformers2023 15th International Conference on Innovations in Information Technology (IIT)10.1109/IIT59782.2023.10366421(218-221)Online publication date: 14-Nov-2023
https://doi.org/10.1109/IIT59782.2023.10366421
Aryal SPrioleau HWashington GBurge L(2023)Evaluating Ensembled Transformers for Multilingual Code-Switched Sentiment Analysis2023 International Conference on Computational Science and Computational Intelligence (CSCI)10.1109/CSCI62032.2023.00032(165-173)Online publication date: 13-Dec-2023
https://doi.org/10.1109/CSCI62032.2023.00032
Show More Cited By

Index Terms

IIIT-H System Submission for FIRE2014 Shared Task on Transliterated Search
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Recommendations

A Hybrid Approach for Transliterated Word-Level Language Identification: CRF with Post-Processing Heuristics
FIRE '14: Proceedings of the 6th Annual Meeting of the Forum for Information Retrieval Evaluation

In this paper, we describe a hybrid approach for word-level language (WLL) identification of Bangla words written in Roman script and mixed with English words as part of our participation in the shared task on transliterated search at Forum for ...
ISM@FIRE-2013 Shared Task on Transliterated Search
FIRE '12 & '13: Proceedings of the 4th and 5th Annual Meetings of the Forum for Information Retrieval Evaluation

This paper describes the approach we adopted during official submission of FIRE-2013 Shared Task on Transliterated Search along with few other approaches that we experimented post-submission. The techniques solve the problem of language labeling, by ...
Two approaches for the resolution of word mismatch problem caused by English words and foreign words in Korean information retrieval
IRAL '00: Proceedings of the fifth international workshop on on Information retrieval with Asian languages

In Korean text, recently, the use of English words with or without phonetic translation is growing at high speed. To make matters worse the Korean transliterations of an English word may be very various. The mixed use of English words and their various ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

FIRE '14: Proceedings of the 6th Annual Meeting of the Forum for Information Retrieval Evaluation

December 2014

151 pages

ISBN:9781450337557

DOI:10.1145/2824864

Editors:
Prasenjit Majumder
Dhirubhai Ambani Institute of Information and Communication Technology, Gujarat, India
,
Mandar Mitra
Indian Statistical Institute, Kolkata, India
,
Sukomal Pal
Indian School of Mines, Dhanbad
,
Madhulika Agrawal
Dhirubhai Ambani Institute of Information and Communication Technology, Gujarat, India
,
Parth Mehta
Dhirubhai Ambani Institute of Information and Communication Technology, Gujarat, India

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 December 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

FIRE '14

FIRE '14: Forum for Information Retrieval Evaluation

December 5 - 7, 2014

Bangalore, India

Acceptance Rates

Overall Acceptance Rate 19 of 64 submissions, 30%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

19
Total Citations
View Citations
294
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)1

Reflects downloads up to 24 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Jain MJindal RJain A(2023)Code‐mixed Hindi‐English text correction using fuzzy graph and word embeddingExpert Systems10.1111/exsy.13328Online publication date: 14-May-2023
https://doi.org/10.1111/exsy.13328
Memon N(2023)Sentence Level Language Identification in Code-mix Gujarati Language with Transformers2023 15th International Conference on Innovations in Information Technology (IIT)10.1109/IIT59782.2023.10366421(218-221)Online publication date: 14-Nov-2023
https://doi.org/10.1109/IIT59782.2023.10366421
Aryal SPrioleau HWashington GBurge L(2023)Evaluating Ensembled Transformers for Multilingual Code-Switched Sentiment Analysis2023 International Conference on Computational Science and Computational Intelligence (CSCI)10.1109/CSCI62032.2023.00032(165-173)Online publication date: 13-Dec-2023
https://doi.org/10.1109/CSCI62032.2023.00032
Kapil PKumari GEkbal APal SChatterjee AVinutha B(2023)HHSD: Hindi Hate Speech Detection Leveraging Multi-Task LearningIEEE Access10.1109/ACCESS.2023.331299311(101460-101473)Online publication date: 2023
https://doi.org/10.1109/ACCESS.2023.3312993
Chanda SPal S(2023)The Effect of Stopword Removal on Information Retrieval for Code-Mixed Data Obtained Via Social MediaSN Computer Science10.1007/s42979-023-01942-74:5Online publication date: 27-Jun-2023
https://doi.org/10.1007/s42979-023-01942-7
Laskar SPaul BPakray PBandyopadhyay S(2023)Improving English-Assamese Neural Machine Translation Using Transliteration-Based ApproachEvolution in Computational Intelligence10.1007/978-981-19-7513-4_20(223-231)Online publication date: 26-Apr-2023
https://doi.org/10.1007/978-981-19-7513-4_20
Jaballi SHazar MZrigui SNicolas HZrigui M(2023)Deep Bidirectional LSTM Network Learning-Based Sentiment Analysis for Tunisian Dialectical Facebook Content During the Spread of the Coronavirus PandemicAdvances in Computational Collective Intelligence10.1007/978-3-031-41774-0_8(96-109)Online publication date: 22-Sep-2023
https://doi.org/10.1007/978-3-031-41774-0_8
Joshi PJoshi D(2023)Code Mixed Information Retrieval for Gujarati Script News ArticlesAdvances in Computing and Data Sciences10.1007/978-3-031-37940-6_22(265-276)Online publication date: 23-Jul-2023
https://doi.org/10.1007/978-3-031-37940-6_22
Sharma AKabra AJain M(2022)Ceasing hate with MoHInformation Processing and Management: an International Journal10.1016/j.ipm.2021.10276059:1Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1016/j.ipm.2021.102760
Mundra SMittal N(2022)FA-Net: fused attention-based network for Hindi English code-mixed offensive text classificationSocial Network Analysis and Mining10.1007/s13278-022-00929-112:1Online publication date: 3-Aug-2022
https://doi.org/10.1007/s13278-022-00929-1
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

A Hybrid Approach for Transliterated Word-Level Language Identification: CRF with Post-Processing Heuristics

ISM@FIRE-2013 Shared Task on Transliterated Search

Two approaches for the resolution of word mismatch problem caused by English words and foreign words in Korean information retrieval