Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2824864.2824872acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfireConference Proceedingsconference-collections
research-article

IIIT-H System Submission for FIRE2014 Shared Task on Transliterated Search

Published: 05 December 2014 Publication History

Abstract

This paper describes our submission for FIRE 2014 Shared Task on Transliterated Search. The shared task features two sub-tasks: Query word labeling and Mixed-script Ad hoc retrieval for Hindi Song Lyrics.
Query Word Labeling is on token level language identification of query words in code-mixed queries and back-transliteration of identified Indian language words into their native scripts. We have developed letter based language models for the token level language identification of query words and a structured perceptron model for back-transliteration of Indic words.
The second subtask for Mixed-script Ad hoc retrieval for Hindi Song Lyrics is to retrieve a ranked list of songs from a corpus of Hindi song lyrics given an input query in Devanagari or transliterated Roman script. We have used edit distance based query expansion and language modeling followed by relevance based reranking for the retrieval of relevant Hindi Song lyrics for a given query.

References

[1]
Michael Collins. Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. pages 188--193, 2006.
[2]
Marcello Federico, Nicola Bertoldi, and Mauro Cettolo. Irstlm: an open source toolkit for handling large scale language models. In Interspeech, pages 1618--1621, 2008.
[3]
Parth Gupta, Kalika Bali, Rafael E Banchs, Monojit Choudhury, and Paolo Rosso. Query expansion for mixed-script information retrieval. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, pages 677--686. ACM, 2014.
[4]
Christopher D Manning, Prabhakar Raghavan, and Hinrich Schütze. Introduction to information retrieval, volume 1. Cambridge university press Cambridge, 2008.
[5]
Tomas Mikolov, Stefan Kombrink, Anoop Deoras, Lukar Burget, and J Cernocky. Rnnlm-recurrent neural network language modeling toolkit. In Proc. of the 2011 ASRU Workshop, pages 196--201, 2011.
[6]
Franz Josef Och and Hermann Ney. Giza++: Training of statistical translation models, 2000.
[7]
Andreas Stolcke et al. Srilm-an extensible language modeling toolkit. In INTERSPEECH, 2002.
[8]
Olga Vechtomova and Ying Wang. A study of the effect of term proximity on query expansion. Journal of Information Science, 32(4):324--333, 2006.

Cited By

View all
  • (2023)Code‐mixed Hindi‐English text correction using fuzzy graph and word embeddingExpert Systems10.1111/exsy.13328Online publication date: 14-May-2023
  • (2023)Sentence Level Language Identification in Code-mix Gujarati Language with Transformers2023 15th International Conference on Innovations in Information Technology (IIT)10.1109/IIT59782.2023.10366421(218-221)Online publication date: 14-Nov-2023
  • (2023)Evaluating Ensembled Transformers for Multilingual Code-Switched Sentiment Analysis2023 International Conference on Computational Science and Computational Intelligence (CSCI)10.1109/CSCI62032.2023.00032(165-173)Online publication date: 13-Dec-2023
  • Show More Cited By

Index Terms

  1. IIIT-H System Submission for FIRE2014 Shared Task on Transliterated Search

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    FIRE '14: Proceedings of the 6th Annual Meeting of the Forum for Information Retrieval Evaluation
    December 2014
    151 pages
    ISBN:9781450337557
    DOI:10.1145/2824864
    • Editors:
    • Prasenjit Majumder,
    • Mandar Mitra,
    • Sukomal Pal,
    • Madhulika Agrawal,
    • Parth Mehta
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 05 December 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Information Retrieval
    2. Language Identification
    3. Language Modeling
    4. Perplexity
    5. Transliteration

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    FIRE '14
    FIRE '14: Forum for Information Retrieval Evaluation
    December 5 - 7, 2014
    Bangalore, India

    Acceptance Rates

    Overall Acceptance Rate 19 of 64 submissions, 30%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 24 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Code‐mixed Hindi‐English text correction using fuzzy graph and word embeddingExpert Systems10.1111/exsy.13328Online publication date: 14-May-2023
    • (2023)Sentence Level Language Identification in Code-mix Gujarati Language with Transformers2023 15th International Conference on Innovations in Information Technology (IIT)10.1109/IIT59782.2023.10366421(218-221)Online publication date: 14-Nov-2023
    • (2023)Evaluating Ensembled Transformers for Multilingual Code-Switched Sentiment Analysis2023 International Conference on Computational Science and Computational Intelligence (CSCI)10.1109/CSCI62032.2023.00032(165-173)Online publication date: 13-Dec-2023
    • (2023)HHSD: Hindi Hate Speech Detection Leveraging Multi-Task LearningIEEE Access10.1109/ACCESS.2023.331299311(101460-101473)Online publication date: 2023
    • (2023)The Effect of Stopword Removal on Information Retrieval for Code-Mixed Data Obtained Via Social MediaSN Computer Science10.1007/s42979-023-01942-74:5Online publication date: 27-Jun-2023
    • (2023)Improving English-Assamese Neural Machine Translation Using Transliteration-Based ApproachEvolution in Computational Intelligence10.1007/978-981-19-7513-4_20(223-231)Online publication date: 26-Apr-2023
    • (2023)Deep Bidirectional LSTM Network Learning-Based Sentiment Analysis for Tunisian Dialectical Facebook Content During the Spread of the Coronavirus PandemicAdvances in Computational Collective Intelligence10.1007/978-3-031-41774-0_8(96-109)Online publication date: 22-Sep-2023
    • (2023)Code Mixed Information Retrieval for Gujarati Script News ArticlesAdvances in Computing and Data Sciences10.1007/978-3-031-37940-6_22(265-276)Online publication date: 23-Jul-2023
    • (2022)Ceasing hate with MoHInformation Processing and Management: an International Journal10.1016/j.ipm.2021.10276059:1Online publication date: 1-Jan-2022
    • (2022)FA-Net: fused attention-based network for Hindi English code-mixed offensive text classificationSocial Network Analysis and Mining10.1007/s13278-022-00929-112:1Online publication date: 3-Aug-2022
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media