Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2600428.2609543acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
poster

Learning to bridge colloquial and formal language applied to linking and search of E-Commerce data

Published: 03 July 2014 Publication History

Abstract

We study the problem of linking information between different idiomatic usages of the same language, for example, colloquial and formal language. We propose a novel probabilistic topic model called multi-idiomatic LDA (MiLDA). Its modeling principles follow the intuition that certain words are shared between two idioms of the same language, while other words are non-shared, that is, idiom-specific. We demonstrate the ability of our model to learn relations between cross-idiomatic topics in a dataset containing product descriptions and reviews. We intrinsically evaluate our model by the perplexity measure. Following that, as an extrinsic evaluation, we present the utility of the new MiLDA topic model in a recently proposed IR task of linking Pinterest pins (given in colloquial English on the users' side) to online webshops (given in formal English on the retailers' side). We show that our multi-idiomatic model outperforms the standard monolingual LDA model and the pure bilingual LDA model both in terms of perplexity and MAP scores in the IR task.

References

[1]
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet Allocation. Journal of Machine Learning Research, 3:993--1022, 2003.
[2]
W. De Smet and M.-F. Moens. Cross-language linking of news stories on the web using interlingual topic modelling. In Proc. of the CIKM SWSM Workshop, pages 57--64, 2009.
[3]
D. Mimno, H. Wallach, J. Naradowsky, D. A. Smith, and A. McCallum. Polylingual topic models. In EMNLP, pages 880--889, 2009.
[4]
X. Ni, J.-T. Sun, J. Hu, and Z. Chen. Mining multilingual topics from Wikipedia. In WWW, pages 1155--1156, 2009.
[5]
M. Steyvers and T. Griffiths. Probabilistic topic models. Handbook of Latent Semantic Analysis, 427(7):424--440, 2007.
[6]
X. Wei and W. B. Croft. LDA-based document models for ad-hoc retrieval. In SIGIR, pages 178--185, 2006.
[7]
S. Zoghbi, I. Vuli--c, and M.-F. Moens. Are words enough?: A study on text-based representations and retrieval models for linking pins to online shops. In CIKM UnstructureNLP Workshop, pages 45--52, 2013.

Cited By

View all
  • (2018)Retrieving Information from Multiple SourcesCompanion Proceedings of the The Web Conference 201810.1145/3184558.3186920(43-44)Online publication date: 23-Apr-2018
  • (2016)Fashion Meets Computer Vision and NLP at e-Commerce SearchInternational Journal of Computer and Electrical Engineering10.17706/IJCEE.2016.8.1.31-438:1(31-43)Online publication date: 2016
  • (2015)Probabilistic topic modeling in multilingual settings: An overview of its methodology and applicationsInformation Processing & Management10.1016/j.ipm.2014.08.00351:1(111-147)Online publication date: Jan-2015

Index Terms

  1. Learning to bridge colloquial and formal language applied to linking and search of E-Commerce data

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        SIGIR '14: Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval
        July 2014
        1330 pages
        ISBN:9781450322577
        DOI:10.1145/2600428
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 03 July 2014

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. personalized linking
        2. recommendation systems
        3. topic models
        4. unstructured data
        5. user interests
        6. user-generated data

        Qualifiers

        • Poster

        Conference

        SIGIR '14
        Sponsor:

        Acceptance Rates

        SIGIR '14 Paper Acceptance Rate 82 of 387 submissions, 21%;
        Overall Acceptance Rate 792 of 3,983 submissions, 20%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)6
        • Downloads (Last 6 weeks)2
        Reflects downloads up to 16 Nov 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2018)Retrieving Information from Multiple SourcesCompanion Proceedings of the The Web Conference 201810.1145/3184558.3186920(43-44)Online publication date: 23-Apr-2018
        • (2016)Fashion Meets Computer Vision and NLP at e-Commerce SearchInternational Journal of Computer and Electrical Engineering10.17706/IJCEE.2016.8.1.31-438:1(31-43)Online publication date: 2016
        • (2015)Probabilistic topic modeling in multilingual settings: An overview of its methodology and applicationsInformation Processing & Management10.1016/j.ipm.2014.08.00351:1(111-147)Online publication date: Jan-2015

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media