research-article

Learning Distributed Representations of Data in Community Question Answering for Question Retrieval

Authors:

Zhoujun LiAuthors Info & Claims

WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining

Pages 533 - 542

https://doi.org/10.1145/2835776.2835786

Published: 08 February 2016 Publication History

Abstract

We study the problem of question retrieval in community question answering (CQA). The biggest challenge within this task is lexical gaps between questions since similar questions are usually expressed with different but semantically related words. To bridge the gaps, state-of-the-art methods incorporate extra information such as word-to-word translation and categories of questions into the traditional language models. We find that the existing language model based methods can be interpreted using a new framework, that is they represent words and question categories in a vector space and calculate question-question similarities with a linear combination of dot products of the vectors. The problem is that these methods are either heuristic on data representation or difficult to scale up. We propose a principled and efficient approach to learning representations of data in CQA. In our method, we simultaneously learn vectors of words and vectors of question categories by optimizing an objective function naturally derived from the framework. In question retrieval, we incorporate learnt representations into traditional language models in an effective and efficient way. We conduct experiments on large scale data from Yahoo! Answers and Baidu Knows, and compared our method with state-of-the-art methods on two public data sets. Experimental results show that our method can significantly improve on baseline methods for retrieval relevance. On 1 million training data, our method takes less than 50 minutes to learn a model on a single multicore machine, while the translation based language model needs more than 2 days to learn a translation table on the same machine.

References

[1]

R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1999.

Digital Library

[2]

Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin. A neural probabilistic language model. JMLR '03, 3:1137--1155, Mar. 2003.

Digital Library

[3]

A. Berger, R. Caruana, D. Cohn, D. Freitag, and V. Mittal. Bridging the lexical chasm: statistical approaches to answer-finding. In SIGIR'00, pages 192--199, 2000.

Digital Library

[4]

X. Cao, G. Cong, B. Cui, and C. S. Jensen. A generalized framework of exploring category information for question retrieval in community question answer archives. In WWW'10, pages 201--210, 2010.

Digital Library

[5]

X. Cao, G. Cong, B. Cui, C. S. Jensen, and C. Zhang. The use of categorization information in language models for question retrieval. In CIKM'09, pages 265--274, 2009.

Digital Library

[6]

R. Collobert and J. Weston. A unified architecture for natural language processing: Deep neural networks with multitask learning. In ICML '08, pages 160--167, New York, NY, USA, 2008. ACM.

Digital Library

[7]

Y. Goldberg and O. Levy. word2vec explained: deriving mikolov et al.'s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722, 2014.

[8]

E. H. Huang, R. Socher, C. D. Manning, and A. Y. Ng. Improving word representations via global context and multiple word prototypes. In ACL'12, pages 873--882, Stroudsburg, PA, USA, 2012. ACL.

Digital Library

[9]

P.-S. Huang, X. He, J. Gao, L. Deng, A. Acero, and L. Heck. Learning deep structured semantic models for web search using clickthrough data. In CIKM '13, pages 2333--2338, New York, NY, USA, 2013. ACM.

Digital Library

[10]

J. Jeon, W. B. Croft, and J. H. Lee. Finding similar questions in large question and answer archives. In CIKM'05, pages 84--90, 2005.

Digital Library

[11]

Q. Le and T. Mikolov. Distributed representations of sentences and documents. In ICML '14, pages 1188--1196, 2014.

Digital Library

[12]

T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. CoRR, abs/1301.3781, 2013.

[13]

T. Mikolov, M. Karafiát, L. Burget, J. Cernocký, and S. Khudanpur. Recurrent neural network based language model. In T. Kobayashi, K. Hirose, and S. Nakamura, editors, INTERSPEECH, pages 1045--1048. ISCA, 2010.

[14]

T. Mikolov, Q. V. Le, and I. Sutskever. Exploiting similarities among languages for machine translation. CoRR, abs/1309.4168, 2013.

[15]

T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS '13, pages 3111--3119. Curran Associates, Inc., 2013.

Digital Library

[16]

A. Mnih and G. E. Hinton. A scalable hierarchical distributed language model. In NIPS '09, pages 1081--1088. Curran Associates, Inc., 2009.

[17]

F. Morin and Y. Bengio. Hierarchical probabilistic neural network language model. In R. G. Cowell and Z. Ghahramani, editors, Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, pages 246--252. Society for Artificial Intelligence and Statistics, 2005.

[18]

J. Ponte and W. Croft. A language modeling approach to information retrieval. In SIGIR' 98, pages 275--281, 1998.

Digital Library

[19]

S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at trec-3. In TREC, 1994.

[20]

H. Schwenk, D. Dchelotte, and J.-L. Gauvain. Continuous space language models for statistical machine translation. In COLING-ACL '06, pages 723--730, Stroudsburg, PA, USA, 2006. ACL.

Digital Library

[21]

H. Schwenk and J.-L. Gauvain. Neural network language models for conversational speech recognition. In International Conference on Speech and Language Processing, pages 1215--1218, 2004.

[22]

Y. Shen, X. He, J. Gao, L. Deng, and G. Mesnil. A latent semantic model with convolutional-pooling structure for information retrieval. In CIKM '14, pages 101--110, New York, NY, USA, 2014. ACM.

Digital Library

[23]

E. M. Voorhees. The trec-8 question answering track report. In Proceedings of the 8th Text Retrieval Conference, pages 77--82, 1999.

[24]

K. Wang, Z. Ming, and T.-S. Chua. A syntactic tree matching approach to finding similar questions in community-based qa services. In SIGIR'09, pages 187--194, 2009.

Digital Library

[25]

J. Weston, S. Bengio, and N. Usunier. Wsabie: Scaling up to large vocabulary image annotation. In IJCAI'11, pages 2764--2770. AAAI Press, 2011.

Digital Library

[26]

X. Xue, J. Jeon, and W. B. Croft. Retrieval models for question and answer archives. In SIGIR'08, pages 475--482, 2008.

Digital Library

[27]

C. Zhai and J. Lafferty. A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst., 22(2):179--214, 2004.

Digital Library

[28]

K. Zhang, W. Wu, H. Wu, Z. Li, and M. Zhou. Question retrieval with high quality answers in community question answering. In CIKM '14, pages 371--380, New York, NY, USA, 2014. ACM.

Digital Library

[29]

G. Zhou, L. Cai, J. Zhao, and K. Liu. Phrase-based translation model for question retrieval in community question answer archives. In ACL'11, pages 653--662, 2011.

Digital Library

Cited By

Khushhal SMajid AAbbas SNadeem MShah S(2020)Question retrieval using combined queries in community question answeringJournal of Intelligent Information Systems10.1007/s10844-020-00612-xOnline publication date: 24-Jul-2020
https://doi.org/10.1007/s10844-020-00612-x
Chelliah MShrivastava MRam Tej J(2020)Principle-to-Program: Neural Methods for Similar Question Retrieval in Online CommunitiesAdvances in Information Retrieval10.1007/978-3-030-45442-5_88(663-668)Online publication date: 8-Apr-2020
https://doi.org/10.1007/978-3-030-45442-5_88
Gallant MIsah HZulkernine FKhan S(2019)Xu: An Automated Query Expansion and Optimization Tool2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC)10.1109/COMPSAC.2019.00070(443-452)Online publication date: Jul-2019
https://doi.org/10.1109/COMPSAC.2019.00070
Show More Cited By

Index Terms

Learning Distributed Representations of Data in Community Question Answering for Question Retrieval
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Question-answer topic model for question retrieval in community question answering
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

The major challenge for Question Retrieval (QR) in Community Question Answering (CQA) is the lexical gap between the queried question and the historical questions. This paper proposes a novel Question-Answer Topic Model (QATM) to learn the latent topics ...
Learning the multilingual translation representations for question retrieval in community question answering via non-negative matrix factorization

Community question answering (CQA) has become an increasingly popular research topic. In this paper, we focus on the problem of question retrieval. Question retrieval in CQA can automatically find the most relevant and recent questions that have been ...
Improved Cross-Lingual Question Retrieval for Community Question Answering
WWW '19: The World Wide Web Conference

We perform cross-lingual question retrieval in community question answering (cQA), i.e., we retrieve similar questions for queries that are given in another language. The standard approach to cross-lingual information retrieval, which is to ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

WSDM '16: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining

February 2016

746 pages

ISBN:9781450337168

DOI:10.1145/2835776

General Chairs:
Paul N. Bennett
Microsoft Research
,
Vanja Josifovski
Pinterest
,
Program Chairs:
Jennifer Neville
Purdue University
,
Filip Radlinski
Microsoft

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 February 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
National High Technology Research and Development Program of China
Science and Technology Innovation Ability Promotion Project of Beijing
Major Projects of the National Social Science Fund of China
State Key Laboratory of Software Development Environment
Microsoft Research Asia Fund

Conference

WSDM 2016

Sponsor:

WSDM 2016: Ninth ACM International Conference on Web Search and Data Mining

February 22 - 25, 2016

California, San Francisco, USA

Acceptance Rates

WSDM '16 Paper Acceptance Rate 67 of 368 submissions, 18%;

Overall Acceptance Rate 498 of 2,863 submissions, 17%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
875
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Khushhal SMajid AAbbas SNadeem MShah S(2020)Question retrieval using combined queries in community question answeringJournal of Intelligent Information Systems10.1007/s10844-020-00612-xOnline publication date: 24-Jul-2020
https://doi.org/10.1007/s10844-020-00612-x
Chelliah MShrivastava MRam Tej J(2020)Principle-to-Program: Neural Methods for Similar Question Retrieval in Online CommunitiesAdvances in Information Retrieval10.1007/978-3-030-45442-5_88(663-668)Online publication date: 8-Apr-2020
https://doi.org/10.1007/978-3-030-45442-5_88
Gallant MIsah HZulkernine FKhan S(2019)Xu: An Automated Query Expansion and Optimization Tool2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC)10.1109/COMPSAC.2019.00070(443-452)Online publication date: Jul-2019
https://doi.org/10.1109/COMPSAC.2019.00070
Abujabal ASaha Roy RYahya MWeikum GChampin PGandon FMédini LLalmas MIpeirotis P(2018)Never-Ending Learning for Open-Domain Question Answering over Knowledge BasesProceedings of the 2018 World Wide Web Conference10.1145/3178876.3186004(1053-1062)Online publication date: 10-Apr-2018
https://dl.acm.org/doi/10.1145/3178876.3186004
Yu QLam WChang YZhai CLiu YMaarek Y(2018)Review-Aware Answer Prediction for Product-Related Questions Incorporating AspectsProceedings of the Eleventh ACM International Conference on Web Search and Data Mining10.1145/3159652.3159718(691-699)Online publication date: 2-Feb-2018
https://dl.acm.org/doi/10.1145/3159652.3159718
Kamineni AShrivastava MYenala HChinnakotla M(2018)Siamese LSTM with Convolutional Similarity for Similar Question Retrieval2018 International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)10.1109/iSAI-NLP.2018.8692937(1-7)Online publication date: Nov-2018
https://doi.org/10.1109/iSAI-NLP.2018.8692937
Zhou GHuang J(2017)Modeling and Learning Distributed Word Representation with Metadata for Question RetrievalIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.266562529:6(1226-1239)Online publication date: 1-Jun-2017
https://dl.acm.org/doi/10.1109/TKDE.2017.2665625

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten