research-article

QALink: Enriching Text Documents with Relevant Q&A Site Contents

Authors:

Anthony K.H. Tung,

Beibei ZhangAuthors Info & Claims

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

Pages 1359 - 1368

https://doi.org/10.1145/3132847.3132934

Published: 06 November 2017 Publication History

Abstract

With rapid development of Q&A sites such as Quora and StackExchange, high quality question-answer pairs have been produced by users. These Q&A contents cover a wide range of topics, and they are useful for users to resolve queries and obtain new knowledge. Meanwhile, when people are reading digital documents, they may encounter reading problems such as lack of background information and unclear illustration of concepts. We believe that Q&A sites offer high-quality contents which can serve as rich supplements to digital documents. In this paper, we devise a rigorous formulation of the novel text enrichment problem, and design an end-to-end system named QALink which assigns the most relevant Q&A contents to the corresponding section of the document. We first present a new segmentation approach to model each document with a hierarchical structure. Based on the hierarchy, queries are constructed to retrieve and rank related question-answer pairs. Both syntactical and semantic features are adopted in our system. The empirical evaluation results indicate that QALink is able to effectively enrich text documents with relevant Q&A contents to help people better understand the documents.

References

[1]

Hartigan John A and Wong Manchek A. 1979. Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society (1979), 100--108.

[2]

Rakesh Agrawal, Sreenivas Gollapudi, Anitha Kannan, and Krishnaram Kenthapadi. 2011. Enriching Textbooks with Images. In CIKM. 1847--1856.

Digital Library

[3]

Rakesh Agrawal, Sreenivas Gollapudi, Anitha Kannan, and Krishnaram Kenthapadi. 2012. Data Mining for Improving Textbooks. SIGKDD Explor. Newsl. (2012), 7--19.

Digital Library

[4]

Rakesh Agrawal, Sreenivas Gollapudi, Anitha Kannan, and Krishnaram Kenthapadi. 2014. Similarity Search using Concept Graphs. In CIKM.

Digital Library

[5]

Akiko Aizawa. 2003. An information-theoretic perspective of tf--idf measures. Information Processing and Management (2003), 45--65.

Digital Library

[6]

Doug Beeferman, Adam Berger, and John Lafferty. 1999. Statistical Models for Text Segmentation. Mach. Learn. (1999), 177--210.

Digital Library

[7]

Adam Berger and John Lafferty. 1999. Information retrieval as statistical translation. Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. 222--229.

Digital Library

[8]

Marc Bron, Bouke Huurnink, and Maarten de Rijke. 2011. Linking Archives Using Document Enrichment and Term Selection TPDL.

Digital Library

[9]

Lucien Carroll. 2010. Evaluating Hierarchical Discourse Segmentation. ACL (2010).

Digital Library

[10]

Freddy Y. Y. Choi. 2000. Advances in Domain Independent Linear Text Segmentation NAACL.

Digital Library

[11]

Charles LA Clarke, Maheedhar Kolla, Gordon V Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, and Ian MacKinnon. 2008. Novelty and diversity in information retrieval evaluation SIGIR.

Digital Library

[12]

Inderjit S. Dhillon and Dharmendra S. Modha. 2001. Concept Decompositions for Large Sparse Text Data Using Clustering. Mach. Learn. (2001), 143--175.

[13]

Paolo Ferragina and Ugo Scaiella. 2010. TAGME: On-the-fly Annotation of Short Text Fragments (by Wikipedia Entities) CIKM. 1625--1628.

Digital Library

[14]

Johanna Moore Freddy Choi, Peter Wiemer-Hastings. 2001. Latent semantic analysis for text segmentation. In EMNLP. 109--117.

[15]

Evgeniy Gabrilovich and Shaul Markovitch. 2006. Overcoming the Brittleness Bottleneck Using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge. In AAAI. 1301--1306.

Digital Library

[16]

Alex Graves, Greg Wayne, and Ivo Danihelka. 2014. Neural turing machines. arXiv preprint arXiv:1410.5401 (2014).

[17]

Xianpei Han, Le Sun, and Jun Zhao. 2011. Collective Entity Linking in Web Text: A Graph-based Method SIGIR. 765--774.

Digital Library

[18]

Marti A. Hearst. 1997. TextTiling: segmenting text into multi-paragraph subtopic passages. Comput. Linguist. (1997), 33--64.

Digital Library

[19]

H. V. Jagadish, Nick Koudas, S. Muthukrishnan, Viswanath Poosala, Kenneth C. Sevcik, and Torsten Suel. 1998. Optimal Histograms with Quality Guarantees. In VLDB. 275--286.

Digital Library

[20]

Frederick Jelinek. 1980. Interpolated estimation of Markov source parameters from sparse data. Pattern recognition in practice (1980).

[21]

Wei Kang, Anthony K. H. Tung, Wei Chen, Xinyu Li, and Qiyue Song. 2014. Trendspedia: An Internet observatory for analyzing and visualizing the evolving web ICDE.

[22]

Marios Kokkodis, Anitha Kannan, and Krishnaram Kenthapadi. 2014. Assigning Videos to Textbooks at Appropriate Granularity ACM LAS.

Digital Library

[23]

Andrew L Maas, Awni Y Hannun, and Andrew Y Ng. 2013. Rectifier nonlinearities improve neural network acoustic models Proc. ICML.

[24]

Rada Mihalcea and Andras Csomai. 2007. Wikify!: Linking Documents to Encyclopedic Knowledge CIKM. 233--242.

Digital Library

[25]

Lev Pevzner and Marti A. Hearst. 2002. A Critique and Improvement of an Evaluation Metric for Text Segmentation. Comput. Linguist. (2002), 19--36.

Digital Library

[26]

Stephen Robertson and Hugo Zaragoza. 2009. The probabilistic relevance framework: BM25 and beyond.

Digital Library

[27]

Malcolm Slaney and Dulce Ponceleon. 2001. Hierarchical segmentation: finding changes in a text signal SIAM Text Mining Workshop. 6--13.

[28]

Fei Song, William M. Darling, Adnan Duric, and Fred W. Kroon. 2011. An Iterative Approach to Text Segmentation. In ECIR. 629--640.

Digital Library

[29]

Mikolov Tomas, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality NIPS.

Digital Library

[30]

Andrew Trotman. 2005. Learning to rank. Information Retrieval (2005), 359--381.

Digital Library

[31]

Manos Tsagkias, Maarten de Rijke, and Wouter Weerkamp. 2011. Linking Online News and Social Media. In WSDM.

Digital Library

[32]

Yaakov Yaari. 1997. Segmentation of expository texts by hierarchical agglomerative clustering. RANLP (1997).

Cited By

Bai XWu XStojkovic ITsioutsiouliklis KSerra ESpezzano F(2024)Leveraging Large Language Models for Improving Keyphrase Generation for Contextual TargetingProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680093(4349-4357)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3680093
Li SJiang TZhang Y(2024)A Phrase-Level Attention Enhanced CRF for Keyphrase ExtractionAdvances in Information Retrieval10.1007/978-3-031-56027-9_28(455-469)Online publication date: 20-Mar-2024
https://doi.org/10.1007/978-3-031-56027-9_28
Zhang YJiang TYang TLi XWang SAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)HTKGProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531990(1044-1054)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3531990
Show More Cited By

Index Terms

QALink: Enriching Text Documents with Relevant Q&A Site Contents
1. Information systems
  1. Information retrieval
    1. Document representation
      1. Document structure
    2. Retrieval models and ranking
      1. Learning to rank

Recommendations

Document Enrichment using DBPedia Ontology for Short Text Classification
WIMS '18: Proceedings of the 8th International Conference on Web Intelligence, Mining and Semantics

Every day, millions of short-texts are generated for which effective tools for organization and retrieval are required. Because of the short length of these documents and of their extremely sparse representations, the traditional text classification ...
Identifying, Indexing, and Ranking Chemical Formulae and Chemical Names in Digital Documents

End-users utilize chemical search engines to search for chemical formulae and chemical names. Chemical search engines identify and index chemical formulae and chemical names appearing in text documents to support efficient search and retrieval in the ...
Learning to rank with groups
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management

An essential issue in document retrieval is ranking, and the documents are ranked by their expected relevance to a given query. Multiple labels are used to represent different level of relevance for documents to a given query, and the corresponding ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

November 2017

2604 pages

ISBN:9781450349185

DOI:10.1145/3132847

General Chairs:
Ee-Peng Lim
Singapore Management University, Singapore
,
Marianne Winslett
University of Illinois at Urbana-Champaign, USA, and Advanced Digital Sciences Center, Singapore
,
Program Chairs:
Mark Sanderson
RMIT, Australia
,
Ada Fu
Chinese University of Hong Kong, Hong Kong
,
Jimeng Sun
Georgia Tech, USA
,
Shane Culpepper
RMIT, Australia
,
Eric Lo
Chinese University of Hong Kong, Hong Kong
,
Joyce Ho
Emory University, USA
,
Debora Donato
Mix Tech, Inc., USA
,
Rakesh Agrawal
Data Insights Laboratories, USA
,
Yu Zheng
Microsoft Research Asia, China
,
Carlos Castillo
Qatar Computing Research Institute, Qatar
,
Aixin Sun
Nanyang Technological University, Singapore
,
Vincent S. Tseng
National Cheng Kung University, Taiwan
,
Chenliang Li
Wuhan University, China

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Research Foundation Singapore

Conference

CIKM '17

Sponsor:

CIKM '17: ACM Conference on Information and Knowledge Management

November 6 - 10, 2017

Singapore, Singapore

Acceptance Rates

CIKM '17 Paper Acceptance Rate 171 of 855 submissions, 20%;

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
238
Total Downloads

Downloads (Last 12 months)15
Downloads (Last 6 weeks)3

Reflects downloads up to 16 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bai XWu XStojkovic ITsioutsiouliklis KSerra ESpezzano F(2024)Leveraging Large Language Models for Improving Keyphrase Generation for Contextual TargetingProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680093(4349-4357)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3680093
Li SJiang TZhang Y(2024)A Phrase-Level Attention Enhanced CRF for Keyphrase ExtractionAdvances in Information Retrieval10.1007/978-3-031-56027-9_28(455-469)Online publication date: 20-Mar-2024
https://doi.org/10.1007/978-3-031-56027-9_28
Zhang YJiang TYang TLi XWang SAmigo ECastells PGonzalo JCarterette BCulpepper JKazai G(2022)HTKGProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531990(1044-1054)Online publication date: 6-Jul-2022
https://dl.acm.org/doi/10.1145/3477495.3531990
Zhang YYang TJiang TLi XWang S(2022)Hyperbolic Deep Keyphrase GenerationMachine Learning and Knowledge Discovery in Databases10.1007/978-3-031-26390-3_30(521-536)Online publication date: 19-Sep-2022
https://dl.acm.org/doi/10.1007/978-3-031-26390-3_30
Zhang YLiu HShi BLi XWang S(2020)WEKE: Learning Word Embeddings for Keyphrase ExtractionWeb and Big Data10.1007/978-3-030-60290-1_19(245-260)Online publication date: 14-Oct-2020
https://doi.org/10.1007/978-3-030-60290-1_19
Zhou TZhang YZhu H(2020)Multi-level Memory Network with CRFs for Keyphrase ExtractionAdvances in Knowledge Discovery and Data Mining10.1007/978-3-030-47426-3_56(726-738)Online publication date: 6-May-2020
https://doi.org/10.1007/978-3-030-47426-3_56
Wang XWang RBao ZLiang JLu WPiwowarski BChevalier MGaussier EMaarek YNie JScholer F(2019)Effective Medical Archives Processing Using Knowledge GraphsProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331350(1141-1144)Online publication date: 18-Jul-2019
https://dl.acm.org/doi/10.1145/3331184.3331350
Zhang YLiu HWang SIp. WFan WXiao C(2019)Automatic keyphrase extraction using word embeddingsSoft Computing10.1007/s00500-019-03963-y24:8(5593-5608)Online publication date: 29-Mar-2019
https://doi.org/10.1007/s00500-019-03963-y

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents