Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3132847.3132934acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

QALink: Enriching Text Documents with Relevant Q&A Site Contents

Published: 06 November 2017 Publication History

Abstract

With rapid development of Q&A sites such as Quora and StackExchange, high quality question-answer pairs have been produced by users. These Q&A contents cover a wide range of topics, and they are useful for users to resolve queries and obtain new knowledge. Meanwhile, when people are reading digital documents, they may encounter reading problems such as lack of background information and unclear illustration of concepts. We believe that Q&A sites offer high-quality contents which can serve as rich supplements to digital documents. In this paper, we devise a rigorous formulation of the novel text enrichment problem, and design an end-to-end system named QALink which assigns the most relevant Q&A contents to the corresponding section of the document. We first present a new segmentation approach to model each document with a hierarchical structure. Based on the hierarchy, queries are constructed to retrieve and rank related question-answer pairs. Both syntactical and semantic features are adopted in our system. The empirical evaluation results indicate that QALink is able to effectively enrich text documents with relevant Q&A contents to help people better understand the documents.

References

[1]
Hartigan John A and Wong Manchek A. 1979. Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society (1979), 100--108.
[2]
Rakesh Agrawal, Sreenivas Gollapudi, Anitha Kannan, and Krishnaram Kenthapadi. 2011. Enriching Textbooks with Images. In CIKM. 1847--1856.
[3]
Rakesh Agrawal, Sreenivas Gollapudi, Anitha Kannan, and Krishnaram Kenthapadi. 2012. Data Mining for Improving Textbooks. SIGKDD Explor. Newsl. (2012), 7--19.
[4]
Rakesh Agrawal, Sreenivas Gollapudi, Anitha Kannan, and Krishnaram Kenthapadi. 2014. Similarity Search using Concept Graphs. In CIKM.
[5]
Akiko Aizawa. 2003. An information-theoretic perspective of tf--idf measures. Information Processing and Management (2003), 45--65.
[6]
Doug Beeferman, Adam Berger, and John Lafferty. 1999. Statistical Models for Text Segmentation. Mach. Learn. (1999), 177--210.
[7]
Adam Berger and John Lafferty. 1999. Information retrieval as statistical translation. Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. 222--229.
[8]
Marc Bron, Bouke Huurnink, and Maarten de Rijke. 2011. Linking Archives Using Document Enrichment and Term Selection TPDL.
[9]
Lucien Carroll. 2010. Evaluating Hierarchical Discourse Segmentation. ACL (2010).
[10]
Freddy Y. Y. Choi. 2000. Advances in Domain Independent Linear Text Segmentation NAACL.
[11]
Charles LA Clarke, Maheedhar Kolla, Gordon V Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, and Ian MacKinnon. 2008. Novelty and diversity in information retrieval evaluation SIGIR.
[12]
Inderjit S. Dhillon and Dharmendra S. Modha. 2001. Concept Decompositions for Large Sparse Text Data Using Clustering. Mach. Learn. (2001), 143--175.
[13]
Paolo Ferragina and Ugo Scaiella. 2010. TAGME: On-the-fly Annotation of Short Text Fragments (by Wikipedia Entities) CIKM. 1625--1628.
[14]
Johanna Moore Freddy Choi, Peter Wiemer-Hastings. 2001. Latent semantic analysis for text segmentation. In EMNLP. 109--117.
[15]
Evgeniy Gabrilovich and Shaul Markovitch. 2006. Overcoming the Brittleness Bottleneck Using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge. In AAAI. 1301--1306.
[16]
Alex Graves, Greg Wayne, and Ivo Danihelka. 2014. Neural turing machines. arXiv preprint arXiv:1410.5401 (2014).
[17]
Xianpei Han, Le Sun, and Jun Zhao. 2011. Collective Entity Linking in Web Text: A Graph-based Method SIGIR. 765--774.
[18]
Marti A. Hearst. 1997. TextTiling: segmenting text into multi-paragraph subtopic passages. Comput. Linguist. (1997), 33--64.
[19]
H. V. Jagadish, Nick Koudas, S. Muthukrishnan, Viswanath Poosala, Kenneth C. Sevcik, and Torsten Suel. 1998. Optimal Histograms with Quality Guarantees. In VLDB. 275--286.
[20]
Frederick Jelinek. 1980. Interpolated estimation of Markov source parameters from sparse data. Pattern recognition in practice (1980).
[21]
Wei Kang, Anthony K. H. Tung, Wei Chen, Xinyu Li, and Qiyue Song. 2014. Trendspedia: An Internet observatory for analyzing and visualizing the evolving web ICDE.
[22]
Marios Kokkodis, Anitha Kannan, and Krishnaram Kenthapadi. 2014. Assigning Videos to Textbooks at Appropriate Granularity ACM LAS.
[23]
Andrew L Maas, Awni Y Hannun, and Andrew Y Ng. 2013. Rectifier nonlinearities improve neural network acoustic models Proc. ICML.
[24]
Rada Mihalcea and Andras Csomai. 2007. Wikify!: Linking Documents to Encyclopedic Knowledge CIKM. 233--242.
[25]
Lev Pevzner and Marti A. Hearst. 2002. A Critique and Improvement of an Evaluation Metric for Text Segmentation. Comput. Linguist. (2002), 19--36.
[26]
Stephen Robertson and Hugo Zaragoza. 2009. The probabilistic relevance framework: BM25 and beyond.
[27]
Malcolm Slaney and Dulce Ponceleon. 2001. Hierarchical segmentation: finding changes in a text signal SIAM Text Mining Workshop. 6--13.
[28]
Fei Song, William M. Darling, Adnan Duric, and Fred W. Kroon. 2011. An Iterative Approach to Text Segmentation. In ECIR. 629--640.
[29]
Mikolov Tomas, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality NIPS.
[30]
Andrew Trotman. 2005. Learning to rank. Information Retrieval (2005), 359--381.
[31]
Manos Tsagkias, Maarten de Rijke, and Wouter Weerkamp. 2011. Linking Online News and Social Media. In WSDM.
[32]
Yaakov Yaari. 1997. Segmentation of expository texts by hierarchical agglomerative clustering. RANLP (1997).

Cited By

View all
  • (2024)Leveraging Large Language Models for Improving Keyphrase Generation for Contextual TargetingProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680093(4349-4357)Online publication date: 21-Oct-2024
  • (2024)A Phrase-Level Attention Enhanced CRF for Keyphrase ExtractionAdvances in Information Retrieval10.1007/978-3-031-56027-9_28(455-469)Online publication date: 20-Mar-2024
  • (2022)HTKGProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531990(1044-1054)Online publication date: 6-Jul-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management
November 2017
2604 pages
ISBN:9781450349185
DOI:10.1145/3132847
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 November 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. hierarchical text segmentation
  2. learning to rank
  3. probabilistic information retrieval
  4. q&a sites
  5. text enrichment

Qualifiers

  • Research-article

Funding Sources

Conference

CIKM '17
Sponsor:

Acceptance Rates

CIKM '17 Paper Acceptance Rate 171 of 855 submissions, 20%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)15
  • Downloads (Last 6 weeks)3
Reflects downloads up to 16 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Leveraging Large Language Models for Improving Keyphrase Generation for Contextual TargetingProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3680093(4349-4357)Online publication date: 21-Oct-2024
  • (2024)A Phrase-Level Attention Enhanced CRF for Keyphrase ExtractionAdvances in Information Retrieval10.1007/978-3-031-56027-9_28(455-469)Online publication date: 20-Mar-2024
  • (2022)HTKGProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3477495.3531990(1044-1054)Online publication date: 6-Jul-2022
  • (2022)Hyperbolic Deep Keyphrase GenerationMachine Learning and Knowledge Discovery in Databases10.1007/978-3-031-26390-3_30(521-536)Online publication date: 19-Sep-2022
  • (2020)WEKE: Learning Word Embeddings for Keyphrase ExtractionWeb and Big Data10.1007/978-3-030-60290-1_19(245-260)Online publication date: 14-Oct-2020
  • (2020)Multi-level Memory Network with CRFs for Keyphrase ExtractionAdvances in Knowledge Discovery and Data Mining10.1007/978-3-030-47426-3_56(726-738)Online publication date: 6-May-2020
  • (2019)Effective Medical Archives Processing Using Knowledge GraphsProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3331184.3331350(1141-1144)Online publication date: 18-Jul-2019
  • (2019)Automatic keyphrase extraction using word embeddingsSoft Computing10.1007/s00500-019-03963-y24:8(5593-5608)Online publication date: 29-Mar-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media