Article

Incorporating expert knowledge into keyphrase extraction

Authors:

Sujatha Das Gollapalli,

Peng YangAuthors Info & Claims

AAAI'17: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence

Pages 3180 - 3187

Published: 04 February 2017 Publication History

Abstract

Keyphrases that efficiently summarize a document's content are used in various document processing and retrieval tasks. Current state-of-the-art techniques for keyphrase extraction operate at a phrase-level and involve scoring candidate phrases based on features of their component words. In this paper, we learn keyphrase taggers for research papers using token-based features incorporating linguistic, surface-form, and document-structure information through sequence labeling. We experimentally illustrate that using withindocument features alone, our tagger trained with Conditional Random Fields performs on-par with existing state-of-the-art systems that rely on information from Wikipedia and citation networks. In addition, we are also able to harness recent work on feature labeling to seamlessly incorporate expert knowledge and predictions from existing systems to enhance the extraction performance further. We highlight the modeling advantages of our keyphrase taggers and show significant performance improvements on two recently-compiled datasets of keyphrases from Computer Science research papers.

References

[1]

Arcan, M.; Turchi, M.; Tonelli, S.; and Buitelaar, P. 2014. Enhancing statistical machine translation with bilingual terminology in a cat environment. In Proceedings of the Eleventh Biennial Conference of the Association for Machine Translation in the Americas.

[2]

Bao, S.; Xue, G.; Wu, X.; Yu, Y.; Fei, B.; and Su, Z. 2007. Optimizing web search using social annotations. In WWW.

Digital Library

[3]

Bhaskar, P.; Nongmeikapam, K.; and Bandyopadhyay, S. 2012. Keyphrase extraction in scientific articles: A supervised approach. In COLING.

[4]

Bishop, C. M. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc.

Digital Library

[5]

Bong, S.-Y., and Hwang, K.-B. 2011. Keyphrase extraction in biomedical publications using mesh and intraphrase word co-occurrence information. In Proceedings of the ACM Fifth International Workshop on Data and Text Mining in Biomedical Informatics.

Digital Library

[6]

Boudin, F. 2013. A comparison of centrality measures for graph-based keyphrase extraction. In IJCNLP.

[7]

Caragea, C.; Bulgarov, F. A.; Godea, A.; and Gollapalli, S. D. 2014. Citation-enhanced keyphrase extraction from research papers: A supervised approach. In EMNLP.

[8]

Druck, G.; Mann, G.; and McCallum, A. 2008. Learning from labeled features using generalized expectation criteria. In SIGIR.

Digital Library

[9]

Finkel, J. R.; Grenager, T.; and Manning, C. 2005. Incorporating non-local information into information extraction systems by gibbs sampling. In ACL.

Digital Library

[10]

Frank, E.; Paynter, G. W.; Witten, I. H.; Gutwin, C.; and Nevill-Manning, C. G. 1999. Domain-specific keyphrase extraction. In IJCAI.

Digital Library

[11]

Ganchev, K.; Graça, J. a.; Gillenwater, J.; and Taskar, B. 2010. Posterior regularization for structured latent variable models. JMLR.

Digital Library

[12]

Gollapalli, S. D., and Caragea, C. 2014. Extracting keyphrases from research papers using citation networks. In AAAI.

Digital Library

[13]

Gollapalli, S. D.; Qi, Y.; Mitra, P.; and Giles, C. L. 2014. Extracting researcher metadata with labeled features. In SDM.

[14]

Haghighi, A., and Klein, D. 2006. Prototype-driven learning for sequence models. In HLT-NAACL.

Digital Library

[15]

Hammouda, K. M.; Matute, D. N.; and Kamel, M. S. 2005. Corephrase: Keyphrase extraction for document clustering. In Machine Learning and Data Mining in Pattern Recognition.

Digital Library

[16]

Hasan, K. S., and Ng, V. 2010. Conundrums in unsupervised keyphrase extraction: Making sense of the state-of-the-art. In COLING.

Digital Library

[17]

Hasan, K. S., and Ng, V. 2014. Automatic keyphrase extraction: A survey of the state of the art. In ACL.

[18]

Hoffmann, R.; Zhang, C.; Ling, X.; Zettlemoyer, L.; and Weld, D. S. 2011. Knowledge-based weak supervision for information extraction of overlapping relations. In HLT.

Digital Library

[19]

Hulth, A. 2003. Improved automatic keyword extraction given more linguistic knowledge. EMNLP.

Digital Library

[20]

Indurkhya, N., and Damerau, F. J. 2010. Handbook of Natural Language Processing. Chapman & Hall/CRC, 2nd edition.

Digital Library

[21]

Jiang, X.; Hu, Y.; and Li, H. 2009. A ranking approach to keyphrase extraction. In SIGIR.

Digital Library

[22]

Kim, S. N.; Medelyan, O.; Kan, M.-Y.; and Baldwin, T. 2010. Semeval-2010 task 5: Automatic keyphrase extraction from scientific articles. In SemEval.

Digital Library

[23]

Lafferty, J. D.; McCallum, A.; and Pereira, F. C. N. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML.

Digital Library

[24]

Li, H.; Councill, I. G.; Bolelli, L.; Zhou, D.; Song, Y.; Lee, W.-C.; Sivasubramaniam, A.; and Giles, C. L. 2006. Citeseerx: A scalable autonomous scientific digital library. In Proceedings of the 1st International Conference on Scalable Information Systems.

Digital Library

[25]

Li, Z.; Zhou, D.; Juan, Y.-F.; and Han, J. 2010. Keyword extraction for social snippets. In WWW.

Digital Library

[26]

Liu, F.; Pennell, D.; Liu, F.; and Liu, Y. 2009. Unsupervised approaches for automatic keyword extraction using meeting transcripts. In NAACL.

Digital Library

[27]

Mann, S. G., and McCallum, A. 2008. Generalized expectation criteria for semi-supervised learning of conditional random fields. In Proceedings of ACL-08: HLT.

[28]

Mann, G. S., and McCallum, A. 2010. Generalized expectation criteria for semi-supervised learning with weakly labeled data. J. Mach. Learn. Res.

Digital Library

[29]

Manning, C. D.; Raghavan, P.; and Schütze, H. 2008. Introduction to Information Retrieval. Cambridge University Press.

Digital Library

[30]

Marujo, L.; Ribeiro, R.; de Matos, D. M.; Neto, J. P.; Gershman, A.; and Carbonell, J. G. 2013. Key phrase extraction of lightly filtered broadcast news. CoRR.

[31]

McCallum, A. K. 2002. Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu.

[32]

Medelyan, O.; Frank, E.; and Witten, I. H. 2009. Human-competitive tagging using automatic keyphrase extraction. In EMNLP.

Digital Library

[33]

Mihalcea, R., and Tarau, P. 2004. Textrank: Bringing order into text. In EMNLP.

[34]

Nguyen, T. D., and Kan, M.-Y. 2007. Keyphrase extraction in scientific publications. In Proceedings of the 10th International Conference on Asian Digital Libraries: Looking Back 10 Years and Forging New Frontiers.

Digital Library

[35]

Porter, M. F. 1997. Readings in information retrieval. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. chapter An Algorithm for Suffix Stripping.

Digital Library

[36]

Sarawagi, S. 2005. Advanced Methods for Knowledge Discovery from Complex Data. chapter Sequence Data Mining.

[37]

Sutton, C., and McCallum, A. 2012. An introduction to conditional random fields. Found. Trends Mach. Learn.

Digital Library

[38]

Turney, P. D. 2000. Learning algorithms for keyphrase extraction. Information Retrieval 2(4).

Digital Library

[39]

Wan, X., and Xiao, J. 2008. Single document keyphrase extraction using neighborhood knowledge. In AAAI.

Digital Library

[40]

Wang, R.; Liu, W.; and McDonald, C. 2015. Corpus-independent generic keyphrase extraction using word embedding vectors. In Deep Learning for Web Search and Data Mining.

[41]

Witten, I. H.; Paynter, G. W.; Frank, E.; Gutwin, C.; and Nevill-Manning, C. G. 1999. Kea: Practical automatic keyphrase extraction. In Proceedings of the Fourth ACM Conference on Digital Libraries.

Digital Library

[42]

Xu, S.; Bao, S.; Fei, B.; Su, Z.; and Yu, Y. 2008. Exploring folksonomy for personalized search. In SIGIR.

Digital Library

[43]

Zhang, C.; Wang, H.; Liu, Y.; Wu, D.; Liao, Y.; and Wang, B. 2008. Automatic keyword extraction from documents using conditional random fields. Journal of Computational Information Systems 4(3).

Cited By

Abulaish MFazil MZaki M(2022)Domain-Specific Keyword Extraction Using Joint Modeling of Local and Global Contextual SemanticsACM Transactions on Knowledge Discovery from Data10.1145/349456016:4(1-30)Online publication date: 8-Jan-2022
https://dl.acm.org/doi/10.1145/3494560
Yao KQin CZhu HMa CZhang JDu YXiong HDemartini GZuccon GCulpepper JHuang ZTong H(2021)An Interactive Neural Network Approach to Keyphrase Extraction in Talent RecruitmentProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482319(2383-2393)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3482319
Alzaidy RCaragea CGiles C(2019)Bi-LSTM-CRF Sequence Labeling for Keyphrase Extraction from Scholarly DocumentsThe World Wide Web Conference10.1145/3308558.3313642(2551-2557)Online publication date: 13-May-2019
https://dl.acm.org/doi/10.1145/3308558.3313642

Incorporating expert knowledge into keyphrase extraction
1. Computing methodologies
  1. Artificial intelligence
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Single document keyphrase extraction using neighborhood knowledge
AAAI'08: Proceedings of the 23rd national conference on Artificial intelligence - Volume 2

Existing methods for single document keyphrase extraction usually make use of only the information contained in the specified document. This paper proposes to use a small number of nearest neighbor documents to provide more knowledge to improve single ...
Domain-specific keyphrase extraction
CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management

Document keyphrases provide semantic metadata characterizing documents and producing an overview of the content of a document. They can be used in many text-mining and knowledge management related applications. This paper describes a Keyphrase ...
Exploiting neighborhood knowledge for single document summarization and keyphrase extraction

Document summarization and keyphrase extraction are two related tasks in the IR and NLP fields, and both of them aim at extracting condensed representations from a single text document. Existing methods for single document summarization and keyphrase ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

AAAI'17: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence

February 2017

5106 pages

Program Chairs:
Satinder Singh
University of Michigan
,
Shaul Markovitch
Technion-Israel Institute of Technology

Sponsors

Association for the Advancement of Artificial Intelligence
amazon: amazon
Infosys
Facebook: Facebook
IBM: IBM

Publisher

AAAI Press

Publication History

Published: 04 February 2017

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 21 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Abulaish MFazil MZaki M(2022)Domain-Specific Keyword Extraction Using Joint Modeling of Local and Global Contextual SemanticsACM Transactions on Knowledge Discovery from Data10.1145/349456016:4(1-30)Online publication date: 8-Jan-2022
https://dl.acm.org/doi/10.1145/3494560
Yao KQin CZhu HMa CZhang JDu YXiong HDemartini GZuccon GCulpepper JHuang ZTong H(2021)An Interactive Neural Network Approach to Keyphrase Extraction in Talent RecruitmentProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482319(2383-2393)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3482319
Alzaidy RCaragea CGiles C(2019)Bi-LSTM-CRF Sequence Labeling for Keyphrase Extraction from Scholarly DocumentsThe World Wide Web Conference10.1145/3308558.3313642(2551-2557)Online publication date: 13-May-2019
https://dl.acm.org/doi/10.1145/3308558.3313642

View Options

View options

Media

Figures

Other

Tables

View Table of Contents