Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/3298023.3298031guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Incorporating expert knowledge into keyphrase extraction

Published: 04 February 2017 Publication History

Abstract

Keyphrases that efficiently summarize a document's content are used in various document processing and retrieval tasks. Current state-of-the-art techniques for keyphrase extraction operate at a phrase-level and involve scoring candidate phrases based on features of their component words. In this paper, we learn keyphrase taggers for research papers using token-based features incorporating linguistic, surface-form, and document-structure information through sequence labeling. We experimentally illustrate that using withindocument features alone, our tagger trained with Conditional Random Fields performs on-par with existing state-of-the-art systems that rely on information from Wikipedia and citation networks. In addition, we are also able to harness recent work on feature labeling to seamlessly incorporate expert knowledge and predictions from existing systems to enhance the extraction performance further. We highlight the modeling advantages of our keyphrase taggers and show significant performance improvements on two recently-compiled datasets of keyphrases from Computer Science research papers.

References

[1]
Arcan, M.; Turchi, M.; Tonelli, S.; and Buitelaar, P. 2014. Enhancing statistical machine translation with bilingual terminology in a cat environment. In Proceedings of the Eleventh Biennial Conference of the Association for Machine Translation in the Americas.
[2]
Bao, S.; Xue, G.; Wu, X.; Yu, Y.; Fei, B.; and Su, Z. 2007. Optimizing web search using social annotations. In WWW.
[3]
Bhaskar, P.; Nongmeikapam, K.; and Bandyopadhyay, S. 2012. Keyphrase extraction in scientific articles: A supervised approach. In COLING.
[4]
Bishop, C. M. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc.
[5]
Bong, S.-Y., and Hwang, K.-B. 2011. Keyphrase extraction in biomedical publications using mesh and intraphrase word co-occurrence information. In Proceedings of the ACM Fifth International Workshop on Data and Text Mining in Biomedical Informatics.
[6]
Boudin, F. 2013. A comparison of centrality measures for graph-based keyphrase extraction. In IJCNLP.
[7]
Caragea, C.; Bulgarov, F. A.; Godea, A.; and Gollapalli, S. D. 2014. Citation-enhanced keyphrase extraction from research papers: A supervised approach. In EMNLP.
[8]
Druck, G.; Mann, G.; and McCallum, A. 2008. Learning from labeled features using generalized expectation criteria. In SIGIR.
[9]
Finkel, J. R.; Grenager, T.; and Manning, C. 2005. Incorporating non-local information into information extraction systems by gibbs sampling. In ACL.
[10]
Frank, E.; Paynter, G. W.; Witten, I. H.; Gutwin, C.; and Nevill-Manning, C. G. 1999. Domain-specific keyphrase extraction. In IJCAI.
[11]
Ganchev, K.; Graça, J. a.; Gillenwater, J.; and Taskar, B. 2010. Posterior regularization for structured latent variable models. JMLR.
[12]
Gollapalli, S. D., and Caragea, C. 2014. Extracting keyphrases from research papers using citation networks. In AAAI.
[13]
Gollapalli, S. D.; Qi, Y.; Mitra, P.; and Giles, C. L. 2014. Extracting researcher metadata with labeled features. In SDM.
[14]
Haghighi, A., and Klein, D. 2006. Prototype-driven learning for sequence models. In HLT-NAACL.
[15]
Hammouda, K. M.; Matute, D. N.; and Kamel, M. S. 2005. Corephrase: Keyphrase extraction for document clustering. In Machine Learning and Data Mining in Pattern Recognition.
[16]
Hasan, K. S., and Ng, V. 2010. Conundrums in unsupervised keyphrase extraction: Making sense of the state-of-the-art. In COLING.
[17]
Hasan, K. S., and Ng, V. 2014. Automatic keyphrase extraction: A survey of the state of the art. In ACL.
[18]
Hoffmann, R.; Zhang, C.; Ling, X.; Zettlemoyer, L.; and Weld, D. S. 2011. Knowledge-based weak supervision for information extraction of overlapping relations. In HLT.
[19]
Hulth, A. 2003. Improved automatic keyword extraction given more linguistic knowledge. EMNLP.
[20]
Indurkhya, N., and Damerau, F. J. 2010. Handbook of Natural Language Processing. Chapman & Hall/CRC, 2nd edition.
[21]
Jiang, X.; Hu, Y.; and Li, H. 2009. A ranking approach to keyphrase extraction. In SIGIR.
[22]
Kim, S. N.; Medelyan, O.; Kan, M.-Y.; and Baldwin, T. 2010. Semeval-2010 task 5: Automatic keyphrase extraction from scientific articles. In SemEval.
[23]
Lafferty, J. D.; McCallum, A.; and Pereira, F. C. N. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML.
[24]
Li, H.; Councill, I. G.; Bolelli, L.; Zhou, D.; Song, Y.; Lee, W.-C.; Sivasubramaniam, A.; and Giles, C. L. 2006. Citeseerx: A scalable autonomous scientific digital library. In Proceedings of the 1st International Conference on Scalable Information Systems.
[25]
Li, Z.; Zhou, D.; Juan, Y.-F.; and Han, J. 2010. Keyword extraction for social snippets. In WWW.
[26]
Liu, F.; Pennell, D.; Liu, F.; and Liu, Y. 2009. Unsupervised approaches for automatic keyword extraction using meeting transcripts. In NAACL.
[27]
Mann, S. G., and McCallum, A. 2008. Generalized expectation criteria for semi-supervised learning of conditional random fields. In Proceedings of ACL-08: HLT.
[28]
Mann, G. S., and McCallum, A. 2010. Generalized expectation criteria for semi-supervised learning with weakly labeled data. J. Mach. Learn. Res.
[29]
Manning, C. D.; Raghavan, P.; and Schütze, H. 2008. Introduction to Information Retrieval. Cambridge University Press.
[30]
Marujo, L.; Ribeiro, R.; de Matos, D. M.; Neto, J. P.; Gershman, A.; and Carbonell, J. G. 2013. Key phrase extraction of lightly filtered broadcast news. CoRR.
[31]
McCallum, A. K. 2002. Mallet: A machine learning for language toolkit. http://mallet.cs.umass.edu.
[32]
Medelyan, O.; Frank, E.; and Witten, I. H. 2009. Human-competitive tagging using automatic keyphrase extraction. In EMNLP.
[33]
Mihalcea, R., and Tarau, P. 2004. Textrank: Bringing order into text. In EMNLP.
[34]
Nguyen, T. D., and Kan, M.-Y. 2007. Keyphrase extraction in scientific publications. In Proceedings of the 10th International Conference on Asian Digital Libraries: Looking Back 10 Years and Forging New Frontiers.
[35]
Porter, M. F. 1997. Readings in information retrieval. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. chapter An Algorithm for Suffix Stripping.
[36]
Sarawagi, S. 2005. Advanced Methods for Knowledge Discovery from Complex Data. chapter Sequence Data Mining.
[37]
Sutton, C., and McCallum, A. 2012. An introduction to conditional random fields. Found. Trends Mach. Learn.
[38]
Turney, P. D. 2000. Learning algorithms for keyphrase extraction. Information Retrieval 2(4).
[39]
Wan, X., and Xiao, J. 2008. Single document keyphrase extraction using neighborhood knowledge. In AAAI.
[40]
Wang, R.; Liu, W.; and McDonald, C. 2015. Corpus-independent generic keyphrase extraction using word embedding vectors. In Deep Learning for Web Search and Data Mining.
[41]
Witten, I. H.; Paynter, G. W.; Frank, E.; Gutwin, C.; and Nevill-Manning, C. G. 1999. Kea: Practical automatic keyphrase extraction. In Proceedings of the Fourth ACM Conference on Digital Libraries.
[42]
Xu, S.; Bao, S.; Fei, B.; Su, Z.; and Yu, Y. 2008. Exploring folksonomy for personalized search. In SIGIR.
[43]
Zhang, C.; Wang, H.; Liu, Y.; Wu, D.; Liao, Y.; and Wang, B. 2008. Automatic keyword extraction from documents using conditional random fields. Journal of Computational Information Systems 4(3).

Cited By

View all
  • (2022)Domain-Specific Keyword Extraction Using Joint Modeling of Local and Global Contextual SemanticsACM Transactions on Knowledge Discovery from Data10.1145/349456016:4(1-30)Online publication date: 8-Jan-2022
  • (2021)An Interactive Neural Network Approach to Keyphrase Extraction in Talent RecruitmentProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482319(2383-2393)Online publication date: 26-Oct-2021
  • (2019)Bi-LSTM-CRF Sequence Labeling for Keyphrase Extraction from Scholarly DocumentsThe World Wide Web Conference10.1145/3308558.3313642(2551-2557)Online publication date: 13-May-2019

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
AAAI'17: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence
February 2017
5106 pages

Sponsors

  • Association for the Advancement of Artificial Intelligence
  • amazon: amazon
  • Infosys
  • Facebook: Facebook
  • IBM: IBM

Publisher

AAAI Press

Publication History

Published: 04 February 2017

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Domain-Specific Keyword Extraction Using Joint Modeling of Local and Global Contextual SemanticsACM Transactions on Knowledge Discovery from Data10.1145/349456016:4(1-30)Online publication date: 8-Jan-2022
  • (2021)An Interactive Neural Network Approach to Keyphrase Extraction in Talent RecruitmentProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482319(2383-2393)Online publication date: 26-Oct-2021
  • (2019)Bi-LSTM-CRF Sequence Labeling for Keyphrase Extraction from Scholarly DocumentsThe World Wide Web Conference10.1145/3308558.3313642(2551-2557)Online publication date: 13-May-2019

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media