Abstract
Massive open online courses (MOOCs) have emerged as a great resource for learners. Numerous challenges remain to be addressed in order to make MOOCs more useful and convenient for learners. One such challenge is how to automatically extract a set of keyphrases from MOOC video lectures that can help students quickly identify the right knowledge they want to learn and thus expedite their learning process. In this paper, we propose SemKeyphrase, an unsupervised cluster-based approach for keyphrase extraction from MOOC video lectures. SemKeyphrase incorporates a new semantic relatedness metric and a ranking algorithm, called PhraseRank, that involves two phases on ranking candidates. We conducted experiments on a real-world dataset of MOOC video lectures, and the results show that our proposed approach outperforms the state-of-the-art keyphrase extraction methods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Agrawal A, Venkatraman J, Leonard S, Paepcke A. Youedu: addressing confusion in MOOC discussion forums by recommending instructional video clips
Boudin F (2018) Unsupervised keyphrase extraction with multipartite graphs. In: Proceedings of the 2018 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol 2 (short papers) (New Orleans, Louisiana). Association for Computational Linguistics, pp 667–672
Bougouin A, Boudin F, Daille B (2013) Topicrank: graph-based topic ranking for keyphrase extraction. In: International joint conference on natural language processing (IJCNLP), pp 543–551
Brinton CG, Chiang M (2015) MOOC performance prediction via clickstream data and social learning networks. In: 2015 IEEE conference on computer communications (INFOCOM). IEEE, pp 2299–2307
Chuang J, Manning CD, Heer J (2012) “without the clutter of unimportant words’’: descriptive keyphrases for text visualization. ACM Trans Comput Hum Interact (TOCHI) 19(3):19
Coffrin C, Corrin L, de Barba P, Kennedy G (2014) Visualizing patterns of student engagement and performance in MOOCs. In: Proceedings of the fourth international conference on learning analytics and knowledge. ACM, pp 83–92
Dunning T (1993) Accurate methods for the statistics of surprise and coincidence. Comput Linguist 19(1):61–74
El-Beltagy SR, Rafea A (2010) Kp-miner: participation in semeval-2. In: Proceedings of the 5th international workshop on semantic evaluation, pp 190–193
Florescu C, Caragea C (2017) Positionrank: an unsupervised approach to keyphrase extraction from scholarly documents. In: Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: long papers), vol 1, pp 1105–1115
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976
Gollapalli SD, Caragea C (2014) Extracting keyphrases from research papers using citation networks. In: Twenty-eighth AAAI conference on artificial intelligence
Grineva M, Grinev M, Lizorkin D (2009) Extracting key terms from noisy and multitheme documents. In: Proceedings of the 18th international conference on World wide web. ACM, pp 661–670
Guo PJ, Reinecke K (2014) Demographic differences in how students navigate through MOOCs. In: Proceedings of the first ACM conference on Learning@ scale conference. ACM, pp 21–30
Hasan KS, Ng V (2010) Conundrums in unsupervised keyphrase extraction: making sense of the state-of-the-art. In: Proceedings of the 23rd international conference on computational linguistics: posters. Association for Computational Linguistics, pp 365–373
Hasan KS, Ng V (2014) Automatic keyphrase extraction: a survey of the state of the art. In: Proceedings of the 52nd annual meeting of the Association for Computational Linguistics (volume 1: long papers), vol 1, pp 1262–1273
Hulth A (2003) Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 216–223
John AK, Di Caro L, Boella G (2016) A supervised keyphrase extraction system. In: Proceedings of the 12th international conference on semantic systems. ACM, pp 57–62
Kim SN, Medelyan O, Kan M-Y, Baldwin T (2013) Automatic keyphrase extraction from scientific articles. Lang Resour Eval 47(3):723–742
Koka RS, Chowdhury FN, Rahman MR, Solorio T, Subhlok, J (2020) Automatic identification of keywords in lecture video segments. In: 2020 IEEE international symposium on multimedia (ISM). IEEE, pp 162–165
Liaw A, Wiener M et al (2002) Classification and regression by randomforest. R News 2(3):18–22
Liu Z, Huang W, Zheng Y, Sun M (2010) Automatic keyphrase extraction via topic decomposition. In: Proceedings of the 2010 conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 366–376
Liu Z, Li P, Zheng Y, Sun M (2009) Clustering to find exemplar terms for keyphrase extraction. In: Proceedings of the 2009 conference on empirical methods in natural language processing: volume 1-volume 1. Association for Computational Linguistics, pp 257–266
Lopez P, Romary L (2010) Humb: automatic key term extraction from scientific articles in grobid. In: Proceedings of the 5th international workshop on semantic evaluation. Association for Computational Linguistics, pp 248–251
Luo L, Zhang L, Peng H (2020) An unsupervised keyphrase extraction model by incorporating structural and semantic information. Prog Artif Intell 9(1):77–83
Manning CD, Surdeanu M, Bauer J, Finkel J, Bethard SJ, McClosky D (2014) The Stanford CoreNLP natural language processing toolkit. In: Association for Computational Linguistics (ACL) system demonstrations, pp 55–60
Martinez-Romo J, Araujo L, Duque Fernandez A (2016) Semgraph: extracting keyphrases following a novel semantic graph-based approach. J Assoc Inf Sci Technol 67(1):71–82
Mihalcea R, Tarau P (2004) Textrank: bringing order into text. In: Proceedings of the 2004 conference on empirical methods in natural language processing
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 arXiv:1301.3781
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Nguyen TD, Kan M-Y (2007) Keyphrase extraction in scientific publications. In: International conference on Asian digital libraries. Springer, pp 317–326
Pagliardini M, Gupta P, Jaggi M (2018) Unsupervised learning of sentence embeddings using compositional n-gram features. In: Proceedings of the 2018 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies, volume 1 (long papers) (New Orleans, Louisiana). Association for Computational Linguistics, pp 528–540
Pan L, Wang X, Li C, Li J, Tang J (2017) Course concept extraction in MOOCs via embedding-based graph propagation. In: Proceedings of the eighth international joint conference on natural language processing (volume 1: long papers), vol 1, pp 875–884
Park Y, Byrd RJ, Boguraev BK (2002) Automatic glossary extraction: beyond terminology identification. In: COLING 2002: the 19th international conference on computational linguistics
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523
Stephens-Martinez K, Hearst MA, Fox A (2014) Monitoring MOOCs: which information sources do instructors value? In: Proceedings of the first ACM conference on Learning@ scale conference. ACM, pp 79–88
Tomokiyo T, Hurst M (2003) A language model approach to keyphrase extraction. In: Proceedings of the ACL 2003 workshop on multiword expressions: analysis, acquisition and treatment
Turney PD (2000) Learning algorithms for keyphrase extraction. Inf Retrieval 2(4):303–336
Voorhees EM et al (1999) The trec-8 question answering track report. In: Trec, vol 99, Citeseer, pp 77–82
Wan X, Xiao J (2008) Single document keyphrase extraction using neighborhood knowledge. AAAI 8:855–860
Wang R, Liu W, McDonald C (2014) Corpus-independent generic keyphrase extraction using word embedding vectors. In: Software engineering research conference, vol 39
Witten IH, Medelyan O (2006) Thesaurus based automatic keyphrase indexing. In: Proceedings of the 6th ACM/IEEE-CS joint conference on digital libraries (JCDL’06). IEEE, pp 296–297
Witten IH, Paynter GW, Frank E, Gutwin C, Nevill-Manning CG (2005) Kea: practical automated keyphrase extraction. In: Design and usability of digital libraries: case studies in the Asia Pacific. IGI Global, pp 129–152
Yadav K, Gandhi A, Biswas A, Shrivastava K, Srivastava S, Deshmukh O (2016) Vizig: anchor points based non-linear navigation and summarization in educational videos. In: Proceedings of the 21st international conference on intelligent user interfaces. ACM, pp 407–418
You W, Fontaine D, Barthès J-P (2013) An automatic keyphrase extraction system for scientific documents. Knowl Inf Syst 34(3):691–724
Zu X, Xie F, Liu X (2020) Graph-based keyphrase extraction using word and document embeddings. In: 2020 IEEE international conference on knowledge graph (ICKG). IEEE, pp 70–76
Acknowledgements
The authors would like to thank Mr. Melvyn Leon Boois for his help in proofreading the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Albahr, A., Che, D. & Albahar, M. A novel cluster-based approach for keyphrase extraction from MOOC video lectures. Knowl Inf Syst 63, 1663–1686 (2021). https://doi.org/10.1007/s10115-021-01568-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-021-01568-2