A novel cluster-based approach for keyphrase extraction from MOOC video lectures

535 Accesses
7 Citations
1 Altmetric
Explore all metrics

Abstract

Massive open online courses (MOOCs) have emerged as a great resource for learners. Numerous challenges remain to be addressed in order to make MOOCs more useful and convenient for learners. One such challenge is how to automatically extract a set of keyphrases from MOOC video lectures that can help students quickly identify the right knowledge they want to learn and thus expedite their learning process. In this paper, we propose SemKeyphrase, an unsupervised cluster-based approach for keyphrase extraction from MOOC video lectures. SemKeyphrase incorporates a new semantic relatedness metric and a ranking algorithm, called PhraseRank, that involves two phases on ranking candidates. We conducted experiments on a real-world dataset of MOOC video lectures, and the results show that our proposed approach outperforms the state-of-the-art keyphrase extraction methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Automatic point of interest detection for open online educational video lectures

Article 10 July 2018

Keyword-Based Search and Ranking in NPTEL Lecture Videos

MOOCon: A Framework for Semi-supervised Concept Extraction from MOOC Content

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

References

Agrawal A, Venkatraman J, Leonard S, Paepcke A. Youedu: addressing confusion in MOOC discussion forums by recommending instructional video clips
Boudin F (2018) Unsupervised keyphrase extraction with multipartite graphs. In: Proceedings of the 2018 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol 2 (short papers) (New Orleans, Louisiana). Association for Computational Linguistics, pp 667–672
Bougouin A, Boudin F, Daille B (2013) Topicrank: graph-based topic ranking for keyphrase extraction. In: International joint conference on natural language processing (IJCNLP), pp 543–551
Brinton CG, Chiang M (2015) MOOC performance prediction via clickstream data and social learning networks. In: 2015 IEEE conference on computer communications (INFOCOM). IEEE, pp 2299–2307
Chuang J, Manning CD, Heer J (2012) “without the clutter of unimportant words’’: descriptive keyphrases for text visualization. ACM Trans Comput Hum Interact (TOCHI) 19(3):19
Article Google Scholar
Coffrin C, Corrin L, de Barba P, Kennedy G (2014) Visualizing patterns of student engagement and performance in MOOCs. In: Proceedings of the fourth international conference on learning analytics and knowledge. ACM, pp 83–92
Dunning T (1993) Accurate methods for the statistics of surprise and coincidence. Comput Linguist 19(1):61–74
Google Scholar
El-Beltagy SR, Rafea A (2010) Kp-miner: participation in semeval-2. In: Proceedings of the 5th international workshop on semantic evaluation, pp 190–193
Florescu C, Caragea C (2017) Positionrank: an unsupervised approach to keyphrase extraction from scholarly documents. In: Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: long papers), vol 1, pp 1105–1115
Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976
Article MathSciNet Google Scholar
Gollapalli SD, Caragea C (2014) Extracting keyphrases from research papers using citation networks. In: Twenty-eighth AAAI conference on artificial intelligence
Grineva M, Grinev M, Lizorkin D (2009) Extracting key terms from noisy and multitheme documents. In: Proceedings of the 18th international conference on World wide web. ACM, pp 661–670
Guo PJ, Reinecke K (2014) Demographic differences in how students navigate through MOOCs. In: Proceedings of the first ACM conference on Learning@ scale conference. ACM, pp 21–30
Hasan KS, Ng V (2010) Conundrums in unsupervised keyphrase extraction: making sense of the state-of-the-art. In: Proceedings of the 23rd international conference on computational linguistics: posters. Association for Computational Linguistics, pp 365–373
Hasan KS, Ng V (2014) Automatic keyphrase extraction: a survey of the state of the art. In: Proceedings of the 52nd annual meeting of the Association for Computational Linguistics (volume 1: long papers), vol 1, pp 1262–1273
Hulth A (2003) Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 216–223
John AK, Di Caro L, Boella G (2016) A supervised keyphrase extraction system. In: Proceedings of the 12th international conference on semantic systems. ACM, pp 57–62
Kim SN, Medelyan O, Kan M-Y, Baldwin T (2013) Automatic keyphrase extraction from scientific articles. Lang Resour Eval 47(3):723–742
Article Google Scholar
Koka RS, Chowdhury FN, Rahman MR, Solorio T, Subhlok, J (2020) Automatic identification of keywords in lecture video segments. In: 2020 IEEE international symposium on multimedia (ISM). IEEE, pp 162–165
Liaw A, Wiener M et al (2002) Classification and regression by randomforest. R News 2(3):18–22
Google Scholar
Liu Z, Huang W, Zheng Y, Sun M (2010) Automatic keyphrase extraction via topic decomposition. In: Proceedings of the 2010 conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 366–376
Liu Z, Li P, Zheng Y, Sun M (2009) Clustering to find exemplar terms for keyphrase extraction. In: Proceedings of the 2009 conference on empirical methods in natural language processing: volume 1-volume 1. Association for Computational Linguistics, pp 257–266
Lopez P, Romary L (2010) Humb: automatic key term extraction from scientific articles in grobid. In: Proceedings of the 5th international workshop on semantic evaluation. Association for Computational Linguistics, pp 248–251
Luo L, Zhang L, Peng H (2020) An unsupervised keyphrase extraction model by incorporating structural and semantic information. Prog Artif Intell 9(1):77–83
Article Google Scholar
Manning CD, Surdeanu M, Bauer J, Finkel J, Bethard SJ, McClosky D (2014) The Stanford CoreNLP natural language processing toolkit. In: Association for Computational Linguistics (ACL) system demonstrations, pp 55–60
Martinez-Romo J, Araujo L, Duque Fernandez A (2016) Semgraph: extracting keyphrases following a novel semantic graph-based approach. J Assoc Inf Sci Technol 67(1):71–82
Article Google Scholar
Mihalcea R, Tarau P (2004) Textrank: bringing order into text. In: Proceedings of the 2004 conference on empirical methods in natural language processing
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 arXiv:1301.3781
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Nguyen TD, Kan M-Y (2007) Keyphrase extraction in scientific publications. In: International conference on Asian digital libraries. Springer, pp 317–326
Pagliardini M, Gupta P, Jaggi M (2018) Unsupervised learning of sentence embeddings using compositional n-gram features. In: Proceedings of the 2018 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies, volume 1 (long papers) (New Orleans, Louisiana). Association for Computational Linguistics, pp 528–540
Pan L, Wang X, Li C, Li J, Tang J (2017) Course concept extraction in MOOCs via embedding-based graph propagation. In: Proceedings of the eighth international joint conference on natural language processing (volume 1: long papers), vol 1, pp 875–884
Park Y, Byrd RJ, Boguraev BK (2002) Automatic glossary extraction: beyond terminology identification. In: COLING 2002: the 19th international conference on computational linguistics
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523
Article Google Scholar
Stephens-Martinez K, Hearst MA, Fox A (2014) Monitoring MOOCs: which information sources do instructors value? In: Proceedings of the first ACM conference on Learning@ scale conference. ACM, pp 79–88
Tomokiyo T, Hurst M (2003) A language model approach to keyphrase extraction. In: Proceedings of the ACL 2003 workshop on multiword expressions: analysis, acquisition and treatment
Turney PD (2000) Learning algorithms for keyphrase extraction. Inf Retrieval 2(4):303–336
Article Google Scholar
Voorhees EM et al (1999) The trec-8 question answering track report. In: Trec, vol 99, Citeseer, pp 77–82
Wan X, Xiao J (2008) Single document keyphrase extraction using neighborhood knowledge. AAAI 8:855–860
Google Scholar
Wang R, Liu W, McDonald C (2014) Corpus-independent generic keyphrase extraction using word embedding vectors. In: Software engineering research conference, vol 39
Witten IH, Medelyan O (2006) Thesaurus based automatic keyphrase indexing. In: Proceedings of the 6th ACM/IEEE-CS joint conference on digital libraries (JCDL’06). IEEE, pp 296–297
Witten IH, Paynter GW, Frank E, Gutwin C, Nevill-Manning CG (2005) Kea: practical automated keyphrase extraction. In: Design and usability of digital libraries: case studies in the Asia Pacific. IGI Global, pp 129–152
Yadav K, Gandhi A, Biswas A, Shrivastava K, Srivastava S, Deshmukh O (2016) Vizig: anchor points based non-linear navigation and summarization in educational videos. In: Proceedings of the 21st international conference on intelligent user interfaces. ACM, pp 407–418
You W, Fontaine D, Barthès J-P (2013) An automatic keyphrase extraction system for scientific documents. Knowl Inf Syst 34(3):691–724
Article Google Scholar
Zu X, Xie F, Liu X (2020) Graph-based keyphrase extraction using word and document embeddings. In: 2020 IEEE international conference on knowledge graph (ICKG). IEEE, pp 70–76

Download references

Acknowledgements

The authors would like to thank Mr. Melvyn Leon Boois for his help in proofreading the paper.

Author information

Authors and Affiliations

King Saud Bin Abdulaziz for Health Science, Al-Ahsa, Saudi Arabia
Abdulaziz Albahr
King Abdullah International Medical Research Center, Al-Ahsa, Saudi Arabia
Abdulaziz Albahr
Southern Illinois University Carbondale, Carbondale, IL, USA
Dunren Che
Umm Al-Qura University, Mecca, Saudi Arabia
Marwan Albahar

Authors

Abdulaziz Albahr
View author publications
You can also search for this author in PubMed Google Scholar
Dunren Che
View author publications
You can also search for this author in PubMed Google Scholar
Marwan Albahar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdulaziz Albahr.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Albahr, A., Che, D. & Albahar, M. A novel cluster-based approach for keyphrase extraction from MOOC video lectures. Knowl Inf Syst 63, 1663–1686 (2021). https://doi.org/10.1007/s10115-021-01568-2

Download citation

Received: 12 May 2020
Revised: 01 April 2021
Accepted: 07 April 2021
Published: 21 April 2021
Issue Date: July 2021
DOI: https://doi.org/10.1007/s10115-021-01568-2

A novel cluster-based approach for keyphrase extraction from MOOC video lectures

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Automatic point of interest detection for open online educational video lectures

Keyword-Based Search and Ranking in NPTEL Lecture Videos

MOOCon: A Framework for Semi-supervised Concept Extraction from MOOC Content

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A novel cluster-based approach for keyphrase extraction from MOOC video lectures

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Automatic point of interest detection for open online educational video lectures

Keyword-Based Search and Ranking in NPTEL Lecture Videos

MOOCon: A Framework for Semi-supervised Concept Extraction from MOOC Content

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation