Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

A novel cluster-based approach for keyphrase extraction from MOOC video lectures

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Massive open online courses (MOOCs) have emerged as a great resource for learners. Numerous challenges remain to be addressed in order to make MOOCs more useful and convenient for learners. One such challenge is how to automatically extract a set of keyphrases from MOOC video lectures that can help students quickly identify the right knowledge they want to learn and thus expedite their learning process. In this paper, we propose SemKeyphrase, an unsupervised cluster-based approach for keyphrase extraction from MOOC video lectures. SemKeyphrase incorporates a new semantic relatedness metric and a ranking algorithm, called PhraseRank, that involves two phases on ranking candidates. We conducted experiments on a real-world dataset of MOOC video lectures, and the results show that our proposed approach outperforms the state-of-the-art keyphrase extraction methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. https://www.coursera.org/learn/c-plus-plus-a.

  2. https://radimrehurek.com/gensim/.

  3. https://dumps.wikimedia.org/enwiki/.

References

  1. Agrawal A, Venkatraman J, Leonard S, Paepcke A. Youedu: addressing confusion in MOOC discussion forums by recommending instructional video clips

  2. Boudin F (2018) Unsupervised keyphrase extraction with multipartite graphs. In: Proceedings of the 2018 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol 2 (short papers) (New Orleans, Louisiana). Association for Computational Linguistics, pp 667–672

  3. Bougouin A, Boudin F, Daille B (2013) Topicrank: graph-based topic ranking for keyphrase extraction. In: International joint conference on natural language processing (IJCNLP), pp 543–551

  4. Brinton CG, Chiang M (2015) MOOC performance prediction via clickstream data and social learning networks. In: 2015 IEEE conference on computer communications (INFOCOM). IEEE, pp 2299–2307

  5. Chuang J, Manning CD, Heer J (2012) “without the clutter of unimportant words’’: descriptive keyphrases for text visualization. ACM Trans Comput Hum Interact (TOCHI) 19(3):19

    Article  Google Scholar 

  6. Coffrin C, Corrin L, de Barba P, Kennedy G (2014) Visualizing patterns of student engagement and performance in MOOCs. In: Proceedings of the fourth international conference on learning analytics and knowledge. ACM, pp 83–92

  7. Dunning T (1993) Accurate methods for the statistics of surprise and coincidence. Comput Linguist 19(1):61–74

    Google Scholar 

  8. El-Beltagy SR, Rafea A (2010) Kp-miner: participation in semeval-2. In: Proceedings of the 5th international workshop on semantic evaluation, pp 190–193

  9. Florescu C, Caragea C (2017) Positionrank: an unsupervised approach to keyphrase extraction from scholarly documents. In: Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: long papers), vol 1, pp 1105–1115

  10. Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976

    Article  MathSciNet  Google Scholar 

  11. Gollapalli SD, Caragea C (2014) Extracting keyphrases from research papers using citation networks. In: Twenty-eighth AAAI conference on artificial intelligence

  12. Grineva M, Grinev M, Lizorkin D (2009) Extracting key terms from noisy and multitheme documents. In: Proceedings of the 18th international conference on World wide web. ACM, pp 661–670

  13. Guo PJ, Reinecke K (2014) Demographic differences in how students navigate through MOOCs. In: Proceedings of the first ACM conference on Learning@ scale conference. ACM, pp 21–30

  14. Hasan KS, Ng V (2010) Conundrums in unsupervised keyphrase extraction: making sense of the state-of-the-art. In: Proceedings of the 23rd international conference on computational linguistics: posters. Association for Computational Linguistics, pp 365–373

  15. Hasan KS, Ng V (2014) Automatic keyphrase extraction: a survey of the state of the art. In: Proceedings of the 52nd annual meeting of the Association for Computational Linguistics (volume 1: long papers), vol 1, pp 1262–1273

  16. Hulth A (2003) Improved automatic keyword extraction given more linguistic knowledge. In: Proceedings of the 2003 conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 216–223

  17. John AK, Di Caro L, Boella G (2016) A supervised keyphrase extraction system. In: Proceedings of the 12th international conference on semantic systems. ACM, pp 57–62

  18. Kim SN, Medelyan O, Kan M-Y, Baldwin T (2013) Automatic keyphrase extraction from scientific articles. Lang Resour Eval 47(3):723–742

    Article  Google Scholar 

  19. Koka RS, Chowdhury FN, Rahman MR, Solorio T, Subhlok, J (2020) Automatic identification of keywords in lecture video segments. In: 2020 IEEE international symposium on multimedia (ISM). IEEE, pp 162–165

  20. Liaw A, Wiener M et al (2002) Classification and regression by randomforest. R News 2(3):18–22

    Google Scholar 

  21. Liu Z, Huang W, Zheng Y, Sun M (2010) Automatic keyphrase extraction via topic decomposition. In: Proceedings of the 2010 conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 366–376

  22. Liu Z, Li P, Zheng Y, Sun M (2009) Clustering to find exemplar terms for keyphrase extraction. In: Proceedings of the 2009 conference on empirical methods in natural language processing: volume 1-volume 1. Association for Computational Linguistics, pp 257–266

  23. Lopez P, Romary L (2010) Humb: automatic key term extraction from scientific articles in grobid. In: Proceedings of the 5th international workshop on semantic evaluation. Association for Computational Linguistics, pp 248–251

  24. Luo L, Zhang L, Peng H (2020) An unsupervised keyphrase extraction model by incorporating structural and semantic information. Prog Artif Intell 9(1):77–83

    Article  Google Scholar 

  25. Manning CD, Surdeanu M, Bauer J, Finkel J, Bethard SJ, McClosky D (2014) The Stanford CoreNLP natural language processing toolkit. In: Association for Computational Linguistics (ACL) system demonstrations, pp 55–60

  26. Martinez-Romo J, Araujo L, Duque Fernandez A (2016) Semgraph: extracting keyphrases following a novel semantic graph-based approach. J Assoc Inf Sci Technol 67(1):71–82

    Article  Google Scholar 

  27. Mihalcea R, Tarau P (2004) Textrank: bringing order into text. In: Proceedings of the 2004 conference on empirical methods in natural language processing

  28. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 arXiv:1301.3781

  29. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119

  30. Nguyen TD, Kan M-Y (2007) Keyphrase extraction in scientific publications. In: International conference on Asian digital libraries. Springer, pp 317–326

  31. Pagliardini M, Gupta P, Jaggi M (2018) Unsupervised learning of sentence embeddings using compositional n-gram features. In: Proceedings of the 2018 conference of the North American Chapter of the Association for Computational Linguistics: human language technologies, volume 1 (long papers) (New Orleans, Louisiana). Association for Computational Linguistics, pp 528–540

  32. Pan L, Wang X, Li C, Li J, Tang J (2017) Course concept extraction in MOOCs via embedding-based graph propagation. In: Proceedings of the eighth international joint conference on natural language processing (volume 1: long papers), vol 1, pp 875–884

  33. Park Y, Byrd RJ, Boguraev BK (2002) Automatic glossary extraction: beyond terminology identification. In: COLING 2002: the 19th international conference on computational linguistics

  34. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  35. Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523

    Article  Google Scholar 

  36. Stephens-Martinez K, Hearst MA, Fox A (2014) Monitoring MOOCs: which information sources do instructors value? In: Proceedings of the first ACM conference on Learning@ scale conference. ACM, pp 79–88

  37. Tomokiyo T, Hurst M (2003) A language model approach to keyphrase extraction. In: Proceedings of the ACL 2003 workshop on multiword expressions: analysis, acquisition and treatment

  38. Turney PD (2000) Learning algorithms for keyphrase extraction. Inf Retrieval 2(4):303–336

    Article  Google Scholar 

  39. Voorhees EM et al (1999) The trec-8 question answering track report. In: Trec, vol 99, Citeseer, pp 77–82

  40. Wan X, Xiao J (2008) Single document keyphrase extraction using neighborhood knowledge. AAAI 8:855–860

    Google Scholar 

  41. Wang R, Liu W, McDonald C (2014) Corpus-independent generic keyphrase extraction using word embedding vectors. In: Software engineering research conference, vol 39

  42. Witten IH, Medelyan O (2006) Thesaurus based automatic keyphrase indexing. In: Proceedings of the 6th ACM/IEEE-CS joint conference on digital libraries (JCDL’06). IEEE, pp 296–297

  43. Witten IH, Paynter GW, Frank E, Gutwin C, Nevill-Manning CG (2005) Kea: practical automated keyphrase extraction. In: Design and usability of digital libraries: case studies in the Asia Pacific. IGI Global, pp 129–152

  44. Yadav K, Gandhi A, Biswas A, Shrivastava K, Srivastava S, Deshmukh O (2016) Vizig: anchor points based non-linear navigation and summarization in educational videos. In: Proceedings of the 21st international conference on intelligent user interfaces. ACM, pp 407–418

  45. You W, Fontaine D, Barthès J-P (2013) An automatic keyphrase extraction system for scientific documents. Knowl Inf Syst 34(3):691–724

    Article  Google Scholar 

  46. Zu X, Xie F, Liu X (2020) Graph-based keyphrase extraction using word and document embeddings. In: 2020 IEEE international conference on knowledge graph (ICKG). IEEE, pp 70–76

Download references

Acknowledgements

The authors would like to thank Mr. Melvyn Leon Boois for his help in proofreading the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abdulaziz Albahr.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Albahr, A., Che, D. & Albahar, M. A novel cluster-based approach for keyphrase extraction from MOOC video lectures. Knowl Inf Syst 63, 1663–1686 (2021). https://doi.org/10.1007/s10115-021-01568-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-021-01568-2

Keywords

Navigation