Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1007/978-3-030-54956-5_11guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Knowledge-Based Categorization of Scientific Articles for Similarity Predictions

Published: 25 August 2020 Publication History

Abstract

Staying aware of new approaches emerging within specific areas can be challenging for researchers who have to follow many feeds such as journals articles, authors’ papers, and other basic keyword-based matching algorithms. Hence, this paper proposes an information retrieval process for scientific articles aiming to suggest semantically related articles using exclusively a knowledge base. The first step categorizes articles by the disambiguation of their keywords by identifying common categories within the knowledge base. Then, similar articles are identified using the information extracted from the categorization, such as synonyms. The experimental evaluation shows that the proposed approach significantly outperforms the well known cosine similarity measure of vectors angles inherited from word2vec embeddings. Indeed, there is a difference of 30% for P@k () in favor of the proposed approach.

References

[1]
Ensan, F., Bagheri, E.: Document retrieval model through semantic linking. In: WSDM, pp. 181–190. ACM (2017)
[2]
Firth, J.G.: A synopsis of linguistic theory 1930–1955 in studies in linguistic analysis, Oxford (1962)
[3]
Fuhr N Probabilistic models in information retrieval Comput. J. 1992 35 3 243-255
[4]
Garfield E Current eamments Curr. Contents 1990 32 3-7
[5]
Gil-Leiva I and Alonso-Arroyo A Keywords given by authors of scientific articles in database descriptors J. Am. Soc. Inf. Sci. Technol. 2007 58 8 1175-1187
[6]
Guan, Z., Cutrell, E.: An eye tracking study of the effect of target rank on web search. In: SIGCHI, pp. 417–420. ACM (2007)
[7]
Heidarysafa M, Kowsari K, Brown DE, Meimandi KJ, and Barnes LE An improvement of data classification using random multimodel deep learning (RMDL) Int. J. Mach. Learn. Comput. 2018 8 4 298-310
[8]
Hotho, A., Nürnberger, A., Paass, G.: A brief survey of text mining. In: LDV Forum, vol. 20, pp. 19–62 (2005)
[9]
Huang, A.: Similarity measures for text document clustering. In: NZCSRSC, vol. 4, pp. 9–56 (2008)
[10]
Joachims, T., Granka, L.A., Pan, B., Hembrooke, H., Gay, G.: Accurately interpreting clickthrough data as implicit feedback. In: SIGIR, vol. 5, pp. 154–161 (2005)
[11]
Johnson, R., Watkinson, A., Mabe, M.: The STM report: an overview of scientific and scholarly publishing (2018). https://www.stm-assoc.org/2018_10_04_STM_Report_2018.pdf
[12]
Kanakia, A., Shen, Z., Eide, D., Wang, K.: A scalable hybrid research paper recommender system for Microsoft academic. In: The World Wide Web Conference, pp. 2893–2899. ACM (2019)
[13]
Korenius T, Laurikkala J, and Juhola M On principal component analysis, cosine and euclidean measures in information retrieval Inf. Sci. 2007 177 22 4893-4905
[14]
Kowsari, K., Brown, D.E., Heidarysafa, M., Meimandi, K.J., Gerber, M.S., Barnes, L.E.: HDLTex: hierarchical deep learning for text classification. In: ICMLA, pp. 364–371. IEEE (2017)
[15]
Kowsari, K., Heidarysafa, M., Brown, D.E., Meimandi, K.J., Barnes, L.E.: RMDL: random multimodel deep learning for classification. In: ICISDM, pp. 19–28. ACM (2018)
[16]
Latard, B.: Scientific search engines: from the categorization to the information retrieval. Ph.D. thesis, Université de Haute-Alsace (2019)
[17]
Latard B, Weber J, Forestier G, and Hassenforder M Kamps J, Tsakonas G, Manolopoulos Y, Iliadis L, and Karydis I Towards a semantic search engine for scientific articles Research and Advanced Technology for Digital Libraries 2017 Cham Springer 608-611
[18]
Latard, B., Weber, J., Forestier, G., Hassenforder, M.: Using semantic relations between keywords to categorize articles from scientific literature. In: ICTAI, pp. 260–264. IEEE (2017)
[19]
Manning C, Raghavan P, and Schütze H Introduction to information retrieval Nat. Lang. Eng. 2010 16 1 100-103
[20]
Menaka S and Radha N Text classification using keyword extraction technique Int. J. Adv. Res. Comput. Sci. Softw. Eng. 2013 3 12 734-740
[21]
Mikolov, T., Chen, K., Corrado, G.S., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
[22]
Miller GA WordNet: a lexical database for English Commun. ACM 1995 38 11 39-41
[23]
Navigli R Word sense disambiguation: a survey ACM Comput. Surv. 2009 41 10:1-10:69
[24]
Navigli R and Ponzetto SP BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network Artif. Intell. 2012 193 217-250
[25]
Pain, E.: How to keep up with the scientific literature (2016). https://www.sciencemag.org/careers/2016/11/how-keep-scientific-literature
[26]
Qazanfari K, Youssef A, Keane K, and Nelson J A novel recommendation system to match college events and groups to students IOP Conf. Ser.: Mater. Sci. Eng. 2017 261 1 1-15
[27]
Salatino AA, Osborne F, Thanapalasingam T, and Motta E Doucet A, Isaac A, Golub K, Aalberg T, and Jatowt A The CSO classifier: ontology-driven detection of research topics in scholarly articles Digital Libraries for Open Knowledge 2019 Cham Springer 296-311
[28]
Salton G, Wong A, and Yang CS A vector space model for automatic indexing Commun. ACM 1975 18 613-620
[29]
Shehata, S.: A wordnet-based semantic model for enhancing text clustering. In: ICDM, pp. 477–482. IEEE (2009)
[30]
Shemilt I et al. Pinpointing needles in giant haystacks: use of text mining to reduce impractical screening workload in extremely large scoping reviews Res. Synth. Methods 2014 5 1 31-49

Index Terms

  1. Knowledge-Based Categorization of Scientific Articles for Similarity Predictions
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image Guide Proceedings
          Digital Libraries for Open Knowledge: 24th International Conference on Theory and Practice of Digital Libraries, TPDL 2020, Lyon, France, August 25–27, 2020, Proceedings
          Aug 2020
          234 pages
          ISBN:978-3-030-54955-8
          DOI:10.1007/978-3-030-54956-5
          • Editors:
          • Mark Hall,
          • Tanja Merčun,
          • Thomas Risse,
          • Fabien Duchateau

          Publisher

          Springer-Verlag

          Berlin, Heidelberg

          Publication History

          Published: 25 August 2020

          Author Tags

          1. Information retrieval
          2. Categorization
          3. Scientific literature
          4. Document similarity

          Qualifiers

          • Article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 0
            Total Downloads
          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 20 Dec 2024

          Other Metrics

          Citations

          View Options

          View options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media