Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3487553.3524677acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

SciNoBo: A Hierarchical Multi-Label Classifier of Scientific Publications

Published: 16 August 2022 Publication History

Abstract

Classifying scientific publications according to Field-of-Science (FoS) taxonomies is of crucial importance, allowing funders, publishers, scholars, companies and other stakeholders to organize scientific literature more effectively. Most existing works address classification either at venue level or solely based on the textual content of a research publication. We present SciNoBo, a novel classification system of publications to predefined FoS taxonomies, leveraging the structural properties of a publication and its citations and references organized in a multilayer network. In contrast to other works, our system supports assignments of publications to multiple fields by considering their multidisciplinarity potential. By unifying publications and venues under a common multilayer network structure made up of citing and publishing relationships, classifications at the venue-level can be augmented with publication-level classifications. We evaluate SciNoBo on a dataset of publications extracted from Microsoft Academic Graph, and we perform a comparative analysis against a state-of-the-art neural-network baseline. The results reveal that our proposed system is capable of producing high-quality classifications of publications.

References

[1]
Mehwish Alam, Russa Biswas, Yiyi Chen, Danilo Dessì, Genet Asefa Gesese, Fabian Hoppe, and Harald Sack. 2021. HierClasSArt: Knowledge-Aware Hierarchical Classification of Scholarly Articles. In Companion Proceedings of the Web Conference 2021. Association for Computing Machinery, New York, NY, USA, 436–440. https://doi.org/10.1145/3442442.3451365
[2]
Éric Archambault, Olivier H Beauchesne, and Julie Caruso. 2011. Towards a multilingual, comprehensive and open scientific journal ontology. In Proceedings of the 13th international conference of the international society for scientometrics and informetrics. Durban South Africa, 66–77.
[3]
Jeroen Baas, Michiel Schotten, Andrew Plume, Grégoire Côté, and Reza Karimi. 2020. Scopus as a curated, high-quality bibliometric data source for academic research in quantitative science studies. Quantitative Science Studies 1, 1 (2020), 377–386. Publisher: MIT Press One Rogers Street, Cambridge.
[4]
Caroline Birkle, David A Pendlebury, Joshua Schnell, and Jonathan Adams. 2020. Web of Science as a data source for research on scientific and scholarly activity. Quantitative Science Studies 1, 1 (2020), 363–376. Publisher: MIT Press One Rogers Street, Cambridge.
[5]
Annette Boaz, Siobhan Fitzpatrick, and Ben Shaw. 2009. Assessing the impact of research on policy: A literature review. Science and Public Policy 36, 4 (May 2009), 255–270. https://doi.org/10.3152/030234209X436545 _eprint: https://academic.oup.com/spp/article-pdf/36/4/255/4693984/36-4-255.pdf.
[6]
Lutz Bornmann, Robin Haunschild, and Rüdiger Mutz. 2021. Growth rates of modern science: a latent piecewise growth curve approach to model publication numbers from established and new literature databases. Humanities and Social Sciences Communications 8, 1(2021), 1–15. Publisher: Palgrave.
[7]
Cornelia Caragea, Florin Bulgarov, and Rada Mihalcea. 2015. Co-training for topic classification of scholarly data. In Proceedings of the 2015 conference on empirical methods in natural language processing. Association for Computational Linguistics, Lisbon, Portugal, 2357–2366.
[8]
Worasit Choochaiwattana. 2010. Usage of tagging for research paper recommendation. In 2010 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE), Vol. 2. IEEE, Chengdu, China, V2–439. https://doi.org/10.1109/ICACTE.2010.5579321
[9]
Lisa Colledge. 2014. Snowball metrics recipe book. Amsterdam: Snowball Metrics Program Partners 110 (2014), 82.
[10]
Lisa Colledge and R Verlinde. 2014. Scival metrics guidebook. Netherlands: Elsevier(2014), 68.
[11]
Noshir Contractor, Peter Monge, and Paul M Leonardi. 2011. Network Theory| multidimensional networks and the dynamics of sociomateriality: bringing technology inside the network. International Journal of Communication 5 (2011), 39.
[12]
Joshua Eykens, Raf Guns, and Tim C. E. Engels. 2021. Fine-grained classification of social science journal articles using textual data: A comparison of supervised machine learning approaches. Quantitative Science Studies 2, 1 (April 2021), 89–110. https://doi.org/10.1162/qss_a_00106 _eprint: https://direct.mit.edu/qss/article-pdf/2/1/89/1906557/qss_a_00106.pdf.
[13]
Eugene Garfield, Morton V Malin, and Henry Small. 1975. A system for automatic classification of scientific literature. Journal of the Indian Institute of Science 57, 2 (1975), 14.
[14]
Belver C Griffith, Henry G Small, Judith A Stonehill, and Sandra Dey. 1974. The structure of scientific literatures II: Toward a macro-and microstructure for science. Science studies 4, 4 (1974), 339–365. Publisher: Sage Publications Sage CA: Thousand Oaks, CA.
[15]
Ioanna Grypari, Dimitris Pappas, Natalia Manola, and Haris Papageorgiou. 2020. Research & Innovation Activities’ Impact Assessment: The Data4Impact System. In Proceedings of the 1st Workshop on Language Technologies for Government and Public Administration (LT4Gov). European Language Resources Association, Marseille, France, 22–27. https://aclanthology.org/2020.lt4gov-1.4
[16]
Zaynab Hammoud and Frank Kramer. 2020. Multilayer networks: aspects, implementations, and application in biomedicine. Big Data Analytics 5, 1 (2020), 1–18. Publisher: Springer.
[17]
Qi He, Bi Chen, Jian Pei, Baojun Qiu, Prasenjit Mitra, and Lee Giles. 2009. Detecting Topic Evolution in Scientific Literature: How Can Citations Help?. In Proceedings of the 18th ACM Conference on Information and Knowledge Management(CIKM ’09). Association for Computing Machinery, New York, NY, USA, 957–966. https://doi.org/10.1145/1645953.1646076 event-place: Hong Kong, China.
[18]
Ginny Hendricks, Dominika Tkaczyk, Jennifer Lin, and Patricia Feeney. 2020. Crossref: The sustainable source of community-owned scholarly metadata. Quantitative Science Studies 1, 1 (2020), 414–427. Publisher: MIT Press One Rogers Street, Cambridge.
[19]
Christian Herzog, Daniel Hook, and Stacy Konkiel. 2020. Dimensions: Bringing down barriers between scientometricians and data. Quantitative Science Studies 1, 1 (2020), 387–395. Publisher: MIT Press One Rogers Street, Cambridge.
[20]
Fabian Hoppe, Danilo Dessì, and Harald Sack. 2021. Deep Learning Meets Knowledge Graphs for Scholarly Data Classification. In Companion Proceedings of the Web Conference 2021. Association for Computing Machinery, New York, NY, USA, 417–421. https://doi.org/10.1145/3442442.3451361
[21]
Bharath Kandimalla, Shaurya Rohatgi, Jian Wu, and C Lee Giles. 2021. Large scale subject category classification of scholarly papers with deep attentive neural networks. Frontiers in research metrics and analytics 5 (2021), 31. Publisher: Frontiers.
[22]
Brigitte Khoury, Cary Kogan, and Sariah Daouk. 2017. International Classification of Diseases 11th Edition (ICD-11). In Encyclopedia of Personality and Individual Differences, Virgil Zeigler-Hill and Todd K. Shackelford (Eds.). Springer International Publishing, Cham, 1–6. https://doi.org/10.1007/978-3-319-28099-8_904-1
[23]
Mikko Kivelä, Alex Arenas, Marc Barthelemy, James P Gleeson, Yamir Moreno, and Mason A Porter. 2014. Multilayer networks. Journal of complex networks 2, 3 (2014), 203–271. Publisher: Oxford University Press.
[24]
Karol Kurach, Krzysztof Pawlowski, Lukasz Romaszko, Marcin Tatjewski, Andrzej Janusz, and Hung Son Nguyen. 2013. Multi-label Classification of Biomedical Articles. In Intelligent Tools for Building a Scientific Information Platform: Advanced Architectures and Solutions, Robert Bembenik, Lukasz Skonieczny, Henryk Rybinski, Marzena Kryszkiewicz, and Marek Niezgodka (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 199–214. https://doi.org/10.1007/978-3-642-35647-6_15
[25]
Walter Leal Filho, Ulisses Azeiteiro, Fátima Alves, Paul Pace, Mark Mifsud, Luciana Brandli, Sandra S Caeiro, and Antje Disterheft. 2018. Reinvigorating the sustainable development research agenda: the role of the sustainable development goals (SDG). International Journal of Sustainable Development & World Ecology 25, 2(2018), 131–142. Publisher: Taylor & Francis.
[26]
Loet Leydesdorff and Ismael Rafols. 2009. A global map of science based on the ISI subject categories. Journal of the American Society for Information Science and Technology 60, 2 (2009), 348–362. Publisher: Wiley Online Library.
[27]
Micha\l Lukasik, Tomasz Kuśmierczyk, \Lukasz Bolikowski, and Hung Son Nguyen. 2013. Hierarchical, multi-label classification of scholarly publications: modifications of ML-KNN algorithm. In Intelligent tools for building a scientific information platform. Springer, 343–363.
[28]
Paolo Manghi, Alessia Bardi, Claudio Atzori, Miriam Baglioni, Natalia Manola, Jochen Schirrwagen, Pedro Principe, Michele Artini, Amelie Becker, Michele De Bonis, and others. 2019. The OpenAIRE research graph data model. Zenodo (2019), 23.
[29]
OECD. 2015. Frascati Manual 2015. Organisation for Economic Co-operation and Development. https://www.oecd-ilibrary.org/content/publication/9789264239012-en
[30]
Francesco Osborne, Angelo Salatino, Aliaksandr Birukou, and Enrico Motta. 2016. Automatic classification of springer nature proceedings with smart topic miner. In International Semantic Web Conference. Springer, 383–399.
[31]
Antonio Perianes-Rodriguez and Javier Ruiz-Castillo. 2017. A comparison of the Web of Science and publication-level classification systems of science. Journal of Informetrics 11, 1 (2017), 32–45. Publisher: Elsevier.
[32]
Silvio Peroni and David Shotton. 2020. OpenCitations, an infrastructure organization for open scholarship. Quantitative Science Studies 1, 1 (2020), 428–444. Publisher: MIT Press One Rogers Street, Cambridge.
[33]
M Rivest, E Vignola-Gagne, and E Archambault. 2021. Article-level classification of scientific publications: A comparison of deep learning, direct citation and bibliographic coupling. PloS one 16, 5 (2021).
[34]
Javier Ruiz-Castillo and Ludo Waltman. 2015. Field-normalized citation impact indicators using algorithmically constructed classification systems of science. Journal of Informetrics 9, 1 (2015), 102–117. Publisher: Elsevier.
[35]
Angelo Salatino, Francesco Osborne, and Enrico Motta. 2021. CSO Classifier 3.0: a scalable unsupervised method for classifying documents in terms of research topics. International Journal on Digital Libraries(2021), 1–20. Publisher: Springer.
[36]
Angelo Salatino, Thiviyan Thanapalasingam, Andrea Mannocci, Francesco Osborne, and Enrico Motta. 2018. Classifying Research Papers with the Computer Science Ontology. In SEMWEB.
[37]
Angelo A Salatino, Thiviyan Thanapalasingam, Andrea Mannocci, Francesco Osborne, and Enrico Motta. 2018. The computer science ontology: a large-scale taxonomy of research areas. In International Semantic Web Conference. Springer, 187–205.
[38]
Zhihong Shen, Hao Ma, and Kuansan Wang. 2018. A web-scale system for scientific knowledge exploration. arXiv preprint arXiv:1805.12216(2018).
[39]
Fei Shu, Charles-Antoine Julien, Lin Zhang, Junping Qiu, Jing Zhang, and Vincent Larivière. 2019. Comparing journal and paper level classifications of science. Journal of Informetrics 13, 1 (2019), 202–225. Publisher: Elsevier.
[40]
Henry Small, Kevin W Boyack, and Richard Klavans. 2014. Identifying emerging topics in science and technology. Research policy 43, 8 (2014), 1450–1467. Publisher: Elsevier.
[41]
Vilius Stanciauskas, Ioanna Grypari, Gustaf Nelhans, G Papageorgiou, and I Demiros. 2020. Policy report on new indicators and approaches for assessing the societal impact of re-search and innovation activities: Big Data approaches for improved monitoring of re-search and innovation performance and assessment of the societal impact in the Health, Demographic Change and Wellbeing Societal Challenge.
[42]
Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. ArnetMiner: Extraction and Mining of Academic Social Networks. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD ’08). Association for Computing Machinery, New York, NY, USA, 990–998. https://doi.org/10.1145/1401890.1402008 event-place: Las Vegas, Nevada, USA.
[43]
S Upham and Henry Small. 2010. Emerging research fronts in science and technology: patterns of new knowledge development. Scientometrics 83, 1 (2010), 15–38. Publisher: Akadémiai Kiadó, co-published with Springer Science+ Business Media BV.
[44]
Thanasis Vergoulis, Ilias Kanellos, Claudio Atzori, Andrea Mannocci, Serafeim Chatzopoulos, Sandro La Bruzzo, Natalia Manola, and Paolo Manghi. 2021. Bip! db: A dataset of impact measures for scientific publications. In Companion Proceedings of the Web Conference 2021. Association for Computing Machinery, 456–460.
[45]
Thanasis Vergoulis, Ilias Kanellos, Serafeim Chatzopoulos, Christos Tryfonopoulos, Theodore Dalamagas, and Yannis Vassiliou. 2018. Pub Finder: Assisting the discovery of qualitative research. Association for Computing Machinery, Larnaka, Cyprus.
[46]
Reinhilde Veugelers, Michele Cincera, Rainer Frietsch, Christian Rammer, Torben Schubert, Anita Pelle, Andrea Renda, Carlos Montalvo, and Jos Leijten. 2015. The impact of horizon 2020 on innovation in Europe. Intereconomics 50, 1 (2015), 4–30. Publisher: Springer.
[47]
Ludo Waltman and Nees Jan Van Eck. 2012. A new methodology for constructing a publication-level classification system of science. Journal of the American Society for Information Science and Technology 63, 12 (2012), 2378–2392. Publisher: Wiley Online Library.
[48]
Kuansan Wang, Zhihong Shen, Chiyuan Huang, Chieh-Han Wu, Yuxiao Dong, and Anshul Kanakia. 2020. Microsoft academic graph: When experts are not enough. Quantitative Science Studies 1, 1 (2020), 396–413. Publisher: MIT Press One Rogers Street, Cambridge.

Cited By

View all
  • (2024)FoRC@NSLP2024: Overview and Insights from the Field of Research Classification Shared TaskNatural Scientific Language Processing and Research Knowledge Graphs10.1007/978-3-031-65794-8_12(189-204)Online publication date: 15-Aug-2024
  • (2023)SCINOBO: a novel system classifying scholarly communication in a dynamically constructed hierarchical Field-of-Science taxonomyFrontiers in Research Metrics and Analytics10.3389/frma.2023.11498348Online publication date: 4-May-2023
  • (2023)Knowledge organisation in institutional repositories: a case study on policies and procedures manuals in the Ibero-American environmentThe Electronic Library10.1108/EL-05-2023-012841:6(770-786)Online publication date: 28-Jun-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
WWW '22: Companion Proceedings of the Web Conference 2022
April 2022
1338 pages
ISBN:9781450391306
DOI:10.1145/3487553
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 August 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. digital libraries
  2. field of science publication classification
  3. hierarchical classification
  4. label propagation
  5. multi-label classification
  6. multilayer network
  7. neural networks
  8. scholarly data

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

WWW '22
Sponsor:
WWW '22: The ACM Web Conference 2022
April 25 - 29, 2022
Virtual Event, Lyon, France

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)53
  • Downloads (Last 6 weeks)8
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)FoRC@NSLP2024: Overview and Insights from the Field of Research Classification Shared TaskNatural Scientific Language Processing and Research Knowledge Graphs10.1007/978-3-031-65794-8_12(189-204)Online publication date: 15-Aug-2024
  • (2023)SCINOBO: a novel system classifying scholarly communication in a dynamically constructed hierarchical Field-of-Science taxonomyFrontiers in Research Metrics and Analytics10.3389/frma.2023.11498348Online publication date: 4-May-2023
  • (2023)Knowledge organisation in institutional repositories: a case study on policies and procedures manuals in the Ibero-American environmentThe Electronic Library10.1108/EL-05-2023-012841:6(770-786)Online publication date: 28-Jun-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media