short-paper

Open access

Assessing Scientific Research Papers with Knowledge Graphs

Authors:

Daniel Benjamin,

Fred Morstatter,

Kristina Lerman,

Jay PujaraAuthors Info & Claims

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 2467 - 2472

https://doi.org/10.1145/3477495.3531879

Published: 07 July 2022 Publication History

Abstract

In recent decades, the growing scale of scientific research has led to numerous novel findings. Reproducing these findings is the foundation of future research. However, due to the complexity of experiments, manually assessing scientific research is laborious and time-intensive, especially in social and behavioral sciences. Although increasing reproducibility studies have garnered increased attention in the research community, there is still a lack of systematic ways for evaluating scientific research at scale. In this paper, we propose a novel approach towards automatically assessing scientific publications by constructing a knowledge graph (KG) that captures a holistic view of the research contributions. Specifically, during the KG construction, we combine information from two different perspectives: micro-level features that capture knowledge from published articles such as sample sizes, effect sizes, and experimental models, and macro-level features that comprise relationships between entities such as authorship and reference information. We then learn low-dimensional representations using language models and knowledge graph embeddings for entities (nodes in KGs), which are further used for the assessments. A comprehensive set of experiments on two benchmark datasets shows the usefulness of leveraging KGs for scoring scientific research.

Supplementary Material

MP4 File (SIGIR22-sp1936.mp4)

A short version of the presentation video of the research work on constructing knowledge graphs to assess scientific publications.

Download
6.61 MB

References

[1]

Ralph Abboud, .Ismail .Ilkan Ceylan, Thomas Lukasiewicz, and Tommaso Salvatori. 2020. BoxE: A Box Embedding Model for Knowledge Base Completion. In Proceedings of the Thirty-Fourth Annual Conference on Advances in Neural Information Processing Systems .

[2]

Mehdi Ali, Max Berrendorf, Charles Tapley Hoyt, Laurent Vermue, Sahand Sharifzadeh, Volker Tresp, and Jens Lehmann. 2021. PyKEEN 1.0: A Python Library for Training and Evaluating Knowledge Graph Embeddings . Journal of Machine Learning Research, Vol. 22, 82 (2021), 1--6.

[3]

Nazanin Alipourfard, Beatrix Arendt, Daniel M Benjamin, Noam Benkler, Michael M Bishop, Mark Burstein, Martin Bush, James Caverlee, Yiling Chen, Chae Clark, and et al. 2021. Systematizing Confidence in Open Research and Evidence (SCORE). https://doi.org/10.31235/osf.io/46mnb

[4]

Adam Altmejd, Anna Dreber, Eskil Forsell, Juergen Huber, Taisuke Imai, Magnus Johannesson, Michael Kirchler, Gideon Nave, and Colin Camerer. 2019. Predicting the replicability of social science lab experiments., Vol. 14, 12 (2019). https://doi.org/10.1371/journal.pone.0225826

[5]

Monya Baker. 2016. IS THERE A REPRODUCIBILITY CRISIS? Nature, Vol. 533 (05 2016), 452--454.

[6]

C. Glenn Begley and Lee M. Ellis. 2012. Raise standards for preclinical cancer research textbar Nature., Vol. 83, 7391 (2012), 531--533. https://doi.org/doi.org/10.1038/483531a

[7]

Iz Beltagy, Kyle Lo, and Arman Cohan. 2019. SciBERT: Pretrained Language Model for Scientific Text. In EMNLP .

[8]

Taylor Berg-Kirkpatrick and Daniel Spokoyny. 2020. An Empirical Investigation of Contextualized Number Prediction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing . 4754--4764. https://doi.org/10.18653/v1/2020.emnlp-main.385

[9]

Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, and Jamie Taylor. 2008. Freebase: A Collaboratively Created Graph Database for Structuring Human Knowledge. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data . 1247--1250. https://doi.org/10.1145/1376616.1376746

Digital Library

[10]

Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating Embeddings for Modeling Multi-relational Data. In Advances in Neural Information Processing Systems, Vol. 26.

[11]

Colin F. Camerer, Anna Dreber, Eskil Forsell, Teck-Hua Ho, Jürgen Huber, Magnus Johannesson, Michael Kirchler, Johan Almenberg, Adam Altmejd, Taizan Chan, Emma Heikensten, Felix Holzmeister, Taisuke Imai, Siri Isaksson, Gideon Nave, Thomas Pfeiffer, Michael Razen, and Hang Wu. 2016. Evaluating replicability of laboratory experiments in economics. Science, Vol. 351, 6280 (2016), 1433--1436. https://doi.org/10.1126/science.aaf0918

[12]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4171--4186. https://doi.org/10.18653/v1/N19--1423

[13]

Timothy M Errington, Courtney K Soderberg Maya Mathur, Alexandria Denis, Nicole Perfito, Elizabeth Iorns, and Brian A Nosek. 2021. Investigating the replicability of preclinical cancer biology. (2021). https://doi.org/10.7554/eLife.71601

[14]

John P. A. Ioannidis. 2005. Why Most Published Research Findings Are False. PLOS Medicine, Vol. 2 (08 2005), null. https://doi.org/10.1371/journal.pmed.0020124

[15]

Guoliang Ji, Shizhu He, Liheng Xu, Kang Liu, and Jun Zhao. 2015. Knowledge Graph Embedding via Dynamic Mapping Matrix. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 687--696. https://doi.org/10.3115/v1/P15--1067

[16]

M. G. Kendall. 1938. A New Measure of Rank Correlation. Biometrika, Vol. 30, 1/2 (1938), 81--93.

[17]

Agustinus Kristiadi, Mohammad Asif Khan, Denis Lukovnikov, Jens Lehmann, and Asja Fischer. [n.d.]. Incorporating Literals into Knowledge Graph Embeddings. In The Semantic Web . 347--363. https://doi.org/10.1007/978--3-030--30793--6_20

[18]

Imre Lakatos. 1970. Criticism and the Growth of Knowledge (Proceedings of the International Colloquium in the Philosophy of Science, London 1965, Volume 4). (1970).

[19]

Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick Van Kleef, Sören Auer, and Christian Bizer. 2014. DBpedia - A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia. Semantic Web Journal, Vol. 6 (01 2014). https://doi.org/10.3233/SW-140134

[20]

Yankai Lin, Zhiyuan Liu, Maosong Sun, Yang Liu, and Xuan Zhu. 2015. Learning Entity and Relation Embeddings for Knowledge Graph Completion. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence . 2181--2187.

[21]

Farzaneh Mahdisoltani, Joanna Asia Biega, and Fabian M. Suchanek. 2015. YAGO3: A Knowledge Base from Multilingual Wikipedias. In CIDR .

[22]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. (2013). arxiv: 1301.3781

[23]

Brian A. Nosek and Timothy M. Errington. 2020. What is replication? PLOS Biology, Vol. 18 (03 2020), 1--8. https://doi.org/10.1371/journal.pbio.3000691

[24]

Open Science Collaboration. 2015. Estimating the reproducibility of psychological science textbar Science., Vol. 349, 6251 (2015). https://doi.org/10.1126/science.aac4716

[25]

Florian Prinz, Thomas Schlange, and Khusru Asadullah. 2011. Believe it or not: how much can we rely on published data on potential drug targets?, Vol. 10, 9 (2011), 712--712. https://doi.org/10.1038/nrd3439-c1

[26]

Théo Trouillon, Johannes Welbl, Sebastian Riedel, Eric Gaussier, and Guillaume Bouchard. 2016. Complex Embeddings for Simple Link Prediction. In Proceedings of The 33rd International Conference on Machine Learning (Proceedings of Machine Learning Research), Vol. 48. 2071--2080.

[27]

Shikhar Vashishth, Soumya Sanyal, Vikram Nitin, and Partha Talukdar. 2020. Composition-based Multi-Relational Graph Convolutional Networks. In International Conference on Learning Representations .

[28]

Ruobing Xie, Zhiyuan Liu, Jia Jia, Huanbo Luan, and Maosong Sun. 2016. Representation Learning of Knowledge Graphs with Entity Descriptions. Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 30, 1 (2016).

[29]

Jiacheng Xu, Xipeng Qiu, Kan Chen, and Xuanjing Huang. 2017. Knowledge Graph Representation with Jointly Structural and Textual Encoding. In IJCAI .

[30]

Bishan Yang, Wen-tau Yih, Xiaodong He, Jianfeng Gao, and Li Deng. 2015. Embedding Entities and Relations for Learning and Inference in Knowledge Bases. In 3rd International Conference on Learning Representations, ICLR 2015 .

[31]

Yang Yang, Wu Youyou, and Brian Uzzi. 2020. Estimating the deep replicability of scientific findings using human and artificial intelligence., Vol. 117, 20 (2020), 10762--10768. https://doi.org/10.1073/pnas.1909046117

Cited By

Bhuvanesh Shathyan RBegam MJashwanth KJayaprakash A(2023)Knowledge Graph Based Medical Chatbot building2023 4th IEEE Global Conference for Advancement in Technology (GCAT)10.1109/GCAT59970.2023.10353415(1-6)Online publication date: 6-Oct-2023
https://doi.org/10.1109/GCAT59970.2023.10353415

Index Terms

Assessing Scientific Research Papers with Knowledge Graphs
1. Applied computing
  1. Law, social and behavioral sciences
2. Computing methodologies
  1. Machine learning

Recommendations

Towards Building Live Open Scientific Knowledge Graphs
WWW '22: Companion Proceedings of the Web Conference 2022

Due to the large number and heterogeneity of data sources, it becomes increasingly difficult to follow the research output and the scientific discourse. For example, a publication listed on DBLP may be discussed on Twitter and its underlying data set ...
Using altmetrics for assessing research impact in the humanities

The prospects of altmetrics are especially encouraging for research fields in the humanities that currently are difficult to study using established bibliometric methods. Yet, little is known about the altmetric impact of research fields in the ...
The effect of social media knowledge cascade: an analysis of scientific papers diffusion
Abstract
Our goal is to reveal the social media knowledge cascade (SMKC) of the diffusion of scientific papers. PLoS Biology, one of the prestigious and influential open-access journals under PLoS, has received much attention from researchers. Using papers ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2022

3569 pages

ISBN:9781450387323

DOI:10.1145/3477495

General Chairs:
Enrique Amigo
UNED
,
Pablo Castells
UAM and Amazon
,
Julio Gonzalo
UNED
,
Program Chairs:
Ben Carterette
Spotify
,
J. Shane Culpepper
RMIT University
,
Gabriella Kazai
Waseda University

Copyright © 2022 Owner/Author.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike International 4.0 License.

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2022

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

Defense Advanced Research Projects Agency

Conference

SIGIR '22

Sponsor:

SIGIR

SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 11 - 15, 2022

Madrid, Spain

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
753
Total Downloads

Downloads (Last 12 months)380
Downloads (Last 6 weeks)40

Reflects downloads up to 24 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bhuvanesh Shathyan RBegam MJashwanth KJayaprakash A(2023)Knowledge Graph Based Medical Chatbot building2023 4th IEEE Global Conference for Advancement in Technology (GCAT)10.1109/GCAT59970.2023.10353415(1-6)Online publication date: 6-Oct-2023
https://doi.org/10.1109/GCAT59970.2023.10353415

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents