Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3459637.3482003acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

GAKG: A Multimodal Geoscience Academic Knowledge Graph

Published: 30 October 2021 Publication History

Abstract

The research of geoscience plays a strong role in helping people gain a better understanding of the Earth. To effectively represent the knowledge (KG) from enormous geoscience research papers, knowledge graphs can be a powerful means. In the face of enormous geoscience research papers, knowledge graphs can be a powerful means to manage the relationships of data and integrate knowledge extracted from them. However, the existing geoscience KGs mainly focus on the external connection between concepts, whereas the potential abundant information contained in the internal multimodal data of the paper is largely overlooked for more fine-grained knowledge mining. To this end, we propose GAKG, a large-scale multimodal academic KG based on 1.12 million papers published in various geoscience-related journals. In addition to the bibliometrics elements, we also extracted the internal illustrations, tables, and text information of the articles, and dig out the knowledge entities of the papers and the era and spatial attributes of the articles, coupling multimodal academic data and features. Specifically, GAKG realizes knowledge entity extraction under our proposed Human-In-the-Loop framework, the novelty of which is to combine the techniques of machine reading and information retrieval with manual annotation of geoscientists in the loop. Considering the fact that literature of geoscience often contains more abundant illustrations and time scale information compared with that of other disciplines, we extract all the geographical information and era from the geoscience papers' text and illustrations, mapping papers to the atlas and chronology. Based on GAKG, we build several knowledge discovery benchmarks for finding geoscience communities and predicting potential links. GAKG and its services have been made publicly available and user-friendly.

Supplementary Material

MP4 File (gakg.mp4)
GAKG is a multimodal Geoscience Academic Knowledge Graph (GAKG) framework by fusing papers' illustrations, text, and bibliometric data. To our knowledge, GAKG is currently the largest and most comprehensive geoscience academic knowledge graph, consisting of more than 68 million triples. And if you want to explore the entire GAKG, view https://gakg.acemap.info.

References

[1]
Surendra Adhikari, Lambert Caron, Bernhard Steinberger, John T Reager, Kristian K Kjeldsen, Ben Marzeion, Eric Larour, and Erik R Ivins. 2018. What drives 20th century polar motion? Earth and Planetary Science Letters, Vol. 502 (2018), 126--132.
[2]
Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, Vol. 2008, 10 (2008), P10008.
[3]
Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. In Neural Information Processing Systems (NIPS). 1--9.
[4]
IGU CGE. 1992. International charter on geographical education. International Geographical Union, Commission on Geographical Education (1992).
[5]
Christopher Clark and Santosh Divvala. 2016. Pdffigures 2.0: Mining figures from research papers. In 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL). IEEE, 143--152.
[6]
Aaron Clauset, Mark EJ Newman, and Cristopher Moore. 2004. Finding community structure in very large networks. Physical review E, Vol. 70, 6 (2004), 066111.
[7]
National Research Council et al. 2001. Basic research opportunities in earth science .national academies Press.
[8]
Ofer Egozi, Shaul Markovitch, and Evgeniy Gabrilovich. 2011. Concept-based information retrieval using explicit semantic analysis. ACM Transactions on Information Systems (TOIS), Vol. 29, 2 (2011), 1--34.
[9]
Jun-xuan Fan, Shu-zhong Shen, Douglas H Erwin, Peter M Sadler, Norman MacLeod, Qiu-ming Cheng, Xu-dong Hou, Jiao Yang, Xiang-dong Wang, Yue Wang, et al. 2020. A high-resolution summary of Cambrian to Early Triassic marine invertebrate biodiversity. Science, Vol. 367, 6475 (2020), 272--277.
[10]
Paul Groth, Mike Lauruhn, Antony Scerri, and Ron Daniel Jr. 2018. Open Information Extraction on Scientific Text: An Evaluation. In Proceedings of the 27th International Conference on Computational Linguistics. Association for Computational Linguistics, Santa Fe, New Mexico, USA, 3414--3423. https://www.aclweb.org/anthology/C18--1289
[11]
Kalervo J"arvelin and Jaana Kek"al"ainen. 2017. IR evaluation methods for retrieving highly relevant documents. In ACM SIGIR Forum, Vol. 51. ACM New York, NY, USA, 243--250.
[12]
Amar Viswanathan Kannan, Dmitriy Fradkin, Ioannis Akrotirianakis, Tugba Kulahcioglu, Arquimedes Canedo, Aditi Roy, Shih-Yuan Yu, Malawade Arnav, and Mohammad Abdullah Al Faruque. 2020. Multimodal Knowledge Graph for Deep Learning Papers and Code .Association for Computing Machinery, New York, NY, USA, 3417--3420. https://doi.org/10.1145/3340531.3417439
[13]
Yasuhiro Kato, Koichiro Fujinaga, Kentaro Nakamura, Yutaro Takaya, Kenichi Kitamura, Junichiro Ohta, Ryuichi Toda, Takuya Nakashima, and Hikaru Iwamori. 2011. Deep-sea mud in the Pacific Ocean as a potential resource for rare-earth elements. Nature Geoscience, Vol. 4, 8 (2011), 535--539.
[14]
Seyed Mehran Kazemi and David Poole. 2018. Simple embedding for link prediction in knowledge graphs. arXiv preprint arXiv:1802.04868 (2018).
[15]
Andrea Lancichinetti and Santo Fortunato. 2009. Community detection algorithms: a comparative analysis. Physical review E, Vol. 80, 5 (2009), 056117.
[16]
Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. 2007. Graph evolution: Densification and shrinking diameters. ACM transactions on Knowledge Discovery from Data (TKDD), Vol. 1, 1 (2007), 2--es.
[17]
Jiaqi Liu, Luoyi Fu, Yuhang Yao, Xinzhe Fu, Xinbing Wang, and Guihai Chen. 2018. Modeling, analysis and validation of evolving networks with hybrid interactions. IEEE/ACM Transactions on Networking, Vol. 27, 1 (2018), 126--142.
[18]
Jiaqi Liu, Qin Zhang, Luoyi Fu, Xinbing Wang, and Songwu Lu. 2019. Evolving knowledge graphs. In IEEE INFOCOM 2019-IEEE Conference on Computer Communications. IEEE, 2260--2268.
[19]
Fragkiskos D Malliaros and Michalis Vazirgiannis. 2013. Clustering and community detection in directed networks: A survey. Physics reports, Vol. 533, 4 (2013), 95--142.
[20]
Andrew Kachites McCallum, Kamal Nigam, Jason Rennie, and Kristie Seymore. 2000. Automating the construction of internet portals with machine learning. Information Retrieval, Vol. 3, 2 (2000), 127--163.
[21]
George A Miller. 1998. WordNet: An electronic lexical database .MIT press.
[22]
Ming-Guo Zhai and M. Santosh. 2011. The early Precambrian odyssey of the North China Craton: A synoptic overview. GONDWANA RESEARCH 1, 6--25. https://deep.acemap.info/paper/366752
[23]
Mark EJ Newman and Michelle Girvan. 2004. Finding and evaluating community structure in networks. Physical review E, Vol. 69, 2 (2004), 026113.
[24]
Mark EJ Newman and Elizabeth A Leicht. 2007. Mixture models and exploratory analysis in networks. Proceedings of the National Academy of Sciences, Vol. 104, 23 (2007), 9564--9569.
[25]
Maximilian Nickel, Volker Tresp, and Hans-Peter Kriegel. 2011. A three-way model for collective learning on multi-relational data. In Icml.
[26]
C Quoc and Viet Le. 2007. Learning to rank with nonsmooth cost functions. Proceedings of the Advances in Neural Information Processing Systems, Vol. 19 (2007), 193--200.
[27]
Filippo Radicchi, Claudio Castellano, Federico Cecconi, Vittorio Loreto, and Domenico Parisi. 2004. Defining and identifying communities in networks. Proceedings of the national academy of sciences, Vol. 101, 9 (2004), 2658--2663.
[28]
Usha Nandini Raghavan, Réka Albert, and Soundar Kumara. 2007. Near linear time algorithm to detect community structures in large-scale networks. Physical review E, Vol. 76, 3 (2007), 036106.
[29]
Martin Rosvall and Carl T Bergstrom. 2008. Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences, Vol. 105, 4 (2008), 1118--1123.
[30]
Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June Hsu, and Kuansan Wang. 2015. An overview of microsoft academic service (mas) and applications. In Proceedings of the 24th international conference on world wide web. 243--246.
[31]
Alexander Strehl and Joydeep Ghosh. 2002. Cluster ensembles--a knowledge reuse framework for combining multiple partitions. Journal of machine learning research, Vol. 3, Dec (2002), 583--617.
[32]
Zhiqing Sun, Zhi-Hong Deng, Jian-Yun Nie, and Jian Tang. 2019. Rotate: Knowledge graph embedding by relational rotation in complex space. arXiv preprint arXiv:1902.10197 (2019).
[33]
Svetlana L. Stuefer, Christopher D. Arp, Douglas L Kane, and Anna K. Liljedahl. 2017. Recent Extreme Runoff Observations From Coastal Arctic Watersheds in Alaska. WATER RESOURCES RESEARCH 11, 9145--9163. https://deep.acemap.info/paper/738179
[34]
Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. Arnetminer: extraction and mining of academic social networks. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. 990--998.
[35]
Timothy M. Kusky, Brian F. Windley, Lu Wang, Zhensheng Wang, Xiaoyong Li, and Peimin Zhu. 2014. Flat slab subduction, trench suction, and craton destruction: Comparison of the North China, Wyoming, and Brazilian cratons. Tectonophysics, Vol. 630, 208--221. https://www.acemap.info/paper/233963873
[36]
Qingyun Wang, Manling Li, Xuan Wang, Nikolaus Parulian, Guangxing Han, Jiawei Ma, Jingxuan Tu, Ying Lin, Haoran Zhang, Weili Liu, et al. 2020. COVID-19 literature knowledge graph construction and drug repurposing report generation. arXiv preprint arXiv:2007.00576 (2020).
[37]
Ruijie Wang, Yuchen Yan, Jialu Wang, Yuting Jia, Ye Zhang, Weinan Zhang, and Xinbing Wang. 2018. Acekg: A large-scale knowledge graph for academic data mining. In Proceedings of the 27th ACM international conference on information and knowledge management. 1487--1490.
[38]
Shu Wang, Xueying Zhang, Peng Ye, Mi Du, Yanxu Lu, and Haonan Xue. 2019. Geographic knowledge graph (GeoKG): A formalized geographic knowledge representation. ISPRS International Journal of Geo-Information, Vol. 8, 4 (2019), 184.
[39]
Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. 2014. Knowledge Graph Embedding by Translating on Hyperplanes. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence (AAAI'14). AAAI Press, 1112--1119.
[40]
Jaewon Yang and Jure Leskovec. 2015. Defining and evaluating network communities based on ground-truth. Knowledge and Information Systems, Vol. 42, 1 (2015), 181--213.
[41]
Hao Yin, Austin R Benson, Jure Leskovec, and David F Gleich. 2017. Local higher-order graph clustering. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. 555--564.
[42]
Yi Zhang, Yong Gao, LuLu Xue, Si Shen, and KaiChen Chen. 2008. A common sense geographic knowledge base for GIR. Science in China Series E: Technological Sciences, Vol. 51, 1 (2008), 26--37.
[43]
Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun, and Qun Liu. 2019. ERNIE: Enhanced language representation with informative entities. arXiv preprint arXiv:1905.07129 (2019).
[44]
Chenghu Zhou, Hua Wang, Chengshan Wang, Zengqian Hou, Zhiming Zheng, Shuzhong Shen, Qiuming Cheng, Zhiqiang Feng, Xinbing Wang, Hairong Lv, et al. 2021. Prospects for the research on geoscience knowledge graph in the Big Data Era. Science China Earth Sciences (2021), 1--11.

Cited By

View all
  • (2024)Bibliometric Analysis on the Research of Geoscience Knowledge Graph (GeoKG) from 2012 to 2023ISPRS International Journal of Geo-Information10.3390/ijgi1307025513:7(255)Online publication date: 16-Jul-2024
  • (2024)ESDC: 一种用于支持地学文献信息抽取的开放地球科学数据语料库SCIENTIA SINICA Terrae10.1360/N072023-0247Online publication date: 12-Nov-2024
  • (2024)Fault Diagnosis for Test Alarms in Microservices through Multi-source DataCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663833(115-125)Online publication date: 10-Jul-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
October 2021
4966 pages
ISBN:9781450384469
DOI:10.1145/3459637
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data management
  2. geoscience academic knowledge graph
  3. information extraction
  4. knowledge base

Qualifiers

  • Research-article

Funding Sources

  • 2021 Tencent AI Lab RhinoBird Focused Research Program
  • National Key R&D Program of China
  • NSF China
  • Shanghai Academic/Technology Research Leader Program

Conference

CIKM '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)248
  • Downloads (Last 6 weeks)27
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Bibliometric Analysis on the Research of Geoscience Knowledge Graph (GeoKG) from 2012 to 2023ISPRS International Journal of Geo-Information10.3390/ijgi1307025513:7(255)Online publication date: 16-Jul-2024
  • (2024)ESDC: 一种用于支持地学文献信息抽取的开放地球科学数据语料库SCIENTIA SINICA Terrae10.1360/N072023-0247Online publication date: 12-Nov-2024
  • (2024)Fault Diagnosis for Test Alarms in Microservices through Multi-source DataCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663833(115-125)Online publication date: 10-Jul-2024
  • (2024)A Survey of Multi-modal Knowledge Graphs: Technologies and TrendsACM Computing Surveys10.1145/365657956:11(1-41)Online publication date: 28-Jun-2024
  • (2024)K2: A Foundation Language Model for Geoscience Knowledge Understanding and UtilizationProceedings of the 17th ACM International Conference on Web Search and Data Mining10.1145/3616855.3635772(161-170)Online publication date: 4-Mar-2024
  • (2024)Review, framework, and future perspectives of Geographic Knowledge Graph (GeoKG) quality assessmentGeo-spatial Information Science10.1080/10095020.2024.2403785(1-21)Online publication date: 20-Sep-2024
  • (2024)Multi-granularity retrieval of mineral resource geological reports based on multi-feature associationOre Geology Reviews10.1016/j.oregeorev.2024.105889165(105889)Online publication date: Feb-2024
  • (2024)Efficacy of Knowledge Graphs to Systematize Primitive Research MethodologySmart Trends in Computing and Communications10.1007/978-981-97-1329-5_29(365-375)Online publication date: 15-May-2024
  • (2024) DDE KG Editor: A data service system for knowledge graph construction in geoscience Geoscience Data Journal10.1002/gdj3.245Online publication date: 12-Apr-2024
  • (2023)Fine‐Grained Scene Graph Generation with Overlap Region and Geometrical CenterComputer Graphics Forum10.1111/cgf.1468341:7(359-370)Online publication date: 20-Mar-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media