Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Online maintenance of evolving knowledge graphs with RDFS-based saturation and why-provenance support

Published: 01 October 2023 Publication History

Abstract

Enterprise RDF knowledge graphs are often built using extraction data pipelines that are fed by several heterogeneous sources (relational databases, CSV files or even unstructured textual data). As a direct consequence, the construction of these KGs undergoes a number of changes in the early stages of their life cycle, which are initiated by a human developer and therefore need to be done interactively and efficiently. Driven by such needs, in this paper, we present a solution for the incremental maintenance of KGs given user-prescribed changes. A key feature of the proposed solution is the support of provenance collection that can be used to assist the developer in the analysis and debugging of the KG. Specifically, we strive to compute and maintain the provenance of asserted and inferred facts in the knowledge graph incrementally (and thus efficiently). The evaluation exercises we have conducted show the effectiveness of our solution and highlight the parameters that impact performance.

References

[1]
Exploiting Linked Data and Knowledge Graphs in Large Organisations, Springer, 2017,.
[2]
Ji Shaoxiong, Pan Shirui, Cambria Erik, Marttinen Pekka, Yu Philip, A survey on knowledge graphs: Representation, acquisition, and applications, IEEE Trans. Neural Netw. Learn. Syst. PP (2021),.
[3]
Darari Fariz, Nutt Werner, Pirrò Giuseppe, Razniewski Simon, Completeness management for RDF data sources, TWEB 12 (3) (2018) 18:1–18:53,.
[4]
Cebiric Sejla, Goasdoué François, Kondylakis Haridimos, Kotzinos Dimitris, Manolescu Ioana, Troullinou Georgia, Zneika Mussab, Summarizing semantic graphs: a survey, VLDB J. 28 (3) (2019) 295–327,.
[5]
Herschel Melanie, Diestelkämper Ralf, Ben Lahmar Houssem, A survey on provenance: What for? What form? What from?, VLDB J. 26 (6) (2017) 881–906,.
[6]
Dai Chenyun, Lin Dan, Bertino Elisa, Kantarcioglu Murat, An approach to evaluate data trustworthiness based on data provenance, in: Jonker Willem, Petkovic Milan (Eds.), Secure Data Management, 5th VLDB Workshop, SDM 2008, Auckland, New Zealand, August 24, 2008, Proceedings, in: Lecture Notes in Computer Science, vol. 5159, Springer, 2008, pp. 82–98,.
[7]
Hartig Olaf, Zhao Jun, Using web data provenance for quality assessment, in: Freire Juliana, Missier Paolo, Sahoo Satya Sanket (Eds.), Proceedings of the First International Workshop on the Role of Semantic Web in Provenance Management (SWPM, Collocated with ISWC-2009, in: CEUR Workshop Proceedings, vol. 526, CEUR-WS.org, 2009, URL http://ceur-ws.org/Vol-526/paper_1.pdf.
[8]
Nguyen Thanh Tam, Weidlich Matthias, Yin Hongzhi, Zheng Bolong, Nguyen Quang Huy, Nguyen Quoc Viet Hung, FactCatch: Incremental pay-as-you-go fact checking with minimal user effort, in: Huang Jimmy, Chang Yi, Cheng Xueqi, Kamps Jaap, Murdock Vanessa, Wen Ji-Rong, Liu Yiqun (Eds.), Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2020, Virtual Event, China, July 25-30, 2020, ACM, 2020, pp. 2165–2168,.
[9]
Alawini Abdussalam, Davidson Susan B., Silvello Gianmaria, Tannen Val, Wu Yinjun, Data citation: A new provenance challenge, IEEE Data Eng. Bull. 41 (1) (2018) 27–38. URL http://sites.computer.org/debull/A18mar/p27.pdf.
[10]
Cheney James, Chiticariu Laura, Tan Wang Chiew, Provenance in databases: Why, how, and where, Found. Trends Databases 1 (4) (2009) 379–474,.
[11]
Senellart Pierre, Provenance in databases: principles and applications, in: Reasoning Web. Explainable Artificial Intelligence, Springer, 2019, pp. 104–109.
[12]
Lee Seokki, Ludäscher Bertram, Glavic Boris, Approximate summaries for why and why-not provenance (extended version), 2020, arXiv preprint arXiv:2002.00084.
[13]
Sakr Sherif, Wylot Marcin, Mutharaju Raghava, Le Phuoc Danh, Fundulaki Irini, Provenance management for linked data, in: Linked Data, Springer, 2018, pp. 181–195.
[14]
Sikos Leslie F., The evolution of context-aware RDF knowledge graphs, Provenance Data Sci. (2021) 1–10.
[15]
McKenna Lucy, Debruyne Christophe, O’Sullivan Declan, Modelling the provenance of linked data interlinks for the library domain, in: Companion Proceedings of the 2019 World Wide Web Conference, WWW ’19, Association for Computing Machinery, New York, NY, USA, 2019, pp. 954–958,. URL https://doi.org/10.1145/3308560.3316518.
[16]
Freire Juliana, Chirigati Fernando, Provenance and the different flavors of computational reproducibility, IEEE Data Eng. Bull. 41 (1) (2018) 15.
[17]
Sahoo Satya S, Valdez Joshua, Kim Matthew, Rueschman Michael, Redline Susan, ProvCaRe: characterizing scientific reproducibility of biomedical research studies using semantic provenance metadata, Int. J. Med. Inform. 121 (2019) 10–18.
[18]
Goasdoué François, Manolescu Ioana, Roatis Alexandra, Efficient query answering against dynamic RDF databases, in: Joint 2013 EDBT/ICDT Conferences, EDBT ’13 Proceedings, Genoa, Italy, March 18-22, 2013, ACM, 2013, pp. 299–310,.
[19]
Urbani Jacopo, Margara Alessandro, Jacobs Ceriel J.H., van Harmelen Frank, Bal Henri E., DynamiTE: Parallel materialization of dynamic RDF data, in: The Semantic Web - ISWC 2013 - 12th International Semantic Web Conference, Sydney, NSW, Australia, October 21-25, 2013, Proceedings, Part I, Springer, 2013, pp. 657–672,.
[20]
Urbani Jacopo, Kotoulas Spyros, Maassen Jason, van Harmelen Frank, Bal Henri E., WebPIE: A web-scale parallel inference engine using MapReduce, J. Web Semant. 10 (2012) 59–75,.
[21]
Bazoobandi Hamid R., Beck Harald, Urbani Jacopo, Expressive stream reasoning with laser, in: The Semantic Web - ISWC 2017 - 16th International Semantic Web Conference, Vienna, Austria, October 21-25, 2017, Proceedings, Part I, Springer, 2017, pp. 87–103,.
[22]
Antoniou Grigoris, Batsakis Sotiris, Mutharaju Raghava, Pan Jeff Z., Qi Guilin, Tachmazidis Ilias, Urbani Jacopo, Zhou Zhangquan, A survey of large-scale reasoning on the web of data, Knowledge Eng. Rev. 33 (2018),.
[23]
Chevalier Jules, Subercaze Julien, Gravier Christophe, Laforest Frédérique, Slider: An efficient incremental reasoner, in: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31 - June 4, 2015, ACM, 2015, pp. 1081–1086,.
[24]
Bishop Barry, Kiryakov Atanas, Tashev Zdravko, Damova Mariana, Simov Kiril Ivanov, OWLIM reasoning over FactForge, in: Proceedings of the 1st International Workshop on OWL Reasoner Evaluation (ORE-2012), Manchester, UK, July 1st, 2012, Ceur-WS, 2012, URL http://ceur-ws.org/Vol-858/ore2012_paper14.pdf.
[25]
Ren Yuan, Pan Jeff Z., Guclu Isa, Kollingbaum Martin J., A combined approach to incremental reasoning for EL ontologies, in: Web Reasoning and Rule Systems - 10th International Conference, RR 2016, Aberdeen, UK, September 9-11, 2016, Proceedings, Springer, 2016, pp. 167–183,.
[26]
Farvardin Mohammad Amin, Colazzo Dario, Belhajjame Khalid, Sartiani Carlo, Scalable saturation of streaming RDF triples, Trans. Large Scale Data Knowl. Centered Syst. 44 (2020) 1–40,.
[27]
Farvardin Mohammad Amin, Colazzo Dario, Belhajjame Khalid, Sartiani Carlo, Streaming saturation for large RDF graphs with dynamic schema information, in: Proceedings of the 17th ACM SIGPLAN International Symposium on Database Programming Languages, DBPL 2019, Phoenix, AZ, USA, June 23, 2019, ACM, 2019, pp. 42–52,.
[28]
Guo Yuanbo, Pan Zhengxiang, Heflin Jeff, LUBM: A benchmark for OWL knowledge base systems, J. Web Semant. 3 (2–3) (2005) 158–182,.
[29]
Broekstra Jeen, Kampman Arjohn, Inferencing and truth maintenance in RDF schema, in: PSSS1 - Practical and Scalable Semantic Systems, Proceedings of the First International Workshop on Practical and Scalable Semantic Systems, Sanibel Island, Florida, USA, October 20, 2003, CEUR-WS.org, 2003, URL http://ceur-ws.org/Vol-89/broekstra-et-al.pdf.
[30]
Gaur Garima, Bhattacharya Arnab, Bedathur Srikanta, How and why is an answer (still) correct? Maintaining provenance in dynamic knowledge graphs, in: d’Aquin Mathieu, Dietze Stefan, Hauff Claudia, Curry Edward, Cudré-Mauroux Philippe (Eds.), CIKM ’20: The 29th ACM International Conference on Information and Knowledge Management, Virtual Event, Ireland, October 19-23, 2020, ACM, 2020, pp. 405–414,.
[31]
Hernández Daniel, Galárraga Luis, Hose Katja, Computing how-provenance for SPARQL queries via query rewriting, Proc. VLDB Endow. 14 (13) (2021) 3389–3401,. URL http://www.vldb.org/pvldb/vol14/p3389-galarraga.pdf.
[32]
Halpin Harry, Cheney James, Dynamic provenance for SPARQL updates, in: Mika Peter, Tudorache Tania, Bernstein Abraham, Welty Chris, Knoblock Craig A., Vrandecic Denny, Groth Paul, Noy Natasha F., Janowicz Krzysztof, Goble Carole A. (Eds.), The Semantic Web - ISWC 2014 - 13th International Semantic Web Conference, Riva Del Garda, Italy, October 19-23, 2014. Proceedings, Part I, in: Lecture Notes in Computer Science, vol. 8796, Springer, 2014, pp. 425–440,.
[33]
Avgoustaki Argyro, Flouris Giorgos, Fundulaki Irini, Plexousakis Dimitris, Provenance management for evolving RDF datasets, in: Sack Harald, Blomqvist Eva, d’Aquin Mathieu, Ghidini Chiara, Ponzetto Simone Paolo, Lange Christoph (Eds.), The Semantic Web. Latest Advances and New Domains - 13th International Conference, ESWC 2016, Heraklion, Crete, Greece, May 29 - June 2, 2016, Proceedings, in: Lecture Notes in Computer Science, vol. 9678, Springer, 2016, pp. 575–592,.
[34]
Abiteboul Serge, Hull Richard, Vianu Victor, Foundations of Databases, Addison-Wesley, 1995, URL http://webdam.inria.fr/Alice/.
[35]
Hanson Eric N., A performance analysis of view materialization strategies, in: Dayal Umeshwar, Traiger Irving L. (Eds.), Proceedings of the Association for Computing Machinery Special Interest Group on Management of Data 1987 Annual Conference, San Francisco, CA, USA, May 27-29, 1987, ACM Press, 1987, pp. 440–453,.
[36]
Dong Guozhu, Ramamohanarao Kotagiri, Maintaining constrained transitive closure by conjunctive queries, in: International Conference on Deductive and Object-Oriented Databases, Springer, 1997, pp. 35–51.
[37]
Pang Chaoyi, Kotagiri Ramamohanarao, Dong Guozhu, Incremental FO (+,¡) maintenance of all-pairs shortest paths for undirected graphs after insertions and deletions, in: International Conference on Database Theory, Springer, 1999, pp. 365–382.
[38]
Zeume Thomas, Schwentick Thomas, Dynamic conjunctive queries, J. Comput. System Sci. 88 (2017) 3–26.
[39]
Jean-Marie Nicolas, Kioumars Yazdanian, An Outline of BDGEN: A Deductive DBMS., in: IFIP Congress, 1983, pp. 711–717.
[40]
Goasdoué François, Karanasos Konstantinos, Leblay Julien, Manolescu Ioana, View selection in semantic web databases, Proc. VLDB Endow. 5 (2) (2011) 97–108,. URL http://www.vldb.org/pvldb/vol5/p097_francoisgoasdoue_vldb2012.pdf.
[41]
Motik Boris, Nenov Yavor, Piro Robert, Horrocks Ian, Maintenance of datalog materialisations revisited, Artificial Intelligence 269 (2019) 76–136,. URL https://doi.org/10.1016/j.artint.2018.12.004.
[42]
Gupta Ashish, Mumick Inderpal Singh, Subrahmanian V.S., Maintaining views incrementally, in: Buneman Peter, Jajodia Sushil (Eds.), Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, DC, USA, May 26-28, 1993, ACM Press, 1993, pp. 157–166,.
[43]
Deutch Daniel, Gilad Amir, Moskovitch Yuval, Selective provenance for datalog programs using top-k queries, Proc. VLDB Endow. 8 (12) (2015) 1394–1405,. URL http://www.vldb.org/pvldb/vol8/p1394-deutch.pdf.
[44]
Deutch Daniel, Gilad Amir, Moskovitch Yuval, Efficient provenance tracking for datalog using top-k queries, VLDB J. 27 (2) (2018) 245–269,.
[45]
Green Todd J., Karvounarakis Gregory, Tannen Val, Provenance semirings, in: Libkin Leonid (Ed.), Proceedings of the Twenty-Sixth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, June 11-13, 2007, Beijing, China, ACM, 2007, pp. 31–40,.
[46]
Senellart Pierre, Jachiet Louis, Maniu Silviu, Ramusat Yann, ProvSQL: Provenance and probability management in postgresql, Proc. VLDB Endow. 11 (12) (2018) 2034–2037,. URL http://www.vldb.org/pvldb/vol11/p2034-senellart.pdf.
[47]
Glavic Boris, Alonso Gustavo, The perm provenance management system in action, in: Çetintemel Ugur, Zdonik Stanley B., Kossmann Donald, Tatbul Nesime (Eds.), Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2009, Providence, Rhode Island, USA, June 29 - July 2, 2009, ACM, 2009, pp. 1055–1058,.
[48]
Arab Bahareh Sadat, Feng Su, Glavic Boris, Lee Seokki, Niu Xing, Zeng Qitian, GProM - A swiss army knife for your provenance needs, IEEE Data Eng. Bull. 41 (1) (2018) 51–62. URL http://sites.computer.org/debull/A18mar/p51.pdf.
[49]
Wylot Marcin, Cudré-Mauroux Philippe, Groth Paul, TripleProv: efficient processing of lineage queries in a native RDF store, in: Chung Chin-Wan, Broder Andrei Z., Shim Kyuseok, Suel Torsten (Eds.), 23rd International World Wide Web Conference, WWW ’14, Seoul, Republic of Korea, April 7-11, 2014, ACM, 2014, pp. 455–466,.
[50]
Deutch Daniel, Milo Tova, Roy Sudeepa, Tannen Val, Circuits for datalog provenance, in: Schweikardt Nicole, Christophides Vassilis, Leroy Vincent (Eds.), Proc. 17th International Conference on Database Theory (ICDT), Athens, Greece, March 24-28, 2014, OpenProceedings.org, 2014, pp. 201–212,.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Web Semantics: Science, Services and Agents on the World Wide Web
Web Semantics: Science, Services and Agents on the World Wide Web  Volume 78, Issue C
Oct 2023
59 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 October 2023

Author Tags

  1. Knowledge graph
  2. Maintenance
  3. RDF

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media