Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2247596.2247640acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

Efficient distributed query processing for autonomous RDF databases

Published: 27 March 2012 Publication History

Abstract

The inherent flexibility of the RDF data model has led to its notable adoption in many domains, especially in the area of life-sciences. Some of these domains have an emerging need to access data integrated from various distributed sources of information. It is not always possible to implement this by simply loading all data into one central RDF store. For example, in the context of inter-institutional collaboration for drug development and clinical research participants often want to maintain control over their local databases. Alternatively, distributed query processing techniques can be utilized to evaluate queries by accessing the remote data sources only on demand and in conformance with local authorization models. In this paper we present an efficient approach to distributed query processing for large autonomous RDF databases. The groundwork is laid by a comprehensive RDF-specific schema- and instance-level synopsis. We present an optimizer that is able to utilize this synopsis to generate compact execution plans by precisely determining, at compile-time, those sources that are relevant to a query. Furthermore we present a tightly integrated query engine that is able to further reduce the volume of intermediate results at run-time. An extensive evaluation shows that our approach improves query execution times by up to two and transferred data volumes by up to three orders of magnitude compared to a naïve implementation.

References

[1]
Linked data - design issues. http://www.w3.org/DesignIssues/LinkedData.html.
[2]
OWL web ontology language overview. http://www.w3.org/TR/owl-features/.
[3]
RDF primer. http://www.w3.org/TR/rdf-primer/.
[4]
RDF vocabulary description language 1.0: RDF schema. http://www.w3.org/TR/rdf-schema/.
[5]
SPARQL 1.1 federation extensions. http://www.w3.org/TR/sparql11-federated-query/.
[6]
SPARQL query language for RDF. http://www.w3.org/TR/rdf-sparql-query/.
[7]
D. J. Abadi, A. Marcus, S. Madden, and K. J. Hollenbach. Scalable semantic web data management using vertical partitioning. In VLDB, pages 411--422. ACM, 2007.
[8]
M. Arias, J. D. Fernández, M. A. Martínez-Prieto, and P. de la Fuente. An empirical study of real-world SPARQL queries. CoRR, abs/1103.5043, 2011.
[9]
C. Basca and A. Bernstein. Avalanche: putting the spirit of the web back into semantic web querying. In SSWS, pages 64--79, 2010.
[10]
F. Belleau, M.-A. Nolin, N. Tourigny, P. Rigault, and J. Morissette. Bio2RDF: Towards a mashup to build bioinformatics knowledge systems. Journal of Biomedical Informatics, 41(5):706--716, 2008.
[11]
K. -H. Cheung, H. R. Frost, M. S. Marshall, E. Prud'hommeaux, M. Samwald, J. Zhao, and A. Paschke. A journey to semantic web query federation in the life sciences. BMC Bioinformatics, 10(S-10):10, 2009.
[12]
P. Fender and G. Moerkotte. A new, highly efficient, and easy to implement top-down join enumeration algorithm. In ICDE, pages 864--875, 2011.
[13]
K. Goh, M. E. Cusick, D. Valle, B. Childs, M. Vidal, and A.-L. Barabási. The human disease network. PNAS, 104(21):8685--8690, May 2007.
[14]
A. Harth, K. Hose, M. Karnstedt, A. Polleres, K.-U. Sattler, and J. Umbrich. Data summaries for on-demand queries over linked data. In WWW, pages 411--420. ACM, 2010.
[15]
O. Hartig, C. Bizer, and J. C. Freytag. Executing sparql queries over the web of linked data. In ISWC, pages 293--309. Springer, 2009.
[16]
J. Huang, D. Abadi, and K. Ren. Scalable SPARQL querying of large RDF graphs. In VLDB, pages 1123--1134. ACM, 2011.
[17]
D. Kossmann. The state of the art in distributed query processing. ACM Comput. Surv., 32(4):422--469, 2000.
[18]
G. Ladwig and T. Tran. Linked data query processing strategies. In ISWC, pages 453--469. Springer, 2010.
[19]
A. Langegger, W. Wöß, and M. Blöchl. A semantic web middleware for virtual data integration on the web. In ESWC, pages 493--507. Springer, 2008.
[20]
S. T. Leutenegger, J. M. Edgington, and M. A. Lopez. STR: A simple and efficient algorithm for R-Tree packing. In ICDE, pages 497--506. IEEE Computer Society, 1997.
[21]
Y. Li and J. Heflin. Using reformulation trees to optimize queries over distributed heterogeneous sources. In ISWC, pages 502--517. Springer, 2010.
[22]
T. Neumann and G. Moerkotte. Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins. In ICDE, pages 984--994. IEEE Computer Society, 2011.
[23]
T. Neumann and G. Weikum. The RDF-3X engine for scalable management of RDF data. VLDB J., 19(1):91--113, 2010.
[24]
B. Quilitz and U. Leser. Querying distributed RDF data sources with SPARQL. In ESWC, pages 524--538. Springer, 2008.
[25]
L. Sidirourgos, R. Goncalves, M. L. Kersten, N. Nes, and S. Manegold. Column-store support for RDF data management: not all swans are white. PVLDB, 1(2):1553--1563, 2008.
[26]
K. Stocker, D. Kossmann, R. Braumandl, and A. Kemper. Integrating semi-join-reducers into state of the art query processors. In ICDE, pages 575--584. IEEE Computer Society, 2001.
[27]
H. Stuckenschmidt, R. Vdovjak, G.-J. Houben, and J. Broekstra. Index structures and algorithms for querying distributed RDF repositories. In WWW, pages 631--639. ACM, 2004.
[28]
L. Zou, J. Mo, L. Chen, M. T. Özsu, and D. Zhao. gStore: Answering SPARQL queries via subgraph matching. PVLDB, 4(8):482--493, May 2011.

Cited By

View all
  • (2024)A systematic overview of data federation systemsSemantic Web10.3233/SW-22320115:1(107-165)Online publication date: 12-Jan-2024
  • (2024)Answering Property Path Queries over Federated RDF SystemsWeb and Big Data10.1007/978-981-97-2387-4_2(16-31)Online publication date: 28-Apr-2024
  • (2023)Optimizing Keyword Search Over Federated RDF SystemsIEEE Transactions on Big Data10.1109/TBDATA.2022.32247499:3(918-935)Online publication date: 1-Jun-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
EDBT '12: Proceedings of the 15th International Conference on Extending Database Technology
March 2012
643 pages
ISBN:9781450307901
DOI:10.1145/2247596
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 March 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. RDF
  2. SPARQL
  3. distributed query processing

Qualifiers

  • Research-article

Conference

EDBT '12

Acceptance Rates

Overall Acceptance Rate 7 of 10 submissions, 70%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A systematic overview of data federation systemsSemantic Web10.3233/SW-22320115:1(107-165)Online publication date: 12-Jan-2024
  • (2024)Answering Property Path Queries over Federated RDF SystemsWeb and Big Data10.1007/978-981-97-2387-4_2(16-31)Online publication date: 28-Apr-2024
  • (2023)Optimizing Keyword Search Over Federated RDF SystemsIEEE Transactions on Big Data10.1109/TBDATA.2022.32247499:3(918-935)Online publication date: 1-Jun-2023
  • (2023)A Cost-Driven Top-K Queries Optimization Approach on Federated RDF SystemsIEEE Transactions on Big Data10.1109/TBDATA.2022.31560909:2(665-676)Online publication date: 1-Apr-2023
  • (2021)Subgraph matching over graph federationProceedings of the VLDB Endowment10.14778/3494124.349412915:3(437-450)Online publication date: 1-Nov-2021
  • (2019)Optimizing Multi-Query Evaluation in Federated RDF SystemsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.2947050(1-1)Online publication date: 2019
  • (2019)Partitioning Large-Scale Property Graph for Efficient Distributed Query Processing2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)10.1109/HPCC/SmartCity/DSS.2019.00225(1643-1650)Online publication date: Aug-2019
  • (2019)Federated RDF Query ProcessingEncyclopedia of Big Data Technologies10.1007/978-3-319-77525-8_228(754-761)Online publication date: 20-Feb-2019
  • (2018)Multi-query Optimization in Federated RDF SystemsDatabase Systems for Advanced Applications10.1007/978-3-319-91452-7_48(745-765)Online publication date: 13-May-2018
  • (2018)Federated RDF Query ProcessingEncyclopedia of Big Data Technologies10.1007/978-3-319-63962-8_228-1(1-8)Online publication date: 21-Feb-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media