Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2247596.2247635acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

Heuristics-based query optimisation for SPARQL

Published: 27 March 2012 Publication History

Abstract

Query optimization in RDF Stores is a challenging problem as SPARQL queries typically contain many more joins than equivalent relational plans, and hence lead to a large join order search space. In such cases, cost-based query optimization often is not possible. One practical reason for this is that statistics typically are missing in web scale setting such as the Linked Open Datasets (LOD). The more profound reason is that due to the absence of schematic structure in RDF, join-hit ratio estimation requires complicated forms of correlated join statistics; and currently there are no methods to identify the relevant correlations beforehand. For this reason, the use of good heuristics is essential in SPARQL query optimization, even in the case that are partially used with cost-based statistics (i.e., hybrid query optimization). In this paper we describe a set of useful heuristics for SPARQL query optimizers. We present these in the context of a new Heuristic SPARQL Planner (HSP) that is capable of exploiting the syntactic and the structural variations of the triple patterns in a SPARQL query in order to choose an execution plan without the need of any cost model. For this, we define the variable graph and we show a reduction of the SPARQL query optimization problem to the maximum weight independent set problem. We implemented our planner on top of the MonetDB open source column-store and evaluated its effectiveness against the state-of-the-art RDF-3X engine as well as comparing the plan quality with a relational (SQL) equivalent of the benchmarks.

References

[1]
Yet Another Great Ontology. http://www.mpi-inf.mpg.de/yago-naga/yago.
[2]
Abadi, D. J., Marcus, A., Madden, S. R., and Hollenbach, K. Scalable semantic web data management using vertical partitioning. In VLDB (2007), pp. 411--422.
[3]
Abadi, D. J., Marcus, A., Madden, S. R., and Hollenbach, K. SW-Store: A Vertically Partitioned DBMS for Semantic Web Data Management. VLDB Journal 18, 2 (April 2009).
[4]
Atre, M., Chaoji, V., Zaki, M. J., and Hendler, J. A. Matrix "Bit" loaded: A Scalable Lightweight Join Query Processor for RDF Data. In WWW (2010).
[5]
Baolin, L., and Bo, H. HPRD: A High Performance RDF Database. In Network and Parallel Computing (2007), pp. 364--374.
[6]
Broekstra, J., Kampman, A., and van Harmelen, F. Sesame: An Architecture for Storing and Querying RDF Data and Schema Information. In Spinning the Semantic Web (2003).
[7]
Chong, E. I., Das, S., Eadon, G., and Srinivasan, J. An efficient SQL-based RDF querying scheme. In VLDB (2005).
[8]
Duan, S., Kementsietsidis, A., Srinivas, K., and Udrea, O. Apples and Oranges: A Comparison of RDF benchmarks and Real RDF Datasets. In SIGMOD (2011).
[9]
Erling, O., and Mikhailov, I. RDF Support in the Virtuoso DBMS. In In Proc. of the CSSW (2007).
[10]
Erling, O., and Mikhailov, I. RDF Support in Virtuoso DBMS. In Networked Knowledge - Networked Media, SCI (2009).
[11]
Garey, M. R., and Johnson, D. S. Computers and Intractability: A Guide to the Theory qf NP-Completeness. Freeman, 1979.
[12]
Gibbons, A. Algorithmic Graph Theory. Cambridge University Press, 1985.
[13]
Harth, A., Umbrich, J., Hogan, A., and Decker, S. YARS2: A Federated Repository for Querying Graph Structured Data from the Web. In ISWC (2007).
[14]
Hartig, O., and Heese, R. The SPARQL Query Graph Model for Query Optimization. In ESWC (2007).
[15]
Husain, M. F., McGlothin, J., Masud, M. M., Khan, L. R., and Thuraisingham, B. Heuristics-Based Query Processing for Large RDF Graphs Using Cloud Computing. TKDE (2011).
[16]
Kobilarov, G., Scott, T., Raimond, Y., Oliver, S., Sizemore, C., Smethurst, M., Bizer, C., and Lee, R. Media Meets Semantic Web -- How the BBC Uses DBpedia and Linked Data to Make Connections. In ESWC (2009).
[17]
Linked data. http://linkeddata.org/.
[18]
Lu, J., Cao, F., Ma, L., Yu, Y., and Pan, Y. An Effective SPARQL Support over Relational Databases. In SWDB-ODBIS (2007).
[19]
Manola, F., Miller, E., and McBride, B. RDF Primer. www.w3.org/TR/rdf-primer, 2004.
[20]
MonetDB. http://www.monetdb.org.
[21]
Neumann, T., and Moerkotte, G. Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with Multiple Joins. In ICDE (2011).
[22]
Neumann, T., and Weikum, G. RDF-3X: a RISC-style engine for RDF. PVLDB 1, 1 (2008).
[23]
Neumann, T., and Weikum, G. Scalable join processing on very large rdf graphs. In SIGMOD (June 2009), pp. 627--640.
[24]
Neumann, T., and Weikum, G. The RDF-3X engine for scalable management of RDF data. VLDB Journal 19, 1 (2010).
[25]
Oracle database semantic technologies. http://www.oracle.com/technetwork/database/options/semantic-tech/index.html.
[26]
Ostergard, P. R. J. A new algorithm for the maximum-weight clique problem. Nordic Journal of Computing 8 (2001), 424--436.
[27]
Prud'hommeaux, E., and Seaborne, A. SPARQL Query Language for RDF. www.w3.org/TR/rdf-sparql-query, 2008.
[28]
http://librdf.org/raptor/.
[29]
Schmidt, M., Hornung, T., Lausen, G., and Pinkel, C. SP2bench: A SPARQL performance benchmark. In ICDE (2009).
[30]
Schmidt, M., Meier, M., and Lausen, G. Foundations of SPARQL Query Optimization. In ICDT (2010).
[31]
Sidirourgos, L., Goncalves, R., Kersten, M., Nes, N., and Manegold, S. Column-store support for RDF data management: not all swans are white. PVLDB 1, 2 (2008).
[32]
Stocker, M., Seaborne, A., Bernstein, A., Kiefer, C., and Reynolds, D. SPARQL basic graph pattern optimization using selectivity estimation. In WWW (2008).
[33]
Stonebraker, M., Abadi, D., Batkin, A., Chen, X., Cherniak, M., Ferreira, M., Lau, E., Lin, A., Madden, S., E. O'Neil, P., Rasin, A., Tran, N., and Zdonik, S. C-Store: A Column Oriented DBMS. In VLDB (2005).
[34]
Theoharis, Y., Christophides, V., and Karvounarakis, G. Benchmarking Database Representations of RDF/S Stores. In ISWC (2005).
[35]
Tsialiamanis, P. Heuristic Optimization of SPARQL queries over Column-Store DBMS. Master's thesis, University of Crete, September 2011.
[36]
Udrea, O., Pugliese, A., and Subrahmanian, V. S. Grin: A Graph Based RDF Index. In AAAI (2007).
[37]
Ullman, J. D. Principles of Database and Knowledge-Base Systems. Computer Science Press, 1988.
[38]
Vidal, M. -E., Ruckhaus, E., Lampo, T., Martinez, A., Sierra, J., and Polleres, A. Efficiently Joining Group Patterns in SPARQL Queries. In ESWC (2010).
[39]
Weiss, C., Karras, P., and Bernstein, A. Hexastore: sextuple indexing for semantic web data management. PVLDB 1, 1 (2008).

Cited By

View all
  • (2022)An Efficient Source Selection Approach for Retrieving Electronic Health Records From Federated Clinical RepositoriesInternational Journal of Information Technologies and Systems Approach10.4018/IJITSA.30702515:2(1-18)Online publication date: 18-Aug-2022
  • (2022)HERMES: data placement and schema optimization for enterprise knowledge basesThe VLDB Journal10.1007/s00778-022-00756-y32:3(549-574)Online publication date: 26-Jul-2022
  • (2022)RDF_QDAG in Action: Efficient RDF Data Querying at ScaleWeb Information Systems Engineering – WISE 202210.1007/978-3-031-20891-1_45(633-640)Online publication date: 7-Nov-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
EDBT '12: Proceedings of the 15th International Conference on Extending Database Technology
March 2012
643 pages
ISBN:9781450307901
DOI:10.1145/2247596
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 March 2012

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

EDBT '12

Acceptance Rates

Overall Acceptance Rate 7 of 10 submissions, 70%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)2
Reflects downloads up to 25 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)An Efficient Source Selection Approach for Retrieving Electronic Health Records From Federated Clinical RepositoriesInternational Journal of Information Technologies and Systems Approach10.4018/IJITSA.30702515:2(1-18)Online publication date: 18-Aug-2022
  • (2022)HERMES: data placement and schema optimization for enterprise knowledge basesThe VLDB Journal10.1007/s00778-022-00756-y32:3(549-574)Online publication date: 26-Jul-2022
  • (2022)RDF_QDAG in Action: Efficient RDF Data Querying at ScaleWeb Information Systems Engineering – WISE 202210.1007/978-3-031-20891-1_45(633-640)Online publication date: 7-Nov-2022
  • (2021)An Approach to an efficient Evaluation of SPARQL Queries in DatabasesThe 23rd International Conference on Information Integration and Web Intelligence10.1145/3487664.3487765(375-384)Online publication date: 29-Nov-2021
  • (2021)SymphonyDB: A Polyglot Model for Knowledge Graph Query Processing2021 Third International Conference on Transdisciplinary AI (TransAI)10.1109/TransAI51903.2021.00013(25-32)Online publication date: Sep-2021
  • (2021)Traveling Light — A Low-Overhead Approach for SPARQL Query Optimization2021 IEEE 15th International Conference on Semantic Computing (ICSC)10.1109/ICSC50631.2021.00014(56-61)Online publication date: Jan-2021
  • (2021)Property Graph Schema Optimization for Domain-Specific Knowledge Graphs2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00085(924-935)Online publication date: Apr-2021
  • (2021)A Neural Networks Approach to SPARQL Query Performance Prediction2021 XLVII Latin American Computing Conference (CLEI)10.1109/CLEI53233.2021.9639899(1-9)Online publication date: 25-Oct-2021
  • (2021)Discovery and diagnosis of wrong SPARQL queries with ontology and constraint reasoningExpert Systems with Applications10.1016/j.eswa.2020.113772165(113772)Online publication date: Mar-2021
  • (2021)An Effective Discrete Artificial Bee Colony Based SPARQL Query Path Optimization by Reordering TriplesJournal of Computer Science and Technology10.1007/s11390-020-9901-y36:2(445-462)Online publication date: 31-Mar-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media