Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Hybrid query execution engine for large attributed graphs

Published: 01 May 2014 Publication History

Abstract

Graphs are widely used for modeling complicated data such as social networks, bibliographical networks and knowledge bases. The growing sizes of graph databases motivate the crucial need for developing powerful and scalable graph-based query engines. We propose a SPARQL-like language, G-SPARQL, for querying attributed graphs. The language enables the expression of different types of graph queries that are of large interest in the databases that are modeled as large graph such as pattern matching, reachability and shortest path queries. Each query can combine both structural predicates and value-based predicates (on the attributes of the graph nodes/edges). We describe an algebraic compilation mechanism for our proposed query language which is extended from the relational algebra and based on the basic construct of building SPARQL queries, the Triple Pattern. We describe an efficient hybrid Memory/Disk representation of large attributed graphs where only the topology of the graph is maintained in memory while the data of the graph are stored in a relational database. The execution engine of our proposed query language splits parts of the query plan to be pushed inside the relational database (using SQL) while the execution of other parts of the query plan is processed using memory-based algorithms, as necessary. Experimental results on real and synthetic datasets demonstrate the efficiency and the scalability of our approach and show that our approach outperforms native graph databases by several factors.

References

[1]
D. Abadi, A. Marcus, S. Madden, K. Hollenbach, Scalable semantic web data management using vertical partitioning, in: VLDB, 2007.
[2]
Abiteboul, S., Quass, D., McHugh, J., Widom, J. and Wiener, J., The Lorel query language for semistructured data. Int. J. Digital Libr. v1 i1.
[3]
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Rasin, A. and Silberschatz, A., HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. . PVLDB. v2 i1.
[4]
Alkhateeb, F., Baget, J. and Euzenat, J., Extending SPARQL with regular expression patterns. J. Web Sem. v7 i2.
[5]
Angles, R. and Gutiérrez, C., Survey of graph database models. ACM Comput. Surv. v40 i1.
[6]
K. Anyanwu, A. Maduko, A. Sheth, SPARQ2L: towards support for subgraph extraction queries in RDF databases, in: WWW, 2007.
[7]
M. Bröcheler, A. Pugliese, V.S. Subrahmanian, DOGMA: a disk-oriented graph matching algorithm for RDF databases, in: ISWC, 2009.
[8]
D. Cai, Z. Shao, X. He, X. Yan, J. Han, Community mining from multi-relational networks, in: PKDD, 2005.
[9]
D. Chakrabarti, Y. Zhan, C. Faloutsos, R-MAT: a recursive model for graph mining, in: SDM, 2004.
[10]
Chebotko, A., Lu, S. and Fotouhi, F., Semantics preserving SPARQL-to-SQL translation. DKE. v68 i10.
[11]
C. Chen, X. Yan, P. Yu, J. Han, D. Zhang, X. Gu, Towards graph containment search and indexing, in: VLDB, 2007.
[12]
J. Cheng, J. Yu, On-line exact shortest distance query processing, in: EDBT, 2009.
[13]
Cohen, E., Halperin, E., Kaplan, H. and Zwick, U., Reachability and distance queries via 2-hop labels. SIAM J. Comput. v32 i5.
[14]
G. Copeland, S. Khoshafian, A decomposition storage model, in: SIGMOD, 1985.
[15]
Cormen, T., Rivest, R., Leiserson, C. and Stein, C., Introduction to Algorithms. 2009. third edition. The MIT Press.
[16]
I. Cruz, A.O. Mendelzon, P.T. Wood, A graphical query language supporting recursion, in: SIGMOD Conference, 1987.
[17]
R. Cyganiak, A relational algebra for SPARQL, Technical Report HPL-2005-170, HP Laboratories Bristol, 2005.
[18]
B. Elliott, E. Cheng, C. Thomas-Ogbuji, Z. Meral Özsoyoglu, A complete translation from SPARQL into efficient SQL, in: IDEAS, 2009.
[19]
Gassner, P., Lohman, G., Schiefer, K. and Wang, Y., Query optimization in the IBM DB2 family. IEEE Data Eng. Bull. v16 i4.
[20]
Gou, G. and Chirkova, R., Efficiently querying large XML data repositories: a survey. . TKDE. v19 i10.
[21]
G. Graefe, Sorting and indexing with partitioned B-trees, in: CIDR, 2003.
[22]
T. Grust, M. Mayr, J. Rittinger, S. Sakr, J. Teubner, A SQL: 1999 code generator for the pathfinder XQuery compiler, in: SIGMOD, 2007.
[23]
T. Grust, M. Mayr, J. Rittinger, T. Schreiber, FERRY: database-supported program execution, in: SIGMOD, 2009.
[24]
T. Grust, S. Sakr, J. Teubner, XQuery on SQL hosts, in: VLDB, 2004.
[25]
R.H. Güting, GraphDB: modeling and querying graphs in databases, in: VLDB, 1994.
[26]
M. Gyssens, J. Paredaens, D.V. Gucht, A graph-oriented object database model, in: PODS, 1990.
[27]
H. He, A. Singh, Graphs-at-a-time: query language and access methods for graph databases, in: SIGMOD, 2008.
[28]
J. Huan, W. Wang, D. Bandyopadhyay, J. Snoeyink, J. Prins, A. Tropsha, Mining protein family specific residue packing patterns from protein structure graphs, in: RECOM, 2004.
[29]
H. Jagadish, L.V.S. Lakshmanan, D. Srivastava, K. Thompson, TAX: a tree algebra for XML, in: DBPL, 2001.
[30]
R. Jin, Y. Xiang, N. Ruan, D. Fuhry, 3-HOP: a high-compression indexing scheme for reachability query, in: SIGMOD, 2009.
[31]
S. Klinger, J. Austin, Chemical similarity searching using a neural graph matcher, in: ESANN, 2005.
[32]
Juchang Lee, Yong Sik Kwon, Franz Färber, Michael Muehle, Chulwon Lee, Christian Bensberg, Joo-Yeon Lee, Arthur H. Lee, Wolfgang Lehner, SAP HANA distributed in-memory database system: transaction, session, and metadata management, in: ICDE, 2013, pp. 1165-1173.
[33]
Leser, U., A query language for biological networks. Bioinformatics. v21 i2.
[34]
G. Malewicz, M. Austern, A. Bik, J. Dehnert, I. Horn, N. Leiser, G. Czajkowski, Pregel: a system for large-scale graph processing, in: SIGMOD, 2010.
[35]
F. Manola, E. Miller, RDF primer, February 2004. {http://www.w3.org/TR/2004/REC-rdf-primer-20040210/}.
[36]
T. Neumann, G. Moerkotte, Characteristic sets: accurate cardinality estimation for RDF queries with multiple joins, in: ICDE, 2011.
[37]
Neumann, T. and Weikum, G., RDF-3X: a RISC-style engine for RDF. . PVLDB. v1 i1. 34
[38]
P. Peng, L. Zou, L. Chen, X. Lin, D. Zhao, Subgraph search over massive disk resident graphs, in: SSDBM, 2011.
[39]
Pérez, J., Arenas, M. and Gutierrez, C., Semantics and complexity of SPARQL. TODS. v34 i3.
[40]
H. Pirahesh, T. Cliff Leung, W. Hasan, A rule engine for query transformation in starburst and IBM DB2 C/S DBMS, in: ICDE, 1997.
[41]
S. Powers, Practical RDF, O'Reilly Media, 2003.
[42]
E. Prud'hommeaux, A. Seaborne, SPARQL query language for RDF, W3C recommendation, {http://www.w3.org/TR/rdf-sparql-query/}, January 2008.
[43]
Michael Rudolf, Marcus Paradies, Christof Bornhövd, Wolfgang Lehner, The Graph Story of the SAP HANA Database, BTW, 2013, pp. 403-420, {http://www.btw-2013.de/proceedings/The%20Graph%20Story%20of%20the%20SAP%20HANA%20Database.pdf}
[44]
S. Sakr, GraphREL: a decomposition-based and selectivity-aware relational framework for processing sub-graph queries, in: DASFAA, 2009.
[45]
Sakr, S. and Al-Naymat, G., Relational processing of RDF queries: a survey. . SIGMOD Rec. v38 i4.
[46]
S. Sakr, A. Awad, A framework for querying graph-based business process models, in: WWW, 2010.
[47]
S. Sakr, S. Elnikety, Y. He, G-SPARQL: a hybrid engine for querying large attributed graphs, in: CIKM, 2012.
[48]
M. Sarwat, S. Elnikety, Y. He, G. Kliot, Horton: online query execution engine for large distributed graphs, in: ICDE, 2011.
[49]
L. Sheng, Z.M. Özsoyoglu, G. Özsoyoglu, A graph query language and its query processing, in: ICDE, 1999.
[50]
Sidirourgos, L., Goncalves, R., Kersten, M., Nes, N. and Manegold, S., Column-store support for RDF data management: not all swans are white. . PVLDB. v1 i2.
[51]
Sikka, Vishal, Färber, Franz, Goel, Anil K. and Lehner, Wolfgang, SAP HANA: the evolution from a modern main-memory data platform to an enterprise application platform. . PVLDB. v6 i11. 1184-1185.
[52]
M. Stonebraker, D. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. O'Neil, P. O'Neil, A. Rasin, N. Tran, S. Zdonik, C-Store: a column-oriented DBMS, in: VLDB, 2005.
[53]
F. Suchanek, G. Kasneci, G. Weikum, Yago: a core of semantic knowledge, in: WWW, 2007.
[54]
H. Tong, C. Faloutsos, B. Gallagher, T. Eliassi-Rad, Fast best-effort pattern matching in large attributed graphs, in: KDD, 2007.
[55]
S. Triíl, U. Leser, Fast and practical indexing and querying of very large graphs, in: SIGMOD, 2007.
[56]
C. Wang, J. Han, Y. Jia, J. Tang, D. Zhang, Y. Yu, J. Guo, Mining advisor-advisee relationships from research publication networks, in: KDD, 2010.
[57]
F. Wei, TEDI: efficient shortest path query answering on graphs, in: SIGMOD, 2010.
[58]
X. Yan, P. Yu, J. Han, Graph indexing: a frequent structure-based approach, in: SIGMOD, 2004.
[59]
S. Zhang, M. Hu, J. Yang, TreePi: a novel graph indexing method, in: ICDE, 2007.
[60]
S. Zhang, J. Li, H. Gao, Z. Zou, A novel approach for efficient supergraph query processing on graph databases, in: EDBT, 2009.
[61]
S. Zhang, S. Li, J. Yang, GADDI: distance index based subgraph matching in biological networks, in: EDBT, 2009.
[62]
Zhao, P. and Han, J., On graph query optimization in large networks. PVLDB. v3 i1.
[63]
Zou, L., Chen, L., Özsu, M. and Zhao, D., Answering pattern match queries in large graph databases via graph embedding. VLDB J. v32 i5.
[64]
Zou, L., Mo, J., Chen, L., Özsu, M. and Zhao, D., gStore: answering SPARQL queries via subgraph matching. . PVLDB. v4 i8.

Cited By

View all
  • (2019)Towards compiling graph queries in relational enginesProceedings of the 17th ACM SIGPLAN International Symposium on Database Programming Languages10.1145/3315507.3330200(30-41)Online publication date: 23-Jun-2019
  • (2018)Time Complexity and Parallel Speedup of Relational Queries to Solve Graph ProblemsDatabase and Expert Systems Applications10.1007/978-3-319-98812-2_30(339-349)Online publication date: 3-Sep-2018

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Information Systems
Information Systems  Volume 41, Issue
May, 2014
74 pages

Publisher

Elsevier Science Ltd.

United Kingdom

Publication History

Published: 01 May 2014

Author Tags

  1. Access methods
  2. Data models
  3. Graph queries
  4. Graphs
  5. Query
  6. SPARQL

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 20 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Towards compiling graph queries in relational enginesProceedings of the 17th ACM SIGPLAN International Symposium on Database Programming Languages10.1145/3315507.3330200(30-41)Online publication date: 23-Jun-2019
  • (2018)Time Complexity and Parallel Speedup of Relational Queries to Solve Graph ProblemsDatabase and Expert Systems Applications10.1007/978-3-319-98812-2_30(339-349)Online publication date: 3-Sep-2018

View Options

View options

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media