Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Hexastore: sextuple indexing for semantic web data management

Published: 01 August 2008 Publication History

Abstract

Despite the intense interest towards realizing the Semantic Web vision, most existing RDF data management schemes are constrained in terms of efficiency and scalability. Still, the growing popularity of the RDF format arguably calls for an effort to offset these drawbacks. Viewed from a relational-database perspective, these constraints are derived from the very nature of the RDF data model, which is based on a triple format. Recent research has attempted to address these constraints using a vertical-partitioning approach, in which separate two-column tables are constructed for each property. However, as we show, this approach suffers from similar scalability drawbacks on queries that are not bound by RDF property value. In this paper, we propose an RDF storage scheme that uses the triple nature of RDF as an asset. This scheme enhances the vertical partitioning idea and takes it to its logical conclusion. RDF data is indexed in six possible ways, one for each possible ordering of the three RDF elements. Each instance of an RDF element is associated with two vectors; each such vector gathers elements of one of the other types, along with lists of the third-type resources attached to each vector element. Hence, a sextuple-indexing scheme emerges. This format allows for quick and scalable general-purpose query processing; it confers significant advantages (up to five orders of magnitude) compared to previous approaches for RDF data management, at the price of a worst-case five-fold increase in index space. We experimentally document the advantages of our approach on real-world and synthetic data sets with practical queries.

References

[1]
Longwell browser, http://simile.mit.edu/longwell.
[2]
MIT Libraries Barton Catalog Data. http://simile.mit.edu/rdf-test-data/barton/.
[3]
The SIMILE Project, http://simile.mit.edu/.
[4]
D. J. Abadi, S. R. Madden, and M. Ferreira. Integrating compression and execution in column-oriented database systems. In SIGMOD, 2006.
[5]
D. J. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach. Scalable Semantic Web Data Management using vertical partitioning. In VLDB, 2007.
[6]
D. J. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach. Using the Barton Libraries dataset as an RDF benchmark. Technical Report MIT-CSAIL-TR-2007-036, MIT, 2007.
[7]
D. J. Abadi, D. S. Myers, D. J. DeWitt, and S. Madden. Materialization strategies in a column-oriented DBMS. In ICDE, 2007.
[8]
S. Alexaki, V. Christophides, G. Karvounarakis, and D. Plexousakis. On storing voluminous RDF descriptions: The case of web portal catalogs. In WebDB, 2001.
[9]
S. Alexaki, V. Christophides, G. Karvounarakis, D. Plexousakis, and K. Tolle. The ICS-FORTH RDFSuite: Managing voluminous RDF description bases. In SemWeb, 2001.
[10]
R. Angles and C. Gutiérrez. Querying RDF data from a graph database perspective. In ESWC, 2005.
[11]
D. Beckett. The design and implementation of the Redland RDF application framework. In WWW, 2001.
[12]
J. L. Beckmann, A. Halverson, R. Krishnamurthy, and J. F. Naughton. Extending RDBMSs to support sparse datasets using an interpreted attribute storage format. In ICDE, 2006.
[13]
T. Berners-Lee, J. Hendler, and O. Lassila. The Semantic Web. Scientific American, 284(5):34--43, 2001.
[14]
P. A. Boncz and M. L. Kersten. MIL primitives for querying a fragmented world. VLDB Jounral, 8(2):101--119, 1999.
[15]
P. A. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper-pipelining query execution. In CIDR, 2005.
[16]
V. Bönström, A. Hinze, and H. Schweppe. Storing RDF as a graph. In LA-WEB, 2003.
[17]
J. Broekstra, A. Kampman, and F. van Harmelen. Sesame: A generic architecture for storing and querying RDF and RDF Schema. In ISWC, 2002.
[18]
K. S. Candan, H. Liu, and R. Suvarna. Resource Description Framework: Metadata and its applications. SIGKDD Explorations Newsletter, 3(1):6--19, 2001.
[19]
J. J. Carroll, I. Dickinson, C. Dollin, D. Reynolds, A. Seaborne, and K. Wilkinson. Jena: implementing the Semantic Web recommendations. In WWW (Alternate Track Papers & Posters), 2004.
[20]
E. I. Chong, S. Das, G. Eadon, and J. Srinivasan. An efficient SQL-based RDF querying scheme. In VLDB, 2005.
[21]
S. Dar and R. Ramakrishnan. A performance study of transitive closure algorithms. In SIGMOD, 1994.
[22]
R. V. Guha. rdfDB: An RDF database. http://www.guha.com/rdfdb/.
[23]
Y. Guo, J. Heflin, and Z. Pan. Benchmarking DAML+OIL repositories. In ISWC, 2003.
[24]
Y. Guo, Z. Pan, and J. Heflin. An evaluation of knowledge base systems for large OWL datasets. In ISWC, 2004.
[25]
S. Harris and N. Gibbins. 3store: Efficient bulk RDF storage. In PSSS, 2003.
[26]
S. Harris and N. Shadbolt. SPARQL query processing with conventional relational database systems. In SSWS, 2005.
[27]
A. Harth and S. Decker. Optimized index structures for querying rdf from the web. In LA-WEB, 2005.
[28]
J. Hayes and C. Gutiérrez. Bipartite graphs as intermediate model for RDF. In ISWC, 2004.
[29]
S. Idreos, M. L. Kersten, and S. Manegold. Database cracking. In CIDR, 2007.
[30]
S. Idreos, M. L. Kersten, and S. Manegold. Updating a cracked database. In SIGMOD, 2007.
[31]
Y. Ioannidis, R. Ramakrishnan, and L. Winger. Transitive closure algorithms based on graph traversal. ACM TODS, 18(3):512--576, 1993.
[32]
M. L. Kersten and S. Manegold. Cracking the database store. In CIDR, 2005.
[33]
C. Kiefer, A. Bernstein, and M. Stocker. The fundamentals of iSPARQL - a virtual triple approach for similarity-based Semantic Web tasks. In ISWC, 2007.
[34]
Y. Kim, B. Kim, J. Lee, and H. Lim. The path index for query processing on RDF and RDF Schema. In ICACT, 2005.
[35]
E. Liarou, S. Idreos, and M. Koubarakis. Continuous RDF query processing over DHTs. In ISWC, 2007.
[36]
L. Ma, Z. Su, Y. Pan, L. Zhang, and T. Liu. RStar: an RDF storage and query system for enterprise resource management. In CIKM, 2004.
[37]
F. Manola and E. Miller, editors. RDF Primer. W3C Recommendation. WWW Consortium, 2004.
[38]
A. Matono, T. Amagasa, M. Yoshikawa, and S. Uemura. A path-based relational RDF database. In ADC, 2005.
[39]
Z. Pan and J. Heflin. DLDB: Extending relational databases to support Semantic Web queries. In PSSS, 2003.
[40]
M. Petrovic, H. Liu, and H.-A. Jacobsen. G-ToPSS: Fast filtering of graph-based metadata. In WWW, 2005.
[41]
M. Stocker, A. Seaborne, A. Bernstein, C. Kiefer, and D. Reynolds. SPARQL basic graph pattern optimization using selectivity estimation. In WWW, 2008.
[42]
M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. R. Madden, E. J. O'Neil, P. E. O'Neil, A. Rasin, N. Tran, and S. B. Zdonik. C-store: a column-oriented DBMS. In VLDB, 2005.
[43]
J. D. Ullman and M. Yannakakis. The input/output complexity of transitive closure. In SIGMOD, 1990.
[44]
R. Volz, D. Oberle, S. Staab, and B. Motik. KAON SERVER - A Semantic Web Management System. In WWW (Alternate Paper Tracks), 2003.
[45]
K. Wilkinson. Jena property table implementation. In SSWS, 2006.
[46]
K. Wilkinson, C. Sayers, H. A. Kuno, and D. Reynolds. Efficient RDF storage and retrieval in Jena2. In SWDB, 2003.
[47]
D. Wood, P. Gearon, and T. Adams. Kowari: A platform for Semantic Web storage and analysis. In XTech, 2005.

Cited By

View all
  • (2024)Sorting on Byte-Addressable Storage: The Resurgence of Tree StructureProceedings of the VLDB Endowment10.14778/3648160.364818517:6(1487-1500)Online publication date: 1-Feb-2024
  • (2024)The Ring: Worst-case Optimal Joins in Graph Databases using (Almost) No Extra SpaceACM Transactions on Database Systems10.1145/364482449:2(1-45)Online publication date: 23-Mar-2024
  • (2024)MEGR-APT: A Memory-Efficient APT Hunting System Based on Attack Representation LearningIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.339639019(5257-5271)Online publication date: 2-May-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 1, Issue 1
August 2008
1216 pages

Publisher

VLDB Endowment

Publication History

Published: 01 August 2008
Published in PVLDB Volume 1, Issue 1

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)63
  • Downloads (Last 6 weeks)7
Reflects downloads up to 16 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Sorting on Byte-Addressable Storage: The Resurgence of Tree StructureProceedings of the VLDB Endowment10.14778/3648160.364818517:6(1487-1500)Online publication date: 1-Feb-2024
  • (2024)The Ring: Worst-case Optimal Joins in Graph Databases using (Almost) No Extra SpaceACM Transactions on Database Systems10.1145/364482449:2(1-45)Online publication date: 23-Mar-2024
  • (2024)MEGR-APT: A Memory-Efficient APT Hunting System Based on Attack Representation LearningIEEE Transactions on Information Forensics and Security10.1109/TIFS.2024.339639019(5257-5271)Online publication date: 2-May-2024
  • (2024)Analyzing workload trends for boosting triple stores performanceInformation Systems10.1016/j.is.2024.102420125:COnline publication date: 1-Nov-2024
  • (2024)Compressed and queryable self-indexes for RDF archivesKnowledge and Information Systems10.1007/s10115-023-01967-766:1(381-417)Online publication date: 1-Jan-2024
  • (2023)Design and Development of a Knowledge Service Platform in the Field of Computer ScienceProceedings of the 2023 8th International Conference on Distance Education and Learning10.1145/3606094.3606108(57-65)Online publication date: 9-Jun-2023
  • (2023)Distributed SPARQL queries in collaboration with the routing protocolProceedings of the 27th International Database Engineered Applications Symposium10.1145/3589462.3589497(99-106)Online publication date: 5-May-2023
  • (2023)A Universal Question-Answering Platform for Knowledge GraphsProceedings of the ACM on Management of Data10.1145/35889111:1(1-25)Online publication date: 30-May-2023
  • (2023)Efficient query evaluation techniques over large amount of distributed linked dataInformation Systems10.1016/j.is.2023.102194115:COnline publication date: 1-May-2023
  • (2023)HyperBitData & Knowledge Engineering10.1016/j.datak.2022.102128144:COnline publication date: 1-Mar-2023
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media