Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3302424.3303962acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

Matrix Algebra Framework for Portable, Scalable and Efficient Query Engines for RDF Graphs

Published: 25 March 2019 Publication History

Abstract

Existing query engines for RDF graphs follow one of two design paradigms: relational or graph-based. We explore sparse matrix algebra as a third paradigm and propose MAGiQ: a framework for implementing SPARQL query engines that are portable on various hardware architectures, scalable over thousands of compute nodes, and efficient for very large RDF datasets. MAGiQ represents the RDF graph as a sparse matrix and defines a domain-specific language of algebraic operations. SPARQL queries are translated into matrix algebra programs that are oblivious to the underlying computing infrastructure. Existing matrix algebra libraries, optimized for each particular architecture, are called to execute the program and handle the performance issues. We present three case studies of matrix algebra back-end libraries: SuiteSparse, MATLAB, and CombBLAS; we demonstrate how MAGiQ can effortlessly be ported on a variety of architectures such as Intel CPUs, NVIDIA GPUs, and Cray XC40 supercomputers. Our experiments on large-scale real and synthetic datasets show that MAGiQ performs comparably to or better than existing specialized SPARQL query engines for data-intensive queries, scales to very large computing infrastructures, and handles datasets with up to 512 billion triples.

References

[1]
Basic Linear Algebra Subprograms. http://www.netlib.org/blas/.
[2]
Bio2RDF Repository. http://bio2rdf.org/.
[3]
GraphBLAS Standard. http://graphblas.org.
[4]
Intel MKL. https://software.intel.com/mkl.
[5]
LUBM Benchmark. http://swat.cse.lehigh.edu/projects/lubm.
[6]
NVIDIA cuSPARSE. https://developer.nvidia.com/cusparse.
[7]
SPARQL. https://www.w3.org/TR/rdf-sparql-query/.
[8]
SuiteSparse. http://faculty.cse.tamu.edu/davis/suitesparse.html.
[9]
Urika-GD. http://www.cray.com/sites/default/files/resources/Urika-GD-TechSpecs.pdf.
[10]
W3C: RDF. http://www.w3.org/RDF.
[11]
WatDiv Benchmark. http://dsg.uwaterloo.ca/watdiv/.
[12]
YAGO2 Repository. http://yago-knowledge.org.
[13]
Ibrahim Abdelaziz, Razen Harbi, Zuhair Khayyat, and Panos Kalnis. 2017. A Survey and Experimental Comparison of Distributed SPARQL Engines for Very Large RDF Data. PVLDB 10, 13 (2017), 2049--2060.
[14]
Ibrahim Abdelaziz, Razen Harbi, Semih Salihoglu, and Panos Kalnis. 2017. Combining Vertex-centric Graph Processing with SPARQL for Large-scale RDF Data Analytics. IEEE TPDS 28, 12 (2017), 3374--3388.
[15]
Yousuf Ahmad, Omar Khattab, Arsal Malik, Ahmad Musleh, Mohammad Hammoud, Mucahid Kutlu, Mostafa Shehata, and Elsayed Tamer. 2018. LA3: A Scalable Link- and Locality-Aware Linear Algebra-Based Graph Analytics System. PVLDB 11, 8 (2018), 920--933.
[16]
Ziyad AlGhamdi, Fuad Jamour, Spiros Skiadopoulos, and Panos Kalnis. 2017. A Benchmark for Betweenness Centrality Approximation Algorithms on Large Graphs. In SSDBM 2017.
[17]
Michael J Anderson, Narayanan Sundaram, Nadathur Satish, Md Mostofa Ali Patwary, Theodore L Willke, and Pradeep Dubey. GraphPad: Optimized Graph Primitives for Parallel and Distributed Platforms. In IEEE IPDPS 2016.
[18]
Medha Atre, Vineet Chaoji, Mohammed J. Zaki, and James A. Hendler. Matrix "Bit" Loaded: a Scalable Lightweight Join Query Processor for RDF Data. In WWW 2010.
[19]
Scott Beamer, Aydın Buluç, Krste Asanovic, and David Patterson. Distributed Memory Breadth-first Search Revisited: Enabling Bottom-up Search. In IEEE IPDPSW 2013.
[20]
Maciej Besta, Florian Marending, Edgar Solomonik, and Torsten Hoefler. SlimSell: A Vectorizable Graph Representation for Breadth-First Search. In IEEE IPDPS 2017.
[21]
Angela Bonifati, Wim Martens, and Thomas Timm. 2017. An Analytical Study of Large SPARQL Query Logs. PVLDB 11, 2 (2017), 149--161.
[22]
Aydın Buluç and John R Gilbert. On the Representation and Multiplication of Hypersparse Matrices. In IEEE IPDPS 2008.
[23]
Aydın Buluç and John R Gilbert. 2011. The Combinatorial BLAS: Design, Implementation, and Applications. IJHPCA 25, 4 (2011), 496--509.
[24]
Aydın Buluç and John R Gilbert. 2012. Parallel Sparse Matrix-matrix Multiplication and Indexing: Implementation and Experiments. SIAM Journal on Scientific Computing 34, 4 (2012), 170--191.
[25]
Aydın Buluç, Tim Mattson, Scott McMillan, José Moreira, and Carl Yang. Design of the GraphBLAS API for C. In IEEE IPDPSW 2017.
[26]
Chantana Chantrapornchai and Chidchanok Choksuchat. 2018. TripleID-Q: RDF Query Processing Framework using GPU. IEEE TPDS 29, 9 (2018), 2121--2135.
[27]
Timothy Davis. 2018. Algorithm 9xx: SuiteSparse: GraphBLAS: Graph Algorithms in the Language of Sparse Linear Algebra. Submitted to ACM TOMS (2018).
[28]
Timothy Davis. 2018. Graph Algorithms via SuiteSparse: GraphBLAS: Triangle Counting and K-truss. In IEEE HPEC 2018.
[29]
Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In USENIX OSDI 2004.
[30]
Iain S Duff, Roger G Grimes, and John G Lewis. 1989. Sparse Matrix Test Problems. ACM TOMS 15, 1 (1989), 1--14.
[31]
Kattamuri Ekanadham, William P Horn, Manoj Kumar, Joefon Jann, José Moreira, Pratap Pattnaik, Mauricio Serrano, Gabriel Tanase, and Hao Yu. Graph Programming Interface (GPI): a Linear Algebra Programming Model for Large Scale Graph Computations. In ACM ICCF 2016.
[32]
Orri Erling. 2012. Virtuoso, a Hybrid RDBMS/Graph Column Store. IEEE Data Engineering Bulletin 35, 1 (2012), 3--8.
[33]
Mario Arias Gallego, Javier D Fernández, Miguel A Martínez-Prieto, and Pablo de la Fuente. 2011. An Empirical Study of Real-world SPARQL Queries. In USEWOD workshop 2011.
[34]
Sairam Gurajada, Stephan Seufert, Iris Miliaraki, and Martin Theobald. TriAD: a Distributed Shared-nothing RDF Engine Based on Asynchronous Message Passing. In ACM SIGMOD 2014.
[35]
Razen Harbi, Ibrahim Abdelaziz, Panos Kalnis, Nikos Mamoulis, Yasser Ebrahim, and Majed Sahli. 2016. Accelerating SPARQL Queries by Exploiting Hash-based Locality and Adaptive Partitioning. VLDBJ 25, 3 (2016), 355--380.
[36]
Liang He, Bin Shao, Yatao Li, Huanhuan Xia, Yanghua Xiao, Enhong Chen, and Liang Jeff Chen. 2017. Stylus: A Strongly-Typed Store for Serving Massive RDF Data. PVLDB 11, 2 (2017).
[37]
Fuad Jamour, Ibrahim Abdelaziz, and Panos Kalnis. 2018. A Demonstration of MAGiQ: Matrix Algebra Approach for Solving RDF Graph Queries. PVLDB 11, 12 (2018), 1978--1981.
[38]
Zoi Kaoudi and Ioana Manolescu. 2015. RDF in the Clouds: A Survey. VLDBJ 24, 1 (2015), 67--91.
[39]
Jeremy Kepner, Peter Aaltonen, David Bader, Aydın Buluç, Franz Franchetti, John Gilbert, Dylan Hutchison, Manoj Kumar, Andrew Lumsdaine, Henning Meyerhenke, et al. Mathematical Foundations of the GraphBLAS. In IEEE HPEC 2016.
[40]
Jeremy Kepner, David Bader, Aydın Buluç, John Gilbert, Timothy Mattson, and Henning Meyerhenke. 2015. Graphs, Matrices, and the GraphBLAS: Seven Good Reasons. Procedia Computer Science 51 (2015), 2453--2462.
[41]
Jeremy Kepner and John Gilbert. 2011. Graph Algorithms in the Language of Linear Algebra. SIAM.
[42]
Kisung Lee and Ling Liu. 2013. Scaling Queries over Big RDF Graphs with Semantic Hash Partitioning. PVLDB 6, 14 (2013), 1894--1905.
[43]
Grzegorz Malewicz, Matthew Austern, Aart Bik, James Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: a System for Large-scale Graph Processing. In ACM SIGMOD 2010.
[44]
Saskia Metzler and Pauli Miettinen. 2015. On Defining SPARQL with Boolean Tensor Algebra. arXiv preprint arXiv:1503.00301 (2015).
[45]
Thomas Neumann and Gerhard Weikum. 2010. The RDF-3X Engine for Scalable Management of RDF Data. VLDBJ 19, 1 (2010), 91--113.
[46]
M Tamer Özsu. 2016. A Survey of RDF Data Management Systems. Frontiers of Computer Science 10, 3 (2016), 418--432.
[47]
Alexander Schätzle, Martin Przyjaciel-Zablocki, Simon Skilevic, and Georg Lausen. 2016. S2RDF: RDF Querying with SPARQL on Spark. PVLDB 9, 10 (2016), 804--815.
[48]
Jiaxin Shi, Youyang Yao, Rong Chen, Haibo Chen, and Feifei Li. Fast and Concurrent RDF Queries with RDMA-Based Distributed Graph Exploration. In USENIX OSDI 2016.
[49]
Edgar Solomonik, Maciej Besta, Flavio Vella, and Torsten Hoefler. Scaling Betweenness Centrality using Communication-Efficient Sparse Matrix Multiplication. In ACM/IEEE SC 2017.
[50]
Narayanan Sundaram, Nadathur Satish, Md Mostofa Ali Patwary, Subramanya R Dulloor, Michael J Anderson, Satya Gautam Vadlamudi, Dipankar Das, and Pradeep Dubey. 2015. GraphMat: High Performance Graph Analytics Made Productive. PVLDB 8, 11 (2015), 1214--1225.
[51]
Siyuan Wang, Chang Lou, Rong Chen, and Haibo Chen. Fast and Concurrent RDF Queries using RDMA-assisted GPU Graph Exploration. In USENIX ATC 2018.
[52]
Cathrin Weiss, Panagiotis Karras, and Abraham Bernstein. 2008. Hexa-store: Sextuple Indexing for Semantic Web Data Management. PVLDB 1, 1 (2008), 1008--1019.
[53]
Pingpeng Yuan, Pu Liu, Buwen Wu, Hai Jin, Wenya Zhang, and Ling Liu. 2013. TripleBit: A Fast and Compact System for Large Scale RDF Data. PVLDB 6, 7 (2013), 517--528.
[54]
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J Franklin, Scott Shenker, and Ion Stoica. Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing. In USENIX NSDI 2012.
[55]
Kai Zeng, Jiacheng Yang, Haixun Wang, Bin Shao, and Zhongyuan Wang. 2013. A Distributed Graph Engine for Web Scale RDF Data. PVLDB 6, 4 (2013), 265--276.
[56]
Lei Zou, Jinghui Mo, Lei Chen, M. Tamer Özsu, and Dongyan Zhao. 2011. gStore: Answering SPARQL Queries via Subgraph Matching. PVLDB 4, 8 (2011), 482--493.

Cited By

View all
  • (2023)SGSI – A Scalable GPU-Friendly Subgraph Isomorphism AlgorithmIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.323074435:11(11899-11916)Online publication date: 1-Nov-2023
  • (2023)RSG-Search: Semantic Traffic Scene Retrieval Using Graph-Based Scene Representation2023 IEEE Intelligent Vehicles Symposium (IV)10.1109/IV55152.2023.10186641(1-8)Online publication date: 4-Jun-2023
  • (2023)Exploiting Fusion Opportunities in Linear Algebraic Graph Query Engines2023 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC58863.2023.10363617(1-7)Online publication date: 25-Sep-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
EuroSys '19: Proceedings of the Fourteenth EuroSys Conference 2019
March 2019
714 pages
ISBN:9781450362818
DOI:10.1145/3302424
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 March 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Graph Query Engines
  2. Matrix Algebra
  3. RDF

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

EuroSys '19
Sponsor:
EuroSys '19: Fourteenth EuroSys Conference 2019
March 25 - 28, 2019
Dresden, Germany

Acceptance Rates

Overall Acceptance Rate 241 of 1,308 submissions, 18%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)39
  • Downloads (Last 6 weeks)7
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)SGSI – A Scalable GPU-Friendly Subgraph Isomorphism AlgorithmIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.323074435:11(11899-11916)Online publication date: 1-Nov-2023
  • (2023)RSG-Search: Semantic Traffic Scene Retrieval Using Graph-Based Scene Representation2023 IEEE Intelligent Vehicles Symposium (IV)10.1109/IV55152.2023.10186641(1-8)Online publication date: 4-Jun-2023
  • (2023)Exploiting Fusion Opportunities in Linear Algebraic Graph Query Engines2023 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC58863.2023.10363617(1-7)Online publication date: 25-Sep-2023
  • (2022)TCUDB: Accelerating Database with Tensor ProcessorsProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517869(1360-1374)Online publication date: 10-Jun-2022
  • (2022)Wukong+G: Fast and Concurrent RDF Query Processing Using RDMA-Assisted GPU Graph ExplorationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.312156833:7(1619-1635)Online publication date: 1-Jul-2022
  • (2022)Space/time-efficient RDF stores based on circular suffix sortingThe Journal of Supercomputing10.1007/s11227-022-04890-w79:5(5643-5683)Online publication date: 25-Oct-2022
  • (2021)Fast and Accurate Optimizer for Query Processing over Knowledge GraphsProceedings of the ACM Symposium on Cloud Computing10.1145/3472883.3486991(503-517)Online publication date: 1-Nov-2021
  • (2021)cuTSProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3458817.3476214(1-14)Online publication date: 14-Nov-2021
  • (2021)Database systems research in the Arab worldCommunications of the ACM10.1145/344775064:4(120-123)Online publication date: 22-Mar-2021
  • (2021)Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS49936.2021.00018(90-100)Online publication date: May-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media