Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

The LDBC Social Network Benchmark: Business Intelligence Workload

Published: 01 December 2022 Publication History

Abstract

The Social Network Benchmark's Business Intelligence workload (SNB BI) is a comprehensive graph OLAP benchmark targeting analytical data systems capable of supporting graph workloads. This paper marks the finalization of almost a decade of research in academia and industry via the Linked Data Benchmark Council (LDBC). SNB BI advances the state-of-the art in synthetic and scalable analytical database benchmarks in many aspects. Its base is a sophisticated data generator, implemented on a scalable distributed infrastructure, that produces a social graph with small-world phenomena, whose value properties follow skewed and correlated distributions and where values correlate with structure. This is a temporal graph where all nodes and edges follow lifespan-based rules with temporal skew enabling realistic and consistent temporal inserts and (recursive) deletes. The query workload exploiting this skew and correlation is based on LDBC's "choke point"-driven design methodology and will entice technical and scientific improvements in future (graph) database systems. SNB BI includes the first adoption of "parameter curation" in an analytical benchmark, a technique that ensures stable runtimes of query variants across different parameter values. Two performance metrics characterize peak single-query performance (power) and sustained concurrent query throughput. To demonstrate the portability of the benchmark, we present experimental results on a relational and a graph DBMS. Note that these do not constitute an official LDBC Benchmark Result - only audited results can use this trademarked term.

References

[1]
Takuya Akiba, Yoichi Iwata, and Yuichi Yoshida. 2013. Fast exact shortest-path distance queries on large networks by pruned landmark labeling. In SIGMOD. ACM, 349--360.
[2]
Yazeed Alabdulkarim, Sumita Barahmand, and Shahram Ghandeharizadeh. 2018. BG: A scalable benchmark for interactive social networking actions. Future Gener. Comput. Syst. 85 (2018), 29--38.
[3]
Hazim Almuhimedi, Shomir Wilson, Bin Liu, Norman M. Sadeh, and Alessandro Acquisti. 2013. Tweets are forever: A large-scale quantitative analysis of deleted tweets. In CSCW. ACM, 897--908.
[4]
Günes Aluç, Olaf Hartig, M. Tamer Özsu, and Khuzaima Daudjee. 2014. Diversified Stress Testing of RDF Data Management Systems. In ISWC. 197--212.
[5]
Renzo Angles. 2018. The Property Graph Database Model. In AMW (CEUR Workshop Proceedings), Vol. 2100. CEUR-WS.org. http://ceur-ws.org/Vol-2100/paper26.pdf
[6]
Renzo Angles, János Benjamin Antal, Alex Averbuch, Peter A. Boncz, Orri Erling, Andrey Gubichev, Vlad Haprian, Moritz Kaufmann, Josep Lluís Larriba-Pey, Norbert Martínez-Bazan, József Marton, Marcus Paradies, Minh-Duc Pham, Arnau Prat-Pérez, Mirko Spasic, Benjamin A. Steer, Gábor Szárnyas, and Jack Waudby. 2020. The LDBC Social Network Benchmark. CoRR abs/2001.02299 (2020). http://arxiv.org/abs/2001.02299
[7]
Renzo Angles, Marcelo Arenas, Pablo Barceló, Peter A. Boncz, George H. L. Fletcher, Claudio Gutierrez, Tobias Lindaaker, Marcus Paradies, Stefan Plantikow, Juan F. Sequeda, Oskar van Rest, and Hannes Voigt. 2018. G-CORE: A Core for Future Graph Query Languages. In SIGMOD. ACM, 1421--1432.
[8]
Timothy G. Armstrong, Vamsi Ponnekanti, Dhruba Borthakur, and Mark Callaghan. 2013. LinkBench: A database benchmark based on the Facebook social graph. In SIGMOD. 1185--1196.
[9]
Nurzhan Bakibayev, Dan Olteanu, and Jakub Zavodny. 2012. FDB: A Query Engine for Factorised Relational Databases. Proc. VLDB Endow. 5, 11 (2012), 1232--1243.
[10]
Christian Bizer and Andreas Schultz. 2009. The Berlin SPARQL Benchmark. Int. J. Semantic Web Inf. Syst. 5, 2 (2009), 1--24.
[11]
Peter A. Boncz, Thomas Neumann, and Orri Erling. 2013. TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark, Vol. 8391. Springer, 61--76.
[12]
Federico Busato, Oded Green, Nicola Bombieri, and David A. Bader. 2018. Hornet: An Efficient Data Structure for Dynamic Sparse Graphs and Matrices on GPUs. In HPEC. IEEE, 1--7.
[13]
Audrey Cheng et al. 2022. TAOBench: An End-to-End Benchmark for Social Networking Workloads". In VLDB.
[14]
Debezium Community. 2022. Debezium. https://debezium.io/
[15]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In SoCC. ACM, 143--154.
[16]
Alin Deutsch, Nadime Francis, Alastair Green, Keith Hare, Bei Li, Leonid Libkin, Tobias Lindaaker, Victor Marsault, Wim Martens, Jan Michels, Filip Murlak, Stefan Plantikow, Petra Selmer, Oskar van Rest, Hannes Voigt, Domagoj Vrgoc, Mingxi Wu, and Fred Zemke. 2022. Graph Pattern Matching in GQL and SQL/PGQ. In SIGMOD. ACM, 2246--2258.
[17]
Alin Deutsch, Yu Xu, Mingxi Wu, and Victor E. Lee. 2019. TigerGraph: A Native MPP Graph Database. CoRR abs/1901.08248 (2019). http://arxiv.org/abs/1901.08248
[18]
Alin Deutsch, Yu Xu, Mingxi Wu, and Victor E. Lee. 2020. Aggregation Support for Modern Graph Analytics in TigerGraph. In SIGMOD. ACM, 377--392.
[19]
Markus Dreseler, Martin Boissier, Tilmann Rabl, and Matthias Uflacker. 2020. Quantifying TPC-H Choke Points and Their Optimizations. VLDB 13, 8 (2020), 1206--1220. http://www.vldb.org/pvldb/vol13/p1206-dreseler.pdf
[20]
Orri Erling, Alex Averbuch, Josep-Lluis Larriba-Pey, Hassan Chafi, Andrey Gubichev, Arnau Prat-Pérez, Minh-Duc Pham, and Peter A. Boncz. 2015. The LDBC Social Network Benchmark: Interactive Workload. In SIGMOD. 619--630.
[21]
Michael J. Freitag, Maximilian Bandle, Tobias Schmidt, Alfons Kemper, and Thomas Neumann. 2020. Adopting Worst-Case Optimal Joins in Relational Database Systems. VLDB 13, 11 (2020), 1891--1904. http://www.vldb.org/pvldb/vol13/p1891-freitag.pdf
[22]
Jim Gray (Ed.). 1993. The Benchmark Handbook for Database and Transaction Systems (2nd ed.). Morgan Kaufmann.
[23]
Jim Gray. 2005. A "Measure of Transaction Processing" 20 Years Later. IEEE Data Eng. Bull. 28, 2 (2005), 3--4. http://sites.computer.org/debull/A05june/gray.ps
[24]
Jim Gray, Surajit Chaudhuri, Adam Bosworth, Andrew Layman, Don Reichart, Murali Venkatrao, Frank Pellow, and Hamid Pirahesh. 1997. Data Cube: A Relational Aggregation Operator Generalizing Group-by, Cross-Tab, and Sub Totals. Data Min. Knowl. Discov. 1, 1 (1997), 29--53.
[25]
Andrey Gubichev and Peter A. Boncz. 2014. Parameter Curation for Benchmark Queries. In TPCTC (Lecture Notes in Computer Science), Vol. 8904. Springer, 113--129.
[26]
Yuanbo Guo, Zhengxiang Pan, and Jeff Heflin. 2005. LUBM: A benchmark for OWL knowledge base systems. J. Web Sem. 3, 2-3 (2005), 158--182.
[27]
Pankaj Gupta, Ashish Goel, Jimmy Lin, Aneesh Sharma, Dong Wang, and Reza Zadeh. 2013. WTF: The who to follow service at Twitter. In WWW. International World Wide Web Conferences Steering Committee / ACM, 505--514.
[28]
Pranjal Gupta, Amine Mhedhbi, and Semih Salihoglu. 2021. Columnar Storage and List-based Processing for Graph Database Management Systems. Proc. VLDB Endow. 14, 11 (2021), 2491--2504.
[29]
Rihan Hai, Christoph Quix, and Matthias Jarke. 2021. Data lake concept and systems: A survey. CoRR abs/2106.09592 (2021). https://arxiv.org/abs/2106.09592
[30]
Karl Huppler. 2009. The Art of Building a Good Benchmark. In TPCTC. 18--30.
[31]
Alexandru Iosup, Tim Hegeman, Wing Lung Ngai, Stijn Heldens, Arnau Prat-Pérez, Thomas Manhardt, Hassan Chafi, Mihai Capota, Narayanan Sundaram, Michael J. Anderson, Ilie Gabriel Tanase, Yinglong Xia, Lifeng Nai, and Peter A. Boncz. 2016. LDBC Graphalytics: A Benchmark for Large-Scale Graph Analysis on Parallel and Distributed Platforms. PVLDB 9, 13 (2016), 1317--1328.
[32]
Alexandru Iosup, Ahmed Musaafir, Alexandru Uta, Arnau Prat-Pérez, Gábor Szárnyas, Hassan Chafi, Ilie Gabriel Tanase, Lifeng Nai, Michael J. Anderson, Mihai Capota, Narayanan Sundaram, Peter A. Boncz, Siegfried Depner, Stijn Heldens, Thomas Manhardt, Tim Hegeman, Wing Lung Ngai, and Yinglong Xia. 2020. The LDBC Graphalytics Benchmark. CoRR abs/2011.15028 (2020). https://arxiv.org/abs/2011.15028
[33]
Jeremy Kepner, Peter Aaltonen, David A. Bader, Aydin Buluç, Franz Franchetti, John R. Gilbert, Dylan Hutchison, Manoj Kumar, Andrew Lumsdaine, Henning Meyerhenke, Scott McMillan, Carl Yang, John D. Owens, Marcin Zalewski, Timothy G. Mattson, and José E. Moreira. 2016. Mathematical foundations of the GraphBLAS. In HPEC. IEEE.
[34]
Timo Kersten, Viktor Leis, and Thomas Neumann. 2021. Tidy Tuples and Flying Start: Fast compilation and fast execution of relational queries in Umbra. VLDB J. 30, 5 (2021), 883--905.
[35]
LDBC. 2021. Byelaws of the Linked Data Benchmark Council v1.3. https://ldbcouncil.org/docs/LDBC.Byelaws.1.3.ADOPTED.2021-01-14.pdf
[36]
Jure Leskovec, Lars Backstrom, and Jon M. Kleinberg. 2009. Meme-tracking and the dynamics of the news cycle. In SIGKDD. ACM, 497--506.
[37]
László Lőrincz, Júlia Koltai, Anna Fruzsina Győr, and Károly Takács. 2019. Collapse of an online social network: Burning social capital to create it? Soc. Networks 57 (2019), 43--53.
[38]
M. McPherson, L. Smith-Lovin, and J. M. Cook. 2001. Birds of a feather: Homophily in social networks. Annual Review of Sociology (2001), 415--444.
[39]
Amine Mhedhbi and Semih Salihoglu. 2019. Optimizing Subgraph Queries by Combining Binary and Worst-Case Optimal Joins. Proc. VLDB Endow. 12, 11 (2019), 1692--1704.
[40]
David Mizell, Kristyn J. Maschhoff, and Steven P. Reinhardt. 2014. Extending SPARQL with graph functions. In BigData. IEEE Computer Society, 46--53.
[41]
Seth A. Myers and Jure Leskovec. 2014. The bursty dynamics of the Twitter information network. In WWW. ACM, 913--924.
[42]
Thomas Neumann. 2021. Evolution of a Compiling Query Engine. Proc. VLDB Endow. 14, 12 (2021), 3207--3210.
[43]
Thomas Neumann and Michael J. Freitag. 2020. Umbra: A Disk-Based System with In-Memory Performance. In CIDR. http://cidrdb.org/cidr2020/papers/p29-neumann-cidr20.pdf
[44]
Thomas Neumann and Alfons Kemper. 2015. Unnesting Arbitrary Queries. In BTW (LNI), Vol. P-241. GI, 383--402. https://dl.gi.de/20.500.12116/2418
[45]
Thomas Neumann and Bernhard Radke. 2018. Adaptive Optimization of Very Large Join Queries. In SIGMOD. ACM, 677--692.
[46]
Hung Q. Ngo, Christopher Ré, and Atri Rudra. 2013. Skew strikes back: New developments in the theory of join algorithms. SIGMOD Rec. 42, 4 (2013), 5--16.
[47]
Dan Olteanu and Maximilian Schleich. 2016. Factorized Databases. SIGMOD Rec. 45, 2 (2016), 5--16.
[48]
Oracle. 2022. GoldenGate. https://www.oracle.com/integration/goldengate/
[49]
Minh-Duc Pham, Peter A. Boncz, and Orri Erling. 2012. S3G2: A Scalable Structure-Correlated Social Graph Generator. In TPCTC. 156--172.
[50]
Meikel Pöss, Tilmann Rabl, and Hans-Arno Jacobsen. 2017. Analysis of TPC-DS: the first standard benchmark for SQL-based big data systems. In SoCC. 573--585.
[51]
Mark Raasveldt and Hannes Mühleisen. 2017. Don't Hold My Data Hostage - A Case For Client Protocol Redesign. Proc. VLDB Endow. 10, 10 (2017), 1022--1033.
[52]
Mark Raasveldt and Hannes Mühleisen. 2019. DuckDB: An Embeddable Analytical Database. In SIGMOD. ACM, 1981--1984.
[53]
Liam Roditty and Uri Zwick. 2004. On Dynamic Shortest Paths Problems. In ESA (Lecture Notes in Computer Science), Vol. 3221. Springer, 580--591.
[54]
Marko A. Rodriguez. 2015. The Gremlin graph traversal machine and language (invited talk). In DBPL. 1--10.
[55]
Yousef Saad. 2003. Iterative methods for sparse linear systems. SIAM.
[56]
Siddhartha Sahu, Amine Mhedhbi, Semih Salihoglu, Jimmy Lin, and M. Tamer Özsu. 2020. The ubiquity of large graphs and surprising challenges of graph processing: Extended survey. VLDB J. 29, 2-3 (2020), 595--618.
[57]
Sherif Sakr, Angela Bonifati, Hannes Voigt, Alexandru Iosup, Khaled Ammar, Renzo Angles, Walid G. Aref, Marcelo Arenas, Maciej Besta, Peter A. Boncz, Khuzaima Daudjee, Emanuele Della Valle, Stefania Dumbrava, Olaf Hartig, Bernhard Haslhofer, Tim Hegeman, Jan Hidders, Katja Hose, Adriana Iamnitchi, Vasiliki Kalavri, Hugo Kapp, Wim Martens, M. Tamer Özsu, Eric Peukert, Stefan Plantikow, Mohamed Ragab, Matei Ripeanu, Semih Salihoglu, Christian Schulz, Petra Selmer, Juan F. Sequeda, Joshua Shinavier, Gábor Szárnyas, Riccardo Tommasini, Antonino Tumeo, Alexandru Uta, Ana Lucia Varbanescu, Hsiang-Yun Wu, Nikolay Yakovets, Da Yan, and Eiko Yoneki. 2021. The future is big graphs: A community view on graph processing systems. Commun. ACM 64, 9 (2021), 62--71.
[58]
Maximilian Schleich and Dan Olteanu. 2020. LMFAO: An Engine for Batches of Group-By Aggregates. Proc. VLDB Endow. 13, 12 (2020), 2945--2948.
[59]
Michael Schmidt, Thomas Hornung, Michael Meier, Christoph Pinkel, and Georg Lausen. 2009. SP2Bench: A SPARQL Performance Benchmark. In Semantic Web Information Management - A Model-Based Perspective. Springer, 371--393.
[60]
Juan Sequeda and Ora Lassila. 2021. Designing and Building Enterprise Knowledge Graphs. Morgan & Claypool Publishers.
[61]
Supreeth Shastri, Vinay Banakar, Melissa Wasserman, Arun Kumar, and Vijay Chidambaram. 2020. Understanding and Benchmarking the Impact of GDPR on Database Systems. VLDB 13, 7 (2020), 1064--1077. http://www.vldb.org/pvldb/vol13/p1064-shastri.pdf
[62]
Dávid Szakállas. 2020. Speeding Up LDBC SNB Datagen. https://ldbcouncil.org/post/speeding-up-ldbc-snb-datagen/
[63]
Dávid Szakállas. 2022. LDBC SNB Datagen - The Winding Path to SF100k. https://ldbcouncil.org/post/ldbc-snb-datagen-the-winding-path-to-sf100k/
[64]
Gábor Szárnyas, Arnau Prat-Pérez, Alex Averbuch, József Marton, Marcus Paradies, Moritz Kaufmann, Orri Erling, Peter A. Boncz, Vlad Haprian, and János Benjamin Antal. 2018. An early look at the LDBC Social Network Benchmark's Business Intelligence workload. In GRADES-NDA at SIGMOD/PODS. ACM, 9:1--9:11.
[65]
Manuel Then, Stephan Günnemann, Alfons Kemper, and Thomas Neumann. 2017. Efficient Batched Distance and Centrality Computation in Unweighted and Weighted Graphs. In BTW (LNI), Vol. P-265. GI, 247--266. https://dl.gi.de/20.500.12116/632
[66]
Manuel Then, Moritz Kaufmann, Fernando Chirigati, Tuan-Anh Hoang-Vu, Kien Pham, Alfons Kemper, Thomas Neumann, and Huy T. Vo. 2014. The More the Merrier: Efficient Multi-Source Graph Traversal. Proc. VLDB Endow. 8, 4 (2014), 449--460.
[67]
TPC (Transaction Processing Performance Council). 2010. TPC Benchmark C, revision 5.11., 132 pages. https://www.tpc.org/tpc_documents_current_versions/pdf/tpc-c_v5.11.0.pdf
[68]
TPC (Transaction Processing Performance Council). 2017. TPC Benchmark H, revision 2.18.0. (2017), 1--138. https://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.18.0.pdf
[69]
TPC (Transaction Processing Performance Council). 2019. TPC Benchmark DS, revision 2.11.0. (2019), 1--141. https://www.tpc.org/tpc_documents_current_versions/pdf/tpc-ds_v2.11.0.pdf
[70]
TPC (Transaction Processing Performance Council). 2021. TPC Pricing Specification, revision 2.7.0. (2021), 1--62. http://tpc.org/TPC_Documents_Current_Versions/pdf/TPC-Pricing_v2.7.0.pdf
[71]
Oskar van Rest, Sungpack Hong, Jinha Kim, Xuming Meng, and Hassan Chafi. 2016. PGQL: A property graph query language. In GRADES at SIGMOD.
[72]
Todd L. Veldhuizen. 2014. Leapfrog Triejoin: A Simple, Worst-Case Optimal Join Algorithm. In ICDT. OpenProceedings.org, 96--106.
[73]
D. J. Watts and S. H. Strogatz. 1998. Collective dynamics of 'small-world' networks. Nature 393 (1998), 440--442.
[74]
Jack Waudby, Benjamin A. Steer, Karim Karimov, József Marton, Peter A. Boncz, and Gábor Szárnyas. 2020. Towards Testing ACID Compliance in the LDBC Social Network Benchmark. In TPCTC. Springer, 1--17.
[75]
Jack Waudby, Benjamin A. Steer, Arnau Prat-Pérez, and Gábor Szárnyas. 2020. Supporting Dynamic Graphs and Temporal Entity Deletions in the LDBC Social Network Benchmark's Data Generator. In GRADES-NDA at SIGMOD. ACM, 8:1--8:8.
[76]
Fan Xia, Ye Li, Chengcheng Yu, Haixin Ma, and Weining Qian. 2014. BSMA: A Benchmark for Analytical Queries over Social Media Data. Proc. VLDB Endow. 7, 13 (2014), 1573--1576.
[77]
Reynold S. Xin, Joseph E. Gonzalez, Michael J. Franklin, and Ion Stoica. 2013. GraphX: A resilient distributed graph system on Spark. In GRADES at SIGMOD. CWI/ACM.
[78]
Matei Zaharia, Reynold S. Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J. Franklin, Ali Ghodsi, Joseph Gonzalez, Scott Shenker, and Ion Stoica. 2016. Apache Spark: A unified engine for big data processing. Commun. ACM 59, 11 (2016), 56--65.
[79]
Kangfei Zhao and Jeffrey Xu Yu. 2017. Graph Processing in RDBMSs. IEEE Data Eng. Bull. 40, 3 (2017), 6--17. http://sites.computer.org/debull/A17sept/p6.pdf
[80]
Xiaowei Zhu, Zhisong Fu, Zhenxuan Pan, Jin Jiang, Chuntao Hong, Yongchao Liu, Yang Fang, Wenguang Chen, and Changhua He. 2021. Taking the Pulse of Financial Activities with Online Graph Processing. ACM SIGOPS Oper. Syst. Rev. 55, 1 (2021), 84--87.

Cited By

View all

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 16, Issue 4
December 2022
426 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 December 2022
Published in PVLDB Volume 16, Issue 4

Check for updates

Badges

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)218
  • Downloads (Last 6 weeks)18
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Saving Money for Analytical Workloads in the CloudProceedings of the VLDB Endowment10.14778/3681954.368201817:11(3524-3537)Online publication date: 1-Jul-2024
  • (2024)Robust Join Processing with Diamond Hardened JoinsProceedings of the VLDB Endowment10.14778/3681954.368199517:11(3215-3228)Online publication date: 1-Jul-2024
  • (2024)Incremental Sliding Window Connectivity over Streaming GraphsProceedings of the VLDB Endowment10.14778/3675034.367504017:10(2473-2486)Online publication date: 1-Jun-2024
  • (2024)Speeding Up Subgraph Matching Queries with Schema Guided IndexProceedings of the 2024 3rd International Conference on Networks, Communications and Information Technology10.1145/3672121.3672129(34-38)Online publication date: 7-Jun-2024
  • (2024)Simple, Efficient, and Robust Hash Tables for Join ProcessingProceedings of the 20th International Workshop on Data Management on New Hardware10.1145/3662010.3663442(1-9)Online publication date: 10-Jun-2024
  • (2024)Reservoir Sampling over JoinsProceedings of the ACM on Management of Data10.1145/36549212:3(1-26)Online publication date: 30-May-2024
  • (2024)GraphScope Flex: LEGO-like Graph Computing StackCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653383(386-399)Online publication date: 9-Jun-2024
  • (2023)Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph QueriesACM Computing Surveys10.1145/360493256:2(1-40)Online publication date: 15-Sep-2023
  • (2023)Microarchitectural Analysis of Graph BI Queries on RDBMSProceedings of the 19th International Workshop on Data Management on New Hardware10.1145/3592980.3595321(102-106)Online publication date: 18-Jun-2023
  • (2023)A Key-Value Based Approach to Scalable Graph DatabaseDatabase and Expert Systems Applications10.1007/978-3-031-39847-6_26(338-344)Online publication date: 28-Aug-2023

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media