research-article

Tensor relational algebra for distributed machine learning system design

Authors:

Dimitrije Jankov,

Daniel Bourgeois,

Chris JermaineAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 14, Issue 8

Pages 1338 - 1350

https://doi.org/10.14778/3457390.3457399

Published: 01 April 2021 Publication History

Abstract

We consider the question: what is the abstraction that should be implemented by the computational engine of a machine learning system? Current machine learning systems typically push whole tensors through a series of compute kernels such as matrix multiplications or activation functions, where each kernel runs on an AI accelerator (ASIC) such as a GPU. This implementation abstraction provides little built-in support for ML systems to scale past a single machine, or for handling large models with matrices or tensors that do not easily fit into the RAM of an ASIC. In this paper, we present an alternative implementation abstraction called the tensor relational algebra (TRA). The TRA is a set-based algebra based on the relational algebra. Expressions in the TRA operate over binary tensor relations, where keys are multi-dimensional arrays and values are tensors. The TRA is easily executed with high efficiency in a parallel or distributed environment, and amenable to automatic optimization. Our empirical study shows that the optimized TRA-based back-end can significantly outperform alternatives for running ML workflows in distributed clusters.

References

[1]

[n.d.]. Dask. https://dask.org/.

[2]

[n.d.]. PETSc. https://www.mcs.anl.gov/petsc/.

[3]

[n.d.]. PyTorch. https://pytorch.org/.

[4]

[n.d.]. TensorFlow. https://www.tensorflow.org/.

[5]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A system for large-scale machine learning. In 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16). 265--283.

Digital Library

[6]

Christopher Aberger, Andrew Lamb, Kunle Olukotun, and Christopher Ré. 2018. Levelheaded: A unified engine for business intelligence and linear algebra querying. In 2018 IEEE 34th International Conference on Data Engineering (ICDE). IEEE, 449--460.

[7]

Mahmoud Abo Khamis, Hung Q Ngo, XuanLong Nguyen, Dan Olteanu, and Maximilian Schleich. 2018. In-database learning with sparse tensors. In Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 325--340.

Digital Library

[8]

Mahmoud Abo Khamis, Hung Q Ngo, and Atri Rudra. 2016. FAQ: questions asked frequently. In Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. 13--28.

Digital Library

[9]

Ramesh C Agarwal, Susanne M Balle, Fred G Gustavson, Mahesh Joshi, and Prasad Palkar. 1995. A three-dimensional approach to parallel matrix multiplication. IBM Journal of Research and Development 39, 5 (1995), 575--582.

Digital Library

[10]

Satish Balay, Shrirang Abhyankar, Mark Adams, Jed Brown, Peter Brune, Kris Buschelman, Lisandro Dalcin, Alp Dener, Victor Eijkhout, W Gropp, et al. 2019. PETSc users manual. (2019).

[11]

Pablo Barceló, Nelson Higuera, Jorge Pérez, and Bernardo Subercaseaux. 2019. Expressiveness of Matrix and Tensor Query Languages in terms of ML Operators. In Proceedings of the 3rd International Workshop on Data Management for End-to-End Machine Learning. ACM, 9.

Digital Library

[12]

Brandon Barker. 2015. Message passing interface (mpi). In Workshop: High Performance Computing on Stampede, Vol. 262.

[13]

Peter Baumann, Andreas Dehmel, Paula Furtado, Roland Ritsch, and Norbert Widmann. 1998. The multidimensional database system RasDaMan. In SIGMOD Record, Vol. 27. ACM, 575--577.

Digital Library

[14]

Paul G Brown. 2010. Overview of SciDB: large scale array storage, processing and analysis. In SIGMOD. 963--968.

Digital Library

[15]

Zhuhua Cai, Zografoula Vagena, Luis Perez, Subramanian Arumugam, Peter J Haas, and Christopher Jermaine. 2013. Simulation of database-valued Markov chains using SimSQL. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 637--648.

Digital Library

[16]

Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. 2018. {TVM}: An automated end-to-end optimizing compiler for deep learning. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 578--594.

Digital Library

[17]

Jaeyoung Choi, Jack J Dongarra, Roldan Pozo, and David W Walker. 1992. ScaLA-PACK: A scalable linear algebra library for distributed memory concurrent computers. In The Fourth Symposium on the Frontiers of Massively Parallel Computation. IEEE Computer Society, 120--121.

[18]

Jason Jinquan Dai, Yiheng Wang, Xin Qiu, Ding Ding, Yao Zhang, Yanzhang Wang, Xianyan Jia, Cherry Li Zhang, Yan Wan, Zhichao Li, et al. 2019. Bigdl: A distributed deep learning framework for big data. In Proceedings of the ACM Symposium on Cloud Computing. 50--60.

Digital Library

[19]

Leonardo De Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In International conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, 337--340.

[20]

Oksana Dolmatova, Nikolaus Augsten, and Michael H Böhlen. 2020. A Relational Matrix Algebra and its Implementation in a Column Store. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2573--2587.

Digital Library

[21]

Amol Ghoting, Rajasekar Krishnamurthy, Edwin Pednault, Berthold Reinwald, Vikas Sindhwani, Shirish Tatikonda, Yuanyuan Tian, and Shivakumar Vaithyanathan. 2011. SystemML: Declarative machine learning on MapReduce. In ICDE. 231--242.

Digital Library

[22]

Rong Gu, Yun Tang, Chen Tian, Hucheng Zhou, Guanru Li, Xudong Zheng, and Yihua Huang. 2017. Improving execution concurrency of large-scale matrix multiplication on distributed data-parallel platforms. IEEE Transactions on Parallel and Distributed Systems 28, 9 (2017), 2539--2552.

Digital Library

[23]

Donghyoung Han, Yoon-Min Nam, Jihye Lee, Kyongseok Park, Hyunwoo Kim, and Min-Soo Kim. 2019. DistME: A Fast and Elastic Distributed Matrix Computation Engine using GPUs. In Proceedings of the 2019 International Conference on Management of Data. 759--774.

Digital Library

[24]

T Hunter. 2016. Tensorframes on google's tensorflow and apache spark. Bay Area Spark Meetup (2016).

[25]

Dylan Hutchison, Bill Howe, and Dan Suciu. 2017. LaraDB: A minimalist kernel for linear and relational algebra computation. In Proceedings of the 4th ACM SIGMOD Workshop on Algorithms and Systems for MapReduce and Beyond. ACM, 2.

Digital Library

[26]

Dimitrije Jankov, Shangyu Luo, Binhang Yuan, Zhuhua Cai, Jia Zou, Chris Jermaine, and Zekai J Gao. 2019. Declarative recursive computation on an RDBMS: or, why you should use a database for distributed machine learning. Proceedings of the VLDB Endowment 12, 7 (2019), 822--835.

Digital Library

[27]

Matthias Jasny, Tobias Ziegler, Tim Kraska, Uwe Roehm, and Carsten Binnig. 2020. DB4ML-An In-Memory Database Kernel with Machine Learning Support. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 159--173.

Digital Library

[28]

Mahmoud Abo Khamis, Hung Q Ngo, XuanLong Nguyen, Dan Olteanu, and Maximilian Schleich. 2018. AC/DC: in-database learning thunderstruck. In Proceedings of the Second Workshop on Data Management for End-To-End Machine Learning. 1--10.

Digital Library

[29]

Mijung Kim and K Selçuk Candan. 2014. Efficient static and dynamic in-database tensor decompositions on chunk-based array stores. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. 969--978.

Digital Library

[30]

Mijung Kim and K Selçuk Candan. 2014. Tensordb: In-database tensor manipulation with tensor-relational query plans. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. ACM, 2039--2041.

Digital Library

[31]

Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The tensor algebra compiler. Proceedings of the ACM on Programming Languages 1, OOPSLA (2017), 1--29.

Digital Library

[32]

Sören Laue, Matthias Mitterreiter, and Joachim Giesen. 2020. A simple and efficient tensor calculus. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 4527--4534.

[33]

Shen Li, Yanli Zhao, Rohan Varma, Omkar Salpekar, Pieter Noordhuis, Teng Li, Adam Paszke, Jeff Smith, Brian Vaughan, Pritam Damania, et al. [n.d.]. PyTorch Distributed: Experiences on Accelerating Data Parallel Training. Proceedings of the VLDB Endowment 13, 12 ([n. d.]).

Digital Library

[34]

Xupeng Li, Bin Cui, Yiru Chen, Wentao Wu, and Ce Zhang. 2017. Mlog: Towards declarative in-database machine learning. Proceedings of the VLDB Endowment 10, 12 (2017), 1933--1936.

Digital Library

[35]

Shangyu Luo, Zekai J Gao, Michael Gubanov, Luis L Perez, and Christopher Jermaine. 2018. Scalable linear algebra on a relational database system. IEEE Transactions on Knowledge and Data Engineering 31, 7 (2018), 1224--1238.

[36]

Julian McAuley, Rahul Pandey, and Jure Leskovec. 2015. Inferring networks of substitutable and complementary products. In Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. 785--794.

Digital Library

[37]

Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton Van Den Hengel. 2015. Image-based recommendations on styles and substitutes. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. 43--52.

Digital Library

[38]

Philipp Moritz, Robert Nishihara, Ion Stoica, and Michael I Jordan. 2015. Sparknet: Training deep networks in spark. arXiv preprint arXiv:1511.06051 (2015).

[39]

Matthew Rocklin. 2015. Dask: Parallel computation with blocked algorithms and task scheduling. In Proceedings of the 14th python in science conference, Vol. 126. Citeseer.

[40]

Edgar Solomonik and James Demmel. 2011. Communication-optimal parallel 2.5 D matrix multiplication and LU factorization algorithms. In European Conference on Parallel Processing. Springer, 90--109.

Digital Library

[41]

D Team et al. 2016. Deeplearning4j: Open-source distributed deep learning for the jvm. Apache Software Foundation License 2 (2016), 2.

[42]

Robert A Van De Geijn and Jerrell Watts. 1997. SUMMA: Scalable universal matrix multiplication algorithm. Concurrency: Practice and Experience 9, 4 (1997), 255--274.

[43]

Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S Moses, Sven Verdoolaege, Andrew Adams, and Albert Cohen. 2018. Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions. arXiv preprint arXiv:1802.04730 (2018).

[44]

Pete Warden. 2018. Speech commands: A dataset for limited-vocabulary speech recognition. arXiv preprint arXiv:1804.03209 (2018).

[45]

Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, and Ion Stoica. 2010. Spark: cluster computing with working sets. In USENIX HotCloud. 1--10.

Digital Library

Cited By

Xu LQiu SYuan BJiang JRenggli CGan SKara KLi GLiu JWu WYe JZhang C(2024)Stochastic gradient descent without full data shuffle: with applications to in-database machine learning and deep learning systemsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00845-033:5(1231-1255)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1007/s00778-024-00845-0
Wang GLiu ZJiang ZLiu NZou NHu XKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)DIVISIONProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619904(36036-36057)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619904
Guan HMasood SDwarampudi MGunda VMin HYu LNag SZou J(2023)A Comparison of End-to-End Decision Forest Inference PipelinesProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624656(200-215)Online publication date: 30-Oct-2023
https://dl.acm.org/doi/10.1145/3620678.3624656
Show More Cited By

Index Terms

Tensor relational algebra for distributed machine learning system design
1. Computing methodologies

Index terms have been assigned to the content through auto-classification.

Recommendations

Sparse Tensor Algebra Compilation
One-class support higher order tensor machine classifier

One-class classification problems have been widely encountered in the fields that the negative class patterns are difficult to be collected, and the one-class support vector machine is one of the popular algorithms for solving them. However, one-class ...
FEASTA: A Flexible and Efficient Accelerator for Sparse Tensor Algebra in Machine Learning
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3

Recently, sparse tensor algebra (SpTA) plays an increasingly important role in machine learning. However, due to the unstructured sparsity of SpTA, the general-purpose processors (e.g., GPU and CPU) are inefficient because of the underutilized hardware ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 14, Issue 8

April 2021

200 pages

ISSN:2150-8097

Editors:
Xin Luna Dong
Amazon
,
Felix Naumann
HPI, University of Potsdam

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 April 2021

Published in PVLDB Volume 14, Issue 8

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
85
Total Downloads

Downloads (Last 12 months)32
Downloads (Last 6 weeks)10

Reflects downloads up to 26 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Xu LQiu SYuan BJiang JRenggli CGan SKara KLi GLiu JWu WYe JZhang C(2024)Stochastic gradient descent without full data shuffle: with applications to in-database machine learning and deep learning systemsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00845-033:5(1231-1255)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1007/s00778-024-00845-0
Wang GLiu ZJiang ZLiu NZou NHu XKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)DIVISIONProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619904(36036-36057)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619904
Guan HMasood SDwarampudi MGunda VMin HYu LNag SZou J(2023)A Comparison of End-to-End Decision Forest Inference PipelinesProceedings of the 2023 ACM Symposium on Cloud Computing10.1145/3620678.3624656(200-215)Online publication date: 30-Oct-2023
https://dl.acm.org/doi/10.1145/3620678.3624656
Kovach SKolichala PGu TKjolstad F(2023)Indexed Streams: A Formal Intermediate Representation for Fused Contraction ProgramsProceedings of the ACM on Programming Languages10.1145/35912687:PLDI(1169-1193)Online publication date: 6-Jun-2023
https://dl.acm.org/doi/10.1145/3591268
Schleich MShaikhha ASuciu D(2023)Optimizing Tensor Programs on Flexible StorageProceedings of the ACM on Management of Data10.1145/35887171:1(1-27)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588717
Boehm MInterlandi MJermaine CDas SPandis ISelçuk Candan KAmer-Yahia S(2023)Optimizing Tensor Computations: From Applications to Compilation and Runtime TechniquesCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589407(53-59)Online publication date: 4-Jun-2023
https://dl.acm.org/doi/10.1145/3555041.3589407
He DNakandala SBanda DSen RSaur KPark KCurino CCamacho-Rodríguez JKaranasos KInterlandi M(2022)Query processing on tensor computation runtimesProceedings of the VLDB Endowment10.14778/3551793.355183315:11(2811-2825)Online publication date: 1-Jul-2022
https://dl.acm.org/doi/10.14778/3551793.3551833
Ahrens PKjolstad FAmarasinghe SJhala RDillig I(2022)Autoscheduling for sparse tensor algebra with an asymptotic cost modelProceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation10.1145/3519939.3523442(269-285)Online publication date: 9-Jun-2022
https://dl.acm.org/doi/10.1145/3519939.3523442
Xu LQiu SYuan BJiang JRenggli CGan SKara KLi GLiu JWu WYe JZhang CIves ZBonifati AEl Abbadi A(2022)In-Database Machine Learning with CorgiPile: Stochastic Gradient Descent without Full Data ShuffleProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526150(1286-1300)Online publication date: 10-Jun-2022
https://dl.acm.org/doi/10.1145/3514221.3526150
Olteanu DVortmeier NZivanović ĐIves ZBonifati AEl Abbadi A(2022)Givens QR Decomposition over Relational DatabasesProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526144(1948-1961)Online publication date: 10-Jun-2022
https://dl.acm.org/doi/10.1145/3514221.3526144
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents