Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

AWARE: Workload-aware, Redundancy-exploiting Linear Algebra

Published: 30 May 2023 Publication History

Abstract

Compression is an effective technique for fitting data in available memory, reducing I/O, and increasing instruction parallelism. While data systems primarily rely on lossless compression, modern machine learning (ML) systems exploit the approximate nature of ML and mostly use lossy compression via low-precision floating- or fixed-point representations. The resulting unknown impact on learning progress, and model accuracy, however, create trust concerns, that require trial and error, and are problematic for declarative ML pipelines. Given the trend towards increasingly complex, composite ML pipelines---with outer loops for hyper-parameter tuning, feature selection, and data cleaning/augmentation---it is hard for a user to infer the impact of lossy compression. Sparsity exploitation is a common lossless scheme used to improve performance without this uncertainty. Evolving this concept to general redundancy-exploiting compression is a natural next step. Existing work on lossless compression and compressed linear algebra (CLA) enable such exploitation to a degree, but face challenges for general applicability. In this paper, we address these limitations with a workload-aware compression framework, comprising a broad spectrum of new compression schemes and kernels. Instead of a data-centric approach that optimizes compression ratios, our workload-aware compression summarizes the workload of an ML pipeline, and optimizes the compression and execution plan to minimize execution time. On various micro benchmarks and end-to-end ML pipelines, we observe improvements for individual operations up to 10,000x and ML algorithms up to νmprint6.6 x compared to uncompressed operations.

Supplemental Material

MP4 File
Presentation video for SIGMOD 2023 of AWARE: Workload-aware, Redundancy-exploiting Linear Algebra.
PDF File
Read me
ZIP File
Source Code

References

[1]
Daniel J. Abadi, Peter A. Boncz, and Stavros Harizopoulos. 2009. Column oriented Database Systems. PVLDB, Vol. 2, 2 (2009), 1664--1665. https://doi.org/10.14778/1687553.1687625
[2]
Daniel J. Abadi, Samuel Madden, and Miguel Ferreira. 2006. Integrating compression and execution in column-oriented database systems. In SIGMOD. 671--682. https://doi.org/10.1145/1142473.1142548
[3]
Mart'i n Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek Gordon Murray, Benoit Steiner, Paul A. Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In OSDI. 265--283. https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi
[4]
Amir Abboud, Arturs Backurs, Karl Bringmann, and Marvin Kü nnemann. 2020. Impossibility Results for Grammar-Compressed Linear Algebra. In NeurIPS. https://arxiv.org/abs/2010.14181
[5]
Alexander Alexandrov, Rico Bergmann, Stephan Ewen, Johann-Christoph Freytag, Fabian Hueske, Arvid Heise, Odej Kao, Marcus Leich, Ulf Leser, Volker Markl, Felix Naumann, Mathias Peters, Astrid Rheinl"a nder, Matthias J. Sax, Sebastian Schelter, Mareike Hö ger, Kostas Tzoumas, and Daniel Warneke. 2014. The Stratosphere platform for big data analytics. VLDB J., Vol. 23, 6 (2014), 939--964. https://doi.org/10.1007/s00778-014-0357-y
[6]
American Statistical Association (ASA). 2009. Airline on-time performance dataset. https://stat-computing.org/dataexpo/2009/the-data.html.
[7]
Manos Athanassoulis, Kenneth S. Bøgh, and Stratos Idreos. 2019. Optimal Column Layout for Hybrid Workloads. In PVLDB. 2393--2407. https://doi.org/10.14778/3358701.3358707
[8]
Sebastian Baunsgaard, Matthias Boehm, Ankit Chaudhary, Behrouz Derakhshan, Stefan Geißelsö der, Philipp M. Grulich, Michael Hildebrand, Kevin Innerebner, Volker Markl, Claus Neubauer, Sarah Osterburg, Olga Ovcharenko, Sergey Redyuk, Tobias Rieger, Alireza Rezaei Mahdiraji, Sebastian Benjamin Wrede, and Steffen Zeuch. 2021. ExDRa: Exploratory Data Science on Federated Raw Data. In SIGMOD. 2450--2463. https://doi.org/10.1145/3448016.3457549
[9]
Sebastian Baunsgaard, Matthias Boehm, Kevin Innerebner, Mito Kehayov, Florian Lackner, Olga Ovcharenko, Arnab Phani, Tobias Rieger, David Weissteiner, and Sebastian Benjamin Wrede. 2022. Federated Data Preparation, Learning, and Debugging in Apache SystemDS. In CIKM. 4813--4817. https://doi.org/10.1145/3511808.3557162
[10]
Souvik Bhattacherjee, Amol Deshpande, and Alan Sussman. 2014. PStore: an efficient storage framework for managing scientific data. In SSDBM. 25:1--25:12. https://doi.org/10.1145/2618243.2618268
[11]
Carsten Binnig, Stefan Hildenbrand, and Franz F"a rber. 2009. Dictionary-based order-preserving string compression for main memory column stores. In SIGMOD. 283--296. https://doi.org/10.14778/2536222.2536233
[12]
Davis W. Blalock, Samuel Madden, and John V. Guttag. 2018. Sprintz: Time Series Compression for the Internet of Things. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, 3 (2018), 93:1--93:23. https://doi.org/10.1145/3264903
[13]
Matthias Boehm, Michael Dusenberry, Deron Eriksson, Alexandre V. Evfimievski, Faraz Makari Manshadi, Niketan Pansare, Berthold Reinwald, Frederick Reiss, Prithviraj Sen, Arvind Surve, and Shirish Tatikonda. 2016. SystemML: Declarative Machine Learning on Spark. PVLDB, Vol. 9, 13 (2016), 1425--1436. https://doi.org/10.14778/3007263.3007279
[14]
Matthias Boehm, Berthold Reinwald, Dylan Hutchison, Prithviraj Sen, Alexandre V. Evfimievski, and Niketan Pansare. 2018. On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML. PVLDB, Vol. 11, 12 (2018), 1755--1768. https://doi.org/10.14778/3229863.3229865
[15]
Matthias Bö hm, Douglas R. Burdick, Alexandre V. Evfimievski, Berthold Reinwald, Frederick R. Reiss, Prithviraj Sen, Shirish Tatikonda, and Yuanyuan Tian. 2014. SystemML's Optimizer: Plan Generation for Large-Scale Machine Learning Programs. IEEE Data Eng. Bull., Vol. 37, 3 (2014), 52--62.
[16]
Martin Boissier. 2022. Robust and Budget-Constrained Encoding Configurations for In-Memory Database Systems. PVLDB, Vol. 15, 4, 780--793. https://doi.org/10.14778/3503585.3503588
[17]
Martin Boissier and Max Jendruk. 2019. Workload-Driven and Robust Selection of Compression Schemes for Column Stores. In EDBT. 674--677. https://doi.org/10.5441/002/edbt.2019.84
[18]
Léon Bottou and Gaëlle Loosli. 2007. The infinite MNIST dataset. https://leon.bottou.org/projects/infimnist.
[19]
Lujing Cen, Andreas Kipf, Ryan Marcus, and Tim Kraska. 2021. LEA: A Learned Encoding Advisor for Column Stores. In aiDM@SIGMOD. 32--35. https://doi.org/10.1145/3464509.3464885
[20]
Lingjiao Chen, Arun Kumar, Jeffrey F. Naughton, and Jignesh M. Patel. 2017. Towards Linear Algebra over Normalized Data. PVLDB, Vol. 10, 11 (2017), 1214--1225. https://doi.org/10.14778/3137628.3137633
[21]
Patrick Damme, Dirk Habich, Juliana Hildebrandt, and Wolfgang Lehner. 2017. Lightweight Data Compression Algorithms: An Experimental Survey (Experiments and Analyses). In EDBT. 72--83. https://doi.org/10.5441/002/edbt.2017.08
[22]
Patrick Damme, Annett Ungethü m, Juliana Hildebrandt, Dirk Habich, and Wolfgang Lehner. 2019. From a Comprehensive Experimental Survey to a Cost-based Selection Strategy for Lightweight Integer Compression Algorithms. ACM Trans. Database Syst., Vol. 44, 3 (2019), 9:1--9:46. https://doi.org/10.1145/3323991
[23]
Patrick Damme, Annett Ungethüm, Johannes Pietrzyk, Alexander Krause, Dirk Habich, and Wolfgang Lehner. 2020. MorphStore: Analytical Query Engine with a Holistic Compression-Enabled Processing Model. In PVLDB. 2396--2410. https://doi.org/10.14778/3407790.3407833
[24]
Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. https://archive.ics.uci.edu/ml
[25]
Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, and Berthold Reinwald. 2016. Compressed Linear Algebra for Large-Scale Machine Learning. PVLDB, Vol. 9, 12 (2016), 960--971. https://doi.org/10.14778/2994509.2994515
[26]
Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, and Berthold Reinwald. 2018. Compressed linear algebra for large-scale machine learning. VLDB J., Vol. 27, 5 (2018), 719--744. https://doi.org/10.1145/3318221
[27]
Paolo Ferragina, Travis Gagie, Dominik Kö ppl, Giovanni Manzini, Gonzalo Navarro, Manuel Striani, and Francesco Tosoni. 2022. Improving Matrix-vector Multiplication via Lossless Grammar-Compressed Matrices. CoRR, Vol. abs/2203.14540 (2022). https://doi.org/10.14778/3547305.3547321
[28]
Jonathan Frankle, Gintare Karolina Dziugaite, Daniel Roy, and Michael Carbin. 2021. Pruning Neural Networks at Initialization: Why Are We Missing the Mark?. In ICLR. https://openreview.net/forum?id=Ig-VyQc-MLK
[29]
Google. 2019. TensorFlow Model Optimization Toolkit - Pruning API. https://blog.tensorflow.org/2019/05/tf-model-optimization-toolkit-pruning-API.html.
[30]
Google. 2020. Quantization Aware Training with TensorFlow Model Optimization Toolkit - Performance with Accuracy. https://blog.tensorflow.org/2020/04/quantization-aware-training-with-tensorflow-model-optimization-toolkit.html.
[31]
The HDF Group. 2021. The HDF5 Library and File Format. https://www.hdfgroup.org/solutions/hdf5/.
[32]
Ruining He and Julian McAuley. 2016. Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering. In WWW. 507--517. https://doi.org/10.1145/2872427.2883037
[33]
Allison L. Holloway, Vijayshankar Raman, Garret Swart, and David J. DeWitt. 2007. How to barter bits for chronons: compression and bandwidth trade offs for database scans. In SIGMOD. 389--400. https://doi.org/10.1145/1247480.1247525
[34]
Amir Ilkhechi, Andrew Crotty, Alex Galakatos, Yicong Mao, Grace Fan, Xiran Shi, and Ugur cC etintemel. 2020. DeepSqueeze: Deep Semantic Compression for Tabular Data. In SIGMOD. 1733--1746. https://doi.org/10.1145/3318464.3389734
[35]
Jiawei Jiang, Fangcheng Fu, Tong Yang, and Bin Cui. 2018. SketchML: Accelerating Distributed Machine Learning with Data Sketches. In SIGMOD. 1269--1284. https://doi.org/10.1145/3183713.3196894
[36]
Sian Jin, Chengming Zhang, Xintong Jiang, Yunhe Feng, Hui Guan, Guanpeng Li, Shuaiwen Leon Song, and Dingwen Tao. 2022. COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression. In PVLDB. 886--899. https://doi.org/10.14778/3503585.3503597
[37]
Vasileios Karakasis, Theodoros Gkountouvas, Kornilios Kourtis, Georgios I. Goumas, and Nectarios Koziris. 2013. An Extended Compression Format for the Optimization of Sparse Matrix-Vector Multiplication. IEEE Trans. Parallel Distributed Syst., Vol. 24, 10 (2013), 1930--1940. https://doi.org/10.1109/TPDS.2012.290
[38]
Hideaki Kimura, Vivek R. Narasayya, and Manoj Syamala. 2011. Compression Aware Physical Database Design. PVLDB, Vol. 4, 10 (2011), 657--668. https://doi.org/10.14778/2021017.2021023
[39]
Urs Kö ster, Tristan Webb, Xin Wang, Marcel Nassar, Arjun K. Bansal, William Constable, Oguz Elibol, Stewart Hall, Luke Hornof, Amir Khosrowshahi, Carey Kloss, Ruby J. Pai, and Naveen Rao. 2017. Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks. In NeurIPS. 1742--1752. https://proceedings.neurips.cc/paper/2017/hash/a0160709701140704575d499c997b6ca-Abstract.html
[40]
Kornilios Kourtis, Georgios I. Goumas, and Nectarios Koziris. 2008. Optimizing sparse matrix-vector multiplication using index and value compression. In CF. 87--96. https://doi.org/10.1145/1366230.1366244
[41]
Michael Kuchnik, George Amvrosiadis, and Virginia Smith. 2021. Progressive Compressed Records: Taking a Byte out of DeepLearning Data. In PVLDB. 2627--2641. https://doi.org/10.14778/3476249.3476308
[42]
Arun Kumar, Jeffrey F. Naughton, and Jignesh M. Patel. 2015. Learning Generalized Linear Models Over Normalized Data. In SIGMOD. 1969--1984. https://doi.org/10.1145/2723372.2723713
[43]
Harald Lang, Tobias Mü hlbauer, Florian Funke, Peter A. Boncz, Thomas Neumann, and Alfons Kemper. 2016. Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation. In SIGMOD. 311--326. https://doi.org/10.1145/2882903.2882925
[44]
Robert Lasch, Robert Schulze, Thomas Legler, and Kai-Uwe Sattler. 2021. Workload-Driven Placement of Column-Store Data Structures on DRAM and NVM. In DaMoN@SIGMOD Workshop. 5:1--5:8. https://doi.org/10.1145/3465998.3466008
[45]
Fengan Li, Lingjiao Chen, Yijing Zeng, Arun Kumar, Xi Wu, Jeffrey F. Naughton, and Jignesh M. Patel. 2019. Tuple-oriented Compression for Large-scale Mini-batch Stochastic Gradient Descent. In SIGMOD. 1517--1534. https://doi.org/10.1145/3299869.3300070
[46]
Panagiotis Liakos, Katia Papakonstantinopoulou, and Yannis Kotidis. 2022. Chimp: Efficient Lossless Floating Point Compression for Time Series Databases. In PVLDB. 3058--3070. https://doi.org/10.14778/3551793.3551852
[47]
Chunbin Lin, Etienne Boursier, and Yannis Papakonstantinou. 2020. Plato: Approximate Analytics over Compressed Time Series with Tight Deterministic Error Guarantees. In PVLDB. 1105--1118. https://doi.org/10.14778/3384345.3384357
[48]
Chunwei Liu, Hao Jiang, John Paparrizos, and Aaron J Elmore. 2021. Decomposed Bounded Floats for Fast Compression and Queries. In PVLDB. 2586--2598. https://doi.org/10.14778/3476249.3476305
[49]
Chunwei Liu, McKade Umbenhower, Hao Jiang, Pranav Subramaniam, Jihong Ma, and Aaron J. Elmore. 2019. Mostly Order Preserving Dictionaries. In ICDE. 1214--1225. https://doi.org/10.1109/ICDE.2019.00111
[50]
Shangyu Luo, Dimitrije Jankov, Binhang Yuan, and Chris Jermaine. 2021. Automatic Optimization of Matrix Implementations for Distributed Machine Learning and Linear Algebra. In SIGMOD. 1222--1234. https://doi.org/10.1145/3448016.3457317
[51]
Guido Moerkotte. 1998. Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing. In VLDB. 476--487.
[52]
Dan Moldovan, James M Decker, Fei Wang, Andrew A Johnson, Brian K Lee, Zachary Nado, D Sculley, Tiark Rompf, and Alexander B Wiltschko. 2019. AutoGraph: Imperative-style Coding with Graph-based Performance. SysML (2019). https://proceedings.mlsys.org/book/272.pdf
[53]
Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan, and Ion Stoica. 2018. Ray: A Distributed Framework for Emerging AI Applications. In OSDI. 561--577. https://doi.org/10.48550/arXiv.1712.05889
[54]
Thomas Neumann and Alfons Kemper. 2015. Unnesting Arbitrary Queries. In BTW. 383--402. https://dl.gi.de/20.500.12116/2418
[55]
Jianmo Ni. 2018. Amazon Product Data - Books. https://cseweb.ucsd.edu/ jmcauley/datasets/amazon_v2/.
[56]
NVIDIA. 2020. NVIDIA A100 Tensor Core GPU Architecture - UNPRECEDENTED ACCELERATION AT EVERY SCALE. https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdf.
[57]
Dan Olteanu. 2020. The Relational Data Borg is Learning. PVLDB, Vol. 13, 12 (2020), 3502--3515. https://doi.org/10.14778/3415478.3415572
[58]
Kunle Olukotun. 2021. Keynote: "Let the Data Flow!". In CIDR.
[59]
Stavros Papadopoulos, Kushal Datta, Samuel Madden, and Timothy G. Mattson. 2016. The TileDB Array Data Storage Manager. PVLDB, Vol. 10, 4 (2016), 349--360. https://doi.org/10.14778/3025111.3025117
[60]
Yongjoo Park, Jingyi Qing, Xiaoyang Shen, and Barzan Mozafari. 2019. BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees. In SIGMOD. 1135--1152. https://doi.org/10.1145/3299869.3300077
[61]
Vijayshankar Raman, Gopi K. Attaluri, Ronald Barber, Naresh Chainani, David Kalmuk, Vincent KulandaiSamy, Jens Leenstra, Sam Lightstone, Shaorong Liu, Guy M. Lohman, Tim Malkemus, René Mü ller, Ippokratis Pandis, Berni Schiefer, David Sharpe, Richard Sidle, Adam J. Storm, and Liping Zhang. 2013. DB2 with BLU Acceleration: So Much More than Just a Column Store. PVLDB, Vol. 6, 11 (2013), 1080--1091. https://doi.org/10.14778/2536222.2536233
[62]
Vijayshankar Raman and Garret Swart. 2006. How to Wring a Table Dry: Entropy Compression of Relations and Querying of Compressed Relations. In VLDB. 858--869. http://dl.acm.org/citation.cfm?id=1164201
[63]
Hongyu Ren, Hanjun Dai, Zihang Dai, Mengjiao Yang, Jure Leskovec, Dale Schuurmans, and Bo Dai. 2021. Combiner: Full Attention Transformer with Sparse Computation Cost. In NeurIPS. https://proceedings.neurips.cc/paper_files/paper/2021/file/bd4a6d0563e0604510989eb8f9ff71f5-Paper.pdf
[64]
Youcef Saad. 1994. SPARSKIT: a basic tool kit for sparse matrix computations - Version 2.
[65]
Brennan Saeta. 2018. Training Performance A user's guide to converge faster, TF Dev Summit.
[66]
SAP. 2019. SAP HANA Performance Guide for Developers. https://help.sap.com/doc/05b8cb60dfd94c82b86828ee77f7e0d9/2.0.04/en-US/SAP_HANA_Performance_Developer_Guide_en.pdf.
[67]
Maximilian Schleich, Dan Olteanu, and Radu Ciucanu. 2016. Learning Linear Regression Models over Factorized Joins. In SIGMOD. 3--18. https://doi.org/10.1145/2882903.2882939
[68]
Maximilian Schleich, Dan Olteanu, Mahmoud Abo Khamis, Hung Q. Ngo, and XuanLong Nguyen. 2019. A Layered Aggregate Engine for Analytics Workloads. In SIGMOD. 1642--1659. https://doi.org/10.1145/3299869.3324961
[69]
Frank Seide, Hao Fu, Jasha Droppo, Gang Li, and Dong Yu. 2014. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs. In INTERSPEECH. 1058--1062. http://www.isca-speech.org/archive/interspeech_2014/i14_1058.html
[70]
Johanna Sommer, Matthias Boehm, Alexandre V. Evfimievski, Berthold Reinwald, and Peter J. Haas. 2019. MNC: Structure-Exploiting Sparsity Estimation for Matrix Expressions. In SIGMOD. 1607--1623.
[71]
Michael Stonebraker, Paul Brown, Alex Poliakov, and Suchi Raman. 2011. The Architecture of SciDB. In SSDBM. 1--16. https://doi.org/10.1007/978--3--642--22351--8_1
[72]
Felipe Petroski Such, Aditya Rawal, Joel Lehman, Kenneth O. Stanley, and Jeffrey Clune. 2020. Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data. In ICML. 9206--9216. https://doi.org/10.48550/arXiv.1912.07768
[73]
Yasuo Tabei, Hiroto Saigo, Yoshihiro Yamanishi, and Simon J. Puglisi. 2016. Scalable Partial Least Squares Regression on Grammar-Compressed Data Matrices. In SIGKDD. 1875--1884. https://doi.org/10.1145/2939672.2939864
[74]
Teradata. 2021. Teradata Documentation - Table Statements, USING FAST MODE. https://docs.teradata.com/r/76g1CuvvQlYBjb2WPIuk3g/Ls7re6GN7m3Tw_faIYIN2Q.
[75]
Ramakrishna Varadarajan, Bibek Bharathan, Ariel Cary, Jaimin Dave, and Sreenath Bodagala. 2014. DBDesigner: A customizable physical design tool for Vertica Analytic Database. ICDE, 1084--1095. https://doi.org/10.1109/ICDE.2014.6816725
[76]
Lukas Vogel, Alexander van Renen, Satoshi Imamura, Viktor Leis, Thomas Neumann, and Alfons Kemper. 2020. Mosaic: A Budget-Conscious Storage Engine for Relational Database Systems. PVLDB, Vol. 13, 11 (2020), 2662--2675. https://doi.org/10.14778/3565838.3565858
[77]
Naigang Wang, Jungwook Choi, Daniel Brand, Chia-Yu Chen, and Kailash Gopalakrishnan. 2018. Training Deep Neural Networks with 8-bit Floating Point Numbers. In NeurIPS. 7686--7695. https://proceedings.neurips.cc/paper/2018/hash/335d3d1cd7ef05ec77714a215134914c-Abstract.html
[78]
Zeke Wang, Kaan Kara, Hantian Zhang, Gustavo Alonso, Ce Zhang, and Onur Mutlu. 2019. Accelerating Generalized Linear Models with MLWeaving: A One-Size-Fits-All System for Any-precision Learning. PVLDB, Vol. 12, 7 (2019), 807--821. https://doi.org/10.14778/3317315.3317322
[79]
Thomas Willhalm, Nicolae Popovici, Yazan Boshmaf, Hasso Plattner, Alexander Zeier, and Jan Schaffner. 2009. SIMD-Scan: Ultra Fast in-Memory Table Scan using on-Chip Vector Processing Units. PVLDB, Vol. 2, 1 (2009), 385--394. https://doi.org/10.14778/1687627.1687671
[80]
Stephen Wolfram. 2002. A New Kind of Science - Data Compression. Wolfram-Media. 1069 pages.
[81]
Doris Xin, Hui Miao, Aditya G. Parameswaran, and Neoklis Polyzotis. 2021. Production Machine Learning Pipelines: Empirical Analysis and Optimization Opportunities. In SIGMOD. 2639--2652. https://doi.org/10.1145/3448016.3457566
[82]
Zongheng Yang, Badrish Chandramouli, Chi Wang, Johannes Gehrke, Yinan Li, Umar Farooq Minhas, Per-Åke Larson, Donald Kossmann, and Rajeev Acharya. 2020. Qd-tree: Learning Data Layouts for Big Data Analytics. In SIGMOD. 193--208. https://doi.org/10.1145/3318464.3389770
[83]
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In NSDI. 15--28.
[84]
Hantian Zhang, Jerry Li, Kaan Kara, Dan Alistarh, Ji Liu, and Ce Zhang. 2017. ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning. In ICML. 4035--4043. https://proceedings.mlr.press/v70/zhang17e.html
[85]
Chenzhuo Zhu, Song Han, Huizi Mao, and William J. Dally. 2017. Trained Ternary Quantization. In ICLR. https://openreview.net/forum?id=S1_pAu9xl
[86]
Marcin Zukowski, Sándor Héman, Niels Nes, and Peter A. Boncz. 2006. Super-Scalar RAM-CPU Cache Compression. In ICDE. 59. https://doi.org/10.1109/ICDE.2006.150

Cited By

View all
  • (2024)The Duck’s BrainDatenbank-Spektrum10.1007/s13222-024-00485-2Online publication date: 9-Oct-2024
  • (2023)Optimizing Tensor Computations: From Applications to Compilation and Runtime TechniquesCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589407(53-59)Online publication date: 4-Jun-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 1, Issue 1
PACMMOD
May 2023
2807 pages
EISSN:2836-6573
DOI:10.1145/3603164
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2023
Published in PACMMOD Volume 1, Issue 1

Permissions

Request permissions for this article.

Badges

Author Tags

  1. declarative
  2. large-scale
  3. linear algebra
  4. lossless compression
  5. machine learning
  6. online compression
  7. redundancy exploitation
  8. workload-aware optimization

Qualifiers

  • Research-article

Funding Sources

  • Austrian Federal Ministry for Climate Action, Environment, Energy, Mobility, Innovation and Technology (BMK)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)188
  • Downloads (Last 6 weeks)21
Reflects downloads up to 24 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)The Duck’s BrainDatenbank-Spektrum10.1007/s13222-024-00485-2Online publication date: 9-Oct-2024
  • (2023)Optimizing Tensor Computations: From Applications to Compilation and Runtime TechniquesCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589407(53-59)Online publication date: 4-Jun-2023

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media