research-article

AWARE: Workload-aware, Redundancy-exploiting Linear Algebra

Authors:

Sebastian Baunsgaard,

Matthias BoehmAuthors Info & Claims

Proceedings of the ACM on Management of Data, Volume 1, Issue 1

Article No.: 2, Pages 1 - 28

https://doi.org/10.1145/3588682

Published: 30 May 2023 Publication History

Abstract

Compression is an effective technique for fitting data in available memory, reducing I/O, and increasing instruction parallelism. While data systems primarily rely on lossless compression, modern machine learning (ML) systems exploit the approximate nature of ML and mostly use lossy compression via low-precision floating- or fixed-point representations. The resulting unknown impact on learning progress, and model accuracy, however, create trust concerns, that require trial and error, and are problematic for declarative ML pipelines. Given the trend towards increasingly complex, composite ML pipelines---with outer loops for hyper-parameter tuning, feature selection, and data cleaning/augmentation---it is hard for a user to infer the impact of lossy compression. Sparsity exploitation is a common lossless scheme used to improve performance without this uncertainty. Evolving this concept to general redundancy-exploiting compression is a natural next step. Existing work on lossless compression and compressed linear algebra (CLA) enable such exploitation to a degree, but face challenges for general applicability. In this paper, we address these limitations with a workload-aware compression framework, comprising a broad spectrum of new compression schemes and kernels. Instead of a data-centric approach that optimizes compression ratios, our workload-aware compression summarizes the workload of an ML pipeline, and optimizes the compression and execution plan to minimize execution time. On various micro benchmarks and end-to-end ML pipelines, we observe improvements for individual operations up to 10,000x and ML algorithms up to νmprint6.6 x compared to uncompressed operations.

Supplemental Material

MP4 File

Presentation video for SIGMOD 2023 of AWARE: Workload-aware, Redundancy-exploiting Linear Algebra.

Download
40.76 MB

PDF File

Read me

Download
119.53 KB

ZIP File

Source Code

Download
4.17 MB

References

[1]

Daniel J. Abadi, Peter A. Boncz, and Stavros Harizopoulos. 2009. Column oriented Database Systems. PVLDB, Vol. 2, 2 (2009), 1664--1665. https://doi.org/10.14778/1687553.1687625

Digital Library

[2]

Daniel J. Abadi, Samuel Madden, and Miguel Ferreira. 2006. Integrating compression and execution in column-oriented database systems. In SIGMOD. 671--682. https://doi.org/10.1145/1142473.1142548

Digital Library

[3]

Mart'i n Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, Manjunath Kudlur, Josh Levenberg, Rajat Monga, Sherry Moore, Derek Gordon Murray, Benoit Steiner, Paul A. Tucker, Vijay Vasudevan, Pete Warden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2016. TensorFlow: A System for Large-Scale Machine Learning. In OSDI. 265--283. https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi

Digital Library

[4]

Amir Abboud, Arturs Backurs, Karl Bringmann, and Marvin Kü nnemann. 2020. Impossibility Results for Grammar-Compressed Linear Algebra. In NeurIPS. https://arxiv.org/abs/2010.14181

[5]

Alexander Alexandrov, Rico Bergmann, Stephan Ewen, Johann-Christoph Freytag, Fabian Hueske, Arvid Heise, Odej Kao, Marcus Leich, Ulf Leser, Volker Markl, Felix Naumann, Mathias Peters, Astrid Rheinl"a nder, Matthias J. Sax, Sebastian Schelter, Mareike Hö ger, Kostas Tzoumas, and Daniel Warneke. 2014. The Stratosphere platform for big data analytics. VLDB J., Vol. 23, 6 (2014), 939--964. https://doi.org/10.1007/s00778-014-0357-y

Digital Library

[6]

American Statistical Association (ASA). 2009. Airline on-time performance dataset. https://stat-computing.org/dataexpo/2009/the-data.html.

[7]

Manos Athanassoulis, Kenneth S. Bøgh, and Stratos Idreos. 2019. Optimal Column Layout for Hybrid Workloads. In PVLDB. 2393--2407. https://doi.org/10.14778/3358701.3358707

Digital Library

[8]

Sebastian Baunsgaard, Matthias Boehm, Ankit Chaudhary, Behrouz Derakhshan, Stefan Geißelsö der, Philipp M. Grulich, Michael Hildebrand, Kevin Innerebner, Volker Markl, Claus Neubauer, Sarah Osterburg, Olga Ovcharenko, Sergey Redyuk, Tobias Rieger, Alireza Rezaei Mahdiraji, Sebastian Benjamin Wrede, and Steffen Zeuch. 2021. ExDRa: Exploratory Data Science on Federated Raw Data. In SIGMOD. 2450--2463. https://doi.org/10.1145/3448016.3457549

Digital Library

[9]

Sebastian Baunsgaard, Matthias Boehm, Kevin Innerebner, Mito Kehayov, Florian Lackner, Olga Ovcharenko, Arnab Phani, Tobias Rieger, David Weissteiner, and Sebastian Benjamin Wrede. 2022. Federated Data Preparation, Learning, and Debugging in Apache SystemDS. In CIKM. 4813--4817. https://doi.org/10.1145/3511808.3557162

Digital Library

[10]

Souvik Bhattacherjee, Amol Deshpande, and Alan Sussman. 2014. PStore: an efficient storage framework for managing scientific data. In SSDBM. 25:1--25:12. https://doi.org/10.1145/2618243.2618268

Digital Library

[11]

Carsten Binnig, Stefan Hildenbrand, and Franz F"a rber. 2009. Dictionary-based order-preserving string compression for main memory column stores. In SIGMOD. 283--296. https://doi.org/10.14778/2536222.2536233

Digital Library

[12]

Davis W. Blalock, Samuel Madden, and John V. Guttag. 2018. Sprintz: Time Series Compression for the Internet of Things. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, 3 (2018), 93:1--93:23. https://doi.org/10.1145/3264903

Digital Library

[13]

Matthias Boehm, Michael Dusenberry, Deron Eriksson, Alexandre V. Evfimievski, Faraz Makari Manshadi, Niketan Pansare, Berthold Reinwald, Frederick Reiss, Prithviraj Sen, Arvind Surve, and Shirish Tatikonda. 2016. SystemML: Declarative Machine Learning on Spark. PVLDB, Vol. 9, 13 (2016), 1425--1436. https://doi.org/10.14778/3007263.3007279

Digital Library

[14]

Matthias Boehm, Berthold Reinwald, Dylan Hutchison, Prithviraj Sen, Alexandre V. Evfimievski, and Niketan Pansare. 2018. On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML. PVLDB, Vol. 11, 12 (2018), 1755--1768. https://doi.org/10.14778/3229863.3229865

Digital Library

[15]

Matthias Bö hm, Douglas R. Burdick, Alexandre V. Evfimievski, Berthold Reinwald, Frederick R. Reiss, Prithviraj Sen, Shirish Tatikonda, and Yuanyuan Tian. 2014. SystemML's Optimizer: Plan Generation for Large-Scale Machine Learning Programs. IEEE Data Eng. Bull., Vol. 37, 3 (2014), 52--62.

[16]

Martin Boissier. 2022. Robust and Budget-Constrained Encoding Configurations for In-Memory Database Systems. PVLDB, Vol. 15, 4, 780--793. https://doi.org/10.14778/3503585.3503588

Digital Library

[17]

Martin Boissier and Max Jendruk. 2019. Workload-Driven and Robust Selection of Compression Schemes for Column Stores. In EDBT. 674--677. https://doi.org/10.5441/002/edbt.2019.84

[18]

Léon Bottou and Gaëlle Loosli. 2007. The infinite MNIST dataset. https://leon.bottou.org/projects/infimnist.

[19]

Lujing Cen, Andreas Kipf, Ryan Marcus, and Tim Kraska. 2021. LEA: A Learned Encoding Advisor for Column Stores. In aiDM@SIGMOD. 32--35. https://doi.org/10.1145/3464509.3464885

Digital Library

[20]

Lingjiao Chen, Arun Kumar, Jeffrey F. Naughton, and Jignesh M. Patel. 2017. Towards Linear Algebra over Normalized Data. PVLDB, Vol. 10, 11 (2017), 1214--1225. https://doi.org/10.14778/3137628.3137633

Digital Library

[21]

Patrick Damme, Dirk Habich, Juliana Hildebrandt, and Wolfgang Lehner. 2017. Lightweight Data Compression Algorithms: An Experimental Survey (Experiments and Analyses). In EDBT. 72--83. https://doi.org/10.5441/002/edbt.2017.08

[22]

Patrick Damme, Annett Ungethü m, Juliana Hildebrandt, Dirk Habich, and Wolfgang Lehner. 2019. From a Comprehensive Experimental Survey to a Cost-based Selection Strategy for Lightweight Integer Compression Algorithms. ACM Trans. Database Syst., Vol. 44, 3 (2019), 9:1--9:46. https://doi.org/10.1145/3323991

Digital Library

[23]

Patrick Damme, Annett Ungethüm, Johannes Pietrzyk, Alexander Krause, Dirk Habich, and Wolfgang Lehner. 2020. MorphStore: Analytical Query Engine with a Holistic Compression-Enabled Processing Model. In PVLDB. 2396--2410. https://doi.org/10.14778/3407790.3407833

Digital Library

[24]

Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. https://archive.ics.uci.edu/ml

[25]

Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, and Berthold Reinwald. 2016. Compressed Linear Algebra for Large-Scale Machine Learning. PVLDB, Vol. 9, 12 (2016), 960--971. https://doi.org/10.14778/2994509.2994515

Digital Library

[26]

Ahmed Elgohary, Matthias Boehm, Peter J. Haas, Frederick R. Reiss, and Berthold Reinwald. 2018. Compressed linear algebra for large-scale machine learning. VLDB J., Vol. 27, 5 (2018), 719--744. https://doi.org/10.1145/3318221

Digital Library

[27]

Paolo Ferragina, Travis Gagie, Dominik Kö ppl, Giovanni Manzini, Gonzalo Navarro, Manuel Striani, and Francesco Tosoni. 2022. Improving Matrix-vector Multiplication via Lossless Grammar-Compressed Matrices. CoRR, Vol. abs/2203.14540 (2022). https://doi.org/10.14778/3547305.3547321

Digital Library

[28]

Jonathan Frankle, Gintare Karolina Dziugaite, Daniel Roy, and Michael Carbin. 2021. Pruning Neural Networks at Initialization: Why Are We Missing the Mark?. In ICLR. https://openreview.net/forum?id=Ig-VyQc-MLK

[29]

Google. 2019. TensorFlow Model Optimization Toolkit - Pruning API. https://blog.tensorflow.org/2019/05/tf-model-optimization-toolkit-pruning-API.html.

[30]

Google. 2020. Quantization Aware Training with TensorFlow Model Optimization Toolkit - Performance with Accuracy. https://blog.tensorflow.org/2020/04/quantization-aware-training-with-tensorflow-model-optimization-toolkit.html.

[31]

The HDF Group. 2021. The HDF5 Library and File Format. https://www.hdfgroup.org/solutions/hdf5/.

[32]

Ruining He and Julian McAuley. 2016. Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering. In WWW. 507--517. https://doi.org/10.1145/2872427.2883037

Digital Library

[33]

Allison L. Holloway, Vijayshankar Raman, Garret Swart, and David J. DeWitt. 2007. How to barter bits for chronons: compression and bandwidth trade offs for database scans. In SIGMOD. 389--400. https://doi.org/10.1145/1247480.1247525

Digital Library

[34]

Amir Ilkhechi, Andrew Crotty, Alex Galakatos, Yicong Mao, Grace Fan, Xiran Shi, and Ugur cC etintemel. 2020. DeepSqueeze: Deep Semantic Compression for Tabular Data. In SIGMOD. 1733--1746. https://doi.org/10.1145/3318464.3389734

Digital Library

[35]

Jiawei Jiang, Fangcheng Fu, Tong Yang, and Bin Cui. 2018. SketchML: Accelerating Distributed Machine Learning with Data Sketches. In SIGMOD. 1269--1284. https://doi.org/10.1145/3183713.3196894

Digital Library

[36]

Sian Jin, Chengming Zhang, Xintong Jiang, Yunhe Feng, Hui Guan, Guanpeng Li, Shuaiwen Leon Song, and Dingwen Tao. 2022. COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression. In PVLDB. 886--899. https://doi.org/10.14778/3503585.3503597

Digital Library

[37]

Vasileios Karakasis, Theodoros Gkountouvas, Kornilios Kourtis, Georgios I. Goumas, and Nectarios Koziris. 2013. An Extended Compression Format for the Optimization of Sparse Matrix-Vector Multiplication. IEEE Trans. Parallel Distributed Syst., Vol. 24, 10 (2013), 1930--1940. https://doi.org/10.1109/TPDS.2012.290

Digital Library

[38]

Hideaki Kimura, Vivek R. Narasayya, and Manoj Syamala. 2011. Compression Aware Physical Database Design. PVLDB, Vol. 4, 10 (2011), 657--668. https://doi.org/10.14778/2021017.2021023

Digital Library

[39]

Urs Kö ster, Tristan Webb, Xin Wang, Marcel Nassar, Arjun K. Bansal, William Constable, Oguz Elibol, Stewart Hall, Luke Hornof, Amir Khosrowshahi, Carey Kloss, Ruby J. Pai, and Naveen Rao. 2017. Flexpoint: An Adaptive Numerical Format for Efficient Training of Deep Neural Networks. In NeurIPS. 1742--1752. https://proceedings.neurips.cc/paper/2017/hash/a0160709701140704575d499c997b6ca-Abstract.html

[40]

Kornilios Kourtis, Georgios I. Goumas, and Nectarios Koziris. 2008. Optimizing sparse matrix-vector multiplication using index and value compression. In CF. 87--96. https://doi.org/10.1145/1366230.1366244

Digital Library

[41]

Michael Kuchnik, George Amvrosiadis, and Virginia Smith. 2021. Progressive Compressed Records: Taking a Byte out of DeepLearning Data. In PVLDB. 2627--2641. https://doi.org/10.14778/3476249.3476308

Digital Library

[42]

Arun Kumar, Jeffrey F. Naughton, and Jignesh M. Patel. 2015. Learning Generalized Linear Models Over Normalized Data. In SIGMOD. 1969--1984. https://doi.org/10.1145/2723372.2723713

Digital Library

[43]

Harald Lang, Tobias Mü hlbauer, Florian Funke, Peter A. Boncz, Thomas Neumann, and Alfons Kemper. 2016. Data Blocks: Hybrid OLTP and OLAP on Compressed Storage using both Vectorization and Compilation. In SIGMOD. 311--326. https://doi.org/10.1145/2882903.2882925

Digital Library

[44]

Robert Lasch, Robert Schulze, Thomas Legler, and Kai-Uwe Sattler. 2021. Workload-Driven Placement of Column-Store Data Structures on DRAM and NVM. In DaMoN@SIGMOD Workshop. 5:1--5:8. https://doi.org/10.1145/3465998.3466008

Digital Library

[45]

Fengan Li, Lingjiao Chen, Yijing Zeng, Arun Kumar, Xi Wu, Jeffrey F. Naughton, and Jignesh M. Patel. 2019. Tuple-oriented Compression for Large-scale Mini-batch Stochastic Gradient Descent. In SIGMOD. 1517--1534. https://doi.org/10.1145/3299869.3300070

Digital Library

[46]

Panagiotis Liakos, Katia Papakonstantinopoulou, and Yannis Kotidis. 2022. Chimp: Efficient Lossless Floating Point Compression for Time Series Databases. In PVLDB. 3058--3070. https://doi.org/10.14778/3551793.3551852

Digital Library

[47]

Chunbin Lin, Etienne Boursier, and Yannis Papakonstantinou. 2020. Plato: Approximate Analytics over Compressed Time Series with Tight Deterministic Error Guarantees. In PVLDB. 1105--1118. https://doi.org/10.14778/3384345.3384357

Digital Library

[48]

Chunwei Liu, Hao Jiang, John Paparrizos, and Aaron J Elmore. 2021. Decomposed Bounded Floats for Fast Compression and Queries. In PVLDB. 2586--2598. https://doi.org/10.14778/3476249.3476305

Digital Library

[49]

Chunwei Liu, McKade Umbenhower, Hao Jiang, Pranav Subramaniam, Jihong Ma, and Aaron J. Elmore. 2019. Mostly Order Preserving Dictionaries. In ICDE. 1214--1225. https://doi.org/10.1109/ICDE.2019.00111

[50]

Shangyu Luo, Dimitrije Jankov, Binhang Yuan, and Chris Jermaine. 2021. Automatic Optimization of Matrix Implementations for Distributed Machine Learning and Linear Algebra. In SIGMOD. 1222--1234. https://doi.org/10.1145/3448016.3457317

Digital Library

[51]

Guido Moerkotte. 1998. Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing. In VLDB. 476--487.

[52]

Dan Moldovan, James M Decker, Fei Wang, Andrew A Johnson, Brian K Lee, Zachary Nado, D Sculley, Tiark Rompf, and Alexander B Wiltschko. 2019. AutoGraph: Imperative-style Coding with Graph-based Performance. SysML (2019). https://proceedings.mlsys.org/book/272.pdf

[53]

Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I. Jordan, and Ion Stoica. 2018. Ray: A Distributed Framework for Emerging AI Applications. In OSDI. 561--577. https://doi.org/10.48550/arXiv.1712.05889

[54]

Thomas Neumann and Alfons Kemper. 2015. Unnesting Arbitrary Queries. In BTW. 383--402. https://dl.gi.de/20.500.12116/2418

[55]

Jianmo Ni. 2018. Amazon Product Data - Books. https://cseweb.ucsd.edu/ jmcauley/datasets/amazon_v2/.

[56]

NVIDIA. 2020. NVIDIA A100 Tensor Core GPU Architecture - UNPRECEDENTED ACCELERATION AT EVERY SCALE. https://images.nvidia.com/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdf.

[57]

Dan Olteanu. 2020. The Relational Data Borg is Learning. PVLDB, Vol. 13, 12 (2020), 3502--3515. https://doi.org/10.14778/3415478.3415572

Digital Library

[58]

Kunle Olukotun. 2021. Keynote: "Let the Data Flow!". In CIDR.

[59]

Stavros Papadopoulos, Kushal Datta, Samuel Madden, and Timothy G. Mattson. 2016. The TileDB Array Data Storage Manager. PVLDB, Vol. 10, 4 (2016), 349--360. https://doi.org/10.14778/3025111.3025117

Digital Library

[60]

Yongjoo Park, Jingyi Qing, Xiaoyang Shen, and Barzan Mozafari. 2019. BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees. In SIGMOD. 1135--1152. https://doi.org/10.1145/3299869.3300077

Digital Library

[61]

Vijayshankar Raman, Gopi K. Attaluri, Ronald Barber, Naresh Chainani, David Kalmuk, Vincent KulandaiSamy, Jens Leenstra, Sam Lightstone, Shaorong Liu, Guy M. Lohman, Tim Malkemus, René Mü ller, Ippokratis Pandis, Berni Schiefer, David Sharpe, Richard Sidle, Adam J. Storm, and Liping Zhang. 2013. DB2 with BLU Acceleration: So Much More than Just a Column Store. PVLDB, Vol. 6, 11 (2013), 1080--1091. https://doi.org/10.14778/2536222.2536233

Digital Library

[62]

Vijayshankar Raman and Garret Swart. 2006. How to Wring a Table Dry: Entropy Compression of Relations and Querying of Compressed Relations. In VLDB. 858--869. http://dl.acm.org/citation.cfm?id=1164201

[63]

Hongyu Ren, Hanjun Dai, Zihang Dai, Mengjiao Yang, Jure Leskovec, Dale Schuurmans, and Bo Dai. 2021. Combiner: Full Attention Transformer with Sparse Computation Cost. In NeurIPS. https://proceedings.neurips.cc/paper_files/paper/2021/file/bd4a6d0563e0604510989eb8f9ff71f5-Paper.pdf

[64]

Youcef Saad. 1994. SPARSKIT: a basic tool kit for sparse matrix computations - Version 2.

[65]

Brennan Saeta. 2018. Training Performance A user's guide to converge faster, TF Dev Summit.

[66]

SAP. 2019. SAP HANA Performance Guide for Developers. https://help.sap.com/doc/05b8cb60dfd94c82b86828ee77f7e0d9/2.0.04/en-US/SAP_HANA_Performance_Developer_Guide_en.pdf.

[67]

Maximilian Schleich, Dan Olteanu, and Radu Ciucanu. 2016. Learning Linear Regression Models over Factorized Joins. In SIGMOD. 3--18. https://doi.org/10.1145/2882903.2882939

Digital Library

[68]

Maximilian Schleich, Dan Olteanu, Mahmoud Abo Khamis, Hung Q. Ngo, and XuanLong Nguyen. 2019. A Layered Aggregate Engine for Analytics Workloads. In SIGMOD. 1642--1659. https://doi.org/10.1145/3299869.3324961

Digital Library

[69]

Frank Seide, Hao Fu, Jasha Droppo, Gang Li, and Dong Yu. 2014. 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs. In INTERSPEECH. 1058--1062. http://www.isca-speech.org/archive/interspeech_2014/i14_1058.html

[70]

Johanna Sommer, Matthias Boehm, Alexandre V. Evfimievski, Berthold Reinwald, and Peter J. Haas. 2019. MNC: Structure-Exploiting Sparsity Estimation for Matrix Expressions. In SIGMOD. 1607--1623.

Digital Library

[71]

Michael Stonebraker, Paul Brown, Alex Poliakov, and Suchi Raman. 2011. The Architecture of SciDB. In SSDBM. 1--16. https://doi.org/10.1007/978--3--642--22351--8_1

[72]

Felipe Petroski Such, Aditya Rawal, Joel Lehman, Kenneth O. Stanley, and Jeffrey Clune. 2020. Generative Teaching Networks: Accelerating Neural Architecture Search by Learning to Generate Synthetic Training Data. In ICML. 9206--9216. https://doi.org/10.48550/arXiv.1912.07768

[73]

Yasuo Tabei, Hiroto Saigo, Yoshihiro Yamanishi, and Simon J. Puglisi. 2016. Scalable Partial Least Squares Regression on Grammar-Compressed Data Matrices. In SIGKDD. 1875--1884. https://doi.org/10.1145/2939672.2939864

Digital Library

[74]

Teradata. 2021. Teradata Documentation - Table Statements, USING FAST MODE. https://docs.teradata.com/r/76g1CuvvQlYBjb2WPIuk3g/Ls7re6GN7m3Tw_faIYIN2Q.

[75]

Ramakrishna Varadarajan, Bibek Bharathan, Ariel Cary, Jaimin Dave, and Sreenath Bodagala. 2014. DBDesigner: A customizable physical design tool for Vertica Analytic Database. ICDE, 1084--1095. https://doi.org/10.1109/ICDE.2014.6816725

[76]

Lukas Vogel, Alexander van Renen, Satoshi Imamura, Viktor Leis, Thomas Neumann, and Alfons Kemper. 2020. Mosaic: A Budget-Conscious Storage Engine for Relational Database Systems. PVLDB, Vol. 13, 11 (2020), 2662--2675. https://doi.org/10.14778/3565838.3565858

Digital Library

[77]

Naigang Wang, Jungwook Choi, Daniel Brand, Chia-Yu Chen, and Kailash Gopalakrishnan. 2018. Training Deep Neural Networks with 8-bit Floating Point Numbers. In NeurIPS. 7686--7695. https://proceedings.neurips.cc/paper/2018/hash/335d3d1cd7ef05ec77714a215134914c-Abstract.html

[78]

Zeke Wang, Kaan Kara, Hantian Zhang, Gustavo Alonso, Ce Zhang, and Onur Mutlu. 2019. Accelerating Generalized Linear Models with MLWeaving: A One-Size-Fits-All System for Any-precision Learning. PVLDB, Vol. 12, 7 (2019), 807--821. https://doi.org/10.14778/3317315.3317322

Digital Library

[79]

Thomas Willhalm, Nicolae Popovici, Yazan Boshmaf, Hasso Plattner, Alexander Zeier, and Jan Schaffner. 2009. SIMD-Scan: Ultra Fast in-Memory Table Scan using on-Chip Vector Processing Units. PVLDB, Vol. 2, 1 (2009), 385--394. https://doi.org/10.14778/1687627.1687671

Digital Library

[80]

Stephen Wolfram. 2002. A New Kind of Science - Data Compression. Wolfram-Media. 1069 pages.

[81]

Doris Xin, Hui Miao, Aditya G. Parameswaran, and Neoklis Polyzotis. 2021. Production Machine Learning Pipelines: Empirical Analysis and Optimization Opportunities. In SIGMOD. 2639--2652. https://doi.org/10.1145/3448016.3457566

Digital Library

[82]

Zongheng Yang, Badrish Chandramouli, Chi Wang, Johannes Gehrke, Yinan Li, Umar Farooq Minhas, Per-Åke Larson, Donald Kossmann, and Rajeev Acharya. 2020. Qd-tree: Learning Data Layouts for Big Data Analytics. In SIGMOD. 193--208. https://doi.org/10.1145/3318464.3389770

Digital Library

[83]

Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In NSDI. 15--28.

[84]

Hantian Zhang, Jerry Li, Kaan Kara, Dan Alistarh, Ji Liu, and Ce Zhang. 2017. ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning. In ICML. 4035--4043. https://proceedings.mlr.press/v70/zhang17e.html

[85]

Chenzhuo Zhu, Song Han, Huizi Mao, and William J. Dally. 2017. Trained Ternary Quantization. In ICLR. https://openreview.net/forum?id=S1_pAu9xl

[86]

Marcin Zukowski, Sándor Héman, Niels Nes, and Peter A. Boncz. 2006. Super-Scalar RAM-CPU Cache Compression. In ICDE. 59. https://doi.org/10.1109/ICDE.2006.150

Digital Library

Cited By

Schüle MNeumann TKemper A(2024)The Duck’s BrainDatenbank-Spektrum10.1007/s13222-024-00485-2Online publication date: 9-Oct-2024
https://doi.org/10.1007/s13222-024-00485-2
Boehm MInterlandi MJermaine CDas SPandis ISelçuk Candan KAmer-Yahia S(2023)Optimizing Tensor Computations: From Applications to Compilation and Runtime TechniquesCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589407(53-59)Online publication date: 4-Jun-2023
https://dl.acm.org/doi/10.1145/3555041.3589407

Index Terms

AWARE: Workload-aware, Redundancy-exploiting Linear Algebra

Recommendations

Compressed linear algebra for large-scale machine learning

Large-scale machine learning algorithms are often iterative, using repeated read-only data access and I/O-bound matrix-vector multiplications to converge to an optimal model. It is crucial for performance to fit the data into single-node or distributed ...
Conditional Entropy Coding of VQ Indexes for Image Compression
DCC '97: Proceedings of the Conference on Data Compression

Vector quantization (VQ) is a source coding methodology with provable rate-distortion optimality. However, despite more than two decades of intensive research, VQ theoretical promise is yet to be fully realized in image compression practice. Restricted ...
Grayscale true two-dimensional dictionary-based image compression

Dictionary-based encoding methods are popular forms of data compression. These methods were initially implemented to reduce the one-dimensional correlation in data, since they are designed to compress text. Therefore, they do not take advantage of the ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data

Proceedings of the ACM on Management of Data Volume 1, Issue 1

PACMMOD

May 2023

2807 pages

EISSN:2836-6573

DOI:10.1145/3603164

Editor:
Divyakant Agrawal
UC Santa Barbara, United States

Issue’s Table of Contents

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2023

Published in PACMMOD Volume 1, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Badges

Author Tags

Qualifiers

Research-article

Funding Sources

Austrian Federal Ministry for Climate Action, Environment, Energy, Mobility, Innovation and Technology (BMK)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
278
Total Downloads

Downloads (Last 12 months)188
Downloads (Last 6 weeks)21

Reflects downloads up to 24 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Schüle MNeumann TKemper A(2024)The Duck’s BrainDatenbank-Spektrum10.1007/s13222-024-00485-2Online publication date: 9-Oct-2024
https://doi.org/10.1007/s13222-024-00485-2
Boehm MInterlandi MJermaine CDas SPandis ISelçuk Candan KAmer-Yahia S(2023)Optimizing Tensor Computations: From Applications to Compilation and Runtime TechniquesCompanion of the 2023 International Conference on Management of Data10.1145/3555041.3589407(53-59)Online publication date: 4-Jun-2023
https://dl.acm.org/doi/10.1145/3555041.3589407

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents