research-article

Open access

What Operations can be Performed Directly on Compressed Arrays, and with What Error?

Authors: Tripti Agarwal, Harvey Dam, Ponnuswamy Sadayappan, Ganesh Gopalakrishnan, Dorra Ben Khalifa, Matthieu MartelAuthors Info & Claims

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

Pages 254 - 262

https://doi.org/10.1145/3624062.3625122

Published: 12 November 2023 Publication History

All formats PDF

Abstract

In response to the rapidly escalating data movement-related costs of computing with large matrices and multidimensional arrays, several lossy compression methods have been developed that help reduce the volume of data moved. Unfortunately, all these methods require the data to be decompressed before operating on it. In this work, we develop a lossy compressor for arbitrary-dimensional arrays called PyBlaz 1 that supports a dozen non-trivial operations directly on compressed data while also offering good compression ratios. PyBlaz is based on the PyTorch framework, and thus can be run on CPUs or GPUs without any code changes. We evaluate the efficacy of PyBlaz on datasets originating in three applications: comparing shallow-water simulation implementations, measuring statistics from MRI images, and detecting the scission point in plutonium fission data. Our results demonstrate that PyBlaz’s compressed-domain operations achieve good scalability while incurring controllable errors. To our best knowledge, this is the first such lossy compressor that supports compressed-domain operations in the realm of handling arbitrary-dimensional scientific datasets.

Supplemental Material

MP4 File

Recording of "What Operations can be Performed Directly on Compressed Arrays, and with What Error?" presentation at DRBSD-9.

Download
194.53 MB

References

[1]

Tripti Agarwal, Amit Chattopadhyay, and Vijay Natarajan. 2021. Topological Feature Search in Time-Varying Multifield Data. In Topological Methods in Data Analysis and Visualization VI, Ingrid Hotz, Talha Bin Masood, Filip Sadlo, and Julien Tierny (Eds.). Springer International Publishing, Cham, 197–217.

[2]

Dong H Ahn, Allison H Baker, Michael Bentley, Ian Briggs, Ganesh Gopalakrishnan, Dorit M Hammerling, Ignacio Laguna, Gregory L Lee, Daniel J Milroy, and Mariana Vertenstein. 2021. Keeping science on keel when software moves. Commun. ACM 64, 2 (2021), 66–74.

Digital Library

[3]

Bálint Balázs, Joran Deschamps, Marvin Albert, Jonas Ries, and Lars Hufnagel. 2017. A real-time compression library for microscopy images. bioRxiv (2017), 164624.

[4]

Mateusz Buda. 2017. LGG Segmentation Dataset. https://www.kaggle.com/datasets/mateuszbuda/lgg-mri-segmentation

[5]

Franck Cappello, Sheng Di, Sihuan Li, Xin Liang, Ali Murat Gok, Dingwen Tao, Chun Hong Yoon, Xin-Chuan Wu, Yuri Alexeev, and Frederic T Chong. 2019. Use cases of lossy compression for floating-point data in scientific data sets. The International Journal of High Performance Computing Applications 33, 6 (2019), 1201–1220. https://doi.org/10.1177/1094342019853336 arXiv:https://doi.org/10.1177/1094342019853336

Digital Library

[6]

Sheng Di and Franck Cappello. 2016. Fast Error-Bounded Lossy HPC Data Compression with SZ. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 730–739. https://doi.org/10.1109/IPDPS.2016.11

[7]

James Diffenderfer, Alyson Fox, Jeffrey A. F. Hittinger, Geoffrey Sanders, and Peter G. Lindstrom. 2019. Error Analysis of ZFP Compression for Floating-Point Data. SIAM J. Sci. Comput. 41, 3 (2019), A1867–A1898. https://doi.org/10.1137/18M1168832

Digital Library

[8]

David Duke, Hamish Carr, Aaron Knoll, Nicolas Schunck, Hai Nam, and A. Staszczak. 2012. Visualizing Nuclear Scission Through a Multifield Extension of Topological Analysis. Visualization and Computer Graphics, IEEE Transactions on 18. https://doi.org/10.1109/TVCG.2012.287

Digital Library

[9]

Max Ehrlich and Larry S Davis. 2019. Deep residual learning in the jpeg transform domain. In Proceedings of the IEEE International Conference on Computer Vision. 3484–3493.

[10]

Lionel Gueguen, Alex Sergeev, Ben Kadlec, Rosanne Liu, and Jason Yosinski. 2018. Faster Neural Networks Straight from JPEG. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). Vol. 31. Curran Associates, Inc.https://proceedings.neurips.cc/paper/2018/file/7af6266cc52234b5aa339b16695f7fc4-Paper.pdf

[11]

Yafan Huang, Sheng Di, Xiaodong Yu, Guanpeng Li, and Franck Cappello. 2023. cuSZp: An Ultra-Fast GPU Error-Bounded Lossy Compression Framework with Optimized End-to-End Performance.

[12]

IEEE Task P754. 1985. ANSI/IEEE 754-1985, Standard for Binary Floating-Point Arithmetic. IEEE.

[13]

Vinu Joseph, Nithin Chalapathi, Aditya Bhaskara, Ganesh Gopalakrishnan, Pavel Panchekha, and Mu Zhang. 2020. Correctness-preserving Compression of Datasets and Neural Network Models. In 4th IEEE/ACM International Workshop on Software Correctness for HPC Applications (an SC 2022 workshop), Atlanta, GA, USA, November 11, 2020, Ignacio Laguna and Cindy Rubio-González (Eds.). IEEE. https://doi.org/10.1109/Correctness51934.2020.00006

[14]

Vinu Joseph, Shoaib Ahmed Siddiqui, Aditya Bhaskara, Ganesh Gopalakrishnan, Saurav Muralidharan, Michael Garland, Sheraz Ahmed, and Andreas Dengel. 2020. Going Beyond Classification Accuracy Metrics in Model Compression. https://doi.org/10.48550/ARXIV.2012.01604

[15]

Julius Kammerl, Nico Blodow, Radu Bogdan Rusu, Suat Gedikli, Michael Beetz, and Eckehard Steinbach. 2012. Real-time compression of point cloud streams. In 2012 IEEE International Conference on Robotics and Automation. IEEE, 778–785.

[16]

Milan Klimenko and Nico Schlömer. [n. d.]. ShallowWaters.jl. https://github.com/milankl/ShallowWaters.jl.

[17]

M. Klöwer, P. D. Düben, and T. N. Palmer. 2020. Number Formats, Error Mitigation, and Scope for 16-Bit Arithmetics in Weather and Climate Modeling Analyzed With a Shallow Water Model. Journal of Advances in Modeling Earth Systems 12, 10 (2020), e2020MS002246. https://doi.org/10.1029/2020MS002246 arXiv:https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/2020MS002246e2020MS002246 10.1029/2020MS002246.

[18]

Xin Liang, Sheng Di, Dingwen Tao, Zizhong Chen, and Franck Cappello. 2018. An Efficient Transformation Scheme for Lossy Data Compression with Point-Wise Relative Error Bound. In 2018 IEEE International Conference on Cluster Computing (CLUSTER). 179–189. https://doi.org/10.1109/CLUSTER.2018.00036

[19]

Peter Lindstrom. 2014. Fixed-Rate Compressed Floating-Point Arrays. IEEE Transactions on Visualization and Computer Graphics 20 (08 2014). https://doi.org/10.1109/TVCG.2014.2346458

[20]

Matthieu Martel. 2022. Compressed Matrix Computations. In IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, BDCAT 2022. IEEE, 68–76. https://doi.org/10.1109/BDCAT56447.2022.00016

[21]

Tulika Mitra and Tzi cker Chiueh. 2003. Compression-domain editing of 3D models. In Data Compression Conference, 2003. Proceedings. DCC 2003. 343–352. https://doi.org/10.1109/DCC.2003.1194025

[22]

Yashwanth Ramamurthi, Tripti Agarwal, and Amit Chattopadhyay. 2022. A Topological Similarity Measure Between Multi-Resolution Reeb Spaces. IEEE Transactions on Visualization and Computer Graphics 28, 12 (2022), 4360–4374. https://doi.org/10.1109/TVCG.2021.3087273

Digital Library

[23]

B.C. Smith and L.A. Rowe. 1993. Algorithms for manipulating compressed images. IEEE Computer Graphics and Applications 13, 5 (1993), 34–42. https://doi.org/10.1109/38.232097

Digital Library

[24]

Danhang Tang, Mingsong Dou, Peter Lincoln, Philip Davidson, Kaiwen Guo, Jonathan Taylor, Sean Fanello, Cem Keskin, Adarsh Kowdle, Sofien Bouaziz, 2018. Real-time compression and streaming of 4d performances. ACM Transactions on Graphics (TOG) 37, 6 (2018), 1–11.

Digital Library

[25]

William Thies, Steven Hall, and Saman Amarasinghe. 2009. Manipulating Lossless Video in the Compressed Domain. In Proceedings of the 17th ACM International Conference on Multimedia (Beijing, China) (MM ’09). Association for Computing Machinery, New York, NY, USA, 331–340. https://doi.org/10.1145/1631272.1631319

Digital Library

[26]

Jiannan Tian, Sheng Di, Kai Zhao, Cody Rivera, Megan Hickman Fulp, Robert Underwood, Sian Jin, Xin Liang, Jon Calhoun, Dingwen Tao, and Franck Cappello. 2020. CuSZ: An Efficient GPU-Based Error-Bounded Lossy Compression Framework for Scientific Data. In Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques (Virtual Event, GA, USA) (PACT ’20). Association for Computing Machinery, New York, NY, USA, 3–15. https://doi.org/10.1145/3410463.3414624

Digital Library

[27]

Wikipedia. [n. d.]. Discrete cosine transform — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/wiki/Discrete_cosine_transform

[28]

Mark A. Will and Ryan K.L. Ko. 2015. Chapter 5 - A guide to homomorphic encryption. In The Cloud Security Ecosystem, Ryan Ko and Kim-Kwang Raymond Choo (Eds.). Syngress, Boston, 101–127. https://doi.org/10.1016/B978-0-12-801595-7.00005-7

[29]

Kai Zhao, Sheng Di, Maxim Dmitriev, Thierry-Laurent D. Tonellot, Zizhong Chen, and Franck Cappello. 2021. Optimizing Error-Bounded Lossy Compression for Scientific Data by Dynamic Spline Interpolation. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). 1643–1654. https://doi.org/10.1109/ICDE51399.2021.00145

Index Terms

What Operations can be Performed Directly on Compressed Arrays, and with What Error?
1. Security and privacy
  1. Cryptography
2. Theory of computation
  1. Design and analysis of algorithms
    1. Data structures design and analysis
  2. Theory and algorithms for application domains
    1. Database theory

Index terms have been assigned to the content through auto-classification.

Recommendations

Huffman Coding with Gap Arrays for GPU Acceleration
ICPP '20: Proceedings of the 49th International Conference on Parallel Processing

Huffman coding is a fundamental lossless data compression scheme used in many data compression file formats such as gzip, zip, png, and jpeg. Huffman encoding is easily parallelized, because all 8-bit symbols can be converted into codewords ...
Lossless compression of already compressed textures
HPG '11: Proceedings of the ACM SIGGRAPH Symposium on High Performance Graphics

Texture compression helps rendering by reducing the footprint in graphics memory, thus allowing for more textures, and by lowering the number of memory accesses between the graphics processor and memory, increasing performance and lowering power ...
Optimal transcoding of compressed video
ICIP '97: Proceedings of the 1997 International Conference on Image Processing (ICIP '97) 3-Volume Set-Volume 1 - Volume 1

Transcoding is regarded as a down conversion process, where the bit rate of a compressed video bit stream is reduced according to a given constraint. Based on a transcoding architecture, previously developed by the authors, an optimal transcoder in a ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

November 2023

2180 pages

ISBN:9798400707858

DOI:10.1145/3624062

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 November 2023

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Conference

SC-W 2023

SC-W 2023: Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis

November 12 - 17, 2023

CO, Denver, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
372
Total Downloads

Downloads (Last 12 months)372
Downloads (Last 6 weeks)36

Reflects downloads up to 14 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents