Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3624062.3625122acmotherconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Open access

What Operations can be Performed Directly on Compressed Arrays, and with What Error?

Published: 12 November 2023 Publication History

Abstract

In response to the rapidly escalating data movement-related costs of computing with large matrices and multidimensional arrays, several lossy compression methods have been developed that help reduce the volume of data moved. Unfortunately, all these methods require the data to be decompressed before operating on it. In this work, we develop a lossy compressor for arbitrary-dimensional arrays called PyBlaz 1 that supports a dozen non-trivial operations directly on compressed data while also offering good compression ratios. PyBlaz is based on the PyTorch framework, and thus can be run on CPUs or GPUs without any code changes. We evaluate the efficacy of PyBlaz on datasets originating in three applications: comparing shallow-water simulation implementations, measuring statistics from MRI images, and detecting the scission point in plutonium fission data. Our results demonstrate that PyBlaz’s compressed-domain operations achieve good scalability while incurring controllable errors. To our best knowledge, this is the first such lossy compressor that supports compressed-domain operations in the realm of handling arbitrary-dimensional scientific datasets.

Supplemental Material

MP4 File
Recording of "What Operations can be Performed Directly on Compressed Arrays, and with What Error?" presentation at DRBSD-9.

References

[1]
Tripti Agarwal, Amit Chattopadhyay, and Vijay Natarajan. 2021. Topological Feature Search in Time-Varying Multifield Data. In Topological Methods in Data Analysis and Visualization VI, Ingrid Hotz, Talha Bin Masood, Filip Sadlo, and Julien Tierny (Eds.). Springer International Publishing, Cham, 197–217.
[2]
Dong H Ahn, Allison H Baker, Michael Bentley, Ian Briggs, Ganesh Gopalakrishnan, Dorit M Hammerling, Ignacio Laguna, Gregory L Lee, Daniel J Milroy, and Mariana Vertenstein. 2021. Keeping science on keel when software moves. Commun. ACM 64, 2 (2021), 66–74.
[3]
Bálint Balázs, Joran Deschamps, Marvin Albert, Jonas Ries, and Lars Hufnagel. 2017. A real-time compression library for microscopy images. bioRxiv (2017), 164624.
[4]
Mateusz Buda. 2017. LGG Segmentation Dataset. https://www.kaggle.com/datasets/mateuszbuda/lgg-mri-segmentation
[5]
Franck Cappello, Sheng Di, Sihuan Li, Xin Liang, Ali Murat Gok, Dingwen Tao, Chun Hong Yoon, Xin-Chuan Wu, Yuri Alexeev, and Frederic T Chong. 2019. Use cases of lossy compression for floating-point data in scientific data sets. The International Journal of High Performance Computing Applications 33, 6 (2019), 1201–1220. https://doi.org/10.1177/1094342019853336 arXiv:https://doi.org/10.1177/1094342019853336
[6]
Sheng Di and Franck Cappello. 2016. Fast Error-Bounded Lossy HPC Data Compression with SZ. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 730–739. https://doi.org/10.1109/IPDPS.2016.11
[7]
James Diffenderfer, Alyson Fox, Jeffrey A. F. Hittinger, Geoffrey Sanders, and Peter G. Lindstrom. 2019. Error Analysis of ZFP Compression for Floating-Point Data. SIAM J. Sci. Comput. 41, 3 (2019), A1867–A1898. https://doi.org/10.1137/18M1168832
[8]
David Duke, Hamish Carr, Aaron Knoll, Nicolas Schunck, Hai Nam, and A. Staszczak. 2012. Visualizing Nuclear Scission Through a Multifield Extension of Topological Analysis. Visualization and Computer Graphics, IEEE Transactions on 18. https://doi.org/10.1109/TVCG.2012.287
[9]
Max Ehrlich and Larry S Davis. 2019. Deep residual learning in the jpeg transform domain. In Proceedings of the IEEE International Conference on Computer Vision. 3484–3493.
[10]
Lionel Gueguen, Alex Sergeev, Ben Kadlec, Rosanne Liu, and Jason Yosinski. 2018. Faster Neural Networks Straight from JPEG. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). Vol. 31. Curran Associates, Inc.https://proceedings.neurips.cc/paper/2018/file/7af6266cc52234b5aa339b16695f7fc4-Paper.pdf
[11]
Yafan Huang, Sheng Di, Xiaodong Yu, Guanpeng Li, and Franck Cappello. 2023. cuSZp: An Ultra-Fast GPU Error-Bounded Lossy Compression Framework with Optimized End-to-End Performance.
[12]
IEEE Task P754. 1985. ANSI/IEEE 754-1985, Standard for Binary Floating-Point Arithmetic. IEEE.
[13]
Vinu Joseph, Nithin Chalapathi, Aditya Bhaskara, Ganesh Gopalakrishnan, Pavel Panchekha, and Mu Zhang. 2020. Correctness-preserving Compression of Datasets and Neural Network Models. In 4th IEEE/ACM International Workshop on Software Correctness for HPC Applications (an SC 2022 workshop), Atlanta, GA, USA, November 11, 2020, Ignacio Laguna and Cindy Rubio-González (Eds.). IEEE. https://doi.org/10.1109/Correctness51934.2020.00006
[14]
Vinu Joseph, Shoaib Ahmed Siddiqui, Aditya Bhaskara, Ganesh Gopalakrishnan, Saurav Muralidharan, Michael Garland, Sheraz Ahmed, and Andreas Dengel. 2020. Going Beyond Classification Accuracy Metrics in Model Compression. https://doi.org/10.48550/ARXIV.2012.01604
[15]
Julius Kammerl, Nico Blodow, Radu Bogdan Rusu, Suat Gedikli, Michael Beetz, and Eckehard Steinbach. 2012. Real-time compression of point cloud streams. In 2012 IEEE International Conference on Robotics and Automation. IEEE, 778–785.
[16]
Milan Klimenko and Nico Schlömer. [n. d.]. ShallowWaters.jl. https://github.com/milankl/ShallowWaters.jl.
[17]
M. Klöwer, P. D. Düben, and T. N. Palmer. 2020. Number Formats, Error Mitigation, and Scope for 16-Bit Arithmetics in Weather and Climate Modeling Analyzed With a Shallow Water Model. Journal of Advances in Modeling Earth Systems 12, 10 (2020), e2020MS002246. https://doi.org/10.1029/2020MS002246 arXiv:https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1029/2020MS002246e2020MS002246 10.1029/2020MS002246.
[18]
Xin Liang, Sheng Di, Dingwen Tao, Zizhong Chen, and Franck Cappello. 2018. An Efficient Transformation Scheme for Lossy Data Compression with Point-Wise Relative Error Bound. In 2018 IEEE International Conference on Cluster Computing (CLUSTER). 179–189. https://doi.org/10.1109/CLUSTER.2018.00036
[19]
Peter Lindstrom. 2014. Fixed-Rate Compressed Floating-Point Arrays. IEEE Transactions on Visualization and Computer Graphics 20 (08 2014). https://doi.org/10.1109/TVCG.2014.2346458
[20]
Matthieu Martel. 2022. Compressed Matrix Computations. In IEEE/ACM International Conference on Big Data Computing, Applications and Technologies, BDCAT 2022. IEEE, 68–76. https://doi.org/10.1109/BDCAT56447.2022.00016
[21]
Tulika Mitra and Tzi cker Chiueh. 2003. Compression-domain editing of 3D models. In Data Compression Conference, 2003. Proceedings. DCC 2003. 343–352. https://doi.org/10.1109/DCC.2003.1194025
[22]
Yashwanth Ramamurthi, Tripti Agarwal, and Amit Chattopadhyay. 2022. A Topological Similarity Measure Between Multi-Resolution Reeb Spaces. IEEE Transactions on Visualization and Computer Graphics 28, 12 (2022), 4360–4374. https://doi.org/10.1109/TVCG.2021.3087273
[23]
B.C. Smith and L.A. Rowe. 1993. Algorithms for manipulating compressed images. IEEE Computer Graphics and Applications 13, 5 (1993), 34–42. https://doi.org/10.1109/38.232097
[24]
Danhang Tang, Mingsong Dou, Peter Lincoln, Philip Davidson, Kaiwen Guo, Jonathan Taylor, Sean Fanello, Cem Keskin, Adarsh Kowdle, Sofien Bouaziz, 2018. Real-time compression and streaming of 4d performances. ACM Transactions on Graphics (TOG) 37, 6 (2018), 1–11.
[25]
William Thies, Steven Hall, and Saman Amarasinghe. 2009. Manipulating Lossless Video in the Compressed Domain. In Proceedings of the 17th ACM International Conference on Multimedia (Beijing, China) (MM ’09). Association for Computing Machinery, New York, NY, USA, 331–340. https://doi.org/10.1145/1631272.1631319
[26]
Jiannan Tian, Sheng Di, Kai Zhao, Cody Rivera, Megan Hickman Fulp, Robert Underwood, Sian Jin, Xin Liang, Jon Calhoun, Dingwen Tao, and Franck Cappello. 2020. CuSZ: An Efficient GPU-Based Error-Bounded Lossy Compression Framework for Scientific Data. In Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques (Virtual Event, GA, USA) (PACT ’20). Association for Computing Machinery, New York, NY, USA, 3–15. https://doi.org/10.1145/3410463.3414624
[27]
Wikipedia. [n. d.]. Discrete cosine transform — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/wiki/Discrete_cosine_transform
[28]
Mark A. Will and Ryan K.L. Ko. 2015. Chapter 5 - A guide to homomorphic encryption. In The Cloud Security Ecosystem, Ryan Ko and Kim-Kwang Raymond Choo (Eds.). Syngress, Boston, 101–127. https://doi.org/10.1016/B978-0-12-801595-7.00005-7
[29]
Kai Zhao, Sheng Di, Maxim Dmitriev, Thierry-Laurent D. Tonellot, Zizhong Chen, and Franck Cappello. 2021. Optimizing Error-Bounded Lossy Compression for Scientific Data by Dynamic Spline Interpolation. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). 1643–1654. https://doi.org/10.1109/ICDE51399.2021.00145

Index Terms

  1. What Operations can be Performed Directly on Compressed Arrays, and with What Error?
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis
        November 2023
        2180 pages
        ISBN:9798400707858
        DOI:10.1145/3624062
        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 12 November 2023

        Check for updates

        Author Tags

        1. arrays
        2. data compression
        3. floating-point arithmetic
        4. high-performance computing
        5. parallel computing
        6. tensors

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Funding Sources

        Conference

        SC-W 2023

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 372
          Total Downloads
        • Downloads (Last 12 months)372
        • Downloads (Last 6 weeks)36
        Reflects downloads up to 14 Nov 2024

        Other Metrics

        Citations

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media