research-article

RGB-D Data Compression via Bi-Directional Cross-Modal Prior Transfer and Enhanced Entropy Modeling

Authors:

Pingping Zhang,

Xu WangAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 21, Issue 2

Article No.: 58, Pages 1 - 17

https://doi.org/10.1145/3702997

Published: 25 December 2024 Publication History

Abstract

RGB-D data, being homogeneous cross-modal data, demonstrates significant correlations among data elements. However, current research focuses only on a uni-directional pattern of cross-modal contextual information, neglecting the exploration of bi-directional relationships in the compression field. Thus, we propose a joint RGB-D compression scheme, which is combined with Bi-Directional Cross-Modal Prior Transfer (Bi-CPT) modules and a Bi-Directional Cross-Modal Enhanced Entropy (Bi-CEE) model. The Bi-CPT module is designed for compact representations of cross-modal features, effectively eliminating spatial and modality redundancies at different granularity levels. In contrast to the traditional entropy models, our proposed Bi-CEE model not only achieves spatial-channel contextual adaptation through partitioning RGB and depth features but also incorporates information from other modalities as prior to enhance the accuracy of probability estimation for latent variables. Furthermore, this model enables parallel multi-stage processing to accelerate coding. Experimental results demonstrate the superiority of our proposed framework over the current compression scheme, outperforming both rate-distortion performance and downstream tasks, including surface reconstruction and semantic segmentation. The source code will be available at https://github.com/xyy7/Learning-based-RGB-D-Image-Compression.

References

[1]

Johannes Ballé, Valero Laparra, and Eero P. Simoncelli. 2016. End-to-End Optimization of Nonlinear Transform Codes for Perceptual Quality. In 2016 Picture Coding Symposium, 1–5.

[2]

Johannes Ballé, Valero Laparra, and Eero P. Simoncelli. 2017. End-to-End Optimized Image Compression. In 5th International Conference on Learning Representations, 1–27.

[3]

Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. 2018. Variational Image Compression with a Scale Hyperprior. In 6th International Conference on Learning Representations, 1–10.

[4]

Fabrice Bellard. 2015. Better Portable Graphics. Retrieved from https://bellard.org/bpg/

[5]

Minghui Chen, Pingping Zhang, Zhuo Chen, Yun Zhang, Xu Wang, and Sam Kwong. 2022. End-to-End Depth Map Compression Framework via RGB-to-Depth Structure Priors Learning. In 2022 IEEE International Conference on Image Processing, 3206–3210.

[6]

Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto. 2020. Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7936–7945.

[7]

Charilaos A. Christopoulos, Touradj Ebrahimi, and Athanassios N. Skodras. 2000. JPEG2000: The New Still Picture Compression Standard. In the ACM Multimedia 2000 Workshops, 45–49.

Digital Library

[8]

Reuben A. Farrugia. 2012. Efficient Depth Image Compression Using Accurate Depth Discontinuity Detection and Prediction. In 8th International Conference on Signal Image Technology and Internet Based Systems, 29–35.

Digital Library

[9]

Emmy-Charlotte Förster, Thomas Löwe, Stephan Wenger, and Marcus A. Magnor. 2015. RGB-Guided Depth Map Compression via Compressed Sensing and Sparse Coding. In 2015 Picture Coding Symposium, 1–4.

[10]

Jingjing Fu, Dan Miao, Weiren Yu, Shiqi Wang, Yan Lu, and Shipeng Li. 2013. Kinect-Like Depth Data Compression. IEEE Transactions on Multimedia 15, 6 (2013), 1340–1352.

Digital Library

[11]

Yixin Gao, Yaojun Wu, Zongyu Guo, Zhizheng Zhang, and Zhibo Chen. 2021. Perceptual Friendly Variable Rate Image Compression. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, 1916–1920.

[12]

Josselin Gautier, Olivier Le Meur, and Christine Guillemot. 2012. Efficient Depth Map Compression Based on Lossless Edge Coding and Diffusion. In 2012 Picture Coding Symposium, 81–84.

[13]

Dailan He, Ziming Yang, Weikun Peng, Rui Ma, Hongwei Qin, and Yan Wang. 2022. ELIC: Efficient Learned Image Compression with Unevenly Grouped Space-Channel Contextual Adaptive Coding. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5718–5727.

[14]

Dailan He, Yaoyan Zheng, Baocheng Sun, Yan Wang, and Hongwei Qin. 2021. Checkerboard Context Model for Efficient Learned Image Compression. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 14771–14780.

[15]

Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-Excitation Networks. In IEEE Conference on Computer Vision and Pattern Recognition, 7132–7141.

[16]

Yueyu Hu, Wenhan Yang, and Jiaying Liu. 2020. Coarse-to-Fine Hyper-Prior Modeling for Learned Image Compression. In the 34th AAAI Conference on Artificial Intelligence, 11013–11020.

[17]

Wei Jiang and Ronggang Wang. 2023. MLIC++: Linear Complexity Multi-Reference Entropy Modeling for Learned Image Compression. In ICML 2023 Workshop Neural Compression: From Information Theory to Applications, 1–6. Retrieved from https://openreview.net/forum?id=hxIpcSoz2t

[18]

Wei Jiang, Jiayu Yang, Yongqi Zhai, Peirong Ning, Feng Gao, and Ronggang Wang. 2023. MLIC: Multi-Reference Entropy Model for Learned Image Compression. In the 31st ACM International Conference on Multimedia, 7618–7627.

Digital Library

[19]

Joint Video Experts Team (JVET). 2021. VVC Official Test Model VTM. Retrieved from https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/-/tree/VTM-12.1

[20]

Faisal Khan, Shahid Hussain, Shubhajit Basak, Mohamed Moustafa, and Peter Corcoran. 2021. A Review of Benchmark Datasets and Training Loss Functions in Neural Depth Estimation. IEEE Access 9 (2021), 148479–148503.

[21]

Jiguo Li, Chuanmin Jia, Xinfeng Zhang, Siwei Ma, and Wen Gao. 2021. Cross Modal Compression: Towards Human-Comprehensible Semantic Compression. In 29th ACM International Conference on Multimedia, 4230–4238.

Digital Library

[22]

Jie Liu, Wenjie Zhang, Yuting Tang, Jie Tang, and Gangshan Wu. 2020. Residual Feature Aggregation Network for Image Super-Resolution. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2356–2365.

[23]

Guo Lu, Tianxiong Zhong, Jing Geng, Qiang Hu, and Dong Xu. 2022. Learning Based Multi-Modality Image and Video Compression. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6073–6082.

[24]

Siwei Ma, Xinfeng Zhang, Chuanmin Jia, Zhenghui Zhao, Shiqi Wang, and Shanshe Wang. 2019. Image and Video Compression with Neural Networks: A Review. IEEE Transactions on Circuits and Systems for Video Technology 30, 6 (2019), 1683–1698.

[25]

David Minnen, Johannes Ballé, and George Toderici. 2018. Joint Autoregressive and Hierarchical Priors for Learned Image Compression. In Advances in Neural Information Processing Systems, 10794–10803.

[26]

Fabrizio Pece, Jan Kautz, and Tim Weyrich. 2011. Adapting Standard Video Codecs for Depth Streaming. In Joint Virtual Reality Conference of EGVE (JVRC ’11), 59–66.

[27]

Bo Peng, Yuying Jing, Dengchao Jin, Xiangrui Liu, Zhaoqing Pan, and Jianjun Lei. 2022. Texture-Guided End-to-End Depth Map Compression. In 2022 IEEE International Conference on Image Processing, 2386–2390.

[28]

Daniel Seichter, Mona Köhler, Benjamin Lewandowski, Tim Wengefeld, and Horst-Michael Gross. 2021. Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis. In 2021 IEEE International Conference on Robotics and Automation, 7132–7141.

Digital Library

[29]

Nathan Silberman, Pushmeet Kohli, Derek Hoiem, and Rob Fergus. 2012. Indoor Segmentation and Support Inference from RGBD Images. In European Conference on Computer Vision, 746–760.

[30]

Shuran Song, Samuel P. Lichtenberg, and Jianxiong Xiao. 2015. Sun RGB-D: A RGB-D Scene Understanding Benchmark Suite. In IEEE Conference on Computer Vision and Pattern Recognition, 567–576.

[31]

Lucas Theis, Wenzhe Shi, Andrew Cunningham, and Ferenc Huszár. 2017. Lossy Image Compression with Compressive Autoencoders. In International Conference on Learning Representations, 1–19.

[32]

Gregory K. Wallace. 1992. The JPEG Still Picture Compression Standard. IEEE Transactions on Consumer Electronics 38, 1 (1992), xviii–xxxiv.

Digital Library

[33]

Matthias Wödlinger, Jan Kotera, Jan Xu, and Robert Sablatnig. 2022. SASIC: Stereo Image Compression with Latent Shifts and Stereo Attention. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 651–660.

[34]

Yuyang Wu and Wei Gao. 2022. End-to-End Lossless Compression of High Precision Depth Maps Guided by Pseudo-Residual. In 2022 Data Compression Conference, 489–489.

[35]

Yuyang Wu, Zhiyang Qi, Huiming Zheng, Lvfang Tao, and Wei Gao. 2021. Deep Image Compression with Latent Optimization and Piece-Wise Quantization Approximation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1926–1930.

[36]

Runyu Yang, Dong Liu, Siwei Ma, Feng Wu, and Wen Gao. 2024. Perceptual Quality-Oriented Rate Allocation via Distillation from End-to-End Image Compression. ACM Transactions on Multimedia Computing, Communications and Applications 20, 198 (2024), 1–22.

Digital Library

[37]

Pingping Zhang, Shiqi Wang, Meng Wang, Jiguo Li, Xu Wang, and Sam Kwong. 2023. Rethinking Semantic Image Compression: Scalable Representation with Cross-Modality Transfer. IEEE Transactions on Circuits and Systems for Video Technology 33 (2023), 4441–4445.

Digital Library

[38]

Jing Zhao, Bin Li, Jiahao Li, Ruiqin Xiong, and Yan Lu. 2023. A Universal Optimization Framework for Learning-Based Image Codec. ACM Transactions on Multimedia Computing, Communications and Applications 20, 1 (2023), 1–19.

Digital Library

[39]

Tiesong Zhao, Weize Feng, HongJi Zeng, Yiwen Xu, Yuzhen Niu, and Jiaying Liu. 2022. Learning-Based Video Coding with Joint Deep Compression and Enhancement. In 30th ACM International Conference on Multimedia, 3045–3054.

Digital Library

[40]

Huiming Zheng and Wei Gao. 2024. End-to-End RGB-D Image Compression via Exploiting Channel-Modality Redundancy. In AAAI Conference on Artificial Intelligence, 7562–7570.

Digital Library

[41]

Renjie Zou, Chunfeng Song, and Zhaoxiang Zhang. 2022. The Devil Is in the Details: Window-Based Attention for Image Compression. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 17492–17501.

Index Terms

RGB-D Data Compression via Bi-Directional Cross-Modal Prior Transfer and Enhanced Entropy Modeling
1. Computing methodologies
  1. Computer graphics
    1. Image compression

Recommendations

Conditional Entropy Coding of VQ Indexes for Image Compression
DCC '97: Proceedings of the Conference on Data Compression

Vector quantization (VQ) is a source coding methodology with provable rate-distortion optimality. However, despite more than two decades of intensive research, VQ theoretical promise is yet to be fully realized in image compression practice. Restricted ...
Cross-Modal Transformer for RGB-D semantic segmentation of production workshop objects
Highlights
- We propose Cross-Modal Transformer (CMFormer), a Transformer-based cross-modal semantic segmentation model, which achieves better cross-modal information interaction by capturing long-range contextual dependencies. The CMFormer includes ...
Abstract
Scene understanding in a production workshop is an important technology to improve its intelligence level, semantic segmentation of production workshop objects is an effective method for realizing scene understanding. Since the varieties of ...
Lossless-by-Lossy Coding for Scalable Lossless Image Compression

This paper presents a method of scalable lossless image compression by means of lossy coding. A progressive decoding capability and a full decoding for the lossless rendition are equipped with the losslessly encoded bit stream. Embedded coding is ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 21, Issue 2

February 2025

651 pages

EISSN:1551-6865

DOI:10.1145/3703007

Editor:
Abuabdulmotaleb El Saddik
ZapAura and University of Ottawa, Canada

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 December 2024

Online AM: 05 November 2024

Accepted: 13 October 2024

Revised: 11 September 2024

Received: 13 May 2024

Published in TOMM Volume 21, Issue 2

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Guangdong Basic and Applied Basic Research Foundation
Stable Support Project of Shenzhen

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
207
Total Downloads

Downloads (Last 12 months)207
Downloads (Last 6 weeks)25

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents