Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

RGB-D Data Compression via Bi-Directional Cross-Modal Prior Transfer and Enhanced Entropy Modeling

Published: 25 December 2024 Publication History

Abstract

RGB-D data, being homogeneous cross-modal data, demonstrates significant correlations among data elements. However, current research focuses only on a uni-directional pattern of cross-modal contextual information, neglecting the exploration of bi-directional relationships in the compression field. Thus, we propose a joint RGB-D compression scheme, which is combined with Bi-Directional Cross-Modal Prior Transfer (Bi-CPT) modules and a Bi-Directional Cross-Modal Enhanced Entropy (Bi-CEE) model. The Bi-CPT module is designed for compact representations of cross-modal features, effectively eliminating spatial and modality redundancies at different granularity levels. In contrast to the traditional entropy models, our proposed Bi-CEE model not only achieves spatial-channel contextual adaptation through partitioning RGB and depth features but also incorporates information from other modalities as prior to enhance the accuracy of probability estimation for latent variables. Furthermore, this model enables parallel multi-stage processing to accelerate coding. Experimental results demonstrate the superiority of our proposed framework over the current compression scheme, outperforming both rate-distortion performance and downstream tasks, including surface reconstruction and semantic segmentation. The source code will be available at https://github.com/xyy7/Learning-based-RGB-D-Image-Compression.

References

[1]
Johannes Ballé, Valero Laparra, and Eero P. Simoncelli. 2016. End-to-End Optimization of Nonlinear Transform Codes for Perceptual Quality. In 2016 Picture Coding Symposium, 1–5.
[2]
Johannes Ballé, Valero Laparra, and Eero P. Simoncelli. 2017. End-to-End Optimized Image Compression. In 5th International Conference on Learning Representations, 1–27.
[3]
Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. 2018. Variational Image Compression with a Scale Hyperprior. In 6th International Conference on Learning Representations, 1–10.
[4]
Fabrice Bellard. 2015. Better Portable Graphics. Retrieved from https://bellard.org/bpg/
[5]
Minghui Chen, Pingping Zhang, Zhuo Chen, Yun Zhang, Xu Wang, and Sam Kwong. 2022. End-to-End Depth Map Compression Framework via RGB-to-Depth Structure Priors Learning. In 2022 IEEE International Conference on Image Processing, 3206–3210.
[6]
Zhengxue Cheng, Heming Sun, Masaru Takeuchi, and Jiro Katto. 2020. Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7936–7945.
[7]
Charilaos A. Christopoulos, Touradj Ebrahimi, and Athanassios N. Skodras. 2000. JPEG2000: The New Still Picture Compression Standard. In the ACM Multimedia 2000 Workshops, 45–49.
[8]
Reuben A. Farrugia. 2012. Efficient Depth Image Compression Using Accurate Depth Discontinuity Detection and Prediction. In 8th International Conference on Signal Image Technology and Internet Based Systems, 29–35.
[9]
Emmy-Charlotte Förster, Thomas Löwe, Stephan Wenger, and Marcus A. Magnor. 2015. RGB-Guided Depth Map Compression via Compressed Sensing and Sparse Coding. In 2015 Picture Coding Symposium, 1–4.
[10]
Jingjing Fu, Dan Miao, Weiren Yu, Shiqi Wang, Yan Lu, and Shipeng Li. 2013. Kinect-Like Depth Data Compression. IEEE Transactions on Multimedia 15, 6 (2013), 1340–1352.
[11]
Yixin Gao, Yaojun Wu, Zongyu Guo, Zhizheng Zhang, and Zhibo Chen. 2021. Perceptual Friendly Variable Rate Image Compression. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, 1916–1920.
[12]
Josselin Gautier, Olivier Le Meur, and Christine Guillemot. 2012. Efficient Depth Map Compression Based on Lossless Edge Coding and Diffusion. In 2012 Picture Coding Symposium, 81–84.
[13]
Dailan He, Ziming Yang, Weikun Peng, Rui Ma, Hongwei Qin, and Yan Wang. 2022. ELIC: Efficient Learned Image Compression with Unevenly Grouped Space-Channel Contextual Adaptive Coding. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5718–5727.
[14]
Dailan He, Yaoyan Zheng, Baocheng Sun, Yan Wang, and Hongwei Qin. 2021. Checkerboard Context Model for Efficient Learned Image Compression. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 14771–14780.
[15]
Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-Excitation Networks. In IEEE Conference on Computer Vision and Pattern Recognition, 7132–7141.
[16]
Yueyu Hu, Wenhan Yang, and Jiaying Liu. 2020. Coarse-to-Fine Hyper-Prior Modeling for Learned Image Compression. In the 34th AAAI Conference on Artificial Intelligence, 11013–11020.
[17]
Wei Jiang and Ronggang Wang. 2023. MLIC++: Linear Complexity Multi-Reference Entropy Modeling for Learned Image Compression. In ICML 2023 Workshop Neural Compression: From Information Theory to Applications, 1–6. Retrieved from https://openreview.net/forum?id=hxIpcSoz2t
[18]
Wei Jiang, Jiayu Yang, Yongqi Zhai, Peirong Ning, Feng Gao, and Ronggang Wang. 2023. MLIC: Multi-Reference Entropy Model for Learned Image Compression. In the 31st ACM International Conference on Multimedia, 7618–7627.
[19]
Joint Video Experts Team (JVET). 2021. VVC Official Test Model VTM. Retrieved from https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/-/tree/VTM-12.1
[20]
Faisal Khan, Shahid Hussain, Shubhajit Basak, Mohamed Moustafa, and Peter Corcoran. 2021. A Review of Benchmark Datasets and Training Loss Functions in Neural Depth Estimation. IEEE Access 9 (2021), 148479–148503.
[21]
Jiguo Li, Chuanmin Jia, Xinfeng Zhang, Siwei Ma, and Wen Gao. 2021. Cross Modal Compression: Towards Human-Comprehensible Semantic Compression. In 29th ACM International Conference on Multimedia, 4230–4238.
[22]
Jie Liu, Wenjie Zhang, Yuting Tang, Jie Tang, and Gangshan Wu. 2020. Residual Feature Aggregation Network for Image Super-Resolution. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2356–2365.
[23]
Guo Lu, Tianxiong Zhong, Jing Geng, Qiang Hu, and Dong Xu. 2022. Learning Based Multi-Modality Image and Video Compression. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6073–6082.
[24]
Siwei Ma, Xinfeng Zhang, Chuanmin Jia, Zhenghui Zhao, Shiqi Wang, and Shanshe Wang. 2019. Image and Video Compression with Neural Networks: A Review. IEEE Transactions on Circuits and Systems for Video Technology 30, 6 (2019), 1683–1698.
[25]
David Minnen, Johannes Ballé, and George Toderici. 2018. Joint Autoregressive and Hierarchical Priors for Learned Image Compression. In Advances in Neural Information Processing Systems, 10794–10803.
[26]
Fabrizio Pece, Jan Kautz, and Tim Weyrich. 2011. Adapting Standard Video Codecs for Depth Streaming. In Joint Virtual Reality Conference of EGVE (JVRC ’11), 59–66.
[27]
Bo Peng, Yuying Jing, Dengchao Jin, Xiangrui Liu, Zhaoqing Pan, and Jianjun Lei. 2022. Texture-Guided End-to-End Depth Map Compression. In 2022 IEEE International Conference on Image Processing, 2386–2390.
[28]
Daniel Seichter, Mona Köhler, Benjamin Lewandowski, Tim Wengefeld, and Horst-Michael Gross. 2021. Efficient RGB-D Semantic Segmentation for Indoor Scene Analysis. In 2021 IEEE International Conference on Robotics and Automation, 7132–7141.
[29]
Nathan Silberman, Pushmeet Kohli, Derek Hoiem, and Rob Fergus. 2012. Indoor Segmentation and Support Inference from RGBD Images. In European Conference on Computer Vision, 746–760.
[30]
Shuran Song, Samuel P. Lichtenberg, and Jianxiong Xiao. 2015. Sun RGB-D: A RGB-D Scene Understanding Benchmark Suite. In IEEE Conference on Computer Vision and Pattern Recognition, 567–576.
[31]
Lucas Theis, Wenzhe Shi, Andrew Cunningham, and Ferenc Huszár. 2017. Lossy Image Compression with Compressive Autoencoders. In International Conference on Learning Representations, 1–19.
[32]
Gregory K. Wallace. 1992. The JPEG Still Picture Compression Standard. IEEE Transactions on Consumer Electronics 38, 1 (1992), xviii–xxxiv.
[33]
Matthias Wödlinger, Jan Kotera, Jan Xu, and Robert Sablatnig. 2022. SASIC: Stereo Image Compression with Latent Shifts and Stereo Attention. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 651–660.
[34]
Yuyang Wu and Wei Gao. 2022. End-to-End Lossless Compression of High Precision Depth Maps Guided by Pseudo-Residual. In 2022 Data Compression Conference, 489–489.
[35]
Yuyang Wu, Zhiyang Qi, Huiming Zheng, Lvfang Tao, and Wei Gao. 2021. Deep Image Compression with Latent Optimization and Piece-Wise Quantization Approximation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1926–1930.
[36]
Runyu Yang, Dong Liu, Siwei Ma, Feng Wu, and Wen Gao. 2024. Perceptual Quality-Oriented Rate Allocation via Distillation from End-to-End Image Compression. ACM Transactions on Multimedia Computing, Communications and Applications 20, 198 (2024), 1–22.
[37]
Pingping Zhang, Shiqi Wang, Meng Wang, Jiguo Li, Xu Wang, and Sam Kwong. 2023. Rethinking Semantic Image Compression: Scalable Representation with Cross-Modality Transfer. IEEE Transactions on Circuits and Systems for Video Technology 33 (2023), 4441–4445.
[38]
Jing Zhao, Bin Li, Jiahao Li, Ruiqin Xiong, and Yan Lu. 2023. A Universal Optimization Framework for Learning-Based Image Codec. ACM Transactions on Multimedia Computing, Communications and Applications 20, 1 (2023), 1–19.
[39]
Tiesong Zhao, Weize Feng, HongJi Zeng, Yiwen Xu, Yuzhen Niu, and Jiaying Liu. 2022. Learning-Based Video Coding with Joint Deep Compression and Enhancement. In 30th ACM International Conference on Multimedia, 3045–3054.
[40]
Huiming Zheng and Wei Gao. 2024. End-to-End RGB-D Image Compression via Exploiting Channel-Modality Redundancy. In AAAI Conference on Artificial Intelligence, 7562–7570.
[41]
Renjie Zou, Chunfeng Song, and Zhaoxiang Zhang. 2022. The Devil Is in the Details: Window-Based Attention for Image Compression. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 17492–17501.

Index Terms

  1. RGB-D Data Compression via Bi-Directional Cross-Modal Prior Transfer and Enhanced Entropy Modeling

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 21, Issue 2
    February 2025
    651 pages
    EISSN:1551-6865
    DOI:10.1145/3703007
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 December 2024
    Online AM: 05 November 2024
    Accepted: 13 October 2024
    Revised: 11 September 2024
    Received: 13 May 2024
    Published in TOMM Volume 21, Issue 2

    Check for updates

    Author Tags

    1. RGB-D
    2. Cross-modal
    3. Image Compression
    4. Enhanced Entropy Model

    Qualifiers

    • Research-article

    Funding Sources

    • National Natural Science Foundation of China
    • Guangdong Basic and Applied Basic Research Foundation
    • Stable Support Project of Shenzhen

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 207
      Total Downloads
    • Downloads (Last 12 months)207
    • Downloads (Last 6 weeks)25
    Reflects downloads up to 19 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media