Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/3581783.3611851acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Toward Scalable Image Feature Compression: A Content-Adaptive and Diffusion-Based Approach

Published: 27 October 2023 Publication History

Abstract

Traditional image codecs prioritize signal fidelity and human perception, often neglecting machine vision tasks. Deep learning approaches have shown promising coding performance by leveraging rich semantic embeddings that can be optimized for both human and machine vision. However, these compact embeddings struggle to represent low-level details like contours and textures, leading to imperfect reconstructions. Additionally, existing learning-based coding tools lack scalability. To address these challenges, this paper presents a content-adaptive diffusion model for scalable image compression. The method encodes accurate texture through a diffusion process, enhancing human perception while preserving important features for machine vision tasks. It employs a Markov palette diffusion model with commonly-used feature extractors and image generators, enabling efficient data compression. By utilizing collaborative texture-semantic feature extraction and pseudo-label generation, the approach accurately learns texture information. A content-adaptive Markov palette diffusion model is then applied to capture both low-level texture and high-level semantic knowledge in a scalable manner. This framework enables elegant compression ratio control by flexibly selecting intermediate diffusion states, eliminating the need for deep learning model re-training at different operating points. Extensive experiments demonstrate the effectiveness of the proposed framework in image reconstruction and downstream machine vision tasks such as object detection, segmentation, and facial landmark detection. It achieves superior perceptual quality scores compared to state-of-the-art methods.

References

[1]
Johannes Ballé, Nick Johnston, and David Minnen. 2018. Integer networks for data compression with latent-variable models. In International Conference on Learning Representations.
[2]
Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. 2018. Variational image compression with a scale hyperprior. arXiv preprint arXiv:1802.01436 (2018).
[3]
Arpit Bansal, Eitan Borgnia, Hong-Min Chu, Jie S Li, Hamid Kazemi, Furong Huang, Micah Goldblum, Jonas Geiping, and Tom Goldstein. 2022. Cold diffusion: Inverting arbitrary image transforms without noise. arXiv preprint arXiv:2208.09392 (2022).
[4]
Yochai Blau and Tomer Michaeli. 2018. The perception-distortion tradeoff. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6228--6237.
[5]
Ronald Newbold Bracewell. 1986. The Fourier transform and its applications NY, McGraw-Hill.
[6]
Benjamin Bross, Ye-Kui Wang, Yan Ye, Shan Liu, Jianle Chen, Gary J Sullivan, and Jens-Rainer Ohm. 2021. Overview of the versatile video coding (VVC) standard and its applications. IEEE Transactions on Circuits and Systems for Video Technology 31, 10 (2021), 3736--3764.
[7]
Yoojin Choi, Mostafa El-Khamy, and Jungwon Lee. 2019. Variable rate deep image compression with a conditional autoencoder. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3146--3154.
[8]
Jiankang Deng, Jia Guo, Evangelos Ververas, Irene Kotsia, and Stefanos Zafeiriou. 2020. Retinaface: Single-shot multi-level face localisation in the wild. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5203--5212.
[9]
Yu Deng, Jiaolong Yang, Dong Chen, Fang Wen, and Xin Tong. 2020. Disentangled and controllable face image generation via 3d imitative-contrastive learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5154--5163.
[10]
Keyan Ding, Yi Liu, Xueyi Zou, Shiqi Wang, and Kede Ma. 2021. Locally adaptive structure and texture similarity for image quality assessment. In Proceedings of the 29th ACM International Conference on Multimedia. 2483--2491.
[11]
Tim Dockhorn, Arash Vahdat, and Karsten Kreis. 2022. Genie: Higher-order denoising diffusion solvers. Advances in Neural Information Processing Systems 35 (2022), 30150--30166.
[12]
Lingyu Duan, Jiaying Liu, Wenhan Yang, Tiejun Huang, and Wen Gao. 2020. Video coding for machines: A paradigm of collaborative compression and intelligent analytics. IEEE Transactions on Image Processing 29 (2020), 8680--8695.
[13]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets in advances in neural information processing systems (NIPS). Curran Associates, Inc. Red Hook, NY, USA (2014), 2672--2680.
[14]
Noor Fathima Goose, Jens Petersen, Auke Wiggers, Tianlin Xu, and Guillaume Sautiere. 2023. Neural Image Compression with a Diffusion-Based Decoder. arXiv preprint arXiv:2301.05489 (2023).
[15]
John A Hartigan and Manchek A Wong. 1979. Algorithm AS 136: A k-means clustering algorithm. Journal of the royal statistical society. series c (applied statistics) 28, 1 (1979), 100--108.
[16]
Dailan He, Ziming Yang, Weikun Peng, Rui Ma, Hongwei Qin, and Yan Wang. 2022. Elic: Efficient learned image compression with unevenly grouped space-channel contextual adaptive coding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5718--5727.
[17]
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9729--9738.
[18]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In Advances in Neural Information Processing Systems. 6626--6637.
[19]
Jonathan Ho, William Chan, Chitwan Saharia, Jay Whang, Ruiqi Gao, Alexey Gritsenko, Diederik P Kingma, Ben Poole, Mohammad Norouzi, David J Fleet, et al. 2022. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303 (2022).
[20]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems 33 (2020), 6840--6851.
[21]
Jonathan Ho, Chitwan Saharia, William Chan, David J Fleet, Mohammad Norouzi, and Tim Salimans. 2022. Cascaded Diffusion Models for High Fidelity Image Generation. J. Mach. Learn. Res. 23, 47 (2022), 1--33.
[22]
Weixin Hong, Tong Chen, Ming Lu, Shiliang Pu, and Zhan Ma. 2020. Efficient neural image decoding via fixed-point inference. IEEE Transactions on Circuits and Systems for Video Technology 31, 9 (2020), 3618--3630.
[23]
Tero Karras, Samuli Laine, and Timo Aila. 2019. A style-based generator architec-ture for generative adversarial networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 4401--4410.
[24]
Diederik P Kingma and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013).
[25]
Jani Lainema, Frank Bossen, Woo-Jin Han, Junghye Min, and Kemal Ugur. 2012. Intra coding of the HEVC standard. IEEE transactions on circuits and systems for video technology 22, 12 (2012), 1792--1801.
[26]
Kwot Sin Lee, Ngoc-Trung Tran, and Ngai-Man Cheung. 2021. Infomax-gan: Improved adversarial image generation via information maximization and contrastive learning. In Proceedings of the IEEE/CVF winter conference on applications of computer vision. 3942--3952.
[27]
Yue Li, Dong Liu, Houqiang Li, Li Li, Feng Wu, Hong Zhang, and Haitao Yang. 2017. Convolutional neural network-based block up-sampling for intra frame coding. IEEE Transactions on Circuits and Systems for Video Technology 28, 9 (2017), 2316--2330.
[28]
Jingyun Liang, Andreas Lugmayr, Kai Zhang, Martin Danelljan, Luc Van Gool, and Radu Timofte. 2021. Hierarchical conditional flow: A unified framework for image super-resolution and image rescaling. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4076--4085.
[29]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In European Conference on Computer Vision (ECCV). Springer, 740--755.
[30]
Fan Liu and Yong Deng. 2020. Determine the number of unknown targets in open world based on elbow method. IEEE Transactions on Fuzzy Systems 29, 5 (2020), 986--995.
[31]
Rui Liu, Yixiao Ge, Ching Lam Choi, Xiaogang Wang, and Hongsheng Li. 2021. Divco: Diverse conditional image synthesis via contrastive generative adversarial network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 16377--16386.
[32]
Eric Luhman and Troy Luhman. 2021. Knowledge distillation in iterative generative models for improved sampling speed. arXiv preprint arXiv:2101.02388 (2021).
[33]
Zhaoyang Lyu, Xudong Xu, Ceyuan Yang, Dahua Lin, and Bo Dai. 2022. Accelerating diffusion models via early stop of the diffusion process. arXiv preprint arXiv:2205.12524 (2022).
[34]
Siwei Ma, Tiejun Huang, Cliff Reader, and Wen Gao. 2015. AVS2? Making video coding smarter [standards in a nutshell]. IEEE Signal Processing Magazine 32, 2 (2015), 172--183.
[35]
Chenlin Meng, Robin Rombach, Ruiqi Gao, Diederik Kingma, Stefano Ermon, Jonathan Ho, and Tim Salimans. 2023. On distillation of guided diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14297--14306.
[36]
Fabian Mentzer, Eirikur Agustsson, Michael Tschannen, Radu Timofte, and Luc Van Gool. 2018. Conditional probability models for deep image compression. In Advances in Neural Information Processing Systems (NeurIPS). 4390--4401.
[37]
Fabian Mentzer, George D Toderici, Michael Tschannen, and Eirikur Agustsson. 2020. High-fidelity generative image compression. Advances in Neural Information Processing Systems 33 (2020), 11913--11924.
[38]
David Minnen, Johannes Ballé, and George D Toderici. 2018. Joint autoregressive and hierarchical priors for learned image compression. Advances in neural information processing systems 31 (2018).
[39]
David Minnen, George Toderici, Michele Covell, Troy Chinen, Nick Johnston, Joel Shor, Sung Jin Hwang, Damien Vincent, and Saurabh Singh. 2017. Spatially adaptive image compression using a tiled deep network. In 2017 IEEE International Conference on Image Processing (ICIP). IEEE, 2796--2800.
[40]
Debargha Mukherjee and Sanjit K Mitra. 2014. WebP: A new image format for the web. Journal of Signal Processing Systems 74, 3 (2014), 327--338.
[41]
Taesung Park, Alexei A Efros, Richard Zhang, and Jun-Yan Zhu. 2020. Contrastive learning for unpaired image-to-image translation. In Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part IX 16. Springer, 319--345.
[42]
Jonathan Pfaff, Alexey Filippov, Shan Liu, Xin Zhao, Jianle Chen, Santiago DeLuxán-Hernández, Thomas Wiegand, Vasily Rufitskiy, Adarsh Krishnan Ramasubramonian, and Geert Van der Auwera. 2021. Intra prediction and mode coding in VVC. IEEE Transactions on Circuits and Systems for Video Technology 31, 10 (2021), 3834--3847.
[43]
Pradeep Ramachandran, Dzung T Nguyen, Vinod Pandit, Cheng Xu, Jianle Li, San Li, Shijun Li, Wenli Xu, Wei Liu, Zongming Li, et al. 2013. x265: A HEVC/H.265 Video Encoder Implementation. IEEE Transactions on Circuits and Systems for Video Technology 23, 9 (2013), 1485--1497.
[44]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10684--10695.
[45]
Tim Salimans and Jonathan Ho. 2022. Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:2202.00512 (2022).
[46]
Jérémy Salomon and Thomas Lecroq. 2012. WebP: A new image format for the Web. Signal Processing: Image Communication 27, 3 (2012), 157--167.
[47]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[48]
Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning. PMLR, 2256--2265.
[49]
Gary J Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on circuits and systems for video technology 22, 12 (2012), 1649--1668.
[50]
Wanjie Sun and Zhenzhong Chen. 2020. Learned image downscaling for upscaling using content adaptive resampler. IEEE Transactions on Image Processing 29 (2020), 4027--4040.
[51]
Gregory K Wallace. 1992. The JPEG still picture compression standard. IEEE transactions on consumer electronics 38, 1 (1992), xviii--xxxiv.
[52]
Ce Wang, Bin He, Shengsen Wu, Renjie Wan, Boxin Shi, and Ling-Yu Duan. 2023. Coarse-to-fine Disentangling Demoiréing Framework for Recaptured Screen Images. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).
[53]
Wenhai Wang, Jifeng Dai, Zhe Chen, Zhenhang Huang, Zhiqi Li, Xizhou Zhu, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, et al. 2022. Internimage: Exploring large-scale vision foundation models with deformable convolutions. arXiv preprint arXiv:2211.05778 (2022).
[54]
Xinlong Wang, Rufeng Zhang, Chunhua Shen, Tao Kong, and Lei Li. 2021. Dense contrastive learning for self-supervised visual pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3024--3033.
[55]
Daniel Watson, William Chan, Jonathan Ho, and Mohammad Norouzi. 2021. Learning fast samplers for diffusion models by differentiating through sample quality. In International Conference on Learning Representations.
[56]
Thomas Wiegand, Gary J Sullivan, Gisle Bjontegaard, and Ajay Luthra. 2003. Overview of the H. 264/AVC video coding standard. IEEE Transactions on circuits and systems for video technology 13, 7 (2003), 560--576.
[57]
Mingqing Xiao, Shuxin Zheng, Chang Liu, Zhouchen Lin, and Tie-Yan Liu. 2023. Invertible Rescaling Network and Its Extensions. International Journal of Computer Vision 131, 1 (2023), 134--159.
[58]
Yueqi Xie, Ka Leong Cheng, and Qifeng Chen. 2021. Enhanced invertible encoding for learned image compression. In Proceedings of the 29th ACM international conference on multimedia. 162--170.
[59]
Shuo Yang, Ping Luo, Chen-Change Loy, and Xiaoou Tang. 2016. Wider face: A face detection benchmark. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5525--5533.
[60]
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition. 586--595.
[61]
Huangjie Zheng, Pengcheng He, Weizhu Chen, and Mingyuan Zhou. 2022. Truncated diffusion probabilistic models. stat 1050 (2022), 7.
[62]
Yingbo Zhou, Pengcheng Zhao, Weiqin Tong, and Yongxin Zhu. 2021. CDL-GAN: contrastive distance learning generative adversarial network for image generation. Applied Sciences 11, 4 (2021), 1380.

Cited By

View all
  • (2024)A Unified Image Compression Method for Human Perception and Multiple Vision TasksComputer Vision – ECCV 202410.1007/978-3-031-73209-6_20(342-359)Online publication date: 1-Nov-2024

Index Terms

  1. Toward Scalable Image Feature Compression: A Content-Adaptive and Diffusion-Based Approach
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '23: Proceedings of the 31st ACM International Conference on Multimedia
    October 2023
    9913 pages
    ISBN:9798400701085
    DOI:10.1145/3581783
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. diffusion
    2. image compression
    3. scalable feature representation
    4. video coding for machines

    Qualifiers

    • Research-article

    Funding Sources

    • the Basic and Frontier Research Project of PCL
    • the National Natural Science Foundation of China

    Conference

    MM '23
    Sponsor:
    MM '23: The 31st ACM International Conference on Multimedia
    October 29 - November 3, 2023
    Ottawa ON, Canada

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)434
    • Downloads (Last 6 weeks)36
    Reflects downloads up to 16 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Unified Image Compression Method for Human Perception and Multiple Vision TasksComputer Vision – ECCV 202410.1007/978-3-031-73209-6_20(342-359)Online publication date: 1-Nov-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media