Abstract
Image compression is to compress image data without compromising human vision feeling. However, the information loss through the image compression process may influence the following machine vision tasks, such as object detection and semantic segmentation. How to jointly consider the human vision and the machine vision to compress images for human and machine vision tasks is still an open problem. In this paper, we provide a multi-task framework for image compression and semantic segmentation. More specifically, an end-to-end mutual enhancement network is designed to efficiently compress the given image, and simultaneously segment the semantic information. Firstly, a uniform feature learning strategy is adopted to jointly learn the features for image compression and semantic segmentation in the encoder. Moreover, a multi-scale aggregation module in the encoder is employed to enhance the semantic features. Then, by transmitting the quantified features, both the decompressed image features and the learned semantic features can be reconstructed. Finally, we decode this information for the image compression task and the semantic segmentation task. On one hand, we can utilize the decompressed semantic features to implement semantic segmentation in the decoder. On the other hand, the quality of the decompressed image can be further improved depending on the obtained semantic segmentation map. Experimental results prove that our framework is effective to simultaneously support image compression and semantic segmentation, both in the subjective and objective evaluation.
This work was supported by National Natural Science Foundation of China (61972028, 61902022) and the Fundamental Research Funds for the Central Universities (2019JBM018, FRF-TP-19-015A1).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ballé, J., Laparra, V., Simoncelli, E.P.: End-to-end optimized image compression. In: 5th International Conference on Learning Representations, ICLR 2017 (2017)
Ballé, J., Minnen, D., Singh, S., Hwang, S.J., Johnston, N.: Variational image compression with a scale hyperprior. In: 6th International Conference on Learning Representations, ICLR 2018 (2018)
Lee, J., Cho, S., Beack, S.K.: Context-adaptive entropy model for end-to-end optimized image compression. In: 6th International Conference on Learning Representations, ICLR 2018 (2018)
Cheng, Z., Sun, H., Takeuchi, M., Katto, J.: Learned image compression with discretized gaussian mixture likelihoods and attention modules. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7939–7948 (2020)
Duan, L., Liu, J., Yang, W., Huang, T., Gao, W.: Video coding for machines: a paradigm of collaborative compression and intelligent analytics. IEEE Trans. Image Process. 29, 8680–8695 (2020)
Liu, D., Li, Y., Lin, J., Li, H., Wu, F.: Deep learning-based video coding: a review and a case study. ACM Comput. Surv. (CSUR) 53(1), 1–35 (2020)
Lin, W., et al.: Partition-aware adaptive switching neural networks for post-processing in HEVC. IEEE Trans. Multimed. 22(11), 2749–2763 (2019)
Cui, W., et al.: Convolutional neural networks based intra prediction for HEVC. In: 2017 Data Compression Conference (DCC), pp. 436–436. IEEE Computer Society (2017)
Mao, J., Yu, L.: Convolutional neural network based bi-prediction utilizing spatial and temporal information in video coding. IEEE Trans. Circ. Syst. Video Technol. 30(7), 1856–1870 (2019)
Song, R., Liu, D., Li, H., Wu, F.: Neural network-based arithmetic coding of intra prediction modes in HEVC. In: Visual Communications and Image Processing (VCIP), pp. 1–4. IEEE (2017)
Liu, D., Ma, H., Xiong, Z., Wu, F.: CNN-based DCT-like transform for image compression. In: Schoeffmann, K., et al. (eds.) MMM 2018. LNCS, vol. 10705, pp. 61–72. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73600-6_6
Alam, M.M., Nguyen, T.D., Hagan, M.T., Chandler, D.M.: A perceptual quantization strategy for HEVC based on a convolutional neural network trained on natural images. In: Applications of Digital Image Processing, vol. 9599, p. 959918. International Society for Optics and Photonics (2015)
Bross, B., Chen, J., Ohm, J.R., Sullivan, G.J., Wang, Y.K.: Developments in international video coding standardization after AVC, with an overview of versatile video coding (VVC). In: Proceedings of the IEEE (2021)
Sullivan, G.J., Ohm, J.R., Han, W.J., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circ. Syst. Video Technol. 22(12), 1649–1668 (2012)
Hou, D., Zhao, Y., Ye, Y., Yang, J., Zhang, J., Wang, R.: Super-resolving compressed video in coding chain. arXiv preprint arXiv:2103.14247 (2021)
Ho, M.M., Zhou, J., He, G.: RR-DnCNN v2.0: enhanced restoration-reconstruction deep neural network for down-sampling-based video coding. IEEE Trans. Image Process. 30, 1702–1715 (2021)
Akbari, M., Liang, J., Han, J.: DSSLIC: deep semantic segmentation-based layered image compression. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2042–2046. IEEE (2019)
Sun, S., He, T., Chen, Z.: Semantic structured image coding framework for multiple intelligent applications. IEEE Trans. Circ. Syst. Video Technol. 31(9), 3631–3642 (2020)
Hoang, T.M., Zhou, J., Fan, Y.: Image compression with encoder-decoder matched semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 160–161 (2020)
Ballé, J., Laparra, V., Simoncelli, E.P.: Density modeling of images using a generalized normalization transformation. In: 4th International Conference on Learning Representations, ICLR 2016 (2016)
Romera, E., Alvarez, J.M., Bergasa, L.M., Arroyo, R.: ERFNet: efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst. 19(1), 263–272 (2017)
Kodak, E.: Kodak lossless true color image suite (PhotoCD PCD0992), vol. 6. http://r0k.us/graphics/kodak (1993)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Wallace, G.K.: The JPEG still picture compression standard. IEEE Trans. Consum. Electron. 38(1), 18–34 (1992)
Skodras, A., Christopoulos, C., Ebrahimi, T.: The JPEG 2000 still image compression standard. IEEE Signal Process. Mag. 18(5), 36–58 (2001)
Bellard, F.: Better portable graphics. https://www.bellard.org/bpg (2014)
Lin, G., Milan, A., Shen, C., Reid, I.: RefineNet: multi-path refinement networks with identity mappings for high-resolution semantic segmentation. arXiv preprint arXiv:1611.06612
Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015)
Lin, G., Shen, C., Van Den Hengel, A., Reid, I.: Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3194–3203 (2016)
Krešo, I., Čaušević, D., Krapac, J., Šegvić, S.: Convolutional scale invariance for semantic segmentation. In: Rosenhahn, B., Andres, B. (eds.) GCPR 2016. LNCS, vol. 9796, pp. 64–75. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-45886-1_6
Ghiasi, G., Fowlkes, C.C.: Laplacian reconstruction and refinement for semantic segmentation. arXiv preprint arXiv:1605.02264, vol. 4(4) (2016)
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, J., Yao, C., Liu, M., Zhao, Y. (2021). An End-to-End Mutual Enhancement Network Toward Image Compression and Semantic Segmentation. In: Ma, H., et al. Pattern Recognition and Computer Vision. PRCV 2021. Lecture Notes in Computer Science(), vol 13020. Springer, Cham. https://doi.org/10.1007/978-3-030-88007-1_51
Download citation
DOI: https://doi.org/10.1007/978-3-030-88007-1_51
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88006-4
Online ISBN: 978-3-030-88007-1
eBook Packages: Computer ScienceComputer Science (R0)