Abstract
This paper presents a block transform for image compression, where the transform is inspired by discrete cosine transform (DCT) but achieved by training convolutional neural network (CNN) models. Specifically, we adopt the combination of convolution, nonlinear mapping, and linear transform to form a non-linear transform as well as a non-linear inverse transform. The transform, quantization, and inverse transform are jointly trained to achieve the overall rate-distortion optimization. For the training purpose, we propose to estimate the rate by the \(l_1\)-norm of the quantized coefficients. We also explore different combinations of linear/non-linear transform and inverse transform. Experimental results show that our proposed CNN-based transform achieves higher compression efficiency than fixed DCT, and also outperforms JPEG significantly at low bit rates.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
https://github.com/tensorflow/models/tree/master/compression. This network has no entropy coding since the authors do not provide.
References
Wallace, G.K.: The JPEG still picture compression standard. IEEE Trans. Consum. Electron. 38(1), xviii–xxxiv (1992)
Christopoulos, C., Skodras, A., Ebrahimi, T.: The JPEG2000 still image coding system: an overview. IEEE Trans. Consum. Electron. 46(4), 1103–1127 (2000)
Wiegand, T., Sullivan, G.J., Bjontegaard, G., Luthra, A.: Overview of the H.264/AVC video coding standard. IEEE Trans. Circ. Syst. Video Technol. 13(7), 560–576 (2003)
Sullivan, G.J., Ohm, J., Han, W.J., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circ. Syst. Video Technol. 22(12), 1649–1668 (2012)
Hu, W., Cheung, G., Ortega, A., Au, O.C.: Multiresolution graph fourier transform for compression of piecewise smooth images. IEEE Trans. Image Process. 24(1), 419–433 (2015)
Toderici, G., O’Malley, S.M., Hwang, S.J., Vincent, D., Minnen, D., Baluja, S., Covell, M., Sukthankar, R.: Variable rate image compression with recurrent neural networks. In: ICLR (2016)
Toderici, G., Vincent, D., Johnston, N., Hwang, S.J., Minnen, D., Shor, J., Covell, M.: Full resolution image compression with recurrent neural networks. In: CVPR, pp. 5306–5314 (2017)
Johnston, N., Vincent, D., Minnen, D., Covell, M., Singh, S., Chinen, T., Hwang, S.J., Shor, J., Toderici, G.: Improved lossy image compression with priming and spatially adaptive bit rates for recurrent networks. arXiv preprint arXiv:1703.10114 (2017)
Ballé, J., Laparra, V., Simoncelli, E.P.: End-to-end optimized image compression. In: ICLR (2017)
Theis, L., Shi, W., Cunningham, A., Huszár, F.: Lossy image compression with compressive autoencoders. In: ICLR (2017)
Rippel, O., Bourdev, L.: Real-time adaptive image compression. In: ICML, pp. 2922–2930 (2017)
Jiang, F., Tao, W., Liu, S., Ren, J., Guo, X., Zhao, D.: An end-to-end compression framework based on convolutional neural networks. IEEE Trans. Circ. Syst. Video Technol. (2017). https://doi.org/10.1109/TCSVT.2017.2734838
Baig, M.H., Torresani, L.: Multiple hypothesis colorization and its application to image compression. Comput. Vis. Image Underst. (2017)
Prakash, A., Moran, N., Garber, S., DiLillo, A., Storer, J.: Semantic perceptual image compression using deep convolution networks. In: DCC, pp. 250–259 (2017)
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Wong, C.W., Au, O.C., Lam, H.K.: Rate control using probability of non-zero quantized coefficients. In: ICME (2004)
Candes, E.J., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory 51(12), 4203–4215 (2005)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: ICML, pp. 807–814 (2010)
Schaefer, G., Stich, M.: UCID: an uncompressed color image database. In: Electronic Imaging 2004, International Society for Optics and Photonics, pp. 472–480 (2004)
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM Multimedia, pp. 675–678. ACM (2014)
Said, A.: Introduction to arithmetic coding - theory and practice. Technical report HPL-2004-76, Hewlett Packard Laboratories Palo Alto (2004)
Acknowledgment
This work was supported by the Natural Science Foundation of China (NSFC) under Grant 61772483, Grant 61390512, and Grant 61425026, and by the Fundamental Research Funds for the Central Universities under Grant WK3490000001.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Liu, D., Ma, H., Xiong, Z., Wu, F. (2018). CNN-Based DCT-Like Transform for Image Compression. In: Schoeffmann, K., et al. MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science(), vol 10705. Springer, Cham. https://doi.org/10.1007/978-3-319-73600-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-73600-6_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73599-3
Online ISBN: 978-3-319-73600-6
eBook Packages: Computer ScienceComputer Science (R0)