CN112929666A - Method, device and equipment for training coding and decoding network and storage medium - Google Patents
Method, device and equipment for training coding and decoding network and storage medium Download PDFInfo
- Publication number
- CN112929666A CN112929666A CN202110303982.1A CN202110303982A CN112929666A CN 112929666 A CN112929666 A CN 112929666A CN 202110303982 A CN202110303982 A CN 202110303982A CN 112929666 A CN112929666 A CN 112929666A
- Authority
- CN
- China
- Prior art keywords
- image
- network
- decoding
- coding
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012549 training Methods 0.000 title claims abstract description 72
- 238000000034 method Methods 0.000 title claims abstract description 60
- 238000012545 processing Methods 0.000 claims abstract description 88
- 230000006835 compression Effects 0.000 claims abstract description 71
- 238000007906 compression Methods 0.000 claims abstract description 71
- 238000011084 recovery Methods 0.000 claims abstract description 51
- 239000011159 matrix material Substances 0.000 claims description 92
- 239000013598 vector Substances 0.000 claims description 70
- 238000004891 communication Methods 0.000 claims description 34
- 230000015654 memory Effects 0.000 claims description 26
- 238000004590 computer program Methods 0.000 claims description 8
- 230000008569 process Effects 0.000 description 16
- 238000010586 diagram Methods 0.000 description 14
- 230000006870 function Effects 0.000 description 14
- 238000005516 engineering process Methods 0.000 description 11
- 230000005540 biological transmission Effects 0.000 description 7
- 238000004422 calculation algorithm Methods 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 238000006243 chemical reaction Methods 0.000 description 4
- 230000006837 decompression Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000008439 repair process Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000000444 liquid chromatography-electrochemical detection Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 230000008054 signal transmission Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/172—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/147—Data rate or code amount at the encoder output according to rate distortion criteria
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/44—Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The application discloses a method, a device, equipment and a storage medium for training a coding and decoding network. The method comprises the following steps: acquiring a sample image while performing each round of training; inputting the sample image into an encoding and decoding network, and sequentially performing compression encoding processing and decoding recovery processing on the sample image by using the encoding and decoding network to obtain a decoded image corresponding to the sample image; determining an image loss value and a decoding loss value corresponding to an encoding and decoding network according to the sample image and the decoding image corresponding to the sample image; determining a network loss value corresponding to the coding and decoding network according to the image loss value and the decoding loss value corresponding to the coding and decoding network; and determining the convergence of the coding and decoding network when the network loss value corresponding to the coding and decoding network is in a preset network convergence range through multiple rounds of training. According to the method and the device, the definition of the decoded image is ensured, meanwhile, the distortion condition of the decoded image is controlled, the quality of the restored image is high, and the decoded image is not required to be restored by using a restoration network.
Description
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for training a codec network.
Background
With the popularization of smart devices, more and more images are produced, the data volume of high-resolution images is often large, and the large-data-volume images occupy more resources (storage space and transmission resources) in both storage scenes and transmission scenes. To solve such a problem, image compression techniques have been developed. The original image is subjected to image coding by using an image compression technology, the original image can be compressed into compressed data with a small data volume, then the compressed data is subjected to image decoding, and the compressed data can be restored into the original image.
Currently, the commonly used image compression technologies are WebP image compression technology and BPG (Better Portable Graphics) image compression technology. WebP supports lossy compression and lossless compression, adopts VP8 encoding mode, and many websites adopt WebP picture format. The BPG is a charging project, the usage cost is very High, the BPG adopts an HEVC (High Efficiency Video Coding) encoding mode, and the files in the BPG format are only half of the files in the JPEG (Joint Photographic Experts Group) format under the same storage volume.
However, the WebP image compression technology and the BPG image compression technology only consider how to compress an image into compressed data with a small data amount, and do not consider how to avoid the problem of serious distortion in the image recovery process, so that the quality of the recovered original image is not high.
Disclosure of Invention
The application provides a training method, a device, equipment and a storage medium of a coding and decoding network, which aim to solve the problem of image distortion in image recovery of the existing image compression technology.
In view of the above technical problems, the present application is implemented by the following technical solutions:
the embodiment of the application provides a training method of a coding and decoding network, which comprises the following steps: acquiring a sample image while performing each round of training; inputting the sample image into an encoding and decoding network, and sequentially performing compression encoding processing and decoding recovery processing on the sample image by using the encoding and decoding network to obtain a decoded image corresponding to the sample image; determining an image loss value and a decoding loss value corresponding to the coding and decoding network according to the sample image and a decoding image corresponding to the sample image; determining a network loss value corresponding to the coding and decoding network according to the image loss value and the decoding loss value corresponding to the coding and decoding network; and determining the convergence of the coding and decoding network when the network loss value corresponding to the coding and decoding network is in a preset network convergence range through multiple rounds of training.
Wherein, the determining the decoding loss value corresponding to the coding and decoding network according to the sample image and the decoding image corresponding to the sample image comprises: inputting the sample image and a decoded image corresponding to the sample image into a preset enhancement network; determining, by the enhancement network, a distortion rate of the sample image and a distortion rate of a decoded image corresponding to the sample image; and determining a decoding loss value corresponding to the coding and decoding network according to the distortion rate of the sample image and the distortion rate of the decoded image.
Wherein the obtaining a sample image comprises: acquiring at least one sample image; the inputting the sample image into a coding and decoding network, and performing compression coding processing and decoding recovery processing on the sample image in sequence by using the coding and decoding network to obtain a decoded image corresponding to the sample image, includes: generating a sample image matrix corresponding to the at least one sample image; each row vector in the sample image matrix is a one-dimensional sample image; inputting the sample image matrix into the coding and decoding network, and sequentially performing compression coding processing and decoding recovery processing on the sample image matrix by using the image coding network to obtain a decoded image matrix; each row vector in the decoded image matrix is a one-dimensional decoded image corresponding to one sample image.
Wherein, the coding and decoding network comprises: an image encoding network and an image decoding network; the output end of the image coding network is connected with a preset storage device; the input end of the image decoding network is connected with the storage device; the inputting the sample image into a coding and decoding network, and performing compression coding processing and decoding recovery processing on the sample image in sequence by using the coding and decoding network to obtain a decoded image corresponding to the sample image, includes: inputting the sample image into the image coding network, performing compression coding processing on the sample image by using the image coding network to obtain compressed data corresponding to the sample image and outputting the compressed data corresponding to the sample image to the storage device; acquiring compressed data corresponding to the sample image from the storage device through the image decoding network; and performing decoding recovery processing on the compressed data corresponding to the sample image by using the image decoding network to obtain a decoded image corresponding to the sample image.
After determining that the coding and decoding network converges, the method further comprises: receiving an image storage instruction; the image storage instructions are to instruct to store a first target image; inputting the first target image into the coding and decoding networks, and selecting an image coding network in the coding and decoding networks according to the image storage indication; and performing compression coding processing on the first target image by using the image coding network to obtain compressed data corresponding to the first target image and outputting the compressed data to the storage device so as to store the first target image.
Wherein the image storage instructions are to instruct to store at least one first target image; the inputting the first target image into the coding and decoding network comprises: generating a first target image matrix corresponding to the at least one first target image; each row vector in the first target image matrix is a one-dimensional first target image; inputting the first target image matrix into the coding and decoding network; the performing, by using the image coding network, compression coding processing on the first target image to obtain compressed data corresponding to the first target image includes: performing compression coding processing on the first target image matrix by using the image coding network to obtain a compressed data matrix corresponding to the first target image matrix; each row vector in the compressed data matrix is compressed data corresponding to one first target image.
After determining that the coding and decoding network converges, the method further comprises: receiving an image reading instruction; the image reading instruction is used for indicating to read a second target image; acquiring compressed data corresponding to the second target image from the storage device according to the image reading instruction, and inputting the compressed data corresponding to the second target image into the coding and decoding network; and selecting an image decoding network in the coding and decoding network according to the image reading instruction, and performing decoding recovery processing on the compressed data corresponding to the second target image by using the image decoding network to obtain a decoded image corresponding to the second target image.
Wherein the image reading instruction is used for indicating to read at least one second target image; before the compressed data corresponding to the second target image is input into the codec network, the method further includes: generating a compressed data matrix according to the compressed data corresponding to the at least one second target image respectively; each row vector in the compressed data matrix corresponds to one second target image; inputting the compressed data matrix into the coding and decoding network; the decoding recovery processing is performed on the compressed data corresponding to the second target image by using the image decoding network to obtain a decoded image corresponding to the second target image, and the decoding recovery processing includes: utilizing the image decoding network to perform decoding recovery processing on the compressed data matrix to obtain a decoded image matrix corresponding to the compressed data matrix; each row vector in the decoded image matrix is a one-dimensional decoded image corresponding to one second target image.
When the network loss value corresponding to the coding and decoding network is in a preset network convergence range after multiple rounds of training, determining that the coding and decoding network converges comprises the following steps: in the training of continuous preset turns, when the network loss values corresponding to the coding and decoding networks are all in the network convergence range, determining that the coding and decoding networks are converged; or, after multiple rounds of training, when the network loss value corresponding to the coding and decoding network is in the network convergence range for the first time, determining that the coding and decoding network converges.
The embodiment of the present application further provides a training apparatus for a coding/decoding network, including: the acquisition module is used for acquiring a sample image when each round of training is executed; the coding and decoding module is used for inputting the sample images into a coding and decoding network, and sequentially executing compression coding processing and decoding recovery processing on the sample images by using the coding and decoding network to obtain decoded images corresponding to the sample images; a first determining module, configured to determine an image loss value and a decoding loss value corresponding to the coding and decoding network according to the sample image and a decoded image corresponding to the sample image; a second determining module, configured to determine a network loss value corresponding to the coding and decoding network according to the image loss value and the decoding loss value corresponding to the coding and decoding network; and the third determining module is used for determining the convergence of the coding and decoding network when the network loss value corresponding to the coding and decoding network is in a preset network convergence range after multiple rounds of training.
The embodiment of the application also provides electronic equipment which comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete mutual communication through the communication bus; a memory for storing a computer program; and a processor for implementing the steps of the method for training the codec network described in any one of the above when executing the program stored in the memory.
The embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method for training a codec network described in any one of the above.
Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages:
in the embodiment of the application, the coding and decoding network is used for carrying out compression coding and decoding recovery on the image, in the process of training the coding and decoding network, not only the image loss value of the whole coding and decoding network is concerned, but also the decoding loss value in the coding and decoding network is concerned, the image loss value is the measure of coding and decoding loss, the decoding loss value is the supervision of the decoding accuracy, the coding and decoding effects of the coding and decoding network are determined by adopting double standards of the image loss value and the decoding loss value, the definition of the decoded image is ensured, meanwhile, the distortion condition of the decoded image is controlled, the quality of the recovered image is higher, and the decoded image does not need to be repaired by using a repairing network.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
FIG. 1 is a flow chart of a method of training a codec network according to an embodiment of the present application;
FIG. 2 is a schematic diagram of the operation of an autoencoder according to an embodiment of the present application;
FIG. 3 is a schematic structural diagram of a multi-layer AE network according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a training network structure of a codec network according to an embodiment of the present invention;
FIG. 5 is a flowchart illustrating the training steps of a codec network according to an embodiment of the present invention;
FIG. 6 is a flowchart of the steps of image storage according to one embodiment of the present invention;
FIG. 7 is a flowchart of the steps of image reading according to one embodiment of the present invention;
FIG. 8 is a block diagram of a training apparatus of a codec network according to an embodiment of the present application;
FIG. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 10 is a diagram illustrating image access in a duplex communication mode according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application provides a training method of an encoding and decoding network. Fig. 1 is a flowchart of a training method for a coding/decoding network according to an embodiment of the present disclosure.
In step S110, a sample image is acquired while each round of training is performed.
The sample image refers to an original image used for training encoding and decoding.
Acquiring a preset sample image set; including a plurality of sample images in a sample image set; at each round of training, at least one sample image is acquired from a set of sample images.
Step S120, inputting the sample image into a coding and decoding network, and sequentially performing compression coding processing and decoding recovery processing on the sample image by using the coding and decoding network to obtain a decoded image corresponding to the sample image.
And the coding and decoding network is used for executing compression coding processing on the image to obtain compressed data corresponding to the image, and executing decoding recovery processing on the compressed data corresponding to the image to obtain a decoded image corresponding to the image. The coding and decoding network is a deep neural network. Further, the codec network may be a CED (Compression Encode and Decode, Compression codec network), VAE (variation Auto-Encoder), DAE (Deep Auto-Encoder), or a feature extraction network.
And compression encoding processing for compressing the image, the data amount of the obtained compressed data being smaller than that of the sample image.
And a decoding restoration process for restoring the compressed data to an image.
The decoded image refers to an image restored by a coding and decoding network according to the compressed data.
Specifically, the at least one sample image obtained from the sample image set is input to a coding and decoding network, and the coding and decoding network is used to perform compression coding processing and decoding recovery processing on the at least one sample image, so as to obtain a decoded image corresponding to each sample image.
Further, when the number of the sample images is at least one, generating a sample image matrix corresponding to the at least one sample image; each row vector in the sample image matrix is a one-dimensional sample image; inputting the sample image matrix into the coding and decoding network, and sequentially performing compression coding processing and decoding recovery processing on the sample image matrix by using the image coding network to obtain a decoded image matrix; each row vector in the decoded image matrix is a one-dimensional decoded image corresponding to one sample image.
Generating a sample image matrix corresponding to the at least one sample image, including: converting each sample image in the at least one sample image into a one-dimensional image vector, and generating a sample image matrix according to the one-dimensional image vectors corresponding to the at least one sample image; each row vector in the sample image matrix represents a one-dimensional image vector corresponding to one sample image. The multidimensional sample image can be converted into a one-dimensional image vector by using a preset dimension conversion algorithm. The dimension conversion algorithm includes, but is not limited to: matrix () function algorithm.
In order to visualize the decoded image, after obtaining the interface image matrix, a row of vectors corresponding to each sample image in the decoded image matrix may be converted into a multidimensional vector according to the number of dimensions of each sample image, so as to obtain a multidimensional decoded image corresponding to the sample image.
Step S130, determining an image loss value and a decoding loss value corresponding to the coding and decoding network according to the sample image and the decoding image corresponding to the sample image.
And the image loss value is used for measuring the difference degree (coding and decoding loss) between the sample image and the decoded image.
And the decoding loss value is used for measuring the decoding correctness of the coding and decoding network.
The lower the coding and decoding loss is, the higher the decoding accuracy is, the clearer the decoded image output by the coding and decoding network is, and the lower the distortion rate is. The higher the coding and decoding loss is, the lower the decoding accuracy is, the less clear the decoded image output by the coding and decoding network is, and the higher the distortion rate is.
The following will specifically describe how to determine the image loss value and the decoding loss value, and therefore, the details thereof are not repeated herein.
Step S140, determining the network loss value corresponding to the coding and decoding network according to the image loss value and the decoding loss value corresponding to the coding and decoding network.
And determining the sum value or the weighted sum of the image loss value and the decoding loss value corresponding to the coding and decoding network as the network loss value corresponding to the coding and decoding network.
And S150, determining that the coding and decoding network is converged when the network loss value corresponding to the coding and decoding network is in a preset network convergence range through multiple rounds of training.
The network convergence range is used for measuring whether the network loss value of the coding and decoding network meets the application requirement. The two end values of the network convergence range may be empirical values or experimental values.
The network convergence range can control the codec loss of the codec network and the decoding accuracy.
Specifically, in a continuous preset round of training, when the network loss values corresponding to the coding and decoding network are all in the network convergence range, it may be determined that the coding and decoding network converges; or, after multiple rounds of training, when the network loss value corresponding to the coding and decoding network is in the network convergence range for the first time, determining that the coding and decoding network converges. When the network loss values of the coding and decoding network in the multiple rounds of training are all in the preset network convergence range, the loss values of the coding and decoding network meet the application requirements and tend to be stable.
In the embodiment, an encoding and decoding network is used for carrying out compression encoding and decoding recovery on an image, in the process of training the encoding and decoding network, not only the image loss value of the whole encoding and decoding network is concerned, but also the decoding loss value in the encoding and decoding network is concerned, the image loss value is the measure of encoding and decoding loss, the decoding loss value is used for monitoring the decoding accuracy, the encoding and decoding effects of the encoding and decoding network are determined by adopting double standards of the image loss value and the decoding loss value, the definition of the decoded image is ensured, meanwhile, the distortion condition of the decoded image is controlled, the quality of the recovered image is high, and the decoded image does not need to be repaired by using a repairing network. The repair network is used to repair the distorted image.
In this embodiment, the codec network may be deployed in a high-speed operator. Further, in order to obtain a better compression effect and a sufficiently high encoding and decoding efficiency, the embodiment of the application may compress and decode the image in a manner of combining a trained deep neural network and a high-speed arithmetic unit. Moreover, the deep neural network can process the matrix, so that the embodiment of the application can compress and decode the images in batch.
In order to make the embodiments of the present application clearer, a network structure of a codec network is described below.
A codec network comprising: an image encoding network and an image decoding network.
And the image coding network is used for performing compression coding processing on the image to obtain compressed data.
And the image decoding network is used for executing decoding recovery processing on the compressed data and recovering the compressed data into an image.
In the embodiment of the application, the coding and decoding network is a deep neural network. The deep neural network compression image is used for extracting features of the image, the less the extracted features are, the more the image is compressed, the more the extracted features are, and the smaller the image is compressed. Such features include, but are not limited to: color features, texture features, shape features, and spatial relationship features.
The image encoding network includes a plurality of encoding layers and the image decoding network includes a plurality of decoding layers. The coding layer and the decoding layer are symmetric, namely: the number of the coding layers is the same as that of the decoding layers, and the architecture of the coding layers is the same as that of the decoding layers.
Codec networks may be applied in various application scenarios. For example: the codec network can be applied in an image storage scenario, and the codec network can be applied in an image transmission scenario.
In an image storage scene, the output end of the image coding network is connected with a preset storage device; the input end of the image decoding network is connected with the storage device. Inputting the sample image into the image coding network, performing compression coding processing on the sample image by using the image coding network to obtain compressed data corresponding to the sample image and outputting the compressed data corresponding to the sample image to the storage device; acquiring compressed data corresponding to the sample image from the storage device through the image decoding network; and performing decoding recovery processing on the compressed data corresponding to the sample image by using the image decoding network to obtain a decoded image corresponding to the sample image.
In an image transmission scene, the output end of the image coding network is connected with a preset communication transmitter; and the input end of the image decoding network is connected with a preset communication receiver. Inputting the sample image into the image coding network, performing compression coding processing on the sample image by using the image coding network to obtain compressed data corresponding to the sample image, and outputting the compressed data corresponding to the sample image to the communication transmitter; receiving compressed data corresponding to the sample image through the communication receiver and inputting the compressed data into the image decoding network; and performing decoding recovery processing on the compressed data corresponding to the sample image by using the image decoding network to obtain a decoded image corresponding to the sample image.
Specifically, the codec network may be constructed using an AE (Auto-Encoder) network. The encoding layer and the decoding layer may be AE layers. Of course, other network or encoder configurations may be chosen.
The AE network comprises two processes of compression encoding f (X) and decoding recovery g (Y).
f (x) denotes an operation process of an image from the input layer to the feature extraction layer, and Y ═ f (x) S (WX + b). X represents an image input by an AE network, W and b are respectively preset encoding parameters, S is an activation function, and Y is an extracted feature vector.
g (Y) refers to an operation procedure for restoring the feature vector to the output layer, and Z ═ g (Y) ═ S ' (W ' Y + b '). W ' and b ' are decoding parameters, S ' is an activation function, and Z is a decoded image, respectively. The closer between image Z and image X, the higher the quality of image Z.
Fig. 2 is a schematic diagram of an automatic encoder according to an embodiment of the present application. When the dimension of Y is lower than that of X, the auto-encoder encodes the network for the image, as shown in the left diagram of fig. 2, and when the dimension of Y is higher than that of X, the auto-encoder decodes the network for the image, as shown in the right diagram of fig. 2.
In order to maximize compression, multiple layers of AE networks may be used in combination to extract and restore features in an image multiple times, thereby forming a compression codec network (CED).
To illustrate the specific structure of the present embodiment, an image encoding network composed of two encoded AEs (AE1, AE2) and an image decoding network composed of two decoded AEs (AE3, AE4) will be described below. Fig. 3 is a schematic structural diagram of a multi-layer AE network according to an embodiment of the present application.
When an image is compressed and encoded, an image encoding network is used, the first layer is an input layer of the network from left to right, an X vector (one-dimensional image vector) generated by the image is input, the image passes through an AE1 network, the X vector of the image is compressed into an h1 vector, the h1 vector is a feature vector extracted from AE1, the h1 vector is input into the next layer of AE2 network, features are further extracted, the dimension of the output feature vector h2 is lower, at the moment, the feature vector h1 is compressed into the feature vector h2, and the steps are repeated to finally obtain a low-dimensional encoded Y vector.
When the image is decompressed and restored, an image decoding network is used, namely the process of restoring the characteristic Y vector into the image X, the AE3 network decodes the Y vector of the low-dimensional coding into the h3 vector, the h3 vector is input into the AE4 network of the next layer, the h4 vector is decoded, and finally the decoded image Z vector is obtained, so that the image X is restored.
The following description will be made taking a storage scenario as an example.
High resolution images tend to occupy a large storage space, and in order to store these high resolution images, it is common practice to purchase a storage device with a larger capacity, which must be more costly. Furthermore, the artificial intelligence technology has an increasingly important position in the field of computer vision, and deep learning has good performance in image classification and image reconstruction, but a large number of images are often required for training a machine vision model, and a larger storage space is required for the large number of images. Therefore, in order to save storage space and reduce capital investment for storage equipment, the coding and decoding network of the application can be applied to a storage scene.
Firstly, a coding and decoding network is built, the input of the whole coding and decoding network is an original image to be compressed (a sample image in a training stage), and the original image is coded into a low-dimensional vector by an image coding network in the coding and decoding network and is stored. The image decoding network in the codec network can restore the low-dimensional vectors to the desired image, so that storing the low-dimensional vectors is equivalent to storing the original image.
Secondly, the coding and decoding network is built in a high-speed arithmetic unit. Writing the original image into a high-speed arithmetic unit, calculating a low-dimensional vector through an encoding and decoding network and storing the low-dimensional vector; when reading the image, the low-dimensional vector is read into a high-speed arithmetic unit, and the original image is decoded by applying an encoding and decoding network. Through experimental comparison, the embodiment not only can compress and store the images in batches, but also can obtain higher compression ratio and storage and reading efficiency.
The codec network needs to be trained before application, and the training network of the codec network of the present application is further described below. Fig. 4 is a schematic diagram of a training network structure of a codec network according to an embodiment of the present invention.
The training network of the coding and decoding network comprises: an image encoding network, an image decoding network, a storage device and an enhancement network.
The output of the image coding network is connected to the input of the storage device. The input of the image decoding network is connected with the output of the storage device. The input of the image coding network and the output of the image decoding network are both connected to the input of the enhancement network.
In practical application, it is not complicated to compress an image into a low-dimensional vector, and it is more complicated to restore the image, that is, when a low-dimensional feature vector Y is decoded into a Z image from a storage device, the image is often not clear enough, and the distortion is large. In order to enhance the recovery capability of the codec network for the image, so that the image is more real and clear, the embodiment provides an enhancement network, which is used for determining the decoding loss value of the codec network.
Further, the enhanced network may employ an LSGAN (Least square generated adaptive Networks) network, but may also employ other Networks.
The LSGAN network can improve the effect of decoding images in the game process of original images (sample images) and decoding images, thereby enhancing the decoding images decoded by the encoding and decoding network, shortening the distance between real distribution and generated distribution to the greatest extent and improving the decoding accuracy.
When the coding and decoding network is trained, the coding and decoding network and the enhancement network are trained simultaneously so as to pay attention to the image loss value and the decoding loss value of the coding and decoding network simultaneously.
The basic principle of this embodiment is to use the enhancement network as a Discriminator (Discriminator), and discriminate whether the output decoded image is a sample image or not by the Discriminator, if the decoded image is different from the sample image, it indicates that the decoding correctness of the image decoding network is not high, and the distortion of the decoded image is more, so that the codec network is adjusted to realize supervision of the image decoding network, so that the image decoding network generates a better image until the Discriminator cannot discriminate whether the decoded image output by the codec network is a restored image or an original sample image, and at this time, a better codec network is obtained, and the sample image can be restored to the maximum extent.
Further, when the codec network is trained, in order to prevent the discriminator from distinguishing that the image Z is a decoded and restored image, the distortion of the image Z relative to the image X needs to be as small as possible, so that by continuously training the codec network and the enhancement network at the same time, the image output by the codec network becomes clearer and clearer, and the distortion degree becomes lower and lower.
The following describes the training method of the codec network according to the embodiment of the present application with respect to the above-mentioned training network structure diagram. Fig. 5 is a flowchart illustrating training steps of a codec network according to an embodiment of the present invention.
Step S510, a sample image is acquired.
Step S520, inputting a sample image into the image coding network, performing compression coding processing on the sample image by using the image coding network to obtain compressed data corresponding to the sample image, and outputting the compressed data corresponding to the sample image to the storage device.
The image coding network and the image decoding network are two parallel sub-networks in the coding and decoding network. Since the compression encoding process uses only the image encoding network and the decoding restoration process uses only the image decoding network, the image encoding network or the image decoding network can be selected among the encoding and decoding networks by sub-network parameter selection. When the sub-network parameters corresponding to the image coding network are selected (for example, all the sub-network parameters corresponding to the image decoding network are set to zero), the input data is processed only in the image coding network, and the image decoding network does not work. When the sub-network parameters corresponding to the image decoding network are selected (for example, all the sub-network parameters corresponding to the image coding network are set to zero), the input data is processed only in the image decoding network, and the image coding network does not work.
Step S530, obtaining compressed data corresponding to the sample image from the storage device through the image decoding network; and performing decoding recovery processing on the compressed data corresponding to the sample image by using the image decoding network to obtain a decoded image corresponding to the sample image.
Step S540, determining a loss value between the sample image and the decoded image corresponding to the sample image as an image loss value of the codec network by using a preset loss function.
The categories of loss functions include, but are not limited to: a cross entropy loss function and an average error function.
For example: for calculating image loss values L when using average loss functionsCEDThe average loss function of (X, Z) can be shown as follows:
wherein: omega represents a punishment item, and can prevent the overfitting of the coding and decoding network in the training process; x is a sample image, Z is a decoded image, and N is the number of sample images.
Step S550, inputting the sample image and the decoded image corresponding to the sample image into a preset enhancement network; determining, by the enhancement network, a distortion rate of the sample image and a distortion rate of a decoded image corresponding to the sample image; and determining a decoding loss value corresponding to the coding and decoding network according to the distortion rate of the sample image and the distortion rate of the decoded image.
The enhanced network may be built by a Discriminator (Discriminator). The arbiter may employ a LSGAN network. The discriminator is used for respectively scoring the sample image and the decoded image. The score is used to represent a distortion rate between an input image (sample image or decoded image) and a real image. D is scored between 0 and 1. The closer the score of the discriminator is to 1, the closer the input image is to the real image. The closer the score of the discriminator is to 0, the less the input image is a real image.
The real image is an image input to the codec network. In the present embodiment, the real image is a sample image.
Further, the present embodiment trains the codec network and the discriminator (enhancement network) synchronously. When training is started, the decoded image is greatly different from the sample image, the score of the discriminator on the sample image is close to or equal to 1, and the score on the decoded image is close to or equal to 0. A score of 1 for the sample image indicates that the discriminator can recognize the real image, and a score of 0 for the decoded image indicates that the decoded image is not the real image. Therefore, the discriminator can monitor the image decoding network, the discrimination result can measure the decoding accuracy of the image decoding network, the decoding capability of the image decoding network is improved, meanwhile, the decoded image output by the image decoding network can promote the training of the discriminator, and the discrimination capability of the discriminator on the input image is improved. With the training, the correctness of the image decoding network will be higher and higher, that is, the decoded image and the sample image are more and more difficult to distinguish, until the scores of the sample image and the decoded image by the discriminator are infinitely close, that is, the decoded image output by the image decoding network can be regarded as a real image, so that the image recovery capability of the encoding and decoding network is optimal.
The decoding loss value can measure the decoding correctness degree of the image decoding network. For example: the function used to calculate the decoding loss value may be as follows:
wherein D is a discriminator; x is a sample image; z is a decoded image; d (X) is the mark of the sample image by the discriminator; d (Z) is the grade of the discriminator to the decoding image; n is the number of sample images; a and b are preset parameters.
Further, a ═ 1 and b ═ 1 may be set. The function used to calculate the decoding loss value is equivalent to the pearson chi-squared divergence function.
Step S560, determining a network loss value corresponding to the coding and decoding network according to the image loss value and the decoding loss value corresponding to the coding and decoding network.
The image loss value and the decoding loss value are combined to form a network loss value L. The network loss value L is an index for judging whether the coding and decoding network reaches the optimal value. The optimal encoding and decoding network means that the compression ratio of the image encoding network is optimal, and the recovery effect of the image decoding network is optimal.
For example: the function used to calculate the network loss value L may be as shown in the following graph:
L=LCED(X,Z)+λ*LLSGAN(D,X,Z);
where λ denotes a ratio of the decoding loss value to the total loss value (sum of the decoding loss value and the image loss value), that is, a degree of importance of the decoding loss value. λ may also be a preset value, which may be an empirical value or an experimental value. For example, it may be set to 0.2 during actual operation.
Step S570, determining whether the coding and decoding network meets the convergence condition according to the network loss value corresponding to the coding and decoding network; if so, go to step S580; if not, it jumps to step S510.
The convergence conditions include: in continuous multi-round training, the network loss values corresponding to the coding and decoding networks are all in a preset network convergence range.
If the coding and decoding network does not converge, the parameters of the coding and decoding network need to be adjusted, and then the next round of training is performed on the coding and decoding network.
Step S580, determining that the codec network has converged.
The image transmission scenario may be performed with reference to the present embodiment. The only difference is that after the image coding network obtains the compressed data corresponding to the sample image, the image coding network outputs the compressed data corresponding to the sample image to the communication transmitter (at this time, the image decoding network does not work), and the communication transmitter transmits the compressed data to the receiving end device. After the communication receiver receives the compressed data, the compressed data is input into an image decoding network (at this time, the image coding network does not work), and the image decoding network performs decoding recovery processing on the compressed data to obtain a decoded image corresponding to the sample image.
After the codec network converges, the training of the codec network is stopped, and the trained codec network can be applied to a specific scene.
The following description will take an example of a storing process in an image storing scene. The execution subject of the present embodiment is an image storage system. FIG. 6 is a flowchart illustrating steps of image storage according to an embodiment of the present invention.
Step S610, receiving an image storage instruction; the image storage instructions are to instruct to store a first target image.
Image storage instructions for instructing storage of at least one first target image.
After receiving the image storage instruction, the (at least one) first target image may be received or acquired.
For example: the client sends an image storage instruction to the image storage system, and immediately after the client sends the first target image to the image storage system, the image storage system starts to receive the first target image after receiving the image storage instruction.
Step S620, inputting the first target image into the coding and decoding networks, and selecting an image coding network in the coding and decoding networks according to the image storage indication.
And selecting the sub-network parameters corresponding to the image coding network according to the image storage instruction, so that the image coding network in the coding and decoding network works normally, and the image decoding network in the coding and decoding network stops working. Thus, after the first target image is input into the codec network, the first target image may be processed by the image coding network and the result may be output.
Specifically, when the number of the first target images is at least one, a first target image matrix corresponding to the at least one first target image is generated; each row vector in the first target image matrix is a one-dimensional first target image; and inputting the first target image matrix into the coding and decoding network.
Further, each first target image in the at least one first target image is converted into a one-dimensional image vector, and a first target image matrix is generated according to the one-dimensional image vectors corresponding to the at least one first target image; each row vector in the first target image matrix represents a one-dimensional image vector corresponding to one first target image. The multidimensional first target image may be converted into a one-dimensional image vector using a preset dimension conversion algorithm. The dimension conversion algorithm includes, but is not limited to: matrix () function algorithm.
For example: the first object image a is [1,2,3,3,2], the first object image B is [3,2,3,4,1], the first object image C is [4,2,1,4,1], and thus the first object image matrix composed of these three one-dimensional image vectors is:
[1,2,3,3,2]
[3,2,3,4,1]
[4,2,1,4,1]。
step S630, performing compression encoding processing on the first target image by using the image encoding network, obtaining compressed data corresponding to the first target image, and outputting the compressed data to the storage device so as to store the first target image.
Performing compression coding processing on the first target image matrix by using the image coding network to obtain a compressed data matrix corresponding to the first target image matrix; each row vector in the compressed data matrix is compressed data corresponding to one first target image.
After the compressed data corresponding to the first target image is output to the storage device, the storage device stores the compressed data corresponding to the first target image and returns the storage address of the first target image for subsequent use in reading the first target image.
In this embodiment, after receiving the image storage instruction, the first target image may be input into the trained codec network, and the image coding network in the codec network is used to perform compression coding on the first target image, so as to implement the dimension reduction processing on the first target image, achieve the purpose of compressing the first target image, and finally store the compressed data corresponding to the first target image in the storage device. Because the coding and decoding network also comprises the image decoding network, the storage of the compressed data of the first target image is equal to the storage of the first target image, the demand and the investment cost of the storage space are effectively reduced, and the utilization rate of the storage space is improved.
Compared with a mode of compressing and encoding an image by using an image encoding network, although the image volume obtained by the WebP image compression technology is smaller, the longer encoding time is consumed, the time length is usually several times or even dozens of times of a common compression algorithm, and therefore the method is not suitable for wide popularization and use; the BPG image compression technique is single picture compression and decompression, is not suitable for simultaneous compression and decompression (decoding) of a large number of images, and requires a high cost. The image coding network of the embodiment can not only perform compression and decompression of a single image, but also perform compression and decompression of batch images, and can realize more efficient storage.
The following description will take the reading process in the image storage scene as an example. Fig. 7 is a flowchart illustrating steps of image reading according to an embodiment of the present invention.
Step S710, receiving an image reading instruction; the image reading instruction is used for indicating to read a second target image.
And image reading instructions for instructing to read at least one second target image.
Information of the second target image may be carried in the image reading instruction. The information of the second target image includes, but is not limited to: a storage address of the second target image.
The storage address of the second target image is also the storage address of the compressed data corresponding to the second target image.
Step S720, obtaining the compressed data corresponding to the second target image from the storage device according to the image reading instruction, and inputting the compressed data corresponding to the second target image into the codec network.
When the number of the second target images is at least one, generating a compressed data matrix according to the compressed data corresponding to the at least one second target image; each row vector in the compressed data matrix corresponds to one second target image; and inputting the compressed data matrix into the coding and decoding network.
Step S730, selecting an image decoding network in the coding and decoding networks according to the image reading instruction, and performing decoding recovery processing on the compressed data corresponding to the second target image by using the image decoding network to obtain a decoded image corresponding to the second target image.
Utilizing the image decoding network to perform decoding recovery processing on the compressed data matrix to obtain a decoded image matrix corresponding to the compressed data matrix; each row vector in the decoded image matrix is a one-dimensional decoded image corresponding to one second target image.
In order to visualize the decoded image, a row of vectors corresponding to each second target image in the decoded image matrix may be converted into a multidimensional vector according to the number of dimensions of each second target image, so as to obtain a multidimensional decoded image.
In this embodiment, after receiving the image reading instruction, the storage device may obtain compressed data corresponding to the second target image, input the compressed data into the trained codec network, and perform decoding recovery on the compressed data by using the image decoding network in the codec network, so as to implement dimension-up processing on the compressed data, achieve the purpose of recovering the compressed data, and finally recover the compressed data into the second target image.
When the image is required to be obtained, the clear image can be obtained only by decoding and restoring the stored low-dimensional characteristic vector Y. The embodiment can compress the image to a greater extent, and supports batch parallel compression and decoding processing of a plurality of images. For example: 100 images are compressed or decompressed at one time, and the compression storage efficiency is improved.
In this embodiment, the codec network may be deployed to a high-speed computing Unit, which may be a device such as a DSP (Digital Signal Processing), an FPGA (programmable logic Array) Unit, and a GPU (Graphics Processing Unit). The high-speed computing unit may be a high-speed storage unit.
When the image needs to be written into the storage device, the image needs to be written into the high-speed computing unit firstly, and the high-speed computing unit computes the low-dimensional vector of the image by using the trained coding and decoding network and writes the low-dimensional vector into the storage equipment.
The storage device may be any of a variety of memories, such as: SRAM (Static Random-Access Memory), DRAM (Dynamic Random-Access Memory), other types of RAM (Random-Access Memory), etc., but also some flash Memory or other Memory technology, DVD (Digital Video Disc) or other optical storage, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium.
When the image needs to be read from the storage device, the high-speed calculation unit decodes the low-dimensional vector read from the storage device, restores the image data and finishes reading the image.
For AI (Artificial Intelligence) platforms related to cloud computing and machine learning, a large number of network models and images need to be processed, and generally, high-speed computing units are provided, so that the coding and decoding network of the embodiment is more suitable for a cloud storage system, and generates greater economic benefit for the cloud storage system.
The embodiment of the application also provides a training device of the coding and decoding network. Fig. 8 is a block diagram of a training apparatus of a codec network according to an embodiment of the present application.
The training device of the coding and decoding network comprises: an obtaining module 810, a coding and decoding module 820, a first determining module 830, a second determining module 840 and a third determining module 850.
An obtaining module 810 is configured to obtain a sample image while performing each round of training.
The coding and decoding module 820 is configured to input the sample image into a coding and decoding network, and perform compression coding processing and decoding recovery processing on the sample image sequentially by using the coding and decoding network to obtain a decoded image corresponding to the sample image.
The first determining module 830 is configured to determine an image loss value and a decoding loss value corresponding to the coding and decoding network according to the sample image and a decoded image corresponding to the sample image.
A second determining module 840, configured to determine a network loss value corresponding to the coding and decoding network according to the image loss value and the decoding loss value corresponding to the coding and decoding network.
And a third determining module 850, configured to determine that the coding and decoding network converges when the network loss value corresponding to the coding and decoding network is within a preset network convergence range after multiple rounds of training.
The functions of the apparatus according to the embodiments of the present invention have been described in the foregoing method embodiments, so that reference may be made to the related descriptions in the foregoing embodiments for details that are not described in the foregoing embodiments of the present invention, and further details are not described herein.
As shown in fig. 9, an electronic device according to an embodiment of the present application includes a processor 910, a communication interface 920, a memory 930, and a communication bus 940, where the processor 910, the communication interface 920, and the memory 930 complete communication with each other through the communication bus 940.
A memory 930 for storing computer programs.
In an embodiment of the present application, when the processor 910 is configured to execute the program stored in the memory 930, the method for training the codec network according to any one of the foregoing method embodiments includes: acquiring a sample image while performing each round of training; inputting the sample image into an encoding and decoding network, and sequentially performing compression encoding processing and decoding recovery processing on the sample image by using the encoding and decoding network to obtain a decoded image corresponding to the sample image; determining an image loss value and a decoding loss value corresponding to the coding and decoding network according to the sample image and a decoding image corresponding to the sample image; determining a network loss value corresponding to the coding and decoding network according to the image loss value and the decoding loss value corresponding to the coding and decoding network; and determining the convergence of the coding and decoding network when the network loss value corresponding to the coding and decoding network is in a preset network convergence range through multiple rounds of training.
Wherein, the determining the decoding loss value corresponding to the coding and decoding network according to the sample image and the decoding image corresponding to the sample image comprises: inputting the sample image and a decoded image corresponding to the sample image into a preset enhancement network; determining, by the enhancement network, a distortion rate of the sample image and a distortion rate of a decoded image corresponding to the sample image; and determining a decoding loss value corresponding to the coding and decoding network according to the distortion rate of the sample image and the distortion rate of the decoded image.
Wherein the obtaining a sample image comprises: acquiring at least one sample image; the inputting the sample image into a coding and decoding network, and performing compression coding processing and decoding recovery processing on the sample image in sequence by using the coding and decoding network to obtain a decoded image corresponding to the sample image, includes: generating a sample image matrix corresponding to the at least one sample image; each row vector in the sample image matrix is a one-dimensional sample image; inputting the sample image matrix into the coding and decoding network, and sequentially performing compression coding processing and decoding recovery processing on the sample image matrix by using the image coding network to obtain a decoded image matrix; each row vector in the decoded image matrix is a one-dimensional decoded image corresponding to one sample image.
Wherein, the coding and decoding network comprises: an image encoding network and an image decoding network; the output end of the image coding network is connected with a preset storage device; the input end of the image decoding network is connected with the storage device; the inputting the sample image into a coding and decoding network, and performing compression coding processing and decoding recovery processing on the sample image in sequence by using the coding and decoding network to obtain a decoded image corresponding to the sample image, includes: inputting the sample image into the image coding network, performing compression coding processing on the sample image by using the image coding network to obtain compressed data corresponding to the sample image and outputting the compressed data corresponding to the sample image to the storage device; acquiring compressed data corresponding to the sample image from the storage device through the image decoding network; and performing decoding recovery processing on the compressed data corresponding to the sample image by using the image decoding network to obtain a decoded image corresponding to the sample image.
After determining that the coding and decoding network converges, the method further comprises: receiving an image storage instruction; the image storage instructions are to instruct to store a first target image; inputting the first target image into the coding and decoding networks, and selecting an image coding network in the coding and decoding networks according to the image storage indication; and performing compression coding processing on the first target image by using the image coding network to obtain compressed data corresponding to the first target image and outputting the compressed data to the storage device so as to store the first target image.
Wherein the image storage instructions are to instruct to store at least one first target image; the inputting the first target image into the coding and decoding network comprises: generating a first target image matrix corresponding to the at least one first target image; each row vector in the first target image matrix is a one-dimensional first target image; inputting the first target image matrix into the coding and decoding network; the performing, by using the image coding network, compression coding processing on the first target image to obtain compressed data corresponding to the first target image includes: performing compression coding processing on the first target image matrix by using the image coding network to obtain a compressed data matrix corresponding to the first target image matrix; each row vector in the compressed data matrix is compressed data corresponding to one first target image.
After determining that the coding and decoding network converges, the method further comprises: receiving an image reading instruction; the image reading instruction is used for indicating to read a second target image; acquiring compressed data corresponding to the second target image from the storage device according to the image reading instruction, and inputting the compressed data corresponding to the second target image into the coding and decoding network; and selecting an image decoding network in the coding and decoding network according to the image reading instruction, and performing decoding recovery processing on the compressed data corresponding to the second target image by using the image decoding network to obtain a decoded image corresponding to the second target image.
Wherein the image reading instruction is used for indicating to read at least one second target image; before the compressed data corresponding to the second target image is input into the codec network, the method further includes: generating a compressed data matrix according to the compressed data corresponding to the at least one second target image respectively; each row vector in the compressed data matrix corresponds to one second target image; inputting the compressed data matrix into the coding and decoding network; the decoding recovery processing is performed on the compressed data corresponding to the second target image by using the image decoding network to obtain a decoded image corresponding to the second target image, and the decoding recovery processing includes: utilizing the image decoding network to perform decoding recovery processing on the compressed data matrix to obtain a decoded image matrix corresponding to the compressed data matrix; each row vector in the decoded image matrix is a one-dimensional decoded image corresponding to one second target image.
When the network loss value corresponding to the coding and decoding network is in a preset network convergence range after multiple rounds of training, determining that the coding and decoding network converges comprises the following steps: in the training of continuous preset turns, when the network loss values corresponding to the coding and decoding networks are all in the network convergence range, determining that the coding and decoding networks are converged; or, after multiple rounds of training, when the network loss value corresponding to the coding and decoding network is in the network convergence range for the first time, determining that the coding and decoding network converges.
In a typical configuration of the present application, a device includes one or more processing calculators, input/output interfaces, and memory. Certainly, the communication between the computing unit and the storage needs a communication interface, for the point-to-point communication between the devices, according to the relation between the message transmission direction and the time, the communication mode can adopt any one of simplex communication, half-duplex communication and full-duplex communication, and as the read-write operation is needed between each device, in order to increase the applicability of the device, the full-duplex communication mode is adopted in the invention, namely, the bidirectional signal transmission of image storage and reading can be carried out on the line at any time of the communication, thereby improving the network efficiency. Fig. 10 is a schematic diagram illustrating image access in a duplex communication mode according to an embodiment of the present application. Under the duplex communication model, images can be stored (written) and read simultaneously, and the image access efficiency is further improved.
The present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the training method for the codec network provided in any one of the foregoing method embodiments. Among other things, computer-readable storage media may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, a hard disk, or a solid state disk; the memory may also comprise a combination of memories of the kind described above.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present invention, which enable those skilled in the art to understand or practice the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (12)
1. A method for training a coding/decoding network is characterized by comprising the following steps:
acquiring a sample image while performing each round of training;
inputting the sample image into an encoding and decoding network, and sequentially performing compression encoding processing and decoding recovery processing on the sample image by using the encoding and decoding network to obtain a decoded image corresponding to the sample image;
determining an image loss value and a decoding loss value corresponding to the coding and decoding network according to the sample image and a decoding image corresponding to the sample image;
determining a network loss value corresponding to the coding and decoding network according to the image loss value and the decoding loss value corresponding to the coding and decoding network;
and determining the convergence of the coding and decoding network when the network loss value corresponding to the coding and decoding network is in a preset network convergence range through multiple rounds of training.
2. The method of claim 1, wherein the determining, according to the sample image and the decoded image corresponding to the sample image, a decoding loss value corresponding to the coding network comprises:
inputting the sample image and a decoded image corresponding to the sample image into a preset enhancement network;
determining, by the enhancement network, a distortion rate of the sample image and a distortion rate of a decoded image corresponding to the sample image;
and determining a decoding loss value corresponding to the coding and decoding network according to the distortion rate of the sample image and the distortion rate of the decoded image.
3. The method of claim 1,
the acquiring a sample image includes: acquiring at least one sample image;
the inputting the sample image into a coding and decoding network, and performing compression coding processing and decoding recovery processing on the sample image in sequence by using the coding and decoding network to obtain a decoded image corresponding to the sample image, includes:
generating a sample image matrix corresponding to the at least one sample image; each row vector in the sample image matrix is a one-dimensional sample image;
inputting the sample image matrix into the coding and decoding network, and sequentially performing compression coding processing and decoding recovery processing on the sample image matrix by using the image coding network to obtain a decoded image matrix; each row vector in the decoded image matrix is a one-dimensional decoded image corresponding to one sample image.
4. The method of claim 1,
the coding and decoding network comprises: an image encoding network and an image decoding network; the output end of the image coding network is connected with a preset storage device; the input end of the image decoding network is connected with the storage device;
the inputting the sample image into a coding and decoding network, and performing compression coding processing and decoding recovery processing on the sample image in sequence by using the coding and decoding network to obtain a decoded image corresponding to the sample image, includes:
inputting the sample image into the image coding network, performing compression coding processing on the sample image by using the image coding network to obtain compressed data corresponding to the sample image and outputting the compressed data corresponding to the sample image to the storage device;
acquiring compressed data corresponding to the sample image from the storage device through the image decoding network; and performing decoding recovery processing on the compressed data corresponding to the sample image by using the image decoding network to obtain a decoded image corresponding to the sample image.
5. The method of claim 4, after determining convergence of the codec network, further comprising:
receiving an image storage instruction; the image storage instructions are to instruct to store a first target image;
inputting the first target image into the coding and decoding networks, and selecting an image coding network in the coding and decoding networks according to the image storage indication;
and performing compression coding processing on the first target image by using the image coding network to obtain compressed data corresponding to the first target image and outputting the compressed data to the storage device so as to store the first target image.
6. The method of claim 5,
the image storage instructions are used for indicating at least one first target image to be stored;
the inputting the first target image into the coding and decoding network comprises:
generating a first target image matrix corresponding to the at least one first target image; each row vector in the first target image matrix is a one-dimensional first target image;
inputting the first target image matrix into the coding and decoding network;
the performing, by using the image coding network, compression coding processing on the first target image to obtain compressed data corresponding to the first target image includes:
performing compression coding processing on the first target image matrix by using the image coding network to obtain a compressed data matrix corresponding to the first target image matrix; each row vector in the compressed data matrix is compressed data corresponding to one first target image.
7. The method of claim 4, after determining convergence of the codec network, further comprising:
receiving an image reading instruction; the image reading instruction is used for indicating to read a second target image;
acquiring compressed data corresponding to the second target image from the storage device according to the image reading instruction, and inputting the compressed data corresponding to the second target image into the coding and decoding network;
and selecting an image decoding network in the coding and decoding network according to the image reading instruction, and performing decoding recovery processing on the compressed data corresponding to the second target image by using the image decoding network to obtain a decoded image corresponding to the second target image.
8. The method of claim 7,
the image reading instruction is used for indicating to read at least one second target image;
before the compressed data corresponding to the second target image is input into the codec network, the method further includes:
generating a compressed data matrix according to the compressed data corresponding to the at least one second target image respectively; each row vector in the compressed data matrix corresponds to one second target image;
inputting the compressed data matrix into the coding and decoding network;
the decoding recovery processing is performed on the compressed data corresponding to the second target image by using the image decoding network to obtain a decoded image corresponding to the second target image, and the decoding recovery processing includes:
utilizing the image decoding network to perform decoding recovery processing on the compressed data matrix to obtain a decoded image matrix corresponding to the compressed data matrix; each row vector in the decoded image matrix is a one-dimensional decoded image corresponding to one second target image.
9. The method according to any one of claims 1 to 8, wherein determining convergence of the codec network when the network loss value corresponding to the codec network is within a preset network convergence range after multiple rounds of training comprises:
in the training of continuous preset turns, when the network loss values corresponding to the coding and decoding networks are all in the network convergence range, determining that the coding and decoding networks are converged; or,
and determining the convergence of the coding and decoding network when the network loss value corresponding to the coding and decoding network is in the network convergence range for the first time after multiple rounds of training.
10. An apparatus for training a codec network, comprising:
the acquisition module is used for acquiring a sample image when each round of training is executed;
the coding and decoding module is used for inputting the sample images into a coding and decoding network, and sequentially executing compression coding processing and decoding recovery processing on the sample images by using the coding and decoding network to obtain decoded images corresponding to the sample images;
a first determining module, configured to determine an image loss value and a decoding loss value corresponding to the coding and decoding network according to the sample image and a decoded image corresponding to the sample image;
a second determining module, configured to determine a network loss value corresponding to the coding and decoding network according to the image loss value and the decoding loss value corresponding to the coding and decoding network;
and the third determining module is used for determining the convergence of the coding and decoding network when the network loss value corresponding to the coding and decoding network is in a preset network convergence range after multiple rounds of training.
11. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the steps of the method for training a codec network according to any one of claims 1 to 9 when executing a program stored in a memory.
12. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of training a codec network according to any one of claims 1 to 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110303982.1A CN112929666B (en) | 2021-03-22 | 2021-03-22 | Method, device and equipment for training coding and decoding network and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110303982.1A CN112929666B (en) | 2021-03-22 | 2021-03-22 | Method, device and equipment for training coding and decoding network and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112929666A true CN112929666A (en) | 2021-06-08 |
CN112929666B CN112929666B (en) | 2023-04-14 |
Family
ID=76175378
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110303982.1A Active CN112929666B (en) | 2021-03-22 | 2021-03-22 | Method, device and equipment for training coding and decoding network and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112929666B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113658282A (en) * | 2021-06-25 | 2021-11-16 | 陕西尚品信息科技有限公司 | Image compression and decompression method and device |
CN113746870A (en) * | 2021-11-05 | 2021-12-03 | 山东万网智能科技有限公司 | Intelligent data transmission method and system for Internet of things equipment |
CN116095339A (en) * | 2023-01-16 | 2023-05-09 | 北京智芯微电子科技有限公司 | Image transmission method, training method, electronic device, and readable storage medium |
US11711449B2 (en) | 2021-12-07 | 2023-07-25 | Capital One Services, Llc | Compressing websites for fast data transfers |
CN116737607A (en) * | 2023-08-16 | 2023-09-12 | 之江实验室 | Sample data caching method, system, computer device and storage medium |
CN117459727A (en) * | 2023-12-22 | 2024-01-26 | 浙江省北大信息技术高等研究院 | Image processing method, device and system, electronic equipment and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108416752A (en) * | 2018-03-12 | 2018-08-17 | 中山大学 | A method of image is carried out based on production confrontation network and removes motion blur |
CN109255769A (en) * | 2018-10-25 | 2019-01-22 | 厦门美图之家科技有限公司 | The training method and training pattern and image enchancing method of image enhancement network |
CN110070174A (en) * | 2019-04-10 | 2019-07-30 | 厦门美图之家科技有限公司 | A kind of stabilization training method generating confrontation network |
CN110225350A (en) * | 2019-05-30 | 2019-09-10 | 西安电子科技大学 | Natural image compression method based on production confrontation network |
US20190354811A1 (en) * | 2017-12-07 | 2019-11-21 | Shanghai Cambricon Information Technology Co., Ltd | Image compression method and related device |
EP3633990A1 (en) * | 2018-10-02 | 2020-04-08 | Nokia Technologies Oy | An apparatus, a method and a computer program for running a neural network |
CN111462000A (en) * | 2020-03-17 | 2020-07-28 | 北京邮电大学 | Image recovery method and device based on pre-training self-encoder |
CN111565314A (en) * | 2019-02-13 | 2020-08-21 | 合肥图鸭信息科技有限公司 | Image compression method, coding and decoding network training method and device and electronic equipment |
-
2021
- 2021-03-22 CN CN202110303982.1A patent/CN112929666B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190354811A1 (en) * | 2017-12-07 | 2019-11-21 | Shanghai Cambricon Information Technology Co., Ltd | Image compression method and related device |
CN108416752A (en) * | 2018-03-12 | 2018-08-17 | 中山大学 | A method of image is carried out based on production confrontation network and removes motion blur |
EP3633990A1 (en) * | 2018-10-02 | 2020-04-08 | Nokia Technologies Oy | An apparatus, a method and a computer program for running a neural network |
CN109255769A (en) * | 2018-10-25 | 2019-01-22 | 厦门美图之家科技有限公司 | The training method and training pattern and image enchancing method of image enhancement network |
CN111565314A (en) * | 2019-02-13 | 2020-08-21 | 合肥图鸭信息科技有限公司 | Image compression method, coding and decoding network training method and device and electronic equipment |
CN110070174A (en) * | 2019-04-10 | 2019-07-30 | 厦门美图之家科技有限公司 | A kind of stabilization training method generating confrontation network |
CN110225350A (en) * | 2019-05-30 | 2019-09-10 | 西安电子科技大学 | Natural image compression method based on production confrontation network |
CN111462000A (en) * | 2020-03-17 | 2020-07-28 | 北京邮电大学 | Image recovery method and device based on pre-training self-encoder |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113658282A (en) * | 2021-06-25 | 2021-11-16 | 陕西尚品信息科技有限公司 | Image compression and decompression method and device |
CN113746870A (en) * | 2021-11-05 | 2021-12-03 | 山东万网智能科技有限公司 | Intelligent data transmission method and system for Internet of things equipment |
US11711449B2 (en) | 2021-12-07 | 2023-07-25 | Capital One Services, Llc | Compressing websites for fast data transfers |
CN116095339A (en) * | 2023-01-16 | 2023-05-09 | 北京智芯微电子科技有限公司 | Image transmission method, training method, electronic device, and readable storage medium |
CN116737607A (en) * | 2023-08-16 | 2023-09-12 | 之江实验室 | Sample data caching method, system, computer device and storage medium |
CN116737607B (en) * | 2023-08-16 | 2023-11-21 | 之江实验室 | Sample data caching method, system, computer device and storage medium |
CN117459727A (en) * | 2023-12-22 | 2024-01-26 | 浙江省北大信息技术高等研究院 | Image processing method, device and system, electronic equipment and storage medium |
CN117459727B (en) * | 2023-12-22 | 2024-05-03 | 浙江省北大信息技术高等研究院 | Image processing method, device and system, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN112929666B (en) | 2023-04-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112929666B (en) | Method, device and equipment for training coding and decoding network and storage medium | |
US11310509B2 (en) | Method and apparatus for applying deep learning techniques in video coding, restoration and video quality analysis (VQA) | |
EP3777207B1 (en) | Content-specific neural network distribution | |
Zhou et al. | End-to-end Optimized Image Compression with Attention Mechanism. | |
EP3583777A1 (en) | A method and technical equipment for video processing | |
JP2020518191A (en) | Quantization parameter prediction maintaining visual quality using deep neural network | |
CN108780499A (en) | The system and method for video processing based on quantization parameter | |
CN107547773B (en) | Image processing method, device and equipment | |
Guarda et al. | Deep learning-based point cloud geometry coding with resolution scalability | |
CN113192147B (en) | Method, system, storage medium, computer device and application for significance compression | |
Löhdefink et al. | On low-bitrate image compression for distributed automotive perception: Higher peak snr does not mean better semantic segmentation | |
Löhdefink et al. | GAN-vs. JPEG2000 image compression for distributed automotive perception: Higher peak SNR does not mean better semantic segmentation | |
Löhdefink et al. | Focussing learned image compression to semantic classes for V2X applications | |
CN114245126B (en) | Depth feature map compression method based on texture cooperation | |
CN103716622A (en) | Image processing method and device | |
CN118741055A (en) | High-resolution image transmission method and system based on optical communication | |
US12137230B2 (en) | Method and apparatus for applying deep learning techniques in video coding, restoration and video quality analysis (VQA) | |
US20240193819A1 (en) | Learning-based point cloud compression via tearing transform | |
US20240282013A1 (en) | Learning-based point cloud compression via unfolding of 3d point clouds | |
CN117956178A (en) | Video encoding method and device, and video decoding method and device | |
Ma et al. | Activation Map-based Vector Quantization for 360-degree Image Semantic Communication | |
Yang | Image Compression and Transmission Optimization Based on Deep Learning | |
KR20240107131A (en) | Learning-based point cloud compression through adaptive point generation | |
Altaay | Developed a Method for Satellite Image Compression Using Enhanced Fixed Prediction Scheme | |
Shingala et al. | Multi-level Latent Fusion in Learning-based Image Coding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |