Nothing Special   »   [go: up one dir, main page]

WO2024078403A1 - 图像处理方法、装置及设备 - Google Patents

图像处理方法、装置及设备 Download PDF

Info

Publication number
WO2024078403A1
WO2024078403A1 PCT/CN2023/123322 CN2023123322W WO2024078403A1 WO 2024078403 A1 WO2024078403 A1 WO 2024078403A1 CN 2023123322 W CN2023123322 W CN 2023123322W WO 2024078403 A1 WO2024078403 A1 WO 2024078403A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
video
network
feature map
sub
Prior art date
Application number
PCT/CN2023/123322
Other languages
English (en)
French (fr)
Inventor
李胜曦
刘铁
陈超然
张子夫
徐迈
吕卓逸
Original Assignee
维沃移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 维沃移动通信有限公司 filed Critical 维沃移动通信有限公司
Publication of WO2024078403A1 publication Critical patent/WO2024078403A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output

Definitions

  • the present application belongs to the field of coding and decoding technology, and specifically relates to an image processing method, device and equipment.
  • the embodiments of the present application provide an image processing method, apparatus and device, which can solve the problem in the related art that the traditional image compression method used to process the image feature map cannot guarantee the encoding efficiency and the quality of the reconstructed feature map.
  • an image processing method comprising:
  • the first image to be processed includes a first feature map of the first image of the target object or a first sub-video in a first video of the target object;
  • the second feature map is the feature map of the first image, and the image features of the reconstructed second feature map are different from the image features of the first feature map;
  • the reconstructed image of the second sub-video is different from the image of the first sub-video.
  • an image processing device comprising:
  • a first acquisition module used to acquire a first image to be processed of the target object, wherein the first image to be processed includes a first feature map of the first image of the target object or a first sub-video in a first video of the target object;
  • a second acquisition module used to process the first to-be-processed image based on the first compression network to obtain a reconstructed second feature map or a second sub-video;
  • the second feature map is the feature map of the first image, and the image features of the reconstructed second feature map are different from the image features of the first feature map;
  • the second sub-video is a partial video in the first video, and the image of the reconstructed second sub-video is different from the image of the first sub-video.
  • an image processing device which includes a processor and a memory, wherein the memory stores a program or instruction that can be run on the processor, and when the program or instruction is executed by the processor, the steps of the method described in the first aspect are implemented.
  • an image processing device comprising a processor and a communication interface, wherein the processor is used to obtain a first image to be processed of a target object, wherein the first image to be processed includes a first feature map of the first image of the target object or includes a first sub-video in a first video of the target object;
  • the second feature map is the feature map of the first image, and the image features of the reconstructed second feature map are different from the image features of the first feature map;
  • the reconstructed image of the second sub-video is different from the image of the first sub-video.
  • a readable storage medium on which a program or instruction is stored.
  • the program or instruction is executed by a processor, the steps of the method described in the first aspect are implemented.
  • a chip comprising a processor and a communication interface, wherein the communication interface is coupled to the processor, and the processor is used to run a program or instruction to implement the method described in the first aspect.
  • a computer program/program product is provided, wherein the computer program/program product is stored in a storage medium and is executed by at least one processor to implement the steps of the method described in the first aspect.
  • a first object to be processed of a target object is obtained, wherein the first image to be processed includes a first feature map of the first image of the target object or includes a first sub-video in a first video of the target object; the first image to be processed is processed based on a first compression network to obtain a reconstructed second feature map or a second sub-video; the second feature map is a feature map of the first image, and the image features of the reconstructed second feature map are different from the image features of the first feature map; the image of the reconstructed second sub-video is different from the image of the first sub-video.
  • the reconstructed second feature map or second sub-video of the target object can be obtained through the above-mentioned first compression network, so that the second feature map or the second sub-video does not need to be encoded and transmitted, which improves the encoding efficiency, and obtaining the reconstructed second feature map or the second sub-video based on the compression network can effectively ensure the image quality.
  • FIG1 is a schematic diagram showing a flow chart of an image processing method according to an embodiment of the present application.
  • FIG2 is a schematic diagram showing the network architecture of a feature pyramid network according to an embodiment of the present application.
  • FIG3 is a schematic diagram showing a first compression network in an embodiment of the present application.
  • FIG4 is a schematic diagram showing a prediction and restoration network in an embodiment of the present application.
  • FIG5 is a schematic diagram showing a first compression network and a second compression network processing feature graph in an embodiment of the present application
  • FIG6 is a schematic diagram showing a second compression network in an embodiment of the present application.
  • FIG7 is a schematic diagram showing a module of an image processing device according to an embodiment of the present application.
  • FIG8 is a block diagram showing a structure of an image processing device in an embodiment of the present application.
  • FIG. 9 is a block diagram showing a structure of a terminal according to an embodiment of the present application.
  • first, second, etc. in the specification and claims of the present application are used to distinguish similar objects, and are not used to describe a specific order or sequence. It should be understood that the terms used in this way are interchangeable under appropriate circumstances, so that the embodiments of the present application can be implemented in an order other than those illustrated or described here, and the objects distinguished by “first” and “second” are generally of the same type, and the number of objects is not limited.
  • the first object can be one or more.
  • “and/or” in the specification and claims represents at least one of the connected objects, and the character “/" generally represents that the objects associated with each other are in an "or” relationship.
  • the image processing device corresponding to the image processing method in the embodiment of the present application may be a terminal, which may also be referred to as a terminal device or a user terminal (User Equipment, UE).
  • the terminal may be a mobile phone, a tablet computer (Tablet Personal Computer), a laptop computer (Laptop Computer) or a notebook computer, a personal digital assistant (Personal Digital Assistant, PDA), a handheld computer, a netbook, an ultra-mobile personal computer (ultra-mobile personal computer, UMPC), a mobile Internet device (Mobile Internet Device, MID), an augmented reality (augmented reality, AR)/virtual reality (virtual reality, VR) device, a robot, a wearable device (Wearable Device) or a vehicle-mounted device (Vehicle User Equipment, VUE), a pedestrian terminal (Pedestrian User Equipment, PUE) and other terminal-side devices, and the wearable device includes: a smart watch, a bracelet, a headset, glasses, etc. It should be noted that the specific
  • JPEG Joint Photographic Experts Group
  • DCT Discrete Cosine Transform
  • JPEG2000 uses Discrete Walsh Transform (DWT) instead of DCT to achieve better compression quality.
  • DWT Discrete Walsh Transform
  • intra-frame coding units can independently compress a single frame
  • the new image format can also compress the image information in the same frame.
  • BPG Portable Graphics
  • the compression task is regarded as an encoding process and trained using an end-to-end learning method.
  • the specific process can be decomposed into an encoding process, and the corresponding reconstruction task can be regarded as a decoding process.
  • the encoder-decoder structure is widely used in learning-based compression methods.
  • a compression model based on recursive neural network (RNN) uses an autoencoder to extract image features as a transformation process.
  • the residual structure is used for feature extraction in the encoder and decoder.
  • the new optimization method of compression rate and restoration distortion in the related art makes it very popular in the later methods.
  • the related art has proposed a compression method based on super prior to reduce spatial redundancy, and improved the entropy coding module and attached it to the super prior structure to further reduce coding redundancy.
  • the related paper takes advantage of the residual structure and attention mechanism, proposes a better structured autoencoder, and proposes a Gaussian mixture likelihood entropy model to improve its flexibility and accuracy.
  • the above methods can certainly change the compression ratio by adjusting parameters, however, when compressing at low bit rates, because these methods do not pay special attention to extreme cases, their compression quality usually drops very quickly.
  • the low bit rate performance is optimized.
  • JPEG codec method it is proposed to apply 2 ⁇ 2 average pooling to the image to obtain a smaller image.
  • the original size image is interpolated during reconstruction.
  • the method is optimized by designing filters in the downsampling and interpolation process, but the filters designed by this method are related to the image information and need to be designed manually.
  • GAN Generative Adversarial Network
  • a generative compression architecture is proposed to generate images from the image distribution encoded by the encoder, and the corresponding loss function is designed to balance the visual quality and reconstruction quality.
  • Generative Adversarial Networks is used as an enhancement module of the decoder structure.
  • a pair of classic codec structures are trained by optimizing the rate-distortion loss, and the trained encoder is frozen to fine-tune the decoder to make it a generator in the GAN.
  • the decoder and generator parameters are interpolated to reduce the artifacts of compressed images at low bit rates.
  • the compression network is completely optimized through a newly designed network structure. To further obtain better results, the network structure needs to be redesigned, and there is basically no compatibility.
  • the embodiment of the present application provides an image processing method, including:
  • Step 101 Acquire a first image to be processed of a target object, where the first image to be processed includes a first feature map of a first image of the target object or a first sub-video in a first video of the target object.
  • the first feature map is extracted from the first image or the first video using a neural network.
  • the target object is a photographed object (or photographed content) corresponding to the first video or the first image.
  • the first video is a multi-view video (multi-view) of the target object, or the first video is a scalable video of the target object.
  • the multi-view video (also described as stereoscopic video) refers to the video of each view obtained by shooting the same object (or the same scene) with multiple cameras at different viewpoints.
  • the first sub-video is the video of the target object at a certain viewpoint.
  • the scalable video image includes videos of different resolutions or different frame rates of the same video source.
  • the first sub-video is a video in the first video that transmits and displays the target object at a certain resolution. That is to say, the method of the embodiment of the present application can be used not only for processing feature maps but also for processing videos of different viewing angles or different resolutions.
  • Step 102 Processing the first to-be-processed image based on the first compression network to obtain a reconstructed second feature map or a second sub-video;
  • the second feature map is the feature map of the first image, and the image features of the reconstructed second feature map are different from the image features of the first feature map;
  • the reconstructed image of the second sub-video is different from the image of the first sub-video.
  • the image features include at least one of resolution and feature quantity.
  • the resolution of the first feature map is different from the resolution of the second feature map
  • the number of features corresponding to the first feature map is different from the number of features corresponding to the second feature map.
  • the first compression network is used to output a feature map having different image features from the input feature map.
  • the image features in the embodiment of the present application include but are not limited to resolution and number of features. That is to say, in the embodiment of the present application, the feature map in the image can be compressed by the first compression network, and part of the video in the multi-view video or scalable video (such as a video of a certain resolution or a video of a certain view) can also be compressed.
  • the first compression network is a learnable compression network.
  • the first compression network is trained by a rate loss function and a distortion loss function.
  • the method of the embodiment of the present application is applied to a neural network that extracts multiple feature maps from an image, that is, the first feature map is extracted by a neural network.
  • the first feature map is extracted by a neural network.
  • this feature is used to perform mutual prediction between the feature maps based on the first compression network, that is, the reconstructed second feature map is obtained by the first feature map.
  • mutual prediction is performed between different videos based on the above-mentioned first compression network.
  • a reconstructed second sub-video is obtained based on the above-mentioned first sub-video.
  • the resolution or frame rate corresponding to the first sub-video and the second sub-video are different, or the shooting angles corresponding to the first sub-video or the second sub-video are different.
  • a first object to be processed of a target object is obtained, wherein the first image to be processed includes a first feature map of the first image of the target object or a first sub-video in a first video of the target object; the first image to be processed is processed based on a first compression network to obtain a reconstructed second feature map or a second sub-video; the second feature map is a feature map of the first image, and the image features of the reconstructed second feature map are different from the image features of the first feature map; the image of the reconstructed second sub-video is different from the image of the first sub-video.
  • the network can obtain the reconstructed second feature map or second sub-video of the target object, so there is no need to encode and transmit the second feature map or second sub-video, which improves the encoding efficiency.
  • obtaining the reconstructed second feature map or second sub-video based on the compression network can effectively ensure the image quality.
  • obtaining a first feature map of a first image of the target object includes:
  • a feature map is selected from the multiple feature maps as the first feature map.
  • the target neural network is a feature pyramid network, for example, a fast region-based convolutional neural network (Fast Region Convolutional Neural Network, FastRCNN).
  • the feature pyramid network can be used to extract feature maps of different resolutions.
  • target neural network can also extract neural networks of multiple feature maps in other forms.
  • the embodiment of the present application is explained by using the feature pyramid network to implement the target detection task as an example.
  • the network architecture of the feature pyramid network is shown in Figure 2, where the input of the neural network is an image with a resolution of W ⁇ H consisting of three color channels (RGB).
  • the feature map of the P layer is obtained from the neural network, and the resolution of the feature map of each P layer is P2: P3: P4: P5: The number of feature channels is 256.
  • the first compression network includes a first compression encoding network, a first processing unit and a first compression decoding network;
  • the step of processing the first image to be processed based on the first compression network to obtain a reconstructed second feature map or a second sub-video includes:
  • the decoded first variable is decoded based on the first compression decoding network to obtain a reconstructed second feature map or a second sub-video.
  • pixels of the first image to be processed are normalized.
  • the first compression network of the embodiment of the present application can also be described as a prediction compression module.
  • the prediction compression module includes an encoder, a first processing unit and a decoder.
  • the encoder includes a first compression encoding network
  • the decoder includes a first compression decoding network.
  • the above-mentioned first feature map is the feature map P2 obtained based on the neural network shown in FIG2
  • the above-mentioned second feature map is the feature map P3 obtained based on the neural network shown in FIG2
  • the feature maps for mutual prediction restoration are P2 and P3.
  • the feature maps for mutual prediction restoration can also be any two feature maps in P2-P5 except P2 and P3.
  • the input of the encoder is the feature map P2.
  • the input feature map P2 is compressed (encoded) using the first compression coding network to obtain a latent variable (first variable), that is, a variable
  • first variable that is, a variable
  • the variable c is then quantized and arithmetically encoded to obtain a binary bit stream.
  • the decoder After the decoder obtains the input binary bit stream, it first performs arithmetic decoding and dequantization to obtain the decoded latent variable That is, the decoded first variable is then decoded through the first compression decoding network to obtain the reconstructed second feature map, that is, the feature map P3'.
  • Q( ⁇ ) is the quantization
  • Enc( ⁇ ) and Dec( ⁇ ) are the encoder and decoder respectively.
  • the method of the embodiment of the present application further includes:
  • target processing is performed on the reconstructed second sub-video based on the prediction restoration network to obtain the reconstructed first sub-video;
  • the prediction and restoration network is obtained by training through an enhanced loss function.
  • the target processing includes sampling processing and residual recovery processing
  • the sampling processing includes upsampling processing or downsampling processing.
  • the above-mentioned decoder may also include the above-mentioned prediction and restoration network (prediction and restoration module), which is used to predict and restore the reconstructed first feature map from the second feature map obtained by decoding and reconstruction.
  • the prediction and restoration network is designed based on the residual unit, and the width and height of the reconstructed second feature map are first upsampled to twice the original size. It can be obtained by interpolation methods such as bilinear interpolation, which is not limited here. After that, multiple stacked residual units are used to restore the residual, and then added to the input to obtain the predicted first feature map (i.e., the reconstructed first feature map).
  • the specific prediction and restoration network can also be selected and designed according to actual needs, such as using commonly used residual networks, dense networks and other enhanced networks.
  • Up( ⁇ ) represents a 2-fold upsampling operation
  • the upsampling or downsampling multiple is determined according to the feature map for inter-prediction restoration
  • Res( ⁇ ) represents a multi-stacked residual unit.
  • processing the first to-be-processed image based on the first compression network to obtain a reconstructed second feature map or a second sub-video includes:
  • the first to-be-processed image is processed based on the first compression network to obtain a reconstructed second feature map or a second sub-video;
  • the first condition includes at least one of the following:
  • the network bandwidth is less than or equal to the first threshold
  • the data volume of the first image to be processed is greater than or equal to the second threshold.
  • the method of the embodiment of the present application further includes:
  • the second image to be processed of the target object is processed based on the second compression network to obtain a reconstructed third feature The picture or the third sub video;
  • the second image to be processed includes at least one third feature map of the first image or includes at least one third sub-video of the first sub-video.
  • At least one third feature map of the target object is processed based on the second compression network to obtain a reconstructed third feature map, wherein image features of the third feature map are different from image features of the first feature map or the second feature map;
  • At least one third sub-video of the target object is processed based on the second compression network to obtain a reconstructed third sub-video, wherein the image of the third sub-video is different from the image of the first sub-video, or the image of the third sub-video is different from the image of the second sub-video.
  • processing the second to-be-processed image of the target object based on the second compression network to obtain a reconstructed third feature map or a third sub-video includes:
  • the second to-be-processed image of the target object is processed based on the second compression network to obtain a reconstructed third feature map or a third sub-video;
  • the second condition includes at least one of the following:
  • the network bandwidth is greater than a first threshold
  • the data volume of the second image to be processed is less than the second threshold.
  • the second compression network includes a second compression encoding network, a second processing unit and a second compression decoding network;
  • the processing of the second to-be-processed object of the target object based on the second compression network to obtain a reconstructed third feature map or a third sub-video includes:
  • the decoded second variable is decoded based on the second compression decoding network to obtain a reconstructed third feature map or a third sub-video.
  • the second compression network is a learnable codec network
  • the target neural network is described as a feature pyramid network as an example
  • the at least one third feature map is a feature map P4 and a feature map P5.
  • the second compression network By using the second compression network to compress and reconstruct the feature maps P4 and P5, the reconstruction quality of P4 and P5 can be guaranteed at a low bit rate, thereby ensuring the accuracy of the machine vision task.
  • the specific processing flow is shown in Figure 5.
  • the feature maps P2, P3, P4 and P5 are first obtained from the FastRCNN network, their pixel values are normalized, and the feature maps P2 and P3 with larger resolution are predicted and restored using the mutual prediction restoration network.
  • the encoding end uses the basic feature compression network (second compression network) to compress P4 and P5 respectively.
  • the decoding end uses the decoding network of the second compression network to obtain reconstructed feature maps P4' and P5', and combines the reconstructed feature maps P2' and P3' obtained at the decoding end for visual task (eg, target detection task) analysis to obtain the final target detection result.
  • the second compression network includes an encoder, a second processing unit and a decoder, the encoder includes a second compression encoding network, and the decoder includes a second compression decoding network.
  • the second compression network is a learnable encoding and decoding network, which can be selected and designed according to actual needs, such as using commonly used compression networks such as Balle and Cheng.
  • Enc base ( ⁇ ) and Dec base ( ⁇ ) are the encoder and decoder of the basic feature compression network respectively.
  • the sampling loss function is used to train the first compression network, the second compression network and the prediction and restoration network.
  • Specific loss functions include rate loss function, distortion loss function and enhancement loss function.
  • the rate loss function converts the input features into latent variables, which can be output through arithmetic coding. and Calculate the bit rate.
  • the specific function is as follows:
  • N is the number of training samples
  • represents the network parameters
  • R represents the rate
  • E represents the expected value
  • the distortion loss function is used to measure the difference between the original feature P and the reconstructed feature P'.
  • the distortion loss function is as follows:
  • P′ is the reconstructed feature map
  • represents the network parameters.
  • N is the number of training samples
  • l 2 measures the difference between the original feature map and the reconstructed feature map
  • D( ⁇ ) represents the distortion.
  • the enhancement loss is used to measure the difference between the output feature P′ 2 and the original feature P 2 , and its formula is as follows:
  • L total ( ⁇ ) represents the total loss function
  • compression models with different compression rates can be obtained by adjusting ⁇ .
  • the above-mentioned enhanced loss function can be used for training when training the prediction and restoration module to ensure the quality of the predicted and restored feature map P2'.
  • a first to-be-processed object of a target object is obtained, wherein the first to-be-processed image includes a first feature map of a first image of the target object or a first sub-video in a first video including the target object;
  • the first image to be processed is processed in a first compression network to obtain a reconstructed second feature map or a second sub-video;
  • the second feature map is a feature map of the first image, and the image features of the reconstructed second feature map are different from the image features of the first feature map;
  • the image of the reconstructed second sub-video is different from the image of the first sub-video.
  • the reconstructed second feature map or the second sub-video of the target object can be obtained through the above-mentioned first compression network, so there is no need to encode and transmit the second feature map or the second sub-video, which improves the encoding efficiency, and obtaining the reconstructed second feature map or the second sub-video based on the compression network can effectively ensure the image quality.
  • the image processing method provided in the embodiment of the present application can be executed by an image processing device.
  • an image processing device executing the image processing method is taken as an example to illustrate the image processing device provided in the embodiment of the present application.
  • the embodiment of the present application further provides an image processing device 700, including:
  • a first acquisition module 701 is used to acquire a first image to be processed of a target object, where the first image to be processed includes a first feature map of a first image of the target object or a first sub-video in a first video of the target object;
  • a second acquisition module 702 is used to process the first to-be-processed image based on the first compression network to obtain a reconstructed second feature map or a second sub-video;
  • the second feature map is the feature map of the first image, and the image features of the reconstructed second feature map are different from the image features of the first feature map;
  • the reconstructed image of the second sub-video is different from the image of the first sub-video.
  • the first acquisition module 701 includes:
  • a first acquisition submodule used to acquire a plurality of feature maps of the first image using a target neural network, wherein the target neural network is a neural network used to extract image features;
  • the second acquisition submodule is used to select a feature map from the multiple feature maps as the first feature map.
  • the first compression network includes a first compression encoding network, a first processing unit and a first compression decoding network;
  • the second acquisition module 702 includes:
  • a third acquisition submodule is used to encode the first to-be-processed image based on the first compression coding network to obtain a first variable
  • a fourth acquisition submodule configured to perform quantization, arithmetic coding, arithmetic decoding, and inverse quantization on the first variable based on the first processing unit to obtain a decoded first variable
  • the fifth acquisition submodule is used to decode the decoded first variable based on the first compression decoding network to obtain a reconstructed second feature map or a second sub-video.
  • the first compression network is trained by a rate loss function and a distortion loss function.
  • the image processing device 700 of the embodiment of the present application further includes:
  • a third acquisition module is used to perform target processing on the reconstructed second feature map based on the prediction restoration network to obtain a reconstructed first feature map
  • the reconstructed second sub-video is subjected to target processing based on the prediction restoration network to obtain the reconstructed first sub-video.
  • the prediction and restoration network is obtained by training through an enhanced loss function.
  • the target processing includes sampling processing and residual recovery processing
  • the sampling processing includes upsampling processing or downsampling processing.
  • the first acquisition module 701 is used to process the first to-be-processed image based on a first compression network to obtain a reconstructed second feature map or a second sub-video when a first condition is met;
  • the first condition includes at least one of the following:
  • the network bandwidth is less than or equal to the first threshold
  • the data volume of the first image to be processed is greater than or equal to the second threshold.
  • the image processing device 700 of the embodiment of the present application further includes:
  • a fourth acquisition module used for processing the second to-be-processed image of the target object based on the second compression network to obtain a reconstructed third feature map or a third sub-video;
  • the second image to be processed includes at least one third feature map of the first image or includes at least one third sub-video of the first sub-video.
  • the fourth acquisition module is used to process the second to-be-processed image of the target object based on the second compression network to obtain a reconstructed third feature map or a third sub-video when the second condition is met;
  • the second condition includes at least one of the following:
  • the network bandwidth is greater than a first threshold
  • the data volume of the image to be processed is less than the second threshold.
  • the second compression network includes a second compression encoding network, a second processing unit and a second compression decoding network;
  • the fourth acquisition module includes:
  • a sixth acquisition submodule configured to encode the second object to be processed according to the second compression coding network to obtain a second variable
  • a seventh acquisition submodule configured to perform quantization, arithmetic coding, arithmetic decoding, and inverse quantization on the second variable based on the second processing unit to obtain a decoded second variable
  • An eighth acquisition submodule is used to decode the decoded second variable based on the second compression decoding network to obtain a reconstructed third feature map or a third sub-video.
  • the second compression network is trained by a rate loss function and a distortion loss function.
  • the first video is a multi-view video of the target object, or the first video is a scalable video of the target object.
  • the image features include at least one of resolution and feature quantity.
  • the image processing device of the embodiment of the present application obtains a first object to be processed of a target object, wherein the first image to be processed includes a first feature map of the first image of the target object or a first sub-video in a first video of the target object; processes the first image to be processed based on a first compression network to obtain a reconstructed second feature map or a second sub-video; the second feature map is the feature map of the first image, and the image features of the reconstructed second feature map are different from the image features of the first feature map; the image features of the reconstructed second sub-video are different from the image features of the first sub-video.
  • the reconstructed second feature map or second sub-video of the target object can be obtained through the above-mentioned first compression network, so there is no need to encode and transmit the second feature map or second sub-video, which improves the encoding efficiency, and obtaining the reconstructed second feature map or second sub-video based on the compression network can effectively ensure the image quality.
  • the image processing device in the embodiment of the present application can be an electronic device, such as an electronic device with an operating system, or a component in an electronic device, such as an integrated circuit or a chip.
  • the electronic device can be a terminal, or it can be other devices other than a terminal.
  • the terminal can include but is not limited to the types of terminal 11 listed above, and other devices can be servers, network attached storage (NAS), etc., which are not specifically limited in the embodiment of the present application.
  • the image processing device provided in the embodiment of the present application can implement each process implemented by the method embodiments of Figures 1 to 6 and achieve the same technical effect. To avoid repetition, it will not be repeated here.
  • the embodiment of the present application further provides an image processing device 800, including a processor 801 and a memory 802, wherein the memory 802 stores a program or instruction that can be run on the processor 801, and when the program or instruction is executed by the processor 801, each step of the above-mentioned image processing method embodiment is implemented, and the same technical effect can be achieved. To avoid repetition, it will not be described here.
  • the embodiment of the present application also provides an image processing device, including a processor and a communication interface, the processor is used to obtain a first image to be processed of a target object, the first image to be processed includes a first feature map of the first image of the target object or includes a first sub-video in a first video of the target object; the first image to be processed is processed based on a first compression network to obtain a reconstructed second feature map or a second sub-video; wherein the second feature map is a feature map of the first image, and the image features of the reconstructed second feature map are different from the image features of the first feature map; the image of the reconstructed second sub-video is different from the image of the first sub-video.
  • FIG. 9 is a schematic diagram of the hardware structure of an image processing device that implements the embodiment of the present application.
  • the image processing device is specifically a terminal 900.
  • the terminal 900 includes but is not limited to: a radio frequency unit 901, a network module 902, an audio output unit 903, an input unit 904, a sensor 905, a display unit 906, a user input unit 907, an interface unit 908, a memory 909 and at least some of the components of a processor 910.
  • the terminal 900 may also include a power source (such as a battery) for supplying power to each component, and the power source may be logically connected to the processor 910 through a power management system, so as to implement functions such as managing charging, discharging, and power consumption management through the power management system.
  • a power source such as a battery
  • the terminal structure shown in FIG9 does not constitute a limitation on the terminal, and the terminal may include more or fewer components than shown, or combine certain components, or arrange components differently, which will not be described in detail here.
  • the input unit 904 may include a graphics processing unit (GPU) 9041 and a microphone 9042.
  • the graphics processor 9041 processes the image data of a static picture or video obtained by an image capture device (such as a camera) in a video capture mode or an image capture mode.
  • a display panel 9061 may be included, and the display panel 9061 may be configured in the form of a liquid crystal display, an organic light emitting diode, etc.
  • the user input unit 907 includes a touch panel 9071 and at least one of other input devices 9072.
  • the touch panel 9071 is also called a touch screen.
  • the touch panel 9071 may include two parts: a touch detection device and a touch controller.
  • Other input devices 9072 may include, but are not limited to, a physical keyboard, a function key (such as a volume control key, a switch key, etc.), a trackball, a mouse, and a joystick, which will not be repeated here.
  • the RF unit 901 can transmit the data to the processor 910 for processing; in addition, the RF unit 901 can send uplink data to the network side device.
  • the RF unit 901 includes but is not limited to an antenna, an amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, etc.
  • the memory 909 can be used to store software programs or instructions and various data.
  • the memory 909 may mainly include a first storage area for storing programs or instructions and a second storage area for storing data, wherein the first storage area may store an operating system, an application program or instruction required for at least one function (such as a sound playback function, an image playback function, etc.), etc.
  • the memory 909 may include a volatile memory or a non-volatile memory, or the memory 909 may include both volatile and non-volatile memories.
  • the non-volatile memory may be a read-only memory (ROM), a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), or a flash memory.
  • the volatile memory may be a random access memory (RAM), a static random access memory (SRAM), a dynamic random access memory (DRAM), a synchronous dynamic random access memory (SDRAM), a double data rate synchronous dynamic random access memory (DDRSDRAM), an enhanced synchronous dynamic random access memory (ESDRAM), a synchronous link dynamic random access memory (SLDRAM) and a direct memory bus random access memory (DRRAM).
  • the memory 909 in the embodiment of the present application includes but is not limited to these and any other suitable types of memories.
  • the processor 910 may include one or more processing units; optionally, the processor 910 integrates an application processor and a modem processor, wherein the application processor mainly processes operations related to an operating system, a user interface, and application programs, and the modem processor mainly processes wireless communication signals, such as a baseband processor. It is understandable that the modem processor may not be integrated into the processor 910.
  • the processor 910 is configured to obtain a first image to be processed of the target object, wherein the first image to be processed includes a first feature map of the first image of the target object or a first sub-video in a first video of the target object;
  • the second feature map is the feature map of the first image, and the image features of the reconstructed second feature map are different from the image features of the first feature map;
  • the reconstructed image of the second sub-video is different from the image of the first sub-video.
  • a first to-be-processed object of the target object is obtained, wherein the first to-be-processed image includes the The first feature map of the first image of the target object or the first sub-video in the first video including the target object; the first image to be processed is processed based on the first compression network to obtain the reconstructed second feature map or second sub-video; the second feature map is the feature map of the first image, and the image features of the reconstructed second feature map are different from the image features of the first feature map; the second sub-video is a partial video in the first video, and the image of the reconstructed second sub-video is different from the image of the first sub-video.
  • the reconstructed second feature map or second sub-video of the target object can be obtained through the above-mentioned first compression network, so there is no need to encode and transmit the second feature map or second sub-video, which improves the encoding efficiency, and obtaining the reconstructed second feature map or second sub-video based on the compression network can effectively ensure the image quality.
  • processor 910 is further configured to:
  • a feature map is selected from the multiple feature maps as the first feature map.
  • the first compression network includes a first compression encoding network, a first processing unit and a first compression decoding network;
  • the processor 910 is further configured to:
  • the decoded first variable is decoded based on the first compression decoding network to obtain a reconstructed second feature map or a second sub-video.
  • the first compression network is trained by a rate loss function and a distortion loss function.
  • processor 910 is further configured to:
  • target processing is performed on the reconstructed second sub-video based on the prediction restoration network to obtain the reconstructed first sub-video;
  • the prediction and restoration network is obtained by training through an enhanced loss function.
  • the target processing includes sampling processing and residual recovery processing
  • the sampling processing includes upsampling processing or downsampling processing.
  • processor 910 is further configured to:
  • the first to-be-processed image is processed based on the first compression network to obtain a reconstructed second feature map or a second sub-video;
  • the first condition includes at least one of the following:
  • the network bandwidth is less than or equal to the first threshold
  • the data volume of the first image to be processed is greater than or equal to the second threshold.
  • processor 910 is further configured to:
  • the second image to be processed of the target object is processed based on the second compression network to obtain a reconstructed third feature The picture or the third sub video;
  • the second image to be processed includes at least one third feature map of the first image or includes at least one third sub-video of the first sub-video.
  • processor 910 is further configured to:
  • the second to-be-processed image of the target object is processed based on the second compression network to obtain a reconstructed third feature map or a third sub-video;
  • the second condition includes at least one of the following:
  • the network bandwidth is greater than a first threshold
  • the data volume of the second image to be processed is less than the second threshold.
  • the second compression network includes a second compression encoding network, a second processing unit and a second compression decoding network;
  • the processor 910 is further configured to:
  • the decoded second variable is decoded based on the second compression decoding network to obtain a reconstructed third feature map or a third sub-video.
  • the second compression network is trained by a rate loss function and a distortion loss function.
  • the first video is a multi-view video of the target object, or the first video is a scalable video of the target object.
  • the image features include at least one of resolution and feature quantity.
  • a first object to be processed of a target object is obtained, wherein the first image to be processed includes a first feature map of the first image of the target object or includes a first sub-video in a first video of the target object; the first image to be processed is processed based on a first compression network to obtain a reconstructed second feature map or a second sub-video; the second feature map is a feature map of the first image, and the image features of the reconstructed second feature map are different from the image features of the first feature map; the image of the reconstructed second sub-video is different from the image of the first sub-video.
  • the reconstructed second feature map or second sub-video of the target object can be obtained through the above-mentioned first compression network, so that the second feature map or the second sub-video does not need to be encoded and transmitted, which improves the encoding efficiency, and obtaining the reconstructed second feature map or the second sub-video based on the compression network can effectively ensure the image quality.
  • An embodiment of the present application also provides a readable storage medium, which may be volatile or non-volatile.
  • a program or instruction is stored on the readable storage medium.
  • the program or instruction is executed by a processor, the various processes of the above-mentioned image processing method embodiment are implemented and the same technical effect can be achieved. To avoid repetition, it will not be repeated here.
  • the processor is the processor in the terminal described in the above embodiment.
  • the readable storage medium includes a computer readable storage medium, such as a computer read-only memory ROM, a random access memory RAM, a magnetic disk or an optical disk.
  • An embodiment of the present application further provides a chip, which includes a processor and a communication interface, wherein the communication interface is coupled to the processor, and the processor is used to run programs or instructions to implement the various processes of the above-mentioned image processing method embodiment, and can achieve the same technical effect. To avoid repetition, it will not be repeated here.
  • the chip mentioned in the embodiments of the present application can also be called a system-level chip, a system chip, a chip system or a system-on-chip chip, etc.
  • the embodiment of the present application further provides a computer program/program product, which is stored in a storage medium, and is executed by at least one processor to implement the various processes of the above-mentioned image processing method embodiment, and can achieve the same technical effect. To avoid repetition, it will not be repeated here.
  • the technical solution of the present application can be embodied in the form of a computer software product, which is stored in a storage medium (such as ROM/RAM, a magnetic disk, or an optical disk), and includes a number of instructions for enabling a terminal (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the methods described in each embodiment of the present application.
  • a storage medium such as ROM/RAM, a magnetic disk, or an optical disk
  • a terminal which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请公开了一种图像处理方法、装置及设备,属于编解码技术领域,本申请实施例的图像处理方法包括:获取目标对象的第一待处理图像,所述第一待处理图像包括所述目标对象的第一图像的第一特征图或者包括所述目标对象的第一视频中的第一子视频;基于第一压缩网络对所述第一待处理图像进行处理,获取重建后的第二特征图或第二子视频;其中,所述第二特征图为所述第一图像的特征图,且所述重建后的第二特征图的图像特征与所述第一特征图的图像特征不同;所述重建后的第二子视频的图像与所述第一子视频的图像不同。

Description

图像处理方法、装置及设备
相关申请的交叉引用
本申请主张在2022年10月13日在中国提交的中国专利申请No.202211254689.1的优先权,其全部内容通过引用包含于此。
技术领域
本申请属于编解码技术领域,具体涉及一种图像处理方法、装置及设备。
背景技术
在机器视觉任务应用中,为了避免直接传输数据量极大视频或图像,一般会对视频或图像进行压缩。而传统图像压缩标准是面向广泛的图像视频压缩任务设计的,图像或视频在空间和时间上具有较高的相关性,但是图像或视频的特征图并没有此特征,直接使用传统的图像压缩方法处理图像特征图对编码效率和重建特征图的质量都无法保证。
发明内容
本申请实施例提供一种图像处理方法、装置及设备,能够解决相关技术中使用传统的图像压缩方法处理图像特征图对编码效率和重建特征图的质量都无法保证的问题。
第一方面,提供了一种图像处理方法,包括:
获取目标对象的第一待处理图像,所述第一待处理图像包括所述目标对象的第一图像的第一特征图或者包括所述目标对象的第一视频中的第一子视频;
基于第一压缩网络对所述第一待处理图像进行处理,获取重建后的第二特征图或第二子视频;
其中,所述第二特征图为所述第一图像的特征图,且所述重建后的第二特征图的图像特征与所述第一特征图的图像特征不同;
所述重建后的第二子视频的图像与所述第一子视频的图像不同。
第二方面,提供了一种图像处理装置,包括:
第一获取模块,用于获取目标对象的第一待处理图像,所述第一待处理图像包括所述目标对象的第一图像的第一特征图或者包括所述目标对象的第一视频中的第一子视频;
第二获取模块,用于基于第一压缩网络对所述第一待处理图像进行处理,获取重建后的第二特征图或第二子视频;
其中,所述第二特征图为所述第一图像的特征图,且所述重建后的第二特征图的图像特征与所述第一特征图的图像特征不同;
所述第二子视频为所述第一视频中的部分视频,且所述重建后的第二子视频的图像与所述第一子视频的图像不同。
第三方面,提供了一种图像处理装置,该装置包括处理器和存储器,所述存储器存储可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如第一方面所述的方法的步骤。
第四方面,提供了一种图像处理装置,包括处理器及通信接口,其中,所述处理器用于获取目标对象的第一待处理图像,所述第一待处理图像包括所述目标对象的第一图像的第一特征图或者包括所述目标对象的第一视频中的第一子视频;
基于第一压缩网络对所述第一待处理图像进行处理,获取重建后的第二特征图或第二子视频;
其中,所述第二特征图为所述第一图像的特征图,且所述重建后的第二特征图的图像特征与所述第一特征图的图像特征不同;
所述重建后的第二子视频的图像与所述第一子视频的图像不同。
第五方面,提供了一种可读存储介质,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如第一方面所述的方法的步骤。
第六方面,提供了一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如第一方面所述的方法。
第七方面,提供了一种计算机程序/程序产品,所述计算机程序/程序产品被存储在存储介质中,所述计算机程序/程序产品被至少一个处理器执行以实现如第一方面所述的方法的步骤。
在本申请实施例中,获取目标对象的第一待处理对象,所述第一待处理图像包括所述目标对象的第一图像的第一特征图或者包括所述目标对象的第一视频中的第一子视频;基于第一压缩网络对所述第一待处理图像进行处理,获取重建后的第二特征图或第二子视频;第二特征图为第一图像的特征图,且重建后的第二特征图的图像特征与第一特征图的图像特征不同;重建后的第二子视频的图像与所述第一子视频的图像不同。通过上述第一压缩网络能够获取目标对象重建后的第二特征图或第二子视频,从而无需再对第二特征图或第二子视频进行编码传输,提高了编码效率,且基于压缩网络来获取重建后的第二特征图或第二子视频能够有效保证图像质量。
附图说明
图1表示本申请实施例的图像处理方法的流程示意图;
图2表示本申请实施例的特征金字塔网络的网络架构示意图;
图3表示本申请实施例中第一压缩网络的示意图;
图4表示本申请实施例中预测复原网络的示意图;
图5表示本申请实施例中第一压缩网络和第二压缩网络处理特征图的示意图;
图6表示本申请实施例中第二压缩网络的示意图;
图7表示本申请实施例中图像处理装置的模块示意图;
图8表示本申请实施例中图像处理设备的结构框图;
图9表示本申请实施例的终端的结构框图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员所获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书中的术语“第一”、“第二”等是用于区别类似的对象,而不用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,以便本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施,且“第一”、“第二”所区别的对象通常为一类,并不限定对象的个数,例如第一对象可以是一个,也可以是多个。此外,说明书以及权利要求中“和/或”表示所连接对象的至少其中之一,字符“/”一般表示前后关联对象是一种“或”的关系。
本申请实施例中的图像处理方法对应的图像处理装置可以为终端,该终端也可以称作终端设备或者用户终端(User Equipment,UE),终端可以是手机、平板电脑(Tablet Personal Computer)、膝上型电脑(Laptop Computer)或称为笔记本电脑、个人数字助理(Personal Digital Assistant,PDA)、掌上电脑、上网本、超级移动个人计算机(ultra-mobile personal computer,UMPC)、移动上网装置(Mobile Internet Device,MID)、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、机器人、可穿戴式设备(Wearable Device)或车载设备(Vehicle User Equipment,VUE)、行人终端(Pedestrian User Equipment,PUE)等终端侧设备,可穿戴式设备包括:智能手表、手环、耳机、眼镜等。需要说明的是,在本申请实施例并不限定终端的具体类型。
为使本领域技术人员能够更好地理解本申请实施例,先进行如下说明。
1、传统图像编解码方案。
传统编码方案大多遵循变换、量化和熵编码三步。对联合图像专家小组(Joint Photographic Experts Group,JPEG)的静止图像压缩作为最广泛的压缩标准,使用一系列转换、量化和熵编码来尽可能多地减少空间等编码冗余。为了实现减少空间冗余,首先将图像划分为小块,通过离散余弦变换(Discrete Cosine Transform,DCT)将图像从时域转换到频域,以实现更紧凑的表示,然后对变换后的图像信息进行量化,输入到熵编码处理中。其中量化是整个压缩方法中唯一有损过程。由于小波变换对非平稳过程的通用性,为了减少量化过程中的信息损失,JPEG2000使用离散沃尔什变换(Discrete Walsh Transform,DWT)代替DCT以达到更好的压缩质量。在新的视频压缩标准(High Efficiency Video Coding,HEVC)标准中,帧内编码单元可以独立地压缩一个单独的帧,新的图像格式(Better  Portable Graphics,BPG)编解码器基于这种相似性被提出用于图像压缩。目前压缩标准为面向更广泛的图像压缩任务设计,所以在一些特殊的高度相关的图像集合面前并没有优势。
2、基于深度学习的图像编解码。
随着深度学习的发展,基于神经网络的方法在图像处理中有了很广泛的应用,图像压缩就是其一。在基于深度学习的方案中,压缩任务被视为一个编码过程,并使用端到端(End-to-end)学习的方法来训练。具体过程可分解为一个编码过程,与之对应的重构任务可视为解码过程。在基于学习的压缩方法中广泛应用了编码器-解码器结构。例如,一种基于递归神经网络(Recursive Neural Network,RNN)的压缩模型,该模型采用自动编码器提取图像的特征作为变换过程。在编码器和解码器中利用了残差结构进行特征提取。例如,相关技术中的利用压缩率和恢复失真的新优化方法,光滑可调的压缩比使其在以后的方法中很受欢迎。为了获得更好的性能,相关技术在此基础上又提出了一种基于超先验的压缩方法来减少空间冗余,并改进了熵编码模块,并附加到超先验结构上,以进一步减少编码冗余。相关就死中利用残差结构和注意机制的优势,提出了一种更好结构的自编码器,并提出了一种高斯混合似然熵模型,以提高其灵活性和精度。上述的方法固然可以通过参数调节来进行压缩比的变化,然而,在低比特率压缩时,因为这些方法没有进行极端情况下的特殊关注,其压缩质量通常下降极快。
3.针对低码率的方案。
相关技术中,对低比特率性能进行优化。对于JPEG编解码器方法,提出了对图像应用2×2平均池化(Average Pooling)得到较小的图像,在经过JPEG编解码后,重建时进行插值得到原尺寸图像。并通过设计降采样和插值过程中的滤波器对方法进行优化,但是此方法设计的滤波器与图像信息相关,并且需要手动设计。同时,相关技术中尝试将生成对抗网络(Generative Adversarial Network,GAN)应用于低比特率压缩。提出了一种生成式压缩架构,从编码器编码的图像分布中生成图像,并且设计了相应的损失函数来平衡视觉质量和重建质量。并使用生成对抗网络(Generative Adversarial Networks,GAN)作为解码器结构的增强模块,通过优化率失真损失来训练一对经典的编解码器结构,并冻结训练后的编码器用来微调解码器使其成为GAN中的生成器。最后对解码器和生成器参数进行插值,以减少低比特率情况下压缩图像的伪影。此时的压缩网络完全通过全新设计的网络结构进行优化,想进一步得到更优秀的结果需要重新设计网络结构,基本没有可兼容性。
下面结合附图,通过一些实施例及其应用场景对本申请实施例提供的图像处理方法进行详细地说明。
如图1所示,本申请实施例提供了一种图像处理方法,包括:
步骤101:获取目标对象的第一待处理图像,所述第一待处理图像包括所述目标对象的第一图像的第一特征图或者包括所述目标对象的第一视频中的第一子视频。
该第一特征图是利用神经网络从所述第一图像或所述第一视频中提取的。该目标对象为第一视频或第一图像对应的拍摄对象(或拍摄内容)。
可选地,所述第一视频为所述目标对象的多视角视频(multi-view),或者,所述第一视频为所述目标对象的可伸缩视频。
上述多视角视频(也可描述为立体视频)是指对同一对象(或同一场景)用多个摄像机在不同视角进行拍摄,得到的各视角的视频。例如,上述第一子视频为所述目标对象在某一个视角的视频。
上述可伸缩视频图像(scalable video)包括同一个视频源的不同分辨率或不同帧率的视频。例如,上述第一子视频为所述第一视频中以某一个分辨率对所述目标对象进行传输和显示的视频。也就是说明,本申请实施例的方法,可以利用不仅适用于特征图的处理也适用于不同视角或不同分辨率的视频的处理。
步骤102:基于第一压缩网络对所述第一待处理图像进行处理,获取重建后的第二特征图或第二子视频;
其中,所述第二特征图为所述第一图像的特征图,且所述重建后的第二特征图的图像特征与所述第一特征图的图像特征不同;
所述重建后的第二子视频的图像与所述第一子视频的图像不同。
可选地,所述图像特征包括分辨率和特征数量中的至少一项。
例如,所述第一特征图的分辨率与所述第二特征图的分辨率不同;
或者,所述第一特征图对应的特征数量与所述第二特征图对应的特征数量不同。
上述第一压缩网络用于输出与输入的特征图具有不同图像特征的特征图。本申请实施例中的图像特征包括但不限于分辨率和特征数量。也就是说,本申请实施例中可以通过第一压缩网络对图像中的特征图进行压缩,也可以对多视角视频或可伸缩视频中的部分视频(如某一个分辨率的视频或某一视角的视频)进行压缩。
可选地,上述第一压缩网络为可学习的压缩网络。该第一压缩网络是通过速率损失函数和失真损失函数训练得到的。
在本申请的一实现方式中,本申请实施例的方法应用于对图像提取多个特征图的神经网络,即上述第一特征图是通过神经网络提取的。通过神经网络提取的多个特征图之间存在信息冗余的特点,利用该特点基于上述第一压缩网络进行特征图之间的互预测,即通过上述第一特征图得到重建后的第二特征图。
在本申请的另一实现方式中,基于上述第一压缩网络进行不同视频之间的互预测,例如,基于上述第一子视频得到重建后的第二子视频,该第一子视频与所述第二子视频对应的分辨率或帧率不同,或者,该第一子视频或第二子视频对应的拍摄视角不同。
本申请实施例中,获取目标对象的第一待处理对象,所述第一待处理图像包括所述目标对象的第一图像的第一特征图或者包括所述目标对象的第一视频中的第一子视频;基于第一压缩网络对所述第一待处理图像进行处理,获取重建后的第二特征图或第二子视频;,第二特征图为第一图像的特征图,且重建后的第二特征图的图像特征与第一特征图的图像特征不同;重建后的第二子视频的图像与所述第一子视频的图像不同。通过上述第一压缩 网络能够获取目标对象重建后的第二特征图或第二子视频,从而无需再对第二特征图或第二子视频进行编码传输,提高了编码效率,且基于压缩网络来获取重建后的第二特征图或第二子视频能够有效保证图像质量。
可选地,获取目标对象的第一图像的第一特征图,包括:
利用目标神经网络获取所述第一图像的多个特征图,所述目标神经网络为用于提取图像特征的神经网络;
在所述多个特征图中选取一个特征图作为所述第一特征图。
在本申请的一实现方式中,上述目标神经网络为特征金字塔网络例如,快速的基于区域的卷积神经网络(Fast Region Convolutional Neural Network,FastRCNN)。利用该特征金字塔网络能够提取不同分辨率的特征图。
当然,上述目标神经网络也可以提取其他形式的多个特征图的神经网络,本申请实施例以使用特征金字塔网络实现目标检测任务为例进行阐述。
特征金字塔网络的网络架构如图2所示,其中,神经网络的输入是分辨率为W×H由3个颜色通道(RGB)组成的图像,从神经网络中获得P层(P-layer)的特征图,各P层的特征图分辨率分别为P2:P3:P4:P5:特征通道数量均为256。
可选地,所述第一压缩网络包括第一压缩编码网络、第一处理单元和第一压缩解码网络;
所述基于第一压缩网络对所述第一待处理图像进行处理,获取重建后的第二特征图或第二子视频,包括:
基于所述第一压缩编码网络对所述第一待处理图像进行编码,得到第一变量;
基于所述第一处理单元对所述第一变量进行量化、算术编码、算术解码和反量化,得到解码后的第一变量;
基于所述第一压缩解码网络对所述解码后的第一变量进行解码处理,得到重建后的第二特征图或第二子视频。
可选地,基于所述第一压缩编码网络对第一待处理图像进行编码之前,对第一待处理图像的像素进行归一化处理。
本申请实施例的第一压缩网络也可描述为预测压缩模块,如图3所示,该预测压缩模块包括编码器、第一处理单元和解码器,该编码器包括第一压缩编码网络,该解码器包括第一压缩解码网络。假设上述第一特征图为基于图2所示的神经网络所得到的特征图P2,上述第二特征图为基于图2所示的神经网络所得到的特征图P3,即进行互预测复原的特征图为P2和P3,当然进行互预测复原的特征图也可以是P2-P5中除上述P2和P3之外的任意两个特征图。编码器的输入为特征图P2,首先对其像素值进行归一化,然后利用第一压缩编码网络对输入的特征图P2进行压缩(编码),获得潜在变量(第一变量),即变 量c,然后对变量c进行量化和算术编码,获得二进制比特流,解码器获得输入的二进制比特流后,首先进行算术解码和反量化获得解码后的潜在变量即解码后的第一变量,然后经过第一压缩解码网络进行解码获得重建后的第二特征图,即特征图P3’。
图3所示的第一压缩网络可以根据实际需求进行选择和设计,如使用常用的Balle、Cheng等压缩网络,图3的第一压缩网络锁执行的过程可以表示为如下公式:
c=Enc(P2);

其中,Q(·)为量化、算术编码和算术解码操作,Enc(·)和Dec(·)分别为编码器和解码器。
可选地,本申请实施例的方法,还包括:
基于预测复原网络对重建后的第二特征图进行目标处理,得到重建后的第一特征图;
或者,基于预测复原网络对重建后的第二子视频进行目标处理,得到重建后的第一子视频;
其中,所述预测复原网络是通过增强损失函数训练得到的。
可选地,所述目标处理包括采样处理和恢复残差处理,所述采样处理包括上采样处理或下采样处理。
在本申请的一实施例中,如图4所示,上述解码器还可包括上述预测复原网络(预测复原模块),该预测复原网络用于从解码重建获得的第二特征图预测复原重建的第一特征图。具体的,该预测复原网络基于残差单元设计,首先对重建的第二特征图的宽和高分别上采样到原尺寸的2倍。可以采用双线性插值等插值方法得到,在此不做限制。之后采用多堆叠的残差单元来恢复出残差,进而和输入相加得到预测的第一特征图(即重建的第一特征图)。如图4所示,具体的预测复原网络也可以根据实际需求进行选择和设计,如使用常用的残差网络、密集网络等增强网络,图4的过程可以表示为如下公式:
P2’=Up(P3’)+Res(Up(P3’));
其中,Up(·)表示2倍上采样操作,上采样或下采样的倍数根据具体进行互预测复原的特征图决定,Res(·)表示多堆叠的残差单元。
可选地,基于第一压缩网络对所述第一待处理图像进行处理,获取重建后的第二特征图或第二子视频,包括:
在满足第一条件的情况下,基于第一压缩网络对所述第一待处理图像进行处理,获取重建后的第二特征图或第二子视频;
其中,所述第一条件包括以下至少一项:
网络带宽小于或者等于第一阈值;
第一待处理图像的数据量大于或者等于第二阈值。
可选地,本申请实施例的方法,还包括:
基于第二压缩网络对所述目标对象的第二待处理图像进行处理,得到重建后的第三特 征图或第三子视频;
所述第二待处理图像包括所述第一图像的至少一个第三特征图或者包括所述第一子视频的至少一个第三子视频。
具体的,基于第二压缩网络对所述目标对象的至少一个第三特征图进行处理,得到重建后的第三特征图,所述第三特征图的图像特征与所述第一特征图或第二特征图的图像特征不同;
或者,基于第二压缩网络对所述目标对象的至少一个第三子视频进行处理,得到重建后的第三子视频,所述第三子视频的图像与所述第一子视频的图像不同,或者,所述第三子视频的图像与所述第二子视频的图像不同。
可选地,基于第二压缩网络对所述目标对象的第二待处理图像进行处理,得到重建后的第三特征图或第三子视频,包括:
在满足第二条件的情况下,基于第二压缩网络对所述目标对象的第二待处理图像进行处理,得到重建后的第三特征图或第三子视频;
其中,所述第二条件包括以下至少一项:
网络带宽大于第一阈值;
第二待处理图像的数据量小于第二阈值。
可选地,所述第二压缩网络包括第二压缩编码网络、第二处理单元和第二压缩解码网络;
所述基于第二压缩网络对所述目标对象的第二待处理对象进行处理,得到重建后的第三特征图或第三子视频,包括:
根据所述第二压缩编码网络对所述第二待处理对象进行编码,得到第二变量;
基于所述第二处理单元对所述第二变量进行量化、算术编码、算术解码和反量化,得到解码后的第二变量;
基于所述第二压缩解码网络对所述解码后的第二变量进行解码处理,得到重建后的第三特征图或第三子视频。
可选地,上述第二压缩网络为可学习的编解码网络,以上述目标神经网络为特征金字塔网络为例进行阐述,上述至少一个第三特征图为特征图P4和特征图P5。利用该第二压缩网络对特征图P4和特征图P5进行压缩和重建,可以在码率较低的情况下保证P4和P5的重建质量,从而保证机器视觉任务的准确度。具体的处理流程如图5所示,编码端,首先从FastRCNN网络获得特征图P2、P3、P4和P5,对其像素值进行归一化,使用互预测复原网络对分辨率较大的特征图P2和P3进行预测复原。然后,针对基础特征图P4和P5,考虑到其占用码率较少且对于完成视觉任务比较重要,因此编码端采用基础特征压缩网络(第二压缩网络)分别压缩P4和P5。解码端使用第二压缩网络的解码网络得到重建的特征图P4’和P5’,结合解码端获得重建的特征图P2’和P3’,用于视觉任务(例如:目标检测任务)分析,得到最终的目标检测结果。
可选地,如图6所示,上述第二压缩网络包括编码器、第二处理单元和解码器,该编码器包括第二压缩编码网络,该解码器包括第二压缩解码网络。该第二压缩网络为可学习的编解码网络,可以根据实际需求进行选择和设计,如使用常用的Balle、Cheng等压缩网络,图6的过程可以表示为如下公式:
d=Encbase(P4);


e=Encbase(P5);

其中,Encbase(·)和Decbase(·)分别为基础特征压缩网络的编码器和解码器。
在本申请的一实施例中,采样损失函数对第一压缩网络、第二压缩网络和预测复原网络进行训练。具体的损失函数包括速率损失函数、失真损失函数和增强损失函数。
其中,速率损失函数将输入特征转换为潜在变量,算术编码后可通过输出计算比特率,具体的该函数如下所示:
其中,是编码器输出的潜在变量,是引入的额外边缘信息,为在条件下潜在变量的像素值概率分布,为边缘信息的概率分布。此外,N是训练样本的数量,θ代表网络参数,其中,R表示速率,E表示期望值。
失真损失函数用于测量原始特征P重建特征P’之间的差异,该失真损失函数如下所示:
其中P′是重建特征图,其中θ代表网络参数。此外,N是训练样本的数量,l2测量原特征图与重建特征图之间的差异,D(θ)表示失真度。
增强损失用于测量输出特征P′2和原始特征P2之间的差异,其公式如下:
其中,E(θ)表示期望值。
在训练上述第一压缩网络和第二压缩网络时采用速率损失函数和失真损失函数进行训练,具体的,采用以下公式进行训练:
Ltotal(θ)=λ·D(θ)+R(θ);
其中,Ltotal(θ)表示总的损失函数,通过调节λ可以得到不同压缩率的压缩模型。
在训练预测复原模块时可采用上述增强损失函数进行训练,以保证预测复原的特征图P2’的质量。
在本申请实施例中,获取目标对象的第一待处理对象,所述第一待处理图像包括所述目标对象的第一图像的第一特征图或者包括所述目标对象的第一视频中的第一子视频;基 于第一压缩网络对所述第一待处理图像进行处理,获取重建后的第二特征图或第二子视频;,第二特征图为第一图像的特征图,且重建后的第二特征图的图像特征与第一特征图的图像特征不同;重建后的第二子视频的图像与所述第一子视频的图像不同。通过上述第一压缩网络能够获取目标对象重建后的第二特征图或第二子视频,从而无需再对第二特征图或第二子视频进行编码传输,提高了编码效率,且基于压缩网络来获取重建后的第二特征图或第二子视频能够有效保证图像质量。
本申请实施例提供的图像处理方法,执行主体可以为图像处理装置。本申请实施例中以图像处理装置执行图像处理方法为例,说明本申请实施例提供的图像处理装置。
如图7所示,本申请实施例还提供了一种图像处理装置700,包括:
第一获取模块701,用于获取目标对象的第一待处理图像,所述第一待处理图像包括所述目标对象的第一图像的第一特征图或者包括所述目标对象的第一视频中的第一子视频;
第二获取模块702,用于基于第一压缩网络对所述第一待处理图像进行处理,获取重建后的第二特征图或第二子视频;
其中,所述第二特征图为所述第一图像的特征图,且所述重建后的第二特征图的图像特征与所述第一特征图的图像特征不同;
所述重建后的第二子视频的图像与所述第一子视频的图像不同。
可选地,第一获取模块701包括:
第一获取子模块,用于利用目标神经网络获取所述第一图像的多个特征图,所述目标神经网络为用于提取图像特征的神经网络;
第二获取子模块,用于在所述多个特征图中选取一个特征图作为所述第一特征图。
可选地,所述第一压缩网络包括第一压缩编码网络、第一处理单元和第一压缩解码网络;
第二获取模块702包括:
第三获取子模块,用于基于所述第一压缩编码网络对所述第一待处理图像进行编码,得到第一变量;
第四获取子模块,用于基于所述第一处理单元对所述第一变量进行量化、算术编码、算术解码和反量化,得到解码后的第一变量;
第五获取子模块,用于基于所述第一压缩解码网络对所述解码后的第一变量进行解码处理,得到重建后的第二特征图或第二子视频。
可选地,所述第一压缩网络是通过速率损失函数和失真损失函数训练得到的。
可选地,本申请实施例的图像处理装置700,还包括:
第三获取模块,用于基于预测复原网络对重建后的第二特征图进行目标处理,得到重建后的第一特征图;
或者,基于预测复原网络对重建后的第二子视频进行目标处理,得到重建后的第一子 视频;
其中,所述预测复原网络是通过增强损失函数训练得到的。
可选地,所述目标处理包括采样处理和恢复残差处理,所述采样处理包括上采样处理或下采样处理。
可选地,第一获取模块701用于在满足第一条件的情况下,基于第一压缩网络对所述第一待处理图像进行处理,获取重建后的第二特征图或第二子视频;
其中,所述第一条件包括以下至少一项:
网络带宽小于或者等于第一阈值;
第一待处理图像的数据量大于或者等于第二阈值。
可选地,本申请实施例的图像处理装置700,还包括:
第四获取模块,用于基于第二压缩网络对所述目标对象的第二待处理图像进行处理,得到重建后的第三特征图或第三子视频;
所述第二待处理图像包括所述第一图像的至少一个第三特征图或者包括所述第一子视频的至少一个第三子视频。
可选地,所述第四获取模块用于在满足第二条件的情况下,基于第二压缩网络对所述目标对象的第二待处理图像进行处理,得到重建后的第三特征图或第三子视频;
其中,所述第二条件包括以下至少一项:
网络带宽大于第一阈值;
待处理图像的数据量小于第二阈值。
可选地,所述第二压缩网络包括第二压缩编码网络、第二处理单元和第二压缩解码网络;
所述第四获取模块包括:
第六获取子模块,用于根据所述第二压缩编码网络对所述第二待处理对象进行编码,得到第二变量;
第七获取子模块,用于基于所述第二处理单元对所述第二变量进行量化、算术编码、算术解码和反量化,得到解码后的第二变量;
第八获取子模块,用于基于所述第二压缩解码网络对所述解码后的第二变量进行解码处理,得到重建后的第三特征图或第三子视频。
可选地,所述第二压缩网络是通过速率损失函数和失真损失函数训练得到的。
可选地,所述第一视频为所述目标对象的多视角视频,或者,所述第一视频为所述目标对象的可伸缩视频。
可选地,所述图像特征包括分辨率和特征数量中的至少一项。
本申请实施例的图像处理装置,获取目标对象的第一待处理对象,所述第一待处理图像包括所述目标对象的第一图像的第一特征图或者包括所述目标对象的第一视频中的第一子视频;基于第一压缩网络对所述第一待处理图像进行处理,获取重建后的第二特征图 或第二子视频;,第二特征图为第一图像的特征图,且重建后的第二特征图的图像特征与第一特征图的图像特征不同;重建后的第二子视频的图像特征与所述第一子视频的图像特征不同。通过上述第一压缩网络能够获取目标对象重建后的第二特征图或第二子视频,从而无需再对第二特征图或第二子视频进行编码传输,提高了编码效率,且基于压缩网络来获取重建后的第二特征图或第二子视频能够有效保证图像质量。
本申请实施例中的图像处理装置可以是电子设备,例如具有操作系统的电子设备,也可以是电子设备中的部件,例如集成电路或芯片。该电子设备可以是终端,也可以为除终端之外的其他设备。示例性的,终端可以包括但不限于上述所列举的终端11的类型,其他设备可以为服务器、网络附属存储器(Network Attached Storage,NAS)等,本申请实施例不作具体限定。
本申请实施例提供的图像处理装置能够实现图1至图6的方法实施例实现的各个过程,并达到相同的技术效果,为避免重复,这里不再赘述。
可选地,如图8所示,本申请实施例还提供一种图像处理设备800,包括处理器801和存储器802,存储器802上存储有可在所述处理器801上运行的程序或指令,该程序或指令被处理器801执行时实现上述图像处理方法实施例的各个步骤,且能达到相同的技术效果。为避免重复,这里不再赘述。
本申请实施例还提供一种图像处理设备,包括处理器和通信接口,处理器用于获取目标对象的第一待处理图像,所述第一待处理图像包括所述目标对象的第一图像的第一特征图或者包括所述目标对象的第一视频中的第一子视频;基于第一压缩网络对所述第一待处理图像进行处理,获取重建后的第二特征图或第二子视频;其中,所述第二特征图为所述第一图像的特征图,且所述重建后的第二特征图的图像特征与所述第一特征图的图像特征不同;所述重建后的第二子视频的图像与所述第一子视频的图像不同。该设备实施例与上述方法实施例对应,上述方法实施例的各个实施过程和实现方式均可适用于该设备实施例中,且能达到相同的技术效果。具体地,图9为实现本申请实施例的一种图像处理设备的硬件结构示意图。该图像处理设备具体为终端900。
该终端900包括但不限于:射频单元901、网络模块902、音频输出单元903、输入单元904、传感器905、显示单元906、用户输入单元907、接口单元908、存储器909以及处理器910等中的至少部分部件。
本领域技术人员可以理解,终端900还可以包括给各个部件供电的电源(比如电池),电源可以通过电源管理系统与处理器910逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。图9中示出的终端结构并不构成对终端的限定,终端可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置,在此不再赘述。
应理解的是,本申请实施例中,输入单元904可以包括图形处理器(Graphics Processing Unit,GPU)9041和麦克风9042,图形处理器9041对在视频捕获模式或图像捕获模式中由图像捕获装置(如摄像头)获得的静态图片或视频的图像数据进行处理。显示单元906 可包括显示面板9061,可以采用液晶显示器、有机发光二极管等形式来配置显示面板9061。用户输入单元907包括触控面板9071以及其他输入设备9072中的至少一种。触控面板9071,也称为触摸屏。触控面板9071可包括触摸检测装置和触摸控制器两个部分。其他输入设备9072可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆,在此不再赘述。
本申请实施例中,射频单元901接收来自网络侧设备的下行数据后,可以传输给处理器910进行处理;另外,射频单元901可以向网络侧设备发送上行数据。通常,射频单元901包括但不限于天线、放大器、收发信机、耦合器、低噪声放大器、双工器等。
存储器909可用于存储软件程序或指令以及各种数据。存储器909可主要包括存储程序或指令的第一存储区和存储数据的第二存储区,其中,第一存储区可存储操作系统、至少一个功能所需的应用程序或指令(比如声音播放功能、图像播放功能等)等。此外,存储器909可以包括易失性存储器或非易失性存储器,或者,存储器909可以包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDRSDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synch link DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DRRAM)。本申请实施例中的存储器909包括但不限于这些和任意其它适合类型的存储器。
处理器910可包括一个或多个处理单元;可选地,处理器910集成应用处理器和调制解调处理器,其中,应用处理器主要处理涉及操作系统、用户界面和应用程序等的操作,调制解调处理器主要处理无线通信信号,如基带处理器。可以理解的是,上述调制解调处理器也可以不集成到处理器910中。
其中,处理器910,用于获取目标对象的第一待处理图像,所述第一待处理图像包括所述目标对象的第一图像的第一特征图或者包括所述目标对象的第一视频中的第一子视频;
基于第一压缩网络对所述第一待处理图像进行处理,获取重建后的第二特征图或第二子视频;
其中,所述第二特征图为所述第一图像的特征图,且所述重建后的第二特征图的图像特征与所述第一特征图的图像特征不同;
所述重建后的第二子视频的图像与所述第一子视频的图像不同。
在本申请实施例中,获取目标对象的第一待处理对象,所述第一待处理图像包括所述 目标对象的第一图像的第一特征图或者包括所述目标对象的第一视频中的第一子视频;基于第一压缩网络对所述第一待处理图像进行处理,获取重建后的第二特征图或第二子视频;,第二特征图为第一图像的特征图,且重建后的第二特征图的图像特征与第一特征图的图像特征不同;第二子视频为第一视频中的部分视频,且重建后的第二子视频的图像与所述第一子视频的图像不同。通过上述第一压缩网络能够获取目标对象重建后的第二特征图或第二子视频,从而无需再对第二特征图或第二子视频进行编码传输,提高了编码效率,且基于压缩网络来获取重建后的第二特征图或第二子视频能够有效保证图像质量。
可选地,处理器910,还用于:
利用目标神经网络获取所述第一图像的多个特征图,所述目标神经网络为用于提取图像特征的神经网络;
在所述多个特征图中选取一个特征图作为所述第一特征图。
可选地,所述第一压缩网络包括第一压缩编码网络、第一处理单元和第一压缩解码网络;
处理器910,还用于:
基于所述第一压缩编码网络对所述第一待处理图像进行编码,得到第一变量;
基于所述第一处理单元对所述第一变量进行量化、算术编码、算术解码和反量化,得到解码后的第一变量;
基于所述第一压缩解码网络对所述解码后的第一变量进行解码处理,得到重建后的第二特征图或第二子视频。
可选地,所述第一压缩网络是通过速率损失函数和失真损失函数训练得到的。
可选地,处理器910,还用于:
基于预测复原网络对重建后的第二特征图进行目标处理,得到重建后的第一特征图;
或者,基于预测复原网络对重建后的第二子视频进行目标处理,得到重建后的第一子视频;
其中,所述预测复原网络是通过增强损失函数训练得到的。
可选地,所述目标处理包括采样处理和恢复残差处理,所述采样处理包括上采样处理或下采样处理。
可选地,处理器910,还用于:
在满足第一条件的情况下,基于第一压缩网络对所述第一待处理图像进行处理,获取重建后的第二特征图或第二子视频;
其中,所述第一条件包括以下至少一项:
网络带宽小于或者等于第一阈值;
第一待处理图像的数据量大于或者等于第二阈值。
可选地,处理器910,还用于:
基于第二压缩网络对所述目标对象的第二待处理图像进行处理,得到重建后的第三特 征图或第三子视频;
所述第二待处理图像包括所述第一图像的至少一个第三特征图或者包括所述第一子视频的至少一个第三子视频。
可选地,处理器910,还用于:
在满足第二条件的情况下,基于第二压缩网络对所述目标对象的第二待处理图像进行处理,得到重建后的第三特征图或第三子视频;
其中,所述第二条件包括以下至少一项:
网络带宽大于第一阈值;
第二待处理图像的数据量小于第二阈值。
可选地,所述第二压缩网络包括第二压缩编码网络、第二处理单元和第二压缩解码网络;
处理器910,还用于:
根据所述第二压缩编码网络对所述第二待处理对象进行编码,得到第二变量;
基于所述第二处理单元对所述第二变量进行量化、算术编码、算术解码和反量化,得到解码后的第二变量;
基于所述第二压缩解码网络对所述解码后的第二变量进行解码处理,得到重建后的第三特征图或第三子视频。
可选地,所述第二压缩网络是通过速率损失函数和失真损失函数训练得到的。
可选地,所述第一视频为所述目标对象的多视角视频,或者,所述第一视频为所述目标对象的可伸缩视频。
可选地,所述图像特征包括分辨率和特征数量中的至少一项。
在本申请实施例中,获取目标对象的第一待处理对象,所述第一待处理图像包括所述目标对象的第一图像的第一特征图或者包括所述目标对象的第一视频中的第一子视频;基于第一压缩网络对所述第一待处理图像进行处理,获取重建后的第二特征图或第二子视频;,第二特征图为第一图像的特征图,且重建后的第二特征图的图像特征与第一特征图的图像特征不同;重建后的第二子视频的图像与所述第一子视频的图像不同。通过上述第一压缩网络能够获取目标对象重建后的第二特征图或第二子视频,从而无需再对第二特征图或第二子视频进行编码传输,提高了编码效率,且基于压缩网络来获取重建后的第二特征图或第二子视频能够有效保证图像质量。
本申请实施例还提供一种可读存储介质,该存储介质可以是易失的或非易失的,所述可读存储介质上存储有程序或指令,该程序或指令被处理器执行时实现上述图像处理方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
其中,所述处理器为上述实施例中所述的终端中的处理器。所述可读存储介质,包括计算机可读存储介质,如计算机只读存储器ROM、随机存取存储器RAM、磁碟或者光盘等。
本申请实施例另提供了一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现上述图像处理方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
应理解,本申请实施例提到的芯片还可以称为系统级芯片,系统芯片,芯片系统或片上系统芯片等。
本申请实施例另提供了一种计算机程序/程序产品,所述计算机程序/程序产品被存储在存储介质中,所述计算机程序/程序产品被至少一个处理器执行以实现上述图像处理方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。此外,需要指出的是,本申请实施方式中的方法和装置的范围不限按示出或讨论的顺序来执行功能,还可包括根据所涉及的功能按基本同时的方式或按相反的顺序来执行功能,例如,可以按不同于所描述的次序来执行所描述的方法,并且还可以添加、省去、或组合各种步骤。另外,参照某些示例所描述的特征可在其他示例中被组合。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以计算机软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
上面结合附图对本申请的实施例进行了描述,但是本申请并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本申请的启示下,在不脱离本申请宗旨和权利要求所保护的范围情况下,还可做出很多形式,均属于本申请的保护之内。

Claims (26)

  1. 一种图像处理方法,包括:
    获取目标对象的第一待处理图像,所述第一待处理图像包括所述目标对象的第一图像的第一特征图或者包括所述目标对象的第一视频中的第一子视频;
    基于第一压缩网络对所述第一待处理图像进行处理,获取重建后的第二特征图或第二子视频;
    其中,所述第二特征图为所述第一图像的特征图,且所述重建后的第二特征图的图像特征与所述第一特征图的图像特征不同;
    所述重建后的第二子视频的图像与所述第一子视频的图像不同。
  2. 根据权利要求1所述的方法,其中,获取目标对象的第一图像的第一特征图,包括:
    利用目标神经网络获取所述第一图像的多个特征图,所述目标神经网络为用于提取图像特征的神经网络;
    在所述多个特征图中选取一个特征图作为所述第一特征图。
  3. 根据权利要求1所述的方法,其中,所述第一压缩网络包括第一压缩编码网络、第一处理单元和第一压缩解码网络;
    所述基于第一压缩网络对所述第一待处理图像进行处理,获取重建后的第二特征图或第二子视频,包括:
    基于所述第一压缩编码网络对所述第一待处理图像进行编码,得到第一变量;
    基于所述第一处理单元对所述第一变量进行量化、算术编码、算术解码和反量化,得到解码后的第一变量;
    基于所述第一压缩解码网络对所述解码后的第一变量进行解码处理,得到重建后的第二特征图或第二子视频。
  4. 根据权利要求1所述的方法,其中,所述第一压缩网络是通过速率损失函数和失真损失函数训练得到的。
  5. 根据权利要求1所述的方法,还包括:
    基于预测复原网络对重建后的第二特征图进行目标处理,得到重建后的第一特征图;
    或者,基于预测复原网络对重建后的第二子视频进行目标处理,得到重建后的第一子视频;
    其中,所述预测复原网络是通过增强损失函数训练得到的。
  6. 根据权利要求5所述的方法,其中,所述目标处理包括采样处理和恢复残差处理,所述采样处理包括上采样处理或下采样处理。
  7. 根据权利要求1所述的方法,其中,基于第一压缩网络对所述第一待处理图像进行处理,获取重建后的第二特征图或第二子视频,包括:
    在满足第一条件的情况下,基于第一压缩网络对所述第一待处理图像进行处理,获取 重建后的第二特征图或第二子视频;
    其中,所述第一条件包括以下至少一项:
    网络带宽小于或者等于第一阈值;
    第一待处理图像的数据量大于或者等于第二阈值。
  8. 根据权利要求1所述的方法,还包括:
    基于第二压缩网络对所述目标对象的第二待处理图像进行处理,得到重建后的第三特征图或第三子视频;
    所述第二待处理图像包括所述第一图像的至少一个第三特征图或者包括所述第一子视频的至少一个第三子视频。
  9. 根据权利要求8所述的方法,其中,基于第二压缩网络对所述目标对象的第二待处理图像进行处理,得到重建后的第三特征图或第三子视频,包括:
    在满足第二条件的情况下,基于第二压缩网络对所述目标对象的第二待处理图像进行处理,得到重建后的第三特征图或第三子视频;
    其中,所述第二条件包括以下至少一项:
    网络带宽大于第一阈值;
    第二待处理图像的数据量小于第二阈值。
  10. 根据权利要求8所述的方法,其中,所述第二压缩网络包括第二压缩编码网络、第二处理单元和第二压缩解码网络;
    所述基于第二压缩网络对所述目标对象的第二待处理对象进行处理,得到重建后的第三特征图或第三子视频,包括:
    根据所述第二压缩编码网络对所述第二待处理对象进行编码,得到第二变量;
    基于所述第二处理单元对所述第二变量进行量化、算术编码、算术解码和反量化,得到解码后的第二变量;
    基于所述第二压缩解码网络对所述解码后的第二变量进行解码处理,得到重建后的第三特征图或第三子视频。
  11. 根据权利要求8所述的方法,其中,所述第二压缩网络是通过速率损失函数和失真损失函数训练得到的。
  12. 根据权利要求1所述的方法,其中,所述第一视频为所述目标对象的多视角视频,或者,所述第一视频为所述目标对象的可伸缩视频。
  13. 根据权利要求1所述的方法,其中,所述图像特征包括分辨率和特征数量中的至少一项。
  14. 一种图像处理装置,包括:
    第一获取模块,用于获取目标对象的第一待处理图像,所述第一待处理图像包括所述目标对象的第一图像的第一特征图或者包括所述目标对象的第一视频中的第一子视频;
    第二获取模块,用于基于第一压缩网络对所述第一待处理图像进行处理,获取重建后 的第二特征图或第二子视频;
    其中,所述第二特征图为所述第一图像的特征图,且所述重建后的第二特征图的图像特征与所述第一特征图的图像特征不同;
    所述重建后的第二子视频的图像特征与所述第一子视频的图像不同。
  15. 根据权利要求14所述的装置,其中,所述第一获取模块包括:
    第一获取子模块,用于利用目标神经网络获取所述第一图像的多个特征图,所述目标神经网络为用于提取图像特征的神经网络;
    第二获取子模块,用于在所述多个特征图中选取一个特征图作为所述第一特征图。
  16. 根据权利要求14所述的装置,其中,所述第一压缩网络包括第一压缩编码网络、第一处理单元和第一压缩解码网络;
    所述第二获取模块包括:
    第三获取子模块,用于基于所述第一压缩编码网络对所述第一待处理图像进行编码,得到第一变量;
    第四获取子模块,用于基于所述第一处理单元对所述第一变量进行量化、算术编码、算术解码和反量化,得到解码后的第一变量;
    第五获取子模块,用于基于所述第一压缩解码网络对所述解码后的第一变量进行解码处理,得到重建后的第二特征图或第二子视频。
  17. 根据权利要求14所述的装置,其中,所述第一压缩网络是通过速率损失函数和失真损失函数训练得到的。
  18. 根据权利要求14所述的装置,其中,还包括:
    第三获取模块,用于基于预测复原网络对重建后的第二特征图进行目标处理,得到重建后的第一特征图;
    或者,基于预测复原网络对重建后的第二子视频进行目标处理,得到重建后的第一子视频;
    其中,所述预测复原网络是通过增强损失函数训练得到的。
  19. 根据权利要求18所述的装置,其中,所述目标处理包括采样处理和恢复残差处理,所述采样处理包括上采样处理或下采样处理。
  20. 根据权利要求14所述的装置,其中,还包括:
    第四获取模块,用于基于第二压缩网络对所述目标对象的第二待处理图像进行处理,得到重建后的第三特征图或第三子视频;
    所述第二待处理图像包括所述第一图像的至少一个第三特征图或者包括所述第一子视频的至少一个第三子视频。
  21. 根据权利要求20所述的装置,其中,所述第二压缩网络包括第二压缩编码网络、第二处理单元和第二压缩解码网络;
    所述第四获取模块包括:
    第六获取子模块,用于根据所述第二压缩编码网络对所述第二待处理对象进行编码,得到第二变量;
    第七获取子模块,用于基于所述第二处理单元对所述第二变量进行量化、算术编码、算术解码和反量化,得到解码后的第二变量;
    第八获取子模块,用于基于所述第二压缩解码网络对所述解码后的第二变量进行解码处理,得到重建后的第三特征图或第三子视频。
  22. 根据权利要求20所述的装置,其中,所述第二压缩网络是通过速率损失函数和失真损失函数训练得到的。
  23. 根据权利要求14所述的装置,其中,所述第一视频为所述目标对象的多视角视频,或者,所述第一视频为所述目标对象的可伸缩视频。
  24. 根据权利要求14所述的装置,其中,所述图像特征包括分辨率和特征数量中的至少一项。
  25. 一种图像处理设备,包括处理器和存储器,所述存储器存储可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如权利要求1至13任一项所述的图像处理方法的步骤。
  26. 一种可读存储介质,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如权利要求1至13任一项所述的图像处理方法的步骤。
PCT/CN2023/123322 2022-10-13 2023-10-08 图像处理方法、装置及设备 WO2024078403A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211254689.1 2022-10-13
CN202211254689.1A CN117939157A (zh) 2022-10-13 2022-10-13 图像处理方法、装置及设备

Publications (1)

Publication Number Publication Date
WO2024078403A1 true WO2024078403A1 (zh) 2024-04-18

Family

ID=90668812

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/123322 WO2024078403A1 (zh) 2022-10-13 2023-10-08 图像处理方法、装置及设备

Country Status (2)

Country Link
CN (1) CN117939157A (zh)
WO (1) WO2024078403A1 (zh)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106791927A (zh) * 2016-12-23 2017-05-31 福建帝视信息科技有限公司 一种基于深度学习的视频增强与传输方法
CN111970513A (zh) * 2020-08-14 2020-11-20 成都数字天空科技有限公司 一种图像处理方法、装置、电子设备及存储介质
WO2022057837A1 (zh) * 2020-09-16 2022-03-24 广州虎牙科技有限公司 图像处理和人像超分辨率重建及模型训练方法、装置、电子设备及存储介质
CN114501013A (zh) * 2022-01-14 2022-05-13 上海交通大学 一种可变码率视频压缩方法、系统、装置及存储介质
CN114882350A (zh) * 2022-03-30 2022-08-09 北京市测绘设计研究院 图像处理方法及装置、电子设备和存储介质
CN114897711A (zh) * 2022-04-06 2022-08-12 厦门美图之家科技有限公司 一种视频中图像处理方法、装置、设备及存储介质
US20220286696A1 (en) * 2021-03-02 2022-09-08 Samsung Electronics Co., Ltd. Image compression method and apparatus

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106791927A (zh) * 2016-12-23 2017-05-31 福建帝视信息科技有限公司 一种基于深度学习的视频增强与传输方法
CN111970513A (zh) * 2020-08-14 2020-11-20 成都数字天空科技有限公司 一种图像处理方法、装置、电子设备及存储介质
WO2022057837A1 (zh) * 2020-09-16 2022-03-24 广州虎牙科技有限公司 图像处理和人像超分辨率重建及模型训练方法、装置、电子设备及存储介质
US20220286696A1 (en) * 2021-03-02 2022-09-08 Samsung Electronics Co., Ltd. Image compression method and apparatus
CN114501013A (zh) * 2022-01-14 2022-05-13 上海交通大学 一种可变码率视频压缩方法、系统、装置及存储介质
CN114882350A (zh) * 2022-03-30 2022-08-09 北京市测绘设计研究院 图像处理方法及装置、电子设备和存储介质
CN114897711A (zh) * 2022-04-06 2022-08-12 厦门美图之家科技有限公司 一种视频中图像处理方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN117939157A (zh) 2024-04-26

Similar Documents

Publication Publication Date Title
US12136186B2 (en) Super resolution image processing method and apparatus
EP3571841B1 (en) Dc coefficient sign coding scheme
Chen et al. Dynamic measurement rate allocation for distributed compressive video sensing
KR101266667B1 (ko) 장치 내 제어기에서 프로그래밍되는 압축 방법 및 시스템
WO2022155974A1 (zh) 视频编解码以及模型训练方法与装置
US10277905B2 (en) Transform selection for non-baseband signal coding
WO2023279961A1 (zh) 视频图像的编解码方法及装置
Chen et al. Learning to compress videos without computing motion
CN107018416B (zh) 用于视频和图像压缩的自适应贴片数据大小编码
TW202324308A (zh) 圖像編解碼方法和裝置
CN116918329A (zh) 一种视频帧的压缩和视频帧的解压缩方法及装置
WO2024078066A1 (zh) 视频解码方法、视频编码方法、装置、存储介质及设备
WO2024078403A1 (zh) 图像处理方法、装置及设备
WO2023193629A1 (zh) 区域增强层的编解码方法和装置
WO2023225808A1 (en) Learned image compress ion and decompression using long and short attention module
WO2023133888A1 (zh) 图像处理方法、装置、遥控设备、系统及存储介质
WO2023133889A1 (zh) 图像处理方法、装置、遥控设备、系统及存储介质
CN116847087A (zh) 视频处理方法、装置、存储介质及电子设备
US20130308698A1 (en) Rate and distortion estimation methods and apparatus for coarse grain scalability in scalable video coding
CN115767085A (zh) 数据处理方法及其装置
WO2024131692A1 (zh) 图像处理方法、装置及设备
CN111491166A (zh) 基于内容分析的动态压缩系统及方法
WO2024222109A1 (zh) 一种编码方法、解码方法及相关设备
US20240380924A1 (en) Geometric transformations for video compression
WO2024007977A1 (zh) 图像处理方法、装置及设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23876605

Country of ref document: EP

Kind code of ref document: A1