CN115695803B

CN115695803B - Inter-frame image coding method based on extreme learning machine

Info

Publication number: CN115695803B
Application number: CN202310000697.1A
Authority: CN
Inventors: 蒋先涛; 柳云夏; 郭咏梅; 郭咏阳
Original assignee: Ningbo Kangda Kaineng Medical Technology Co ltd
Current assignee: Ningbo Kangda Kaineng Medical Technology Co ltd
Priority date: 2023-01-03
Filing date: 2023-01-03
Publication date: 2023-05-12
Anticipated expiration: 2043-01-03
Also published as: CN115695803A

Abstract

The invention discloses an inter-frame image coding method based on an extreme learning machine, which relates to the technical field of image processing and comprises the following steps: extracting an intra-frame coding frame in a current image group in a target video sequence; extracting a characteristic vector set related to the division of each coding unit and the coding unit under each coding depth of an intra-frame coding frame and a corresponding division result as a training set; training the initialized ELM classifier under the dual optimization problem through a training set; acquiring a characteristic vector of a target coding unit after coding division under the current coding depth of a target inter-frame image in a current image group; and judging the division mode according to the feature vector by the trained ELM classifier. According to the invention, the characteristic relation is researched by utilizing the extreme learning machine, and coding division judgment is not needed by calculating the rate distortion cost under different coding division modes, so that the coding calculation complexity is greatly reduced, and the coding efficiency is improved.

Description

Inter-frame image coding method based on extreme learning machine

Technical Field

The invention relates to the technical field of image processing, in particular to an inter-frame image coding method based on an extreme learning machine.

Background

With the development of video technology, the video compression coding standard HEVC of the previous generation has been difficult to meet the increasing demands of people. The newly proposed VVC encoder is still a mainstream video encoding framework, that is, a hybrid encoding framework based on block encoding, and mainly includes intra-frame prediction, inter-frame prediction, variation, quantization, entropy encoding, loop filtering, and other modules. Although coding standards are changed, coding efficiency is improved, coding complexity and coding time are increased, and overall improvement is not ideal.

With the development of video technology and the demand for real-time coding, video coding acceleration has become a research hotspot. The coding algorithm acceleration is mainly improved aiming at the existing coding algorithm, and the algorithm complexity is reduced at the expense of some compression performance by reducing the calculation amount of the algorithm, and the coding algorithm acceleration is mainly expressed as improvement on the dividing process of coding units. This part of the algorithm can be divided into two main categories: based on statistical analysis algorithms and based on machine learning algorithms. But the existing method does not balance the coding quality and the coding calculation complexity well.

Disclosure of Invention

In order to better balance coding efficiency and coding complexity, the invention provides an inter-frame image coding method based on an extreme learning machine, which comprises the following steps:

s1: extracting an intra-frame coding frame in a current image group in a target video sequence;

s2: extracting a characteristic vector set related to the division of each coding unit and the coding unit under each coding depth of an intra-frame coding frame and a corresponding division result as a training set;

s3: training the initialized ELM classifier under the dual optimization problem through a training set;

s4: acquiring a characteristic vector of a target coding unit after coding division under the current coding depth of a target inter-frame image in a current image group;

s5: and (4) judging a division mode according to the feature vector through the trained ELM classifier, entering the next coding depth before reaching the maximum coding depth, and returning to the step (S4).

Further, in the step S2, the feature vector of the coding unit includes a rate distortion cost, a coding depth, and a prediction residual.

Further, the ELM classifier includes n input layers, L hidden layers, and m output layers, where the value of n is the number of coding units in the intra-frame coding frame, the value of L is a global optimal solution obtained by training, and the value of m is the number of coding division modes of the current video coding standard.

Further, in the step S2, the training set is expressed as

Wherein->

For an input set numbered i consisting of feature vectors of an i-th coding unit in an intra-coded frame,/v>

For the desired value output set, +.>

Is that

A corresponding expected value.

Further, in the step S3, the dual optimization problem of the ELM classifier is expressed as the following formula:

in the method, in the process of the invention,

for the number of hidden layers after dual optimization, i and j are constants, +.>

For the connection weight of the ith hidden layer neuron and the jth output layer neuron which are limited by taking L and m as matrix sizes, C is a constant, < ->

To be intra-coded in framei input set numbered i consisting of feature vectors of i coding units,/i>

For input set +.>

Error between actual value and expected value of corresponding division pattern, +.>

Lagrangian multiplier greater than zero for the ith input layer and the jth output layer, +.>

For implicit layer all neurons for input set +.>

Response of->

For the connection weight of each hidden layer neuron and the jth output layer neuron,/for each hidden layer neuron>

For the i-th input layer and j-th output layer for the desired value of the division mode, +.>

Is the error between the actual value of the i-th input layer and the expected value of the j-th output layer.

Further, in the step S5, the trained ELM classifier is expressed as the following formula:

in the method, in the process of the invention,

for the output of the ELM classifier, +.>

For training set with total n +.>

As an output matrix of the hidden layer,

for the set of expected values, the superscript T of the matrix indicates that the matrix to which it belongs is transposed.

Further, in the step S5, the division pattern is obtained by the following formula:

where x is the feature vector of the target coding unit,

and (5) a partition mode label for the target coding unit.

Compared with the prior art, the invention at least has the following beneficial effects:

(1) According to the inter-frame image coding method based on the extreme learning machine, the feature relation among the rate distortion cost, the prediction residual error, the coding depth and the coding division mode is researched by the extreme learning machine, so that the prediction of the division mode can be performed through the acquisition of the feature vector when the inter-frame coding is performed subsequently, and coding division judgment is not needed through calculating the rate distortion cost under different coding division modes, the coding calculation complexity is greatly reduced, and the coding efficiency is improved;

(2) By utilizing the characteristic that the intra-frame coding frame contains all coding information, the feature relation of the limit learning machine is researched, so that the training result can be ensured to the greatest extent to be more suitable for the partition mode prediction of the residual inter-frame images of the current picture group, and the coding quality is ensured.

Drawings

FIG. 1 is a step diagram of an extreme learning machine-based inter-frame image encoding method;

FIG. 2 is a schematic diagram of a partitioning mode of a VVC standard;

fig. 3 is a schematic diagram of the structure of an ELM classifier.

Detailed Description

The following are specific embodiments of the present invention and the technical solutions of the present invention will be further described with reference to the accompanying drawings, but the present invention is not limited to these embodiments.

Example 1

Similar to the HEVC Coding standard, the VVC Coding standard uses a block-based Coding scheme, where each frame of pictures is first divided into Coding tree Units (Coding Tree Units, CTUs), and then the CTUs are further divided into smaller Coding Units (CUs). In the intra-prediction process, the obvious difference between the two is that only Quadtree (QT) partition is allowed in HEVC, and a quadtree nested multi-tree (QTMT) partition structure is introduced in VVC. As shown in fig. 2, the CU in VVC has 6 coding division modes: non-split (NS), quadtree split (QT), horizontal binary tree split (BTH), vertical binary tree split (BTV), horizontal trigeminal tree split (TTH), vertical trigeminal tree split (TTV). It can be seen that, compared with the previous HEVC coding standard, the VVC coding standard has four more dividing modes, and if the dividing mode acquisition based on the rate distortion cost as the decision standard is continuously adopted, the computational complexity is definitely improved by three times, so that the overall coding efficiency is reduced. Therefore, in order to effectively balance the coding quality and the coding efficiency, as shown in fig. 1, the present invention proposes an inter-frame image coding method based on an extreme learning machine, comprising the steps of:

The extreme learning machine (Extreme Learning Machine, ELM) is a type of machine learning system built based on a feedforward neural network, which is an improved algorithm based on a single hidden layer feedforward neural network (SLFN). The ELM algorithm has the greatest advantage over the conventional SLFN algorithm in that there is no need to update parameters in training, such as weights between input layer and hidden layer and thresholds of hidden layer neurons. Once the number of hidden layer nodes is determined, the ELM algorithm can obtain a unique global optimal solution, so that the method has good generalization and general approximation capability.

Based on the excellent characteristics of the extreme learning machine, the present invention proposes to use it as a division mode decision for inter-frame coding. In order to better enable the extreme learning machine to acquire the relation between the division mode judgment and the related information in the inter-frame images, the invention selects the intra-frame coding frame in the current image group of the target video as an acquisition source of training parameters. Since the intra reference frame does not have default coding information when compressed, it can be decompressed alone into a single complete video picture. The continuous inter-frame images in the same picture group have content continuity, so that the characteristic of complete coding information of the intra-frame coding frames can be fully utilized to train the logic relationship between relevant characteristic parameters and coding division mode judgment of the limit learning machine.

In order to make the training set more specific and avoid analysis and research on unnecessary features in the training process of the extreme learning machine, the invention selects the rate distortion cost and the prediction residual associated with coding division as feature vectors, and adds the coding depth which can influence the judgment of the division mode into the feature vectors

(wherein i in this embodiment all refer to constants, here +.>

An input set numbered i, represented as a feature vector of an i-th coding unit in an intra-coded frame). Therefore, when the limit learning machine is trained, the training set is composed of the feature vector set of each coding unit under each coding depth of the intra-frame coding frame and the corresponding dividing result.

As shown in fig. 3, the neuron configuration of the ELM classifier structure includes: n input layers, L hidden layer neurons, m output layer neurons, wherein the value of n depends on the number of coding units in the current intra-frame coding frame, the value of L is the global optimal solution obtained by final training, and the value of m is the number of coding division modes of the current video coding standard (taking VVC coding standard as an example, m=6). For each intra-coded frame, the extreme learning input set (i.e., the feature vector set) is set to

，

Is->

Node bias of hidden layer->

For the bias of the output layer node,

as implicit layer excitation function and with a de-linear function as output layer excitation function, with +.>

Implicit in referenceResponse of all neurons to input quantity set x, then +.>

The expression of (2) is:

（1）

in the formula (1),

for each input layer neuron and +.>

The connection weights of the neurons of the hidden layers are calculated and output can be obtained:

（2）

in the formula (2),

for the output matrix of the output layer, < > for>

The connection weight of the ith hidden layer neuron and the jth output layer neuron which are limited by the matrix size with L and m is used.

Error between actual and expected value outputs of neural network according to training target of extreme learning machine

It is desirable to satisfy the formula as much as possible:

（3）

in the formula (3),

for the desired value output set, +.>

Is->

A corresponding expected value. From this formula, it can be seen that there is a suitable +.>

、

、

When they satisfy the following formula:

（4）

in the formula (4) of the present invention,

is the output matrix of the hidden layer,

is the set of expected values. It should be noted that, in the present embodiment, all the superscript T is denoted as matrix transposition on the matrix to which it belongs, which has a different meaning from the expected value set T.

According to the statistical learning theory, the experience risk and the structured risk form the actual risk, so that the output weight and the actual error need to be minimized, namely, the output weight and the actual error need to be minimized during the learning of the extreme learning machine

And->

Expressed by the formula:

（5）

（6）

in the formulas (5) and (6),

to minimize the number of hidden layers after optimization, C is constant, < >>

For implicit layer all neurons for input set +.>

Response of->

For input set +.>

An error between an actual value and an expected value of the corresponding division pattern. According to the KKT theory (for weak constraints, the minimum of the function is calculated), training ELM is equivalent to solving the following dual optimization problem:

（7）

in the formula (7) of the present invention,

for the number of hidden layers after dual optimization, +.>

If the input weight is minimized and the error is minimized, then there are:

（8）

（9）

（10）

in the formulas (8), (9) and (10),

is Lagrangian multiplier greater than zero corresponding between the ith input layer and each output layer, +.>

Is->

A Lagrangian multiplier matrix is formed. The representation (10) is represented by a matrix method, and then:

（11）

from formulas (8) and (11), it is possible to obtain:

（12）

the output of the ELM classifier can be expressed as:

（13）

in the formula (13), it is set

Output function for the ith output node, i.e.

. Then, with the trained ELM classifier, the mode of division acquisition can be performed by selecting the ELM classifier output with the highest probability:

（14）

in equation (14), x is the eigenvector of the target coding unit,

and (5) a partition mode label for the target coding unit.

In general, the intra-frame coding frames in each image group are utilized to carry out feature extraction learning training on the ELM classifier to obtain a formula (13), then the feature vectors of the target coding units after coding division under different coding depths of the subsequent inter-frame images of the same image group are extracted, and the formula (14) is utilized to judge the division mode, so that the inter-frame image coding is realized with higher efficiency on the premise of ensuring the coding quality.

In summary, according to the inter-frame image coding method based on the extreme learning machine, the feature relation among the rate distortion cost, the prediction residual error, the coding depth and the coding division mode is studied by using the extreme learning machine, so that the prediction of the division mode can be performed through the acquisition of the feature vector when the inter-frame coding is performed subsequently.

By utilizing the characteristic that the intra-frame coding frame contains all coding information, the feature relation of the limit learning machine is researched, so that the training result can be ensured to the greatest extent to be more suitable for the partition mode prediction of the residual inter-frame images of the current picture group, and the coding quality is ensured.

It should be noted that all directional indicators (such as up, down, left, right, front, and rear … …) in the embodiments of the present invention are merely used to explain the relative positional relationship, movement, etc. between the components in a particular posture (as shown in the drawings), and if the particular posture is changed, the directional indicator is changed accordingly.

Furthermore, descriptions such as those referred to herein as "first," "second," "a," and the like are provided for descriptive purposes only and are not to be construed as indicating or implying a relative importance or an implicit indication of the number of features being indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present invention, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.

In the present invention, unless specifically stated and limited otherwise, the terms "connected," "affixed," and the like are to be construed broadly, and for example, "affixed" may be a fixed connection, a removable connection, or an integral body; can be mechanically or electrically connected; either directly or indirectly, through intermediaries, or both, may be in communication with each other or in interaction with each other, unless expressly defined otherwise. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.

In addition, the technical solutions of the embodiments of the present invention may be combined with each other, but it is necessary to be based on the fact that those skilled in the art can implement the technical solutions, and when the technical solutions are contradictory or cannot be implemented, the combination of the technical solutions should be considered as not existing, and not falling within the scope of protection claimed by the present invention.

Claims

1. An inter-frame image coding method based on an extreme learning machine is characterized by comprising the following steps:

s5: judging a division mode according to the feature vector through the trained ELM classifier, entering the next coding depth before reaching the maximum coding depth, and returning to the step S4;

in the step S3, the dual optimization problem of the ELM classifier is expressed as the following formula:

in the method, in the process of the invention,

For input set +.>

For implicit layer all neurons for input set +.>

Response of->

Is the error between the actual value of the ith input layer and the expected value of the jth output layer;

in the step S5, the trained ELM classifier is expressed as the following formula:

in the method, in the process of the invention,

for the output of the ELM classifier, +.>

For training set with total n +.>

Output matrix for hidden layer +_>

For the expected value set, the superscript T of the matrix indicates that matrix transposition is performed on the matrix to which the matrix belongs;

the partition pattern is obtained by the following formula:

where x is the feature vector of the target coding unit,

and (5) a partition mode label for the target coding unit.

2. The method for coding an inter image based on an extreme learning machine according to claim 1, wherein in the step S2, the feature vector of the coding unit includes a rate distortion cost, a coding depth, and a prediction residual.

3. The method for coding an inter-frame image based on an extreme learning machine according to claim 1, wherein the ELM classifier comprises n input layers, L hidden layers and m output layers, wherein the value of n is the number of coding units in an intra-frame coding frame, the value of L is a global optimal solution obtained through training, and the value of m is the number of coding division modes of a current video coding standard.

4. The method for encoding an inter-frame image based on an extreme learning machine as claimed in claim 3, wherein in said step S2, the training set is expressed as

Wherein->

For the desired value output set, +.>

Is->

A corresponding expected value. />