CN115174967A

CN115174967A - Code rate dynamic allocation method based on bandwidth estimation

Info

Publication number: CN115174967A
Application number: CN202210782151.1A
Authority: CN
Inventors: 王聿隽; 姜蔚; 江晓
Original assignee: Shandong Henghao Information Technology Co ltd
Current assignee: Shandong Henghao Information Technology Co ltd
Priority date: 2022-07-04
Filing date: 2022-07-04
Publication date: 2022-10-11

Abstract

The invention provides a code rate dynamic allocation method based on bandwidth estimation, which comprises the steps of constructing a bandwidth estimation neural network, and estimating the total bandwidth value required by sending each coded stream to be transmitted; establishing a corresponding relation between the bandwidth and the code rate, obtaining the estimated code rate according to the estimated bandwidth, and calculating a target code rate according to the position of the current frame and the network bandwidth condition based on the obtained estimated code rate; setting a region division rule, dividing a video picture into a plurality of region blocks, and distributing code rate to each region according to the coding priority and the target code rate set by a user. The invention solves the problems that the existing code rate distribution method has poor robustness, too slow distribution efficiency, larger code rate distribution deviation and can not ensure the communication transmission quality.

Description

Code rate dynamic allocation method based on bandwidth estimation

Technical Field

The invention relates to the field of computers, in particular to a code rate dynamic allocation method based on bandwidth estimation.

Background

With the continuous development of digital media technology, video communication has become an important communication mode for people to communicate remotely, and in order to guarantee the video quality during communication of each client, a video meeting the network bandwidth requirement of the client needs to be transmitted to each client.

Rate allocation is a key module in efficient video coding, with the goal of improving the quality of reconstructed video at a suitable target bit rate. At a given bit rate, the visual quality will be optimized by a reasonable allocation of bits to the blocks of each frame. The existing code rate distribution method has the problems of poor robustness and excessively slow distribution efficiency, and the code rate distribution deviation is large, so that the communication transmission quality cannot be guaranteed.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the existing code rate distribution method has the defects of poor robustness, too slow distribution efficiency and larger code rate distribution deviation, and cannot ensure the communication transmission quality. Therefore, a code rate dynamic allocation method based on bandwidth estimation is provided.

The invention discloses a code rate dynamic allocation method based on bandwidth estimation, which comprises the following steps:

s1, constructing a bandwidth prediction neural network, and predicting the total bandwidth value required by sending each coded stream to be transmitted;

s2, establishing a corresponding relation between the bandwidth and the code rate, obtaining the estimated code rate according to the estimated bandwidth, and calculating a target code rate according to the position of the current frame and the network bandwidth condition on the basis of the obtained estimated code rate;

and S3, setting a region division rule, dividing the video picture into a plurality of region blocks, and distributing code rates to each region according to the coding priority and the target code rate set by the user.

Further, the step S1 includes:

constructing a bandwidth prediction neural network, and forming influence factors such as packet loss number, packet loss rate, round-trip delay and the like of each to-be-transmitted coded stream into an input vector

N represents shadowThe number of response factors, I, represents the I-th set of vectors. The bandwidth prediction neural network comprises an input layer, a time fusion layer, a space fusion layer, an equalization layer, a circulation layer and an output layer.

The input of the input layer is an input vector S of continuous t time points ^I (I e (0, t)) for preprocessing the input data, including normalization, padding, deduplication, etc., the preprocessing being prior art, and the invention not being elaborated upon herein. The number of neurons in the input layer is t, and the processed data is

And transmitting to a time fusion layer.

The time fusion layer fuses the input data according to the time sequence, and the fusion method comprises the following steps:

for the output of the time fusion layer, the number of neurons of the time fusion layer is N, each neuron after fusion represents an influence factor fused with t time points, and the fused data are transmitted to the space fusion layer;

the space fusion layer fuses input data in a space range, the space fusion layer is fully connected with the time fusion layer, and the fusion method comprises the following steps:

OS ^I representing the output of the spatial fusion layer, wherein the number of neurons of the spatial fusion layer is 1, each neuron represents fusion data of all influence factors after fusion, and the fusion data is transmitted to the homogenization layer;

the homogenization layer performs homogenization treatment on the fused data:

OA denotes the output of the homogenization layer, which includes temporal level homogenization and spatial level homogenization, and which has only one neuron. Setting the homogenization threshold epsilon _A If the data after homogenization is less than the homogenization threshold, OA < epsilon _A The homogenization layer transmits the difference value between the processed data and the homogenization threshold value to the time fusion layer, and fusion calculation is carried out again after the difference value data is added to the data of each neuron in the time fusion layer; otherwise, the homogenization layer transmits the processed data to the circulation layer. The homogenization threshold value can be set according to actual conditions;

the circulation layer has only one neuron, and the activation formula is as follows:

wherein, OC ^T A cyclic layer output representing a current time period, T represents an arbitrary time period, each time period is composed of T time points, ω ^T Represents the connection weight, OA, of the equalization layer and the loop layer ^T Indicating the homogenized layer output, OC, of the current time segment ^(T-1) Indicating the output of the cycle layer of the previous time period, the initial value is 0 or set according to the actual situation, b ^T Representing the bias parameters.

Further, the step S2 includes:

the target code rate setting rule is as follows: firstly, according to whether a current frame is a first frame of the video, if so, directly taking an estimated code rate as a target code rate; otherwise, it needs to judge whether the network bandwidth meets the transmission standard. If the current network bandwidth meets the transmission standard, taking the code rate of the previous frame of video image as the target code rate of the current frame of video image; otherwise, the code rate of the previous frame of video image needs to be reduced and then is used as the target code rate of the current frame of video image.

The code reduction method comprises the following steps:

wherein R is _i Representing the target code rate of the ith frame of video, M representing the total number of frames of the video sequence, i belongs to [0]D represents a distortion metric of the channel in transmission, a represents a constant,

representing the mean square error geometric mean value of the reconstructed transmission channel and the original transmission channel of the ith frame when the code rate is zero, Y represents the available channel bandwidth in the transmission time, j belongs to [0, i), R _j Represents any frame of video picture before the ith frame of video. And performing code reduction processing on the code rate of the current frame picture according to the relationship between the ratio of the code rates of the video pictures before and after communication and the difference value of the quantization parameters.

Further, the step S3 includes:

the terminal receives the coding priority order, determines the quantization parameter value of each region according to the coding priority order,

where k ∈ m, m denotes the number of divided regions, S (k) is a range concept value for each region, λ _k Is the prioritization of each region. Calculating the allocated code rate of each region according to the quantization parameter value and the target code rate of each region:

the beneficial effects of the invention are:

1. constructing a bandwidth prediction neural network, and fusing input data by using time as a sequence and a spatial position as a sequence through a time fusion layer and a space fusion layer respectively to enhance the fusion effect; and setting a homogenization threshold and a cycle threshold, improving the pre-estimation accuracy and ensuring timely acquisition of bandwidth information.

2. The estimated code rate is utilized to calculate the target code rate according to the position of the current frame and the network bandwidth condition, so that the problems of single code rate distribution method and overlarge sum of the code rates of all frames can be avoided, and the video communication quality is improved.

3. The appropriate code rate is allocated according to the coding priority, the situation that most of code rates are allocated to one region in a concentrated mode is reduced, the situation of code rate waste is improved, the overall picture quality of a coded video picture is guaranteed, the user viscosity and the flexibility of code rate allocation are improved, the definition of the video picture quality is guaranteed, and the flexibility of code rate allocation is improved.

Drawings

FIG. 1 is a flow chart of a dynamic code rate allocation method based on bandwidth estimation according to the present invention;

FIG. 2 is a diagram of a bandwidth prediction neural network model structure according to the present invention.

Detailed Description

The following detailed description will be provided with reference to the drawings in the present embodiment, so that how to apply the technical means to solve the technical problems and achieve the technical effects can be fully understood and implemented. It should be noted that, as long as there is no conflict, the features in the embodiments of the present invention may be combined with each other, and the formed technical solutions are within the scope of the present invention.

Referring to fig. 1, a method for dynamically allocating a code rate based on bandwidth estimation according to the present invention includes:

when each frame of video picture is encoded, in order to enable the encoded video picture to be transmitted smoothly, code rates required for encoding the video pictures of different frames may be different, that is, the code rate for encoding the video pictures is changed dynamically. Firstly, the bandwidth of a current frame video to be transmitted is estimated, and the estimated code rate total value corresponding to the current frame video to be transmitted is searched according to the corresponding relation between the bandwidth and the code rate. And setting a target code rate according to the position of the current frame and the network bandwidth condition based on the obtained estimated code rate. Then setting a region division requirement, dividing the video picture into a plurality of regions, distributing corresponding code rates for the plurality of regions according to the coding priority and the target code rate of the plurality of regions, and coding the plurality of regions according to different code rates, thereby completing the coding of the video picture of the current frame.

S1, constructing a bandwidth estimation neural network, and estimating the total bandwidth value required by sending each coded stream to be transmitted.

S11, according to the packet loss number, the packet loss rate, the round-trip delay and the like of each coded stream to be transmitted when the video is transmitted, the total bandwidth value required by sending each coded stream to be transmitted can be estimated, and the estimated bandwidth can be obtained.

Constructing a bandwidth prediction neural network, and forming the influence factors such as packet loss number, packet loss rate and round trip delay of each to-be-transmitted encoded stream into an input vector as shown in fig. 2

N denotes the number of influencing factors and I denotes the I-th set of vectors. The bandwidth prediction neural network comprises an input layer, a time fusion layer, a space fusion layer, an equalization layer, a circulation layer and an output layer.

And transmitting to a time fusion layer.

The time fusion layer fuses input data according to a time sequence, and the fusion method comprises the following steps:

OS ^I representing the output of the spatial fusion layer, wherein the number of neurons of the spatial fusion layer is 1, each neuron after fusion represents fusion data of the influence factors, and the fused data is transmitted to the homogenization layer;

the homogenization layer is used for carrying out homogenization treatment on the fused data:

OA denotes the output of the homogenization layer, which includes temporal level homogenization and spatial level homogenization, and which has only one neuron. Setting the homogenization threshold epsilon _A If the data after the homogenization process is less than the homogenization threshold, i.e., OA < ε _A The homogenization layer transmits the difference value between the processed data and the homogenization threshold value to the time fusion layer, and fusion calculation is carried out again after the difference value data is added to the data of each neuron in the time fusion layer; otherwise, the homogenization layer transmits the processed data to the circulation layer. The homogenization threshold value can be set according to actual conditions;

s12, the circulation layer is only provided with one neuron, and the activation formula is as follows:

wherein, OC ^T A cycle layer output, T, representing the current time periodRepresenting arbitrary time segments, each time segment consisting of t time points, ω ^T Represents the connection weight, OA, of the equalization layer and the loop layer ^T Average layer output, OC, representing the current time period ^(T-1) Indicating the output of the cycle layer of the previous time period, the initial value is 0 or set according to the actual situation, b ^T Representing the bias parameters. Setting the upper and lower limits of circulation

And

if it is

The circulation layer transmits the processed data to the output layer; if it is

Sending a cycle instruction to a spatial fusion layer, introducing the difference value between the cycle layer and the cycle upper limit into the data fused by each neuron of the spatial fusion layer, and then sending the data to an equalization layer for recalculation, namely

If it is

The cycle will be lower bound

Passed as the circulating layer output to the output layer,

output layer output Y = OC ^T That is, the estimated total bandwidth value required by each encoding stream to be transmitted. Calculating bandwidth prediction neural network output error

Is the desired output. And if the error E meets the preset error range, finishing the neural network training.

The bandwidth estimation neural network has the beneficial effects that: the input data are fused by the time fusion layer and the empty fusion layer respectively by taking time as a sequence and taking a spatial position as a sequence, so that the fusion effect is enhanced; setting a homogenization threshold value and a cycle threshold value, improving the pre-estimation accuracy and ensuring to acquire bandwidth information in time.

S2, establishing a corresponding relation between the bandwidth and the code rate, obtaining the estimated code rate according to the estimated bandwidth, and calculating the target code rate according to the position of the current frame and the network bandwidth condition based on the obtained estimated code rate.

S21, according to the Nyquist theorem and the Shannon theorem, establishing a relation between the bandwidth and the code element transmission rate:

V＝2Y log ₂ K

where C is the channel capacity, Y is the channel bandwidth, s is the power of the signal, p is the noise power, V is the symbol transmission rate, and K is the number of phases of the polyphase modulation. According to the code rate R is equal to the product of the code element transmission rate V and the binary digit corresponding to a single modulation state, the corresponding relation between the bandwidth and the code rate can be obtained. Therefore, the code rate to be allocated corresponding to the estimated bandwidth is obtained, and the matching efficiency of the code rate and the bandwidth can be improved.

And S22, if the total bandwidth value required by sending the code rate of each video to be transmitted is Y, searching the code rate Y corresponding to the bandwidth Y from the corresponding relation between the bandwidth and the code rate, namely the estimated code rate of each video to be transmitted. And based on the obtained estimated code rate, calculating a target code rate according to the position of the current frame and the network bandwidth condition.

The bandwidth transmission standard is set as follows: after the video is coded, the situations of data packet blockage, packet loss and the like do not occur, the network transmission speed cannot be smaller than a network transmission speed threshold value, and the network transmission speed threshold value is set according to actual requirements.

The code reduction method comprises the following steps:

wherein R is _i Representing the target code rate of the ith frame of video, M representing the total number of frames of the video sequence, i belongs to [0]D denotes a distortion metric of a channel in transmission, a denotes a constant,

representing the geometric mean value of mean square errors of the reconstructed transmission channel and the original transmission channel of the ith frame when the code rate is zero, Y represents the available channel bandwidth in the transmission time, j belongs to [0, i ], R _j And represents any frame of video picture before the ith frame of video. And performing code reduction processing on the code rate of the current frame picture according to the relation between the code rate ratio between the video picture before and after communication and the quantization parameter difference value.

The method for calculating the target code rate has the beneficial effects that: and calculating the target code rate according to the position of the current frame and the network bandwidth condition by using the estimated code rate, so that the problems of single code rate distribution method and overlarge sum of the code rates of all frames can be avoided, and the video communication quality is improved.

S31, coding the video by adopting different code rates can cause the quality of the video picture to be different, so that the video picture needs to be divided into a plurality of area blocks. Firstly, a region division rule needs to be set for each frame of video, and the region division rule is set in advance by a user according to actual requirements in the process of using the terminal.

The human body area is taken as an embodiment of the invention, when a user sets an area division rule, if the area needing to be divided comprises the human body area, the area where the human body is located in the video picture is identified through a human body identification algorithm, corresponding parameters are set according to actual conditions, and the area where the human body is located is determined as the human body area. The human body recognition algorithm is an existing method, and the invention is not elaborated herein.

Further, a static background area, a dynamic background area, and the like may also be selected as the target division area, and the static background area and the dynamic background area are identified from the video picture by using the prior art such as a motion estimation algorithm.

And S32, when the user sets the region division rule, setting a corresponding coding priority for each region according to the interest degree of each region. The encoding priority instruction includes description information and priority order of each region. The terminal respectively allocates corresponding code rates to each region according to the coding priority and the target code rate of each region:

and coding different areas of each frame of video picture according to different code rates by combining the target code rate and the allocation code rate, so that the method can be used for meeting the requirements of transmission and the like, and further the code rate allocation method is completed.

The code rate allocation has the beneficial effects that: the appropriate code rate is allocated according to the coding priority, the situation that most of code rates are allocated to one region in a concentrated mode is reduced, the situation of code rate waste is improved, the overall picture quality of a coded video picture is guaranteed, the user viscosity and the flexibility of code rate allocation are improved, the definition of the video picture quality is guaranteed, and the flexibility of code rate allocation is improved.

In conclusion, the method for dynamically allocating the code rate based on the bandwidth estimation is completed.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A code rate dynamic allocation method based on bandwidth estimation is characterized by comprising the following steps:

s2, establishing a corresponding relation between the bandwidth and the code rate, obtaining a pre-estimated code rate according to the pre-estimated bandwidth, and calculating a target code rate according to the position of the current frame and the network bandwidth condition based on the obtained pre-estimated code rate;

and S3, setting a region division rule, dividing a video picture into a plurality of region blocks, and allocating code rates to each region according to the coding priority and the target code rate set by the user.

2. The method for dynamically allocating code rate based on bandwidth estimation as claimed in claim 1, wherein said step S1 comprises:

constructing a bandwidth prediction neural network, and forming influence factors such as packet loss number, packet loss rate, round-trip delay and the like of each to-be-transmitted coding stream into an input vector

N belongs to N, N represents the number of the influence factors, and I represents the I group vector; the bandwidth prediction neural network comprises an input layer, a time fusion layer, a space fusion layer, an equalization layer, a circulation layer and an output layer;

the input of the input layer is an input vector S of continuous t time points ^I (I e (0, t)) for preprocessing input data, including scalingOperations such as normalization, padding and duplicate removal, wherein the preprocessing operation is the prior art, and the invention is not described herein too much; the number of neurons in the input layer is t, and the processed data is

Transmitting to a time fusion layer;

the homogenization layer performs homogenization treatment on the fused data:

OA represents the output of the homogenization layer, the homogenization process including time-plane homogenization and nullHomogenizing the interlayer layer, wherein the homogenization layer only has one neuron; setting the homogenization threshold ε _A If the data after homogenization is less than the homogenization threshold, OA < epsilon _A The homogenization layer transmits the difference value between the processed data and the homogenization threshold value to the time fusion layer, and fusion calculation is carried out again after the difference value data is added to the data of each neuron in the time fusion layer; otherwise, the homogenization layer transmits the processed data to the circulation layer; the homogenization threshold value can be set according to actual conditions;

3. The method for dynamically allocating code rate based on bandwidth estimation as claimed in claim 2, wherein said step S2 comprises:

the target code rate setting rule is as follows: firstly, according to whether a current frame is a first frame of the video, if so, directly taking an estimated code rate as a target code rate; otherwise, judging whether the network bandwidth meets the transmission standard; if the current network bandwidth meets the transmission standard, taking the code rate of the previous frame of video image as the target code rate of the current frame of video image; otherwise, the code rate of the previous frame of video image needs to be subjected to code reduction processing to be used as the target code rate of the current frame of video image;

the code reduction method comprises the following steps:

representing the mean square error geometric mean value of the reconstructed transmission channel and the original transmission channel of the ith frame when the code rate is zero, Y represents the available channel bandwidth in the transmission time, j belongs to [0, i), R _j Representing any one frame of video picture before the ith frame of video; and performing code reduction processing on the code rate of the current frame picture according to the relation between the code rate ratio between the video picture before and after communication and the quantization parameter difference value.

4. The method for dynamically allocating code rate based on bandwidth estimation as claimed in claim 3, wherein said step S3 comprises:

where k ∈ m, m denotes the number of divided regions, S (k) is a range concept value for each region, λ _k Is the priority ranking of each region; calculating the allocated code rate of each region according to the quantization parameter value and the target code rate of each region: