WO2024139166A1 - Video coding method and apparatus, and electronic device and storage medium - Google Patents
Video coding method and apparatus, and electronic device and storage medium Download PDFInfo
- Publication number
- WO2024139166A1 WO2024139166A1 PCT/CN2023/106615 CN2023106615W WO2024139166A1 WO 2024139166 A1 WO2024139166 A1 WO 2024139166A1 CN 2023106615 W CN2023106615 W CN 2023106615W WO 2024139166 A1 WO2024139166 A1 WO 2024139166A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video
- encoded
- preset
- coding
- video coding
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 69
- 238000004590 computer program Methods 0.000 claims description 32
- 238000012549 training Methods 0.000 claims description 21
- 230000015654 memory Effects 0.000 claims description 16
- 238000010606 normalization Methods 0.000 claims description 13
- 238000004364 calculation method Methods 0.000 claims description 12
- 238000001514 detection method Methods 0.000 claims description 12
- 239000013598 vector Substances 0.000 claims description 12
- 230000031068 symbiosis, encompassing mutualism through parasitism Effects 0.000 claims description 9
- 230000002123 temporal effect Effects 0.000 claims description 8
- 238000007781 pre-processing Methods 0.000 claims description 7
- 230000004044 response Effects 0.000 claims description 7
- 230000006870 function Effects 0.000 description 13
- 238000004891 communication Methods 0.000 description 11
- 238000010586 diagram Methods 0.000 description 7
- 239000011159 matrix material Substances 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 239000002699 waste material Substances 0.000 description 5
- 230000003936 working memory Effects 0.000 description 5
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 4
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 3
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 3
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000013139 quantization Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000001427 coherent effect Effects 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 239000012634 fragment Substances 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013475 authorization Methods 0.000 description 1
- 230000010267 cellular communication Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011022 operating instruction Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
- H04N19/14—Coding unit complexity, e.g. amount of activity or edge presence estimation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/142—Detection of scene cut or scene change
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/593—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
Definitions
- an electronic device comprising: at least one processor; and at least one memory communicatively connected to the at least one processor, wherein the at least one memory stores a computer program, and the computer program implements the above-mentioned video encoding method when executed by the at least one processor.
- the second prediction of the rate factor is based on the rate factor of the video coding standard with the shortest encoding time in the first prediction result. Since the computational complexity of the video coding standard with the shortest encoding time is the lowest, the rate factor under this video coding standard can reflect the accuracy of the rate factors under other video coding standards with longer encoding time to a certain extent, thereby associating different video coding standards, which is conducive to further reducing the amount of calculation required to predict the rate factor.
- FIG1 shows a flowchart of a video encoding method according to some embodiments of the present disclosure
- FIG2 shows a flowchart of a method for training a first rate factor prediction model according to some embodiments of the present disclosure
- FIG3 shows a structural block diagram of a first rate factor prediction model according to some embodiments of the present disclosure
- FIG4 shows a flowchart of determining multiple target bit rate factors of a video segment to be encoded under multiple preset video coding standards according to some embodiments of the present disclosure
- FIG5 shows a structural block diagram of a video encoding device according to an embodiment of the present disclosure
- step S110, obtaining a video segment to be encoded may include: obtaining a video to be encoded; performing scene detection on the video to be encoded; dividing the video to be encoded into one or more sub-video segments based on the scene detection result; and identifying each video sub-segment in the one or more video segments as a video segment to be encoded.
- scene detection can be performed on the video to be encoded based on an open source encoder.
- the open source encoder x264 can be called to encode the video V, and the scene detection switch can be turned on to obtain a corresponding scene switching detection result.
- the video V can be divided into t independent single-lens video segments (V1, V2...Vt), and each single-lens video segment can be used as a video segment to be encoded.
- the content shown in a single-shot video clip is usually similar and more coherent. Therefore, when encoding the video, it is more reasonable to use a specific and identical bitrate factor for each single-shot video clip, thereby ensuring a stable picture quality experience in all scenarios.
- the inter-frame coding information of each video frame pair can also be extracted. For example, for a video frame pair (f 1 , f 2 ), the inter-frame coding information of f 1 and f 2 is calculated; for a video frame pair (f 2 , f 3 ), the inter-frame coding information of f 2 and f 3 is calculated, and so on.
- Examples of multiple preset video coding standards may include H.264 video coding standard, H.265 video coding standard, H.266 video coding standard, AV1 video coding standard. It will be understood that the preset video standard may also include any other existing video coding standard or video coding standard generated as technology develops. It will also be understood that the number of multiple preset video coding standards may be any integer value that meets actual needs. The scope of the subject matter claimed in the present disclosure is not limited in both aspects.
- the sample rate factor may be a real rate factor, that is, the quality of the encoded video obtained after encoding the sample video segment using the rate factor meets the target quality requirement.
- the quality of the encoded video may be evaluated, for example, by a video coding score (also known as a video multimethod assessment fusion, VMAF).
- VMAF video multimethod assessment fusion
- the smaller the value of VMAF the lower the quality of the encoded video.
- the value range of VMAF may be 0-100 under normal circumstances.
- the real bit rate factor can be obtained by the following steps: for the video coding standard, selecting an initial sample bit rate factor and a preset sample video coding score range; based on the initial sample bit rate factor, using a corresponding encoder in a real transcoding system (for example, H.264 encoder, H.265 encoder, H.266 encoder, AV1 encoder, etc.) to encode the sample video clip to obtain a sample pre-encoded video clip under the video coding standard; comparing the sample video coding score of the obtained sample pre-encoded video clip with the preset sample video coding score range; if the sample video coding score is within If the sample video coding score is within the preset sample video coding score range (i.e., the encoded video meets the target quality requirements), it indicates that the initial sample rate factor under the video coding standard is feasible, and the initial sample rate factor can be used as the real rate factor; if the sample video coding score is outside the preset sample video coding score range (i.e., the encode
- initial parameters of a first rate factor prediction model may be selected for a plurality of video coding standards, and based on the initial parameters, a plurality of sample prediction rate factors of sample video clips under the plurality of video coding standards may be determined using the acquired sample spatiotemporal domain feature information.
- a model loss value may be calculated based on the corresponding sample prediction rate factor and the sample prediction rate factor, for example, via a loss function.
- loss functions include, but are not limited to, a cross entropy loss function, a maximum loss function, an average loss function, a 0-1 loss function, etc. It will be understood that in the present disclosure, other suitable methods may also be used to calculate the model loss value, and the scope of the subject matter claimed in the present disclosure is not limited in this respect.
- step S240 after calculating the model loss value under each preset video coding standard, these model loss values can be back-propagated, and the parameters of the first rate factor prediction model can be adjusted, and then steps S220-S240 are repeated until the model training stop condition is reached.
- the model training stop condition can be that the model loss value is lower than a preset threshold, and/or the number of rounds of model training reaches a preset number of rounds. It will be understood that the present disclosure does not limit the model training stop condition, and it can be adjusted according to actual needs.
- the screened features are input to the residual module 330, which may include multiple convolutional layers, which change the screened features as input, and the unchanged screened features will skip these convolutional layers and be transmitted to the subsequent layers, and then be processed as a whole to obtain the output result of the residual module, thereby reducing information loss.
- the output of the residual module 330 can be further input to the second attention mechanism module 340 to achieve further screening of the features, thereby further improving the accuracy of the prediction results.
- the input features may be aggregated via the fully connected module 350 to generate and output a predicted bit rate factor.
- FIG4 shows a flowchart of determining multiple target rate factors of a video segment to be encoded under multiple preset video coding standards according to some embodiments of the present disclosure.
- step S140 based on a first predicted rate factor of a first preset video coding standard among multiple preset video coding standards and a preset video coding score interval, determines multiple target rate factors of a video segment to be encoded under multiple preset video coding standards.
- the first preset video coding standard refers to the video coding standard that takes the shortest time to perform video coding.
- the video coding standard H.264 takes the shortest time to perform video coding, so the video coding standard H.264 is the first preset video coding standard.
- the encoding parameters used when encoding the video to be encoded may also include but are not limited to preset, GOP (Group of Picture), size, etc. Since the bit rate factor plays an important role in the quality of the encoded video, when encoding the video using the method of the present disclosure, the values of these encoding parameters can be set to fixed constants (for example, default preset values).
- the rate factor under this video coding standard can, to a certain extent, reflect the accuracy of the rate factors under other video coding standards with longer encoding times.
- the video coding score of the video clip encoded based on the first predicted rate factor corresponding to the first preset video coding standard (for example, the video coding standard H.264 in the above example) is within the preset video coding score range (meeting the video quality requirement)
- the accuracy of other predicted rate factors corresponding to other preset video coding standards for example, the video coding standards H.265, H.266 and AV1 in the above example
- the video coding scores of the encoded videos under the video coding standards H.265, H.266 and AV1 also have a
- the video coding scores of the video clips encoded using these predicted bitrate factors are usually also within the preset video coding score range (meeting the video quality requirements), so that different video coding standards can be associated, which is conducive to reducing the amount of calculation required for the predicted bitrate factors.
- the probability that the predicted bitrate factors under the video coding standards H.265, H.266 and AV1 meet the video quality requirements is 95%, so these predicted bitrate factors can be directly used as target bitrate factors for encoding the video to be encoded.
- multiple preset video coding standards being H.264, H.265, H.266 and AV1
- the respective predicted bitrate factors under the video coding standards H.265, H.266 and AV1 and the video coding score of the pre-encoded video clip under the video coding standard H.264 can be used as feedback information, together with the spatiotemporal domain feature information of the video clip to be encoded, to perform a second prediction of the bitrate factors under these video coding standards to obtain multiple target bitrate factors.
- step S448, based on the spatiotemporal domain feature information of the video segment to be encoded, multiple predicted rate factors and the video coding score of the first pre-encoded video segment, updating multiple predicted rate factors to obtain multiple target rate factors may include: inputting the spatiotemporal domain feature information of the video segment to be encoded, multiple predicted rate factors and the video coding score of the first pre-encoded video segment into a second rate factor prediction model, so as to determine multiple target rate factors of the video segment to be encoded under multiple preset video coding standards via the second rate factor prediction model.
- the second rate factor prediction model can be trained by the following operations: inputting the sample spatiotemporal feature information, the sample prediction rate factor under each preset video coding standard, and the first sample video coding score corresponding to the sample prediction rate factor under the first preset video coding standard into the second rate factor prediction model, so as to determine the second rate factor prediction model through the second rate factor prediction model.
- the structural framework of the second rate factor prediction model is also the same as the structural framework of the first rate factor prediction model, so it will not be repeated here.
- the training of the second rate factor prediction model is dependent on the training results of the first rate factor prediction model. Therefore, the training of the second rate factor prediction model is performed after the training of the first rate factor prediction model is completed.
- the corresponding encoders in the transcoding system After determining multiple target bit rate factors, the corresponding encoders in the transcoding system (based on the corresponding preset video coding standards) are used to encode the video segments to be encoded according to the multiple bit rate factors. After multiple test data show that after encoding the video segments using the target bit rate factors obtained by the second prediction, there is a 99% probability that the video coding score of the encoded video segments is within the preset video coding score range. Therefore, the encoding result of the video segments based on the target bit rate factors obtained by the second prediction can be directly trusted.
- method 100 may also include: for each preset video coding standard among multiple preset video coding standards, obtaining all target video segments under the preset video coding standard; and combining all target video segments based on the order of one or more sub-video segments in the video to be encoded to obtain the target video corresponding to the video to be encoded under the preset video coding standard.
- multiple target video segments for each video segment to be encoded can be obtained.
- the corresponding target video segment among the multiple target video segments for each video segment to be encoded can be written into the final video bitstream file, and the written target video segments are spliced according to the order of each video segment to be encoded in the video to be encoded, so as to obtain the target video corresponding to the video to be encoded under the preset video encoding standard.
- each video segment to be encoded is a single-shot video segment showing similar and coherent content. Therefore, encoding each video segment to be encoded and obtaining the corresponding target video segment can ensure that each encoded video segment is The quality of the video clips to be encoded is improved, thereby ensuring a stable picture quality experience in various scenarios.
- the target video clips are spliced and combined based on the order of these video clips to be encoded in the video to be encoded, so that the complete video file can be restored, avoiding the discontinuity of the video file caused by the splicing errors of the target video clips.
- FIG5 shows a structural block diagram of a video encoding device 500 according to an embodiment of the present disclosure.
- the device 500 may include: an acquisition module 510, configured to acquire a video segment to be encoded, the video segment to be encoded including one or more video frames; a calculation module 520, configured to calculate the spatiotemporal feature information of the video segment to be encoded based on the one or more video frames; a prediction rate factor determination module 530, configured to determine a plurality of prediction rate factors of the video segment to be encoded under a plurality of preset video encoding standards based on the spatiotemporal feature information; a target rate factor determination module 540, configured to determine a plurality of target rate factors of the video segment to be encoded under a plurality of preset video encoding standards based on a first prediction rate factor corresponding to a first preset video encoding standard among the plurality of preset video encoding standards and a preset video encoding score interval among the plurality of prediction rate factors, wherein
- the spatiotemporal domain feature information may include coding feature information
- the computing module 530 may include: a preprocessing module, configured to preprocess one or more video frames to generate a new video frame sequence, the new video frame sequence including a set of video frame pairs; and an intra-frame and inter-frame coding module, configured to perform intra-frame coding and inter-frame coding, respectively, for each video frame pair in the set of video frame pairs to obtain coding feature information of the video segment to be encoded.
- the intra-frame and inter-frame coding module may include: a module configured to perform intra-frame coding on the first video frame in each video frame pair in the video frame pair set to obtain intra-frame coding information; A module configured to perform inter-frame coding on each video frame pair in the video frame pair set to obtain inter-frame coding information; and a module configured to obtain coding feature information based on the intra-frame coding information and the inter-frame coding information.
- the coding feature information may be based on one or more of the following items: the number of coding bits, the proportion of intra-frame prediction modes, and the distribution information of the amplitude of inter-frame motion vectors.
- the module 530 for determining the predicted bit rate factor may include: a module configured to input the spatiotemporal domain feature information of the video segment to be encoded into a first bit rate factor prediction model, so as to determine multiple predicted bit rate factors of the video segment to be encoded under multiple preset video coding standards via the first bit rate factor prediction model.
- the update module may include: a module configured to input the spatiotemporal domain feature information of the video segment to be encoded, multiple predicted bit rate factors, and the video coding score of the first pre-encoded video segment into a second bit rate factor prediction model, so as to determine multiple target bit rate factors of the video segment to be encoded under multiple preset video coding standards via the second bit rate factor prediction model.
- the second rate factor prediction model is trained by the following operations: combining the sample spatiotemporal feature information, the sample prediction rate factor under each preset video coding standard, and the first preset video coding standard with the sample prediction rate factor; A first sample video coding score corresponding to a sample predicted rate factor under a coding standard is input into a second rate factor prediction model to determine multiple sample target rate factors of sample video clips under multiple preset video coding standards through the second rate factor prediction model; for each of the multiple preset video coding standards, a second model loss value is calculated based on the sample rate factor and the sample target rate factor corresponding to the preset video coding standard; and parameters of the second rate factor prediction model are adjusted based on the second model loss value under each preset video coding standard until the second model training stop condition is reached.
- modules 510-550 of the apparatus 500 shown in FIG5 may correspond to the steps S110-S150 in the method 100 described with reference to FIG1.
- the operations, features and advantages described above for the method 100 are also applicable to the apparatus 500 and the modules included therein. For the sake of brevity, some operations, features and advantages are not described in detail herein.
- modules described above with respect to FIG. 5 may be implemented in hardware or in hardware in combination with software and/or firmware.
- these modules may be implemented as computer program codes/instructions configured to be executed in one or more processors and stored in a computer-readable storage medium.
- these modules may be implemented as hardware logic/circuits.
- one or more of the acquisition module 510, the calculation module 520, the determination prediction rate factor module 530, the determination target rate factor module 540, and the encoding module 550 may be implemented together in a system on chip (System on Chip, SoC).
- a non-transitory computer-readable storage medium storing a computer program
- the computer program implements the above-mentioned video encoding method when executed by a processor.
- the communication unit 680 allows the electronic device 600 to exchange information/data with other devices via a computer network such as the Internet and/or various telecommunication networks, and may include but is not limited to a modem, a network card, an infrared communication device, a wireless communication transceiver and/or a chipset, such as a Bluetooth TM device, an 802.11 device, a Wi-Fi device, a WiMAX device, a cellular communication device and/or the like.
- a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, device, or equipment.
- a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
- a machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or equipment, or any suitable combination of the foregoing.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
Provided in the present disclosure are a video coding method and apparatus, and an electronic device and a storage medium. The method comprises: acquiring a video clip to be coded, wherein said video clip comprises one or more video frames; calculating spatio-temporal domain feature information of said video clip on the basis of the one or more video frames; determining a plurality of predicted bitrate factors for said video clip under a plurality of preset video coding standards on the basis of the spatio-temporal domain feature information; determining a plurality of target bitrate factors for said video clip under the plurality of preset video coding standards on the basis of a first predicted bitrate factor among the plurality of predicted bitrate factors that corresponds to a first preset video coding standard among the plurality of preset video coding standards, and a preset video coding score interval, wherein the time for performing video coding by using the first preset video coding standard among the plurality of preset video coding standards is the shortest; and coding said video clip according to the plurality of target bitrate factors, so as to obtain a plurality of target video clips.
Description
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本公开要求于2022年12月27日提交的中国专利申请202211690147.9的优先权,其全部内容通过引用整体结合在本公开中。This disclosure claims priority to Chinese patent application 202211690147.9 filed on December 27, 2022, the entire contents of which are incorporated by reference in their entirety into this disclosure.
本公开涉及计算机技术领域,具体涉及一种视频编码方法及装置、电子设备、计算机可读存储介质和计算机程序产品。The present disclosure relates to the field of computer technology, and in particular to a video encoding method and apparatus, an electronic device, a computer-readable storage medium, and a computer program product.
随着互联网技术的发展,视频平台也得到了迅猛发展,越来越多的用户分享、观看视频。对于平台而言,随着用户数量的不断增长,用于视频传输的带宽成本也越来越大。为了降低该成本,需要压缩性能更好的转码系统,在保证视频质量的同时,显著降低压缩后视频的大小,从而降低传输所需的网络流量。With the development of Internet technology, video platforms have also developed rapidly, with more and more users sharing and watching videos. For the platform, as the number of users continues to grow, the bandwidth cost for video transmission is also increasing. In order to reduce this cost, a transcoding system with better compression performance is needed. While ensuring the quality of the video, it can significantly reduce the size of the compressed video, thereby reducing the network traffic required for transmission.
在此部分中描述的方法不一定是之前已经设想到或采用的方法。除非另有指明,否则不应假定此部分中描述的任何方法仅因其包括在此部分中就被认为是现有技术。类似地,除非另有指明,否则此部分中提及的问题不应认为在任何现有技术中已被公认。The methods described in this section are not necessarily methods that have been previously conceived or employed. Unless otherwise indicated, it should not be assumed that any method described in this section is considered to be prior art simply because it is included in this section. Similarly, unless otherwise indicated, the issues mentioned in this section should not be considered to have been recognized in any prior art.
发明内容Summary of the invention
本公开提供了一种视频编码方法及装置、电子设备、计算机可读存储介质和计算机程序产品。The present disclosure provides a video encoding method and apparatus, an electronic device, a computer-readable storage medium, and a computer program product.
根据本公开的一方面,提供一种视频编码方法,包括:获取待编码视频片段,该待编码视频片段包括一个或多个视频帧;基于该一个或多个视频帧计算待编码视频片段的时空域特征信息;基于时空域特征信息确定多个预设视频编码标准下待编码视频片段的多个预测码率因子;基于多个预测码率因子中对应于多个预设视频编码标准中的第一预设视频编码标准的第一预测码率因子和预设视频编码分值区间,确定多个预设视频编码标准下待编码视频片段的多个目标码率因子,其中,在多个预设视频编码标准中,利用第
一预设视频编码标准进行视频编码的时间最短;以及根据多个目标码率因子分别对待编码视频片段进行编码,以获得多个目标视频片段。According to one aspect of the present disclosure, a video encoding method is provided, comprising: obtaining a video segment to be encoded, the video segment to be encoded comprising one or more video frames; calculating spatiotemporal feature information of the video segment to be encoded based on the one or more video frames; determining multiple predicted bitrate factors of the video segment to be encoded under multiple preset video encoding standards based on the spatiotemporal feature information; determining multiple target bitrate factors of the video segment to be encoded under multiple preset video encoding standards based on a first predicted bitrate factor corresponding to a first preset video encoding standard among multiple preset video encoding standards and a preset video encoding score interval among the multiple predicted bitrate factors, wherein, among the multiple preset video encoding standards, a first predicted bitrate factor corresponding to a first preset video encoding standard among the multiple preset video encoding standards is used. A preset video encoding standard performs video encoding in the shortest time; and the video segments to be encoded are encoded respectively according to a plurality of target bit rate factors to obtain a plurality of target video segments.
根据本公开的另一方面,还提供一种视频编码装置,包括:获取模块,被配置为获取待编码视频片段,该待编码视频片段包括一个或多个视频帧;计算模块,被配置为基于该一个或多个视频帧计算待编码视频片段的时空域特征信息;确定预测码率因子模块,被配置为基于时空域特征信息确定多个预设视频编码标准下待编码视频片段的多个预测码率因子;确定目标码率因子模块,被配置为基于多个预测码率因子中对应于多个预设视频编码标准中的第一预设视频编码标准的第一预测码率因子和预设视频编码分值区间,确定多个预设视频编码标准下待编码视频片段的多个目标码率因子,其中,在多个预设视频编码标准中,利用第一预设视频编码标准进行视频编码的时间最短;以及编码模块,被配置为根据多个目标码率因子分别对待编码视频片段进行编码,以获得多个目标视频片段。According to another aspect of the present disclosure, a video encoding device is also provided, including: an acquisition module, configured to acquire a video segment to be encoded, the video segment to be encoded including one or more video frames; a calculation module, configured to calculate the spatiotemporal feature information of the video segment to be encoded based on the one or more video frames; a prediction rate factor determination module, configured to determine a plurality of prediction rate factors of the video segment to be encoded under a plurality of preset video encoding standards based on the spatiotemporal feature information; a target rate factor determination module, configured to determine a plurality of target rate factors of the video segment to be encoded under a plurality of preset video encoding standards based on a first prediction rate factor corresponding to a first preset video encoding standard among a plurality of preset video encoding standards among the plurality of prediction rate factors and a preset video encoding score interval, wherein, among the plurality of preset video encoding standards, the time for performing video encoding using the first preset video encoding standard is the shortest; and an encoding module, configured to encode the video segments to be encoded respectively according to the plurality of target rate factors to obtain a plurality of target video segments.
根据本公开的另一方面,还提供一种电子设备,包括:至少一个处理器;以及与所述至少一个处理器通信连接的至少一个存储器,其中,所述至少一个存储器存储有计算机程序,所述计算机程序在被所述至少一个处理器执行时实现上述视频编码方法。According to another aspect of the present disclosure, an electronic device is also provided, comprising: at least one processor; and at least one memory communicatively connected to the at least one processor, wherein the at least one memory stores a computer program, and the computer program implements the above-mentioned video encoding method when executed by the at least one processor.
根据本公开的另一方面,还提供一种存储有计算机程序的非瞬时计算机可读存储介质,其中,所述计算机程序在被处理器执行时实现上述视频编码方法。According to another aspect of the present disclosure, a non-transitory computer-readable storage medium storing a computer program is also provided, wherein the computer program implements the above-mentioned video encoding method when executed by a processor.
根据本公开的另一方面,还提供一种计算机程序产品,包括计算机程序,其中,所述计算机程序在被处理器执行时实现上述视频编码方法。According to another aspect of the present disclosure, a computer program product is also provided, including a computer program, wherein the computer program implements the above-mentioned video encoding method when executed by a processor.
根据本公开的一个或多个实施例,基于待编码视频片段的时空域特征信息对多个预设视频编码标准下的码率因子进行第一次预测,并且基于预测得到的码率因子和预设视频编码分值对多个预设视频编码标准下的码率因子进行第二次预测,由此可以提高码率因子的准确度,从而使得视频编码的准确度也得到了有效提升,保证了视频质量。另外,由于可以同时对多个预设视频编码标准下的码率因子进行预测,因此可以使得编码系统同时输出多路不同视频编码标准但均满足视频质量要求的编码视频,避免了针对多个视频编码标准重复执行预测操作,由此节约了时间成本。进一步地,对码率因子进行的第二次预测是基于第一次预测结果中对应于编码时间最短的视频编码标准的码率因子的,由于编码时间最短的视频编码标准的计算复杂度最低,因此该视频编码标准下的码率因子可以在一定程度上反应编码时间较长的其他视频编码标准下的码率因子的准确度,从而将不同视频编码标准关联起来,有利于进一步减少预测码率因子所需的计算量。
According to one or more embodiments of the present disclosure, the rate factors under multiple preset video coding standards are predicted for the first time based on the spatiotemporal feature information of the video clip to be encoded, and the rate factors under multiple preset video coding standards are predicted for the second time based on the predicted rate factors and the preset video coding scores, thereby improving the accuracy of the rate factors, thereby effectively improving the accuracy of video coding and ensuring the video quality. In addition, since the rate factors under multiple preset video coding standards can be predicted at the same time, the encoding system can simultaneously output multiple channels of encoded videos of different video coding standards but all meeting the video quality requirements, avoiding repeated prediction operations for multiple video coding standards, thereby saving time costs. Further, the second prediction of the rate factor is based on the rate factor of the video coding standard with the shortest encoding time in the first prediction result. Since the computational complexity of the video coding standard with the shortest encoding time is the lowest, the rate factor under this video coding standard can reflect the accuracy of the rate factors under other video coding standards with longer encoding time to a certain extent, thereby associating different video coding standards, which is conducive to further reducing the amount of calculation required to predict the rate factor.
应当理解,本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。It should be understood that the content described in this section is not intended to identify the key or important features of the embodiments of the present disclosure, nor is it intended to limit the scope of the present disclosure. Other features of the present disclosure will become easily understood through the following description.
附图示例性地示出了实施例并且构成说明书的一部分,与说明书的文字描述一起用于讲解实施例的示例性实施方式。所示出的实施例仅出于例示的目的,并不限制权利要求的范围。在所有附图中,相同的附图标记指代类似但不一定相同的要素。The accompanying drawings exemplarily illustrate the embodiments and constitute a part of the specification, and together with the text description of the specification, are used to explain the exemplary implementation of the embodiments. The embodiments shown are for illustrative purposes only and do not limit the scope of the claims. In all drawings, the same reference numerals refer to similar but not necessarily identical elements.
图1示出了根据本公开的一些实施例的视频编码方法的流程图;FIG1 shows a flowchart of a video encoding method according to some embodiments of the present disclosure;
图2示出了根据本公开的一些实施例的训练第一码率因子预测模型的方法的流程图;FIG2 shows a flowchart of a method for training a first rate factor prediction model according to some embodiments of the present disclosure;
图3示出了根据本公开的一些实施例的第一码率因子预测模型的结构框图;FIG3 shows a structural block diagram of a first rate factor prediction model according to some embodiments of the present disclosure;
图4示出了根据本公开的一些实施例的确定多个预设视频编码标准下待编码视频片段的多个目标码率因子的流程图;FIG4 shows a flowchart of determining multiple target bit rate factors of a video segment to be encoded under multiple preset video coding standards according to some embodiments of the present disclosure;
图5示出了根据本公开的实施例的视频编码装置的结构框图;FIG5 shows a structural block diagram of a video encoding device according to an embodiment of the present disclosure;
图6示出了能够用于实现本公开的实施例的示例性电子设备的结构框图。FIG. 6 shows a block diagram of an exemplary electronic device that can be used to implement an embodiment of the present disclosure.
以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本公开的范围。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。The following is a description of exemplary embodiments of the present disclosure in conjunction with the accompanying drawings, including various details of the embodiments of the present disclosure to facilitate understanding, which should be considered as merely exemplary. Therefore, it should be recognized by those of ordinary skill in the art that various changes and modifications may be made to the embodiments described herein without departing from the scope of the present disclosure. Similarly, for the sake of clarity and conciseness, the description of well-known functions and structures is omitted in the following description.
在本公开中,除非另有说明,否则使用术语“第一”、“第二”等来描述各种要素不意图限定这些要素的位置关系、时序关系或重要性关系,这种术语只是用于将一个元件与另一元件区分开。在一些示例中,第一要素和第二要素可以指向该要素的同一实例,而在某些情况下,基于上下文的描述,它们也可以指代不同实例。In the present disclosure, unless otherwise specified, the use of the terms "first", "second", etc. to describe various elements is not intended to limit the positional relationship, timing relationship, or importance relationship of these elements, and such terms are only used to distinguish one element from another element. In some examples, the first element and the second element may refer to the same instance of the element, and in some cases, based on the description of the context, they may also refer to different instances.
在本公开中对各种所述示例的描述中所使用的术语只是为了描述特定示例的目的,而并非旨在进行限制。除非上下文另外明确地表明,如果不特意限定要素的数量,则该要素可以是一个也可以是多个。此外,本公开中所使用的术语“和/或”涵盖所列出的项目中的任何一个以及全部可能的组合方式。The terms used in the description of various examples in this disclosure are only for the purpose of describing specific examples and are not intended to be limiting. Unless the context clearly indicates otherwise, if the number of elements is not specifically limited, the element can be one or more. In addition, the term "and/or" used in this disclosure covers any one of the listed items and all possible combinations.
随着互联网技术的发展,视频平台也得到了迅猛发展,越来越多的用户分享、观看视频。对于平台而言,随着用户数量的不断增长,用于视频传输的带宽成本也越来越大。
为了降低该成本,需要压缩性能更好的转码系统,在保证视频质量的同时,显著降低压缩后视频的大小,从而降低传输所需的网络流量。With the development of Internet technology, video platforms have also developed rapidly, with more and more users sharing and watching videos. For the platform, as the number of users continues to grow, the bandwidth cost for video transmission is also increasing. In order to reduce this cost, a transcoding system with better compression performance is needed to significantly reduce the size of the compressed video while ensuring the video quality, thereby reducing the network traffic required for transmission.
目前,转码系统常见的编码模式包括固定量化参数模式、恒定码率模式、可变码率模式等,其中,最常用的编码模式是恒定码率模式。虽然在该模式下输出的视频流具有稳定的码率,可以改善播放卡顿的问题,但是,在该模式下由于对所有的视频使用同一个码率因子进行编码,因此,对于场景内容丰富的视频,无法确保所有视频的画质体验,也无法避免码率浪费。例如,对于场景复杂的视频,使用平均码率因子对视频进行编码,将使得输出视频的码率不足并导致画质明显下降,而对于场景简单的视频,使用平均码率因子对视频进行编码,将可能造成不必要的码率浪费。At present, common encoding modes of transcoding systems include fixed quantization parameter mode, constant bit rate mode, variable bit rate mode, etc. Among them, the most commonly used encoding mode is constant bit rate mode. Although the video stream output in this mode has a stable bit rate, which can improve the problem of playback jams, in this mode, since all videos are encoded using the same bit rate factor, it is impossible to ensure the picture quality experience of all videos for videos with rich scene content, and it is also impossible to avoid bit rate waste. For example, for videos with complex scenes, using the average bit rate factor to encode the video will make the bit rate of the output video insufficient and cause a significant decrease in picture quality. For videos with simple scenes, using the average bit rate factor to encode the video may cause unnecessary bit rate waste.
为了确保画质体验并避免码率浪费,还可以采用恒定质量编码模式。具体而言,可以设定一个目标质量,并使用编码器对视频进行编码,以获得满足该目标质量的视频。该编码模式的目标是使编码后视频的质量尽可能与设定的目标质量接近,并且编码后视频的质量与设定的目标质量越接近,编码效果约好。当实际质量低于目标质量时,将可能降低视频的画质体验,而当实际质量超过目标质量时,将可能造成码率浪费。In order to ensure the picture quality experience and avoid bit rate waste, a constant quality encoding mode can also be used. Specifically, a target quality can be set, and the encoder can be used to encode the video to obtain a video that meets the target quality. The goal of this encoding mode is to make the quality of the encoded video as close to the set target quality as possible, and the closer the quality of the encoded video is to the set target quality, the better the encoding effect. When the actual quality is lower than the target quality, the video quality experience may be reduced, and when the actual quality exceeds the target quality, it may cause bit rate waste.
但是在使用这种编码模式对视频进行编码之前,编码使用的码率因子与编码结果的实际质量之间的关系是未知的,即,无法通过设定的目标质量直接得到准确的码率因子。常见的解决方法为:However, before using this encoding mode to encode a video, the relationship between the bit rate factor used in the encoding and the actual quality of the encoding result is unknown, that is, it is impossible to directly obtain the accurate bit rate factor by setting the target quality. Common solutions are:
(1)对视频进行多次编码,通过后验方式,经由搜索得到最优的码率因子,但是这种方法需要消耗巨大的计算资源;(1) Encode the video multiple times and search for the optimal bit rate factor in a posteriori manner, but this method consumes huge computing resources;
(2)对码率因子进行预测,例如,可以通过机器学习的方式,在对视频进行编码之前基于该视频的时空复杂度信息对码率因子进行预测。但是,该解决方法通常是一次预测和编码的方式,即,根据预设的目标,基于视频的时空复杂度信息对码率因子进行预测,并基于预测的码率因子对视频进行编码。这可能导致对码率因子的预测结果准确度较低,尤其对于场景复杂的视频或热点视频,可能无法稳定地确保画质体验并同时避免码率浪费。(2) Predicting the bit rate factor, for example, the bit rate factor can be predicted based on the spatiotemporal complexity information of the video before encoding the video through machine learning. However, this solution is usually a one-time prediction and encoding method, that is, according to a preset goal, the bit rate factor is predicted based on the spatiotemporal complexity information of the video, and the video is encoded based on the predicted bit rate factor. This may result in a low accuracy of the prediction result of the bit rate factor, especially for videos with complex scenes or hot videos, which may not be able to stably ensure the picture quality experience and avoid bit rate waste at the same time.
此外,发明人还发现,为了适配各种终端播放器,转码系统需要输出满足不同视频编码标准的视频码流。然而,如上所述,在现有技术中通常采用一次预测和编码的方式,这将导致针对每种视频编码标准,转码系统均需要进行一次码率因子的预测和视频的编码,从而增加了计算量和时间成本。
In addition, the inventors have also found that in order to adapt to various terminal players, the transcoding system needs to output video streams that meet different video coding standards. However, as mentioned above, in the prior art, a one-time prediction and encoding method is usually adopted, which will result in the transcoding system needing to predict the bit rate factor and encode the video once for each video coding standard, thereby increasing the amount of calculation and time cost.
有鉴于此,本公开的实施例提供了一种视频编码方法,该方法基于待编码视频片段的时空域特征信息对多个预设视频编码标准下的码率因子进行第一次预测,并且基于预测得到的码率因子和预设视频编码分值对多个预设视频编码标准下的码率因子进行第二次预测,由此可以提高码率因子的准确度,进而使得视频编码的准确度也得到了有效提升,保证了视频质量。另外,由于可以同时对多个预设视频编码标准下的码率因子进行预测,因此可以使得编码系统同时输出多路不同视频编码标准但均满足视频质量要求的编码视频,避免了针对多个视频编码标准重复执行预测操作,由此节约了时间成本。In view of this, an embodiment of the present disclosure provides a video encoding method, which performs a first prediction of the bit rate factors under multiple preset video encoding standards based on the spatiotemporal feature information of the video clip to be encoded, and performs a second prediction of the bit rate factors under multiple preset video encoding standards based on the predicted bit rate factors and the preset video encoding scores, thereby improving the accuracy of the bit rate factors, thereby effectively improving the accuracy of video encoding, and ensuring the video quality. In addition, since the bit rate factors under multiple preset video encoding standards can be predicted at the same time, the encoding system can simultaneously output multiple channels of encoded videos of different video encoding standards but all meeting the video quality requirements, avoiding repeated prediction operations for multiple video encoding standards, thereby saving time costs.
进一步地,在该方法中,对码率因子进行的第二次预测是基于第一次预测结果中对应于编码时间最短的视频编码标准的码率因子的。由于编码时间最短的视频编码标准的计算复杂度最低,因此该视频编码标准下的码率因子可以在一定程度上反应编码时间较长的其他视频编码标准下的码率因子的准确度,从而将不同视频编码标准关联起来,有利于进一步减少预测码率因子所需的计算量。Furthermore, in the method, the second prediction of the rate factor is based on the rate factor corresponding to the video coding standard with the shortest encoding time in the first prediction result. Since the computational complexity of the video coding standard with the shortest encoding time is the lowest, the rate factor under this video coding standard can reflect the accuracy of the rate factor under other video coding standards with longer encoding time to a certain extent, thereby associating different video coding standards, which is conducive to further reducing the amount of calculation required for predicting the rate factor.
下面将结合附图详细描述本公开的实施例。The embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.
图1示出了根据本公开的一些实施例的视频编码方法100的流程图。如图1所示,方法100可以包括:步骤S110、获取待编码视频片段,该待编码视频片段包括一个或多个视频帧;步骤S120、基于该一个或多个视频帧计算待编码视频片段的时空域特征信息;步骤S130、基于时空域特征信息确定多个预设视频编码标准下待编码视频片段的多个预测码率因子;步骤S140、基于多个预测码率因子中对应于多个预设视频编码标准中的第一预设视频编码标准的第一预测码率因子和预设视频编码分值区间,确定多个预设视频编码标准下待编码视频片段的多个目标码率因子,其中,在多个预设视频编码标准中,利用第一预设视频编码标准进行视频编码的时间最短;以及步骤S150、根据多个目标码率因子分别对待编码视频片段进行编码,以获得多个目标视频片段。FIG1 shows a flow chart of a video encoding method 100 according to some embodiments of the present disclosure. As shown in FIG1 , the method 100 may include: step S110, obtaining a video segment to be encoded, the video segment to be encoded including one or more video frames; step S120, calculating the spatiotemporal feature information of the video segment to be encoded based on the one or more video frames; step S130, determining multiple predicted rate factors of the video segment to be encoded under multiple preset video encoding standards based on the spatiotemporal feature information; step S140, determining multiple target rate factors of the video segment to be encoded under multiple preset video encoding standards based on the first predicted rate factor corresponding to the first preset video encoding standard among the multiple preset video encoding standards and the preset video encoding score interval among the multiple predicted rate factors, wherein the time for video encoding using the first preset video encoding standard is the shortest among the multiple preset video encoding standards; and step S150, encoding the video segments to be encoded respectively according to the multiple target rate factors to obtain multiple target video segments.
通过利用待编码视频片段的时空域特征信息确定多个预设视频编码标准下的预测码率因子,并且利用预测码率因子和预设视频编码分值确定多个预设视频编码标准下的目标码率因子,可以实现对码率因子的修正,由此提高了码率因子的准确度,进而使得视频编码的准确度也得到了有效提升,保证了视频质量。另外,由于可以同时对多个预设视频编码标准下的码率因子进行预测,因此可以使得编码系统同时输出多路不同视频编码标准但均满足视频质量要求的编码视频,避免了针对多个视频编码标准重复执行预测操作,由此节约了时间成本。进一步地,对码率因子的修正是基于第一次预测结果中对应于编码时间最短的视频编码标准的码率因子的。由于编码时间最短的视频编码标准的
计算复杂度最低,因此该视频编码标准下的码率因子可以在一定程度上反应编码时间较长的其他视频编码标准下的码率因子的准确度,从而将不同视频编码标准关联起来,实现了针对不同视频编码标准的码率因子的联合预测,进一步减少了预测码率因子所需的计算量。By using the spatiotemporal feature information of the video clip to be encoded to determine the predicted rate factors under multiple preset video coding standards, and using the predicted rate factors and preset video coding scores to determine the target rate factors under multiple preset video coding standards, the rate factors can be corrected, thereby improving the accuracy of the rate factors, thereby effectively improving the accuracy of video coding and ensuring video quality. In addition, since the rate factors under multiple preset video coding standards can be predicted at the same time, the encoding system can simultaneously output multiple encoded videos of different video coding standards but all meeting the video quality requirements, avoiding repeated prediction operations for multiple video coding standards, thereby saving time costs. Furthermore, the correction of the rate factor is based on the rate factor of the video coding standard with the shortest encoding time in the first prediction result. Since the rate factor of the video coding standard with the shortest encoding time The computational complexity is the lowest, so the bit rate factor under this video coding standard can, to a certain extent, reflect the accuracy of the bit rate factors under other video coding standards with longer encoding time, thereby associating different video coding standards and realizing the joint prediction of bit rate factors for different video coding standards, further reducing the amount of calculation required to predict the bit rate factor.
根据本公开的一些实施例,步骤S110、获取待编码视频片段可以包括:获取待编码视频;对所述待编码视频进行场景检测;基于场景检测结果将所述待编码视频分成一个或多个子视频片段;以及识别所述一个或多个视频片段中的每一个视频子片段作为待编码视频片段。According to some embodiments of the present disclosure, step S110, obtaining a video segment to be encoded may include: obtaining a video to be encoded; performing scene detection on the video to be encoded; dividing the video to be encoded into one or more sub-video segments based on the scene detection result; and identifying each video sub-segment in the one or more video segments as a video segment to be encoded.
根据一些实施例,可以从适当的存储装置(本地的和/或远程的)读取已存储或缓存的待编码视频。可替代地,也可以经由有线或无线通信链路从外部其他设备接收待编码视频。待编码视频可以指任何一段完整的视频文件。例如,其可以是用户自己录制的视频文件,也可以是用户从其他视频文件中截取的视频文件,也可以是用户由多个视频文件为基础创作的视频文件,本公开要求保护的主题的范围在这方面不受限制。According to some embodiments, the stored or cached video to be encoded can be read from an appropriate storage device (local and/or remote). Alternatively, the video to be encoded can also be received from other external devices via a wired or wireless communication link. The video to be encoded can refer to any complete video file. For example, it can be a video file recorded by the user himself, or a video file captured by the user from other video files, or a video file created by the user based on multiple video files, and the scope of the subject matter claimed for protection in the present disclosure is not limited in this respect.
在获取到待编码视频后,可以对该待编码视频进行场景检测,根据检测结果对待编码视频进行分割,并且将分割出的每个视频片段均作为待编码视频片段。After the video to be encoded is obtained, scene detection may be performed on the video to be encoded, and the video to be encoded may be segmented according to the detection result, and each segmented video segment may be used as a video segment to be encoded.
根据一些实施例,可以基于开源编码器对待编码视频进行场景检测。例如,在响应于接收到用户上传的视频V而确定视频V为待编码视频的场景中,可以调用开源编码器x264对视频V进行编码,并打开场景检测开关,得到对应的场景切换检测结果。根据该检测结果,可以将视频V分割成t个独立的单镜头视频片段(V1、V2……Vt),并且每个单镜头视频片段均可以作为一个待编码视频片段。According to some embodiments, scene detection can be performed on the video to be encoded based on an open source encoder. For example, in a scenario where a video V is determined to be a video to be encoded in response to receiving a video V uploaded by a user, the open source encoder x264 can be called to encode the video V, and the scene detection switch can be turned on to obtain a corresponding scene switching detection result. According to the detection result, the video V can be divided into t independent single-lens video segments (V1, V2...Vt), and each single-lens video segment can be used as a video segment to be encoded.
根据另一些实施例,可以基于分帧操作来对待编码视频进行分割。继续上述待编码视频为V的示例,可以对该待编码视频V进行分帧操作以获得多个视频帧,然后针对其中的每一个视频帧确定该视频帧与相邻视频帧之间的相似度,并且基于计算出的相似度确定该视频帧是否为待处理视频的用于区分相邻镜头的边界视频帧。应当理解,可以利用任何合适的解码器或其他技术(例如,Open CV、FFmpeg等)来进行分帧操作,也可以利用任何合适的计算方法还确定相邻视频帧之间的相似度(例如,计算帧差、灰度直方图等),本公开要求保护的主题的范围在这方面不受限制。According to other embodiments, the video to be encoded can be segmented based on a frame operation. Continuing with the above example where the video to be encoded is V, the video to be encoded V can be subjected to a frame operation to obtain a plurality of video frames, and then the similarity between the video frame and the adjacent video frame is determined for each of the video frames, and based on the calculated similarity, it is determined whether the video frame is a boundary video frame for distinguishing adjacent shots of the video to be processed. It should be understood that any suitable decoder or other technology (e.g., Open CV, FFmpeg, etc.) can be used to perform the frame operation, and any suitable calculation method can also be used to determine the similarity between adjacent video frames (e.g., calculating frame differences, grayscale histograms, etc.), and the scope of the subject matter claimed for protection in the present disclosure is not limited in this respect.
相较于完整的视频文件,一个单镜头视频片段中所展示的内容通常是相似的并且连贯性更强,因此,在对视频进行编码时,针对每个单镜头视频片段使用特定的、相同的码率因子将更加合理,从而确保了各个场景下稳定的画质体验。
Compared with a complete video file, the content shown in a single-shot video clip is usually similar and more coherent. Therefore, when encoding the video, it is more reasonable to use a specific and identical bitrate factor for each single-shot video clip, thereby ensuring a stable picture quality experience in all scenarios.
根据本公开的一些实施例,时空域特征信息可以包括时空复杂度信息,并且其中,步骤S120、基于一个或多个视频帧计算待编码视频片段的时空域特征信息可以包括:基于该一个或多个视频帧计算待编码视频片段的灰度共生信息和归一化信息作为时空复杂度信息,其中,灰度共生信息用于描述待编码视频片段的空间复杂度,并且归一化信息用于描述待编码视频片段的时域复杂度;以及对灰度共生信息和归一化信息进行组合,以生成待编码视频片段的时空复杂度信息。According to some embodiments of the present disclosure, the spatiotemporal domain feature information may include spatiotemporal complexity information, and wherein step S120, calculating the spatiotemporal domain feature information of the video segment to be encoded based on one or more video frames may include: calculating the grayscale symbiosis information and normalization information of the video segment to be encoded based on the one or more video frames as the spatiotemporal complexity information, wherein the grayscale symbiosis information is used to describe the spatial complexity of the video segment to be encoded, and the normalization information is used to describe the temporal complexity of the video segment to be encoded; and combining the grayscale symbiosis information and the normalization information to generate the spatiotemporal complexity information of the video segment to be encoded.
灰度共生信息(也被称为灰度共生矩阵,Gray Level Co-occurrence Matrix,GLCM),是一种用于描述视频空间纹理复杂度的特征。在一些实施例中,针对待编码视频片段的一个或多个视频帧中的每一个视频帧,可以计算获得一个m×m的二维GLCM矩阵,其中该GLCM矩阵中每个元素的取值范围是0-1。然后,可以针对该矩阵进一步求解信息熵(entropy)、对比度(contrast)、相关度(inverse different moment)、能量、自相关(correlation)等5个特征值,并且对所有视频帧对应的这些特征值计算均值、方差、偏度和峰度等4个统计值,例如,计算每个待处理视频帧的信息熵的均值、方差、偏度和峰度等。最终将获得共计20(5×4)个特征值。进一步地,为了考量视频多尺度特性,可以将每一个视频帧缩小4倍和16倍,并且分别重复上述步骤(即,进行两次GLCM特征提取)。因此,针对待编码视频片段的视频帧最终将获得60个特征值,即60个GLCM特征值。Gray level co-occurrence information (also known as gray level co-occurrence matrix, Gray Level Co-occurrence Matrix, GLCM) is a feature used to describe the complexity of video spatial texture. In some embodiments, for each video frame in one or more video frames of the video segment to be encoded, an m×m two-dimensional GLCM matrix can be calculated, where the value range of each element in the GLCM matrix is 0-1. Then, five eigenvalues such as information entropy, contrast, correlation, energy, and autocorrelation can be further solved for the matrix, and four statistical values such as mean, variance, skewness, and kurtosis are calculated for these eigenvalues corresponding to all video frames. For example, the mean, variance, skewness, and kurtosis of the information entropy of each video frame to be processed are calculated. Finally, a total of 20 (5×4) eigenvalues will be obtained. Further, in order to consider the multi-scale characteristics of the video, each video frame can be reduced by 4 times and 16 times, and the above steps are repeated respectively (i.e., GLCM feature extraction is performed twice). Therefore, 60 feature values, that is, 60 GLCM feature values, will be finally obtained for the video frame of the video segment to be encoded.
归一化信息(也被称为归一化相关系数,Normalized Correlation Coefficient,NCC),是一种用于描述时域复杂度的特征。在一些实施例中,针对待编码视频片段的一个或多个视频帧中的每对相邻的视频帧,可以计算NCC以获得长宽和原视频帧一样大的二维相关系数矩阵,其中,矩阵中每个元素的取值范围是0-1,表示相邻两个视频帧对应位置的时域相关性。对于包含n个视频帧的待编码视频片段,将存在n-1个相邻视频帧对,因此可以获得n-1个相关系数矩阵。然后,计算这n-1个相关系数矩阵的均值、方差、峰度、偏度和信息熵等共计5个特征值,并且对于每个特征值,再从时域上计算均值和方差,最终可以得到10个(5×2)特征值,即10个NCC特征值。Normalized information (also known as the Normalized Correlation Coefficient, NCC) is a feature used to describe time domain complexity. In some embodiments, for each pair of adjacent video frames in one or more video frames of the video segment to be encoded, the NCC can be calculated to obtain a two-dimensional correlation coefficient matrix with the same length and width as the original video frame, wherein the value range of each element in the matrix is 0-1, indicating the time domain correlation of the corresponding positions of two adjacent video frames. For a video segment to be encoded containing n video frames, there will be n-1 adjacent video frame pairs, so n-1 correlation coefficient matrices can be obtained. Then, the mean, variance, kurtosis, skewness and information entropy of these n-1 correlation coefficient matrices are calculated, totaling 5 eigenvalues, and for each eigenvalue, the mean and variance are calculated from the time domain, and finally 10 (5×2) eigenvalues can be obtained, that is, 10 NCC eigenvalues.
在通过例如如上所述的方法计算出GLCM特征值和NCC特征值之后,可以对两者进行组合,即可获得70个特征值作为待编码视频片段的时空复杂度信息,其中,该时空复杂度信息可以通过向量形式进行表示。
After the GLCM eigenvalue and the NCC eigenvalue are calculated by, for example, the method described above, the two may be combined to obtain 70 eigenvalues as the spatiotemporal complexity information of the video segment to be encoded, wherein the spatiotemporal complexity information may be represented in a vector form.
由于灰度共生信息和归一化信息可以分别从空间和时间角度来表征待编码视频的复杂度,因此,基于这些空间复杂度信息来预测不同视频编码标准下待编码视频片段的多个码率因子,可以使预测得到的码率因子将更为准确,从而有利于确保输出视频的质量。Since grayscale co-occurrence information and normalization information can characterize the complexity of the video to be encoded from the spatial and temporal perspectives respectively, predicting multiple bit rate factors of the video fragments to be encoded under different video coding standards based on these spatial complexity information can make the predicted bit rate factors more accurate, which is beneficial to ensure the quality of the output video.
根据本公开的一些实施例,时空域特征信息可以包括编码特征信息,并且其中,步骤S120、基于一个或多个视频帧计算待编码视频片段的时空域特征信息可以包括:对一个或多个视频帧进行预处理,以生成新的视频帧序列,其中,新的视频帧序列包括视频帧对集合;以及针对视频帧对集合中的每一个视频帧对,分别进行帧内编码和帧间编码,以获得待编码视频片段的编码特征信息。According to some embodiments of the present disclosure, the spatiotemporal feature information may include coding feature information, and wherein step S120, calculating the spatiotemporal feature information of the video segment to be encoded based on one or more video frames may include: preprocessing one or more video frames to generate a new video frame sequence, wherein the new video frame sequence includes a set of video frame pairs; and performing intra-frame encoding and inter-frame encoding for each video frame pair in the set of video frame pairs, respectively, to obtain the coding feature information of the video segment to be encoded.
通过对待编码视频中的每一视频帧进行帧内编码或帧间编码,可以从另一个维度反映待编码视频片段的时空域特征信息,即空间编码复杂度和时域编码复杂度,这是因为帧内编码采用了待编码视频帧的图像内部数据,而帧间编码采用了待编码视频帧与其相邻视频帧的图像数据,由此可以更为全面的表征待编码视频片段。利用该编码特征信息与上述由GLCM特征值和NCC特征值组成的时空复杂度信息的组合来预测不同视频编码标准下待编码视频片段的多个码率因子,可以进一步提高预测结果的准确度,这对于期望更高质量、编码更为精细的热点视频将尤为有益。By performing intra-frame coding or inter-frame coding on each video frame in the video to be coded, the spatiotemporal domain feature information of the video segment to be coded can be reflected from another dimension, namely, the spatial coding complexity and the temporal coding complexity. This is because intra-frame coding uses the internal image data of the video frame to be coded, while inter-frame coding uses the image data of the video frame to be coded and its adjacent video frames, thereby more comprehensively representing the video segment to be coded. The combination of this coding feature information and the spatiotemporal complexity information composed of the GLCM eigenvalue and the NCC eigenvalue can be used to predict multiple bit rate factors of the video segment to be coded under different video coding standards, which can further improve the accuracy of the prediction results, which will be particularly beneficial for hot videos that expect higher quality and more refined coding.
根据本公开的一些实施例,在步骤S120中,对一个或多个视频帧进行预处理,以生成新的视频帧序列可以包括:对一个或多个视频帧中除首帧和尾帧之外的视频帧进行复制,以生成一个或多个复制视频帧;以及对一个或多个视频帧和一个或多个复制视频帧进行排序,使得一个或多个复制视频帧中的每一个复制视频帧位于对应的视频帧之后。According to some embodiments of the present disclosure, in step S120, preprocessing one or more video frames to generate a new video frame sequence may include: copying video frames other than the first frame and the last frame of the one or more video frames to generate one or more copied video frames; and sorting the one or more video frames and the one or more copied video frames so that each of the one or more copied video frames is located after the corresponding video frame.
根据一些实施例,在生成对应于除首帧和尾帧之外的视频帧的一个或多个复制帧后,可以将每一个复制帧放在对应的原始视频帧之后,以生成新的视频帧序列。举例而言,假设待编码视频片段包括n个视频帧并且该n个视频帧组成的序列为(f1,f2,f3,…,fn-2,fn-1,fn,),则除首帧f1和尾帧fn之外,分别对视频帧f2至fn-1中的每个视频帧进行复制,以生成新的视频帧序列(f1,f2,f2,f3,f3,…,fn-2,fn-2,fn-1,fn-1,fn,),其中,(f1,f2)、(f2,f3)……(fn-2,fn-1)、(fn-1,fn)为视频帧对。According to some embodiments, after generating one or more duplicate frames corresponding to video frames other than the first frame and the last frame, each duplicate frame may be placed after the corresponding original video frame to generate a new video frame sequence. For example, assuming that the video clip to be encoded includes n video frames and the sequence composed of the n video frames is ( f1 , f2 , f3 , …, fn -2 , fn -1 , fn ), then except the first frame f1 and the last frame fn , each video frame from video frames f2 to fn -1 is copied to generate a new video frame sequence ( f1 , f2 , f2 , f3 , f3 , …, fn -2 , fn -2 , fn -1, fn-1 , fn ), where ( f1 , f2 ), ( f2 , f3 ), … ( fn-2 , fn -1 ), ( fn-1 , fn ) are video frame pairs.
与未生成复制帧的视频帧序列相比,通过对除首帧和尾帧之外的视频帧进行复制并生成一个或多个有序视频帧对,可以确保每个视频帧内以及相邻视频帧间的数据均能够得到利用,保留了数据完整性,有利于实现对待编码视频片段的每一视频帧的更为准确的帧内编码和帧间编码,从而提高预测出的码率因子的准确度。
Compared with a video frame sequence without generating duplicate frames, by copying the video frames except the first frame and the last frame and generating one or more ordered video frame pairs, it can ensure that the data within each video frame and between adjacent video frames can be utilized, retaining data integrity, and facilitating more accurate intra-frame encoding and inter-frame encoding of each video frame of the encoded video fragment, thereby improving the accuracy of the predicted bit rate factor.
根据本公开的另一些实施例,对一个或多个视频帧进行预处理还可以包括例如对每个视频帧进行裁剪以满足尺寸要求、去除视频帧中的非必要(干扰)特征等。由此,可以加快计算和预测速度,快速获得编码特征信息并且预测待编码视频的码率因子,同时也有利于提高预测出的码率因子的准确度。According to other embodiments of the present disclosure, preprocessing one or more video frames may also include, for example, cropping each video frame to meet size requirements, removing unnecessary (interference) features in the video frame, etc. Thus, the calculation and prediction speed can be accelerated, the encoding feature information can be quickly obtained, and the bit rate factor of the video to be encoded can be predicted, which is also conducive to improving the accuracy of the predicted bit rate factor.
在对待编码视频片段进行预处理获得视频帧对集合后,可以采用例如显卡内置的硬件H.264编码器对其进行预编码。预编码可以采用例如固定量化参数模式,并且任选地,量化参数可以为28、29、30等任何适合的值。编码帧结构可以采用例如IP结构,其中,I表示帧内编码并且P表示帧间编码。在这种编码模式下,除首帧和尾帧之外,待编码视频片段的每个视频帧均被编码了两次(即一次采用帧内编码I,一次采用帧间编码P),其中,帧内编码可以反映每一个视频帧的空间编码复杂度,并且帧间编码可以反映每一个视频帧的时域编码复杂度。在对每一个视频帧对集合进行IP编码之后,可以获得待编码视频片段的编码特征信息。After the video segment to be encoded is preprocessed to obtain a set of video frame pairs, it can be pre-encoded using, for example, a hardware H.264 encoder built into a graphics card. Pre-encoding can use, for example, a fixed quantization parameter mode, and optionally, the quantization parameter can be any suitable value such as 28, 29, 30, etc. The encoding frame structure can use, for example, an IP structure, where I represents intra-frame coding and P represents inter-frame coding. In this encoding mode, except for the first frame and the last frame, each video frame of the video segment to be encoded is encoded twice (i.e., once using intra-frame coding I and once using inter-frame coding P), where intra-frame coding can reflect the spatial coding complexity of each video frame, and inter-frame coding can reflect the temporal coding complexity of each video frame. After IP encoding is performed on each set of video frame pairs, the encoding feature information of the video segment to be encoded can be obtained.
根据本公开的一些实施例,针对视频帧对集合中的每一个视频帧对,分别进行帧内编码和帧间编码,以获得待编码视频片段的编码特征信息可以包括:对视频帧对集合中的每一个视频帧对中的第一视频帧进行帧内编码,以获得帧内编码信息;对视频帧对集合中的每一个视频帧对进行帧间编码,以获得帧间编码信息;以及基于帧内编码信息和帧间编码信息获得编码特征信息。According to some embodiments of the present disclosure, performing intra-frame coding and inter-frame coding respectively on each video frame pair in a video frame pair set to obtain coding feature information of a video segment to be encoded may include: performing intra-frame coding on the first video frame in each video frame pair in the video frame pair set to obtain intra-frame coding information; performing inter-frame coding on each video frame pair in the video frame pair set to obtain inter-frame coding information; and obtaining coding feature information based on the intra-frame coding information and the inter-frame coding information.
待编码视频片段的每个视频帧对中存在两个视频帧,在本申请中,第一视频帧指代视频帧对中在先的视频帧。例如,对于视频帧对(f1,f2),则f1为第一视频帧;对于待编码视频帧对(f2,f3),则f2为第一视频帧,以此类推。There are two video frames in each video frame pair of the video segment to be encoded. In this application, the first video frame refers to the previous video frame in the video frame pair. For example, for the video frame pair (f 1 , f 2 ), f 1 is the first video frame; for the video frame pair to be encoded (f 2 , f 3 ), f 2 is the first video frame, and so on.
根据一些示例,对每一个视频帧对中的第一视频帧进行帧内编码,以获得帧内编码信息可以包括:在对每个第一视频帧进行帧内编码后提取每个帧内编码的特征值,例如编码比特数、帧内预测模式的占比值等34个特征值;以及计算所有帧内编码的对应特征值的均值、方差、偏度、峰度和信息熵5个统计值,以得到170(即34×5)个帧内编码信息。According to some examples, intra-coding the first video frame in each video frame pair to obtain intra-coding information may include: extracting characteristic values of each intra-coding after intra-coding each first video frame, such as 34 characteristic values such as the number of coding bits and the proportion of intra-prediction mode; and calculating five statistical values of the mean, variance, skewness, kurtosis and information entropy of the corresponding characteristic values of all intra-coding to obtain 170 (i.e., 34×5) intra-coding information.
对于每个视频针对,还可以提取每个视频帧对的帧间编码信息,例如对于视频帧对(f1,f2),则计算f1与f2的帧间编码信息;对于视频帧对(f2,f3),则计算f2与f3的帧间编码信息,依此类推。For each video pair, the inter-frame coding information of each video frame pair can also be extracted. For example, for a video frame pair (f 1 , f 2 ), the inter-frame coding information of f 1 and f 2 is calculated; for a video frame pair (f 2 , f 3 ), the inter-frame coding information of f 2 and f 3 is calculated, and so on.
根据一些示例,对每个视频帧对进行帧间编码,以获得帧间编码信息可以包括:在对每个视频帧对进行帧间编码后提取每个帧间编码的特征值,例如编码比特数、帧间运
动矢量幅度分布等26个特征值;以及计算所有帧间编码的对应特征值计算均值、方差、偏度、峰度和信息熵5个统计值,以得到130(即26×5)个帧间编码信息。According to some examples, performing inter-frame coding on each video frame pair to obtain inter-frame coding information may include: extracting a feature value of each inter-frame coding after performing inter-frame coding on each video frame pair, such as the number of coding bits, the number of inter-frame operations, and the number of inter-frame operations. 26 eigenvalues such as motion vector amplitude distribution, and 5 statistical values including mean, variance, skewness, kurtosis and information entropy are calculated for the corresponding eigenvalues of all inter-frame coding to obtain 130 (i.e. 26×5) inter-frame coding information.
继续上述示例,将170个帧内编码信息和130个帧间编码信息进行拼接,从而获得300个待编码视频片段的编码特征信息。与空间复杂度信息类似,编码特征信息也可以通过向量形式进行表示。Continuing the above example, 170 intra-frame coding information and 130 inter-frame coding information are concatenated to obtain coding feature information of 300 video segments to be coded. Similar to the spatial complexity information, the coding feature information can also be represented in vector form.
将理解的是,在上述实施例中,虽然帧内编码的步骤被描述为在帧间编码的步骤之后执行,但是这不应解释为要求帧内编码和帧间编码必须以此特定顺序执行。例如,帧内编码的步骤可以和帧间编码的步骤并行执行、或者帧内编码的步骤可以在帧间编码的步骤之后执行。It will be understood that in the above embodiments, although the intraframe coding step is described as being performed after the interframe coding step, this should not be interpreted as requiring that the intraframe coding and the interframe coding must be performed in this particular order. For example, the intraframe coding step can be performed in parallel with the interframe coding step, or the intraframe coding step can be performed after the interframe coding step.
还将理解的是,在上述实施例中,虽然帧内编码信息和帧间编码信息被描述为诸如编码比特数、帧内预测模式的占比值、帧间运动矢量幅度分布等特征值的均值、方差、偏度、峰度和信息熵,但是也可以直接确定这些特征值作为帧内编码信息或帧间编码信息。It will also be understood that, in the above-mentioned embodiments, although the intra-frame coding information and the inter-frame coding information are described as the mean, variance, skewness, kurtosis and information entropy of characteristic values such as the number of coding bits, the proportion of intra-frame prediction modes, the distribution of inter-frame motion vector amplitudes, etc., these characteristic values can also be directly determined as the intra-frame coding information or the inter-frame coding information.
根据本公开的一些实施例,步骤S130、基于时空域特征信息确定多个预设视频编码标准下待编码视频片段的多个预测码率因子可以包括:将待编码视频片段的时空域特征信息输入到第一码率因子预测模型,以经由第一码率因子预测模型确定多个预设视频编码标准下待编码视频片段的多个预测码率因子。According to some embodiments of the present disclosure, step S130, determining multiple predicted rate factors of the video segments to be encoded under multiple preset video coding standards based on the spatiotemporal domain feature information may include: inputting the spatiotemporal domain feature information of the video segments to be encoded into a first rate factor prediction model, so as to determine multiple predicted rate factors of the video segments to be encoded under multiple preset video coding standards via the first rate factor prediction model.
码率因子(Rate Factor,RF)是指恒定码率因子编码模式下的编码参数,用于控制编码文件的码率和质量。码率因子的取值通常在0-50的范围之间。码率因子的取值越大,则码率越低,编码质量也越差,反之,码率因子的取值越小,则码率越高,编码质量也越好。Rate Factor (RF) refers to the encoding parameter in the constant rate factor encoding mode, which is used to control the bit rate and quality of the encoded file. The value of the rate factor is usually in the range of 0-50. The larger the value of the rate factor, the lower the bit rate and the worse the encoding quality. Conversely, the smaller the value of the rate factor, the higher the bit rate and the better the encoding quality.
多个预设视频编码标准的示例可以包括H.264视频编码标准、H.265视频编码标准、H.266视频编码标准、AV1视频编码标准。将理解的是,预设视频标准还可以包括任何其他现有的或者随着技术的发展而生成的视频编码标准。还将理解的是,多个预设视频编码标准的数量可以为满足实际需求的任意整数值。本公开要求保护的主题的范围在这两方面均不受限制。Examples of multiple preset video coding standards may include H.264 video coding standard, H.265 video coding standard, H.266 video coding standard, AV1 video coding standard. It will be understood that the preset video standard may also include any other existing video coding standard or video coding standard generated as technology develops. It will also be understood that the number of multiple preset video coding standards may be any integer value that meets actual needs. The scope of the subject matter claimed in the present disclosure is not limited in both aspects.
根据本公开的一些实施例,在时空域特征信息仅包括时空复杂度信息的情况下,可以将待编码视频片段的灰度共生信息和归一化信息组合后的特征信息输入到第一码率因子预测模型,其中,该第一码率因子预测模型被训练用于根据时空复杂度信息来确定多个预设视频编码标准下每个待编码视频片段的预测码率因子。
According to some embodiments of the present disclosure, when the spatiotemporal domain feature information only includes spatiotemporal complexity information, the feature information obtained by combining the grayscale co-occurrence information and normalization information of the video segment to be encoded can be input into a first rate factor prediction model, wherein the first rate factor prediction model is trained to determine the predicted rate factor of each video segment to be encoded under multiple preset video coding standards based on the spatiotemporal complexity information.
根据本公开的另一些实施例,在时空域特征信息仅包括编码特征信息的情况下,可以将待编码视频片段的帧内编码信息和帧间编码信息组合后的特征信息输入到第一码率因子预测模型,其中,该第一码率因子预测模型被训练用于根据编码特征信息来确定多个预设视频编码标准下每个待编码视频片段的预测码率因子。According to some other embodiments of the present disclosure, when the spatiotemporal domain feature information only includes coding feature information, feature information obtained by combining intra-frame coding information and inter-frame coding information of the video segment to be encoded can be input into a first rate factor prediction model, wherein the first rate factor prediction model is trained to determine the predicted rate factor of each video segment to be encoded under multiple preset video coding standards based on the coding feature information.
根据本公开的又一些实施例,在时空域特征信息包括时空复杂度信息和编码特征信息两者的情况下,可以将时空复杂度信息和编码特征信息进行组合以生成以一组特征向量进行表示的特征信息。继续上述时空复杂度信息为包含有70个特征值的向量、编码特征信息为包含有300个特征值的向量的示例,则将这两个向量进行组合后将获得包含有370个特征值的特征向量,从而更为完整地表征待编码视频片段的时空特性。在这种情况下,可以将生成的一组特征向量输入到第一码率因子预测模型,其中,该第一码率因子预测模型被训练用于根据时空复杂度信息和编码特征信息两者来确定多个预设视频编码标准下每个待编码视频片段的预测码率因子。According to some other embodiments of the present disclosure, when the spatiotemporal domain feature information includes both spatiotemporal complexity information and coding feature information, the spatiotemporal complexity information and the coding feature information can be combined to generate feature information represented by a set of feature vectors. Continuing with the above example where the spatiotemporal complexity information is a vector containing 70 eigenvalues and the coding feature information is a vector containing 300 eigenvalues, a feature vector containing 370 eigenvalues will be obtained after combining these two vectors, thereby more completely characterizing the spatiotemporal characteristics of the video segment to be encoded. In this case, the generated set of feature vectors can be input into a first rate factor prediction model, wherein the first rate factor prediction model is trained to determine the predicted rate factor of each video segment to be encoded under multiple preset video coding standards based on both the spatiotemporal complexity information and the coding feature information.
通过利用经训练的第一码率因子预测模型来预测多个预设视频编码标准下每个待编码视频片段的预测码率因子,可以直接确定满足质量要求(例如,编码后视频片段的视频编码分值较高)的码率因子,而无需在每次编码时经过多次编码和搜索获得,从而节约了大量时间成本。同时,在样本数据较为充足的情况下,也可以在一定程度上确保预测出的码率因子的准确度。By using the trained first rate factor prediction model to predict the predicted rate factor of each to-be-encoded video segment under multiple preset video encoding standards, the rate factor that meets the quality requirements (for example, the video encoding score of the encoded video segment is higher) can be directly determined without having to obtain it through multiple encodings and searches each time, thereby saving a lot of time costs. At the same time, when the sample data is sufficient, the accuracy of the predicted rate factor can also be ensured to a certain extent.
图2示出了根据本公开的一些实施例的训练第一码率因子预测模型的方法200的流程图。如图2所示,第一码率因子预测模型可以通过方法200的以下步骤被训练:步骤S210、针对多个预设视频编码标准中的每一个预设视频编码标准,获取样本视频片段的样本时空域特征信息以及该预设视频编码标准下样本视频片段的样本码率因子;步骤S220、将样本时空域特征信息输入到第一码率因子预测模型,以经由第一码率因子预测模型确定该预设视频编码标准下样本视频片段的样本预测码率因子;步骤S230、基于样本码率因子和样本预测码率因子来计算第一模型损失值;以及步骤S240、基于每一个预设视频编码标准下的第一模型损失值来调整第一码率因子预测模型的参数,直到达到第一模型训练停止条件。FIG2 shows a flow chart of a method 200 for training a first rate factor prediction model according to some embodiments of the present disclosure. As shown in FIG2, the first rate factor prediction model can be trained by the following steps of the method 200: step S210, for each of a plurality of preset video coding standards, obtaining sample spatiotemporal feature information of a sample video clip and a sample rate factor of the sample video clip under the preset video coding standard; step S220, inputting the sample spatiotemporal feature information into the first rate factor prediction model to determine the sample predicted rate factor of the sample video clip under the preset video coding standard via the first rate factor prediction model; step S230, calculating a first model loss value based on the sample rate factor and the sample predicted rate factor; and step S240, adjusting the parameters of the first rate factor prediction model based on the first model loss value under each preset video coding standard until the first model training stop condition is reached.
通过利用样本数据对第一码率因子预测模型进行预训练和调整,可以提高预测出的码率因子的准确度。By pre-training and adjusting the first rate factor prediction model using sample data, the accuracy of the predicted rate factor can be improved.
根据一些实施例,在步骤S210中,可以采用与步骤S110中获取待编码视频片段类似的方法来获取样本视频片段,并且采用与步骤S120中计算待编码视频片段的时空域特
征信息类似的方法来计算样本时空域特征信息,故此不再赘述。为了确保训练后的第一码率因子预测模型的准确度,可以在兼顾计算资源和时间成本的前提下获取更多的样本视频片段,样本视频片段的数量例如可以为至少5万条、至少10万条等。According to some embodiments, in step S210, a sample video segment may be obtained by using a method similar to that of obtaining the video segment to be encoded in step S110, and the temporal and spatial characteristics of the video segment to be encoded may be calculated by using the method similar to that of calculating the temporal and spatial characteristics of the video segment to be encoded in step S120. In order to ensure the accuracy of the first rate factor prediction model after training, more sample video clips can be obtained under the premise of taking into account computing resources and time costs. The number of sample video clips can be, for example, at least 50,000, at least 100,000, etc.
根据一些实施例,在步骤S210中,样本码率因子可以为真实码率因子,即采用该码率因子对样本视频片段进行编码后得到的经编码视频的质量满足目标质量要求。经编码视频的质量可以例如通过视频编码分值(也被称为视频多维度混合评价因子,Video Multimethod Assessment Fusion,VMAF)来评价,VMAF的值越大,则表明编码后视频的质量越高,反之,VMAF的值越小,则表明编码后视频的质量越低,其中,VMAF的取值范围在通常情况下可以为0-100。真实码率因子可以通过如下步骤来获得:针对该视频编码标准,选择初始样本码率因子和预设样本视频编码分值区间;基于该初始样本码率因子,使用真实转码系统中的对应编码器(例如,H.264编码器、H.265编码器、H.266编码器、AV1编码器等)对样本视频片段进行编码,以获得该视频编码标准下的样本预编码视频片段;将获得的样本预编码视频片段的样本视频编码分值与预设样本视频编码分值区间进行比较;如果该样本视频编码分值在预设样本视频编码分值区间之内(即,编码后的视频满足目标质量要求),则表明该视频编码标准下初始样本码率因子是可行的,可以采用该初始样本码率因子作为真实码率因子;如果该样本视频编码分值在预设样本视频编码分值区间之外(即,编码后的视频未满足目标质量要求),则表明该视频编码标准下初始样本码率因子不可行,可以使用诸如二分法等搜索方法对初始样本码率因子进行调整,直到该样本视频编码分值在预设样本视频编码分值区间之内为止。通常情况下,针对同一样本视频片段,不同视频编码标准下的码率因子一般不同,因此,将针对每一视频编码标准重复上述步骤,以搜索获得多个视频编码标准下样本视频片段的样本码率因子(真实码率因子)。According to some embodiments, in step S210, the sample rate factor may be a real rate factor, that is, the quality of the encoded video obtained after encoding the sample video segment using the rate factor meets the target quality requirement. The quality of the encoded video may be evaluated, for example, by a video coding score (also known as a video multimethod assessment fusion, VMAF). The larger the value of VMAF, the higher the quality of the encoded video. Conversely, the smaller the value of VMAF, the lower the quality of the encoded video. The value range of VMAF may be 0-100 under normal circumstances. The real bit rate factor can be obtained by the following steps: for the video coding standard, selecting an initial sample bit rate factor and a preset sample video coding score range; based on the initial sample bit rate factor, using a corresponding encoder in a real transcoding system (for example, H.264 encoder, H.265 encoder, H.266 encoder, AV1 encoder, etc.) to encode the sample video clip to obtain a sample pre-encoded video clip under the video coding standard; comparing the sample video coding score of the obtained sample pre-encoded video clip with the preset sample video coding score range; if the sample video coding score is within If the sample video coding score is within the preset sample video coding score range (i.e., the encoded video meets the target quality requirements), it indicates that the initial sample rate factor under the video coding standard is feasible, and the initial sample rate factor can be used as the real rate factor; if the sample video coding score is outside the preset sample video coding score range (i.e., the encoded video does not meet the target quality requirements), it indicates that the initial sample rate factor under the video coding standard is not feasible, and a search method such as binary search can be used to adjust the initial sample rate factor until the sample video coding score is within the preset sample video coding score range. Usually, for the same sample video clip, the rate factors under different video coding standards are generally different. Therefore, the above steps will be repeated for each video coding standard to search for sample rate factors (real rate factors) of sample video clips under multiple video coding standards.
在上述实施例中,预设样本视频编码分值区间例如可以为基于VMAF的、用户根据实际需求自定义的一个数值范围(例如[91,93]、[90,95]),也可以为为了使第一码率因子预测模型适配于多种目标质量要求而关于每一个VMAF整数值设置的区间(例如[91.5,92.5]、[92.5,93.5]……[95.5,96.5]等)。本公开要求保护的主题的范围在这方面不受限制。In the above embodiment, the preset sample video coding score interval may be, for example, a value range based on VMAF and customized by the user according to actual needs (e.g., [91, 93], [90, 95]), or an interval set for each VMAF integer value in order to adapt the first rate factor prediction model to multiple target quality requirements (e.g., [91.5, 92.5], [92.5, 93.5] ... [95.5, 96.5], etc.). The scope of the subject matter claimed by the present disclosure is not limited in this respect.
根据一些实施例,在步骤S220中,在初始阶段,可以针对多个视频编码标准选择第一码率因子预测模型的初始参数,并在此初始参数的基础上利用获取的样本时空域特征信息确定该多个视频编码标准下样本视频片段的多个样本预测码率因子。
According to some embodiments, in step S220, in the initial stage, initial parameters of a first rate factor prediction model may be selected for a plurality of video coding standards, and based on the initial parameters, a plurality of sample prediction rate factors of sample video clips under the plurality of video coding standards may be determined using the acquired sample spatiotemporal domain feature information.
根据一些实施例,在步骤S230中,针对每一个预设视频编码标准,可以例如经由损失函数来基于对应的样本预测码率因子和样本预测码率因子来计算模型损失值。损失函数的示例包括但不限于交叉熵损失函数、最大损失函数、平均值损失函数、0-1损失函数等。将理解的是,在本公开中,还可以使用其他适合的方法来计算模型损失值,本公开要求保护的主题的范围在这方面不受限制。According to some embodiments, in step S230, for each preset video coding standard, a model loss value may be calculated based on the corresponding sample prediction rate factor and the sample prediction rate factor, for example, via a loss function. Examples of loss functions include, but are not limited to, a cross entropy loss function, a maximum loss function, an average loss function, a 0-1 loss function, etc. It will be understood that in the present disclosure, other suitable methods may also be used to calculate the model loss value, and the scope of the subject matter claimed in the present disclosure is not limited in this respect.
根据一些实施例,在步骤S240中,在计算出每个预设视频编码标准下的模型损失值后,可以反向传播这些模型损失值,并调整第一码率因子预测模型的参数,然后重复步骤S220-S240,直至达到模型训练停止条件。具体的,模型训练停止条件可以是模型损失值低于预设阈值,和/或模型训练的轮次达到了预设轮次。将理解的是,本公开对模型训练停止条件不做限定,可以根据实际需求来进行调整。According to some embodiments, in step S240, after calculating the model loss value under each preset video coding standard, these model loss values can be back-propagated, and the parameters of the first rate factor prediction model can be adjusted, and then steps S220-S240 are repeated until the model training stop condition is reached. Specifically, the model training stop condition can be that the model loss value is lower than a preset threshold, and/or the number of rounds of model training reaches a preset number of rounds. It will be understood that the present disclosure does not limit the model training stop condition, and it can be adjusted according to actual needs.
图3示出了根据本公开的一些实施例的第一码率因子预测模型300的结构框图。如图3所述,该第一码率因子预测模型300可以包括批归一化模块310、第一注意力机制模块320、残差模块330、第二注意力机制模块340和全连接模块350,并且该第一码率因子预测模型300以待编码视频片段的时空域特征作为输入,输出不同视频编码标准下该待编码视频片段的预测码率因子。Fig. 3 shows a structural block diagram of a first rate factor prediction model 300 according to some embodiments of the present disclosure. As shown in Fig. 3, the first rate factor prediction model 300 may include a batch normalization module 310, a first attention mechanism module 320, a residual module 330, a second attention mechanism module 340 and a fully connected module 350, and the first rate factor prediction model 300 takes the spatiotemporal domain features of the video segment to be encoded as input, and outputs the predicted rate factor of the video segment to be encoded under different video coding standards.
批归一化模块310用于将原始输入的特征归一化为具有均值为0、方差为1、高斯分布的特征,从而消除特征之间量纲差异带来的影响。然后,归一化后的特征被输入到第一注意力机制模块320,该模块可以为每个特征计算一个权重,其中,为具有更高辨识度的特征赋予数值更大的权重并且为信息量较低的特征赋予数值较小的权重,由此对特征进行自动筛选,从而提高预测结果的准确度。然后,经过筛选后的特征被输入到残差模块330,该模块可以包括多个卷积层,这些卷积层对作为输入的筛选后特征进行变化,同时未经变化的筛选后特征将跳过这些卷积层传输到后序层,然后对其整体进行处理以得到残差模块的输出结果,由此减少了信息丢失。残差模块330的输出可以进一步被输入到第二注意力机制模块340,以实现对特征的进一步筛选,从而进一步提高预测结果的准确度。最后,经由全连接模块350可以对输入的特征进行聚合,以生成并输出预测码率因子。The batch normalization module 310 is used to normalize the original input features into features with a mean of 0, a variance of 1, and a Gaussian distribution, thereby eliminating the impact of the dimensional differences between the features. Then, the normalized features are input to the first attention mechanism module 320, which can calculate a weight for each feature, wherein a larger weight is assigned to features with higher recognition and a smaller weight is assigned to features with lower information content, thereby automatically screening the features, thereby improving the accuracy of the prediction results. Then, the screened features are input to the residual module 330, which may include multiple convolutional layers, which change the screened features as input, and the unchanged screened features will skip these convolutional layers and be transmitted to the subsequent layers, and then be processed as a whole to obtain the output result of the residual module, thereby reducing information loss. The output of the residual module 330 can be further input to the second attention mechanism module 340 to achieve further screening of the features, thereby further improving the accuracy of the prediction results. Finally, the input features may be aggregated via the fully connected module 350 to generate and output a predicted bit rate factor.
图4示出了根据本公开的一些实施例的确定多个预设视频编码标准下待编码视频片段的多个目标码率因子的流程图。如图4所示,步骤S140、基于多个预测码率因子中对应于多个预设视频编码标准中的第一预设视频编码标准的第一预测码率因子和预设视频编码分值区间,确定多个预设视频编码标准下待编码视频片段的多个目标码率因子可以
包括:步骤S442、基于第一预测码率因子对待编码视频片段进行第一次编码,以获得第一预设视频编码标准下的第一预编码视频片段;步骤S444、确定第一预编码视频片段的视频编码分值是否在预设视频编码分值区间内;以及步骤S446、响应于确定第一预编码视频片段的视频编码分值在预设视频编码分值区间内:确定多个预测码率因子作为多个目标码率因子。FIG4 shows a flowchart of determining multiple target rate factors of a video segment to be encoded under multiple preset video coding standards according to some embodiments of the present disclosure. As shown in FIG4, step S140, based on a first predicted rate factor of a first preset video coding standard among multiple preset video coding standards and a preset video coding score interval, determines multiple target rate factors of a video segment to be encoded under multiple preset video coding standards. The method comprises: step S442, performing a first encoding on a video segment to be encoded based on a first predicted bit rate factor to obtain a first pre-encoded video segment under a first preset video coding standard; step S444, determining whether a video coding score of the first pre-encoded video segment is within a preset video coding score range; and step S446, in response to determining that the video coding score of the first pre-encoded video segment is within the preset video coding score range: determining a plurality of predicted bit rate factors as a plurality of target bit rate factors.
在本公开中,第一预设视频编码标准是指采用其进行视频编码时耗时最短的视频编码标准。举例而言,在视频编码标准H.264、H.265、H.266和AV1中,由于采用视频编码标准H.264进行视频编码时花费的时间最短,因此视频编码标准H.264为第一预设视频编码标准。In the present disclosure, the first preset video coding standard refers to the video coding standard that takes the shortest time to perform video coding. For example, among the video coding standards H.264, H.265, H.266, and AV1, the video coding standard H.264 takes the shortest time to perform video coding, so the video coding standard H.264 is the first preset video coding standard.
在步骤S442中,除了第一预测码率因子之外,对待编码视频进行编码时利用的编码参数还可以包括但不限于preset、GOP(Group of Picture)、size等。由于码率因子对于编码后视频的质量起到重要作用,因此在使用本公开的方法对视频进行编码时,可以设定这些编码参数的值为固定常数(例如,默认预设值)。In step S442, in addition to the first predicted bit rate factor, the encoding parameters used when encoding the video to be encoded may also include but are not limited to preset, GOP (Group of Picture), size, etc. Since the bit rate factor plays an important role in the quality of the encoded video, when encoding the video using the method of the present disclosure, the values of these encoding parameters can be set to fixed constants (for example, default preset values).
在步骤S444中,类似于预设样本视频编码分值区间的设定,预设视频编码分值区间也可以为用户根据实际需求自定义的一个数值范围,故在此不在赘述。In step S444, similar to the setting of the preset sample video encoding score interval, the preset video encoding score interval can also be a numerical range customized by the user according to actual needs, so it will not be described in detail here.
由于编码时间最短的视频编码标准的计算复杂度最低,因此该视频编码标准下的码率因子可以在一定程度上反应编码时间较长的其他视频编码标准下的码率因子的准确度。换言之,如果基于对应于第一预设视频编码标准(例如,在上述示例中为视频编码标准H.264)的第一预测码率因子进行编码的视频片段的视频编码分值在预设视频编码分值区间内(满足视频质量要求),则表明对应于其他预设视频编码标准(例如,在上述示例中为视频编码标准H.265、H.266和AV1)的其他预测码率因子的准确度也较高(例如,如果视频编码标准H.264下的经编码视频的视频编码分值有99%的概率在预设视频编码分值区间内,则视频编码标准H.265、H.266和AV1下的经编码视频的视频编码分值也有超过95%的概率在预设视频编码分值区间内)。利用这些预测码率因子编码后的视频片段的视频编码分值通常也在预设视频编码分值区间内(满足视频质量要求),从而可以将不同视频编码标准关联起来,有利于减少预测码率因子所需的计算量。在一些示例中,经过多次试验数据显示,当视频编码标准H.264下的预测码率因子满足视频质量要求时,视频编码标准H.265、H.266和AV1下各自的预测码率因子满足视频质量要求的概率为95%,因此可以将这些预测码率因子直接作为用来对待编码视频进行编码的目标码率因子。
Since the computational complexity of the video coding standard with the shortest encoding time is the lowest, the rate factor under this video coding standard can, to a certain extent, reflect the accuracy of the rate factors under other video coding standards with longer encoding times. In other words, if the video coding score of the video clip encoded based on the first predicted rate factor corresponding to the first preset video coding standard (for example, the video coding standard H.264 in the above example) is within the preset video coding score range (meeting the video quality requirement), it indicates that the accuracy of other predicted rate factors corresponding to other preset video coding standards (for example, the video coding standards H.265, H.266 and AV1 in the above example) is also high (for example, if the video coding score of the encoded video under the video coding standard H.264 has a probability of 99% being within the preset video coding score range, the video coding scores of the encoded videos under the video coding standards H.265, H.266 and AV1 also have a probability of more than 95% being within the preset video coding score range). The video coding scores of the video clips encoded using these predicted bitrate factors are usually also within the preset video coding score range (meeting the video quality requirements), so that different video coding standards can be associated, which is conducive to reducing the amount of calculation required for the predicted bitrate factors. In some examples, after multiple test data, it is shown that when the predicted bitrate factors under the video coding standard H.264 meet the video quality requirements, the probability that the predicted bitrate factors under the video coding standards H.265, H.266 and AV1 meet the video quality requirements is 95%, so these predicted bitrate factors can be directly used as target bitrate factors for encoding the video to be encoded.
继续参考图4,根据本公开的另一些实施例,步骤S140、基于多个预测码率因子中对应于多个预设视频编码标准中的第一预设视频编码标准的第一预测码率因子和预设视频编码分值区间,确定多个预设视频编码标准下待编码视频片段的多个目标码率因子还可以包括:步骤S448、响应于确定第一预编码视频片段的视频编码分值不在预设视频编码分值区间内:基于待编码视频片段的时空域特征信息、多个预测码率因子和第一预编码视频片段的视频编码分值,更新多个预测码率因子以获得多个目标码率因子。Continuing to refer to Figure 4, according to other embodiments of the present disclosure, step S140, determining multiple target rate factors for the video segment to be encoded under multiple preset video coding standards based on a first predicted rate factor among multiple predicted rate factors corresponding to a first preset video coding standard among multiple preset video coding standards and a preset video coding score range, may also include: step S448, in response to determining that the video coding score of the first pre-encoded video segment is not within the preset video coding score range: based on the spatiotemporal domain feature information of the video segment to be encoded, the multiple predicted rate factors and the video coding score of the first pre-encoded video segment, updating the multiple predicted rate factors to obtain multiple target rate factors.
继续上述多个预设视频编码标准为H.264、H.265、H.266和AV1的示例,当基于视频编码标准H.264下的预测码率因子获得的预编码视频片段的视频编码分值不在预设视频编码分值区间内时,可以将视频编码标准H.265、H.266和AV1下各自的预测码率因子和视频编码标准为H.264下的预编码视频片段的视频编码分值作为反馈信息,连同待编码视频片段的时空域特征信息一起对这些视频编码标准下的码率因子进行第二次预测,以获得多个目标码率因子。Continuing with the above example of multiple preset video coding standards being H.264, H.265, H.266 and AV1, when the video coding score of the pre-encoded video clip obtained based on the predicted bitrate factor under the video coding standard H.264 is not within the preset video coding score range, the respective predicted bitrate factors under the video coding standards H.265, H.266 and AV1 and the video coding score of the pre-encoded video clip under the video coding standard H.264 can be used as feedback information, together with the spatiotemporal domain feature information of the video clip to be encoded, to perform a second prediction of the bitrate factors under these video coding standards to obtain multiple target bitrate factors.
通过大量的实验数据表明,经过第一次预测并且利用预测码率因子对待编码视频片段进行编码后,视频编码分值满足视频质量要求的概率为40%。因此,60%的待编码视频片段将经历上述第二次预测更新码率因子的操作。当第一预编码视频片段的视频编码分值不在预设视频编码分值区间内时,通过执行对不同视频编码标准下的码率因子的第二次预测,可以进一步提高目标码率因子的准确度,避免针对上述60%的待编码视频片段预测的码率因子误差较大而不能稳定地保障画质体验。A large amount of experimental data shows that after the first prediction and encoding of the video clip to be encoded using the predicted bit rate factor, the probability that the video coding score meets the video quality requirements is 40%. Therefore, 60% of the video clips to be encoded will undergo the above-mentioned second prediction and update bit rate factor operation. When the video coding score of the first pre-encoded video clip is not within the preset video coding score range, by performing a second prediction of the bit rate factor under different video coding standards, the accuracy of the target bit rate factor can be further improved, avoiding the large error of the bit rate factor predicted for the above 60% of the video clips to be encoded and failing to stably guarantee the picture quality experience.
此外,对第一次预测获得的预测码率因子的更新是基于第一预编码视频片段的视频编码分值,从而无需利用其他预设视频编码标准下的预测码率因子分别对待编码视频片段进行编码,这将大大节约编码所需的时间成本和计算资源。In addition, the update of the predicted bit rate factor obtained by the first prediction is based on the video coding score of the first pre-encoded video clip, so there is no need to use the predicted bit rate factors under other preset video coding standards to encode the video clips to be encoded separately, which will greatly save the time cost and computing resources required for encoding.
根据本公开的一些实施例,步骤S448、基于待编码视频片段的时空域特征信息、多个预测码率因子和第一预编码视频片段的视频编码分值,更新多个预测码率因子以获得多个目标码率因子可以包括:将待编码视频片段的时空域特征信息、多个预测码率因子和第一预编码视频片段的视频编码分值输入到第二码率因子预测模型,以经由第二码率因子预测模型确定多个预设视频编码标准下待编码视频片段的多个目标码率因子。According to some embodiments of the present disclosure, step S448, based on the spatiotemporal domain feature information of the video segment to be encoded, multiple predicted rate factors and the video coding score of the first pre-encoded video segment, updating multiple predicted rate factors to obtain multiple target rate factors may include: inputting the spatiotemporal domain feature information of the video segment to be encoded, multiple predicted rate factors and the video coding score of the first pre-encoded video segment into a second rate factor prediction model, so as to determine multiple target rate factors of the video segment to be encoded under multiple preset video coding standards via the second rate factor prediction model.
根据本公开的一些实施例,类似于第一码率因子预测模型的训练方法,第二码率因子预测模型可以通过以下操作被训练:将样本时空域特征信息、每一个预设视频编码标准下的样本预测码率因子、和与第一预设视频编码标准下的样本预测码率因子相对应的第一样本视频编码分值输入到第二码率因子预测模型,以经由第二码率因子预测模型确
定多个预设视频编码标准下样本视频片段的多个样本目标码率因子;针对多个预设视频编码标准中的每一个预设视频编码标准,基于样本码率因子和对应于该预设视频编码标准的样本目标码率因子来计算第二模型损失值;以及基于第二模型损失值来调整第二码率因子预测模型的参数,直到达到第二模型训练停止条件。According to some embodiments of the present disclosure, similar to the training method of the first rate factor prediction model, the second rate factor prediction model can be trained by the following operations: inputting the sample spatiotemporal feature information, the sample prediction rate factor under each preset video coding standard, and the first sample video coding score corresponding to the sample prediction rate factor under the first preset video coding standard into the second rate factor prediction model, so as to determine the second rate factor prediction model through the second rate factor prediction model. Determine multiple sample target rate factors of sample video clips under multiple preset video coding standards; for each preset video coding standard in the multiple preset video coding standards, calculate a second model loss value based on the sample rate factor and the sample target rate factor corresponding to the preset video coding standard; and adjust the parameters of the second rate factor prediction model based on the second model loss value until the second model training stop condition is reached.
将理解的是,除了增加每一个预设视频编码标准下的样本预测码率因子和第一样本视频编码分值作为第二码率因子预测模型的输入,上述训练第二码率因子预测模型的方法中步骤与训练第一码率因子预测模型的方法200类似,故此不再赘述。It will be understood that, except for adding the sample prediction rate factor under each preset video coding standard and the first sample video coding score as inputs of the second rate factor prediction model, the steps in the above method for training the second rate factor prediction model are similar to the method 200 for training the first rate factor prediction model, so they will not be repeated here.
相应地,与第一码率因子预测模型相比,除了增加每一个预设视频编码标准下的样本预测码率因子和第一样本视频编码分值作为输入,第二码率因子预测模型的结构框架也与第一码率因子预测模型的结构框架相同,故此也不再赘述。Correspondingly, compared with the first rate factor prediction model, in addition to adding the sample prediction rate factor under each preset video coding standard and the first sample video coding score as input, the structural framework of the second rate factor prediction model is also the same as the structural framework of the first rate factor prediction model, so it will not be repeated here.
还将理解的是,对第二码率因子预测模型的训练是依赖于第一码率因子预测模型的训练结果的。因此,在完成对第一码率因子预测模型的训练之后执行对第二码率因子预测模型的训练。It will also be understood that the training of the second rate factor prediction model is dependent on the training results of the first rate factor prediction model. Therefore, the training of the second rate factor prediction model is performed after the training of the first rate factor prediction model is completed.
在确定多个目标码率因子之后,根据该多个码率因子分别使用转码系统中的对应编码器(基于对应的预设视频编码标准)对待编码视频片段进行编码。经过多次试验数据显示,使用第二次预测获得的目标码率因子对视频片段进行编码之后,经编码视频片段的视频编码分值有99%的概率在预设视频编码分值区间之内,因此,可以直接信任基于第二次预测获得的目标码率因子的视频片段编码结果。After determining multiple target bit rate factors, the corresponding encoders in the transcoding system (based on the corresponding preset video coding standards) are used to encode the video segments to be encoded according to the multiple bit rate factors. After multiple test data show that after encoding the video segments using the target bit rate factors obtained by the second prediction, there is a 99% probability that the video coding score of the encoded video segments is within the preset video coding score range. Therefore, the encoding result of the video segments based on the target bit rate factors obtained by the second prediction can be directly trusted.
在利用多个目标码率因子对每个待编码视频片段进行编码之后,方法100还可以包括:针对多个预设视频编码标准中的每一个预设视频编码标准,获取该预设视频编码标准下的所有目标视频片段;以及基于一个或多个子视频片段在待编码视频中的顺序,对所有目标视频片段进行组合,以获得该预设视频编码标准下对应于待编码视频的目标视频。After encoding each video segment to be encoded using multiple target bit rate factors, method 100 may also include: for each preset video coding standard among multiple preset video coding standards, obtaining all target video segments under the preset video coding standard; and combining all target video segments based on the order of one or more sub-video segments in the video to be encoded to obtain the target video corresponding to the video to be encoded under the preset video coding standard.
对一个待编码视频中的每个待编码视频片段进行上述的编码操作之后,可以获得针对每个待编码视频片段的多个目标视频片段。然后,针对每一个预设视频编码标准,可以将针对每个待编码视频片段的多个目标视频片段中的对应目标视频片段写入到最终的视频码流文件中,并且根据每个待编码视频片段在该待编码视频中的顺序对写入的目标视频片段进行拼接,获得该预设视频编码标准下对应于该待编码视频的目标视频。After performing the above encoding operation on each video segment to be encoded in a video to be encoded, multiple target video segments for each video segment to be encoded can be obtained. Then, for each preset video encoding standard, the corresponding target video segment among the multiple target video segments for each video segment to be encoded can be written into the final video bitstream file, and the written target video segments are spliced according to the order of each video segment to be encoded in the video to be encoded, so as to obtain the target video corresponding to the video to be encoded under the preset video encoding standard.
如上所述,每个待编码视频片段均为展示内容相似且连贯的一个单镜头视频片段,因此,针对每个待编码视频片段进行编码并获得对应的目标视频片段可以确保每个经编
码的视频片段的质量,进而确保了各个场景下稳定的画质体验。此外,基于这些待编码视频片段在待编码视频中的顺序对获得的目标视频片段进行拼接组合,可以还原完整的视频文件,避免目标视频片段拼接错误导致的视频文件不连续。As described above, each video segment to be encoded is a single-shot video segment showing similar and coherent content. Therefore, encoding each video segment to be encoded and obtaining the corresponding target video segment can ensure that each encoded video segment is The quality of the video clips to be encoded is improved, thereby ensuring a stable picture quality experience in various scenarios. In addition, the target video clips are spliced and combined based on the order of these video clips to be encoded in the video to be encoded, so that the complete video file can be restored, avoiding the discontinuity of the video file caused by the splicing errors of the target video clips.
图5示出了根据本公开的实施例的视频编码装置500的结构框图。如图5所示,装置500可以包括:获取模块510,被配置为获取待编码视频片段,该待编码视频片段包括一个或多个视频帧;计算模块520,被配置为基于该一个或多个视频帧计算待编码视频片段的时空域特征信息;确定预测码率因子模块530,被配置为基于时空域特征信息确定多个预设视频编码标准下待编码视频片段的多个预测码率因子;确定目标码率因子模块540,被配置为基于多个预测码率因子中对应于多个预设视频编码标准中的第一预设视频编码标准的第一预测码率因子和预设视频编码分值区间,确定多个预设视频编码标准下待编码视频片段的多个目标码率因子,其中,在多个预设视频编码标准中,利用第一预设视频编码标准进行视频编码的时间最短;以及编码模块550,被配置为根据多个目标码率因子分别对待编码视频片段进行编码,以获得多个目标视频片段。FIG5 shows a structural block diagram of a video encoding device 500 according to an embodiment of the present disclosure. As shown in FIG5, the device 500 may include: an acquisition module 510, configured to acquire a video segment to be encoded, the video segment to be encoded including one or more video frames; a calculation module 520, configured to calculate the spatiotemporal feature information of the video segment to be encoded based on the one or more video frames; a prediction rate factor determination module 530, configured to determine a plurality of prediction rate factors of the video segment to be encoded under a plurality of preset video encoding standards based on the spatiotemporal feature information; a target rate factor determination module 540, configured to determine a plurality of target rate factors of the video segment to be encoded under a plurality of preset video encoding standards based on a first prediction rate factor corresponding to a first preset video encoding standard among the plurality of preset video encoding standards and a preset video encoding score interval among the plurality of prediction rate factors, wherein the time for performing video encoding using the first preset video encoding standard is the shortest among the plurality of preset video encoding standards; and an encoding module 550, configured to encode the video segments to be encoded respectively according to the plurality of target rate factors to obtain a plurality of target video segments.
根据本公开的一些实施例,时空域特征信息可以包括时空复杂度信息,并且其中,计算模块530可以包括:被配置为基于一个或多个视频帧计算待编码视频片段的灰度共生信息和归一化信息作为时空复杂度信息的模块,其中,灰度共生信息用于描述待编码视频片段的空间复杂度,并且归一化信息用于描述待编码视频片段的时域复杂度;以及被配置为对灰度共生信息和归一化信息进行组合,以生成待编码视频片段的时空复杂度信息的模块。According to some embodiments of the present disclosure, the spatiotemporal feature information may include spatiotemporal complexity information, and wherein the calculation module 530 may include: a module configured to calculate the grayscale symbiosis information and normalization information of the video segment to be encoded as the spatiotemporal complexity information based on one or more video frames, wherein the grayscale symbiosis information is used to describe the spatial complexity of the video segment to be encoded, and the normalization information is used to describe the temporal complexity of the video segment to be encoded; and a module configured to combine the grayscale symbiosis information and the normalization information to generate the spatiotemporal complexity information of the video segment to be encoded.
根据本公开的一些实施例,时空域特征信息可以包括编码特征信息,并且其中,计算模块530可以包括:预处理模块,被配置为对一个或多个视频帧进行预处理,以生成新的视频帧序列,新的视频帧序列包括视频帧对集合;以及帧内帧间编码模块,被配置为针对视频帧对集合中的每一个视频帧对,分别进行帧内编码和帧间编码,以获得待编码视频片段的编码特征信息。According to some embodiments of the present disclosure, the spatiotemporal domain feature information may include coding feature information, and wherein the computing module 530 may include: a preprocessing module, configured to preprocess one or more video frames to generate a new video frame sequence, the new video frame sequence including a set of video frame pairs; and an intra-frame and inter-frame coding module, configured to perform intra-frame coding and inter-frame coding, respectively, for each video frame pair in the set of video frame pairs to obtain coding feature information of the video segment to be encoded.
根据本公开的一些实施例,预处理模块可以包括:被配置为对一个或多个视频帧中除首帧和尾帧之外的视频帧进行复制,以生成一个或多个复制视频帧的模块;以及被配置为对一个或多个视频帧和一个或多个复制视频帧进行排序,使得一个或多个复制视频帧中的每一个复制视频帧位于对应的视频帧之后的模块。According to some embodiments of the present disclosure, the preprocessing module may include: a module configured to copy video frames other than the first frame and the last frame of one or more video frames to generate one or more copied video frames; and a module configured to sort the one or more video frames and the one or more copied video frames so that each of the one or more copied video frames is located after the corresponding video frame.
根据本公开的一些实施例,帧内帧间编码模块可以包括:被配置为对视频帧对集合中的每一个视频帧对中的第一视频帧进行帧内编码,以获得帧内编码信息的模块;被配
置为对视频帧对集合中的每一个视频帧对进行帧间编码,以获得帧间编码信息的模块;以及被配置为基于帧内编码信息和帧间编码信息获得编码特征信息的模块。According to some embodiments of the present disclosure, the intra-frame and inter-frame coding module may include: a module configured to perform intra-frame coding on the first video frame in each video frame pair in the video frame pair set to obtain intra-frame coding information; A module configured to perform inter-frame coding on each video frame pair in the video frame pair set to obtain inter-frame coding information; and a module configured to obtain coding feature information based on the intra-frame coding information and the inter-frame coding information.
根据本公开的一些实施例,编码特征信息可以基于以下项中的一个或多个:编码比特数、帧内预测模式占比值、帧间运动矢量幅度分布信息。According to some embodiments of the present disclosure, the coding feature information may be based on one or more of the following items: the number of coding bits, the proportion of intra-frame prediction modes, and the distribution information of the amplitude of inter-frame motion vectors.
根据本公开的一些实施例,确定预测码率因子模块530可以包括:被配置为将待编码视频片段的时空域特征信息输入到第一码率因子预测模型,以经由第一码率因子预测模型确定多个预设视频编码标准下待编码视频片段的多个预测码率因子的模块。According to some embodiments of the present disclosure, the module 530 for determining the predicted bit rate factor may include: a module configured to input the spatiotemporal domain feature information of the video segment to be encoded into a first bit rate factor prediction model, so as to determine multiple predicted bit rate factors of the video segment to be encoded under multiple preset video coding standards via the first bit rate factor prediction model.
根据本公开的一些实施例,确定目标码率因子模块540可以包括:被配置为基于第一预测码率因子对待编码视频片段进行第一次编码,以获得第一预设视频编码标准下的第一预编码视频片段的模块;被配置为确定第一预编码视频片段的视频编码分值是否在预设视频编码分值区间内的模块;以及被配置为响应于确定第一预编码视频片段的视频编码分值在预设视频编码分值区间内而确定多个预测码率因子作为多个目标码率因子的模块。According to some embodiments of the present disclosure, the target bit rate factor determination module 540 may include: a module configured to perform a first encoding on a video segment to be encoded based on a first predicted bit rate factor to obtain a first pre-encoded video segment under a first preset video coding standard; a module configured to determine whether the video coding score of the first pre-encoded video segment is within a preset video coding score range; and a module configured to determine multiple predicted bit rate factors as multiple target bit rate factors in response to determining that the video coding score of the first pre-encoded video segment is within the preset video coding score range.
根据本公开的一些实施例,确定目标码率因子模块540还可以包括:更新模块,被配置为响应于确定第一预编码视频片段的视频编码分值不在预设视频编码分值区间内,基于待编码视频片段的时空域特征信息、多个预测码率因子和第一预编码视频片段的视频编码分值,更新多个预测码率因子以获得多个目标码率因子。According to some embodiments of the present disclosure, the module 540 for determining the target bit rate factor may further include: an updating module, configured to, in response to determining that the video coding score of the first pre-encoded video clip is not within a preset video coding score range, update multiple predicted bit rate factors to obtain multiple target bit rate factors based on spatiotemporal domain feature information of the video clip to be encoded, multiple predicted bit rate factors and the video coding score of the first pre-encoded video clip.
根据本公开的一些实施例,更新模块可以包括:被配置为将待编码视频片段的时空域特征信息、多个预测码率因子和第一预编码视频片段的视频编码分值输入到第二码率因子预测模型,以经由第二码率因子预测模型确定多个预设视频编码标准下待编码视频片段的多个目标码率因子的模块。According to some embodiments of the present disclosure, the update module may include: a module configured to input the spatiotemporal domain feature information of the video segment to be encoded, multiple predicted bit rate factors, and the video coding score of the first pre-encoded video segment into a second bit rate factor prediction model, so as to determine multiple target bit rate factors of the video segment to be encoded under multiple preset video coding standards via the second bit rate factor prediction model.
根据本公开的一些实施例,其中,第一码率因子预测模型可以通过以下操作被训练:针对多个预设视频编码标准中的每一个预设视频编码标准:获取样本视频片段的样本时空域特征信息以及该预设视频编码标准下样本视频片段的样本码率因子;将样本时空域特征信息输入到第一码率因子预测模型,以经由第一码率因子预测模型确定该预设视频编码标准下样本视频片段的样本预测码率因子;以及基于样本码率因子和样本预测码率因子来计算第一模型损失值;以及基于每一个预设视频编码标准下的第一模型损失值来调整第一码率因子预测模型的参数,直到达到第一模型训练停止条件。According to some embodiments of the present disclosure, the first rate factor prediction model can be trained by the following operations: for each of a plurality of preset video coding standards: obtaining sample spatiotemporal domain feature information of a sample video clip and a sample rate factor of the sample video clip under the preset video coding standard; inputting the sample spatiotemporal domain feature information into the first rate factor prediction model to determine the sample predicted rate factor of the sample video clip under the preset video coding standard via the first rate factor prediction model; and calculating a first model loss value based on the sample rate factor and the sample predicted rate factor; and adjusting the parameters of the first rate factor prediction model based on the first model loss value under each preset video coding standard until the first model training stop condition is reached.
根据本公开的一些实施例,第二码率因子预测模型通过以下操作被训练:将样本时空域特征信息、每一个预设视频编码标准下的样本预测码率因子、和与第一预设视频编
码标准下的样本预测码率因子相对应的第一样本视频编码分值输入到第二码率因子预测模型,以经由第二码率因子预测模型确定多个预设视频编码标准下样本视频片段的多个样本目标码率因子;针对多个预设视频编码标准中的每一个预设视频编码标准,基于样本码率因子和对应于该预设视频编码标准的样本目标码率因子来计算第二模型损失值;以及基于每一个预设视频编码标准下的第二模型损失值来调整第二码率因子预测模型的参数,直到达到第二模型训练停止条件。According to some embodiments of the present disclosure, the second rate factor prediction model is trained by the following operations: combining the sample spatiotemporal feature information, the sample prediction rate factor under each preset video coding standard, and the first preset video coding standard with the sample prediction rate factor; A first sample video coding score corresponding to a sample predicted rate factor under a coding standard is input into a second rate factor prediction model to determine multiple sample target rate factors of sample video clips under multiple preset video coding standards through the second rate factor prediction model; for each of the multiple preset video coding standards, a second model loss value is calculated based on the sample rate factor and the sample target rate factor corresponding to the preset video coding standard; and parameters of the second rate factor prediction model are adjusted based on the second model loss value under each preset video coding standard until the second model training stop condition is reached.
根据本公开的一些实施例,获取模块510可以包括:被配置为获取待编码视频的模块;被配置为对待编码视频进行场景检测的模块;被配置为基于场景检测结果将待编码视频分成一个或多个子视频片段的模块;以及被配置为识别一个或多个视频片段中的每一个视频子片段作为待编码视频片段的模块。According to some embodiments of the present disclosure, the acquisition module 510 may include: a module configured to acquire a video to be encoded; a module configured to perform scene detection on the video to be encoded; a module configured to divide the video to be encoded into one or more sub-video segments based on the scene detection results; and a module configured to identify each video sub-segment in one or more video segments as a video segment to be encoded.
根据本公开的一些实施例,装置500还可以包括:被配置为针对多个预设视频编码标准中的每一个预设视频编码标准,获取该预设视频编码标准下的所有目标视频片段的模块;以及被配置为基于一个或多个子视频片段在待编码视频中的顺序,对所有目标视频片段进行组合,以获得该预设视频编码标准下对应于待编码视频的目标视频的模块。According to some embodiments of the present disclosure, the device 500 may also include: a module configured to obtain all target video segments under each preset video coding standard among multiple preset video coding standards; and a module configured to combine all target video segments based on the order of one or more sub-video segments in the video to be encoded to obtain a target video corresponding to the video to be encoded under the preset video coding standard.
应当理解,图5中所示装置500的各个模块510-550可以与参考图1描述的方法100中的各个步骤S110-S150相对应。由此,上面针对方法100所描述的操作、特征和优点同样适用于装置500及其所包括的模块。为了简洁起见,某些操作、特征和优点在此不再赘述。It should be understood that the modules 510-550 of the apparatus 500 shown in FIG5 may correspond to the steps S110-S150 in the method 100 described with reference to FIG1. Thus, the operations, features and advantages described above for the method 100 are also applicable to the apparatus 500 and the modules included therein. For the sake of brevity, some operations, features and advantages are not described in detail herein.
还应当理解,本文可以在软件硬件元件或程序模块的一般上下文中描述各种技术。上面关于图5描述的各个模块可以在硬件中或在结合软件和/或固件的硬件中实现。例如,这些模块可以被实现为计算机程序代码/指令,该计算机程序代码/指令被配置为在一个或多个处理器中执行并存储在计算机可读存储介质中。可替换地,这些模块可以被实现为硬件逻辑/电路。例如,在一些实施例中,获取模块510、计算模块520、确定预测码率因子模块530、确定目标码率因子模块540和编码模块550中的一个或多个可以一起被实现在片上系统(System on Chip,SoC)中。SoC可以包括集成电路芯片(其包括处理器(例如,中央处理单元(Central Processing Unit,CPU)、微控制器、微处理器、数字信号处理器(Digital Signal Processor,DSP)等)、存储器、一个或多个通信接口、和/或其他电路中的一个或多个部件),并且可以可选地执行所接收的程序代码和/或包括嵌入式固件以执行功能。
It should also be understood that various techniques may be described herein in the general context of software hardware elements or program modules. The various modules described above with respect to FIG. 5 may be implemented in hardware or in hardware in combination with software and/or firmware. For example, these modules may be implemented as computer program codes/instructions configured to be executed in one or more processors and stored in a computer-readable storage medium. Alternatively, these modules may be implemented as hardware logic/circuits. For example, in some embodiments, one or more of the acquisition module 510, the calculation module 520, the determination prediction rate factor module 530, the determination target rate factor module 540, and the encoding module 550 may be implemented together in a system on chip (System on Chip, SoC). SoC may include an integrated circuit chip (which includes a processor (e.g., a central processing unit (CPU), a microcontroller, a microprocessor, a digital signal processor (DSP), etc.), a memory, one or more communication interfaces, and/or one or more components in other circuits), and may optionally execute the received program code and/or include embedded firmware to perform functions.
根据本公开的另一方面,还提供一种电子设备,包括:至少一个处理器;以及与所述至少一个处理器通信连接的至少一个存储器;其中所述至少一个存储器存储有计算机程序,所述计算机程序在被所述至少一个处理器执行时实现上述视频编码方法。According to another aspect of the present disclosure, an electronic device is also provided, comprising: at least one processor; and at least one memory communicatively connected to the at least one processor; wherein the at least one memory stores a computer program, and the computer program implements the above-mentioned video encoding method when executed by the at least one processor.
根据本公开的另一方面,还提供一种存储有计算机程序的非瞬时计算机可读存储介质,其中,所述计算机程序在被处理器执行时实现上述视频编码方法。According to another aspect of the present disclosure, a non-transitory computer-readable storage medium storing a computer program is also provided, wherein the computer program implements the above-mentioned video encoding method when executed by a processor.
根据本公开的另一方面,还提供一种计算机程序产品,包括计算机程序,其中,所述计算机程序在被处理器执行时实现上述视频编码方法。According to another aspect of the present disclosure, a computer program product is also provided, including a computer program, wherein the computer program implements the above-mentioned video encoding method when executed by a processor.
参见图6,现将描述可以作为本公开的服务器的电子设备600的结构框图,其是可以应用于本公开的各方面的硬件设备的示例。电子设备可以是不同类型的计算机设备,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本公开的实现。Referring to Fig. 6, the structural block diagram of the electronic device 600 that can be used as the server of the present disclosure will now be described, which is an example of a hardware device that can be applied to various aspects of the present disclosure. The electronic device can be a computer device of different types, such as a laptop computer, a desktop computer, a workbench, a personal digital assistant, a server, a blade server, a mainframe computer, and other suitable computers. The electronic device can also represent various forms of mobile devices, such as personal digital processing, a cellular phone, a smart phone, a wearable device, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely examples, and are not intended to limit the implementation of the present disclosure described herein and/or required.
如图6所示,电子设备600可以包括能够通过系统总线630彼此通信的至少一个处理器610、工作存储器620、输入单元640、显示单元650、扬声器660、存储单元670、通信单元680以及其它输出单元690。As shown in FIG. 6 , the electronic device 600 may include at least one processor 610 , a working memory 620 , an input unit 640 , a display unit 650 , a speaker 660 , a storage unit 670 , a communication unit 680 , and other output units 690 that can communicate with each other through a system bus 630 .
处理器610可以是单个处理单元或多个处理单元,所有处理单元可以包括单个或多个计算单元或者多个核心。处理器610可以被实施成一个或更多微处理器、微型计算机、微控制器、数字信号处理器、中央处理单元、状态机、逻辑电路和/或基于操作指令来操纵信号的任何设备。处理器610可以被配置成获取并且执行存储在工作存储器620、存储单元670或者其他计算机可读介质中的计算机可读指令,诸如操作系统620a的程序代码、应用程序620b的程序代码等。The processor 610 may be a single processing unit or multiple processing units, all of which may include a single or multiple computing units or multiple cores. The processor 610 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuits, and/or any device that manipulates signals based on operating instructions. The processor 610 may be configured to obtain and execute computer-readable instructions stored in a working memory 620, a storage unit 670, or other computer-readable media, such as program codes of an operating system 620a, program codes of an application 620b, and the like.
工作存储器620和存储单元670是用于存储指令的计算机可读存储介质的示例,指令由处理器610执行来实施前面所描述的各种功能。工作存储器620可以包括易失性存储器和非易失性存储器二者(例如RAM、ROM等等)。此外,存储单元670可以包括硬盘驱动器、固态驱动器、可移除介质、包括外部和可移除驱动器、存储器卡、闪存、软盘、光盘(例如CD、DVD)、存储阵列、网络附属存储、存储区域网等等。工作存储器620和存储单元670在本文中都可以被统称为存储器或计算机可读存储介质,并且可以是能够把计算机可读、处理器可执行程序指令存储为计算机程序代码的非暂态介质,计
算机程序代码可以由处理器610作为被配置成实施在本文的示例中所描述的操作和功能的特定机器来执行。The working memory 620 and the storage unit 670 are examples of computer-readable storage media for storing instructions that are executed by the processor 610 to implement the various functions described above. The working memory 620 may include both volatile memory and non-volatile memory (e.g., RAM, ROM, etc.). In addition, the storage unit 670 may include a hard disk drive, a solid-state drive, a removable medium, including external and removable drives, memory cards, flash memory, a floppy disk, an optical disk (e.g., a CD, DVD), a storage array, a network attached storage, a storage area network, and the like. The working memory 620 and the storage unit 670 may all be collectively referred to herein as memory or computer-readable storage media, and may be a non-transitory medium capable of storing computer-readable, processor-executable program instructions as computer program code, a computer program code, and a computer program code. The computer program code may be executed by the processor 610 as a specific machine configured to implement the operations and functions described in the examples herein.
输入单元660可以是能向电子设备600输入信息的任何类型的设备,输入单元660可以接收输入的数字或字符信息,以及产生与电子设备的用户设置和/或功能控制有关的键信号输入,并且可以包括但不限于鼠标、键盘、触摸屏、轨迹板、轨迹球、操作杆、麦克风和/或遥控器。输出单元可以是能呈现信息的任何类型的设备,并且可以包括但不限于显示单元650、扬声器660以及其它输出单元690,其它输出单元690可以但不限于包括视频/音频输出终端、振动器和/或打印机。通信单元680允许电子设备600通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据,并且可以包括但不限于调制解调器、网卡、红外通信设备、无线通信收发机和/或芯片组,例如蓝牙TM设备、802.11设备、Wi-Fi设备、WiMAX设备、蜂窝通信设备和/或类似物。The input unit 660 may be any type of device capable of inputting information to the electronic device 600. The input unit 660 may receive input digital or character information and generate key signal inputs related to user settings and/or function control of the electronic device, and may include but is not limited to a mouse, a keyboard, a touch screen, a track pad, a track ball, a joystick, a microphone and/or a remote controller. The output unit may be any type of device capable of presenting information, and may include but is not limited to a display unit 650, a speaker 660, and other output units 690, which may include but are not limited to video/audio output terminals, vibrators and/or printers. The communication unit 680 allows the electronic device 600 to exchange information/data with other devices via a computer network such as the Internet and/or various telecommunication networks, and may include but is not limited to a modem, a network card, an infrared communication device, a wireless communication transceiver and/or a chipset, such as a Bluetooth ™ device, an 802.11 device, a Wi-Fi device, a WiMAX device, a cellular communication device and/or the like.
工作寄存器620中的应用程序620b可以被加载执行上文所描述的各个方法及其步骤,例如图1中的步骤S110-S150、图2中的步骤S210-S240、图4中的步骤S442-S448。例如,在一些实施例中,上文描述的各种方法可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元670。在一些实施例中,计算机程序的部分或者全部可以经由存储单元670和/或通信单元680而被载入和/或安装到电子设备600上。当计算机程序被加载并由处理器610执行时,可以执行上文描述的方法100、200的一个或多个步骤。备选地,在其他实施例中,处理器610可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行方法100、200。The application 620b in the working register 620 can be loaded to execute the various methods and steps thereof described above, such as steps S110-S150 in FIG. 1, steps S210-S240 in FIG. 2, and steps S442-S448 in FIG. 4. For example, in some embodiments, the various methods described above can be implemented as a computer software program, which is tangibly contained in a machine-readable medium, such as a storage unit 670. In some embodiments, part or all of the computer program can be loaded and/or installed on the electronic device 600 via the storage unit 670 and/or the communication unit 680. When the computer program is loaded and executed by the processor 610, one or more steps of the methods 100 and 200 described above can be executed. Alternatively, in other embodiments, the processor 610 can be configured to execute the methods 100 and 200 in any other appropriate manner (for example, by means of firmware).
本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、负载可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various implementations can include: being implemented in one or more computer programs that can be executed and/or interpreted on a programmable system including at least one programmable processor, which can be a special purpose or general purpose programmable processor that can receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能
/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。The program code for implementing the method of the present disclosure can be written in any combination of one or more programming languages. These program codes can be provided to a processor or controller of a general-purpose computer, a special-purpose computer or other programmable data processing device, so that when the program code is executed by the processor or controller, the functions specified in the flowchart and/or block diagram are implemented. /operations are performed. The program code may execute entirely on the machine, partly on the machine, partly on the machine as a stand-alone software package and partly on a remote machine or entirely on the remote machine or server.
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, device, or equipment. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or equipment, or any suitable combination of the foregoing. A more specific example of a machine-readable storage medium may include an electrical connection based on one or more lines, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide interaction with a user, the systems and techniques described herein can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user; and a keyboard and pointing device (e.g., a mouse or trackball) through which the user can provide input to the computer. Other types of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including acoustic input, voice input, or tactile input).
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)和互联网。The systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., a user computer with a graphical user interface or a web browser through which a user can interact with implementations of the systems and techniques described herein), or a computing system that includes any combination of such back-end components, middleware components, or front-end components. The components of the system may be interconnected by any form or medium of digital data communication (e.g., a communications network). Examples of communications networks include: a local area network (LAN), a wide area network (WAN), and the Internet.
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。A computer system may include clients and servers. Clients and servers are generally remote from each other and usually interact through a communication network. The relationship of client and server is generated by computer programs running on respective computers and having a client-server relationship to each other.
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本公开中记载的各步骤可以并行地执行、也可以顺序地或以不同的次序执行,只要能够实现本公开公开的技术方案所期望的结果,本文在此不进行限制。
It should be understood that the various forms of processes shown above can be used to reorder, add or delete steps. For example, the steps described in this disclosure can be performed in parallel, sequentially or in a different order, as long as the desired results of the technical solutions disclosed in this disclosure can be achieved, and this document is not limited here.
虽然已经参照附图描述了本公开的实施例或示例,但应理解,上述的方法、系统和设备仅仅是示例性的实施例或示例,本发明的范围并不由这些实施例或示例限制,而是仅由授权后的权利要求书及其等同范围来限定。实施例或示例中的各种要素可以被省略或者可由其等同要素替代。此外,可以通过不同于本公开中描述的次序来执行各步骤。进一步地,可以以各种方式组合实施例或示例中的各种要素。重要的是随着技术的演进,在此描述的很多要素可以由本公开之后出现的等同要素进行替换。
Although the embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it should be understood that the above-mentioned methods, systems and devices are merely exemplary embodiments or examples, and the scope of the present invention is not limited by these embodiments or examples, but is only limited by the claims after authorization and their equivalents. Various elements in the embodiments or examples may be omitted or replaced by their equivalents. In addition, each step may be performed in an order different from that described in the present disclosure. Further, the various elements in the embodiments or examples may be combined in various ways. It is important that with the evolution of technology, many of the elements described herein may be replaced by equivalent elements that appear after the present disclosure.
Claims (18)
- 一种视频编码方法,包括:A video encoding method, comprising:获取待编码视频片段,所述待编码视频片段包括一个或多个视频帧;Acquire a video segment to be encoded, where the video segment to be encoded includes one or more video frames;基于所述一个或多个视频帧计算所述待编码视频片段的时空域特征信息;Calculating spatiotemporal feature information of the video segment to be encoded based on the one or more video frames;基于所述时空域特征信息确定多个预设视频编码标准下所述待编码视频片段的多个预测码率因子;Determining a plurality of predicted bit rate factors of the to-be-encoded video segment under a plurality of preset video encoding standards based on the spatiotemporal feature information;基于所述多个预测码率因子中对应于所述多个预设视频编码标准中的第一预设视频编码标准的第一预测码率因子和预设视频编码分值区间,确定所述多个预设视频编码标准下所述待编码视频片段的多个目标码率因子,其中,在所述多个预设视频编码标准中,利用所述第一预设视频编码标准进行视频编码的时间最短;以及Determining, based on a first predicted rate factor corresponding to a first preset video coding standard among the multiple preset video coding standards and a preset video coding score interval among the multiple predicted rate factors, multiple target rate factors of the video segment to be encoded under the multiple preset video coding standards, wherein, among the multiple preset video coding standards, the time for performing video encoding using the first preset video coding standard is the shortest; and根据所述多个目标码率因子分别对所述待编码视频片段进行编码,以获得多个目标视频片段。The to-be-encoded video segments are respectively encoded according to the multiple target bit rate factors to obtain multiple target video segments.
- 根据权利要求1所述的方法,其中,所述时空域特征信息包括时空复杂度信息,并且其中,基于所述一个或多个视频帧计算所述待编码视频片段的时空域特征信息包括:The method according to claim 1, wherein the spatiotemporal feature information includes spatiotemporal complexity information, and wherein calculating the spatiotemporal feature information of the video segment to be encoded based on the one or more video frames comprises:基于所述一个或多个视频帧计算所述待编码视频片段的灰度共生信息和归一化信息作为所述时空复杂度信息,其中,所述灰度共生信息用于描述所述待编码视频片段的空间复杂度,并且所述归一化信息用于描述所述待编码视频片段的时域复杂度;以及Calculating grayscale symbiosis information and normalization information of the video segment to be encoded based on the one or more video frames as the spatiotemporal complexity information, wherein the grayscale symbiosis information is used to describe the spatial complexity of the video segment to be encoded, and the normalization information is used to describe the temporal complexity of the video segment to be encoded; and对所述灰度共生信息和所述归一化信息进行组合,以生成所述待编码视频片段的所述时空复杂度信息。The grayscale symbiosis information and the normalization information are combined to generate the spatiotemporal complexity information of the video segment to be encoded.
- 根据权利要求1或2所述的方法,其中,所述时空域特征信息包括编码特征信息,并且其中,基于所述一个或多个视频帧计算所述待编码视频片段的时空域特征信息包括:The method according to claim 1 or 2, wherein the spatiotemporal feature information includes coding feature information, and wherein calculating the spatiotemporal feature information of the video segment to be encoded based on the one or more video frames comprises:对所述一个或多个视频帧进行预处理,以生成新的视频帧序列,所述新的视频帧序列包括视频帧对集合;以及preprocessing the one or more video frames to generate a new video frame sequence, wherein the new video frame sequence includes a set of video frame pairs; and针对所述视频帧对集合中的每一个视频帧对,分别进行帧内编码和帧间编码,以获得所述待编码视频片段的所述编码特征信息。For each video frame pair in the video frame pair set, intra-frame coding and inter-frame coding are performed respectively to obtain the coding feature information of the video segment to be encoded.
- 根据权利要求3所述的方法,其中,对所述一个或多个视频帧进行预处理,以生成新的视频帧序列包括: The method according to claim 3, wherein preprocessing the one or more video frames to generate a new video frame sequence comprises:对所述一个或多个视频帧中除首帧和尾帧之外的视频帧进行复制,以生成一个或多个复制视频帧;以及Copying video frames other than the first frame and the last frame of the one or more video frames to generate one or more copied video frames; and对所述一个或多个视频帧和所述一个或多个复制视频帧进行排序,使得所述一个或多个复制视频帧中的每一个复制视频帧位于对应的视频帧之后。The one or more video frames and the one or more duplicate video frames are ordered such that each of the one or more duplicate video frames is located after a corresponding video frame.
- 根据权利要求3所述的方法,其中,针对所述视频帧对集合中的每一个视频帧对,分别进行帧内编码和帧间编码,以获得所述待编码视频片段的所述编码特征信息包括:The method according to claim 3, wherein for each video frame pair in the set of video frame pairs, intra-frame coding and inter-frame coding are respectively performed to obtain the coding feature information of the video segment to be encoded, comprising:对所述视频帧对集合中的每一个视频帧对中的第一视频帧进行帧内编码,以获得帧内编码信息;Performing intra-frame coding on a first video frame in each video frame pair in the set of video frame pairs to obtain intra-frame coding information;对所述视频帧对集合中的每一个视频帧对进行帧间编码,以获得帧间编码信息;以及Performing inter-frame coding on each video frame pair in the set of video frame pairs to obtain inter-frame coding information; and基于所述帧内编码信息和所述帧间编码信息获得所述编码特征信息。The encoding feature information is obtained based on the intra-frame encoding information and the inter-frame encoding information.
- 根据权利要求3-5中任一项所述的方法,其中,所述编码特征信息基于以下项中的一个或多个:编码比特数、帧内预测模式占比值、帧间运动矢量幅度分布信息。The method according to any one of claims 3 to 5, wherein the coding feature information is based on one or more of the following items: the number of coding bits, the proportion of intra-frame prediction modes, and the amplitude distribution information of inter-frame motion vectors.
- 根据权利要求1-6中任一项所述的方法,其中,基于所述时空域特征信息确定多个预设视频编码标准下所述待编码视频片段的多个预测码率因子包括:The method according to any one of claims 1 to 6, wherein determining a plurality of predicted bit rate factors of the to-be-encoded video segment under a plurality of preset video coding standards based on the spatiotemporal feature information comprises:将所述待编码视频片段的所述时空域特征信息输入到第一码率因子预测模型,以经由所述第一码率因子预测模型确定所述多个预设视频编码标准下所述待编码视频片段的所述多个预测码率因子。The spatiotemporal domain feature information of the to-be-encoded video segment is input into a first rate factor prediction model, so as to determine the multiple predicted rate factors of the to-be-encoded video segment under the multiple preset video coding standards through the first rate factor prediction model.
- 根据权利要求7所述的方法,其中,基于所述多个预测码率因子中对应于所述多个预设视频编码标准中的第一预设视频编码标准的第一预测码率因子和预设视频编码分值区间,确定所述多个预设视频编码标准下所述待编码视频片段的多个目标码率因子包括:The method according to claim 7, wherein, based on a first predicted bit rate factor corresponding to a first preset video coding standard among the multiple preset video coding standards and a preset video coding score interval among the multiple predicted bit rate factors, determining a plurality of target bit rate factors for the to-be-encoded video segment under the multiple preset video coding standards comprises:基于所述第一预测码率因子对所述待编码视频片段进行第一次编码,以获得所述第一预设视频编码标准下的第一预编码视频片段;Performing a first encoding on the to-be-encoded video segment based on the first predicted bit rate factor to obtain a first pre-encoded video segment under the first preset video encoding standard;确定所述第一预编码视频片段的视频编码分值是否在所述预设视频编码分值区间内;以及 determining whether the video coding score of the first pre-encoded video segment is within the preset video coding score range; and响应于确定所述第一预编码视频片段的视频编码分值在所述预设视频编码分值区间内:In response to determining that the video coding score of the first pre-encoded video segment is within the preset video coding score range:确定所述多个预测码率因子作为所述多个目标码率因子。The plurality of predicted rate factors are determined as the plurality of target rate factors.
- 根据权利要求8所述的方法,其中,基于所述多个预测码率因子中对应于所述多个预设视频编码标准中的第一预设视频编码标准的第一预测码率因子和预设视频编码分值区间,确定所述多个预设视频编码标准下所述待编码视频片段的多个目标码率因子还包括:The method according to claim 8, wherein, based on a first predicted bit rate factor corresponding to a first preset video coding standard among the multiple preset video coding standards and a preset video coding score interval among the multiple predicted bit rate factors, determining a plurality of target bit rate factors for the to-be-encoded video segment under the multiple preset video coding standards further comprises:响应于确定所述第一预编码视频片段的视频编码分值不在所述预设视频编码分值区间内:In response to determining that the video coding score of the first pre-encoded video segment is not within the preset video coding score range:基于所述待编码视频片段的所述时空域特征信息、所述多个预测码率因子和所述第一预编码视频片段的视频编码分值,更新所述多个预测码率因子以获得所述多个目标码率因子。Based on the spatiotemporal feature information of the to-be-encoded video segment, the multiple predicted rate factors and the video coding score of the first pre-encoded video segment, the multiple predicted rate factors are updated to obtain the multiple target rate factors.
- 根据权利要求9所述的方法,其中,基于所述待编码视频片段的所述时空域特征信息、所述多个预测码率因子和所述第一预编码视频片段的视频编码分值,更新所述多个预测码率因子以获得所述多个目标码率因子包括:The method according to claim 9, wherein, based on the spatiotemporal feature information of the to-be-encoded video segment, the multiple predicted rate factors and the video coding score of the first pre-encoded video segment, updating the multiple predicted rate factors to obtain the multiple target rate factors comprises:将所述待编码视频片段的所述时空域特征信息、所述多个预测码率因子和所述第一预编码视频片段的视频编码分值输入到所述第二码率因子预测模型,以经由所述第二码率因子预测模型确定所述多个预设视频编码标准下所述待编码视频片段的所述多个目标码率因子。The spatiotemporal domain feature information of the video segment to be encoded, the multiple predicted bit rate factors and the video coding score of the first pre-encoded video segment are input into the second bit rate factor prediction model to determine the multiple target bit rate factors of the video segment to be encoded under the multiple preset video coding standards through the second bit rate factor prediction model.
- 根据权利要求10所述的方法,其中,所述第一码率因子预测模型通过以下操作被训练:The method according to claim 10, wherein the first rate factor prediction model is trained by:针对所述多个预设视频编码标准中的每一个预设视频编码标准:For each preset video coding standard of the multiple preset video coding standards:获取样本视频片段的样本时空域特征信息以及该预设视频编码标准下所述样本视频片段的样本码率因子;Acquire sample spatiotemporal feature information of a sample video segment and a sample bit rate factor of the sample video segment under the preset video coding standard;将所述样本时空域特征信息输入到所述第一码率因子预测模型,以经由所述第一码率因子预测模型确定该预设视频编码标准下所述样本视频片段的样本预测码率因子;以及 Inputting the sample spatiotemporal feature information into the first rate factor prediction model to determine a sample prediction rate factor of the sample video clip under the preset video coding standard via the first rate factor prediction model; and基于所述样本码率因子和所述样本预测码率因子来计算第一模型损失值;以及Calculating a first model loss value based on the sample rate factor and the sample predicted rate factor; and基于所述每一个预设视频编码标准下的所述第一模型损失值来调整所述第一码率因子预测模型的参数,直到达到第一模型训练停止条件。The parameters of the first rate factor prediction model are adjusted based on the first model loss value under each preset video encoding standard until the first model training stop condition is reached.
- 根据权利要求11所述的方法,其中,所述第二码率因子预测模型通过以下操作被训练:The method according to claim 11, wherein the second rate factor prediction model is trained by:将所述样本时空域特征信息、所述每一个预设视频编码标准下的所述样本预测码率因子、和与所述第一预设视频编码标准下的样本预测码率因子相对应的第一样本视频编码分值输入到所述第二码率因子预测模型,以经由所述第二码率因子预测模型确定所述多个预设视频编码标准下所述样本视频片段的多个样本目标码率因子;Inputting the sample spatiotemporal feature information, the sample prediction rate factor under each of the preset video coding standards, and a first sample video coding score corresponding to the sample prediction rate factor under the first preset video coding standard into the second rate factor prediction model, so as to determine a plurality of sample target rate factors of the sample video clips under the plurality of preset video coding standards through the second rate factor prediction model;针对所述多个预设视频编码标准中的所述每一个预设视频编码标准,基于所述样本码率因子和对应于该预设视频编码标准的样本目标码率因子来计算第二模型损失值;以及For each of the plurality of preset video coding standards, calculating a second model loss value based on the sample rate factor and a sample target rate factor corresponding to the preset video coding standard; and基于所述每一个预设视频编码标准下的所述第二模型损失值来调整所述第二码率因子预测模型的参数,直到达到第二模型训练停止条件。The parameters of the second rate factor prediction model are adjusted based on the second model loss value under each preset video encoding standard until the second model training stop condition is reached.
- 根据权利要求1所述的方法,其中,获取待编码视频片段包括:The method according to claim 1, wherein obtaining the video segment to be encoded comprises:获取待编码视频;Get the video to be encoded;对所述待编码视频进行场景检测;Performing scene detection on the video to be encoded;基于场景检测结果将所述待编码视频分成一个或多个子视频片段;以及Dividing the to-be-encoded video into one or more sub-video segments based on the scene detection result; and识别所述一个或多个视频片段中的每一个视频子片段作为待编码视频片段。Each video sub-segment in the one or more video segments is identified as a video segment to be encoded.
- 根据权利要求13所述的方法,还包括:The method according to claim 13, further comprising:针对所述多个预设视频编码标准中的每一个预设视频编码标准,获取该预设视频编码标准下的所有目标视频片段;以及For each preset video coding standard among the plurality of preset video coding standards, obtaining all target video segments under the preset video coding standard; and基于所述一个或多个子视频片段在所述待编码视频中的顺序,对所述所有目标视频片段进行组合,以获得该预设视频编码标准下对应于所述待编码视频的目标视频。Based on the order of the one or more sub-video segments in the video to be encoded, all the target video segments are combined to obtain a target video corresponding to the video to be encoded under the preset video encoding standard.
- 一种视频编码装置,包括: A video encoding device, comprising:获取模块,被配置为获取待编码视频片段,所述待编码视频片段包括一个或多个视频帧;An acquisition module is configured to acquire a video segment to be encoded, where the video segment to be encoded includes one or more video frames;计算模块,被配置为基于所述一个或多个视频帧计算所述待编码视频片段的时空域特征信息;A calculation module, configured to calculate the spatiotemporal feature information of the video segment to be encoded based on the one or more video frames;确定预测码率因子模块,被配置为基于所述时空域特征信息确定多个预设视频编码标准下所述待编码视频片段的多个预测码率因子;A prediction rate factor determination module is configured to determine a plurality of prediction rate factors of the to-be-encoded video segment under a plurality of preset video encoding standards based on the spatiotemporal feature information;确定目标码率因子模块,被配置为基于所述多个预测码率因子中对应于所述多个预设视频编码标准中的第一预设视频编码标准的第一预测码率因子和预设视频编码分值区间,确定所述多个预设视频编码标准下所述待编码视频片段的多个目标码率因子,其中,在所述多个预设视频编码标准中,利用所述第一预设视频编码标准进行视频编码的时间最短;以及a target rate factor determination module, configured to determine a plurality of target rate factors of the video segment to be encoded under the plurality of preset video coding standards based on a first predicted rate factor corresponding to a first preset video coding standard among the plurality of preset video coding standards among the plurality of predicted rate factors and a preset video coding score interval, wherein, among the plurality of preset video coding standards, the time for performing video encoding using the first preset video coding standard is the shortest; and编码模块,被配置为根据所述多个目标码率因子分别对所述待编码视频片段进行编码,以获得多个目标视频片段。The encoding module is configured to encode the to-be-encoded video segments respectively according to the multiple target bit rate factors to obtain multiple target video segments.
- 一种电子设备,包括:An electronic device, comprising:至少一个处理器;以及at least one processor; and与所述至少一个处理器通信连接的至少一个存储器,at least one memory communicatively coupled to the at least one processor,其中,所述至少一个存储器存储有计算机程序,所述计算机程序在被所述至少一个处理器执行时实现如权利要求1-14中任一项所述的方法。The at least one memory stores a computer program, and when the computer program is executed by the at least one processor, the method according to any one of claims 1 to 14 is implemented.
- 一种存储有计算机程序的非瞬时计算机可读存储介质,其中,所述计算机程序在被处理器执行时实现如权利要求1-14中任一项所述的方法。A non-transitory computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the method according to any one of claims 1 to 14.
- 一种计算机程序产品,包括计算机程序,其中,所述计算机程序在被处理器执行时实现如权利要求1-14中任一项所述的方法。 A computer program product comprises a computer program, wherein the computer program implements the method according to any one of claims 1 to 14 when executed by a processor.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211690147.9 | 2022-12-27 | ||
CN202211690147.9A CN116055723A (en) | 2022-12-27 | 2022-12-27 | Video encoding method and device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024139166A1 true WO2024139166A1 (en) | 2024-07-04 |
Family
ID=86117486
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2023/106615 WO2024139166A1 (en) | 2022-12-27 | 2023-07-10 | Video coding method and apparatus, and electronic device and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN116055723A (en) |
WO (1) | WO2024139166A1 (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116055723A (en) * | 2022-12-27 | 2023-05-02 | 上海哔哩哔哩科技有限公司 | Video encoding method and device, electronic equipment and storage medium |
CN118524222B (en) * | 2024-07-22 | 2024-09-27 | 湖南快乐阳光互动娱乐传媒有限公司 | Video transcoding method and device, storage medium and electronic equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110876060A (en) * | 2018-08-31 | 2020-03-10 | 网宿科技股份有限公司 | Code rate adjusting method and device in coding process |
CN111246209A (en) * | 2020-01-20 | 2020-06-05 | 北京字节跳动网络技术有限公司 | Adaptive encoding method, apparatus, electronic device, and computer storage medium |
US20210360233A1 (en) * | 2020-05-12 | 2021-11-18 | Comcast Cable Communications, Llc | Artificial intelligence based optimal bit rate prediction for video coding |
CN113973205A (en) * | 2021-10-21 | 2022-01-25 | 重庆邮电大学 | Code rate control bit distribution method based on video content characteristics and storage medium |
CN114885167A (en) * | 2022-04-29 | 2022-08-09 | 上海哔哩哔哩科技有限公司 | Video coding method and device |
CN116055723A (en) * | 2022-12-27 | 2023-05-02 | 上海哔哩哔哩科技有限公司 | Video encoding method and device, electronic equipment and storage medium |
-
2022
- 2022-12-27 CN CN202211690147.9A patent/CN116055723A/en active Pending
-
2023
- 2023-07-10 WO PCT/CN2023/106615 patent/WO2024139166A1/en unknown
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110876060A (en) * | 2018-08-31 | 2020-03-10 | 网宿科技股份有限公司 | Code rate adjusting method and device in coding process |
CN111246209A (en) * | 2020-01-20 | 2020-06-05 | 北京字节跳动网络技术有限公司 | Adaptive encoding method, apparatus, electronic device, and computer storage medium |
US20210360233A1 (en) * | 2020-05-12 | 2021-11-18 | Comcast Cable Communications, Llc | Artificial intelligence based optimal bit rate prediction for video coding |
CN113973205A (en) * | 2021-10-21 | 2022-01-25 | 重庆邮电大学 | Code rate control bit distribution method based on video content characteristics and storage medium |
CN114885167A (en) * | 2022-04-29 | 2022-08-09 | 上海哔哩哔哩科技有限公司 | Video coding method and device |
CN116055723A (en) * | 2022-12-27 | 2023-05-02 | 上海哔哩哔哩科技有限公司 | Video encoding method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN116055723A (en) | 2023-05-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102683700B1 (en) | Video processing method, apparatus, electronic device and storage medium and computer program | |
WO2024139166A1 (en) | Video coding method and apparatus, and electronic device and storage medium | |
JP6928041B2 (en) | Methods and equipment for processing video | |
EP3777207B1 (en) | Content-specific neural network distribution | |
CN108780499B (en) | System and method for video processing based on quantization parameters | |
US20230291909A1 (en) | Coding video frame key points to enable reconstruction of video frame | |
CN111670580B (en) | Progressive compressed domain computer vision and deep learning system | |
US10771807B1 (en) | System and method for compressing video using deep learning | |
US20220046261A1 (en) | Encoding method and apparatus for screen sharing, storage medium, and electronic device | |
CN103886623A (en) | Image compression method and equipment, and system | |
US9071814B1 (en) | Scene detection based on video encoding parameters | |
CN110248189B (en) | Video quality prediction method, device, medium and electronic equipment | |
WO2023207205A1 (en) | Video encoding method and apparatus | |
US10904542B2 (en) | Image transcoding method and apparatus | |
KR20170079852A (en) | Method and system for optimization of image encode quality | |
CN114245209A (en) | Video resolution determination method, video resolution determination device, video model training method, video coding device and video coding device | |
US11917163B2 (en) | ROI-based video coding method and device | |
US11164328B2 (en) | Object region detection method, object region detection apparatus, and non-transitory computer-readable medium thereof | |
US11395002B2 (en) | Prediction direction selection method and apparatus in image encoding, and storage medium | |
CN116847085A (en) | Video coding method, device and system | |
KR20240006667A (en) | Point cloud attribute information encoding method, decoding method, device and related devices | |
KR20240027618A (en) | Context-based image coding | |
TWI735297B (en) | Coding of video and audio with initialization fragments | |
WO2024109138A1 (en) | Video encoding method and apparatus and storage medium | |
US20150149578A1 (en) | Storage device and method of distributed processing of multimedia data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23909109 Country of ref document: EP Kind code of ref document: A1 |