CN116033183A

CN116033183A - Video frame insertion method and device

Info

Publication number: CN116033183A
Application number: CN202211648783.5A
Authority: CN
Inventors: 邱慎杰
Original assignee: Shanghai Bilibili Technology Co Ltd
Current assignee: Shanghai Bilibili Technology Co Ltd
Priority date: 2022-12-21
Filing date: 2022-12-21
Publication date: 2023-04-28
Anticipated expiration: 2042-12-21
Also published as: WO2024131035A1; CN116033183B

Abstract

The application provides a video frame inserting method and a device, wherein the video frame inserting method comprises the following steps: acquiring a continuous first video frame and a continuous second video frame from a video to be inserted; determining an interframe optical flow diagram corresponding to the first video frame and the second video frame, wherein the interframe optical flow diagram is used for indicating motion information of each pixel point from the first video frame to the second video frame; determining a target frame inserting time between the first video frame and the second video frame based on the inter-frame optical flow diagram; and determining a corresponding target synthetic frame according to the first video frame, the second video frame and the target frame inserting time, and inserting the target synthetic frame between the first video frame and the second video frame. Therefore, the frame insertion is carried out at any position between the first video frame and the second video frame, the characteristics that the frame insertion artifact is less when the frame insertion is closer to any input video frame are utilized, the accuracy of generating the composite frame based on two continuous video frames is improved, and therefore the frame insertion quality and the frame insertion effect are improved.

Description

Video frame insertion method and device

技术领域technical field

本申请涉及视频处理技术领域，特别涉及一种视频插帧方法。本申请同时涉及一种视频插帧装置，一种计算设备，以及一种计算机可读存储介质。The present application relates to the technical field of video processing, in particular to a video frame insertion method. The present application also relates to a video frame insertion device, a computing device, and a computer-readable storage medium.

背景技术Background technique

随着计算机技术和网络技术的快速发展，各种各样的视频层出不穷，观看视频已成为人们工作、休闲、娱乐的重要方式。为了提高视频的帧率和流畅度，可以在视频中连续的两视频帧之间插入合成帧，缩短帧间显示时间。With the rapid development of computer technology and network technology, various videos emerge in an endless stream, and watching videos has become an important way for people to work, relax and entertain. In order to improve the frame rate and fluency of the video, a synthetic frame can be inserted between two consecutive video frames in the video to shorten the display time between frames.

现有技术中，可以复制前一帧或后一帧的画面作为合成帧，插入至连续的两视频帧之间，即复制帧；或者，还可以将前后两视频帧进行类似双重曝光的模糊处理来得到合成帧，即混合帧；又或者，可以基于深度学习模型进行插帧，通过对前后两帧画面进行分析建模生成光流从而得到帧间线性映射关系，最终结合出合成帧。In the prior art, the picture of the previous frame or the next frame can be copied as a synthetic frame, and inserted between two consecutive video frames, that is, the copied frame; or, the two video frames before and after can also be blurred like double exposure To obtain a composite frame, that is, a mixed frame; or, frame interpolation can be performed based on a deep learning model, and the optical flow is generated by analyzing and modeling the two frames before and after to obtain a linear mapping relationship between frames, and finally a composite frame is combined.

然而，上述方法中，第一种由于依靠的是完全复制前一帧或者后一帧来提高帧率，并不会带来视觉上的提升，有时反而会导致视频观感卡顿；第二种虽然参考了前后两帧的信息，但简单的双重曝光模糊会导致较为严重的伪影，同时一帧清晰一帧模糊会给视频编解码带来额外的负担；第三种基于光流估计模型的插帧方法，当原始两帧之间真实地存在大光流时，由于光流估计不准等原因，会导致合成出来的合成帧也出现严重的伪影。也即是，现有进行视频插帧的方法中，基于连续的两帧视频帧在存在大运动时生成合成帧的准确率较低，可能会出现严重的伪影，插帧质量和插帧效果差。However, among the above methods, the first one relies on completely copying the previous frame or the next frame to increase the frame rate, which does not bring visual improvement, and sometimes causes the video to look stuck; although the second one The information of the two frames before and after is referred to, but simple double-exposure blurring will lead to more serious artifacts, and at the same time, one frame is clear and the other frame is blurred will bring additional burden to video encoding and decoding; the third interpolation based on the optical flow estimation model In the frame method, when there is a large optical flow between the original two frames, due to inaccurate optical flow estimation and other reasons, serious artifacts will also appear in the synthesized frame. That is to say, in the existing method of video frame interpolation, the accuracy of generating composite frames based on two consecutive video frames in the presence of large motion is low, and serious artifacts may occur, and the quality and effect of interpolation frames may occur. Difference.

发明内容Contents of the invention

有鉴于此，本申请实施例提供了一种视频插帧方法。本申请同时涉及一种视频插帧装置，一种计算设备，以及一种计算机可读存储介质，以解决现有技术中存在的基于连续的两帧视频帧生成合成帧的准确率较低，可能会出现严重的伪影，插帧质量和插帧效果差的技术问题。In view of this, an embodiment of the present application provides a video frame insertion method. The present application also relates to a video frame interpolation device, a computing device, and a computer-readable storage medium to solve the problem in the prior art that the accuracy of generating composite frames based on two consecutive video frames is low and may There will be serious artifacts, frame insertion quality and technical problems of poor frame insertion effect.

根据本申请实施例的第一方面，提供了一种视频插帧方法，包括：According to the first aspect of the embodiments of the present application, a video frame insertion method is provided, including:

从待插帧视频中获取连续的第一视频帧和第二视频帧；Acquiring continuous first video frames and second video frames from the video to be interpolated;

确定第一视频帧和第二视频帧对应的帧间光流图，其中，帧间光流图用于指示各像素点从第一视频帧至第二视频帧的运动信息；determining an inter-frame optical flow map corresponding to the first video frame and the second video frame, wherein the inter-frame optical flow map is used to indicate motion information of each pixel from the first video frame to the second video frame;

基于帧间光流图，确定第一视频帧和第二视频帧之间的目标插帧时刻；Based on the inter-frame optical flow graph, determine the target frame interpolation moment between the first video frame and the second video frame;

根据第一视频帧、第二视频帧和目标插帧时刻，确定对应的目标合成帧，并将目标合成帧插入第一视频帧和第二视频帧之间。According to the first video frame, the second video frame and the target frame insertion moment, the corresponding target composite frame is determined, and the target composite frame is inserted between the first video frame and the second video frame.

根据本申请实施例的第二方面，提供了一种视频插帧装置，包括：According to the second aspect of the embodiments of the present application, a video frame insertion device is provided, including:

获取模块，被配置为从待插帧视频中获取连续的第一视频帧和第二视频帧；The obtaining module is configured to obtain continuous first video frames and second video frames from the video to be inserted;

第一确定模块，被配置为确定第一视频帧和第二视频帧对应的帧间光流图，其中，帧间光流图用于指示各像素点从第一视频帧至第二视频帧的运动信息；The first determination module is configured to determine the inter-frame optical flow graph corresponding to the first video frame and the second video frame, wherein the inter-frame optical flow graph is used to indicate the distance of each pixel from the first video frame to the second video frame sports information;

第二确定模块，被配置为基于帧间光流图，确定第一视频帧和第二视频帧之间的目标插帧时刻；The second determination module is configured to determine a target frame insertion moment between the first video frame and the second video frame based on the inter-frame optical flow map;

插入模块，被配置为根据第一视频帧、第二视频帧和目标插帧时刻，确定对应的目标合成帧，并将目标合成帧插入第一视频帧和第二视频帧之间。The insertion module is configured to determine the corresponding target composite frame according to the first video frame, the second video frame and the target frame insertion moment, and insert the target composite frame between the first video frame and the second video frame.

根据本申请实施例的第三方面，提供了一种计算设备，包括：According to a third aspect of the embodiments of the present application, a computing device is provided, including:

存储器和处理器；memory and processor;

存储器用于存储计算机可执行指令，处理器用于执行计算机可执行指令，以实现下述方法：The memory is used to store computer-executable instructions, and the processor is used to execute computer-executable instructions to implement the following methods:

根据本申请实施例的第四方面，提供了一种计算机可读存储介质，其存储有计算机可执行指令，该计算机可执行指令被处理器执行时实现任意视频插帧方法的步骤。According to a fourth aspect of the embodiments of the present application, a computer-readable storage medium is provided, which stores computer-executable instructions, and when the computer-executable instructions are executed by a processor, the steps of any video frame insertion method are implemented.

本申请实施例提供的视频插帧方法，从待插帧视频中获取连续的第一视频帧和第二视频帧；确定第一视频帧和第二视频帧对应的帧间光流图，其中，帧间光流图用于指示各像素点从第一视频帧至第二视频帧的运动信息；基于帧间光流图，确定第一视频帧和第二视频帧之间的目标插帧时刻；根据第一视频帧、第二视频帧和目标插帧时刻，确定对应的目标合成帧，并将目标合成帧插入第一视频帧和第二视频帧之间。The video frame interpolation method provided in the embodiment of the present application obtains continuous first video frames and second video frames from the video to be interpolated; determines the inter-frame optical flow graph corresponding to the first video frame and the second video frame, wherein, The inter-frame optical flow diagram is used to indicate the motion information of each pixel from the first video frame to the second video frame; based on the inter-frame optical flow diagram, determine the target frame insertion moment between the first video frame and the second video frame; According to the first video frame, the second video frame and the target frame insertion moment, the corresponding target composite frame is determined, and the target composite frame is inserted between the first video frame and the second video frame.

这种情况下，可以先基于第一视频帧和第二视频帧对应的帧间光流图，确定出第一视频帧和第二视频帧之间的目标插帧时刻，然后基于第一视频帧和第二视频帧，确定出该目标插帧时刻对应的目标合成帧，自适应地指导插帧时刻往前或者往后移动，更靠近第一视频帧或第二视频帧，避免获得的目标合成帧与前后两视频帧都差别较大的情况。如此，实现了在第一视频帧和第二视频帧之间的任意位置进行插帧，利用了任意时刻插帧，插帧越靠近任一输入的视频帧，其插帧伪影越少的特点，提高了基于连续的两帧视频帧生成合成帧的准确率，大幅改善了两视频帧中大运动可能会出现的伪影，提高了插帧质量和插帧效果。In this case, the target frame insertion moment between the first video frame and the second video frame can be determined based on the inter-frame optical flow diagram corresponding to the first video frame and the second video frame, and then based on the first video frame and the second video frame, determine the target synthesis frame corresponding to the target frame insertion time, and adaptively guide the frame insertion time to move forward or backward, closer to the first video frame or the second video frame, avoiding the obtained target synthesis There is a large difference between the frame and the two video frames before and after. In this way, the frame interpolation is realized at any position between the first video frame and the second video frame, and the frame interpolation at any time is used. The closer the interpolation frame is to any input video frame, the less the interpolation artifacts are. , which improves the accuracy of generating synthetic frames based on two consecutive video frames, greatly improves the artifacts that may occur in large motion in the two video frames, and improves the quality and effect of frame interpolation.

附图说明Description of drawings

图1是本申请一实施例提供的一种视频插帧方法的流程图；FIG. 1 is a flowchart of a video frame insertion method provided by an embodiment of the present application;

图2是本申请一实施例提供的一种帧间光流图；FIG. 2 is an inter-frame optical flow diagram provided by an embodiment of the present application;

图3a是本申请一实施例提供的一种应用于两倍插帧场景下的视频插帧方法的处理过程示意图；Fig. 3a is a schematic diagram of a processing process of a video frame interpolation method applied to a double frame interpolation scenario provided by an embodiment of the present application;

图3b是本申请一实施例提供的一种第一视频帧的示意图；Fig. 3b is a schematic diagram of a first video frame provided by an embodiment of the present application;

图3c是本申请一实施例提供的一种第二视频帧的示意图；Fig. 3c is a schematic diagram of a second video frame provided by an embodiment of the present application;

图3d是本申请一实施例提供的一种合成帧的示意图；Fig. 3d is a schematic diagram of a synthesized frame provided by an embodiment of the present application;

图3e是本申请一实施例提供的另一种合成帧的示意图；Fig. 3e is a schematic diagram of another synthesized frame provided by an embodiment of the present application;

图4是本申请一实施例提供的一种视频插帧装置的结构示意图；Fig. 4 is a schematic structural diagram of a video frame insertion device provided by an embodiment of the present application;

图5是本申请一实施例提供的一种计算设备的结构框图。Fig. 5 is a structural block diagram of a computing device provided by an embodiment of the present application.

具体实施方式Detailed ways

在下面的描述中阐述了很多具体细节以便于充分理解本申请。但是本申请能够以很多不同于在此描述的其它方式来实施，本领域技术人员可以在不违背本申请内涵的情况下做类似推广，因此本申请不受下面公开的具体实施的限制。In the following description, numerous specific details are set forth in order to provide a thorough understanding of the application. However, the present application can be implemented in many other ways different from those described here, and those skilled in the art can make similar promotions without violating the connotation of the present application. Therefore, the present application is not limited by the specific implementation disclosed below.

在本申请一个或多个实施例中使用的术语是仅仅出于描述特定实施例的目的，而非旨在限制本申请一个或多个实施例。在本申请一个或多个实施例和所附权利要求书中所使用的单数形式的“一种”和“该”也旨在包括多数形式，除非上下文清楚地表示其他含义。还应当理解，本申请一个或多个实施例中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。Terms used in one or more embodiments of the present application are for the purpose of describing specific embodiments only, and are not intended to limit the one or more embodiments of the present application. As used in one or more embodiments of this application and the appended claims, the singular forms "a" and "the" are also intended to include the plural unless the context clearly dictates otherwise. It should also be understood that the term "and/or" used in one or more embodiments of the present application refers to and includes any and all possible combinations of one or more associated listed items.

应当理解，尽管在本申请一个或多个实施例中可能采用术语第一、第二等来描述各种信息，但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如，在不脱离本申请一个或多个实施例范围的情况下，第一也可以被称为第二，类似地，第二也可以被称为第一。取决于语境，如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that although the terms first, second, etc. may be used to describe various information in one or more embodiments of the present application, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, first may also be referred to as second, and similarly, second may also be referred to as first, without departing from the scope of one or more embodiments of the present application. Depending on the context, the word "if" as used herein may be interpreted as "at" or "when" or "in response to a determination."

首先，对本申请一个或多个实施例涉及的名词术语进行解释。First, terms and terms involved in one or more embodiments of the present application are explained.

视频插帧：在视频的时序上连续的每两帧画面中增加若干帧，缩短视频帧之间的显示时间，提高视频的帧率和流畅度，本申请实施例中主要针对的是在连续的两视频帧之间增加1帧，即2倍插帧。Video frame insertion: Add several frames to every two consecutive frames in the video sequence, shorten the display time between video frames, and improve the frame rate and fluency of the video. The embodiment of this application is mainly aimed at continuous Add 1 frame between two video frames, that is, double frame interpolation.

光流：空间运动物体在观察成像平面上的像素运动的瞬时速度。Optical flow: The instantaneous velocity of the pixel movement of a spatially moving object on the observation imaging plane.

伪影：合成帧中，不自然的、反常的、能让人看出是人为处理过的痕迹、区域、瑕疵等。Artifacts: unnatural, abnormal, traces, areas, blemishes, etc. that can be seen as artificially processed in the composite frame.

需要说明的是，目前插帧方法普遍有三种：第一种方法，可以复制前一帧或后一帧的画面作为合成帧，插入至连续的两视频帧之间，即复制帧；第二种方法，还可以将前后两视频帧进行类似双重曝光的模糊处理来得到合成帧，即混合帧；第三种方法，可以基于深度学习模型进行插帧，通过对前后两帧画面进行分析建模生成光流从而得到帧间线性映射关系，最终结合出合成帧。It should be noted that there are generally three frame insertion methods: the first method, which can copy the picture of the previous frame or the next frame as a composite frame, and insert it between two consecutive video frames, that is, copy the frame; the second method method, the two video frames before and after can also be blurred similarly to double exposure to obtain a composite frame, that is, a mixed frame; the third method can be based on a deep learning model for frame interpolation, which is generated by analyzing and modeling the two frames before and after The optical flow thus obtains the linear mapping relationship between frames, and finally combines the synthesized frames.

上述提到的三种插帧方法中，第一种由于依靠的是完全复制前一帧或者后一帧来提高帧率，但在实际应用中，并不会带来视觉上的提升，有时反而会导致视频观感卡顿；第二种方法虽然参考了前后两帧的信息，但简单的双重曝光模糊会导致较为严重的伪影，同时一帧清晰一帧模糊会给视频编解码带来额外的负担；而第三种方法基于深度学习模型的插帧方法，由于其依靠拟合等方式，对目标中间帧和前后两帧的映射关系进行有效建模，从而还原出合成帧，相较于上第一种和第二种插帧方式，合成出的合成帧更加合理，大量的实际应用结果也表明，基于深度学习模型的插帧结果也远好于复制帧和混合帧的插帧方法。Among the three frame interpolation methods mentioned above, the first one relies on completely copying the previous frame or the next frame to increase the frame rate, but in practical applications, it does not bring visual improvement, and sometimes It will cause the video look and feel to be stuck; although the second method refers to the information of the two frames before and after, the simple double exposure blur will lead to more serious artifacts, and at the same time, one frame is clear and the other frame is blurred. The third method is based on the frame interpolation method of the deep learning model. Because it relies on fitting and other methods, it can effectively model the mapping relationship between the target intermediate frame and the two frames before and after, thereby restoring the synthesized frame. Compared with the above With the first and second frame interpolation methods, the synthetic frames synthesized are more reasonable, and a large number of practical application results also show that the frame interpolation results based on the deep learning model are also far better than the frame interpolation methods of copy frames and mixed frames.

虽然基于深度学习模型的插帧方法已经能生成足够令人满意的结果，但其普遍存在的一个问题是：当原始两帧之间真实地存在大光流时，由于光流估计不准等原因，会导致合成出来的目标合成帧出现严重的伪影。例如，当大光流由摆动的手臂引起时，合成出来的运动中的手臂经常表现为“断肢”；当大光流由前景/背景的快速移动引起时，合成出来的前景/背景经常表现为模糊等等。Although the frame interpolation method based on the deep learning model has been able to generate satisfactory results, a common problem is that when there is a large optical flow between the original two frames, due to inaccurate optical flow estimation, etc. , will cause serious artifacts in the synthesized target composite frame. For example, when a large optical flow is caused by a swinging arm, the synthesized moving arm often appears as a "broken limb"; when a large optical flow is caused by a fast moving foreground/background, the synthesized foreground/background often appears for blur and so on.

目前常见的插帧是2倍插帧，即在两帧的正中间位置时刻插帧，通常模型在这个位置生成的光流又是相对不准的，因为此位置距离第1帧(位置0)、第2帧(位置1)的距离都是最远的，更远的时间距离意味着更大的光流、与原图更大的差别。当两输入视频帧之间存在大光流时，往往模型估计出的光流会不准确，从而影响合成帧的效果，对整体的视频观感有着十分不利的影响。At present, the common frame interpolation is 2 times the frame interpolation, that is, the frame is inserted at the middle position of the two frames. Usually, the optical flow generated by the model at this position is relatively inaccurate, because this position is far from the first frame (position 0) , The distance of the second frame (position 1) is the farthest, and a longer time distance means a greater optical flow and a greater difference from the original image. When there is a large optical flow between two input video frames, the optical flow estimated by the model will often be inaccurate, which will affect the effect of the composite frame and have a very adverse impact on the overall video look and feel.

针对深度学习模型对大运动场景插帧普遍效果差的缺点，本申请实施例中利用基于光流的插帧模型可以在两帧之间任意时刻位置(0～1)插帧、并且时间越靠近任一输入帧其插帧伪影越少的特点，借助额外的光流估计模型判断帧间光流大小来决定插帧时刻位置，从而规避模型直接在中间时刻插帧可能出现严重伪影的问题，大大提高了最终的插帧结果的感官效果。Aiming at the disadvantage that the deep learning model generally has poor frame interpolation effect on large motion scenes, in the embodiment of this application, the frame interpolation model based on optical flow can be used to interpolate frames at any time position (0-1) between two frames, and the closer the time is For any input frame, the frame interpolation artifacts are less, and the additional optical flow estimation model is used to judge the optical flow between frames to determine the position of the frame insertion time, so as to avoid the problem of serious artifacts that may occur when the model directly interpolates frames at the middle moment , which greatly improves the sensory effect of the final frame interpolation result.

在本申请中，提供了一种视频插帧方法，本申请同时涉及一种视频插帧装置，一种计算设备，以及一种计算机可读存储介质，在下面的实施例中逐一进行详细说明。In the present application, a video frame insertion method is provided, and the present application also relates to a video frame insertion device, a computing device, and a computer-readable storage medium, which are described in detail in the following embodiments one by one.

需要说明的是，用户在通过视频处理平台对视频进行编辑工作时，有时需要通过插帧来提高视频流畅度，或者在不引起视频卡顿的前提下对视频进行慢放操作等等，这些功能都需要通过视频插帧方法来实现，而提高插帧质量对用户体验有着至关重要的作用。It should be noted that when users edit videos through the video processing platform, sometimes they need to insert frames to improve the video fluency, or slow down the video without causing video freeze, etc. These functions Both need to be realized through the method of video frame insertion, and improving the quality of frame insertion plays a vital role in user experience.

图1示出了根据本申请一实施例提供的一种视频插帧方法的流程图，具体包括以下步骤：Figure 1 shows a flow chart of a video frame insertion method provided according to an embodiment of the present application, which specifically includes the following steps:

步骤102：从待插帧视频中获取连续的第一视频帧和第二视频帧。Step 102: Obtain consecutive first video frames and second video frames from the video to be inserted.

需要说明的是，待插帧视频为需要在视频的时序上连续的两帧视频帧之间插入中间合成帧的视频。第一视频帧和第二视频帧是待插帧视频中连续的两帧，且时序上第一视频帧位于第二视频帧之前。It should be noted that the frame-to-be-inserted video is a video in which an intermediate synthesis frame needs to be inserted between two consecutive video frames in time sequence. The first video frame and the second video frame are two consecutive frames in the video frame to be inserted, and the first video frame is located before the second video frame in time sequence.

实际应用中，视频处理平台可以按照设定频率截取待插帧视频，获得待插帧视频包括的各个视频帧，选取连续的两帧视频帧，时序上靠前的作为第一视频帧，时序上靠后的作为第二视频帧。In practical applications, the video processing platform can intercept the video frame to be inserted according to the set frequency, obtain each video frame included in the video frame to be inserted, select two consecutive video frames, and the first video frame is the one that is earlier in time sequence, and the first video frame in time sequence The latter is used as the second video frame.

示例的，假设截取待插帧视频，获得视频帧1、视频帧2、视频帧3、……、视频帧N-1、视频帧N，且视频帧1、视频帧2、视频帧3、……、视频帧N-1、视频帧N按照时序依次排列，此时视频帧1和视频帧2连续，视频帧1为第一视频帧，视频帧2为第二视频帧；视频帧2和视频帧3连续，视频帧2为第一视频帧，视频帧3为第二视频帧；……；视频帧N-1和视频帧N连续，视频帧N-1为第一视频帧，视频帧N为第二视频帧。For example, assume that the video frame to be inserted is intercepted, and video frame 1, video frame 2, video frame 3, ..., video frame N-1, video frame N are obtained, and video frame 1, video frame 2, video frame 3, ... ..., video frame N-1, and video frame N are arranged sequentially according to time sequence. At this time, video frame 1 and video frame 2 are continuous, video frame 1 is the first video frame, and video frame 2 is the second video frame; video frame 2 and video frame Frame 3 is continuous, video frame 2 is the first video frame, video frame 3 is the second video frame; ...; video frame N-1 and video frame N are continuous, video frame N-1 is the first video frame, video frame N is the second video frame.

本实施例一个可选的实施方式中，从待插帧视频中获取连续的第一视频帧和第二视频帧之后，还可以对第一视频帧和第二视频帧进行缩放处理，也即从待插帧视频中获取第一视频帧和第二视频帧之后，还包括：In an optional implementation manner of this embodiment, after obtaining continuous first video frames and second video frames from the video to be interpolated, scaling processing may also be performed on the first video frames and second video frames, that is, from After the first video frame and the second video frame are obtained in the video to be interpolated, it also includes:

将第一视频帧和第二视频帧缩放至设定倍数，获得第一更新视频帧和第二更新视频帧。The first video frame and the second video frame are scaled to a set multiple to obtain a first updated video frame and a second updated video frame.

具体的，设定倍数可以是预先设置的数值，用于指示第一视频帧和第二视频帧的缩放比例，为了降低第一视频帧和第二视频帧的尺寸，以提高分析速度，可以将该设定倍数设置的小于1，如设定倍数为0.25倍、0.5倍等。Specifically, the setting multiple can be a preset value, which is used to indicate the scaling ratio of the first video frame and the second video frame. In order to reduce the size of the first video frame and the second video frame and improve the analysis speed, the The setting multiple is set to be less than 1, for example, the setting multiple is 0.25 times, 0.5 times, etc.

需要说明的是，由于尺度越小分析的速度越快，但是相应的准确率会降低，因而在设置该设定倍数时，需要同时考虑分析速度和分析准确率，如0.25倍是相对其他倍数同时兼顾分析速度和光流估计准确率的一个值，也即较优地，可以将设定倍数设置为0.25。It should be noted that the smaller the scale, the faster the analysis speed, but the corresponding accuracy rate will be reduced. Therefore, when setting the set multiple, it is necessary to consider both the analysis speed and the analysis accuracy rate. For example, 0.25 times is relative to other multiples at the same time. A value that takes both analysis speed and optical flow estimation accuracy into consideration, that is, preferably, the setting multiple can be set to 0.25.

实际应用中，可以通过最近邻插值、三次样条插值、线性插值、区域插值等方法，实现第一视频帧的缩放，将第一视频帧和第二视频帧缩放至设定倍数，以获得第一更新视频帧和第二更新视频帧。In practical applications, the scaling of the first video frame can be achieved by methods such as nearest neighbor interpolation, cubic spline interpolation, linear interpolation, and area interpolation, and the first and second video frames can be scaled to a set multiple to obtain the second An updated video frame and a second updated video frame.

示例的，假设第一视频帧为视频帧I₀，第二视频帧为视频帧I₁，此时可以将I₀和I₁缩放至原始尺寸的0.25倍，即4倍下采样后得到更新视频帧I₀′和I₁′，后续可以基于该更新视频帧I₀′和I₁′进行处理，确定出对应的帧间光流图。For example, assuming that the first video frame is video frame I ₀ and the second video frame is video frame I ₁ , at this time, I ₀ and I ₁ can be scaled to 0.25 times the original size, that is, the updated video can be obtained after 4 times downsampling Frames I ₀ ′ and I ₁ ′ can be subsequently processed based on the updated video frames I ₀ ′ and I ₁ ′ to determine a corresponding inter-frame optical flow map.

本申请实施例中，从待插帧视频中获取连续的第一视频帧和第二视频帧之后，还可以对第一视频帧和第二视频帧进行缩放处理，以缩小第一视频帧和第二视频帧的尺寸，在保证一定准确率的基础上，提升后续进行光流分析时对视频帧的分析速度。In the embodiment of the present application, after obtaining the continuous first video frame and the second video frame from the video to be interpolated, the first video frame and the second video frame may also be scaled to reduce the size of the first video frame and the second video frame. The size of the second video frame, on the basis of ensuring a certain accuracy, improves the analysis speed of the video frame in the subsequent optical flow analysis.

步骤104：确定第一视频帧和第二视频帧对应的帧间光流图，其中，帧间光流图用于指示各像素点从第一视频帧至第二视频帧的运动信息。Step 104: Determine an inter-frame optical flow map corresponding to the first video frame and the second video frame, wherein the inter-frame optical flow map is used to indicate motion information of each pixel from the first video frame to the second video frame.

需要说明的是，获取到第一视频帧和第二视频帧后，可以确定第一视频帧和第二视频帧对应的帧间光流图，其中，光流是一个运动矢量，可以表示像素点的运动方向和运动距离，因而该帧间光流图可以指示各像素点从第一视频帧至第二视频帧的运动信息，也即可以指示各像素点的变化情况。It should be noted that after the first video frame and the second video frame are obtained, the inter-frame optical flow graph corresponding to the first video frame and the second video frame can be determined, wherein the optical flow is a motion vector, which can represent the pixel point Therefore, the inter-frame optical flow map can indicate the motion information of each pixel from the first video frame to the second video frame, that is, it can indicate the change of each pixel.

实际应用中，可以将第一视频帧和第二视频帧输入至训练完成的光流估计模型中，获得光流估计模型输出的帧间光流图。其中，该光流估计模型可以是任意基于深度学习的光流估计模型(例如：RAFT(Recurrent All Pairs Field Transforms for OpticalFlow，光流场的递归全对场变换)，一种新的光流深度神经架构；FlowNet(LearningOptical Flow with Convolutional Networks，神经光流网络)，用卷积网络实现光流预测；……等)或者是传统光流估计算法(例如：Lucas-Kanade，一种两帧差分的光流估计算法)。In practical applications, the first video frame and the second video frame may be input into the trained optical flow estimation model to obtain an inter-frame optical flow map output by the optical flow estimation model. Among them, the optical flow estimation model can be any optical flow estimation model based on deep learning (for example: RAFT (Recurrent All Pairs Field Transforms for Optical Flow, recursive all pairs of field transformation of optical flow field), a new optical flow deep neural Architecture; FlowNet (Learning Optical Flow with Convolutional Networks, neural optical flow network), using convolutional network to achieve optical flow prediction; ... etc.) or traditional optical flow estimation algorithm (for example: Lucas-Kanade, a two-frame differential optical flow estimation algorithm).

其中，帧间光流图可以表示第一时间戳至第二时间戳(或者也可以是第二时间戳至第一时间戳，此时不考虑方向)的帧间光流大小，该第一时间戳为第一视频帧的时间戳，第二时间戳为第二视频帧的时间戳。图2是本申请一实施例提供的一种帧间光流图，如图2所示，帧间光流图中各像素点的像素值可以表示像素点的光流强度，如像素值越大，颜色越深，代表光流强度越大。Among them, the inter-frame optical flow graph can represent the size of the inter-frame optical flow from the first time stamp to the second time stamp (or from the second time stamp to the first time stamp, regardless of the direction at this time), the first time The timestamp is the timestamp of the first video frame, and the second timestamp is the timestamp of the second video frame. Fig. 2 is an inter-frame optical flow diagram provided by an embodiment of the present application. As shown in Fig. 2, the pixel value of each pixel in the inter-frame optical flow diagram can represent the optical flow intensity of the pixel, such as the larger the pixel value , the darker the color, the greater the optical flow intensity.

另外，光流估计模型的训练数据大都是人工合成的，例如在3D游戏制作过程中，计算机通常会生成游戏人物及场景的运动向量，即光流，此时可以将连续两帧画面和对应的光流图作为一对训练数据，此时初始模型的输入是两帧画面，标签是训练数据中的光流图，通过有监督训练的方式对初始模型进行训练，获得训练完成的光流估计模型。In addition, the training data of the optical flow estimation model is mostly artificially synthesized. For example, in the process of 3D game production, the computer usually generates the motion vectors of game characters and scenes, that is, optical flow. At this time, two consecutive frames of pictures and the corresponding The optical flow diagram is used as a pair of training data. At this time, the input of the initial model is two frames, and the label is the optical flow diagram in the training data. The initial model is trained through supervised training to obtain the optical flow estimation model after training. .

本实施例一个可选的实施方式中，若获取到第一视频帧和第二视频帧后，还可进行了缩放处理，那么确定第一视频帧和第二视频帧对应的帧间光流图，具体实现过程可以如下：In an optional implementation manner of this embodiment, if after the first video frame and the second video frame are obtained, scaling processing can also be performed, then determine the inter-frame optical flow graph corresponding to the first video frame and the second video frame , the specific implementation process can be as follows:

将第一更新视频帧和第二更新视频帧输入至训练完成的光流估计模型中，获得光流估计模型输出的帧间光流图。The first updated video frame and the second updated video frame are input into the trained optical flow estimation model, and an inter-frame optical flow map output by the optical flow estimation model is obtained.

需要说明的是，若获取到第一视频帧和第二视频帧后，还将第一视频帧和第二视频帧缩放至设定倍数，获得了第一更新视频帧和第二更新视频帧，此时可以将第一更新视频帧和第二更新视频帧输入至训练完成的光流估计模型中，获得光流估计模型输出的帧间光流图。如此，缩小了第一视频帧和第二视频帧的尺寸，通过光流估计模型分析缩小尺寸后的第一更新视频帧和第二更新视频帧，确定出对应的帧间光流图，在保证一定准确率的基础上，提升了对视频帧的分析速度。It should be noted that if the first video frame and the second video frame are acquired, the first video frame and the second video frame are scaled to a set multiple to obtain the first updated video frame and the second updated video frame, At this point, the first updated video frame and the second updated video frame may be input into the trained optical flow estimation model to obtain an inter-frame optical flow map output by the optical flow estimation model. In this way, the size of the first video frame and the second video frame is reduced, and the optical flow estimation model is used to analyze the reduced first updated video frame and the second updated video frame to determine the corresponding inter-frame optical flow graph. On the basis of a certain accuracy rate, the analysis speed of video frames is improved.

步骤106：基于帧间光流图，确定第一视频帧和第二视频帧之间的目标插帧时刻。Step 106: Based on the inter-frame optical flow map, determine the target frame interpolation moment between the first video frame and the second video frame.

需要说明的是，由于帧间光流图用于指示各像素点从第一视频帧至第二视频帧的运动信息，因而可以基于帧间光流图，分析各个像素点从第一视频帧至第二视频帧的运动变化情况，确定第一视频帧和第二视频帧之间适合插入视频帧的目标插帧时刻。It should be noted that since the inter-frame optical flow graph is used to indicate the motion information of each pixel from the first video frame to the second video frame, it is possible to analyze the motion information of each pixel from the first video frame to the second video frame based on the inter-frame optical flow graph. The motion variation of the second video frame determines the target frame insertion moment suitable for inserting the video frame between the first video frame and the second video frame.

本申请实施例提供的视频插帧方法，可以应用于两倍插帧，即在连续的两个视频帧之间插入一帧视频帧，具体实现时，并不直接在连续的两帧视频帧的中间位置插入对应的合成帧，而是分析各个像素点从第一视频帧至第二视频帧的运动变化情况，确定第一视频帧和第二视频帧之间适合插入视频帧的目标插帧时刻，避免大幅度运动导致的伪影。The video frame interpolation method provided in the embodiment of the present application can be applied to double frame interpolation, that is, to insert a video frame between two consecutive video frames. Insert the corresponding synthetic frame at the middle position, but analyze the motion changes of each pixel point from the first video frame to the second video frame, and determine the target frame insertion moment suitable for inserting video frames between the first video frame and the second video frame , to avoid artifacts caused by large movements.

当然，实际应用中，本申请实施例提供的视频插帧方法，也可以应用于四倍插帧、八倍插帧等场景，只需要基于帧间光流图，在第一视频帧和第二视频帧之间确定出适合插入视频帧的对应数值个目标插帧时刻即可。Of course, in practical applications, the video frame interpolation method provided by the embodiment of the present application can also be applied to scenarios such as quadruple frame interpolation and eightfold frame interpolation. It is only necessary to determine a corresponding number of target frame interpolation moments suitable for inserting video frames between the video frames.

本申请实施例中，确定出的目标插帧时刻可以是第一视频帧和第二视频帧之间的任意时刻位置，也即是说，本申请实施例中可以借助额外的光流估计模型判断帧间光流大小，从而决定插帧时刻位置，以在两视频帧之间任意时刻位置进行插帧，从而规避了直接在中间时刻插帧可能出现严重伪影的问题，大大提高了最终的插帧结果的感官效果。In the embodiment of the present application, the determined target frame insertion time can be any time position between the first video frame and the second video frame, that is to say, in the embodiment of the present application, an additional optical flow estimation model can be used to judge The size of the optical flow between frames determines the position of the frame insertion time, so that the frame can be inserted at any time between two video frames, thereby avoiding the problem of serious artifacts that may occur when interpolating frames directly in the middle, and greatly improving the final interpolation. The sensory effect of the frame result.

本实施例一个可选的实施方式，基于帧间光流图，确定第一视频帧和第二视频帧之间的目标插帧时刻，具体实现过程可以如下：In an optional implementation manner of this embodiment, the target frame insertion time between the first video frame and the second video frame is determined based on the inter-frame optical flow diagram, and the specific implementation process may be as follows:

基于帧间光流图，确定第一视频帧和第二视频帧之间的光流强度指标；determining an optical flow intensity index between the first video frame and the second video frame based on the inter-frame optical flow graph;

根据光流强度范围和插帧时刻之间的对应关系，确定光流强度指标对应的目标插帧时刻。According to the corresponding relationship between the optical flow intensity range and the frame insertion moment, the target frame insertion moment corresponding to the optical flow intensity index is determined.

需要说明的是，视频处理平台中可以预先设置有光流强度范围和插帧时刻之间的对应关系，在基于帧间光流图，确定出第一视频帧和第二视频帧之间的光流强度指标之后，可以直接基于光流强度范围和插帧时刻之间的对应关系，确定出光流强度指标对应的目标插帧时刻。It should be noted that the corresponding relationship between the optical flow intensity range and the frame insertion time can be preset in the video processing platform, and the optical flow between the first video frame and the second video frame can be determined based on the inter-frame optical flow graph. After the flow intensity index, the target frame insertion moment corresponding to the optical flow intensity index can be determined directly based on the correspondence between the optical flow intensity range and the frame insertion moment.

实际应用中，光流强度指标越小，说明运动越小，可以选择中间时刻作为目标插帧时刻；光流强度指标越大，说明运动越大，应该尽量向前靠近第一视频帧或者向后靠近第二视频，因而光流强度范围和插帧时刻之间的对应关系中，一个光流强度范围可能对应有两个插帧时刻，一个插帧时刻向前偏移靠近第一视频帧，另一个插帧时刻向后偏移靠近第二视频帧，且向前偏移的数值和向后偏移的数值对称。在基于光流强度范围和插帧时刻之间的对应关系，确定光流强度指标对应的目标插帧时刻时，可以任意选择向前偏移的插帧时刻，或者向后偏移的插帧时刻，但是针对一个待插帧视频，选择规则应当一致，也即针对一个待插帧视频，在光流强度指标较大时，每次插帧均向前偏移，或者均向后偏移。In practical applications, the smaller the optical flow intensity index, the smaller the motion, and you can choose the middle moment as the target frame insertion time; the larger the optical flow intensity index, the greater the motion, and you should try to move forward as close to the first video frame or backward as possible. It is close to the second video, so in the correspondence between the optical flow intensity range and the frame insertion moment, one optical flow intensity range may correspond to two frame insertion moments, one frame insertion moment is shifted forward and close to the first video frame, and the other A frame insertion moment is shifted backwards close to the second video frame, and the value of the forward shift is symmetrical to the value of the backward shift. When determining the target frame insertion time corresponding to the optical flow intensity index based on the correspondence between the optical flow intensity range and the frame insertion time, you can arbitrarily select the forward offset frame insertion time or the backward offset frame insertion time , but for a frame-to-be-inserted video, the selection rules should be the same, that is, for a frame-to-be-inserted video, when the optical flow intensity index is large, each frame to be inserted is shifted forward or backward.

示例的，预先设置的光流强度范围和插帧时刻之间的对应关系表如下表1所示，假设基于帧间光流图，确定出的第一视频帧和第二视频帧之间的光流强度指标为30，且假设光流强度指标较大时，插帧时刻向左偏移，基于如下表1，可以确定出对应的目标插帧时刻t＝0.25。As an example, the correspondence table between the preset optical flow intensity range and the frame insertion time is shown in Table 1 below, assuming that the optical flow between the first video frame and the second video frame is determined based on the inter-frame optical flow graph The flow intensity index is 30, and it is assumed that when the optical flow intensity index is larger, the frame insertion time shifts to the left. Based on the following table 1, the corresponding target frame insertion time t=0.25 can be determined.

表1光流强度范围和插帧时刻之间的对应关系表Table 1 Correspondence table between optical flow intensity range and frame insertion time

光流强度范围Optical flow intensity range 插帧时刻Interpolation time 小于等于15Less than or equal to 15 0.50.5 大于15小于等于20greater than 15 less than or equal to 20 0.4(或者0.6)0.4 (or 0.6) 大于20小于等于25greater than 20 less than or equal to 25 0.3(或者0.7)0.3 (or 0.7) 大于25greater than 25 0.25(或者0.75)0.25 (or 0.75)

本申请实施例中，可以对帧间光流图中各像素点的像素值进行分析，以分析各个像素点的光流强度，从而确定出第一视频帧和第二视频帧之间的光流强度指标；然后，可以根据光流强度范围和插帧时刻之间的对应关系，确定出光流强度指标对应的目标插帧时刻。如此，通过预先设置的对应关系，即可确定出光流强度指标对应的目标插帧时刻，操作简单，提高了分析效率，借助了额外的光流估计模型判断帧间光流大小来决定具体的插帧时刻位置，从而规避模型直接在中间时刻插帧可能出现严重伪影的问题，大大提高了最终的插帧结果的感官效果。In the embodiment of the present application, the pixel values of each pixel in the inter-frame optical flow diagram can be analyzed to analyze the optical flow intensity of each pixel, so as to determine the optical flow between the first video frame and the second video frame intensity index; then, according to the corresponding relationship between the optical flow intensity range and the frame insertion moment, the target frame insertion moment corresponding to the optical flow intensity index can be determined. In this way, through the preset corresponding relationship, the target frame insertion time corresponding to the optical flow intensity index can be determined. The operation is simple and the analysis efficiency is improved. The specific interpolation time is determined by using an additional optical flow estimation model to determine the size of the optical flow between frames. Frame time position, so as to avoid the problem of serious artifacts that may occur when the model directly interpolates frames at intermediate moments, and greatly improves the sensory effect of the final frame interpolation result.

本实施例一个可选的实施方式中，基于帧间光流图，确定第一视频帧和第二视频帧之间的光流强度指标，包括：In an optional implementation manner of this embodiment, the optical flow intensity index between the first video frame and the second video frame is determined based on the inter-frame optical flow graph, including:

确定目标像素点在横轴方向上的横轴分量，以及在纵轴方向上的纵轴分量，其中，目标像素点为帧间光流图中的任一像素点；Determine the horizontal axis component of the target pixel point in the horizontal axis direction and the vertical axis component in the vertical axis direction, wherein the target pixel point is any pixel point in the inter-frame optical flow diagram;

基于帧间光流图中各像素点的横轴分量，确定平均横轴分量，并基于帧间光流图中各像素点的纵轴分量，确定平均纵轴分量；Determine the average horizontal axis component based on the horizontal axis components of each pixel in the inter-frame optical flow diagram, and determine the average vertical axis component based on the vertical axis components of each pixel in the inter-frame optical flow diagram;

基于横轴分量、纵轴分量、平均横轴分量和平均纵轴分量，确定第一视频帧和第二视频帧之间的光流强度指标。An optical flow intensity index between the first video frame and the second video frame is determined based on the horizontal axis component, the vertical axis component, the average horizontal axis component, and the average vertical axis component.

实际应用中，可以通过如下公式(1)确定第一视频帧和第二视频帧之间的光流强度指标：In practical applications, the optical flow intensity index between the first video frame and the second video frame can be determined by the following formula (1):

Ind＝max(max₉₉(F_x)/max(1.0，mean(abs(F_x)))，max₉₉(F_y)/max(1.0，mean(abs(F_y)))) (1)Ind=max(max ₉₉ (F _x )/max(1.0, mean(abs(F _x ))), max ₉₉ (F _y )/max(1.0, mean(abs(F _y )))) (1)

其中，Ind表示光流强度指标；max为最大值函数，min为最小值函数，mean为平均值函数，abs为绝对值函数。F_x为目标像素点的光流F在横轴方向上的横轴分量，F_y为目标像素点的光流F在纵轴方向上的纵轴分量；max₉₉表示最大值，取各像素点的光流中99分位的值作为最大光流值，其目的是为了排除异常大值对最终结果的影响。Among them, Ind represents the optical flow intensity index; max is the maximum value function, min is the minimum value function, mean is the average value function, and abs is the absolute value function. F _x is the horizontal axis component of the optical flow F of the target pixel point in the horizontal axis direction, and F _y is the vertical axis component of the optical flow F of the target pixel point in the vertical axis direction; max ₉₉ represents the maximum value, and each pixel point is taken The 99th percentile value of the optical flow is used as the maximum optical flow value, and its purpose is to exclude the influence of abnormally large values on the final result.

由上述公式(1)可知，分别计算两个分量上最大光流值和平均光流值的比值，而不直接采用光流的最大值作为最终的光流强度指标，可以减少帧间光流图中运动幅度的作用。另外，mean(abs(Fx))的值有可能小于1，因而上述公式(1)中将平均光流值的下限设置为1，可以防止平均光流值小于1时所求的光流强度指标结果被放大，也是防止异常小值对结果的影响，然后再取两个分量中最大的一个比值作为最终的光流强度指标。From the above formula (1), it can be seen that calculating the ratio of the maximum optical flow value and the average optical flow value on the two components, instead of directly using the maximum value of the optical flow as the final optical flow intensity index, can reduce the inter-frame optical flow map. The role of range of motion. In addition, the value of mean(abs(Fx)) may be less than 1, so setting the lower limit of the average optical flow value in the above formula (1) to 1 can prevent the optical flow intensity index obtained when the average optical flow value is less than 1 The result is amplified to prevent the influence of abnormally small values on the result, and then take the largest ratio of the two components as the final optical flow intensity index.

需要说明的是，光流是一个矢量，一般得到的帧间光流图是一个(h,w,2)的两通道、尺寸和原始视频帧相同的图像数据，假如两帧视频帧中的同一个物体在运动过程中都只是一个像素点的大小，那么此物体在两视频帧之间运动的距离和方向则是此物体或者是此像素点的光流，在水平方向的移动距离和方向就是光流在横轴方向上的横轴分量，在垂直方向上的移动距离和方向就是光流在纵轴方向上的纵轴分量，两个矢量组成最终的光流矢量。It should be noted that the optical flow is a vector, and the generally obtained inter-frame optical flow map is a (h, w, 2) two-channel image data with the same size as the original video frame. If the same An object is only the size of a pixel during its movement, so the distance and direction of the object’s movement between two video frames are the optical flow of the object or the pixel, and the moving distance and direction in the horizontal direction are The horizontal axis component of the optical flow in the horizontal axis direction, the moving distance and direction in the vertical direction is the vertical axis component of the optical flow in the vertical axis direction, and the two vectors form the final optical flow vector.

本申请实施例中，可以基于帧间光流图，确定出第一视频帧和第二视频帧之间的光流强度指标，后续可以基于该光流强度指标，确定出第一视频帧和第二视频帧之间适合插入视频帧的目标插帧时刻，借助了额外的光流估计模型判断帧间光流大小来决定具体的插帧时刻位置，从而规避模型直接在中间时刻插帧可能出现严重伪影的问题，大大提高了最终的插帧结果的感官效果。In the embodiment of the present application, the optical flow intensity index between the first video frame and the second video frame can be determined based on the inter-frame optical flow graph, and then the first video frame and the second video frame can be determined based on the optical flow intensity index. The target frame insertion time suitable for inserting video frames between two video frames, with the help of an additional optical flow estimation model to determine the size of the optical flow between frames to determine the specific frame insertion time position, so as to avoid the possibility of serious problems when the model directly inserts frames at the middle time The problem of artifacts greatly improves the sensory effect of the final interpolation results.

步骤108：根据第一视频帧、第二视频帧和目标插帧时刻，确定对应的目标合成帧，并将目标合成帧插入第一视频帧和第二视频帧之间。Step 108: According to the first video frame, the second video frame and the target frame insertion time, determine the corresponding target composite frame, and insert the target composite frame between the first video frame and the second video frame.

需要说明的是，确定出的目标插帧时刻是第一视频帧和第二视频帧之间，适合插入对应合成帧的时刻，因而可以根据第一视频帧、第二视频帧和目标插帧时刻，确定对应的目标合成帧，该目标合成帧为目标插帧时刻对应的合成帧，然后可以将目标合成帧插入第一视频帧和第二视频帧之间，实现在两视频帧之间插入合成帧。It should be noted that the determined target frame insertion time is between the first video frame and the second video frame, which is suitable for inserting the corresponding synthesized frame, so it can be based on the first video frame, the second video frame and the target frame insertion time , to determine the corresponding target composite frame, the target composite frame is the composite frame corresponding to the target frame insertion moment, and then the target composite frame can be inserted between the first video frame and the second video frame to realize the insertion and synthesis between the two video frames frame.

具体实现时，虽然目标合成帧是目标插帧时刻对应的混合结果，但是在将目标合成帧插入第一视频帧和第二视频帧之间时，可以插入第一视频帧和第二视频帧之间的任意位置处，只需要保证目标合成帧位于第一视频帧和第二视频帧之间即可。During specific implementation, although the target composite frame is the mixed result corresponding to the target frame insertion moment, when the target composite frame is inserted between the first video frame and the second video frame, it can be inserted between the first video frame and the second video frame It is only necessary to ensure that the target composite frame is located between the first video frame and the second video frame.

本实施例一个可选的实施方式中，根据第一视频帧、第二视频帧和目标插帧时刻，确定对应的目标合成帧，包括：In an optional implementation manner of this embodiment, determining the corresponding target composite frame according to the first video frame, the second video frame and the target frame insertion moment includes:

根据目标插帧时刻，生成插帧时刻信息；Generate frame insertion time information according to the target frame insertion time;

将第一视频帧、第二视频帧和插帧时刻信息输入训练完成的插帧模型，获得插帧模型输出的目标合成帧，其中，目标合成帧为插帧时刻信息指示的目标插帧时刻对应的合成帧。Input the first video frame, the second video frame and the frame insertion time information into the trained frame insertion model to obtain the target composite frame output by the frame insertion model, wherein the target composite frame corresponds to the target frame insertion time indicated by the frame insertion time information composite frames.

具体的，插帧时刻信息是基于目标插帧时刻生成的、插帧模型可以识别分析的信息，该插帧时刻信息可以是指基于目标插帧时刻生成的一个单独向量。另外，该插帧模型可以是任何一种基于深度学习并支持任意时刻插帧的模型(例如：RIFE(Real-TimeIntermediate Flow Estimation forVideo Frame Interpolation，一种实时中间流估计算法)，IFRNet(Intermediate Feature Refine Network for Efficient FrameInterpolation，只包含一个编解码结构的视频插帧网络)，……等等)，采用此插帧模型是因为其推理速度快，同时能够在两视频帧之间的任意时刻位置进行插帧。Specifically, the frame insertion time information is information that is generated based on the target frame insertion time and that can be identified and analyzed by the frame insertion model, and the frame insertion time information may refer to a separate vector generated based on the target frame insertion time. In addition, the frame interpolation model can be any model that is based on deep learning and supports frame interpolation at any time (for example: RIFE (Real-Time Intermediate Flow Estimation for Video Frame Interpolation, a real-time intermediate flow estimation algorithm), IFRNet (Intermediate Feature Refine Network for Efficient FrameInterpolation, which only includes a video frame interpolation network with a codec structure), ... etc.), this frame interpolation model is adopted because of its fast reasoning speed, and it can interpolate at any time between two video frames frame.

实际应用中，可以获取一段训练视频，从该训练视频中截取多个视频帧，每三个连续的视频帧作为一组三元训练数据，每组三元训练数据中的第一个视频帧和第三个视频帧作为输入数据输入初始模型，中间的视频帧作为样本标签，通过有监督训练的方式对初始模型进行训练，获得训练完成的插帧模型。In practical applications, a training video can be obtained, multiple video frames are intercepted from the training video, and every three consecutive video frames are used as a set of ternary training data, and the first video frame and The third video frame is input into the initial model as the input data, and the middle video frame is used as the sample label. The initial model is trained through supervised training, and the trained frame interpolation model is obtained.

需要说明的是，可以将第一视频帧、第二视频帧和插帧时刻信息输入训练完成的插帧模型，获得插帧模型针对目标插帧时刻输出的目标合成帧，利用基于光流的插帧模型可以在两帧之间任意时刻位置进行插帧、并且时间越靠近任一输入帧其插帧伪影越少的特点，借助额外的光流估计模型判断帧间光流大小来决定插帧时刻位置，从而规避模型直接在中间时刻插帧可能出现严重伪影的问题，大大提高了最终的插帧结果的感官效果。It should be noted that the first video frame, the second video frame, and the frame insertion time information can be input into the trained frame interpolation model to obtain the target composite frame output by the frame interpolation model for the target frame insertion time, and use optical flow-based interpolation The frame model can interpolate frames at any time between two frames, and the closer the time is to any input frame, the less the interpolation artifacts are. The additional optical flow estimation model is used to determine the size of the optical flow between frames to determine the frame interpolation Time position, so as to avoid the problem of serious artifacts that may occur when the model interpolates frames directly in the middle, and greatly improves the sensory effect of the final frame interpolation result.

另外，如果获取到第一视频帧和第二视频帧之后，对第一视频帧和第二视频帧进行了缩放，获得了第一更新视频帧和第二更新视频帧，该第一更新视频帧和第二更新视频帧仅用于光流估计模型确定帧间光流图，为了保证插帧模型有足够的参考信息，向插帧模型中输入的始终是原始获取到的第一视频帧和第二视频帧。In addition, if after the first video frame and the second video frame are acquired, the first video frame and the second video frame are scaled to obtain the first updated video frame and the second updated video frame, and the first updated video frame and the second updated video frame are only used for the optical flow estimation model to determine the inter-frame optical flow graph. In order to ensure that the frame interpolation model has sufficient reference information, the input to the frame interpolation model is always the original acquired first video frame and the second video frame. Two video frames.

本实施例一个可选的实施方式中，将第一视频帧、第二视频帧和插帧时刻信息输入训练完成的插帧模型，获得插帧模型输出的目标合成帧，具体实现过程可以如下：In an optional implementation of this embodiment, the first video frame, the second video frame and the frame insertion time information are input into the trained frame insertion model to obtain the target composite frame output by the frame insertion model. The specific implementation process can be as follows:

将第一视频帧、第二视频帧和插帧时刻信息输入插帧模型的光流分析层，通过光流分析层确定第一时间戳至目标插帧时刻的第一光流，以及目标插帧时刻至第二时间戳的第二光流，其中，第一时间戳为第一视频帧的时间戳，第二时间戳为第二视频帧的时间戳；Input the first video frame, the second video frame and the frame insertion time information into the optical flow analysis layer of the frame insertion model, and determine the first optical flow from the first timestamp to the target frame insertion time through the optical flow analysis layer, and the target frame insertion time The second optical flow from the moment to the second time stamp, wherein the first time stamp is the time stamp of the first video frame, and the second time stamp is the time stamp of the second video frame;

通过插帧模型的采样层基于第一光流从第一视频帧中采样，获得第一采样结果，并基于第二光流从第二视频帧中采样，获得第二采样结果；Sampling from the first video frame based on the first optical flow through the sampling layer of the frame interpolation model to obtain a first sampling result, and sampling from the second video frame based on the second optical flow to obtain a second sampling result;

通过插帧模型的融合层基于设定融合权重，对第一采样结果和第二采样结果进行融合，获得并输出目标合成帧。Based on the set fusion weight, the fusion layer of the frame interpolation model fuses the first sampling result and the second sampling result to obtain and output the target composite frame.

实际应用中，向插帧模型输入两帧连续视频帧以及需要插帧的目标插帧时刻，插帧模型的光流分析层会先生成第一时间戳至目标插帧时刻的第一光流和目标插帧时刻至第二时间戳的第二光流，再通过采用层的映射操作(warp操作)分别从两张输入的视频帧中采样，获得第一采样结果和第二采样结果，并通过插帧模型的融合层，按插帧模型生成的融合权重对两次采样结果融合，生成最终目标插帧时刻对应的目标合成帧。In practical applications, two consecutive video frames and the target frame insertion time to be interpolated are input to the frame interpolation model, and the optical flow analysis layer of the frame interpolation model will first generate the first optical flow and The second optical flow from the target frame interpolation moment to the second time stamp is sampled from the two input video frames respectively by using layer mapping operation (warp operation) to obtain the first sampling result and the second sampling result, and pass The fusion layer of the frame interpolation model fuses the two sampling results according to the fusion weight generated by the frame interpolation model to generate the target composite frame corresponding to the final target frame interpolation moment.

需要说明的是，从获得的光流中可以知道第一视频帧中某一像素点在两视频帧之间的运动方向和大小，然后通过映射操作(warp操作)将对应像素点根据获得的两个光流可以映射回对应的坐标位置，实现对输入的两视频帧进行采样。另外，设定融合权重是插帧模型的一个中间输出结果，插帧模型可自行计算出前后两视频帧的每个像素点的权重。例如，第一视频帧的像素点a和第二视频帧的像素点b都被映射到中间的合成帧的某一位置C，但一个位置只能放一个像素点，此时需要权衡a、b两个像素点在C中的比重，此时可以使用上述设定融合权重来平衡。It should be noted that the motion direction and size of a certain pixel in the first video frame between the two video frames can be known from the obtained optical flow, and then the corresponding pixel is transformed according to the obtained two video frames through a mapping operation (warp operation). Each optical flow can be mapped back to the corresponding coordinate position to realize sampling of the two input video frames. In addition, setting the fusion weight is an intermediate output result of the frame interpolation model, and the frame interpolation model can calculate the weight of each pixel of the two video frames before and after by itself. For example, the pixel a of the first video frame and the pixel b of the second video frame are both mapped to a certain position C of the intermediate composite frame, but only one pixel can be placed in one position. At this time, a and b need to be weighed The proportion of the two pixels in C can be balanced by setting the fusion weight above.

本申请实施例中，可以借助额外的光流估计模型得到两帧之间的光流，并作为一种先验信息自适应地指导插帧模型的插帧时刻往前或者往后移动。往前移动，目标合成帧会更接近第一视频帧，此时第一时间戳至目标插帧时刻的光流会相对较小，虽然目标插帧时刻至第二时间戳的光流会很大，但即使目标插帧时刻至第二时间戳的光流估计不准确，在融合过程中由于插帧模型机制使得目标插帧时刻至第二时间戳的光流的融合权重会很小，因此不会对最终结果造成实质性的影响，最终较小的融合权重会降低目标插帧时刻至第二时间戳的光流对结果的影响程度，保证了不会造成与前后两帧都不像的局面，而是更接近第一视频帧；同理，如果将目标合成帧时刻往后移，能够使得目标合成帧与第二视频帧更像。如此，利用了任意时刻插帧及插帧越靠近任一输入帧，其插帧伪影越少的特点，大幅改善大运动导致的中间时刻插帧出现伪影的问题。In the embodiment of the present application, the optical flow between two frames can be obtained by means of an additional optical flow estimation model, and used as a priori information to adaptively guide the frame insertion time of the frame interpolation model to move forward or backward. Moving forward, the target composite frame will be closer to the first video frame. At this time, the optical flow from the first time stamp to the target frame insertion time will be relatively small, although the optical flow from the target frame insertion time to the second time stamp will be large , but even if the optical flow estimation from the target frame insertion time to the second time stamp is inaccurate, the fusion weight of the optical flow from the target frame insertion time to the second time stamp will be very small due to the frame insertion model mechanism in the fusion process, so it is not It will have a substantial impact on the final result. In the end, the smaller fusion weight will reduce the influence of the optical flow from the target frame insertion time to the second timestamp on the result, ensuring that it will not cause a situation that is not the same as the two frames before and after. , but closer to the first video frame; similarly, if the time of the target composite frame is moved backward, the target composite frame can be made more similar to the second video frame. In this way, the characteristics of frame interpolation at any time and the closer the interpolation frame is to any input frame, the less interpolation artifacts are used, and the problem of artifacts in interpolation frames at intermediate moments caused by large movements is greatly improved.

本实施例一个可选的实施方式中，将目标合成帧插入第一视频帧和第二视频帧之间之后，还包括：In an optional implementation manner of this embodiment, after inserting the target synthesis frame between the first video frame and the second video frame, it further includes:

确定当前是否满足插帧结束条件；Determine whether the frame insertion end condition is currently satisfied;

若满足插帧结束条件，则将插入目标合成帧后的视频作为获得的插帧视频；If the frame interpolation end condition is met, the video after inserting the target composite frame is used as the obtained frame interpolation video;

若不满足插帧结束条件，则继续执行从待插帧视频中确定连续的第一视频帧和第二视频帧的操作步骤。If the frame insertion end condition is not satisfied, the operation step of determining the continuous first video frame and the second video frame from the video to be inserted is continued.

具体的，插帧结束条件是预先设置的、待插帧视频完成插帧所需满足的条件，如插帧结束条件可以为待插帧视频中任意连续的两视频帧之间均完成了插入目标合成帧；或者，如插帧结束条件还可以为待插帧视频中指定得连续两视频帧之间完成了插入目标合成帧。Specifically, the frame insertion end condition is a pre-set condition that needs to be met for the video to be inserted to complete the frame insertion. For example, the frame insertion end condition can be that the insertion target has been completed between any consecutive two video frames in the video to be inserted. Composite frame; or, for example, the end condition of frame insertion can also be that the insertion target composite frame is completed between two consecutive video frames specified in the video frame to be inserted.

实际应用中，将目标合成帧插入第一视频帧和第二视频帧之间，然后可以确定当前是否满足插帧结束条件；若满足插帧结束条件，则说明插帧完成，此时可以将插入目标合成帧后的视频作为最终获得的插帧视频；若不满足插帧结束条件，则继续执行从待插帧视频中确定连续的第一视频帧和第二视频帧的操作步骤，继续在两视频帧之间的目标插帧时刻插入合成帧，直至满足插帧结束条件。In practical applications, the target composite frame is inserted between the first video frame and the second video frame, and then it can be determined whether the frame insertion end condition is satisfied; if the frame insertion end condition is satisfied, the frame insertion is completed, and the insertion The video after the target synthesized frame is used as the frame interpolation video finally obtained; if the frame interpolation end condition is not met, then continue to perform the operation steps of determining the continuous first video frame and the second video frame from the video to be interpolated frame, and continue between the two The target frame interpolation moment between video frames inserts composite frames until the frame interpolation end condition is met.

需要说明的是，本申请实施例提供的视频插帧方法，可以作为一个即插即用的扩展模块，可用于任何一种支持任意时刻插帧的插帧模型。It should be noted that the video frame insertion method provided in the embodiment of the present application can be used as a plug-and-play extension module, and can be used for any frame insertion model that supports frame insertion at any time.

本申请实施例提供的视频插帧方法，可以先基于第一视频帧和第二视频帧对应的帧间光流图，确定出第一视频帧和第二视频帧之间的目标插帧时刻，然后基于第一视频帧和第二视频帧，确定出该目标插帧时刻对应的目标合成帧，自适应地指导插帧时刻往前或者往后移动，更靠近第一视频帧或第二视频帧，避免获得的目标合成帧与前后两视频帧都差别较大的情况。如此，实现了在第一视频帧和第二视频帧之间的任意位置进行插帧，利用了任意时刻插帧，插帧越靠近任一输入的视频帧，其插帧伪影越少的特点，提高了基于连续的两帧视频帧生成合成帧的准确率，大幅改善了两视频帧中大运动可能会出现的伪影，提高了插帧质量和插帧效果。The video frame interpolation method provided in the embodiment of the present application can first determine the target frame interpolation time between the first video frame and the second video frame based on the inter-frame optical flow diagram corresponding to the first video frame and the second video frame, Then, based on the first video frame and the second video frame, determine the target composite frame corresponding to the target frame insertion moment, and adaptively guide the frame insertion moment to move forward or backward, closer to the first video frame or the second video frame , to avoid the situation that the obtained target composite frame is quite different from the two video frames before and after. In this way, the frame interpolation is realized at any position between the first video frame and the second video frame, and the frame interpolation at any time is used. The closer the interpolation frame is to any input video frame, the less the interpolation artifacts are. , which improves the accuracy of generating synthetic frames based on two consecutive video frames, greatly improves the artifacts that may occur in large motion in the two video frames, and improves the quality and effect of frame interpolation.

下述结合附图3a，以本申请提供的视频插帧方法在两倍插帧场景下的应用为例，对视频插帧方法进行进一步说明。其中，图3a示出了本申请一实施例提供的一种应用于两倍插帧场景下的视频插帧方法的处理过程示意图，图3b是本申请一实施例提供的一种第一视频帧的示意图，图3c是本申请一实施例提供的一种第二视频帧的示意图，图3d是本申请一实施例提供的一种合成帧的示意图，图3e是本申请一实施例提供的另一种合成帧的示意图，具体包括以下步骤：The video frame interpolation method will be further described below by taking the application of the video frame interpolation method provided by the present application in the double-fold frame interpolation scenario as an example in conjunction with Fig. 3a. Among them, Fig. 3a shows a schematic diagram of the processing process of a video frame interpolation method applied to the double frame interpolation scenario provided by an embodiment of the present application, and Fig. 3b is a first video frame provided by an embodiment of the present application 3c is a schematic diagram of a second video frame provided by an embodiment of the present application, FIG. 3d is a schematic diagram of a synthesized frame provided by an embodiment of the present application, and FIG. 3e is another schematic diagram provided by an embodiment of the present application A schematic diagram of a synthetic frame, specifically comprising the following steps:

已知时间上连续的两视频帧I₀和I₁，如图3b和图3c所示，现在要合成两视频帧之间的目标合成帧I_t，其中t为0～1之间的任意浮点数，具体实现方式可以如下：Two video frames I ₀ and I ₁ that are continuous in time are known, as shown in Fig. 3b and Fig. 3c, now it is necessary to synthesize the target synthesis frame I _t between the two video frames, where t is any float between 0 and 1 Points, the specific implementation can be as follows:

将上述两连续的视频帧I₀和I₁输入光流估计模型F，输出对应的帧间光流图，根据得到的帧间光流图，计算光流强度指标Ind，再根据光流强度指标Ind确定此次插帧的目标插帧时刻t；然后，将视频帧I₀和I₁，以及目标插帧时刻t输入插帧模型M，输出最终的目标合成帧I_t，将目标合成帧I_t插入至两视频帧I₀和I₁之间的任意位置处。Input the above two consecutive video frames I ₀ and I ₁ into the optical flow estimation model F, output the corresponding inter-frame optical flow graph, calculate the optical flow intensity index Ind according to the obtained inter-frame optical flow graph, and then calculate the optical flow intensity index Ind according to the optical flow intensity index Ind determines the target frame insertion time t of this frame interpolation; then, input the video frames I ₀ and I ₁ and the target frame insertion time t into the frame interpolation model M, output the final target synthesis frame I _t , and combine the target synthesis frame I _t is inserted into any position between the two video frames I ₀ and I ₁ .

需要说明的是，基于光流的插帧模型在处理复杂运动或者大运动时，由于光流估计错误，通常会出现严重的伪影，实际应用中，常见的插帧都在对视频进行两倍插帧，即两连续的视频帧之间插一帧，插帧时刻为t＝0.5，这个时刻距两端(0，1)都是最远的一个距离，当输入的两视频帧之间存在大光流时，由于插帧模型对于大运动的建模能力较差，导致最终结果如图3d一样，挥动的小臂消失。It should be noted that when the frame interpolation model based on optical flow is dealing with complex motion or large motion, serious artifacts usually occur due to optical flow estimation errors. In practical applications, common frame interpolation doubles the video Frame insertion, that is, inserting a frame between two consecutive video frames, the frame insertion time is t=0.5, which is the farthest distance from both ends (0, 1) at this time, when there is a When the optical flow is large, due to the poor modeling ability of the interpolation model for large movements, the final result is as shown in Figure 3d, and the waving forearm disappears.

本申请实施例中，引入了光流强度指标判断具体的目标插帧时刻后，插帧模型可以提前得知两视频帧之间的光流过大可能会导致插帧结果出现伪影，因此可以放弃在插帧时刻上最合理的t＝0.5，而采用插帧结果可能更好的t＝0.25的插帧时刻(基于光流强度指标计算出的目标插帧时刻)，由于从0至0.5时刻的帧间光流要大于0至0.25时刻的帧间光流，后者更容易被插帧模型拟合出并且更不容易出现伪影，最终得到的目标合成帧可以如图3e所示，摆动的手臂依然完整没有“断肢”，虽然图中的手臂只挥到了0.25时刻，在视频播放过程中连续性也许会弱于理想情况下的0.5时刻，但完整性却远高于实际情况下的0.5时刻，因此以损失一小部分流畅度的代价下，帮助插帧视频获得更高的完整性，而尽量保留完整性是因为人眼对亮度更为敏感，断肢部分的伪影由于不完整会导致在视频播放过程中表现为“闪烁”，更容易被人眼发现，完整的手臂则不会出现闪烁，从而真正提高最终视频的感官效果。In the embodiment of this application, after the optical flow intensity index is introduced to judge the specific target frame insertion time, the frame insertion model can know in advance that the excessive optical flow between two video frames may cause artifacts in the frame insertion result, so it can Abandon the most reasonable t=0.5 at the frame interpolation time, and use the frame interpolation time t=0.25 (the target frame interpolation time calculated based on the optical flow intensity index) that may have a better frame interpolation result. Since the time from 0 to 0.5 The optical flow between frames is greater than the optical flow between frames from 0 to 0.25. The latter is easier to be fitted by the frame interpolation model and less prone to artifacts. The final target composite frame can be shown in Figure 3e, swinging The arm is still intact and there is no "broken limb". Although the arm in the picture only swings to the 0.25 moment, the continuity may be weaker than the ideal 0.5 moment during video playback, but the integrity is much higher than the actual situation. 0.5 moment, so at the cost of losing a small part of fluency, it helps the frame interpolation video to obtain higher integrity, and the integrity is preserved as much as possible because the human eye is more sensitive to brightness, and the artifacts of the severed limbs are due to incompleteness This will result in a "flicker" during video playback, which is easier to spot by the human eye, while the full arm will not appear to flicker, which really improves the sensory effect of the final video.

与上述方法实施例相对应，本申请还提供了视频插帧装置实施例，图4示出了本申请一实施例提供的一种视频插帧装置的结构示意图。如图4所示，该装置包括：Corresponding to the foregoing method embodiments, the present application also provides an embodiment of a video frame insertion device. FIG. 4 shows a schematic structural diagram of a video frame insertion device provided by an embodiment of the present application. As shown in Figure 4, the device includes:

获取模块402，被配置为从待插帧视频中获取连续的第一视频帧和第二视频帧；The obtaining module 402 is configured to obtain continuous first video frames and second video frames from the video to be inserted;

第一确定模块404，被配置为确定第一视频帧和第二视频帧对应的帧间光流图，其中，帧间光流图用于指示各像素点从第一视频帧至第二视频帧的运动信息；The first determination module 404 is configured to determine the inter-frame optical flow graph corresponding to the first video frame and the second video frame, wherein the inter-frame optical flow graph is used to indicate that each pixel is from the first video frame to the second video frame sports information;

第二确定模块406，被配置为基于帧间光流图，确定第一视频帧和第二视频帧之间的目标插帧时刻；The second determination module 406 is configured to determine a target frame insertion moment between the first video frame and the second video frame based on the inter-frame optical flow map;

插入模块408，被配置为根据第一视频帧、第二视频帧和目标插帧时刻，确定对应的目标合成帧，并将目标合成帧插入第一视频帧和第二视频帧之间。The insertion module 408 is configured to determine a corresponding target composite frame according to the first video frame, the second video frame and the target frame insertion moment, and insert the target composite frame between the first video frame and the second video frame.

可选地，第二确定模块406，进一步被配置为：Optionally, the second determination module 406 is further configured to:

可选地，插入模块408，进一步被配置为：Optionally, the plug-in module 408 is further configured to:

可选地，该装置还包括缩放模块，被配置为：Optionally, the device also includes a scaling module configured to:

将第一视频帧和第二视频帧缩放至设定倍数，获得第一更新视频帧和第二更新视频帧；Scaling the first video frame and the second video frame to a set multiple to obtain the first updated video frame and the second updated video frame;

相应地，第一确定模块404，进一步被配置为：Correspondingly, the first determination module 404 is further configured to:

可选地，该装置还包括第三确定模块，被配置为：Optionally, the device further includes a third determination module configured to:

若不满足插帧结束条件，则继续返回运行上述获取模块402。If the frame insertion end condition is not satisfied, continue to return to the above acquisition module 402 to run.

本申请实施例提供的视频插帧装置，可以先基于第一视频帧和第二视频帧对应的帧间光流图，确定出第一视频帧和第二视频帧之间的目标插帧时刻，然后基于第一视频帧和第二视频帧，确定出该目标插帧时刻对应的目标合成帧，自适应地指导插帧时刻往前或者往后移动，更靠近第一视频帧或第二视频帧，避免获得的目标合成帧与前后两视频帧都差别较大的情况。如此，实现了在第一视频帧和第二视频帧之间的任意位置进行插帧，利用了任意时刻插帧，插帧越靠近任一输入的视频帧，其插帧伪影越少的特点，提高了基于连续的两帧视频帧生成合成帧的准确率，大幅改善了两视频帧中大运动可能会出现的伪影，提高了插帧质量和插帧效果。The video frame insertion device provided in the embodiment of the present application can first determine the target frame insertion time between the first video frame and the second video frame based on the inter-frame optical flow graph corresponding to the first video frame and the second video frame, Then, based on the first video frame and the second video frame, determine the target composite frame corresponding to the target frame insertion moment, and adaptively guide the frame insertion moment to move forward or backward, closer to the first video frame or the second video frame , to avoid the situation that the obtained target composite frame is quite different from the two video frames before and after. In this way, the frame interpolation is realized at any position between the first video frame and the second video frame, and the frame interpolation at any time is used. The closer the interpolation frame is to any input video frame, the less the interpolation artifacts are. , which improves the accuracy of generating synthetic frames based on two consecutive video frames, greatly improves the artifacts that may occur in large motion in the two video frames, and improves the quality and effect of frame interpolation.

上述为本实施例的一种视频插帧装置的示意性方案。需要说明的是，该视频插帧装置的技术方案与上述的视频插帧方法的技术方案属于同一构思，视频插帧装置的技术方案未详细描述的细节内容，均可以参见上述视频插帧方法的技术方案的描述。The foregoing is a schematic solution of a video frame insertion device in this embodiment. It should be noted that the technical solution of the video frame insertion device and the technical solution of the above-mentioned video frame insertion method belong to the same concept. For details not described in detail in the technical solution of the video frame insertion device, you can refer to the above-mentioned video frame insertion method. Description of the technical solution.

图5示出了根据本申请一实施例提供的一种计算设备的结构框图。该计算设备500的部件包括但不限于存储器510和处理器520。处理器520与存储器510通过总线530相连接，数据库550用于保存数据。Fig. 5 shows a structural block diagram of a computing device provided according to an embodiment of the present application. Components of the computing device 500 include, but are not limited to, a memory 510 and a processor 520 . The processor 520 is connected to the memory 510 through the bus 530, and the database 550 is used for saving data.

计算设备500还包括接入设备540，接入设备540使得计算设备500能够经由一个或多个网络560通信。这些网络的示例包括公用交换电话网(PSTN，Public SwitchedTelephone Network)、局域网(LAN，LocalAreaNetwork)、广域网(WAN，WideAreaNetwork)、个域网(PAN，PersonalAreaNetwork)或诸如因特网的通信网络的组合。接入设备540可以包括有线或无线的任何类型的网络接口(例如，网络接口卡(NIC，Network InterfaceController))中的一个或多个，诸如IEEE802.11无线局域网(WLAN，WirelessLocalAreaNetworks)无线接口、全球微波互联接入(Wi-MAX，Worldwide InteroperabilityforMicrowave Access)接口、以太网接口、通用串行总线(USB，Universal Serial Bus)接口、蜂窝网络接口、蓝牙接口、近场通信(NFC，NearField Communication)接口，等等。Computing device 500 also includes an access device 540 that enables computing device 500 to communicate via one or more networks 560 . Examples of these networks include a public switched telephone network (PSTN, Public Switched Telephone Network), a local area network (LAN, LocalAreaNetwork), a wide area network (WAN, WideAreaNetwork), a personal area network (PAN, PersonalAreaNetwork) or a combination of communication networks such as the Internet. The access device 540 may include one or more of wired or wireless network interfaces of any type (for example, a network interface card (NIC, Network Interface Controller)), such as an IEEE802.11 wireless local area network (WLAN, WirelessLocalAreaNetworks) wireless interface, global Microwave Interoperability (Wi-MAX, Worldwide Interoperability for Microwave Access) interface, Ethernet interface, Universal Serial Bus (USB, Universal Serial Bus) interface, cellular network interface, Bluetooth interface, Near Field Communication (NFC, NearField Communication) interface, etc.

在本申请的一个实施例中，计算设备500的上述部件以及图5中未示出的其他部件也可以彼此相连接，例如通过总线。应当理解，图5所示的计算设备结构框图仅仅是出于示例的目的，而不是对本申请范围的限制。本领域技术人员可以根据需要，增添或替换其他部件。In an embodiment of the present application, the above-mentioned components of the computing device 500 and other components not shown in FIG. 5 may also be connected to each other, for example, through a bus. It should be understood that the structural block diagram of the computing device shown in FIG. 5 is only for the purpose of illustration, rather than limiting the scope of the application. Those skilled in the art can add or replace other components as needed.

计算设备500可以是任何类型的静止或移动计算设备，包括移动计算机或移动计算设备(例如，平板计算机、个人数字助理、膝上型计算机、笔记本计算机、上网本等)、移动电话(例如，智能手机)、可佩戴的计算设备(例如，智能手表、智能眼镜等)或其他类型的移动设备，或者诸如台式计算机或PC的静止计算设备。计算设备500还可以是移动式或静止式的服务器。Computing device 500 may be any type of stationary or mobile computing device, including mobile computers or mobile computing devices (e.g., tablet computers, personal digital assistants, laptop computers, notebook computers, netbooks, etc.), mobile telephones (e.g., smartphones), ), wearable computing devices (eg, smart watches, smart glasses, etc.), or other types of mobile devices, or stationary computing devices such as desktop computers or PCs. Computing device 500 may also be a mobile or stationary server.

其中，处理器520用于执行如下计算机可执行指令，以实现下述方法：Wherein, the processor 520 is configured to execute the following computer-executable instructions to implement the following methods:

上述为本实施例的一种计算设备的示意性方案。需要说明的是，该计算设备的技术方案与上述的视频插帧方法的技术方案属于同一构思，计算设备的技术方案未详细描述的细节内容，均可以参见上述视频插帧方法的技术方案的描述。The foregoing is a schematic solution of a computing device in this embodiment. It should be noted that the technical solution of the computing device and the above-mentioned technical solution of the video frame insertion method belong to the same concept. For details not described in detail in the technical solution of the computing device, please refer to the description of the technical solution of the above-mentioned video frame insertion method .

本申请一实施例还提供一种计算机可读存储介质，其存储有计算机可执行指令，该计算机可执行指令被处理器执行时实现任意视频插帧方法的步骤。An embodiment of the present application also provides a computer-readable storage medium, which stores computer-executable instructions, and when the computer-executable instructions are executed by a processor, steps of any video frame insertion method are implemented.

上述为本实施例的一种计算机可读存储介质的示意性方案。需要说明的是，该存储介质的技术方案与上述的视频插帧方法的技术方案属于同一构思，存储介质的技术方案未详细描述的细节内容，均可以参见上述视频插帧方法的技术方案的描述。The foregoing is a schematic solution of a computer-readable storage medium in this embodiment. It should be noted that the technical solution of the storage medium and the above-mentioned technical solution of the video frame insertion method belong to the same concept, and details not described in detail in the technical solution of the storage medium can be found in the description of the technical solution of the above-mentioned video frame insertion method .

上述对本申请特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下，在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外，在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中，多任务处理和并行处理也是可以的或者可能是有利的。The foregoing describes specific embodiments of the present application. Other implementations are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in an order different from that in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. Multitasking and parallel processing are also possible or may be advantageous in certain embodiments.

计算机指令包括计算机程序代码，计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。计算机可读介质可以包括：能够携带计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、电载波信号、电信信号以及软件分发介质等。Computer instructions include computer program code, which may be in source code form, object code form, executable file, or some intermediate form. The computer-readable medium may include: any entity or device capable of carrying computer program code, recording medium, U disk, removable hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory), random access Memory (RAM, Random Access Memory), electrical carrier signal, telecommunication signal and software distribution medium, etc.

需要说明的是，对于前述的各方法实施例，为了简便描述，故将其都表述为一系列的动作组合，但是本领域技术人员应该知悉，本申请并不受所描述的动作顺序的限制，因为依据本申请，某些步骤可以采用其它顺序或者同时进行。其次，本领域技术人员也应该知悉，说明书中所描述的实施例均属于优选实施例，所涉及的动作和模块并不一定都是本申请所必须的。It should be noted that, for the sake of simplicity of description, the aforementioned method embodiments are expressed as a series of action combinations, but those skilled in the art should know that the present application is not limited by the described action sequence. Depending on the application, certain steps may be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by this application.

在上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述的部分，可以参见其它实施例的相关描述。In the foregoing embodiments, the descriptions of each embodiment have their own emphases, and for parts not described in detail in a certain embodiment, reference may be made to relevant descriptions of other embodiments.

以上公开的本申请优选实施例只是用于帮助阐述本申请。可选实施例并没有详尽叙述所有的细节，也不限制该发明仅为所述的具体实施方式。显然，根据本申请的内容，可作很多的修改和变化。本申请选取并具体描述这些实施例，是为了更好地解释本申请的原理和实际应用，从而使所属技术领域技术人员能很好地理解和利用本申请。本申请仅受权利要求书及其全部范围和等效物的限制。The preferred embodiments of the present application disclosed above are only used to help clarify the present application. Alternative embodiments are not exhaustive in all detail, nor are the inventions limited to specific implementations described. Obviously, many modifications and changes can be made according to the content of this application. This application selects and specifically describes these embodiments in order to better explain the principles and practical applications of this application, so that those skilled in the art can well understand and use this application. This application is to be limited only by the claims, along with their full scope and equivalents.

Claims

1. A method for video framing, comprising:

acquiring a continuous first video frame and a continuous second video frame from a video to be inserted;

determining an inter-frame optical flow diagram corresponding to the first video frame and the second video frame, wherein the inter-frame optical flow diagram is used for indicating motion information of each pixel point from the first video frame to the second video frame;

determining a target frame insertion time between the first video frame and the second video frame based on the inter-frame optical flow map;

And determining a corresponding target synthetic frame according to the first video frame, the second video frame and the target frame inserting time, and inserting the target synthetic frame between the first video frame and the second video frame.

2. The video interpolation method of claim 1, wherein the determining a target interpolation time between the first video frame and the second video frame based on the inter-frame optical flow map comprises:

determining an optical flow intensity index between the first video frame and the second video frame based on the inter-frame optical flow map;

and determining the target frame inserting time corresponding to the optical flow intensity index according to the corresponding relation between the optical flow intensity range and the frame inserting time.

3. The video interpolation method of claim 2, wherein the determining an optical flow intensity index between the first video frame and the second video frame based on the inter-frame optical flow map comprises:

determining a horizontal axis component of a target pixel point in the horizontal axis direction and a vertical axis component of the target pixel point in the vertical axis direction, wherein the target pixel point is any pixel point in the interframe optical flow graph;

determining an average transverse axis component based on the transverse axis components of each pixel point in the interframe light flow graph, and determining an average longitudinal axis component based on the longitudinal axis components of each pixel point in the interframe light flow graph;

An optical flow intensity index between the first video frame and the second video frame is determined based on the horizontal axis component, the vertical axis component, the average horizontal axis component, and the average vertical axis component.

4. The method of video interpolation according to claim 1, wherein determining a corresponding target composite frame based on the first video frame, the second video frame, and the target interpolation time, comprises:

generating frame inserting time information according to the target frame inserting time;

inputting the first video frame, the second video frame and the frame inserting moment information into a frame inserting model after training is completed, and obtaining a target composite frame output by the frame inserting model, wherein the target composite frame is a composite frame corresponding to the target frame inserting moment indicated by the frame inserting moment information.

5. The method for video frame interpolation according to claim 4, wherein inputting the first video frame, the second video frame, and the frame interpolation time information into a frame interpolation model after training is completed, obtaining a target composite frame output by the frame interpolation model, comprises:

inputting the first video frame, the second video frame and the frame inserting moment information into an optical flow analysis layer of the frame inserting model, and determining a first optical flow from a first time stamp to the target frame inserting moment and a second optical flow from the target frame inserting moment to a second time stamp through the optical flow analysis layer, wherein the first time stamp is a time stamp of the first video frame, and the second time stamp is a time stamp of the second video frame;

Sampling from the first video frame based on the first optical flow through a sampling layer of the frame insertion model to obtain a first sampling result, and sampling from the second video frame based on the second optical flow to obtain a second sampling result;

and fusing the first sampling result and the second sampling result based on the set fusion weight through a fusion layer of the frame inserting model to obtain and output the target synthetic frame.

6. The method for inserting frames according to any one of claims 1 to 5, further comprising, after the first video frame and the second video frame are obtained from the video to be inserted, the steps of:

scaling the first video frame and the second video frame to a set multiple to obtain a first updated video frame and a second updated video frame;

accordingly, the determining the inter-frame optical flow map corresponding to the first video frame and the second video frame includes:

and inputting the first updated video frame and the second updated video frame into a trained optical flow estimation model to obtain an inter-frame optical flow diagram output by the optical flow estimation model.

7. The video interpolation method according to any one of claims 1 to 5, wherein after the target composite frame is inserted between the first video frame and the second video frame, further comprising:

Determining whether a frame inserting ending condition is met currently;

if the frame inserting ending condition is met, taking the video after the target composite frame is inserted as the obtained frame inserting video;

and if the frame inserting ending condition is not met, continuing to execute the operation step of determining the continuous first video frame and the continuous second video frame from the video to be inserted.

8. A video framing apparatus, comprising:

the acquisition module is configured to acquire a first video frame and a second video frame which are continuous from the video to be inserted;

a first determining module configured to determine an inter-frame optical flow graph corresponding to the first video frame and the second video frame, wherein the inter-frame optical flow graph is used for indicating motion information of each pixel point from the first video frame to the second video frame;

a second determination module configured to determine a target inter-frame time instant between the first video frame and the second video frame based on the inter-frame optical flow map;

and the inserting module is configured to determine a corresponding target composite frame according to the first video frame, the second video frame and the target inserting frame time, and insert the target composite frame between the first video frame and the second video frame.

9. A computing device, comprising:

a memory and a processor;

the memory is configured to store computer-executable instructions and the processor is configured to execute the computer-executable instructions to implement the method of:

10. A computer readable storage medium, characterized in that it stores computer executable instructions which, when executed by a processor, implement the steps of the video interpolation method of any one of claims 1 to 7.