CN115914606A

CN115914606A - Video processing method, system, readable storage medium and equipment terminal

Info

Publication number: CN115914606A
Application number: CN202211364949.0A
Authority: CN
Inventors: 黄子嵩; 杨灏; 向天宇; 黄荣权; 区一彤
Original assignee: Shenzhen Technology University
Current assignee: Shenzhen Technology University
Priority date: 2022-11-03
Filing date: 2022-11-03
Publication date: 2023-04-04

Abstract

The invention is suitable for the technical field of video processing, and provides a video processing method, a system, a readable storage medium and a device terminal, wherein the method comprises the steps of separating an audio signal and a video signal in an acquired video to be converted to obtain audio data and video data; splitting the video data frame by frame to obtain an image to be converted of each frame; respectively carrying out depth processing on the image to be converted of each frame, and carrying out image processing on the image to be converted according to the depth processing result to generate a corresponding virtual right image; respectively carrying out image processing on the image to be converted of each frame and the corresponding virtual right image together to generate a three-dimensional image; combining the three-dimensional images of all the frames to obtain three-dimensional video data; and merging the separated audio data and the generated three-dimensional video data to obtain the three-dimensional video. The video processing method provided by the invention solves the problems of high cost and low efficiency of the existing 3d video production.

Description

A video processing method, system, readable storage medium and device terminal

技术领域technical field

本发明属于视频处理技术领域，尤其涉及一种视频处理方法、系统、可读存储介质及设备终端。The invention belongs to the technical field of video processing, and in particular relates to a video processing method, system, readable storage medium and equipment terminal.

背景技术Background technique

目前，随着科学技术的发展以及人们生活水平的提高，越来越多的用户愿意体验新鲜的科技产品，而裸眼3d技术及对应的产品便是用户所愿意体验的科技技术中的其中一种。At present, with the development of science and technology and the improvement of people's living standards, more and more users are willing to experience new technological products, and naked-eye 3D technology and corresponding products are one of the technologies that users are willing to experience. .

其中裸眼3d技术自2013年以来已经发展了很久，但普及较少，其发展也更多是体现在硬件的发展上，而关于普通2d视频转3d视频的发展则极其缓慢，根据调研发现现在市面上没有能够自动生成3d右图虚拟视角的软件，只能手动处理，使得目前市面上流行的裸眼3d片源主要都是由厂商通过逐帧人工处理而成，具体例如2分钟的视频通常就有4600张左右的图片需要人工进行抠图处理，使得人工成本极其高昂，同时裸眼3d的视频制作效率也极低，这也导致了目前市面上可用于播放的裸眼3d视频较少。Among them, naked-eye 3D technology has been developed for a long time since 2013, but it is less popular, and its development is more reflected in the development of hardware, while the development of ordinary 2D video to 3D video is extremely slow. There is no software on the Internet that can automatically generate the virtual perspective of the 3D right picture, and it can only be processed manually. As a result, the naked-eye 3D film sources that are currently popular on the market are mainly manually processed by manufacturers frame by frame. For example, a 2-minute video usually has About 4,600 pictures need to be manually cut out, which makes the labor cost extremely high. At the same time, the efficiency of naked-eye 3D video production is also extremely low, which also leads to the lack of naked-eye 3D videos available for playback on the market.

发明内容Contents of the invention

本发明实施例的目的在于提供一种视频处理方法，旨在解决现有3d视频制作成本高及效率低的问题。The purpose of the embodiments of the present invention is to provide a video processing method, aiming at solving the problems of high cost and low efficiency of existing 3D video production.

本发明实施例是这样实现的，一种视频处理方法，所述方法包括：The embodiment of the present invention is achieved in this way, a video processing method, the method comprising:

将所获取的待转换视频中的音频信号和视频信号进行分离得到音频数据和视频数据；Separating the acquired audio signal and video signal in the video to be converted to obtain audio data and video data;

将视频数据进行逐帧拆分得到每一帧的待转换图像；Split the video data frame by frame to obtain the image to be converted for each frame;

分别将每一帧的待转换图像进行深度处理，并根据深度处理的结果对待转换图像进行图像处理生成对应的虚拟右图，所述虚拟右图为待转换图像经深度转化处理后的虚拟视点图像；Perform depth processing on the image to be converted for each frame, and perform image processing on the image to be converted according to the result of the depth processing to generate a corresponding virtual right image, the virtual right image is a virtual viewpoint image after the image to be converted has been processed by depth conversion ;

分别将每一帧的待转换图像与所对应的虚拟右图一同进行图像处理生成三维图像，所述三维图像为待转换图像与虚拟右图分别位于左右两侧的图像；Carrying out image processing on the image to be converted and the corresponding virtual right image of each frame respectively to generate a three-dimensional image, the three-dimensional image is an image in which the image to be converted and the virtual right image are respectively located on the left and right sides;

将所有帧的三维图像进行合并得到三维视频数据；Merging the three-dimensional images of all frames to obtain three-dimensional video data;

将所分离的音频数据与所生成的三维视频数据进行合并得到三维视频。Combining the separated audio data and the generated 3D video data to obtain 3D video.

更进一步地，所述分别将每一帧的待转换图像进行深度处理，并根据深度处理的结果对待转换图像进行图像处理生成对应的虚拟右图的步骤包括：Furthermore, the steps of performing depth processing on the image to be converted in each frame respectively, and performing image processing on the image to be converted according to the result of the depth processing to generate a corresponding virtual right image include:

分别将每一帧的待转换图像进行深度处理得到所对应的深度图；Depth processing is performed on the image to be converted for each frame to obtain a corresponding depth map;

分别将每一帧的深度图动态转换为所对应的蒙版；Dynamically convert the depth map of each frame into the corresponding mask;

分别根据每一帧的蒙版从所对应的待转换图像中分割出前景和背景；Separate the foreground and background from the corresponding image to be converted according to the mask of each frame;

分别将每一帧所提取的前景进行旋转拉伸及左移变换，并将所变换得到的结果填充到所对应的背景中；The extracted foreground of each frame is rotated, stretched and left-shifted respectively, and the transformed result is filled into the corresponding background;

分别将每一帧所填充到背景中的图像进行图像修复得到所对应的虚拟右图。The images filled in the background of each frame are respectively image restored to obtain the corresponding virtual right image.

更进一步地，所述分别将每一帧的待转换图像进行深度处理得到所对应的深度图的步骤包括：Further, the step of performing depth processing on each frame of the image to be converted to obtain a corresponding depth map includes:

分别将每一帧的待转换图像转换为含三原色信息的矩阵，所述三原色信息包括红、绿、蓝三种颜色通道信息；The image to be converted of each frame is converted into a matrix containing information of three primary colors respectively, and the information of three primary colors includes three color channel information of red, green and blue;

将每一帧所包括三原色信息的矩阵输入至深度处理算法中进行算法处理得到对应的深度图。Input the matrix of the three primary color information included in each frame into the depth processing algorithm for algorithm processing to obtain the corresponding depth map.

更进一步地，所述分别将每一帧的深度图动态转换为所对应的蒙版的步骤包括：Further, the step of dynamically converting the depth map of each frame into a corresponding mask respectively includes:

分别将每一帧的深度图的对比度比例进行提高调整；Improve and adjust the contrast ratio of the depth map of each frame;

分别将每一帧的深度图转换成灰度图；Convert the depth map of each frame into a grayscale image respectively;

分别将每一帧所转换为灰度图的深度图进行二值化处理得到所对应的蒙版。Binarize the depth image converted to grayscale image for each frame to obtain the corresponding mask.

更进一步地，所述分别将每一帧所填充到背景中的图像进行图像修复得到所对应的虚拟右图的步骤包括：Furthermore, the step of performing image restoration on the images filled into the background of each frame to obtain the corresponding virtual right image includes:

根据每一帧所填充到背景中的图像的像素位置绘制掩膜；Draw a mask based on the pixel positions of the image that is filled into the background each frame;

对掩膜外的空洞进行填补得到对应的虚拟右图。Fill the holes outside the mask to get the corresponding virtual right image.

更进一步地，所述分别将每一帧的待转换图像与所对应的虚拟右图一同进行图像处理生成三维图像的步骤包括：Furthermore, the step of performing image processing on the image to be converted of each frame together with the corresponding virtual right image to generate a three-dimensional image includes:

将每一帧的待转换图像与所对应的虚拟右图分别转换为对应的两组数组；Convert the image to be converted and the corresponding virtual right image of each frame into corresponding two sets of arrays;

分别对每一帧中的两组数组进行矩阵的广播，并进行数组的合并；Perform matrix broadcasting on the two sets of arrays in each frame, and merge the arrays;

分别将每一帧中所合并的数组转换为对应的三维图像。Transform the merged arrays in each frame into corresponding 3D images respectively.

更进一步地，所述将视频数据进行逐帧拆分得到每一帧的待转换图像的步骤之后还包括：Furthermore, after the step of splitting the video data frame by frame to obtain the image to be converted for each frame, it also includes:

对每一帧的待转换图像添加文字字幕。Add text subtitles to each frame of the image to be converted.

本发明另一实施例的目的还在于提供一种视频处理系统，所述系统包括：The purpose of another embodiment of the present invention is to provide a video processing system, the system includes:

分离模块，用于将所获取的待转换视频中的音频信号和视频信号进行分离得到音频数据和视频数据；A separation module, configured to separate the acquired audio signal and video signal in the video to be converted to obtain audio data and video data;

拆分模块，用于将视频数据进行逐帧拆分得到每一帧的待转换图像；A splitting module is used to split the video data frame by frame to obtain an image to be converted for each frame;

第一图像处理模块，用于分别将每一帧的待转换图像进行深度处理，并根据深度处理的结果对待转换图像进行图像处理生成对应的虚拟右图，所述虚拟右图为待转换图像经深度转化处理后的虚拟视点图像；The first image processing module is used to perform in-depth processing on the image to be converted in each frame, and perform image processing on the image to be converted according to the result of the in-depth processing to generate a corresponding virtual right image, the virtual right image is the image to be converted through image processing The virtual viewpoint image after depth conversion processing;

第二图像处理模块，用于分别将每一帧的待转换图像与所对应的虚拟右图一同进行图像处理生成三维图像，所述三维图像为待转换图像与虚拟右图分别位于左右两侧的图像；The second image processing module is used to respectively perform image processing on the image to be converted and the corresponding virtual right image of each frame to generate a three-dimensional image, and the three-dimensional image is the image to be converted and the virtual right image respectively located on the left and right sides image;

第一合并模块，用于将所有帧的三维图像进行合并得到三维视频数据；The first combining module is used to combine the three-dimensional images of all frames to obtain three-dimensional video data;

第二合并模块，用于将所分离的音频数据与所生成的三维视频数据进行合并得到三维视频。The second merging module is used for merging the separated audio data and the generated 3D video data to obtain 3D video.

本发明另一实施例的目的还在于提供一种可读存储介质，其存储有程序，所述程序被处理器执行时实现如上述所述的视频处理方法。The purpose of another embodiment of the present invention is to provide a readable storage medium, which stores a program, and when the program is executed by a processor, the above-mentioned video processing method is realized.

本发明另一实施例的目的还在于提供一种设备终端，包括存储器、处理器以及存储在存储器上并可在处理器上运行的程序，所述处理器执行所述程序时实现如上述所述的视频处理方法。The purpose of another embodiment of the present invention is to provide a device terminal, including a memory, a processor, and a program stored in the memory and operable on the processor. When the processor executes the program, the above-mentioned video processing method.

本发明实施例提供的视频处理方法，通过将需要制作成裸眼3d视频的原始待转换视频先经过音视频的分离，使得先得到音频数据以及视频数据，然后将视频数据进行逐帧分帧处理，从而得到每一帧的待转换图像，然后利用视觉处理技术将待转换图像转换为具体3d右图虚拟视角的虚拟右图，并将待转换图像与虚拟右图进行拼接使得生成三维图像，然后通过将所有帧的三维图像进行合并得到三维视频数据，并在三维视频数据中并入所分离的音频数据，使得最终生成三维视频，使得将原图转换为裸眼3d图像的处理可以实现自动化，实现自动化将原视频转成对应的裸眼3d视频，从而极大地降低了裸眼3d视频制作的高昂人工成本，提高制作裸眼3d的视频制作，有利于裸眼3d的普及，解决了现有3d视频制作成本高及效率低的问题。In the video processing method provided by the embodiment of the present invention, the original video to be converted that needs to be made into a naked-eye 3D video is first separated from the audio and video, so that the audio data and video data are first obtained, and then the video data is processed frame by frame. In order to obtain the image to be converted for each frame, and then use the visual processing technology to convert the image to be converted into a virtual right image of a specific 3D virtual perspective of the right image, and splice the image to be converted with the virtual right image to generate a three-dimensional image, and then pass Merge the 3D images of all frames to obtain 3D video data, and incorporate the separated audio data into the 3D video data, so that the final 3D video can be generated, so that the process of converting the original image into a naked-eye 3D image can be automated, and the automation will The original video is converted into the corresponding naked-eye 3D video, which greatly reduces the high labor cost of naked-eye 3D video production, improves the production of naked-eye 3D video, is conducive to the popularization of naked-eye 3D, and solves the high cost and efficiency of existing 3D video production. low problem.

附图说明Description of drawings

图1是本发明实施例提供的视频处理方法的流程图；Fig. 1 is a flowchart of a video processing method provided by an embodiment of the present invention;

图2是本发明实施例提供的视频处理方法的又一流程图；Fig. 2 is another flow chart of the video processing method provided by the embodiment of the present invention;

图3是本发明实施例提供的视频处理系统的结构示意图；FIG. 3 is a schematic structural diagram of a video processing system provided by an embodiment of the present invention;

图4是本发明实施例提供的设备终端的结构示意图。Fig. 4 is a schematic structural diagram of a device terminal provided by an embodiment of the present invention.

具体实施方式Detailed ways

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

在本发明中，除非另有明确的规定和限定，术语“安装”、“相连”、“连接”、“固定”等术语应做广义理解，例如，可以是固定连接，也可以是可拆卸连接，或一体地连接；可以是机械连接，也可以是电连接；可以是直接相连，也可以通过中间媒介间接相连，可以是两个元件内部的连通。对于本领域的普通技术人员而言，可以根据具体情况理解上述术语在本发明中的具体含义。本文所使用的术语“及/或”包括一个或多个相关的所列项目的任意的和所有的组合。In the present invention, unless otherwise clearly specified and limited, terms such as "installation", "connection", "connection" and "fixation" should be understood in a broad sense, for example, it can be a fixed connection or a detachable connection , or integrally connected; it may be mechanically connected or electrically connected; it may be directly connected or indirectly connected through an intermediary, and it may be the internal communication of two components. Those of ordinary skill in the art can understand the specific meanings of the above terms in the present invention according to specific situations. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

实施例一Embodiment one

请参阅图1，是本发明第一实施例提供的视频处理方法的流程示意图，为了便于说明，仅示出了与本发明实施例相关的部分，该方法包括：Please refer to FIG. 1, which is a schematic flowchart of the video processing method provided by the first embodiment of the present invention. For the convenience of description, only the parts related to the embodiment of the present invention are shown. The method includes:

步骤S10，将所获取的待转换视频中的音频信号和视频信号进行分离得到音频数据和视频数据；Step S10, separating the acquired audio signal and video signal in the video to be converted to obtain audio data and video data;

其中，在本发明的一个实施例中，该视频处理方法用于将现有普通的二维视频进行处理后转换为可三维显示的三维视频。具体的，该视频处理方法可应用于设备终端，其中设备终端可以为智能终端、智能平板、电视、投影仪等用于视频源播放的终端，当然设备终端也可以为VR设备、VR盒子等用于虚拟现实体验的终端，另外设备终端还可以为服务器等用于数据处理的终端。Wherein, in one embodiment of the present invention, the video processing method is used to process and convert existing common 2D video into 3D video that can be displayed in 3D. Specifically, the video processing method can be applied to a device terminal, where the device terminal can be a terminal for playing video sources such as a smart terminal, a smart tablet, a TV, a projector, etc. Of course, the device terminal can also be a VR device, a VR box, etc. In addition, the device terminal can also be a terminal for data processing such as a server.

其中，该设备终端中存储有待转换视频，其待转换视频可以为现有所制作完成的视频，也可以为设备终端所拍摄得到的视频，该方法先通过将所获取的待转换视频中的音频信号和视频信号进行分离得到音频数据和视频数据。具体的，其将待转换视频进行视频格式的解码解封装，当在识别到音频轨道后，提取视频中的音频声道，并单独保存提取完的视频数据与音频数据，并将音频数据与视频数据进行分离，此时视频数据为不含音频信息的视频。需要指出的是，当待转换视频中不含有音频时，则相应的不会分离得到与音频数据，此时直接进入步骤S20。Wherein, the video to be converted is stored in the device terminal, and the video to be converted can be an existing video or a video captured by the device terminal. The method first converts the acquired video to be converted. Signal and video signal are separated to obtain audio data and video data. Specifically, it decodes and decapsulates the video format to be converted. After the audio track is identified, it extracts the audio channel in the video, saves the extracted video data and audio data separately, and combines the audio data with the video The data is separated, and at this time, the video data is a video without audio information. It should be pointed out that, when the video to be converted does not contain audio, the corresponding audio data will not be separated, and the process directly enters step S20.

在本发明的一个实施例中，具体操作时，其利用Python中的moviepy.editor开源模块提取待转换视频中的音频声道，其moviepy中使用ffmpeg可对视频、音频文件进行读取和导出，而使用VideoFileClip可对视频进行读取，得到VideoFileClip对象，记作clip，之后可以对这个clip进行任意的操作，例如进行剪切、合并、调整亮度、速度、提取音频和其它的clip拼接在一起等等操作。In one embodiment of the present invention, during specific operations, it utilizes the moviepy.editor open source module in Python to extract the audio channel in the video to be converted, and uses ffmpeg in its moviepy to read and export video and audio files, And use VideoFileClip to read the video, get the VideoFileClip object, record it as clip, and then perform arbitrary operations on the clip, such as cutting, merging, adjusting brightness, speed, extracting audio and splicing together with other clips, etc. and so on.

步骤S20，将视频数据进行逐帧拆分得到每一帧的待转换图像；Step S20, splitting the video data frame by frame to obtain an image to be converted for each frame;

其中，在本发明的一个实施例中，在将待转换视频分离出音频数据和视频数据后，其将所分离得到的视频数据进行分帧处理，使得将视频数据进行拆分得到每一帧的待转换图像，并将所分离好的所有待转换图像单独保存在文件夹中以用于后续处理。Wherein, in one embodiment of the present invention, after the video to be converted is separated into audio data and video data, it divides the separated video data into frames, so that the video data is split to obtain the Images to be converted, and all the separated images to be converted are saved separately in a folder for subsequent processing.

在本发明的一个实施例中，具体操作时，其利用python对上述待转换视频进行解码后，利用python的Opencv(Open Source Computer Vision Library)开源库调用cv.VideoCapture API打开视频数据，并逐帧保存图像，使得逐帧提取所分离好的视频数据得到每一帧的待转换图像，然后单独保存在文件夹中用于后续处理。In one embodiment of the present invention, during specific operation, after it utilizes python to decode above-mentioned video to be converted, utilizes the Opencv (Open Source Computer Vision Library) open source library of python to call cv.VideoCapture API to open video data, and frame by frame Save the image so that the separated video data is extracted frame by frame to obtain the image to be converted for each frame, and then saved separately in the folder for subsequent processing.

步骤S30，分别将每一帧的待转换图像进行深度处理，并根据深度处理的结果对待转换图像进行图像处理生成对应的虚拟右图；Step S30, respectively performing in-depth processing on the image to be converted in each frame, and performing image processing on the image to be converted according to the result of the in-depth processing to generate a corresponding virtual right image;

其中，在本发明的一个实施例中，在将视频数据进行分帧处理得到每一帧的待转换图像后再对所有帧的待转换图像进行深度处理得到每一帧待转换图像所对应的深度图，此时通过算法将每一帧的待转换图像基于所对应的深度图的结果进行图像处理使得生成对应的虚拟右图，其中虚拟右图为待转换图像经深度转化处理后的虚拟视点图像，也即通常用户在右眼虚拟视角下的图像。Wherein, in one embodiment of the present invention, after the video data is divided into frames to obtain the image to be converted in each frame, depth processing is performed on the images to be converted in all frames to obtain the depth corresponding to the image to be converted in each frame At this time, through the algorithm, the image to be converted in each frame is processed based on the result of the corresponding depth map to generate the corresponding virtual right image, where the virtual right image is the virtual viewpoint image after the image to be converted has been processed by depth conversion , which is usually the image of the user under the virtual perspective of the right eye.

在本发明实施例中，具体的，其每一帧的待转换图像输入到深度处理的算法中，其深度处理算法采用的是视觉顶会CVPR2022年的深度估计NeW CRFs与Midas开源算法，在算法处理完后，将所生成的每一帧的深度图单独建立一个文件夹进行保存，其用于检验深度图像生成是否合理(具体的检查图像深度提取是否正确，防止后续提取出错)及方便后续步骤。In the embodiment of the present invention, specifically, the image to be converted in each frame is input into the depth processing algorithm, and the depth processing algorithm adopts the depth estimation NeW CRFs and Midas open source algorithm of CVPR2022 in the vision top meeting, in the algorithm After processing, create a separate folder for the generated depth image of each frame to save, which is used to check whether the depth image generation is reasonable (specifically check whether the image depth extraction is correct, to prevent subsequent extraction errors) and facilitate subsequent steps .

其中深度估计原理为：获取物体和拍摄点之间的距离，最终会获得一个深度图，也称为光流图，它记录了同一物体在不同图像之间的视差，再通过相机参数、两个拍摄点之间的位置信息即可换算出物体和拍摄点之间的距离。例如在100m的距离下，汽车在图像中大致高为3cm，此时通过多个物品来相互修正深度，从而获得相对准确的预测结果。其中，深度估计是计算机视觉领域中的一个子任务，其目的是获取物体和拍摄点之间的距离，为三维重建、距离感知、SLAM、视觉里程计、活体检测、视频插帧、图像重建等一系列任务提供深度信息。The principle of depth estimation is: to obtain the distance between the object and the shooting point, and finally obtain a depth map, also known as the optical flow map, which records the parallax of the same object between different images, and then through the camera parameters, two The positional information between the shooting points can be converted into the distance between the object and the shooting point. For example, at a distance of 100m, the height of the car in the image is roughly 3cm. At this time, multiple objects are used to correct the depth of each other, so as to obtain relatively accurate prediction results. Among them, depth estimation is a subtask in the field of computer vision. Its purpose is to obtain the distance between the object and the shooting point. A series of tasks provide in-depth information.

进一步的，在深度估计算法处理完后，基于深度估计的结果对初始每一帧的待转换进行图像处理生成每一帧待转换图像所对应的虚拟右图。Further, after the depth estimation algorithm is processed, based on the result of the depth estimation, image processing is performed on each initial frame to be converted to generate a virtual right image corresponding to each frame of the image to be converted.

步骤S40，分别将每一帧的待转换图像与所对应的虚拟右图一同进行图像处理生成三维图像；Step S40, performing image processing on each frame of the image to be converted and the corresponding virtual right image together to generate a three-dimensional image;

其中，在本发明的一个实施例中，在将每一帧的待转换图像分别转换为对应的虚拟右图后再将每一帧的待转换图像与所对应的虚拟右图一同进行图像处理生成三维图像，其中三维图像为待转换图像与虚拟右图分别位于左右两侧的图像，也即在左右两侧分别加载待转换图像及虚拟右图从而拼接成三维图像。此时将待转换图像与新生成的虚拟右图进行左右拼接即可生成裸眼3d技术中左右形式的3d视频源中所需的三维图像。Wherein, in one embodiment of the present invention, after converting the image to be converted in each frame into the corresponding virtual right image, the image to be converted in each frame is processed together with the corresponding virtual right image to generate A three-dimensional image, wherein the three-dimensional image is an image in which the image to be converted and the virtual right image are respectively located on the left and right sides, that is, the image to be converted and the virtual right image are respectively loaded on the left and right sides to splicing into a three-dimensional image. At this time, the image to be converted and the newly generated virtual right image are spliced left and right to generate the three-dimensional image required in the left and right form of the 3d video source in the naked eye 3d technology.

步骤S50，将所有帧的三维图像进行合并得到三维视频数据；Step S50, merging the three-dimensional images of all frames to obtain three-dimensional video data;

其中，在本发明的一个实施例中，当将待转换图像与虚拟右图一同进行图像处理生成三维图像后，判断每一帧的图片是否转换完毕，若否，则继续将所剩下的待转换图像与虚拟右图进行图像处理生成三维图像；若是，则将转换后的所有帧的三维图像进行合并后得到一个没有声音的视频。具体的，在本发明实施例中，参照前述所述，利用Python的moviepy.editor开源模块，将每一帧的三维图像按照原始帧数、封装格式合成视频。Wherein, in one embodiment of the present invention, after image processing is performed on the image to be converted and the virtual right image together to generate a three-dimensional image, it is judged whether the image of each frame has been converted, and if not, continue to convert the remaining image to be converted Perform image processing on the converted image and the virtual right image to generate a three-dimensional image; if so, merge the converted three-dimensional images of all frames to obtain a video without sound. Specifically, in the embodiment of the present invention, with reference to the foregoing, the open source module moviepy.editor of Python is used to synthesize the 3D image of each frame into a video according to the original number of frames and the packaging format.

步骤S60，将所分离的音频数据与所生成的三维视频数据进行合并得到三维视频；Step S60, merging the separated audio data and the generated 3D video data to obtain a 3D video;

其中，在本发明的一个实施例中，参照步骤S10所述，当转换视频中含有音频时，则将所分离的音频数据与所生成的三维视频数据进行合并得到三维视频，也即直接在上述三维视频数据中导入所分离的音频数据，此时由于所有时间参数均与原始视频相同，因此无需进行音频视频同步，可直接生成三维视频。Wherein, in one embodiment of the present invention, as described in step S10, when the converted video contains audio, the separated audio data and the generated 3D video data are combined to obtain a 3D video, that is, directly in the above-mentioned The separated audio data is imported into the 3D video data. At this time, since all time parameters are the same as those of the original video, there is no need for audio and video synchronization, and the 3D video can be directly generated.

而当待转换视频中不含有音频且无需音频数据时，则直接在上述步骤S50中即可得到三维视频；而当需要制作音频数据时，则将所制作的音频数据导入至三维视频数据中，且对音频数据进行调整以得到适配三维视频数据的三维视频。And when the video to be converted does not contain audio and does not need audio data, the three-dimensional video can be obtained directly in the above-mentioned step S50; And the audio data is adjusted to obtain a 3D video adapted to the 3D video data.

相应的，其所转换成的三维视频可应用于如下场景：Correspondingly, the converted 3D video can be applied to the following scenarios:

一、需要由平台视频转化成裸眼3d视频的场景，通过本发明实施例中所提供的视频处理方法可以将普通视频进行处理后生成具有虚拟右眼视角的每一帧虚拟右图，然后通过将普通视频所分帧出的每一帧待转换图像与对应的虚拟右图进行拼接后即可生成裸眼3d技术中左右形式的3d视频源，并用于放映。1. For scenes that need to be converted from platform video to naked-eye 3D video, ordinary video can be processed to generate a virtual right image of each frame with a virtual right-eye perspective through the video processing method provided in the embodiment of the present invention, and then by converting After splicing each frame of the image to be converted from the frame division of the ordinary video with the corresponding virtual right image, a 3D video source in the form of left and right in naked-eye 3D technology can be generated and used for projection.

二、作VR盒子所需的3d视频源，通过本发明实施例中所提供的视频处理方法为部分3d视频源的转化提供技术支持。2. For the 3D video source required by the VR box, provide technical support for the conversion of some 3D video sources through the video processing method provided in the embodiment of the present invention.

综上，本发明上述实施例当中的视频处理方法，通过将需要制作成裸眼3d视频的原始待转换视频先经过音视频的分离，使得先得到音频数据以及视频数据，然后将视频数据进行逐帧分帧处理，从而得到每一帧的待转换图像，然后利用视觉处理技术将待转换图像转换为具体3d右图虚拟视角的虚拟右图，并将待转换图像与虚拟右图进行拼接使得生成三维图像，然后通过将所有帧的三维图像进行合并得到三维视频数据，并在三维视频数据中并入所分离的音频数据，使得最终生成三维视频，使得将原图转换为裸眼3d图像的处理可以实现自动化，实现自动化将原视频转成对应的裸眼3d视频，从而极大地降低了裸眼3d视频制作的高昂人工成本，提高制作裸眼3d的视频制作，有利于裸眼3d的普及，解决了现有3d视频制作成本高及效率低的问题。To sum up, the video processing method in the above-mentioned embodiments of the present invention firstly separates the audio and video from the original video to be converted into naked-eye 3D video, so that the audio data and video data are obtained first, and then the video data is processed frame by frame. Framing processing, so as to obtain the image to be converted in each frame, and then use visual processing technology to convert the image to be converted into a virtual right image of a specific 3D right image virtual perspective, and stitch the image to be converted with the virtual right image to generate a three-dimensional image, and then by merging the three-dimensional images of all frames to obtain three-dimensional video data, and incorporating the separated audio data into the three-dimensional video data, the final three-dimensional video is generated, so that the process of converting the original image into a naked-eye 3D image can be automated , automatically convert the original video into the corresponding naked-eye 3D video, thereby greatly reducing the high labor cost of naked-eye 3D video production, improving the production of naked-eye 3D video, which is conducive to the popularization of naked-eye 3D, and solves the problem of existing 3D video production The problem of high cost and low efficiency.

实施例二Embodiment two

请参阅图2，是本发明第二实施例提供的一种视频处理方法的流程示意图，为了便于说明，仅示出了与本发明实施例相关的部分，该第二实施例与第一实施例的方法大抵相同，为简要描述，本实施例中未提及之处，可参考第一实施例中相应内容，具体该方法包括：Please refer to FIG. 2 , which is a schematic flowchart of a video processing method provided by the second embodiment of the present invention. For the convenience of description, only the parts related to the embodiment of the present invention are shown. The second embodiment is the same as the first embodiment The methods are roughly the same, for a brief description, for the parts not mentioned in this embodiment, you can refer to the corresponding content in the first embodiment, specifically the method includes:

步骤S11，将所获取的待转换视频中的音频信号和视频信号进行分离得到音频数据和视频数据。Step S11, separating the acquired audio signal and video signal in the video to be converted to obtain audio data and video data.

步骤S21，将视频数据进行逐帧拆分得到每一帧的待转换图像。Step S21 , splitting the video data frame by frame to obtain an image to be converted for each frame.

其中，在本发明的一个实施例中，其步骤S21之后还可以包括：Wherein, in one embodiment of the present invention, it may also include after step S21:

也即是说，当待转换视频中仅包括音频信息及视频信息，但是不包括字幕信息时，其可相应的在进行本发明实施例所述的视频处理方法中，在对所拆分的待转换图像进行文字字幕的合成添加，且每一帧的文字字幕需要与音频数据中的音频相对应，使得其所添加的文字字幕也可相应的进行后续的图像处理，使得最终生成带有字幕的三维视频。That is to say, when the video to be converted only includes audio information and video information, but does not include subtitle information, it can correspondingly carry out the video processing method described in the embodiment of the present invention, in the split to be converted Convert the image to add text and subtitles, and the text and subtitles of each frame need to correspond to the audio in the audio data, so that the added text and subtitles can also be processed accordingly, so that the final image with subtitles can be generated 3D video.

步骤S31，分别将每一帧的待转换图像进行深度处理得到所对应的深度图；Step S31, performing depth processing on each frame of the image to be converted to obtain a corresponding depth map;

其中，在本发明的一个实施例中，上述分别将每一帧的待转换图像进行深度处理得到所对应的深度图可通过如下步骤实现：Wherein, in one embodiment of the present invention, the aforementioned depth processing of each frame of the image to be converted to obtain the corresponding depth map can be achieved through the following steps:

具体的，先将每一帧的待转换图像转换为含RGB(红(R)、绿(G)、蓝(B))信息的矩阵，然后将每一帧所转换的矩阵输入到深度处理算法中进行算法处理，此时深度处理算法将利用训练好的数据集，识别不同物体在一般情况下的距离大小，并结合条件随机场优化运算速度，从而快速识别物体深度，最终得到对应的深度图。可以理解的，在本发明的其他实施例中，其将待转换图像转换为深度图的方式还可以为现有其他方式，其根据实际使用需要进行设置，在此不做具体限定。Specifically, the image to be converted of each frame is first converted into a matrix containing RGB (red (R), green (G), blue (B)) information, and then the converted matrix of each frame is input to the depth processing algorithm In this case, the depth processing algorithm will use the trained data set to identify the distance of different objects under normal circumstances, and combine the conditional random field to optimize the calculation speed, so as to quickly identify the depth of the object, and finally obtain the corresponding depth map . It can be understood that, in other embodiments of the present invention, the manner of converting the image to be converted into a depth map may also be other existing manners, which are set according to actual use requirements, and are not specifically limited here.

步骤S41，分别将每一帧的深度图动态转换为所对应的蒙版；Step S41, respectively dynamically converting the depth map of each frame into a corresponding mask;

其中，在本发明的一个实施例中，上述分别将每一帧的深度图动态转换为所对应的蒙版可通过如下步骤实现：Wherein, in one embodiment of the present invention, the above-mentioned dynamic conversion of the depth map of each frame into the corresponding mask can be realized through the following steps:

具体的，在将待转换图像进行深度处理得到深度图后，Specifically, after performing depth processing on the image to be converted to obtain a depth map,

具体的，其先对每一帧的深度图进行图像处理以提高主体部分的对比度，具体可对深度图的Gama值(伽马值)与对比度比例进行调整，然后转化成灰度图，再进行二值化处理得到二值图像(也即蒙版)。Specifically, it first performs image processing on the depth map of each frame to improve the contrast of the main part. Specifically, the Gama value (gamma value) and contrast ratio of the depth map can be adjusted, and then converted into a grayscale image, and then processed. The binarization process obtains a binary image (that is, a mask).

步骤S51，分别根据每一帧的蒙版从所对应的待转换图像中分割出前景和背景；Step S51, segmenting the foreground and background from the corresponding image to be converted according to the mask of each frame;

其中，在本发明的一个实施例中，当将深度图动态转换为蒙版后，利用图像分割技术，使得可以单独提取某部分的图像，具体的，将蒙版与待转换图像进行bitwise操作(也即位运算)，使得将前景与背景分开，然后根据蒙版截取反相，使得除了前景区域其他都是黑色，此时可将前景的主体部分进行扣出提取。Wherein, in one embodiment of the present invention, after the depth map is dynamically converted into a mask, the image segmentation technology is used to extract a certain part of the image separately, specifically, the mask and the image to be converted are bitwise operated ( Also known as bit operation), so that the foreground is separated from the background, and then intercepted and inverted according to the mask, so that everything except the foreground area is black. At this time, the main part of the foreground can be deducted and extracted.

步骤S61，分别将每一帧所提取的前景进行旋转拉伸及左移变换，并将所变换得到的结果填充到所对应的背景中；Step S61, performing rotation stretching and left shifting transformation on the extracted foreground of each frame respectively, and filling the transformed result into the corresponding background;

其中，在本发明的一个实施例中，当将前景与背景进行分割后，将前景进行提取后，将所提取的前景中的所有像素基于中间轴进行旋转拉伸以及左移等操作，以此单独突出主体的深度变换，然后将变换后的所有像素再放回至背景中。Wherein, in one embodiment of the present invention, after the foreground and the background are segmented, after the foreground is extracted, all the pixels in the extracted foreground are rotated, stretched and shifted to the left based on the middle axis, so that Depth transformation that highlights the subject alone, then places all transformed pixels back into the background.

步骤S71，分别将每一帧所填充到背景中的图像进行图像修复得到所对应的虚拟右图；Step S71, performing image restoration on the image filled in the background of each frame to obtain the corresponding virtual right image;

其中，在本发明的一个实施例中，上述分别将每一帧所填充到背景中的图像进行图像修复得到所对应的虚拟右图可通过如下步骤实现：Wherein, in one embodiment of the present invention, the image restoration of the image filled into the background of each frame above to obtain the corresponding virtual right image can be realized through the following steps:

其中，由于前景基于深度进行左移，使得后续填充回背景后，由于生成的图像在深度信息发生较大变换的情况下，会产生一些空洞，此时需要对空白部分进行修复；具体的，根据平移的图像像素位置进行绘制掩膜，然后进行空洞区域的填补，从而得到和原始的待转换图像相同格式大小，但部分图层位置进行平移的虚拟右图。目前为了算力考虑，通常采用的是均值填充法或最近邻插值法，使得最终在对整体图像进行平滑，抹去细小“裂缝”后生成3d右图虚拟视角的虚拟右图。Among them, since the foreground is moved to the left based on the depth, after the subsequent filling back to the background, there will be some holes in the generated image when the depth information undergoes a large transformation, and the blank part needs to be repaired at this time; specifically, according to The shifted image pixel position is used to draw a mask, and then the hole area is filled, so as to obtain a virtual right image with the same format and size as the original image to be converted, but part of the layer position is shifted. At present, for computing power considerations, the mean filling method or the nearest neighbor interpolation method is usually used, so that the overall image is finally smoothed and the small "cracks" are erased to generate a virtual right image of the virtual perspective of the 3d right image.

步骤S81，分别将每一帧的待转换图像与所对应的虚拟右图一同进行图像处理生成三维图像；Step S81, performing image processing on the image to be converted and the corresponding virtual right image of each frame to generate a three-dimensional image;

其中，在本发明的一个实施例中，上述分别将每一帧的待转换图像与所对应的虚拟右图一同进行图像处理生成三维图像可通过如下步骤实现：Wherein, in one embodiment of the present invention, the above-mentioned processing of the image to be converted and the corresponding virtual right image of each frame to generate a three-dimensional image can be realized through the following steps:

其中通过先将两组图像分别转换为多维数组，此时将图像转换成多维数组形式，这样就能将两组图像拼接的问题转换成两个多维数组的合并，此时再将所合并的多维数组转换成三维图像，使得完成将两组图像实现拼接为三维图像。此时如果观察三维图像中的左右两个视图就可以发现两个图片有了细微的变化，而这个细微变化本质上就是类似人眼中左右眼所看到画面的差异。因此通过将原图与新生成的虚拟视角进行左右拼接即可以模拟人眼平时所见的三维物体视角。Among them, by first converting the two sets of images into multidimensional arrays, the images are converted into multidimensional arrays at this time, so that the problem of stitching two sets of images can be converted into a combination of two multidimensional arrays, and then the merged The multi-dimensional array is converted into a three-dimensional image, so that the two sets of images can be spliced into a three-dimensional image. At this time, if you observe the left and right views in the 3D image, you can find that the two pictures have a slight change, and this slight change is essentially similar to the difference between the left and right eyes of the human eye. Therefore, by splicing the original image with the newly generated virtual perspective, the perspective of the three-dimensional object usually seen by the human eye can be simulated.

步骤S91，将所有帧的三维图像进行合并得到三维视频数据。Step S91, combining the 3D images of all frames to obtain 3D video data.

步骤S101，将所分离的音频数据与所生成的三维视频数据进行合并得到三维视频。Step S101, combining the separated audio data and the generated 3D video data to obtain a 3D video.

综上，本发明上述实施例当中的视频处理方法，通过将需要制作成裸眼3d视频的原始待转换视频先经过音视频的分离，使得先得到音频数据以及视频数据，然后将视频数据进行逐帧分帧处理，从而得到每一帧的待转换图像，通过将待转换图像基于深度估计技术生成深度图，然后将深度图动态转换成蒙版，再利用蒙版提取待转换图像中前景的所有像素并进行旋转拉伸及左移，并基于图像修复技术对由移动所产生的空洞进行图像修复，使得最终生成3d右图虚拟视角下的虚拟右图，然后将待转换图像与虚拟右图进行拼接使得生成三维图像，并通过将所有帧的三维图像进行合并得到三维视频数据，并在三维视频数据中并入所分离的音频数据，使得最终生成三维视频，此时该视频处理方法通过软件自动化进行处理，使得解决了现有3d视频制作成本高及效率低的问题。To sum up, the video processing method in the above-mentioned embodiments of the present invention firstly separates the audio and video from the original video to be converted into naked-eye 3D video, so that the audio data and video data are obtained first, and then the video data is processed frame by frame. Framing processing, so as to obtain the image to be converted in each frame, generate a depth map based on the depth estimation technology of the image to be converted, and then dynamically convert the depth map into a mask, and then use the mask to extract all the pixels of the foreground in the image to be converted Rotate, stretch and move to the left, and based on the image repair technology, perform image repair on the hole generated by the movement, so that the virtual right image under the virtual perspective of the 3D right image is finally generated, and then the image to be converted is stitched with the virtual right image 3D images are generated, and the 3D video data is obtained by merging the 3D images of all frames, and the separated audio data is incorporated into the 3D video data, so that the 3D video is finally generated. At this time, the video processing method is automatically processed by software Therefore, the problems of high cost and low efficiency of existing 3D video production are solved.

实施例三Embodiment three

请参阅图3，是本发明第三实施例提供的视频处理系统的结构示意图，为了便于说明，仅示出了与本发明实施例相关的部分，该系统包括：Please refer to FIG. 3, which is a schematic structural diagram of a video processing system provided by the third embodiment of the present invention. For ease of description, only the parts related to the embodiment of the present invention are shown. The system includes:

分离模块11，用于将所获取的待转换视频中的音频信号和视频信号进行分离得到音频数据和视频数据；Separation module 11, for separating the audio signal and the video signal in the acquired video to be converted to obtain audio data and video data;

拆分模块21，用于将视频数据进行逐帧拆分得到每一帧的待转换图像；The splitting module 21 is used to split the video data frame by frame to obtain the image to be converted of each frame;

第一图像处理模块31，用于分别将每一帧的待转换图像进行深度处理，并根据深度处理的结果对待转换图像进行图像处理生成对应的虚拟右图，所述虚拟右图为待转换图像经深度转化处理后的虚拟视点图像；The first image processing module 31 is configured to perform depth processing on each frame of the image to be converted, and perform image processing on the image to be converted according to the result of the depth processing to generate a corresponding virtual right image, the virtual right image being the image to be converted The virtual viewpoint image processed by depth conversion;

第二图像处理模块41，用于分别将每一帧的待转换图像与所对应的虚拟右图一同进行图像处理生成三维图像，所述三维图像为待转换图像与虚拟右图分别位于左右两侧的图像；The second image processing module 41 is used to respectively perform image processing on the image to be converted and the corresponding virtual right image of each frame to generate a three-dimensional image, and the three-dimensional image is that the image to be converted and the virtual right image are respectively located on the left and right sides Image;

第一合并模块51，用于将所有帧的三维图像进行合并得到三维视频数据；The first merging module 51 is used for merging the three-dimensional images of all frames to obtain three-dimensional video data;

第二合并模块61，用于将所分离的音频数据与所生成的三维视频数据进行合并得到三维视频。The second merging module 61 is configured to combine the separated audio data and the generated 3D video data to obtain 3D video.

进一步的，在本发明的一个实施例中，第一图像处理模块31包括：Further, in one embodiment of the present invention, the first image processing module 31 includes:

深度图处理单元，用于分别将每一帧的待转换图像进行深度处理得到所对应的深度图；A depth map processing unit, configured to perform depth processing on the image to be converted in each frame to obtain a corresponding depth map;

蒙版转换单元，用于分别将每一帧的深度图动态转换为所对应的蒙版；A mask conversion unit is used to dynamically convert the depth map of each frame into a corresponding mask;

图像分割单元，用于分别根据每一帧的蒙版从所对应的待转换图像中分割出前景和背景；The image segmentation unit is used to segment the foreground and the background from the corresponding image to be converted according to the mask of each frame;

图像变换单元，用于分别将每一帧所提取的前景进行旋转拉伸及左移变换，并将所变换得到的结果填充到所对应的背景中；The image transformation unit is used to perform rotation stretching and left shift transformation on the extracted foreground of each frame respectively, and fill the transformed result into the corresponding background;

虚拟右图处理单元，用于分别将每一帧所填充到背景中的图像进行图像修复得到所对应的虚拟右图。The virtual right image processing unit is configured to respectively perform image restoration on the image filled in the background of each frame to obtain a corresponding virtual right image.

进一步的，在本发明的一个实施例中，所述深度图处理单元用于：Further, in one embodiment of the present invention, the depth map processing unit is used for:

进一步的，在本发明的一个实施例中，所述蒙版转换单元用于：Further, in one embodiment of the present invention, the mask conversion unit is used for:

进一步的，在本发明的一个实施例中，虚拟右图处理单元用于：Further, in one embodiment of the present invention, the virtual right image processing unit is used for:

进一步的，在本发明的一个实施例中，第二图像处理模块41包括：Further, in one embodiment of the present invention, the second image processing module 41 includes:

数组转换单元，用于将每一帧的待转换图像与所对应的虚拟右图分别转换为对应的两组数组；An array conversion unit, configured to convert the image to be converted and the corresponding virtual right image of each frame into corresponding two sets of arrays;

数组合并单元，用于分别对每一帧中的两组数组进行矩阵的广播，并进行数组的合并；The array merging unit is used to broadcast the matrix to two groups of arrays in each frame respectively, and to merge the arrays;

图像转换单元，用于分别将每一帧中所合并的数组转换为对应的三维图像。The image conversion unit is used to respectively convert the combined array in each frame into a corresponding three-dimensional image.

进一步的，在本发明的一个实施例中，所述系统还包括：Further, in one embodiment of the present invention, the system also includes:

字幕添加模块，用于对每一帧的待转换图像添加文字字幕。The subtitle adding module is used for adding text subtitles to the image to be converted in each frame.

本发明实施例所提供的视频处理系统，其实现原理及产生的技术效果和前述方法实施例相同，为简要描述，装置实施例部分未提及之处，可参考前述方法实施例中相应内容。The implementation principles and technical effects of the video processing system provided by the embodiments of the present invention are the same as those of the foregoing method embodiments. For brief description, for the parts not mentioned in the device embodiments, reference may be made to the corresponding content in the foregoing method embodiments.

实施例四Embodiment Four

本发明另一方面还提出一种设备终端，请参阅图4，所示为本发明第四实施例当中的设备终端，包括存储器20、处理器10以及存储在存储器上并可在处理器上运行的程序30，所述处理器10执行所述程序30时实现如上述实施例一或实施例二所述的视频处理方法。Another aspect of the present invention also proposes a device terminal, please refer to FIG. 4, which shows the device terminal in the fourth embodiment of the present invention, including a memory 20, a processor 10, and a device stored in the memory and capable of running on the processor. A program 30, when the processor 10 executes the program 30, implements the video processing method as described in Embodiment 1 or Embodiment 2 above.

其中，处理器10在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器或其他数据处理芯片，用于运行存储器20中存储的程序代码或处理数据，例如执行访问限制程序等。Wherein, the processor 10 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor or other data processing chips in some embodiments, and is used to run the program codes stored in the memory 20 or Processing data, such as executing access restriction programs, etc.

其中，存储器20至少包括一种类型的可读存储介质，所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如SD或DX存储器等)、磁性存储器、磁盘、光盘等。存储器20在一些实施例中可以是设备终端的内部存储单元，例如该设备终端的硬盘。存储器20在另一些实施例中也可以是设备终端的外部存储装置，例如设备终端上配备的智能存储卡(Smart Media Card,SMC)，安全数字(Secure Digital,SD)卡，闪存卡(Flash Card)等。进一步地，存储器20还可以既包括设备终端的内部存储单元也包括外部存储装置。存储器20不仅可以用于存储安装于设备终端的应用软件及各类数据，还可以用于暂时地存储已经输出或者将要输出的数据。Wherein, the memory 20 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card type memory (such as SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 20 may be an internal storage unit of the device terminal in some embodiments, such as a hard disk of the device terminal. In other embodiments, the memory 20 can also be an external storage device of the equipment terminal, such as a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, a flash memory card (Flash Card) equipped on the equipment terminal. )wait. Further, the memory 20 may also include both an internal storage unit of the device terminal and an external storage device. The memory 20 can be used not only to store application software and various data installed in the device terminal, but also to temporarily store data that has been output or will be output.

综上，本发明上述实施例当中的设备终端，通过将需要制作成裸眼3d视频的原始待转换视频先经过音视频的分离，使得先得到音频数据以及视频数据，然后将视频数据进行逐帧分帧处理，从而得到每一帧的待转换图像，然后利用视觉处理技术将待转换图像转换为具体3d右图虚拟视角的虚拟右图，并将待转换图像与虚拟右图进行拼接使得生成三维图像，然后通过将所有帧的三维图像进行合并得到三维视频数据，并在三维视频数据中并入所分离的音频数据，使得最终生成三维视频，使得将原图转换为裸眼3d图像的处理可以实现自动化，实现自动化将原视频转成对应的裸眼3d视频，从而极大地降低了裸眼3d视频制作的高昂人工成本，提高制作裸眼3d的视频制作，有利于裸眼3d的普及，解决了现有3d视频制作成本高及效率低的问题。To sum up, the device terminal in the above-mentioned embodiments of the present invention separates the original video to be converted into a naked-eye 3D video through audio and video separation, so that the audio data and video data are obtained first, and then the video data is divided frame by frame. Frame processing, so as to obtain the image to be converted for each frame, and then use visual processing technology to convert the image to be converted into a virtual right image of a specific 3D right image virtual perspective, and stitch the image to be converted with the virtual right image to generate a three-dimensional image , and then obtain 3D video data by merging the 3D images of all frames, and incorporate the separated audio data into the 3D video data, so that a 3D video is finally generated, so that the process of converting the original image into a naked-eye 3D image can be automated, Automatically convert the original video into the corresponding naked-eye 3D video, thereby greatly reducing the high labor cost of naked-eye 3D video production, improving the production of naked-eye 3D video, which is conducive to the popularization of naked-eye 3D and solving the existing 3D video production cost high and low efficiency issues.

本发明实施例还提供了一种可读存储介质，其上存储有程序，该程序被处理器执行时实现如上述实施例所述的视频处理方法步骤。所述可读存储介质，如：ROM/RAM、磁碟、光盘等。An embodiment of the present invention also provides a readable storage medium, on which a program is stored, and when the program is executed by a processor, the steps of the video processing method described in the foregoing embodiments are implemented. The readable storage medium, such as: ROM/RAM, magnetic disk, optical disk, etc.

所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，仅以上述各功能单元、模块的划分进行举例说明，实际应用中，可以根据需要而将上述功能分配由不同的功能单元或模块完成，即将存储装置的内部结构划分成不同的功能单元或模块，以完成以上描述的全部或者部分功能。实施方式中的各功能单元、模块可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中，上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。另外，各功能单元、模块的具体名称也只是为了便于相互区分，并不用于限制本申请的保护范围。Those skilled in the art can clearly understand that for the convenience and brevity of description, only the division of the above-mentioned functional units and modules is used for illustration. In practical applications, the above-mentioned functions can be assigned to different functional units or modules according to needs. The completion of the modules means that the internal structure of the storage device is divided into different functional units or modules, so as to complete all or part of the functions described above. Each functional unit and module in the implementation manner may be integrated into one processing unit, or each unit may exist separately physically, or two or more units may be integrated into one unit, and the above-mentioned integrated units may adopt hardware It can also be implemented in the form of software functional units. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing each other, and are not used to limit the protection scope of the present application.

本领域技术人员可以理解，在流程图中表示或在此以其他方式描述的逻辑和/或步骤，例如，可以被认为是用于实现逻辑功能的可执行指令的定序列表，可以具体实现在任何可读存储介质中，以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用，或结合这些指令执行系统、装置或设备而使用。就本说明书而言，“可读存储介质”可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。Those skilled in the art will understand that the logic and/or steps shown in the flowchart or otherwise described herein, for example, can be considered as a sequenced list of executable instructions for implementing logical functions, which can be specifically implemented in on any readable storage medium for use by an instruction execution system, apparatus, or device (such as a computer-based system, system including a processor, or other system that can fetch instructions from an instruction execution system, apparatus, or device and execute them), or Used in conjunction with these instruction execution systems, devices or equipment. As far as this specification is concerned, a "readable storage medium" may be any device that can contain, store, communicate, propagate or transmit programs for instruction execution systems, devices or devices or use in conjunction with these instruction execution systems, devices or devices.

可读存储介质的更具体的示例(非穷尽性列表)包括以下：具有一个或多个布线的电连接部(电子装置)，便携式计算机盘盒(磁装置)，随机存取存储器(RAM)，只读存储器(ROM)，可擦除可编辑只读存储器(EPROM或闪速存储器)，光纤装置，以及便携式光盘只读存储器(CDROM)。另外，可读存储介质甚至可以是可在其上打印所述程序的纸或其他合适的介质，因为可以例如通过对纸或其他介质进行光学扫描，接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序，然后将其存储在存储器中。More specific examples (non-exhaustive list) of readable storage media include the following: electrical connection with one or more wires (electronic device), portable computer disk case (magnetic device), random access memory (RAM), Read Only Memory (ROM), Erasable and Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Compact Disc Read Only Memory (CDROM). In addition, the readable storage medium may even be paper or other suitable medium on which the program can be printed, since it may be possible, for example, by optical scanning of the paper or other medium, followed by editing, interpretation or other suitable processing if necessary. The program is processed electronically and then stored in memory.

应当理解，本发明的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中，多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如，如果用硬件来实现，和在另一实施方式中一样，可用本领域公知的下列技术中的任一项或它们的组合来实现：具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路，具有合适的组合逻辑门电路的专用集成电路，可编程门阵列(PGA)，现场可编程门阵列(FPGA)等。It should be understood that various parts of the present invention can be realized by hardware, software, firmware or their combination. In the embodiments described above, various steps or methods may be implemented by software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques known in the art: Discrete logic circuits, ASICs with suitable combinational logic gates, programmable gate arrays (PGAs), field programmable gate arrays (FPGAs), etc.

本领域技术人员可以理解，图3中示出的组成结构并不构成对本发明的视频处理系统的限定，可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件布置，而图1-2中的视频处理方法亦采用图3中所示的更多或更少的部件，或者组合某些部件，或者不同的部件布置来实现。本发明所称的单元、模块等是指一种能够被视频处理系统中的处理器(图未示)所执行并功能够完成特定功能的一系列计算机程序，其均可存储于视频处理系统的存储设备(图未示)内。Those skilled in the art can understand that the composition structure shown in Figure 3 does not constitute a limitation to the video processing system of the present invention, and may include more or less components than those shown in the figure, or combine some components, or different components Arrangement, and the video processing method in Fig. 1-2 also adopts more or less components shown in Fig. 3, or some components are combined, or different component arrangements are implemented. The units, modules, etc. referred to in the present invention refer to a series of computer programs that can be executed by the processor (not shown in the figure) in the video processing system and can complete specific functions, which can all be stored in the video processing system. storage device (not shown).

本领域技术人员还可以理解，图4中示出的组成结构并不构成对本发明的设备终端的限定，可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件布置，而图1-2中的视频处理方法亦采用图4中所示的更多或更少的部件，或者组合某些部件，或者不同的部件布置来实现。Those skilled in the art can also understand that the composition structure shown in FIG. 4 does not constitute a limitation to the device terminal of the present invention, and may include more or less components than shown in the figure, or combine some components, or different components Arrangement, and the video processing method in Fig. 1-2 also adopts more or less components shown in Fig. 4, or some components are combined, or different component arrangements are implemented.

在本说明书的描述中，参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中，对上述术语的示意性表述不一定指的是相同的实施例或示例。而且，描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。In the description of this specification, descriptions referring to the terms "one embodiment", "some embodiments", "example", "specific examples", or "some examples" mean that specific features described in connection with the embodiment or example , structure, material or characteristic is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

以上所述实施例仅表达了本发明的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对本发明专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本发明构思的前提下，还可以做出若干变形和改进，这些都属于本发明的保护范围。因此，本发明的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation modes of the present invention, and the description thereof is relatively specific and detailed, but should not be construed as limiting the patent scope of the present invention. It should be pointed out that those skilled in the art can make several modifications and improvements without departing from the concept of the present invention, and these all belong to the protection scope of the present invention. Therefore, the protection scope of the present invention should be determined by the appended claims.

Claims

1. A video processing method, characterized in that the method comprises:

Separating the acquired audio signal and video signal in the video to be converted to obtain audio data and video data;

Split the video data frame by frame to obtain the image to be converted for each frame;

Perform depth processing on the image to be converted for each frame, and perform image processing on the image to be converted according to the result of the depth processing to generate a corresponding virtual right image, the virtual right image is a virtual viewpoint image after the image to be converted has been processed by depth conversion ;

Carrying out image processing on the image to be converted and the corresponding virtual right image of each frame respectively to generate a three-dimensional image, the three-dimensional image is an image in which the image to be converted and the virtual right image are respectively located on the left and right sides;

Merging the three-dimensional images of all frames to obtain three-dimensional video data;

Combining the separated audio data and the generated 3D video data to obtain 3D video.

2. The video processing method according to claim 1, wherein the image to be converted of each frame is subjected to in-depth processing, and the image to be converted is processed according to the result of the in-depth processing to generate a corresponding virtual right image The steps include:

Depth processing is performed on the image to be converted for each frame to obtain a corresponding depth map;

Dynamically convert the depth map of each frame into the corresponding mask;

Separate the foreground and background from the corresponding image to be converted according to the mask of each frame;

The extracted foreground of each frame is rotated, stretched and left-shifted respectively, and the transformed result is filled into the corresponding background;

The images filled in the background of each frame are respectively image restored to obtain the corresponding virtual right image.

3. The video processing method according to claim 2, wherein the step of performing depth processing on the image to be converted of each frame to obtain a corresponding depth map comprises:

The image to be converted of each frame is converted into a matrix containing information of three primary colors respectively, and the information of three primary colors includes three color channel information of red, green and blue;

Input the matrix of the three primary color information included in each frame into the depth processing algorithm for algorithm processing to obtain the corresponding depth map.

4. The video processing method according to claim 2, wherein the step of dynamically converting the depth map of each frame into a corresponding mask respectively comprises:

Improve and adjust the contrast ratio of the depth map of each frame;

Convert the depth map of each frame into a grayscale image respectively;

Binarize the depth image converted to grayscale image for each frame to obtain the corresponding mask.

5. The video processing method according to claim 2, wherein the steps of performing image repair on the images filled into the background by each frame respectively to obtain the corresponding virtual right image include:

Draw a mask based on the pixel positions of the image that is filled into the background each frame;

Fill the holes outside the mask to get the corresponding virtual right image.

6. The video processing method according to claim 1, wherein the step of performing image processing on the image to be converted of each frame together with the corresponding virtual right image to generate a three-dimensional image comprises:

Convert the image to be converted and the corresponding virtual right image of each frame into corresponding two sets of arrays;

Perform matrix broadcasting on the two sets of arrays in each frame, and merge the arrays;

Transform the merged arrays in each frame into corresponding 3D images respectively.

7. video processing method as claimed in claim 1, is characterized in that, described video data is carried out frame by frame and obtains the step of the image to be converted of each frame and also comprises after:

Add text subtitles to each frame of the image to be converted.

8. A video processing system, characterized in that the system comprises:

A separation module, configured to separate the acquired audio signal and video signal in the video to be converted to obtain audio data and video data;

A splitting module is used to split the video data frame by frame to obtain an image to be converted for each frame;

The first image processing module is used to perform in-depth processing on the image to be converted in each frame, and perform image processing on the image to be converted according to the result of the in-depth processing to generate a corresponding virtual right image, the virtual right image is the image to be converted through image processing The virtual viewpoint image after depth conversion processing;

The second image processing module is used to respectively perform image processing on the image to be converted and the corresponding virtual right image of each frame to generate a three-dimensional image, and the three-dimensional image is the image to be converted and the virtual right image respectively located on the left and right sides image;

The first combining module is used to combine the three-dimensional images of all frames to obtain three-dimensional video data;

The second merging module is used for merging the separated audio data and the generated 3D video data to obtain 3D video.

9. A readable storage medium storing a program, characterized in that, when the program is executed by a processor, the video processing method according to any one of claims 1 to 8 is realized.

10. A device terminal, comprising a memory, a processor, and a program stored on the memory and operable on the processor, characterized in that, when the processor executes the program, any one of claims 1 to 8 is implemented The video processing method described.