CN102542593A

CN102542593A - Interactive video stylized rendering method based on video interpretation

Info

Publication number: CN102542593A
Application number: CN201110302054XA
Authority: CN
Inventors: 刘树郁; 张新楠; 江波
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2011-09-30
Filing date: 2011-09-30
Publication date: 2012-07-04

Abstract

The invention relates to an interactive video stylized rendering method based on video interpretation, wherein an interactive video semantic segmentation module and a video stylization module are utilized. A segmentation method of the interactive video semantic segmentation module comprises the following steps of: (1) interactive segmentation and automatic identification of key frame images; (2) matching of dense characteristic points among key frames; and (3) area competition segmentation. A stylization method of the video stylization module comprises the following steps of: (4) un-reality sense drawing of the key frames based on semantic analysis; (5) a brushwork propagating method of a sequence frame; and (6) a damping pen brush system for preventing shaking. The interactive video stylized rendering method based on the video interpretation, disclosed by the invention, has the advantages of short manufacturing period, low cost and favorability on manufacturing in batches.

Description

Interactive video stylized rendering method based on video interpretation

Technical Field

The invention discloses an interactive video stylized rendering method based on video interpretation, and belongs to the technology for modifying the interactive video stylized rendering method based on the video interpretation.

Background

With the wide spread of computers, digital cameras and digital video cameras, people have higher and higher requirements for manufacturing video entertainment. With the resulting explosion in the field of home digital entertainment. More and more people are trying to play the role of an amateur "director" enthusiastically to produce and edit various commonly-written videos. In recent years, various stylized videos are gradually accepted by people and become popular elements, particularly in the aspects of animation videos, online game production and the like. For example, manually drawn oil painting short films such as 'old man and sea' and water and ink painting videos such as 'polliwog looking for mother' lead people to be widely attentive, and the former also obtains a series of awards such as 'oscar short films'. The video stylized rendering not only needs professional technology, but also needs a large amount of manpower and financial support, and the traditional video stylized technology realizes the stylized rendering through a frame-by-frame drawing method. Although the visual effect of each frame of image of the work finished in the production mode can be manually controlled, the continuous playing causes a large jitter phenomenon of a video picture due to lack of inter-frame consistency, and the methods have long production period and high cost and are not beneficial to batch production. For example, although the above-mentioned oil painting short piece of "old man and sea" has a duration of only 22 minutes, the manufacturing cycle can last for nearly 3 years.

Disclosure of Invention

The invention aims to provide an interactive video stylized rendering method based on video interpretation, which has the advantages of short manufacturing period, low cost and benefit for batch manufacturing in consideration of the problems.

The technical scheme of the invention is as follows: the invention relates to an interactive video stylized rendering method based on video interpretation, which comprises an interactive video semantic segmentation module and a video stylized module, wherein the segmentation method of the interactive video semantic segmentation module comprises the following steps:

1) interactive segmentation and automatic identification of key frame images;

2) matching dense feature points among the key frames;

3) a region competition segmentation algorithm;

the stylization method of the video stylization module comprises the following steps:

4) performing non-photorealistic drawing on the key frame based on semantic analysis;

5) a stroke propagation method of the sequence frame;

6) a damping brush system for anti-shake.

The stylization of the video will use both modules in turn. Namely, firstly, the interactive semantic segmentation module is used for carrying out semantic segmentation on the video. And performing stylized rendering on the segmented video by using a video stylization module. The interactive segmentation and automatic identification method of the key frame image in the step 1) comprises the following steps:

dividing the divided semantic regions into twelve classes according to different material properties of the semantic regions, wherein the classes comprise sky/cloud, mountain/land, rock/building, leaves/tree bundle, hair/hair, flower/fruit, skin/leather, trunk/branch, abstract background, wood/plastic, water and clothes;

in actual operation, three main features of texture, color distribution and position information are adopted for training and recognition, a region image X is given, and the conditional probability of the category c is defined as follows:

logP(x|X，θ)＝∑_iΨ_i(c_i，X；θ_Ψ)+π(c_i，X；θ_π)+λ(c_i，X；θ_λ)-logZ(θ，X)(*)

the latter four terms in the formula are respectively a texture potential energy function, a color potential energy function, a position potential energy function and a normalization term.

Texture potential energy function is defined as psi_i(c_i，X；θ_Ψ)＝logP(c_i|X，i)，P(c_iIx, i) is a normalized distribution function given by the Boost classifier;

the color potential energy function is defined as pi (c)_i，X；θ_π)＝log∑_kθ_n(c_i，k)P(k|x_i) Using a Gaussian Mixture model in CIELab color space (Gaussian Mixture Models: GMMs) to represent a color model with a conditional probability of, for a pixel color x in a given image:

wherein mu_kAnd Σ k represents the mean and variance of the kth color cluster, respectively;

the position potential energy function is defined as lambda (c)_i，X；θ_λ)＝logθ_λ(c_iI) the position potential energy function is relatively weak with respect to the first two potential energy functions, in the definition of this function the class labels of the image pixels are only related to the absolute position in the image;

training 12 types of materials by using the method, then calculating the probability of each pixel in a given image region for each type by using the formula, finally counting all pixels in the region, and determining the type of each region by using a voting mode; in the stylized rendering process, the selection of the paintbrush is determined by the material identified by the object area, and a foundation is laid for realizing automatic rendering.

The matching method of dense feature points among the key frames in the step 2) is as follows:

after semantic information on the key frame is obtained, line drawing characteristics and texture and color mixed image template characteristics are integrated, and rich characteristic sets and expressions are provided for image matching problems;

11) line drawing features are represented by Gabor basis as:

F^sk(I_i)＝||<I_i，G_cos，x，θ>||²+||<I_i，G_sin，x，θ>||²，G_sin，x，θand G_cos，x，θRespectively show at the position

And x is a sine and cosine Gabor base with the direction theta. Its characteristic probability distribution is expressed as:

representing the parameter theta_i，h^skIs a function of the sigmoid and is,

is a standardized constraint.

So the model will encourage a stronger corresponding edge than the background distribution;

12) the texture features are modeled by a simplified histogram of gradient directions (HOG), and 6 feature dimensions respectively represent different gradient directions; represents the jth direction of the HOG, and

represents the ith feature I_iA corresponding descriptor;

is F^txt(I_i) Mean over all positive samples. The present invention represents a probabilistic model of a feature as:

is the parameter theta_i. It can be seen that the model encourages responses to a collection of feature image blocks in a relatively large set;

13) the color characteristics are described in terms of simple pixel brightness,

is the filter in position x. According to the invention, the pixel brightness value is quantized to each statistical interval, so that the model can be simplified as follows:

the method comprises the steps of combining similar small image features to obtain a feature combination with strong local discrimination, firstly, segmenting an image to obtain a plurality of tiny image blocks in the image, extracting statistical features capable of describing line drawing, textures and colors from the small image blocks, and finally obtaining the feature combination with strong local discrimination by adopting an iterative region growing and model learning algorithm and continuously updating a feature model and iteratively growing a feature combination region in order to effectively obtain the feature combination;

on the basis of the expression, modeling the matching problem of the moving target on a time domain and a space domain into a layered graph matching frame on a graph representation, taking the extracted mixed image template characteristics as graph nodes, constructing a graph structure between the frames, and defining the edge connection relation between the graph nodes based on the similarity and the space position between the characteristics and the type of an object to which the characteristics belong;

the original graph and the target graph are represented by Is and It, U, V represents Is and the mixed template feature set in It, and each feature point U belongs to U', and the two marks are arranged: hierarchy tag I (u) e {1, 2

Establishing a vertex set of a graph structure by using a candidate set C with higher matching degree of each feature point in an original graph, and taking E as E⁺∪E^-And constructing an edge set. Negative sides indicate that the connected candidates repel each other, and define their "repulsive force" as:

connecting spatially adjacent and non-mutually exclusive candidate feature points by positive edges

Indicating the degree of closeness of cooperation between them,denotes v_i，v_jThe spatial distance therebetween;

graph structure G for combining original image and target image^s、G^TIs divided into K +1 layers, where K represents the number of objects in the original image and is denoted by G^sFor example, the division is denoted as ═ g₀，g₁，...，g_k}. Wherein, g_kIs G^sA sub-graph of which vertices are grouped by U_kAnd (4) showing. Similarly, G^TSet of vertices of (1) as V_kAnd (4) showing. Then G is^sAnd G^TThe matching relationship between the two is expressed asAssuming that the matching between subgraphs is independent of each other, then:

defining matching sub-graph pairs (g) by geometric transformation and appearance measure_k，g_k') measure of similarity between

Represents; in summary, the solution to the graph structure matching problem can be configured as:

W＝(K，∏＝{g₀，g₁，...，g_k}，Ψ＝{Φ_k}，Φ＝{Φ_k})

under the Bayesian theory framework, the graph structure matching problem is described by maximizing the posterior probability:

W^*＝argmaxp(W|G^s，G^T)＝argmaxp(W)p(G^s，G^T|W)

the above formula is solved by a Markov Chain Monte Carlo (MCMC) method, and meanwhile, for efficient calculation, the global optimal solution is quickly converged by efficient skip in a solution space so as to achieve matching of the inter-frame feature points.

The area competition segmentation method of the step 3) is as follows:

on the basis of obtaining a stable matching relation between frames, the matching relation between the characteristics of a previous frame and the characteristics of a current frame can be determined by mining the advantages of a region competition mechanism in video segmentation and utilizing an image matching algorithm of a layered graph structure, so that the semantic information of the previous frame is spread to the current frame, and then the current frame is segmented into a plurality of semantic regions by utilizing a region competition segmentation algorithm according to the characteristic information of each matching region;

given image I, the corresponding image segmentation solution is defined as follows:

W＝{(R₁，R₂，...R_N)，(θ₁，θ₂，...，θ_N)，(I₁，I₂，，...，I_N)}

wherein R is_iIndicating the divided regions having the same characteristics,

θ_irepresents a region R_iParameters of the corresponding feature probability distribution model, I_iRepresents a region R_iA corresponding mark;

the number N of the segmentation areas can be determined according to the matching relation of the features in the previous frame and the next frame. Let the feature small region set S corresponding to each region be { S }₁，S₂，...，S_NFor each region R }_iAccording to the small area S occupied by the feature_iEstimating initial parameter theta of the model_iObtaining an initial posterior probability P (theta)_iI (x, y)). According to the MDL principle, the posterior probability is converted into the minimum problem of solving the energy function, and the following results are obtained:

<math><mrow> <mi>log</mi> <mi>P</mi> <mrow> <mo>(</mo> <mi>W</mi> <mo>|</mo> <mi>I</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>E</mi> <mo>[</mo> <mi>Γ</mi> <mo>,</mo> <mo>{</mo> <msub> <mi>θ</mi> <mi>i</mi> </msub> <mo>}</mo> <mo>]</mo> <mo>=</mo> <munderover> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <mo>-</mo> <msub> <mrow> <mo>&Integral;</mo> <mo>&Integral;</mo> </mrow> <msub> <mi>R</mi> <mn>1</mn> </msub> </msub> <mi>log</mi> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>θ</mi> <mi>i</mi> </msub> <mo>|</mo> <mi>I</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mi>dxdy</mi> </mrow></math>

wherein

Represents a region R_iThe boundary contour of (1). The invention adopts an iterative mode to estimate the parameter theta in stages_iAlternating and iterating two stages, and continuously reducing an energy function in each stage, so as to continuously learn and deduce a final segmentation result of the whole image;

in the regional competition process, continuously updating a characteristic probability distribution model of each region, simultaneously competing for ownership of pixel points according to the steepest descent principle, and updating respective boundary contour, so that each region continuously expands the range, and finally obtaining an image segmentation result of the current frame;

the specific iteration steps are as follows: the first stage, fixing Γ, estimates { θ ] from the current region segmentation state_iGet the parameter theta under the current state_iAs its optimal solution

To minimize the cost of describing each region, the energy function therefore translates into:

<math><mrow> <msubsup> <mi>θ</mi> <mi>i</mi> <mo>*</mo> </msubsup> <mo>=</mo> <msub> <mrow> <mi>arg</mi> <mi></mi> <mi>max</mi> </mrow> <msub> <mi>θ</mi> <mi>i</mi> </msub> </msub> <mo>{</mo> <msub> <mrow> <mo>&Integral;</mo> <mo>&Integral;</mo> </mrow> <mi>R</mi> </msub> <mi>log</mi> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>θ</mi> <mi>i</mi> </msub> <mo>|</mo> <mi>I</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mi>dxdy</mi> <mo>}</mo> <mo>,</mo> <mo>&ForAll;</mo> <mi>i</mi> <mo>&Element;</mo> <mo>[</mo> <mn>1</mn> <mo>,</mo> <mi>N</mi> <mo>]</mo> </mrow></math>

second stage, { θ }_iAs known, the steepest descent is carried out on the gamma, and in order to quickly obtain the minimum solution of the energy function, the steepest descent motion equation is solved for the boundary gamma of all the areas. For any point on the boundary contour gamma

Is provided with

<math><mrow> <mfrac> <mrow> <mi>d</mi> <mover> <mi>&upsi;</mi> <mo>&RightArrow;</mo> </mover> </mrow> <mi>dt</mi> </mfrac> <mo>=</mo> <mo>-</mo> <mfrac> <mrow> <mi>δE</mi> <mrow> <mo>(</mo> <mi>Γ</mi> <mo>,</mo> <mo>{</mo> <msub> <mi>θ</mi> <mi>i</mi> </msub> <mo>}</mo> <mo>)</mo> </mrow> </mrow> <mrow> <mi>δ</mi> <mover> <mi>&upsi;</mi> <mo>&RightArrow;</mo> </mover> </mrow> </mfrac> <mo>=</mo> <msub> <mi>Σ</mi> <mrow> <mi>k</mi> <mo>&Element;</mo> <mi>Q</mi> <mrow> <mo>(</mo> <mover> <mi>&upsi;</mi> <mo>&RightArrow;</mo> </mover> <mo>)</mo> </mrow> </mrow> </msub> <mi>log</mi> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>θ</mi> <mi>k</mi> </msub> <mo>|</mo> <mi>I</mi> <mrow> <mo>(</mo> <mover> <mi>&upsi;</mi> <mo>&RightArrow;</mo> </mover> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>·</mo> <msub> <mover> <mi>n</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mi>k</mi> <mrow> <mo>(</mo> <mover> <mi>&upsi;</mi> <mo>&RightArrow;</mo> </mover> <mo>)</mo> </mrow> </mrow> </msub> </mrow></math>

Wherein,

at point of τ kDirection vector, point of

To which region belongsDepending on the point

The degree of suitability for description by the region feature probability distribution model;

to determine the membership between each pixel point and the region, the competition-based image segmentation algorithm process is described as follows:

in the initialization stage, estimating initial parameters of various models according to the matched characteristic image blocks, adding boundary points of all the characteristic image blocks into a queue to be determined, and calculating posterior probabilities of all the boundary points belonging to various types;

in a loop iteration stage, selecting a boundary point i with the steepest descending current energy from an undetermined queue, and further updating all boundaries where the boundary point i is located; then under the current segmentation state, recalculating the model parameters of each region by using maximum likelihood estimation; recalculating the posterior probabilities of all the boundary points belonging to various types by using the newly obtained feature distribution models of the regions;

therefore, the boundary point with the fastest descending current energy is continuously selected from the queue to be determined to update the corresponding boundary, meanwhile, the characteristic distribution probability model of each region is updated timely according to the current region segmentation state, the regions are mutually restricted, and the ownership of the image region is simultaneously competed until the energy function converges, so that the image is segmented into the regions.

The stylization method of the video stylization module includes the steps of 4) video stylization is based on an interactive video semantic segmentation module, and the selection of the brush is determined only by the material corresponding to the identified object area;

the paintbrushes draw a large number of typical strokes on paper based on professional painters, then scan and parameterize, finally establish a stroke library, draw in each image area, firstly adopt a big brush to bottom, then gradually reduce brush size and opacity to finely depict the detailed part of an object, and during drawing, adopt the drawing strategy of edge first and then inside: drawing each layer of image firstly starts from the edge, firstly draws along the edge of line drawing, and aligns the brush according to the flow field;

in the video rendering, in order to ensure the continuity and stability of the brush in the time domain, a thin-plate spline interpolation technology is adopted to carry out the propagation of the brush strokes, and in addition, in the propagation process of the brush strokes, the deletion and addition mechanisms of the brush strokes are designed by calculating the area of the brush stroke areas; and the 'shaking' effect of the rendering result is reduced by using a simulated damping spring system.

The key frame non-photorealistic rendering method based on semantic analysis in the step 5) of the stylizing method of the video stylizing module is as follows:

how to design pen touch models with different artistic styles is one of the focuses of video stylization attention, works with different artistic expression forms have characteristics on pen touch expression, a basic drawing strategy in video stylization is to select proper pen touches to draw based on image content, a pen touch library is to draw a large number of typical pen touches on paper based on professional painters, then to scan and parameterize, and finally to complete establishment, and for a brush B to be drawn_nThe following information is contained: class information of brush I_nRange of laying area Λ_nColor mapping C_nAlpha of the transparency field_nHeight field H_nAnd control point { P_niThere are:

B_n＝{I_n，Λ_n，C_n，α_n，H_n，{P_ni}}

when designing the stroke model, not only the low-level information such as the shape and texture of the stroke is considered, but also the high-level semantic information of the stroke is integrated, so that each interpretation area of the image/video has pen dependence in the rendering process; when selecting the strokes, the interpretation area categories are used as key words, and a batch of strokes with the same category are simply and quickly selected from the stroke library. And then selecting a stroke from the strokes in a random manner;

for simulating the principle of 'alignment' in oil painting drawing, the original simple model theory is used for reference, and in each region R_iIn the interior, calculate its original reduced graph SK_iAnd (4) expressing. The reduced graph is composed of a group of salient elements for marking the surface characteristics of an object, such as spots, lines and wrinkles on clothes; during rendering, different paintbrushes will be overlaid on these primitives to produce the desired artistic effect; interpretation region R_i，R_i∈Λ_iDivided into line-drawing parts for describing the line-drawing

And for describing non-line drawing parts having the same structural region

R_iThe directional field θ x is defined as:

<math><mrow> <msub> <mi>Θ</mi> <mi>i</mi> </msub> <mo>=</mo> <mo>{</mo> <mi>θ</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>|</mo> <mi>θ</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <mo>[</mo> <mn>0</mn> <mo>,</mo> <mi>π</mi> <mo>)</mo> <mo>,</mo> <mo>&ForAll;</mo> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <msub> <mi>Λ</mi> <mi>i</mi> </msub> <mo>}</mo> </mrow></math>

wherein the direction field theta_iThe initial value being line tracingIn the direction of the gradient of (c). Then using diffusion equation to propagate direction to non-line drawing region

The rendering process of the key frame is a process of continuously selecting strokes and placing strokes; to interpret the region R_iFor example, first render its non-line-drawn part

Then rendering the line-drawn part

This is to ensure that when rendered regions overlap, the pen strokes of the line-drawn portion can be on the upper layer; in the non-line drawing part, optionally selecting an unrendered pixel area, taking the center of the area as an initial point, diffusing to two sides along the direction field, and generating a flow pattern area; taking the central axis of the area as a reference line, and transforming the selected brush into the flow pattern area to align the central axis of the pen touch with the central axis of the area; rendering of the line-drawn portion of the region is similar.

The stylization method of the video stylization module (2) described above, step 5) is the stroke propagation method of the sequence frame as follows:

the rendering of the non-key frame is obtained by 'propagation' of the rendering result of the key frame, the propagation basis is the time-space corresponding relation of the interpretation region, in the propagation process, as the variation of the interpretation region is larger and larger, the stroke may be gradually leaked to the outside of the region, and meanwhile, a rendered gap appears in the region, so in the propagation stroke graph, the adding and deleting mechanism of the stroke must be considered at the same time, otherwise, the rendering result has a jitter phenomenon; the mechanism of propagation, addition and deletion of strokes is as follows:

(a) and (3) pen touch transmission: let c denote a certain interpretation zone of the key-frame at time t of the video, R_i(t +1) represents R_i(t) the region corresponding to time t + 1. Their image area is divided intoIs distinguished by Λ_i(t)、Λ_i(t + 1). With P_ij(t)、P_ij(t +1) represents Λ_i(t), Λ x (t +1) dense matching points in the time domain (calculated during video interpretation). Let R be_i(t +1) can be represented by R_i(t) non-rigid transformation of the table. When the pen touch is transmitted, the invention hopes to be inverted V_iMatching point P on (t)_ij(t) can be mapped to a new image region Λ in the t +1 th frame_iMatching point P of (t +1)_ij(t + 1). Based on the above consideration, the invention selects a Thin-plate Spline (TPS) interpolation model. It can handle Lambda_i(t) key Point P_ij(t) mapping to Λ_iMatching point P of (t +1)_ij(t +1), and for Λ_iThe TPS minimizes the energy function to make lambda be_iThe pixel grid of (t) is distorted by elastic (non-rigid) deformation.

(b) Deleting brush strokes: because the region corresponding to some brushes becomes smaller and smaller after the brushes are propagated in the video or in an occlusion relationship or when the number of frames of stroke propagation is too large, the invention eliminates the brushes when the area of the region corresponding to the brushes is smaller than a given threshold. Similarly, a propagated brush is also deleted when it falls outside the corresponding zone boundary.

(c) The pen strokes are added. When new semantic areas appear or existing semantic areas become larger (such as unfolding of clothing), the invention must add new brushes to cover the new areas, and simply change the size and position of adjacent brushes in order to fill the gap between the brushes. If the area not covered by a brush becomes larger and exceeds some given threshold, the system will automatically create a new brush to cover it. Nevertheless, it is not possible with the present invention to draw a stroke on the gap immediately upon its first occurrence. Thus, the present invention sets a relatively high threshold and delays rendering newly appearing regions until they grow large enough. Next, the present invention employs a general brush placement algorithm to fill in gaps large enough to reach a threshold, and finally propagates and transforms these new brushes back to fill in previously occurring but unrendered gap regions. The process of filling the brush backwards avoids frequent brush changes while linking smaller, fragmented brushes to larger brushes, thereby reducing flickering effects and other undesirable artificially created visual effects. Also, since the present invention adds new brushes at the bottom level, they are drawn under the existing brushes, which further reduces the visual flicker effect.

The damping brush system for anti-shake in the step 6) of the stylization method of the video stylization module is as follows:

the last step of stylized rendering of the video is anti-shake operation, in which adjacent paintbrushes in the time domain and the space domain are connected by springs to simulate a damping system; by minimizing the energy of the system, the effect of removing the jitter can be achieved;

for the ith brush at the time t, the invention uses A_i，t＝(x_i，t，y_i，t，s_i，t) Geometric attributes representing its center coordinates and size, and its initial values are noted

The energy function of the damped brush system is defined as follows:

E＝E_data+λ₁E_smooth1+λ₂E_smooth2

λ₁and λ₂As a weight, λ₁＝2.8，λ₂＝1.1；

The first term constrains the position of the brush not to be too far from the initial position:

the second term in the equation is the smoothing constraint on brush i in the time domain:

<math><mrow> <msub> <mi>E</mi> <mrow> <mi>smooth</mi> <mn>1</mn> </mrow> </msub> <mo>=</mo> <munder> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>t</mi> </mrow> </munder> <msup> <mrow> <mo>(</mo> <msub> <mi>A</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>-</mo> <msub> <mrow> <mn>2</mn> <mi>A</mi> </mrow> <mrow> <mi>i</mi> <mo>,</mo> <mi>t</mi> </mrow> </msub> <mo>+</mo> <msub> <mi>A</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow></math>

the third term in the formula smoothly constrains adjacent brush in both time domain and space domain; note the book

For any adjacent brush, i.e. the ith brush at time t

The relative distance difference and the size difference between them are recorded as Δ A_i，j，t＝A_i，t-A_j，tAnd the smoothing term is defined as follows:

<math><mrow> <msub> <mi>E</mi> <mrow> <mi>smooth</mi> <mn>1</mn> </mrow> </msub> <mo>=</mo> <munder> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>t</mi> </mrow> </munder> <msup> <mrow> <mo>(</mo> <msub> <mi>ΔA</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>-</mo> <msub> <mrow> <mn>2</mn> <mi>ΔA</mi> </mrow> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>t</mi> </mrow> </msub> <mo>+</mo> <mi>Δ</mi> <msub> <mi>A</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow></math>

the energy minimization problem is solved by the Levenbergy-Marquard algorithm.

Lambda of above₁＝2.8，λ₂＝1.1。

The invention explores the semantic-driven video stylized rendering technology by researching the segmentation and identification of the video and the establishment of the space-time corresponding relation, and achieves the expression effect required by the art. The invention starts from the semantic analysis research of the input video, adopts an interactive mode based on key frames, provides sufficient prior information for video segmentation while reducing the burden of a user to the maximum extent, and then propagates the interactive information on the key frames to subsequent frames by establishing the corresponding relation of characteristic points between the frames and adopting a regional competition algorithm, so that the semantic information of the user can sufficiently guide accurate video segmentation. And different stroke libraries are created for different styles. During rendering, the key frame is rendered according to the semantic information, and then the stroke of the key frame is transmitted to the sequence frame through spatial transformation by taking the spatial-temporal relationship of the semantic region as constraint, so that the 'jitter' effect of the rendering result is effectively inhibited. In addition, the invention further provides a system scheme convenient for user interactive creation, thereby improving the applicability of the project. The invention can be widely applied to various industries such as advertisement, education, entertainment and the like, and has important application background.

Detailed Description

Example (b):

the invention relates to an interactive video stylized rendering method based on video interpretation, which comprises an interactive video semantic segmentation module and a video stylized module, wherein the segmentation method of the interactive video semantic segmentation module comprises the following steps:

1) interactive segmentation and automatic identification of key frame images;

2) matching dense feature points among the key frames;

3) a region competition segmentation algorithm;

5) a stroke propagation method of the sequence frame;

6) a damping brush system for anti-shake.

The stylization of the video will use both modules in turn. Namely, firstly, the interactive semantic segmentation module is used for carrying out semantic segmentation on the video. And performing stylized rendering on the segmented video by using a video stylization module. The interactive video semantic segmentation module 1 comprises the following steps of 1) interactive segmentation and automatic identification method of key frame images:

in the invention, the mature recognition technology TextonBoost and the interactive segmentation method GraphCut are integrated, and interactive semantic segmentation and recognition are carried out on the key frame image, so that the object region and the mutual layering and shielding relation in the image are obtained. The system of the invention classifies the segmented semantic regions into twelve categories according to different material properties of the semantic regions, including sky, water, land, rock, hair, skin, clothes and the like, as shown in table 1.

Table 1: 12 material classes of semantic region

Mountain range	Water (W)	Rock/building	Leaves/bushes
				Skin/leather	Hair/hair	Flower/fruit	Sky/cloud
Clothes	Trunk/branch	Abstracted background	Wood/plastic

In actual operation, the method adopts three main characteristics of texture, color distribution and position information for training and recognition. Given a region image X, the conditional probability of defining its class c is:

the last four terms in the formula are respectively a texture potential energy function, a color potential energy function, a position potential energy function and a normalization term.

Texture potential energy function is defined as psi_i(c_i，X；θ_Ψ)＝logP(c_i|X，_i)，P(c_i|X，_i) Is a normalized distribution function given by the Boost classifier.

The color potential energy function is defined as pi (c)_i，X；θ_π)＝log∑_kθ_π(c_i，k)P(k|x_i) Here, the present invention uses a Gaussian Mixture model in the CIELab color space (Gaussian Mixture Models: GMMs) to represent a color model with a conditional probability of, for a pixel color x in a given image:wherein mu_kAnd Σ k denotes the mean and variance, respectively, of the kth color cluster.

The position potential energy function is defined as lambda (c)_i，X；θ_λ)＝logθ_λ(c_iI), the position potential energy function is relatively weak with respect to the first two potential energy functions, in the definition of this function the class labels of the image pixels are only related to the absolute position in the image.

The method is used for training 12 types of materials, then the probability of each pixel in a given image region for each category is calculated by adopting the formula, all pixels in the region are counted, and the category of each region is determined by adopting a voting mode. In the stylized rendering process, the selection of the paintbrush is determined by the material identified by the object area, and a foundation is laid for realizing automatic rendering.

2) Matching of dense feature points between key frames

After obtaining the semantic information on the key frames, the present invention needs to explore a matching algorithm between frames to effectively propagate the semantic information to the sequence frames.

The invention firstly provides comprehensive line drawing and texture and color mixed image template characteristics, and provides rich characteristic set and expression for image matching problems.

(a) Line drawing features are represented by Gabor basis as: f^sk(I_i)＝||<I_i，G_cos，x，θ>||²+||<I_i，G_sin，x，θ>||²，G_sin，x，θAnd G_cos，x，θRepresenting the sine and cosine Gabor bases of the direction theta at position x, respectively. Its characteristic probability distribution is expressed as:

representing the parameter theta_i，h^skIs a function of the sigmoid and is,

is a standardized constraint.

So the model will encourage a stronger corresponding edge than the background distribution.

(b) Texture features are modeled by a simplified histogram of gradient directions (HOG), with 6 feature dimensions representing different gradient directions.Represents the jth direction of the HOG, and

a descriptor corresponding to the ith feature Ii is shown.

is the parameter theta_i. It can be seen that the model encourages responses to a collection of feature image blocks in a relatively large set.

(c) The color characteristics are described in terms of simple pixel brightness.

according to the invention, by combining the similar small features of the images, a feature combination with strong discrimination can be obtained locally. Firstly, the image is segmented to obtain a plurality of tiny image blocks in the image. And extracting statistical characteristics capable of describing line drawing, texture and color from the small image block. In order to effectively obtain the feature combination, an iterative region growing and model learning algorithm is adopted, the feature combination region is iteratively grown by continuously updating the feature model, and finally the feature combination with strong local discrimination is obtained.

Based on the expression, the invention models the matching problem of the moving object in the time domain and the space domain as a layered graph matching framework on a graph representation. The extracted mixed image template features serve as graph nodes, graph structures are built among frames, and edge connection relations among the graph nodes can be defined based on similarity and spatial positions among the features and object types to which the features belong.

The original drawing and the target drawing are represented by Is and It, and U, V represents a mixed template feature set in Is and It, respectively. For each feature point U ∈ U', there are two labels: hierarchy tag I (u) e {1, 2

And establishing a vertex set of the graph structure by using the candidate set C with higher matching degree of each feature point in the original graph. With E ═ E⁺∪E^-And constructing an edge set. Negative sides indicate that the connected candidates repel each other, and define their "repulsive force" as:

Indicating the degree of closeness of cooperation between them,denotes v_i，v_jThe spatial distance therebetween.

Graph structure G for combining original image and target image^s、G^TAnd dividing the image into K +1 layers, wherein K represents the number of objects in the original image. With G^sFor example, the division is denoted as ═ g₀，g₁，...，g_k}. Wherein, g_kIs G^sA sub-graph of which vertices are grouped by U_kAnd (4) showing. Similarly, G^TSet of vertices of (1) as V_kAnd (4) showing. Then G is^sAnd G^TThe matching relationship between the two is expressed as

Assuming that the matching between subgraphs is independent of each other, then:

in the invention, matching sub-graph pairs (g) are defined by geometric transformation and appearance measure_k，g_k') measure of similarity betweenAnd (4) showing. In summary, the solution to the graph structure matching problem can be configured as:

W＝(K，∏＝{g₀，g₁，...，g_k}，Ψ＝{Φ_k}，Φ＝{Φ_k})

under the Bayes theory framework, the invention describes the graph structure matching problem with the maximum posterior probability:

W^*＝argmaxp(W|G^s，G^T)＝argmaxp(W)p(G^s，G^T|W)

the present invention may solve the above equation by a Markov Chain Monte Carlo (MCMC) method. Meanwhile, for efficient calculation, the method explores a cluster sampling strategy, and quickly converges to a global optimal solution through efficient jumping in a solution space so as to achieve matching of inter-frame feature points.

(1) Region competition segmentation algorithm

On the basis of obtaining the inter-frame stable matching relation, the invention provides an inter-frame matching-based regional competition propagation algorithm by mining the advantages of a regional competition mechanism in video segmentation. By using the image matching algorithm of the layered graph structure, the invention can determine the matching relationship between the characteristics of the previous frame and the current frame, the semantic information of the previous frame is transmitted to the current frame, and then the current frame is divided into a plurality of semantic areas by using the area competition division algorithm according to the characteristic information of each matching area.

W＝{(R₁，R₂，...R_N)，(θ₁，θ₂，...，θ_N)，(I₁，I₂，....，I_N)}

wherein R is_iIndicating the divided regions having the same characteristics,

θ_irepresents a region R_iParameters of the corresponding feature probability distribution model, I_iRepresents a region R_iAnd marking correspondingly.

wherein

Represents a region R_iThe boundary contour of (1). The invention adopts an iterative mode to estimate the parameter theta in stages_iAnd f, alternately iterating two stages, and continuously reducing the energy function in each stage, thereby continuously learning and reasoning the final segmentation result of the whole image.

In the process of regional competition, each region continuously updates the characteristic probability distribution model of the region, simultaneously contends for ownership of pixel points according to the steepest descent principle, and updates respective boundary contour, so that each region continuously expands the range, and finally the image segmentation result of the current frame is obtained.

Is provided with

Wherein,

at point of τ k

The direction vector of (2). Dot

To which region it belongs, depending on the point

Fitting the degree described by the region feature probability distribution model.

In order to determine the membership between each pixel point and each region, the invention provides an image segmentation algorithm based on a competition mechanism to rapidly complete image segmentation. The specific image segmentation algorithm process based on the competition mechanism is described as follows:

in the initialization stage, the initial parameters of various models are estimated according to the matched characteristic image blocks, the boundary points of all the characteristic image blocks are added into a queue to be determined, and the posterior probability that all the boundary points belong to various types is calculated.

In a loop iteration stage, selecting a boundary point i with the steepest descending current energy from an undetermined queue, and further updating all boundaries where the boundary point i is located; then under the current segmentation state, recalculating the model parameters of each region by using maximum likelihood estimation; and recalculating the posterior probabilities of all the boundary points belonging to the various types by using the newly obtained feature distribution models of the regions.

1. Video stylization module

Video stylization is based on an interactive video semantic segmentation module. The selection of the brush is determined only by the material corresponding to the identified object region. The brush of the system of the invention is based on a professional painter to draw a large number of typical strokes on paper, then scanning and parameterizing are carried out, and finally a stroke library is established. For each image region rendering, a large brush is first used for priming, and then the brush size and opacity are gradually reduced to fine-delineate detailed portions of the object. During drawing, adopting a drawing strategy of firstly drawing the edge and then drawing the inside: drawing of each layer of image the invention starts with the edge first, draws along the line-drawn edge first, and aligns the brush according to the flow field. In video rendering, in order to ensure the continuity and stability of the brush in the time domain, the invention adopts the thin-plate spline interpolation technology to carry out the propagation of the brush strokes. In addition, in the process of spreading the pen strokes, the area of the pen stroke area is calculated, and a pen stroke deleting and adding mechanism is designed. And the 'shaking' effect of the rendering result is reduced by using a simulated damping spring system.

(1) Key frame non-photorealistic drawing technology based on semantic analysis

How to design different artistic style stroke models is one of the focuses of video stylization attention. Works with different artistic expression forms have respective characteristics on stroke expression. In video stylization, the basic drawing strategy of the invention is to select proper strokes for drawing based on image content, and the stroke library is to draw a large number of typical strokes on paper based on a professional painter, then to scan and parameterize, and finally to complete the establishment. For brush B to be drawn_nThe following information is contained: class information l of brush_nRange of laying area Λ_nColor mapping C_nAlpha of the transparency field_nHeight field H_nAnd control point { P_niThere are:

B_n＝{I_n，Λ_n，C_n，α_n，H_n，{P_ni}}

when designing the stroke model, the invention not only considers the low-level information of the shape, the texture and the like of the stroke, but also integrates the high-level semantic information of the stroke. So that each interpretation zone of the image/video has a "pen" to rely on during the rendering process. The method is one of the keys of the rendering algorithm of the invention different from the traditional pen-touch-based rendering algorithm. Therefore, when the strokes are selected, the interpretation area categories are used as key words, and a batch of strokes with the same category can be simply and quickly selected from the stroke library. And then select one stroke from them in a random manner.

In order to simulate the principle of 'alignment' in oil painting drawing, the invention uses the original simple model theory for reference, and each region R is provided with a plurality of regions R_iIn the invention, the original simple graph SK is calculated_iAnd (4) expressing. The reduced graph is composed of a set of salient elements for marking the surface features of the object, such as spots, lines, folds and the like on clothes. During rendering, different paintbrushes are overlaid on the primitives to produce the desired artistic effect. Interpretation region R_i，R_i∈Λ_iDivided into line-drawing parts for describing the line-drawing

And for describing non-line drawing parts having the same structural region

R_iDirection field theta_iIs defined as:

wherein the direction field theta_iThe initial value being line tracing

In the direction of the gradient of (c). Then using diffusion equation to propagate direction to non-line drawing region

The process of rendering the key frame is a process of continuously selecting strokes and placing strokes. To interpret the region R_iFor example, the invention first renders its non-line-drawn parts

Then rendering the line-drawn part

This is to ensure that when the rendered regions overlap, the pen strokes of the line-drawn portion can be on the upper layer. In the non-line drawing part, an unrendered pixel area is selected optionally, the area is diffused to two sides along the direction field by taking the center of the area as an initial point, and a flow pattern area is generated. And taking the central axis of the area as a reference line, and converting the selected brush into the flow pattern area to align the central axis of the pen touch with the central axis of the area. Rendering of the line-drawn portion of the region is similar.

(2) Stroke propagation algorithm of sequence frame

In the invention, the rendering of the non-key frame is obtained by the 'propagation' of the rendering result of the key frame. The propagation basis is the spatio-temporal correspondence of the interpretation zones. In the propagation process, as the interpretation zone changes more and more, the brush strokes may gradually leak outside the zone while gaps in the zone appear as being rendered. Therefore, in propagating the stroke graph, the adding and deleting mechanisms of the strokes must be considered at the same time. Otherwise, the rendering result will have a jitter phenomenon. The following describes the mechanism of propagation, addition and deletion of strokes, respectively.

(d) And (3) pen touch transmission: let c denote a certain interpretation zone of the key-frame at time t of the video, R_i(t +1) represents R_i(t) the region corresponding to time t + 1. Their image areas are respectively marked with Λ_i(t)、Λ_i(t + 1). With P_ij(t)、P_ij(t +1) represents Λ_i(t)、Λ_i(t +1) dense matching points in the time domain (calculated during video interpretation). Let R be_i(t +1) can be represented by R_i(t) non-rigid transformation of the table. When the pen touch is transmitted, the invention hopes to be inverted V_iMatching point P on (t)_ij(t) can be mapped to a new image region Λ in the t +1 th frame_iMatching point P of (t +1)_ij(t + 1). Based on the above consideration, the invention selects a Thin-plate Spline (TPS) interpolation model. It can handle Lambda_i(t) key Point P_ij(t) mapping to Λ_iMatching point P of (t +1)_ij(t +1), and for Λ_iThe TPS minimizes the energy function to make lambda be_iThe pixel grid of (t) is distorted by elastic (non-rigid) deformation.

(e) Deleting brush strokes: because the region corresponding to some brushes becomes smaller and smaller after the brushes are propagated in the video or in an occlusion relationship or when the number of frames of stroke propagation is too large, the invention eliminates the brushes when the area of the region corresponding to the brushes is smaller than a given threshold. Similarly, a propagated brush is also deleted when it falls outside the corresponding zone boundary.

(f) The pen strokes are added. When new semantic areas appear or existing semantic areas become larger (such as unfolding of clothing), the invention must add new brushes to cover the new areas, and simply change the size and position of adjacent brushes in order to fill the gap between the brushes. If the area not covered by a brush becomes larger and exceeds some given threshold, the system will automatically create a new brush to cover it. Nevertheless, it is not possible with the present invention to draw a stroke on the gap immediately upon its first occurrence. Thus, the present invention sets a relatively high threshold and delays rendering newly appearing regions until they grow large enough. Next, the present invention employs a general brush placement algorithm to fill in gaps large enough to reach a threshold, and finally propagates and transforms these new brushes back to fill in previously occurring but unrendered gap regions. The process of filling the brush backwards avoids frequent brush changes while linking smaller, fragmented brushes to larger brushes, thereby reducing flickering effects and other undesirable artificially created visual effects. Also, since the present invention adds new brushes at the bottom layer, they are drawn under the existing brushes, which further reduces the visual flicker effect.

(3) Damping brush system for preventing shaking

The final step in stylizing the video is the anti-shake operation. The invention connects adjacent paintbrushes in the time domain and the space domain by springs to simulate a damping system. By minimizing the energy of the system, the effect of removing jitter is achieved.

Energy function definition for a damped brush systemThe following were used:

E＝E_data+λ₁E_smooth1+λ₂E_smooth2

λ₁and λ₂For weighting, in the experiment, the present invention sets it as λ₁＝2.8，λ₂＝1.1。

the third term in the equation smoothly constrains adjacent brushes in both the temporal and spatial domains. Note the book

For any adjacent brush, i.e. the ith brush at time t

the energy minimization problem is solved by the Levenbergy-Marquard algorithm.

Claims

1. An interactive video stylized rendering method based on video interpretation is characterized by comprising an interactive video semantic segmentation module and a video stylization module.

The segmentation method of the interactive video semantic segmentation module comprises the following steps:

1) interactive segmentation and automatic identification of key frame images;

2) matching dense feature points among the key frames;

3) performing area competition segmentation;

1) performing non-photorealistic drawing on the key frame based on semantic analysis;

2) stroke propagation of sequence frames;

3) treated with an anti-shake, damped brush system.

The method comprises the steps of sequentially using an interactive video semantic segmentation module and a video stylization module for stylizing a video, namely performing semantic segmentation on the video by using the interactive video semantic segmentation module, and performing stylized rendering on the segmented video by using the video stylization module.

2. The interactive video stylized rendering method based on video interpretation according to claim 1, characterized in that the interactive segmentation and automatic identification method of key frame images of the above steps is as follows:

(formula 1)

Texture potential energy function is defined as psi_i(c_i，X；θ_Ψ)＝logP(c_i|X，i)，P(c_iIx, i), a normalized distribution function given by the Boost classifier;

the color potential energy function is defined as pi (c)_i，X；θ_n)＝log∑_kθ_n(c_i，k)P(k|x_i) Using a Gaussian Mixture model in CIELab color space (Gaussian Mixture Models: GMMs) to represent a color model with a conditional probability of, for a pixel color x in a given image:

the position potential energy function is defined as lambda (c)_i，X；θ_λ)＝logθ₂(c_iI) the position potential energy function is relatively weak with respect to the first two potential energy functions, in the definition of this function the class labels of the image pixels are only related to the absolute position in the image;

training 12 types of materials by using the method, giving the probability of each pixel in an image region for each type by adopting formula 1, counting all pixels in the region, and determining the type of each region by adopting a voting mode; in the stylized rendering process, the selection of the paintbrush is determined by the material identified by the object area, and a foundation is laid for realizing automatic rendering.

3. The interactive video stylized rendering method based on video interpretation according to claim 1, characterized in that the matching method of dense feature points among the key frames of the above step 2) is as follows:

11) line drawing features are represented by Gabor basis as:

F^sk(I_i)＝||<I_i，G_cos，x，θ>||²+||<I_i，G_sin，x，θ，G_iand G_cRepresenting the sine and cosine Gabor bases, respectively, in the direction at position x. Its characteristic probability distribution is expressed as:

representing the parameter theta_i，h^skIs a function of the sigmoid and is,

is a standardized constraint.

a descriptor corresponding to the ith feature is represented;

is F^tMean over all positive samples. The present invention represents a probabilistic model of a feature as:

according to the invention, the pixel brightness value is quantized to each statistical interval, so that the model can be simplified as follows:

and Is, It represents the original graph and the target graph, U, V represents the mixed template feature set in Is, It, and there are two marks for each feature point U e U': hierarchy tag I (u) e {1, 2

Indicating the degree of closeness of cooperation between them,

denotes v_i，v_iThe spatial distance therebetween;

W＝(K，∏＝{g₀，g₁，...，g_k}，Ψ＝{Φ_k}，Φ＝{Φ_k})

W^*＝argmaxp(W|G^s，G^T)＝argmaxp(W)p(G^s，G^T|W)

4. The interactive video stylized rendering method based on video interpretation according to claim 1, characterized in that the region competition segmentation method of the above step 3) is as follows:

W＝{(R₁，R₂，...R_N)，(θ₁，θ₂，...，θ_N)，(I₁，I₂，...，I_N)}

wherein R is_iIndicating the divided regions having the same characteristics,

the number N of the segmentation areas can be determined according to the matching relation of the features in the previous frame and the next frame. Setting a feature small region set corresponding to each regionS＝{S₁，S₂，...，S_NFor each region R }_iAccording to the small area S occupied by the feature_iEstimating initial parameter theta of the model_iObtaining an initial posterior probability P (theta)_iI (x, y)). According to the MDL principle, the posterior probability is converted into the minimum problem of solving the energy function, and the following results are obtained:

wherein

Is provided with

Wherein,

is tau_kAt the point ofDirection vector, point of

To which region it belongs, depending on the point

5. The interactive video stylized rendering method based on video interpretation according to claim 1, characterized in that the stylization method step 4) of the video stylization module (2) is based on an interactive video semantic segmentation module, the selection of the brush being determined only by the material corresponding to the identified object region;

6. The interactive video stylized rendering method based on video interpretation according to claim 1, characterized in that the keyframe nonphotorealistic rendering method based on semantic parsing of the stylized method step 5) of the video stylized module (2) is as follows:

how to design pen touch models with different artistic styles is one of the focuses of video stylization attention, works with different artistic expression forms have characteristics on pen touch expression, a basic drawing strategy in video stylization is to select proper pen touches to draw based on image content, a pen touch library is to draw a large number of typical pen touches on paper based on professional painters, then to scan and parameterize, and finally to complete establishment, and for a brush to be drawn

The following information is contained: class information of brush

Range of placement area

Color mapping

Of a field of transparency

Height field

And a control point

Namely, the following steps are provided:

when designing the stroke model, not only the low-level information such as the shape and texture of the stroke is considered, but also the high-level semantic information of the stroke is integrated, so that each interpretation area of the image/video has pen dependence in the rendering process; when selecting the strokes, the category of the interpretation area is taken as a keyword, a batch of strokes with the same category are simply and quickly selected from the stroke library, and one stroke is selected from the strokes in a random mode;

for simulating the principle of 'alignment' in oil painting drawing, the original simple model theory is used for reference, and in each area

Internally, calculate its original reduced graph

Expressed, the reduced graph is composed of a set of salient elements for marking the surface characteristics of the object, such as spots, lines and folds on clothes; during rendering, different paintbrushes will be overlaid on these primitives to produce the desired artistic effect; interpretation zone

Divided into line-drawing parts for describing the line-drawing

And for describing non-line drawing parts having the same structural region

；

Direction field

Is defined as:

in which the direction field

The initial value being line tracing

And then propagating the direction to a non-line-drawing region using a diffusion equation

；

The rendering process of the key frame is a process of continuously selecting strokes and placing strokes; to interpret the region

For example, first render its non-line-drawn part

Then rendering the line-drawn part

(ii) a This is to ensure that when rendered regions overlap, the pen strokes of the line-drawn portion can be on the upper layer; in the non-line drawing part, optionally selecting an unrendered pixel area, taking the center of the area as an initial point, diffusing to two sides along the direction field, and generating a flow pattern area; taking the central axis of the area as a reference line, and transforming the selected brush into the flow pattern area to align the central axis of the pen touch with the central axis of the area; rendering of the line-drawn portion of the region is similar.

7. The interactive video stylized rendering method based on video interpretation according to claim 1, characterized in that the stylized method step 5) of the video stylized module (2) is a stroke propagation method of the sequence frames as follows:

and (3) pen touch transmission: let a certain interpretation zone representing the key-frame at time t of the video,

to representThe regions corresponding to the time t +1, their image regions respectively

、

Represents; to be provided with

、

To represent

、

Dense matching points in the time domain (computed during video interpretation); suppose that

Watch can pass

Non-rigid transformation of the table; when the pen touch is transmitted, the invention hopes that

Upper matching point

Can map to a new image area in the t +1 th frame

Is matched with

Based on the above consideration, the invention selects a Thin-plate Spline (TPS) interpolation model, which can be used for calculating the thickness of the Thin-plate Spline

Middle key point

Mapping toIs matched with

To a

The TPS minimizes the energy function to obtain the final TPS

The pixel grid of (a) is distorted by elastic (non-rigid) deformation;

deleting brush strokes: because some brush regions become smaller and smaller after the brushes are propagated in the video or have a shielding relationship or the number of stroke propagation frames is too many, the invention eliminates the brushes when the area of the brush regions corresponding to the brushes is smaller than a given threshold value, and also deletes the brush regions when the propagated brushes fall outside the corresponding region boundaries;

adding strokes, when new semantic areas appear or existing semantic areas become larger and larger (such as unfolding of clothes), the invention must add new brushes to cover the new areas, and in order to fill gaps among the brushes, the invention only needs to simply change the size and the position of the adjacent brushes, and if the area which is not covered by the brushes becomes larger and exceeds a given threshold, the system automatically creates a new brush to cover the new brushes; nevertheless, it is not possible with the present invention to draw a stroke on the gap immediately when it first appears; thus, the present invention sets a relatively high threshold and delays rendering newly appearing regions until they grow large enough; then, the invention adopts a general brush placement algorithm to fill the large enough gaps reaching the threshold, and finally reversely propagates and transforms the new brushes to fill the gap areas which appear previously but are not rendered; the process of filling the paintbrush backwards can avoid frequently changing paintbrushes, and can link smaller and fragmented paintbrushes into larger paintbrushes, thereby reducing flicker effects and other undesirable visual effects caused by human factors; also, since the present invention adds new brushes at the bottom layer, they are drawn under the existing brushes, which further reduces the visual flicker effect.

8. The interactive video stylized rendering method based on video interpretation according to claim 1, characterized in that the damping brush system for anti-shake in the stylized method step 6) of the video stylized module (2) is as follows:

for the ith brush at time t, the invention uses

Geometric attributes representing its center coordinates and size, and its initial values are noted(ii) a The energy function of the damped brush system is defined as follows: