Nothing Special   »   [go: up one dir, main page]

CN102542593A - Interactive video stylized rendering method based on video interpretation - Google Patents

Interactive video stylized rendering method based on video interpretation Download PDF

Info

Publication number
CN102542593A
CN102542593A CN201110302054XA CN201110302054A CN102542593A CN 102542593 A CN102542593 A CN 102542593A CN 201110302054X A CN201110302054X A CN 201110302054XA CN 201110302054 A CN201110302054 A CN 201110302054A CN 102542593 A CN102542593 A CN 102542593A
Authority
CN
China
Prior art keywords
video
region
mrow
image
brush
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201110302054XA
Other languages
Chinese (zh)
Inventor
刘树郁
张新楠
江波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN201110302054XA priority Critical patent/CN102542593A/en
Publication of CN102542593A publication Critical patent/CN102542593A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention relates to an interactive video stylized rendering method based on video interpretation, wherein an interactive video semantic segmentation module and a video stylization module are utilized. A segmentation method of the interactive video semantic segmentation module comprises the following steps of: (1) interactive segmentation and automatic identification of key frame images; (2) matching of dense characteristic points among key frames; and (3) area competition segmentation. A stylization method of the video stylization module comprises the following steps of: (4) un-reality sense drawing of the key frames based on semantic analysis; (5) a brushwork propagating method of a sequence frame; and (6) a damping pen brush system for preventing shaking. The interactive video stylized rendering method based on the video interpretation, disclosed by the invention, has the advantages of short manufacturing period, low cost and favorability on manufacturing in batches.

Description

Interactive video stylized rendering method based on video interpretation
Technical Field
The invention discloses an interactive video stylized rendering method based on video interpretation, and belongs to the technology for modifying the interactive video stylized rendering method based on the video interpretation.
Background
With the wide spread of computers, digital cameras and digital video cameras, people have higher and higher requirements for manufacturing video entertainment. With the resulting explosion in the field of home digital entertainment. More and more people are trying to play the role of an amateur "director" enthusiastically to produce and edit various commonly-written videos. In recent years, various stylized videos are gradually accepted by people and become popular elements, particularly in the aspects of animation videos, online game production and the like. For example, manually drawn oil painting short films such as 'old man and sea' and water and ink painting videos such as 'polliwog looking for mother' lead people to be widely attentive, and the former also obtains a series of awards such as 'oscar short films'. The video stylized rendering not only needs professional technology, but also needs a large amount of manpower and financial support, and the traditional video stylized technology realizes the stylized rendering through a frame-by-frame drawing method. Although the visual effect of each frame of image of the work finished in the production mode can be manually controlled, the continuous playing causes a large jitter phenomenon of a video picture due to lack of inter-frame consistency, and the methods have long production period and high cost and are not beneficial to batch production. For example, although the above-mentioned oil painting short piece of "old man and sea" has a duration of only 22 minutes, the manufacturing cycle can last for nearly 3 years.
Disclosure of Invention
The invention aims to provide an interactive video stylized rendering method based on video interpretation, which has the advantages of short manufacturing period, low cost and benefit for batch manufacturing in consideration of the problems.
The technical scheme of the invention is as follows: the invention relates to an interactive video stylized rendering method based on video interpretation, which comprises an interactive video semantic segmentation module and a video stylized module, wherein the segmentation method of the interactive video semantic segmentation module comprises the following steps:
1) interactive segmentation and automatic identification of key frame images;
2) matching dense feature points among the key frames;
3) a region competition segmentation algorithm;
the stylization method of the video stylization module comprises the following steps:
4) performing non-photorealistic drawing on the key frame based on semantic analysis;
5) a stroke propagation method of the sequence frame;
6) a damping brush system for anti-shake.
The stylization of the video will use both modules in turn. Namely, firstly, the interactive semantic segmentation module is used for carrying out semantic segmentation on the video. And performing stylized rendering on the segmented video by using a video stylization module. The interactive segmentation and automatic identification method of the key frame image in the step 1) comprises the following steps:
dividing the divided semantic regions into twelve classes according to different material properties of the semantic regions, wherein the classes comprise sky/cloud, mountain/land, rock/building, leaves/tree bundle, hair/hair, flower/fruit, skin/leather, trunk/branch, abstract background, wood/plastic, water and clothes;
in actual operation, three main features of texture, color distribution and position information are adopted for training and recognition, a region image X is given, and the conditional probability of the category c is defined as follows:
logP(x|X,θ)=∑iΨi(ci,X;θΨ)+π(ci,X;θπ)+λ(ci,X;θλ)-logZ(θ,X)(*)
the latter four terms in the formula are respectively a texture potential energy function, a color potential energy function, a position potential energy function and a normalization term.
Texture potential energy function is defined as psii(ci,X;θΨ)=logP(ci|X,i),P(ciIx, i) is a normalized distribution function given by the Boost classifier;
the color potential energy function is defined as pi (c)i,X;θπ)=log∑kθn(ci,k)P(k|xi) Using a Gaussian Mixture model in CIELab color space (Gaussian Mixture Models: GMMs) to represent a color model with a conditional probability of, for a pixel color x in a given image:
Figure BDA0000095342860000031
wherein mukAnd Σ k represents the mean and variance of the kth color cluster, respectively;
the position potential energy function is defined as lambda (c)i,X;θλ)=logθλ(ciI) the position potential energy function is relatively weak with respect to the first two potential energy functions, in the definition of this function the class labels of the image pixels are only related to the absolute position in the image;
training 12 types of materials by using the method, then calculating the probability of each pixel in a given image region for each type by using the formula, finally counting all pixels in the region, and determining the type of each region by using a voting mode; in the stylized rendering process, the selection of the paintbrush is determined by the material identified by the object area, and a foundation is laid for realizing automatic rendering.
The matching method of dense feature points among the key frames in the step 2) is as follows:
after semantic information on the key frame is obtained, line drawing characteristics and texture and color mixed image template characteristics are integrated, and rich characteristic sets and expressions are provided for image matching problems;
11) line drawing features are represented by Gabor basis as:
Fsk(Ii)=||<Ii,Gcos,x,θ>||2+||<Ii,Gsin,x,θ>||2,Gsin,x,θand Gcos,x,θRespectively show at the position
And x is a sine and cosine Gabor base with the direction theta. Its characteristic probability distribution is expressed as:
<math><mrow> <mfrac> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>I</mi> <mi>i</mi> </msub> <mo>|</mo> <msub> <mi>B</mi> <mi>i</mi> </msub> <mo>;</mo> <msub> <mi>&theta;</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <mi>q</mi> <mrow> <mo>(</mo> <msub> <mi>I</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mi>Z</mi> <mrow> <mo>(</mo> <msubsup> <mi>&lambda;</mi> <mi>i</mi> <mi>sk</mi> </msubsup> <mo>)</mo> </mrow> </mrow> </mfrac> <mi>exp</mi> <mo>{</mo> <msubsup> <mi>&lambda;</mi> <mi>i</mi> <mi>sk</mi> </msubsup> <msup> <mi>h</mi> <mi>sk</mi> </msup> <mo>[</mo> <msup> <mi>F</mi> <mi>sk</mi> </msup> <mrow> <mo>(</mo> <msub> <mi>I</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>]</mo> <mo>}</mo> </mrow></math>
representing the parameter thetai,hskIs a function of the sigmoid and is,
Figure BDA0000095342860000043
is a standardized constraint.
So the model will encourage a stronger corresponding edge than the background distribution;
12) the texture features are modeled by a simplified histogram of gradient directions (HOG), and 6 feature dimensions respectively represent different gradient directions; represents the jth direction of the HOG, and
Figure BDA0000095342860000044
represents the ith feature IiA corresponding descriptor;
Figure BDA0000095342860000045
Figure BDA0000095342860000046
is Ftxt(Ii) Mean over all positive samples. The present invention represents a probabilistic model of a feature as:
<math><mrow> <mfrac> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>I</mi> <mi>i</mi> </msub> <mo>|</mo> <msub> <mi>B</mi> <mi>i</mi> </msub> <mo>;</mo> <msub> <mi>&theta;</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <mi>q</mi> <mrow> <mo>(</mo> <msub> <mi>I</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mi>Z</mi> <mrow> <mo>(</mo> <msubsup> <mi>&lambda;</mi> <mi>i</mi> <mi>txt</mi> </msubsup> <mo>)</mo> </mrow> </mrow> </mfrac> <mi>exp</mi> <mo>{</mo> <msubsup> <mi>&lambda;</mi> <mi>i</mi> <mi>txt</mi> </msubsup> <msub> <mi>&Sigma;</mi> <mi>j</mi> </msub> <msubsup> <mi>h</mi> <mi>j</mi> <mi>txt</mi> </msubsup> <mo>[</mo> <msup> <mi>F</mi> <mi>txt</mi> </msup> <mrow> <mo>(</mo> <msub> <mi>I</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>]</mo> <mo>}</mo> </mrow></math>
Figure BDA0000095342860000048
is the parameter thetai. It can be seen that the model encourages responses to a collection of feature image blocks in a relatively large set;
13) the color characteristics are described in terms of simple pixel brightness,
Figure BDA0000095342860000049
is the filter in position x. According to the invention, the pixel brightness value is quantized to each statistical interval, so that the model can be simplified as follows:
<math><mrow> <mfrac> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>I</mi> <mi>i</mi> </msub> <mo>|</mo> <msub> <mi>B</mi> <mi>i</mi> </msub> <mo>;</mo> <msub> <mi>&theta;</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <mi>q</mi> <mrow> <mo>(</mo> <msub> <mi>I</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mi>Z</mi> <mrow> <mo>(</mo> <msubsup> <mi>&lambda;</mi> <mi>i</mi> <mi>fl</mi> </msubsup> <mo>)</mo> </mrow> </mrow> </mfrac> <mi>exp</mi> <mo>{</mo> <msub> <mi>&Sigma;</mi> <mi>j</mi> </msub> <msubsup> <mi>&lambda;</mi> <mi>i</mi> <mi>fl</mi> </msubsup> <msubsup> <mi>h</mi> <mi>xj</mi> <mi>fl</mi> </msubsup> <mo>[</mo> <msubsup> <mi>F</mi> <mi>xj</mi> <mi>fl</mi> </msubsup> <mrow> <mo>(</mo> <msub> <mi>I</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>]</mo> <mo>}</mo> </mrow></math>
the method comprises the steps of combining similar small image features to obtain a feature combination with strong local discrimination, firstly, segmenting an image to obtain a plurality of tiny image blocks in the image, extracting statistical features capable of describing line drawing, textures and colors from the small image blocks, and finally obtaining the feature combination with strong local discrimination by adopting an iterative region growing and model learning algorithm and continuously updating a feature model and iteratively growing a feature combination region in order to effectively obtain the feature combination;
on the basis of the expression, modeling the matching problem of the moving target on a time domain and a space domain into a layered graph matching frame on a graph representation, taking the extracted mixed image template characteristics as graph nodes, constructing a graph structure between the frames, and defining the edge connection relation between the graph nodes based on the similarity and the space position between the characteristics and the type of an object to which the characteristics belong;
the original graph and the target graph are represented by Is and It, U, V represents Is and the mixed template feature set in It, and each feature point U belongs to U', and the two marks are arranged: hierarchy tag I (u) e {1, 2
Figure BDA0000095342860000051
Establishing a vertex set of a graph structure by using a candidate set C with higher matching degree of each feature point in an original graph, and taking E as E+∪E-And constructing an edge set. Negative sides indicate that the connected candidates repel each other, and define their "repulsive force" as:
connecting spatially adjacent and non-mutually exclusive candidate feature points by positive edges
Figure BDA0000095342860000053
Indicating the degree of closeness of cooperation between them,denotes vi,vjThe spatial distance therebetween;
graph structure G for combining original image and target images、GTIs divided into K +1 layers, where K represents the number of objects in the original image and is denoted by GsFor example, the division is denoted as ═ g0,g1,...,gk}. Wherein, gkIs GsA sub-graph of which vertices are grouped by UkAnd (4) showing. Similarly, GTSet of vertices of (1) as VkAnd (4) showing. Then G issAnd GTThe matching relationship between the two is expressed asAssuming that the matching between subgraphs is independent of each other, then:
Figure BDA0000095342860000056
defining matching sub-graph pairs (g) by geometric transformation and appearance measurek,gk') measure of similarity between
Figure BDA0000095342860000057
Represents; in summary, the solution to the graph structure matching problem can be configured as:
W=(K,∏={g0,g1,...,gk},Ψ={Φk},Φ={Φk})
under the Bayesian theory framework, the graph structure matching problem is described by maximizing the posterior probability:
W*=argmaxp(W|Gs,GT)=argmaxp(W)p(Gs,GT|W)
the above formula is solved by a Markov Chain Monte Carlo (MCMC) method, and meanwhile, for efficient calculation, the global optimal solution is quickly converged by efficient skip in a solution space so as to achieve matching of the inter-frame feature points.
The area competition segmentation method of the step 3) is as follows:
on the basis of obtaining a stable matching relation between frames, the matching relation between the characteristics of a previous frame and the characteristics of a current frame can be determined by mining the advantages of a region competition mechanism in video segmentation and utilizing an image matching algorithm of a layered graph structure, so that the semantic information of the previous frame is spread to the current frame, and then the current frame is segmented into a plurality of semantic regions by utilizing a region competition segmentation algorithm according to the characteristic information of each matching region;
given image I, the corresponding image segmentation solution is defined as follows:
W={(R1,R2,...RN),(θ1,θ2,...,θN),(I1,I2,,...,IN)}
wherein R isiIndicating the divided regions having the same characteristics,
Figure BDA0000095342860000061
Figure BDA0000095342860000062
θirepresents a region RiParameters of the corresponding feature probability distribution model, IiRepresents a region RiA corresponding mark;
the number N of the segmentation areas can be determined according to the matching relation of the features in the previous frame and the next frame. Let the feature small region set S corresponding to each region be { S }1,S2,...,SNFor each region R }iAccording to the small area S occupied by the featureiEstimating initial parameter theta of the modeliObtaining an initial posterior probability P (theta)iI (x, y)). According to the MDL principle, the posterior probability is converted into the minimum problem of solving the energy function, and the following results are obtained:
<math><mrow> <mi>log</mi> <mi>P</mi> <mrow> <mo>(</mo> <mi>W</mi> <mo>|</mo> <mi>I</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>E</mi> <mo>[</mo> <mi>&Gamma;</mi> <mo>,</mo> <mo>{</mo> <msub> <mi>&theta;</mi> <mi>i</mi> </msub> <mo>}</mo> <mo>]</mo> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <mo>-</mo> <msub> <mrow> <mo>&Integral;</mo> <mo>&Integral;</mo> </mrow> <msub> <mi>R</mi> <mn>1</mn> </msub> </msub> <mi>log</mi> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>&theta;</mi> <mi>i</mi> </msub> <mo>|</mo> <mi>I</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mi>dxdy</mi> </mrow></math>
wherein
Figure BDA0000095342860000071
Figure BDA0000095342860000073
Represents a region RiThe boundary contour of (1). The invention adopts an iterative mode to estimate the parameter theta in stagesiAlternating and iterating two stages, and continuously reducing an energy function in each stage, so as to continuously learn and deduce a final segmentation result of the whole image;
in the regional competition process, continuously updating a characteristic probability distribution model of each region, simultaneously competing for ownership of pixel points according to the steepest descent principle, and updating respective boundary contour, so that each region continuously expands the range, and finally obtaining an image segmentation result of the current frame;
the specific iteration steps are as follows: the first stage, fixing Γ, estimates { θ ] from the current region segmentation stateiGet the parameter theta under the current stateiAs its optimal solution
Figure BDA0000095342860000074
To minimize the cost of describing each region, the energy function therefore translates into:
<math><mrow> <msubsup> <mi>&theta;</mi> <mi>i</mi> <mo>*</mo> </msubsup> <mo>=</mo> <msub> <mrow> <mi>arg</mi> <mi></mi> <mi>max</mi> </mrow> <msub> <mi>&theta;</mi> <mi>i</mi> </msub> </msub> <mo>{</mo> <msub> <mrow> <mo>&Integral;</mo> <mo>&Integral;</mo> </mrow> <mi>R</mi> </msub> <mi>log</mi> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>&theta;</mi> <mi>i</mi> </msub> <mo>|</mo> <mi>I</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mi>dxdy</mi> <mo>}</mo> <mo>,</mo> <mo>&ForAll;</mo> <mi>i</mi> <mo>&Element;</mo> <mo>[</mo> <mn>1</mn> <mo>,</mo> <mi>N</mi> <mo>]</mo> </mrow></math>
second stage, { θ }iAs known, the steepest descent is carried out on the gamma, and in order to quickly obtain the minimum solution of the energy function, the steepest descent motion equation is solved for the boundary gamma of all the areas. For any point on the boundary contour gamma
Figure BDA0000095342860000076
Is provided with
<math><mrow> <mfrac> <mrow> <mi>d</mi> <mover> <mi>&upsi;</mi> <mo>&RightArrow;</mo> </mover> </mrow> <mi>dt</mi> </mfrac> <mo>=</mo> <mo>-</mo> <mfrac> <mrow> <mi>&delta;E</mi> <mrow> <mo>(</mo> <mi>&Gamma;</mi> <mo>,</mo> <mo>{</mo> <msub> <mi>&theta;</mi> <mi>i</mi> </msub> <mo>}</mo> <mo>)</mo> </mrow> </mrow> <mrow> <mi>&delta;</mi> <mover> <mi>&upsi;</mi> <mo>&RightArrow;</mo> </mover> </mrow> </mfrac> <mo>=</mo> <msub> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>&Element;</mo> <mi>Q</mi> <mrow> <mo>(</mo> <mover> <mi>&upsi;</mi> <mo>&RightArrow;</mo> </mover> <mo>)</mo> </mrow> </mrow> </msub> <mi>log</mi> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>&theta;</mi> <mi>k</mi> </msub> <mo>|</mo> <mi>I</mi> <mrow> <mo>(</mo> <mover> <mi>&upsi;</mi> <mo>&RightArrow;</mo> </mover> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <msub> <mover> <mi>n</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mi>k</mi> <mrow> <mo>(</mo> <mover> <mi>&upsi;</mi> <mo>&RightArrow;</mo> </mover> <mo>)</mo> </mrow> </mrow> </msub> </mrow></math>
Wherein,
Figure BDA0000095342860000078
at point of τ kDirection vector, point of
Figure BDA00000953428600000711
To which region belongsDepending on the point
Figure BDA00000953428600000712
The degree of suitability for description by the region feature probability distribution model;
to determine the membership between each pixel point and the region, the competition-based image segmentation algorithm process is described as follows:
in the initialization stage, estimating initial parameters of various models according to the matched characteristic image blocks, adding boundary points of all the characteristic image blocks into a queue to be determined, and calculating posterior probabilities of all the boundary points belonging to various types;
in a loop iteration stage, selecting a boundary point i with the steepest descending current energy from an undetermined queue, and further updating all boundaries where the boundary point i is located; then under the current segmentation state, recalculating the model parameters of each region by using maximum likelihood estimation; recalculating the posterior probabilities of all the boundary points belonging to various types by using the newly obtained feature distribution models of the regions;
therefore, the boundary point with the fastest descending current energy is continuously selected from the queue to be determined to update the corresponding boundary, meanwhile, the characteristic distribution probability model of each region is updated timely according to the current region segmentation state, the regions are mutually restricted, and the ownership of the image region is simultaneously competed until the energy function converges, so that the image is segmented into the regions.
The stylization method of the video stylization module includes the steps of 4) video stylization is based on an interactive video semantic segmentation module, and the selection of the brush is determined only by the material corresponding to the identified object area;
the paintbrushes draw a large number of typical strokes on paper based on professional painters, then scan and parameterize, finally establish a stroke library, draw in each image area, firstly adopt a big brush to bottom, then gradually reduce brush size and opacity to finely depict the detailed part of an object, and during drawing, adopt the drawing strategy of edge first and then inside: drawing each layer of image firstly starts from the edge, firstly draws along the edge of line drawing, and aligns the brush according to the flow field;
in the video rendering, in order to ensure the continuity and stability of the brush in the time domain, a thin-plate spline interpolation technology is adopted to carry out the propagation of the brush strokes, and in addition, in the propagation process of the brush strokes, the deletion and addition mechanisms of the brush strokes are designed by calculating the area of the brush stroke areas; and the 'shaking' effect of the rendering result is reduced by using a simulated damping spring system.
The key frame non-photorealistic rendering method based on semantic analysis in the step 5) of the stylizing method of the video stylizing module is as follows:
how to design pen touch models with different artistic styles is one of the focuses of video stylization attention, works with different artistic expression forms have characteristics on pen touch expression, a basic drawing strategy in video stylization is to select proper pen touches to draw based on image content, a pen touch library is to draw a large number of typical pen touches on paper based on professional painters, then to scan and parameterize, and finally to complete establishment, and for a brush B to be drawnnThe following information is contained: class information of brush InRange of laying area ΛnColor mapping CnAlpha of the transparency fieldnHeight field HnAnd control point { PniThere are:
Bn={In,Λn,Cn,αn,Hn,{Pni}}
when designing the stroke model, not only the low-level information such as the shape and texture of the stroke is considered, but also the high-level semantic information of the stroke is integrated, so that each interpretation area of the image/video has pen dependence in the rendering process; when selecting the strokes, the interpretation area categories are used as key words, and a batch of strokes with the same category are simply and quickly selected from the stroke library. And then selecting a stroke from the strokes in a random manner;
for simulating the principle of 'alignment' in oil painting drawing, the original simple model theory is used for reference, and in each region RiIn the interior, calculate its original reduced graph SKiAnd (4) expressing. The reduced graph is composed of a group of salient elements for marking the surface characteristics of an object, such as spots, lines and wrinkles on clothes; during rendering, different paintbrushes will be overlaid on these primitives to produce the desired artistic effect; interpretation region Ri,Ri∈ΛiDivided into line-drawing parts for describing the line-drawing
Figure BDA0000095342860000091
And for describing non-line drawing parts having the same structural region
Figure BDA0000095342860000092
RiThe directional field θ x is defined as:
<math><mrow> <msub> <mi>&Theta;</mi> <mi>i</mi> </msub> <mo>=</mo> <mo>{</mo> <mi>&theta;</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>|</mo> <mi>&theta;</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <mo>[</mo> <mn>0</mn> <mo>,</mo> <mi>&pi;</mi> <mo>)</mo> <mo>,</mo> <mo>&ForAll;</mo> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <msub> <mi>&Lambda;</mi> <mi>i</mi> </msub> <mo>}</mo> </mrow></math>
wherein the direction field thetaiThe initial value being line tracingIn the direction of the gradient of (c). Then using diffusion equation to propagate direction to non-line drawing region
Figure BDA0000095342860000095
The rendering process of the key frame is a process of continuously selecting strokes and placing strokes; to interpret the region RiFor example, first render its non-line-drawn part
Figure BDA0000095342860000101
Then rendering the line-drawn part
Figure BDA0000095342860000102
This is to ensure that when rendered regions overlap, the pen strokes of the line-drawn portion can be on the upper layer; in the non-line drawing part, optionally selecting an unrendered pixel area, taking the center of the area as an initial point, diffusing to two sides along the direction field, and generating a flow pattern area; taking the central axis of the area as a reference line, and transforming the selected brush into the flow pattern area to align the central axis of the pen touch with the central axis of the area; rendering of the line-drawn portion of the region is similar.
The stylization method of the video stylization module (2) described above, step 5) is the stroke propagation method of the sequence frame as follows:
the rendering of the non-key frame is obtained by 'propagation' of the rendering result of the key frame, the propagation basis is the time-space corresponding relation of the interpretation region, in the propagation process, as the variation of the interpretation region is larger and larger, the stroke may be gradually leaked to the outside of the region, and meanwhile, a rendered gap appears in the region, so in the propagation stroke graph, the adding and deleting mechanism of the stroke must be considered at the same time, otherwise, the rendering result has a jitter phenomenon; the mechanism of propagation, addition and deletion of strokes is as follows:
(a) and (3) pen touch transmission: let c denote a certain interpretation zone of the key-frame at time t of the video, Ri(t +1) represents Ri(t) the region corresponding to time t + 1. Their image area is divided intoIs distinguished by Λi(t)、Λi(t + 1). With Pij(t)、Pij(t +1) represents Λi(t), Λ x (t +1) dense matching points in the time domain (calculated during video interpretation). Let R bei(t +1) can be represented by Ri(t) non-rigid transformation of the table. When the pen touch is transmitted, the invention hopes to be inverted ViMatching point P on (t)ij(t) can be mapped to a new image region Λ in the t +1 th frameiMatching point P of (t +1)ij(t + 1). Based on the above consideration, the invention selects a Thin-plate Spline (TPS) interpolation model. It can handle Lambdai(t) key Point Pij(t) mapping to ΛiMatching point P of (t +1)ij(t +1), and for ΛiThe TPS minimizes the energy function to make lambda beiThe pixel grid of (t) is distorted by elastic (non-rigid) deformation.
(b) Deleting brush strokes: because the region corresponding to some brushes becomes smaller and smaller after the brushes are propagated in the video or in an occlusion relationship or when the number of frames of stroke propagation is too large, the invention eliminates the brushes when the area of the region corresponding to the brushes is smaller than a given threshold. Similarly, a propagated brush is also deleted when it falls outside the corresponding zone boundary.
(c) The pen strokes are added. When new semantic areas appear or existing semantic areas become larger (such as unfolding of clothing), the invention must add new brushes to cover the new areas, and simply change the size and position of adjacent brushes in order to fill the gap between the brushes. If the area not covered by a brush becomes larger and exceeds some given threshold, the system will automatically create a new brush to cover it. Nevertheless, it is not possible with the present invention to draw a stroke on the gap immediately upon its first occurrence. Thus, the present invention sets a relatively high threshold and delays rendering newly appearing regions until they grow large enough. Next, the present invention employs a general brush placement algorithm to fill in gaps large enough to reach a threshold, and finally propagates and transforms these new brushes back to fill in previously occurring but unrendered gap regions. The process of filling the brush backwards avoids frequent brush changes while linking smaller, fragmented brushes to larger brushes, thereby reducing flickering effects and other undesirable artificially created visual effects. Also, since the present invention adds new brushes at the bottom level, they are drawn under the existing brushes, which further reduces the visual flicker effect.
The damping brush system for anti-shake in the step 6) of the stylization method of the video stylization module is as follows:
the last step of stylized rendering of the video is anti-shake operation, in which adjacent paintbrushes in the time domain and the space domain are connected by springs to simulate a damping system; by minimizing the energy of the system, the effect of removing the jitter can be achieved;
for the ith brush at the time t, the invention uses Ai,t=(xi,t,yi,t,si,t) Geometric attributes representing its center coordinates and size, and its initial values are noted
Figure BDA0000095342860000121
The energy function of the damped brush system is defined as follows:
E=Edata1Esmooth12Esmooth2
λ1and λ2As a weight, λ1=2.8,λ2=1.1;
The first term constrains the position of the brush not to be too far from the initial position:
<math><mrow> <msub> <mi>E</mi> <mi>data</mi> </msub> <mo>=</mo> <munder> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>t</mi> </mrow> </munder> <msup> <mrow> <mo>(</mo> <msub> <mi>A</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>t</mi> </mrow> </msub> <mo>-</mo> <msubsup> <mi>A</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>t</mi> </mrow> <mn>0</mn> </msubsup> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow></math>
the second term in the equation is the smoothing constraint on brush i in the time domain:
<math><mrow> <msub> <mi>E</mi> <mrow> <mi>smooth</mi> <mn>1</mn> </mrow> </msub> <mo>=</mo> <munder> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>t</mi> </mrow> </munder> <msup> <mrow> <mo>(</mo> <msub> <mi>A</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>-</mo> <msub> <mrow> <mn>2</mn> <mi>A</mi> </mrow> <mrow> <mi>i</mi> <mo>,</mo> <mi>t</mi> </mrow> </msub> <mo>+</mo> <msub> <mi>A</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow></math>
the third term in the formula smoothly constrains adjacent brush in both time domain and space domain; note the book
Figure BDA0000095342860000124
For any adjacent brush, i.e. the ith brush at time t
Figure BDA0000095342860000125
The relative distance difference and the size difference between them are recorded as Δ Ai,j,t=Ai,t-Aj,tAnd the smoothing term is defined as follows:
<math><mrow> <msub> <mi>E</mi> <mrow> <mi>smooth</mi> <mn>1</mn> </mrow> </msub> <mo>=</mo> <munder> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>t</mi> </mrow> </munder> <msup> <mrow> <mo>(</mo> <msub> <mi>&Delta;A</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>-</mo> <msub> <mrow> <mn>2</mn> <mi>&Delta;A</mi> </mrow> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>t</mi> </mrow> </msub> <mo>+</mo> <mi>&Delta;</mi> <msub> <mi>A</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow></math>
the energy minimization problem is solved by the Levenbergy-Marquard algorithm.
Lambda of above1=2.8,λ2=1.1。
The invention explores the semantic-driven video stylized rendering technology by researching the segmentation and identification of the video and the establishment of the space-time corresponding relation, and achieves the expression effect required by the art. The invention starts from the semantic analysis research of the input video, adopts an interactive mode based on key frames, provides sufficient prior information for video segmentation while reducing the burden of a user to the maximum extent, and then propagates the interactive information on the key frames to subsequent frames by establishing the corresponding relation of characteristic points between the frames and adopting a regional competition algorithm, so that the semantic information of the user can sufficiently guide accurate video segmentation. And different stroke libraries are created for different styles. During rendering, the key frame is rendered according to the semantic information, and then the stroke of the key frame is transmitted to the sequence frame through spatial transformation by taking the spatial-temporal relationship of the semantic region as constraint, so that the 'jitter' effect of the rendering result is effectively inhibited. In addition, the invention further provides a system scheme convenient for user interactive creation, thereby improving the applicability of the project. The invention can be widely applied to various industries such as advertisement, education, entertainment and the like, and has important application background.
Detailed Description
Example (b):
the invention relates to an interactive video stylized rendering method based on video interpretation, which comprises an interactive video semantic segmentation module and a video stylized module, wherein the segmentation method of the interactive video semantic segmentation module comprises the following steps:
1) interactive segmentation and automatic identification of key frame images;
2) matching dense feature points among the key frames;
3) a region competition segmentation algorithm;
the stylization method of the video stylization module comprises the following steps:
4) performing non-photorealistic drawing on the key frame based on semantic analysis;
5) a stroke propagation method of the sequence frame;
6) a damping brush system for anti-shake.
The stylization of the video will use both modules in turn. Namely, firstly, the interactive semantic segmentation module is used for carrying out semantic segmentation on the video. And performing stylized rendering on the segmented video by using a video stylization module. The interactive video semantic segmentation module 1 comprises the following steps of 1) interactive segmentation and automatic identification method of key frame images:
in the invention, the mature recognition technology TextonBoost and the interactive segmentation method GraphCut are integrated, and interactive semantic segmentation and recognition are carried out on the key frame image, so that the object region and the mutual layering and shielding relation in the image are obtained. The system of the invention classifies the segmented semantic regions into twelve categories according to different material properties of the semantic regions, including sky, water, land, rock, hair, skin, clothes and the like, as shown in table 1.
Table 1: 12 material classes of semantic region
Mountain range Water (W) Rock/building Leaves/bushes
Skin/leather Hair/hair Flower/fruit Sky/cloud
Clothes Trunk/branch Abstracted background Wood/plastic
In actual operation, the method adopts three main characteristics of texture, color distribution and position information for training and recognition. Given a region image X, the conditional probability of defining its class c is:
<math><mrow> <mi>log</mi> <mi>P</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>|</mo> <mi>X</mi> <mo>,</mo> <mi>&theta;</mi> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mi>&Sigma;</mi> <mi>i</mi> </munder> <msub> <mi>&Psi;</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>c</mi> <mi>i</mi> </msub> <mo>,</mo> <mi>X</mi> <mo>;</mo> <msub> <mi>&theta;</mi> <mi>&Psi;</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <mi>&pi;</mi> <mrow> <mo>(</mo> <msub> <mi>c</mi> <mi>i</mi> </msub> <mo>,</mo> <mi>X</mi> <mo>;</mo> <msub> <mi>&theta;</mi> <mi>&pi;</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <mi>&lambda;</mi> <mrow> <mo>(</mo> <msub> <mi>c</mi> <mi>i</mi> </msub> <mo>,</mo> <mi>X</mi> <mo>;</mo> <msub> <mi>&theta;</mi> <mi>&lambda;</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mi>log</mi> <mi>Z</mi> <mrow> <mo>(</mo> <mi>&theta;</mi> <mo>,</mo> <mi>X</mi> <mo>)</mo> </mrow> </mrow></math>
the last four terms in the formula are respectively a texture potential energy function, a color potential energy function, a position potential energy function and a normalization term.
Texture potential energy function is defined as psii(ci,X;θΨ)=logP(ci|X,i),P(ci|X,i) Is a normalized distribution function given by the Boost classifier.
The color potential energy function is defined as pi (c)i,X;θπ)=log∑kθπ(ci,k)P(k|xi) Here, the present invention uses a Gaussian Mixture model in the CIELab color space (Gaussian Mixture Models: GMMs) to represent a color model with a conditional probability of, for a pixel color x in a given image:wherein mukAnd Σ k denotes the mean and variance, respectively, of the kth color cluster.
The position potential energy function is defined as lambda (c)i,X;θλ)=logθλ(ciI), the position potential energy function is relatively weak with respect to the first two potential energy functions, in the definition of this function the class labels of the image pixels are only related to the absolute position in the image.
The method is used for training 12 types of materials, then the probability of each pixel in a given image region for each category is calculated by adopting the formula, all pixels in the region are counted, and the category of each region is determined by adopting a voting mode. In the stylized rendering process, the selection of the paintbrush is determined by the material identified by the object area, and a foundation is laid for realizing automatic rendering.
2) Matching of dense feature points between key frames
After obtaining the semantic information on the key frames, the present invention needs to explore a matching algorithm between frames to effectively propagate the semantic information to the sequence frames.
The invention firstly provides comprehensive line drawing and texture and color mixed image template characteristics, and provides rich characteristic set and expression for image matching problems.
(a) Line drawing features are represented by Gabor basis as: fsk(Ii)=||<Ii,Gcos,x,θ>||2+||<Ii,Gsin,x,θ>||2,Gsin,x,θAnd Gcos,x,θRepresenting the sine and cosine Gabor bases of the direction theta at position x, respectively. Its characteristic probability distribution is expressed as:
<math><mrow> <mfrac> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>I</mi> <mi>i</mi> </msub> <mo>|</mo> <msub> <mi>B</mi> <mi>i</mi> </msub> <mo>;</mo> <msub> <mi>&theta;</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <mi>q</mi> <mrow> <mo>(</mo> <msub> <mi>I</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mi>Z</mi> <mrow> <mo>(</mo> <msubsup> <mi>&lambda;</mi> <mi>i</mi> <mi>sk</mi> </msubsup> <mo>)</mo> </mrow> </mrow> </mfrac> <mi>exp</mi> <mo>{</mo> <msubsup> <mi>&lambda;</mi> <mi>i</mi> <mi>sk</mi> </msubsup> <msup> <mi>h</mi> <mi>sk</mi> </msup> <mo>[</mo> <msup> <mi>F</mi> <mi>sk</mi> </msup> <mrow> <mo>(</mo> <msub> <mi>I</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>]</mo> <mo>}</mo> </mrow></math>
Figure BDA0000095342860000162
representing the parameter thetai,hskIs a function of the sigmoid and is,
Figure BDA0000095342860000163
is a standardized constraint.
So the model will encourage a stronger corresponding edge than the background distribution.
(b) Texture features are modeled by a simplified histogram of gradient directions (HOG), with 6 feature dimensions representing different gradient directions.Represents the jth direction of the HOG, and
Figure BDA0000095342860000165
a descriptor corresponding to the ith feature Ii is shown.
Figure BDA0000095342860000166
Figure BDA0000095342860000167
Is Ftxt(Ii) Mean over all positive samples. The present invention represents a probabilistic model of a feature as:
<math><mrow> <mfrac> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>I</mi> <mi>i</mi> </msub> <mo>|</mo> <msub> <mi>B</mi> <mi>i</mi> </msub> <mo>;</mo> <msub> <mi>&theta;</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <mi>q</mi> <mrow> <mo>(</mo> <msub> <mi>I</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mi>Z</mi> <mrow> <mo>(</mo> <msubsup> <mi>&lambda;</mi> <mi>i</mi> <mi>txt</mi> </msubsup> <mo>)</mo> </mrow> </mrow> </mfrac> <mi>exp</mi> <mo>{</mo> <msubsup> <mi>&lambda;</mi> <mi>i</mi> <mi>txt</mi> </msubsup> <msub> <mi>&Sigma;</mi> <mi>j</mi> </msub> <msubsup> <mi>h</mi> <mi>j</mi> <mi>txt</mi> </msubsup> <mo>[</mo> <msup> <mi>F</mi> <mi>txt</mi> </msup> <mrow> <mo>(</mo> <msub> <mi>I</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>]</mo> <mo>}</mo> </mrow></math>
Figure BDA0000095342860000169
is the parameter thetai. It can be seen that the model encourages responses to a collection of feature image blocks in a relatively large set.
(c) The color characteristics are described in terms of simple pixel brightness.
Figure BDA00000953428600001610
Is the filter in position x. According to the invention, the pixel brightness value is quantized to each statistical interval, so that the model can be simplified as follows:
<math><mrow> <mfrac> <mrow> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>I</mi> <mi>i</mi> </msub> <mo>|</mo> <msub> <mi>B</mi> <mi>i</mi> </msub> <mo>;</mo> <msub> <mi>&theta;</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <mi>q</mi> <mrow> <mo>(</mo> <msub> <mi>I</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mi>Z</mi> <mrow> <mo>(</mo> <msubsup> <mi>&lambda;</mi> <mi>i</mi> <mi>fl</mi> </msubsup> <mo>)</mo> </mrow> </mrow> </mfrac> <mi>exp</mi> <mo>{</mo> <msub> <mi>&Sigma;</mi> <mi>j</mi> </msub> <msubsup> <mi>&lambda;</mi> <mi>i</mi> <mi>fl</mi> </msubsup> <msubsup> <mi>h</mi> <mi>xj</mi> <mi>fl</mi> </msubsup> <mo>[</mo> <msubsup> <mi>F</mi> <mi>xj</mi> <mi>fl</mi> </msubsup> <mrow> <mo>(</mo> <msub> <mi>I</mi> <mi>i</mi> </msub> <mo>)</mo> </mrow> <mo>]</mo> <mo>}</mo> </mrow></math>
according to the invention, by combining the similar small features of the images, a feature combination with strong discrimination can be obtained locally. Firstly, the image is segmented to obtain a plurality of tiny image blocks in the image. And extracting statistical characteristics capable of describing line drawing, texture and color from the small image block. In order to effectively obtain the feature combination, an iterative region growing and model learning algorithm is adopted, the feature combination region is iteratively grown by continuously updating the feature model, and finally the feature combination with strong local discrimination is obtained.
Based on the expression, the invention models the matching problem of the moving object in the time domain and the space domain as a layered graph matching framework on a graph representation. The extracted mixed image template features serve as graph nodes, graph structures are built among frames, and edge connection relations among the graph nodes can be defined based on similarity and spatial positions among the features and object types to which the features belong.
The original drawing and the target drawing are represented by Is and It, and U, V represents a mixed template feature set in Is and It, respectively. For each feature point U ∈ U', there are two labels: hierarchy tag I (u) e {1, 2
Figure BDA0000095342860000171
And establishing a vertex set of the graph structure by using the candidate set C with higher matching degree of each feature point in the original graph. With E ═ E+∪E-And constructing an edge set. Negative sides indicate that the connected candidates repel each other, and define their "repulsive force" as:
Figure BDA0000095342860000172
connecting spatially adjacent and non-mutually exclusive candidate feature points by positive edges
Figure BDA0000095342860000173
Indicating the degree of closeness of cooperation between them,denotes vi,vjThe spatial distance therebetween.
Graph structure G for combining original image and target images、GTAnd dividing the image into K +1 layers, wherein K represents the number of objects in the original image. With GsFor example, the division is denoted as ═ g0,g1,...,gk}. Wherein, gkIs GsA sub-graph of which vertices are grouped by UkAnd (4) showing. Similarly, GTSet of vertices of (1) as VkAnd (4) showing. Then G issAnd GTThe matching relationship between the two is expressed as
Figure BDA0000095342860000175
Assuming that the matching between subgraphs is independent of each other, then:
in the invention, matching sub-graph pairs (g) are defined by geometric transformation and appearance measurek,gk') measure of similarity betweenAnd (4) showing. In summary, the solution to the graph structure matching problem can be configured as:
W=(K,∏={g0,g1,...,gk},Ψ={Φk},Φ={Φk})
under the Bayes theory framework, the invention describes the graph structure matching problem with the maximum posterior probability:
W*=argmaxp(W|Gs,GT)=argmaxp(W)p(Gs,GT|W)
the present invention may solve the above equation by a Markov Chain Monte Carlo (MCMC) method. Meanwhile, for efficient calculation, the method explores a cluster sampling strategy, and quickly converges to a global optimal solution through efficient jumping in a solution space so as to achieve matching of inter-frame feature points.
(1) Region competition segmentation algorithm
On the basis of obtaining the inter-frame stable matching relation, the invention provides an inter-frame matching-based regional competition propagation algorithm by mining the advantages of a regional competition mechanism in video segmentation. By using the image matching algorithm of the layered graph structure, the invention can determine the matching relationship between the characteristics of the previous frame and the current frame, the semantic information of the previous frame is transmitted to the current frame, and then the current frame is divided into a plurality of semantic areas by using the area competition division algorithm according to the characteristic information of each matching area.
Given image I, the corresponding image segmentation solution is defined as follows:
W={(R1,R2,...RN),(θ1,θ2,...,θN),(I1,I2,....,IN)}
wherein R isiIndicating the divided regions having the same characteristics,
Figure BDA0000095342860000183
θirepresents a region RiParameters of the corresponding feature probability distribution model, IiRepresents a region RiAnd marking correspondingly.
The number N of the segmentation areas can be determined according to the matching relation of the features in the previous frame and the next frame. Let the feature small region set S corresponding to each region be { S }1,S2,...,SNFor each region R }iAccording to the small area S occupied by the featureiEstimating initial parameter theta of the modeliObtaining an initial posterior probability P (theta)iI (x, y)). According to the MDL principle, the posterior probability is converted into the minimum problem of solving the energy function, and the following results are obtained:
<math><mrow> <mi>log</mi> <mi>P</mi> <mrow> <mo>(</mo> <mi>W</mi> <mo>|</mo> <mi>I</mi> <mo>)</mo> </mrow> <mo>=</mo> <mi>E</mi> <mo>[</mo> <mi>&Gamma;</mi> <mo>,</mo> <mo>{</mo> <msub> <mi>&theta;</mi> <mi>i</mi> </msub> <mo>}</mo> <mo>]</mo> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <mo>-</mo> <msub> <mrow> <mo>&Integral;</mo> <mo>&Integral;</mo> </mrow> <msub> <mi>R</mi> <mn>1</mn> </msub> </msub> <mi>log</mi> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>&theta;</mi> <mi>i</mi> </msub> <mo>|</mo> <mi>I</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mi>dxdy</mi> </mrow></math>
wherein
Figure BDA0000095342860000192
Figure BDA0000095342860000193
Represents a region RiThe boundary contour of (1). The invention adopts an iterative mode to estimate the parameter theta in stagesiAnd f, alternately iterating two stages, and continuously reducing the energy function in each stage, thereby continuously learning and reasoning the final segmentation result of the whole image.
In the process of regional competition, each region continuously updates the characteristic probability distribution model of the region, simultaneously contends for ownership of pixel points according to the steepest descent principle, and updates respective boundary contour, so that each region continuously expands the range, and finally the image segmentation result of the current frame is obtained.
The specific iteration steps are as follows: the first stage, fixing Γ, estimates { θ ] from the current region segmentation stateiGet the parameter theta under the current stateiAs its optimal solution
Figure BDA0000095342860000195
To minimize the cost of describing each region, the energy function therefore translates into:
<math><mrow> <msubsup> <mi>&theta;</mi> <mi>i</mi> <mo>*</mo> </msubsup> <mo>=</mo> <msub> <mrow> <mi>arg</mi> <mi></mi> <mi>max</mi> </mrow> <msub> <mi>&theta;</mi> <mi>i</mi> </msub> </msub> <mo>{</mo> <msub> <mrow> <mo>&Integral;</mo> <mo>&Integral;</mo> </mrow> <mi>R</mi> </msub> <mi>log</mi> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>&theta;</mi> <mi>i</mi> </msub> <mo>|</mo> <mi>I</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mi>dxdy</mi> <mo>}</mo> <mo>,</mo> <mo>&ForAll;</mo> <mi>i</mi> <mo>&Element;</mo> <mo>[</mo> <mn>1</mn> <mo>,</mo> <mi>N</mi> <mo>]</mo> </mrow></math>
second stage, { θ }iAs known, the steepest descent is carried out on the gamma, and in order to quickly obtain the minimum solution of the energy function, the steepest descent motion equation is solved for the boundary gamma of all the areas. For any point on the boundary contour gamma
Figure BDA0000095342860000197
Is provided with
<math><mrow> <mfrac> <mrow> <mi>d</mi> <mover> <mi>&upsi;</mi> <mo>&RightArrow;</mo> </mover> </mrow> <mi>dt</mi> </mfrac> <mo>=</mo> <mo>-</mo> <mfrac> <mrow> <mi>&delta;E</mi> <mrow> <mo>(</mo> <mi>&Gamma;</mi> <mo>,</mo> <mo>{</mo> <msub> <mi>&theta;</mi> <mi>i</mi> </msub> <mo>}</mo> <mo>)</mo> </mrow> </mrow> <mrow> <mi>&delta;</mi> <mover> <mi>&upsi;</mi> <mo>&RightArrow;</mo> </mover> </mrow> </mfrac> <mo>=</mo> <msub> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>&Element;</mo> <mi>Q</mi> <mrow> <mo>(</mo> <mover> <mi>&upsi;</mi> <mo>&RightArrow;</mo> </mover> <mo>)</mo> </mrow> </mrow> </msub> <mi>log</mi> <mi>P</mi> <mrow> <mo>(</mo> <msub> <mi>&theta;</mi> <mi>k</mi> </msub> <mo>|</mo> <mi>I</mi> <mrow> <mo>(</mo> <mover> <mi>&upsi;</mi> <mo>&RightArrow;</mo> </mover> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mo>&CenterDot;</mo> <msub> <mover> <mi>n</mi> <mo>&RightArrow;</mo> </mover> <mrow> <mi>k</mi> <mrow> <mo>(</mo> <mover> <mi>&upsi;</mi> <mo>&RightArrow;</mo> </mover> <mo>)</mo> </mrow> </mrow> </msub> </mrow></math>
Wherein,
Figure BDA0000095342860000201
Figure BDA0000095342860000202
at point of τ k
Figure BDA0000095342860000203
The direction vector of (2). Dot
Figure BDA0000095342860000204
To which region it belongs, depending on the point
Figure BDA0000095342860000205
Fitting the degree described by the region feature probability distribution model.
In order to determine the membership between each pixel point and each region, the invention provides an image segmentation algorithm based on a competition mechanism to rapidly complete image segmentation. The specific image segmentation algorithm process based on the competition mechanism is described as follows:
in the initialization stage, the initial parameters of various models are estimated according to the matched characteristic image blocks, the boundary points of all the characteristic image blocks are added into a queue to be determined, and the posterior probability that all the boundary points belong to various types is calculated.
In a loop iteration stage, selecting a boundary point i with the steepest descending current energy from an undetermined queue, and further updating all boundaries where the boundary point i is located; then under the current segmentation state, recalculating the model parameters of each region by using maximum likelihood estimation; and recalculating the posterior probabilities of all the boundary points belonging to the various types by using the newly obtained feature distribution models of the regions.
Therefore, the boundary point with the fastest descending current energy is continuously selected from the queue to be determined to update the corresponding boundary, meanwhile, the characteristic distribution probability model of each region is updated timely according to the current region segmentation state, the regions are mutually restricted, and the ownership of the image region is simultaneously competed until the energy function converges, so that the image is segmented into the regions.
1. Video stylization module
Video stylization is based on an interactive video semantic segmentation module. The selection of the brush is determined only by the material corresponding to the identified object region. The brush of the system of the invention is based on a professional painter to draw a large number of typical strokes on paper, then scanning and parameterizing are carried out, and finally a stroke library is established. For each image region rendering, a large brush is first used for priming, and then the brush size and opacity are gradually reduced to fine-delineate detailed portions of the object. During drawing, adopting a drawing strategy of firstly drawing the edge and then drawing the inside: drawing of each layer of image the invention starts with the edge first, draws along the line-drawn edge first, and aligns the brush according to the flow field. In video rendering, in order to ensure the continuity and stability of the brush in the time domain, the invention adopts the thin-plate spline interpolation technology to carry out the propagation of the brush strokes. In addition, in the process of spreading the pen strokes, the area of the pen stroke area is calculated, and a pen stroke deleting and adding mechanism is designed. And the 'shaking' effect of the rendering result is reduced by using a simulated damping spring system.
(1) Key frame non-photorealistic drawing technology based on semantic analysis
How to design different artistic style stroke models is one of the focuses of video stylization attention. Works with different artistic expression forms have respective characteristics on stroke expression. In video stylization, the basic drawing strategy of the invention is to select proper strokes for drawing based on image content, and the stroke library is to draw a large number of typical strokes on paper based on a professional painter, then to scan and parameterize, and finally to complete the establishment. For brush B to be drawnnThe following information is contained: class information l of brushnRange of laying area ΛnColor mapping CnAlpha of the transparency fieldnHeight field HnAnd control point { PniThere are:
Bn={In,Λn,Cn,αn,Hn,{Pni}}
when designing the stroke model, the invention not only considers the low-level information of the shape, the texture and the like of the stroke, but also integrates the high-level semantic information of the stroke. So that each interpretation zone of the image/video has a "pen" to rely on during the rendering process. The method is one of the keys of the rendering algorithm of the invention different from the traditional pen-touch-based rendering algorithm. Therefore, when the strokes are selected, the interpretation area categories are used as key words, and a batch of strokes with the same category can be simply and quickly selected from the stroke library. And then select one stroke from them in a random manner.
In order to simulate the principle of 'alignment' in oil painting drawing, the invention uses the original simple model theory for reference, and each region R is provided with a plurality of regions RiIn the invention, the original simple graph SK is calculatediAnd (4) expressing. The reduced graph is composed of a set of salient elements for marking the surface features of the object, such as spots, lines, folds and the like on clothes. During rendering, different paintbrushes are overlaid on the primitives to produce the desired artistic effect. Interpretation region Ri,Ri∈ΛiDivided into line-drawing parts for describing the line-drawing
Figure BDA0000095342860000221
And for describing non-line drawing parts having the same structural region
Figure BDA0000095342860000222
RiDirection field thetaiIs defined as:
<math><mrow> <msub> <mi>&Theta;</mi> <mi>i</mi> </msub> <mo>=</mo> <mo>{</mo> <mi>&theta;</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>|</mo> <mi>&theta;</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <mo>[</mo> <mn>0</mn> <mo>,</mo> <mi>&pi;</mi> <mo>)</mo> <mo>,</mo> <mo>&ForAll;</mo> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>&Element;</mo> <msub> <mi>&Lambda;</mi> <mi>i</mi> </msub> <mo>}</mo> </mrow></math>
wherein the direction field thetaiThe initial value being line tracing
Figure BDA0000095342860000224
In the direction of the gradient of (c). Then using diffusion equation to propagate direction to non-line drawing region
Figure BDA0000095342860000225
The process of rendering the key frame is a process of continuously selecting strokes and placing strokes. To interpret the region RiFor example, the invention first renders its non-line-drawn parts
Figure BDA0000095342860000226
Then rendering the line-drawn part
Figure BDA0000095342860000227
This is to ensure that when the rendered regions overlap, the pen strokes of the line-drawn portion can be on the upper layer. In the non-line drawing part, an unrendered pixel area is selected optionally, the area is diffused to two sides along the direction field by taking the center of the area as an initial point, and a flow pattern area is generated. And taking the central axis of the area as a reference line, and converting the selected brush into the flow pattern area to align the central axis of the pen touch with the central axis of the area. Rendering of the line-drawn portion of the region is similar.
(2) Stroke propagation algorithm of sequence frame
In the invention, the rendering of the non-key frame is obtained by the 'propagation' of the rendering result of the key frame. The propagation basis is the spatio-temporal correspondence of the interpretation zones. In the propagation process, as the interpretation zone changes more and more, the brush strokes may gradually leak outside the zone while gaps in the zone appear as being rendered. Therefore, in propagating the stroke graph, the adding and deleting mechanisms of the strokes must be considered at the same time. Otherwise, the rendering result will have a jitter phenomenon. The following describes the mechanism of propagation, addition and deletion of strokes, respectively.
(d) And (3) pen touch transmission: let c denote a certain interpretation zone of the key-frame at time t of the video, Ri(t +1) represents Ri(t) the region corresponding to time t + 1. Their image areas are respectively marked with Λi(t)、Λi(t + 1). With Pij(t)、Pij(t +1) represents Λi(t)、Λi(t +1) dense matching points in the time domain (calculated during video interpretation). Let R bei(t +1) can be represented by Ri(t) non-rigid transformation of the table. When the pen touch is transmitted, the invention hopes to be inverted ViMatching point P on (t)ij(t) can be mapped to a new image region Λ in the t +1 th frameiMatching point P of (t +1)ij(t + 1). Based on the above consideration, the invention selects a Thin-plate Spline (TPS) interpolation model. It can handle Lambdai(t) key Point Pij(t) mapping to ΛiMatching point P of (t +1)ij(t +1), and for ΛiThe TPS minimizes the energy function to make lambda beiThe pixel grid of (t) is distorted by elastic (non-rigid) deformation.
(e) Deleting brush strokes: because the region corresponding to some brushes becomes smaller and smaller after the brushes are propagated in the video or in an occlusion relationship or when the number of frames of stroke propagation is too large, the invention eliminates the brushes when the area of the region corresponding to the brushes is smaller than a given threshold. Similarly, a propagated brush is also deleted when it falls outside the corresponding zone boundary.
(f) The pen strokes are added. When new semantic areas appear or existing semantic areas become larger (such as unfolding of clothing), the invention must add new brushes to cover the new areas, and simply change the size and position of adjacent brushes in order to fill the gap between the brushes. If the area not covered by a brush becomes larger and exceeds some given threshold, the system will automatically create a new brush to cover it. Nevertheless, it is not possible with the present invention to draw a stroke on the gap immediately upon its first occurrence. Thus, the present invention sets a relatively high threshold and delays rendering newly appearing regions until they grow large enough. Next, the present invention employs a general brush placement algorithm to fill in gaps large enough to reach a threshold, and finally propagates and transforms these new brushes back to fill in previously occurring but unrendered gap regions. The process of filling the brush backwards avoids frequent brush changes while linking smaller, fragmented brushes to larger brushes, thereby reducing flickering effects and other undesirable artificially created visual effects. Also, since the present invention adds new brushes at the bottom layer, they are drawn under the existing brushes, which further reduces the visual flicker effect.
(3) Damping brush system for preventing shaking
The final step in stylizing the video is the anti-shake operation. The invention connects adjacent paintbrushes in the time domain and the space domain by springs to simulate a damping system. By minimizing the energy of the system, the effect of removing jitter is achieved.
For the ith brush at the time t, the invention uses Ai,t=(xi,t,yi,t,si,t) Geometric attributes representing its center coordinates and size, and its initial values are noted
Figure BDA0000095342860000241
Energy function definition for a damped brush systemThe following were used:
E=Edata1Esmooth12Esmooth2
λ1and λ2For weighting, in the experiment, the present invention sets it as λ1=2.8,λ2=1.1。
The first term constrains the position of the brush not to be too far from the initial position:
<math><mrow> <msub> <mi>E</mi> <mi>data</mi> </msub> <mo>=</mo> <munder> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>t</mi> </mrow> </munder> <msup> <mrow> <mo>(</mo> <msub> <mi>A</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>t</mi> </mrow> </msub> <mo>-</mo> <msubsup> <mi>A</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>t</mi> </mrow> <mn>0</mn> </msubsup> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow></math>
the second term in the equation is the smoothing constraint on brush i in the time domain:
<math><mrow> <msub> <mi>E</mi> <mrow> <mi>smooth</mi> <mn>1</mn> </mrow> </msub> <mo>=</mo> <munder> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>t</mi> </mrow> </munder> <msup> <mrow> <mo>(</mo> <msub> <mi>A</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>-</mo> <msub> <mrow> <mn>2</mn> <mi>A</mi> </mrow> <mrow> <mi>i</mi> <mo>,</mo> <mi>t</mi> </mrow> </msub> <mo>+</mo> <msub> <mi>A</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow></math>
the third term in the equation smoothly constrains adjacent brushes in both the temporal and spatial domains. Note the book
Figure BDA0000095342860000253
For any adjacent brush, i.e. the ith brush at time t
Figure BDA0000095342860000254
The relative distance difference and the size difference between them are recorded as Δ Ai,j,t=Ai,t-Aj,tAnd the smoothing term is defined as follows:
<math><mrow> <msub> <mi>E</mi> <mrow> <mi>smooth</mi> <mn>1</mn> </mrow> </msub> <mo>=</mo> <munder> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>t</mi> </mrow> </munder> <msup> <mrow> <mo>(</mo> <msub> <mi>&Delta;A</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>t</mi> <mo>+</mo> <mn>1</mn> </mrow> </msub> <mo>-</mo> <msub> <mrow> <mn>2</mn> <mi>&Delta;A</mi> </mrow> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>t</mi> </mrow> </msub> <mo>+</mo> <mi>&Delta;</mi> <msub> <mi>A</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>t</mi> <mo>-</mo> <mn>1</mn> </mrow> </msub> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow></math>
the energy minimization problem is solved by the Levenbergy-Marquard algorithm.

Claims (9)

1. An interactive video stylized rendering method based on video interpretation is characterized by comprising an interactive video semantic segmentation module and a video stylization module.
The segmentation method of the interactive video semantic segmentation module comprises the following steps:
1) interactive segmentation and automatic identification of key frame images;
2) matching dense feature points among the key frames;
3) performing area competition segmentation;
the stylization method of the video stylization module comprises the following steps:
1) performing non-photorealistic drawing on the key frame based on semantic analysis;
2) stroke propagation of sequence frames;
3) treated with an anti-shake, damped brush system.
The method comprises the steps of sequentially using an interactive video semantic segmentation module and a video stylization module for stylizing a video, namely performing semantic segmentation on the video by using the interactive video semantic segmentation module, and performing stylized rendering on the segmented video by using the video stylization module.
2. The interactive video stylized rendering method based on video interpretation according to claim 1, characterized in that the interactive segmentation and automatic identification method of key frame images of the above steps is as follows:
dividing the divided semantic regions into twelve classes according to different material properties of the semantic regions, wherein the classes comprise sky/cloud, mountain/land, rock/building, leaves/tree bundle, hair/hair, flower/fruit, skin/leather, trunk/branch, abstract background, wood/plastic, water and clothes;
in actual operation, three main features of texture, color distribution and position information are adopted for training and recognition, a region image X is given, and the conditional probability of the category c is defined as follows:
Figure FDA0000095342850000021
(formula 1)
The last four terms in the formula are respectively a texture potential energy function, a color potential energy function, a position potential energy function and a normalization term.
Texture potential energy function is defined as psii(ci,X;θΨ)=logP(ci|X,i),P(ciIx, i), a normalized distribution function given by the Boost classifier;
the color potential energy function is defined as pi (c)i,X;θn)=log∑kθn(ci,k)P(k|xi) Using a Gaussian Mixture model in CIELab color space (Gaussian Mixture Models: GMMs) to represent a color model with a conditional probability of, for a pixel color x in a given image:
Figure FDA0000095342850000022
wherein mukAnd Σ k represents the mean and variance of the kth color cluster, respectively;
the position potential energy function is defined as lambda (c)i,X;θλ)=logθ2(ciI) the position potential energy function is relatively weak with respect to the first two potential energy functions, in the definition of this function the class labels of the image pixels are only related to the absolute position in the image;
training 12 types of materials by using the method, giving the probability of each pixel in an image region for each type by adopting formula 1, counting all pixels in the region, and determining the type of each region by adopting a voting mode; in the stylized rendering process, the selection of the paintbrush is determined by the material identified by the object area, and a foundation is laid for realizing automatic rendering.
3. The interactive video stylized rendering method based on video interpretation according to claim 1, characterized in that the matching method of dense feature points among the key frames of the above step 2) is as follows:
after semantic information on the key frame is obtained, line drawing characteristics and texture and color mixed image template characteristics are integrated, and rich characteristic sets and expressions are provided for image matching problems;
11) line drawing features are represented by Gabor basis as:
Fsk(Ii)=||<Ii,Gcos,x,θ>||2+||<Ii,Gsin,x,θ,Giand GcRepresenting the sine and cosine Gabor bases, respectively, in the direction at position x. Its characteristic probability distribution is expressed as:
Figure FDA0000095342850000031
Figure FDA0000095342850000032
representing the parameter thetai,hskIs a function of the sigmoid and is,
Figure FDA0000095342850000033
is a standardized constraint.
So the model will encourage a stronger corresponding edge than the background distribution;
12) the texture features are modeled by a simplified histogram of gradient directions (HOG), and 6 feature dimensions respectively represent different gradient directions; represents the jth direction of the HOG, and
Figure FDA0000095342850000034
a descriptor corresponding to the ith feature is represented;
Figure FDA0000095342850000035
is FtMean over all positive samples. The present invention represents a probabilistic model of a feature as:
Figure FDA0000095342850000036
Figure FDA0000095342850000037
is the parameter thetai. It can be seen that the model encourages responses to a collection of feature image blocks in a relatively large set;
13) the color characteristics are described in terms of simple pixel brightness,
Figure FDA0000095342850000038
according to the invention, the pixel brightness value is quantized to each statistical interval, so that the model can be simplified as follows:
Figure FDA00000953428500000310
the method comprises the steps of combining similar small image features to obtain a feature combination with strong local discrimination, firstly, segmenting an image to obtain a plurality of tiny image blocks in the image, extracting statistical features capable of describing line drawing, textures and colors from the small image blocks, and finally obtaining the feature combination with strong local discrimination by adopting an iterative region growing and model learning algorithm and continuously updating a feature model and iteratively growing a feature combination region in order to effectively obtain the feature combination;
on the basis of the expression, modeling the matching problem of the moving target on a time domain and a space domain into a layered graph matching frame on a graph representation, taking the extracted mixed image template characteristics as graph nodes, constructing a graph structure between the frames, and defining the edge connection relation between the graph nodes based on the similarity and the space position between the characteristics and the type of an object to which the characteristics belong;
and Is, It represents the original graph and the target graph, U, V represents the mixed template feature set in Is, It, and there are two marks for each feature point U e U': hierarchy tag I (u) e {1, 2
Figure FDA0000095342850000041
Establishing a vertex set of a graph structure by using a candidate set C with higher matching degree of each feature point in an original graph, and taking E as E+∪E-And constructing an edge set. Negative sides indicate that the connected candidates repel each other, and define their "repulsive force" as:
Figure FDA0000095342850000042
connecting spatially adjacent and non-mutually exclusive candidate feature points by positive edges
Figure FDA0000095342850000043
Indicating the degree of closeness of cooperation between them,
Figure FDA0000095342850000044
denotes vi,viThe spatial distance therebetween;
graph structure G for combining original image and target images、GTIs divided into K +1 layers, where K represents the number of objects in the original image and is denoted by GsFor example, the division is denoted as ═ g0,g1,...,gk}. Wherein, gkIs GsA sub-graph of which vertices are grouped by UkAnd (4) showing. Similarly, GTSet of vertices of (1) as VkAnd (4) showing. Then G issAnd GTThe matching relationship between the two is expressed asAssuming that the matching between subgraphs is independent of each other, then:
Figure FDA0000095342850000046
defining matching sub-graph pairs (g) by geometric transformation and appearance measurek,gk') measure of similarity between
Figure FDA0000095342850000051
Represents; in summary, the solution to the graph structure matching problem can be configured as:
W=(K,∏={g0,g1,...,gk},Ψ={Φk},Φ={Φk})
under the Bayesian theory framework, the graph structure matching problem is described by maximizing the posterior probability:
W*=argmaxp(W|Gs,GT)=argmaxp(W)p(Gs,GT|W)
the above formula is solved by a Markov Chain Monte Carlo (MCMC) method, and meanwhile, for efficient calculation, the global optimal solution is quickly converged by efficient skip in a solution space so as to achieve matching of the inter-frame feature points.
4. The interactive video stylized rendering method based on video interpretation according to claim 1, characterized in that the region competition segmentation method of the above step 3) is as follows:
on the basis of obtaining a stable matching relation between frames, the matching relation between the characteristics of a previous frame and the characteristics of a current frame can be determined by mining the advantages of a region competition mechanism in video segmentation and utilizing an image matching algorithm of a layered graph structure, so that the semantic information of the previous frame is spread to the current frame, and then the current frame is segmented into a plurality of semantic regions by utilizing a region competition segmentation algorithm according to the characteristic information of each matching region;
given image I, the corresponding image segmentation solution is defined as follows:
W={(R1,R2,...RN),(θ1,θ2,...,θN),(I1,I2,...,IN)}
wherein R isiIndicating the divided regions having the same characteristics,
Figure FDA0000095342850000053
θirepresents a region RiParameters of the corresponding feature probability distribution model, IiRepresents a region RiA corresponding mark;
the number N of the segmentation areas can be determined according to the matching relation of the features in the previous frame and the next frame. Setting a feature small region set corresponding to each regionS={S1,S2,...,SNFor each region R }iAccording to the small area S occupied by the featureiEstimating initial parameter theta of the modeliObtaining an initial posterior probability P (theta)iI (x, y)). According to the MDL principle, the posterior probability is converted into the minimum problem of solving the energy function, and the following results are obtained:
Figure FDA0000095342850000061
wherein
Figure FDA0000095342850000062
Figure FDA0000095342850000063
Represents a region RiThe boundary contour of (1). The invention adopts an iterative mode to estimate the parameter theta in stagesiAlternating and iterating two stages, and continuously reducing an energy function in each stage, so as to continuously learn and deduce a final segmentation result of the whole image;
in the regional competition process, continuously updating a characteristic probability distribution model of each region, simultaneously competing for ownership of pixel points according to the steepest descent principle, and updating respective boundary contour, so that each region continuously expands the range, and finally obtaining an image segmentation result of the current frame;
the specific iteration steps are as follows: the first stage, fixing Γ, estimates { θ ] from the current region segmentation stateiGet the parameter theta under the current stateiAs its optimal solution
Figure FDA0000095342850000065
To minimize the cost of describing each region, the energy function therefore translates into:
Figure FDA0000095342850000066
second stage, { θ }iAs known, the steepest descent is carried out on the gamma, and in order to quickly obtain the minimum solution of the energy function, the steepest descent motion equation is solved for the boundary gamma of all the areas. For any point on the boundary contour gamma
Figure FDA0000095342850000067
Is provided with
Wherein,
Figure FDA0000095342850000069
Figure FDA00000953428500000610
is taukAt the point ofDirection vector, point of
Figure FDA00000953428500000612
To which region it belongs, depending on the point
Figure FDA0000095342850000071
The degree of suitability for description by the region feature probability distribution model;
to determine the membership between each pixel point and the region, the competition-based image segmentation algorithm process is described as follows:
in the initialization stage, estimating initial parameters of various models according to the matched characteristic image blocks, adding boundary points of all the characteristic image blocks into a queue to be determined, and calculating posterior probabilities of all the boundary points belonging to various types;
in a loop iteration stage, selecting a boundary point i with the steepest descending current energy from an undetermined queue, and further updating all boundaries where the boundary point i is located; then under the current segmentation state, recalculating the model parameters of each region by using maximum likelihood estimation; recalculating the posterior probabilities of all the boundary points belonging to various types by using the newly obtained feature distribution models of the regions;
therefore, the boundary point with the fastest descending current energy is continuously selected from the queue to be determined to update the corresponding boundary, meanwhile, the characteristic distribution probability model of each region is updated timely according to the current region segmentation state, the regions are mutually restricted, and the ownership of the image region is simultaneously competed until the energy function converges, so that the image is segmented into the regions.
5. The interactive video stylized rendering method based on video interpretation according to claim 1, characterized in that the stylization method step 4) of the video stylization module (2) is based on an interactive video semantic segmentation module, the selection of the brush being determined only by the material corresponding to the identified object region;
the paintbrushes draw a large number of typical strokes on paper based on professional painters, then scan and parameterize, finally establish a stroke library, draw in each image area, firstly adopt a big brush to bottom, then gradually reduce brush size and opacity to finely depict the detailed part of an object, and during drawing, adopt the drawing strategy of edge first and then inside: drawing each layer of image firstly starts from the edge, firstly draws along the edge of line drawing, and aligns the brush according to the flow field;
in the video rendering, in order to ensure the continuity and stability of the brush in the time domain, a thin-plate spline interpolation technology is adopted to carry out the propagation of the brush strokes, and in addition, in the propagation process of the brush strokes, the deletion and addition mechanisms of the brush strokes are designed by calculating the area of the brush stroke areas; and the 'shaking' effect of the rendering result is reduced by using a simulated damping spring system.
6. The interactive video stylized rendering method based on video interpretation according to claim 1, characterized in that the keyframe nonphotorealistic rendering method based on semantic parsing of the stylized method step 5) of the video stylized module (2) is as follows:
how to design pen touch models with different artistic styles is one of the focuses of video stylization attention, works with different artistic expression forms have characteristics on pen touch expression, a basic drawing strategy in video stylization is to select proper pen touches to draw based on image content, a pen touch library is to draw a large number of typical pen touches on paper based on professional painters, then to scan and parameterize, and finally to complete establishment, and for a brush to be drawn
Figure RE-DEST_PATH_IMAGE078
The following information is contained: class information of brush
Figure RE-571907DEST_PATH_IMAGE079
Range of placement area
Figure RE-DEST_PATH_IMAGE080
Color mapping
Figure RE-494863DEST_PATH_IMAGE081
Of a field of transparency
Figure RE-DEST_PATH_IMAGE082
Height field
Figure RE-804622DEST_PATH_IMAGE083
And a control point
Figure RE-DEST_PATH_IMAGE084
Namely, the following steps are provided:
Figure RE-DEST_PATH_IMAGE086
when designing the stroke model, not only the low-level information such as the shape and texture of the stroke is considered, but also the high-level semantic information of the stroke is integrated, so that each interpretation area of the image/video has pen dependence in the rendering process; when selecting the strokes, the category of the interpretation area is taken as a keyword, a batch of strokes with the same category are simply and quickly selected from the stroke library, and one stroke is selected from the strokes in a random mode;
for simulating the principle of 'alignment' in oil painting drawing, the original simple model theory is used for reference, and in each area
Figure RE-563762DEST_PATH_IMAGE056
Internally, calculate its original reduced graph
Figure RE-617168DEST_PATH_IMAGE087
Expressed, the reduced graph is composed of a set of salient elements for marking the surface characteristics of the object, such as spots, lines and folds on clothes; during rendering, different paintbrushes will be overlaid on these primitives to produce the desired artistic effect; interpretation zone
Figure RE-DEST_PATH_IMAGE088
Divided into line-drawing parts for describing the line-drawing
Figure RE-394631DEST_PATH_IMAGE089
And for describing non-line drawing parts having the same structural region
Figure RE-DEST_PATH_IMAGE090
Figure RE-875291DEST_PATH_IMAGE056
Direction field
Figure RE-354683DEST_PATH_IMAGE091
Is defined as:
Figure RE-211781DEST_PATH_IMAGE093
in which the direction field
Figure RE-906067DEST_PATH_IMAGE091
The initial value being line tracing
Figure RE-557628DEST_PATH_IMAGE089
And then propagating the direction to a non-line-drawing region using a diffusion equation
Figure RE-540628DEST_PATH_IMAGE090
The rendering process of the key frame is a process of continuously selecting strokes and placing strokes; to interpret the region
Figure RE-935837DEST_PATH_IMAGE056
For example, first render its non-line-drawn part
Figure RE-484630DEST_PATH_IMAGE090
Then rendering the line-drawn part
Figure RE-272806DEST_PATH_IMAGE089
(ii) a This is to ensure that when rendered regions overlap, the pen strokes of the line-drawn portion can be on the upper layer; in the non-line drawing part, optionally selecting an unrendered pixel area, taking the center of the area as an initial point, diffusing to two sides along the direction field, and generating a flow pattern area; taking the central axis of the area as a reference line, and transforming the selected brush into the flow pattern area to align the central axis of the pen touch with the central axis of the area; rendering of the line-drawn portion of the region is similar.
7. The interactive video stylized rendering method based on video interpretation according to claim 1, characterized in that the stylized method step 5) of the video stylized module (2) is a stroke propagation method of the sequence frames as follows:
the rendering of the non-key frame is obtained by 'propagation' of the rendering result of the key frame, the propagation basis is the time-space corresponding relation of the interpretation region, in the propagation process, as the variation of the interpretation region is larger and larger, the stroke may be gradually leaked to the outside of the region, and meanwhile, a rendered gap appears in the region, so in the propagation stroke graph, the adding and deleting mechanism of the stroke must be considered at the same time, otherwise, the rendering result has a jitter phenomenon; the mechanism of propagation, addition and deletion of strokes is as follows:
and (3) pen touch transmission: let a certain interpretation zone representing the key-frame at time t of the video,
Figure RE-539839DEST_PATH_IMAGE095
to representThe regions corresponding to the time t +1, their image regions respectively
Figure RE-DEST_PATH_IMAGE098
Figure RE-676423DEST_PATH_IMAGE099
Represents; to be provided with
Figure RE-79722DEST_PATH_IMAGE101
Figure RE-DEST_PATH_IMAGE102
To represent
Figure RE-525616DEST_PATH_IMAGE098
Figure RE-279945DEST_PATH_IMAGE099
Dense matching points in the time domain (computed during video interpretation); suppose that
Figure RE-16957DEST_PATH_IMAGE095
Watch can pass
Figure RE-478025DEST_PATH_IMAGE096
Non-rigid transformation of the table; when the pen touch is transmitted, the invention hopes that
Figure RE-907870DEST_PATH_IMAGE098
Upper matching point
Figure RE-883916DEST_PATH_IMAGE101
Can map to a new image area in the t +1 th frame
Figure RE-424619DEST_PATH_IMAGE099
Is matched with
Figure RE-756505DEST_PATH_IMAGE102
Based on the above consideration, the invention selects a Thin-plate Spline (TPS) interpolation model, which can be used for calculating the thickness of the Thin-plate Spline
Figure RE-91672DEST_PATH_IMAGE098
Middle key point
Figure RE-820593DEST_PATH_IMAGE101
Mapping toIs matched with
Figure RE-335068DEST_PATH_IMAGE102
To a
Figure RE-841136DEST_PATH_IMAGE098
The TPS minimizes the energy function to obtain the final TPS
Figure RE-57354DEST_PATH_IMAGE098
The pixel grid of (a) is distorted by elastic (non-rigid) deformation;
deleting brush strokes: because some brush regions become smaller and smaller after the brushes are propagated in the video or have a shielding relationship or the number of stroke propagation frames is too many, the invention eliminates the brushes when the area of the brush regions corresponding to the brushes is smaller than a given threshold value, and also deletes the brush regions when the propagated brushes fall outside the corresponding region boundaries;
adding strokes, when new semantic areas appear or existing semantic areas become larger and larger (such as unfolding of clothes), the invention must add new brushes to cover the new areas, and in order to fill gaps among the brushes, the invention only needs to simply change the size and the position of the adjacent brushes, and if the area which is not covered by the brushes becomes larger and exceeds a given threshold, the system automatically creates a new brush to cover the new brushes; nevertheless, it is not possible with the present invention to draw a stroke on the gap immediately when it first appears; thus, the present invention sets a relatively high threshold and delays rendering newly appearing regions until they grow large enough; then, the invention adopts a general brush placement algorithm to fill the large enough gaps reaching the threshold, and finally reversely propagates and transforms the new brushes to fill the gap areas which appear previously but are not rendered; the process of filling the paintbrush backwards can avoid frequently changing paintbrushes, and can link smaller and fragmented paintbrushes into larger paintbrushes, thereby reducing flicker effects and other undesirable visual effects caused by human factors; also, since the present invention adds new brushes at the bottom layer, they are drawn under the existing brushes, which further reduces the visual flicker effect.
8. The interactive video stylized rendering method based on video interpretation according to claim 1, characterized in that the damping brush system for anti-shake in the stylized method step 6) of the video stylized module (2) is as follows:
the last step of stylized rendering of the video is anti-shake operation, in which adjacent paintbrushes in the time domain and the space domain are connected by springs to simulate a damping system; by minimizing the energy of the system, the effect of removing the jitter can be achieved;
for the ith brush at time t, the invention uses
Figure RE-126810DEST_PATH_IMAGE103
Geometric attributes representing its center coordinates and size, and its initial values are noted(ii) a The energy function of the damped brush system is defined as follows:
Figure RE-DEST_PATH_IMAGE106
Figure RE-416977DEST_PATH_IMAGE107
and
Figure RE-DEST_PATH_IMAGE108
in order to be the weight, the weight is,
Figure RE-DEST_PATH_IMAGE110
the first term constrains the position of the brush not to be too far from the initial position:
Figure RE-DEST_PATH_IMAGE112
the second term in the equation is the smoothing constraint on brush i in the time domain:
Figure RE-DEST_PATH_IMAGE114
the third term in the formula smoothly constrains adjacent brush in both time domain and space domain; note the book
Figure RE-485875DEST_PATH_IMAGE115
For any adjacent brush, i.e. the ith brush at time t
Figure RE-DEST_PATH_IMAGE116
The relative distance difference and size difference between them are recorded as
Figure RE-844175DEST_PATH_IMAGE117
And the smoothing term is defined as follows:
Figure RE-316745DEST_PATH_IMAGE119
the energy minimization problem is solved by the Levenbergy-Marquard algorithm.
9. The method of claim 8, wherein the interactive video stylized rendering based on video interpretation is performed by a client
Figure RE-276796DEST_PATH_IMAGE110
CN201110302054XA 2011-09-30 2011-09-30 Interactive video stylized rendering method based on video interpretation Pending CN102542593A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110302054XA CN102542593A (en) 2011-09-30 2011-09-30 Interactive video stylized rendering method based on video interpretation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110302054XA CN102542593A (en) 2011-09-30 2011-09-30 Interactive video stylized rendering method based on video interpretation

Publications (1)

Publication Number Publication Date
CN102542593A true CN102542593A (en) 2012-07-04

Family

ID=46349405

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110302054XA Pending CN102542593A (en) 2011-09-30 2011-09-30 Interactive video stylized rendering method based on video interpretation

Country Status (1)

Country Link
CN (1) CN102542593A (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103927372A (en) * 2014-04-24 2014-07-16 厦门美图之家科技有限公司 Image processing method based on user semanteme
CN104063876A (en) * 2014-01-10 2014-09-24 北京理工大学 Interactive image segmentation method
CN104346789A (en) * 2014-08-19 2015-02-11 浙江工业大学 Fast artistic style study method supporting diverse images
CN104867183A (en) * 2015-06-11 2015-08-26 华中科技大学 Three-dimensional point cloud reconstruction method based on region growing
CN105719327A (en) * 2016-02-29 2016-06-29 北京中邮云天科技有限公司 Art stylization image processing method
CN105825531A (en) * 2016-03-17 2016-08-03 广州多益网络股份有限公司 Method and device for dyeing game object
CN106296567A (en) * 2015-05-25 2017-01-04 北京大学 The conversion method of a kind of multi-level image style based on rarefaction representation and device
CN106485223A (en) * 2016-10-12 2017-03-08 南京大学 The automatic identifying method of rock particles in a kind of sandstone microsection
CN107277615A (en) * 2017-06-30 2017-10-20 北京奇虎科技有限公司 Live stylized processing method, device, computing device and storage medium
CN109741413A (en) * 2018-12-29 2019-05-10 北京金山安全软件有限公司 Rendering method and device for semitransparent objects in scene and electronic equipment
CN109816663A (en) * 2018-10-15 2019-05-28 华为技术有限公司 A kind of image processing method, device and equipment
CN110288625A (en) * 2019-07-04 2019-09-27 北京字节跳动网络技术有限公司 Method and apparatus for handling image
CN110446066A (en) * 2019-08-28 2019-11-12 北京百度网讯科技有限公司 Method and apparatus for generating video
CN110738715A (en) * 2018-07-19 2020-01-31 北京大学 automatic migration method of dynamic text special effect based on sample
CN111722896A (en) * 2019-03-21 2020-09-29 华为技术有限公司 Animation playing method, device, terminal and computer readable storage medium
CN112017179A (en) * 2020-09-09 2020-12-01 杭州时光坐标影视传媒股份有限公司 Method, system, electronic device and storage medium for evaluating visual effect grade of picture
CN113128498A (en) * 2019-12-30 2021-07-16 财团法人工业技术研究院 Cross-domain picture comparison method and system
CN113256484A (en) * 2021-05-17 2021-08-13 百果园技术(新加坡)有限公司 Method and device for stylizing image
CN116761018A (en) * 2023-08-18 2023-09-15 湖南马栏山视频先进技术研究院有限公司 Real-time rendering system based on cloud platform

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101807198A (en) * 2010-01-08 2010-08-18 中国科学院软件研究所 Video abstraction generating method based on sketch
CN101853517A (en) * 2010-05-26 2010-10-06 西安交通大学 Real image oil painting automatic generation method based on stroke limit and texture
CN101930614A (en) * 2010-08-10 2010-12-29 西安交通大学 Drawing rendering method based on video sub-layer

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101807198A (en) * 2010-01-08 2010-08-18 中国科学院软件研究所 Video abstraction generating method based on sketch
CN101853517A (en) * 2010-05-26 2010-10-06 西安交通大学 Real image oil painting automatic generation method based on stroke limit and texture
CN101930614A (en) * 2010-08-10 2010-12-29 西安交通大学 Drawing rendering method based on video sub-layer

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104063876A (en) * 2014-01-10 2014-09-24 北京理工大学 Interactive image segmentation method
CN104063876B (en) * 2014-01-10 2017-02-01 北京理工大学 Interactive image segmentation method
CN103927372A (en) * 2014-04-24 2014-07-16 厦门美图之家科技有限公司 Image processing method based on user semanteme
CN104346789A (en) * 2014-08-19 2015-02-11 浙江工业大学 Fast artistic style study method supporting diverse images
CN104346789B (en) * 2014-08-19 2017-02-22 浙江工业大学 Fast artistic style study method supporting diverse images
CN106296567A (en) * 2015-05-25 2017-01-04 北京大学 The conversion method of a kind of multi-level image style based on rarefaction representation and device
CN106296567B (en) * 2015-05-25 2019-05-07 北京大学 A kind of conversion method and device of the multi-level image style based on rarefaction representation
CN104867183A (en) * 2015-06-11 2015-08-26 华中科技大学 Three-dimensional point cloud reconstruction method based on region growing
CN105719327B (en) * 2016-02-29 2018-09-07 北京中邮云天科技有限公司 A kind of artistic style image processing method
CN105719327A (en) * 2016-02-29 2016-06-29 北京中邮云天科技有限公司 Art stylization image processing method
CN105825531A (en) * 2016-03-17 2016-08-03 广州多益网络股份有限公司 Method and device for dyeing game object
CN105825531B (en) * 2016-03-17 2018-08-21 广州多益网络股份有限公司 A kind of colouring method and device of game object
CN106485223B (en) * 2016-10-12 2019-07-12 南京大学 The automatic identifying method of rock particles in a kind of sandstone microsection
CN106485223A (en) * 2016-10-12 2017-03-08 南京大学 The automatic identifying method of rock particles in a kind of sandstone microsection
CN107277615A (en) * 2017-06-30 2017-10-20 北京奇虎科技有限公司 Live stylized processing method, device, computing device and storage medium
CN107277615B (en) * 2017-06-30 2020-06-23 北京奇虎科技有限公司 Live broadcast stylization processing method and device, computing device and storage medium
CN110738715A (en) * 2018-07-19 2020-01-31 北京大学 automatic migration method of dynamic text special effect based on sample
CN110738715B (en) * 2018-07-19 2021-07-09 北京大学 Automatic migration method of dynamic text special effect based on sample
CN109816663A (en) * 2018-10-15 2019-05-28 华为技术有限公司 A kind of image processing method, device and equipment
US12026863B2 (en) 2018-10-15 2024-07-02 Huawei Technologies Co., Ltd. Image processing method and apparatus, and device
CN109741413A (en) * 2018-12-29 2019-05-10 北京金山安全软件有限公司 Rendering method and device for semitransparent objects in scene and electronic equipment
CN109741413B (en) * 2018-12-29 2023-09-19 超级魔方(北京)科技有限公司 Rendering method and device of semitransparent objects in scene and electronic equipment
CN111722896B (en) * 2019-03-21 2021-09-21 华为技术有限公司 Animation playing method, device, terminal and computer readable storage medium
CN111722896A (en) * 2019-03-21 2020-09-29 华为技术有限公司 Animation playing method, device, terminal and computer readable storage medium
CN110288625A (en) * 2019-07-04 2019-09-27 北京字节跳动网络技术有限公司 Method and apparatus for handling image
CN110446066A (en) * 2019-08-28 2019-11-12 北京百度网讯科技有限公司 Method and apparatus for generating video
CN110446066B (en) * 2019-08-28 2021-11-19 北京百度网讯科技有限公司 Method and apparatus for generating video
CN113128498A (en) * 2019-12-30 2021-07-16 财团法人工业技术研究院 Cross-domain picture comparison method and system
CN112017179A (en) * 2020-09-09 2020-12-01 杭州时光坐标影视传媒股份有限公司 Method, system, electronic device and storage medium for evaluating visual effect grade of picture
CN113256484A (en) * 2021-05-17 2021-08-13 百果园技术(新加坡)有限公司 Method and device for stylizing image
CN113256484B (en) * 2021-05-17 2023-12-05 百果园技术(新加坡)有限公司 Method and device for performing stylization processing on image
CN116761018A (en) * 2023-08-18 2023-09-15 湖南马栏山视频先进技术研究院有限公司 Real-time rendering system based on cloud platform
CN116761018B (en) * 2023-08-18 2023-10-17 湖南马栏山视频先进技术研究院有限公司 Real-time rendering system based on cloud platform

Similar Documents

Publication Publication Date Title
CN102542593A (en) Interactive video stylized rendering method based on video interpretation
Kelly et al. FrankenGAN: guided detail synthesis for building mass-models using style-synchonized GANs
Hartmann et al. Streetgan: Towards road network synthesis with generative adversarial networks
Zhao et al. Achieving good connectivity in motion graphs
CN106920243A (en) The ceramic material part method for sequence image segmentation of improved full convolutional neural networks
CN110222722A (en) Interactive image stylization processing method, calculates equipment and storage medium at system
CN103218846B (en) The ink and wash analogy method of Three-dimension Tree model
Yu et al. Modern machine learning techniques and their applications in cartoon animation research
Cao et al. Difffashion: Reference-based fashion design with structure-aware transfer by diffusion models
CN103854306A (en) High-reality dynamic expression modeling method
Fan et al. Structure completion for facade layouts.
CN108242074B (en) Three-dimensional exaggeration face generation method based on single irony portrait painting
Tang et al. Animated construction of Chinese brush paintings
Xie et al. Stroke-based stylization learning and rendering with inverse reinforcement learning
KR20230085931A (en) Method and system for extracting color from face images
CN102270345A (en) Image feature representing and human motion tracking method based on second-generation strip wave transform
Tong et al. Sketch generation with drawing process guided by vector flow and grayscale
Yang et al. Brushwork master: Chinese ink painting synthesis for animating brushwork process
Guo Design and development of an intelligent rendering system for new year's paintings color based on b/s architecture
Xie et al. Stroke-based stylization by learning sequential drawing examples
CN104091318B (en) A kind of synthetic method of Chinese Sign Language video transition frame
Jiang et al. Animation scene generation based on deep learning of CAD data
Jia et al. Facial expression synthesis based on motion patterns learned from face database
Fu et al. PlanNet: A Generative Model for Component-Based Plan Synthesis
Wang et al. AI Promotes the Inheritance and Dissemination of Chinese Boneless Painting——Research on Design Practice from Interdisciplinary Collaboration

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20120704