CN102542593A - Interactive video stylized rendering method based on video interpretation - Google Patents
Interactive video stylized rendering method based on video interpretation Download PDFInfo
- Publication number
- CN102542593A CN102542593A CN201110302054XA CN201110302054A CN102542593A CN 102542593 A CN102542593 A CN 102542593A CN 201110302054X A CN201110302054X A CN 201110302054XA CN 201110302054 A CN201110302054 A CN 201110302054A CN 102542593 A CN102542593 A CN 102542593A
- Authority
- CN
- China
- Prior art keywords
- video
- region
- mrow
- image
- brush
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 90
- 238000009877 rendering Methods 0.000 title claims abstract description 72
- 230000002452 interceptive effect Effects 0.000 title claims abstract description 46
- 230000011218 segmentation Effects 0.000 claims abstract description 62
- 238000013016 damping Methods 0.000 claims abstract description 12
- 238000004458 analytical method Methods 0.000 claims abstract description 7
- 230000008901 benefit Effects 0.000 claims abstract description 6
- 230000001902 propagating effect Effects 0.000 claims abstract description 3
- 230000006870 function Effects 0.000 claims description 51
- 238000004422 calculation algorithm Methods 0.000 claims description 28
- 230000008569 process Effects 0.000 claims description 27
- 238000009826 distribution Methods 0.000 claims description 24
- 238000005381 potential energy Methods 0.000 claims description 24
- 230000000694 effects Effects 0.000 claims description 17
- 230000007246 mechanism Effects 0.000 claims description 14
- 230000014509 gene expression Effects 0.000 claims description 13
- 239000000463 material Substances 0.000 claims description 13
- 238000003709 image segmentation Methods 0.000 claims description 11
- 238000005516 engineering process Methods 0.000 claims description 9
- 230000009466 transformation Effects 0.000 claims description 7
- 230000000007 visual effect Effects 0.000 claims description 7
- 238000009499 grossing Methods 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 6
- 239000000203 mixture Substances 0.000 claims description 6
- 230000000644 propagated effect Effects 0.000 claims description 6
- 238000012549 training Methods 0.000 claims description 6
- 238000012217 deletion Methods 0.000 claims description 5
- 230000037430 deletion Effects 0.000 claims description 5
- 238000010428 oil painting Methods 0.000 claims description 5
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 5
- 239000011435 rock Substances 0.000 claims description 4
- 238000007476 Maximum Likelihood Methods 0.000 claims description 3
- 230000005540 biological transmission Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000008859 change Effects 0.000 claims description 3
- 230000001934 delay Effects 0.000 claims description 3
- 238000013461 design Methods 0.000 claims description 3
- 238000009792 diffusion process Methods 0.000 claims description 3
- 238000005315 distribution function Methods 0.000 claims description 3
- 235000013399 edible fruits Nutrition 0.000 claims description 3
- 239000010985 leather Substances 0.000 claims description 3
- 238000005065 mining Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 230000004044 response Effects 0.000 claims description 3
- 239000004576 sand Substances 0.000 claims description 3
- 239000002023 wood Substances 0.000 claims description 3
- 239000003086 colorant Substances 0.000 claims description 2
- 230000001131 transforming effect Effects 0.000 claims description 2
- 238000004519 manufacturing process Methods 0.000 abstract description 9
- 241000501754 Astronotus ocellatus Species 0.000 description 1
- 238000010923 batch production Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 238000010422 painting Methods 0.000 description 1
- 230000037452 priming Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000007480 spreading Effects 0.000 description 1
- 238000003892 spreading Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000037303 wrinkles Effects 0.000 description 1
Landscapes
- Image Analysis (AREA)
Abstract
The invention relates to an interactive video stylized rendering method based on video interpretation, wherein an interactive video semantic segmentation module and a video stylization module are utilized. A segmentation method of the interactive video semantic segmentation module comprises the following steps of: (1) interactive segmentation and automatic identification of key frame images; (2) matching of dense characteristic points among key frames; and (3) area competition segmentation. A stylization method of the video stylization module comprises the following steps of: (4) un-reality sense drawing of the key frames based on semantic analysis; (5) a brushwork propagating method of a sequence frame; and (6) a damping pen brush system for preventing shaking. The interactive video stylized rendering method based on the video interpretation, disclosed by the invention, has the advantages of short manufacturing period, low cost and favorability on manufacturing in batches.
Description
Technical Field
The invention discloses an interactive video stylized rendering method based on video interpretation, and belongs to the technology for modifying the interactive video stylized rendering method based on the video interpretation.
Background
With the wide spread of computers, digital cameras and digital video cameras, people have higher and higher requirements for manufacturing video entertainment. With the resulting explosion in the field of home digital entertainment. More and more people are trying to play the role of an amateur "director" enthusiastically to produce and edit various commonly-written videos. In recent years, various stylized videos are gradually accepted by people and become popular elements, particularly in the aspects of animation videos, online game production and the like. For example, manually drawn oil painting short films such as 'old man and sea' and water and ink painting videos such as 'polliwog looking for mother' lead people to be widely attentive, and the former also obtains a series of awards such as 'oscar short films'. The video stylized rendering not only needs professional technology, but also needs a large amount of manpower and financial support, and the traditional video stylized technology realizes the stylized rendering through a frame-by-frame drawing method. Although the visual effect of each frame of image of the work finished in the production mode can be manually controlled, the continuous playing causes a large jitter phenomenon of a video picture due to lack of inter-frame consistency, and the methods have long production period and high cost and are not beneficial to batch production. For example, although the above-mentioned oil painting short piece of "old man and sea" has a duration of only 22 minutes, the manufacturing cycle can last for nearly 3 years.
Disclosure of Invention
The invention aims to provide an interactive video stylized rendering method based on video interpretation, which has the advantages of short manufacturing period, low cost and benefit for batch manufacturing in consideration of the problems.
The technical scheme of the invention is as follows: the invention relates to an interactive video stylized rendering method based on video interpretation, which comprises an interactive video semantic segmentation module and a video stylized module, wherein the segmentation method of the interactive video semantic segmentation module comprises the following steps:
1) interactive segmentation and automatic identification of key frame images;
2) matching dense feature points among the key frames;
3) a region competition segmentation algorithm;
the stylization method of the video stylization module comprises the following steps:
4) performing non-photorealistic drawing on the key frame based on semantic analysis;
5) a stroke propagation method of the sequence frame;
6) a damping brush system for anti-shake.
The stylization of the video will use both modules in turn. Namely, firstly, the interactive semantic segmentation module is used for carrying out semantic segmentation on the video. And performing stylized rendering on the segmented video by using a video stylization module. The interactive segmentation and automatic identification method of the key frame image in the step 1) comprises the following steps:
dividing the divided semantic regions into twelve classes according to different material properties of the semantic regions, wherein the classes comprise sky/cloud, mountain/land, rock/building, leaves/tree bundle, hair/hair, flower/fruit, skin/leather, trunk/branch, abstract background, wood/plastic, water and clothes;
in actual operation, three main features of texture, color distribution and position information are adopted for training and recognition, a region image X is given, and the conditional probability of the category c is defined as follows:
logP(x|X,θ)=∑iΨi(ci,X;θΨ)+π(ci,X;θπ)+λ(ci,X;θλ)-logZ(θ,X)(*)
the latter four terms in the formula are respectively a texture potential energy function, a color potential energy function, a position potential energy function and a normalization term.
Texture potential energy function is defined as psii(ci,X;θΨ)=logP(ci|X,i),P(ciIx, i) is a normalized distribution function given by the Boost classifier;
the color potential energy function is defined as pi (c)i,X;θπ)=log∑kθn(ci,k)P(k|xi) Using a Gaussian Mixture model in CIELab color space (Gaussian Mixture Models: GMMs) to represent a color model with a conditional probability of, for a pixel color x in a given image:wherein mukAnd Σ k represents the mean and variance of the kth color cluster, respectively;
the position potential energy function is defined as lambda (c)i,X;θλ)=logθλ(ciI) the position potential energy function is relatively weak with respect to the first two potential energy functions, in the definition of this function the class labels of the image pixels are only related to the absolute position in the image;
training 12 types of materials by using the method, then calculating the probability of each pixel in a given image region for each type by using the formula, finally counting all pixels in the region, and determining the type of each region by using a voting mode; in the stylized rendering process, the selection of the paintbrush is determined by the material identified by the object area, and a foundation is laid for realizing automatic rendering.
The matching method of dense feature points among the key frames in the step 2) is as follows:
after semantic information on the key frame is obtained, line drawing characteristics and texture and color mixed image template characteristics are integrated, and rich characteristic sets and expressions are provided for image matching problems;
11) line drawing features are represented by Gabor basis as:
Fsk(Ii)=||<Ii,Gcos,x,θ>||2+||<Ii,Gsin,x,θ>||2,Gsin,x,θand Gcos,x,θRespectively show at the position
And x is a sine and cosine Gabor base with the direction theta. Its characteristic probability distribution is expressed as:
representing the parameter thetai,hskIs a function of the sigmoid and is,is a standardized constraint.
So the model will encourage a stronger corresponding edge than the background distribution;
12) the texture features are modeled by a simplified histogram of gradient directions (HOG), and 6 feature dimensions respectively represent different gradient directions; represents the jth direction of the HOG, andrepresents the ith feature IiA corresponding descriptor; is Ftxt(Ii) Mean over all positive samples. The present invention represents a probabilistic model of a feature as:
is the parameter thetai. It can be seen that the model encourages responses to a collection of feature image blocks in a relatively large set;
13) the color characteristics are described in terms of simple pixel brightness,is the filter in position x. According to the invention, the pixel brightness value is quantized to each statistical interval, so that the model can be simplified as follows:
the method comprises the steps of combining similar small image features to obtain a feature combination with strong local discrimination, firstly, segmenting an image to obtain a plurality of tiny image blocks in the image, extracting statistical features capable of describing line drawing, textures and colors from the small image blocks, and finally obtaining the feature combination with strong local discrimination by adopting an iterative region growing and model learning algorithm and continuously updating a feature model and iteratively growing a feature combination region in order to effectively obtain the feature combination;
on the basis of the expression, modeling the matching problem of the moving target on a time domain and a space domain into a layered graph matching frame on a graph representation, taking the extracted mixed image template characteristics as graph nodes, constructing a graph structure between the frames, and defining the edge connection relation between the graph nodes based on the similarity and the space position between the characteristics and the type of an object to which the characteristics belong;
the original graph and the target graph are represented by Is and It, U, V represents Is and the mixed template feature set in It, and each feature point U belongs to U', and the two marks are arranged: hierarchy tag I (u) e {1, 2Establishing a vertex set of a graph structure by using a candidate set C with higher matching degree of each feature point in an original graph, and taking E as E+∪E-And constructing an edge set. Negative sides indicate that the connected candidates repel each other, and define their "repulsive force" as:
connecting spatially adjacent and non-mutually exclusive candidate feature points by positive edgesIndicating the degree of closeness of cooperation between them,denotes vi,vjThe spatial distance therebetween;
graph structure G for combining original image and target images、GTIs divided into K +1 layers, where K represents the number of objects in the original image and is denoted by GsFor example, the division is denoted as ═ g0,g1,...,gk}. Wherein, gkIs GsA sub-graph of which vertices are grouped by UkAnd (4) showing. Similarly, GTSet of vertices of (1) as VkAnd (4) showing. Then G issAnd GTThe matching relationship between the two is expressed asAssuming that the matching between subgraphs is independent of each other, then:
defining matching sub-graph pairs (g) by geometric transformation and appearance measurek,gk') measure of similarity betweenRepresents; in summary, the solution to the graph structure matching problem can be configured as:
W=(K,∏={g0,g1,...,gk},Ψ={Φk},Φ={Φk})
under the Bayesian theory framework, the graph structure matching problem is described by maximizing the posterior probability:
W*=argmaxp(W|Gs,GT)=argmaxp(W)p(Gs,GT|W)
the above formula is solved by a Markov Chain Monte Carlo (MCMC) method, and meanwhile, for efficient calculation, the global optimal solution is quickly converged by efficient skip in a solution space so as to achieve matching of the inter-frame feature points.
The area competition segmentation method of the step 3) is as follows:
on the basis of obtaining a stable matching relation between frames, the matching relation between the characteristics of a previous frame and the characteristics of a current frame can be determined by mining the advantages of a region competition mechanism in video segmentation and utilizing an image matching algorithm of a layered graph structure, so that the semantic information of the previous frame is spread to the current frame, and then the current frame is segmented into a plurality of semantic regions by utilizing a region competition segmentation algorithm according to the characteristic information of each matching region;
given image I, the corresponding image segmentation solution is defined as follows:
W={(R1,R2,...RN),(θ1,θ2,...,θN),(I1,I2,,...,IN)}
wherein R isiIndicating the divided regions having the same characteristics, θirepresents a region RiParameters of the corresponding feature probability distribution model, IiRepresents a region RiA corresponding mark;
the number N of the segmentation areas can be determined according to the matching relation of the features in the previous frame and the next frame. Let the feature small region set S corresponding to each region be { S }1,S2,...,SNFor each region R }iAccording to the small area S occupied by the featureiEstimating initial parameter theta of the modeliObtaining an initial posterior probability P (theta)iI (x, y)). According to the MDL principle, the posterior probability is converted into the minimum problem of solving the energy function, and the following results are obtained:
wherein Represents a region RiThe boundary contour of (1). The invention adopts an iterative mode to estimate the parameter theta in stagesiAlternating and iterating two stages, and continuously reducing an energy function in each stage, so as to continuously learn and deduce a final segmentation result of the whole image;
in the regional competition process, continuously updating a characteristic probability distribution model of each region, simultaneously competing for ownership of pixel points according to the steepest descent principle, and updating respective boundary contour, so that each region continuously expands the range, and finally obtaining an image segmentation result of the current frame;
the specific iteration steps are as follows: the first stage, fixing Γ, estimates { θ ] from the current region segmentation stateiGet the parameter theta under the current stateiAs its optimal solutionTo minimize the cost of describing each region, the energy function therefore translates into:
second stage, { θ }iAs known, the steepest descent is carried out on the gamma, and in order to quickly obtain the minimum solution of the energy function, the steepest descent motion equation is solved for the boundary gamma of all the areas. For any point on the boundary contour gammaIs provided with
Wherein, at point of τ kDirection vector, point ofTo which region belongsDepending on the pointThe degree of suitability for description by the region feature probability distribution model;
to determine the membership between each pixel point and the region, the competition-based image segmentation algorithm process is described as follows:
in the initialization stage, estimating initial parameters of various models according to the matched characteristic image blocks, adding boundary points of all the characteristic image blocks into a queue to be determined, and calculating posterior probabilities of all the boundary points belonging to various types;
in a loop iteration stage, selecting a boundary point i with the steepest descending current energy from an undetermined queue, and further updating all boundaries where the boundary point i is located; then under the current segmentation state, recalculating the model parameters of each region by using maximum likelihood estimation; recalculating the posterior probabilities of all the boundary points belonging to various types by using the newly obtained feature distribution models of the regions;
therefore, the boundary point with the fastest descending current energy is continuously selected from the queue to be determined to update the corresponding boundary, meanwhile, the characteristic distribution probability model of each region is updated timely according to the current region segmentation state, the regions are mutually restricted, and the ownership of the image region is simultaneously competed until the energy function converges, so that the image is segmented into the regions.
The stylization method of the video stylization module includes the steps of 4) video stylization is based on an interactive video semantic segmentation module, and the selection of the brush is determined only by the material corresponding to the identified object area;
the paintbrushes draw a large number of typical strokes on paper based on professional painters, then scan and parameterize, finally establish a stroke library, draw in each image area, firstly adopt a big brush to bottom, then gradually reduce brush size and opacity to finely depict the detailed part of an object, and during drawing, adopt the drawing strategy of edge first and then inside: drawing each layer of image firstly starts from the edge, firstly draws along the edge of line drawing, and aligns the brush according to the flow field;
in the video rendering, in order to ensure the continuity and stability of the brush in the time domain, a thin-plate spline interpolation technology is adopted to carry out the propagation of the brush strokes, and in addition, in the propagation process of the brush strokes, the deletion and addition mechanisms of the brush strokes are designed by calculating the area of the brush stroke areas; and the 'shaking' effect of the rendering result is reduced by using a simulated damping spring system.
The key frame non-photorealistic rendering method based on semantic analysis in the step 5) of the stylizing method of the video stylizing module is as follows:
how to design pen touch models with different artistic styles is one of the focuses of video stylization attention, works with different artistic expression forms have characteristics on pen touch expression, a basic drawing strategy in video stylization is to select proper pen touches to draw based on image content, a pen touch library is to draw a large number of typical pen touches on paper based on professional painters, then to scan and parameterize, and finally to complete establishment, and for a brush B to be drawnnThe following information is contained: class information of brush InRange of laying area ΛnColor mapping CnAlpha of the transparency fieldnHeight field HnAnd control point { PniThere are:
Bn={In,Λn,Cn,αn,Hn,{Pni}}
when designing the stroke model, not only the low-level information such as the shape and texture of the stroke is considered, but also the high-level semantic information of the stroke is integrated, so that each interpretation area of the image/video has pen dependence in the rendering process; when selecting the strokes, the interpretation area categories are used as key words, and a batch of strokes with the same category are simply and quickly selected from the stroke library. And then selecting a stroke from the strokes in a random manner;
for simulating the principle of 'alignment' in oil painting drawing, the original simple model theory is used for reference, and in each region RiIn the interior, calculate its original reduced graph SKiAnd (4) expressing. The reduced graph is composed of a group of salient elements for marking the surface characteristics of an object, such as spots, lines and wrinkles on clothes; during rendering, different paintbrushes will be overlaid on these primitives to produce the desired artistic effect; interpretation region Ri,Ri∈ΛiDivided into line-drawing parts for describing the line-drawingAnd for describing non-line drawing parts having the same structural regionRiThe directional field θ x is defined as:
wherein the direction field thetaiThe initial value being line tracingIn the direction of the gradient of (c). Then using diffusion equation to propagate direction to non-line drawing region
The rendering process of the key frame is a process of continuously selecting strokes and placing strokes; to interpret the region RiFor example, first render its non-line-drawn partThen rendering the line-drawn partThis is to ensure that when rendered regions overlap, the pen strokes of the line-drawn portion can be on the upper layer; in the non-line drawing part, optionally selecting an unrendered pixel area, taking the center of the area as an initial point, diffusing to two sides along the direction field, and generating a flow pattern area; taking the central axis of the area as a reference line, and transforming the selected brush into the flow pattern area to align the central axis of the pen touch with the central axis of the area; rendering of the line-drawn portion of the region is similar.
The stylization method of the video stylization module (2) described above, step 5) is the stroke propagation method of the sequence frame as follows:
the rendering of the non-key frame is obtained by 'propagation' of the rendering result of the key frame, the propagation basis is the time-space corresponding relation of the interpretation region, in the propagation process, as the variation of the interpretation region is larger and larger, the stroke may be gradually leaked to the outside of the region, and meanwhile, a rendered gap appears in the region, so in the propagation stroke graph, the adding and deleting mechanism of the stroke must be considered at the same time, otherwise, the rendering result has a jitter phenomenon; the mechanism of propagation, addition and deletion of strokes is as follows:
(a) and (3) pen touch transmission: let c denote a certain interpretation zone of the key-frame at time t of the video, Ri(t +1) represents Ri(t) the region corresponding to time t + 1. Their image area is divided intoIs distinguished by Λi(t)、Λi(t + 1). With Pij(t)、Pij(t +1) represents Λi(t), Λ x (t +1) dense matching points in the time domain (calculated during video interpretation). Let R bei(t +1) can be represented by Ri(t) non-rigid transformation of the table. When the pen touch is transmitted, the invention hopes to be inverted ViMatching point P on (t)ij(t) can be mapped to a new image region Λ in the t +1 th frameiMatching point P of (t +1)ij(t + 1). Based on the above consideration, the invention selects a Thin-plate Spline (TPS) interpolation model. It can handle Lambdai(t) key Point Pij(t) mapping to ΛiMatching point P of (t +1)ij(t +1), and for ΛiThe TPS minimizes the energy function to make lambda beiThe pixel grid of (t) is distorted by elastic (non-rigid) deformation.
(b) Deleting brush strokes: because the region corresponding to some brushes becomes smaller and smaller after the brushes are propagated in the video or in an occlusion relationship or when the number of frames of stroke propagation is too large, the invention eliminates the brushes when the area of the region corresponding to the brushes is smaller than a given threshold. Similarly, a propagated brush is also deleted when it falls outside the corresponding zone boundary.
(c) The pen strokes are added. When new semantic areas appear or existing semantic areas become larger (such as unfolding of clothing), the invention must add new brushes to cover the new areas, and simply change the size and position of adjacent brushes in order to fill the gap between the brushes. If the area not covered by a brush becomes larger and exceeds some given threshold, the system will automatically create a new brush to cover it. Nevertheless, it is not possible with the present invention to draw a stroke on the gap immediately upon its first occurrence. Thus, the present invention sets a relatively high threshold and delays rendering newly appearing regions until they grow large enough. Next, the present invention employs a general brush placement algorithm to fill in gaps large enough to reach a threshold, and finally propagates and transforms these new brushes back to fill in previously occurring but unrendered gap regions. The process of filling the brush backwards avoids frequent brush changes while linking smaller, fragmented brushes to larger brushes, thereby reducing flickering effects and other undesirable artificially created visual effects. Also, since the present invention adds new brushes at the bottom level, they are drawn under the existing brushes, which further reduces the visual flicker effect.
The damping brush system for anti-shake in the step 6) of the stylization method of the video stylization module is as follows:
the last step of stylized rendering of the video is anti-shake operation, in which adjacent paintbrushes in the time domain and the space domain are connected by springs to simulate a damping system; by minimizing the energy of the system, the effect of removing the jitter can be achieved;
for the ith brush at the time t, the invention uses Ai,t=(xi,t,yi,t,si,t) Geometric attributes representing its center coordinates and size, and its initial values are notedThe energy function of the damped brush system is defined as follows:
E=Edata+λ1Esmooth1+λ2Esmooth2
λ1and λ2As a weight, λ1=2.8,λ2=1.1;
The first term constrains the position of the brush not to be too far from the initial position:
the second term in the equation is the smoothing constraint on brush i in the time domain:
the third term in the formula smoothly constrains adjacent brush in both time domain and space domain; note the book For any adjacent brush, i.e. the ith brush at time tThe relative distance difference and the size difference between them are recorded as Δ Ai,j,t=Ai,t-Aj,tAnd the smoothing term is defined as follows:
the energy minimization problem is solved by the Levenbergy-Marquard algorithm.
Lambda of above1=2.8,λ2=1.1。
The invention explores the semantic-driven video stylized rendering technology by researching the segmentation and identification of the video and the establishment of the space-time corresponding relation, and achieves the expression effect required by the art. The invention starts from the semantic analysis research of the input video, adopts an interactive mode based on key frames, provides sufficient prior information for video segmentation while reducing the burden of a user to the maximum extent, and then propagates the interactive information on the key frames to subsequent frames by establishing the corresponding relation of characteristic points between the frames and adopting a regional competition algorithm, so that the semantic information of the user can sufficiently guide accurate video segmentation. And different stroke libraries are created for different styles. During rendering, the key frame is rendered according to the semantic information, and then the stroke of the key frame is transmitted to the sequence frame through spatial transformation by taking the spatial-temporal relationship of the semantic region as constraint, so that the 'jitter' effect of the rendering result is effectively inhibited. In addition, the invention further provides a system scheme convenient for user interactive creation, thereby improving the applicability of the project. The invention can be widely applied to various industries such as advertisement, education, entertainment and the like, and has important application background.
Detailed Description
Example (b):
the invention relates to an interactive video stylized rendering method based on video interpretation, which comprises an interactive video semantic segmentation module and a video stylized module, wherein the segmentation method of the interactive video semantic segmentation module comprises the following steps:
1) interactive segmentation and automatic identification of key frame images;
2) matching dense feature points among the key frames;
3) a region competition segmentation algorithm;
the stylization method of the video stylization module comprises the following steps:
4) performing non-photorealistic drawing on the key frame based on semantic analysis;
5) a stroke propagation method of the sequence frame;
6) a damping brush system for anti-shake.
The stylization of the video will use both modules in turn. Namely, firstly, the interactive semantic segmentation module is used for carrying out semantic segmentation on the video. And performing stylized rendering on the segmented video by using a video stylization module. The interactive video semantic segmentation module 1 comprises the following steps of 1) interactive segmentation and automatic identification method of key frame images:
in the invention, the mature recognition technology TextonBoost and the interactive segmentation method GraphCut are integrated, and interactive semantic segmentation and recognition are carried out on the key frame image, so that the object region and the mutual layering and shielding relation in the image are obtained. The system of the invention classifies the segmented semantic regions into twelve categories according to different material properties of the semantic regions, including sky, water, land, rock, hair, skin, clothes and the like, as shown in table 1.
Table 1: 12 material classes of semantic region
Mountain range | Water (W) | Rock/building | Leaves/bushes |
Skin/leather | Hair/hair | Flower/fruit | Sky/cloud |
Clothes | Trunk/branch | Abstracted background | Wood/plastic |
In actual operation, the method adopts three main characteristics of texture, color distribution and position information for training and recognition. Given a region image X, the conditional probability of defining its class c is:
the last four terms in the formula are respectively a texture potential energy function, a color potential energy function, a position potential energy function and a normalization term.
Texture potential energy function is defined as psii(ci,X;θΨ)=logP(ci|X,i),P(ci|X,i) Is a normalized distribution function given by the Boost classifier.
The color potential energy function is defined as pi (c)i,X;θπ)=log∑kθπ(ci,k)P(k|xi) Here, the present invention uses a Gaussian Mixture model in the CIELab color space (Gaussian Mixture Models: GMMs) to represent a color model with a conditional probability of, for a pixel color x in a given image:wherein mukAnd Σ k denotes the mean and variance, respectively, of the kth color cluster.
The position potential energy function is defined as lambda (c)i,X;θλ)=logθλ(ciI), the position potential energy function is relatively weak with respect to the first two potential energy functions, in the definition of this function the class labels of the image pixels are only related to the absolute position in the image.
The method is used for training 12 types of materials, then the probability of each pixel in a given image region for each category is calculated by adopting the formula, all pixels in the region are counted, and the category of each region is determined by adopting a voting mode. In the stylized rendering process, the selection of the paintbrush is determined by the material identified by the object area, and a foundation is laid for realizing automatic rendering.
2) Matching of dense feature points between key frames
After obtaining the semantic information on the key frames, the present invention needs to explore a matching algorithm between frames to effectively propagate the semantic information to the sequence frames.
The invention firstly provides comprehensive line drawing and texture and color mixed image template characteristics, and provides rich characteristic set and expression for image matching problems.
(a) Line drawing features are represented by Gabor basis as: fsk(Ii)=||<Ii,Gcos,x,θ>||2+||<Ii,Gsin,x,θ>||2,Gsin,x,θAnd Gcos,x,θRepresenting the sine and cosine Gabor bases of the direction theta at position x, respectively. Its characteristic probability distribution is expressed as:
representing the parameter thetai,hskIs a function of the sigmoid and is,is a standardized constraint.
So the model will encourage a stronger corresponding edge than the background distribution.
(b) Texture features are modeled by a simplified histogram of gradient directions (HOG), with 6 feature dimensions representing different gradient directions.Represents the jth direction of the HOG, anda descriptor corresponding to the ith feature Ii is shown. Is Ftxt(Ii) Mean over all positive samples. The present invention represents a probabilistic model of a feature as:
is the parameter thetai. It can be seen that the model encourages responses to a collection of feature image blocks in a relatively large set.
(c) The color characteristics are described in terms of simple pixel brightness.Is the filter in position x. According to the invention, the pixel brightness value is quantized to each statistical interval, so that the model can be simplified as follows:
according to the invention, by combining the similar small features of the images, a feature combination with strong discrimination can be obtained locally. Firstly, the image is segmented to obtain a plurality of tiny image blocks in the image. And extracting statistical characteristics capable of describing line drawing, texture and color from the small image block. In order to effectively obtain the feature combination, an iterative region growing and model learning algorithm is adopted, the feature combination region is iteratively grown by continuously updating the feature model, and finally the feature combination with strong local discrimination is obtained.
Based on the expression, the invention models the matching problem of the moving object in the time domain and the space domain as a layered graph matching framework on a graph representation. The extracted mixed image template features serve as graph nodes, graph structures are built among frames, and edge connection relations among the graph nodes can be defined based on similarity and spatial positions among the features and object types to which the features belong.
The original drawing and the target drawing are represented by Is and It, and U, V represents a mixed template feature set in Is and It, respectively. For each feature point U ∈ U', there are two labels: hierarchy tag I (u) e {1, 2And establishing a vertex set of the graph structure by using the candidate set C with higher matching degree of each feature point in the original graph. With E ═ E+∪E-And constructing an edge set. Negative sides indicate that the connected candidates repel each other, and define their "repulsive force" as:
connecting spatially adjacent and non-mutually exclusive candidate feature points by positive edgesIndicating the degree of closeness of cooperation between them,denotes vi,vjThe spatial distance therebetween.
Graph structure G for combining original image and target images、GTAnd dividing the image into K +1 layers, wherein K represents the number of objects in the original image. With GsFor example, the division is denoted as ═ g0,g1,...,gk}. Wherein, gkIs GsA sub-graph of which vertices are grouped by UkAnd (4) showing. Similarly, GTSet of vertices of (1) as VkAnd (4) showing. Then G issAnd GTThe matching relationship between the two is expressed asAssuming that the matching between subgraphs is independent of each other, then:
in the invention, matching sub-graph pairs (g) are defined by geometric transformation and appearance measurek,gk') measure of similarity betweenAnd (4) showing. In summary, the solution to the graph structure matching problem can be configured as:
W=(K,∏={g0,g1,...,gk},Ψ={Φk},Φ={Φk})
under the Bayes theory framework, the invention describes the graph structure matching problem with the maximum posterior probability:
W*=argmaxp(W|Gs,GT)=argmaxp(W)p(Gs,GT|W)
the present invention may solve the above equation by a Markov Chain Monte Carlo (MCMC) method. Meanwhile, for efficient calculation, the method explores a cluster sampling strategy, and quickly converges to a global optimal solution through efficient jumping in a solution space so as to achieve matching of inter-frame feature points.
(1) Region competition segmentation algorithm
On the basis of obtaining the inter-frame stable matching relation, the invention provides an inter-frame matching-based regional competition propagation algorithm by mining the advantages of a regional competition mechanism in video segmentation. By using the image matching algorithm of the layered graph structure, the invention can determine the matching relationship between the characteristics of the previous frame and the current frame, the semantic information of the previous frame is transmitted to the current frame, and then the current frame is divided into a plurality of semantic areas by using the area competition division algorithm according to the characteristic information of each matching area.
Given image I, the corresponding image segmentation solution is defined as follows:
W={(R1,R2,...RN),(θ1,θ2,...,θN),(I1,I2,....,IN)}
wherein R isiIndicating the divided regions having the same characteristics, θirepresents a region RiParameters of the corresponding feature probability distribution model, IiRepresents a region RiAnd marking correspondingly.
The number N of the segmentation areas can be determined according to the matching relation of the features in the previous frame and the next frame. Let the feature small region set S corresponding to each region be { S }1,S2,...,SNFor each region R }iAccording to the small area S occupied by the featureiEstimating initial parameter theta of the modeliObtaining an initial posterior probability P (theta)iI (x, y)). According to the MDL principle, the posterior probability is converted into the minimum problem of solving the energy function, and the following results are obtained:
wherein Represents a region RiThe boundary contour of (1). The invention adopts an iterative mode to estimate the parameter theta in stagesiAnd f, alternately iterating two stages, and continuously reducing the energy function in each stage, thereby continuously learning and reasoning the final segmentation result of the whole image.
In the process of regional competition, each region continuously updates the characteristic probability distribution model of the region, simultaneously contends for ownership of pixel points according to the steepest descent principle, and updates respective boundary contour, so that each region continuously expands the range, and finally the image segmentation result of the current frame is obtained.
The specific iteration steps are as follows: the first stage, fixing Γ, estimates { θ ] from the current region segmentation stateiGet the parameter theta under the current stateiAs its optimal solutionTo minimize the cost of describing each region, the energy function therefore translates into:
second stage, { θ }iAs known, the steepest descent is carried out on the gamma, and in order to quickly obtain the minimum solution of the energy function, the steepest descent motion equation is solved for the boundary gamma of all the areas. For any point on the boundary contour gammaIs provided with
Wherein, at point of τ kThe direction vector of (2). DotTo which region it belongs, depending on the pointFitting the degree described by the region feature probability distribution model.
In order to determine the membership between each pixel point and each region, the invention provides an image segmentation algorithm based on a competition mechanism to rapidly complete image segmentation. The specific image segmentation algorithm process based on the competition mechanism is described as follows:
in the initialization stage, the initial parameters of various models are estimated according to the matched characteristic image blocks, the boundary points of all the characteristic image blocks are added into a queue to be determined, and the posterior probability that all the boundary points belong to various types is calculated.
In a loop iteration stage, selecting a boundary point i with the steepest descending current energy from an undetermined queue, and further updating all boundaries where the boundary point i is located; then under the current segmentation state, recalculating the model parameters of each region by using maximum likelihood estimation; and recalculating the posterior probabilities of all the boundary points belonging to the various types by using the newly obtained feature distribution models of the regions.
Therefore, the boundary point with the fastest descending current energy is continuously selected from the queue to be determined to update the corresponding boundary, meanwhile, the characteristic distribution probability model of each region is updated timely according to the current region segmentation state, the regions are mutually restricted, and the ownership of the image region is simultaneously competed until the energy function converges, so that the image is segmented into the regions.
1. Video stylization module
Video stylization is based on an interactive video semantic segmentation module. The selection of the brush is determined only by the material corresponding to the identified object region. The brush of the system of the invention is based on a professional painter to draw a large number of typical strokes on paper, then scanning and parameterizing are carried out, and finally a stroke library is established. For each image region rendering, a large brush is first used for priming, and then the brush size and opacity are gradually reduced to fine-delineate detailed portions of the object. During drawing, adopting a drawing strategy of firstly drawing the edge and then drawing the inside: drawing of each layer of image the invention starts with the edge first, draws along the line-drawn edge first, and aligns the brush according to the flow field. In video rendering, in order to ensure the continuity and stability of the brush in the time domain, the invention adopts the thin-plate spline interpolation technology to carry out the propagation of the brush strokes. In addition, in the process of spreading the pen strokes, the area of the pen stroke area is calculated, and a pen stroke deleting and adding mechanism is designed. And the 'shaking' effect of the rendering result is reduced by using a simulated damping spring system.
(1) Key frame non-photorealistic drawing technology based on semantic analysis
How to design different artistic style stroke models is one of the focuses of video stylization attention. Works with different artistic expression forms have respective characteristics on stroke expression. In video stylization, the basic drawing strategy of the invention is to select proper strokes for drawing based on image content, and the stroke library is to draw a large number of typical strokes on paper based on a professional painter, then to scan and parameterize, and finally to complete the establishment. For brush B to be drawnnThe following information is contained: class information l of brushnRange of laying area ΛnColor mapping CnAlpha of the transparency fieldnHeight field HnAnd control point { PniThere are:
Bn={In,Λn,Cn,αn,Hn,{Pni}}
when designing the stroke model, the invention not only considers the low-level information of the shape, the texture and the like of the stroke, but also integrates the high-level semantic information of the stroke. So that each interpretation zone of the image/video has a "pen" to rely on during the rendering process. The method is one of the keys of the rendering algorithm of the invention different from the traditional pen-touch-based rendering algorithm. Therefore, when the strokes are selected, the interpretation area categories are used as key words, and a batch of strokes with the same category can be simply and quickly selected from the stroke library. And then select one stroke from them in a random manner.
In order to simulate the principle of 'alignment' in oil painting drawing, the invention uses the original simple model theory for reference, and each region R is provided with a plurality of regions RiIn the invention, the original simple graph SK is calculatediAnd (4) expressing. The reduced graph is composed of a set of salient elements for marking the surface features of the object, such as spots, lines, folds and the like on clothes. During rendering, different paintbrushes are overlaid on the primitives to produce the desired artistic effect. Interpretation region Ri,Ri∈ΛiDivided into line-drawing parts for describing the line-drawingAnd for describing non-line drawing parts having the same structural regionRiDirection field thetaiIs defined as:
wherein the direction field thetaiThe initial value being line tracingIn the direction of the gradient of (c). Then using diffusion equation to propagate direction to non-line drawing region
The process of rendering the key frame is a process of continuously selecting strokes and placing strokes. To interpret the region RiFor example, the invention first renders its non-line-drawn partsThen rendering the line-drawn partThis is to ensure that when the rendered regions overlap, the pen strokes of the line-drawn portion can be on the upper layer. In the non-line drawing part, an unrendered pixel area is selected optionally, the area is diffused to two sides along the direction field by taking the center of the area as an initial point, and a flow pattern area is generated. And taking the central axis of the area as a reference line, and converting the selected brush into the flow pattern area to align the central axis of the pen touch with the central axis of the area. Rendering of the line-drawn portion of the region is similar.
(2) Stroke propagation algorithm of sequence frame
In the invention, the rendering of the non-key frame is obtained by the 'propagation' of the rendering result of the key frame. The propagation basis is the spatio-temporal correspondence of the interpretation zones. In the propagation process, as the interpretation zone changes more and more, the brush strokes may gradually leak outside the zone while gaps in the zone appear as being rendered. Therefore, in propagating the stroke graph, the adding and deleting mechanisms of the strokes must be considered at the same time. Otherwise, the rendering result will have a jitter phenomenon. The following describes the mechanism of propagation, addition and deletion of strokes, respectively.
(d) And (3) pen touch transmission: let c denote a certain interpretation zone of the key-frame at time t of the video, Ri(t +1) represents Ri(t) the region corresponding to time t + 1. Their image areas are respectively marked with Λi(t)、Λi(t + 1). With Pij(t)、Pij(t +1) represents Λi(t)、Λi(t +1) dense matching points in the time domain (calculated during video interpretation). Let R bei(t +1) can be represented by Ri(t) non-rigid transformation of the table. When the pen touch is transmitted, the invention hopes to be inverted ViMatching point P on (t)ij(t) can be mapped to a new image region Λ in the t +1 th frameiMatching point P of (t +1)ij(t + 1). Based on the above consideration, the invention selects a Thin-plate Spline (TPS) interpolation model. It can handle Lambdai(t) key Point Pij(t) mapping to ΛiMatching point P of (t +1)ij(t +1), and for ΛiThe TPS minimizes the energy function to make lambda beiThe pixel grid of (t) is distorted by elastic (non-rigid) deformation.
(e) Deleting brush strokes: because the region corresponding to some brushes becomes smaller and smaller after the brushes are propagated in the video or in an occlusion relationship or when the number of frames of stroke propagation is too large, the invention eliminates the brushes when the area of the region corresponding to the brushes is smaller than a given threshold. Similarly, a propagated brush is also deleted when it falls outside the corresponding zone boundary.
(f) The pen strokes are added. When new semantic areas appear or existing semantic areas become larger (such as unfolding of clothing), the invention must add new brushes to cover the new areas, and simply change the size and position of adjacent brushes in order to fill the gap between the brushes. If the area not covered by a brush becomes larger and exceeds some given threshold, the system will automatically create a new brush to cover it. Nevertheless, it is not possible with the present invention to draw a stroke on the gap immediately upon its first occurrence. Thus, the present invention sets a relatively high threshold and delays rendering newly appearing regions until they grow large enough. Next, the present invention employs a general brush placement algorithm to fill in gaps large enough to reach a threshold, and finally propagates and transforms these new brushes back to fill in previously occurring but unrendered gap regions. The process of filling the brush backwards avoids frequent brush changes while linking smaller, fragmented brushes to larger brushes, thereby reducing flickering effects and other undesirable artificially created visual effects. Also, since the present invention adds new brushes at the bottom layer, they are drawn under the existing brushes, which further reduces the visual flicker effect.
(3) Damping brush system for preventing shaking
The final step in stylizing the video is the anti-shake operation. The invention connects adjacent paintbrushes in the time domain and the space domain by springs to simulate a damping system. By minimizing the energy of the system, the effect of removing jitter is achieved.
For the ith brush at the time t, the invention uses Ai,t=(xi,t,yi,t,si,t) Geometric attributes representing its center coordinates and size, and its initial values are notedEnergy function definition for a damped brush systemThe following were used:
E=Edata+λ1Esmooth1+λ2Esmooth2
λ1and λ2For weighting, in the experiment, the present invention sets it as λ1=2.8,λ2=1.1。
The first term constrains the position of the brush not to be too far from the initial position:
the second term in the equation is the smoothing constraint on brush i in the time domain:
the third term in the equation smoothly constrains adjacent brushes in both the temporal and spatial domains. Note the bookFor any adjacent brush, i.e. the ith brush at time tThe relative distance difference and the size difference between them are recorded as Δ Ai,j,t=Ai,t-Aj,tAnd the smoothing term is defined as follows:
the energy minimization problem is solved by the Levenbergy-Marquard algorithm.
Claims (9)
1. An interactive video stylized rendering method based on video interpretation is characterized by comprising an interactive video semantic segmentation module and a video stylization module.
The segmentation method of the interactive video semantic segmentation module comprises the following steps:
1) interactive segmentation and automatic identification of key frame images;
2) matching dense feature points among the key frames;
3) performing area competition segmentation;
the stylization method of the video stylization module comprises the following steps:
1) performing non-photorealistic drawing on the key frame based on semantic analysis;
2) stroke propagation of sequence frames;
3) treated with an anti-shake, damped brush system.
The method comprises the steps of sequentially using an interactive video semantic segmentation module and a video stylization module for stylizing a video, namely performing semantic segmentation on the video by using the interactive video semantic segmentation module, and performing stylized rendering on the segmented video by using the video stylization module.
2. The interactive video stylized rendering method based on video interpretation according to claim 1, characterized in that the interactive segmentation and automatic identification method of key frame images of the above steps is as follows:
dividing the divided semantic regions into twelve classes according to different material properties of the semantic regions, wherein the classes comprise sky/cloud, mountain/land, rock/building, leaves/tree bundle, hair/hair, flower/fruit, skin/leather, trunk/branch, abstract background, wood/plastic, water and clothes;
in actual operation, three main features of texture, color distribution and position information are adopted for training and recognition, a region image X is given, and the conditional probability of the category c is defined as follows:
The last four terms in the formula are respectively a texture potential energy function, a color potential energy function, a position potential energy function and a normalization term.
Texture potential energy function is defined as psii(ci,X;θΨ)=logP(ci|X,i),P(ciIx, i), a normalized distribution function given by the Boost classifier;
the color potential energy function is defined as pi (c)i,X;θn)=log∑kθn(ci,k)P(k|xi) Using a Gaussian Mixture model in CIELab color space (Gaussian Mixture Models: GMMs) to represent a color model with a conditional probability of, for a pixel color x in a given image:wherein mukAnd Σ k represents the mean and variance of the kth color cluster, respectively;
the position potential energy function is defined as lambda (c)i,X;θλ)=logθ2(ciI) the position potential energy function is relatively weak with respect to the first two potential energy functions, in the definition of this function the class labels of the image pixels are only related to the absolute position in the image;
training 12 types of materials by using the method, giving the probability of each pixel in an image region for each type by adopting formula 1, counting all pixels in the region, and determining the type of each region by adopting a voting mode; in the stylized rendering process, the selection of the paintbrush is determined by the material identified by the object area, and a foundation is laid for realizing automatic rendering.
3. The interactive video stylized rendering method based on video interpretation according to claim 1, characterized in that the matching method of dense feature points among the key frames of the above step 2) is as follows:
after semantic information on the key frame is obtained, line drawing characteristics and texture and color mixed image template characteristics are integrated, and rich characteristic sets and expressions are provided for image matching problems;
11) line drawing features are represented by Gabor basis as:
Fsk(Ii)=||<Ii,Gcos,x,θ>||2+||<Ii,Gsin,x,θ,Giand GcRepresenting the sine and cosine Gabor bases, respectively, in the direction at position x. Its characteristic probability distribution is expressed as:
representing the parameter thetai,hskIs a function of the sigmoid and is,is a standardized constraint.
So the model will encourage a stronger corresponding edge than the background distribution;
12) the texture features are modeled by a simplified histogram of gradient directions (HOG), and 6 feature dimensions respectively represent different gradient directions; represents the jth direction of the HOG, anda descriptor corresponding to the ith feature is represented;is FtMean over all positive samples. The present invention represents a probabilistic model of a feature as:
is the parameter thetai. It can be seen that the model encourages responses to a collection of feature image blocks in a relatively large set;
13) the color characteristics are described in terms of simple pixel brightness, according to the invention, the pixel brightness value is quantized to each statistical interval, so that the model can be simplified as follows:
the method comprises the steps of combining similar small image features to obtain a feature combination with strong local discrimination, firstly, segmenting an image to obtain a plurality of tiny image blocks in the image, extracting statistical features capable of describing line drawing, textures and colors from the small image blocks, and finally obtaining the feature combination with strong local discrimination by adopting an iterative region growing and model learning algorithm and continuously updating a feature model and iteratively growing a feature combination region in order to effectively obtain the feature combination;
on the basis of the expression, modeling the matching problem of the moving target on a time domain and a space domain into a layered graph matching frame on a graph representation, taking the extracted mixed image template characteristics as graph nodes, constructing a graph structure between the frames, and defining the edge connection relation between the graph nodes based on the similarity and the space position between the characteristics and the type of an object to which the characteristics belong;
and Is, It represents the original graph and the target graph, U, V represents the mixed template feature set in Is, It, and there are two marks for each feature point U e U': hierarchy tag I (u) e {1, 2Establishing a vertex set of a graph structure by using a candidate set C with higher matching degree of each feature point in an original graph, and taking E as E+∪E-And constructing an edge set. Negative sides indicate that the connected candidates repel each other, and define their "repulsive force" as:
connecting spatially adjacent and non-mutually exclusive candidate feature points by positive edgesIndicating the degree of closeness of cooperation between them,denotes vi,viThe spatial distance therebetween;
graph structure G for combining original image and target images、GTIs divided into K +1 layers, where K represents the number of objects in the original image and is denoted by GsFor example, the division is denoted as ═ g0,g1,...,gk}. Wherein, gkIs GsA sub-graph of which vertices are grouped by UkAnd (4) showing. Similarly, GTSet of vertices of (1) as VkAnd (4) showing. Then G issAnd GTThe matching relationship between the two is expressed asAssuming that the matching between subgraphs is independent of each other, then:
defining matching sub-graph pairs (g) by geometric transformation and appearance measurek,gk') measure of similarity betweenRepresents; in summary, the solution to the graph structure matching problem can be configured as:
W=(K,∏={g0,g1,...,gk},Ψ={Φk},Φ={Φk})
under the Bayesian theory framework, the graph structure matching problem is described by maximizing the posterior probability:
W*=argmaxp(W|Gs,GT)=argmaxp(W)p(Gs,GT|W)
the above formula is solved by a Markov Chain Monte Carlo (MCMC) method, and meanwhile, for efficient calculation, the global optimal solution is quickly converged by efficient skip in a solution space so as to achieve matching of the inter-frame feature points.
4. The interactive video stylized rendering method based on video interpretation according to claim 1, characterized in that the region competition segmentation method of the above step 3) is as follows:
on the basis of obtaining a stable matching relation between frames, the matching relation between the characteristics of a previous frame and the characteristics of a current frame can be determined by mining the advantages of a region competition mechanism in video segmentation and utilizing an image matching algorithm of a layered graph structure, so that the semantic information of the previous frame is spread to the current frame, and then the current frame is segmented into a plurality of semantic regions by utilizing a region competition segmentation algorithm according to the characteristic information of each matching region;
given image I, the corresponding image segmentation solution is defined as follows:
W={(R1,R2,...RN),(θ1,θ2,...,θN),(I1,I2,...,IN)}
wherein R isiIndicating the divided regions having the same characteristics, θirepresents a region RiParameters of the corresponding feature probability distribution model, IiRepresents a region RiA corresponding mark;
the number N of the segmentation areas can be determined according to the matching relation of the features in the previous frame and the next frame. Setting a feature small region set corresponding to each regionS={S1,S2,...,SNFor each region R }iAccording to the small area S occupied by the featureiEstimating initial parameter theta of the modeliObtaining an initial posterior probability P (theta)iI (x, y)). According to the MDL principle, the posterior probability is converted into the minimum problem of solving the energy function, and the following results are obtained:
wherein Represents a region RiThe boundary contour of (1). The invention adopts an iterative mode to estimate the parameter theta in stagesiAlternating and iterating two stages, and continuously reducing an energy function in each stage, so as to continuously learn and deduce a final segmentation result of the whole image;
in the regional competition process, continuously updating a characteristic probability distribution model of each region, simultaneously competing for ownership of pixel points according to the steepest descent principle, and updating respective boundary contour, so that each region continuously expands the range, and finally obtaining an image segmentation result of the current frame;
the specific iteration steps are as follows: the first stage, fixing Γ, estimates { θ ] from the current region segmentation stateiGet the parameter theta under the current stateiAs its optimal solutionTo minimize the cost of describing each region, the energy function therefore translates into:
second stage, { θ }iAs known, the steepest descent is carried out on the gamma, and in order to quickly obtain the minimum solution of the energy function, the steepest descent motion equation is solved for the boundary gamma of all the areas. For any point on the boundary contour gammaIs provided with
Wherein, is taukAt the point ofDirection vector, point ofTo which region it belongs, depending on the pointThe degree of suitability for description by the region feature probability distribution model;
to determine the membership between each pixel point and the region, the competition-based image segmentation algorithm process is described as follows:
in the initialization stage, estimating initial parameters of various models according to the matched characteristic image blocks, adding boundary points of all the characteristic image blocks into a queue to be determined, and calculating posterior probabilities of all the boundary points belonging to various types;
in a loop iteration stage, selecting a boundary point i with the steepest descending current energy from an undetermined queue, and further updating all boundaries where the boundary point i is located; then under the current segmentation state, recalculating the model parameters of each region by using maximum likelihood estimation; recalculating the posterior probabilities of all the boundary points belonging to various types by using the newly obtained feature distribution models of the regions;
therefore, the boundary point with the fastest descending current energy is continuously selected from the queue to be determined to update the corresponding boundary, meanwhile, the characteristic distribution probability model of each region is updated timely according to the current region segmentation state, the regions are mutually restricted, and the ownership of the image region is simultaneously competed until the energy function converges, so that the image is segmented into the regions.
5. The interactive video stylized rendering method based on video interpretation according to claim 1, characterized in that the stylization method step 4) of the video stylization module (2) is based on an interactive video semantic segmentation module, the selection of the brush being determined only by the material corresponding to the identified object region;
the paintbrushes draw a large number of typical strokes on paper based on professional painters, then scan and parameterize, finally establish a stroke library, draw in each image area, firstly adopt a big brush to bottom, then gradually reduce brush size and opacity to finely depict the detailed part of an object, and during drawing, adopt the drawing strategy of edge first and then inside: drawing each layer of image firstly starts from the edge, firstly draws along the edge of line drawing, and aligns the brush according to the flow field;
in the video rendering, in order to ensure the continuity and stability of the brush in the time domain, a thin-plate spline interpolation technology is adopted to carry out the propagation of the brush strokes, and in addition, in the propagation process of the brush strokes, the deletion and addition mechanisms of the brush strokes are designed by calculating the area of the brush stroke areas; and the 'shaking' effect of the rendering result is reduced by using a simulated damping spring system.
6. The interactive video stylized rendering method based on video interpretation according to claim 1, characterized in that the keyframe nonphotorealistic rendering method based on semantic parsing of the stylized method step 5) of the video stylized module (2) is as follows:
how to design pen touch models with different artistic styles is one of the focuses of video stylization attention, works with different artistic expression forms have characteristics on pen touch expression, a basic drawing strategy in video stylization is to select proper pen touches to draw based on image content, a pen touch library is to draw a large number of typical pen touches on paper based on professional painters, then to scan and parameterize, and finally to complete establishment, and for a brush to be drawnThe following information is contained: class information of brushRange of placement areaColor mappingOf a field of transparencyHeight fieldAnd a control pointNamely, the following steps are provided:
when designing the stroke model, not only the low-level information such as the shape and texture of the stroke is considered, but also the high-level semantic information of the stroke is integrated, so that each interpretation area of the image/video has pen dependence in the rendering process; when selecting the strokes, the category of the interpretation area is taken as a keyword, a batch of strokes with the same category are simply and quickly selected from the stroke library, and one stroke is selected from the strokes in a random mode;
for simulating the principle of 'alignment' in oil painting drawing, the original simple model theory is used for reference, and in each areaInternally, calculate its original reduced graphExpressed, the reduced graph is composed of a set of salient elements for marking the surface characteristics of the object, such as spots, lines and folds on clothes; during rendering, different paintbrushes will be overlaid on these primitives to produce the desired artistic effect; interpretation zoneDivided into line-drawing parts for describing the line-drawingAnd for describing non-line drawing parts having the same structural region;Direction fieldIs defined as:
in which the direction fieldThe initial value being line tracingAnd then propagating the direction to a non-line-drawing region using a diffusion equation;
The rendering process of the key frame is a process of continuously selecting strokes and placing strokes; to interpret the regionFor example, first render its non-line-drawn partThen rendering the line-drawn part(ii) a This is to ensure that when rendered regions overlap, the pen strokes of the line-drawn portion can be on the upper layer; in the non-line drawing part, optionally selecting an unrendered pixel area, taking the center of the area as an initial point, diffusing to two sides along the direction field, and generating a flow pattern area; taking the central axis of the area as a reference line, and transforming the selected brush into the flow pattern area to align the central axis of the pen touch with the central axis of the area; rendering of the line-drawn portion of the region is similar.
7. The interactive video stylized rendering method based on video interpretation according to claim 1, characterized in that the stylized method step 5) of the video stylized module (2) is a stroke propagation method of the sequence frames as follows:
the rendering of the non-key frame is obtained by 'propagation' of the rendering result of the key frame, the propagation basis is the time-space corresponding relation of the interpretation region, in the propagation process, as the variation of the interpretation region is larger and larger, the stroke may be gradually leaked to the outside of the region, and meanwhile, a rendered gap appears in the region, so in the propagation stroke graph, the adding and deleting mechanism of the stroke must be considered at the same time, otherwise, the rendering result has a jitter phenomenon; the mechanism of propagation, addition and deletion of strokes is as follows:
and (3) pen touch transmission: let a certain interpretation zone representing the key-frame at time t of the video,to representThe regions corresponding to the time t +1, their image regions respectively、Represents; to be provided with、To represent、Dense matching points in the time domain (computed during video interpretation); suppose thatWatch can passNon-rigid transformation of the table; when the pen touch is transmitted, the invention hopes thatUpper matching pointCan map to a new image area in the t +1 th frameIs matched withBased on the above consideration, the invention selects a Thin-plate Spline (TPS) interpolation model, which can be used for calculating the thickness of the Thin-plate SplineMiddle key pointMapping toIs matched withTo aThe TPS minimizes the energy function to obtain the final TPSThe pixel grid of (a) is distorted by elastic (non-rigid) deformation;
deleting brush strokes: because some brush regions become smaller and smaller after the brushes are propagated in the video or have a shielding relationship or the number of stroke propagation frames is too many, the invention eliminates the brushes when the area of the brush regions corresponding to the brushes is smaller than a given threshold value, and also deletes the brush regions when the propagated brushes fall outside the corresponding region boundaries;
adding strokes, when new semantic areas appear or existing semantic areas become larger and larger (such as unfolding of clothes), the invention must add new brushes to cover the new areas, and in order to fill gaps among the brushes, the invention only needs to simply change the size and the position of the adjacent brushes, and if the area which is not covered by the brushes becomes larger and exceeds a given threshold, the system automatically creates a new brush to cover the new brushes; nevertheless, it is not possible with the present invention to draw a stroke on the gap immediately when it first appears; thus, the present invention sets a relatively high threshold and delays rendering newly appearing regions until they grow large enough; then, the invention adopts a general brush placement algorithm to fill the large enough gaps reaching the threshold, and finally reversely propagates and transforms the new brushes to fill the gap areas which appear previously but are not rendered; the process of filling the paintbrush backwards can avoid frequently changing paintbrushes, and can link smaller and fragmented paintbrushes into larger paintbrushes, thereby reducing flicker effects and other undesirable visual effects caused by human factors; also, since the present invention adds new brushes at the bottom layer, they are drawn under the existing brushes, which further reduces the visual flicker effect.
8. The interactive video stylized rendering method based on video interpretation according to claim 1, characterized in that the damping brush system for anti-shake in the stylized method step 6) of the video stylized module (2) is as follows:
the last step of stylized rendering of the video is anti-shake operation, in which adjacent paintbrushes in the time domain and the space domain are connected by springs to simulate a damping system; by minimizing the energy of the system, the effect of removing the jitter can be achieved;
for the ith brush at time t, the invention usesGeometric attributes representing its center coordinates and size, and its initial values are noted(ii) a The energy function of the damped brush system is defined as follows:
the first term constrains the position of the brush not to be too far from the initial position:
the second term in the equation is the smoothing constraint on brush i in the time domain:
the third term in the formula smoothly constrains adjacent brush in both time domain and space domain; note the bookFor any adjacent brush, i.e. the ith brush at time tThe relative distance difference and size difference between them are recorded asAnd the smoothing term is defined as follows:
the energy minimization problem is solved by the Levenbergy-Marquard algorithm.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110302054XA CN102542593A (en) | 2011-09-30 | 2011-09-30 | Interactive video stylized rendering method based on video interpretation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110302054XA CN102542593A (en) | 2011-09-30 | 2011-09-30 | Interactive video stylized rendering method based on video interpretation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102542593A true CN102542593A (en) | 2012-07-04 |
Family
ID=46349405
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201110302054XA Pending CN102542593A (en) | 2011-09-30 | 2011-09-30 | Interactive video stylized rendering method based on video interpretation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102542593A (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103927372A (en) * | 2014-04-24 | 2014-07-16 | 厦门美图之家科技有限公司 | Image processing method based on user semanteme |
CN104063876A (en) * | 2014-01-10 | 2014-09-24 | 北京理工大学 | Interactive image segmentation method |
CN104346789A (en) * | 2014-08-19 | 2015-02-11 | 浙江工业大学 | Fast artistic style study method supporting diverse images |
CN104867183A (en) * | 2015-06-11 | 2015-08-26 | 华中科技大学 | Three-dimensional point cloud reconstruction method based on region growing |
CN105719327A (en) * | 2016-02-29 | 2016-06-29 | 北京中邮云天科技有限公司 | Art stylization image processing method |
CN105825531A (en) * | 2016-03-17 | 2016-08-03 | 广州多益网络股份有限公司 | Method and device for dyeing game object |
CN106296567A (en) * | 2015-05-25 | 2017-01-04 | 北京大学 | The conversion method of a kind of multi-level image style based on rarefaction representation and device |
CN106485223A (en) * | 2016-10-12 | 2017-03-08 | 南京大学 | The automatic identifying method of rock particles in a kind of sandstone microsection |
CN107277615A (en) * | 2017-06-30 | 2017-10-20 | 北京奇虎科技有限公司 | Live stylized processing method, device, computing device and storage medium |
CN109741413A (en) * | 2018-12-29 | 2019-05-10 | 北京金山安全软件有限公司 | Rendering method and device for semitransparent objects in scene and electronic equipment |
CN109816663A (en) * | 2018-10-15 | 2019-05-28 | 华为技术有限公司 | A kind of image processing method, device and equipment |
CN110288625A (en) * | 2019-07-04 | 2019-09-27 | 北京字节跳动网络技术有限公司 | Method and apparatus for handling image |
CN110446066A (en) * | 2019-08-28 | 2019-11-12 | 北京百度网讯科技有限公司 | Method and apparatus for generating video |
CN110738715A (en) * | 2018-07-19 | 2020-01-31 | 北京大学 | automatic migration method of dynamic text special effect based on sample |
CN111722896A (en) * | 2019-03-21 | 2020-09-29 | 华为技术有限公司 | Animation playing method, device, terminal and computer readable storage medium |
CN112017179A (en) * | 2020-09-09 | 2020-12-01 | 杭州时光坐标影视传媒股份有限公司 | Method, system, electronic device and storage medium for evaluating visual effect grade of picture |
CN113128498A (en) * | 2019-12-30 | 2021-07-16 | 财团法人工业技术研究院 | Cross-domain picture comparison method and system |
CN113256484A (en) * | 2021-05-17 | 2021-08-13 | 百果园技术(新加坡)有限公司 | Method and device for stylizing image |
CN116761018A (en) * | 2023-08-18 | 2023-09-15 | 湖南马栏山视频先进技术研究院有限公司 | Real-time rendering system based on cloud platform |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101807198A (en) * | 2010-01-08 | 2010-08-18 | 中国科学院软件研究所 | Video abstraction generating method based on sketch |
CN101853517A (en) * | 2010-05-26 | 2010-10-06 | 西安交通大学 | Real image oil painting automatic generation method based on stroke limit and texture |
CN101930614A (en) * | 2010-08-10 | 2010-12-29 | 西安交通大学 | Drawing rendering method based on video sub-layer |
-
2011
- 2011-09-30 CN CN201110302054XA patent/CN102542593A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101807198A (en) * | 2010-01-08 | 2010-08-18 | 中国科学院软件研究所 | Video abstraction generating method based on sketch |
CN101853517A (en) * | 2010-05-26 | 2010-10-06 | 西安交通大学 | Real image oil painting automatic generation method based on stroke limit and texture |
CN101930614A (en) * | 2010-08-10 | 2010-12-29 | 西安交通大学 | Drawing rendering method based on video sub-layer |
Cited By (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104063876A (en) * | 2014-01-10 | 2014-09-24 | 北京理工大学 | Interactive image segmentation method |
CN104063876B (en) * | 2014-01-10 | 2017-02-01 | 北京理工大学 | Interactive image segmentation method |
CN103927372A (en) * | 2014-04-24 | 2014-07-16 | 厦门美图之家科技有限公司 | Image processing method based on user semanteme |
CN104346789A (en) * | 2014-08-19 | 2015-02-11 | 浙江工业大学 | Fast artistic style study method supporting diverse images |
CN104346789B (en) * | 2014-08-19 | 2017-02-22 | 浙江工业大学 | Fast artistic style study method supporting diverse images |
CN106296567A (en) * | 2015-05-25 | 2017-01-04 | 北京大学 | The conversion method of a kind of multi-level image style based on rarefaction representation and device |
CN106296567B (en) * | 2015-05-25 | 2019-05-07 | 北京大学 | A kind of conversion method and device of the multi-level image style based on rarefaction representation |
CN104867183A (en) * | 2015-06-11 | 2015-08-26 | 华中科技大学 | Three-dimensional point cloud reconstruction method based on region growing |
CN105719327B (en) * | 2016-02-29 | 2018-09-07 | 北京中邮云天科技有限公司 | A kind of artistic style image processing method |
CN105719327A (en) * | 2016-02-29 | 2016-06-29 | 北京中邮云天科技有限公司 | Art stylization image processing method |
CN105825531A (en) * | 2016-03-17 | 2016-08-03 | 广州多益网络股份有限公司 | Method and device for dyeing game object |
CN105825531B (en) * | 2016-03-17 | 2018-08-21 | 广州多益网络股份有限公司 | A kind of colouring method and device of game object |
CN106485223B (en) * | 2016-10-12 | 2019-07-12 | 南京大学 | The automatic identifying method of rock particles in a kind of sandstone microsection |
CN106485223A (en) * | 2016-10-12 | 2017-03-08 | 南京大学 | The automatic identifying method of rock particles in a kind of sandstone microsection |
CN107277615A (en) * | 2017-06-30 | 2017-10-20 | 北京奇虎科技有限公司 | Live stylized processing method, device, computing device and storage medium |
CN107277615B (en) * | 2017-06-30 | 2020-06-23 | 北京奇虎科技有限公司 | Live broadcast stylization processing method and device, computing device and storage medium |
CN110738715A (en) * | 2018-07-19 | 2020-01-31 | 北京大学 | automatic migration method of dynamic text special effect based on sample |
CN110738715B (en) * | 2018-07-19 | 2021-07-09 | 北京大学 | Automatic migration method of dynamic text special effect based on sample |
CN109816663A (en) * | 2018-10-15 | 2019-05-28 | 华为技术有限公司 | A kind of image processing method, device and equipment |
US12026863B2 (en) | 2018-10-15 | 2024-07-02 | Huawei Technologies Co., Ltd. | Image processing method and apparatus, and device |
CN109741413A (en) * | 2018-12-29 | 2019-05-10 | 北京金山安全软件有限公司 | Rendering method and device for semitransparent objects in scene and electronic equipment |
CN109741413B (en) * | 2018-12-29 | 2023-09-19 | 超级魔方(北京)科技有限公司 | Rendering method and device of semitransparent objects in scene and electronic equipment |
CN111722896B (en) * | 2019-03-21 | 2021-09-21 | 华为技术有限公司 | Animation playing method, device, terminal and computer readable storage medium |
CN111722896A (en) * | 2019-03-21 | 2020-09-29 | 华为技术有限公司 | Animation playing method, device, terminal and computer readable storage medium |
CN110288625A (en) * | 2019-07-04 | 2019-09-27 | 北京字节跳动网络技术有限公司 | Method and apparatus for handling image |
CN110446066A (en) * | 2019-08-28 | 2019-11-12 | 北京百度网讯科技有限公司 | Method and apparatus for generating video |
CN110446066B (en) * | 2019-08-28 | 2021-11-19 | 北京百度网讯科技有限公司 | Method and apparatus for generating video |
CN113128498A (en) * | 2019-12-30 | 2021-07-16 | 财团法人工业技术研究院 | Cross-domain picture comparison method and system |
CN112017179A (en) * | 2020-09-09 | 2020-12-01 | 杭州时光坐标影视传媒股份有限公司 | Method, system, electronic device and storage medium for evaluating visual effect grade of picture |
CN113256484A (en) * | 2021-05-17 | 2021-08-13 | 百果园技术(新加坡)有限公司 | Method and device for stylizing image |
CN113256484B (en) * | 2021-05-17 | 2023-12-05 | 百果园技术(新加坡)有限公司 | Method and device for performing stylization processing on image |
CN116761018A (en) * | 2023-08-18 | 2023-09-15 | 湖南马栏山视频先进技术研究院有限公司 | Real-time rendering system based on cloud platform |
CN116761018B (en) * | 2023-08-18 | 2023-10-17 | 湖南马栏山视频先进技术研究院有限公司 | Real-time rendering system based on cloud platform |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102542593A (en) | Interactive video stylized rendering method based on video interpretation | |
Kelly et al. | FrankenGAN: guided detail synthesis for building mass-models using style-synchonized GANs | |
Hartmann et al. | Streetgan: Towards road network synthesis with generative adversarial networks | |
Zhao et al. | Achieving good connectivity in motion graphs | |
CN106920243A (en) | The ceramic material part method for sequence image segmentation of improved full convolutional neural networks | |
CN110222722A (en) | Interactive image stylization processing method, calculates equipment and storage medium at system | |
CN103218846B (en) | The ink and wash analogy method of Three-dimension Tree model | |
Yu et al. | Modern machine learning techniques and their applications in cartoon animation research | |
Cao et al. | Difffashion: Reference-based fashion design with structure-aware transfer by diffusion models | |
CN103854306A (en) | High-reality dynamic expression modeling method | |
Fan et al. | Structure completion for facade layouts. | |
CN108242074B (en) | Three-dimensional exaggeration face generation method based on single irony portrait painting | |
Tang et al. | Animated construction of Chinese brush paintings | |
Xie et al. | Stroke-based stylization learning and rendering with inverse reinforcement learning | |
KR20230085931A (en) | Method and system for extracting color from face images | |
CN102270345A (en) | Image feature representing and human motion tracking method based on second-generation strip wave transform | |
Tong et al. | Sketch generation with drawing process guided by vector flow and grayscale | |
Yang et al. | Brushwork master: Chinese ink painting synthesis for animating brushwork process | |
Guo | Design and development of an intelligent rendering system for new year's paintings color based on b/s architecture | |
Xie et al. | Stroke-based stylization by learning sequential drawing examples | |
CN104091318B (en) | A kind of synthetic method of Chinese Sign Language video transition frame | |
Jiang et al. | Animation scene generation based on deep learning of CAD data | |
Jia et al. | Facial expression synthesis based on motion patterns learned from face database | |
Fu et al. | PlanNet: A Generative Model for Component-Based Plan Synthesis | |
Wang et al. | AI Promotes the Inheritance and Dissemination of Chinese Boneless Painting——Research on Design Practice from Interdisciplinary Collaboration |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20120704 |