CN117221466A

CN117221466A - Video stitching method and system based on grid transformation

Info

Publication number: CN117221466A
Application number: CN202311482383.6A
Authority: CN
Inventors: 刘卫华; 周舟; 陈虹旭
Original assignee: Beijing Smart Yunzhou Technology Co ltd
Current assignee: Beijing Smart Yunzhou Technology Co ltd
Priority date: 2023-11-09
Filing date: 2023-11-09
Publication date: 2023-12-12
Anticipated expiration: 2043-11-09
Also published as: CN117221466B

Abstract

The invention provides a video stitching method and a system based on grid transformation, which relate to the technical field of video stitching and comprise the following steps: acquiring an image to be matched, determining a first characteristic point corresponding to the image to be matched and a second characteristic point corresponding to the reference image through a characteristic point extraction algorithm according to the image to be matched and the reference image, carrying out characteristic point matching in combination with a characteristic point matching strategy, and determining a characteristic point matching result; according to the feature matching point result, determining a polar coordinate value corresponding to the image to be matched through a grid deformation constraint algorithm, and determining a grid deformation graph corresponding to the image to be matched by combining feature point constraint items in the image to be matched; according to the grid deformation graph, an artifact candidate set is obtained through a target tracking algorithm, whether the current suture line needs to be updated or not is judged by combining the current suture line of the grid deformation graph, if so, a splice joint generated by the suture line is eliminated through a splice joint fusion algorithm, a splice image is obtained and output, and if not, the grid deformation graph is output as the splice image.

Description

Video stitching method and system based on grid transformation

Technical Field

The invention relates to the technical field of video stitching, in particular to a video stitching method and system based on grid transformation.

Background

Video stitching is a process of combining multiple video clips into a single overall video, and video stitching is an increasingly popular research area due to the requirements of the fields of detection, military and the like.

In the related art, CN107734268A is a structure-maintained wide baseline video stitching method, which discloses a structure-maintained wide baseline video stitching method, and in a video frame synchronization stage, video frames are extracted from an input wide baseline video, added into a buffer queue and frame-synchronized; in the video frame registration stage, aiming at a synchronous video frame, firstly, adopting a three-stage strategy combining point matching, linear matching and contour matching to perform feature matching, then adopting a grid optimization model with a structure kept to perform image deformation, comprehensively considering an alignment error, a color error and a saliency structure to obtain an optimal suture line, generating a panoramic image and initializing a splicing template; and in the video splicing stage, a splicing template is combined, synchronous frames are extracted from the buffer queue for frame-by-frame splicing, and a panoramic video composed of panoramic frames is obtained. The video splicing method based on the traditional global homography transformation or the local similarity transformation is more effective, reduces projection distortion and perspective distortion, obtains better splicing effect, is applied to an intelligent security monitoring system, enlarges the field of view of a monitoring picture, and improves monitoring efficiency.

The CN106412461A video splicing method comprises the steps of searching an optimal seam of an overlapping area of a kth frame of two videos by using a dynamic programming method based on a first preset line number of the overlapping area of the kth frame of the first video and the second video as a processing unit; considering the size relation between the width of the overlapped area and a preset threshold value in the color correction, if the width of the overlapped area is larger than the preset threshold value, the color correction is not needed; otherwise, the second preset line number is used as a processing unit to carry out color correction; when the images are synthesized, all lines of the current second preset line number of the kth frame of the two videos are synthesized, and processing units based on optimal seam searching, color correction and image synthesis of the videos are different, so that the complexity of calculation is simplified, the size of a required storage space is reduced, smooth transition of colors between frames can be realized, more image details are reserved in the synthesized overlapping area, and therefore the phenomena of video jump and flickering are effectively avoided.

In summary, although the prior art can realize the stitching of video images, in a video scene with a relatively complex aggregate structure and moving objects, the problem of poor image stitching effect and the like may occur.

Disclosure of Invention

The embodiment of the invention provides a video stitching method and a system based on grid transformation, which are used for automatically stitching video images to be matched, and improving the stitching quality and effect under the conditions of deformation, movement and visual angle change between the images.

In a first aspect of an embodiment of the present invention, a video stitching method based on grid transformation is provided, including:

acquiring an image to be matched, determining a first characteristic point corresponding to the image to be matched and a second characteristic point corresponding to a reference image through a characteristic point extraction algorithm according to the image to be matched and a preset reference image, performing characteristic point matching according to the first characteristic point and the second characteristic point and combining a preset characteristic point matching strategy, and determining a characteristic point matching result;

determining a polar coordinate value corresponding to the image to be matched through a preset grid deformation constraint algorithm according to the characteristic point matching result, and determining a grid deformation graph corresponding to the image to be matched according to the polar coordinate value and combining a characteristic point constraint item in the image to be matched;

according to the grid deformation graph, an artifact candidate set is obtained through a target tracking algorithm, according to the artifact candidate set, a current suture line of the grid deformation graph is combined, whether the current suture line needs to be updated or not is judged, if so, a splicing seam generated by the suture line is eliminated through a splicing seam fusion algorithm, a splicing image is obtained and output, and if not, the grid deformation graph is output as the splicing image.

In an alternative embodiment of the present invention,

according to the image to be matched and a preset reference image, determining a first feature point corresponding to the image to be matched and a second feature point corresponding to the reference image through a feature point extraction algorithm, and according to the first feature point and the second feature point, carrying out feature point matching in combination with a preset feature point matching strategy, wherein the determining of a feature point matching result comprises the following steps:

the first feature points and the second feature points are geometric structure mechanism feature points of a macroscopic structure object in the image to be matched and the reference image, such as building contour structure inflection points, crossing points and road crossing points, street lamp poles, curve inflection points, sky and horizon boundary inflection points and the like;

determining a first feature point corresponding to the image to be matched and a second feature point corresponding to the reference image through a feature point extraction algorithm;

acquiring the first characteristic point and the second characteristic point, generating two characteristic point sets according to the first characteristic point and the second characteristic point, and respectively marking the two characteristic point sets as a first reference set and a first set to be matched;

traversing all the reference feature points in the first reference set, determining first matching points corresponding to the reference feature points in the first set to be matched, and recording the first matching points as a first feature point set;

Traversing all the feature points to be matched in the first set to be matched by adopting the same strategy, determining second matching points corresponding to the feature points to be matched in the first reference set, and marking the second matching points as a second feature point set;

and comparing all the matching point pairs in the first characteristic point set and the second characteristic point set, if the mapping relation of the same elements exists, reserving all the matching correct characteristic point pairs for matching the correct characteristic point pairs, and deleting the characteristic point pairs which are not matched correctly to obtain a characteristic point matching result.

In an alternative embodiment of the present invention,

according to the feature matching point result, determining a polar coordinate value corresponding to the image to be matched through a preset grid deformation constraint algorithm, and according to the polar coordinate value, combining a feature point constraint item in the image to be matched, determining a grid deformation graph corresponding to the image to be matched comprises:

dividing the image to be matched into grid units according to the characteristic point matching result, determining coordinates of each grid in the image to be matched after the homography matrix corresponding to each grid is projected, traversing the coordinates of each grid after the homography matrix corresponding to each grid is projected, and determining an initial grid deformation map corresponding to the grid units;

Based on alignment constraint items corresponding to all feature points in the initial grid deformation graph, combining a global similarity item and a local similarity item which are constructed in advance, and performing grid deformation operation on the image to be matched to obtain a first deformation graph;

based on the first deformation map, introducing a geometric structure maintaining item, and determining a grid deformation map according to the geometric structure maintaining item and a preset grid deformation constraint algorithm;

wherein the mesh deformation constraint algorithm is constructed from a modified adaptive block image alignment algorithm.

In an alternative embodiment of the present invention,

based on alignment constraint items corresponding to all feature points in the initial grid deformation graph, combining a global similarity item and a local similarity item which are constructed in advance, performing grid deformation operation on the image to be matched, and obtaining a first deformation graph comprises:

initializing a local affine transformation matrix on nodes of each grid cell according to the grid cells;

calculating local affine transformation parameters according to the local affine transformation matrix and combining characteristic point matching results;

and deforming the grid cells of the image to be matched according to the local affine transformation parameters to obtain the first deformation graph.

In an alternative embodiment of the present invention,

the expression of mesh deformation after the introduction of the geometric construction retention term is:

；

wherein,Vrepresenting a set of mesh vertices for all images,λ _a representing the weight of the alignment constraint term,E _a (V)the alignment constraint term is represented as such,λ _ls representing the weights of the locally similar terms,E _ls (V)as a locally similar term,λ _gs the weights of the geometric retention terms are represented,E _gs (V)the geometric retention terms are represented as such,λ _g representing the weight of the global hold item,E _g (V)is a global hold item;

the expression of the global hold term is:

;

wherein,nrepresenting the number of vertices that are to be processed,La laplace matrix representing a mesh is presented,V _deformed the position of the apex after the deformation is indicated,V ₀ representing initial vertex position of grid [ ]V _deformed -V ₀ ） _i Represent the firstiThe difference between the deformed coordinates and the initial coordinates of the vertices.

In an alternative embodiment of the present invention,

according to the grid deformation graph, an artifact candidate set is obtained through a target tracking algorithm, and according to the artifact candidate set, the step of judging whether the suture line needs to be updated according to the current suture line of the grid deformation graph obtained through a background filtering algorithm comprises the following steps:

detecting all objects in the grid deformation graph through a target detection algorithm, tracking all objects in the grid deformation graph through a target tracking algorithm, determining a moving track of each object, screening the objects positioned in an overlapping area according to the moving track, and determining a synthetic artifact candidate set;

Determining a foreground image of an object in the synthetic artifact candidate set based on a preset background filtering algorithm, determining a current suture line of the grid deformation graph according to the foreground image, judging whether the current suture line needs to be updated according to the foreground image and the current suture line of the grid deformation graph by the artifact candidate set, if so, eliminating a splice joint generated by the suture line through a splice joint fusion algorithm, obtaining a splice image and outputting the splice image, and if not, outputting the grid deformation graph as the splice image.

In an alternative embodiment of the present invention,

the specific method for determining the suture is as follows:

；

wherein,ξ ^I （e,f）for the intensity difference of the foreground image and the background image at the same pixel position in the overlapping area,ξ ^▽ （e,f）gradient differences of the foreground image and the background image at the same pixel position in the overlapping area,θ ₁ as a weight for the intensity difference in question,θ ₂ as a weight of the gradient difference,ethe foreground image is represented by a representation of the foreground image,frepresenting a background image.

In a second aspect of an embodiment of the present invention, there is provided a video stitching system based on grid transformation, including:

the first unit is used for acquiring an image to be matched, determining a first characteristic point corresponding to the image to be matched and a second characteristic point corresponding to the reference image through a characteristic point extraction algorithm according to the image to be matched and a preset reference image, carrying out characteristic point matching according to the first characteristic point and the second characteristic point and combining a preset characteristic point matching strategy, and determining a characteristic point matching result;

The second unit is used for determining a polar coordinate value corresponding to the image to be matched through a preset grid deformation constraint algorithm according to the characteristic point matching result, and determining a grid deformation graph corresponding to the image to be matched according to the polar coordinate value and combining a characteristic point constraint item in the image to be matched;

and the third unit is used for obtaining an artifact candidate set through a target tracking algorithm according to the grid deformation graph, judging whether the current suture line needs to be updated or not according to the artifact candidate set and combining the current suture line of the grid deformation graph, if so, eliminating the splice joint generated by the suture line through a splice joint fusion algorithm to obtain and output a splice image, and if not, outputting the grid deformation graph as the splice image.

In a third aspect of an embodiment of the present invention,

there is provided an electronic device including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to invoke the instructions stored in the memory to perform the method described previously.

In a fourth aspect of an embodiment of the present invention,

there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method as described above.

The beneficial effects of the embodiments of the present invention may refer to the effects corresponding to technical features in the specific embodiments, and are not described herein.

Drawings

FIG. 1 is a flow chart of a video stitching method based on grid transformation according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a video stitching system based on grid transformation according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The technical scheme of the invention is described in detail below by specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.

Fig. 1 is a flow chart of a video stitching method based on grid transformation according to an embodiment of the present invention, as shown in fig. 1, the method includes:

S1, acquiring an image to be matched, determining a first characteristic point corresponding to the image to be matched and a second characteristic point corresponding to a reference image through a characteristic point extraction algorithm according to the image to be matched and a preset reference image, performing characteristic point matching according to the first characteristic point and the second characteristic point and combining a preset characteristic point matching strategy, and determining a characteristic point matching result;

the characteristic point extraction algorithm is used for extracting the salient characteristic points with unique properties from the image so as to facilitate subsequent matching and positioning, and the characteristic point extraction algorithm in the scheme can be scale-invariant characteristic transformation and is used for extracting the salient characteristic points with unique properties from the image so as to facilitate subsequent matching and positioning; the feature point matching strategy is used for matching feature points in two images so as to establish a corresponding relation between the two images, and the feature point matching strategy uses distance information between feature descriptors to perform matching by setting a threshold value or adopting some self-adaptive methods.

In an alternative embodiment of the present invention,

loading an image to be matched and a reference image into a memory, converting a color image into a gray image, and detecting feature points in the image by using a selected feature point extraction algorithm;

for each detected feature point, a descriptor is calculated, which is used to describe the local image information around the point.

traversing each reference feature point in the first reference set, finding a corresponding first matching point in the first set to be matched for each reference feature point, and adding the found first matching point into the first feature point set;

Traversing each feature point to be matched in the first set to be matched by adopting the same strategy, finding a corresponding second matching point in the first reference set for each feature point to be matched, and adding the found second matching point into the second feature point set;

For each feature point in the first feature point set, checking whether the same matching point exists in the second feature point set, if so, reserving the matching point pair to indicate that the matching is correct, and for each feature point in the first feature point set, checking whether the same matching point exists, and if not, deleting the feature point from the matching result.

For each matching point pair in the first characteristic point set, searching whether the same matching point pair exists in the second characteristic point set, marking the point pair as being matched correctly if the same matching point pair exists, and marking the point pair as being not matched correctly if the same matching point pair does not exist;

Illustratively, if the same matching relationship is found in the comparison, it may be determined that there is a mapping relationship of the same element;

if there is a matching point where A belongs to the first set of feature points and B belongs to the second set of feature points, so that A is B, then a mapping relationship is considered to exist between elements A and B.

Creating a new list (or other data structure) for storing pairs of matching correct feature points, adding each pair of matching correct feature points to the new list;

in the step, the feature point extraction algorithm can accurately extract feature points from the image to be matched and the reference image. This ensures that the subsequent matching process is based on good characteristics;

and through the characteristic point matching strategy, the accuracy of matching point pairs in the first characteristic point set and the second characteristic point set is ensured. The two feature point sets are traversed by adopting a similar strategy, so that the matching robustness is ensured, and a better matching effect can be obtained even under the conditions of some shielding, illumination change and the like;

and deleting the feature point pairs which are not matched correctly by comparing the first feature point set with the second feature point set, and reserving the feature point pairs which are matched correctly. The step ensures that the accurately matched characteristic points are used in the subsequent image splicing process, thereby being beneficial to realizing more accurate and more stable grid deformation;

In conclusion, the method realizes accurate matching of the feature points in video stitching, and provides reliable constraint for subsequent grid deformation. This has a positive impact on the overall effect of the video splice, the nature of the transition, and the seamless nature of the splice area.

S2, determining a polar coordinate value corresponding to the image to be matched through a preset grid deformation constraint algorithm according to the characteristic point matching result, and determining a grid deformation graph corresponding to the image to be matched according to the polar coordinate value and combining a characteristic point constraint item in the image to be matched;

the preset grid deformation constraint algorithm is generally used in image alignment or deformation operation, wherein a grid is used as a control point, the shape of an image is adjusted in a grid deformation mode, and the core is to realize the deformation of the grid while maintaining a global or local geometric structure;

the feature point constraint item is commonly used for maintaining the corresponding relation of specific points or features in two images in the field of image processing, and generally plays a role in tasks such as image alignment, deformation, splicing and the like;

in an alternative embodiment of the present invention,

the homography matrix, namely the homography transformation matrix, describes the projection relation between two planes, is generally used for mapping points on one plane to the other plane, and common applications include image registration, stitching, point projection on a plane map and the like;

calculating a homography matrix through a standard homography matrix estimation algorithm according to the characteristic point matching result;

illustratively, assuming that there are already matching point pairs a and B, forming two matrices by combining the coordinates of the matching point pairs into a matrix, wherein one matrix represents the points in the first image and the other matrix represents the points in the second image, and performing a homogenization process;

and constructing the standardized coordinate matrix as an augmentation matrix, performing singular value decomposition on the augmentation matrix, and extracting a homography matrix from a singular value decomposition result.

Dividing the original picture into grid cells according to the preset grid size and layout, and calculating boundary coordinates of each grid cell;

For example, assuming that there is one image, the width is 800 pixels, the height is 600 pixels, the preset grid size is 100×100 pixels, assuming that we want to have 4 grids in the horizontal direction and 3 grids in the vertical direction, the grid rows and columns are calculated, and the grid width and the height are 100 pixels respectively;

for each grid cell, applying a homography matrix to the boundary coordinates, determining coordinates projected by the homography matrix, and mapping the grid cells in the original image into the image to be matched;

and determining the color value of each pixel in the initial grid deformation map by an interpolation method according to the projected coordinates to obtain the initial grid deformation map.

Exemplary, assuming an original image, projectively transforming to obtain transformed coordinates of each pixel, using projectively transformed coordinates for each pixel, using bilinear interpolation to find four nearest neighbor pixels around the transformed coordinates for each transformed coordinate, weighted averaging color values of the four pixels to obtain interpolated color values, calculating color values thereof in the initial grid transformation map, operating on each transformed pixel, and finally obtaining an initial grid transformation map

the alignment constraint item corresponding to the feature points is used for keeping the position alignment of the feature points in the image after transformation, and is usually realized by minimizing the distance or error between the feature points;

the local similarity term is typically used to achieve local constraints, allowing different deformations, such as bends, twists or deformations, of the image of different regions to adapt to local features.

The global similarity term is typically used to apply the same transformation to the entire image, scaling, rotation and translation of the image as a whole.

Defining a global similarity term and a local similarity term, creating a target image according to the original picture, wherein the size of the target image is the same as that of the original picture, mapping the initial grid deformation map to the target image, and meeting the global similarity term and the local similarity term;

the optimization problem is solved using an optimization algorithm to find the optimal grid deformation operation, to align the initial grid deformation map as much as possible with the target image, and to satisfy the constraint. The optimization algorithm can be selected according to factors such as complexity of the problem, calculation resources and the like, and common optimization algorithms comprise a gradient descent method, a quasi-Newton method and the like;

Mapping the initial grid deformation map to a target image by applying the optimal grid deformation operation found in the optimization to obtain the first deformation map;

the geometry-preserving item is a constraint for ensuring that the deformed image still maintains the geometry of the original image, and the geometry-preserving item is an optical flow field describing the motion of each pixel in the image;

selecting an optical flow field calculation method, calculating a motion vector between two images for each pixel to describe a displacement of the pixel from the first deformation map to a target image;

and mapping the first deformation map to a target image according to the optimal grid deformation operation to obtain the grid deformation map.

Wherein the grid deformation constraint algorithm is constructed according to an improved adaptive block image alignment algorithm;

the self-adaptive block image alignment algorithm is a computer vision algorithm for image segmentation and clustering, combines the thought of self-adaptive block, and aims to automatically discover different textures and areas in an image and divide the textures and areas into different subareas.

In this embodiment, the image to be matched is divided into grid cells according to the feature point matching result. For each grid, an initial grid deformation map is created by determining coordinates after homography matrix projection. This helps to ensure that the underlying mesh is aligned when deformed, taking into account the local geometry;

and performing grid deformation operation on the image to be matched based on the alignment constraint item, the global similarity item and the local similarity item corresponding to the feature points by using the initial grid deformation graph. This step aims at comprehensively considering the global and local geometries to better align the image to be matched with the reference image.

Geometry-preserving items are introduced to ensure that the geometry of the image is preserved while the mesh deformation is taking place. This is critical to video stitching and other tasks, as it can prevent distortion or deformation of the image when deformed, maintaining a more realistic appearance;

in summary, the embodiment comprehensively considers the global and local geometric structures when grid deformation is performed by a comprehensive and self-adaptive deformation method, so as to realize a more accurate and natural alignment effect.

In an alternative embodiment of the present invention,

For each grid cell, the initialized affine transformation parameters are applied to all pixels within the cell, specifically:

traversing each pixel in the grid unit, transforming the coordinates of each pixel through an affine transformation matrix to obtain new coordinates, obtaining transformed pixel values by using an interpolation method, and filling new coordinate positions;

repeating the application of affine transformation parameters until all grid cells are processed, each grid cell having a local affine transformation matrix applied to pixels therein;

after the initialization and transformation of all grid cells are completed, obtaining the first deformation graph;

for example, it is assumed that there is a grid cell including four nodes a, B, C, D, and an identity matrix is initialized on each node as an initial local affine transformation matrix, for the nodes a, B, C, D, the local affine transformation matrix is initialized respectively as the identity matrix, for each grid cell, the correspondence between the feature points in the image to be matched and the feature points in the reference image is obtained from the feature point matching result, the local affine transformation parameters are calculated by using the least square method, for each grid cell, the grid cell transformation is performed on the image to be matched using the calculated affine transformation parameters

In this embodiment, global and local geometric structures can be comprehensively considered when grid deformation is performed on an image, so that more accurate alignment and deformation are realized. This is critical for video stitching and other application scenarios, because images may have different global and local geometric features, thus requiring a comprehensive morphing approach.

In an alternative embodiment of the present invention,

；

by introducing the geometric construction maintaining item, the constraint on grid deformation can be realized, more accurate deformation and alignment effect can be realized in image processing and computer vision tasks, the selection and balance of different weight parameters can be adjusted according to specific tasks and application requirements, and more accurate and vivid image processing results can be realized;

the expression of the global hold term is:

;

wherein, nRepresenting the number of vertices that are to be processed,La laplace matrix representing a mesh is presented,V _deformed the position of the apex after the deformation is indicated,V ₀ representing initial vertex position of grid [ ]V _deformed -V ₀ ） _i Represent the firstiThe difference between the deformed coordinates and the initial coordinates of the vertices.

In the function, through the combination of a plurality of constraint items, more accurate and accurate grid deformation and alignment effects can be realized, and meanwhile, weight parameters can be selected according to specific tasks and application requirements, so that the method is suitable for different image processing and computer vision applications, and customization can be carried out according to actual requirements.

S3, according to the grid deformation graph, an artifact candidate set is obtained through a target tracking algorithm, according to the artifact candidate set, a current suture line of the grid deformation graph is combined, whether the current suture line needs to be updated or not is judged, if so, a splicing seam generated by the suture line is eliminated through a splicing seam fusion algorithm, a splicing image is obtained and output, and if not, the grid deformation graph is output as the splicing image.

The stitching line is the dividing line between two adjacent images. In image stitching, these lines are typically due to camera view, illumination, color, or object edges, etc., and the stitching is the area along the boundary between two adjacent images. This is a transition region that contains a portion of the two images that need to be reasonably processed to achieve a smooth stitching effect, and the stitching fusion algorithm is an algorithm that eliminates or mitigates the effects of stitching so that the stitching region appears more natural and seamless.

In an alternative embodiment of the present invention,

illustratively, assuming that we use YOLO for target detection, deepSORT for target tracking, use YOLO for detecting objects on a grid deformation graph, obtain the position and class of each object, initialize a DeepSORT tracker for each detected object, perform target tracking on successive frames, analyze tracking results, obtain movement track information for each object, screen objects that appear in an overlapping region for a certain period of time according to track analysis, and add objects in the overlapping region to a synthetic artifact candidate set.

The object detection algorithm is a computer vision task, and aims to identify objects existing in an image or video, and is usually expressed in the form of a boundary box, and the main object of the object detection algorithm is to find an interesting object in the image and distinguish the interesting object from a background;

Object tracking is a computer vision task aimed at tracking the position and motion of objects in successive image frames. The target tracking algorithm is generally used in the fields of video analysis, monitoring systems, automatic driving and the like;

processing the input image or video frame using a target detection algorithm to detect all objects in the image, for each detected object, activating a target tracker to track the position of the object in successive image frames;

each object is continuously tracked and their movement trajectories between successive frames are recorded. These trajectories can be represented as sequences of positions of objects, one for each sequence;

for each pair of object trajectories, their overlap region is calculated over a time window. The size of the time window can be adjusted according to application requirements, if the tracks of two objects overlap in the time window, which means that the two objects share the same area at a certain moment, namely, artifact can occur, and the information of the overlapping areas is recorded;

based on the information of the overlapping region, a set of potential artifact candidates is determined. The candidate set comprises object pairs which overlap in a time window, a rule for judging the synthetic artifact is defined, and artifact candidate items are determined if two objects overlap in the time window in at least N continuous frames or the speed difference of the two objects is smaller than a certain threshold value, so that the synthetic artifact candidate set is obtained.

For each frame, the foreground object is separated from the background using a preset background filtering algorithm, a differential-based method or a background modeling-based method. This will result in a binary mask in which foreground objects are marked as foreground regions (white) and background as background regions (black);

based on the foreground detection result, screening the objects in the synthetic artifact candidate set, and only reserving the objects overlapped with the foreground region in the foreground mask;

for the remaining synthetic artifact candidates, they are extracted from the current frame to generate the foreground image. This can be achieved by extracting the foreground region from the current frame and then placing it in a separate image;

On the foreground images, the suture is detected using a suitable algorithm or technique. The stitching line refers to a line or boundary used for seamlessly splicing the foreground image and the background image, and the method for detecting the stitching line can be an algorithm based on the characteristics of brightness, color, gradient or texture of the image; stitching lines are typically located between the foreground and background for making the composite image appear coherent.

Using the foreground image and the detected stitch line, the quality of the composite image is assessed. Whether the synthesized image needs to update the suture line can be judged according to some quality measurement standards, and if the quality measurement is lower than a certain threshold value or meets a certain condition, the suture line needs to be dynamically updated is judged;

in this embodiment, the object in the mesh deformation map is detected by the target detection algorithm, and then the object is tracked by the target tracking algorithm. This process enables the system to understand objects moving in the video, providing accurate object trajectory information for subsequent processing;

from the object located in the overlap region and its trajectory of movement, a candidate set of synthetic artefacts is formed. This set contains artifact objects that need to be processed, providing an explicit target for subsequent operations;

And determining a foreground image of the object in the synthetic artifact candidate set through a preset background filtering algorithm. This step helps to separate the object from the background, providing accurate object shape information for subsequent suture processing;

based on the foreground image, a current stitch line is determined. The suture is judged by the artifact candidate set, and if necessary, the suture is updated. The dynamic suture updating ensures that the suture can be adjusted in real time along with the movement of the object in the process of the movement of the object, and the discontinuity of the splicing area is reduced.

In summary, in this embodiment, by detecting, tracking and artifact processing on the target, accurate stitching of the object and dynamic adjustment of the stitching line in video stitching are ensured, and the overall video stitching effect is improved.

In an alternative embodiment, the suture is defined as:

；

The function can control the sensitivity of the stitching line to color or brightness change by considering the intensity difference, and can control the sensitivity of the stitching line to texture change by considering the gradient difference, and the stitching line can be more accurately and smoothly determined by balancing the intensity difference and the gradient difference of the images in the overlapping area.

Fig. 2 is a schematic structural diagram of a video stitching system based on grid transformation according to an embodiment of the present invention, as shown in fig. 2, where the system includes:

The present invention may be a method, apparatus, system, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for performing various aspects of the present invention.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. A method of video stitching based on a grid transformation, comprising:

2. The method according to claim 1, wherein determining, according to the image to be matched and a preset reference image, a first feature point corresponding to the image to be matched and a second feature point corresponding to the reference image by a feature point extraction algorithm, and performing feature point matching according to the first feature point and the second feature point in combination with a preset feature point matching policy, and determining a feature point matching result includes:

3. The method according to claim 1, wherein determining, according to the feature matching point result, a polar coordinate value corresponding to the image to be matched through a preset grid deformation constraint algorithm, and determining, according to the polar coordinate value, a grid deformation graph corresponding to the image to be matched in combination with feature point constraint items in the image to be matched includes:

4. The method of claim 3, wherein performing a grid deformation operation on the image to be matched based on alignment constraint terms corresponding to all feature points in the initial grid deformation graph in combination with pre-constructed global similarity terms and local similarity terms to obtain a first deformation graph comprises:

5. A method according to claim 3, wherein the expression of mesh deformation after the introduction of the geometric construction hold term is:；

wherein,Vrepresenting a set of mesh vertices for all images, λ _a Representing the weight of the alignment constraint term,E _a (V)the alignment constraint term is represented as such,λ _ls representing the weights of the locally similar terms,E _ls (V)as a locally similar term,λ _gs the weights of the geometric retention terms are represented,E _gs (V)the geometric retention terms are represented as such,λ _g representing the weight of the global hold item,E _g (V)is a global hold item;

the expression of the global hold term is:

;

6. The method of claim 1, wherein obtaining an artifact candidate set according to the mesh deformation map through a target tracking algorithm, and determining whether a suture needs to be updated according to the artifact candidate set in combination with a current suture of the mesh deformation map obtained through a background filtering algorithm comprises:

7. The method according to claim 6, characterized in that the determining of the suture is in particular:

；

8. A grid transformation-based video stitching system, comprising:

9. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to invoke the instructions stored in the memory to perform the method of any of claims 1 to 7.

10. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 1 to 7.