Nothing Special   »   [go: up one dir, main page]

CN113724290B - Multi-level template self-adaptive matching target tracking method for infrared image - Google Patents

Multi-level template self-adaptive matching target tracking method for infrared image Download PDF

Info

Publication number
CN113724290B
CN113724290B CN202110830766.2A CN202110830766A CN113724290B CN 113724290 B CN113724290 B CN 113724290B CN 202110830766 A CN202110830766 A CN 202110830766A CN 113724290 B CN113724290 B CN 113724290B
Authority
CN
China
Prior art keywords
image
target
error
template
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110830766.2A
Other languages
Chinese (zh)
Other versions
CN113724290A (en
Inventor
吕梅柏
刘晓东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202110830766.2A priority Critical patent/CN113724290B/en
Publication of CN113724290A publication Critical patent/CN113724290A/en
Application granted granted Critical
Publication of CN113724290B publication Critical patent/CN113724290B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a multi-level template self-adaptive matching target tracking method for infrared images, which belongs to the field of image processing and comprises the following steps: acquiring an infrared image, and performing image enhancement pretreatment on the infrared image to obtain a first layer of an image feature pyramid; manually marking a target position on the infrared image, namely, giving a target frame of a first frame; extracting images of different layers in a target frame on the infrared image by utilizing a feature pyramid algorithm: performing target searching on each layer by using an SSDA template matching algorithm, selecting the maximum possible position of a target by a frame, and simultaneously recording confidence and target movement position information; judging the confidence coefficient through an error function, and determining a new template if the scale change occurs; if the form transformation occurs, updating the template in time; the search area is enlarged if occlusion occurs. The method provided by the invention can still store the effective characteristics of the target when the target is subjected to scale change, form change and shielding, and keep good tracking.

Description

Multi-level template self-adaptive matching target tracking method for infrared image
Technical Field
The invention belongs to the field of image processing, and particularly relates to a multi-level template self-adaptive matching target tracking method for infrared images.
Background
With the development of intelligence, images become an important means of acquiring information. In the acquisition of images, image processing is a very important link, and image processing technology has been widely used in various fields such as industrial safety medical management. In image processing, detection and tracking of a moving target are heavy points of an image processing technology, and detection and tracking processing is generally realized by adopting an image processing tracking algorithm.
The traditional image processing tracking method adopts a template matching algorithm at present. Conventional template matching algorithms tend to lose objects when faced with dimensional changes, morphological changes, and occlusion of the object. The existing template matching tracking algorithm has the following three defects: the target cannot be continuously tracked when the morphology changes; the target cannot be continuously tracked when the scale of the target is changed; the target cannot be continuously tracked when it is occluded.
Therefore, the invention provides a multi-level template self-adaptive matching target tracking method for infrared images.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides a multi-level template self-adaptive matching target tracking method for infrared images.
In order to achieve the above object, the present invention provides the following technical solutions:
a multi-level template self-adaptive matching target tracking method for infrared images comprises the following steps:
acquiring an infrared image, and performing image enhancement pretreatment on the infrared image to obtain the first layer of an image characteristic golden character tower;
manually marking a target position, namely a target frame of a given first frame, on the infrared image;
extracting images of different layers in a target frame on the infrared image by utilizing a feature pyramid algorithm:
aiming at images of different layers obtained through the feature pyramid, performing target searching on each layer by using an improved SSDA template matching algorithm, selecting the maximum possible position of a target by a frame, and simultaneously recording confidence level and target movement position information;
judging the confidence coefficient through an error function, and judging whether the target is subjected to scale change, morphological change or shielding through the confidence coefficient; if the scale change occurs, determining a new template; updating the template if the form transformation occurs; the search area is enlarged if occlusion occurs.
Preferably, the infrared image is a single-channel 8-bit image, and the pixel value of the image is x in The image enhancement preprocessing includes: traversing the image, enhancing the gray value of each pixel, and outputting a pixel value x out
Where k is the enhancement ratio.
Preferably, extracting images of different levels on the infrared image by using a characteristic pyramid algorithm means that convolution is respectively carried out on convolution check targets of 2x2, 3x3, 5x5 and 7x7 to respectively obtain blurred images of different degrees; different target features are reflected on different feature images, and image information of different dimensions is detected from images of different dimensions.
Preferably, assuming that S (x, y) is a search graph of MxN and T (x, y) is a template graph of MxN, the search is completed by sliding on the graph to be searched, the target search process specifically includes:
error definition:
wherein,is a sub-graph of the search graph, T is a template graph, < >>Is the average value of the template diagram, the starting position of the upper left corner of the subgraph is (i, j), then the average subgraph +.>Is that;
setting an initial threshold Th0, wherein when random points in the SSDA algorithm are matched, the points are considered to be not necessarily targets when the error accumulation design threshold is considered to be the targets, and the points are directly discarded; the threshold value is set as an empirical value;
randomly selecting non-repeated pixel points in the area to be searched as the center of the tracking frame; calculating the current error, accumulating the error, and recording the current accumulation times H when the error accumulation exceeds a threshold Th 0; then traversing all sub-graphs;
in the traversal process, if the error value is larger than the threshold Th0 under the frequency smaller than H, the operation of selecting the random point to calculate the error is not continued, and the next sub-graph is directly switched;
if a subgraph exists in the traversal process, after H times are calculated, the accumulated error is Th1; if Th1< Th0, then Th0 is updated to Th1;
recording the matching times H and the accumulated error sum in all sub-graph matching in the traversal process, and calculating the average error; after traversing all the sub-graphs, the average error rate and the center coordinates of the smallest sub-graph are output.
Preferably, the specific process of discriminating the confidence coefficient through the error function is as follows:
when a morphological change occurs in the target:
the error probability of minimum error of matching of a plurality of layers obtained by the improved SSDA template matching algorithm is E respectively 0 、E 1 、E 2 、E 3 、E 4 Carrying out normalization processing on each error probability, mapping the error probability to a 0-1 interval, conforming to positive logic and obtaining matching correct probability:
wherein TH is that i A threshold value for each time an error is obtained;
the confidence level at the time t is an evaluation index of each layer by taking the average matching probability of the last 5 frames of pictures:
let the confidence of the i layer obtained at the time t beSetting an error function L t Is a function of the results of these five parameters:
when the target changes morphology, the error function L t Will increase rapidly; when the error exceeds the threshold value, updating the template, wherein the error judging method comprises the following steps:
ΔL t =|L t -L t-1 |
IfΔL t more than or equal to 10 percent, the updated template is an image at the position of the previous frame of template as a template; namely, the returned frame information (x, y, h, w) obtained by the last template matching is used as a current frame selection template to carry out the matching of the next frame;
when the target undergoes a dimensional change:
when the error discrimination function judges that the target is in scale change, three frames are generated by taking the target normally tracked in the previous frame as a reference, wherein the sizes of the three frames are 75%, 120% and 150% of the target respectively, scaling transformation is carried out to reduce the target to the original size, the scaling transformation is carried out and the scaling transformation is compared with the template of the previous frame, and a transformation result with the highest matching degree is used as a new template.
The multi-level template self-adaptive matching target tracking method for the infrared image has the following beneficial effects:
the invention uses image feature pyramid algorithm to carry out different convolution fuzzy operation on the target, obtains feature images with different scales of the image, carries out matching algorithm on the feature images, not only can quicken the searching speed and improve the frame rate, but also can respectively calculate the value of the matching function of each layer of the feature pyramid when the scale of the target changes, select the best matching feature image and carry out template switching. The method ensures that the effective characteristics of the target can be still saved when the target is subjected to scale change. The method provided by the invention can keep good tracking when the size, the shape and the shielding of the target are changed.
Drawings
In order to more clearly illustrate the embodiments of the present invention and the design thereof, the drawings required for the embodiments will be briefly described below. The drawings in the following description are only some of the embodiments of the present invention and other drawings may be made by those skilled in the art without the exercise of inventive faculty.
FIG. 1 is a flow chart of a multi-level template adaptive matching target tracking method for infrared images according to embodiment 1 of the present invention;
FIG. 2 is a specific implementation process of the multi-level template adaptive matching target tracking method for infrared images according to embodiment 1 of the present invention;
FIG. 3 is an infrared image enhancement contrast map;
FIG. 4 is a diagram illustrating the extraction of different levels of image structure using feature pyramids;
FIG. 5 is a graph of the effect of processing an infrared image directly using Robert convolution kernel;
FIG. 6 is a graph of the effect of processing an infrared image after superimposing the results of two convolution kernels;
FIG. 7 is an effect diagram of the superimposed image with small noise removed;
FIG. 8 is an effect graph of a 3x3 convolution of an enhanced image;
FIG. 9 is a graph of the effect of processing an enhanced image using a modified Laplace operator;
FIG. 10 is an effect diagram of expanding receptive field area while reducing weight parameters using a multiple convolution kernel superposition approach;
FIG. 11 is a graph showing the effect of convolving layers 3 and 4 with 5x5 and 7x7 convolutions, respectively;
FIG. 12(a) FIG. 12 (h)The change condition on each image when the scale of the target changes;
FIG. 13 is a schematic diagram of a search;
FIG. 14 is a flow chart of random matching;
FIG. 15 is an image when a change in scale of the target occurs;
FIG. 16 is a diagram of an adaptive search area variation;
fig. 17 is a schematic diagram of a search area.
Detailed Description
The present invention will be described in detail below with reference to the drawings and the embodiments, so that those skilled in the art can better understand the technical scheme of the present invention and can implement the same. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.
Example 1
The invention provides a multi-level template self-adaptive matching target tracking method for infrared images, which is particularly shown in fig. 1 and 2 and comprises the following steps:
s1, acquiring an infrared image, and performing image enhancement pretreatment on the infrared image to obtain the first layer of the image characteristic golden character tower.
The infrared image reflects the temperature distribution of the image, and the tracking target is an object with obvious difference between the temperature and the surrounding environment. To make the target more visible, image enhancement is made for the infrared image to facilitate the implementation of the subsequent tracking algorithm.
The infrared image in this embodiment is a single channel 8bit image,the pixel value of the image is x in The enhanced preprocessing of the infrared image comprises: traversing the image, enhancing the gray value of each pixel, and outputting a pixel value x out
Where k is the enhancement ratio, empirically chosen to be 1.3.
The enhanced image effect is shown in fig. 3, and it can be seen that the object is more clearly distinguished from the background.
S2, manually marking a target position on the infrared image, namely, giving a target frame (group trunk) of the first frame.
S3, extracting images with different layers on the infrared image by utilizing a characteristic pyramid algorithm.
The feature pyramid algorithm is used for extracting images of different levels on the infrared image, namely, convolution is respectively carried out on convolution check targets of 2x2, 3x3, 5x5 and 7x7, so that blurred images of different degrees are respectively obtained. Different target features can be reflected on different feature images, image information of different dimensions is detected on images of different dimensions, and the image structure is shown in fig. 4.
The resolution of the infrared camera used in the present invention is 640x512.
1. Layer 0-enhanced image
The enhanced image is taken as layer 0. The size remains 640x512 from the original image.
2. Layer 1-2 x2 convolution image
The 2x2 convolution image uses a modified version of the Robert convolution kernel, step 1, using padding so its output size image remains consistent with the original, 640x512. The Robert convolution kernel is used as a first-order differential, has small calculation amount and is sensitive to details.
The Robert convolution kernel comprises the following two convolution kernels:
because the infrared image is used in the invention, after the infrared image is directly used, the overall brightness of the obtained image is reduced, and the effect is not ideal, as shown in fig. 5.
So that it is changed to a structure in which,
and the results of the two convolution kernels are superimposed, the effect of which is shown in fig. 6:
the final superimposed image is used as an image on operation, small noise points are removed, the effect is shown in fig. 7, the image quality is clear, and the main contour features of the target are obvious.
3. Layer 2-3 x3 convolution image
3x3 convolution is performed on the enhanced image, wherein the convolution kernel adopts an improved Laplacian, namely:
the result of such convolution is that the overall brightness is low, as shown in fig. 8:
thus using the modified laplace operator:
the brightness of the improved image is significantly enhanced, and the result is shown in fig. 9.
It can be seen that the image quality details after the 3x3 convolution are more rich.
Since the 3x3 convolution uses a stride of 2, the formula is calculated from the convolved image size:
where the size of the input image is H x W, the convolution kernel size is FH x FW, the stride is S, and the padding (padding) is P.
Thus, by 3x3 convolution and clipping out surrounding pixels, the resulting image size is 320 x 256.
When the size of the image is smaller, the target searching speed is higher when template matching is performed in the later period, and the speed is improved.
4. Layer 3-5 x5 convolution image
5. Layer 4-7 x7 convolution image
The 3 rd layer and the 4 th layer respectively carry out 5x5 and 7x7 convolution, but the conventional 5x5 and 7x7 convolution methods provide certain difficulty for the calculation of padding and the programming of step length, and the parameters are more in quantity, if a reasonable convolution kernel is designed, a great deal of time is wasted, so that the method of convolution kernel accumulation in deep learning is used for reference, the same receptive field is obtained by utilizing a plurality of 3x3 convolution kernels, and the schematic diagram is shown in fig. 10.
Wherein the 5x5 convolved image is cropped to a size of 160x128 and the 7x7 convolved image is cropped to a size of 80x64. The convolved image is shown in fig. 11 (enlarged to the same view for ease of viewing, not to full size).
In this way, when the change of the target from far to near is that the detail information is not required to be traced, the approximate position can be judged, and the characteristics and advantages of each image are comprehensively utilized by a later error function discrimination method, so that a matching method is comprehensively designed.
The situation on the individual images when the scale change of the object occurs is shown in fig. 12.
S4, carrying out an improved SSDA template matching algorithm on the images of different layers obtained through the feature pyramid in the S3, selecting the maximum possible position of the target by a frame, and simultaneously recording the confidence coefficient and the target movement position information.
Let S (x, y) be an mxn search graph, T (x, y) be an MxN template graph, S i,j Is a sub-graph (the starting position of the upper left corner is (i, j)) in the search graph, and the searching is completed by sliding on the graph to be searched, as shown in fig. 13, the target searching process specifically includes:
s4.1, error definition:
wherein,is a sub-graph of the search graph, T is a template graph, < >>Is the average value of the template diagram, the starting position of the upper left corner of the subgraph is (i, j), then the average subgraph +.>Is that;
s4.2, setting an initial threshold Th0, and when the threshold is random point matching in the SSDA algorithm, considering that the point is not necessarily a target when the error accumulation design threshold is considered, and directly discarding the point. The threshold is set to an empirical value, typically 30-40.
S4.3 random matching method
Randomly selecting non-repeated pixel points in the area to be searched as the center of the tracking frame. And calculating the current error, accumulating the error, and recording the current accumulation times H when the error accumulation exceeds a threshold Th 0. All sub-graphs are then traversed.
In the traversal process, if the error value is larger than the threshold Th0 under the frequency smaller than H, the operation of selecting the random point to calculate the error is not continued, and the next sub-graph is directly switched.
If there is a sub-graph during traversal, the accumulated error is Th1 after H times are calculated. If Th1< Th0, then Th0 is updated to Th1.
In order to ensure false detection caused by accidental conditions in the random calculation process, a lower limit should be set for the random number H, which is generally not less than 40% of the number of sub-pixels.
And recording the matching times H and the accumulated error sum in all sub-graph matching in the traversal process, and calculating the average error sum Ea. When traversing all the sub-graphs, outputting average error rate and the center coordinates of the smallest sub-graph, and randomly matching the sub-graphs, as shown in fig. 14.
In fig. 2, the line corresponding to a in fig. (a) and the line corresponding to a in fig. (b) are one line, and the line corresponding to b in fig. (d) are one line; the lines corresponding to c and d in fig. (b) are one line respectively corresponding to c and d in fig. (c); the lines corresponding to e, f, and g in the drawing (c) are one line each corresponding to e, f, and g in the drawing (d).
In fig. 14, the line corresponding to a, b, c, d in fig. (a) is one line each corresponding to a, b, c, d in fig. (b).
S5, judging the confidence coefficient obtained by the improved SSDA algorithm on each layer obtained in the S4 through an error function, and judging whether the target is subjected to scale change, morphological change or shielding through the confidence coefficient; if the scale change occurs, determining a new template; if the form transformation occurs, updating the template in time; if shielding occurs, the search area is enlarged, and the method specifically comprises the following steps:
the object is information GT (x, y, h, w) according to the object given in the first frame of the original image, which is the upper left corner coordinates (x, y) of the object frame, the height h and the width w of the object frame, respectively. And obtaining the target frame information of the first frame on other feature graphs according to the image size proportion of the convolution operation. The corresponding target frame information of the first frame at the image enhancement layer, 2x2, 3x3, 5x5 and 7x7 layers is GT0, GT1, GT2, GT3, GT4, respectively. The target match results M (x, y, P) at each layer can be derived by performing a separate modified version of the SSDA matching algorithm on each layer, where (x, y) is the center coordinates of the box of the target match results and P is the confidence level.
Since the enhancement layer reflects the original information of the image, the 2x2 convolution layer reflects the contour information of the target, the 3x3 convolution layer reflects the detail texture information of the target, and the 5x5 and 7x7 convolution layers reflect the approximate position information of the target, the following error function discrimination template updating method is designed by integrating the image features:
1. when the target changes morphology
The error probability of minimum error of matching of a plurality of layers obtained by the improved SSDA template matching algorithm is E respectively 0 、E 1 、E 2 、E 3 、E 4 . In order to facilitate subsequent calculation, normalization processing is carried out, the normalization processing is mapped to a 0-1 interval, positive logic is met, and the matching correct probability is obtained, wherein the method comprises the following steps:
wherein TH is TH i For each threshold at which an error is obtained.
the confidence level at the time t is an evaluation index of each layer by taking the average matching probability of the last 5 frames of pictures:
it is apparent that the matching term with the smallest error of each layer is the highest in weight, occupies 80% of weight according to the empirical formula, and the rest layers occupy 5% of weight. Let the confidence of the i layer obtained at the time t beSet an error function L t Is set by the resulting function of these five parameters:
when the target changes morphology, the error function L t Will increase rapidly. And when the judgment error exceeds the threshold value, updating the template. The error judging method comprises the following steps:
ΔL t =|L t -L t-1 |
IfΔL t and (3) if the position of the template is more than or equal to 10%, updating the template to be the image at the position of the template of the previous frame as the template. Namely, the returned frame information (x, y, h, w) obtained by the last template matching is used as a current frame selection template to carry out the matching of the next frame.
2. When the target is scaled
Considering that the target is likely to change in scale, when the template is updated each time, one-time scale change evaluation is performed, if the error range is always kept within 10%, one-time scale change evaluation is performed on the enhanced image every 10 frames, and if the scale change exists, the size of the template after the change is proportionally assigned to other layers. The idea of evaluating the dimensional change is as follows:
and evaluating the scale change at the time t. Because the error of the t-1 moment is within 10%, the tracking frame of the t-1 moment truly reflects the target position, so that the tracking frame of the previous frame can be used as a group trunk. And on the image at the time t, performing multi-scale selection by taking the group trunk information of the previous frame as a basis. The selection method comprises the following steps:
an additional 3 reference frames referenced to the group trunk are selected at time t-1, 2 larger reference frames and 1 smaller reference frame, respectively. The length and width are 150%, 120% and 75% of groundtrunk, respectively. As shown in fig. 15, the first layer frame, the second layer frame, and the innermost layer frame are respectively from the outside to the inside, wherein the third layer frame is a group trunk frame.
When the target is near a near infrared camera, it can be seen that the outermost box therein is more adapted to the target.
The specific implementation method is as follows:
assume that at time t it is determined that a template scale update is required.
The tracking frame at the time t-1 is selected as the group trunk, and three tracking frames, namely a tracking box1, a tracking box2 and a tracking box3, are selected in the image at the time t according to 150%, 120% and 75% of the group trunk according to the length and the width respectively. As shown in fig. 15, the first layer frame, the second layer frame, and the innermost layer frame are respectively from the outside to the inside, wherein the third layer frame is the groundtrunk frame from the outside to the inside.
According to the coordinate position of the group trunk, at the same center position at the time t, selecting the images framed by the tracking box1, the tracking box2 and the tracking box3, and carrying out operations of reducing the length and width of the images framed by the tracking box1, the tracking box2 and the tracking box3 to 66%, 83% and 133% respectively, so as to obtain images 1, 2 and 3.
And comparing the three graphs with the ground trunk to carry out an improved SSDA matching algorithm. The original size of the image with the smallest error is transmitted to all layers as a new template size parameter, and the template size is updated.
The step of size updating is performed only on the enhanced image layer (layer 0), the 3x3 convolution layer (layer 2), and the 5x5 convolution layer (layer 3), taking into account the difference in image characteristics of the respective layers.
3. Adaptive search area variation
The template matching search mode of the calculation method is an 8-neighborhood search method, namely, the search with the step length of one pixel is carried out in a wide range with the length of 3 times of group trunk and the width of 3 times of group trunk by taking the center position of a group trunk as the center, and filling (padding) is not carried out in the sliding window process.
And comparing the central position P1 (x, y) of the frame with highest confidence coefficient obtained by matching with the central position P0 (x, y) of the frame of the previous frame each time, obtaining a target leaving position, classifying the result into 8 categories of up, down, left, right, up left, up right, down left and down right, and storing the classification result of the latest 10 frames. As shown in fig. 16.
However, targets sometimes appear occluded or lost due to environmental reasons. The appearance at this time is that the confidence of the image matching of a certain frame suddenly drops much.
The strategy of the method is to change to a larger scope search while stopping the updating of the template.
Assume that the tracking confidence of the target at time t drops below 20% (empirical value). At this point an adaptive search area change algorithm is started. The specific search range is a search area which is enlarged to 25 times of the original position, namely, the length and the width are respectively 5 times of the previous frame ground.
And simultaneously, predicting the possible position of the target according to the position of the target leaving of the last 10 frames. A search area of 3x3 times the group trunk area is added. The method for predicting the possible positions of the target is to take one of the 8 categories of the last 10 frames, namely the upper, lower, left, right, upper left, upper right, lower left and lower right, with the largest occurrence number as a prediction result.
In summary, when the tracking confidence of the target at the time t is reduced to be lower than 20% (experience value), stopping tracking, predicting the possible position of the target, and expanding the search area, wherein the schematic diagram of the search area is shown in fig. 17;
the middle maximum area is an expanded search area, the uppermost projected area is an expanded search area when the target leaves from above, and the lower right projected area is an expanded search area when the target leaves from below right.
In view of the effectiveness of the algorithm, the adaptive search mode is only performed on 7x7 and 5x5 convolution layers, and when the resolution of the infrared camera is 640x512, the image sizes of the 7x7 and 5x5 convolution layers are 80x64 and 160x128. Even if the search area is enlarged, too much time is not consumed.
The invention introduces the idea of an SSDA template (sequence similarity detection algorithm sequential similiarity detection algorithm) matching algorithm into image template matching, updates the template at proper time by a reasonably designed error function identification method, and can update the template in time when the target changes in form during rolling and yawing motions so as to carry out continuous tracking; when matching is carried out on each feature pyramid by a design error function discrimination method, whether the target is lost or not is judged according to the value of the error function, if the target is lost, the target is blocked, at the moment, the approximate position of the target is predicted according to the central position change of the regression frame by saving the templates of the latest frames, a search domain is updated, and the target is searched in the new search domain. When the error discrimination function judges that the target is lost, the approximate range of the target reappearance is predicted according to the central coordinate position of the tracking frame of a plurality of frames of pictures before the target is lost, and the search domain is changed in a self-adaptive mode. The method can keep good tracking when the target is subjected to scale change, morphological change and shielding.
The above embodiments are merely preferred embodiments of the present invention, the protection scope of the present invention is not limited thereto, and any simple changes or equivalent substitutions of technical solutions that can be obviously obtained by those skilled in the art within the technical scope of the present invention disclosed in the present invention belong to the protection scope of the present invention.

Claims (2)

1. A multi-level template self-adaptive matching target tracking method for infrared images is characterized by comprising the following steps:
acquiring an infrared image, and performing image enhancement pretreatment on the infrared image to obtain a first layer of an image feature pyramid;
manually marking a target position, namely a target frame of a given first frame, on the infrared image;
extracting images of different layers in a target frame on the infrared image by utilizing a feature pyramid algorithm:
aiming at images of different layers obtained through the feature pyramid, performing target searching on each layer by using an SSDA template matching algorithm, selecting the maximum possible position of a target by a frame, and simultaneously recording confidence coefficient and target movement position information;
judging the confidence coefficient through an error function, and judging whether the target is subjected to scale change, morphological change or shielding through the confidence coefficient; if the scale change occurs, determining a new template; if the form transformation occurs, updating the template in time; enlarging the search area if shielding occurs;
extracting images of different levels on the infrared image by using a characteristic pyramid algorithm means that convolution is respectively carried out on the convolution check targets of 2x2, 3x3, 5x5 and 7x7 to respectively obtain blurred images of different degrees; different target features are reflected on different feature images, image information of different dimensions is detected on images of different dimensions, and the image structure is specifically as follows;
layer 0 is an enhanced image, its size remains 640x512 from the original image;
layer 1 is a 2x2 convolution image, the 2x2 convolution uses a modified version of the Robert convolution kernel, the stride is 1, and the output size image is 640x512 using padding;
the Robert convolution kernel comprises two convolution kernels, the results of the two convolution kernels are overlapped, the final overlapped image is used as an image opening operation, and small noise points are removed; the Robert convolution kernel structure is as follows:
layer 2 is a 3x3 convolution image, the 3x3 convolution is used to perform a 3x3 convolution on the enhanced image, wherein the convolution kernel employs an improved laplacian:
since the 3x3 convolution uses a stride of 2, the calculation formula according to the convolved image size is:
the size of the input image is H x W, the convolution kernel size is FH x FW, the stride is S, the filling padding is P, the surrounding pixels are cut off after 3x3 convolution, and the obtained image size is 320 x 256;
layer 3 is a 5x5 convolution image, and the size of the 5x5 convolution image is 160x128 after clipping;
layer 4 is a 7x7 convolution image, and the size of the 7x7 convolution image is 80x64 after being cut;
assuming that S (x, y) is a search graph of MxN and T (x, y) is a template graph of MxN, the search is completed by sliding on the graph to be searched, and the target search process specifically includes:
error definition:
wherein S is i,j Is a sub-graph of the search graph, T is a template graph,is the average value of the template diagram, the starting position of the upper left corner of the subgraph is (i, j), then the average subgraph +.>Is that;
setting an initial threshold Th0, wherein the threshold is a random point matching in an SSDA algorithm, and when the error accumulation design threshold considers that the point is not necessarily a target, directly discarding the point; the threshold value is set as an empirical value;
randomly selecting non-repeated pixel points in the area to be searched as the center of the tracking frame; calculating the current error, accumulating the error, and recording the current accumulation times H when the error accumulation exceeds a threshold Th 0; then traversing all sub-graphs;
in the traversal process, if the error value is larger than the threshold Th0 under the frequency smaller than H, the operation of selecting the random point to calculate the error is not continued, and the next sub-graph is directly switched;
if a subgraph exists in the traversal process, after H times are calculated, the accumulated error is Th1; if Th1< Th0, then Th0 is updated to Th1;
recording the matching times H and the accumulated error sum in all sub-graph matching in the traversal process, and calculating the average error; after traversing all the subgraphs, outputting an average error rate and the center coordinates of the smallest subgraphs;
the specific process for judging the confidence coefficient through the error function is as follows:
when a morphological change occurs in the target:
the error probability of minimum error of matching of a plurality of layers obtained by using an SSDA template matching algorithm is E 0 、E 1 、E 2 、E 3 、E 4 Carrying out normalization processing on each error probability, mapping the error probability to a 0-1 interval, conforming to positive logic and obtaining matching correct probability:
wherein TH is that i A threshold value for each time an error is obtained;
the confidence level at the time t is an evaluation index of each layer by taking the average matching probability of the last 5 frames of pictures:
let the confidence of the i layer obtained at the time t beSetting an error function L t Is a function of the results of these five parameters:
when the target changes morphology, the error function L t Will increase rapidly; when the error exceeds the threshold value, updating the template, wherein the error judging method comprises the following steps:
ΔL t =|L t -L t-1 |
IfΔL t more than or equal to 10 percent, the updated template is an image at the position of the previous frame of template as a template; namely, the returned frame information (x, y, h, w) obtained by the last template matching is used as a current frame selection template to carry out the matching of the next frame;
when the target undergoes a dimensional change:
when the error discrimination function judges that the target is in scale change, three frames are generated by taking the target normally tracked in the previous frame as a reference, wherein the sizes of the three frames are 75%, 120% and 150% of the target respectively, scaling transformation is carried out to restore the target to the original size, comparison is carried out with the template of the previous frame, and a transformation result with the highest matching degree is used as a new template.
2. The method for adaptively matching a target tracking multi-level template for an infrared image according to claim 1, wherein the infrared image is a single-channel 8-bit image, and the pixel value of the image is x in The image enhancement preprocessing includes: traversing the image, enhancing the gray value of each pixel, and outputting a pixel value x out
Where k is the enhancement ratio.
CN202110830766.2A 2021-07-22 2021-07-22 Multi-level template self-adaptive matching target tracking method for infrared image Active CN113724290B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110830766.2A CN113724290B (en) 2021-07-22 2021-07-22 Multi-level template self-adaptive matching target tracking method for infrared image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110830766.2A CN113724290B (en) 2021-07-22 2021-07-22 Multi-level template self-adaptive matching target tracking method for infrared image

Publications (2)

Publication Number Publication Date
CN113724290A CN113724290A (en) 2021-11-30
CN113724290B true CN113724290B (en) 2024-03-05

Family

ID=78673690

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110830766.2A Active CN113724290B (en) 2021-07-22 2021-07-22 Multi-level template self-adaptive matching target tracking method for infrared image

Country Status (1)

Country Link
CN (1) CN113724290B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117290537B (en) * 2023-09-28 2024-06-07 腾讯科技(深圳)有限公司 Image searching method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106875415A (en) * 2016-12-29 2017-06-20 北京理工雷科电子信息技术有限公司 The continuous-stable tracking of small and weak moving-target in a kind of dynamic background
CN110021033A (en) * 2019-02-22 2019-07-16 广西师范大学 A kind of method for tracking target based on the twin network of pyramid
CN111508002A (en) * 2020-04-20 2020-08-07 北京理工大学 Small-sized low-flying target visual detection tracking system and method thereof
CN112614136A (en) * 2020-12-31 2021-04-06 华中光电技术研究所(中国船舶重工集团公司第七一七研究所) Infrared small target real-time instance segmentation method and device
US11004212B1 (en) * 2020-01-02 2021-05-11 Hong Kong Applied Science and Technology Research Institute Company Limited Object tracking method and system using iterative template matching
CN113052873A (en) * 2021-03-16 2021-06-29 南京理工大学 Single-target tracking method for on-line self-supervision learning scene adaptation

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106875415A (en) * 2016-12-29 2017-06-20 北京理工雷科电子信息技术有限公司 The continuous-stable tracking of small and weak moving-target in a kind of dynamic background
CN110021033A (en) * 2019-02-22 2019-07-16 广西师范大学 A kind of method for tracking target based on the twin network of pyramid
US11004212B1 (en) * 2020-01-02 2021-05-11 Hong Kong Applied Science and Technology Research Institute Company Limited Object tracking method and system using iterative template matching
CN111508002A (en) * 2020-04-20 2020-08-07 北京理工大学 Small-sized low-flying target visual detection tracking system and method thereof
CN112614136A (en) * 2020-12-31 2021-04-06 华中光电技术研究所(中国船舶重工集团公司第七一七研究所) Infrared small target real-time instance segmentation method and device
CN113052873A (en) * 2021-03-16 2021-06-29 南京理工大学 Single-target tracking method for on-line self-supervision learning scene adaptation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Yu Wang.Lednet: A Lightweight Encoder-Decoder Network for Real-Time Semantic Segmentation.《2019 IEEE International Conference on Image Processing (ICIP)》.2019,全文. *
张红源.图像匹配经典算法及其改进方法研究.《软件开发与应用》.2008,第27卷(第9期),全文. *

Also Published As

Publication number Publication date
CN113724290A (en) 2021-11-30

Similar Documents

Publication Publication Date Title
WO2022002039A1 (en) Visual positioning method and device based on visual map
CN107644429B (en) Video segmentation method based on strong target constraint video saliency
CN108304798B (en) Street level order event video detection method based on deep learning and motion consistency
CN109101897A (en) Object detection method, system and the relevant device of underwater robot
CN112801008B (en) Pedestrian re-recognition method and device, electronic equipment and readable storage medium
CN112052797A (en) MaskRCNN-based video fire identification method and system
CN111860494B (en) Optimization method and device for image target detection, electronic equipment and storage medium
CN109658424B (en) Improved robust two-dimensional OTSU threshold image segmentation method
CN110309808B (en) Self-adaptive smoke root node detection method in large-scale space
CN112598713A (en) Offshore submarine fish detection and tracking statistical method based on deep learning
CN109035300B (en) Target tracking method based on depth feature and average peak correlation energy
CN113610895A (en) Target tracking method and device, electronic equipment and readable storage medium
CN107633226A (en) A kind of human action Tracking Recognition method and system
CN113888461A (en) Method, system and equipment for detecting defects of hardware parts based on deep learning
CN111461011A (en) Weak and small target detection method based on probabilistic pipeline filtering
Iraei et al. Object tracking with occlusion handling using mean shift, Kalman filter and edge histogram
CN115661860A (en) Method, device and system for dog behavior and action recognition technology and storage medium
CN108765463B (en) Moving target detection method combining region extraction and improved textural features
CN113724290B (en) Multi-level template self-adaptive matching target tracking method for infrared image
CN109978916B (en) Vibe moving target detection method based on gray level image feature matching
CN114898306B (en) Method and device for detecting target orientation and electronic equipment
CN107169533B (en) SAR image coastline detection algorithm of probability factor TMF of super-pixel
CN113470001B (en) Target searching method for infrared image
CN109389624A (en) Model drift rejection method and its device based on measuring similarity
CN115984712A (en) Multi-scale feature-based remote sensing image small target detection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant