CN113724290B

CN113724290B - Multi-level template self-adaptive matching target tracking method for infrared image

Info

Publication number: CN113724290B
Application number: CN202110830766.2A
Authority: CN
Inventors: 吕梅柏; 刘晓东
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2021-07-22
Filing date: 2021-07-22
Publication date: 2024-03-05
Anticipated expiration: 2041-07-22
Also published as: CN113724290A

Abstract

The invention provides a multi-level template self-adaptive matching target tracking method for infrared images, which belongs to the field of image processing and comprises the following steps: acquiring an infrared image, and performing image enhancement pretreatment on the infrared image to obtain a first layer of an image feature pyramid; manually marking a target position on the infrared image, namely, giving a target frame of a first frame; extracting images of different layers in a target frame on the infrared image by utilizing a feature pyramid algorithm: performing target searching on each layer by using an SSDA template matching algorithm, selecting the maximum possible position of a target by a frame, and simultaneously recording confidence and target movement position information; judging the confidence coefficient through an error function, and determining a new template if the scale change occurs; if the form transformation occurs, updating the template in time; the search area is enlarged if occlusion occurs. The method provided by the invention can still store the effective characteristics of the target when the target is subjected to scale change, form change and shielding, and keep good tracking.

Description

Multi-level template self-adaptive matching target tracking method for infrared image

Technical Field

The invention belongs to the field of image processing, and particularly relates to a multi-level template self-adaptive matching target tracking method for infrared images.

Background

With the development of intelligence, images become an important means of acquiring information. In the acquisition of images, image processing is a very important link, and image processing technology has been widely used in various fields such as industrial safety medical management. In image processing, detection and tracking of a moving target are heavy points of an image processing technology, and detection and tracking processing is generally realized by adopting an image processing tracking algorithm.

The traditional image processing tracking method adopts a template matching algorithm at present. Conventional template matching algorithms tend to lose objects when faced with dimensional changes, morphological changes, and occlusion of the object. The existing template matching tracking algorithm has the following three defects: the target cannot be continuously tracked when the morphology changes; the target cannot be continuously tracked when the scale of the target is changed; the target cannot be continuously tracked when it is occluded.

Therefore, the invention provides a multi-level template self-adaptive matching target tracking method for infrared images.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention provides a multi-level template self-adaptive matching target tracking method for infrared images.

In order to achieve the above object, the present invention provides the following technical solutions:

a multi-level template self-adaptive matching target tracking method for infrared images comprises the following steps:

acquiring an infrared image, and performing image enhancement pretreatment on the infrared image to obtain the first layer of an image characteristic golden character tower;

manually marking a target position, namely a target frame of a given first frame, on the infrared image;

extracting images of different layers in a target frame on the infrared image by utilizing a feature pyramid algorithm:

aiming at images of different layers obtained through the feature pyramid, performing target searching on each layer by using an improved SSDA template matching algorithm, selecting the maximum possible position of a target by a frame, and simultaneously recording confidence level and target movement position information;

judging the confidence coefficient through an error function, and judging whether the target is subjected to scale change, morphological change or shielding through the confidence coefficient; if the scale change occurs, determining a new template; updating the template if the form transformation occurs; the search area is enlarged if occlusion occurs.

Preferably, the infrared image is a single-channel 8-bit image, and the pixel value of the image is x _in The image enhancement preprocessing includes: traversing the image, enhancing the gray value of each pixel, and outputting a pixel value x _out ：

Where k is the enhancement ratio.

Preferably, extracting images of different levels on the infrared image by using a characteristic pyramid algorithm means that convolution is respectively carried out on convolution check targets of 2x2, 3x3, 5x5 and 7x7 to respectively obtain blurred images of different degrees; different target features are reflected on different feature images, and image information of different dimensions is detected from images of different dimensions.

Preferably, assuming that S (x, y) is a search graph of MxN and T (x, y) is a template graph of MxN, the search is completed by sliding on the graph to be searched, the target search process specifically includes:

error definition:

wherein,is a sub-graph of the search graph, T is a template graph, < >>Is the average value of the template diagram, the starting position of the upper left corner of the subgraph is (i, j), then the average subgraph +.>Is that;

setting an initial threshold Th0, wherein when random points in the SSDA algorithm are matched, the points are considered to be not necessarily targets when the error accumulation design threshold is considered to be the targets, and the points are directly discarded; the threshold value is set as an empirical value;

randomly selecting non-repeated pixel points in the area to be searched as the center of the tracking frame; calculating the current error, accumulating the error, and recording the current accumulation times H when the error accumulation exceeds a threshold Th 0; then traversing all sub-graphs;

in the traversal process, if the error value is larger than the threshold Th0 under the frequency smaller than H, the operation of selecting the random point to calculate the error is not continued, and the next sub-graph is directly switched;

if a subgraph exists in the traversal process, after H times are calculated, the accumulated error is Th1; if Th1< Th0, then Th0 is updated to Th1;

recording the matching times H and the accumulated error sum in all sub-graph matching in the traversal process, and calculating the average error; after traversing all the sub-graphs, the average error rate and the center coordinates of the smallest sub-graph are output.

Preferably, the specific process of discriminating the confidence coefficient through the error function is as follows:

when a morphological change occurs in the target:

the error probability of minimum error of matching of a plurality of layers obtained by the improved SSDA template matching algorithm is E respectively ₀ 、E ₁ 、E ₂ 、E ₃ 、E ₄ Carrying out normalization processing on each error probability, mapping the error probability to a 0-1 interval, conforming to positive logic and obtaining matching correct probability:

wherein TH is that _i A threshold value for each time an error is obtained;

the confidence level at the time t is an evaluation index of each layer by taking the average matching probability of the last 5 frames of pictures:

let the confidence of the i layer obtained at the time t beSetting an error function L _t Is a function of the results of these five parameters:

when the target changes morphology, the error function L _t Will increase rapidly; when the error exceeds the threshold value, updating the template, wherein the error judging method comprises the following steps:

ΔL _t ＝|L _t -L _t-1 |

IfΔL _t more than or equal to 10 percent, the updated template is an image at the position of the previous frame of template as a template; namely, the returned frame information (x, y, h, w) obtained by the last template matching is used as a current frame selection template to carry out the matching of the next frame;

when the target undergoes a dimensional change:

when the error discrimination function judges that the target is in scale change, three frames are generated by taking the target normally tracked in the previous frame as a reference, wherein the sizes of the three frames are 75%, 120% and 150% of the target respectively, scaling transformation is carried out to reduce the target to the original size, the scaling transformation is carried out and the scaling transformation is compared with the template of the previous frame, and a transformation result with the highest matching degree is used as a new template.

The multi-level template self-adaptive matching target tracking method for the infrared image has the following beneficial effects:

the invention uses image feature pyramid algorithm to carry out different convolution fuzzy operation on the target, obtains feature images with different scales of the image, carries out matching algorithm on the feature images, not only can quicken the searching speed and improve the frame rate, but also can respectively calculate the value of the matching function of each layer of the feature pyramid when the scale of the target changes, select the best matching feature image and carry out template switching. The method ensures that the effective characteristics of the target can be still saved when the target is subjected to scale change. The method provided by the invention can keep good tracking when the size, the shape and the shielding of the target are changed.

Drawings

In order to more clearly illustrate the embodiments of the present invention and the design thereof, the drawings required for the embodiments will be briefly described below. The drawings in the following description are only some of the embodiments of the present invention and other drawings may be made by those skilled in the art without the exercise of inventive faculty.

FIG. 1 is a flow chart of a multi-level template adaptive matching target tracking method for infrared images according to embodiment 1 of the present invention;

FIG. 2 is a specific implementation process of the multi-level template adaptive matching target tracking method for infrared images according to embodiment 1 of the present invention;

FIG. 3 is an infrared image enhancement contrast map;

FIG. 4 is a diagram illustrating the extraction of different levels of image structure using feature pyramids;

FIG. 5 is a graph of the effect of processing an infrared image directly using Robert convolution kernel;

FIG. 6 is a graph of the effect of processing an infrared image after superimposing the results of two convolution kernels;

FIG. 7 is an effect diagram of the superimposed image with small noise removed;

FIG. 8 is an effect graph of a 3x3 convolution of an enhanced image;

FIG. 9 is a graph of the effect of processing an enhanced image using a modified Laplace operator;

FIG. 10 is an effect diagram of expanding receptive field area while reducing weight parameters using a multiple convolution kernel superposition approach;

FIG. 11 is a graph showing the effect of convolving layers 3 and 4 with 5x5 and 7x7 convolutions, respectively;

FIG. 12(a) FIG. 12 (h)The change condition on each image when the scale of the target changes;

FIG. 13 is a schematic diagram of a search;

FIG. 14 is a flow chart of random matching;

FIG. 15 is an image when a change in scale of the target occurs;

FIG. 16 is a diagram of an adaptive search area variation;

fig. 17 is a schematic diagram of a search area.

Detailed Description

The present invention will be described in detail below with reference to the drawings and the embodiments, so that those skilled in the art can better understand the technical scheme of the present invention and can implement the same. The following examples are only for more clearly illustrating the technical aspects of the present invention, and are not intended to limit the scope of the present invention.

Example 1

The invention provides a multi-level template self-adaptive matching target tracking method for infrared images, which is particularly shown in fig. 1 and 2 and comprises the following steps:

s1, acquiring an infrared image, and performing image enhancement pretreatment on the infrared image to obtain the first layer of the image characteristic golden character tower.

The infrared image reflects the temperature distribution of the image, and the tracking target is an object with obvious difference between the temperature and the surrounding environment. To make the target more visible, image enhancement is made for the infrared image to facilitate the implementation of the subsequent tracking algorithm.

The infrared image in this embodiment is a single channel 8bit image,the pixel value of the image is x _in The enhanced preprocessing of the infrared image comprises: traversing the image, enhancing the gray value of each pixel, and outputting a pixel value x _out ：

Where k is the enhancement ratio, empirically chosen to be 1.3.

The enhanced image effect is shown in fig. 3, and it can be seen that the object is more clearly distinguished from the background.

S2, manually marking a target position on the infrared image, namely, giving a target frame (group trunk) of the first frame.

S3, extracting images with different layers on the infrared image by utilizing a characteristic pyramid algorithm.

The feature pyramid algorithm is used for extracting images of different levels on the infrared image, namely, convolution is respectively carried out on convolution check targets of 2x2, 3x3, 5x5 and 7x7, so that blurred images of different degrees are respectively obtained. Different target features can be reflected on different feature images, image information of different dimensions is detected on images of different dimensions, and the image structure is shown in fig. 4.

The resolution of the infrared camera used in the present invention is 640x512.

1. Layer 0-enhanced image

The enhanced image is taken as layer 0. The size remains 640x512 from the original image.

2. Layer 1-2 x2 convolution image

The 2x2 convolution image uses a modified version of the Robert convolution kernel, step 1, using padding so its output size image remains consistent with the original, 640x512. The Robert convolution kernel is used as a first-order differential, has small calculation amount and is sensitive to details.

The Robert convolution kernel comprises the following two convolution kernels:

because the infrared image is used in the invention, after the infrared image is directly used, the overall brightness of the obtained image is reduced, and the effect is not ideal, as shown in fig. 5.

So that it is changed to a structure in which,

and the results of the two convolution kernels are superimposed, the effect of which is shown in fig. 6:

the final superimposed image is used as an image on operation, small noise points are removed, the effect is shown in fig. 7, the image quality is clear, and the main contour features of the target are obvious.

3. Layer 2-3 x3 convolution image

3x3 convolution is performed on the enhanced image, wherein the convolution kernel adopts an improved Laplacian, namely:

the result of such convolution is that the overall brightness is low, as shown in fig. 8:

thus using the modified laplace operator:

the brightness of the improved image is significantly enhanced, and the result is shown in fig. 9.

It can be seen that the image quality details after the 3x3 convolution are more rich.

Since the 3x3 convolution uses a stride of 2, the formula is calculated from the convolved image size:

where the size of the input image is H x W, the convolution kernel size is FH x FW, the stride is S, and the padding (padding) is P.

Thus, by 3x3 convolution and clipping out surrounding pixels, the resulting image size is 320 x 256.

When the size of the image is smaller, the target searching speed is higher when template matching is performed in the later period, and the speed is improved.

4. Layer 3-5 x5 convolution image

5. Layer 4-7 x7 convolution image

The 3 rd layer and the 4 th layer respectively carry out 5x5 and 7x7 convolution, but the conventional 5x5 and 7x7 convolution methods provide certain difficulty for the calculation of padding and the programming of step length, and the parameters are more in quantity, if a reasonable convolution kernel is designed, a great deal of time is wasted, so that the method of convolution kernel accumulation in deep learning is used for reference, the same receptive field is obtained by utilizing a plurality of 3x3 convolution kernels, and the schematic diagram is shown in fig. 10.

Wherein the 5x5 convolved image is cropped to a size of 160x128 and the 7x7 convolved image is cropped to a size of 80x64. The convolved image is shown in fig. 11 (enlarged to the same view for ease of viewing, not to full size).

In this way, when the change of the target from far to near is that the detail information is not required to be traced, the approximate position can be judged, and the characteristics and advantages of each image are comprehensively utilized by a later error function discrimination method, so that a matching method is comprehensively designed.

The situation on the individual images when the scale change of the object occurs is shown in fig. 12.

S4, carrying out an improved SSDA template matching algorithm on the images of different layers obtained through the feature pyramid in the S3, selecting the maximum possible position of the target by a frame, and simultaneously recording the confidence coefficient and the target movement position information.

Let S (x, y) be an mxn search graph, T (x, y) be an MxN template graph, S _i，j Is a sub-graph (the starting position of the upper left corner is (i, j)) in the search graph, and the searching is completed by sliding on the graph to be searched, as shown in fig. 13, the target searching process specifically includes:

s4.1, error definition:

s4.2, setting an initial threshold Th0, and when the threshold is random point matching in the SSDA algorithm, considering that the point is not necessarily a target when the error accumulation design threshold is considered, and directly discarding the point. The threshold is set to an empirical value, typically 30-40.

S4.3 random matching method

Randomly selecting non-repeated pixel points in the area to be searched as the center of the tracking frame. And calculating the current error, accumulating the error, and recording the current accumulation times H when the error accumulation exceeds a threshold Th 0. All sub-graphs are then traversed.

In the traversal process, if the error value is larger than the threshold Th0 under the frequency smaller than H, the operation of selecting the random point to calculate the error is not continued, and the next sub-graph is directly switched.

If there is a sub-graph during traversal, the accumulated error is Th1 after H times are calculated. If Th1< Th0, then Th0 is updated to Th1.

In order to ensure false detection caused by accidental conditions in the random calculation process, a lower limit should be set for the random number H, which is generally not less than 40% of the number of sub-pixels.

And recording the matching times H and the accumulated error sum in all sub-graph matching in the traversal process, and calculating the average error sum Ea. When traversing all the sub-graphs, outputting average error rate and the center coordinates of the smallest sub-graph, and randomly matching the sub-graphs, as shown in fig. 14.

In fig. 2, the line corresponding to a in fig. (a) and the line corresponding to a in fig. (b) are one line, and the line corresponding to b in fig. (d) are one line; the lines corresponding to c and d in fig. (b) are one line respectively corresponding to c and d in fig. (c); the lines corresponding to e, f, and g in the drawing (c) are one line each corresponding to e, f, and g in the drawing (d).

In fig. 14, the line corresponding to a, b, c, d in fig. (a) is one line each corresponding to a, b, c, d in fig. (b).

S5, judging the confidence coefficient obtained by the improved SSDA algorithm on each layer obtained in the S4 through an error function, and judging whether the target is subjected to scale change, morphological change or shielding through the confidence coefficient; if the scale change occurs, determining a new template; if the form transformation occurs, updating the template in time; if shielding occurs, the search area is enlarged, and the method specifically comprises the following steps:

the object is information GT (x, y, h, w) according to the object given in the first frame of the original image, which is the upper left corner coordinates (x, y) of the object frame, the height h and the width w of the object frame, respectively. And obtaining the target frame information of the first frame on other feature graphs according to the image size proportion of the convolution operation. The corresponding target frame information of the first frame at the image enhancement layer, 2x2, 3x3, 5x5 and 7x7 layers is GT0, GT1, GT2, GT3, GT4, respectively. The target match results M (x, y, P) at each layer can be derived by performing a separate modified version of the SSDA matching algorithm on each layer, where (x, y) is the center coordinates of the box of the target match results and P is the confidence level.

Since the enhancement layer reflects the original information of the image, the 2x2 convolution layer reflects the contour information of the target, the 3x3 convolution layer reflects the detail texture information of the target, and the 5x5 and 7x7 convolution layers reflect the approximate position information of the target, the following error function discrimination template updating method is designed by integrating the image features:

1. when the target changes morphology

The error probability of minimum error of matching of a plurality of layers obtained by the improved SSDA template matching algorithm is E respectively ₀ 、E ₁ 、E ₂ 、E ₃ 、E ₄ . In order to facilitate subsequent calculation, normalization processing is carried out, the normalization processing is mapped to a 0-1 interval, positive logic is met, and the matching correct probability is obtained, wherein the method comprises the following steps:

wherein TH is TH _i For each threshold at which an error is obtained.

it is apparent that the matching term with the smallest error of each layer is the highest in weight, occupies 80% of weight according to the empirical formula, and the rest layers occupy 5% of weight. Let the confidence of the i layer obtained at the time t beSet an error function L _t Is set by the resulting function of these five parameters:

when the target changes morphology, the error function L _t Will increase rapidly. And when the judgment error exceeds the threshold value, updating the template. The error judging method comprises the following steps:

ΔL _t ＝|L _t -L _t-1 |

IfΔL _t and (3) if the position of the template is more than or equal to 10%, updating the template to be the image at the position of the template of the previous frame as the template. Namely, the returned frame information (x, y, h, w) obtained by the last template matching is used as a current frame selection template to carry out the matching of the next frame.

2. When the target is scaled

Considering that the target is likely to change in scale, when the template is updated each time, one-time scale change evaluation is performed, if the error range is always kept within 10%, one-time scale change evaluation is performed on the enhanced image every 10 frames, and if the scale change exists, the size of the template after the change is proportionally assigned to other layers. The idea of evaluating the dimensional change is as follows:

and evaluating the scale change at the time t. Because the error of the t-1 moment is within 10%, the tracking frame of the t-1 moment truly reflects the target position, so that the tracking frame of the previous frame can be used as a group trunk. And on the image at the time t, performing multi-scale selection by taking the group trunk information of the previous frame as a basis. The selection method comprises the following steps:

an additional 3 reference frames referenced to the group trunk are selected at time t-1, 2 larger reference frames and 1 smaller reference frame, respectively. The length and width are 150%, 120% and 75% of groundtrunk, respectively. As shown in fig. 15, the first layer frame, the second layer frame, and the innermost layer frame are respectively from the outside to the inside, wherein the third layer frame is a group trunk frame.

When the target is near a near infrared camera, it can be seen that the outermost box therein is more adapted to the target.

The specific implementation method is as follows:

assume that at time t it is determined that a template scale update is required.

The tracking frame at the time t-1 is selected as the group trunk, and three tracking frames, namely a tracking box1, a tracking box2 and a tracking box3, are selected in the image at the time t according to 150%, 120% and 75% of the group trunk according to the length and the width respectively. As shown in fig. 15, the first layer frame, the second layer frame, and the innermost layer frame are respectively from the outside to the inside, wherein the third layer frame is the groundtrunk frame from the outside to the inside.

According to the coordinate position of the group trunk, at the same center position at the time t, selecting the images framed by the tracking box1, the tracking box2 and the tracking box3, and carrying out operations of reducing the length and width of the images framed by the tracking box1, the tracking box2 and the tracking box3 to 66%, 83% and 133% respectively, so as to obtain images 1, 2 and 3.

And comparing the three graphs with the ground trunk to carry out an improved SSDA matching algorithm. The original size of the image with the smallest error is transmitted to all layers as a new template size parameter, and the template size is updated.

The step of size updating is performed only on the enhanced image layer (layer 0), the 3x3 convolution layer (layer 2), and the 5x5 convolution layer (layer 3), taking into account the difference in image characteristics of the respective layers.

3. Adaptive search area variation

The template matching search mode of the calculation method is an 8-neighborhood search method, namely, the search with the step length of one pixel is carried out in a wide range with the length of 3 times of group trunk and the width of 3 times of group trunk by taking the center position of a group trunk as the center, and filling (padding) is not carried out in the sliding window process.

And comparing the central position P1 (x, y) of the frame with highest confidence coefficient obtained by matching with the central position P0 (x, y) of the frame of the previous frame each time, obtaining a target leaving position, classifying the result into 8 categories of up, down, left, right, up left, up right, down left and down right, and storing the classification result of the latest 10 frames. As shown in fig. 16.

However, targets sometimes appear occluded or lost due to environmental reasons. The appearance at this time is that the confidence of the image matching of a certain frame suddenly drops much.

The strategy of the method is to change to a larger scope search while stopping the updating of the template.

Assume that the tracking confidence of the target at time t drops below 20% (empirical value). At this point an adaptive search area change algorithm is started. The specific search range is a search area which is enlarged to 25 times of the original position, namely, the length and the width are respectively 5 times of the previous frame ground.

And simultaneously, predicting the possible position of the target according to the position of the target leaving of the last 10 frames. A search area of 3x3 times the group trunk area is added. The method for predicting the possible positions of the target is to take one of the 8 categories of the last 10 frames, namely the upper, lower, left, right, upper left, upper right, lower left and lower right, with the largest occurrence number as a prediction result.

In summary, when the tracking confidence of the target at the time t is reduced to be lower than 20% (experience value), stopping tracking, predicting the possible position of the target, and expanding the search area, wherein the schematic diagram of the search area is shown in fig. 17;

the middle maximum area is an expanded search area, the uppermost projected area is an expanded search area when the target leaves from above, and the lower right projected area is an expanded search area when the target leaves from below right.

In view of the effectiveness of the algorithm, the adaptive search mode is only performed on 7x7 and 5x5 convolution layers, and when the resolution of the infrared camera is 640x512, the image sizes of the 7x7 and 5x5 convolution layers are 80x64 and 160x128. Even if the search area is enlarged, too much time is not consumed.

The invention introduces the idea of an SSDA template (sequence similarity detection algorithm sequential similiarity detection algorithm) matching algorithm into image template matching, updates the template at proper time by a reasonably designed error function identification method, and can update the template in time when the target changes in form during rolling and yawing motions so as to carry out continuous tracking; when matching is carried out on each feature pyramid by a design error function discrimination method, whether the target is lost or not is judged according to the value of the error function, if the target is lost, the target is blocked, at the moment, the approximate position of the target is predicted according to the central position change of the regression frame by saving the templates of the latest frames, a search domain is updated, and the target is searched in the new search domain. When the error discrimination function judges that the target is lost, the approximate range of the target reappearance is predicted according to the central coordinate position of the tracking frame of a plurality of frames of pictures before the target is lost, and the search domain is changed in a self-adaptive mode. The method can keep good tracking when the target is subjected to scale change, morphological change and shielding.

The above embodiments are merely preferred embodiments of the present invention, the protection scope of the present invention is not limited thereto, and any simple changes or equivalent substitutions of technical solutions that can be obviously obtained by those skilled in the art within the technical scope of the present invention disclosed in the present invention belong to the protection scope of the present invention.

Claims

1. A multi-level template self-adaptive matching target tracking method for infrared images is characterized by comprising the following steps:

acquiring an infrared image, and performing image enhancement pretreatment on the infrared image to obtain a first layer of an image feature pyramid;

aiming at images of different layers obtained through the feature pyramid, performing target searching on each layer by using an SSDA template matching algorithm, selecting the maximum possible position of a target by a frame, and simultaneously recording confidence coefficient and target movement position information;

judging the confidence coefficient through an error function, and judging whether the target is subjected to scale change, morphological change or shielding through the confidence coefficient; if the scale change occurs, determining a new template; if the form transformation occurs, updating the template in time; enlarging the search area if shielding occurs;

extracting images of different levels on the infrared image by using a characteristic pyramid algorithm means that convolution is respectively carried out on the convolution check targets of 2x2, 3x3, 5x5 and 7x7 to respectively obtain blurred images of different degrees; different target features are reflected on different feature images, image information of different dimensions is detected on images of different dimensions, and the image structure is specifically as follows;

layer 0 is an enhanced image, its size remains 640x512 from the original image;

layer 1 is a 2x2 convolution image, the 2x2 convolution uses a modified version of the Robert convolution kernel, the stride is 1, and the output size image is 640x512 using padding;

the Robert convolution kernel comprises two convolution kernels, the results of the two convolution kernels are overlapped, the final overlapped image is used as an image opening operation, and small noise points are removed; the Robert convolution kernel structure is as follows:

layer 2 is a 3x3 convolution image, the 3x3 convolution is used to perform a 3x3 convolution on the enhanced image, wherein the convolution kernel employs an improved laplacian:

since the 3x3 convolution uses a stride of 2, the calculation formula according to the convolved image size is:

the size of the input image is H x W, the convolution kernel size is FH x FW, the stride is S, the filling padding is P, the surrounding pixels are cut off after 3x3 convolution, and the obtained image size is 320 x 256;

layer 3 is a 5x5 convolution image, and the size of the 5x5 convolution image is 160x128 after clipping;

layer 4 is a 7x7 convolution image, and the size of the 7x7 convolution image is 80x64 after being cut;

assuming that S (x, y) is a search graph of MxN and T (x, y) is a template graph of MxN, the search is completed by sliding on the graph to be searched, and the target search process specifically includes:

error definition:

wherein S is _i，j Is a sub-graph of the search graph, T is a template graph,is the average value of the template diagram, the starting position of the upper left corner of the subgraph is (i, j), then the average subgraph +.>Is that;

setting an initial threshold Th0, wherein the threshold is a random point matching in an SSDA algorithm, and when the error accumulation design threshold considers that the point is not necessarily a target, directly discarding the point; the threshold value is set as an empirical value;

recording the matching times H and the accumulated error sum in all sub-graph matching in the traversal process, and calculating the average error; after traversing all the subgraphs, outputting an average error rate and the center coordinates of the smallest subgraphs;

the specific process for judging the confidence coefficient through the error function is as follows:

when a morphological change occurs in the target:

the error probability of minimum error of matching of a plurality of layers obtained by using an SSDA template matching algorithm is E ₀ 、E ₁ 、E ₂ 、E ₃ 、E ₄ Carrying out normalization processing on each error probability, mapping the error probability to a 0-1 interval, conforming to positive logic and obtaining matching correct probability:

wherein TH is that _i A threshold value for each time an error is obtained;

ΔL _t ＝|L _t -L _t-1 |

when the target undergoes a dimensional change:

when the error discrimination function judges that the target is in scale change, three frames are generated by taking the target normally tracked in the previous frame as a reference, wherein the sizes of the three frames are 75%, 120% and 150% of the target respectively, scaling transformation is carried out to restore the target to the original size, comparison is carried out with the template of the previous frame, and a transformation result with the highest matching degree is used as a new template.

2. The method for adaptively matching a target tracking multi-level template for an infrared image according to claim 1, wherein the infrared image is a single-channel 8-bit image, and the pixel value of the image is x _in The image enhancement preprocessing includes: traversing the image, enhancing the gray value of each pixel, and outputting a pixel value x _out ：

Where k is the enhancement ratio.