CN119339296A

CN119339296A - A target detection and tracking system and method based on artificial intelligence and machine vision

Info

Publication number: CN119339296A
Application number: CN202411533556.7A
Authority: CN
Inventors: 段丽英; 贾梦; 孟惜; 李燕; 董倩; 韩明; 刘智国; 向存真; 范秀川; 段英华
Original assignee: Shijiazhuang University
Current assignee: Shijiazhuang University
Priority date: 2024-10-30
Filing date: 2024-10-30
Publication date: 2025-01-21

Abstract

The invention provides a target detection tracking system and method based on artificial intelligence and machine vision, which belong to the technical field of image recognition, wherein the module comprises a target module, a data acquisition module, a labeling module and a model training module, wherein the target module is used for definitely monitoring and tracking a target object and determining a target application scene of the system, the data acquisition module is used for collecting original video data in the target application scene and preprocessing the original video data to obtain a standard frame sequence, the labeling module is used for analyzing boundary characteristics of targets of all the standard frame sequences, recognizing all the targets of the standard frame sequence to obtain labeling data, the model training module is used for training an initial model to obtain a final model, and the detection tracking module is used for inputting a real-time video stream into the final model, recognizing the targets of video frames and tracking the positions of the targets in real time. And the accuracy and the instantaneity of target identification and tracking are improved.

Description

Target detection tracking system and method based on artificial intelligence and machine vision

Technical Field

The invention relates to the technical field of image recognition, in particular to a target detection tracking system and method based on artificial intelligence and machine vision.

Background

In recent years, the rapid development of deep learning technology greatly improves the accuracy and efficiency of image processing and target recognition, and the demands of urban safety and public place monitoring are continuously increased to promote the application of target detection and tracking technology, but the recognition technology may have errors under different environmental conditions (such as illumination, weather and shielding) to cause the loss of targets.

Therefore, the invention provides a target detection tracking system and method based on artificial intelligence and machine vision.

Disclosure of Invention

The invention provides a target detection tracking system and method based on artificial intelligence and machine vision, which are used for identifying and marking targets of images after the images are acquired by utilizing image acquisition equipment, training a model and detecting and tracking the targets in an actual environment based on the model.

In one aspect, the present invention provides an artificial intelligence and machine vision based target detection tracking system comprising:

The target module is used for definitely monitoring and tracking a target object and determining a target application scene of the system;

the data acquisition module is used for collecting original video data in a target application scene by using video acquisition equipment and preprocessing the original video data to obtain a standard frame sequence;

The labeling module analyzes boundary characteristics of targets of all standard frame sequences, identifies all targets of the standard frame sequences and obtains labeling data;

The model training module is used for training an initial model according to the labeling data and the historical data of the historical library to obtain a final model;

And the detection tracking module is used for inputting the real-time video stream into the final model, identifying the target of the video frame and tracking the target position in real time.

In another aspect, the target module includes:

The characteristic unit is used for defining the characteristics of the target object according to the monitored and tracked target object;

And the scene unit is used for defining the monitoring and tracking environment of the target object and determining the target application scene of the system.

In another aspect, the data acquisition module includes:

the corresponding unit is used for selecting video acquisition equipment corresponding to the identification characteristic according to the characteristic of the target object, configuring a unique first number for the video acquisition equipment and configuring a unique second number for the video acquisition equipment installation position of the target application scene;

The installation unit is used for installing the video acquisition equipment to the corresponding video acquisition equipment installation position according to the matching relation between the first number of any video acquisition equipment and the second number of the corresponding video acquisition equipment installation position;

And the configuration unit is used for determining configuration parameters of the video acquisition equipment according to the target application scene, carrying out configuration starting on all the video acquisition equipment and acquiring original video data in real time.

In another aspect, the data acquisition module further includes:

The video processing unit unifies the data format of any original video data according to a preset standard, determines a preset frame number according to a frequency-video duration-video rate mapping table, processes the original video frame by frame according to the preset frame number to obtain a plurality of images, and obtains an original frame sequence;

the frame processing unit is used for scaling any frame of the original frame sequence according to a bilinear method to unify standard sizes of all frames of the original frame sequence;

all standard frames of the original frame sequence constitute a standard frame sequence.

In another aspect, the labeling module further includes:

The edge acquisition unit is used for acquiring any frame image of any standard frame sequence, carrying out Gaussian filtering treatment to acquire an edge clear image, and carrying out gray scale treatment on each pixel point of the edge clear image to form a gray scale image;

Setting the gray value of any pixel point on the gray image as Acquiring an L operator absolute value of any pixel point, and performing threshold processing:

Wherein, the method comprises the steps of, The L operator threshold represents the current gray image edge,Represents the average value of all L operator absolute values of the current gray image,Representing current gray scale image commonalityThe number of pixels in a pixel is one,Representing the maximum value in all L operator absolute values of the current gray image; Representing the minimum value of all the absolute values of the L operators corresponding to the current gray image, The absolute value of the L operator corresponding to the ith pixel point in the current gray level image is represented,Represents the lateral gray value at the i-th pixel,Representing the longitudinal gray value of the ith pixel point;

If the edge L operator of any pixel point is larger than the edge L operator threshold, the corresponding pixel point belongs to the edge point, and if the edge L operator of any pixel point is smaller than or equal to the edge L operator threshold, the corresponding pixel point belongs to the internal point;

And the labeling unit is used for labeling the feature names of the closed image targets by using a labeling tool according to the closed image targets formed by the edge points of any frame of image, identifying all the closed image targets of the standard frame sequence and forming an image labeling part to obtain labeling data, wherein the labeling data comprises an image labeling part and a target motion change part.

In another aspect, the model training module comprises:

the training unit divides the labeling data of the same feature name into test sets and divides the historical data of the same feature name of the historical library into training sets;

selecting a detection and tracking algorithm as a model framework, wherein all image labeling parts are used as detection training, all target motion change parts of images are used as tracking training, and setting initial model parameters to construct an initial model;

and training the initial model by using a training set to obtain a first model, verifying the first model by using a test set to obtain model parameter deviation, and adjusting the first model based on the model parameter deviation to obtain a final model.

In another aspect, the detection tracking module includes:

the docking unit deploys the final model into an actual environment and docks with a real-time video stream interface;

and the monitoring unit is used for analyzing and identifying the target of the real-time video stream in real time, tracking the analysis tracking position of the real-time video stream in real time, comparing the analysis tracking position with the actual standard position, and updating and retraining the final model.

In another aspect, the present invention provides a target detection tracking method based on artificial intelligence and machine vision, including:

Step1, definitely monitoring and tracking a target object, and determining an application scene of a system;

step 2, collecting original video data in a target application scene by using video acquisition equipment, and preprocessing the original video data to obtain a standard frame sequence;

Step 3, analyzing boundary characteristics of targets of all standard frame sequences, and identifying all targets of the standard frame sequences to obtain labeling data;

Training an initial model according to the labeling data and the historical data of the historical library to obtain a final model;

And 5, inputting the real-time video stream into the first model, identifying the target of the video frame, and tracking the target position in real time.

Compared with the prior art, the invention has the beneficial effects that:

Drawings

In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a target detection tracking system based on artificial intelligence and machine vision according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of a target detection tracking method based on artificial intelligence and machine vision according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1:

as shown in fig. 1, an object detection tracking system based on artificial intelligence and machine vision according to an embodiment of the present invention includes:

In this embodiment, the target object refers to a particular object that needs to be identified and tracked in the video monitoring and tracking system, such as a person, a vehicle, an animal, etc.

In this embodiment, the target application scenario includes public places, home environments, industrial environments, and the like.

In this embodiment, the video capture device is a hardware tool for collecting raw video data, such as a camera, a drone, a cell phone, etc.

In this embodiment, the raw video data refers to raw video content recorded by a video capture device (such as a camera) in a target application scene.

In this embodiment, preprocessing is an important step in the data processing process, aiming at cleaning and converting the original video data.

In this embodiment, the standard frame sequence refers to frames having a uniform format and characteristics extracted from the original video data after preprocessing.

In this embodiment, the target analysis boundary features refer to features in an image or video frame that are used to describe and identify the boundary and shape of a target object, such as boundaries, contours, keypoints, etc.

In this embodiment, the annotation data refers to information for recording the object in the image or video frame, including the object position, the object type, the time, and the like.

In this embodiment, the historian refers to a database that stores past data and model training information.

In this embodiment, historical data refers to relevant data collected during past monitoring and tracking.

In this embodiment, the initial model refers to a first version of the model created based on the annotation data and the history data during model training.

In this embodiment, the final model is a trained and optimized machine-or deep-learning model.

In this embodiment, the real-time video stream refers to a continuous image data stream generated by a video capture device (e.g., a camera).

In this embodiment, a video frame refers to a single still image in a video, and is a basic constituent unit of a video stream.

The technical scheme has the advantages that the high-efficiency identification and tracking of the monitored object are realized through the clear target, the data acquisition, the target marking, the model training and the real-time detection tracking, and the monitoring capability and the real-time response performance of the system are improved.

Example 2:

on the basis of the above embodiment 1, the target module includes:

In this embodiment, the features are key attributes for describing the target object, including color, shape, texture, etc.

In this embodiment, the environment refers to the physical space in which the target object exists and is active.

The technical scheme has the advantages that by defining the target characteristics and the monitoring environment, the system can accurately identify and track the target object, the monitoring efficiency is improved, the system is suitable for different application scenes, and the flexibility and the accuracy of the system are enhanced.

Example 3:

on the basis of embodiment 2 above, the data acquisition module includes:

In this embodiment, the first number refers to a unique identifier assigned to each video capture device.

In this embodiment, the second number refers to a unique identifier assigned to each video capture device installation location in the target application scenario.

In this embodiment, the matching relationship refers to an association between a unique identifier (first number) of the video capture device and a unique identifier (second number) of its installation location.

In this embodiment, the installation location refers to a specific place or area where a specific video capture device is installed in a target application scene.

In this embodiment, the configuration parameters refer to various parameters and options required in determining the settings of the video capture device.

The technical scheme has the advantages that by matching the target characteristics with the video equipment, the equipment is ensured to be installed and configured correctly, the accuracy of data acquisition is improved, the high-quality original video is acquired in real time, and a foundation is laid for subsequent analysis.

Example 4:

on the basis of the above embodiment 3, the data acquisition module further includes:

In this embodiment, the preset criteria refers to a series of specifications and requirements for unifying the data format and processing the original video data during the video processing, including data format, resolution, frame rate, etc.

In this embodiment, the data format refers to the specific structure and coding scheme employed in the storage and transmission of video and image data.

In this embodiment, the frequency-video duration-video rate map is a table for determining the relationship between the number of video frames and the video duration and playback rate.

In this embodiment, the predetermined frame number refers to a desired frame number calculated according to parameters such as a total duration of video, a play rate, and a set frame rate during video processing.

In this embodiment, the frame-by-frame processing refers to a process of analyzing and processing each frame image in the video individually.

In this embodiment, the bilinear method is an image scaling and interpolation technique, in which four adjacent pixels around a target pixel are acquired, and the values of the four pixels are calculated by weighting, so as to generate a new pixel value.

In this embodiment, the pixel value refers to color and luminance information represented by each pixel in the image.

In this embodiment, the normalization process refers to a process of uniformly adjusting pixel values in images so as to have uniform brightness, contrast and color distribution among different images.

In this embodiment, the standard frame refers to an image frame that meets a preset standard size and a standardized pixel value after being processed.

The working principle and the beneficial effects of the technical scheme are that the unified standard frame sequence is generated by unifying the video format, determining the preset frame number and standardizing the processed frames, the comparability and the processing efficiency of video data are improved, and a reliable basis is provided for subsequent analysis.

Example 5:

on the basis of the foregoing embodiment 1, the labeling module further includes:

In this embodiment, the gaussian filter processing achieves a smoothing effect by convolving an image with a gaussian function.

In this embodiment, the edge-clear image refers to an image that can significantly highlight edge features of an object in the image after processing.

In this embodiment, the gradation processing is a process of converting a color image into a gradation image in image processing, converting each pixel from color into a single color.

In this embodiment, a grayscale image refers to the result of converting a color image into a single-channel image, where the value of each pixel represents its brightness or grayscale level.

In this embodiment, the gradation value is a numerical value used to represent the luminance of each pixel in image processing.

In this embodiment, the L operator absolute value is an operator used for edge detection, which is mainly detected by calculating the gradient of pixels in the image.

In this embodiment, the edge L operator threshold is a key parameter for determining edge points in an image, and a standard value of all L operator absolute values in the image is obtained to represent edge intensity.

In this embodiment, the edge points refer to positions in the image where pixel values change significantly, representing the boundary or shape characteristics of the object.

In this embodiment, the interior points refer to those pixels in the image that do not belong to an edge.

In this embodiment, the closed image object is an object that forms a complete and closed outline or region in the image, as identified by the edge detection and marking tool.

In this embodiment, the target motion changing portion refers to a change in displacement, rotation, scaling, or the like of the target over time in the image sequence.

The technical scheme has the working principle and beneficial effects that the image edge is extracted through Gaussian filtering and gray level processing, the edge point is judged by using the threshold value of the L operator, then target labeling is carried out, labeling data containing characteristic names and motion changes are generated, and the accuracy and the practicability of image identification are improved.

Example 6:

on the basis of embodiment 5 above, the model training module includes:

In this embodiment, the test set is a data set for evaluating and verifying the performance of the model.

In this embodiment, the training set is a data set for training a machine learning model.

In this embodiment, model architecture refers to the configuration of the framework and components used to define the model architecture, including algorithm selection, hierarchy, parameters, etc., in machine learning.

In this embodiment, the algorithms for detection and tracking refer to computer vision techniques for processing image and video data, such as ssd techniques, feature tracking algorithms, and the like.

In this embodiment, the initial model parameters refer to initial values set at the time of model construction, including weights, learning rates, regularization parameters, and the like.

In this embodiment, the first model is a model that is obtained by preliminary training according to training set data in the training process.

In this embodiment, the bias refers to the difference between the performance of the model on the test set and the actual expectation.

The technical scheme has the working principle and beneficial effects that the detection and tracking model is trained by dividing the training set and the testing set and using the marking data, the model parameters are adjusted to optimize the performance, and finally the accuracy and the robustness of target detection and tracking are improved.

Example 7:

On the basis of the above embodiment 1, the detection tracking module includes:

In this embodiment, the actual environment refers to the real world scenario in which the final model is deployed and run.

In this embodiment, an interface refers to a protocol or connection manner in which communication and data exchange between different systems, devices, or components occurs.

In this embodiment, analyzing the tracking location refers to the location of the target identified by the model in the real-time video stream.

In this embodiment, the actual standard position refers to a predefined target position in a specific application scenario.

The technical scheme has the working principle and beneficial effects that the final model is in butt joint with the real-time video stream to realize real-time identification and tracking of the target, and compared with the standard position, the model is updated and retrained in time, so that the real-time performance and accuracy of the system are improved.

Example 8:

As shown in fig. 2, the object detection tracking method based on artificial intelligence and machine vision provided by the embodiment of the invention includes:

It should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention, and not for limiting the same, and although the present invention has been described in detail with reference to the above-mentioned embodiments, it should be understood by those skilled in the art that the technical solution described in the above-mentioned embodiments may be modified or some technical features may be equivalently replaced, and these modifications or substitutions do not make the essence of the corresponding technical solution deviate from the spirit and scope of the technical solution of the embodiments of the present invention.

Claims

1. An artificial intelligence and machine vision based target detection tracking system, comprising:

2. An artificial intelligence and machine vision based object detection tracking system according to claim 1, the method is characterized in that the target module comprises:

3. An artificial intelligence and machine vision based object detection tracking system according to claim 2, the data acquisition module is characterized by comprising:

4. An artificial intelligence and machine vision based object detection tracking system according to claim 3, the data acquisition module is characterized by further comprising:

5. An artificial intelligence and machine vision based object detection tracking system according to claim 1, the marking module is characterized by further comprising:

6. An artificial intelligence and machine vision based object detection tracking system according to claim 5, the model training module is characterized by comprising:

7. An artificial intelligence and machine vision based object detection tracking system according to claim 1, the detection tracking module is characterized by comprising:

8. An artificial intelligence and machine vision based target detection tracking method is characterized by comprising the following steps:

step1, a target object which is explicitly monitored and tracked is determined, and a target application scene of the system is determined;