CN116310359A - Intelligent detection method for photoelectric imaging weak and small target in complex environment - Google Patents
Intelligent detection method for photoelectric imaging weak and small target in complex environment Download PDFInfo
- Publication number
- CN116310359A CN116310359A CN202310196144.8A CN202310196144A CN116310359A CN 116310359 A CN116310359 A CN 116310359A CN 202310196144 A CN202310196144 A CN 202310196144A CN 116310359 A CN116310359 A CN 116310359A
- Authority
- CN
- China
- Prior art keywords
- feature map
- scale
- feature
- layer
- residual error
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 64
- 238000003384 imaging method Methods 0.000 title claims abstract description 14
- 230000004927 fusion Effects 0.000 claims abstract description 37
- 238000012545 processing Methods 0.000 claims abstract description 16
- 238000000605 extraction Methods 0.000 claims abstract description 6
- 238000007499 fusion processing Methods 0.000 claims abstract description 4
- 238000010606 normalization Methods 0.000 claims abstract description 3
- 238000010586 diagram Methods 0.000 claims description 22
- 238000000034 method Methods 0.000 claims description 21
- 238000011176 pooling Methods 0.000 claims description 20
- 238000012216 screening Methods 0.000 claims description 3
- 230000001629 suppression Effects 0.000 claims description 3
- 230000008569 process Effects 0.000 description 8
- 238000011160 research Methods 0.000 description 8
- 239000013598 vector Substances 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 238000013461 design Methods 0.000 description 5
- 230000000007 visual effect Effects 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 4
- 230000003213 activating effect Effects 0.000 description 3
- 230000008034 disappearance Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012634 optical imaging Methods 0.000 description 2
- 238000007500 overflow downdraw method Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 206010039203 Road traffic accident Diseases 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000008021 deposition Effects 0.000 description 1
- 238000001212 derivatisation Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000009448 modified atmosphere packaging Methods 0.000 description 1
- 235000019837 monoammonium phosphate Nutrition 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/32—Normalisation of the pattern dimensions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/10—Image acquisition
- G06V10/16—Image acquisition using multiple overlapping images; Image stitching
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an intelligent detection method for a small target in a complex environment photoelectric imaging, and belongs to the technical field of target detection. The invention comprises the following steps: performing size normalization processing on an image to be detected, performing grouping convolution processing, performing downsampling, and performing multi-scale feature extraction on the second feature map through four stacked residual module layers; carrying out multi-scale fusion processing on the output feature graphs of the residual error module layers of the second layer, the third layer and the fourth layer by adopting a ladder-type structural feature fusion mode to obtain fusion feature graphs; finally, carrying out weak and small target detection processing on the fusion feature map through a classification detection judging device to obtain a detection processing result of the weak and small target; the invention improves the detection network to make the network model algorithm of the invention have the characteristic of richer characteristic information. The invention effectively improves the precision performance, reduces the false detection rate and reduces the complexity of the model.
Description
Technical Field
The invention belongs to the technical field of target detection, and particularly relates to an intelligent detection method for a small target in a complex environment photoelectric imaging.
Background
In various application scenarios, there is a need to use algorithm software to simulate human eyes to intelligently detect, identify and classify targets in image files and process more complex tasks on the basis of the targets. In the long biological derivatization process, the vision system which takes human eyes as information input and human brain as information processing has very strong functions, and has high processing speed and strong anti-interference capability. Early computer vision development is slow and there is great difficulty in reaching the level of the human visual system. With the continuous development of the field of machine learning research, after a deep learning method introduces computer vision, the computer achieves better performance than human eyes in a plurality of vision tasks.
Classification detection and segmentation tasks constitute the basic task of computer vision, and thus target detection and recognition is a long-standing task research history as an advanced visual task. The detection and recognition task not only needs to determine whether the object contains detailed coordinate information of the object in the picture in a rectangular prediction box mode. After the weak and small target position detection is completed, the target category is classified and judged, and a result is given. With the continuous progress of computer vision related theory and technology in recent years, weak and small target detection in a complex environment has gradually become a new research hotspot in the field. However, the existing neural network algorithm still has the problems of false alarm, omission and the like when detecting weak and small targets in a complex environment.
The traditional target detection algorithm mainly adopts the manually designed characteristic of manual setting to judge by using a classifier under a sliding window. Traditional algorithms fall into two main categories: one based on spatial filtering algorithms and one based on the human visual system. However, the traditional method needs to consume a great deal of expert knowledge and labor cost to design templates or detection rules, and meanwhile, the problems of large calculated amount, poor generalization performance, high algorithm air-alarm rate and high omission rate in a complex environment exist. With the advent of deep learning technology, it has been increasingly migrated to deep neural networks in weak target detection due to its strong feature extraction and information abstraction capabilities. Especially, the convolutional neural network is low in air-alarm rate and omission rate of weak and small target detection under a complex environment, so that more and more students use the deep learning method to conduct task research. The convolutional neural network installation detector stage for target detection can be divided into a two-stage network and a single-stage network. Representative of a two-stage target detector is RCNN (Regions with CNN features), and subsequent optimized Faster RCNN. Representative of single-stage object detectors are SSD (Single Shot MultiBox Detector) and YOLO. However, the existing target detection algorithm still has the defects of lack of generalization capability, high detection omission ratio of a detector and poor recognition effect in a complex environment aiming at the detection of weak and small targets in the complex environment.
Disclosure of Invention
Aiming at the technical problem that the accuracy is low due to the fact that the target size is small, characteristic information is weak, identification is difficult, and the background is complex in a detection task of a weak and small target in the photoelectric imaging under a complex environment, the invention provides an intelligent detection method for the weak and small target in the photoelectric imaging under the complex environment.
The invention adopts the technical scheme that:
the intelligent detection method for the photoelectric imaging weak and small target in the complex environment comprises the following steps:
step S1: performing size normalization processing on the image to be detected to obtain the expected image size;
step S2: performing group convolution on the image processed in the step S1 by adopting a depth convolution with the convolution kernel size of 5 multiplied by 5 to obtain a first feature map;
step S3: performing first downsampling through convolution with a step length of 2 and a convolution kernel size of 3×3 to obtain a second feature map;
step S4: performing multi-scale feature extraction on the second feature map through four stacked residual error module layers;
the residual error module layer specifically comprises: the second feature map generated in the step S3 is used as an input feature map of the first residual error module layer, and the input feature maps of the second, third and fourth residual error module layers are output feature maps of the last residual error module layer; namely, the output characteristic diagram of the upper layer is used as the input characteristic diagram of the lower layer;
the network structure of each residual error module layer is the same, a lightweight convolution structure SheffleNet 2 is adopted as a basic convolution module, and after the input feature images of the residual error module layers pass through the basic convolution module, the input feature images of the residual error module layers are spliced with the input feature images of the current residual error module layers according to channels, so that the output feature images of the current residual error module layers are obtained;
step S5: carrying out multi-scale fusion processing on the output feature graphs of the residual error module layers of the second layer, the third layer and the fourth layer by adopting a ladder-type structural feature fusion mode to obtain fusion feature graphs;
the step-type structural feature fusion mode comprises the following steps:
defining output feature maps of residual error module layers of the second layer, the third layer and the fourth layer as a first scale feature map, a second scale feature map and a third scale feature map respectively, and defining feature map dimensions of the output feature maps of the residual error module layers of the second layer, the third layer and the fourth layer as a first scale, a second scale and a third scale respectively;
converting the dimension of the feature map of the third-scale feature map into a second scale, and then performing channel splicing with the second-scale feature map to obtain a first splicing result; converting the dimension of the feature map of the first splicing result into a second dimension to obtain a first splicing result of the second dimension;
converting the feature map dimension of the second scale feature map into a first scale, and then performing channel splicing with the first scale feature map to obtain a second splicing result; converting the feature map dimension of the second splicing result into a first dimension to obtain a second splicing result of the first dimension;
converting the feature map dimension of the first splicing result of the second scale into the first scale, then carrying out channel splicing with the second splicing result of the first scale to obtain a third splicing result, and converting the feature map dimension of the third splicing result into the first scale to obtain a third splicing result of the first scale;
and carrying out global average pooling operation on the third-scale feature map, converting the feature map dimension of the global average pooling operation result into a first scale, and carrying out feature fusion with a third splicing result of the first scale to obtain a fused feature map.
Step S6: performing weak and small target detection processing on the fusion feature map through a classification detection judging device to obtain a detection processing result of the weak and small target;
the weak target refers to a target with a target size smaller than or equal to a specified size.
Further, step S6 includes:
step S601, extracting candidate frames from the fusion feature map, and obtaining a plurality of final candidate frames after screening and non-maximum suppression processing of the extracted candidate frames;
step S602, extracting the characteristics of each final candidate frame from the fusion characteristic diagram based on each final candidate frame obtained in the step S601, and carrying out pooling operation on the characteristics of the candidate frames to obtain candidate frame characteristics conforming to the expected characteristic size;
step S603, weak and small object detection is performed on the candidate frame features based on the classification detection determiner.
Further, the classification detection determiner is composed of a full connection layer and a softmax function layer.
Further, in step S1, the desired image size is: 512 x 512.
The technical scheme provided by the invention has at least the following beneficial effects:
aiming at the technical problems of low accuracy caused by weak and small target detection task of photoelectric imaging in a complex environment due to small target size, weak characteristic information, difficult identification and complex background, the invention improves a detection network so that the network model algorithm has the characteristic of richer characteristic information. The invention effectively improves the precision performance, reduces the false detection rate and reduces the complexity of the model.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram showing the fusion of U-shaped structural features in an embodiment of the present invention;
FIG. 2 is a schematic diagram showing step-like feature fusion in accordance with an embodiment of the present invention;
FIG. 3 is a schematic diagram of a residual module structure according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a co-scale residual module according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a process of an intelligent detection method for a small target in a complex environment photoelectric imaging according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings.
The problem of small and weak target detection has been an important research problem in the fields of computer vision and artificial intelligence. The method is not only a precondition of an advanced visual task, but also can be widely applied to real scenes such as satellite remote sensing, military detection, airport scene detection and the like. With deep learning, especially with deep research of neural network algorithms, research methods based on neural networks are increasingly applied to detection of weak targets in complex environments. Aiming at the characteristics of small and weak target feature information, such as low probability of losing and large image disturbance in a complex environment, a feature value extraction and fusion mode is improved, and a residual error idea is introduced, so that detection efficiency is improved.
The task requirement for identifying weak and small targets of photoelectric imaging in a complex environment is common in aspects of social production and life. For small objects such as hat screws, nails and fuses on the runway of an airport. If the small problems can be intelligently detected through the monitoring images, the working pressure of airport staff can be effectively reduced. For automatic driving, intelligent detection is carried out from the monitoring image, and the road congestion degree and the traffic accident occurrence probability are effectively judged. For factory-generated materials, foreign objects and flaws can be identified by a small and weak target detection algorithm in a complex environment. For military reconnaissance, if the enemy deployment information and the movement can be accurately detected through the satellite remote sensing image, the combat efficiency of the army can be greatly improved, and the loss of personnel is reduced. For public safety management, human behavior analysis can be performed through monitoring visual angle data, early warning is performed on dangerous behaviors such as personnel trampling, violent attack and the like possibly occurring, and public safety is improved. Meanwhile, the intelligent detection and identification technology of the weak target is also an important support for carrying out route tracking and prediction on criminal suspects which may exist.
The complex environment refers to the difficulty that the target overlap and the target shielding exist. Electro-optical imaging refers to an electronic information map acquired by an electro-optical imaging device. A small target refers to a target having a size smaller than a specified size, for example, a target size of no more than 9*9 pixels at 256 x 256 pixels.
As a possible implementation manner, in the embodiment of the present invention, the fast RCNN is taken as a basic frame and is improved by adopting a fusion method of the features with the same scale and multiple scales, that is, the feature graphs with the same size and the same level are fused by using a residual structure and channel stitching on a convolution layer. Thereby improving the detection accuracy. And the lightweight convolution module ShuffeNetV 2 is used for replacing the original convolution module, so that the algorithm efficiency is improved.
In the fast RCNN network structure, features are first extracted from a picture to be detected through a convolutional network to obtain a feature map, where the convolutional network includes a convolutional layer (conv layer), a Relu activating layer (Relu layer), and a pooling layer (pooling layer), each layer of convolutional layer is followed by one layer of Relu activating layer, and a total of 13 layers of convolutional layers and 13 layers of Relu activating layers, and 4 layers of pooling layers, where the first two layers of pooling layers are: one pooling layer is arranged behind each two convolution layers and the Relu activation layer, and the two pooling layers are as follows: a pooling layer is arranged after each three convolution layers and the Relu activation layer. The feature map size is unchanged after passing through each conv layer relu layer according to the convolution and pooling formulas; after each deposition layer, the feature map (feature map) width and height become half of before. For example, a feature map size generated by extracting a m×n picture from the network is (M/16) ×n/16.
The resulting feature map input RPN (Region Proposal Network) is then used to extract candidate boxes, which is the main difference between the two-stage algorithm and the single-stage algorithm. After the FPN network is input, candidate boxes are obtained, and are classified by SVM, so that a plurality of (e.g. 2000) candidate boxes with the highest scores are obtained in a mode of screening and non-maximum suppression.
And then, selecting the features of the corresponding candidate frames from the feature map, and carrying out pooling operation on the features. So that their size is as expected. ROI pooling would have a preset width and height, indicating that each proposal feature is to be unified into such a large feature map. In the processing, the preselected frame coordinates based on the M x N scale are mapped back to the (M/16) x (N/16) scale. And then grid-dividing the corresponding area of each pre-selected frame according to the preset size. And carrying out maximum pooling on each part of the grid, outputting after processing, and unifying the output vectors.
And finally, according to the generated multidimensional feature vector, completing target identification, classification and prediction box position deviation on the candidate frame. That is, the specific categories of all pre-selected frames are classified by the full connection layer and softmax, typically multiple categories, and the regression to the border of the pre-selected frame results in a final classification with higher accuracy.
In the process of multi-scale feature image fusion, a top-down U-shaped structure is adopted in the conventional method as shown in fig. 1, wherein 8s, 16s and 32s represent three feature images with different scales, the scale of 8s is 64×64×128, the scale of 16s is 32×32×256, and the scale of 32s is 16×16×512, and 128, 256 and 512 represent corresponding channel numbers. That is, the feature map of 32s is first transformed into the feature map of 32×256 and then fused with the feature map of 16s, the fusion result is transformed into the feature map of 64×64×128, and finally fused with the feature map of 8s, and finally the fusion feature map of 64×64×128 is obtained. The feature map formed by the deep network lacks sufficient spatial location information for weak target identification, and therefore needs to be supplemented by the feature information of the shallow network. However, the traditional connection structure is simpler, and the shallower the feature information of the layer is less involved in aggregation, which is insufficient for completing the supplement of the space position information of the weak and small target. Meanwhile, when fusion is carried out, the information abstraction degree of shallow layer features is low, and in summary, aiming at the detection problem of weak and small targets, the multi-scale feature fusion mode of the U-shaped structure needs to be improved.
As shown in fig. 2, the embodiment of the invention provides a step-type structural feature fusion mode, which specifically comprises the following steps:
And 2, fusing the 8s feature map and the feature map formed by downsampling the 16s feature map along the indication direction (4) of the indication direction (3) to obtain a new 8s feature map. And carrying out feature fusion on the new 16s feature map obtained through the indication direction (2) along the indication direction (5) and the 8s feature map of the indication direction (6) to obtain a new 8s feature map.
And 3, carrying out global average pooling operation on the 32s feature images along the indication direction (7), and carrying out vector expansion on the feature images after pooling operation along the indication direction (8) to generate 8s feature images. And (3) carrying out feature fusion on the 8s feature map generated after vector expansion and the 8s feature map generated in the step (2) to obtain a final feature map, wherein the size of the final feature map is 64×64×128.
The feature fusion mode is a mode from deep to shallow, and finally, a feature map with expected size is output. The fusion mode provided by the invention is characterized in that the fusion of the characteristics between two adjacent layers is added on the basis of a U-shaped structure. The nonlinear design ensures that the characteristic fusion is more sufficient, the generated characteristic diagram has more abundant characteristics, and the characteristic diagram can represent the complete information of the picture. Meanwhile, the invention also carries out global mean value pooling on the initial shallow feature map (16 x 512), then carries out vector expansion and adds the vector expansion into a fusion process. The calculation of the step is simpler, but the receptive field is improved by a global information enhancement mode.
In order to reduce the calculation amount of the model, the invention selects the light-weight convolution module of the SheffleNetV 2. ShellfeNetV 2 is modified from ShellfeNetV 1. The SheffeNetV 1 provides a grouping convolution and channel rearrangement channel shuffle algorithm for optimizing the convolutional neural network module so as to reduce the addition, subtraction, multiplication and division operation quantity required in the convolution process.
The packet convolution principle used by shufflenetv1 approximates the deep convolution principle. The depth convolution algorithm is to design a separate convolution kernel, which convolves for each characteristic channel. The group convolution algorithm divides the channels of the feature map into a plurality of groups according to the set numerical values, and then allows the convolution kernel to process the features of each group. When the number of channels per set is set to 1, the two algorithm effects are equivalent.
The ShuffleNetV2 mainly increases the ratio of arithmetic computation operations to memory access operations, and improves the parallelism of the model, and is specifically expressed in:
(1) And a channel separation channel split is added at the beginning, so that the input image characteristic channels are divided into two groups, and the subsequent grouping convolution operation is canceled.
(2) element-wise addition (adding corresponding feature maps) replaces operation with channel stitching.
(3) The rearranged channel operation moves to the channel splicing and then is separated and combined with the channel.
Compared with other common lightweight convolution modules, the SheffeNetV 2 not only has better running performance, but also obtains better accuracy in the research of the image Net field, and is slightly worse than the MobileNetV2 in calculation amount. Because the processing task of the invention is to improve the detection precision of the photoelectric imaging weak and small target in the complex environment, the invention uses the photoelectric imaging weak and small target to lighten the convolution module and finish the algorithm optimization.
Aiming at the problem of gradient disappearance of weak and small targets, the invention designs the co-scale residual error module. The module aims to fuse the feature images under the same scale mainly by a residual module and a channel splicing method. By the method, the gradient disappearance problem and the model degradation problem of the algorithm model are relieved, and the feature richness extracted is improved.
The specific structure of the residual error module is shown in fig. 3, the input feature diagram firstly passes through the basic convolution module with the step length of 1 to obtain the output feature diagram of the basic convolution module, and then the output feature diagram of the basic convolution module and the input feature diagram are subjected to channel splicing to obtain the output feature diagram of the residual error module. And the residual module characteristic fusion structure under the same scale combines a dense connection network and a residual network. The features of the shallow layer and the deep layer can be combined together in a dense connection mode, so that the richness of the extracted features is improved. Meanwhile, due to the existence of a residual error module, reverse transmission can be effectively improved, and the gradient disappearance problem in the deep convolution process is relieved to a certain extent. The structure of the co-scale residual error module is shown in fig. 4, and the input of the co-scale residual error module is a generated characteristic diagram; rounded rectangle is residual error module; the intersections of the arrows represent the channel splicing feature images, and the number of the channels of the feature images is kept unchanged through a convolution kernel compression mode. The characteristic diagram firstly passes through a residual error module, and then the characteristic diagram and the output of the residual error module are subjected to channel splicing. The design can efficiently complete tasks such as feature learning, feature multiplexing, feature selection and the like through a residual structure and channel splicing, and can increase the number of channels while reducing the size of a feature map so as to keep feature richness and diversity
For small objects in a complex environment, such as small objects on the ground, since the ground objects are relatively small, in order to preserve the original features of the small ground objects as much as possible, in the embodiment of the present invention, a relatively large size of 512×512 pixels may be used as the input size of the picture. The network structure acquires feature graphs by using a residual module layer comprising a residual module and channel splicing idea, and finally fuses different-size feature graphs generated by different dense fusion layers by using a multi-scale fusion layer. Meanwhile, in order to increase receptive field and reduce calculated amount, the image is continuously downsampled in the transmission process of the convolution network, and the size of the feature map is gradually reduced, so that in order to reduce loss of features in the process, the richness and the diversity of the features are maintained, and the size of the feature map is reduced by half and the number of channels is doubled when downsampling is performed each time.
Referring to fig. 5, in the embodiment of the invention, the implementation of the method for intelligently detecting the small target in the complex environment photoelectric imaging comprises the following steps:
step S1: the picture to be predicted is normalized so that the input size can be unified, and in this embodiment, the input size of the picture is 512×512 pixels.
Step S2: for the input pictures, first, the input 3-channel (RGB three-channel) packets are convolved using the depth convolution of 5*5, resulting in a feature map of 512×512×3. The large-size convolution kernel is used in this step to obtain a relatively large receptive field, and the group convolution is used to reduce the calculation amount.
Step S3: the first downsampling was performed by a step size of 2 of 3*3 convolutions, resulting in a 256 x 32 feature map. This step is convolved with conventional 3*3 because it is the most commonly used convolution kernel size, which allows efficient feature extraction with relatively little computational effort, with increased receptive fields. The convolutional layer is added in the shallow layer of the network, so that the quality of the network extracted features can be ensured, and the stability of the network is improved.
Step S4: the characteristic diagrams with different scales are mainly obtained through a residual error module layer (4 layers). The module layer mainly uses a lightweight convolution structure in the ShuffleNet2 as a basic convolution module, and simultaneously adds a residual module and a channel splicing idea.
Step S5: and carrying out feature fusion on the obtained multi-scale feature map by using the ladder-type feature fusion method provided by the invention, and finally obtaining the feature map with the size of 64 x 128.
Step S6: and detecting and classifying the finally obtained feature map by using a classification detection determiner.
Aiming at the problem that small characteristic information of a weak target size is easy to be lost in a deep network, the shallow network characteristic information is supplemented in a characteristic fusion mode, so that the detection precision of the weak target is improved.
To further verify the detection performance of the method of the present invention, detection performance analyses were performed on the visrrone dataset and the VEDAI dataset. Wherein the visrrone data set contains six thousand training data sets, five hundred verification data sets, and one thousand test data sets. The data set is mainly used for collecting pictures containing people and vehicles from the perspective of the unmanned aerial vehicle, and ten target categories exist. The target distribution in the data set is centralized, and the environment is complex. The VEDAI dataset is an aerial image with very small target size and relatively complex background obtained by satellite. The data set has a picture size of 1024 x 1024 and contains one thousand training data sets, eight hundred verification data sets and one hundred test data sets. The detection targets of the images are 11 kinds of different vehicles, the image background is rich, and the detection targets cover the vehicle targets in the scenes of residential areas, city streets, expressways and the like. Because the original size of the picture is larger, the size of the target is extremely small compared with the size of the target under the satellite view angle, and the detection difficulty is high. Most of the target sizes for both the VEDAI data set and the visrrone data set are 20 x 20 or less. There are few cases in which the size of the object to be detected exceeds 60 x 60 in the data set, and the data set as a whole meets the requirements of a weak and small object. Meanwhile, the two data sets have the problems of target shielding and target overlapping, the background is complex, and the task requirements of the invention are met.
The accuracy comparison results of the method of the invention and the Faster RCNN and SSD algorithm model are shown in Table 1. The AP evaluation (Average Precision aims at the average accuracy of a data set, the calculation mode is the area under a Precision-recovery curve, and the area is used for measuring the quality of a trained target detection model on each target class) index is IoU (Intersection over Union) threshold value 0.50:0.05: at 0.95, the average of all IoU threshold MAPs (average of multiple categories AP). From the data in the table, the invention has better improvement on the detection precision, and the result proves the effectiveness and the high efficiency of the invention.
Table 1 algorithm model accuracy AP comparison table
Model | VEDAI | VISDRONE |
Faster RCNN | 16.7 | 8.6 |
SSD512 | 22.8 | 9.1 |
The invention is that | 23.4 | 12.2 |
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
What has been described above is merely some embodiments of the present invention. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit of the invention.
Claims (4)
1. The intelligent detection method for the photoelectric imaging weak and small target in the complex environment is characterized by comprising the following steps of:
step S1: performing size normalization processing on the image to be detected to obtain the expected image size;
step S2: performing group convolution on the image processed in the step S1 by adopting a depth convolution with the convolution kernel size of 5 multiplied by 5 to obtain a first feature map;
step S3: performing first downsampling through convolution with a step length of 2 and a convolution kernel size of 3×3 to obtain a second feature map;
step S4: performing multi-scale feature extraction on the second feature map through four stacked residual error module layers;
the residual error module layer specifically comprises: the second feature map generated in the step S3 is used as an input feature map of the first residual error module layer, and the input feature maps of the second, third and fourth residual error module layers are output feature maps of the last residual error module layer; namely, the output characteristic diagram of the upper layer is used as the input characteristic diagram of the lower layer;
the network structure of each residual error module layer is the same, a lightweight convolution structure SheffleNet 2 is adopted as a basic convolution module, and after the input feature images of the residual error module layers pass through the basic convolution module, the input feature images of the residual error module layers are spliced with the input feature images of the current residual error module layers according to channels, so that the output feature images of the current residual error module layers are obtained;
step S5: carrying out multi-scale fusion processing on the output feature graphs of the residual error module layers of the second layer, the third layer and the fourth layer by adopting a ladder-type structural feature fusion mode to obtain fusion feature graphs;
the step-type structural feature fusion mode comprises the following steps:
defining output feature maps of residual error module layers of the second layer, the third layer and the fourth layer as a first scale feature map, a second scale feature map and a third scale feature map respectively, and defining feature map dimensions of the output feature maps of the residual error module layers of the second layer, the third layer and the fourth layer as a first scale, a second scale and a third scale respectively;
converting the dimension of the feature map of the third-scale feature map into a second scale, and then performing channel splicing with the second-scale feature map to obtain a first splicing result; converting the dimension of the feature map of the first splicing result into a second dimension to obtain a first splicing result of the second dimension;
converting the feature map dimension of the second scale feature map into a first scale, and then performing channel splicing with the first scale feature map to obtain a second splicing result; converting the feature map dimension of the second splicing result into a first dimension to obtain a second splicing result of the first dimension;
converting the feature map dimension of the first splicing result of the second scale into the first scale, then carrying out channel splicing with the second splicing result of the first scale to obtain a third splicing result, and converting the feature map dimension of the third splicing result into the first scale to obtain a third splicing result of the first scale;
and carrying out global average pooling operation on the third-scale feature map, converting the feature map dimension of the global average pooling operation result into a first scale, and carrying out feature fusion with a third splicing result of the first scale to obtain a fused feature map.
Step S6: performing weak and small target detection processing on the fusion feature map through a classification detection judging device to obtain a detection processing result of the weak and small target;
the weak target refers to a target with a target size smaller than or equal to a specified size.
2. The method of claim 1, wherein step S6 comprises:
step S601, extracting candidate frames from the fusion feature map, and obtaining a plurality of final candidate frames after screening and non-maximum suppression processing of the extracted candidate frames;
step S602, extracting the characteristics of each final candidate frame from the fusion characteristic diagram based on each final candidate frame obtained in the step S601, and carrying out pooling operation on the characteristics of the candidate frames to obtain candidate frame characteristics conforming to the expected characteristic size;
step S603, weak and small object detection is performed on the candidate frame features based on the classification detection determiner.
3. The method of claim 1, wherein the class detection determiner is comprised of a fully connected layer and a softmax function layer.
4. The method of claim 1, wherein in step S1, the desired image size is: 512 x 512.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310196144.8A CN116310359A (en) | 2023-03-03 | 2023-03-03 | Intelligent detection method for photoelectric imaging weak and small target in complex environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310196144.8A CN116310359A (en) | 2023-03-03 | 2023-03-03 | Intelligent detection method for photoelectric imaging weak and small target in complex environment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116310359A true CN116310359A (en) | 2023-06-23 |
Family
ID=86800764
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310196144.8A Pending CN116310359A (en) | 2023-03-03 | 2023-03-03 | Intelligent detection method for photoelectric imaging weak and small target in complex environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116310359A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117237746A (en) * | 2023-11-13 | 2023-12-15 | 光宇锦业(武汉)智能科技有限公司 | Small target detection method, system and storage medium based on multi-intersection edge fusion |
-
2023
- 2023-03-03 CN CN202310196144.8A patent/CN116310359A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117237746A (en) * | 2023-11-13 | 2023-12-15 | 光宇锦业(武汉)智能科技有限公司 | Small target detection method, system and storage medium based on multi-intersection edge fusion |
CN117237746B (en) * | 2023-11-13 | 2024-03-15 | 光宇锦业(武汉)智能科技有限公司 | Small target detection method, system and storage medium based on multi-intersection edge fusion |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110084292B (en) | Target detection method based on DenseNet and multi-scale feature fusion | |
CN111767882A (en) | Multi-mode pedestrian detection method based on improved YOLO model | |
CN107273832B (en) | License plate recognition method and system based on integral channel characteristics and convolutional neural network | |
Masurekar et al. | Real time object detection using YOLOv3 | |
CN113052210A (en) | Fast low-illumination target detection method based on convolutional neural network | |
CN114359851A (en) | Unmanned target detection method, device, equipment and medium | |
CN107133943A (en) | A kind of visible detection method of stockbridge damper defects detection | |
CN112949572A (en) | Slim-YOLOv 3-based mask wearing condition detection method | |
CN105550701A (en) | Real-time image extraction and recognition method and device | |
CN112801158A (en) | Deep learning small target detection method and device based on cascade fusion and attention mechanism | |
CN109902576B (en) | Training method and application of head and shoulder image classifier | |
CN114841244B (en) | Target detection method based on robust sampling and mixed attention pyramid | |
CN114332053A (en) | Multimode two-stage unsupervised video anomaly detection method | |
CN111414807A (en) | Tidal water identification and crisis early warning method based on YO L O technology | |
CN116469020A (en) | Unmanned aerial vehicle image target detection method based on multiscale and Gaussian Wasserstein distance | |
CN114202803A (en) | Multi-stage human body abnormal action detection method based on residual error network | |
CN114049572A (en) | Detection method for identifying small target | |
Shi et al. | An efficient multi-task network for pedestrian intrusion detection | |
CN116310359A (en) | Intelligent detection method for photoelectric imaging weak and small target in complex environment | |
CN115116137A (en) | Pedestrian detection method based on lightweight YOLO v5 network model and space-time memory mechanism | |
CN116597411A (en) | Method and system for identifying traffic sign by unmanned vehicle in extreme weather | |
CN116740516A (en) | Target detection method and system based on multi-scale fusion feature extraction | |
CN117542082A (en) | Pedestrian detection method based on YOLOv7 | |
Zhang et al. | An efficient deep neural network with color-weighted loss for fire detection | |
Huang et al. | Detection of river floating debris in uav images based on improved yolov5 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |