CN117557774A - Unmanned aerial vehicle image small target detection method based on improved YOLOv8 - Google Patents
Unmanned aerial vehicle image small target detection method based on improved YOLOv8 Download PDFInfo
- Publication number
- CN117557774A CN117557774A CN202311456286.XA CN202311456286A CN117557774A CN 117557774 A CN117557774 A CN 117557774A CN 202311456286 A CN202311456286 A CN 202311456286A CN 117557774 A CN117557774 A CN 117557774A
- Authority
- CN
- China
- Prior art keywords
- aerial vehicle
- unmanned aerial
- image
- convolution
- yolov8
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 47
- 238000000034 method Methods 0.000 claims abstract description 22
- 230000004927 fusion Effects 0.000 claims abstract description 19
- 238000012549 training Methods 0.000 claims abstract description 16
- 238000013136 deep learning model Methods 0.000 claims abstract description 9
- 238000005070 sampling Methods 0.000 claims abstract description 9
- 238000002372 labelling Methods 0.000 claims abstract 2
- 238000000605 extraction Methods 0.000 claims description 8
- 238000012545 processing Methods 0.000 claims description 5
- 238000013507 mapping Methods 0.000 claims description 4
- 241001125929 Trisopterus luscus Species 0.000 claims description 2
- 230000003247 decreasing effect Effects 0.000 claims description 2
- 235000002566 Capsicum Nutrition 0.000 claims 1
- 239000006002 Pepper Substances 0.000 claims 1
- 235000016761 Piper aduncum Nutrition 0.000 claims 1
- 235000017804 Piper guineense Nutrition 0.000 claims 1
- 244000203593 Piper nigrum Species 0.000 claims 1
- 235000008184 Piper nigrum Nutrition 0.000 claims 1
- 238000013434 data augmentation Methods 0.000 claims 1
- 238000003672 processing method Methods 0.000 claims 1
- 150000003839 salts Chemical class 0.000 claims 1
- 238000004364 calculation method Methods 0.000 description 5
- 230000004913 activation Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000010200 validation analysis Methods 0.000 description 2
- 241000282816 Giraffa camelopardalis Species 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
- G06V20/17—Terrestrial scenes taken from planes or by drones
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Databases & Information Systems (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Biodiversity & Conservation Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Remote Sensing (AREA)
- Image Analysis (AREA)
Abstract
The invention provides an unmanned aerial vehicle image small target detection method based on improved YOLOv8, which comprises the steps of collecting and labeling various unmanned aerial vehicle shooting images and establishing an unmanned aerial vehicle image data set; based on a Yolov8 original network, a backbone network ITNet is introduced, dynamic convolution ODConv is used for replacing Conv convolution, a neck module SGFPN is used, a feature fusion module CSF is introduced, a CARAFE up-sampling method is used for replacing nearest neighbor sampling, an improved YOLOv8 network structure is used as an unmanned aerial vehicle image recognition network, a deep learning model for unmanned aerial vehicle small target recognition detection is obtained through training, unmanned aerial vehicle images are detected, and high-accuracy detection of the unmanned aerial vehicle images is achieved.
Description
Technical Field
The invention belongs to the technical field of deep learning target detection, and particularly relates to an unmanned aerial vehicle image small target detection method based on improved YOLOv8
Background
In recent years, a target detection algorithm based on a convolutional neural network is widely applied and developed in the fields of remote sensing image processing, unmanned aerial vehicle navigation, automatic driving, medical diagnosis, face recognition, defect detection and the like. Conventional target detection algorithms can basically meet the requirements in various scenes, but the algorithms are mainly faced with large and medium targets, and for small targets of an aerial view of an unmanned aerial vehicle, due to the fact that effective features are few, enough feature information is difficult to extract, and the effect is unsatisfactory. In particular, even the most advanced detectors have a great performance gap in detecting small and medium-sized objects.
Currently popular object detectors typically comprise a backbone network and a detection head, the decision of the latter being dependent on the representation output of the former, which has proven to be effective. However, the small target feature information is originally small, and is hardly reserved after a plurality of downsampling, so that a network can hardly learn useful information, and a detection head cannot make a correct decision, which is fatal to small target detection. Therefore, the detection accuracy of the current detector for the small target of the unmanned aerial vehicle is lower.
Disclosure of Invention
The invention aims to provide an unmanned aerial vehicle small target recognition detection method based on improved YOLOv8, which aims to solve the technical problem of low accuracy in unmanned aerial vehicle picture detection.
In order to solve the technical problems, the specific technical scheme of the unmanned aerial vehicle small target recognition detection method based on the improved YOLOv8 is as follows:
an unmanned aerial vehicle small target recognition detection method based on improved YOLOv8,
step 1, data are obtained from pictures shot by an unmanned aerial vehicle in a real living environment, ten images such as people, vehicles and the like are marked, an unmanned aerial vehicle picture data set is established, and a Mosaic data enhancement method is used for carrying out data enhancement on the data set;
step 2, taking a yolov8 network structure as a reference network, introducing a main network ITNet (Inverted Triangle Net), using a dynamic convolution ODConv to replace Conv convolution, using a neck module SGFPN, introducing a feature fusion module CSF, using a CARAFE up-sampling method to replace nearest neighbor sampling, using the improved yolov5 network structure as an unmanned aerial vehicle small target recognition network, and obtaining a deep learning model of unmanned aerial vehicle small target recognition detection through training;
and step 3, inputting the unmanned aerial vehicle small target image to be detected and identified into a deep learning model for unmanned aerial vehicle small target identification and detection for detection.
Further, the detection network is modified based on the YOLOv8 network structure, and comprises 4C 2f modules, 1 SPPF module, 6 ODConv modules, 7 CSF modules, 7 Concat modules, 3 upsampling modules and 6 conv modules.
Further, the C2f module includes a 3×3 convolution layer, a BN (Batch Normalization) layer, and a SiLU activation function layer, which are sequentially cascaded;
the SPPF module comprises 5 multiplied by 5 global pooling layers which are sequentially cascaded, and the results are spliced through concat;
the Conv module comprises a 1 multiplied by 1 convolution layer, a BN layer and a ReLu activation function layer which are sequentially cascaded;
the ODConv module is represented as
y=(α w1 ⊙α f1 ⊙α c1 ⊙α s1 ⊙W 1 +…+α wn ⊙α fn ⊙α cn ⊙α sn ⊙W n )*x
Wherein x ε R (h x ω x c_in) and y ε R (h x ω x c_out) represent the input and output features, respectively (channel number c_in/c_out, width and height of feature h, ω, respectively), W i Representing an ith convolution kernel consisting of a c_out filter (w_i∈r (kxkxc_in), m=1, …, c_out); x 0_wi×1r represents the attention scalar of the convolution kernel w_i; alpha_si epsilon R (k x k), alpha_ci epsilon R (c_in) and alpha_fi epsilon R (c_out) represent three newly introduced notes, calculated along the spatial dimension, input channel dimension and output channel dimension of the convolution kernel W_i, respectively; x 2 represents multiplication operations along different dimensions of the kernel space.
The CSF module comprises three branches, wherein the first branch is a 3X 3RepConv convolution layer which is sequentially cascaded, the second branch is a PConv module and a Conv module, the third branch is a Conv module, and the outputs of the three branches are spliced through a concat layer;
further, the method for preprocessing the unmanned aerial vehicle image dataset comprises the following steps: the xml file generated using the VOC annotation mode is converted into txt file required for YOLO training.
Further, the data set dividing method comprises the following steps: 60% data was used as training set, 20% data was used as validation set, and 20% data was used as test set.
Further, setting model training parameters, wherein the initial learning rate is 0.01, the momentum is 0.937, the weight attenuation is 0.0005, the training threshold is 0.2, the picture size is normalized to 640×640, the iteration number is 300, and the batch size is 16;
compared with the original YOLOv8 target detection network, the improved YOLOv8 network provided by the invention can realize accurate detection of small target objects under a complex background on the detection task of small targets of an unmanned aerial vehicle, and reduces the parameter quantity and the calculation quantity. Firstly, a trunk which increases the number of the convolution of the shallow extraction features is designed, the extraction of the shallow information by the network is enhanced, the full-dimensional dynamic convolution is utilized for encoding, and the extraction capability of the network to the features of the small target is effectively improved. Secondly, a feature fusion module is provided to further enhance the multi-layer and feature fusion capability of the network. Thirdly, a neck structure is designed, shallow information extraction is increased, and the mining capability of the network on small target position information is enhanced.
According to the method, small targets of the unmanned aerial vehicle with more ground object shielding and complex background in a low-altitude scene are detected, and the manpower and time cost for manually collecting and processing data is reduced through a deep learning method. And the data enhancement mode is utilized to acquire more comprehensive and higher-quality data.
Drawings
FIG. 1 is a schematic and flow chart of the overall architecture of the present invention;
FIG. 2 is a block diagram of a method study of the present invention;
FIG. 3 is a diagram of the improved YOLOv8 network of the present invention;
FIG. 4 is a block diagram of CSF in accordance with the invention;
FIG. 5 is a graph showing the comparison of the changes of the evaluation indexes before and after model improvement;
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
A method for detecting low-speed targets based on edge calculation, as shown in fig. 2, comprises the following steps:
s1, collecting image sets of different targets under different exposure degrees, and processing the image sets to obtain a low-small slow target data set;
specifically, images of different targets under different exposure degrees are acquired through a camera, low and slow targets in the images are marked by a marking tool, so that in order to enhance the generalization, the mosaine and mixup combined data of yolov4 are referenced, the data dimension is enhanced, and the image fuzzy data of different degrees are increased according to different fuzzification of the small targets, so that the detection precision of the fuzzy data is improved.
S2, designing a trunk, enhancing the extraction of shallow information by a network, and encoding by utilizing full-dimensional dynamic convolution.
Specifically, through a large number of experiments, we find that downsampling can improve translational invariance, avoid overfitting, and reduce calculation cost. Small objects occupy very few pixels, while downsampling may remove important features that identify these objects. The only way to save information about small features is to encode these features in the earliest layers using convolution filters and pass this information to the subsequent layers. However, in existing backbones, the number of shallow convolution filters is kept to a minimum to reduce the computational burden, which may result in loss of small target key discrimination features.
The original CSPDarkNet53 reduced the feature map size by a factor of 4 in a 2-layer convolution. The use of such backbones to handle tiny object detection may result in tiny object information disappearing in the feature map before complete extraction. To solve this problem, we propose ITNet. The number of convolution kernels for feature extraction is increased in the shallow layer compared to the original Backbone, while the number of kernels is decreased in the deep layer to improve the computational efficiency. Furthermore, we use the full-dimensional dynamic convolution ODConv at the time of downsampling, so as to preserve the full-dimensional information of the object.
S3, designing a feature fusion module CSF based on upper and lower layers, and designing a neck structure SGFPN, wherein more shallow and high-resolution information is reserved.
Specifically, YOLOv8 uses PANet for feature fusion and top-down and bottom-up feature layers of different dimensions for fusion. The PAN structure in YOLOv8 uses bottom-up paths and lateral connections, the bottom-up paths upsample by a spatially lower resolution but semantically stronger feature map, yielding higher resolution features. These features are enhanced by fusing the lateral connections with features on the same level. Each lateral join incorporates feature maps of the same spatial size from the bottom-up path and the top-down path.
Shallow feature mapping is a lower level of semantics but its activation is more accurately located because it is downsampled less often, so the feature map of this layer is also fused when multi-scale features are acquired and a detection head is added. The performance improvement for small target detection is very significant after adding an additional detection head, although computation and memory costs increase.
In addition, P2 is only down-sampled four times compared to the input picture, which contains much interference information, so we use a feature fusion module to better extract features.
GFPN enhances feature interactions through queen-fusion, but it also brings a large number of additional upsampling and downsampling operations, which are disadvantageous for small targets and which are easily lost during sampling. And the transmission of information is provided from the early node to the later stage through the layer Connection (Skip-layer Connection), but the information almost reaches the subsequent layer through transverse transmission, and the redundant information transmission is generated by continuing to do so, and meanwhile, more parameters and calculation amount are introduced, so that the model efficiency is reduced. In order to further research an effective multi-scale feature fusion method and achieve a better target detection effect, the connection method of the feature fusion layer is improved. The structure adds cross-scale links and uses a modified giraffe feature pyramid network for feature fusion.
The SGFPN of the invention maintains more information of small targets in the upper layer by adding fusion to the features of the upper layer. The system can integrate more features, realize multi-scale feature fusion and obtain a larger receiving domain and an accurate object position. After adding the P2 layer, an up-sampling is added on the P3 layer, the up-sampling is transversely connected with the P2 layer, the F3, F4 and F5 nodes are respectively connected with the P2, P3 and P4 nodes, the N3, N4 and N5 nodes are respectively connected with the F2, F3 and F4 nodes, and the characteristics can be fused better by adding the connections. The final improved structure is shown in figure 3.
The fusion module used in the present invention is CSF (Cross-scale fusion) which is used to fuse incoming multi-layer feature maps. The structure of the CSF module is shown. The original feature fusion module adopts simple channel connection, and only the features are overlapped. To introduce context information and refine the (refine) feature map, we propose a feature fusion module CSF for each scale feature in the k-level
Where Concat () refers to the concatenation of feature maps generated in all previous layers, while Conv1 () represents a 3x3 convolution.
Wherein Conv2 () represents a 1×1 convolution
Basicbolck(P 1 )=Conv1(RepConv(P 1 ))
Where RepConv is typically a convolution block that combines a 3x3 convolution, a 1 x 1 convolution, and an identity mapping in one convolution layer, the structure is shown in fig. 4. The RepConv can learn rich features after one mapping, and is a multi-branch structure, so that the performance can be improved through multiple branches in the training process, and reasoning can be converted into a continuous straight-cylinder structure with 3X3 convolution and ReLU activation functions through structure reparameterization, so that the reasoning speed is accelerated.
Finally, the overcomplete operation truncates the gradient stream to prevent the different layers from learning duplicate gradient information.
Pout=Concat(P 1 ,P 2 ,P 3 ,Basicbolck(P 1 ),(Basicbolck(P 1 )) 2 ,(Basicbolck(P 1 )) 3 )
Wherein, (Basicbolck (P) 1 )) n Representing n basic bolck () connections. PConv refers to depth convolution using a 3x3 kernel size for capturing important local spatial regions for each channel
The CSF module retains the advantages of RepConv characteristic reuse and structure re-parametrization, and simultaneously intercepts gradient flow, prevents excessive repeated gradient information, can well fuse various characteristic diagrams, and accelerates reasoning speed.
In summary, these modules together form a backhaul part in the YOLOv8 network structure, which is used to extract and fuse multi-scale feature information, so as to support accuracy and robustness of the target detection task;
and building a virtual environment for a training model on the GPU server, inputting a training set into the improved yolov8 network structure to perform target detection model training, obtaining a deep learning model for unmanned aerial vehicle image recognition detection after training is completed, inputting a verification set into the deep learning model for unmanned aerial vehicle image recognition detection to perform verification, optimizing the deep learning model according to the effect obtained by verification, and finally obtaining the deep learning model for unmanned aerial vehicle image recognition detection with the best effect.
In one embodiment, a 3×3 convolution and a 1×1 convolution are utilized as the final output module of the YOLOv8 network; and respectively inputting the detected feature maps with three different pixel scales into a YOLO Head for decoding, extracting global features through a 3×3 convolution layer, fully connecting the 1×1 convolution layers, and finally calculating to obtain a prediction boundary box, a confidence value and a category. After the YOLO Head, the loss function value of the detection model is minimized through iterative calculation, and when the training time iteration is completed, the model with the highest detection precision is selected as the final detection model.
Sequentially stacking the modified structures and modules according to the original YOLOv8 network structure form, so as to obtain an improved YOLOv8 network structure; model training parameters include:
the initial learning rate was 0.01, the momentum was set to 0.937, the weight decay was set to 0.0005, the training threshold was 0.2, the picture sizes were all normalized to 640 x 640, the number of iterations was 300, and the batch size was 16.
The data set dividing method comprises the following steps: 60% data was used as training set, 20% data was used as validation set, and 20% data was used as test set.
The bounding box loss is calculated using CIoU, the loss of objects and categories is calculated using cross entropy, and back propagation update model parameters are performed.
It should be emphasized that the examples described herein are illustrative rather than limiting, and therefore the invention includes, but is not limited to, the examples described in the detailed description, as other embodiments derived from the technical solutions of the invention by a person skilled in the art are equally within the scope of the invention.
It will be understood that the invention has been described in terms of several embodiments, and that various changes and equivalents may be made to these features and embodiments by those skilled in the art without departing from the spirit and scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.
Claims (5)
1. An unmanned aerial vehicle image small target detection method based on improved YOLOv8 is characterized by comprising the following steps:
step 1, data are obtained from pictures shot by an unmanned aerial vehicle in a real living environment, ten images such as people, vehicles and the like are marked, an unmanned aerial vehicle picture data set is established, and a Mosaic data enhancement method is used for carrying out data enhancement on the data set;
step 2, taking a yolov8 network structure as a reference network, introducing a main network ITNet (Inverted Triangle Net), using a dynamic convolution ODConv to replace Conv convolution, using a neck module SGFPN, introducing a feature fusion module CSF, using a CARAFE up-sampling method to replace nearest neighbor sampling, using the improved yolov8 network structure as an unmanned aerial vehicle image recognition network, and obtaining a deep learning model of unmanned aerial vehicle small target recognition detection through training;
and step 3, inputting the unmanned aerial vehicle small target image to be detected and identified into a deep learning model for unmanned aerial vehicle small target identification and detection for detection.
2. The unmanned aerial vehicle image small target detection method based on improved YOLOv8 of claim 1, wherein the method comprises the following steps of: the specific implementation method of the step 1 comprises the following steps:
step 1.1, acquiring an image shot by an unmanned aerial vehicle in a real environment through a mobile terminal instrument; labeling information on the acquired image by using a LabelImg tool;
and 1.2, performing data enhancement on the data set by using a Mosaic data enhancement method, and establishing an unmanned aerial vehicle image data set.
3. The unmanned aerial vehicle image small target detection method based on improved YOLOv8 of claim 2, wherein the method comprises the following steps of: the Mosaic data augmentation processing method performs a series of image processing operations on a given image file, including randomly using a plurality of pictures, randomly scaling, randomly distributing, stitching, cropping the image, horizontally turning the image, rotating the image by 90 degrees, reducing the brightness of the image, improving the brightness of the image, performing blurring processing on the image, adding salt and pepper noise into the image, and adding Gaussian noise into the image.
4. The unmanned aerial vehicle image small target detection method based on improved YOLOv8 of claim 1, wherein the method comprises the following steps of: the specific implementation method of the step 2 comprises the following steps: the number of convolution kernels for feature extraction is increased in the shallow layer, while the number of kernels is decreased in the deep layer to improve computational efficiency. Furthermore, we use the full-dimensional dynamic convolution ODConv at the time of downsampling, so as to preserve the full-dimensional information of the object.
5. The unmanned aerial vehicle image small target detection method based on improved YOLOv8 of claim 1, wherein the method comprises the following steps of: the specific implementation method of the step 2 comprises the following steps of:
the fusion module used in the present invention is CSF (Cross-scale fusion) which is used to fuse incoming multi-layer feature maps. For each scale feature in the k-level
Where Concat () refers to the concatenation of feature maps generated in all previous layers, while Conv1 () represents a 3x3 convolution.
Wherein Conv2 () represents a 1×1 convolution
Basicbolck(P 1 )=Conv1(RepConv(P 1 ))
Among these, repConv is typically a convolution block that combines a 3x3 convolution, a 1 x 1 convolution, and an identity mapping in one convolution layer.
Finally, the overcomplete operation truncates the gradient stream to prevent the different layers from learning duplicate gradient information.
Pout=Concat(P 1 ,P 2, P 3 ,Basicbolck(P 1 ),(Basicbolck(P 1 )) 2 ,(Basicbolck(P 1 )) 3 )
Wherein, (Basicbolck (P) 1 )) n Representing n basic bolck () connections. PConv refers to a depth convolution using a 3x3 kernel size for capturing important local spatial regions for each channel.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311456286.XA CN117557774A (en) | 2023-11-03 | 2023-11-03 | Unmanned aerial vehicle image small target detection method based on improved YOLOv8 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311456286.XA CN117557774A (en) | 2023-11-03 | 2023-11-03 | Unmanned aerial vehicle image small target detection method based on improved YOLOv8 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117557774A true CN117557774A (en) | 2024-02-13 |
Family
ID=89813819
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311456286.XA Pending CN117557774A (en) | 2023-11-03 | 2023-11-03 | Unmanned aerial vehicle image small target detection method based on improved YOLOv8 |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117557774A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118247581A (en) * | 2024-05-23 | 2024-06-25 | 中国科学技术大学 | Method and device for labeling and analyzing gestures of key points of animal images |
CN118658047A (en) * | 2024-08-20 | 2024-09-17 | 成都唐源电气股份有限公司 | Small target detection method based on improved YOLOv model |
-
2023
- 2023-11-03 CN CN202311456286.XA patent/CN117557774A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118247581A (en) * | 2024-05-23 | 2024-06-25 | 中国科学技术大学 | Method and device for labeling and analyzing gestures of key points of animal images |
CN118658047A (en) * | 2024-08-20 | 2024-09-17 | 成都唐源电气股份有限公司 | Small target detection method based on improved YOLOv model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110135366B (en) | Shielded pedestrian re-identification method based on multi-scale generation countermeasure network | |
CN113298818B (en) | Remote sensing image building segmentation method based on attention mechanism and multi-scale features | |
CN110956094B (en) | RGB-D multi-mode fusion personnel detection method based on asymmetric double-flow network | |
CN115331087B (en) | Remote sensing image change detection method and system fusing regional semantics and pixel characteristics | |
CN109934200B (en) | RGB color remote sensing image cloud detection method and system based on improved M-Net | |
CN112150493B (en) | Semantic guidance-based screen area detection method in natural scene | |
CN113052210A (en) | Fast low-illumination target detection method based on convolutional neural network | |
CN117557774A (en) | Unmanned aerial vehicle image small target detection method based on improved YOLOv8 | |
CN111666842B (en) | Shadow detection method based on double-current-cavity convolution neural network | |
Delibasoglu et al. | Improved U-Nets with inception blocks for building detection | |
CN114519819B (en) | Remote sensing image target detection method based on global context awareness | |
CN113516126A (en) | Adaptive threshold scene text detection method based on attention feature fusion | |
CN105243154A (en) | Remote sensing image retrieval method and system based on significant point characteristics and spare self-encodings | |
CN112686828B (en) | Video denoising method, device, equipment and storage medium | |
CN113111740A (en) | Characteristic weaving method for remote sensing image target detection | |
CN112149526B (en) | Lane line detection method and system based on long-distance information fusion | |
CN114926734B (en) | Solid waste detection device and method based on feature aggregation and attention fusion | |
Liu et al. | CAFFNet: channel attention and feature fusion network for multi-target traffic sign detection | |
CN116071676A (en) | Infrared small target detection method based on attention-directed pyramid fusion | |
CN115861756A (en) | Earth background small target identification method based on cascade combination network | |
CN117197462A (en) | Lightweight foundation cloud segmentation method and system based on multi-scale feature fusion and alignment | |
CN117726954A (en) | Sea-land segmentation method and system for remote sensing image | |
CN117994573A (en) | Infrared dim target detection method based on superpixel and deformable convolution | |
CN112926667A (en) | Method and device for detecting saliency target of depth fusion edge and high-level feature | |
CN117392508A (en) | Target detection method and device based on coordinate attention mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |