Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (3)

Search Parameters:
Keywords = YOLOv5s-ViT-BiFPN

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 20371 KiB  
Article
YOLOv8 Model for Weed Detection in Wheat Fields Based on a Visual Converter and Multi-Scale Feature Fusion
by Yinzeng Liu, Fandi Zeng, Hongwei Diao, Junke Zhu, Dong Ji, Xijie Liao and Zhihuan Zhao
Sensors 2024, 24(13), 4379; https://doi.org/10.3390/s24134379 - 5 Jul 2024
Viewed by 1297
Abstract
Accurate weed detection is essential for the precise control of weeds in wheat fields, but weeds and wheat are sheltered from each other, and there is no clear size specification, making it difficult to accurately detect weeds in wheat. To achieve the precise [...] Read more.
Accurate weed detection is essential for the precise control of weeds in wheat fields, but weeds and wheat are sheltered from each other, and there is no clear size specification, making it difficult to accurately detect weeds in wheat. To achieve the precise identification of weeds, wheat weed datasets were constructed, and a wheat field weed detection model, YOLOv8-MBM, based on improved YOLOv8s, was proposed. In this study, a lightweight visual converter (MobileViTv3) was introduced into the C2f module to enhance the detection accuracy of the model by integrating input, local (CNN), and global (ViT) features. Secondly, a bidirectional feature pyramid network (BiFPN) was introduced to enhance the performance of multi-scale feature fusion. Furthermore, to address the weak generalization and slow convergence speed of the CIoU loss function for detection tasks, the bounding box regression loss function (MPDIOU) was used instead of the CIoU loss function to improve the convergence speed of the model and further enhance the detection performance. Finally, the model performance was tested on the wheat weed datasets. The experiments show that the YOLOv8-MBM proposed in this paper is superior to Fast R-CNN, YOLOv3, YOLOv4-tiny, YOLOv5s, YOLOv7, YOLOv9, and other mainstream models in regards to detection performance. The accuracy of the improved model reaches 92.7%. Compared with the original YOLOv8s model, the precision, recall, mAP1, and mAP2 are increased by 10.6%, 8.9%, 9.7%, and 9.3%, respectively. In summary, the YOLOv8-MBM model successfully meets the requirements for accurate weed detection in wheat fields. Full article
(This article belongs to the Special Issue Sensor and AI Technologies in Intelligent Agriculture: 2nd Edition)
Show Figures

Figure 1

Figure 1
<p>Image data captured. (<b>a</b>) near-distance (<b>b</b>) long-distance.</p>
Full article ">Figure 2
<p>Example of weed labelling (bounding box).</p>
Full article ">Figure 3
<p>Image data expansion. (<b>a</b>) Original images; (<b>b</b>) 45° rotation; (<b>c</b>) 90° rotation; (<b>d</b>) horizontal mirroring; (<b>e</b>) vertical mirroring; (<b>f</b>) Gaussian noise; (<b>g</b>) impulse noise.</p>
Full article ">Figure 4
<p>YOLOv8-MBM network architecture.</p>
Full article ">Figure 5
<p>C2f and C2f-MobileViTv3 module structure. (<b>a</b>) C2f module structure (<b>b</b>). C2f-MobileViTv3 module structure.</p>
Full article ">Figure 6
<p>MobileViTv3 structure diagram.</p>
Full article ">Figure 7
<p>Structure of FPN, PANet, and BiFPN. (<b>a</b>) FPN (<b>b</b>) PANet (<b>c</b>) BiFPN.</p>
Full article ">Figure 8
<p>Performance index curves for different combinations of optimisation algorithms.</p>
Full article ">Figure 9
<p>Loss curves for different optimisation algorithms.</p>
Full article ">Figure 10
<p>Comparison of heat maps of target areas. Note: Row 1 shows the original collection image, where the blue circle marks the weed.</p>
Full article ">Figure 10 Cont.
<p>Comparison of heat maps of target areas. Note: Row 1 shows the original collection image, where the blue circle marks the weed.</p>
Full article ">Figure 11
<p>Effectiveness of YOLOv8-MBM algorithm for weed detection. (<b>a-1</b>) Original figure with good near-distance light; (<b>a-2</b>) detection effect of good near-distance light; <b>(b-1</b>) original figure with bad near-distance light; (<b>b-2</b>) detection effect of bad near-distance light; (<b>c-1</b>) original figure with good long-distance light; (<b>c-2</b>) detection effect of good long-distance light; (<b>d-1</b>) original figure with bad long-distance light; (<b>d-2</b>) detection effect of bad long-distance light. Note: Row 1 shows the original image, where the blue circles mark the weed areas.</p>
Full article ">
25 pages, 10110 KiB  
Article
Dam Extraction from High-Resolution Satellite Images Combined with Location Based on Deep Transfer Learning and Post-Segmentation with an Improved MBI
by Yafei Jing, Yuhuan Ren, Yalan Liu, Dacheng Wang and Linjun Yu
Remote Sens. 2022, 14(16), 4049; https://doi.org/10.3390/rs14164049 - 19 Aug 2022
Cited by 2 | Viewed by 2661
Abstract
Accurate mapping of dams can provide useful information about geographical locations and boundaries and can help improve public dam datasets. However, when applied to disaster emergency management, it is often difficult to completely determine the distribution of dams due to the incompleteness of [...] Read more.
Accurate mapping of dams can provide useful information about geographical locations and boundaries and can help improve public dam datasets. However, when applied to disaster emergency management, it is often difficult to completely determine the distribution of dams due to the incompleteness of the available data. Thus, we propose an automatic and intelligent extraction method that combines location with post-segmentation for dam detection. First, we constructed a dataset named RSDams and proposed an object detection model, YOLOv5s-ViT-BiFPN (You Only Look Once version 5s-Vision Transformer-Bi-Directional Feature Pyramid Network), with a training method using deep transfer learning to generate graphical locations for dams. After retraining the model on the RSDams dataset, its precision for dam detection reached 88.2% and showed a 3.4% improvement over learning from scratch. Second, based on the graphical locations, we utilized an improved Morphological Building Index (MBI) algorithm for dam segmentation to derive dam masks. The average overall accuracy and Kappa coefficient of the model applied to 100 images reached 97.4% and 0.7, respectively. Finally, we applied the dam extraction method to two study areas, namely, Yangbi County of Yunnan Province and Changping District of Beijing in China, and the recall rates reached 69.2% and 81.5%, respectively. The results show that our method has high accuracy and good potential to serve as an automatic and intelligent method for the establishment of a public dam dataset on a regional or national scale. Full article
(This article belongs to the Special Issue Information Retrieval from Remote Sensing Images)
Show Figures

Figure 1

Figure 1
<p>The workflow of the proposed method for dam extraction.</p>
Full article ">Figure 2
<p>Study areas were Yangbi County in Yunnan Province and Changping District in Beijing, China.</p>
Full article ">Figure 3
<p>Examples of the visualization of bounding boxes for dams in the RSDams dataset.</p>
Full article ">Figure 4
<p>An example of different results for the same dam dealing with no NMS, NMS, and Adaptive-SDT-NMS algorithms. (<b>a</b>) There were 21 bounding boxes without NMS; (<b>b</b>) two bounding boxes were left, and the other redundant bounding boxes were removed by NMS; (<b>c</b>) the bounding box with the highest score was selected by our Adaptive-SDT-NMS.</p>
Full article ">Figure 5
<p>Sketch map of the transferred 1–9 layers of YOLOv5s-ViT-BiFPN in this study. Source domain indicates the COCO dataset trained on the YOLOv5s network, of which the first nine layers were the same as those of YOLOv5s-ViT-BiFPN. The target domain was the RSDams dataset.</p>
Full article ">Figure 6
<p>An example of pre-processing on high-resolution satellite imagery and post-processing for the results of dam detection and segmentation. (<b>a</b>) The original image; (<b>b</b>) one image patch; (<b>c</b>) the image patch with a 10% overlap that is adjacent to (<b>b</b>); (<b>d</b>) dam extraction result of (<b>b</b>), which has a dam. The red rectangle is the result of dam detection, and the yellow irregular polygon is the result of dam segmentation. (<b>e</b>) Dam extraction result of (<b>c</b>), which has no dam targets; (<b>f</b>) dam extraction result of the original image.</p>
Full article ">Figure 7
<p>Construction of dam candidate areas based on the JRC-GSW raster. (<b>a</b>) Original image; (<b>b</b>) JRC-GSW raster; (<b>c</b>) water polygon; (<b>d</b>) water points; (<b>e</b>) water buffer; (<b>f</b>) dam candidate areas.</p>
Full article ">Figure 8
<p>Overlap dam image with built-up and bare land from the ESA Global Land Cover dataset.</p>
Full article ">Figure 9
<p>Performance of deep transfer learning using different frozen layers on the RSDams validation set. (<b>a</b>) Column graphs for training time by different frozen layers. (<b>b</b>) Line and symbol graphs for accuracy using different frozen layers. Black circles represent the index of precision. Red squares represent the recall rate. Green diamonds represent the F1 score. Blue triangles represent mAP.</p>
Full article ">Figure 10
<p>Training losses and precision curves of YOLOv5s-ViT-BiFPN based on learning from scratch and transfer learning with pretrained weights over the COCO dataset. (<b>a</b>) Change curves of loss values for YOLOv5s-ViT-BiFPN based on learning from scratch (blue line) and deep transfer learning (red line). (<b>b</b>) The change curves of precision for YOLOv5s-ViT-BiFPN based on learning from scratch (blue line) and deep transfer learning (red line).</p>
Full article ">Figure 11
<p>Confusion matrixes for different NMS algorithms on the RSDams (<b>left</b>) and DIOR Dams (<b>right</b>) test sets.</p>
Full article ">Figure 12
<p>Comparison of the post-processing results for the NMS and Adaptive-SDT-NMS algorithms. The examples in the first row contain the false positives of the original NMS of YOLOv5s, and the second row shows the real positive cases from the Adaptive-SDT-NMS algorithm.</p>
Full article ">Figure 13
<p>Processes of dam segmentation. (<b>a</b>) Original images: randomly selected from validation samples of RSDams. (<b>b</b>) Dam detection results: the bounding boxes are the visualizations of the results for dam detection. (<b>c</b>) MBI feature images: increasing MBI values from black to bright white. (<b>d</b>) SLIC images: visualization for superpixels using SLIC operation. (<b>e</b>) Dam segmentation results: the white areas are dam bodies, and the black ones are background. (<b>f</b>) The results according to a visual interpretation: the blue areas are dam bodies, and the black ones are background.</p>
Full article ">Figure 14
<p>Performance of dam segmentation. The blue dots indicate the evaluation results of 100 test images, and the dark red dotted lines represent the trend lines for (<b>a</b>) overall accuracy, (<b>b</b>) Kappa, (<b>c</b>) omission errors, and (<b>d</b>) commission errors.</p>
Full article ">Figure 15
<p>Results of dam extraction in the two study areas. (<b>a</b>) The results of dam extraction in Yangbi. (<b>b</b>) The results of dam extraction in Changping. (<b>c</b>) Examples of dam segmentation in Yangbi. (<b>d</b>) Examples of dam segmentation in Changping.</p>
Full article ">Figure 16
<p>False positives in Yangbi (<b>a</b>,<b>b</b>) and Changping (<b>c</b>–<b>e</b>): (<b>a</b>) riverbank, (<b>b</b>) bridge, (<b>c</b>) building, (<b>d</b>) levee, and (<b>e</b>) bridge.</p>
Full article ">Figure 17
<p>Grad-CAM maps of several critical layers in the dam detection process. The first column shows the original images. The middle three columns are the Grad-CAM maps, Backbone+ViT, and the BiFPN convolution layers. The last column is the visualization of the bounding boxes.</p>
Full article ">Figure 18
<p>Performance of dam segmentation without the SLIC algorithm. The blue dots indicate the evaluation results of the 100 test images, and the dark red dotted lines represent the trend lines for (<b>a</b>) overall accuracy, (<b>b</b>) Kappa, (<b>c</b>) omission errors, and (<b>d</b>) commission errors.</p>
Full article ">Figure 18 Cont.
<p>Performance of dam segmentation without the SLIC algorithm. The blue dots indicate the evaluation results of the 100 test images, and the dark red dotted lines represent the trend lines for (<b>a</b>) overall accuracy, (<b>b</b>) Kappa, (<b>c</b>) omission errors, and (<b>d</b>) commission errors.</p>
Full article ">Figure 19
<p>Comparison of dam segmentation without and with the SLIC algorithm. (<b>a</b>) Original images; (<b>b</b>) dam detection results; (<b>c</b>) dam segmentation results without the SLIC algorithm; (<b>d</b>) dam segmentation results with the SLIC algorithm.</p>
Full article ">Figure 20
<p>Examples of OSM Dams overlapped by satellite images. (<b>a</b>–<b>c</b>) Well-matched dam masks; (<b>d</b>–<b>f</b>) poorly matched dam masks.</p>
Full article ">
22 pages, 9176 KiB  
Article
Automatic Extraction of Damaged Houses by Earthquake Based on Improved YOLOv5: A Case Study in Yangbi
by Yafei Jing, Yuhuan Ren, Yalan Liu, Dacheng Wang and Linjun Yu
Remote Sens. 2022, 14(2), 382; https://doi.org/10.3390/rs14020382 - 14 Jan 2022
Cited by 36 | Viewed by 4801
Abstract
Efficiently and automatically acquiring information on earthquake damage through remote sensing has posed great challenges because the classical methods of detecting houses damaged by destructive earthquakes are often both time consuming and low in accuracy. A series of deep-learning-based techniques have been developed [...] Read more.
Efficiently and automatically acquiring information on earthquake damage through remote sensing has posed great challenges because the classical methods of detecting houses damaged by destructive earthquakes are often both time consuming and low in accuracy. A series of deep-learning-based techniques have been developed and recent studies have demonstrated their high intelligence for automatic target extraction for natural and remote sensing images. For the detection of small artificial targets, current studies show that You Only Look Once (YOLO) has a good performance in aerial and Unmanned Aerial Vehicle (UAV) images. However, less work has been conducted on the extraction of damaged houses. In this study, we propose a YOLOv5s-ViT-BiFPN-based neural network for the detection of rural houses. Specifically, to enhance the feature information of damaged houses from the global information of the feature map, we introduce the Vision Transformer into the feature extraction network. Furthermore, regarding the scale differences for damaged houses in UAV images due to the changes in flying height, we apply the Bi-Directional Feature Pyramid Network (BiFPN) for multi-scale feature fusion to aggregate features with different resolutions and test the model. We took the 2021 Yangbi earthquake with a surface wave magnitude (Ms) of 6.4 in Yunan, China, as an example; the results show that the proposed model presents a better performance, with the average precision (AP) being increased by 9.31% and 1.23% compared to YOLOv3 and YOLOv5s, respectively, and a detection speed of 80 FPS, which is 2.96 times faster than YOLOv3. In addition, the transferability test for five other areas showed that the average accuracy was 91.23% and the total processing time was 4 min, while 100 min were needed for professional visual interpreters. The experimental results demonstrate that the YOLOv5s-ViT-BiFPN model can automatically detect damaged rural houses due to destructive earthquakes in UAV images with a good performance in terms of accuracy and timeliness, as well as being robust and transferable. Full article
(This article belongs to the Special Issue Intelligent Damage Assessment Systems Using Remote Sensing Data)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>The flow for the automatic extraction of damaged houses based on YOLOv5s-ViT-BiFPN.</p>
Full article ">Figure 2
<p>The study area.</p>
Full article ">Figure 3
<p>The samples of types of damaged houses. The green dots are the locations of field investigation.</p>
Full article ">Figure 4
<p>The UAV orthophotos acquired after the Yangbi Earthquake in Cangshanxi Town, Yangbi County, Yunnan Province: (<b>a</b>) Beiyinpo; (<b>b</b>) Jinniu; (<b>c</b>) Xiajie; (<b>d</b>) Baiyang (<b>e</b>) Cunwei; (<b>f</b>) Baimu; (<b>g</b>) Hetaoyuan, and (<b>h</b>) Longjing.</p>
Full article ">Figure 5
<p>The samples of damaged houses by the Yangbi Earthquake. The red boxes are the bounding boxes for damaged houses.</p>
Full article ">Figure 6
<p>The network architecture of YOLOv5.</p>
Full article ">Figure 7
<p>Examples of Mosaic data enhancement. Mosaic scales the four different images and arranges them to fit in the desired output size. The red boxes are the bounding boxes for damaged houses.</p>
Full article ">Figure 8
<p>The Focus structure of YOLOv5.</p>
Full article ">Figure 9
<p>The structure of the Vision Transformer.</p>
Full article ">Figure 10
<p>Replacing PANet with BiFPN to improve Feature Fusion Network. For PANet, a top-down and bottom-up pathway were adopted to fuse multi-scale features; for BiFPN, the strategy for top-down and bottom-up bidirectional feature fusion was used and then repeated, applying the same block.</p>
Full article ">Figure 11
<p>The improved network architecture for YOLOv5s-ViT-BiFPN. The Vision Transformer is inserted behind the Backbone. The PANet is replaced by BiFPN to fuse the multi-scale features and, in this study, only repeated once for efficiency. The blue box is the outputs for different scales.</p>
Full article ">Figure 12
<p>Comparison of change curves of the loss function and AP for 3 models: (<b>a</b>) Change curve of loss function and (<b>b</b>) change curve of AP.</p>
Full article ">Figure 13
<p>Accuracy comparison for the performances of the four models based on the metrics Precision (%), Recall (%), F1 (%), AP (%).</p>
Full article ">Figure 14
<p>The test results of YOLOv5s-ViT-BiFPN for the 5 test area: (<b>d</b>) Baiyang (<b>e</b>) Cunwei; (<b>f</b>) Baimu; (<b>g</b>) Hetaoyuan, and (<b>h</b>) Longjing. The red blocks are the damaged houses, and the green blocks are the missing targets.</p>
Full article ">Figure 14 Cont.
<p>The test results of YOLOv5s-ViT-BiFPN for the 5 test area: (<b>d</b>) Baiyang (<b>e</b>) Cunwei; (<b>f</b>) Baimu; (<b>g</b>) Hetaoyuan, and (<b>h</b>) Longjing. The red blocks are the damaged houses, and the green blocks are the missing targets.</p>
Full article ">Figure 15
<p>The examples of detection results by YOLOv5s-ViT-BiFPN.</p>
Full article ">Figure 15 Cont.
<p>The examples of detection results by YOLOv5s-ViT-BiFPN.</p>
Full article ">Figure 16
<p>The samples for UAV images of different types of houses damaged by the Ya’an Ms7.0 earthquake on 20 April. The red vertical bounding boxes are the results of our method. The red irregular polygons are the annotations from references [<a href="#B8-remotesensing-14-00382" class="html-bibr">8</a>,<a href="#B9-remotesensing-14-00382" class="html-bibr">9</a>].</p>
Full article ">Figure 17
<p>Visualization of the feature maps. (<b>a</b>) The feature maps of the first layer; (<b>b</b>) The feature maps of the Vision Transformer layer; (<b>c</b>) The feature maps of the BiFPN structure; (<b>d</b>)The feature maps of the output layer.</p>
Full article ">Figure 17 Cont.
<p>Visualization of the feature maps. (<b>a</b>) The feature maps of the first layer; (<b>b</b>) The feature maps of the Vision Transformer layer; (<b>c</b>) The feature maps of the BiFPN structure; (<b>d</b>)The feature maps of the output layer.</p>
Full article ">Figure 18
<p>Visualization of the heatmaps. (<b>a</b>) The original images; (<b>b</b>) The heatmaps of the first layer; (<b>c</b>) The heatmaps of the Vision Transformer layer; (<b>d</b>) The heatmaps of the BiFPN structure; (<b>e</b>) The heatmaps of the output layer; (<b>f</b>) The final results.</p>
Full article ">
Back to TopTop