Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (12,677)

Search Parameters:
Keywords = mapping accuracy

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 13718 KiB  
Article
BG-YOLO: A Bidirectional-Guided Method for Underwater Object Detection
by Ruicheng Cao, Ruiteng Zhang, Xinyue Yan and Jian Zhang
Sensors 2024, 24(22), 7411; https://doi.org/10.3390/s24227411 (registering DOI) - 20 Nov 2024
Abstract
Degraded underwater images decrease the accuracy of underwater object detection. Existing research uses image enhancement methods to improve the visual quality of images, which may not be beneficial in underwater image detection and lead to serious degradation in detector performance. To alleviate this [...] Read more.
Degraded underwater images decrease the accuracy of underwater object detection. Existing research uses image enhancement methods to improve the visual quality of images, which may not be beneficial in underwater image detection and lead to serious degradation in detector performance. To alleviate this problem, we proposed a bidirectional guided method for underwater object detection, referred to as BG-YOLO. In the proposed method, a network is organized by constructing an image enhancement branch and an object detection branch in a parallel manner. The image enhancement branch consists of a cascade of an image enhancement subnet and object detection subnet. The object detection branch only consists of a detection subnet. A feature-guided module connects the shallow convolution layers of the two branches. When training the image enhancement branch, the object detection subnet in the enhancement branch guides the image enhancement subnet to be optimized towards the direction that is most conducive to the detection task. The shallow feature map of the trained image enhancement branch is output to the feature-guided module, constraining the optimization of the object detection branch through consistency loss and prompting the object detection branch to learn more detailed information about the objects. This enhances the detection performance. During the detection tasks, only the object detection branch is reserved so that no additional computational cost is introduced. Extensive experiments demonstrate that the proposed method significantly improves the detection performance of the YOLOv5s object detection network (the mAP is increased by up to 2.9%) and maintains the same inference speed as YOLOv5s (132 fps). Full article
(This article belongs to the Special Issue Machine Learning in Image/Video Processing and Sensing)
21 pages, 2229 KiB  
Article
LH-YOLO: A Lightweight and High-Precision SAR Ship Detection Model Based on the Improved YOLOv8n
by Qi Cao, Hang Chen, Shang Wang, Yongqiang Wang, Haisheng Fu, Zhenjiao Chen and Feng Liang
Remote Sens. 2024, 16(22), 4340; https://doi.org/10.3390/rs16224340 (registering DOI) - 20 Nov 2024
Abstract
Synthetic aperture radar is widely applied to ship detection due to generating high-resolution images under diverse weather conditions and its penetration capabilities, making SAR images a valuable data source. However, detecting multi-scale ship targets in complex backgrounds leads to issues of false positives [...] Read more.
Synthetic aperture radar is widely applied to ship detection due to generating high-resolution images under diverse weather conditions and its penetration capabilities, making SAR images a valuable data source. However, detecting multi-scale ship targets in complex backgrounds leads to issues of false positives and missed detections, posing challenges for lightweight and high-precision algorithms. There is an urgent need to improve accuracy of algorithms and their deployability. This paper introduces LH-YOLO, a YOLOv8n-based, lightweight, and high-precision SAR ship detection model. We propose a lightweight backbone network, StarNet-nano, and employ element-wise multiplication to construct a lightweight feature extraction module, LFE-C2f, for the neck of LH-YOLO. Additionally, a reused and shared convolutional detection (RSCD) head is designed using a weight sharing mechanism. These enhancements significantly reduce model size and computational demands while maintaining high precision. LH-YOLO features only 1.862 M parameters, representing a 38.1% reduction compared to YOLOv8n. It exhibits a 23.8% reduction in computational load while achieving a mAP50 of 96.6% on the HRSID dataset, which is 1.4% higher than YOLOv8n. Furthermore, it demonstrates strong generalization on the SAR-Ship-Dataset with a mAP50 of 93.8%, surpassing YOLOv8n by 0.7%. LH-YOLO is well-suited for environments with limited resources, such as embedded systems and edge computing platforms. Full article
24 pages, 3462 KiB  
Article
Underutilized Feature Extraction Methods for Burn Severity Mapping: A Comprehensive Evaluation
by Linh Nguyen Van and Giha Lee
Remote Sens. 2024, 16(22), 4339; https://doi.org/10.3390/rs16224339 (registering DOI) - 20 Nov 2024
Abstract
Wildfires increasingly threaten ecosystems and infrastructure, making accurate burn severity mapping (BSM) essential for effective disaster response and environmental management. Machine learning (ML) models utilizing satellite-derived vegetation indices are crucial for assessing wildfire damage; however, incorporating many indices can lead to multicollinearity, reducing [...] Read more.
Wildfires increasingly threaten ecosystems and infrastructure, making accurate burn severity mapping (BSM) essential for effective disaster response and environmental management. Machine learning (ML) models utilizing satellite-derived vegetation indices are crucial for assessing wildfire damage; however, incorporating many indices can lead to multicollinearity, reducing classification accuracy. While principal component analysis (PCA) is commonly used to address this issue, its effectiveness relative to other feature extraction (FE) methods in BSM remains underexplored. This study aims to enhance ML classifier accuracy in BSM by evaluating various FE techniques that mitigate multicollinearity among vegetation indices. Using composite burn index (CBI) data from the 2014 Carlton Complex fire in the United States as a case study, we extracted 118 vegetation indices from seven Landsat-8 spectral bands. We applied and compared 13 different FE techniques—including linear and nonlinear methods such as PCA, t-distributed stochastic neighbor embedding (t-SNE), linear discriminant analysis (LDA), Isomap, uniform manifold approximation and projection (UMAP), factor analysis (FA), independent component analysis (ICA), multidimensional scaling (MDS), truncated singular value decomposition (TSVD), non-negative matrix factorization (NMF), locally linear embedding (LLE), spectral embedding (SE), and neighborhood components analysis (NCA). The performance of these techniques was benchmarked against six ML classifiers to determine their effectiveness in improving BSM accuracy. Our results show that alternative FE techniques can outperform PCA, improving classification accuracy and computational efficiency. Techniques like LDA and NCA effectively capture nonlinear relationships critical for accurate BSM. The study contributes to the existing literature by providing a comprehensive comparison of FE methods, highlighting the potential benefits of underutilized techniques in BSM. Full article
Show Figures

Figure 1

Figure 1
<p>Visualization of dimensionality reduction techniques. Each plot represents a 3D data projection using three main components.</p>
Full article ">Figure 2
<p>Location of the 2014 Carlton Complex wildfire used in this study.</p>
Full article ">Figure 3
<p>Heatmaps representing the performance of thirteen feature extraction methods on four different metrics, namely (<b>a</b>) overall accuracy (OA), (<b>b</b>) precision, (<b>c</b>) recall, and (<b>d</b>) F1-score, across six machine learning classifiers. The x-axis of each heatmap lists the FR methods, while the y-axis lists the classifiers. The color intensity in each heatmap indicates the mean performance score of 1000 simulations.</p>
Full article ">Figure 4
<p>Relationship between the number of components used in thirteen feature reduction methods and the performance (overall accuracy, OA) of six classifiers—RF, LR, KNN, SVM, AB, and MLP. The x-axis in each plot shows the number of components, while the y-axis represents the OA. Each line corresponds to one of the classifiers fitted by quadratic polynomial regression models.</p>
Full article ">Figure 5
<p>Performance comparison of PCA, LDA, and NCA across four wildfire severity categories: (<b>a</b>) no burn, (<b>b</b>) low, (<b>c</b>) moderate, and (<b>d</b>) high severity. The performance is evaluated using six classifiers, and the y-axis shows the F1-score value. Error bars representing interquartile ranges indicate the variability in model performance across each severity level.</p>
Full article ">
29 pages, 8399 KiB  
Article
Automatic Modulation Recognition Based on Multimodal Information Processing: A New Approach and Application
by Wenna Zhang, Kailiang Xue, Aiqin Yao and Yunqiang Sun
Electronics 2024, 13(22), 4568; https://doi.org/10.3390/electronics13224568 - 20 Nov 2024
Abstract
Automatic modulation recognition (AMR) has wide applications in the fields of wireless communications, radar systems, and intelligent sensor networks. The existing deep learning-based modulation recognition models often focus on temporal features while overlooking the interrelations and spatio-temporal relationships among different types of signals. [...] Read more.
Automatic modulation recognition (AMR) has wide applications in the fields of wireless communications, radar systems, and intelligent sensor networks. The existing deep learning-based modulation recognition models often focus on temporal features while overlooking the interrelations and spatio-temporal relationships among different types of signals. To overcome these limitations, a hybrid neural network based on a multimodal parallel structure, called the multimodal parallel hybrid neural network (MPHNN), is proposed to improve the recognition accuracy. The algorithm first preprocesses the data by parallelly processing the multimodal forms of the modulated signals before inputting them into the network. Subsequently, by combining Convolutional Neural Networks (CNN) and Bidirectional Gated Recurrent Unit (Bi-GRU) models, the CNN is used to extract spatial features of the received signals, while the Bi-GRU transmits previous state information of the time series to the current state to capture temporal features. Finally, the Convolutional Block Attention Module (CBAM) and Multi-Head Self-Attention (MHSA) are introduced as two attention mechanisms to handle the temporal and spatial correlations of the signals through an attention fusion mechanism, achieving the calibration of the signal feature maps. The effectiveness of this method is validated using various datasets, with the experimental results demonstrating that the proposed approach can fully utilize the information of multimodal signals. The experimental results show that the recognition accuracy of MPHNN on multiple datasets reaches 93.1%, and it has lower computational complexity and fewer parameters than other models. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

Figure 1
<p>Visualization of instantaneous amplitude, instantaneous phase, instantaneous frequency, and IQ time-domain plots for 11 modulation modes.</p>
Full article ">Figure 2
<p>Overall architecture of the MPHNN.</p>
Full article ">Figure 3
<p>Structure of CBAM.</p>
Full article ">Figure 4
<p>Working mechanism of the CBAM.</p>
Full article ">Figure 5
<p>Structure of the Multi-Head Self-Attention (MHSA) module.</p>
Full article ">Figure 6
<p>Scaled dot-product attention.</p>
Full article ">Figure 7
<p>Structure of attention fusion mechanism.</p>
Full article ">Figure 8
<p>Bi-GRU information flow transfer diagram.</p>
Full article ">Figure 9
<p>Changes during training: (<b>a</b>) accuracy and (<b>b</b>) loss values.</p>
Full article ">Figure 10
<p>Recognition accuracy of the dataset RadioML2016.10A on several models.</p>
Full article ">Figure 11
<p>Confusion matrix at an SNR of 18 dB for (<b>a</b>) 1D-CNN, (<b>b</b>) 2D-CNN, (<b>c</b>) CLDNN, (<b>d</b>) DenseNet, (<b>e</b>) LSTM, (<b>f</b>) ResNet, and (<b>g</b>) proposed model.</p>
Full article ">Figure 12
<p>Confusion matrix at full SNR: (<b>a</b>) 1D-CNN, (<b>b</b>) 2D-CNN, (<b>c</b>) CLDNN, (<b>d</b>) DenseNet, (<b>e</b>) LSTM, (<b>f</b>) ResNet, and (<b>g</b>) proposed model.</p>
Full article ">Figure 13
<p>Recognition accuracy for each modulated signal in the range of −20~18 db for all seven methods. (<b>a</b>) The recognition accuracy of each modulated signal using 1D-CNN. (<b>b</b>) The recognition accuracy of each modulated signal using 2D-CNN. (<b>c</b>) The recognition accuracy of CLDNN for each modulated signal. (<b>d</b>) The recognition accuracy of each modulated signal using DenseNet. (<b>e</b>) The recognition accuracy of each modulated signal using LSTM. (<b>f</b>) The recognition accuracy of each modulated signal using ResNET. (<b>g</b>) The recognition accuracy of the proposed model in this paper for each modulated signal.</p>
Full article ">Figure 13 Cont.
<p>Recognition accuracy for each modulated signal in the range of −20~18 db for all seven methods. (<b>a</b>) The recognition accuracy of each modulated signal using 1D-CNN. (<b>b</b>) The recognition accuracy of each modulated signal using 2D-CNN. (<b>c</b>) The recognition accuracy of CLDNN for each modulated signal. (<b>d</b>) The recognition accuracy of each modulated signal using DenseNet. (<b>e</b>) The recognition accuracy of each modulated signal using LSTM. (<b>f</b>) The recognition accuracy of each modulated signal using ResNET. (<b>g</b>) The recognition accuracy of the proposed model in this paper for each modulated signal.</p>
Full article ">Figure 13 Cont.
<p>Recognition accuracy for each modulated signal in the range of −20~18 db for all seven methods. (<b>a</b>) The recognition accuracy of each modulated signal using 1D-CNN. (<b>b</b>) The recognition accuracy of each modulated signal using 2D-CNN. (<b>c</b>) The recognition accuracy of CLDNN for each modulated signal. (<b>d</b>) The recognition accuracy of each modulated signal using DenseNet. (<b>e</b>) The recognition accuracy of each modulated signal using LSTM. (<b>f</b>) The recognition accuracy of each modulated signal using ResNET. (<b>g</b>) The recognition accuracy of the proposed model in this paper for each modulated signal.</p>
Full article ">Figure 13 Cont.
<p>Recognition accuracy for each modulated signal in the range of −20~18 db for all seven methods. (<b>a</b>) The recognition accuracy of each modulated signal using 1D-CNN. (<b>b</b>) The recognition accuracy of each modulated signal using 2D-CNN. (<b>c</b>) The recognition accuracy of CLDNN for each modulated signal. (<b>d</b>) The recognition accuracy of each modulated signal using DenseNet. (<b>e</b>) The recognition accuracy of each modulated signal using LSTM. (<b>f</b>) The recognition accuracy of each modulated signal using ResNET. (<b>g</b>) The recognition accuracy of the proposed model in this paper for each modulated signal.</p>
Full article ">Figure 14
<p>Validation on other datasets: (<b>a</b>) RadioML2016.10B and (<b>b</b>) RadioML2018.01A-sample.</p>
Full article ">
24 pages, 21738 KiB  
Article
New Method to Correct Vegetation Bias in a Copernicus Digital Elevation Model to Improve Flow Path Delineation
by Gabriel Thomé Brochado and Camilo Daleles Rennó
Remote Sens. 2024, 16(22), 4332; https://doi.org/10.3390/rs16224332 - 20 Nov 2024
Abstract
Digital elevation models (DEM) are widely used in many hydrologic applications, providing key information about the topography, which is a major driver of water flow in a landscape. Several open access DEMs with near-global coverage are currently available, however, they represent the elevation [...] Read more.
Digital elevation models (DEM) are widely used in many hydrologic applications, providing key information about the topography, which is a major driver of water flow in a landscape. Several open access DEMs with near-global coverage are currently available, however, they represent the elevation of the earth’s surface including all its elements, such as vegetation cover and buildings. These features introduce a positive elevation bias that can skew the water flow paths, impacting the extraction of hydrological features and the accuracy of hydrodynamic models. Many attempts have been made to reduce the effects of this bias over the years, leading to the generation of improved datasets based on the original global DEMs, such as MERIT DEM and, more recently, FABDEM. However, even after these corrections, the remaining bias still affects flow path delineation in a significant way. Aiming to improve on this aspect, a new vegetation bias correction method is proposed in this work. The method consists of subtracting from the Copernicus DEM elevations their respective forest height but adjusted by correction factors to compensate for the partial penetration of the SAR pulses into the vegetation cover during the Copernicus DEM acquisition process. These factors were calculated by a new approach where the slope around the pixels at the borders of each vegetation patch were analyzed. The forest height was obtained from a global dataset developed for the year 2019. Moreover, to avoid temporal vegetation cover mismatch between the DEM and the forest height dataset, we introduced a process where the latter is automatically adjusted to best match the Copernicus acquisition year. The correction method was applied for regions with different forest cover percentages and topographic characteristics, and the result was compared to the original Copernicus DEM and FABDEM, which was used as a benchmark for vegetation bias correction. The comparison method was hydrology-based, using drainage networks obtained from topographic maps as reference. The new corrected DEM showed significant improvements over both the Copernicus DEM and FABDEM in all tested scenarios. Moreover, a qualitative comparison of these DEMs was also performed through exhaustive visual analysis, corroborating these findings. These results suggest that the use of this new vegetation bias correction method has the potential to improve DEM-based hydrological applications worldwide. Full article
Show Figures

Figure 1

Figure 1
<p>Position of study areas overlayed to a natural color Sentinel-2 cloud free composite of the year 2020 of South America.</p>
Full article ">Figure 2
<p>Sentinel-2 cloud free composite of the year 2020 and color representation of Copernicus DEM elevations of the study areas. The numbers in the top left corner of each panel refer to the study area depicted on it.</p>
Full article ">Figure 3
<p>Comparison between the forest height datasets. The figure presents a natural color Sentinel-2 cloud free composite of the year 2020 of the entire Area 1, with the subset area marked by the red rectangle (<b>top left</b>); the Sentinel-2 image of the subset area (<b>top center</b>); a grayscale representation of Copernicus DEM elevations on the subset area (<b>top right</b>); the Sentinel-2 composite overlayed by a color representation of Potapov et al. [<a href="#B44-remotesensing-16-04332" class="html-bibr">44</a>] (<b>bottom left</b>), Lang et al. [<a href="#B45-remotesensing-16-04332" class="html-bibr">45</a>] (<b>bottom center</b>) and Tolan et al. [<a href="#B46-remotesensing-16-04332" class="html-bibr">46</a>] (<b>bottom right</b>) forest height datasets, where heights equal to zero are transparent.</p>
Full article ">Figure 4
<p>Effect of forest height overestimation and canopy elevation underestimation on the estimated ground elevation. The illustration represents the difference (Δh1) between the estimated and actual forest heights, the original and corrected DEMs elevation profiles (dotted lines), the differences between the actual and estimated canopy elevations (Δh2), and ground elevations (Δh1 + Δh2).</p>
Full article ">Figure 5
<p>Copernicus DEM vegetation bias correction workflow.</p>
Full article ">Figure 6
<p>Stream flow paths comparison workflow.</p>
Full article ">Figure 7
<p>Flow path displacement area calculation. The illustration shows the drainage network overlayed by an initial point from where the reference and DEM-extracted flow paths are traced, until they the circle with radius <span class="html-italic">r</span> centered around the point. The flow path displacement area, highlighted in gray, is the sum of the areas located between these lines.</p>
Full article ">Figure 8
<p>Example of flow paths selection. The panels present the reference drainage network for Area 1 (<b>left</b>), the set of flow paths extracted from it using a 2000 m radius (<b>center</b>), and the flow paths selected from the latter (<b>right</b>). The lines are represented in yellow color in all panels, with the Sentinel-2 composite of the year 2020 in the background.</p>
Full article ">Figure 9
<p>Comparison between DEMs and vertical profiles in Area 1. The figure presents color representations of Copernicus, FABDEM, and the new corrected DEM elevation data over the study area (<b>top</b>); its natural color Sentinel-2 cloud-free composite of the year 2020, overlayed by the elevation profile lines identified by their respective numbers (<b>bottom left</b>); and charts showing the observed DEM elevations along the profile lines, with the background colored gray in areas covered by vegetation, according to the adjusted forest height obtained for the area (<b>bottom right</b>).</p>
Full article ">Figure 10
<p>Example of a region within Area 1 where the blurring effect was identified. The figure presents a natural color Sentinel-2 cloud-free composite of the year 2020 of the study area, overlayed by red rectangle highlighting the region featured in the other panels (<b>top left</b>); a color representation of the elevations of Copernicus DEM (<b>top right</b>), FABDEM (<b>bottom left</b>) and the new corrected DEM (<b>bottom right</b>), showing the different level of degradation of the finer topographic features visible in the original DEM.</p>
Full article ">Figure 11
<p>Comparison between DEMs and vertical profiles in Area 2. The figure presents color representations of Copernicus, FABDEM, and the new corrected DEM elevation data over the study area (<b>top</b>); its natural color Sentinel-2 cloud-free composite of the year 2020, overlayed by the elevation profile lines identified by their respective numbers (<b>bottom left</b>); and charts showing the observed DEM elevations along the profile lines, with the background colored gray in areas covered by vegetation, according to the adjusted forest height obtained for the area (<b>bottom right</b>).</p>
Full article ">Figure 12
<p>Comparison between DEMs and vertical profiles in Area 3. The figure presents color representations of Copernicus, FABDEM, and the new corrected DEM elevation data over the study area (<b>top</b>); its natural color Sentinel-2 cloud-free composite of the year 2020, overlayed by the elevation profile lines identified by their respective numbers (<b>bottom left</b>); and charts showing the observed DEM elevations along the profile lines, with the background colored gray in areas covered by vegetation, according to the adjusted forest height obtained for the area (<b>bottom right</b>).</p>
Full article ">Figure 13
<p>Example of a region within Area 3 where the blurring effect was identified. The figure presents a natural color Sentinel-2 cloud-free composite of the year 2020 of the study area, overlayed by red rectangle highlighting the region featured in the other panels (<b>top left</b>); a color representation of the elevations of Copernicus DEM (<b>top right</b>), FABDEM (<b>bottom left</b>) and the new corrected DEM (<b>bottom right</b>), showing the different level of degradation of the finer topographic features visible in the original DEM.</p>
Full article ">Figure 14
<p>Comparison between DEMs and vertical profiles in Area 4. The figure presents color representations of Copernicus, FABDEM and our corrected DEM elevation data over the study area (<b>top</b>); its natural color Sentinel-2 cloud-free composite of the year 2020, overlayed by the elevation profile lines identified by their respective numbers (<b>bottom left</b>); and charts showing the observed DEM elevations along the profile lines, with the background colored gray in areas covered by vegetation, according to the adjusted forest height obtained for the area (<b>bottom right</b>).</p>
Full article ">Figure 15
<p>Comparison of drainage networks extracted from the DEMs. The figure is composed of the natural color Sentinel-2 cloud-free composite of the year 2020 of the study areas overlayed by a red rectangle/highlighting the regions featured in the panels below (<b>first row</b>); Sentinel-2 composite of the highlighted regions, overlayed by the reference drainage lines and the ones extracted from Copernicus DEM, FABDEM, and the new corrected DEM, all in yellow color and placed side by side, organized in rows per study area.</p>
Full article ">
27 pages, 10743 KiB  
Article
Comparative Validation and Misclassification Diagnosis of 30-Meter Land Cover Datasets in China
by Xiaolin Xu, Dan Li, Hongxi Liu, Guang Zhao, Baoshan Cui, Yujun Yi, Wei Yang and Jizeng Du
Remote Sens. 2024, 16(22), 4330; https://doi.org/10.3390/rs16224330 - 20 Nov 2024
Viewed by 54
Abstract
Land cover maps with high accuracy are essential for environmental protection and climate change research. The 30-meter-resolution maps, with their better resolution and longer historical records, are extensively utilized to assess changes in land cover and their effects on carbon storage, land–atmosphere energy [...] Read more.
Land cover maps with high accuracy are essential for environmental protection and climate change research. The 30-meter-resolution maps, with their better resolution and longer historical records, are extensively utilized to assess changes in land cover and their effects on carbon storage, land–atmosphere energy balance, and water cycle processes. However, current data products use different classification methods, resulting in significant classification inconsistency and triggering serious disagreements among related studies. Here, we compared four mainstream land cover products in China, namely GLC_FCS30, CLCD, Globeland30, and CNLUCC. The result shows that only 50.34% of the classification results were consistent across the four datasets. The differences between pairs of datasets ranged from 21.10% to 37.53%. Importantly, most inconsistency occurs in transitional zones among land cover types sensitive to climate change and human activities. Based on the accuracy evaluation, CLCD is the most accurate land cover product, with an overall accuracy reaching 86.98 ± 0.76%, followed by CNLUCC (81.38 ± 0.87%) and GLC_FCS30 (77.83 ± 0.80%). Globeland30 had the lowest accuracy (75.24 ± 0.91%), primarily due to misclassification between croplands and forests. Misclassification diagnoses revealed that vegetation-related spectral confusion among land cover types contributed significantly to misclassifications, followed by slope, cloud cover, and landscape fragmentation, which affected satellite observation angles, data availability, and mixed pixels. Automated classification methods using the random forest algorithm can perform better than those that depend on traditional human–machine interactive interpretation or object-based approaches. However, their classification accuracy depends more on selecting training samples and feature variables. Full article
Show Figures

Figure 1

Figure 1
<p>The composition of land cover types and spatial distribution of validation samples. (<b>a</b>) Validation samples used in assessing the classification accuracy of land cover products; (<b>b</b>) the total area of land cover types for products; (<b>c</b>) the number of validation samples for land cover types; (<b>d</b>) validation samples are generated by stratified random sampling based on land cover types.</p>
Full article ">Figure 2
<p>The difference in four typical points between the identification in the Globeland30 and the high-resolution images and the high-definition real picture from Google Earth Pro. (<b>a</b>–<b>d</b>) The land cover map of Globeland30; (<b>e</b>–<b>h</b>) the high-resolution images from Google Earth Pro; (<b>i</b>–<b>l</b>) the high-definition real pictures from Google Earth Pro and street view from Gaode Maps (Version 14.02).</p>
Full article ">Figure 3
<p>Examples of manual visual interpretation are shown here. (<b>a</b>,<b>d</b>) High-resolution images from Google Earth Pro; (<b>b</b>,<b>e</b>) land use types regardless of pixel; (<b>c</b>,<b>f</b>) the box represents the footprint of an interpretation unit and includes 36 grid cells. We divided the box into polygons based on land cover classification, identified grid cell type by area of polygons, and assigned the land cover type with the most grid cells as the box’s land cover type. When the two types of land cover have the largest and the same number of grid cells, we set the land cover type of the box according to the priority order of glacier, wetland, impervious surface, cropland, forest, grassland, and bareland.</p>
Full article ">Figure 4
<p>The spatial pattern among four land cover products at 30m spatial resolution in 2020. (<b>a</b>–<b>f</b>) The inconsistent pixels between each two land cover products; (<b>g</b>) the consistent pixels of four land cover products.</p>
Full article ">Figure 5
<p>Pixel ratio-based confusion matrix of land cover types between land cover products (<b>a</b>–<b>f</b>). (<b>g</b>) Proportion of the consistent land cover for four land cover products; (<b>h</b>) proportion of land cover inconsistencies between four land cover products. Proportions in (<b>g</b>,<b>h</b>) are calculated based on average area of land cover type or land cover combination area in four products. CL: cropland, FL: forest, GL: grassland, WL: wetland, IS: impervious, BL: bareland, GS: glaciers.</p>
Full article ">Figure 6
<p>(<b>a</b>,<b>c</b>,<b>e</b>,<b>g</b>) The spatial pattern of the misclassification sample of land cover products; (<b>b</b>,<b>d</b>,<b>f</b>,<b>h</b>) the number of samples ranked in descending order of quantity. The legends only show the top 12 types of misclassification in terms of sample size.</p>
Full article ">Figure 7
<p>(<b>a</b>–<b>d</b>) The confusion proportions, a key factor in understanding the classification accuracy, for each of the land cover types in the validation dataset; (<b>e</b>) classification accuracy of land cover types in products. CL: cropland, FL: forest, GL: grassland, WL: wetland, IS: impervious, BL: bareland, GS: glaciers.</p>
Full article ">Figure 8
<p>(<b>a</b>–<b>d</b>) The relationship between classification accuracy and NDVI; (<b>e</b>–<b>i</b>) the probability density distribution of NDVI for land cover types in each land cover product and validation dataset.</p>
Full article ">Figure 9
<p>The relationship of classification accuracy with degree of topographic relief (<b>a</b>–<b>d</b>), slope (<b>e</b>–<b>h</b>), hill shade (<b>i</b>–<b>l</b>), and aspect (<b>m</b>–<b>p</b>). * means that the significance level is less than 0.1 (α &lt; 0.1), ** (α &lt; 0.05), *** (α &lt; 0.01).</p>
Full article ">Figure 10
<p>The relationship of classification accuracy with land cover fragmentation (LULC Fragmentation, <b>a</b>–<b>d</b>), land cover change (LULC Change, <b>e</b>–<b>h</b>), and cloud cover (<b>i</b>–<b>l</b>). * means that the significance level is less than 0.1 (α &lt; 0.1), ** (α &lt; 0.05), *** (α &lt; 0.01).</p>
Full article ">Figure 11
<p>The partial correlation coefficients between land cover types and impact factors in different land cover products. * means that the significance level is less than 0.05 (α &lt; 0.05), ** (α &lt; 0.01).</p>
Full article ">
19 pages, 29968 KiB  
Article
Ripe Tomato Detection Algorithm Based on Improved YOLOv9
by Yan Wang, Qianjie Rong and Chunhua Hu
Plants 2024, 13(22), 3253; https://doi.org/10.3390/plants13223253 - 20 Nov 2024
Viewed by 53
Abstract
Recognizing ripe tomatoes is a crucial aspect of tomato picking. To ensure the accuracy of inspection results, You Only Look Once version 9 (YOLOv9) has been explored as a fruit detection algorithm. To tackle the challenge of identifying tomatoes and the low accuracy [...] Read more.
Recognizing ripe tomatoes is a crucial aspect of tomato picking. To ensure the accuracy of inspection results, You Only Look Once version 9 (YOLOv9) has been explored as a fruit detection algorithm. To tackle the challenge of identifying tomatoes and the low accuracy of small object detection in complex environments, we propose a ripe tomato recognition algorithm based on an enhanced YOLOv9-C model. After collecting tomato data, we used Mosaic for data augmentation, which improved model robustness and enriched experimental data. Improvements were made to the feature extraction and down-sampling modules, integrating HGBlock and SPD-ADown modules into the YOLOv9 model. These measures resulted in high detection performance with precision and recall rates of 97.2% and 92.3% in horizontal and vertical experimental comparisons, respectively. The module-integrated model improved accuracy and recall by 1.3% and 1.1%, respectively, and also reduced inference time by 1 ms compared to the original model. The inference time of this model was 14.7 ms, which is 16 ms better than the RetinaNet model. This model was tested accurately with [email protected] (%) up to 98%, which is 9.6% higher than RetinaNet. Its increased speed and accuracy make it more suitable for practical applications. Overall, this model provides a reliable technique for recognizing ripe tomatoes during the picking process. Full article
Show Figures

Figure 1

Figure 1
<p>Tomato data picking scene.</p>
Full article ">Figure 2
<p>Partial dataset of tomato sampling in different environments: (<b>a</b>) backlight; (<b>b</b>) light; (<b>c</b>) tomatoes covered by leaves.</p>
Full article ">Figure 3
<p>Data augmentation: (<b>a</b>) horizontal Flip; (<b>b</b>) vertical flip; (<b>c</b>) deformation; (<b>d</b>) brightness adjustment; (<b>e</b>) brightness adjustment.</p>
Full article ">Figure 4
<p>An example of Mosaic data augmentation.</p>
Full article ">Figure 5
<p>The framework of the YOLOv9 network.</p>
Full article ">Figure 6
<p>The structure of RepNCSPELANA4.</p>
Full article ">Figure 7
<p>Modules in YOLOv9: (<b>a</b>) SPEELAN module; (<b>b</b>) ADown module.</p>
Full article ">Figure 8
<p>Structure of the HGBlock module.</p>
Full article ">Figure 9
<p>SPD-conv module.</p>
Full article ">Figure 10
<p>SPD-ADown module.</p>
Full article ">Figure 11
<p>Improved YOLOv9 model.</p>
Full article ">Figure 12
<p>The performance of each model under different metrics: (<b>a</b>) mAP@0.5; (<b>b</b>) mAP@0.5:0.95.</p>
Full article ">Figure 13
<p>Comparison results of different models: (<b>a</b>) mAP@0.5; (<b>b</b>) mAP@0.5:0.95; (<b>c</b>) PR curve.</p>
Full article ">Figure 13 Cont.
<p>Comparison results of different models: (<b>a</b>) mAP@0.5; (<b>b</b>) mAP@0.5:0.95; (<b>c</b>) PR curve.</p>
Full article ">Figure 14
<p>SSD algorithm performance: (<b>a</b>) micro-ripening tomato; (<b>b</b>) multi-fruit overlapping; (<b>c</b>) sunlight; (<b>d</b>) immature tomato; (<b>e</b>) leaf occlusion; (<b>f</b>) tomato stem occlusion.</p>
Full article ">Figure 15
<p>Faster-RCNN algorithm performance: (<b>a</b>) micro-ripening tomato; (<b>b</b>) multi-fruit overlapping; (<b>c</b>) sunlight; (<b>d</b>) immature tomato; (<b>e</b>) leaf occlusion; (<b>f</b>) tomato stem occlusion.</p>
Full article ">Figure 16
<p>RetinaNet algorithm performance: (<b>a</b>) micro-ripening tomato; (<b>b</b>) multi-fruit overlapping; (<b>c</b>) sunlight; (<b>d</b>) immature tomato; (<b>e</b>) leaf occlusion; (<b>f</b>) tomato stem occlusion.</p>
Full article ">Figure 16 Cont.
<p>RetinaNet algorithm performance: (<b>a</b>) micro-ripening tomato; (<b>b</b>) multi-fruit overlapping; (<b>c</b>) sunlight; (<b>d</b>) immature tomato; (<b>e</b>) leaf occlusion; (<b>f</b>) tomato stem occlusion.</p>
Full article ">Figure 17
<p>YOLOv8 algorithm performance: (<b>a</b>) micro-ripening tomato; (<b>b</b>) multi-fruit overlapping; (<b>c</b>) sunlight; (<b>d</b>) immature tomato; (<b>e</b>) leaf occlusion; (<b>f</b>) tomato stem occlusion.</p>
Full article ">Figure 18
<p>RT-DETR algorithm performance: (<b>a</b>) Micro-ripening tomato; (<b>b</b>) multi-fruit overlapping; (<b>c</b>) sunlight; (<b>d</b>) immature tomato; (<b>e</b>) leaf occlusion; (<b>f</b>) tomato stem occlusion.</p>
Full article ">Figure 19
<p>Algorithm performance in this article: (<b>a</b>) Micro-ripening tomato; (<b>b</b>) multi-fruit overlapping; (<b>c</b>) sunlight; (<b>d</b>) immature tomato; (<b>e</b>) leaf occlusion; (<b>f</b>) tomato stem occlusion.</p>
Full article ">
19 pages, 7893 KiB  
Article
AI-Driven Crack Detection for Remanufacturing Cylinder Heads Using Deep Learning and Engineering-Informed Data Augmentation
by Mohammad Mohammadzadeh, Gül E. Okudan Kremer, Sigurdur Olafsson and Paul A. Kremer
Automation 2024, 5(4), 578-596; https://doi.org/10.3390/automation5040033 - 20 Nov 2024
Viewed by 89
Abstract
Detecting cracks in cylinder heads traditionally relies on manual inspection, which is time-consuming and susceptible to human error. As an alternative, automated object detection utilizing computer vision and machine learning models has been explored. However, these methods often face challenges due to a [...] Read more.
Detecting cracks in cylinder heads traditionally relies on manual inspection, which is time-consuming and susceptible to human error. As an alternative, automated object detection utilizing computer vision and machine learning models has been explored. However, these methods often face challenges due to a lack of sufficiently annotated training data, limited image diversity, and the inherently small size of cracks. Addressing these constraints, this paper introduces a novel automated crack-detection method that enhances data availability through a synthetic data generation technique. Unlike general data augmentation practices, our method involves copying cracks from one location to another, guided by both random and informed engineering decisions about likely crack formations due to cyclic thermomechanical loads. The innovative aspect of our approach lies in the integration of domain-specific engineering knowledge into the synthetic generation process, which substantially improves detection accuracy. We evaluate our method’s effectiveness using two metrics: the F2 score, which emphasizes recall to prioritize detecting all potential cracks, and mean average precision (MAP), a standard measure in object detection. Experimental results demonstrate that, without engineering insights, our method increases the F2 score from 0.40 to 0.65, while maintaining a stable MAP. Incorporating detailed engineering knowledge further enhances the F2 score to 0.70 and improves MAP to 0.57, representing increases of 63% and 43%, respectively. These results confirm that our approach not only mitigates the limitations of traditional data augmentation but also significantly advances the reliability and precision of crack detection in industrial settings. Full article
(This article belongs to the Special Issue Smart Remanufacturing)
Show Figures

Figure 1

Figure 1
<p>Overview of the proposed methods and evaluation procedures.</p>
Full article ">Figure 2
<p>Sample images and cracks (small red squares) obtained in different methods: (<b>a</b>) lab setting and (<b>b</b>) less controlled setting.</p>
Full article ">Figure 3
<p>Sample cracks (small red squares) in original image vs. augmented images.</p>
Full article ">Figure 4
<p>Sample crack patches extracted from training set.</p>
Full article ">Figure 5
<p>Potential (known) areas where cracks may occur.</p>
Full article ">Figure 6
<p>Sample of an image before and after adding synthetic cracks to known areas.</p>
Full article ">Figure 7
<p>Sample of an image before and after adding synthetic cracks to uncommon areas.</p>
Full article ">Figure 8
<p>Sample of training images after randomly placing synthetic cracks.</p>
Full article ">Figure 9
<p>Sample of predicted images using YOLOv8x before (<b>a</b>) and after (<b>b</b>) adding 300 synthetic cracks.</p>
Full article ">
19 pages, 8554 KiB  
Article
RSNC-YOLO: A Deep-Learning-Based Method for Automatic Fine-Grained Tuna Recognition in Complex Environments
by Wenjie Xu, Hui Fang, Shengchi Yu, Shenglong Yang, Haodong Yang, Yujia Xie and Yang Dai
Appl. Sci. 2024, 14(22), 10732; https://doi.org/10.3390/app142210732 - 20 Nov 2024
Viewed by 119
Abstract
Tuna accounts for 20% of the output value of global marine capture fisheries, and it plays a crucial role in maintaining ecosystem stability, ensuring global food security, and supporting economic stability. However, improper management has led to significant overfishing, resulting in a sharp [...] Read more.
Tuna accounts for 20% of the output value of global marine capture fisheries, and it plays a crucial role in maintaining ecosystem stability, ensuring global food security, and supporting economic stability. However, improper management has led to significant overfishing, resulting in a sharp decline in tuna populations. For sustainable tuna fishing, it is essential to accurately identify the species of tuna caught and to count their numbers, as these data are the foundation for setting scientific catch quotas. The traditional manual identification method suffers from several limitations and is prone to errors during prolonged operations, especially due to factors like fatigue, high-intensity workloads, or adverse weather conditions, which ultimately compromise its accuracy. Furthermore, the lack of transparency in the manual process may lead to intentional underreporting, which undermines the integrity of fisheries’ data. In contrast, an intelligent, real-time identification system can reduce the need for human labor, assist in more accurate identification, and enhance transparency in fisheries’ management. This system not only provides reliable data for refined management but also enables fisheries’ authorities to dynamically adjust fishing strategies in real time, issue timely warnings when catch limits are approached or exceeded, and prevent overfishing, thus ultimately contributing to sustainable tuna management. In light of this need, this article proposes the RSNC-YOLO algorithm, an intelligent model designed for recognizing tuna in complex scenarios on fishing vessels. Based on YOLOv8s-seg, RSNC-YOLO integrates Reparameterized C3 (RepC3), Selective Channel Down-sampling (SCDown), a Normalization-based Attention Module (NAM), and C2f-DCNv3-DLKA modules. By utilizing a subset of images selected from the Fishnet Open Image Database, the model achieves a 2.7% improvement in [email protected] and a 0.7% improvement in [email protected]:0.95. Additionally, the number of parameters is reduced by approximately 30%, and the model’s weight size is reduced by 9.6 MB, while maintaining an inference speed comparable to that of YOLOv8s-seg. Full article
Show Figures

Figure 1

Figure 1
<p>Samples of selected images from the Fishnet Open Image Database.</p>
Full article ">Figure 2
<p>The architecture of RSNC-YOLO.</p>
Full article ">Figure 3
<p>The structure of RepC3 in detail. (<b>a</b>) The structure of RepC3. (<b>b</b>) The structure of RepConv during training. (<b>c</b>) The structure of RepConv during inference.</p>
Full article ">Figure 4
<p>The structure of the C2f-DCNv3-DLKA.</p>
Full article ">Figure 5
<p>The structure of the NAMAttention and SCDown block. (<b>a</b>) NAMAttention. (<b>b</b>) SCDown.</p>
Full article ">Figure 6
<p>Results of the RSNC-YOLO. (<b>a</b>) Loss curves for the RSNC-YOLO training process. (<b>b</b>) Loss curves for the RSNC-YOLO validation process. (<b>c</b>) Comparison of mAP@0.5 of YOLOv8-seg and RSNC-YOLO. (<b>d</b>) Comparison of mAP@0.5:0.95 of YOLOv8-seg and RSNC-YOLO.</p>
Full article ">Figure 6 Cont.
<p>Results of the RSNC-YOLO. (<b>a</b>) Loss curves for the RSNC-YOLO training process. (<b>b</b>) Loss curves for the RSNC-YOLO validation process. (<b>c</b>) Comparison of mAP@0.5 of YOLOv8-seg and RSNC-YOLO. (<b>d</b>) Comparison of mAP@0.5:0.95 of YOLOv8-seg and RSNC-YOLO.</p>
Full article ">Figure 7
<p>Precision–Recall curves of RSNC-YOLO and YOLOv8. (<b>a</b>) The curve represents the Precision–Recall curve of RSNC-YOLO. (<b>b</b>) The curve shows the Precision–Recall curve of the original YOLOv8-seg.</p>
Full article ">Figure 8
<p>Comparison of the RSNC-YOLO’s and the original model’s recognition results.</p>
Full article ">
17 pages, 8599 KiB  
Article
Att-BEVFusion: An Object Detection Algorithm for Camera and LiDAR Fusion Under BEV Features
by Peicheng Shi, Mengru Zhou, Xinlong Dong and Aixi Yang
World Electr. Veh. J. 2024, 15(11), 539; https://doi.org/10.3390/wevj15110539 - 20 Nov 2024
Viewed by 110
Abstract
To improve the accuracy of detecting small and long-distance objects while self-driving cars are in motion, in this paper, we propose a 3D object detection method, Att-BEVFusion, which fuses camera and LiDAR data in a bird’s-eye view (BEV). First, the transformation from the [...] Read more.
To improve the accuracy of detecting small and long-distance objects while self-driving cars are in motion, in this paper, we propose a 3D object detection method, Att-BEVFusion, which fuses camera and LiDAR data in a bird’s-eye view (BEV). First, the transformation from the camera view to the BEV space is achieved through an implicit supervision-based method, and then the LiDAR BEV feature point cloud is voxelized and converted into BEV features. Then, a channel attention mechanism is introduced to design a BEV feature fusion network to realize the fusion of camera BEV feature space and LiDAR BEV feature space. Finally, regarding the issue of insufficient global reasoning in the BEV fusion features generated by the channel attention mechanism, as well as the challenge of inadequate interaction between features. We further develop a BEV self-attention mechanism to apply global operations on the features. This paper evaluates the effectiveness of the Att-BEVFusion fusion algorithm on the nuScenes dataset, and the results demonstrate that the algorithm achieved 72.0% mean average precision (mAP) and 74.3% nuScenes detection score (NDS), with an advanced detection accuracy of 88.9% and 91.8% for single-item detection of automotive and pedestrian categories, respectively. Full article
Show Figures

Figure 1

Figure 1
<p>Comparison between BEVFusion and our proposed method, Att-BEVFusion, which shows that our method is able to effectively detect both distant and occluded objects.</p>
Full article ">Figure 2
<p>Overall structure diagram of Att-BEVFusion.</p>
Full article ">Figure 3
<p>Extraction of image features.</p>
Full article ">Figure 4
<p>Transformation of LIDAR point cloud data to BEV features.</p>
Full article ">Figure 5
<p>Structure of the channel attention mechanism (where r is the ratio of compression).</p>
Full article ">Figure 6
<p>Structure of the self-attention mechanism.</p>
Full article ">Figure 7
<p>Att-BEVFusion qualitative detection results.</p>
Full article ">Figure 7 Cont.
<p>Att-BEVFusion qualitative detection results.</p>
Full article ">
15 pages, 6086 KiB  
Article
Improved Visual SLAM Algorithm Based on Dynamic Scenes
by Jinxing Niu, Ziqi Chen, Tao Zhang and Shiyu Zheng
Appl. Sci. 2024, 14(22), 10727; https://doi.org/10.3390/app142210727 - 20 Nov 2024
Viewed by 159
Abstract
This work presents a novel RGB-D dynamic simultaneous localization and mapping (SLAM) method that improves accuracy, stability, and efficiency of localization while relying on deep learning in a dynamic environment, in contrast to traditional static scene-based visual SLAM methods. Based on the classic [...] Read more.
This work presents a novel RGB-D dynamic simultaneous localization and mapping (SLAM) method that improves accuracy, stability, and efficiency of localization while relying on deep learning in a dynamic environment, in contrast to traditional static scene-based visual SLAM methods. Based on the classic framework of traditional visual SLAM, we propose a method that replaces the traditional feature extraction method with a convolutional neural network approach, aiming to enhance the accuracy of feature extraction and localization, as well as to improve the algorithm’s ability to capture and represent the characteristics of the entire scene. Subsequently, the semantic segmentation thread was utilized in a target detection network combined with geometric methods to identify potential dynamic areas in the image and generate masks for dynamic objects. Finally, the standard deviation of the depth information of potential dynamic points was calculated to identify true dynamic feature points, to guarantee that static feature points were used for position estimation. We performed experiments based on the public datasets to validate the feasibility of the proposed algorithm. The experimental results indicate that the improved SLAM algorithm, which boasts a reduction in absolute trajectory error (ATE) by approximately 97% compared to traditional static visual SLAM and about 20% compared to traditional dynamic visual SLAM, also exhibited a 68% decrease in computation time compared to well-known dynamic visual SLAM, thereby possessing absolute advantages in both positioning accuracy and operational efficiency. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

Figure 1
<p>Overview of the enhanced SLAM system. The framework of the algorithm comprises four threads: semantic segmentation, tracking, local mapping, and loop closing.</p>
Full article ">Figure 2
<p>GCNv2 feature extraction network structure with channel numbers listed below each convolutional layer.</p>
Full article ">Figure 3
<p>YOLOv5’s network architecture diagram.</p>
Full article ">Figure 4
<p>(<b>a</b>,<b>b</b>) and (<b>c</b>,<b>d</b>) are the semantic segmentation results based on the modified SLAM. Red indicates the detection boxes from YOLOv5x for object detection, while green represents the extracted feature points.</p>
Full article ">Figure 5
<p>Comparing feature point distribution between ORB and GCNv2, the scenes in figures (<b>a</b>,<b>b</b>) are cluttered with various objects, including computer screens, which made it difficult to obtain features. The images in (<b>c</b>,<b>d</b>) were taken from the corner of a table where the camera was moving, resulting in significant changes in viewpoint.</p>
Full article ">Figure 6
<p>Comparing the ATE of the improved SLAM, ORB-SLAM2 across five dynamic scene sequences from the fr3 dataset, (<b>a</b>–<b>e</b>) represent the trajectory maps of ORB-SLAM2, while (<b>f</b>–<b>j</b>) represent the trajectory maps of the improved SLAM.</p>
Full article ">Figure 7
<p>Shows the results for the fr3_walking_xyz sequence. Panels (<b>a</b>,<b>b</b>) illustrate the estimated trajectories compared to the ground truth, as well as the errors along the x, y, and z axes for ORB-SLAM2 and the improved SLAM. Panel (<b>c</b>) displays the time consumption for each method.</p>
Full article ">
26 pages, 10461 KiB  
Article
Accuracy and Precision of Shallow-Water Photogrammetry from the Sea Surface
by Elisa Casella, Giovanni Scicchitano and Alessio Rovere
Remote Sens. 2024, 16(22), 4321; https://doi.org/10.3390/rs16224321 - 19 Nov 2024
Viewed by 339
Abstract
Mapping shallow-water bathymetry and morphology represents a technical challenge. In fact, acoustic surveys are limited by water depths reachable by boat, and airborne surveys have high costs. Photogrammetric approaches (either via drone or from the sea surface) have opened up the possibility to [...] Read more.
Mapping shallow-water bathymetry and morphology represents a technical challenge. In fact, acoustic surveys are limited by water depths reachable by boat, and airborne surveys have high costs. Photogrammetric approaches (either via drone or from the sea surface) have opened up the possibility to perform shallow-water surveys easily and at accessible costs. This work presents a simple, low-cost, and highly portable platform that allows gathering sequential photos and echosounder depth values of shallow-water sites (up to 5 m depth). The photos are then analysed in conjunction with photogrammetric techniques to obtain digital bathymetric models and orthomosaics of the seafloor. The workflow was tested on four repeated surveys of the same area in the Western Mediterranean and allowed obtaining digital bathymetric models with centimetric average accuracy and precision and root mean square errors within a few decimetres. The platform presented in this work can be employed to obtain first-order bathymetric products, enabling the contextual establishment of the depth accuracy of the final products. Full article
Show Figures

Figure 1

Figure 1
<p>(<b>A</b>) Map of the Italian Peninsula. The star indicates the study site. Site where the test area (dashed line) is located as seen in: (<b>B</b>) orthomosaic of the area (Background image from Google Earth, 2022) and (<b>C</b>) oblique drone photo.</p>
Full article ">Figure 2
<p>Field setup used in this study. An operator working in snorkelling is dragging a diver’s buoy on top of which are fixed a dry case with a GNSS receiver (1) and a mobile phone (2). Fixed on the underwater part of the diver’s buoy are located a GoPro camera (3) and a portable echosounder (4). See text for details. The drawing is not to scale.</p>
Full article ">Figure 3
<p>Example of results obtained using the workflow outlined in the main text. (<b>A</b>) grid pattern followed by the snorkelling operator. (<b>B</b>) Orthomosaic (with hillshade in the background). (<b>C</b>) Digital bathymetric model (DBM) and echosounder points. Panels A, B, and C refer to the survey performed on the 13 August 2020. The same results for all surveys are shown in <a href="#remotesensing-16-04321-f0A2" class="html-fig">Figure A2</a>. (<b>D</b>–<b>G</b>) show an example of a picture for each survey date. The location pin (also shown in panel B) helps orient the image and place it in the reconstructed scene.</p>
Full article ">Figure 4
<p>Percentage of points and corresponding confidence calculated by Agisoft Metashape. Note that the surveys of 28 July and 13 August have higher confidence than the other two surveys, for which fewer photos were aligned by the program.</p>
Full article ">Figure 5
<p>Histograms showing the depth differences between DBM depths and control echosounder points (that represent the accuracy of each DBM), with average difference and RMSE for each survey date (panels <b>A</b>–<b>D</b>). For a plot of echosounder depths versus DBM depths, see <a href="#remotesensing-16-04321-f0A4" class="html-fig">Figure A4</a>.</p>
Full article ">Figure A1
<p>(<b>A</b>) Screenshot of the echosounder during data collection. The upper part shows the map location, while the lower part shows the sonogram surveyed by the echosounder. (<b>B</b>) Picture of the GNSS screen. This data is needed to syncronise the pictures taken with the GoPro camera with GNSS time.</p>
Full article ">Figure A2
<p>Same as in <a href="#remotesensing-16-04321-f003" class="html-fig">Figure 3</a>, but for all survey dates. The orthomosaics and DBMs shown here are not aligned to the 13 August one.</p>
Full article ">Figure A3
<p>Heatmap showing the RMSE between echosounder control points and DBM depths divided by survey date and depth bin. Darker blue colors represent higher RMSE.</p>
Full article ">Figure A4
<p>Scatterplots of DBM depths (x-axis) versus echosounder points depth (y-axis) for each survey date (panels <b>A</b>–<b>D</b>).</p>
Full article ">Figure A5
<p>Maps of the differences between DBMs from surveys performed on different dates.</p>
Full article ">Figure A6
<p>Histograms showing the differences between DBMs from surveys performed on different dates.</p>
Full article ">
22 pages, 5386 KiB  
Article
A Novel Multi-Sensor Nonlinear Tightly-Coupled Framework for Composite Robot Localization and Mapping
by Lu Chen, Amir Hussain, Yu Liu, Jie Tan, Yang Li, Yuhao Yang, Haoyuan Ma, Shenbing Fu and Gun Li
Sensors 2024, 24(22), 7381; https://doi.org/10.3390/s24227381 - 19 Nov 2024
Viewed by 205
Abstract
Composite robots often encounter difficulties due to changes in illumination, external disturbances, reflective surface effects, and cumulative errors. These challenges significantly hinder their capabilities in environmental perception and the accuracy and reliability of pose estimation. We propose a nonlinear optimization approach to overcome [...] Read more.
Composite robots often encounter difficulties due to changes in illumination, external disturbances, reflective surface effects, and cumulative errors. These challenges significantly hinder their capabilities in environmental perception and the accuracy and reliability of pose estimation. We propose a nonlinear optimization approach to overcome these issues to develop an integrated localization and navigation framework, IIVL-LM (IMU, Infrared, Vision, and LiDAR Fusion for Localization and Mapping). This framework achieves tightly coupled integration at the data level using inputs from an IMU (Inertial Measurement Unit), an infrared camera, an RGB (Red, Green and Blue) camera, and LiDAR. We propose a real-time luminance calculation model and verify its conversion accuracy. Additionally, we designed a fast approximation method for the nonlinear weighted fusion of features from infrared and RGB frames based on luminance values. Finally, we optimize the VIO (Visual-Inertial Odometry) module in the R3LIVE++ (Robust, Real-time, Radiance Reconstruction with LiDAR-Inertial-Visual state Estimation) framework based on the infrared camera’s capability to acquire depth information. In a controlled study, using a simulated indoor rescue scenario dataset, the IIVL-LM system demonstrated significant performance enhancements in challenging luminance conditions, particularly in low-light environments. Specifically, the average RMSE ATE (Root Mean Square Error of absolute trajectory Error) improved by 23% to 39%, with reductions from 0.006 to 0.013. At the same time, we conducted comparative experiments using the publicly available TUM-VI (Technical University of Munich Visual-Inertial Dataset) without the infrared image input. It was found that no leading results were achieved, which verifies the importance of infrared image fusion. By maintaining the active engagement of at least three sensors at all times, the IIVL-LM system significantly boosts its robustness in both unknown and expansive environments while ensuring high precision. This enhancement is particularly critical for applications in complex environments, such as indoor rescue operations. Full article
(This article belongs to the Special Issue New Trends in Optical Imaging and Sensing Technologies)
Show Figures

Figure 1

Figure 1
<p>IIVL-LM system framework applied to the composite robot.</p>
Full article ">Figure 2
<p>Schematic diagram of each module of the IIVL-LM system.</p>
Full article ">Figure 3
<p>Feature extraction performance of RGB and infrared images under extreme illuminance value. (<b>a</b>) Feature extraction performance of RGB images at a normalized illuminance value of 0.148. (<b>b</b>) Feature extraction performance of infrared images at a normalized illuminance value of 0.853.</p>
Full article ">Figure 4
<p>Weight-based nonlinear interpolation frame method.</p>
Full article ">Figure 5
<p>VIO (Optimized Visual-Inertial Odometry).</p>
Full article ">Figure 6
<p>Schematic diagram of IIVL-LM system and sensors deployed on composite robots. (<b>a</b>) Multi-sensor. (<b>b</b>) Composite robots.</p>
Full article ">Figure 7
<p>Comparison of X/Y axis data and actual trajectory of review robots under the IIVL-LM system. (<b>a</b>) Comparison of <span class="html-italic">X</span>-axis data. (<b>b</b>) Comparison of <span class="html-italic">Y</span>-axis data. (<b>c</b>) Actual testing and running trajectory of composite robots.</p>
Full article ">Figure 8
<p>The feature extraction results in the VIO module using RGB, infrared, and depth images under different lighting conditions in a small-scale indoor simulated environment. (<b>a</b>) The extraction of environmental features from RGB frames during the day. (<b>b</b>) Feature extraction of environmental characteristics from infrared frames during the day. (<b>c</b>) Feature extraction of environmental characteristics from infrared frames during the night. (<b>d</b>) Feature coordinates with depth values in the depth image.</p>
Full article ">Figure 9
<p>Real-time reconstruction process and radiance map of the small-scale indoor environment. (<b>a</b>) Real-time reconstruction process of the map. (<b>b</b>) Reconstructed radiance map of the small-scale indoor environment.</p>
Full article ">Figure 10
<p>Test conclusion and comparison under different illuminances. (<b>a</b>) RMSE ATE of all methods under different illuminance values. (<b>b</b>) Comparison between various methods and overall average.</p>
Full article ">Figure 11
<p>Test conclusion and comparison under multiple sequences in the TUM-VI dataset. (<b>a</b>) RMSE ATE of all methods under multiple sequences in the TUM-VI dataset. (<b>b</b>) Comparison between various methods and overall average.</p>
Full article ">Figure 12
<p>The test scenario on ORB-SLAM3.</p>
Full article ">
17 pages, 9544 KiB  
Article
Recognition of Maize Tassels Based on Improved YOLOv8 and Unmanned Aerial Vehicles RGB Images
by Jiahao Wei, Ruirui Wang, Shi Wei, Xiaoyan Wang and Shicheng Xu
Drones 2024, 8(11), 691; https://doi.org/10.3390/drones8110691 - 19 Nov 2024
Viewed by 349
Abstract
The tasseling stage of maize, as a critical period of maize cultivation, is essential for predicting maize yield and understanding the normal condition of maize growth. However, the branches overlap each other during the growth of maize seedlings and cannot be used as [...] Read more.
The tasseling stage of maize, as a critical period of maize cultivation, is essential for predicting maize yield and understanding the normal condition of maize growth. However, the branches overlap each other during the growth of maize seedlings and cannot be used as an identifying feature. However, during the tasseling stage, its apical ear blooms and has distinctive features that can be used as an identifying feature. However, the sizes of the maize tassels are small, the background is complex, and the existing network has obvious recognition errors. Therefore, in this paper, unmanned aerial vehicle (UAV) RGB images and an improved YOLOv8 target detection network are used to enhance the recognition accuracy of maize tassels. In the new network, a microscale target detection head is added to increase the ability to perceive small-sized maize tassels; In addition, Spatial Pyramid Pooling—Fast (SPPF) is replaced by the Spatial Pyramid Pooling with Efficient Layer Aggregation Network (SPPELAN) in the backbone network part to connect different levels of detailed features and semantic information. Moreover, a dual-attention module synthesized by GAM-CBAM is added to the neck part to reduce the loss of features of maize tassels, thus improving the network’s detection ability. We also labeled the new maize tassels dataset in VOC format as the training and validation of the network model. In the final model testing results, the new network model’s precision reached 93.6% and recall reached 92.5%, which was an improvement of 2.8–12.6 percentage points and 3.6–15.2 percentage points compared to the mAP50 and F1-score values of other models. From the experimental results, it is shown that the improved YOLOv8 network, with high performance and robustness in small-sized maize tassel recognition, can accurately recognize maize tassels in UAV images, which provides technical support for automated counting, accurate cultivation, and large-scale intelligent cultivation of maize seedlings. Full article
(This article belongs to the Special Issue Advances of UAV in Precision Agriculture)
19 pages, 18572 KiB  
Article
MSG-YOLO: A Lightweight Detection Algorithm for Clubbing Finger Detection
by Zhijie Wang, Qiao Meng, Feng Tang, Yuelin Qi, Bingyu Li, Xin Liu, Siyuan Kong and Xin Li
Electronics 2024, 13(22), 4549; https://doi.org/10.3390/electronics13224549 - 19 Nov 2024
Viewed by 295
Abstract
Clubbing finger is a significant clinical indicator, and its early detection is essential for the diagnosis and treatment of associated diseases. However, traditional diagnostic methods rely heavily on the clinician’s subjective assessment, which can be prone to biases and may lack standardized tools. [...] Read more.
Clubbing finger is a significant clinical indicator, and its early detection is essential for the diagnosis and treatment of associated diseases. However, traditional diagnostic methods rely heavily on the clinician’s subjective assessment, which can be prone to biases and may lack standardized tools. Unlike other diagnostic challenges, the characteristic changes of clubbing finger are subtle and localized, necessitating high-precision feature extraction. Existing models often fail to capture these delicate changes accurately, potentially missing crucial diagnostic features or generating false positives. Furthermore, these models are often not suited for accurate clinical diagnosis in resource-constrained settings. To address these challenges, we propose MSG-YOLO, a lightweight clubbing finger detection model based on YOLOv8n, designed to enhance both detection accuracy and efficiency. The model first employs a multi-scale dilated residual module, which expands the receptive field using dilated convolutions and residual connections, thereby improving the model’s ability to capture features across various scales. Additionally, we introduce a Selective Feature Fusion Pyramid Network (SFFPN) that dynamically selects and enhances critical features, optimizing the flow of information while minimizing redundancy. To further refine the architecture, we reconstruct the YOLOv8 detection head with group normalization and shared-parameter convolutions, significantly reducing the model’s parameter count and increasing computational efficiency. Experimental results indicate that the model maintains high detection accuracy with reduced parameter and computational requirements. Compared to YOLOv8n, MSG-YOLO achieves a 48.74% reduction in parameter count and a 24.17% reduction in computational load, while improving the mAP0.5 score by 2.86%, reaching 93.64%. This algorithm strikes a balance between accuracy and lightweight design, offering efficient and reliable clubbing finger detection even in resource-constrained environments. Full article
Show Figures

Figure 1

Figure 1
<p>The overall architecture of the improved YOLO model, comprising three primary components: Backbone, Neck, and Head. The Backbone module extracts multi-level features from the input image, the Neck module merges and processes multi-scale features, and the Head module produces the target detection results. The SPPF module aggregates multi-scale information through repeated MaxPool2d operations and concatenation, and the Conv module consists of Conv2d, BatchNorm2d, and the SiLU activation function.</p>
Full article ">Figure 2
<p>Structure diagram of the C2f_MDR module. (<b>a</b>) The overall structure of the C2f_MDR module. The input feature map undergoes convolution, segmentation, and multiple MDR module processes, followed by concatenation and convolution to restore the channel number and generate the output feature map. (<b>b</b>) The detailed structure of the MDR module, which consists of three convolutional layers with different dilation rates (dilation rates of 1, 3, and 5) to capture contextual information at different scales.</p>
Full article ">Figure 3
<p>Schematic diagram of the SFFPN module. In the Feature Selection module, channel attention (CA) and element-wise multiplication (⊗) are used to adaptively adjust the weights of the input features, followed by processing with a convolution layer (kernel size k = 1). The Feature Selection Fusion module then performs upsampling of features at different scales through convolution transpose (ConvTranspose), followed by concatenation (Concat) to fuse multi-scale features. The fused feature map is subsequently passed into the C2f module for further optimization, providing higher-quality features for object detection.</p>
Full article ">Figure 4
<p>Structure diagram of the GNSCD module. Feature maps at different scales are first processed using group normalization convolution (Conv_GN), followed by shared convolution layers (with kernel sizes k = 1 and k = 5) to extract multi-scale features, thereby enhancing the model’s ability to detect objects at different scales.</p>
Full article ">Figure 5
<p>Dataset example: image (<b>a</b>) displays a clubbed finger, while image (<b>b</b>) shows a normal finger. Characteristics of clubbed fingers include nail-fold angles greater than 180°, whereas normal fingers generally have nail-fold angles less than 180°.</p>
Full article ">Figure 6
<p>Comparison of detection results between MSG-YOLO and YOLOv8n models. (<b>a</b>) Performance of the MSG-YOLO model in the clubbed finger detection task. (<b>b</b>) Results from the YOLOv8n model on the same task.</p>
Full article ">Figure 7
<p>Detection Figure results of MSG-YOLO on different samples. The left side (<b>a</b>) shows the bounding boxes for normal and clubbed finger samples, with each box labeling the detected category and confidence score. The right side (<b>b</b>) presents the corresponding heatmaps, highlighting the areas of focus for the model during detection. The high-intensity red and yellow regions indicate significant features that the model has identified in these areas.</p>
Full article ">
Back to TopTop