Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

Search Results (583)

Search Parameters:
Keywords = dilated Convolution

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 24146 KiB  
Article
SMEP-DETR: Transformer-Based Ship Detection for SAR Imagery with Multi-Edge Enhancement and Parallel Dilated Convolutions
by Chushi Yu and Yoan Shin
Remote Sens. 2025, 17(6), 953; https://doi.org/10.3390/rs17060953 - 7 Mar 2025
Abstract
Synthetic aperture radar (SAR) serves as a pivotal remote sensing technology, offering critical support for ship monitoring, environmental observation, and national defense. Although optical detection methods have achieved good performance, SAR imagery still faces challenges, including speckle, complex backgrounds, and small, dense targets. [...] Read more.
Synthetic aperture radar (SAR) serves as a pivotal remote sensing technology, offering critical support for ship monitoring, environmental observation, and national defense. Although optical detection methods have achieved good performance, SAR imagery still faces challenges, including speckle, complex backgrounds, and small, dense targets. Reducing false alarms and missed detections while improving detection performance remains a key objective in the field. To address these issues, we propose SMEP-DETR, a transformer-based model with multi-edge enhancement and parallel dilated convolutions. This model integrates a speckle denoising module, a multi-edge information enhancement module, and a parallel dilated convolution and attention pyramid network. Experimental results demonstrate that SMEP-DETR achieves the high mAP 98.6% on SSDD, 93.2% in HRSID, and 80.0% in LS-SSDD-v1.0, surpassing several state-of-the-art algorithms. Visualization results validate the model’s capability to effectively mitigate the impact of speckle noise while preserving valuable information in both inshore and offshore scenarios. Full article
(This article belongs to the Special Issue Remote Sensing Image Thorough Analysis by Advanced Machine Learning)
Show Figures

Figure 1

Figure 1
<p>The architecture of the proposed SMEP-DETR. Ⓒ denotes the concatenate operation and ⊕ denotes the element-wise add operation.</p>
Full article ">Figure 2
<p>The structure of the multi-edge information enhancement module.</p>
Full article ">Figure 3
<p>Diagram of the parallel dilated convolution and attention pyramid network.</p>
Full article ">Figure 4
<p>Visualization of SMEP-DETR and comparison detectors on SSDD: (<b>a</b>) inshore scene with large-scale ship targets, (<b>b</b>) inshore scene with both large and small ships, (<b>c</b>) offshore scene with significant speckle interference, (<b>d</b>) offshore scene with multiple targets. Red bounding boxes represent predicted ships, yellow ellipses indicate missing detections, and blue ellipses denote false alarms.</p>
Full article ">Figure 4 Cont.
<p>Visualization of SMEP-DETR and comparison detectors on SSDD: (<b>a</b>) inshore scene with large-scale ship targets, (<b>b</b>) inshore scene with both large and small ships, (<b>c</b>) offshore scene with significant speckle interference, (<b>d</b>) offshore scene with multiple targets. Red bounding boxes represent predicted ships, yellow ellipses indicate missing detections, and blue ellipses denote false alarms.</p>
Full article ">Figure 4 Cont.
<p>Visualization of SMEP-DETR and comparison detectors on SSDD: (<b>a</b>) inshore scene with large-scale ship targets, (<b>b</b>) inshore scene with both large and small ships, (<b>c</b>) offshore scene with significant speckle interference, (<b>d</b>) offshore scene with multiple targets. Red bounding boxes represent predicted ships, yellow ellipses indicate missing detections, and blue ellipses denote false alarms.</p>
Full article ">Figure 5
<p>Visualization of SMEP-DETR and comparison detectors on HRSID and LS-SSDD-v1.0. (<b>a</b>,<b>b</b>) Samples from HRSID, (<b>c</b>,<b>d</b>) samples of LS-SSDD-v1.0. (<b>a</b>) Offshore scene with closely spaced targets, (<b>b</b>) inshore scene with docked objects near the shoreline, (<b>c</b>) offshore scene containing extremely small targets, (<b>d</b>) inshore scene with extensive background information. Red bounding boxes represent predicted ships, yellow ellipses indicate missing detections, and blue ellipses denote false alarms.</p>
Full article ">Figure 5 Cont.
<p>Visualization of SMEP-DETR and comparison detectors on HRSID and LS-SSDD-v1.0. (<b>a</b>,<b>b</b>) Samples from HRSID, (<b>c</b>,<b>d</b>) samples of LS-SSDD-v1.0. (<b>a</b>) Offshore scene with closely spaced targets, (<b>b</b>) inshore scene with docked objects near the shoreline, (<b>c</b>) offshore scene containing extremely small targets, (<b>d</b>) inshore scene with extensive background information. Red bounding boxes represent predicted ships, yellow ellipses indicate missing detections, and blue ellipses denote false alarms.</p>
Full article ">Figure 5 Cont.
<p>Visualization of SMEP-DETR and comparison detectors on HRSID and LS-SSDD-v1.0. (<b>a</b>,<b>b</b>) Samples from HRSID, (<b>c</b>,<b>d</b>) samples of LS-SSDD-v1.0. (<b>a</b>) Offshore scene with closely spaced targets, (<b>b</b>) inshore scene with docked objects near the shoreline, (<b>c</b>) offshore scene containing extremely small targets, (<b>d</b>) inshore scene with extensive background information. Red bounding boxes represent predicted ships, yellow ellipses indicate missing detections, and blue ellipses denote false alarms.</p>
Full article ">
19 pages, 5899 KiB  
Article
DGBL-YOLOv8s: An Enhanced Object Detection Model for Unmanned Aerial Vehicle Imagery
by Chonghao Wang and Huaian Yi
Appl. Sci. 2025, 15(5), 2789; https://doi.org/10.3390/app15052789 - 5 Mar 2025
Viewed by 169
Abstract
Unmanned aerial vehicle (UAV) imagery often suffers from significant object scale variations, high target density, and varying distances due to shooting conditions and environmental factors, leading to reduced robustness and low detection accuracy in conventional models. To address these issues, this study adopts [...] Read more.
Unmanned aerial vehicle (UAV) imagery often suffers from significant object scale variations, high target density, and varying distances due to shooting conditions and environmental factors, leading to reduced robustness and low detection accuracy in conventional models. To address these issues, this study adopts DGBL-YOLOv8s, an improved object detection model tailored for UAV perspectives based on YOLOv8s. First, a Dilated Wide Residual (DWR) module is introduced to replace the C2f module in the backbone network of YOLOv8, enhancing the model’s capability to capture fine-grained features and contextual information. Second, the neck structure is redesigned by incorporating a Global-to-Local Spatial Aggregation (GLSA) module combined with a Bidirectional Feature Pyramid Network (BiFPN), which strengthens feature fusion. Third, a lightweight shared convolution detection head is proposed, incorporating shared convolution and batch normalization techniques. Additionally, to further improve small object detection, a dedicated small-object detection head is introduced. Results from experiments on the VisDrone dataset reveal that DGBL-YOLOv8s enhances detection accuracy by 8.5% relative to the baseline model, alongside a 34.8% reduction in parameter count. The overall performance exceeds most of the current detection models, which confirms the advantages of the proposed improvement. Full article
Show Figures

Figure 1

Figure 1
<p>Architecture of the DGBL-YOLOv8.</p>
Full article ">Figure 2
<p>C2f_DWR.</p>
Full article ">Figure 3
<p>Architecture of the DWR module.</p>
Full article ">Figure 4
<p>Architecture of the GLBA-BiFPN.</p>
Full article ">Figure 5
<p>Architecture of the GLSA module.</p>
Full article ">Figure 6
<p>Architecture of the LSCDH.</p>
Full article ">Figure 7
<p>Comparison of the changes in the receptive field of the backbone network.</p>
Full article ">Figure 8
<p>Line graph of receptive field contrast.</p>
Full article ">Figure 9
<p>Daytime detection results: (<b>a</b>) unimproved detection results and (<b>b</b>) improved detection results.</p>
Full article ">Figure 10
<p>(<b>I</b>) Night detection results: (<b>a</b>) unimproved detection results and (<b>b</b>) improved detection results. (<b>II</b>) Night detection results: (<b>a</b>) unimproved detection results and (<b>b</b>) improved detection results.</p>
Full article ">Figure 11
<p>Comparison of false detection: (<b>a</b>) unimproved detection results and (<b>b</b>) improved detection results.</p>
Full article ">Figure 12
<p>Comparison of missed detection: (<b>a</b>) unimproved detection results and (<b>b</b>) improved detection results.</p>
Full article ">Figure 13
<p>mAP Comparison between different model categories.</p>
Full article ">
20 pages, 3815 KiB  
Article
A Benchmark for Water Surface Jet Segmentation with MobileHDC Method
by Yaojie Chen, Qing Quan, Wei Wang and Yunhan Lin
Appl. Sci. 2025, 15(5), 2755; https://doi.org/10.3390/app15052755 - 4 Mar 2025
Viewed by 138
Abstract
Intelligent jet systems are widely used in various fields, including firefighting, marine operations, and underwater exploration. Accurate extraction and prediction of jet trajectories are essential for optimizing their performance, but challenges arise due to environmental factors such as climate, wind direction, and suction [...] Read more.
Intelligent jet systems are widely used in various fields, including firefighting, marine operations, and underwater exploration. Accurate extraction and prediction of jet trajectories are essential for optimizing their performance, but challenges arise due to environmental factors such as climate, wind direction, and suction efficiency. To address these issues, we introduce two novel jet segmentation datasets, Libary and SegQinhu, which cover both indoor and outdoor environments under varying weather conditions and temporal intervals. These datasets present significant challenges, including occlusions and strong light reflections, making them ideal for evaluating jet trajectory segmentation methods. Through empirical evaluation of several state-of-the-art (SOTA) techniques on these datasets, we observe that general methods struggle with highly imbalanced pixel distributions in jet trajectory images. To overcome this, we propose a data-driven pipeline for jet trajectory extraction and segmentation. At its core is MobileHDC, a new baseline model that leverages the MobileNetV2 architecture and integrates dilated convolutions to enhance the receptive field without increasing computational cost. Additionally, we introduce a parallel convolutional block and a decoder to fuse multi-level features, enabling a better capture of contextual information and improving the continuity and accuracy of jet segmentation. The experimental results show that our method outperforms existing SOTA techniques on both jet-specific datasets, highlighting the effectiveness of our approach. Full article
Show Figures

Figure 1

Figure 1
<p>Segmentation performance of the SAM model on the Libary and SegQinhu datasets, revealing issues in segmentation where the model tends to misclassify background as jet flow.</p>
Full article ">Figure 2
<p>Relative frequency of annotated jet pixels within an image over the 1300 images in the Libary dataset (<b>a</b>) and the 823 images in the SegQinhu dataset (<b>b</b>), respectively. Here, the fraction of jet pixels serves as proxy for the size of the objects of interest within an image. (<b>a</b>) Libary, (<b>b</b>) SegQinhu.</p>
Full article ">Figure 3
<p>Sample images of jet states from the Libary dataset under various conditions, including strong lighting and minimal pixel coverage.</p>
Full article ">Figure 4
<p>Sample images of jet morphologies from the SegQinhu dataset under various conditions, including occlusion, partial coverage, and reflective scenarios.</p>
Full article ">Figure 5
<p>An overview of the basic architecture of our proposed model. Here, we set the parameters <span class="html-italic">N</span><sub>1</sub>, <span class="html-italic">N</span><sub>2</sub>, <span class="html-italic">N</span><sub>3</sub> for the repeated times as <span class="html-italic">N</span><sub>1</sub> = 6, <span class="html-italic">N</span><sub>2</sub> = 4 and <span class="html-italic">N</span><sub>3</sub> = 2. The operation ⊕ represents the concatenation operation.</p>
Full article ">Figure 6
<p>Diagram of dilated convolution. When the dilation rate is 1, it behaves identically to a standard convolution.</p>
Full article ">Figure 7
<p>Diagram of hybrid dilated convolution layers, where C and C1 represent the number of channels, with C = 160 and C1 = 256, and r = a indicates the dilation rate = a. Additionally, <math display="inline"><semantics> <msub> <mi>x</mi> <mi>s</mi> </msub> </semantics></math> represents the feature maps from the 7th layer of the MobileNetV2 network.</p>
Full article ">Figure 8
<p>Visualization of the jet segmentation results of the different methods on the Libary testing dataset.</p>
Full article ">Figure 9
<p>Visualization of the jet segmentation results of the different methods on the SegQinhu testing dataset.</p>
Full article ">
21 pages, 3926 KiB  
Article
S4Det: Breadth and Accurate Sine Single-Stage Ship Detection for Remote Sense SAR Imagery
by Mingjin Zhang, Yingfeng Zhu, Longyi Li, Jie Guo, Zhengkun Liu and Yunsong Li
Remote Sens. 2025, 17(5), 900; https://doi.org/10.3390/rs17050900 - 4 Mar 2025
Viewed by 178
Abstract
Synthetic Aperture Radar (SAR) is a remote sensing technology that can realize all-weather and all-day monitoring, and it is widely used in ocean ship monitoring tasks. Recently, many oriented detectors were used for ship detection in SAR images. However, these methods often found [...] Read more.
Synthetic Aperture Radar (SAR) is a remote sensing technology that can realize all-weather and all-day monitoring, and it is widely used in ocean ship monitoring tasks. Recently, many oriented detectors were used for ship detection in SAR images. However, these methods often found it difficult to balance the detection accuracy and speed, and the noise around the target in the inshore scene of SAR images led to a poor detection network performance. In addition, the rotation representation still has the problem of boundary discontinuity. To address these issues, we propose S4Det, a Sinusoidal Single-Stage SAR image detection method that enables real-time oriented ship target detection. Two key mechanisms were designed to address inshore scene processing and angle regression challenges. Specifically, a Breadth Search Compensation Module (BSCM) resolved the limited detection capability issue observed within inshore scenarios. Neural Discrete Codebook Learning was strategically integrated with Multi-scale Large Kernel Attention, capturing context information around the target and mitigating the information loss inherent in dilated convolutions. To tackle boundary discontinuity arising from the periodic nature of the target regression angle, we developed a Sine Fourier Transform Coding (SFTC) technique. The angle is represented using diverse sine components, and the discrete Fourier transform is applied to convert these periodic components to the frequency domain for processing. Finally, the experimental results of our S4Det on the RSSDD dataset achieved 92.2% mAP and 31+ FPS on an RTXA5000 GPU, which outperformed the prevalent mainstream of the oriented detection network. The robustness of the proposed S4Det was also verified on another public RSDD dataset. Full article
(This article belongs to the Section AI Remote Sensing)
Show Figures

Figure 1

Figure 1
<p>The overall framework of the proposed S4Det. BSCM (comprising MLKA and NDCL) is integrated into the top layer of the neck network, incorporating a convolutional attention mechanism for noise reduction and utilizing SFTC to encode angle information for loss calculation and training.</p>
Full article ">Figure 2
<p>Comparison of the network design of the existing method (<b>a</b>) and our method (<b>b</b>). In our approach, the backbone and neck share a similar structure, while introducing the BSCM, attention detection head strategy, and angle SFTC.</p>
Full article ">Figure 3
<p>Feature heatmap visualization of the outputs from MLKA, NDCL, and BSCM.</p>
Full article ">Figure 4
<p>Coding and decoding process of SFTC. Sine encoding is applied to the predicted angle <math display="inline"><semantics> <mrow> <mi>θ</mi> <mo>∈</mo> <mo>[</mo> <mo>−</mo> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mi>π</mi> <mn>2</mn> </mfrac> </mstyle> <mo>,</mo> <mstyle scriptlevel="0" displaystyle="true"> <mfrac> <mi>π</mi> <mn>2</mn> </mfrac> </mstyle> <mo>)</mo> </mrow> </semantics></math>, using a four-step phase shift method with initial phases set at 0, 90, 180, and 270 degrees.</p>
Full article ">Figure 5
<p>Target area distributions.</p>
Full article ">Figure 6
<p>Learning curves of the loss values for different dilation designs.</p>
Full article ">Figure 7
<p>Learning curves of the loss values for different modules.</p>
Full article ">Figure 8
<p>Detection performance for different <span class="html-italic">M</span> values in all scenes.</p>
Full article ">Figure 9
<p>Learning curves of the loss values for different methods.</p>
Full article ">Figure 10
<p><math display="inline"><semantics> <mrow> <mi>A</mi> <mi>P</mi> </mrow> </semantics></math> curves on RSDD. (<b>a</b>) <math display="inline"><semantics> <mrow> <mi>A</mi> <msub> <mi>P</mi> <mn>50</mn> </msub> </mrow> </semantics></math> curve. (<b>b</b>) <math display="inline"><semantics> <mrow> <mi>A</mi> <msub> <mi>P</mi> <mn>75</mn> </msub> </mrow> </semantics></math> curve. (<b>c</b>) mAP curve.</p>
Full article ">Figure 11
<p>Visualization of the detection results of different methods on RSSDD. Red rectangles indicate the actual ship targets. Green and purple rectangles represent the detection results of five comparative methods and our method, respectively.</p>
Full article ">Figure 12
<p>Visualization of the detection results of different methods on RSDD. The red rectangular detection boxes represent the actual ground truth annotation. The green and blue rectangular detection boxes denote the detection results of the five comparison methods and the proposed method, respectively. The red ellipses in the picture represent false detection, and the yellow ellipses represent missed detection.</p>
Full article ">
21 pages, 21254 KiB  
Article
Lightweight Explicit 3D Human Digitization via Normal Integration
by Jiaxuan Liu, Jingyi Wu, Ruiyang Jing, Han Yu, Jing Liu and Liang Song
Sensors 2025, 25(5), 1513; https://doi.org/10.3390/s25051513 - 28 Feb 2025
Viewed by 165
Abstract
In recent years, generating 3D human models from images has gained significant attention in 3D human reconstruction. However, deploying large neural network models in practical applications remains challenging, particularly on resource-constrained edge devices. This problem is primarily because large neural network models require [...] Read more.
In recent years, generating 3D human models from images has gained significant attention in 3D human reconstruction. However, deploying large neural network models in practical applications remains challenging, particularly on resource-constrained edge devices. This problem is primarily because large neural network models require significantly higher computational power, which imposes greater demands on hardware capabilities and inference time. To address this issue, we can optimize the network architecture to reduce the number of model parameters, thereby alleviating the heavy reliance on hardware resources. We propose a lightweight and efficient 3D human reconstruction model that balances reconstruction accuracy and computational cost. Specifically, our model integrates Dilated Convolutions and the Cross-Covariance Attention mechanism into its architecture to construct a lightweight generative network. This design effectively captures multi-scale information while significantly reducing model complexity. Additionally, we introduce an innovative loss function tailored to the geometric properties of normal maps. This loss function provides a more accurate measure of surface reconstruction quality and enhances the overall reconstruction performance. Experimental results show that, compared with existing methods, our approach reduces the number of training parameters by approximately 80% while maintaining the generated model’s quality. Full article
Show Figures

Figure 1

Figure 1
<p>This figure demonstrates our method’s reconstruction capability. Our approach successfully generates detailed 3D human models. The top row shows the input images, the middle row presents the generated normal maps, and the bottom row displays the reconstructed 3D human models.</p>
Full article ">Figure 2
<p>Overview. (<b>A</b>) Pipeline for 3D human reconstruction; (<b>B</b>) architectural framework of the loss function for normal map generation.</p>
Full article ">Figure 3
<p>Overview of Lite-GN. The generative network is designed using an encoder–decoder architecture. Each module comprises <math display="inline"><semantics> <msub> <mi>M</mi> <mi>N</mi> </msub> </semantics></math> Dilated Convolution Blocks and a Cross-Covariance Attention Block.</p>
Full article ">Figure 4
<p>The detailed architectures of the Dilated Convolution Block and Cross-Covariance Attention Block are illustrated.</p>
Full article ">Figure 5
<p>Examples from the THuman2.0 dataset.</p>
Full article ">Figure 6
<p>Existing methods exhibit varying limitations in 3D human reconstruction: PIFuHD [<a href="#B13-sensors-25-01513" class="html-bibr">13</a>] demonstrates competent clothing reconstruction capabilities but encounters challenges with complex pose estimation. While ICON [<a href="#B39-sensors-25-01513" class="html-bibr">39</a>] achieves reasonable overall performance, its ability to reconstruct intricate clothing details remains constrained. Similarly, PaMIR [<a href="#B14-sensors-25-01513" class="html-bibr">14</a>] shows insufficient reconstruction fidelity, particularly in capturing fine-grained surface details. In contrast, both ECON [<a href="#B19-sensors-25-01513" class="html-bibr">19</a>] and our proposed method demonstrate superior reconstruction quality, effectively addressing these limitations through advanced architectural designs.</p>
Full article ">Figure 7
<p>User preference. Our method demonstrated higher user preference compared with the baseline approach. The user study results further validate our approach’s effectiveness in meeting the demand for high-quality human body reconstruction. In the figure, the orange color represents our method, while the blue color indicates the comparative method.</p>
Full article ">Figure 8
<p>To enhance the reconstruction quality, the facial and hand regions of the generated model are replaced with the corresponding components from the SMPL-X model.</p>
Full article ">Figure 9
<p>This figure demonstrates the generated texture maps from various viewpoints using different text prompts.</p>
Full article ">Figure 10
<p>We developed a pose-parameter-driven avatar using generated human models based on the SCANimate [<a href="#B60-sensors-25-01513" class="html-bibr">60</a>] framework.</p>
Full article ">
21 pages, 5606 KiB  
Article
CE-RoadNet: A Cascaded Efficient Road Network for Road Extraction from High-Resolution Satellite Images
by Ke-Nan Cheng, Weiping Ni, Han Zhang, Junzheng Wu, Xiao Xiao and Zhigang Yang
Remote Sens. 2025, 17(5), 831; https://doi.org/10.3390/rs17050831 - 27 Feb 2025
Viewed by 129
Abstract
The reconstruction of road networks from high-resolution satellite images is of significant importance across a range of disciplines, including traffic management, vehicle navigation and urban planning. However, existing models are computationally demanding and memory-intensive due to their high model complexity, rendering them impractical [...] Read more.
The reconstruction of road networks from high-resolution satellite images is of significant importance across a range of disciplines, including traffic management, vehicle navigation and urban planning. However, existing models are computationally demanding and memory-intensive due to their high model complexity, rendering them impractical in many real-world applications. In this work, we present Cascaded Efficient Road Network (CE-RoadNet), a novel neural network architecture which emphasizes the elegance and simplicity of its design, while also retaining a noteworthy level of performance in road extraction tasks. First, a simple encoder–decoder architecture (Effi-RoadNet) is proposed, which leverages smoothed dilated convolutions combined with an attention-guided feature fusion module to aggregate features from multiple levels. Subsequently, an extended variant termed CE-RoadNet is designed in a cascaded architecture to enhance the feature representation ability of the model. Benefiting from the concise network design and the prominent representational ability of the stacking mechanism, our network can accomplish better trade-offs between accuracy and efficiency. Extensive experiments on public road datasets demonstrate that our approach achieves state-of-the-art results with lower complexity. All codes and models will be released soon to facilitate reproduction of our results. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Natural image and remote sensing image segmentation task.</p>
Full article ">Figure 2
<p>The overall framework of the proposed method.</p>
Full article ">Figure 3
<p>Smoothed dilated convolution residual block.</p>
Full article ">Figure 4
<p>Illustration of attention-guided feature fusion sub-module.</p>
Full article ">Figure 5
<p>Qualitative evaluations between CE-RoadNet and comparative methods on the DeepGlobe road dataset. Maps in the first line are 1024 × 1024 while the rest are 256 × 256.</p>
Full article ">Figure 6
<p>Qualitative evaluations between CE-RoadNet and comparative methods on the Massachusetts Road dataset. Maps in the first line are 768 × 768 while the rest are 256 × 256.</p>
Full article ">Figure 7
<p>Feature activation visualization. Attention maps and the feature embeddings from stage 1 and stage 2. We visualize the normalized activation map for each feature channel in small squares. Notably, we display the first 64 channels in each visualization.</p>
Full article ">Figure 8
<p>APLS results on the DeepGlobe dataset of the different models. Each bubble’s area is proportional to the params of the corresponding model. All models here take 256<sup>2</sup> images as the input.</p>
Full article ">
19 pages, 3572 KiB  
Article
MOSSNet: A Lightweight Dual-Branch Multiscale Attention Neural Network for Bryophyte Identification
by Haixia Luo, Xiangfen Zhang, Feiniu Yuan, Jing Yu, Hao Ding, Haoyu Xu and Shitao Hong
Symmetry 2025, 17(3), 347; https://doi.org/10.3390/sym17030347 - 25 Feb 2025
Viewed by 161
Abstract
Bryophytes, including liverworts, mosses, and hornworts, play an irreplaceable role in soil moisture retention, erosion prevention, and pollution monitoring. The precise identification of bryophyte species enhances our understanding and utilization of their ecological functions. However, their complex morphology and structural symmetry make identification [...] Read more.
Bryophytes, including liverworts, mosses, and hornworts, play an irreplaceable role in soil moisture retention, erosion prevention, and pollution monitoring. The precise identification of bryophyte species enhances our understanding and utilization of their ecological functions. However, their complex morphology and structural symmetry make identification difficult. Although deep learning improves classification efficiency, challenges remain due to limited datasets and the inadequate adaptation of existing methods to multi-scale features, causing poor performance in fine-grained multi-classification. Thus, we propose MOSSNet, a lightweight neural network for bryophyte feature detection. It has a four-stage architecture that efficiently extracts multi-scale features using a modular design with symmetry consideration in feature representation. At the input stage, the Convolutional Patch Embedding (CPE) module captures representative features through a two-layer convolutional structure. In each subsequent stage, Dual-Branch Multi-scale (DBMS) modules are employed, with one branch utilizing convolutional operations and the other utilizing the Dilated Convolution Enhanced Attention (DCEA) module for multi-scale feature fusion. The DBMS module extracts fine-grained and coarse-grained features by a weighted fusion of the outputs from two branches. Evaluating MOSSNet on the self-constructed dataset BryophyteFine reveals a Top-1 accuracy of 99.02% in classifying 26 bryophyte species, 7.13% higher than the best existing model, while using only 1.58 M parameters, 0.07 G FLOPs. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

Figure 1
<p>Demonstration of interclass similarity and intraclass variability.</p>
Full article ">Figure 2
<p>The overall MOSSNet framework.</p>
Full article ">Figure 3
<p>DBMS module detailed structure.</p>
Full article ">Figure 4
<p>Image types in the BryophyteFine dataset.</p>
Full article ">Figure 5
<p>Heat map of model classification confusion matrix.</p>
Full article ">Figure 6
<p>Distribution of model parameters and Mean Average Precision.</p>
Full article ">
20 pages, 3901 KiB  
Article
Design and Implementation of a Lightweight and Energy-Efficient Semantic Segmentation Accelerator for Embedded Platforms
by Hui Li, Jinyi Li, Bowen Li, Zhengqian Miao and Shengli Lu
Micromachines 2025, 16(3), 258; https://doi.org/10.3390/mi16030258 - 25 Feb 2025
Viewed by 220
Abstract
With the rapid development of lightweight network models and efficient hardware deployment techniques, the demand for real-time semantic segmentation in areas such as autonomous driving and medical image processing has increased significantly. However, realizing efficient semantic segmentation on resource-constrained embedded platforms still faces [...] Read more.
With the rapid development of lightweight network models and efficient hardware deployment techniques, the demand for real-time semantic segmentation in areas such as autonomous driving and medical image processing has increased significantly. However, realizing efficient semantic segmentation on resource-constrained embedded platforms still faces many challenges. As a classical lightweight semantic segmentation network, ENet has attracted much attention due to its low computational complexity. In this study, we optimize the ENet semantic segmentation network to significantly reduce its computational complexity through structural simplification and 8-bit quantization and improve its hardware compatibility through the optimization of on-chip data storage and data transfer while maintaining 51.18% mIoU. The optimized network is successfully deployed on hardware accelerator and SoC systems based on Xilinx ZYNQ ZCU104 FPGA. In addition, we optimize the computational units of transposed convolution and dilated convolution and improve the on-chip data storage and data transfer design. The optimized system achieves a frame rate of 130.75 FPS, which meets the real-time processing requirements in areas such as autonomous driving and medical imaging. Meanwhile, the power consumption of the accelerator is 3.479 W, the throughput reaches 460.8 GOPS, and the energy efficiency reaches 132.2 GOPS/W. These results fully demonstrate the effectiveness of the optimization and deployment strategies in achieving a balance between computational efficiency and accuracy, which makes the system well suited for resource-constrained embedded platform applications. Full article
Show Figures

Figure 1

Figure 1
<p>Optimization of network structure.</p>
Full article ">Figure 2
<p>The overall architecture of the proposed accelerator.</p>
Full article ">Figure 3
<p>Flowchart of accelerator data stream.</p>
Full article ">Figure 4
<p>Schematic design for dealing with discontinuities between dilation convolution columns.</p>
Full article ">Figure 5
<p>Schematic representation of optimized row-caching convolution sliding window for dilation convolution.</p>
Full article ">Figure 6
<p>Overview of line buffer module.</p>
Full article ">Figure 7
<p>Overview of convolution window with delay cell.</p>
Full article ">Figure 8
<p>Overview of weight window generation module.</p>
Full article ">Figure 9
<p>Overview of feature map read-state machine.</p>
Full article ">Figure 10
<p>Overview of configurable computing array.</p>
Full article ">Figure 11
<p>Overview of the array adder tree.</p>
Full article ">Figure 12
<p>The input–output situation of the array addition tree when running transposed convolution.</p>
Full article ">Figure 13
<p>The input–output situation of the second row of PE arrays during transposed convolution.</p>
Full article ">Figure 14
<p>The input–output situation of the first and third rows of PE arrays during transposed convolution.</p>
Full article ">Figure 15
<p>Switching of input and output buffers.</p>
Full article ">Figure 16
<p>Internal structure diagram of the buffer group.</p>
Full article ">Figure 17
<p>Lightweight semantic segmentation model test image. (<b>a</b>) Original image; (<b>b</b>) labeled image; (<b>c</b>) 8-bit quantized lightweight network recognition result.</p>
Full article ">Figure 18
<p>System block design diagram.</p>
Full article ">Figure 19
<p>Overall functional simulation diagram.</p>
Full article ">Figure 20
<p>Overall accelerator power consumption.</p>
Full article ">
22 pages, 11312 KiB  
Article
Multi-Scale Kolmogorov-Arnold Network (KAN)-Based Linear Attention Network: Multi-Scale Feature Fusion with KAN and Deformable Convolution for Urban Scene Image Semantic Segmentation
by Yuanhang Li, Shuo Liu, Jie Wu, Weichao Sun, Qingke Wen, Yibiao Wu, Xiujuan Qin and Yanyou Qiao
Remote Sens. 2025, 17(5), 802; https://doi.org/10.3390/rs17050802 - 25 Feb 2025
Viewed by 212
Abstract
The introduction of an attention mechanism in remote sensing image segmentation improves the accuracy of the segmentation. In this paper, a novel multi-scale KAN-based linear attention (MKLA) segmentation network of MKLANet is developed to promote a better segmentation result. A hybrid global–local attention [...] Read more.
The introduction of an attention mechanism in remote sensing image segmentation improves the accuracy of the segmentation. In this paper, a novel multi-scale KAN-based linear attention (MKLA) segmentation network of MKLANet is developed to promote a better segmentation result. A hybrid global–local attention mechanism in a feature decoder is designed to enhance the ability of aggregating the global–local context and avoiding potential blocking artifacts for feature extraction and segmentation. The local attention channel adopts MKLA block by bringing the merits of KAN convolution in Mamba like the linear attention block to improve the ability of handling linear and nonlinear feature and complex function approximation with a few extra computations. The global attention channel uses long-range cascade encoder–decoder block, where it mainly employs the 7 × 7 depth-wise convolution token mixer and lightweight 7 × 7 dilated deep convolution to capture the long-distance spatial features field and retain key spatial information. In addition, to enrich the input of the attention block, a deformable convolution module is developed between the encoder output and corresponding scale decoder, which can improve the expression ability of the segmentation model without increasing the depth of the network. The experimental results of the Vaihingen dataset (83.68% in mIoU, 92.98 in OA, and 91.08 in mF1), the UAVid dataset (69.78% in mIoU, 96.51 in OA), the LoveDA dataset (51.53% in mIoU, 86.42% in OA, and 67.19% in mF1), and the Potsdam dataset (97.14% in mIoU, 92.64% in OA, and 93.8% in mF1) outperform other advanced attention-based approaches in terms of small targets and edges’ segmentation. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>The overall pipeline and module of the proposed MKLANet; (<b>a</b>) the pipeline of the proposed MKLA network; (<b>b</b>) channel fusion (CF) block structure; (<b>c</b>) KAN convolution representation theorem; (<b>d</b>) illustration of MKLA decoder block; (<b>e</b>) the diagram of DC blocks; (<b>f</b>) segmentation head.</p>
Full article ">Figure 2
<p>The flowchart of DC.</p>
Full article ">Figure 3
<p>Illustration of the SSM and its equivalent form.</p>
Full article ">Figure 4
<p>Structural comparison of (<b>a</b>) Mamba and (<b>b</b>) MKLA.</p>
Full article ">Figure 5
<p>Notations of activations of the KAN function and activation function <span class="html-italic">ϕ</span>(<span class="html-italic">x</span>).</p>
Full article ">Figure 6
<p>The experimental results of the ISPRS Vaihingen dataset.</p>
Full article ">Figure 7
<p>Semantic segmentation results of MKLANet and comparison on UAVid dataset.</p>
Full article ">Figure 8
<p>Semantic segmentation results of LoveDA dataset.</p>
Full article ">Figure 9
<p>Visualization comparisons of the Potsdam dataset.</p>
Full article ">Figure 10
<p>Experimental results of ablation study on different datasets. Red and gray boxes are added to all subfigures to highlight the differences of different method.</p>
Full article ">
20 pages, 6191 KiB  
Article
Research on High-Precision Gas Concentration Inversion for Imaging Fourier Transform Spectroscopy Based on Multi-Scale Feature Attention Model
by Jianhao Luo, Wei Zhao, Feipeng Ouyang, Kaiyang Sheng and Shurong Wang
Appl. Sci. 2025, 15(5), 2438; https://doi.org/10.3390/app15052438 - 25 Feb 2025
Viewed by 203
Abstract
The accurate monitoring of greenhouse gas (GHG) concentrations is crucial in mitigating global warming. The imaging Fourier transform spectrometer (IFTS) is an effective tool for measuring GHG concentrations, offering high throughput and a wide spectral measurement range. In order to address the issue [...] Read more.
The accurate monitoring of greenhouse gas (GHG) concentrations is crucial in mitigating global warming. The imaging Fourier transform spectrometer (IFTS) is an effective tool for measuring GHG concentrations, offering high throughput and a wide spectral measurement range. In order to address the issue of spectral inconsistency during the detection process of the target gas, which is influenced by external environmental factors, making it difficult to achieve high-precision gas concentration inversion, this paper proposes a multi-scale feature attention (MDISE) model. The model uses a multi-scale dilated convolution (MD) module to retain both global and local shallow features of the spectra; introduces the one-dimensional Inception (1D Inception) module to further extract multi-scale deep features; and incorporates the channel attention mechanism (SE) module to enhance attention to important spectral wavelengths, suppressing redundant and interfering information. A target gas detection system was built in the laboratory, and the proposed model was tested on gas samples collected by two channels of a short and medium-wavelength infrared imaging Fourier transform spectrometer (SMWIR-IFTS). The experimental results show that the MDISE model reduces the root mean square error (RMSE) in both channels by 79.14%, 76.59%, and 69.80%, and 81.45%, 82.65%, and 74.01%, respectively, compared to the partial least squares regression (PLSR), support vector regression (SVR), and conventional one-dimensional convolutional neural network (1D-CNN) models. Additionally, the MDISE model achieved average coefficient of determination (R2) values of 0.997 and 0.995 for the concentration intervals in both channels. The MDISE model demonstrates excellent performance and significantly improves the accuracy of GHG concentration inversion. Full article
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) Internal core structure of SMWIR-IFTS; (<b>b</b>) external 3D structure of SMWIR-IFTS.</p>
Full article ">Figure 2
<p>Flowchart of the overall processing.</p>
Full article ">Figure 3
<p>(<b>a</b>) Interferogram data cube; (<b>b</b>) interference image sequence; (<b>c</b>) baseline of interferometric intensity sequence; (<b>d</b>) apodization of interferometric intensity sequence; (<b>e</b>) phase correction of interferometric intensity sequence; (<b>f</b>) reconstructed radiance spectrum of each temperature sampling point.</p>
Full article ">Figure 4
<p>Schematic diagram of three-layer radiance transfer model.</p>
Full article ">Figure 5
<p>Overall structure of MDISE model.</p>
Full article ">Figure 6
<p>Structure of (<b>a</b>) MD module; (<b>b</b>) 1D Inception module; and (<b>c</b>) SE module.</p>
Full article ">Figure 7
<p>Target gas detection system (<b>a</b>) schematic diagram; (<b>b</b>) experimental scene.</p>
Full article ">Figure 8
<p>(<b>a</b>) Gas radiance spectra; (<b>b</b>) 90% CO<sub>2</sub> absorbance spectrum collected by the 1st filter channel at a blackbody temperature of 383 K.</p>
Full article ">Figure 9
<p>(<b>a</b>) Gas radiance spectra; (<b>b</b>) 90% CO<sub>2</sub> absorbance spectrum collected by the 2nd filter channel at a blackbody temperature of 408 K.</p>
Full article ">Figure 10
<p>CO<sub>2</sub> absorbance spectra collected by 1st filter channel at blackbody temperatures of (<b>a</b>) 353 K; (<b>b</b>) 373 K; and (<b>c</b>) 383 K.</p>
Full article ">Figure 11
<p>CO<sub>2</sub> absorbance spectra collected by 2nd filter channel at blackbody temperatures of (<b>a</b>) 353 K; (<b>b</b>) 373 K; and (<b>c</b>) 408 K.</p>
Full article ">Figure 12
<p>Regression prediction scatter plots of (<b>a</b>) PLSR; (<b>b</b>) SVR; (<b>c</b>) 1D-CNN; and (<b>d</b>) MDISE models in 1st filter channel at a blackbody temperature of 353 K.</p>
Full article ">Figure 13
<p>Regression prediction scatter plots of (<b>a</b>) PLSR; (<b>b</b>) SVR; (<b>c</b>) 1D-CNN; and (<b>d</b>) MDISE models in 2nd filter channel at a blackbody temperature of 353 K.</p>
Full article ">Figure 14
<p>Visualization of the feature weights generated by (<b>a</b>) Base model; (<b>b</b>) 1D Inception model; (<b>c</b>) MD-1D Inception model; and (<b>d</b>) MDISE model.</p>
Full article ">
26 pages, 15621 KiB  
Article
Integrated Convolution and Attention Enhancement-You Only Look Once: A Lightweight Model for False Estrus and Estrus Detection in Sows Using Small-Target Vulva Detection
by Yongpeng Duan, Yazhi Yang, Yue Cao, Xuan Wang, Riliang Cao, Guangying Hu and Zhenyu Liu
Animals 2025, 15(4), 580; https://doi.org/10.3390/ani15040580 - 18 Feb 2025
Viewed by 299
Abstract
Accurate estrus detection and optimal insemination timing are crucial for improving sow productivity and enhancing farm profitability in intensive pig farming. However, sows’ estrus typically lasts only 48.4 ± 1.0 h, and interference from false estrus further complicates detection. This study proposes an [...] Read more.
Accurate estrus detection and optimal insemination timing are crucial for improving sow productivity and enhancing farm profitability in intensive pig farming. However, sows’ estrus typically lasts only 48.4 ± 1.0 h, and interference from false estrus further complicates detection. This study proposes an enhanced YOLOv8 model, Integrated Convolution and Attention Enhancement (ICAE), for vulvar detection to identify the estrus stages. This model innovatively divides estrus into three phases (pre-estrus, estrus, and post-estrus) and distinguishes five different estrus states, including pseudo-estrus. ICAE-YOLO integrates the Convolution and Attention Fusion Module (CAFM) and Dual Dynamic Token Mixing (DDTM) for improved feature extraction, Dilation-wise Residual (DWR) for expanding the receptive field, and Focaler-Intersection over Union (Focaler-IoU) for boosting the performance across various detection tasks. To validate the model, it was trained and tested on a dataset of 6402 sow estrus images and compared with YOLOv8n, YOLOv5n, YOLOv7tiny, YOLOv9t, YOLOv10n, YOLOv11n, and the Faster R-CNN. The results show that ICAE-YOLO achieves an mAP of 93.4%, an F1-Score of 92.0%, GFLOPs of 8.0, and a model size of 4.97 M, reaching the highest recognition accuracy among the compared models, while maintaining a good balance between model size and performance. This model enables accurate, real-time estrus monitoring in complex, all-weather farming environments, providing a foundation for automated estrus detection in intensive pig farming. Full article
(This article belongs to the Special Issue Animal Health and Welfare Assessment of Pigs)
Show Figures

Figure 1

Figure 1
<p>Data acquisition process. (<b>a</b>) Structure of the pig farm; (<b>b</b>) data collection process.</p>
Full article ">Figure 2
<p>Data enhancement. (<b>a</b>) Original image; (<b>b</b>) brightness enhancement; (<b>c</b>) contrast enhancement; (<b>d</b>) random flipping; (<b>e</b>) random rotation (with cropping).</p>
Full article ">Figure 3
<p>Vulva images of sows in true and pseudo-estrus. (<b>a</b>) Pre-estrus sow vulva; (<b>b</b>) vulva of sow in pseudo-estrus due to ZEN intoxication.</p>
Full article ">Figure 4
<p>Partial images in estrus dataset of sows.</p>
Full article ">Figure 5
<p>C3 and C2f modules. (<b>a</b>) Module C3; (<b>b</b>) module C2f.</p>
Full article ">Figure 6
<p>Structure of DDTM Attention module.</p>
Full article ">Figure 7
<p>Structure of CAFM Attention module.</p>
Full article ">Figure 8
<p>Structure of DWR module.</p>
Full article ">Figure 9
<p>ICAEM-YOLOv8 network structure diagram.</p>
Full article ">Figure 10
<p>Performance improvement effect of each module combination on the algorithm. (<b>a</b>) Comparison of recognition performances; (<b>b</b>) comparison of GFLOPs.</p>
Full article ">Figure 11
<p>Precision curve, Recall curve, and PR curve of ICAE-YOLO. (<b>a</b>) Precision–confidence curve; (<b>b</b>) Recall–confidence curve; (<b>c</b>) precision–Recall curve.</p>
Full article ">Figure 12
<p>ICAE-YOLO results.</p>
Full article ">Figure 13
<p>Effectiveness of ICAE-YOLO in recognizing sows in different estrous states.</p>
Full article ">Figure 14
<p>Comparison of performances of different models of algorithm. (<b>a</b>) mAP; (<b>b</b>) precision; (<b>c</b>) Recall; (<b>d</b>) F1-Score; (<b>e</b>) GFLOPs; (<b>f</b>) parameters.</p>
Full article ">Figure 15
<p>Precision, Recall and mAP curves for eight algorithmic iterative processes. (<b>a</b>) Precision; (<b>b</b>) Recall; (<b>c</b>) mAP.</p>
Full article ">Figure 16
<p>Average recognition accuracy of eight algorithms in each category.</p>
Full article ">Figure 17
<p>Variation in loss curves during iterations of several algorithms. (<b>a</b>) Box loss; (<b>b</b>) dfl loss; (<b>c</b>) cls loss.</p>
Full article ">Figure 18
<p>Performance radargrams of the eight algorithms. (<b>a</b>) Comparison of radar map performances; (<b>b</b>) area occupied by each algorithm in radargram.</p>
Full article ">Figure 19
<p>Sow estrus data collection and identification system workflow.</p>
Full article ">Figure 20
<p>Conditions leading to model restriction. (<b>a</b>) Extreme light exposure; (<b>b</b>) shading; (<b>c</b>) bacterial vaginitis.</p>
Full article ">
18 pages, 510 KiB  
Article
MCDCNet: Mask Classification Combined with Adaptive Dilated Convolution for Image Semantic Segmentation
by Geng Wei, Junbo Wang, Bingxian Shi, Xiaolin Zhu, Bo Cao and Tong Liu
Appl. Sci. 2025, 15(4), 2012; https://doi.org/10.3390/app15042012 - 14 Feb 2025
Viewed by 293
Abstract
Effectively classifying each pixel in an image is an important research topic in semantic segmentation. The Existing methods typically require the network to directly generate a feature map of the same size as the original image and classify each pixel, which makes it [...] Read more.
Effectively classifying each pixel in an image is an important research topic in semantic segmentation. The Existing methods typically require the network to directly generate a feature map of the same size as the original image and classify each pixel, which makes it difficult for the network to fully leverage the representations from the backbone. To handle this challenge, this paper proposes a method named mask classification combined with an adaptive dilated convolution network (MCDCNet). Firstly, a Vision Transformer (ViT)-based module is employed to capture contextual features as the backbone. Secondly, the Spatial Extraction Module (SEM) is proposed to extract multi-scale spatial information through adaptive dilated convolution while preserving the original feature size. This spatial information is then integrated into the corresponding contextual features to enhance the representation. Finally, a novel inference process is proposed that incorporates the instance activation map (IAM)-based decoder for semantic segmentation, thereby enhancing the network’s capability to capture and comprehend semantic features. The experimental results demonstrate that our network significantly outperforms other per-pixel classification networks across several semantic segmentation datasets. In particular, on Cityscapes, MCDCNet achieves 80.3 mIoU with 11.8 M Params, demonstrating that the network is able to deliver a strong segmentation performance while maintaining a relatively low parameter count. Full article
Show Figures

Figure 1

Figure 1
<p>The mIoU versus parameters on Cityscapes. Seg50, Seg75, and Seg100, respectively, represent input image resolutions of 1536 × 768, 1024 × 512, and 2048 × 1024, respectively. The orange triangle represents our model, while the blue circles represent others.</p>
Full article ">Figure 2
<p>The proposed architecture of MCDCNet. The bottom shows the Spatial Extraction Module (SEM) and inference structure proposed in this paper.</p>
Full article ">Figure 3
<p>Comparison of inference structures of three types of networks. (<b>a</b>–<b>c</b>) refer to the inference processes used in the traditional semantic segmentation network, mask classification network, and MCDCNet, respectively.</p>
Full article ">Figure 4
<p>A ViT-based module with MSCA and FFN as the core components.</p>
Full article ">Figure 5
<p>Visualization results on ADE20k. We compare MCDCNet with TopFormer, SegNeXt-tiny, and SegNeXt-small in the two images above. The top image shows that MCDCNet predicts large areas more accurately. The bottom image demonstrates that MCDCNet classifies categories with similar features more accurately.</p>
Full article ">Figure 6
<p>Visualization results on ADE20k. We visualized the ablation experiments for the network architecture component. There are three network structures in total: MSCA, MSCA + SEM, and MSCA + SEM + IAM. The visualization results effectively demonstrate that the network structure of MCDCNet is optimal.</p>
Full article ">
46 pages, 13796 KiB  
Review
Measurement Techniques for Interfacial Rheology of Surfactant, Asphaltene, and Protein-Stabilized Interfaces in Emulsions and Foams
by Ronald Marquez and Jean-Louis Salager
Colloids Interfaces 2025, 9(1), 14; https://doi.org/10.3390/colloids9010014 - 14 Feb 2025
Viewed by 617
Abstract
This work provides a comprehensive review of experimental methods used to measure rheological properties of interfacial layers stabilized by surfactants, asphaltenes, and proteins that are relevant to systems with large interfacial areas, such as emulsions and foams. Among the shear methods presented, the [...] Read more.
This work provides a comprehensive review of experimental methods used to measure rheological properties of interfacial layers stabilized by surfactants, asphaltenes, and proteins that are relevant to systems with large interfacial areas, such as emulsions and foams. Among the shear methods presented, the deep channel viscometer, bicone rheometer, and double-wall ring rheometers are the most utilized. On the other hand, the main dilational rheology techniques discussed are surface waves, capillary pressure, oscillating Langmuir trough, oscillating pendant drop, and oscillating spinning drop. Recent developments—including machine learning and artificial intelligence (AI) models, such as artificial neural networks (ANN) and convolutional neural networks (CNN)—to calculate interfacial tension from drop shape analysis in shorter times and with higher precision are critically analyzed. Additionally, configurations involving an Atomic Force Microscopy (AFM) cantilever contacting bubble, a microtensiometer platform, rectangular and radial Langmuir troughs, and high-frequency oscillation drop setups are presented. The significance of Gibbs–Marangoni effects and interfacial rheological parameters on the (de)stabilization of emulsions is also discussed. Finally, a critical review of the recent literature on the measurement of interfacial rheology is presented. Full article
(This article belongs to the Special Issue Rheology of Complex Fluids and Interfaces)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>The two types of interfacial rheology measurement in a limited flat region: (<b>A</b>) Shear interfacial rheology—characterized by a constant interfacial area during deformation—and (<b>B</b>) dilatational (simplified as dilational or compression) interfacial rheology, with a variable interfacial area while maintaining shape. Arrows indicate the change after deformation is imposed.</p>
Full article ">Figure 2
<p>Shear strain σ, the stress response τ, and δ for an oscillatory shear test.</p>
Full article ">Figure 3
<p>Oscillatory variation of the interfacial area and the interfacial tension for periodic measurement of oscillatory deformation.</p>
Full article ">Figure 4
<p>Deep channel viscometer. The internal and external cylinders are separated by a distance y<sub>0</sub>, and the base rotates with an angular velocity ω<sub>0</sub>, creating a gap δ through which the liquid flows. The inner fixed cylinder and the outer fixed cylinder define the channel where the liquid is located. The rotating shaft at the base, turning at angular velocity ω<sub>0</sub>, generates shear flow within the liquid as it moves through the gap δ. Reproduced from [<a href="#B73-colloids-09-00014" class="html-bibr">73</a>].</p>
Full article ">Figure 5
<p>Schematic configuration of the bicone interfacial rheometer [<a href="#B76-colloids-09-00014" class="html-bibr">76</a>]. The rotating bicone probe is placed at the interface between Liquid 1 and Liquid 2, allowing for the characterization of the shear and dilational properties at the liquid–liquid interface. The height vertical positions z = 0, z = H<sub>1</sub>, and z = H<sub>2</sub> represent the levels of the two liquid phases, while the radii R<sub>1</sub> and R<sub>2</sub> denote the distances corresponding to the container and the bicone, respectively. The motor transducer controls the angular velocity and detects the response of the interfacial film. α is the cone angle in the range of 5° to 10° for most configurations. Reproduced from [<a href="#B104-colloids-09-00014" class="html-bibr">104</a>].</p>
Full article ">Figure 6
<p>Schematic configuration of the ring viscometer. The Du Noüy ring is placed in the liquid sample, and the device calculates surface tension based on the force required to detach the ring from the liquid interface (Red represents undefined conditions).</p>
Full article ">Figure 7
<p>Double-ring interfacial shear rheometer configuration: (<b>A</b>) Laboratory setup. (<b>B</b>) Geometry with a double-wall and circular ring. Reproduced from [<a href="#B109-colloids-09-00014" class="html-bibr">109</a>].</p>
Full article ">Figure 8
<p>Main components of the magnetic rod interfacial stress rheometer setup. (<b>A</b>) A magnetized rod is suspended at the air–water interface within a Langmuir trough, where surface tension holds it in place. (<b>B</b>) Surrounding the trough are Helmholtz coils, which generate a magnetic field gradient that exerts a force on the rod, causing it to shear the interfacial film. The rod’s movement is primarily influenced by shear stress interactions between the rod and the adjacent glass slides. Reproduced from [<a href="#B95-colloids-09-00014" class="html-bibr">95</a>].</p>
Full article ">Figure 9
<p>Frequency ranges for different methods used in measuring dilational surface elasticity (E). Adapted from [<a href="#B4-colloids-09-00014" class="html-bibr">4</a>].</p>
Full article ">Figure 10
<p>Longitudinal wave measurement equipment. Waves are generated through an electric or mechanical generator, and the motion is recorded via an optical system [<a href="#B101-colloids-09-00014" class="html-bibr">101</a>]. A barrier and a tracer particle, observed through a microscope, help track the movement of the waves. Reproduced from [<a href="#B28-colloids-09-00014" class="html-bibr">28</a>].</p>
Full article ">Figure 11
<p>(<b>A</b>) Oscillating pendant drop interfacial rheology equipment. (<b>B</b>) The drop volume is controlled to vary the area in a sinusoidal manner, the drop profile is recorded with a digital camera, and results are obtained using data analysis software [<a href="#B27-colloids-09-00014" class="html-bibr">27</a>]. Reproduced from [<a href="#B123-colloids-09-00014" class="html-bibr">123</a>].</p>
Full article ">Figure 12
<p>Configurations of the capillary pressure method (<b>A</b>,<b>B</b>). In both configurations, a piezo piston is used to control the liquid flow, and a pressure sensor measures the capillary pressure generated between two immiscible liquids (Liquid 1 and Liquid 2) or between a liquid and air. Reproduced from [<a href="#B124-colloids-09-00014" class="html-bibr">124</a>].</p>
Full article ">Figure 13
<p>Deep neural network architecture employed in pendant drop tensiometry for predicting surface tension from the shape of a pendant drop. The network takes as input a vectorized representation of the drop’s shape, consisting of discrete points sampled along its contour, with both radial and vertical coordinates. The network architecture begins with 226 input neurons, processing the shape information through a series of fully connected hidden layers with varying neuron counts (512, 1024, 256, and 16). Reproduced from [<a href="#B126-colloids-09-00014" class="html-bibr">126</a>].</p>
Full article ">Figure 14
<p>Convolutional neural network architecture for axisymmetric drop shape analysis and surface tension prediction. Reproduced from [<a href="#B127-colloids-09-00014" class="html-bibr">127</a>].</p>
Full article ">Figure 15
<p>Workflow for predicting pendant drop surface tension using a convolutional neural network (CNN) from the generation of drop profiles from the Young–Laplace equation. (<b>a</b>–<b>h</b>) are described in the body of the article. Reproduced from [<a href="#B125-colloids-09-00014" class="html-bibr">125</a>].</p>
Full article ">Figure 16
<p>Recent version of the oscillating spinning drop rheometer ULA. (<b>A</b>) Digital camera; (<b>B</b>) Microscope; (<b>C</b>) Rotating chamber housing the capillary; (<b>D</b>) Main body with digital display for rotational speed and capillary temperature control; (<b>E</b>) Computer and software for data acquisition and processing. Reproduced from [<a href="#B88-colloids-09-00014" class="html-bibr">88</a>].</p>
Full article ">Figure 17
<p>(<b>a</b>) Schematic diagram of the experimental setup: an air bubble is deposited onto a PMMA (polymethyl methacrylate) substrate that is immersed in an aqueous solution containing surfactant molecules (Triton X-100). The bubble is stabilized and a cantilever from an AFM (Atomic Force Microscope) makes contact with the bubble at its apex. (<b>b</b>) Thermal spectrum of the bubble, immersed in a 3 × 10<sup>−4</sup> mM Triton X-100 solution. Reproduced from [<a href="#B137-colloids-09-00014" class="html-bibr">137</a>].</p>
Full article ">Figure 18
<p>Methods used to analyze the interfacial stress contributions for asphaltene nanoaggregates at oil–water interfaces. The left panel presents a representation of the asphaltene nanoaggregates adsorbed at the interface between hexadecane and Milli-Q water. Three experimental setups are depicted: planar compression using a Langmuir trough, compressional and dilational rheology using a radial trough, and shear rheology with a double-wallring (DWR) apparatus. Reproduced from [<a href="#B139-colloids-09-00014" class="html-bibr">139</a>].</p>
Full article ">Figure 19
<p>(<b>a</b>) Microtensiometer platform that uses a capillary tensiometer method. The setup includes an inline pressure transducer capable of measuring pressures below 5000 Pa. It is equipped with a DC motor that oscillates the pressure, enabling the study of dilational rheology. (<b>b</b>) Microbutton device designed for experiments involving water-on-oil interfaces. The water droplet sits on a cover glass, which is placed over oil in a confined environment for precise control of the interface. Reproduced from [<a href="#B138-colloids-09-00014" class="html-bibr">138</a>].</p>
Full article ">Figure 20
<p>Interfacial rheology at the crude oil–water interface in the presence of salts. (<b>a</b>) Formation of a viscoelastic film at the interface after 24 h of contact with deionized water. (<b>b</b>) Dilational modulus (E<sup>s′</sup>, squares, left axis) and shear modulus (G<sup>s′</sup>, solid lines, right axis) evolve as a function of time, representing the reorganization and formation of the interfacial “skin”. Reproduced from [<a href="#B136-colloids-09-00014" class="html-bibr">136</a>].</p>
Full article ">Figure 21
<p>Property changes near the optimum formulation (HLD = 0). (<b>a</b>) Interfacial tension, (<b>b</b>) emulsion type and inversion, (<b>c</b>) emulsion stability, (<b>d</b>) droplet size, (<b>e</b>) emulsion viscosity, and (<b>f</b>) dilational modulus. Reproduced from [<a href="#B162-colloids-09-00014" class="html-bibr">162</a>].</p>
Full article ">Figure 22
<p>Interfacial dilational elasticity (E) and phase behavior as a function of salinity and time for the sodium dodecyl sulfate (SDS) (1 wt%)/n-pentanol (3 wt%)/kerosene/brine system. (<b>A</b>) The salinity scan shows dilational modulus E across Winsor I, Winsor III, and Winsor II regions, with a pronounced minimum at the optimum formulation (5.2 wt% NaCl, HLD = 0), coinciding with the formation of a bicontinuous microemulsion middle phase M. (<b>B</b>) Dynamic E measurements for oil-in-water (O-W) and microemulsion-in-water (M-W) configurations at 5.2 wt% NaCl showing the fast exchanges of interfacial properties in M-W systems, attributable to the microemulsion acting as a surfactant reservoir which makes Marangoni effects almost negligible. Adapted from [<a href="#B166-colloids-09-00014" class="html-bibr">166</a>].</p>
Full article ">Figure 23
<p>Schematic of foam stability as a function of pressure, showing the behavior of CO<sub>2</sub> foam transitioning from a gaseous phase to a dense-phase regime. In the dense-phase CO<sub>2</sub> region (shaded zone), foam stability decreases significantly due to lamella destabilization and bubble coalescence, according to the literature reports [<a href="#B190-colloids-09-00014" class="html-bibr">190</a>].</p>
Full article ">
18 pages, 1807 KiB  
Article
3DVT: Hyperspectral Image Classification Using 3D Dilated Convolution and Mean Transformer
by Xinling Su and Jingbo Shao
Photonics 2025, 12(2), 146; https://doi.org/10.3390/photonics12020146 - 11 Feb 2025
Viewed by 434
Abstract
Hyperspectral imaging and laser technology both rely on different wavelengths of light to analyze the characteristics of materials, revealing their composition, state, or structure through precise spectral data. In hyperspectral image (HSI) classification tasks, the limited number of labeled samples and the lack [...] Read more.
Hyperspectral imaging and laser technology both rely on different wavelengths of light to analyze the characteristics of materials, revealing their composition, state, or structure through precise spectral data. In hyperspectral image (HSI) classification tasks, the limited number of labeled samples and the lack of feature extraction diversity often lead to suboptimal classification performance. Furthermore, traditional convolutional neural networks (CNNs) primarily focus on local features in hyperspectral data, neglecting long-range dependencies and global context. To address these challenges, this paper proposes a novel model that combines CNNs with an average pooling Vision Transformer (ViT) for hyperspectral image classification. The model utilizes three-dimensional dilated convolution and two-dimensional convolution to extract multi-scale spatial–spectral features, while ViT was employed to capture global features and long-range dependencies in the hyperspectral data. Unlike the traditional ViT encoder, which uses linear projection, our model replaces it with average pooling projection. This change enhances the extraction of local features and compensates for the ViT encoder’s limitations in local feature extraction. This hybrid approach effectively combines the local feature extraction strengths of CNNs with the long-range dependency handling capabilities of Transformers, significantly improving overall performance in hyperspectral image classification tasks. Additionally, the proposed method holds promise for the classification of fiber laser spectra, where high precision and spectral analysis are crucial for distinguishing between different fiber laser characteristics. Experimental results demonstrate that the CNN-Transformer model substantially improves classification accuracy on three benchmark hyperspectral datasets. The overall accuracies achieved on the three public datasets—IP, PU, and SV—were 99.35%, 99.31%, and 99.66%, respectively. These advancements offer potential benefits for a wide range of applications, including high-performance optical fiber sensing, laser medicine, and environmental monitoring, where accurate spectral classification is essential for the development of advanced systems in fields such as laser medicine and optical fiber technology. Full article
(This article belongs to the Special Issue Advanced Fiber Laser Technology and Its Application)
Show Figures

Figure 1

Figure 1
<p>Overall framework of the 3DVT network model.</p>
Full article ">Figure 2
<p>ViT encoder with average pooling projection.</p>
Full article ">
30 pages, 16247 KiB  
Article
A Scale-Invariant Looming Detector for UAV Return Missions in Power Line Scenarios
by Jiannan Zhao, Qidong Zhao, Chenggen Wu, Zhiteng Li and Feng Shuang
Biomimetics 2025, 10(2), 99; https://doi.org/10.3390/biomimetics10020099 - 10 Feb 2025
Viewed by 413
Abstract
Unmanned aerial vehicles (UAVs) offer an efficient solution for power grid maintenance, but collision avoidance during return flights is challenged by crossing power lines, especially for small drones with limited computational resources. Conventional visual systems struggle to detect thin, intricate power lines, which [...] Read more.
Unmanned aerial vehicles (UAVs) offer an efficient solution for power grid maintenance, but collision avoidance during return flights is challenged by crossing power lines, especially for small drones with limited computational resources. Conventional visual systems struggle to detect thin, intricate power lines, which are often overlooked or misinterpreted. While deep learning methods have improved static power line detection in images, they still struggle with dynamic scenarios where collision risks are not detected in real time. Inspired by the hypothesis that the Lobula Giant Movement Detector (LGMD) distinguishes sparse and incoherent motion in the background by detecting continuous and clustered motion contours of the looming object, we propose a Scale-Invariant Looming Detector (SILD). SILD detects motion by preprocessing video frames, enhances motion regions using attention masks, and simulates biological arousal to recognize looming threats while suppressing noise. It also predicts impending collisions during high-speed flight and overcomes the limitations of motion vision to ensure consistent sensitivity to looming objects at different scales. We compare SILD with existing static power line detection techniques, including the Hough transform and D-LinkNet with a dilated convolution-based encoder–decoder architecture. Our results show that SILD strikes an effective balance between detection accuracy and real-time processing efficiency. It is well suited for UAV-based power line detection, where high precision and low-latency performance are essential. Furthermore, we evaluated the performance of the model under various conditions and successfully deployed it on a UAV-embedded board for collision avoidance testing at power lines. This approach provides a novel perspective for UAV obstacle avoidance in power line scenarios. Full article
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) Example of a complex power line detection scenario in the real world. (<b>b</b>) A simplified abstraction of the complex power line detection scenario for unmanned aerial vehicles (UAVs). Key elements are split into four categories: The tracking lines (green), obstacles (blue), the horizontal lines (red), and the background (unlabeled).</p>
Full article ">Figure 2
<p>Flow diagram of the proposed visual system sensible to normal-size objects and small targets. It mainly consists of three modules: a preprocessing module (<b>top left</b>), an attention module (<b>top right</b>), and a Lobula Giant Movement Detector (LGMD)-based module (<b>bottom right</b>). There are two signal processing loops; the solid black line represents the workflow of the main processing loop, which extracts looming threat information via image velocity. The dotted red line denotes the attention loop to acquire the preferred attention to power lines. At the start, the feedback loop is processed before the looming detection loop. Note that the power lines are at the left top of images within the preprocessing module, and the small red cubes in the LGMD with the introduction of the distributed presynaptic connection mechanism (D-LGMD) module denote the kernel designed for looming power lines, which was proposed in our previous work [<a href="#B25-biomimetics-10-00099" class="html-bibr">25</a>]. * Sequence: Capturing image motion by recording changes in luminance within the field of view.</p>
Full article ">Figure 3
<p>Illustration of location-induced uneven sensibility. One object with off-center distance <span class="html-italic">e</span> approaches the camera (locust) with forward velocity <span class="html-italic">v</span>. The initial state of the looming object is <math display="inline"><semantics> <mrow> <mo>(</mo> <mi>θ</mi> <mo>,</mo> <mi>d</mi> <mo>)</mo> </mrow> </semantics></math>, and <math display="inline"><semantics> <mrow> <mo>(</mo> <mi>θ</mi> <mo>(</mo> <mi>t</mi> <mo>)</mo> <mo>,</mo> <mi>d</mi> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> </semantics></math>) is the state of the looming object at time <span class="html-italic">t</span>.</p>
Full article ">Figure 4
<p>Neural network schematic of the proposed visual system. The network consists of four neural layers in sequence: retina, photoreceptor, DPC, and G layer.</p>
Full article ">Figure 5
<p>(<b>a</b>) Three-dimensional and (<b>b</b>) vertical view of an attention kernel <math display="inline"><semantics> <mrow> <msub> <mi>W</mi> <mi>A</mi> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>,</mo> <mi>σ</mi> <mo>,</mo> <mi>θ</mi> <mo>,</mo> <mi>ξ</mi> <mo>)</mo> </mrow> </mrow> </semantics></math>, where <math display="inline"><semantics> <mrow> <mi>A</mi> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>B</mi> <mo>=</mo> <mn>3</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>σ</mi> <mo>=</mo> <mn>2.0</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mi>θ</mi> <mo>=</mo> </mrow> </semantics></math> 45°, and <math display="inline"><semantics> <mrow> <mi>ξ</mi> <mo>=</mo> <mn>0.5</mn> </mrow> </semantics></math>.</p>
Full article ">Figure 6
<p>Experiment to verify location-induced uneven sensibility. (<b>a</b>) An experimental scene built on AirSim. (<b>b</b>) Two-dimensional illustrational description of the experimental scene settings in (<b>a</b>). (<b>c</b>) P-layer output heatmap of the original D-LGMD model. (<b>d</b>) Original D-LGMD model output <math display="inline"><semantics> <mrow> <mover accent="true"> <mi>G</mi> <mo stretchy="false">^</mo> </mover> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>,</mo> <mi>t</mi> <mo>)</mo> </mrow> </mrow> </semantics></math> heatmap.</p>
Full article ">Figure 7
<p>Experiment settings built on AirSim, where a drone flies toward three black square objects of the same size. Note that the square in the center also moves toward the drone to generate strong looming stimuli, while the off-center cubes are static.</p>
Full article ">Figure 8
<p>Ablation experiment of location correction. (<b>a</b>) Input signal <math display="inline"><semantics> <mrow> <mi>L</mi> <mo>(</mo> <mi>x</mi> <mo>,</mo> <msub> <mi>y</mi> <mn>0</mn> </msub> <mo>,</mo> <msub> <mi>t</mi> <mn>0</mn> </msub> <mo>)</mo> </mrow> </semantics></math>. Comparative results of (<b>b</b>) P-layer output <math display="inline"><semantics> <mrow> <mi>P</mi> <mo>(</mo> <mi>x</mi> <mo>,</mo> <msub> <mi>y</mi> <mn>0</mn> </msub> <mo>,</mo> <msub> <mi>t</mi> <mn>0</mn> </msub> <mo>)</mo> </mrow> </semantics></math>, (<b>c</b>) DPC-layer output <math display="inline"><semantics> <mrow> <mi>S</mi> <mo>(</mo> <mi>x</mi> <mo>,</mo> <msub> <mi>y</mi> <mn>0</mn> </msub> <mo>,</mo> <msub> <mi>t</mi> <mn>0</mn> </msub> <mo>)</mo> </mrow> </semantics></math>, and (<b>d</b>) G-layer output <math display="inline"><semantics> <mrow> <mover accent="true"> <mi>G</mi> <mo stretchy="false">^</mo> </mover> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <msub> <mi>y</mi> <mn>0</mn> </msub> <mo>,</mo> <msub> <mi>t</mi> <mn>0</mn> </msub> <mo>)</mo> </mrow> </mrow> </semantics></math> with and without location correction.</p>
Full article ">Figure 9
<p>Comparison of G-layer output <math display="inline"><semantics> <mrow> <mover accent="true"> <mi>G</mi> <mo stretchy="false">^</mo> </mover> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>,</mo> <msub> <mi>t</mi> <mn>0</mn> </msub> <mo>)</mo> </mrow> </mrow> </semantics></math> between the D-LGMD model (<b>a</b>) without and (<b>b</b>) with the correction function. The intensive response to non-central squares will mislead the D-LGMD to give false alarms without correcting the location.</p>
Full article ">Figure 10
<p>Input image at time <math display="inline"><semantics> <msub> <mi>t</mi> <mn>0</mn> </msub> </semantics></math>, where there are two power lines at the front and two cubes in the background as interfering noise, and a UAV flies towards the scene.</p>
Full article ">Figure 11
<p>Comparison between the proposed model with and without the attention module. (<b>a</b>) Input signal <math display="inline"><semantics> <mrow> <mi>L</mi> <mo>(</mo> <msub> <mi>x</mi> <mn>0</mn> </msub> <mo>,</mo> <mi>y</mi> <mo>,</mo> <msub> <mi>t</mi> <mn>0</mn> </msub> <mo>)</mo> </mrow> </semantics></math>. (<b>b</b>) Comparison of retina output <math display="inline"><semantics> <mrow> <mi>R</mi> <mo>(</mo> <msub> <mi>x</mi> <mn>0</mn> </msub> <mo>,</mo> <mi>y</mi> <mo>,</mo> <msub> <mi>t</mi> <mn>0</mn> </msub> <mo>)</mo> </mrow> </semantics></math>. (<b>c</b>) Comparison of the P-layer output <math display="inline"><semantics> <mrow> <mi>P</mi> <mo>(</mo> <msub> <mi>x</mi> <mn>0</mn> </msub> <mo>,</mo> <mi>y</mi> <mo>,</mo> <msub> <mi>t</mi> <mn>0</mn> </msub> <mo>)</mo> </mrow> </semantics></math>. (<b>d</b>) P-layer output <math display="inline"><semantics> <mrow> <mi>P</mi> <mo>(</mo> <msub> <mi>x</mi> <mn>0</mn> </msub> <mo>,</mo> <mi>y</mi> <mo>,</mo> <msub> <mi>t</mi> <mn>0</mn> </msub> <mo>)</mo> </mrow> </semantics></math> of the proposed model without the attention module solely. (<b>e</b>) Comparison of the DPC-layer output <math display="inline"><semantics> <mrow> <mi>S</mi> <mo>(</mo> <msub> <mi>x</mi> <mn>0</mn> </msub> <mo>,</mo> <mi>y</mi> <mo>,</mo> <msub> <mi>t</mi> <mn>0</mn> </msub> <mo>)</mo> </mrow> </semantics></math>. (<b>f</b>) Comparison of the G-layer output <math display="inline"><semantics> <mrow> <mover accent="true"> <mi>G</mi> <mo stretchy="false">^</mo> </mover> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mn>0</mn> </msub> <mo>,</mo> <mi>y</mi> <mo>,</mo> <msub> <mi>t</mi> <mn>0</mn> </msub> <mo>)</mo> </mrow> </mrow> </semantics></math>.</p>
Full article ">Figure 12
<p>Neural response comparison between the D-LGMD model (left green box) and the proposed SILD (right orange box) in the experiment. (<b>a</b>) Grayscale input samples. The comparative output samples of the (<b>b</b>) P layer, (<b>c</b>) DPC layer, and (<b>d</b>) G layer. The whole process lasts 68 frames in the experiment, and the sampled images are at frames 31, 44, 56, and 68.</p>
Full article ">Figure 13
<p>Comparative results of (<b>a</b>) normalized output MP and (<b>b</b>) unnormalized output MP between the proposed model and the original D-LGMD model.</p>
Full article ">Figure 14
<p>Input images of three comparison groups under different conditions (<b>top</b>) with the thermogram output from the proposed model (<b>bottom</b>). Group (<b>a</b>) shows two background disturbances of city and snow; group (<b>b</b>) shows two image noise disturbances of falling leaves and rain; and group (<b>c</b>) shows two low-image-texture disturbances of foggy and dark.</p>
Full article ">Figure 15
<p>MP output variation curves for the three comparison groups in <a href="#biomimetics-10-00099-f014" class="html-fig">Figure 14</a>. Figure (<b>a</b>) shows the MP output variation under three background disturbances: city, snow, and plain. Figure (<b>b</b>) illustrates the MP output variation under two image noise disturbances: falling leaves and rain. Figure (<b>c</b>) presents the MP output variation under two low-image-texture disturbances: foggy and dark. Among the three groups, the dashed line shows when the UAV starts to gradually decelerate at that frame, and the red triangle shows the error signal generated by the model being affected by falling leaves.</p>
Full article ">Figure 16
<p>Example results of the four models. The inputs of the first three lines are from real-world datasets, and the input of the last line is generated on the AirSim [<a href="#B34-biomimetics-10-00099" class="html-bibr">34</a>] platform.</p>
Full article ">Figure 17
<p>Experimental results of the proposed system. (<b>a</b>) Experimental site; (<b>b</b>) quadrotor obstacle avoidance system for power line; (<b>c</b>) MP output variation curves during actual flight. The red pentagram shows when the MP value exceeds the threshold (set to 4000), triggering the drone to autonomously hover and land.</p>
Full article ">
Back to TopTop