Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (184)

Search Parameters:
Keywords = adaptive receptive field

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 2156 KiB  
Article
LHGCN: A Laminated Heterogeneous Graph Convolutional Network for Modeling User–Item Interaction in E-Commerce
by Kang Liu, Mengtao Kang, Xinyu Li and Wenqing Dai
Symmetry 2024, 16(12), 1695; https://doi.org/10.3390/sym16121695 (registering DOI) - 21 Dec 2024
Abstract
The e-commerce data structure is a typical multiplex graph network structure, which allows multiple types of edges between node pairs. However, existing methods that rely on message-passing frameworks are not sufficient to fully exploit the rich information in multiplex graphs. To improve the [...] Read more.
The e-commerce data structure is a typical multiplex graph network structure, which allows multiple types of edges between node pairs. However, existing methods that rely on message-passing frameworks are not sufficient to fully exploit the rich information in multiplex graphs. To improve the performance of link prediction, we propose a novel laminated heterogeneous graph convolutional network (LHGCN) consisting of three core modules: a laminate generation module (LGM), an adaptive convolution module (ACM), and a laminate fusion module (LFM). More specifically, the LGM generates symmetric laminates that cover diverse semantics to create rich node representations. Then, the ACM dynamically adjusts the node receptive field and flexibly captures local information, thereby enhancing the representation ability of the node. Through symmetric information propagation across laminates, the LFM combines multiple laminated features to optimize the global representation, which enables our model to accurately predict links. Moreover, an elaborate loss function, consisting of positive sample loss, negative sample loss, and L2 regularization loss, drives the network to preserve critical information. Extensive experiments on various benchmarks demonstrate the superiority of our method over state-of-the-art alternatives in terms of link prediction. Full article
(This article belongs to the Topic Advances in Computational Materials Sciences)
Show Figures

Figure 1

Figure 1
<p>Alibaba dataset structure pattern (<b>left</b>) and feature distribution (<b>right</b>).</p>
Full article ">Figure 2
<p>The overall architecture of LHGCN.</p>
Full article ">Figure 3
<p>Laminate generation module.</p>
Full article ">Figure 4
<p>Adaptive convolution module (gate outputs z and r and controls 2 types of laminates enjoying different receptive fields); the detailed implementation of attention and gate is shown in <a href="#symmetry-16-01695-f005" class="html-fig">Figure 5</a>.</p>
Full article ">Figure 5
<p>Information flow in convolutional layers, exemplified by laminate U.</p>
Full article ">Figure 6
<p>Laminate fusion module (the AC Layer is the adaptive convolution layer).</p>
Full article ">Figure 7
<p>Effects of the module design: <b>full</b> indicates the full model, <b>-laminate</b> excludes the LGM, <b>-gate</b> excludes the ACM, and <b>base</b> omits both modules mentioned above.</p>
Full article ">Figure 8
<p>Comparison of experimental results between GRU and LSTM.</p>
Full article ">Figure 9
<p>Multiplex effectiveness analysis.</p>
Full article ">Figure 10
<p>Parameter sensitivity analysis.</p>
Full article ">
15 pages, 3905 KiB  
Article
Conditional Skipping Mamba Network for Pan-Sharpening
by Yunxuan Tang, Huaguang Li, Peng Liu and Tong Li
Symmetry 2024, 16(12), 1681; https://doi.org/10.3390/sym16121681 - 19 Dec 2024
Viewed by 141
Abstract
Pan-sharpening aims to generate high-resolution multispectral (HRMS) images by combining high-resolution panchromatic (PAN) images with low-resolution multispectral (LRMS) data, while maintaining the symmetry of spatial and spectral characteristics. Traditional convolutional neural networks (CNNs) struggle with global dependency modeling due to local receptive fields, [...] Read more.
Pan-sharpening aims to generate high-resolution multispectral (HRMS) images by combining high-resolution panchromatic (PAN) images with low-resolution multispectral (LRMS) data, while maintaining the symmetry of spatial and spectral characteristics. Traditional convolutional neural networks (CNNs) struggle with global dependency modeling due to local receptive fields, and Transformer-based models are computationally expensive. Recent Mamba models offer linear complexity and effective global modeling. However, existing Mamba-based methods lack sensitivity to local feature variations, leading to suboptimal fine-detail preservation. To address this, we propose a Conditional Skipping Mamba Network (CSMN), which enhances global-local feature fusion symmetrically through two modules: (1) the Adaptive Mamba Module (AMM), which improves global perception using adaptive spatial-frequency integration; and (2) the Cross-domain Mamba Module (CDMM), optimizing cross-domain spectral-spatial representation. Experimental results on the IKONOS and WorldView-2 datasets demonstrate that CSMN surpasses existing state-of-the-art methods in achieving superior spectral consistency and preserving spatial details, with performance that is more symmetric in fine-detail preservation. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

Figure 1
<p>The proposed MSMN architecture features multiple iterative blocks, each containing key sub-blocks: the AMM and the CDM. These components collectively enhance local adaptivity and cross-domain feature integration, improving the overall model’s capability for pan-sharpening tasks.</p>
Full article ">Figure 2
<p>Illustrative breakdown of the components in the AMM.</p>
Full article ">Figure 3
<p>Illustrative breakdown of the components in the CDM.</p>
Full article ">Figure 4
<p>Qualitative results on reduced-resolution IKONOS datasets. The top row shows the fused outputs, while the bottom row depicts the error maps between the fused results and reference images.</p>
Full article ">Figure 5
<p>Qualitative results on reduced-resolution WV-2 datasets. The top row shows the fused outputs, while the bottom row depicts the error maps between the fused results and reference images.</p>
Full article ">Figure 6
<p>Qualitative analysis for the full-scale evaluation on the IKONOS datasets. The red and blue frames highlight details at different positions within the image.</p>
Full article ">Figure 7
<p>Qualitative analysis for the full-scale evaluation on the WV-2 datasets. The red and blue frames highlight details at different positions within the image.</p>
Full article ">Figure 8
<p>Ablation study on different framework combinations. (<b>a</b>) AMM combined with Mamba, (<b>b</b>) CDM combined with Mamba, (<b>c</b>) our complete model architecture.</p>
Full article ">Figure 9
<p>Visual comparison from the ablation study across two datasets. The content within the red box exhibits significant differences.</p>
Full article ">
24 pages, 5004 KiB  
Article
SymSwin: Multi-Scale-Aware Super-Resolution of Remote Sensing Images Based on Swin Transformers
by Dian Jiao, Nan Su, Yiming Yan, Ying Liang, Shou Feng, Chunhui Zhao and Guangjun He
Remote Sens. 2024, 16(24), 4734; https://doi.org/10.3390/rs16244734 - 18 Dec 2024
Viewed by 234
Abstract
Despite the successful applications of the remote sensing image in agriculture, meteorology, and geography, its relatively low spatial resolution is hindering the further applications. Super-resolution technology is introduced to conquer such a dilemma. It is a challenging task due to the variations in [...] Read more.
Despite the successful applications of the remote sensing image in agriculture, meteorology, and geography, its relatively low spatial resolution is hindering the further applications. Super-resolution technology is introduced to conquer such a dilemma. It is a challenging task due to the variations in object size and textures in remote sensing images. To address that problem, we present SymSwin, a super-resolution model based on the Swin transformer aimed to capture a multi-scale context. The symmetric multi-scale window (SyMW) mechanism is proposed and integrated in the backbone, which is capable of perceiving features with various sizes. First, the SyMW mechanism is proposed to capture discriminative contextual features from multi-scale presentations using corresponding attentive window size. Subsequently, a cross-receptive field-adaptive attention (CRAA) module is introduced to model the relations among multi-scale contexts and to realize adaptive fusion. Furthermore, RS data exhibit poor spatial resolution, leading to insufficient visual information when merely spatial supervision is applied. Therefore, a U-shape wavelet transform (UWT) loss is proposed to facilitate the training process from the frequency domain. Extensive experiments demonstrate that our method achieves superior performance in both quantitative metrics and visual quality compared with existing algorithms. Full article
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) Overall architecture of SymSwin, containing three main functional stages. The chief deep feature extraction stage involves SyMWBs and CRAAs. (<b>b</b>) Detailed illustration of SyMWB composition. (<b>c</b>) Detailed illustration of the CRAA module. (<b>d</b>) Detailed illustration of the Swin-DCFF layer. SW-SA denotes conventional shifted-window self-attention. (<b>e</b>) Detailed illustration of DCFF.</p>
Full article ">Figure 2
<p>Indication of SyMW mechanism. <span class="html-fig-inline" id="remotesensing-16-04734-i001"><img alt="Remotesensing 16 04734 i001" src="/remotesensing/remotesensing-16-04734/article_deploy/html/images/remotesensing-16-04734-i001.png"/></span> denotes window for SyMWB<sub>i</sub>, <span class="html-fig-inline" id="remotesensing-16-04734-i002"><img alt="Remotesensing 16 04734 i002" src="/remotesensing/remotesensing-16-04734/article_deploy/html/images/remotesensing-16-04734-i002.png"/></span> denotes window for SyMWB<sub>i+1</sub>, and <span class="html-fig-inline" id="remotesensing-16-04734-i003"><img alt="Remotesensing 16 04734 i003" src="/remotesensing/remotesensing-16-04734/article_deploy/html/images/remotesensing-16-04734-i003.png"/></span> denotes feature map of SyMWB<sub>i</sub>. Each feature map represents the extraction of a whole block. The grid denotes the window size used on each feature map. The illustration intuitively demonstrates the SyMW can provide multi-scale context.</p>
Full article ">Figure 3
<p>Illustration of the CRAA module, containing two main functional stages. During the CRA stage, we calculate the correlation between context with different receptive fields and achieve flexible fusion. During the AFF stage, we adaptively enhance the fusion feature.</p>
Full article ">Figure 4
<p>Illustration of the SWT process. The color space conversion converts an image from RGB space to YCrCb space, and we select the Y-band value, representing the luminance information. LF denotes the low-frequency sub-band, and HF denotes high-frequency sub-bands. The sketches of HF directly depict the horizontal, vertical, and diagonal direction edges.</p>
Full article ">Figure 5
<p>The visualization examples of the ×4 super-resolution reconstruction inference results for the algorithms mentioned in the quantitative experiments on datasets NWPU-RESISC45 and DIOR. The values PSNR and SSIM are listed below each patch, the best performance is highlighted in <b><span style="color:red">bold red</span></b> font and the second-ranked is highlighted in <span style="color:#0070C0">blue</span> font. The inset on the right is a magnified view of the region enclosed by the red bounding box in the main image. Zoom in for better observation.</p>
Full article ">Figure 5 Cont.
<p>The visualization examples of the ×4 super-resolution reconstruction inference results for the algorithms mentioned in the quantitative experiments on datasets NWPU-RESISC45 and DIOR. The values PSNR and SSIM are listed below each patch, the best performance is highlighted in <b><span style="color:red">bold red</span></b> font and the second-ranked is highlighted in <span style="color:#0070C0">blue</span> font. The inset on the right is a magnified view of the region enclosed by the red bounding box in the main image. Zoom in for better observation.</p>
Full article ">Figure 6
<p>The visualization examples of the ×3 super-resolution reconstruction inference results for the algorithms mentioned in the quantitative experiments on datasets NWPU-RESISC45 and DIOR. The values PSNR and SSIM are listed below each patch, the best performance is highlighted in <b><span style="color:red">bold red</span></b> font and the second-ranked is highlighted in <span style="color:#0070C0">blue</span> font. The inset on the right is a magnified view of the region enclosed by the red bounding box in the main image. Zoom in for better observation.</p>
Full article ">Figure 6 Cont.
<p>The visualization examples of the ×3 super-resolution reconstruction inference results for the algorithms mentioned in the quantitative experiments on datasets NWPU-RESISC45 and DIOR. The values PSNR and SSIM are listed below each patch, the best performance is highlighted in <b><span style="color:red">bold red</span></b> font and the second-ranked is highlighted in <span style="color:#0070C0">blue</span> font. The inset on the right is a magnified view of the region enclosed by the red bounding box in the main image. Zoom in for better observation.</p>
Full article ">Figure 7
<p>A comparison of the visualized feature maps extracted by each layer of the backbone with and without multi-scale representations, illustrating the different regions of interest the nets tend to focus on. The color closer to red denotes the stronger attention.</p>
Full article ">
16 pages, 3143 KiB  
Article
DGA Domain Detection Based on Transformer and Rapid Selective Kernel Network
by Jisheng Tang, Yiling Guan, Shenghui Zhao, Huibin Wang and Yinong Chen
Electronics 2024, 13(24), 4982; https://doi.org/10.3390/electronics13244982 - 18 Dec 2024
Viewed by 257
Abstract
Botnets pose a significant challenge in network security by leveraging Domain Generation Algorithms (DGA) to evade traditional security measures. Extracting DGA domain samples is inherently complex, and the current DGA detection models often struggle to capture domain features effectively when facing limited training [...] Read more.
Botnets pose a significant challenge in network security by leveraging Domain Generation Algorithms (DGA) to evade traditional security measures. Extracting DGA domain samples is inherently complex, and the current DGA detection models often struggle to capture domain features effectively when facing limited training data. This limitation results in suboptimal detection performance and an imbalance between model accuracy and complexity. To address these challenges, this paper introduces a novel multi-scale feature fusion model that integrates the Transformer architecture with the Rapid Selective Kernel Network (R-SKNet). The proposed model employs the Transformer’s encoder to couple the single-domain character elements with the multiple types of relationships within the global domain block. This paper proposes integrating R-SKNet into DGA detection and developing an efficient channel attention (ECA) module. By enhancing the branch information guidance in the SKNet architecture, the approach achieves adaptive receptive field selection, multi-scale feature capture, and lightweight yet efficient multi-scale convolution. Moreover, the improved Feature Pyramid Network (FPN) architecture, termed EFAM, is utilized to adjust channel weights for outputs at different stages of the backbone network, leading to achieving multi-scale feature fusion. Experimental results demonstrate that, in tasks with limited training samples, the proposed method achieves lower computational complexity and higher detection accuracy compared to mainstream detection models. Full article
Show Figures

Figure 1

Figure 1
<p>Overall framework.</p>
Full article ">Figure 2
<p>Sample domain length.</p>
Full article ">Figure 3
<p>Transformer encoder module.</p>
Full article ">Figure 4
<p>R-SK convolution structure.</p>
Full article ">Figure 5
<p><b>ECA module</b>.</p>
Full article ">Figure 6
<p>Band matrix.</p>
Full article ">Figure 7
<p>EFAM structure.</p>
Full article ">Figure 8
<p>Binary Classification Results and Model Parameters Comparisons.</p>
Full article ">
22 pages, 7963 KiB  
Article
WTSM-SiameseNet: A Wood-Texture-Similarity-Matching Method Based on Siamese Networks
by Yizhuo Zhang, Guanlei Wu, Shen Shi and Huiling Yu
Information 2024, 15(12), 808; https://doi.org/10.3390/info15120808 - 16 Dec 2024
Viewed by 284
Abstract
In tasks such as wood defect repair and the production of high-end wooden furniture, ensuring the consistency of the texture in repaired or jointed areas is crucial. This paper proposes the WTSM-SiameseNet model for wood-texture-similarity matching and introduces several improvements to address the [...] Read more.
In tasks such as wood defect repair and the production of high-end wooden furniture, ensuring the consistency of the texture in repaired or jointed areas is crucial. This paper proposes the WTSM-SiameseNet model for wood-texture-similarity matching and introduces several improvements to address the issues present in traditional methods. First, to address the issue that fixed receptive fields cannot adapt to textures of different sizes, a multi-receptive field fusion feature extraction network was designed. This allows the model to autonomously select the optimal receptive field, enhancing its flexibility and accuracy when handling wood textures at different scales. Secondly, the interdependencies between layers in traditional serial attention mechanisms limit performance. To address this, a concurrent attention mechanism was designed, which reduces interlayer interference by using a dual-stream parallel structure that enhances the ability to capture features. Furthermore, to overcome the issues of existing feature fusion methods that disrupt spatial structure and lack interpretability, this study proposes a feature fusion method based on feature correlation. This approach not only preserves the spatial structure of texture features but also improves the interpretability and stability of the fused features and the model. Finally, by introducing depthwise separable convolutions, the issue of a large number of model parameters is addressed, significantly improving training efficiency while maintaining model performance. Experiments were conducted using a wood texture similarity dataset consisting of 7588 image pairs. The results show that WTSM-SiameseNet achieved an accuracy of 96.67% on the test set, representing a 12.91% improvement in accuracy and a 14.21% improvement in precision compared to the pre-improved SiameseNet. Compared to CS-SiameseNet, accuracy increased by 2.86%, and precision improved by 6.58%. Full article
Show Figures

Figure 1

Figure 1
<p>Diagram of the SiameseNet architecture.</p>
Full article ">Figure 2
<p>Diagram of the WTSM-SiameseNet architecture.</p>
Full article ">Figure 3
<p>Diagram of the MRF-Resnet architecture.</p>
Full article ">Figure 4
<p>Multi-scale receptive field fusion.</p>
Full article ">Figure 5
<p>Concurrent attention.</p>
Full article ">Figure 6
<p>CBAM attention.</p>
Full article ">Figure 7
<p>Texture feature aggregation and matching module.</p>
Full article ">Figure 8
<p>Sample dataset.</p>
Full article ">Figure 9
<p>Training loss.</p>
Full article ">Figure 10
<p>Wood-texture-similarity matching example.</p>
Full article ">
23 pages, 13802 KiB  
Article
Underwater-Yolo: Underwater Object Detection Network with Dilated Deformable Convolutions and Dual-Branch Occlusion Attention Mechanism
by Zhenming Li, Bing Zheng, Dong Chao, Wenbo Zhu, Haibing Li, Jin Duan, Xinming Zhang, Zhongbo Zhang, Weijie Fu and Yunzhi Zhang
J. Mar. Sci. Eng. 2024, 12(12), 2291; https://doi.org/10.3390/jmse12122291 - 12 Dec 2024
Viewed by 479
Abstract
Underwater object detection is critical for marine ecological monitoring and biodiversity research, yet existing algorithms struggle in detecting densely packed objects of varying sizes, particularly in occluded and complex underwater environments. This study introduces Underwater-Yolo, a novel detection network that enhances performance in [...] Read more.
Underwater object detection is critical for marine ecological monitoring and biodiversity research, yet existing algorithms struggle in detecting densely packed objects of varying sizes, particularly in occluded and complex underwater environments. This study introduces Underwater-Yolo, a novel detection network that enhances performance in these challenging scenarios by integrating a dual-branch occlusion-handling attention mechanism (GLOAM) and a Cross-Stage Partial Dilated Deformable Convolution (CSP-DDC) backbone. The dilated deformable convolutions (DDCs) in the backbone and neck expand the receptive field, thereby improving the detection of small objects, while the deformable convolutions enhance the model’s adaptive feature extraction capabilities for unstructured objects. Additionally, the CARAFE up-sampling operator in the neck aggregates contextual information across a broader spatial domain. The GLOAM, consisting of a global branch (using a Vision Transformer to capture global features and object–background relationships) and a local branch (enhancing the detection of occluded objects through depthwise–pointwise convolutions), further optimizes performance. By incorporating these innovations, the model effectively addresses the challenges of detecting small and occluded objects in dense underwater environments. The evaluation on the CLfish-V1 dataset shows significant improvements over state-of-the-art algorithms, with an AP50 of 93.8%, an AP75 of 88.9%, and an AP-small of 76.4%, marking gains of 4.7%, 16.7%, and 6%, respectively. These results demonstrate the model’s effectiveness in complex underwater scenarios. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

Figure 1
<p>Overall network architecture of Underwater-Yolo. * Representatives multiplied.</p>
Full article ">Figure 2
<p>CSP-DDC module; Pathways with different dilatation rates are indicated in different colors.</p>
Full article ">Figure 3
<p>Dual-branch occlusion-handling attention mechanism (GLOAM); Local and global branches are indicated in different colors.</p>
Full article ">Figure 4
<p>Small-Object Detection layer (SOD-layer). * Representatives multiplied.</p>
Full article ">Figure 5
<p>Distribution of annotation boxes; Each color represents the range of the size of a class of annotation box, and x represents the central anchor point selected by this box.</p>
Full article ">Figure 6
<p>Data statistics of the CLfish-V1 dataset.</p>
Full article ">Figure 7
<p>Target feature representation of the CLfish-V1 dataset; (<b>a</b>) characteristics of dense-target groups; (<b>b</b>) features of small targets; Red boxes represent the position and size of the target mark boxes in Figure.</p>
Full article ">Figure 8
<p>Heatmap visualization of CSP-DDC compared with other backbone networks. The darker the color represents, the better the feature extraction of the corresponding area of the image.</p>
Full article ">Figure 9
<p>Visualization of detection results from Underwater-Yolo compared with other single-stage object detection networks.</p>
Full article ">Figure 10
<p>Visualization of detection results for Underwater-Yolo in different underwater environments (<b>a</b>–<b>c</b>); (<b>a</b>) is the scene with a low contrast, (<b>b</b>) is the color-biased scene, and (<b>c</b>) is the atomized scene.</p>
Full article ">Figure 11
<p>Limitations of Underwater-Yolo. (<b>a</b>) the scene with high dynamics and (<b>b</b>) the scene with high noise.</p>
Full article ">Figure 12
<p>Visualization of heatmaps for ablation study of various models. The darker the color represents, the better the feature extraction of the corresponding area of the image; (<b>1</b>)–(<b>3</b>) represents each generation of the ablation model of the ablation experiment.</p>
Full article ">
18 pages, 15492 KiB  
Article
D3-YOLOv10: Improved YOLOv10-Based Lightweight Tomato Detection Algorithm Under Facility Scenario
by Ao Li, Chunrui Wang, Tongtong Ji, Qiyang Wang and Tianxue Zhang
Agriculture 2024, 14(12), 2268; https://doi.org/10.3390/agriculture14122268 - 11 Dec 2024
Viewed by 461
Abstract
Accurate and efficient tomato detection is one of the key techniques for intelligent automatic picking in the area of precision agriculture. However, under the facility scenario, existing detection algorithms still have challenging problems such as weak feature extraction ability for occlusion conditions and [...] Read more.
Accurate and efficient tomato detection is one of the key techniques for intelligent automatic picking in the area of precision agriculture. However, under the facility scenario, existing detection algorithms still have challenging problems such as weak feature extraction ability for occlusion conditions and different fruit sizes, low accuracy on edge location, and heavy model parameters. To address these problems, this paper proposed D3-YOLOv10, a lightweight YOLOv10-based detection framework. Initially, a compact dynamic faster network (DyFasterNet) was developed, where multiple adaptive convolution kernels are aggregated to extract local effective features for fruit size adaption. Additionally, the deformable large kernel attention mechanism (D-LKA) was designed for the terminal phase of the neck network by adaptively adjusting the receptive field to focus on irregular tomato deformations and occlusions. Then, to further improve detection boundary accuracy and convergence, a dynamic FM-WIoU regression loss with a scaling factor was proposed. Finally, a knowledge distillation scheme using semantic frequency prompts was developed to optimize the model for lightweight deployment in practical applications. We evaluated the proposed framework using a self-made tomato dataset and designed a two-stage category balancing method based on diffusion models to address the sample class-imbalanced issue. The experimental results demonstrated that the D3-YOLOv10 model achieved an mAP0.5 of 91.8%, with a substantial reduction of 54.0% in parameters and 64.9% in FLOPs, compared to the benchmark model. Meanwhile, the detection speed of 80.1 FPS more effectively meets the demand for real-time tomato detection. This study can effectively contribute to the advancement of smart agriculture research on the detection of fruit targets. Full article
Show Figures

Figure 1

Figure 1
<p>The workflow diagram for this paper.</p>
Full article ">Figure 2
<p>Tomato occlusion in the facility scenario, including the unoccluded, mutual occlusion, leaf occlusion, and facility occlusion.</p>
Full article ">Figure 3
<p>Structure diagram of the two-stage class balancing method based on the diffusion model.</p>
Full article ">Figure 4
<p><b>Overview of the experimental framework. Stage 1</b>: With the teacher model’s guidance, the learnable frequency prompts interact with the frequency bands. <b>Stage 2</b>: The feature maps distilled from both the student and teacher are initially transformed into the frequency domain. The frequency prompts from Stage 1 are then applied, with the frozen prompts multiplied by the teacher’s frequency bands to generate points of interest (PoIs). Finally, the spatial weights for each channel are determined by the teacher and student spatial gates. Process (1) in the figure identifies the distillation locations, while Process (2) measures the distillation extent.</p>
Full article ">Figure 5
<p>Architecture of the DyFasterNet module: (<b>a</b>) DyFasterNet; (<b>b</b>) Dynamic convolution.</p>
Full article ">Figure 6
<p>Deformable large kernel attention mechanism structure.</p>
Full article ">Figure 7
<p>The proposed Inner-FM-WIoU.</p>
Full article ">Figure 8
<p>YOLOv10s model performance for class-balanced datasets vs. class-imbalanced datasets.</p>
Full article ">Figure 9
<p>The loss curve and <math display="inline"><semantics> <mrow> <mi>m</mi> <mi>A</mi> <msub> <mi>P</mi> <mrow> <mn>0.5</mn> </mrow> </msub> </mrow> </semantics></math> under different scaling ratios.</p>
Full article ">Figure 10
<p>Examples of detection capabilities of comparative models in facility environment.</p>
Full article ">Figure 11
<p>Robustness experiment in various scenarios: Area (<b>a</b>) demonstrates the performance of the model in a facility agriculture environment with occlusion. Areas (<b>b</b>,<b>c</b>,<b>e</b>) show tomatoes partially occluded by branches or leaves. While area (<b>d</b>) highlights small target recognition. Finally, area (<b>f</b>) presents cases of mutual occlusion between fruits.</p>
Full article ">
26 pages, 11259 KiB  
Article
Axial-UNet++ Power Line Detection Network Based on Gated Axial Attention Mechanism
by Ding Hu, Zihao Zheng, Yafei Liu, Chengkang Liu and Xiaoguo Zhang
Remote Sens. 2024, 16(23), 4585; https://doi.org/10.3390/rs16234585 - 6 Dec 2024
Viewed by 408
Abstract
The segmentation and recognition of power lines are crucial for the UAV-based inspection of overhead power lines. To address the issues of class imbalance, low sample quantity, and long-range dependency in images, a specialized semantic segmentation network for power line segmentation called Axial-UNet++ [...] Read more.
The segmentation and recognition of power lines are crucial for the UAV-based inspection of overhead power lines. To address the issues of class imbalance, low sample quantity, and long-range dependency in images, a specialized semantic segmentation network for power line segmentation called Axial-UNet++ is proposed. Firstly, to tackle the issue of long-range dependencies in images and low sample quantity, a gated axial attention mechanism is introduced to expand the receptive field and improve the capture of relative positional biases in small datasets, thereby proposing a novel feature extraction module termed axial-channel local normalization module. Secondly, to address the imbalance in training samples, a new loss function is developed by combining traditional binary cross-entropy loss with focal loss, enhancing the precision of image semantic segmentation. Lastly, ablation and comparative experiments on the PLDU and Mendeley datasets demonstrate that the proposed model achieves 54.7% IoU and 80.1% recall on the PLDU dataset, and 79.3% IoU and 93.1% recall on the Mendeley dataset, outperforming other listed models. Additionally, robustness experiments show the adaptability of the Axial-UNet++ model under extreme conditions and the augmented image dataset used in this study has been open sourced. Full article
Show Figures

Figure 1

Figure 1
<p>The overall structure of UNet++. The superscript for <math display="inline"><semantics> <msup> <mi>X</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msup> </semantics></math> denotes the <span class="html-italic">j</span>-th module in the <span class="html-italic">i</span>-th layer.</p>
Full article ">Figure 2
<p>Convolutional block of UNet++.</p>
Full article ">Figure 3
<p>The overall structure of axial-channel local normalization module.</p>
Full article ">Figure 4
<p>Gated axial attention mechanism. The matrices <math display="inline"><semantics> <msub> <mi>W</mi> <mi>Q</mi> </msub> </semantics></math>, <math display="inline"><semantics> <msub> <mi>W</mi> <mi>K</mi> </msub> </semantics></math>, and <math display="inline"><semantics> <msub> <mi>W</mi> <mi>V</mi> </msub> </semantics></math> correspond to the parameter matrices for the query term <math display="inline"><semantics> <msub> <mi>q</mi> <mn>0</mn> </msub> </semantics></math>, the key term <math display="inline"><semantics> <msub> <mi>k</mi> <mn>0</mn> </msub> </semantics></math>, and the value term <math display="inline"><semantics> <msub> <mi>v</mi> <mn>0</mn> </msub> </semantics></math>, respectively.</p>
Full article ">Figure 5
<p>The overall structure of Axial-UNet++. The superscript for <math display="inline"><semantics> <msup> <mi>X</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msup> </semantics></math> and <math display="inline"><semantics> <msup> <mi>A</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> </mrow> </msup> </semantics></math> denotes the <span class="html-italic">j</span>-th module in the <span class="html-italic">i</span>-th layer.</p>
Full article ">Figure 6
<p>Example images of power lines and their corresponding ground truth annotations from the PLDU and Mendeley datasets.</p>
Full article ">Figure 7
<p>Comparison of experimental results with different attention feature extraction modules.</p>
Full article ">Figure 8
<p>Comparison of experimental results with general semantic segmentation models.</p>
Full article ">Figure 9
<p>Comparison of experimental results with power line segmentation specialized models.</p>
Full article ">Figure 10
<p>Original images.</p>
Full article ">Figure 11
<p>Experimental results in a foggy environment. In the predicted results, FN pixels are represented in red, while FP pixels are indicated in green.</p>
Full article ">Figure 12
<p>Experimental results in a snowy environment. In the predicted results, FN pixels are represented in red, while FP pixels are indicated in green.</p>
Full article ">Figure 13
<p>Experimental results in a strong light environment. In the predicted results, FN pixels are represented in red, while FP pixels are indicated in green.</p>
Full article ">Figure 14
<p>Experimental results in a motion blur environment. In the predicted results, FN pixels are represented in red, while FP pixels are indicated in green.</p>
Full article ">
19 pages, 12869 KiB  
Article
Cotton Weed-YOLO: A Lightweight and Highly Accurate Cotton Weed Identification Model for Precision Agriculture
by Jinghuan Hu, He Gong, Shijun Li, Ye Mu, Ying Guo, Yu Sun, Tianli Hu and Yu Bao
Agronomy 2024, 14(12), 2911; https://doi.org/10.3390/agronomy14122911 - 5 Dec 2024
Viewed by 515
Abstract
Precise weed recognition is an important step towards achieving intelligent agriculture. In this paper, a novel weed recognition model, Cotton Weed-YOLO, is proposed to improve the accuracy and efficiency of weed detection. CW-YOLO is based on YOLOv8 and introduces a dual-branch structure combining [...] Read more.
Precise weed recognition is an important step towards achieving intelligent agriculture. In this paper, a novel weed recognition model, Cotton Weed-YOLO, is proposed to improve the accuracy and efficiency of weed detection. CW-YOLO is based on YOLOv8 and introduces a dual-branch structure combining a Vision Transformer and a Convolutional Neural Network to address the problems of the small receptive field of the CNN and the high computational complexity of the transformer. The Receptive Field Enhancement (RFE) module is proposed to enable the feature pyramid network to adapt to the feature information of different receptive fields. A Scale-Invariant Shared Convolutional Detection (SSCD) head is proposed to fully utilize the advantages of shared convolution and significantly reduce the number of parameters in the detection head. The experimental results show that the CW-YOLO model outperforms existing methods in terms of detection accuracy and speed. Compared with the original YOLOv8n, the detection accuracy, mAP value, and recall rate are improved by 1.45, 0.7, and 0.6%, respectively, the floating-point numbers are reduced by 2.5 G, and the number of parameters is reduced by 1.52 × 106 times. The proposed CW-YOLO model provides powerful technical support for smart agriculture and is expected to promote the development of agricultural production in the direction of intelligence and precision. Full article
Show Figures

Figure 1

Figure 1
<p>Example images from the cotton weeds dataset.</p>
Full article ">Figure 2
<p>Cotton weed-YOLO structure chart.</p>
Full article ">Figure 3
<p>Two-channel structure diagram.</p>
Full article ">Figure 4
<p>HFF branch composition.</p>
Full article ">Figure 5
<p>T_branch encoder system.</p>
Full article ">Figure 6
<p>MHSA working principle.</p>
Full article ">Figure 7
<p>MHSA calculation process.</p>
Full article ">Figure 8
<p>Comparison of ConvGLU with traditional GLU. Figure (<b>a</b>) is CGLU, and Figure (<b>b</b>) is GLU.</p>
Full article ">Figure 9
<p>HFF model architecture.</p>
Full article ">Figure 10
<p>EMRF structure diagram.</p>
Full article ">Figure 11
<p>RFE, EUCB, MSCB, and EMRF structure diagram.</p>
Full article ">Figure 12
<p>SSCD detection head.</p>
Full article ">Figure 13
<p>Training accuracy and loss curves of deep learning models on the training dataset.</p>
Full article ">Figure 14
<p>Performance comparison of the five models.</p>
Full article ">Figure 15
<p>Comparison of composite indicators of the five models.</p>
Full article ">Figure 16
<p>Example of a weed image with a predicted bounding box.</p>
Full article ">
14 pages, 4606 KiB  
Article
Research on Multi-Scale Spatio-Temporal Graph Convolutional Human Behavior Recognition Method Incorporating Multi-Granularity Features
by Yulin Wang, Tao Song, Yichen Yang and Zheng Hong
Sensors 2024, 24(23), 7595; https://doi.org/10.3390/s24237595 - 28 Nov 2024
Viewed by 458
Abstract
Aiming at the problem that the existing human skeleton behavior recognition methods are insensitive to human local movements and show inaccurate recognition in distinguishing similar behaviors, a multi-scale spatio-temporal graph convolution method incorporating multi-granularity features is proposed for human behavior recognition. Firstly, a [...] Read more.
Aiming at the problem that the existing human skeleton behavior recognition methods are insensitive to human local movements and show inaccurate recognition in distinguishing similar behaviors, a multi-scale spatio-temporal graph convolution method incorporating multi-granularity features is proposed for human behavior recognition. Firstly, a skeleton fine-grained partitioning strategy is proposed, which initializes the skeleton data into data streams of different granularities. An adaptive cross-scale feature fusion layer is designed using a normalized Gaussian function to perform feature fusion among different granularities, guiding the model to focus on discriminative feature representations among similar behaviors through fine-grained features. Secondly, a sparse multi-scale adjacency matrix is introduced to solve the bias weighting problem that amplifies the multi-scale spatial domain modeling process under multi-granularity conditions. Finally, an end-to-end graph convolutional neural network is constructed to improve the feature expression ability of spatio-temporal receptive field information and enhance the robustness of recognition between similar behaviors. The feasibility of the proposed algorithm was verified on the public behavior recognition dataset MSR Action 3D, with a accuracy of 95.67%, which is superior to existing behavior recognition methods. Full article
(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems)
Show Figures

Figure 1

Figure 1
<p>Adjacency matrix topology diagram. (<b>a</b>–<b>c</b>) respectively represent the topological graphs of first-order, second-order, and third-order adjacency matrices used to connect human skeletal joints, while (<b>d</b>–<b>f</b>) respectively represent the topological graphs after constructing multi-scale adjacency matrices.</p>
Full article ">Figure 2
<p>Framework of multi-scale spatio-temporal graph convolutional network model incorporating multi-granularity features. (<b>a</b>) represents the overall framework of the proposed network, and (<b>b</b>) represents the framework of the multi-scale spatio-temporal convolutional module.</p>
Full article ">Figure 3
<p>Three granularity representation methods for MSR Action 3D. The blue nodes represent the original coarse-grained joints, and the red nodes represent the newly added fine-grained joints.</p>
Full article ">Figure 4
<p>The structure of cross-scale feature fusion layer (CSFL).</p>
Full article ">Figure 5
<p>Skeleton graphs of different granularities.</p>
Full article ">Figure 6
<p>The confusion matrix of the MSR Action 3D dataset. The darker the background color of each grid in the figure, the higher the recognition rate it represents.</p>
Full article ">
19 pages, 8633 KiB  
Article
CNN-CBAM-LSTM: Enhancing Stock Return Prediction Through Long and Short Information Mining in Stock Prediction
by Peijie Ye, Hao Zhang and Xi Zhou
Mathematics 2024, 12(23), 3738; https://doi.org/10.3390/math12233738 - 27 Nov 2024
Viewed by 653
Abstract
Deep learning, a foundational technology in artificial intelligence, facilitates the identification of complex associations between stock prices and various influential factors through comprehensive data analysis. Stock price data exhibits unique time-series characteristics; models emphasizing long-term data may miss short-term fluctuations, while those focusing [...] Read more.
Deep learning, a foundational technology in artificial intelligence, facilitates the identification of complex associations between stock prices and various influential factors through comprehensive data analysis. Stock price data exhibits unique time-series characteristics; models emphasizing long-term data may miss short-term fluctuations, while those focusing solely on short-term data may not capture cyclical trends. Existing models that integrate long short-term memory (LSTM) and convolutional neural networks (CNNs) face limitations in capturing both long- and short-term dependencies due to LSTM’s gated transmission mechanism and CNNs’ limited receptive field. This study introduces an innovative deep learning model, CNN-CBAM-LSTM, which integrates the convolutional block attention module (CBAM) to enhance the extraction of both long- and short-term features. The model’s performance is assessed using the Australian Standard & Poor’s 200 Index (AS51), showing improvement over traditional models across metrics such as RMSE, MAE, R2, and RETURN. To further confirm its robustness and generalizability, Diebold–Mariano (DM) tests and model confidence set experiments are conducted, with results indicating the consistently high performance of the CNN-CBAM-LSTM model. Additional tests on six globally recognized stock indices reinforce the model’s predictive strength and adaptability, establishing it as a reliable tool for forecasting in the stock market. Full article
Show Figures

Figure 1

Figure 1
<p>Architecture of the method proposed in this paper.</p>
Full article ">Figure 2
<p>Feature distribution and outlier analysis for stock data.</p>
Full article ">Figure 3
<p>Channel attention mechanism diagram.</p>
Full article ">Figure 4
<p>Feature fusion module.</p>
Full article ">Figure 5
<p>Actual vs. predicted price changes.</p>
Full article ">Figure 6
<p>Opening price change predictions (100-day subset).</p>
Full article ">Figure 7
<p>Forecasting results of different models on the S&amp;P/ASX200 index for the 100-day test period (look-back = 20 days).</p>
Full article ">Figure 8
<p>Prediction results of different models for the AS51 index of the whole test set (look-back = 20 days).</p>
Full article ">Figure 9
<p>CNN-CBAM-LSTM model’s forecasting capability for HSI indices.</p>
Full article ">Figure 10
<p>CNN-CBAM-LSTM model’s forecasting capability for N225 indices.</p>
Full article ">Figure 11
<p>CNN-CBAM-LSTM model’s forecasting capability for SPX indices.</p>
Full article ">Figure 12
<p>CNN-CBAM-LSTM model’s forecasting capability for FTSE indices.</p>
Full article ">Figure 13
<p>CNN-CBAM-LSTM model’s forecasting capability for IXIC indices.</p>
Full article ">Figure 14
<p>CNN-CBAM-LSTM model’s forecasting capability for TWII indices.</p>
Full article ">
18 pages, 4823 KiB  
Article
ME-FCN: A Multi-Scale Feature-Enhanced Fully Convolutional Network for Building Footprint Extraction
by Hui Sheng, Yaoteng Zhang, Wei Zhang, Shiqing Wei, Mingming Xu and Yasir Muhammad
Remote Sens. 2024, 16(22), 4305; https://doi.org/10.3390/rs16224305 - 19 Nov 2024
Viewed by 680
Abstract
The precise extraction of building footprints using remote sensing technology is increasingly critical for urban planning and development amid growing urbanization. However, considering the complexity of building backgrounds, diverse scales, and varied appearances, accurately and efficiently extracting building footprints from various remote sensing [...] Read more.
The precise extraction of building footprints using remote sensing technology is increasingly critical for urban planning and development amid growing urbanization. However, considering the complexity of building backgrounds, diverse scales, and varied appearances, accurately and efficiently extracting building footprints from various remote sensing images remains a significant challenge. In this paper, we propose a novel network architecture called ME-FCN, specifically designed to perceive and optimize multi-scale features to effectively address the challenge of extracting building footprints from complex remote sensing images. We introduce a Squeeze-and-Excitation U-Block (SEUB), which cascades multi-scale semantic information exploration in shallow feature maps and incorporates channel attention to optimize features. In the network’s deeper layers, we implement an Adaptive Multi-scale feature Enhancement Block (AMEB), which captures large receptive field information through concatenated atrous convolutions. Additionally, we develop a novel Dual Multi-scale Attention (DMSA) mechanism to further enhance the accuracy of cascaded features. DMSA captures multi-scale semantic features across both channel and spatial dimensions, suppresses redundant information, and realizes multi-scale feature interaction and fusion, thereby improving the overall accuracy and efficiency. Comprehensive experiments on three datasets demonstrate that ME-FCN outperforms mainstream segmentation methods. Full article
Show Figures

Figure 1

Figure 1
<p>Architecture of the proposed ME-FCN method. (Blue represents the encoder and orange represents the decoder.)</p>
Full article ">Figure 2
<p>Overall architecture of the SEUB. SEUB adopts a simple U-shaped structure, with the bottom layer utilizing atrous convolution to achieve a larger receptive field, while the skip connections incorporate squeeze-and-excitation attention to enhance network stability.</p>
Full article ">Figure 3
<p>Residual structure schematic of SEUB.</p>
Full article ">Figure 4
<p>Schematic diagram of the Adaptive Multi-scale feature Enhancement Block. The AMEB consists of atrous convolutions with multiple dilation rates and introduces an auxiliary branch, allowing high-level feature maps to autonomously select information utilization, reducing the randomness of feature utilization.</p>
Full article ">Figure 5
<p>Dual Multi-scale Attention structure. DMSA performs feature alignment at three scales to alleviate semantic differences and achieve feature interaction and fusion.</p>
Full article ">Figure 6
<p>The set of predicted results obtained from different algorithms on the WHU aerial dataset (blue indicates the number of false negatives, red represents the number of false positives, white indicates the correct identification of positive samples, and black shows the correct identification of negative samples).</p>
Full article ">Figure 7
<p>The set of predicted results obtained from different algorithms on the Massachusetts dataset (blue indicates the number of false negatives, red represents the number of false positives, white indicates the correct identification of positive samples, and black shows the correct identification of negative samples).</p>
Full article ">Figure 8
<p>The set of predicted results obtained from different algorithms on the GF-2 building dataset (blue indicates the number of false positives, red represents the number of false negatives, white indicates the correct identification of positive samples, and black shows the correct identification of negative samples).</p>
Full article ">Figure 9
<p>Ablation experiment results with different module combinations. (<b>a</b>) Image, (<b>b</b>) ground truth, (<b>c</b>) baseline, (<b>d</b>) Baseline + SEUB (<b>e</b>) baseline + SEUB + DMSA, and (<b>f</b>) baseline + SEUB + DMSA + AMEB.</p>
Full article ">
18 pages, 5553 KiB  
Article
LI-YOLO: An Object Detection Algorithm for UAV Aerial Images in Low-Illumination Scenes
by Songwen Liu, Hao He, Zhichao Zhang and Yatong Zhou
Drones 2024, 8(11), 653; https://doi.org/10.3390/drones8110653 - 7 Nov 2024
Viewed by 962
Abstract
With the development of unmanned aerial vehicle (UAV) technology, deep learning is becoming more and more widely used in object detection in UAV aerial images; however, detecting and identifying small objects in low-illumination scenes is still a major challenge. Aiming at the problem [...] Read more.
With the development of unmanned aerial vehicle (UAV) technology, deep learning is becoming more and more widely used in object detection in UAV aerial images; however, detecting and identifying small objects in low-illumination scenes is still a major challenge. Aiming at the problem of low brightness, high noise, and obscure details of low-illumination images, an object detection algorithm, LI-YOLO (Low-Illumination You Only Look Once), for UAV aerial images in low-illumination scenes is proposed. Specifically, in the feature extraction section, this paper proposes a feature enhancement block (FEB) to realize global receptive field and context information learning through lightweight operations and embeds it into the C2f module at the end of the backbone network to alleviate the problems of high noise and detail blur caused by low illumination with very few parameter costs. In the feature fusion part, aiming to improve the detection performance for small objects in UAV aerial images, a shallow feature fusion network and a small object detection head are added. In addition, the adaptive spatial feature fusion structure (ASFF) is also introduced, which adaptively fuses information from different levels of feature maps by optimizing the feature fusion strategy so that the network can more accurately identify and locate objects of various scales. The experimental results show that the mAP50 of LI-YOLO reaches 76.6% on the DroneVehicle dataset and 90.8% on the LLVIP dataset. Compared with other current algorithms, LI-YOLO improves the mAP 50 by 3.1% on the DroneVehicle dataset and 6.9% on the LLVIP dataset. Experimental results show that the proposed algorithm can effectively improve object detection performance in low-illumination scenes. Full article
Show Figures

Figure 1

Figure 1
<p>The network structure of YOLOv8. It contains the backbone, neck, and head.</p>
Full article ">Figure 2
<p>Comparison of C3 module and C2f module. (<b>a</b>) represents the structure of C3, (<b>b</b>) represents the structure of C2f, where the structure of the Bottleneck module is shown in (<b>c</b>).</p>
Full article ">Figure 2 Cont.
<p>Comparison of C3 module and C2f module. (<b>a</b>) represents the structure of C3, (<b>b</b>) represents the structure of C2f, where the structure of the Bottleneck module is shown in (<b>c</b>).</p>
Full article ">Figure 3
<p>Illustration of the adaptive spatial feature fusion mechanism. For each level, the features of all the other levels are resized to the same shape and spatially fused according to the learned weight maps.</p>
Full article ">Figure 4
<p>LI-YOLO’s framework. It contains the backbone, neck, and head. The gray cells represent the original modules of the baseline algorithm YOLOv8 and the colored bold cells represent the improved or newly added modules. Among them, the model structure is improved with C2fFE, the improved feature fusion network, and ASFF, all of which are bordered in black.</p>
Full article ">Figure 5
<p>The structure of C2fFE. The structure of the Bottleneck in this figure is the same as in <a href="#drones-08-00653-f002" class="html-fig">Figure 2</a>c.</p>
Full article ">Figure 6
<p>The structures of MHSA and FEB, where (<b>a</b>) represents MHSA and (<b>b</b>) represents FEB, where n. d, d. n, and n. n represent the dimensions of matrix, respectively.</p>
Full article ">Figure 7
<p>Adaptive spatial feature fusion structure diagram. The formula in the light blue dotted box represents the fusion algorithm for one layer, and the other layers are similar.</p>
Full article ">Figure 8
<p>The amount of data for all labels in the DroneVehicle training set. (<b>a</b>) Position distribution of the labels in the training set. (<b>b</b>) Width and height distribution of the labels in the training set.</p>
Full article ">Figure 9
<p>mAP50 and mAP50:95 for the ablation experiment, where (<b>a</b>) represents mAP50 and (<b>b</b>) represents mAP50:95.</p>
Full article ">Figure 9 Cont.
<p>mAP50 and mAP50:95 for the ablation experiment, where (<b>a</b>) represents mAP50 and (<b>b</b>) represents mAP50:95.</p>
Full article ">Figure 10
<p>Comparison of experimental results with or without Retinex low-illumination image enhancement, where (<b>a</b>) indicates that the Retinex low-illumination image enhancement technology is not used, and (<b>b</b>) indicates that the enhancement technology is used.</p>
Full article ">Figure 11
<p>Comparison of experimental results in different low-illumination scenes, where (<b>a</b>) represents the detection effect of YOLOv8 in foggy scenes, (<b>b</b>) represents the detection effect of LI-YOLO in foggy scenes, (<b>c</b>) represents the detection effect of YOLOv8 in night scenes, and (<b>d</b>) represents the detection effect of LI-YOLO in night scenes.</p>
Full article ">Figure 11 Cont.
<p>Comparison of experimental results in different low-illumination scenes, where (<b>a</b>) represents the detection effect of YOLOv8 in foggy scenes, (<b>b</b>) represents the detection effect of LI-YOLO in foggy scenes, (<b>c</b>) represents the detection effect of YOLOv8 in night scenes, and (<b>d</b>) represents the detection effect of LI-YOLO in night scenes.</p>
Full article ">
22 pages, 46624 KiB  
Article
Autonomous Extraction Technology for Aquaculture Ponds in Complex Geological Environments Based on Multispectral Feature Fusion of Medium-Resolution Remote Sensing Imagery
by Zunxun Liang, Fangxiong Wang, Jianfeng Zhu, Peng Li, Fuding Xie and Yifei Zhao
Remote Sens. 2024, 16(22), 4130; https://doi.org/10.3390/rs16224130 - 5 Nov 2024
Viewed by 774
Abstract
Coastal aquaculture plays a crucial role in global food security and the economic development of coastal regions, but it also causes environmental degradation in coastal ecosystems. Therefore, the automation, accurate extraction, and monitoring of coastal aquaculture areas are crucial for the scientific management [...] Read more.
Coastal aquaculture plays a crucial role in global food security and the economic development of coastal regions, but it also causes environmental degradation in coastal ecosystems. Therefore, the automation, accurate extraction, and monitoring of coastal aquaculture areas are crucial for the scientific management of coastal ecological zones. This study proposes a novel deep learning- and attention-based median adaptive fusion U-Net (MAFU-Net) procedure aimed at precisely extracting individually separable aquaculture ponds (ISAPs) from medium-resolution remote sensing imagery. Initially, this study analyzes the spectral differences between aquaculture ponds and interfering objects such as saltwater fields in four typical aquaculture areas along the coast of Liaoning Province, China. It innovatively introduces a difference index for saltwater field aquaculture zones (DIAS) and integrates this index as a new band into remote sensing imagery to increase the expressiveness of features. A median augmented adaptive fusion module (MEA-FM), which adaptively selects channel receptive fields at various scales, integrates the information between channels, and captures multiscale spatial information to achieve improved extraction accuracy, is subsequently designed. Experimental and comparative results reveal that the proposed MAFU-Net method achieves an F1 score of 90.67% and an intersection over union (IoU) of 83.93% on the CHN-LN4-ISAPS-9 dataset, outperforming advanced methods such as U-Net, DeepLabV3+, SegNet, PSPNet, SKNet, UPS-Net, and SegFormer. This study’s results provide accurate data support for the scientific management of aquaculture areas, and the proposed MAFU-Net method provides an effective method for semantic segmentation tasks based on medium-resolution remote sensing images. Full article
Show Figures

Figure 1

Figure 1
<p>Study area (A represents the coastal aquaculture area of Qingduizi Bay, Zhuanghe City, Liaoning Province; B represents the coastal aquaculture area north of Maya Island, Pulandian District, Dalian City; C represents the coastal aquaculture area of Yingkou City, east of Liaodong Bay; and D represents the Calabash Island Changshan Temple Bay coastal aquaculture area).</p>
Full article ">Figure 2
<p>True colour images of the Yingkou area in January, April, July, and October.</p>
Full article ">Figure 3
<p>Spectral analysis charts of the Yingkou area in January, April, July, and October (the green line in <a href="#remotesensing-16-04130-f003" class="html-fig">Figure 3</a> represents the spectral values of the aquaculture ponds across the 13 bands, the red line represents the spectral values of the saltwater fields across the 13 bands, and the orange line represents the spectral values of the embankments across the 13 bands).</p>
Full article ">Figure 4
<p>Extraction results for saltwater fields using DIAS with different thresholds.</p>
Full article ">Figure 5
<p>Adaptive attention U-Net (MAFU-Net).</p>
Full article ">Figure 6
<p>Median-enhanced adaptive fusion module (MEA-FM).</p>
Full article ">Figure 7
<p>Visual comparison results of different network models on the CHN-LN4-ISAPs-9 dataset are presented. Among them, (<b>a</b>–<b>h</b>) depict various areas, with the first two columns showing the dataset images and the labeled images, while the subsequent eight columns display the extraction results from different models for each scene. (The black areas represent backgrounds, the white areas represent aquaculture areas, the red elliptical areas indicate misidentified water body regions, the yellow rectangular areas indicate misidentified saltwater fields and other land feature regions, the blue rectangular areas indicate misidentified fallow aquaculture ponds, the orange areas indicate omissions, and the green areas indicate edge adhesion).</p>
Full article ">Figure 8
<p>Locations of the verification areas (a represents the Zhangxia Bay coastal aquaculture area in Dalian, Liaoning Province; b represents the Pulandian Bay coastal aquaculture area in Dalian, Liaoning Province; c represents the Taiping Bay coastal aquaculture area in Dalian, Liaoning Province; d represents the southern coastal aquaculture area in Jinzhou, Liaoning Province).</p>
Full article ">
14 pages, 16241 KiB  
Article
Seismic Random Noise Attenuation Using DARE U-Net
by Tara P. Banjade, Cong Zhou, Hui Chen, Hongxing Li, Juzhi Deng, Feng Zhou and Rajan Adhikari
Remote Sens. 2024, 16(21), 4051; https://doi.org/10.3390/rs16214051 - 30 Oct 2024
Viewed by 730
Abstract
Seismic data processing plays a pivotal role in extracting valuable subsurface information for various geophysical applications. However, seismic records often suffer from inherent random noise, which obscures meaningful geological features and reduces the reliability of interpretations. In recent years, deep learning methodologies have [...] Read more.
Seismic data processing plays a pivotal role in extracting valuable subsurface information for various geophysical applications. However, seismic records often suffer from inherent random noise, which obscures meaningful geological features and reduces the reliability of interpretations. In recent years, deep learning methodologies have shown promising results in performing noise attenuation tasks on seismic data. In this research, we propose modifications to the standard U-Net structure by integrating dense and residual connections, which serve as the foundation of our approach named the dense and residual (DARE U-Net) network. Dense connections enhance the receptive field and ensure that information from different scales is considered during the denoising process. Our model implements local residual connections between layers within the encoder, which allows earlier layers to directly connect with deep layers. This promotes the flow of information, allowing the network to utilize filtered and unfiltered input. The combined network mechanisms preserve the spatial information loss during the contraction process so that the decoder can locate the features more accurately by retaining the high-resolution features, enabling precise location in seismic image denoising. We evaluate this adapted architecture by applying synthetic and real data sets and calculating the peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM). The effectiveness of this method is well noted. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>DARE U-Net architecture.</p>
Full article ">Figure 2
<p>Residual connection.</p>
Full article ">Figure 3
<p>Local residual connection within each layer of an encoder.</p>
Full article ">Figure 4
<p>Structure of residual dense block.</p>
Full article ">Figure 5
<p>A sample of the training data. (<b>a</b>) Noise-free data. (<b>b</b>) Noisy data.</p>
Full article ">Figure 6
<p>Test on four sets of seismic data. First to fifth column: noise-free data, noisy data, denoised by wavelet, U-Net, and DARE U-Net.</p>
Full article ">Figure 7
<p>(<b>a</b>) Noise-free data. (<b>b</b>) Noisy data. (<b>c</b>) Denoised by wavelet. (<b>d</b>) Denoised by U-Net. (<b>e</b>) Denoised by DARE U-Net.</p>
Full article ">Figure 8
<p>FK spectrum comparisons. (<b>a</b>) Noise-free data. (<b>b</b>) Noisy data. (<b>c</b>) Denoised by wavelet. (<b>d</b>) Denoised by U-Net. (<b>e</b>) Denoised by DARE U-Net.</p>
Full article ">Figure 8 Cont.
<p>FK spectrum comparisons. (<b>a</b>) Noise-free data. (<b>b</b>) Noisy data. (<b>c</b>) Denoised by wavelet. (<b>d</b>) Denoised by U-Net. (<b>e</b>) Denoised by DARE U-Net.</p>
Full article ">Figure 9
<p>Real data test. (<b>a</b>) Noise-free data. (<b>b</b>) Noisy data. (<b>c</b>) Denoised by wavelet. (<b>d</b>) Denoised by U-Net. (<b>e</b>) Denoised by DARE U-Net.</p>
Full article ">Figure 10
<p>Residual section of denoised real data. (<b>a</b>) Wavelet. (<b>b</b>) U-Net. (<b>c</b>) DARE U-Net.</p>
Full article ">
Back to TopTop