MDPI - Publisher of Open Access Journals

20 pages, 6303 KiB

Open AccessArticle

Progressive Transmission Line Image Transmission and Recovery Algorithm Based on Hybrid Attention and Feature Fusion for Signal-Free Regions of Transmission Lines

by Xiu Ji, Xiao Yang, Zheyu Yue, Hongliu Yang and Haiyang Guo

Electronics 2024, 13(23), 4605; https://doi.org/10.3390/electronics13234605 - 22 Nov 2024

Viewed by 120

Abstract

In this paper, a progressive image transmission and recovery algorithm based on hybrid attention mechanism and feature fusion is proposed, aiming to solve the challenge of monitoring the signal-less region of transmission lines. The method combines wavelet transform, Swin Transformer, and hybrid attention [...] Read more.

In this paper, a progressive image transmission and recovery algorithm based on hybrid attention mechanism and feature fusion is proposed, aiming to solve the challenge of monitoring the signal-less region of transmission lines. The method combines wavelet transform, Swin Transformer, and hybrid attention module with the Pixel Shuffle upsampling mechanism to achieve a balance between quality and efficiency of image transmission in a low bandwidth environment. Initial preview is achieved by prioritizing the transmission of low-frequency subbands through wavelet transform, followed by dynamic optimization of the weight allocation of key features using a hybrid attention and local window multi-scale self-attention mechanism, and further enhancement of the resolution of the decoded image through Pixel Shuffle upsampling. Experimental results show that the algorithm significantly outperforms existing methods in terms of image quality (PSNR, SSIM), transmission efficiency, and bandwidth utilization, proving its superior adaptability and effectiveness in surveillance scenarios in signal-free regions. Full article

► Show Figures

Figure 1

12 pages, 6649 KiB

Open AccessArticle

Masked Image Modeling Meets Self-Distillation: A Transformer-Based Prostate Gland Segmentation Framework for Pathology Slides

by Haoyue Zhang, Sushant Patkar, Rosina Lis, Maria J. Merino, Peter A. Pinto, Peter L. Choyke, Baris Turkbey and Stephanie Harmon

Cancers 2024, 16(23), 3897; https://doi.org/10.3390/cancers16233897 - 21 Nov 2024

Viewed by 220

Abstract

Detailed evaluation of prostate cancer glands is an essential yet labor-intensive step in grading prostate cancer. Gland segmentation can serve as a valuable preliminary step for machine-learning-based downstream tasks, such as Gleason grading, patient classification, cancer biomarker building, and survival analysis. Despite its [...] Read more.

Detailed evaluation of prostate cancer glands is an essential yet labor-intensive step in grading prostate cancer. Gland segmentation can serve as a valuable preliminary step for machine-learning-based downstream tasks, such as Gleason grading, patient classification, cancer biomarker building, and survival analysis. Despite its importance, there is currently a lack of a reliable gland segmentation model for prostate cancer. Without accurate gland segmentation, researchers rely on cell-level or human-annotated regions of interest for pathomic and deep feature extraction. This approach is sub-optimal, as the extracted features are not explicitly tailored to gland information. Although foundational segmentation models have gained a lot of interest, we demonstrated the limitations of this approach. This work proposes a prostate gland segmentation framework that utilizes a dual-path Swin Transformer UNet structure and leverages Masked Image Modeling for large-scale self-supervised pretaining. A tumor-guided self-distillation step further fused the binary tumor labels of each patch to the encoder to ensure the encoders are suitable for the gland segmentation step. We united heterogeneous data sources for self-supervised training, including biopsy and surgical specimens, to reflect the diversity of benign and cancerous pathology features. We evaluated the segmentation performance on two publicly available prostate cancer datasets. We achieved state-of-the-art segmentation performance with a test mDice of 0.947 on the PANDA dataset and a test mDice of 0.664 on the SICAPv2 dataset. Full article

(This article belongs to the Section Methods and Technologies Development)

► Show Figures

Figure 1

21 pages, 12271 KiB

Open AccessArticle

Detection of Marine Oil Spill from PlanetScope Images Using CNN and Transformer Models

by Jonggu Kang, Chansu Yang, Jonghyuk Yi and Yangwon Lee

J. Mar. Sci. Eng. 2024, 12(11), 2095; https://doi.org/10.3390/jmse12112095 - 19 Nov 2024

Viewed by 357

Abstract

The contamination of marine ecosystems by oil spills poses a significant threat to the marine environment, necessitating the prompt and effective implementation of measures to mitigate the associated damage. Satellites offer a spatial and temporal advantage over aircraft and unmanned aerial vehicles (UAVs) [...] Read more.

The contamination of marine ecosystems by oil spills poses a significant threat to the marine environment, necessitating the prompt and effective implementation of measures to mitigate the associated damage. Satellites offer a spatial and temporal advantage over aircraft and unmanned aerial vehicles (UAVs) in oil spill detection due to their wide-area monitoring capabilities. While oil spill detection has traditionally relied on synthetic aperture radar (SAR) images, the combined use of optical satellite sensors alongside SAR can significantly enhance monitoring capabilities, providing improved spatial and temporal coverage. The advent of deep learning methodologies, particularly convolutional neural networks (CNNs) and Transformer models, has generated considerable interest in their potential for oil spill detection. In this study, we conducted a comprehensive and objective comparison to evaluate the suitability of CNN and Transformer models for marine oil spill detection. High-resolution optical satellite images were used to optimize DeepLabV3+, a widely utilized CNN model; Swin-UPerNet, a representative Transformer model; and Mask2Former, which employs a Transformer-based architecture for both encoding and decoding. The results of cross-validation demonstrate a mean Intersection over Union (mIoU) of 0.740, 0.840 and 0.804 for all the models, respectively, indicating their potential for detecting oil spills in the ocean. Additionally, we performed a histogram analysis on the predicted oil spill pixels, which allowed us to classify the types of oil. These findings highlight the considerable promise of the Swin Transformer models for oil spill detection in the context of future marine disaster monitoring. Full article

(This article belongs to the Special Issue Remote Sensing Applications in Marine Environmental Monitoring)

► Show Figures

Figure 1

15 pages, 10336 KiB

Open AccessTechnical Note

Multi-Scenario Remote Sensing Image Forgery Detection Based on Transformer and Model Fusion

by Jinmiao Zhao, Zelin Shi, Chuang Yu and Yunpeng Liu

Remote Sens. 2024, 16(22), 4311; https://doi.org/10.3390/rs16224311 - 19 Nov 2024

Viewed by 258

Abstract

Recently, remote sensing image forgery detection has received widespread attention. To improve the detection accuracy, we build a novel scheme based on Transformer and model fusion. Specifically, we model this task as a binary classification task that focuses on global information. First, we [...] Read more.

Recently, remote sensing image forgery detection has received widespread attention. To improve the detection accuracy, we build a novel scheme based on Transformer and model fusion. Specifically, we model this task as a binary classification task that focuses on global information. First, we explore the performance of various excellent feature extraction networks in this task under the constructed unified classification framework. On this basis, we select three high-performance Transformer-based networks that focus on global information, namely, Swin Transformer V1, Swin Transformer V2, and Twins, as the backbone networks and fuse them. Secondly, considering the small number of samples, we use the public ImageNet-1K dataset to pre-train the network to learn more stable feature expressions. At the same time, a circular data divide strategy is proposed, which can fully utilize all the samples to improve the accuracy in the competition. Finally, to promote network optimization, on the one hand, we explore multiple loss functions and select label smooth loss, which can reduce the model’s excessive dependence on training data. On the other hand, we construct a combined learning rate optimization strategy that first uses step degeneration and then cosine annealing, which reduces the risk of the network falling into local optima. Extensive experiments show that the proposed scheme has excellent performance. This scheme won seventh place in the “Forgery Detection in Multi-scenario Remote Sensing Images of Typical Objects” track of the 2024 ISPRS TC I contest on Intelligent Interpretation for Multi-modal Remote Sensing Application. Full article

(This article belongs to the Special Issue Geospatial Artificial Intelligence (GeoAI) in Remote Sensing)

► Show Figures

Figure 1

21 pages, 10435 KiB

Open AccessArticle

SG-LPR: Semantic-Guided LiDAR-Based Place Recognition

by Weizhong Jiang, Hanzhang Xue, Shubin Si, Chen Min, Liang Xiao, Yiming Nie and Bin Dai

Electronics 2024, 13(22), 4532; https://doi.org/10.3390/electronics13224532 - 18 Nov 2024

Viewed by 272

Abstract

Place recognition plays a crucial role in tasks such as loop closure detection and re-localization in robotic navigation. As a high-level representation within scenes, semantics enables models to effectively distinguish geometrically similar places, therefore enhancing their robustness to environmental changes. Unlike most existing [...] Read more.

Place recognition plays a crucial role in tasks such as loop closure detection and re-localization in robotic navigation. As a high-level representation within scenes, semantics enables models to effectively distinguish geometrically similar places, therefore enhancing their robustness to environmental changes. Unlike most existing semantic-based LiDAR place recognition (LPR) methods that adopt a multi-stage and relatively segregated data-processing and storage pipeline, we propose a novel end-to-end LPR model guided by semantic information—SG-LPR. This model introduces a semantic segmentation auxiliary task to guide the model in autonomously capturing high-level semantic information from the scene, implicitly integrating these features into the main LPR task, thus providing a unified framework of “segmentation-while-describing” and avoiding additional intermediate data-processing and storage steps. Moreover, the semantic segmentation auxiliary task operates only during model training, therefore not adding any time overhead during the testing phase. The model also combines the advantages of Swin Transformer and U-Net to address the shortcomings of current semantic-based LPR methods in capturing global contextual information and extracting fine-grained features. Extensive experiments conducted on multiple sequences from the KITTI and NCLT datasets validate the effectiveness, robustness, and generalization ability of our proposed method. Our approach achieves notable performance improvements over state-of-the-art methods. Full article

(This article belongs to the Collection Advance Technologies of Navigation for Intelligent Vehicles)

► Show Figures

Figure 1

22 pages, 7431 KiB

Open AccessArticle

EDH-STNet: An Evaporation Duct Height Spatiotemporal Prediction Model Based on Swin-Unet Integrating Multiple Environmental Information Sources

by Hanjie Ji, Lixin Guo, Jinpeng Zhang, Yiwen Wei, Xiangming Guo and Yusheng Zhang

Remote Sens. 2024, 16(22), 4227; https://doi.org/10.3390/rs16224227 - 13 Nov 2024

Viewed by 512

Abstract

Given the significant spatial non-uniformity of marine evaporation ducts, accurately predicting the regional distribution of evaporation duct height (EDH) is crucial for ensuring the stable operation of radio systems. While machine-learning-based EDH prediction models have been extensively developed, they fail to provide the [...] Read more.

Given the significant spatial non-uniformity of marine evaporation ducts, accurately predicting the regional distribution of evaporation duct height (EDH) is crucial for ensuring the stable operation of radio systems. While machine-learning-based EDH prediction models have been extensively developed, they fail to provide the EDH distribution over large-scale regions in practical applications. To address this limitation, we have developed a novel spatiotemporal prediction model for EDH that integrates multiple environmental information sources, termed the EDH Spatiotemporal Network (EDH-STNet). This model is based on the Swin-Unet architecture, employing an Encoder–Decoder framework that utilizes consecutive Swin-Transformers. This design effectively captures complex spatial correlations and temporal characteristics. The EDH-STNet model also incorporates nonlinear relationships between various hydrometeorological parameters (HMPs) and EDH. In contrast to existing models, it introduces multiple HMPs to enhance these relationships. By adopting a data-driven approach that integrates these HMPs as prior information, the accuracy and reliability of spatiotemporal predictions are significantly improved. Comprehensive testing and evaluation demonstrate that the EDH-STNet model, which merges an advanced deep learning algorithm with multiple HMPs, yields accurate predictions of EDH for both immediate and future timeframes. This development offers a novel solution to ensure the stable operation of radio systems. Full article

(This article belongs to the Section Atmospheric Remote Sensing)

► Show Figures

Figure 1

18 pages, 4997 KiB

Open AccessArticle

Robotic Grasping Detection Algorithm Based on 3D Vision Dual-Stream Encoding Strategy

by Minglin Lei, Pandong Wang, Hua Lei, Jieyun Ma, Wei Wu and Yongtao Hao

Electronics 2024, 13(22), 4432; https://doi.org/10.3390/electronics13224432 - 12 Nov 2024

Viewed by 420

Abstract

The automatic generation of stable robotic grasping postures is crucial for the application of computer vision algorithms in real-world settings. This task becomes especially challenging in complex environments, where accurately identifying the geometric shapes and spatial relationships between objects is essential. To enhance [...] Read more.

The automatic generation of stable robotic grasping postures is crucial for the application of computer vision algorithms in real-world settings. This task becomes especially challenging in complex environments, where accurately identifying the geometric shapes and spatial relationships between objects is essential. To enhance the capture of object pose information in 3D visual scenes, we propose a planar robotic grasping detection algorithm named SU-Grasp, which simultaneously focuses on local regions and long-distance relationships. Built upon a U-shaped network, SU-Grasp introduces a novel dual-stream encoding strategy using the Swin Transformer combined with spatial semantic enhancement. Compared to existing baseline methods, our algorithm achieves superior performance across public datasets, simulation tests, and real-world scenarios, highlighting its robust understanding of complex spatial environments. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Deep Learning and Its Applications)

► Show Figures

Figure 1

15 pages, 4396 KiB

Open AccessArticle

Breast Cancer Classification Using Fine-Tuned SWIN Transformer Model on Mammographic Images

by Oluwatosin Tanimola, Olamilekan Shobayo, Olusogo Popoola and Obinna Okoyeigbo

Analytics 2024, 3(4), 461-475; https://doi.org/10.3390/analytics3040026 - 11 Nov 2024

Viewed by 546

Abstract

Breast cancer is the most prevalent type of disease among women. It has become one of the foremost causes of death among women globally. Early detection plays a significant role in administering personalized treatment and improving patient outcomes. Mammography procedures are often used [...] Read more.

Breast cancer is the most prevalent type of disease among women. It has become one of the foremost causes of death among women globally. Early detection plays a significant role in administering personalized treatment and improving patient outcomes. Mammography procedures are often used to detect early-stage cancer cells. This traditional method of mammography while valuable has limitations in its potential for false positives and negatives, patient discomfort, and radiation exposure. Therefore, there is a probe for more accurate techniques required in detecting breast cancer, leading to exploring the potential of machine learning in the classification of diagnostic images due to its efficiency and accuracy. This study conducted a comparative analysis of pre-trained CNNs (ResNet50 and VGG16) and vision transformers (ViT-base and SWIN transformer) with the inclusion of ViT-base trained from scratch model architectures to effectively classify mammographic breast cancer images into benign and malignant cases. The SWIN transformer exhibits superior performance with 99.9% accuracy and a precision of 99.8%. These findings demonstrate the efficiency of deep learning to accurately classify mammographic breast cancer images for the diagnosis of breast cancer, leading to improvements in patient outcomes. Full article

► Show Figures

Figure 1

22 pages, 12107 KiB

Open AccessArticle

Deep Learning-Based Classification of Macrofungi: Comparative Analysis of Advanced Models for Accurate Fungi Identification

by Sifa Ozsari, Eda Kumru, Fatih Ekinci, Ilgaz Akata, Mehmet Serdar Guzel, Koray Acici, Eray Ozcan and Tunc Asuroglu

Sensors 2024, 24(22), 7189; https://doi.org/10.3390/s24227189 - 9 Nov 2024

Viewed by 630

Abstract

This study focuses on the classification of six different macrofungi species using advanced deep learning techniques. Fungi species, such as Amanita pantherina, Boletus edulis, Cantharellus cibarius, Lactarius deliciosus, Pleurotus ostreatus and Tricholoma terreum were chosen based on their ecological [...] Read more.

This study focuses on the classification of six different macrofungi species using advanced deep learning techniques. Fungi species, such as Amanita pantherina, Boletus edulis, Cantharellus cibarius, Lactarius deliciosus, Pleurotus ostreatus and Tricholoma terreum were chosen based on their ecological importance and distinct morphological characteristics. The research employed 5 different machine learning techniques and 12 deep learning models, including DenseNet121, MobileNetV2, ConvNeXt, EfficientNet, and swin transformers, to evaluate their performance in identifying fungi from images. The DenseNet121 model demonstrated the highest accuracy (92%) and AUC score (95%), making it the most effective in distinguishing between species. The study also revealed that transformer-based models, particularly the swin transformer, were less effective, suggesting room for improvement in their application to this task. Further advancements in macrofungi classification could be achieved by expanding datasets, incorporating additional data types such as biochemical, electron microscopy, and RNA/DNA sequences, and using ensemble methods to enhance model performance. The findings contribute valuable insights into both the use of deep learning for biodiversity research and the ecological conservation of macrofungi species. Full article

(This article belongs to the Special Issue Artificial Intelligence in Computer Vision: Methods and Applications—2nd Edition)

► Show Figures

Figure 1

12 pages, 1905 KiB

Open AccessArticle

An Algorithmic Study of Transformer-Based Road Scene Segmentation in Autonomous Driving

by Hao Cui and Juyang Lei

World Electr. Veh. J. 2024, 15(11), 516; https://doi.org/10.3390/wevj15110516 - 8 Nov 2024

Viewed by 481

Abstract

Applications such as autonomous driving require high-precision semantic image segmentation technology to identify and understand the content of each pixel in the images. Compared with traditional deep convolutional neural networks, the Transformer model is based on pure attention mechanisms, without convolutional layers or [...] Read more.

Applications such as autonomous driving require high-precision semantic image segmentation technology to identify and understand the content of each pixel in the images. Compared with traditional deep convolutional neural networks, the Transformer model is based on pure attention mechanisms, without convolutional layers or recurrent neural network layers. In this paper, we propose a new network structure called SwinLab, which is an improvement upon the Swin Transformer. Experimental results demonstrate that the improved SwinLab model achieves a segmentation accuracy comparable to that of deep convolutional neural network models in applications such as autonomous driving, with an MIoU of 77.61. Additionally, comparative experiments on the CityScapes dataset further validate the effectiveness and generalization of this structure. In conclusion, by refining the Swin Transformer, this paper simplifies the model structure, improves the training and inference speed, and maintains high accuracy, providing a more reliable semantic image segmentation solution for applications such as autonomous driving. Full article

► Show Figures

Figure 1

24 pages, 10567 KiB

Open AccessArticle

Dual-Modal Fusion PRI-SWT Model for Eddy Current Detection of Cracks, Delamination, and Impact Damage in Carbon Fiber-Reinforced Plastic Materials

by Rongyan Wen, Chongcong Tao, Hongli Ji and Jinhao Qiu

Appl. Sci. 2024, 14(22), 10282; https://doi.org/10.3390/app142210282 - 8 Nov 2024

Viewed by 564

Abstract

Carbon fiber-reinforced plastic (CFRP) composites are prone to damage during both manufacturing and operational phases, making the classification and identification of defects critical for maintaining structural integrity. This paper presents a novel dual-modal feature classification approach for the eddy current detection of CFRP [...] Read more.

Carbon fiber-reinforced plastic (CFRP) composites are prone to damage during both manufacturing and operational phases, making the classification and identification of defects critical for maintaining structural integrity. This paper presents a novel dual-modal feature classification approach for the eddy current detection of CFRP defects, utilizing a Parallel Real–Imaginary/Swin Transformer (PRI-SWT) model. Built using the Transformer architecture, the PRI-SWT model effectively integrates the real and imaginary components of sinusoidal voltage signals, demonstrating a significant performance improvement over traditional classification methods such as Support Vector Machine (SVM) and Vision Transformer (ViT). The proposed model achieved a classification accuracy exceeding 95%, highlighting its superior capability in terms of addressing the complexities of defect detection. Furthermore, the influence of key factors—including the real–imaginary fusion layer, the number of layers, the window shift size, and the model’s scale—on the classification performance of the PRI-SWT model was systematically evaluated. Full article

► Show Figures

Figure 1

17 pages, 5121 KiB

Open AccessArticle

Study on the Evolutionary Characteristics of Post-Fire Forest Recovery Using Unmanned Aerial Vehicle Imagery and Deep Learning: A Case Study of Jinyun Mountain in Chongqing, China

by Deli Zhu and Peiji Yang

Sustainability 2024, 16(22), 9717; https://doi.org/10.3390/su16229717 - 7 Nov 2024

Viewed by 491

Abstract

Forest fires pose a significant threat to forest ecosystems, with severe impacts on both the environment and human society. Understanding the post-fire recovery processes of forests is crucial for developing strategies for species diversity conservation and ecological restoration and preventing further damage. The [...] Read more.

Forest fires pose a significant threat to forest ecosystems, with severe impacts on both the environment and human society. Understanding the post-fire recovery processes of forests is crucial for developing strategies for species diversity conservation and ecological restoration and preventing further damage. The present study proposes applying the EAswin-Mask2former model based on semantic segmentation in deep learning using visible light band data to better monitor the evolution of burn areas in forests after fires. This model is an improvement of the classical semantic segmentation model Mask2former and can better adapt to the complex environment of burned forest areas. This model employs Swin-Transformer as the backbone for feature extraction, which is particularly advantageous for processing high-resolution images. It also includes the Contextual Transformer (CoT) Block to better capture contextual information capture and incorporates the Efficient Multi-Scale Attention (EMA) Block into the Efficiently Adaptive (EA) Block to enhance the model’s ability to learn key features and long-range dependencies. The experimental results demonstrate that the EAswin-Mask2former model can achieve a mean Intersection-over-Union (mIoU) of 76.35% in segmenting complex forest burn areas across different seasons, representing improvements of 3.26 and 0.58 percentage points, respectively, over the Mask2former models using ResNet and Swin-Transformer backbones, respectively. Moreover, this method surpasses the performance of the DeepLabV3+ and Segformer models by 4.04 and 1.75 percentage points, respectively. Ultimately, the proposed model offers excellent segmentation performance for both forest and burn areas and can effectively track the evolution of burned forests when combined with unmanned aerial vehicle (UAV) remote sensing images. Full article

(This article belongs to the Section Sustainable Forestry)

► Show Figures

Figure 1

22 pages, 5584 KiB

Open AccessArticle

Enhanced Magnetic Resonance Imaging-Based Brain Tumor Classification with a Hybrid Swin Transformer and ResNet50V2 Model

by Abeer Fayez Al Bataineh, Khalid M. O. Nahar, Hayel Khafajeh, Ghassan Samara, Raed Alazaidah, Ahmad Nasayreh, Ayah Bashkami, Hasan Gharaibeh and Waed Dawaghreh

Appl. Sci. 2024, 14(22), 10154; https://doi.org/10.3390/app142210154 - 6 Nov 2024

Viewed by 535

Abstract

Brain tumors can be serious; consequently, rapid and accurate detection is crucial. Nevertheless, a variety of obstacles, such as poor imaging resolution, doubts over the accuracy of data, a lack of diverse tumor classes and stages, and the possibility of misunderstanding, present challenges [...] Read more.

Brain tumors can be serious; consequently, rapid and accurate detection is crucial. Nevertheless, a variety of obstacles, such as poor imaging resolution, doubts over the accuracy of data, a lack of diverse tumor classes and stages, and the possibility of misunderstanding, present challenges to achieve an accurate and final diagnosis. Effective brain cancer detection is crucial for patients’ safety and health. Deep learning systems provide the capability to assist radiologists in quickly and accurately detecting diagnoses. This study presents an innovative deep learning approach that utilizes the Swin Transformer. The suggested method entails integrating the Swin Transformer with the pretrained deep learning model Resnet50V2, called (SwT+Resnet50V2). The objective of this modification is to decrease memory utilization, enhance classification accuracy, and reduce training complexity. The self-attention mechanism of the Swin Transformer identifies distant relationships and captures the overall context. Resnet 50V2 improves both accuracy and training speed by extracting adaptive features from the Swin Transformer’s dependencies. We evaluate the proposed framework using two publicly accessible brain magnetic resonance imaging (MRI) datasets, each including two and four distinct classes, respectively. Employing data augmentation and transfer learning techniques enhances model performance, leading to more dependable and cost-effective training. The suggested model achieves an impressive accuracy of 99.9% on the binary-labeled dataset and 96.8% on the four-labeled dataset, outperforming the VGG16, MobileNetV2, Resnet50V2, EfficientNetV2B3, ConvNeXtTiny, and convolutional neural network (CNN) algorithms used for comparison. This demonstrates that the Swin transducer, when combined with Resnet50V2, is capable of accurately diagnosing brain tumors. This method leverages the combination of SwT+Resnet50V2 to create an innovative diagnostic tool. Radiologists have the potential to accelerate and improve the detection of brain tumors, leading to improved patient outcomes and reduced risks. Full article

(This article belongs to the Special Issue Advances in Bioinformatics and Biomedical Engineering)

► Show Figures

Figure 1

23 pages, 5919 KiB

Open AccessArticle

Research on Soybean Seedling Stage Recognition Based on Swin Transformer

by Kai Ma, Jinkai Qiu, Ye Kang, Liqiang Qi, Wei Zhang, Song Wang and Xiuying Xu

Agronomy 2024, 14(11), 2614; https://doi.org/10.3390/agronomy14112614 - 6 Nov 2024

Viewed by 642

Abstract

Accurate identification of the second and third compound leaf periods of soybean seedlings is a prerequisite to ensure that soybeans are chemically weeded after seedling at the optimal application period. Accurate identification of the soybean seedling period is susceptible to natural light and [...] Read more.

Accurate identification of the second and third compound leaf periods of soybean seedlings is a prerequisite to ensure that soybeans are chemically weeded after seedling at the optimal application period. Accurate identification of the soybean seedling period is susceptible to natural light and complex field background factors. A transfer learning-based Swin-T (Swin Transformer) network is proposed to recognize different stages of the soybean seedling stage. A drone was used to collect images of soybeans at the true leaf stage, the first compound leaf stage, the second compound leaf stage, and the third compound leaf stage, and data enhancement methods such as image rotation and brightness enhancement were used to expand the dataset, simulate the drone’s collection of images at different shooting angles and weather conditions, and enhance the adaptability of the model. The field environment and shooting equipment directly affect the quality of the captured images, and in order to test the anti-interference ability of different models, the Gaussian blur method was used to blur the images of the test set to different degrees. The Swin-T model was optimized by introducing transfer learning and combining hyperparameter combination experiments and optimizer selection experiments. The performance of the optimized Swin-T model was compared with the MobileNetV2, ResNet50, AlexNet, GoogleNet, and VGG16Net models. The results show that the optimized Swin-T model has an average accuracy of 98.38% in the test set, which is an improvement of 11.25%, 12.62%, 10.75%, 1.00%, and 0.63% compared with the MobileNetV2, ResNet50, AlexNet, GoogleNet, and VGG16Net models, respectively. The optimized Swin-T model is best in terms of recall and F1 score. In the performance degradation test of the motion blur level model, the maximum degradation accuracy, overall degradation index, and average degradation index of the optimized Swin-T model were 87.77%, 6.54%, and 2.18%, respectively. The maximum degradation accuracy was 7.02%, 7.48%, 10.15%, 3.56%, and 2.5% higher than the MobileNetV2, ResNet50, AlexNet, GoogleNet, and VGG16Net models, respectively. In the performance degradation test of the Gaussian fuzzy level models, the maximum degradation accuracy, overall degradation index, and average degradation index of the optimized Swin-T model were 94.3%, 3.85%, and 1.285%, respectively. Compared with the MobileNetV2, ResNet50, AlexNet, GoogleNet, and VGG16Net models, the maximum degradation accuracy was 12.13%, 15.98%, 16.7%, 2.2%, and 1.5% higher, respectively. Taking into account various degradation indicators, the Swin-T model can still maintain high recognition accuracy and demonstrate good anti-interference ability even when inputting blurry images caused by interference in shooting. It can meet the recognition of different growth stages of soybean seedlings in complex environments, providing a basis for post-seedling chemical weed control during the second and third compound leaf stages of soybeans. Full article

(This article belongs to the Section Precision and Digital Agriculture)

► Show Figures

Figure 1

20 pages, 3531 KiB

Open AccessArticle

Sea Surface Temperature Prediction Using ConvLSTM-Based Model with Deformable Attention

by Benyun Shi, Conghui Ge, Hongwang Lin, Yanpeng Xu, Qi Tan, Yue Peng and Hailun He

Remote Sens. 2024, 16(22), 4126; https://doi.org/10.3390/rs16224126 - 5 Nov 2024

Viewed by 614

Abstract

Sea surface temperature (SST) prediction has received increasing attention in recent years due to its paramount importance in the various fields of oceanography. Existing studies have shown that neural networks are particularly effective in making accurate SST predictions by efficiently capturing spatiotemporal dependencies [...] Read more.

Sea surface temperature (SST) prediction has received increasing attention in recent years due to its paramount importance in the various fields of oceanography. Existing studies have shown that neural networks are particularly effective in making accurate SST predictions by efficiently capturing spatiotemporal dependencies in SST data. Among various models, the ConvLSTM framework is notably prominent. This model skillfully combines convolutional neural networks (CNNs) with recurrent neural networks (RNNs), enabling it to simultaneously capture spatiotemporal dependencies within a single computational framework. To overcome the limitation that CNNs primarily capture local spatial information, in this paper we propose a novel model named DatLSTM that integrates a deformable attention transformer (DAT) module into the ConvLSTM framework, thereby enhancing its ability to process more complex spatial relationships effectively. Specifically, the DAT module adaptively focuses on salient features in space, while ConvLSTM further captures the temporal dependencies of spatial correlations in the SST data. In this way, DatLSTM can adaptively capture complex spatiotemporal dependencies between the preceding and current states within ConvLSTM. To evaluate the performance of the DatLSTM model, we conducted short-term SST forecasts in the Bohai Sea region with forecast lead times ranging from 1 to 10 days and compared its efficacy against several benchmark models, including ConvLSTM, PredRNN, TCTN, and SwinLSTM. Our experimental results show that the proposed model outperforms all of these models in terms of multiple evaluation metrics short-term SST prediction. The proposed model offers a new predictive learning method for improving the accuracy of spatiotemporal predictions in various domains, including meteorology, oceanography, and climate science. Full article

(This article belongs to the Special Issue Linking Upper Ocean Dynamics with Extreme Weather and Climate Events over the Ocean (Second Edition))

► Show Figures

Figure 1

Search Results (414)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (414)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI