Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (403)

Search Parameters:
Keywords = fine-grained classification

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 42222 KiB  
Article
WGAN-GP for Synthetic Retinal Image Generation: Enhancing Sensor-Based Medical Imaging for Classification Models
by Héctor Anaya-Sánchez, Leopoldo Altamirano-Robles, Raquel Díaz-Hernández and Saúl Zapotecas-Martínez
Sensors 2025, 25(1), 167; https://doi.org/10.3390/s25010167 - 31 Dec 2024
Viewed by 317
Abstract
Accurate synthetic image generation is crucial for addressing data scarcity challenges in medical image classification tasks, particularly in sensor-derived medical imaging. In this work, we propose a novel method using a Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) and nearest-neighbor interpolation to [...] Read more.
Accurate synthetic image generation is crucial for addressing data scarcity challenges in medical image classification tasks, particularly in sensor-derived medical imaging. In this work, we propose a novel method using a Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) and nearest-neighbor interpolation to generate high-quality synthetic images for diabetic retinopathy classification. Our approach enhances training datasets by generating realistic retinal images that retain critical pathological features. We evaluated the method across multiple retinal image datasets, including Retinal-Lesions, Fine-Grained Annotated Diabetic Retinopathy (FGADR), Indian Diabetic Retinopathy Image Dataset (IDRiD), and the Kaggle Diabetic Retinopathy dataset. The proposed method outperformed traditional generative models, such as conditional GANs and PathoGAN, achieving the best performance on key metrics: a Fréchet Inception Distance (FID) of 15.21, a Mean Squared Error (MSE) of 0.002025, and a Structural Similarity Index (SSIM) of 0.89 in the Kaggle dataset. Additionally, expert evaluations revealed that only 56.66% of synthetic images could be distinguished from real ones, demonstrating the high fidelity and clinical relevance of the generated data. These results highlight the effectiveness of our approach in improving medical image classification by generating realistic and diverse synthetic datasets. Full article
(This article belongs to the Collection Medical Applications of Sensor Systems and Devices)
Show Figures

Figure 1

Figure 1
<p>Methodology diagram.</p>
Full article ">Figure 2
<p>Diagram of the lesion extraction technique.</p>
Full article ">Figure 3
<p>Style Transfer diagrams. (<b>a</b>) Diagram illustrating the perceptual loss process, utilizing VGG19 for feature extraction. (<b>b</b>) Diagram depicting the severity loss process, where a pretrained CNN is employed for retinal classification.</p>
Full article ">Figure 4
<p>Images of each configuration where the real image does not have lesions. The image with PathoGAN label is the implementation of [<a href="#B13-sensors-25-00167" class="html-bibr">13</a>]. The others are using WGAN-GP with different resizing algorithms, except cGAN. Underlined is the best FID result.</p>
Full article ">Figure 5
<p>Comparison of generated images across different configurations, where the real image does not contain lesions. The images generated by WGAN-GP and PathoGAN exhibit smoothing effects, while the cGAN successfully transfers the noise from the original image. Underlined is the best FID result.</p>
Full article ">Figure 6
<p>Comparison with generated and real image samples. The images generated using the proposed method exhibit colors and textures that are more similar to the real image. In contrast, the images generated by the cGAN and PathoGAN show color variations in areas where the real image does not present them. Underlined is the best FID result.</p>
Full article ">Figure 7
<p>Comparison with generated and real image samples. The proposed method successfully extracts and preserves the color and texture of the original image, while the cGAN method displays different tones. Underlined is the best FID result.</p>
Full article ">Figure 8
<p>Comparison with generated and real image samples. The proposed method successfully transfers lesions from the original images.</p>
Full article ">Figure 9
<p>Sample images with lesions from the Retinal-Lesions database. Underlined is the best FID result.</p>
Full article ">Figure 10
<p>Sample images with lesions from the FGADR database. Underlined is the best FID result.</p>
Full article ">Figure 11
<p>Sample images with lesions from the IDRiD database. Underlined is the best FID result.</p>
Full article ">Figure 12
<p>Sample images with lesions from the Kaggle database. Underlined is the best FID result.</p>
Full article ">
16 pages, 1698 KiB  
Article
EEG-RegNet: Regressive Emotion Recognition in Continuous VAD Space Using EEG Signals
by Hyo Jin Jon, Longbin Jin, Hyuntaek Jung, Hyunseo Kim and Eun Yi Kim
Mathematics 2025, 13(1), 87; https://doi.org/10.3390/math13010087 - 29 Dec 2024
Viewed by 258
Abstract
Electroencephalogram (EEG)-based emotion recognition has garnered significant attention in brain–computer interface research and healthcare applications. While deep learning models have been extensively studied, most are designed for classification tasks and struggle to accurately predict continuous emotional scores in regression settings. In this paper, [...] Read more.
Electroencephalogram (EEG)-based emotion recognition has garnered significant attention in brain–computer interface research and healthcare applications. While deep learning models have been extensively studied, most are designed for classification tasks and struggle to accurately predict continuous emotional scores in regression settings. In this paper, we introduce EEG-RegNet, a novel deep neural network tailored for precise emotional score prediction across the continuous valence–arousal–dominance (VAD) space. EEG-RegNet tackles two core challenges: extracting subject-independent, emotion-relevant EEG features and mapping these features to fine-grained, continuous emotional scores. The model leverages 2D convolutional neural networks (CNNs) for spatial feature extraction and a 1D CNN for temporal dynamics, providing robust spatiotemporal modeling. A key innovation is the hybrid loss function, which integrates mean squared error (MSE) and cross-entropy (CE) with a Bernoulli penalty to enhance probability estimation and address sparsity in the emotional space. Extensive experiments on the DEAP dataset show that EEG-RegNet achieves state-of-the-art results in continuous emotional score prediction and attains 95% accuracy in fine-grained emotion classification, highlighting its scalability and precision in emotion recognition. Full article
Show Figures

Figure 1

Figure 1
<p>Overview of EEG-RegNet’s emotion recognition processes, including preprocessing, spatiotemporal feature extraction, and regressive emotion recognition.</p>
Full article ">Figure 2
<p>Detailed framework of EEG-RegNet, illustrating the internal architecture for spatiotemporal feature extraction and the two-stage regressive emotion recognition process.</p>
Full article ">Figure 3
<p>Emotional spaces for representing emotions. (<b>a</b>) Russell’s valence–arousal space, and (<b>b</b>) Mehrabian’s valence–arousal–dominance space.</p>
Full article ">Figure 4
<p>Confusion matrix of EEG-RegNet predictions for nonary emotion classification across valence, arousal, and dominance dimensions on the DEAP dataset.</p>
Full article ">Figure 5
<p>Impact of hybrid loss balance: (<b>a</b>) Average RMSE for emotional score prediction; (<b>b</b>) Average error rates for fine-grained nonary classification across valence, arousal, and dominance as <math display="inline"><semantics> <mi>τ</mi> </semantics></math> values vary.</p>
Full article ">Figure 6
<p>Probability distribution for emotional score prediction across valence, arousal, and dominance on the DEAP dataset: (<b>a</b>) Baseline predictions using only MSE loss; (<b>b</b>) Predictions generated by EEG-RegNet using hybrid loss.</p>
Full article ">
18 pages, 5460 KiB  
Article
CoCM: Conditional Cross-Modal Learning for Vision-Language Models
by Juncheng Yang, Shuai Xie, Shuxia Li, Zengyu Cai, Yijia Li and Weiping Zhu
Electronics 2025, 14(1), 26; https://doi.org/10.3390/electronics14010026 - 25 Dec 2024
Viewed by 323
Abstract
Parameter tuning based adapter methods have achieved notable success in vision-language models (VLMs). However, they face challenges in scenarios with insufficient training samples or limited resources. While leveraging image modality caching and retrieval techniques can reduce resource requirements, these approaches often overlook the [...] Read more.
Parameter tuning based adapter methods have achieved notable success in vision-language models (VLMs). However, they face challenges in scenarios with insufficient training samples or limited resources. While leveraging image modality caching and retrieval techniques can reduce resource requirements, these approaches often overlook the significance of textual modality and cross-modal cues in VLMs. To address this, we propose a Conditional Cross-Modal learning model, which is abbreviated as CoCM. CoCM builds separate cache models for both the text and image modalities and embedding textual knowledge conditioned on image information. It dynamically adjusts the cross-modal fusion affinity ratio and disentangles similarity measures across different modalities. Additionally, CoCM incorporates intra-batch image similarity loss as a regularization term to identify hard samples and enhance fine-grained classification performance. CoCM surpasses existing methods in terms of accuracy, generalization ability, and efficiency, achieving a 0.28% accuracy improvement over XMAdapter across 11 datasets and demonstrating 44.79% generalization performance on four cross-domain datasets. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

Figure 1
<p>Illustration comparing CoCM with CLIP [<a href="#B1-electronics-14-00026" class="html-bibr">1</a>], CLIP-Adapter [<a href="#B9-electronics-14-00026" class="html-bibr">9</a>] and Tip-Adapter [<a href="#B10-electronics-14-00026" class="html-bibr">10</a>].</p>
Full article ">Figure 2
<p>Illustration of the proposed CoCM. The (<span style="color: #FBD7B8">reddish orange</span>) line depicts the flow of image features, while the (<span style="color: #B5D2AB">pea green</span>) line represents the flow of text features. The model first constructs a key-value cache model and then builds a cross-modal cache by integrating image and text features. It uses similarity loss among images to identify hard samples. Finally, the model combines the knowledge from the original VLM to enhance the accuracy of its predictions.</p>
Full article ">Figure 3
<p>The performance comparison of our CoCM with the SOTA method on cross label generalization, including 1-/2-/4-/8-/16-shots on 11 benchmark datasets.</p>
Full article ">
17 pages, 395 KiB  
Article
WePred: Edge Weight-Guided Contrastive Learning for Bipartite Link Prediction
by Linlin Ding, Yiming Han, Mo Li, Yinghao Gu, Tingting Liu and Shidong Yu
Electronics 2025, 14(1), 20; https://doi.org/10.3390/electronics14010020 - 25 Dec 2024
Viewed by 290
Abstract
Bipartite networks are common in real-world applications, where link prediction helps understand network evolution and make recommendations. Traditional methods have two major limitations; they often ignore edge weight information by treating links as binary connections, and they struggle to capture complex interaction patterns [...] Read more.
Bipartite networks are common in real-world applications, where link prediction helps understand network evolution and make recommendations. Traditional methods have two major limitations; they often ignore edge weight information by treating links as binary connections, and they struggle to capture complex interaction patterns in sparse networks. We propose WePred, a weight-guided contrastive learning method for link prediction which addresses both challenges through three key components: (1) a weight-guided edge attention mechanism that incorporates edge weights into neighbor aggregation using dynamic attention scores, enabling the fine-grained capture of interaction strengths; (2) a dual-level contrastive learning approach that combines edge-level and node-level contrasts to capture both local weighted patterns and global structural dependencies, which is particularly effective in sparse regions; and (3) a unified learning framework that integrates classification and contrastive objectives. We evaluate WePred on five real-world datasets. The experimental results show that WePred consistently outperformed state-of-the-art methods, achieving improvements of 7.27% in the AUC on ML-1M and 3.18% in the AUC on the sparsest dataset of Amazon-Book (density: 0.001%). Ablation studies confirm the effectiveness of each component, while parameter analyses provide deployment guidance. Full article
Show Figures

Figure 1

Figure 1
<p>The architecture of WePred consists of three main layers: (1) an embedding layer, which separates embedding tables for user and item nodes that generate initial node representations; (2) a weight-guided edge attention layer, which processes node embeddings through an attention mechanism and weight-guided aggregation, where Q represents center node embedding and Keys represent neighbor embeddings, combined with the edge weights W through MLP and BMM operations; and (3) a dual-level contrastive learning layer, which includes node-level contrast using LSH for identifying structurally similar nodes and edge-level contrast employing weight-degree balanced sampling (WDBS), with <math display="inline"><semantics> <mrow> <msub> <mi>d</mi> <mi>v</mi> </msub> <mo>×</mo> <mi>k</mi> </mrow> </semantics></math> negative samples for each positive edge.</p>
Full article ">Figure 2
<p>Ablation study results.</p>
Full article ">Figure 3
<p>The impact of hyperparameters on WePred’s performance.</p>
Full article ">
19 pages, 6995 KiB  
Article
A Classification Model for Fine-Grained Silkworm Cocoon Images Based on Bilinear Pooling and Adaptive Feature Fusion
by Mochen Liu, Xin Hou, Mingrui Shang, Eunice Oluwabunmi Owoola, Guizheng Zhang, Wei Wei, Zhanhua Song and Yinfa Yan
Agriculture 2024, 14(12), 2363; https://doi.org/10.3390/agriculture14122363 - 22 Dec 2024
Viewed by 484
Abstract
The quality of silkworm cocoons affects the quality and cost of silk processing. It is necessary to sort silkworm cocoons prior to silk production. Cocoon images consist of fine-grained images with large intra-class differences and small inter-class differences. The subtle intra-class features pose [...] Read more.
The quality of silkworm cocoons affects the quality and cost of silk processing. It is necessary to sort silkworm cocoons prior to silk production. Cocoon images consist of fine-grained images with large intra-class differences and small inter-class differences. The subtle intra-class features pose a serious challenge in accurately locating the effective areas and classifying silkworm cocoons. To improve the perception of intra-class features and the classification accuracy, this paper proposes a bilinear pooling classification model (B-Res41-ASE) based on adaptive multi-scale feature fusion and enhancement. B-Res41-ASE consists of three parts: a feature extraction module, a feature fusion module, and a feature enhancement module. Firstly, the backbone network, ResNet41, is constructed based on the bilinear pooling algorithm to extract complete cocoon features. Secondly, the adaptive spatial feature fusion module (ASFF) is introduced to fuse different semantic information to solve the problem of fine-grained information loss in the process of feature extraction. Finally, the squeeze and excitation module (SE) is used to suppress redundant information, enhance the weight of distinguishable regions, and reduce classification bias. Compared with the widely used classification network, the proposed model achieves the highest classification performance in the test set, with accuracy of 97.0% and an F1-score of 97.5%. The accuracy of B-Res41-ASE is 3.1% and 2.6% higher than that of the classification networks AlexNet and GoogLeNet, respectively, while the F1-score is 2.5% and 2.2% higher, respectively. Additionally, the accuracy of B-Res41-ASE is 1.9% and 7.7% higher than that of the Bilinear CNN and HBP, respectively, while the F1-score is 1.6% and 5.7% higher. The experimental results show that the proposed classification model without complex labelling outperforms other cocoon classification algorithms in terms of classification accuracy and robustness, providing a theoretical basis for the intelligent sorting of silkworm cocoons. Full article
(This article belongs to the Section Digital Agriculture)
Show Figures

Figure 1

Figure 1
<p>Images of silkworm cocoons captured by top and bottom camera. (<b>a</b>) Cocoon image captured by camera (top). (<b>b</b>) Cocoon image captured by camera (bottom).</p>
Full article ">Figure 2
<p>Images of reelable cocoons and different types of waste cocoons.</p>
Full article ">Figure 3
<p>Cocoon image classification model architecture.</p>
Full article ">Figure 4
<p>Bilinear pooling-based image classification model for silkworm cocoon images.</p>
Full article ">Figure 5
<p>Silkworm cocoon image classification model based on bilinear pooling with feature fusion.</p>
Full article ">Figure 6
<p>Bilinear pooling classification model for silkworm cocoon images based on feature fusion and enhancement.</p>
Full article ">Figure 7
<p>The training accuracy curves of different fusion algorithms. (<b>a</b>) The training accuracy curves of different fusion algorithms. (<b>b</b>) The training loss curves of different fusion algorithms.</p>
Full article ">Figure 8
<p>Confusion matrix for different fusion algorithms.</p>
Full article ">Figure 8 Cont.
<p>Confusion matrix for different fusion algorithms.</p>
Full article ">Figure 9
<p>The fine-grained classification <span class="html-italic">precision</span> of the silkworm cocoon for different fusion algorithms. A. Cocoon polluted by oil. B. Stained cocoon. C. Cocoon pressed by cocooning frame. D. Crushed cocoon. E. Double cocoon. F. Reelable cocoon. G. Yellow spotted cocoon. H. Decayed cocoon. I. Malformed cocoon.</p>
Full article ">Figure 10
<p>The training curves of different feature fusion and enhancement methods. (<b>a</b>) The training accuracy curves. (<b>b</b>) The training loss curves.</p>
Full article ">Figure 11
<p>The confusion matrix for different feature enhancements.</p>
Full article ">Figure 12
<p>The fine-grained classification precision of the silkworm cocoon for different feature enhancements. A. Cocoon polluted by oil. B. Stained cocoon. C. Cocoon pressed by cocooning frame. D. Crushed cocoon. E. Double cocoon. F. Reelable cocoon. G. Yellow spotted cocoon. H. Decayed cocoon. I. Malformed cocoon.</p>
Full article ">Figure 13
<p>Comparison of different models with Grad-CAM visualization.</p>
Full article ">Figure 14
<p>Adaptive spatial feature map before and after fusion visualization.</p>
Full article ">Figure 15
<p>Accuracy and loss value change curve for each model. (<b>a</b>) The training accuracy curves of different algorithms. (<b>b</b>) The training loss curves of different algorithms.</p>
Full article ">Figure 16
<p>Experimental images of different varieties of silkworm cocoons.</p>
Full article ">
23 pages, 1149 KiB  
Article
MGAFN-ISA: Multi-Granularity Attention Fusion Network for Implicit Sentiment Analysis
by Yifan Huo, Ming Liu, Junhong Zheng and Lili He
Electronics 2024, 13(24), 4905; https://doi.org/10.3390/electronics13244905 (registering DOI) - 12 Dec 2024
Viewed by 429
Abstract
Although significant progress has been made in sentiment analysis tasks based on image–text data, existing methods still have limitations in capturing cross-modal correlations and detailed information. To address these issues, we propose a Multi-Granularity Attention Fusion Network for Implicit Sentiment Analysis (MGAFN-ISA). MGAFN-ISA [...] Read more.
Although significant progress has been made in sentiment analysis tasks based on image–text data, existing methods still have limitations in capturing cross-modal correlations and detailed information. To address these issues, we propose a Multi-Granularity Attention Fusion Network for Implicit Sentiment Analysis (MGAFN-ISA). MGAFN-ISA that leverages neural networks and attention mechanisms to effectively reduce noise interference between different modalities and captures distinct, fine-grained visual and textual features. The model includes two key feature extraction modules: a multi-scale attention fusion-based visual feature extractor and a hierarchical attention mechanism-based textual feature extractor, each designed to extract detailed and discriminative visual and textual representations. Additionally, we introduce an image translator engine to produce accurate and detailed image descriptions, further narrowing the semantic gap between the visual and textual modalities. A bidirectional cross-attention mechanism is also incorporated to utilize correlations between fine-grained local regions across modalities, extracting complementary information from heterogeneous visual and textual data. Finally, we designed an adaptive multimodal classification module that dynamically adjusts the contribution of each modality through an adaptive gating mechanism. Extensive experimental results demonstrate that MGAFN-ISA achieves a significant performance improvement over nine state-of-the-art methods across multiple public datasets, validating the effectiveness and advancement of our proposed approach. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

Figure 1
<p>Examples of multimodal sentiment analysis.</p>
Full article ">Figure 2
<p>The overall structure of the proposed MGAFN-ISA method. The visual–text feature extraction utilizes ResNet-101 to extract multi-scale image features, combined with ViT-GPT-2 to generate image descriptions, while BERT is used to extract textual features. The multimodal fine-grained correlation fusion module adopts a bidirectional cross-attention mechanism to capture fine-grained associations between visual and textual modalities. Furthermore, multi-head attention and layer stacking are used to enhance feature interactions. The classification module employs an adaptive gating mechanism to dynamically fuse visual and textual features, achieving accurate classification of multimodal implicit sentiment by integrating the improved focal loss function.</p>
Full article ">Figure 3
<p>The performance of MGAFN-ISA under different proportions of training data on each dataset.</p>
Full article ">
17 pages, 2272 KiB  
Article
Convolutional Neural Network–Vision Transformer Architecture with Gated Control Mechanism and Multi-Scale Fusion for Enhanced Pulmonary Disease Classification
by Okpala Chibuike and Xiaopeng Yang
Diagnostics 2024, 14(24), 2790; https://doi.org/10.3390/diagnostics14242790 - 12 Dec 2024
Viewed by 658
Abstract
Background/Objectives: Vision Transformers (ViTs) and convolutional neural networks (CNNs) have demonstrated remarkable performances in image classification, especially in the domain of medical imaging analysis. However, ViTs struggle to capture high-frequency components of images, which are critical in identifying fine-grained patterns, while CNNs have [...] Read more.
Background/Objectives: Vision Transformers (ViTs) and convolutional neural networks (CNNs) have demonstrated remarkable performances in image classification, especially in the domain of medical imaging analysis. However, ViTs struggle to capture high-frequency components of images, which are critical in identifying fine-grained patterns, while CNNs have difficulties in capturing long-range dependencies due to their local receptive fields, which makes it difficult to fully capture the spatial relationship across lung regions. Methods: In this paper, we proposed a hybrid architecture that integrates ViTs and CNNs within a modular component block(s) to leverage both local feature extraction and global context capture. In each component block, the CNN is used to extract the local features, which are then passed through the ViT to capture the global dependencies. We implemented a gated attention mechanism that combines the channel-, spatial-, and element-wise attention to selectively emphasize the important features, thereby enhancing overall feature representation. Furthermore, we incorporated a multi-scale fusion module (MSFM) in the proposed framework to fuse the features at different scales for more comprehensive feature representation. Results: Our proposed model achieved an accuracy of 99.50% in the classification of four pulmonary conditions. Conclusions: Through extensive experiments and ablation studies, we demonstrated the effectiveness of our approach in improving the medical image classification performance, while achieving good calibration results. This hybrid approach offers a promising framework for reliable and accurate disease diagnosis in medical imaging. Full article
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)
Show Figures

Figure 1

Figure 1
<p>The proposed hybrid architecture.</p>
Full article ">Figure 2
<p>Gated mechanism with attention.</p>
Full article ">Figure 3
<p>Inception-styled multi-scale fusion module proposed in this study.</p>
Full article ">Figure 4
<p>A confusion matrix for the proposed model.</p>
Full article ">Figure 5
<p>Impact of different augmentation methods on original images.</p>
Full article ">Figure 6
<p>Impact of gated mechanism and multi-scale fusion using LIME explainability analysis.</p>
Full article ">
21 pages, 7882 KiB  
Article
Multi-Scale Gross Ecosystem Product (GEP) Valuation for Wetland Ecosystems: A Case Study of Lishui City
by Zhixin Zhu, Keyue Wu, Shuyue Zhou, Zhe Wang and Weiya Chen
Water 2024, 16(24), 3554; https://doi.org/10.3390/w16243554 - 10 Dec 2024
Viewed by 514
Abstract
Traditional gross ecosystem product (GEP) accounting methods often operate at macro scales, failing to reflect the localized and nuanced values of wetland ecosystems. This study addresses these challenges by introducing a fine-grained classification system based on a localized adaptation of international standards. The [...] Read more.
Traditional gross ecosystem product (GEP) accounting methods often operate at macro scales, failing to reflect the localized and nuanced values of wetland ecosystems. This study addresses these challenges by introducing a fine-grained classification system based on a localized adaptation of international standards. The framework integrates high-precision national land surveys and remote sensing quantitative analysis while incorporating fisheries resource models, climate regulation beneficiary mapping, and visitor interpolation to address data scarcity related to human activities. This approach refines the spatial calculation methods for functional quantity accounting at fine scales. The results demonstrate that the refined classification maintains consistency with traditional methods in total value while adapting to multi-scale accounting, filling gaps at small and medium scales and providing a more accurate representation of localized wetland characteristics. Additionally, the study highlights the dominance of cultural services in GEP, emphasizing the need to balance cultural and regulatory services to ensure fairness in decision-making. Finally, a village-scale decision-support model is proposed, offering actionable guidance for wetland management and sustainable development planning. Full article
(This article belongs to the Special Issue Hydro-Economic Models for Sustainable Water Resources Management)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Layout of study area.</p>
Full article ">Figure 2
<p>Wetland classification mapping process.</p>
Full article ">Figure 3
<p>Spatial calculation methods for functional quantities maps based on interpolation optimization. (<b>a</b>) Fishery suitability mapping and adjustment. (<b>b</b>) Wetland climate regulation beneficiaries analysis and (<b>c</b>) wetland tourism distribution.</p>
Full article ">Figure 4
<p>GEP accounting process: In the ecosystem product amout part, the “·” symbol in the figure indicates data calculations based on coefficients or models specified in the standard, while the “+” symbol denotes the spatial calculations or adjustment coefficients added for optimizing functional quantities in the research design.</p>
Full article ">Figure 5
<p>Lishui wetland classification map.</p>
Full article ">Figure 6
<p>Multi-scale wetland waterbody area statistics.</p>
Full article ">Figure 7
<p>Gross ecosystem product result in multi scale.</p>
Full article ">Figure 8
<p>Wetland area vs. GEP value with Z-score selections.</p>
Full article ">Figure 9
<p>Cluster analysis of regulatory vs. cultural service contributions.</p>
Full article ">Figure 10
<p>Village-level wetland development strategy map.</p>
Full article ">
20 pages, 16663 KiB  
Article
Evaluating the Spatial Variations in Bed Sediment and Their Depositional Environments Using Particle-Size Analysis of Wadi Fatima, Saudi Arabia
by Mohd Yawar Ali Khan
Water 2024, 16(24), 3553; https://doi.org/10.3390/w16243553 - 10 Dec 2024
Viewed by 399
Abstract
The assessment of grain size and sediment output is crucial for analyzing the pace of sediment erosion, engineering dams and reservoirs, anticipating the impact of climate change and human activities on river systems, and comprehending the presence of trace and heavy metal pathogens [...] Read more.
The assessment of grain size and sediment output is crucial for analyzing the pace of sediment erosion, engineering dams and reservoirs, anticipating the impact of climate change and human activities on river systems, and comprehending the presence of trace and heavy metal pathogens and micropollutants. In July 2024, 16 samples of bed sediments were collected from the mainstream of Wadi Fatima and its tributaries in the Makkah region of Saudi Arabia to identify the depositional environments and the hydrodynamic conditions using Passega diagram, Linear Discriminate Function (LDF) and bivariate plots. The results indicate that the sediments being studied exhibit polymodal properties in both the upstream and midstream regions of the main channel of Wadi Fatima. However, in the downstream region, the samples show trimodal properties. Regarding tributaries, the WFT1, WFT2, and WFT4 sediments exhibit polymodal properties, except for WFT3, which is bimodal. Folk’s classification system categorizes the samples into four distinct classes/facies: gravel, sandy gravel, gravely sand, and sand, with respective proportions of 13%, 62%, 6%, and 19%. The sediments found in Wadi Fatima contain a range of graphic mean (MZ) values, from −3.34 (indicating medium gravel) to 2.48 (indicating fine sand). On average, the MZ value is −0.79, which shows extremely fine gravel. The standard deviation (sorting (σi)) values of the samples analyzed from Wadi Fatima vary between 0.71 (moderately sorted) and 3.44 (very poorly sorted), with an average of 2.00 (very poorly sorted). The data exhibits a range of skewness (Sk) values, ranging from −0.41 (showing a very coarse Sk) to 0.82 (representing a indicating a very fine Sk). On average, the data shows a Sk value of −0.02, indicating a symmetrical distribution. The kurtosis (K) values span from 0.51 (indicating a very platykurtic distribution) to 2.65 (indicating a very leptokurtic distribution), with an average of 0.95 (indicating a mesokurtic distribution). Full article
Show Figures

Figure 1

Figure 1
<p>Map of the study area and sampling locations.</p>
Full article ">Figure 2
<p>Geological map of the of study area (modified after [<a href="#B31-water-16-03553" class="html-bibr">31</a>]).</p>
Full article ">Figure 3
<p>Geological cross-section of three profiles (mentioned in <a href="#water-16-03553-f002" class="html-fig">Figure 2</a>) of Wadi Fatima (<b>a</b>) A-B between Jabal Daf to Jabal Kassab in the southwestern part, (<b>b</b>) C-D between Jabal Mukassar and Jabal AlSail AlKabeer, in the central part of the basin, and (<b>c</b>) E-F between Haddat Ash-Sham area and Ashayrah area in the northern part of the basin (modified after [<a href="#B33-water-16-03553" class="html-bibr">33</a>]).</p>
Full article ">Figure 4
<p>(<b>a</b>) Slope and (<b>b</b>) aspect ratio map of the study area.</p>
Full article ">Figure 5
<p>Frequency plot of Wadi Fatima samples.</p>
Full article ">Figure 6
<p>Cumulative curve of Wadi Fatima samples.</p>
Full article ">Figure 7
<p>A plot of the grain size distributions of the samples (gravel, sand, mud) according to Folk’s classification system.</p>
Full article ">Figure 8
<p>Textural components of the Wadi Fatima sample (Note: blue line represents upstream, green line represents Midstream and red line represents downstream samples).</p>
Full article ">Figure 9
<p>Lateral variation for the samples.</p>
Full article ">Figure 10
<p>Bivariate plots of: (<b>a</b>) <span class="html-italic">σi</span> against <span class="html-italic">M<sub>Z</sub></span>, (<b>b</b>) <span class="html-italic">Sk</span> against <span class="html-italic">M<sub>Z</sub></span>, (<b>c</b>) <span class="html-italic">K</span> against <span class="html-italic">M<sub>Z</sub></span>, (<b>d</b>) <span class="html-italic">σi</span> against <span class="html-italic">Sk</span> and (<b>e</b>) <span class="html-italic">K</span> against <span class="html-italic">Sk</span>.</p>
Full article ">Figure 10 Cont.
<p>Bivariate plots of: (<b>a</b>) <span class="html-italic">σi</span> against <span class="html-italic">M<sub>Z</sub></span>, (<b>b</b>) <span class="html-italic">Sk</span> against <span class="html-italic">M<sub>Z</sub></span>, (<b>c</b>) <span class="html-italic">K</span> against <span class="html-italic">M<sub>Z</sub></span>, (<b>d</b>) <span class="html-italic">σi</span> against <span class="html-italic">Sk</span> and (<b>e</b>) <span class="html-italic">K</span> against <span class="html-italic">Sk</span>.</p>
Full article ">Figure 11
<p>LDF binary plots: (<b>a</b>) Y<sub>2</sub>/Y<sub>1</sub>, (<b>b</b>) Y<sub>3</sub>/Y<sub>2</sub> and (<b>c</b>) Y<sub>4</sub>/Y<sub>3</sub>.</p>
Full article ">Figure 12
<p>C-M plot showing (<b>a</b>) depositional environment, and (<b>b</b>) transporting mechanism of the Wadi Fatima bed sediments.</p>
Full article ">Figure 13
<p>Bivariant plot of (<b>a</b>) <span class="html-italic">Sk</span> versus inclusive <span class="html-italic">σi</span>, (<b>b</b>) inclusive <span class="html-italic">σi</span> versus median, and (<b>c</b>) Mz versus inclusive <span class="html-italic">σi</span>, after Friedman [<a href="#B50-water-16-03553" class="html-bibr">50</a>,<a href="#B65-water-16-03553" class="html-bibr">65</a>], Stewart [<a href="#B64-water-16-03553" class="html-bibr">64</a>] and Moiola and Weiser [<a href="#B47-water-16-03553" class="html-bibr">47</a>].</p>
Full article ">
25 pages, 5732 KiB  
Article
Analyzing the Impact of Binaural Beats on Anxiety Levels by a New Method Based on Denoised Harmonic Subtraction and Transient Temporal Feature Extraction
by Devika Rankhambe, Bharati Sanjay Ainapure, Bhargav Appasani, Avireni Srinivasulu and Nicu Bizon
Bioengineering 2024, 11(12), 1251; https://doi.org/10.3390/bioengineering11121251 - 10 Dec 2024
Viewed by 618
Abstract
Anxiety is a widespread mental health issue, and binaural beats have been explored as a potential non-invasive treatment. EEG data reveal changes in neural oscillation and connectivity linked to anxiety reduction; however, harmonics introduced during signal acquisition and processing often distort these findings. [...] Read more.
Anxiety is a widespread mental health issue, and binaural beats have been explored as a potential non-invasive treatment. EEG data reveal changes in neural oscillation and connectivity linked to anxiety reduction; however, harmonics introduced during signal acquisition and processing often distort these findings. Existing methods struggle to effectively reduce harmonics and capture the fine-grained temporal dynamics of EEG signals, leading to inaccurate feature extraction. Hence, a novel Denoised Harmonic Subtraction and Transient Temporal Feature Extraction is proposed to improve the analysis of the impact of binaural beats on anxiety levels. Initially, a novel Wiener Fused Convo Filter is introduced to capture spatial features and eliminate linear noise in EEG signals. Next, an Intrinsic Harmonic Subtraction Network is employed, utilizing the Attentive Weighted Least Mean Square (AW-LMS) algorithm to capture nonlinear summation and resonant coupling effects, effectively eliminating the misinterpretation of brain rhythms. To address the challenge of fine-grained temporal dynamics, an Embedded Transfo XL Recurrent Network is introduced to detect and extract relevant parameters associated with transient events in EEG data. Finally, EEG data undergo harmonic reduction and temporal feature extraction before classification with a cross-correlated Markov Deep Q-Network (DQN). This facilitates anxiety level classification into normal, mild, moderate, and severe categories. The model demonstrated a high accuracy of 95.6%, precision of 90%, sensitivity of 93.2%, and specificity of 96% in classifying anxiety levels, outperforming previous models. This integrated approach enhances EEG signal processing, enabling reliable anxiety classification and offering valuable insights for therapeutic interventions. Full article
(This article belongs to the Special Issue Adaptive Neurostimulation: Innovative Strategies for Stimulation)
Show Figures

Figure 1

Figure 1
<p>Block Diagram of the proposed system.</p>
Full article ">Figure 2
<p>Wiener Fused Convo Filter.</p>
Full article ">Figure 3
<p>Flowchart of Hilbert–Huang transformation process of the proposed system.</p>
Full article ">Figure 4
<p>Attentive Weighted Least Mean Square (AW-LMS) algorithm of the proposed model.</p>
Full article ">Figure 5
<p>Schematic representation of a Transformer-XL.</p>
Full article ">Figure 6
<p>Long Short-Term Memory.</p>
Full article ">Figure 7
<p>Input EEG of the proposed model for (<b>a</b>) delta, (<b>b</b>) theta, (<b>c</b>) alpha, (<b>d</b>) beta, and (<b>e</b>) gamma.</p>
Full article ">Figure 7 Cont.
<p>Input EEG of the proposed model for (<b>a</b>) delta, (<b>b</b>) theta, (<b>c</b>) alpha, (<b>d</b>) beta, and (<b>e</b>) gamma.</p>
Full article ">Figure 8
<p>Pre-processed EEG of the proposed model for (<b>a</b>) delta, (<b>b</b>) theta, (<b>c</b>) alpha, (<b>d</b>) beta, and (<b>e</b>) gamma.</p>
Full article ">Figure 9
<p>Normalized frequencies of the proposed model for (<b>a</b>) delta, (<b>b</b>) theta, (<b>c</b>) alpha, (<b>d</b>) beta, and (<b>e</b>) gamma.</p>
Full article ">Figure 10
<p>Brain rhythm of the proposed model for (<b>a</b>) delta, (<b>b</b>) theta, (<b>c</b>) alpha, (<b>d</b>) beta, and (<b>e</b>) gamma.</p>
Full article ">Figure 11
<p>The loss rate of the proposed system.</p>
Full article ">Figure 12
<p>Mean square error (MSE) of the proposed model.</p>
Full article ">Figure 13
<p>Confusion matrix of the proposed method.</p>
Full article ">Figure 14
<p>Accuracy, precision, sensitivity, specificity, and F1 score of the proposed model.</p>
Full article ">Figure 15
<p>FPR, FNR, and MAE of the proposed model.</p>
Full article ">Figure 16
<p>NPV, PSNR, and MCC of the proposed model.</p>
Full article ">Figure 17
<p>Comparison of key metrics such as accuracy, precision, sensitivity, specificity, and F1 score of the proposed model.</p>
Full article ">Figure 18
<p>Comparison of the NPV and MCC of the proposed model.</p>
Full article ">Figure 19
<p>Comparison of the FNR, FPR, and FDR of the proposed model.</p>
Full article ">
20 pages, 6009 KiB  
Article
Satellite Image Time-Series Classification with Inception-Enhanced Temporal Attention Encoder
by Zheng Zhang, Weixiong Zhang, Yu Meng, Zhitao Zhao, Ping Tang and Hongyi Li
Remote Sens. 2024, 16(23), 4579; https://doi.org/10.3390/rs16234579 - 6 Dec 2024
Viewed by 473
Abstract
In this study, we propose a one-branch IncepTAE network to extract local and global hybrid temporal attention simultaneously and congruously for fine-grained satellite image time series (SITS) classification. Transformer and the temporal self-attention mechanism have been the research focus of SITS classification in [...] Read more.
In this study, we propose a one-branch IncepTAE network to extract local and global hybrid temporal attention simultaneously and congruously for fine-grained satellite image time series (SITS) classification. Transformer and the temporal self-attention mechanism have been the research focus of SITS classification in recent years. However, its effectiveness seems to diminish in the scenario of fine-grained classification among similar categories, for example, different crop types. Theoretically, most of the existing methods focus on only one type of temporal attention, either global attention or local attention, but actually, both of them are required to achieve fine-grained classification. Even though some works adopt two-branch architecture to extract hybrid attention, they usually lack congruity between different types of temporal attention and hinder the expected discriminating ability. Compared with the existing methods, IncepTAE exhibits multiple methodological novelties. Firstly, we insert average/maximum pooling layers into the calculation of multi-head attention to extract hybrid temporal attention. Secondly, IncepTAE adopts one-branch architecture, which reinforces the interaction and congruity of different temporal information. Thirdly, the proposed IncepTAE is more lightweight due to the use of group convolutions. IncepTAE achieves 95.65% and 97.84% overall accuracy on two challenging datasets, TimeSen2Crop and Ghana. The comparative results with existing state-of-the-art methods demonstrate that IncepTAE is able to achieve superior classification performance and faster inference speed, which is conducive to the large-area application of SITS classification. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Overall architecture of inception-enhanced temporal attention encoder (IncepTAE). The meanings of the abbreviations, colors and symbols involved are given as follows: <span style="color: green">green block</span>: tensor of data; <span style="color: red">red text</span>: shape of tensor; <span style="color: blue">blue block</span>: modules of neural networks; ⟶: the direction in which data flows; ⊕: element-wise add; ⊗: matrix multiplication; <span class="html-italic">X</span>: input time series; <span class="html-italic">D</span>: dates of time series; <span class="html-italic">E</span>: embedding of time series; <span class="html-italic">O</span>: output of self-attention mechanism; <math display="inline"><semantics> <mover accent="true"> <mi>O</mi> <mo stretchy="false">˜</mo> </mover> </semantics></math> final output of hybrid temporal feature; @1,@<span class="html-italic">k</span>: kernel size of convolutions and pooling layers.</p>
Full article ">Figure 2
<p>Comparison of the computation of <span class="html-italic">keys</span> for multi-head attention in (<b>a</b>) LTAE, (<b>b</b>) IncepTAE. IncepTAE has less computation than LTAE because IncepTAE implements multi-head by group convolution rather than full convolution linear layer.</p>
Full article ">Figure 3
<p>The confusion proportions of (<b>a</b>) IncepTAE and (<b>b</b>) GLTAE for each class in TimeSen2Crop plotted with different colors. Note the axis of proportions starts at <math display="inline"><semantics> <mrow> <mn>60</mn> <mo>%</mo> </mrow> </semantics></math>. IncepTAE yields better results in most classes, especially on Legumes and Winter Caraway.</p>
Full article ">Figure 4
<p>The confusion proportions of (<b>a</b>) IncepTAE and (<b>b</b>) GLTAE for each class in Ghana plotted with different colors. Note the axis of proportions starts at <math display="inline"><semantics> <mrow> <mn>80</mn> <mo>%</mo> </mrow> </semantics></math>. Both IncepTAE and GLTAE tend to misclassify Non-Intercrop as Intercrop, but one intuitive fact is that IncepTAE has fewer misclassified proportions as Intercrop for most classes than GLTAE, resulting in a better performance.</p>
Full article ">Figure 5
<p>Performance in OA, mIoU, <math display="inline"><semantics> <mi>κ</mi> </semantics></math> and mF1 of IncepTAE with different <math display="inline"><semantics> <mrow> <mi>k</mi> <mi>e</mi> <mi>r</mi> <mi>n</mi> <mi>e</mi> <mi>l</mi> <mo>_</mo> <mi>s</mi> <mi>i</mi> <mi>z</mi> <mi>e</mi> <mi>s</mi> </mrow> </semantics></math>. The best results are achieved by setting <math display="inline"><semantics> <mrow> <mi>k</mi> <mi>e</mi> <mi>r</mi> <mi>n</mi> <mi>e</mi> <mi>l</mi> <mo>_</mo> <mi>s</mi> <mi>i</mi> <mi>z</mi> <mi>e</mi> <mo>=</mo> <mn>5</mn> </mrow> </semantics></math> for (<b>a</b>) TimeSen2Crop dataset and <math display="inline"><semantics> <mrow> <mi>k</mi> <mi>e</mi> <mi>r</mi> <mi>n</mi> <mi>e</mi> <mi>l</mi> <mo>_</mo> <mi>s</mi> <mi>i</mi> <mi>z</mi> <mi>e</mi> <mo>=</mo> <mn>9</mn> </mrow> </semantics></math> for (<b>b</b>) Ghana dataset, respectively, which are slightly better than setting <math display="inline"><semantics> <mrow> <mi>k</mi> <mi>e</mi> <mi>r</mi> <mi>n</mi> <mi>e</mi> <mi>l</mi> <mo>_</mo> <mi>s</mi> <mi>i</mi> <mi>z</mi> <mi>e</mi> <mo>=</mo> <mn>3</mn> </mrow> </semantics></math> for both datasets. In practice, setting <math display="inline"><semantics> <mrow> <mi>k</mi> <mi>e</mi> <mi>r</mi> <mi>n</mi> <mi>e</mi> <mi>l</mi> <mo>_</mo> <mi>s</mi> <mi>i</mi> <mi>z</mi> <mi>e</mi> <mo>=</mo> <mn>3</mn> </mrow> </semantics></math> is enough for extracting enhanced global and local information with less computation.</p>
Full article ">Figure 6
<p>Classification performance (in mIoU(%)) of IncepTAE, GLTAE and LTAE on (<b>a</b>) TimeSen2Crop and (<b>b</b>) Ghana plotted with respect to the number of parameters. Note the axis of parameters is given on a logarithmic scale. The proposed IncepTAE outperforms other models across model sizes.</p>
Full article ">Figure 7
<p>Visualization of average attention maps of the (<b>a</b>) Spring Cereal, (<b>b</b>) Winter Wheat and (<b>c</b>) Winter Triticale classes generated by IncepTAE and LTAE. The red arrows indicate the temporal phases of cloud interference.</p>
Full article ">
22 pages, 1599 KiB  
Article
Single-Stage Entity–Relation Joint Extraction of Pesticide Registration Information Based on HT-BES Multi-Dimensional Labeling Strategy
by Chenyang Dong, Shiyu Xi, Yinchao Che, Shufeng Xiong, Xinming Ma, Lei Xi and Shuping Xiong
Algorithms 2024, 17(12), 559; https://doi.org/10.3390/a17120559 - 6 Dec 2024
Viewed by 366
Abstract
Pesticide registration information is an essential part of the pesticide knowledge base. However, the large amount of unstructured text data that it contains pose significant challenges for knowledge storage, retrieval, and utilization. To address the characteristics of pesticide registration text such as high [...] Read more.
Pesticide registration information is an essential part of the pesticide knowledge base. However, the large amount of unstructured text data that it contains pose significant challenges for knowledge storage, retrieval, and utilization. To address the characteristics of pesticide registration text such as high information density, complex logical structures, large spans between entities, and heterogeneous entity lengths, as well as to overcome the challenges faced when using traditional joint extraction methods, including triplet overlap, exposure bias, and redundant computation, we propose a single-stage entity–relation joint extraction model based on HT-BES multi-dimensional labeling (MD-SERel). First, in the encoding layer, to address the complex structural characteristics of pesticide registration texts, we employ RoBERTa combined with a multi-head self-attention mechanism to capture the deep semantic features of the text. Simultaneously, syntactic features are extracted using a syntactic dependency tree and graph neural networks to enhance the model’s understanding of text structure. Subsequently, we integrate semantic and syntactic features, enriching the character vector representations and thus improving the model’s ability to represent complex textual data. Secondly, in the multi-dimensional labeling framework layer, we use HT-BES multi-dimensional labeling, where the model assigns multiple labels to each character. These labels include entity boundaries, positions, and head–tail entity association information, which naturally resolves overlapping triplets. Through utilizing a parallel scoring function and fine-grained classification components, the joint extraction of entities and relations is transformed into a multi-label sequence labeling task based on relation dimensions. This process does not involve interdependent steps, thus enabling single-stage parallel labeling, preventing exposure bias and reducing computational redundancy. Finally, in the decoding layer, entity–relation triplets are decoded based on the predicted labels from the fine-grained classification. The experimental results demonstrate that the MD-SERel model performs well on both the Pesticide Registration Dataset (PRD) and the general DuIE dataset. On the PRD, compared to the optimal baseline model, the training time is 1.2 times faster, the inference time is 1.2 times faster, and the F1 score is improved by 1.5%, demonstrating its knowledge extraction capabilities in pesticide registration documents. On the DuIE dataset, the MD-SERel model also achieved better results compared to the baseline, demonstrating its strong generalization ability. These findings will provide technical support for the construction of pesticide knowledge bases. Full article
(This article belongs to the Special Issue Algorithms for Feature Selection (3rd Edition))
Show Figures

Figure 1

Figure 1
<p>MD-SERel model.</p>
Full article ">Figure 2
<p>Self-attention mechanism architecture diagram.</p>
Full article ">Figure 3
<p>Syntactic dependency matrix. In the <a href="#algorithms-17-00559-f003" class="html-fig">Figure 3</a>, (<b>a</b>) is the result of semantic analysis of example sentences. (<b>b</b>) is a semantic adjacency matrix constructed from (<b>a</b>).</p>
Full article ">Figure 4
<p>HT-BES interactive annotation strategy.</p>
Full article ">Figure 5
<p>The type and quantity distribution of entities and relations.</p>
Full article ">Figure 6
<p>Entity lengths.</p>
Full article ">Figure 7
<p>The results for different overlapping patterns of triples.</p>
Full article ">Figure 8
<p>The results of different self-attention head numbers.</p>
Full article ">
13 pages, 839 KiB  
Article
An Unbiased Feature Estimation Network for Few-Shot Fine-Grained Image Classification
by Jiale Wang, Jin Lu, Junpo Yang, Meijia Wang and Weichuan Zhang
Sensors 2024, 24(23), 7737; https://doi.org/10.3390/s24237737 - 3 Dec 2024
Viewed by 462
Abstract
Few-shot fine-grained image classification (FSFGIC) aims to classify subspecies with similar appearances under conditions of very limited data. In this paper, we observe an interesting phenomenon: different types of image data augmentation techniques have varying effects on the performance of FSFGIC methods. This [...] Read more.
Few-shot fine-grained image classification (FSFGIC) aims to classify subspecies with similar appearances under conditions of very limited data. In this paper, we observe an interesting phenomenon: different types of image data augmentation techniques have varying effects on the performance of FSFGIC methods. This indicates that there may be biases in the features extracted from the input images. The bias of the acquired feature may cause deviation in the calculation of similarity, which is particularly detrimental to FSFGIC tasks characterized by low inter-class variation and high intra-class variation, thus affecting the classification accuracy. To address the problems mentioned, we propose an unbiased feature estimation network. The designed network has the capability to significantly optimize the quality of the obtained feature representations and effectively reduce the feature bias from input images. Furthermore, our proposed architecture can be easily integrated into any contextual training mechanism. Extensive experiments on the FSFGIC tasks demonstrate the effectiveness of the proposed algorithm, showing a notable improvement in classification accuracy. Full article
Show Figures

Figure 1

Figure 1
<p>The impact of bias on obtaining feature representations. DA represents data augmentation operations. The numbers on the far right indicate the probability values of the predicted categories, with the red border highlighting the prediction with the highest probability.</p>
Full article ">Figure 2
<p>An overview of the proposed UFENet for a 5-way 1-shot FSFGIC task.</p>
Full article ">Figure 3
<p>The heatmaps of nine images visualized by the FRN and the proposed UFENet.</p>
Full article ">Figure 4
<p>The proposed method is applied to the performance comparison of ProtoNet.</p>
Full article ">
19 pages, 8201 KiB  
Article
Wavelet Texture Descriptor for Steel Surface Defect Classification
by Djilani Belila, Belal Khaldi and Oussama Aiadi
Materials 2024, 17(23), 5873; https://doi.org/10.3390/ma17235873 - 29 Nov 2024
Viewed by 445
Abstract
The accurate and efficient classification of steel surface defects is critical for ensuring product quality and minimizing production costs. This paper proposes a novel method based on wavelet transform and texture descriptors for the robust and precise classification of steel surface defects. By [...] Read more.
The accurate and efficient classification of steel surface defects is critical for ensuring product quality and minimizing production costs. This paper proposes a novel method based on wavelet transform and texture descriptors for the robust and precise classification of steel surface defects. By leveraging the multiscale analysis capabilities of wavelet transforms, our method extracts both broad and fine-grained textural features. It involves decomposing images using multi-level wavelet transforms, extracting a series set of statistical and textural features from the resulting coefficients, and employing Recursive Feature Elimination (RFE) to select the most discriminative features. A comprehensive series of experiments was conducted on two datasets, NEU-CLS and X-SDD, to evaluate the proposed method. The results highlight the effectiveness of the method in accurately classifying steel surface defects, outperforming the state-of-the-art techniques. Our method achieved an accuracy of 99.67% for the NEU-CLS dataset and 98.24% for the X-SDD dataset. Furthermore, we demonstrate the robustness of our method in scenarios with limited data, maintaining high accuracy, making it well-suited for practical industrial applications where obtaining large datasets can be challenging. Full article
Show Figures

Figure 1

Figure 1
<p>General scheme of the proposed WTD method for steel defect classification. The process involves image acquisition, wavelet-based image decomposition across multiple levels to extract both approximation (LL) and detail coefficients (LH, HL, HH), followed by feature extraction and selection. The selected features are then used to train a classifier to accurately identify various defect types.</p>
Full article ">Figure 2
<p>Illustration of the wavelet decomposition process using the Haar wavelet. The figure demonstrates the multi-level decomposition of an original image into its approximation (LL) and detail coefficients (LH, HL, HH) across two levels.</p>
Full article ">Figure 3
<p>Representative samples from each defect class of the employed X-SDD and NEU-CLS datasets. (<b>a</b>) X-SDD: Classes from left to right are finishing roll printing, iron sheet ash, oxide scale of plate system, oxide scale of temperature system, red iron, slag inclusion, and surface scratch. (<b>b</b>) NEU-CLS: Classes shown from left to right are crazing, inclusions, patches, pitted surface, rolled-in scale, and scratches.</p>
Full article ">Figure 4
<p>Confusion matrices illustrating the performance of the proposed method on each steel defect class for the datasets: (<b>a</b>) NEU_CLS and (<b>b</b>) X-SDD.</p>
Full article ">Figure 5
<p>Average classification accuracy of wavelets across datasets, shown in ascending order. The Haar wavelet demonstrates the highest effectiveness for defect classification.</p>
Full article ">
18 pages, 8899 KiB  
Article
Feature Coding and Graph via Transformer: Different Granularities Classification for Aircraft
by Jianghao Rao, Senlin Qin, Zongyan An, Jianlin Zhang, Qiliang Bao and Zhenming Peng
Aerospace 2024, 11(12), 976; https://doi.org/10.3390/aerospace11120976 - 26 Nov 2024
Viewed by 418
Abstract
Against the background of the sky, imaging and perception of aircraft are crucial for various vision applications. Thanks to the ever-evolving nature of the convolutional neural network (CNN), it has become easier to distinguish and recognize different types of aircraft. Nevertheless, accurate classification [...] Read more.
Against the background of the sky, imaging and perception of aircraft are crucial for various vision applications. Thanks to the ever-evolving nature of the convolutional neural network (CNN), it has become easier to distinguish and recognize different types of aircraft. Nevertheless, accurate classification for sub-categories of aircraft still poses great challenges. On one hand, fine-grained recognition focuses on exploring and studying such problems. On the other hand, aircraft under different sub-categories and granularities put forward higher requirements for feature representation to classify, which led us to rethink the in-depth application of features. We noticed that information in the swin-transformer effectively represents the features in neural network layers, fully showcasing encoding and indexing for information. Through further research based on this, we proposed a better understanding of encoding and reuse for features, and innovatively performed feature coding graphically for classification. In this paper, our approach shows the effects on aircraft feature representation and classification, manifested from the flexible recognition effect at different aircraft category granularities, and outperforms other famous fine-grained classification models on this vision task. Not only did the approach we proposed demonstrate adaptability to aircraft at different classification granularities, but it also revealed the mechanisms and characteristics of feature encoding under different sample space partitions for classification. The relationship between the oriented representation of aircraft features and various classification granularities, which is manifested through different classification criteria, shows that feature coding and graph construction via the transformer opens a new door for specific defined classification tasks where objects are divided under various partition criteria, and provides another perspective on calculation and feature extraction in fine-grained classification. Full article
(This article belongs to the Section Aeronautics)
Show Figures

Figure 1

Figure 1
<p>The appearance and structure of aircraft are influenced by many human design factors, rather than being naturally generated and uniformly divided.</p>
Full article ">Figure 2
<p>Data partition of sample space under different criteria and labels in fine-grained classification. Each point represents the mapping of a sample datum in the feature space. When the classification granularity is more detailed, it requires higher and finer discrimination of features and the division of space.</p>
Full article ">Figure 3
<p>The overall flow and schematic diagram of the fine-grained aircraft classification.</p>
Full article ">Figure 4
<p>New nodes formed by weighing nodes in different regions at the graph level and node level.</p>
Full article ">Figure 5
<p>Graph data correlation matrix multiplication. We adopt 1-dimensional convolution operations, as shown in the left figure, to generate tokens.</p>
Full article ">Figure 6
<p>Joints generated from graph data and the final response.</p>
Full article ">Figure 7
<p>Loss function calculation.</p>
Full article ">Figure 8
<p>Class labels for the same data under different classification criteria.</p>
Full article ">Figure 9
<p>Further study of information encoding under various data partition criteria.</p>
Full article ">Figure 10
<p>Heatmaps of our proposed approach in the process of test data inference. The test data, which our model has never seen before, are not provided for model training, and are used to test the performance of our model. In each instance, the color image on the right is the input image, the one on the left is the response heatmap, and the middle one is the result of overlaying the heatmap on the original color image.</p>
Full article ">
Back to TopTop