Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (9,973)

Search Parameters:
Keywords = Convolutional Neural Networks (CNN)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
12 pages, 933 KiB  
Article
Deep Learning-Based Periapical Lesion Detection on Panoramic Radiographs
by Viktor Szabó, Kaan Orhan, Csaba Dobó-Nagy, Dániel Sándor Veres, David Manulis, Matvey Ezhov, Alex Sanders and Bence Tamás Szabó
Diagnostics 2025, 15(4), 510; https://doi.org/10.3390/diagnostics15040510 - 19 Feb 2025
Abstract
Background/Objectives: Our study aimed to determine the accuracy of the artificial intelligence-based Diagnocat system (DC) in detecting periapical lesions (PL) on panoramic radiographs (PRs). Methods: 616 teeth were selected from 357 panoramic radiographs, including 308 teeth with clearly visible periapical radiolucency and [...] Read more.
Background/Objectives: Our study aimed to determine the accuracy of the artificial intelligence-based Diagnocat system (DC) in detecting periapical lesions (PL) on panoramic radiographs (PRs). Methods: 616 teeth were selected from 357 panoramic radiographs, including 308 teeth with clearly visible periapical radiolucency and 308 without any periapical lesion. Three groups were generated: teeth with radiographic signs of caries (Group 1), teeth with coronal restoration (Group 2), and teeth with root canal filling (Group 3). The PRs were uploaded to the Diagnocat system for evaluation. The performance of the convolutional neural network in detecting PLs was assessed by its sensitivity, specificity, and positive and negative predictive values, as well as the diagnostic accuracy value. We investigated the possible effect of the palatoglossal air space (PGAS) on the evaluation of the AI tool. Results: DC identified periapical lesions in 240 (77.9%) cases out of the 308 teeth with PL and detected no PL in 68 (22.1%) teeth with PL. The AI-based system detected no PL in any of the groups without PL. The overall sensitivity, specificity, and diagnostic accuracy of DC were 0.78, 1.00, and 0.89, respectively. Considering these parameters for each group, Group 2 showed the highest values at 0.84, 1.00, and 0.95, respectively. Fisher’s Exact test showed that PGAS does not significantly affect (p = 1) the detection of PL in the upper teeth. The AI-based system showed lower probability values for detecting PL in the case of central incisors, wisdom teeth, and canines. The sensitivity and diagnostic accuracy of DC for detecting PL on canines showed lower values at 0.27 and 0.64, respectively. Conclusions: The CNN-based Diagnocat system can support the diagnosis of PL on PRs and serves as a decision-support tool during radiographic assessments. Full article
29 pages, 34115 KiB  
Article
Sliding-Window CNN + Channel-Time Attention Transformer Network Trained with Inertial Measurement Units and Surface Electromyography Data for the Prediction of Muscle Activation and Motion Dynamics Leveraging IMU-Only Wearables for Home-Based Shoulder Rehabilitation
by Aoyang Bai, Hongyun Song, Yan Wu, Shurong Dong, Gang Feng and Hao Jin
Sensors 2025, 25(4), 1275; https://doi.org/10.3390/s25041275 - 19 Feb 2025
Abstract
Inertial Measurement Units (IMUs) are widely utilized in shoulder rehabilitation due to their portability and cost-effectiveness, but their reliance on spatial motion data restricts their use in comprehensive musculoskeletal analyses. To overcome this limitation, we propose SWCTNet (Sliding Window CNN + Channel-Time Attention [...] Read more.
Inertial Measurement Units (IMUs) are widely utilized in shoulder rehabilitation due to their portability and cost-effectiveness, but their reliance on spatial motion data restricts their use in comprehensive musculoskeletal analyses. To overcome this limitation, we propose SWCTNet (Sliding Window CNN + Channel-Time Attention Transformer Network), an advanced neural network specifically tailored for multichannel temporal tasks. SWCTNet integrates IMU and surface electromyography (sEMG) data through sliding window convolution and channel-time attention mechanisms, enabling the efficient extraction of temporal features. This model enables the prediction of muscle activation patterns and kinematics using exclusively IMU data. The experimental results demonstrate that the SWCTNet model achieves recognition accuracies ranging from 87.93% to 91.03% on public temporal datasets and an impressive 98% on self-collected datasets. Additionally, SWCTNet exhibits remarkable precision and stability in generative tasks: the normalized DTW distance was 0.12 for the normal group and 0.25 for the patient group when using the self-collected dataset. This study positions SWCTNet as an advanced tool for extracting musculoskeletal features from IMU data, paving the way for innovative applications in real-time monitoring and personalized rehabilitation at home. This approach demonstrates significant potential for long-term musculoskeletal function monitoring in non-clinical or home settings, advancing the capabilities of IMU-based wearable devices. Full article
(This article belongs to the Special Issue Wearable Devices for Physical Activity and Healthcare Monitoring)
Show Figures

Figure 1

Figure 1
<p>Experimental setup for IMU and sEMG data acquisition: (<b>a</b>) placement of the IMU sensor; (<b>b</b>) data acquisition checkpoints for IMU and sEMG sensors; (<b>c</b>) a subset of raw signals captured by the system.</p>
Full article ">Figure 2
<p>Data process result. The Euler angles obtained from IMU transformation and the processed multi-channel EMG signals are plotted for three preset SJ movements.</p>
Full article ">Figure 3
<p>Dataset organization structure.</p>
Full article ">Figure 4
<p><b>SWCTNet model architecture.</b> The model consists of the SW-CNN Block, CTAT Block, and Downstream Task Block.</p>
Full article ">Figure 5
<p>Structure of the SW-CNN Block.</p>
Full article ">Figure 6
<p>Structure of the CTAT Block.</p>
Full article ">Figure 7
<p>Results of the ablation study conducted on four public datasets.</p>
Full article ">Figure 8
<p>Radar chart comparing the performance of different models on the feature prediction task, normalized to the range [0, 1]: (<b>a</b>) results for SWIFTIES dataset; (<b>b</b>) results for personal dataset.</p>
Full article ">Figure 9
<p>Visualization of the sequence generation task, showing the actual and predicted EMG feature time-series signals, where (<b>a</b>,<b>b</b>) are the results of the RMS feature, (<b>c</b>,<b>d</b>) are the results of the MPF feature, (<b>a</b>,<b>c</b>) are the results for healthy individuals, and (<b>b</b>,<b>d</b>) are the results for patients.</p>
Full article ">
33 pages, 3144 KiB  
Article
CNN-Based Optimization for Fish Species Classification: Tackling Environmental Variability, Class Imbalance, and Real-Time Constraints
by Amirhosein Mohammadisabet, Raza Hasan, Vishal Dattana, Salman Mahmood and Saqib Hussain
Information 2025, 16(2), 154; https://doi.org/10.3390/info16020154 - 19 Feb 2025
Abstract
Automated fish species classification is essential for marine biodiversity monitoring, fisheries management, and ecological research. However, challenges such as environmental variability, class imbalance, and computational demands hinder the development of robust classification models. This study investigates the effectiveness of convolutional neural network (CNN)-based [...] Read more.
Automated fish species classification is essential for marine biodiversity monitoring, fisheries management, and ecological research. However, challenges such as environmental variability, class imbalance, and computational demands hinder the development of robust classification models. This study investigates the effectiveness of convolutional neural network (CNN)-based models and hybrid approaches to address these challenges. Eight CNN architectures, including DenseNet121, MobileNetV2, and Xception, were compared alongside traditional classifiers like support vector machines (SVMs) and random forest. DenseNet121 achieved the highest accuracy (90.2%), leveraging its superior feature extraction and generalization capabilities, while MobileNetV2 balanced accuracy (83.57%) with computational efficiency, processing images in 0.07 s, making it ideal for real-time deployment. Advanced preprocessing techniques, such as data augmentation, turbidity simulation, and transfer learning, were employed to enhance dataset robustness and address class imbalance. Hybrid models combining CNNs with traditional classifiers achieved intermediate accuracy with improved interpretability. Optimization techniques, including pruning and quantization, reduced model size by 73.7%, enabling real-time deployment on resource-constrained devices. Grad-CAM visualizations further enhanced interpretability by identifying key image regions influencing predictions. This study highlights the potential of CNN-based models for scalable, interpretable fish species classification, offering actionable insights for sustainable fisheries management and biodiversity conservation. Full article
(This article belongs to the Special Issue Machine Learning and Data Mining: Innovations in Big Data Analytics)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Framework for the research methodology in fish species classification.</p>
Full article ">Figure 2
<p>Example of augmented images.</p>
Full article ">Figure 3
<p>Confusion matrix for DenseNet121 performance.</p>
Full article ">Figure 4
<p>Training and validation loss for DenseNet121.</p>
Full article ">Figure 5
<p>Grad-CAM heatmap analysis for a single class.</p>
Full article ">Figure 6
<p>Comparative Grad-CAM visualizations across multiple classes.</p>
Full article ">Figure 7
<p>Turbidity simulation results and model predictions.</p>
Full article ">
32 pages, 10807 KiB  
Article
A Comprehensive Evaluation of Monocular Depth Estimation Methods in Low-Altitude Forest Environment
by Jiwen Jia, Junhua Kang, Lin Chen, Xiang Gao, Borui Zhang and Guijun Yang
Remote Sens. 2025, 17(4), 717; https://doi.org/10.3390/rs17040717 - 19 Feb 2025
Abstract
Monocular depth estimation (MDE) is a critical computer vision task that enhances environmental perception in fields such as autonomous driving and robot navigation. In recent years, deep learning-based MDE methods have achieved notable progress in these fields. However, achieving robust monocular depth estimation [...] Read more.
Monocular depth estimation (MDE) is a critical computer vision task that enhances environmental perception in fields such as autonomous driving and robot navigation. In recent years, deep learning-based MDE methods have achieved notable progress in these fields. However, achieving robust monocular depth estimation in low-altitude forest environments remains challenging, particularly in scenes with dense and cluttered foliage, which complicates applications in environmental monitoring, agriculture, and search and rescue operations. This paper presents a comprehensive evaluation of state-of-the-art deep learning-based MDE methods on low-altitude forest datasets. The evaluated models include both self-supervised and supervised approaches, employing different network structures such as convolutional neural networks (CNNs) and Vision Transformers (ViTs). We assessed the generalization of these approaches across diverse low-altitude scenarios, specifically focusing on forested environments. A systematic set of evaluation criteria is employed, comprising traditional image-based global statistical metrics as well as geometry-aware metrics, to provide a more comprehensive evaluation of depth estimation performance. The results indicate that most Transformer-based models, such as DepthAnything and Metric3D, outperform traditional CNN-based models in complex forest environments by capturing detailed tree structures and depth discontinuities. Conversely, CNN-based models like MiDas and Adabins struggle with handling depth discontinuities and complex occlusions, yielding less detailed predictions. On the Mid-Air dataset, the Transformer-based DepthAnything demonstrates a 54.2% improvement in RMSE for the global error metric compared to the CNN-based Adabins. On the LOBDM dataset, the CNN-based MiDas has the depth edge completeness error of 93.361, while the Transformer-based Metric3D demonstrates the significantly lower error of only 5.494. These findings highlight the potential of Transformer-based approaches for monocular depth estimation in low-altitude forest environments, with implications for high-throughput plant phenotyping, environmental monitoring, and other forest-specific applications. Full article
(This article belongs to the Special Issue Image Analysis for Forest Environmental Monitoring)
22 pages, 3475 KiB  
Article
Uncertainty-Aware Adaptive Multiscale U-Net for Low-Contrast Cardiac Image Segmentation
by A. S. M. Sharifuzzaman Sagar, Muhammad Zubair Islam, Jawad Tanveer and Hyung Seok Kim
Appl. Sci. 2025, 15(4), 2222; https://doi.org/10.3390/app15042222 - 19 Feb 2025
Abstract
Medical image analysis is critical for diagnosing and planning treatments, particularly in addressing heart disease, a leading cause of mortality worldwide. Precise segmentation of the left atrium, a key structure in cardiac imaging, is essential for detecting conditions such as atrial fibrillation, heart [...] Read more.
Medical image analysis is critical for diagnosing and planning treatments, particularly in addressing heart disease, a leading cause of mortality worldwide. Precise segmentation of the left atrium, a key structure in cardiac imaging, is essential for detecting conditions such as atrial fibrillation, heart failure, and stroke. However, its complex anatomy, subtle boundaries, and inter-patient variations make accurate segmentation challenging for traditional methods. Recent advancements in deep learning, especially semantic segmentation, have shown promise in addressing these limitations by enabling detailed, pixel-wise classification. This study proposes a novel segmentation framework Adaptive Multiscale U-Net (AMU-Net) combining Convolutional Neural Networks (CNNs) and transformer-based encoder–decoder architectures. The framework introduces a Contextual Dynamic Encoder (CDE) for extracting multi-scale features and capturing long-range dependencies. An Adaptive Feature Decoder Block (AFDB), leveraging an Adaptive Feature Attention Block (AFAB) improves boundary delineation. Additionally, a Spectral Synthesis Fusion Head (SFFH) synthesizes spectral and spatial features, enhancing segmentation performance in low-contrast regions. To ensure robustness, data augmentation techniques such as rotation, scaling, and flipping are applied. Laplacian approximation is employed for uncertainty estimation, enabling interpretability and identifying regions of low confidence. Our proposed model achieves a Dice score of 93.35, a Precision of 94.12, and a Recall of 92.78, outperforming existing methods. Full article
Show Figures

Figure 1

Figure 1
<p>Overall structure of AMU-Net for medical image analysis.</p>
Full article ">Figure 2
<p>Overall structure of CDE encoder block, along with Modulated Predictive Coding Module (MPCM), used in our model.</p>
Full article ">Figure 3
<p>The overall structure of the proposed DMSA module used in the encoder block.</p>
Full article ">Figure 4
<p>Overall structure of AFDB used in our proposed AMU-Net.</p>
Full article ">Figure 5
<p>Illustration of Adaptive Fusion Attention Block.</p>
Full article ">Figure 6
<p>An illustration of the overall framework of the SFFH.</p>
Full article ">Figure 7
<p>Acquired loss and Dice score during the training process of AMU-Net.</p>
Full article ">Figure 8
<p>The visualization results of AMU-net to evaluate the performance of the model.</p>
Full article ">Figure 9
<p>The visualization results FPs and FNs on challenging images.</p>
Full article ">Figure 10
<p>The visualization results of different models along with FP and FN.</p>
Full article ">Figure 11
<p>Uncertainty estimation of the predicted results using Laplacian approximation.</p>
Full article ">Figure 12
<p>Calibration error of the different data shift intensity for baseline and Bayesian models. Diamond represents the outlier.</p>
Full article ">
15 pages, 10730 KiB  
Article
An Efficient Forest Smoke Detection Approach Using Convolutional Neural Networks and Attention Mechanisms
by Quy-Quyen Hoang, Quy-Lam Hoang and Hoon Oh
J. Imaging 2025, 11(2), 67; https://doi.org/10.3390/jimaging11020067 - 19 Feb 2025
Abstract
This study explores a method of detecting smoke plumes effectively as the early sign of a forest fire. Convolutional neural networks (CNNs) have been widely used for forest fire detection; however, they have not been customized or optimized for smoke characteristics. This paper [...] Read more.
This study explores a method of detecting smoke plumes effectively as the early sign of a forest fire. Convolutional neural networks (CNNs) have been widely used for forest fire detection; however, they have not been customized or optimized for smoke characteristics. This paper proposes a CNN-based forest smoke detection model featuring novel backbone architecture that can increase detection accuracy and reduce computational load. Since the proposed backbone detects the plume of smoke through different views using kernels of varying sizes, it can better detect smoke plumes of different sizes. By decomposing the traditional square kernel convolution into a depth-wise convolution of the coordinate kernel, it can not only better extract the features of the smoke plume spreading along the vertical dimension but also reduce the computational load. An attention mechanism was applied to allow the model to focus on important information while suppressing less relevant information. The experimental results show that our model outperforms other popular ones by achieving detection accuracy of up to 52.9 average precision (AP) and significantly reduces the number of parameters and giga floating-point operations (GFLOPs) compared to the popular models. Full article
Show Figures

Figure 1

Figure 1
<p>The architecture of the forest fire detection model.</p>
Full article ">Figure 2
<p>Reduction in the number of parameters by using different sized kernels.</p>
Full article ">Figure 3
<p>The smoke features tend to be vertically distributed through the layers.</p>
Full article ">Figure 4
<p>The proposed Backbone structure for forest fire detection.</p>
Full article ">Figure 5
<p>CBAM architecture.</p>
Full article ">Figure 6
<p>The Neck architecture.</p>
Full article ">Figure 7
<p>The Head architecture.</p>
Full article ">Figure 8
<p>Qualitative test results for 15 forest fire images numbered 1 to 15, with the class name and confidence value given at the top of each bounding box.</p>
Full article ">Figure 9
<p>Heat maps of the images to which different models are applied.</p>
Full article ">
32 pages, 124914 KiB  
Article
CNN–Transformer Hybrid Architecture for Underwater Sonar Image Segmentation
by Juan Lei, Huigang Wang, Zelin Lei, Jiayuan Li and Shaowei Rong
Remote Sens. 2025, 17(4), 707; https://doi.org/10.3390/rs17040707 - 19 Feb 2025
Abstract
The salient object detection (SOD) of forward-looking sonar images plays a crucial role in underwater detection and rescue tasks. However, the existing SOD algorithms find it difficult to effectively extract salient features and spatial structure information from images with scarce semantic information, uneven [...] Read more.
The salient object detection (SOD) of forward-looking sonar images plays a crucial role in underwater detection and rescue tasks. However, the existing SOD algorithms find it difficult to effectively extract salient features and spatial structure information from images with scarce semantic information, uneven intensity distribution, and high noise. Convolutional neural networks (CNNs) have strong local feature extraction capabilities, but they are easily constrained by the receptive field and lack the ability to model long-range dependencies. Transformers, with their powerful self-attention mechanism, are capable of modeling the global features of a target, but they tend to lose a significant amount of local detail. Mamba effectively models long-range dependencies in long sequence inputs through a selection mechanism, offering a novel approach to capturing long-range correlations between pixels. However, since the saliency of image pixels does not exhibit sequential dependencies, this somewhat limits Mamba’s ability to fully capture global contextual information during the forward pass. Inspired by multimodal feature fusion learning, we propose a hybrid CNN–Transformer–Mamba architecture, termed FLSSNet. FLSSNet is built upon a CNN and Transformer backbone network, integrating four core submodules to address various technical challenges: (1) The asymmetric dual encoder–decoder (ADED) is capable of simultaneously extracting features from different modalities and systematically modeling both local contextual information and global spatial structure. (2) The Transformer feature converter (TFC) module optimizes the multimodal feature fusion process through feature transformation and channel compression. (3) The long-range correlation attention (LRCA) module enhances CNN’s ability to model long-range dependencies through the collaborative use of convolutional kernels, selective sequential scanning, and attention mechanisms, while effectively suppressing noise interference. (4) The recursive contour refinement (RCR) model refines edge contour information through a layer-by-layer recursive mechanism, achieving greater precision in boundary details. The experimental results show that FLSSNet exhibits outstanding competitiveness among 25 state-of-the-art SOD methods, achieving MAE and Eξ values of 0.04 and 0.973, respectively. Full article
(This article belongs to the Special Issue Ocean Remote Sensing Based on Radar, Sonar and Optical Techniques)
Show Figures

Figure 1

Figure 1
<p>Examples of FLS images containing various noise sources. The red region indicates the ground truth of the salient target, the purple region represents areas with intensity inconsistency, the blue region indicates multipath noise, and the yellow region represents shadow noise.</p>
Full article ">Figure 2
<p>The overall structure of the proposed FLSSNet. The method employs a two-stage strategy: the first stage utilizes an asymmetric dual encoder–decoder structure for saliency feature extraction, while the second stage further refines the feature maps using the recursive refine module. In the first stage, the image is input into two encoders to obtain feature information from different modalities. Simultaneously, the Transformer feature converter module is used to transform and compress information from these modalities. Next, the long-range correlation attention module integrates multi-level features and reduces feature redundancy. Finally, the recursive refine module is employed to further enhance the precision of feature prediction.</p>
Full article ">Figure 3
<p>The overall structure of the proposed Transformer feature converter (TFC) module. The TFC module mainly consists of the residual channel attention module (RCAM) and the multi-scale dual self-attention mechanism module. MHSA stands for multi-head attention mechanism.</p>
Full article ">Figure 4
<p>The overall structure of the proposed long-range correlation attention (LRCA) module. The LRCA module primarily consists of a multi-directional convolution module (MDC) and omnidirectional selective scan module (OSSM), an attention module.</p>
Full article ">Figure 5
<p>The overall structure of the proposed recursive block (<b>a</b>) and recursive contour extraction (RECM) module (<b>b</b>). <span class="html-italic">N</span> denotes the number of RECMs, and <span class="html-italic">m</span> represents the sequence number of recursive blocks in the hierarchy of RCR.</p>
Full article ">Figure 6
<p>Contains different types of samples. (<b>a</b>) Bottle; (<b>b</b>) can; (<b>c</b>) tire; (<b>d</b>) chain; (<b>e</b>) hook; (<b>f</b>) standing bottle; (<b>g</b>) drink carton; (<b>h</b>) shampoo bottle; (<b>i</b>) valve; (<b>j</b>) propeller; (<b>k</b>) wall.</p>
Full article ">Figure 7
<p>Examples of noise in FLS images are shown. The red areas indicate the ground truth. The blue areas represent small targets that are easily lost due to spatial positioning. The yellow areas show shadow noise caused by occlusions. The cyan areas depict scattering noise caused by sound waves encountering suspended particles, bubbles, and other media. The purple areas illustrate pseudo-target noise caused by reflection noise from water waves.</p>
Full article ">Figure 8
<p>The comparison between FLSSNet and its comparison model on the PR curve (<b>a</b>) and F-measure curve (<b>b</b>). Please zoom in to view.</p>
Full article ">Figure 9
<p>Visual display of FLSSNet and comparison models. The red box indicates significant differences.</p>
Full article ">Figure 10
<p>Visualization results of side outputs at different levels of the recursive block.</p>
Full article ">Figure 11
<p>(<b>a</b>,<b>b</b>) Quantitative comparison of the variant models within the CNN–Transformer hybrid backbone architecture in the PR (precision–recall) and F-measure curves.</p>
Full article ">Figure 12
<p>Visualization results of variant models in CNN–Transformer hybrid backbone architecture.</p>
Full article ">Figure 13
<p>The visualization results of a single module in a pure CNN backbone architecture.</p>
Full article ">Figure 14
<p>Visualization of ablation experiments using MDC and OSSM in CNN–Transformer hybrid architecture and pure CNN architecture.</p>
Full article ">Figure 15
<p>In different levels of RCR, the visualization results of <math display="inline"><semantics> <msubsup> <mi>X</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>−</mo> <mn>1</mn> <mo>)</mo> </mrow> <mi>l</mi> </msubsup> </semantics></math> and <math display="inline"><semantics> <msubsup> <mi>X</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>−</mo> <mn>1</mn> <mo>)</mo> </mrow> <mi>c</mi> </msubsup> </semantics></math> in the second layer of RCEM are presented. Here, <math display="inline"><semantics> <mrow> <mo>(</mo> <mi>s</mi> <mn>1</mn> <mo>,</mo> <mi>s</mi> <mn>2</mn> <mo>,</mo> <mo>…</mo> <mo>,</mo> <mi>s</mi> <mn>5</mn> <mo>)</mo> </mrow> </semantics></math> represent the hierarchical sequence of RCEM, while <span class="html-italic">L</span> and <span class="html-italic">C</span>, respectively, denote the specific visualization results of <math display="inline"><semantics> <msubsup> <mi>X</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>−</mo> <mn>1</mn> <mo>)</mo> </mrow> <mi>l</mi> </msubsup> </semantics></math> and <math display="inline"><semantics> <msubsup> <mi>X</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>−</mo> <mn>1</mn> <mo>)</mo> </mrow> <mi>c</mi> </msubsup> </semantics></math>.</p>
Full article ">
29 pages, 11018 KiB  
Article
Impact on Classification Process Generated by Corrupted Features
by Simona Moldovanu, Dan Munteanu and Carmen Sîrbu
Big Data Cogn. Comput. 2025, 9(2), 45; https://doi.org/10.3390/bdcc9020045 - 18 Feb 2025
Viewed by 145
Abstract
The topic of this study is the testing of the robustness of machine learning (ML) and neural network (NN) models with a new idea based on corrupted data. Typically, ML and NN classifiers are trained on real feature data; however, a portion of [...] Read more.
The topic of this study is the testing of the robustness of machine learning (ML) and neural network (NN) models with a new idea based on corrupted data. Typically, ML and NN classifiers are trained on real feature data; however, a portion of the features may be false, with noise, or incorrect. The undesired content was analyzed in eight experiments with false data, six with feature noise, and six with label noise. These tests were all conducted on the public Breast Cancer Wisconsin Dataset (BCWD). Throughout this, the false and noise data were gradually corrupted in a random way, generating new data and replacing raw features that belonged to the BCWD. Artificial Intelligence (AI) should be properly selected while categorizing different diseases using medical data. The Pearson correlation coefficient (PCC) applied between features monitored their correlation in each experiment, and a correlation matrix between both true and false features was used. Four machine learning (ML) algorithms—Random Forest (RF), XGBClassifier (XGB), K-Nearest Neighbors (KNN), and Support Vector Machine (SVM)—were used, as well as for the analysis of important features (IF) and the binary classification. The study was completed using three deep neural networks—a simple Deep Neural Network (DNN), a Convolutional Neural Network (CNN), and a Transformer Neural Network (TNN). In the context of a binary classification, the accuracy, F1-score, Area Under the Curve (AUC), and Matthews correlation coefficient (MCC) metrics of the performance of classification in malignant versus benign breast cancer (BC) was computed. The results demonstrated the robustness of some methods and the sensitivity of other machine learning algorithms in the context of corrupted data, computational cost, and hyperparameters optimization. Full article
Show Figures

Figure 1

Figure 1
<p>The flowchart of proposed method.</p>
Full article ">Figure 2
<p>BCWD in academic publishing platforms.</p>
Full article ">Figure 3
<p>Experiment #1: (<b>a</b>) correlation matrix; (<b>b</b>) loss and accuracy curves for training DNN; and (<b>c</b>) feature importance.</p>
Full article ">Figure 4
<p>Experiment #2: (<b>a</b>) correlation matrix; (<b>b</b>) loss and accuracy curves for training DNN; and (<b>c</b>) feature importance.</p>
Full article ">Figure 5
<p>Experiment #3: (<b>a</b>) correlation matrix; (<b>b</b>) loss and accuracy curves for training DNN; and (<b>c</b>) feature importance.</p>
Full article ">Figure 6
<p>Experiment #4: (<b>a</b>) correlation matrix; (<b>b</b>) loss and accuracy curves for training deep DNN; and (<b>c</b>) feature importance.</p>
Full article ">Figure 7
<p>Experiment #5: (<b>a</b>) correlation matrix; (<b>b</b>) loss and accuracy curves for training DNN; and (<b>c</b>) feature importance.</p>
Full article ">Figure 8
<p>Experiment #6: (<b>a</b>) correlation matrix; (<b>b</b>) loss and accuracy curve for training DNN; and (<b>c</b>) feature importance.</p>
Full article ">Figure 9
<p>Experiment #7: (<b>a</b>) correlation matrix; (<b>b</b>) loss and accuracy curves for training deep DNN; and (<b>c</b>) feature importance.</p>
Full article ">Figure 10
<p>Experiment #8: (<b>a</b>) correlation matrix; (<b>b</b>) loss and accuracy curves for training deep DNN; and (<b>c</b>) feature importance.</p>
Full article ">Figure 11
<p>Graph of ACC, F1-score, and AUC along eight experiments with false data for KNN, XGB, RF, SVM, DNN, CNN, and transformer.</p>
Full article ">Figure 11 Cont.
<p>Graph of ACC, F1-score, and AUC along eight experiments with false data for KNN, XGB, RF, SVM, DNN, CNN, and transformer.</p>
Full article ">Figure 12
<p>Graph of ACC, F1-score, and AUC along six experiments with noise on feature data for KNN, XGB, RF, SVM, DNN, CNN, and transformer.</p>
Full article ">Figure 12 Cont.
<p>Graph of ACC, F1-score, and AUC along six experiments with noise on feature data for KNN, XGB, RF, SVM, DNN, CNN, and transformer.</p>
Full article ">Figure 13
<p>Graph of ACC, F1-score, and AUC along six experiments with noise on target data for KNN, XGB, RF, SVM, DNN, CNN, and transformer.</p>
Full article ">Figure 13 Cont.
<p>Graph of ACC, F1-score, and AUC along six experiments with noise on target data for KNN, XGB, RF, SVM, DNN, CNN, and transformer.</p>
Full article ">
18 pages, 5677 KiB  
Article
Computer Vision-Based Concrete Crack Identification Using MobileNetV2 Neural Network and Adaptive Thresholding
by Li Hui, Ahmed Ibrahim and Riyadh Hindi
Infrastructures 2025, 10(2), 42; https://doi.org/10.3390/infrastructures10020042 - 18 Feb 2025
Viewed by 80
Abstract
Concrete is widely used in different types of buildings and bridges; however, one of the major issues for concrete structures is crack formation and propagation during its service life. These cracks can potentially introduce harmful agents into concrete, resulting in a reduction in [...] Read more.
Concrete is widely used in different types of buildings and bridges; however, one of the major issues for concrete structures is crack formation and propagation during its service life. These cracks can potentially introduce harmful agents into concrete, resulting in a reduction in the overall lifespan of concrete structures. Traditional methods for crack detection primarily hinge on manual visual inspection, which relies on the experience and expertise of inspectors using tools such as magnifying glasses and microscopes. To address this issue, computer vision is one of the most innovative solutions for concrete cracking evaluation, and its application has been an area of research interest in the past few years. This study focuses on the utilization of the lightweight MobileNetV2 neural network for concrete crack detection. A dataset including 40,000 images was adopted and preprocessed using various thresholding techniques, of which adaptive thresholding was selected for developing the crack evaluation algorithm. While both the convolutional neural network (CNN) and MobileNetV2 indicated comparable accuracy levels in crack detection, the MobileNetV2 model’s significantly smaller size makes it a more efficient selection for crack detection using mobile devices. In addition, an advanced algorithm was developed to detect cracks and evaluate crack widths in high-resolution images. The effectiveness and reliability of both the selected method and the developed algorithm were subsequently assessed through experimental validation. Full article
(This article belongs to the Special Issue Advances in Artificial Intelligence for Infrastructures)
Show Figures

Figure 1

Figure 1
<p>Sample dataset of concrete surfaces with crack (<b>a</b>–<b>c</b>) and without crack (<b>d</b>–<b>f</b>).</p>
Full article ">Figure 2
<p>Issues in the dataset.</p>
Full article ">Figure 3
<p>Grayscale Processing for the Dataset.</p>
Full article ">Figure 4
<p>Thresholding using Different Approach: (<b>a</b>) Global; (<b>b</b>) OTSU; (<b>c</b>) Adaptive; (<b>d</b>) Triangle; (<b>e</b>) Isodata; (<b>f</b>) Gaussian.</p>
Full article ">Figure 5
<p>Adaptive Thresholding with Different Block Sizes and Constant C.</p>
Full article ">Figure 6
<p>Example of Normalization Process for Image after Thresholding.</p>
Full article ">Figure 7
<p>MobileNetV2 Neural Network Architecture.</p>
Full article ">Figure 8
<p>CNN Neural Network.</p>
Full article ">Figure 9
<p>Comparison of Training Accuracy.</p>
Full article ">Figure 10
<p>Comparison of Validation Accuracy.</p>
Full article ">Figure 11
<p>Framework for Crack Identification.</p>
Full article ">Figure 12
<p>Sliding Windows Technique for High-Resolution Image.</p>
Full article ">Figure 13
<p>NMS Process to Eliminate Redundancy.</p>
Full article ">Figure 14
<p>Crack Radius during the Calculation of Crack Width.</p>
Full article ">Figure 15
<p>Relationship between Image Width, Camera Sensor, and Working Distance.</p>
Full article ">Figure 16
<p>Original Image Used for Experimental Validation.</p>
Full article ">Figure 17
<p>Image after Denoise and Lighting Adjustment.</p>
Full article ">Figure 18
<p>Crack Tracking during Sliding Windows Process and Mask Image.</p>
Full article ">Figure 19
<p>Crack Width Measurement using MobileNetV2 and Validation.</p>
Full article ">
21 pages, 1850 KiB  
Review
Deep Learning for Automatic Detection of Volcanic and Earthquake-Related InSAR Deformation
by Xu Liu, Yingfeng Zhang, Xinjian Shan, Zhenjie Wang, Wenyu Gong and Guohong Zhang
Remote Sens. 2025, 17(4), 686; https://doi.org/10.3390/rs17040686 - 18 Feb 2025
Viewed by 179
Abstract
Interferometric synthetic aperture radar (InSAR) technology plays a crucial role in monitoring surface deformation and has become widely used in volcanic and earthquake research. With the rapid advancement of satellite technology, InSAR now generates vast volumes of deformation data. Deep learning has revolutionized [...] Read more.
Interferometric synthetic aperture radar (InSAR) technology plays a crucial role in monitoring surface deformation and has become widely used in volcanic and earthquake research. With the rapid advancement of satellite technology, InSAR now generates vast volumes of deformation data. Deep learning has revolutionized data analysis, offering exceptional capabilities for processing large datasets. Leveraging these advancements, automatic detection of volcanic and earthquake deformation from extensive InSAR datasets has emerged as a major research focus. In this paper, we first introduce several representative deep learning architectures commonly used in InSAR data analysis, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), generative adversarial networks (GANs), and Transformer networks. Each architecture offers unique advantages for addressing the challenges of InSAR data. We then systematically review recent progress in the automatic detection and identification of volcanic and earthquake deformation signals from InSAR images using deep learning techniques. This review highlights two key aspects: the design of network architectures and the methodologies for constructing datasets. Finally, we discuss the challenges in automatic detection and propose potential solutions. This study aims to provide a comprehensive overview of the current applications of deep learning for extracting InSAR deformation features, with a particular focus on earthquake and volcanic monitoring. Full article
Show Figures

Figure 1

Figure 1
<p>InSAR data processing based on deep learning. (<b>a</b>) The primary deep learning architectures utilized in InSAR data processing, including CNNs, RNNs, GANs, and Transformers. (<b>b</b>) DL is applied to various stages of InSAR data processing, including deformation detection, atmospheric correction, phase filtering, and phase unwrapping.</p>
Full article ">Figure 2
<p>The main architectures of CNNs, RNNs, GANs, and Transformer networks. (<b>a</b>) CNNs primarily consist of an input layer, convolutional layers, pooling layers, fully connected layers, and an output layer. (<b>b</b>) RNNs consist of input layers, recurrent hidden layers, and an output layer for sequence tasks. (<b>c</b>) GANs consist of a generator and a discriminator, which are trained together in a competitive manner. (<b>d</b>) Transformers consist of an encoder and a decoder, both using self-attention and feed-forward layers.</p>
Full article ">Figure 3
<p>Different learning processes between traditional machine learning and transfer learning. (<b>a</b>) Traditional machine learning approaches learn each task independently, starting from scratch. (<b>b</b>) Transfer learning utilizes knowledge gained from previous tasks and applies it to a target task.</p>
Full article ">Figure 4
<p>Data augmentation methods. (<b>a</b>) Geometric transformation-based data augmentation involves techniques like zoom, rotation, mirroring, and flipping to expand the training datasets. (<b>b</b>) Pixel-level transformation-based data augmentation modifies individual pixel values, such as brightness, contrast, and color, to enhance the datasets. (<b>c</b>) Filtering-based data augmentation involves applying filters like blurring, sharpening, and noise to diversify the training datasets. The original InSAR interferogram data were downloaded from the COMET-LiCS Sentinel-1 InSAR portal (<a href="https://comet.nerc.ac.uk/comet-lics-portal/" target="_blank">https://comet.nerc.ac.uk/comet-lics-portal/</a> (accessed on 1 December 2024)).</p>
Full article ">
27 pages, 5597 KiB  
Article
Smart Organization of Imbalanced Traffic Datasets for Long-Term Traffic Forecasting
by Mustafa M. Kara, H. Irem Turkmen and M. Amac Guvensan
Sensors 2025, 25(4), 1225; https://doi.org/10.3390/s25041225 - 18 Feb 2025
Viewed by 177
Abstract
Predicting traffic speed is an important issue, especially in urban regions. Precise long-term forecasts would enable individuals to conserve time and financial resources while diminishing air pollution. Despite extensive research on this subject, to our knowledge, no publications investigate or tackle the issue [...] Read more.
Predicting traffic speed is an important issue, especially in urban regions. Precise long-term forecasts would enable individuals to conserve time and financial resources while diminishing air pollution. Despite extensive research on this subject, to our knowledge, no publications investigate or tackle the issue of imbalanced datasets in traffic speed prediction. Traffic speed data are often biased toward high numbers because low traffic speeds are infrequent. The temporal aspect of traffic carries two important factors for low-speed value. The daily population movement, captured by the time of day, and the weather data, recorded by month, are both considered in this study. Hour-wise Pattern Organization and Month-wise Pattern Organization techniques were devised, which organize the speed data using these two factors as a metric with a view to providing a superior representation of data characteristics that are in the minority. In addition to these two methods, a Speed-wise Pattern Organization strategy is proposed, which arranges train and test samples by setting boundaries on speed while taking the volatile nature of traffic into consideration. We evaluated these strategies using four popular model types: long short-term memory (LSTM), gated recurrent unit networks (GRUs), bi-directional LSTM, and convolutional neural networks (CNNs). GRU had the best performance, achieving a MAPE (Mean Absolute Percentage Error) of 13.51%, whereas LSTM demonstrated the lowest performance, with a MAPE of 13.74%. We validated their robustness through our studies and observed improvements in model accuracy across all categories. While the average improvement was approximately 4%, our methodologies demonstrated superior performance in low-traffic speed scenarios, augmenting model prediction accuracy by 11.2%. The presented methodologies in this study are applied in the pre-processing steps, allowing their application with various models and additional pre-processing procedures to attain comparable performance improvements. Full article
(This article belongs to the Section Navigation and Positioning)
Show Figures

Figure 1

Figure 1
<p>An overview of the general flow of the proposed methodology.</p>
Full article ">Figure 2
<p>Average Speed of 441 Segments separated to 5 different groups based on time of day.</p>
Full article ">Figure 3
<p>Percentage of data belonging to each time group.</p>
Full article ">Figure 4
<p>An example of how model creation using Hour-wise Pattern Organization works.</p>
Full article ">Figure 5
<p>The upper graphics present the MAPE scores of the LSTM model with and without hour-wise pattern organization. The lower part displays the MAPE difference between the two models. Green segments correspond to time regions where the applied method performs better, while red segments belong to regions where the base LSTM model is superior.</p>
Full article ">Figure 6
<p>Distribution of speed values within the dataset.</p>
Full article ">Figure 7
<p>Percentage of data regarding traffic density.</p>
Full article ">Figure 8
<p>An example of how model creation using Speed-wise Pattern Organization method works.</p>
Full article ">Figure 9
<p>Grouping speed values with the help of speed-wise pattern organization. The upper side of the figure shows the arrangement of the test samples, while the lower side shows the arrangement of the training samples.</p>
Full article ">Figure 10
<p>Comparison of the base LSTM model with the model applying Speed-wise Pattern Organization. The upper figure presents the MAPE values of two methods, whereas the lower figure illustrates the MAPE difference between them.</p>
Full article ">Figure 11
<p>Hourly average speed data of 5 consecutive months belonging to all 441 road segments in Istanbul.</p>
Full article ">Figure 12
<p>The average monthly MAPE values were obtained from a base LSTM model that was trained without any pattern organization scheme. Each color represents a different season.</p>
Full article ">Figure 13
<p>An illustration of the model development process with Month-wise Pattern Organization.</p>
Full article ">Figure 14
<p>Speed histograms of April and August. We present the speed values of the respective months in bins.</p>
Full article ">Figure 15
<p>This is a figure illustrating the comparison of month similarities obtained from the proposed methodology. In this heat map, a row depicts the similarity between the month it represents and the twelve previous months shown in the columns. The column designated February indicates the resemblance of the current year’s January and February to February of the preceding year. However, March of this year has similarities to February of this year. The color gradient in this figure progresses from dark red, representing greater distance values, to bright green, denoting lesser distance values.</p>
Full article ">Figure 16
<p>MAPE ratings obtained from four base models. The base models exhibit the MAPE scores obtained without the application of additional methods. The Hour-wise Pattern Organization presents the outcomes obtained using the methodology outlined in <a href="#sec3-sensors-25-01225" class="html-sec">Section 3</a>-A. Speed-wise Pattern Organization presents the outcomes obtained using the methodology outlined in <a href="#sec3-sensors-25-01225" class="html-sec">Section 3</a>-B. The Month-wise Pattern Organization presents the outcomes obtained using the methodology outlined in <a href="#sec3-sensors-25-01225" class="html-sec">Section 3</a>-C. The Multiple Method presents the outcomes obtained using the technique outlined in <a href="#sec4-sensors-25-01225" class="html-sec">Section 4</a>-E.</p>
Full article ">Figure 17
<p>Hourly MAPE Difference between models trained with no additional process and models trained with the hour-wise pattern organization method.</p>
Full article ">Figure 18
<p>Hourly MAPE Difference between models trained with no additional process and models trained using the Speed-wise Pattern Organization method.</p>
Full article ">
11 pages, 939 KiB  
Proceeding Paper
CNN-Based Image Segmentation Approach in Brain Tumor Classification: A Review
by Nurul Huda and Ku Ruhana Ku-Mahamud
Eng. Proc. 2025, 84(1), 66; https://doi.org/10.3390/engproc2025084066 - 17 Feb 2025
Viewed by 21
Abstract
This study explores the application of Convolutional Neural Networks (CNNs) for brain tumor segmentation, leveraging their ability to automatically extract hierarchical features from medical images. CNN architectures like U-Net, V-Net, and ResNet have shown significant promise in brain tumor classification, offering high precision [...] Read more.
This study explores the application of Convolutional Neural Networks (CNNs) for brain tumor segmentation, leveraging their ability to automatically extract hierarchical features from medical images. CNN architectures like U-Net, V-Net, and ResNet have shown significant promise in brain tumor classification, offering high precision in detecting tumor boundaries and classifying tumor types. Various benchmark datasets, such as BraTS, TCIA, Harvard, and Kaggle, provide annotated MRI images to evaluate these models. Performance metrics including Dice Similarity Coefficient (DSC), Intersection over Union, and accuracy are employed to assess the models’ effectiveness. The results demonstrate that CNN-based models, particularly U-Net, perform exceptionally well, with DSC scores exceeding 0.90 in most cases. However, challenges such as data imbalance, the need for large datasets, and high computational demands persist. Despite these limitations, CNNs, when combined with advanced techniques like transfer learning and data augmentation, offer robust solutions for brain tumor segmentation, showing promise for real-time clinical deployment. Further advancements are necessary to address generalization issues and enhance model efficiency, ensuring broader applicability in clinical settings. Full article
Show Figures

Figure 1

Figure 1
<p>Segmentation Techniques from 2013–2024.</p>
Full article ">Figure 2
<p>Recent advances and future trend of image segmentation for brain tumor classification.</p>
Full article ">
32 pages, 4102 KiB  
Article
A Multimodal Pain Sentiment Analysis System Using Ensembled Deep Learning Approaches for IoT-Enabled Healthcare Framework
by Anay Ghosh, Saiyed Umer, Bibhas Chandra Dhara and G. G. Md. Nawaz Ali
Sensors 2025, 25(4), 1223; https://doi.org/10.3390/s25041223 - 17 Feb 2025
Viewed by 168
Abstract
This study introduces a multimodal sentiment analysis system to assess and recognize human pain sentiments within an Internet of Things (IoT)-enabled healthcare framework. This system integrates facial expressions and speech-audio recordings to evaluate human pain intensity levels. This integration aims to enhance the [...] Read more.
This study introduces a multimodal sentiment analysis system to assess and recognize human pain sentiments within an Internet of Things (IoT)-enabled healthcare framework. This system integrates facial expressions and speech-audio recordings to evaluate human pain intensity levels. This integration aims to enhance the recognition system’s performance and enable a more accurate assessment of pain intensity. Such a multimodal approach supports improved decision making in real-time patient care, addressing limitations inherent in unimodal systems for measuring pain sentiment. So, the primary contribution of this work lies in developing a multimodal pain sentiment analysis system that integrates the outcomes of image-based and audio-based pain sentiment analysis models. The system implementation contains five key phases. The first phase focuses on detecting the facial region from a video sequence, a crucial step for extracting facial patterns indicative of pain. In the second phase, the system extracts discriminant and divergent features from the facial region using deep learning techniques, utilizing some convolutional neural network (CNN) architectures, which are further refined through transfer learning and fine-tuning of parameters, alongside fusion techniques aimed at optimizing the model’s performance. The third phase performs the speech-audio recording preprocessing; the extraction of significant features is then performed through conventional methods followed by using the deep learning model to generate divergent features to recognize audio-based pain sentiments in the fourth phase. The final phase combines the outcomes from both image-based and audio-based pain sentiment analysis systems, improving the overall performance of the multimodal system. This fusion enables the system to accurately predict pain levels, including ‘high pain’, ‘mild pain’, and ‘no pain’. The performance of the proposed system is tested with the three image-based databases such as a 2D Face Set Database with Pain Expression, the UNBC-McMaster database (based on shoulder pain), and the BioVid database (based on heat pain), along with the VIVAE database for the audio-based dataset. Extensive experiments were performed using these datasets. Finally, the proposed system achieved accuracies of 76.23%, 84.27%, and 38.04% for two, three, and five pain classes, respectively, on the 2D Face Set Database with Pain Expression, UNBC, and BioVid datasets. The VIVAE audio-based system recorded a peak performance of 97.56% and 98.32% accuracy for varying training–testing protocols. These performances were compared with some state-of-the-art methods that show the superiority of the proposed system. By combining the outputs of both deep learning frameworks on image and audio datasets, the proposed multimodal pain sentiment analysis system achieves accuracies of 99.31% for the two-class, 99.54% for the three-class, and 87.41% for the five-class pain problems. Full article
(This article belongs to the Section Physical Sensors)
Show Figures

Figure 1

Figure 1
<p>Pictorial representation of the proposed multimodal pain sentiment analysis system (PSAS) for smart healthcare framework.</p>
Full article ">Figure 2
<p>Detecting facial regions in input images for the image-based PSAS.</p>
Full article ">Figure 3
<p>Demonstration of the <math display="inline"><semantics> <mrow> <mi>C</mi> <mi>N</mi> <msub> <mi>N</mi> <mi>A</mi> </msub> </mrow> </semantics></math> architecture for image-based PSAS.</p>
Full article ">Figure 4
<p>Illustration of the <math display="inline"><semantics> <mrow> <mi>C</mi> <mi>N</mi> <msub> <mi>N</mi> <mi>B</mi> </msub> </mrow> </semantics></math> architecture.</p>
Full article ">Figure 5
<p>Executed <math display="inline"><semantics> <mrow> <mi>C</mi> <mi>N</mi> <msub> <mi>N</mi> <mn>1</mn> </msub> </mrow> </semantics></math> framework.</p>
Full article ">Figure 6
<p>Examples of some image samples from UNBC-McMaster [<a href="#B60-sensors-25-01223" class="html-bibr">60</a>] database.</p>
Full article ">Figure 7
<p>Examples of some image samples from <math display="inline"><semantics> <mrow> <mn>2</mn> <mi>D</mi> <mi>F</mi> <mi>P</mi> <mi>E</mi> </mrow> </semantics></math> [<a href="#B61-sensors-25-01223" class="html-bibr">61</a>] database.</p>
Full article ">Figure 8
<p>Samples of some image specimens from BioVid Heat Pain Database [<a href="#B62-sensors-25-01223" class="html-bibr">62</a>].</p>
Full article ">Figure 9
<p>Demonstration of utilization of <math display="inline"><semantics> <mrow> <mi>S</mi> <mi>c</mi> <mi>h</mi> <mi>e</mi> <mi>m</mi> <msub> <mi>e</mi> <mn>1</mn> </msub> </mrow> </semantics></math> experiments, exploring the effect of batch size vs. epochs on the proposed system’s performance.</p>
Full article ">Figure 10
<p>Demonstration of <math display="inline"><semantics> <mrow> <mi>S</mi> <mi>c</mi> <mi>h</mi> <mi>e</mi> <mi>m</mi> <msub> <mi>e</mi> <mn>1</mn> </msub> </mrow> </semantics></math> experiments performing multi-resolution image analysis on the performance of the proposed system.</p>
Full article ">Figure 11
<p>Demonstration of some image samples of AffectNet dataset [<a href="#B64-sensors-25-01223" class="html-bibr">64</a>] with ethnic diversity and variations in age among the subjects to validate the robustness of the proposed methodology.</p>
Full article ">Figure 12
<p>The performance outcome of the proposed pain SAS using audio features with (<b>a</b>) 50–50% training–testing, and (<b>b</b>) 75–25% training–testing sets.</p>
Full article ">Figure 13
<p>Performance of the proposed pain sentiment analysis system using the performance reported in <a href="#sensors-25-01223-t011" class="html-table">Table 11</a> and <a href="#sensors-25-01223-f012" class="html-fig">Figure 12</a>.</p>
Full article ">Figure 14
<p>Performance of the proposed multimodal pain SAS (<math display="inline"><semantics> <mrow> <mi>M</mi> <mi>S</mi> <mi>A</mi> <msub> <mi>S</mi> <mn>1</mn> </msub> </mrow> </semantics></math>) using 2-class <math display="inline"><semantics> <mrow> <mn>2</mn> <mi>D</mi> <mi>F</mi> <mi>P</mi> <mi>E</mi> </mrow> </semantics></math> and VIVAE databases.</p>
Full article ">Figure 15
<p>Performance of the proposed multimodal pain SAS (<math display="inline"><semantics> <mrow> <mi>M</mi> <mi>S</mi> <mi>A</mi> <msub> <mi>S</mi> <mn>2</mn> </msub> </mrow> </semantics></math>) using 3-Class UNBC-McMaster and VIVAE databases.</p>
Full article ">Figure 16
<p>Performance of the proposed multimodal pain SAS (<math display="inline"><semantics> <mrow> <mi>M</mi> <mi>S</mi> <mi>A</mi> <msub> <mi>S</mi> <mn>3</mn> </msub> </mrow> </semantics></math>) using 4-class BioVid and VIVAE databases.</p>
Full article ">
16 pages, 2242 KiB  
Article
Effective Data Augmentation Techniques for Arabic Speech Emotion Recognition Using Convolutional Neural Networks
by Wided Bouchelligua, Reham Al-Dayil and Areej Algaith
Appl. Sci. 2025, 15(4), 2114; https://doi.org/10.3390/app15042114 - 17 Feb 2025
Viewed by 248
Abstract
This paper investigates the effectiveness of various data augmentation techniques for enhancing Arabic speech emotion recognition (SER) using convolutional neural networks (CNNs). Utilizing the Saudi Dialect and BAVED datasets, we address the challenges of limited and imbalanced data commonly found in Arabic SER. [...] Read more.
This paper investigates the effectiveness of various data augmentation techniques for enhancing Arabic speech emotion recognition (SER) using convolutional neural networks (CNNs). Utilizing the Saudi Dialect and BAVED datasets, we address the challenges of limited and imbalanced data commonly found in Arabic SER. To improve model performance, we apply augmentation techniques such as noise addition, time shifting, increasing volume, and reducing volume. Additionally, we examine the optimal number of augmentations required to achieve the best results. Our experiments reveal that these augmentations significantly enhance the CNN’s ability to recognize emotions, with certain techniques proving more effective than others. Furthermore, the number of augmentations plays a critical role in balancing model accuracy. The Saudi Dialect dataset achieved its best results with two augmentations (increasing volume and decreasing volume), reaching an accuracy of 96.81%. Similarly, the BAVED dataset demonstrated optimal performance with a combination of three augmentations (noise addition, increasing volume, and reducing volume), achieving an accuracy of 92.60%. These findings indicate that carefully selected augmentation strategies can greatly improve the performance of CNN-based SER systems, particularly in the context of Arabic speech. This research underscores the importance of tailored augmentation techniques to enhance SER performance and sets a foundation for future advancements in this field. Full article
(This article belongs to the Special Issue Natural Language Processing: Novel Methods and Applications)
Show Figures

Figure 1

Figure 1
<p>Original Saudi Dialect dataset distribution.</p>
Full article ">Figure 2
<p>Original BAVED dataset distribution.</p>
Full article ">Figure 3
<p>The flow of data preparation for the SER.</p>
Full article ">Figure 4
<p>Examples of the audio files with data augmentation: (<b>a</b>) original audio for an angry emotion (01), (<b>b</b>) noise addition, (<b>c</b>) time shift, (<b>d</b>) increasing volume, and (<b>e</b>) reducing volume.</p>
Full article ">Figure 5
<p>The block diagram of the MFCC computation.</p>
Full article ">Figure 6
<p>The proposed SER architecture.</p>
Full article ">
24 pages, 2289 KiB  
Article
A Non-Invasive Approach for Facial Action Unit Extraction and Its Application in Pain Detection
by Mondher Bouazizi, Kevin Feghoul, Shengze Wang, Yue Yin and Tomoaki Ohtsuki
Bioengineering 2025, 12(2), 195; https://doi.org/10.3390/bioengineering12020195 - 17 Feb 2025
Viewed by 152
Abstract
A significant challenge that hinders advancements in medical research is the sensitive and confidential nature of patient data in available datasets. In particular, sharing patients’ facial images poses considerable privacy risks, especially with the rise of generative artificial intelligence (AI), which could misuse [...] Read more.
A significant challenge that hinders advancements in medical research is the sensitive and confidential nature of patient data in available datasets. In particular, sharing patients’ facial images poses considerable privacy risks, especially with the rise of generative artificial intelligence (AI), which could misuse such data if accessed by unauthorized parties. However, facial expressions are a valuable source of information for doctors and researchers, which creates a need for methods to derive them without compromising patient privacy or safety by exposing identifiable facial images. To address this, we present a quick, computationally efficient method for detecting action units (AUs) and their intensities—key indicators of health and emotion—using only 3D facial landmarks. Our proposed framework extracts 3D face landmarks from video recordings and employs a lightweight neural network (NN) to identify AUs and estimate AU intensities based on these landmarks. Our proposed method reaches a 79.25% F1-score in AU detection for the main AUs, and 0.66 in AU intensity estimation Root Mean Square Error (RMSE). This performance shows that it is possible for researchers to share 3D landmarks, which are far less intrusive, instead of facial images while maintaining high accuracy in AU detection. Moreover, to showcase the usefulness of our AU detection model, using the detected AUs and estimated intensities, we trained state-of-the-art Deep Learning (DL) models to detect pain. Our method reaches 91.16% accuracy in pain detection, which is not far behind the 93.14% accuracy obtained when employing a convolutional neural network (CNN) with residual blocks trained on actual images and the 92.11% accuracy obtained when employing all the ground-truth AUs. Full article
(This article belongs to the Section Biosignal Processing)
Show Figures

Figure 1

Figure 1
<p>An example of the human face mesh superimposed on the human face itself. Areas around the eyes, the nose, and the mouth have higher landmark density than the remaining parts of the face.</p>
Full article ">Figure 2
<p>A flowchart of the proposed framework. The framework is composed of 3 main components: an anonymizer, an AU detector, and a pain detector.</p>
Full article ">Figure 3
<p>A diagram block of the proposed framework: Upon generating the 3D face landmarks, a 2-layer FCNN with multiple outputs is used to detect the AUs. The sequence of detected AUs is then processed through a Transformer encoder to identify the class (pain).</p>
Full article ">Figure 4
<p>The structure of the Transformer encoder used in our work.</p>
Full article ">Figure 5
<p>Example of consecutive frames from the dataset (a few seconds apart) along with their detected face landmarks.</p>
Full article ">Figure 6
<p>Distribution in percent of the different AUs in our dataset.</p>
Full article ">Figure 7
<p>Precision, recall and F1-scores of the detection of the secondary AUs.</p>
Full article ">Figure 8
<p>Distribution of the intensity level for each action unit in our dataset.</p>
Full article ">
Back to TopTop