Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (17,206)

Search Parameters:
Keywords = convolutional neural networks

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 3034 KiB  
Article
HDCTfusion: Hybrid Dual-Branch Network Based on CNN and Transformer for Infrared and Visible Image Fusion
by Wenqing Wang, Lingzhou Li, Yifei Yang, Han Liu and Runyuan Guo
Sensors 2024, 24(23), 7729; https://doi.org/10.3390/s24237729 (registering DOI) - 3 Dec 2024
Abstract
The purpose of infrared and visible image fusion is to combine the advantages of both and generate a fused image that contains target information and has rich details and contrast. However, existing fusion algorithms often overlook the importance of incorporating both local and [...] Read more.
The purpose of infrared and visible image fusion is to combine the advantages of both and generate a fused image that contains target information and has rich details and contrast. However, existing fusion algorithms often overlook the importance of incorporating both local and global feature extraction, leading to missing key information in the fused image. To address these challenges, this paper proposes a dual-branch fusion network combining convolutional neural network (CNN) and Transformer, which enhances the feature extraction capability and motivates the fused image to contain more information. Firstly, a local feature extraction module with CNN as the core is constructed. Specifically, the residual gradient module is used to enhance the ability of the network to extract texture information. Also, jump links and coordinate attention are used in order to relate shallow features to deeper ones. In addition, a global feature extraction module based on Transformer is constructed. Through the powerful ability of Transformer, the global context information of the image can be captured and the global features are fully extracted. The effectiveness of the proposed method in this paper is verified on different experimental datasets, and it is better than most of the current advanced fusion algorithms. Full article
Show Figures

Figure 1

Figure 1
<p>General framework of the proposed network.</p>
Full article ">Figure 2
<p>Local feature extraction module.</p>
Full article ">Figure 3
<p>Coordinate attention module.</p>
Full article ">Figure 4
<p>Global feature extraction module.</p>
Full article ">Figure 5
<p>Subjective comparison of three pairs of images on the MSRS dataset. (<b>a</b>) Infrared, (<b>b</b>) Visible, (<b>c</b>) DeepFuse, (<b>d</b>) DenseFuse, (<b>e</b>) RFN-nest, (<b>f</b>) SeaFusion, (<b>g</b>) SwinFuse, (<b>h</b>) U2, (<b>i</b>) ITFuse, (<b>j</b>) Ours.</p>
Full article ">Figure 6
<p>Objective comparison of eight indicators on ten image pairs from the MSRS dataset.</p>
Full article ">Figure 7
<p>Subjective comparison of three pairs of images on the TNO dataset. (<b>a</b>) Infrared, (<b>b</b>) Visible, (<b>c</b>) DeepFuse, (<b>d</b>) DenseFuse, (<b>e</b>) RFN-nest, (<b>f</b>) SeAFusion, (<b>g</b>) SwinFuse, (<b>h</b>) U2, (<b>i</b>) ITFuse, (<b>j</b>) Ours.</p>
Full article ">Figure 8
<p>Objective comparison of eight indicators on ten image pairs from the TNO dataset.</p>
Full article ">Figure 9
<p>Subjective comparison of 3 pairs of images on the RoadSence dataset. (<b>a</b>) Infrared, (<b>b</b>) Visible, (<b>c</b>) DeepFuse, (<b>d</b>) DenseFuse, (<b>e</b>) RFN-nest, (<b>f</b>) SeAFusion, (<b>g</b>) SwinFuse, (<b>h</b>) U2, (<b>i</b>) ITFuse, (<b>j</b>) Ours.</p>
Full article ">Figure 10
<p>Objective comparison of eight indicators on ten image pairs from the RoadSence dataset.</p>
Full article ">Figure 11
<p>Network framework for LFEM ablation experiments. (<b>a</b>) The network only with convolutional layers, (<b>b</b>) The network with the residual gradient module, (<b>c</b>) The network with the residual gradient module and CA.</p>
Full article ">Figure 12
<p>Network framework for the GFEM ablation experiments. (<b>a</b>) The proposed network without the GFEM, (<b>b</b>) The proposed network.</p>
Full article ">
15 pages, 23802 KiB  
Article
Vision-Based Prediction of Flashover Using Transformers and Convolutional Long Short-Term Memory Model
by M. Hamed Mozaffari, Yuchuan Li, Niloofar Hooshyaripour and Yoon Ko
Electronics 2024, 13(23), 4776; https://doi.org/10.3390/electronics13234776 - 3 Dec 2024
Abstract
The prediction of fire growth is crucial for effective firefighting and rescue operations. Recent advancements in vision-based techniques using RGB vision and infrared (IR) thermal imaging data, coupled with artificial intelligence and deep learning techniques, have shown promising solutions to be applied in [...] Read more.
The prediction of fire growth is crucial for effective firefighting and rescue operations. Recent advancements in vision-based techniques using RGB vision and infrared (IR) thermal imaging data, coupled with artificial intelligence and deep learning techniques, have shown promising solutions to be applied in the detection of fire and the prediction of its behavior. This study introduces the use of Convolutional Long Short-term Memory (ConvLSTM) network models for predicting room fire growth by analyzing spatiotemporal IR thermal imaging data acquired from full-scale room fire tests. Our findings revealed that SwinLSTM, an enhanced version of ConvLSTM combined with transformers (a deep learning architecture based on a new mechanism called multi-head attention) for computer vision purposes, can be used for the prediction of room fire flashover occurrence. Notably, transformer-based ConvLSTM deep learning models, such as SwinLSTM, demonstrate superior prediction capability, which suggests a new vision-based smart solution for future fire growth prediction tasks. The main focus of this work is to perform a feasibility study on the use of a pure vision-based deep learning model for analysis of future video data to anticipate behavior of fire growth in room fire incidents. Full article
(This article belongs to the Special Issue Deep Learning for Computer Vision Application)
Show Figures

Figure 1

Figure 1
<p>The architecture of the ConvLSTM network used in this study, key components of the network, and their functions.</p>
Full article ">Figure 2
<p>(<b>a</b>) the architecture of the SwinLSTM recurrent cell. STB and LP denote Swin Transformer blocks and Linear Projection; (<b>b</b>) the architecture of the SwinLSTM-B model, which contains a single SwinLSTM cell; and (<b>c</b>) the architecture of the deeper version of the SwinLSTM with multiple SwinLSTM cells, named SwinLSTM-D.</p>
Full article ">Figure 3
<p>Two randomly selected data samples from the entire dataset. For the sake of illustration, we showed every 5 s. Red and yellow colors are lower and higher range of temperatures, respectively.</p>
Full article ">Figure 4
<p>Training and validation of SwinLSTM for room fire IR data.</p>
Full article ">Figure 5
<p>Results of SwinLSTM applied on test dataset.</p>
Full article ">Figure 6
<p>Selected frames from test dataset for better illustration of prediction details by ConvLSTM model. The temperature colormap also provided for better comparison between ground truth images and the predicted images.</p>
Full article ">Figure 7
<p>Quantitative comparison analysis of ground truth image and prediction result image of SwinLSTM applied on IR video data shown in <a href="#electronics-13-04776-f005" class="html-fig">Figure 5</a>.</p>
Full article ">Figure 8
<p>Performance of SwinLSTM in prediction of one frame in future for 20 randomly selected room fire test videos.</p>
Full article ">Figure A1
<p>Schematic of (<b>A</b>) a simple standard recurrent layer, (<b>B</b>) unrolled version of a simple recurrent layer, (<b>C</b>) detailed components of a recurrent neuron or cell.</p>
Full article ">Figure A2
<p>Schematic of original LSTM architecture with forget gate.</p>
Full article ">Figure A3
<p>Details of the recurrent structure of a ConvLSTM module.</p>
Full article ">
11 pages, 5505 KiB  
Proceeding Paper
Combining Deep Learning and Street View Images for Urban Building Color Research
by Wenjing Li, Qian Ma and Zhiyong Lin
Proceedings 2024, 110(1), 7; https://doi.org/10.3390/proceedings2024110007 - 3 Dec 2024
Abstract
The color of a cityscape plays a significant role in its atmosphere; however, the traditional city color analysis methods cover a wide range but are not precise enough, requiring field sampling, a lot of manual comparisons, and lacking quantitative analysis of color. With [...] Read more.
The color of a cityscape plays a significant role in its atmosphere; however, the traditional city color analysis methods cover a wide range but are not precise enough, requiring field sampling, a lot of manual comparisons, and lacking quantitative analysis of color. With the development of artificial intelligence, deep learning and computer vision technology show great potential in urban environment research. In this document, we focus on “building color” and present a deep learning-based framework that combines geospatial big data with AI technology to extract and analyze urban color information. The framework is composed of two phases: “deep learning” and “quantitative analysis.” In the “deep learning” phase, a deep convolutional neural network (DCNN)-based color extraction model is designed to automatically learn building color information from street view images; in the “quantitative analysis” phase, building color is quantitatively analyzed at the overall and local levels, and a color clustering model is designed to quantitatively display the color relationship to comprehensively understand the current status of urban building color. The research method and results of this paper are one of the effective ways to combine geospatial big data with GeoAI, which is helpful to the collection and analysis of urban color and provides direction for the construction of urban color information management. Full article
Show Figures

Figure 1

Figure 1
<p>Distribution of buildings in the study area.</p>
Full article ">Figure 2
<p>The process of the street view image acquisition, processing, and training dataset construction.</p>
Full article ">Figure 3
<p>Framework for research on architectural colors.</p>
Full article ">Figure 4
<p>(<b>a</b>) Scatter diagram of building hue-lightness distribution in Jiangan district; (<b>b</b>) Scatter diagram of building hue-saturation distribution in Jiangan district. The horizontal coordinate axis indicates the hue value, the vertical coordinate axis indicates the lightness or saturation value, and the size of the scatter indicates how many numbers the corresponding value is. When the hue is N, it is a non-colorful gray color. The value of the hue is 0, and there is no value of the saturation value.</p>
Full article ">Figure 5
<p>The color tone analysis diagram of the building in the Jiangan district.</p>
Full article ">Figure 6
<p>The main colors of various buildings and their proportions.</p>
Full article ">Figure 7
<p>The relative intensity of the dominant colors of various buildings.</p>
Full article ">
25 pages, 44855 KiB  
Article
Burned Olive Trees Identification with a Deep Learning Approach in Unmanned Aerial Vehicle Images
by Christos Vasilakos and Vassilios S. Verykios
Remote Sens. 2024, 16(23), 4531; https://doi.org/10.3390/rs16234531 - 3 Dec 2024
Viewed by 82
Abstract
Olive tree orchards are suffering from wildfires in many Mediterranean countries. Following a wildfire event, identifying damaged olive trees is crucial for developing effective management and restoration strategies, while rapid damage assessment can support potential compensation for producers. Moreover, the implementation of real-time [...] Read more.
Olive tree orchards are suffering from wildfires in many Mediterranean countries. Following a wildfire event, identifying damaged olive trees is crucial for developing effective management and restoration strategies, while rapid damage assessment can support potential compensation for producers. Moreover, the implementation of real-time health monitoring in olive groves allows producers to carry out targeted interventions, reducing production losses and preserving crop health. This research examines the use of deep learning methodologies in true-color images from Unmanned Aerial Vehicles (UAV) to detect damaged trees, including withering and desiccation of branches and leaf scorching. More specifically, the object detection and image classification computer vision techniques area applied and compared. In the object detection approach, the algorithm aims to localize and identify burned/dry and unburned/healthy olive trees, while in the image classification approach, the classifier categorizes an image showing a tree as burned/dry or unburned/healthy. Training data included true color UAV images of olive trees damaged by fire obtained by multiple cameras and multiple flight heights, resulting in various resolutions. For object detection, the Residual Neural Network was used as a backbone in an object detection approach with a Single-Shot Detector. In the image classification application, two approaches were evaluated. In the first approach, a new shallow network was developed, while in the second approach, transfer learning from pre-trained networks was applied. According to the results, the object detection approach managed to identify healthy trees with an average accuracy of 74%, while for trees with drying, the average accuracy was 69%. However, the optimal network identified olive trees (healthy or unhealthy) that the user did not detect during data collection. In the image classification approach, the application of convolutional neural networks achieved significantly better results with an F1-score above 0.94, either in the new network training approach or by applying transfer learning. In conclusion, the use of computer vision techniques in UAV images identified damaged olive trees, while the image classification approach performed significantly better than object detection. Full article
(This article belongs to the Section Remote Sensing in Agriculture and Vegetation)
Show Figures

Figure 1

Figure 1
<p>Burn severity map of the study area.</p>
Full article ">Figure 2
<p>Aerial image of unburned, partial, and fully burned olive trees.</p>
Full article ">Figure 3
<p>Masking and labeling unburned (healthy) and burned (dry) trees to be used as training data in object detection approach.</p>
Full article ">Figure 4
<p>Flowchart of the methodology.</p>
Full article ">Figure 5
<p>Architecture of a shallow CNN developed for image classification of burned olive trees.</p>
Full article ">Figure 6
<p>Schematic workflow of transfer learning and fine-tuning.</p>
Full article ">Figure 7
<p>Average precision and Log Average Miss Rate of testing dataset for three anchors in SSD model.</p>
Full article ">Figure 8
<p>Average precision and Log Average Miss Rate of testing dataset for four anchors in SSD model.</p>
Full article ">Figure 9
<p>Average precision and Log Average Miss Rate of testing dataset for five anchors in SSD model.</p>
Full article ">Figure 10
<p>Average precision and Log Average Miss Rate of testing dataset for six anchors in SSD model.</p>
Full article ">Figure 11
<p>Average precision and Log Average Miss Rate of testing dataset for seven anchors in SSD model.</p>
Full article ">Figure 12
<p>Ground truth data (<b>left</b>) and object detection (<b>right</b>).</p>
Full article ">Figure 13
<p>Ground truth data (<b>left</b>) and object detection (<b>right</b>).</p>
Full article ">Figure 14
<p>Ground truth data (<b>left</b>) and object detection (<b>right</b>).</p>
Full article ">Figure 15
<p>Ground truth data (<b>left</b>) and object detection (<b>right</b>).</p>
Full article ">Figure 16
<p>Confusion matrices of the testing dataset for the seven trained models in image classification approach.</p>
Full article ">Figure 17
<p>Actual class and the predicted class with the corresponding score for a subset of images from testing dataset.</p>
Full article ">
27 pages, 21616 KiB  
Article
Integrating Convolutional Attention and Encoder–Decoder Long Short-Term Memory for Enhanced Soil Moisture Prediction
by Jingfeng Han, Jian Hong, Xiao Chen, Jing Wang, Jinlong Zhu, Xiaoning Li, Yuguang Yan and Qingliang Li
Water 2024, 16(23), 3481; https://doi.org/10.3390/w16233481 - 3 Dec 2024
Viewed by 102
Abstract
Soil moisture is recognized as a crucial variable in land–atmosphere interactions. This study introduces the Convolutional Attention Encoder–Decoder Long Short-Term Memory (CAEDLSTM) model to address the uncertainties and limitations inherent in traditional soil moisture prediction methods, especially in capturing complex temporal dynamics across [...] Read more.
Soil moisture is recognized as a crucial variable in land–atmosphere interactions. This study introduces the Convolutional Attention Encoder–Decoder Long Short-Term Memory (CAEDLSTM) model to address the uncertainties and limitations inherent in traditional soil moisture prediction methods, especially in capturing complex temporal dynamics across diverse environmental conditions. Unlike existing approaches, this model integrates convolutional layers, an encoder–decoder framework, and multi-head attention mechanisms for the first time in soil moisture prediction. The convolutional layers capture local spatial features, while the encoder–decoder architecture effectively manages temporal dependencies. Additionally, the multi-head attention mechanism enhances the model’s ability to simultaneously focus on multiple key influencing factors, ensuring a comprehensive understanding of complex environmental variables. This synergistic combination significantly improves predictive performance, particularly in challenging climatic conditions. The model was validated using the LandBench1.0 dataset, which includes multiple high-resolution datasets, such as ERA5-land, ERA5 atmospheric variables, and SoilGrids, covering various climatic regions, including high latitudes, temperate zones, and tropical areas. The superior performance of the CAEDLSTM model is evidenced by comparisons with advanced models such as AEDLSTM, CNNLSTM, EDLSTM, and AttLSTM. Relative to the traditional LSTM model, CAEDLSTM achieved an average increase of 5.01% in R2, a 12.89% reduction in RMSE, a 16.67% decrease in bias, and a 4.35% increase in KGE. Moreover, it effectively addresses the limitations of traditional deep learning methods in challenging climates, including tropical Africa, the Tibetan Plateau, and Southeast Asia, resulting in significant enhancements in predictive accuracy within these regions, with R2 values improving by as much as 20%. These results underscore the capabilities of CAEDLSTM in capturing complex soil moisture dynamics, demonstrating its considerable potential for applications in agriculture and water resource monitoring across diverse climates. Full article
(This article belongs to the Special Issue Methods and Tools for Sustainable Agricultural Water Management)
Show Figures

Figure 1

Figure 1
<p>The correlation of the input feature for predicting soil moisture: the volume of the soil water layer (0–7 cm).</p>
Full article ">Figure 2
<p>The structure of the LSTM model.</p>
Full article ">Figure 3
<p>The structure of the CAEDLSTM model.</p>
Full article ">Figure 4
<p>Box plots of the predictive performance of LSTM, CNNLSTM, EDLSTM, AttLSTM, AEDLSTM, and CAEDLSTM models for predicting soil moisture measured by R<sup>2</sup>, KGE, bias, and RMSE(m<sup>3</sup>/m<sup>3</sup>).</p>
Full article ">Figure 5
<p>CDF plot analysis of all models for R<sup>2</sup> (<b>a</b>,<b>b</b>), KGE (<b>c</b>,<b>d</b>), RMSE(m<sup>3</sup>/m<sup>3</sup>) (<b>e</b>,<b>f</b>), and bias (<b>g</b>,<b>h</b>). Images (<b>b</b>,<b>d</b>,<b>f</b>,<b>h</b>) are magnified views of the red boxes in the corresponding left figure (<b>a</b>,<b>c</b>,<b>e</b>,<b>g</b>), respectively.</p>
Full article ">Figure 6
<p>Global soil moisture predictions for six models (LSTM, CNNLSTM, EDLSTM, AttLSTM, AEDLSTM, and the proposed CAEDLSTM) at a 1-degree spatial resolution and 1-day lead time.</p>
Full article ">Figure 7
<p>The global distribution map of the CAEDLSTM model highlights improvements across three key metrics, R<sup>2</sup>, RMSE (m<sup>3</sup>/m<sup>3</sup>), and bias, compared to five other models.</p>
Full article ">Figure 8
<p>Time series of soil moisture predictions from the CAEDLSTM, AEDLSTM, EDLSTM, AttLSTM, CNNLSTM, and LSTM models across five global locations marked on the world map (<b>a</b>). Subgraphs (<b>b</b>–<b>f</b>) show predictions for the Andes region, Argentina (<b>b</b>); Texas, USA (<b>c</b>); Nigeria (<b>d</b>); Ukraine (<b>e</b>); and Northeast China (<b>f</b>), representing different climatic zones.</p>
Full article ">Figure 9
<p>Spectral comparison of CAEDLSTM and LSTM models for soil moisture prediction.</p>
Full article ">Figure 10
<p>R<sup>2</sup> score comparison between proposed deep learning models and the Random Forest model for global soil moisture prediction.</p>
Full article ">
16 pages, 7431 KiB  
Article
Deep Learning-Based Model for Effective Classification of Ziziphus jujuba Using RGB Images
by Yu-Jin Jeon, So Jin Park, Hyein Lee, Ho-Youn Kim and Dae-Hyun Jung
AgriEngineering 2024, 6(4), 4604-4619; https://doi.org/10.3390/agriengineering6040263 (registering DOI) - 3 Dec 2024
Viewed by 91
Abstract
Ensuring the quality of medicinal herbs in the herbal market is crucial. However, the genetic and physical similarities among medicinal materials have led to issues of mixing and counterfeit distribution, posing significant challenges to quality assurance. Recent advancements in deep learning technology, widely [...] Read more.
Ensuring the quality of medicinal herbs in the herbal market is crucial. However, the genetic and physical similarities among medicinal materials have led to issues of mixing and counterfeit distribution, posing significant challenges to quality assurance. Recent advancements in deep learning technology, widely applied in the field of computer vision, have demonstrated the potential to classify images quickly and accurately, even those that can only be distinguished by experts. This study aimed to develop a classification model based on deep learning technology to distinguish RGB images of seeds from Ziziphus jujuba Mill. var. spinosa, Ziziphus mauritiana Lam., and Hovenia dulcis Thunb. Using three advanced convolutional neural network (CNN) architectures—ResNet-50, Inception-v3, and DenseNet-121—all models demonstrated a classification performance above 98% on the test set, with classification times as low as 23 ms. These results validate that the models and methods developed in this study can effectively distinguish Z. jujuba seeds from morphologically similar species. Furthermore, the strong performance and speed of these models make them suitable for practical use in quality inspection settings. Full article
Show Figures

Figure 1

Figure 1
<p>Sample images of three species: (<b>a</b>) <span class="html-italic">Ziziphus jujuba</span> Mill., (<b>b</b>) <span class="html-italic">Ziziphus mauritiana</span> Lam., and (<b>c</b>) <span class="html-italic">Hovenia dulcis</span> Thunb.</p>
Full article ">Figure 2
<p>Examples of data from three species: (<b>a</b>) central cropped, (<b>b</b>) background-removed, and (<b>c</b>) images rotated at six different angles.</p>
Full article ">Figure 3
<p>Comparative architectural diagrams of (<b>a</b>) ResNet, (<b>b</b>) Inception, and (<b>c</b>) DenseNet models.</p>
Full article ">Figure 4
<p>Transfer learning framework using pre-trained convolutional neural networks for distinguishing <span class="html-italic">Z. jujuba</span> Mill., <span class="html-italic">Z. mauritiana</span> Lam., and <span class="html-italic">H. dulcis</span> Thunb.</p>
Full article ">Figure 5
<p>Training accuracy and loss curves for three deep learning models: (<b>a</b>) ResNet-50, (<b>b</b>) Inception-v3, and (<b>c</b>) DenseNet-121.</p>
Full article ">Figure 6
<p>Confusion matrix for three deep learning models: (<b>a</b>) ResNet-50, (<b>b</b>) Inception-v3, and (<b>c</b>) DenseNet-121.</p>
Full article ">Figure 7
<p>ROC curve with AUC for different species from three deep learning models.</p>
Full article ">Figure 8
<p>Three-dimensional t-SNE visualization of training data features and features from three models: (<b>a</b>) training data, (<b>b</b>) ResNet-50, (<b>c</b>) Inception-v3, and (<b>d</b>) DenseNet-121. In the plots, the colors represent different species: orange for <span class="html-italic">Ziziphus jujuba</span> Mill., green for <span class="html-italic">Ziziphus mauritiana</span> Lam., and blue for <span class="html-italic">Hovenia dulcis</span> Thunb.</p>
Full article ">Figure 9
<p>Future work: development of a portable medicinal materials classifier device.</p>
Full article ">
14 pages, 1185 KiB  
Article
Monitoring Substance Use with Fitbit Biosignals: A Case Study on Training Deep Learning Models Using Ecological Momentary Assessments and Passive Sensing
by Shizhe Li, Chunzhi Fan, Ali Kargarandehkordi, Yinan Sun, Christopher Slade, Aditi Jaiswal, Roberto M. Benzo, Kristina T. Phillips and Peter Washington
AI 2024, 5(4), 2725-2738; https://doi.org/10.3390/ai5040131 (registering DOI) - 3 Dec 2024
Viewed by 171
Abstract
Substance use disorders affect 17.3% of Americans. Digital health solutions that use machine learning to detect substance use from wearable biosignal data can eventually pave the way for real-time digital interventions. However, difficulties in addressing severe between-subject data heterogeneity have hampered the adaptation [...] Read more.
Substance use disorders affect 17.3% of Americans. Digital health solutions that use machine learning to detect substance use from wearable biosignal data can eventually pave the way for real-time digital interventions. However, difficulties in addressing severe between-subject data heterogeneity have hampered the adaptation of machine learning approaches for substance use detection, necessitating more robust technological solutions. We tested the utility of personalized machine learning using participant-specific convolutional neural networks (CNNs) enhanced with self-supervised learning (SSL) to detect drug use. In a pilot feasibility study, we collected data from 9 participants using Fitbit Charge 5 devices, supplemented by ecological momentary assessments to collect real-time labels of substance use. We implemented a baseline 1D-CNN model with traditional supervised learning and an experimental SSL-enhanced model to improve individualized feature extraction under limited label conditions. Results: Among the 9 participants, we achieved an average area under the receiver operating characteristic curve score across participants of 0.695 for the supervised CNNs and 0.729 for the SSL models. Strategic selection of an optimal threshold enabled us to optimize either sensitivity or specificity while maintaining reasonable performance for the other metric. Conclusion: These findings suggest that Fitbit data have the potential to enhance substance use monitoring systems. However, the small sample size in this study limits its generalizability to diverse populations, so we call for future research that explores SSL-powered personalization at a larger scale. Full article
(This article belongs to the Section Medical & Healthcare AI)
Show Figures

Figure 1

Figure 1
<p>Study overview. We recruited participants and equipped them with Fitbits collecting various biosensor data, including HR, steps taken, BR, sleep patterns, and <math display="inline"><semantics> <msub> <mi>SpO</mi> <mn>2</mn> </msub> </semantics></math>. Concurrently, participants completed EMAs via a custom mobile app, recording each substance use event over the monitoring period. We then analyzed these data using personalized deep learning models to detect substance use based on biosensor data from the Fitbit. To protect patient privacy and to avoid asking participants to self-report illegal activity, we gave participants the option to record fruit code names rather than substance names, and the participants eligible for our analysis chose this option.</p>
Full article ">Figure 2
<p>Distribution of each feature’s utilization. Features are ranked by their selection count using Gini impurity.</p>
Full article ">Figure 3
<p>An SSL-enhanced transfer learning framework for drug use classification, utilizing selected biometric features from each participant. A CNN pre-trained with SSL, outlined with a dotted line around the 1D convulutional, pooling, and flatten layers, is fine-tuned with new dense layers to predict drug use from biometric featrues. The dotted line indicates the layers transferred for the task-specific model.</p>
Full article ">Figure 4
<p>Mean bootstrapped sensitivity and specificity at different decision threshold cutoffs across 9 participants, each denoted by distinct colors.</p>
Full article ">
9 pages, 6603 KiB  
Proceeding Paper
Spatially Seamless Downscaling of a SMAP Soil Moisture Product Through a CNN-Based Approach with Integrated Multi-Source Remote Sensing Data
by Yan Jin, Haoyu Fan, Zeshuo Li and Yaojie Liu
Proceedings 2024, 110(1), 8; https://doi.org/10.3390/proceedings2024110008 - 3 Dec 2024
Viewed by 49
Abstract
Surface soil moisture (SSM) is crucial for understanding terrestrial hydrological processes. Despite its widespread use since 2015, the Soil Moisture Active and Passive (SMAP) SSM dataset faces challenges due to its inherent low spatial resolution and data gaps. This study addresses these limitations [...] Read more.
Surface soil moisture (SSM) is crucial for understanding terrestrial hydrological processes. Despite its widespread use since 2015, the Soil Moisture Active and Passive (SMAP) SSM dataset faces challenges due to its inherent low spatial resolution and data gaps. This study addresses these limitations through a deep learning approach aimed at interpolating missing values and downscaling soil moisture data. The result is a seamless, daily 1 km resolution SSM dataset for China, spanning from 1 January 2016 to 31 December 2022. For the original 9 km daily SMAP products, a convolutional neural network (CNN) with residual connections was employed to achieve the spatially seamless 9 km SSM data, integrating multi-source remote sensing data. Subsequently, auxiliary data including land cover, land surface temperatures, vegetation indices, vegetation temperature drought indices, elevation, and soil texture were integrated into the CNN-based downscaling model to generate the spatially seamless 1 km SSM. Comparative analysis of the spatially seamless 9 km and 1 km SSM datasets with ground observations yielded unbiased root mean square error values of 0.09 cm3/cm3 for both, demonstrating the effectiveness of the downscaling method. This approach provides a promising solution for generating high-resolution, spatially seamless soil moisture data to meet the needs of hydrological, meteorological, and agricultural applications. Full article
Show Figures

Figure 1

Figure 1
<p>Study area and locations of the ground stations. The label in the figure represents the network name, which refers to the specific monitoring networks included in ISMN. Each network consists of multiple sites.</p>
Full article ">Figure 2
<p>Model structure of the developed TsSMN.</p>
Full article ">Figure 3
<p>SSM images for 25 June 2018: original SMAP data, spatially seamless 9 km SSM predictions, and downscaled spatially seamless 1 km SSM data.</p>
Full article ">Figure 4
<p>Scatter plots comparing the ground observations with three different 9 km SSM datasets. The dashed line in the figure represents the situation where the predicted value is equal to the actual value, the solid line is the equation obtained by linear regression of the scatter plot, and the values in the figure represent the difference in vertical distance from the point to the regression equation line.</p>
Full article ">Figure 5
<p>Time series comparison of SSM data derived at four ground stations: ground observations (In Situ), original SMAP, TsSMN-based 9 km predictions, SMN-based 9 km predictions, and downscaled 1 km predictions.</p>
Full article ">Figure 6
<p>Scatter plot comparing ground observations with the downscaled 1 km predictions.</p>
Full article ">
24 pages, 2138 KiB  
Article
A Multimodal Machine Learning Model in Pneumonia Patients Hospital Length of Stay Prediction
by Anna Annunziata, Salvatore Cappabianca, Salvatore Capuozzo, Nicola Coppola, Camilla Di Somma, Ludovico Docimo, Giuseppe Fiorentino, Michela Gravina, Lidia Marassi, Stefano Marrone, Domenico Parmeggiani, Giorgio Emanuele Polistina, Alfonso Reginelli, Caterina Sagnelli and Carlo Sansone
Big Data Cogn. Comput. 2024, 8(12), 178; https://doi.org/10.3390/bdcc8120178 - 3 Dec 2024
Viewed by 191
Abstract
Hospital overcrowding, driven by both structural management challenges and widespread medical emergencies, has prompted extensive research into machine learning (ML) solutions for predicting patient length of stay (LOS) to optimize bed allocation. While many existing models simplify the LOS prediction problem to a [...] Read more.
Hospital overcrowding, driven by both structural management challenges and widespread medical emergencies, has prompted extensive research into machine learning (ML) solutions for predicting patient length of stay (LOS) to optimize bed allocation. While many existing models simplify the LOS prediction problem to a classification task, predicting broad ranges of hospital days, an exact day-based regression model is often crucial for precise planning. Additionally, available data are typically limited and heterogeneous, often collected from a small patient cohort. To address these challenges, we present a novel multimodal ML framework that combines imaging and clinical data to enhance LOS prediction accuracy. Specifically, our approach uses the following: (i) feature extraction from chest CT scans via a convolutional neural network (CNN), (ii) their integration with clinically relevant tabular data from patient exams, refined through a feature selection system to retain only significant predictors. As a case study, we applied this framework to pneumonia patient data collected during the COVID-19 pandemic at two hospitals in Naples, Italy—one specializing in infectious diseases and the other general-purpose. Under our experimental setup, the proposed system achieved an average prediction error of only three days, demonstrating its potential to improve patient flow management in critical care environments. Full article
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) Complete dataset distribution on outcome output, where patients with value 0 have been dismissed, with value 1 are deceased without being in intensive care unit, with value 2 are deceased after being in intensive care unit. (<b>b</b>) Complete dataset distribution on LOS output.</p>
Full article ">Figure 2
<p>Complete dataset distribution on range output.</p>
Full article ">Figure 3
<p>(<b>a</b>) Vanvitelli dataset distribution on gender feature. (<b>b</b>) Cotugno dataset distribution on gender feature.</p>
Full article ">Figure 4
<p>(<b>a</b>) Vanvitelli dataset distribution on age feature. (<b>b</b>) Cotugno dataset distribution on age feature.</p>
Full article ">Figure 5
<p>(<b>a</b>) Vanvitelli dataset distribution on CT machine model feature. (<b>b</b>) Cotugno dataset distribution on CT machine model feature.</p>
Full article ">Figure 6
<p>Vanvitelli dataset correlation matrix.</p>
Full article ">Figure 7
<p>Cotugno dataset correlation matrix.</p>
Full article ">Figure 8
<p>Complete dataset correlation matrix.</p>
Full article ">Figure 9
<p>The original Vanvitelli and Cotugno datasets consist of patients’ lung CT scans and raw clinical data. To extract relevant features, we process 3D CT volumes using a 3D-CNN to obtain three-dimensional tabular features. Additionally, the slice with the highest count of non-zero pixels is identified from each CT volume and processed via a 2D-CNN to extract two-dimensional tabular features. Concurrently, the raw clinical tabular data are pre-processed to generate clinically useful features. The final, consolidated dataset—referred to as the Complete dataset—is formed by concatenating patient data from both Vanvitelli and Cotugno sources across all feature sets. Features are color-coded by dataset origin: green for Vanvitelli, orange for Cotugno, and sky blue for the Complete dataset.</p>
Full article ">Figure 10
<p>Models training workflow. The last block contains the list of all the state-of-the-art ML models, in particular classification ones for Outcome and Days Range prediction and regression ones for Length of Stay estimation. The prediction task can be Length of stay estimation, Outcome or Days range prediction, the extracted features can be 3D, 2D or Tabular, according to the extraction process shown in <a href="#BDCC-08-00178-f009" class="html-fig">Figure 9</a>, and the dataset source can be Vanvitelli, Cotugno, or Complete.</p>
Full article ">Figure 11
<p>Architecture of the ResNet50 model with the extra fully connected layer at the end, marked in orange.</p>
Full article ">Figure 12
<p>Distributions of a subset sample over days range and LOS before and after applying SMOTE on days range.</p>
Full article ">
15 pages, 2366 KiB  
Article
Gas Leakage Detection Using Tiny Machine Learning
by Majda El Barkani, Nabil Benamar, Hanae Talei and Miloud Bagaa
Electronics 2024, 13(23), 4768; https://doi.org/10.3390/electronics13234768 - 2 Dec 2024
Viewed by 252
Abstract
Gas leakage detection is a critical concern in both industrial and residential settings, where real-time systems are essential for quickly identifying potential hazards and preventing dangerous incidents. Traditional detection systems often rely on centralized data processing, which can lead to delays and scalability [...] Read more.
Gas leakage detection is a critical concern in both industrial and residential settings, where real-time systems are essential for quickly identifying potential hazards and preventing dangerous incidents. Traditional detection systems often rely on centralized data processing, which can lead to delays and scalability issues. To overcome these limitations, in this study, we present a solution based on tiny machine learning (TinyML) to process data directly on devices. TinyML has the potential to execute machine learning algorithms locally, in real time, and using tiny devices, such as microcontrollers, ensuring faster and more efficient responses to potential dangers. Our approach combines an MLX90640 thermal camera with two optimized convolutional neural networks (CNNs), MobileNetV1 and EfficientNet-B0, deployed on the Arduino Nano 33 BLE Sense. The results show that our system not only provides real-time analytics but does so with high accuracy—88.92% for MobileNetV1 and 91.73% for EfficientNet-B0—while achieving inference times of 1414 milliseconds and using just 124.8 KB of memory. Compared to existing solutions, our edge-based system overcomes common challenges related to latency and scalability, making it a reliable, fast, and efficient option. This work demonstrates the potential for low-cost, scalable gas detection systems that can be deployed widely to enhance safety in various environments. By integrating cutting-edge machine learning models with affordable IoT devices, we aim to make safety more accessible, regardless of financial limitations, and pave the way for further innovation in environmental monitoring solutions. Full article
Show Figures

Figure 1

Figure 1
<p>Experimental setup for data collection (reprinted, with permission, from [<a href="#B20-electronics-13-04768" class="html-bibr">20</a>] @2022 MDPI).</p>
Full article ">Figure 2
<p>The four dataset categories: (<b>a</b>) Mixture; (<b>b</b>); No Gas, (<b>c</b>); Perfume; (<b>d</b>) Smoke.</p>
Full article ">Figure 3
<p>Confusion matrix for the MobileNetV1 model.</p>
Full article ">Figure 4
<p>Confusion matrix for the BO configuration.</p>
Full article ">
23 pages, 3424 KiB  
Article
Automated Detection of Gastrointestinal Diseases Using Resnet50*-Based Explainable Deep Feature Engineering Model with Endoscopy Images
by Veysel Yusuf Cambay, Prabal Datta Barua, Abdul Hafeez Baig, Sengul Dogan, Mehmet Baygin, Turker Tuncer and U. R. Acharya
Sensors 2024, 24(23), 7710; https://doi.org/10.3390/s24237710 (registering DOI) - 2 Dec 2024
Viewed by 200
Abstract
This work aims to develop a novel convolutional neural network (CNN) named ResNet50* to detect various gastrointestinal diseases using a new ResNet50*-based deep feature engineering model with endoscopy images. The novelty of this work is the development of ResNet50*, a new variant of [...] Read more.
This work aims to develop a novel convolutional neural network (CNN) named ResNet50* to detect various gastrointestinal diseases using a new ResNet50*-based deep feature engineering model with endoscopy images. The novelty of this work is the development of ResNet50*, a new variant of the ResNet model, featuring convolution-based residual blocks and a pooling-based attention mechanism similar to PoolFormer. Using ResNet50*, a gastrointestinal image dataset was trained, and an explainable deep feature engineering (DFE) model was developed. This DFE model comprises four primary stages: (i) feature extraction, (ii) iterative feature selection, (iii) classification using shallow classifiers, and (iv) information fusion. The DFE model is self-organizing, producing 14 different outcomes (8 classifier-specific and 6 voted) and selecting the most effective result as the final decision. During feature extraction, heatmaps are identified using gradient-weighted class activation mapping (Grad-CAM) with features derived from these regions via the final global average pooling layer of the pretrained ResNet50*. Four iterative feature selectors are employed in the feature selection stage to obtain distinct feature vectors. The classifiers k-nearest neighbors (kNN) and support vector machine (SVM) are used to produce specific outcomes. Iterative majority voting is employed in the final stage to obtain voted outcomes using the top result determined by the greedy algorithm based on classification accuracy. The presented ResNet50* was trained on an augmented version of the Kvasir dataset, and its performance was tested using Kvasir, Kvasir version 2, and wireless capsule endoscopy (WCE) curated colon disease image datasets. Our proposed ResNet50* model demonstrated a classification accuracy of more than 92% for all three datasets and a remarkable 99.13% accuracy for the WCE dataset. These findings affirm the superior classification ability of the ResNet50* model and confirm the generalizability of the developed architecture, showing consistent performance across all three distinct datasets. Full article
(This article belongs to the Special Issue AI-Based Automated Recognition and Detection in Healthcare)
Show Figures

Figure 1

Figure 1
<p>Sample images used in this work. (<b>a</b>) Dyed lifted polyps. (<b>b</b>) Dyed resection margins. (<b>c</b>) Esophagitis. (<b>d</b>) Normal cecum. (<b>e</b>) Normal pylorus. (<b>f</b>) Normal z-line. (<b>g</b>) Polyps. (<b>h</b>) Ulcerative colitis.</p>
Full article ">Figure 2
<p>Block designs for ResNet and ResNet*. F: number of filters.</p>
Full article ">Figure 3
<p>Graphical demonstration of the proposed ResNet50*. F: number of filters, BN: batch normalization, ReLU: restricted linear unit, Avg. Pool: average pooling, Max Pool: maximum pooling, GAP: global average pooling, FC: fully connected.</p>
Full article ">Figure 4
<p>Graphical overview of the proposed ResNet50*-based DFE model. Here, f: selected feature vector, c: classifier-based outcome, v: voted outcome.</p>
Full article ">Figure 5
<p>Graph of training and validation accuracies/losses versus number of epochs.</p>
Full article ">Figure 6
<p>Confusion matrices obtained for the proposed ResNet50*-based DFE model. In the confusion matrices, cells with a blue background represent the correctly predicted observations (true positives) for each class. Cells with a white background represent zeros, indicating no predictions for those combinations. Cells with a beige background represent the number of falsely predicted observations (misclassifications) between classes.</p>
Full article ">Figure 6 Cont.
<p>Confusion matrices obtained for the proposed ResNet50*-based DFE model. In the confusion matrices, cells with a blue background represent the correctly predicted observations (true positives) for each class. Cells with a white background represent zeros, indicating no predictions for those combinations. Cells with a beige background represent the number of falsely predicted observations (misclassifications) between classes.</p>
Full article ">Figure 7
<p>Sample images and their corresponding heatmaps. The colors in the heatmap typically represent varying levels of intensity or importance: blue tones indicate areas that are not important, yellow tones represent moderately important areas, and red tones highlight regions that are highly important for feature extraction.</p>
Full article ">Figure 8
<p>Validation accuracies (%) obtained for the ResNet50 and ResNet50* models.</p>
Full article ">Figure 9
<p>Classification accuracies (%) obtained using various feature selectors and classifiers. The given red lines represent the median of the accuracies.</p>
Full article ">Figure 10
<p>Number of times classifiers and feature selectors were used to obtain the best outcomes.</p>
Full article ">
22 pages, 29223 KiB  
Article
Risk Assessment of Bridge Damage Due to Heavy Rainfall Considering Landslide Risk and Driftwood Generation Potential Using Convolutional Neural Networks and Conventional Machine Learning
by Fudong Ren, Koichi Isobe and Miku Ando
Water 2024, 16(23), 3471; https://doi.org/10.3390/w16233471 - 2 Dec 2024
Viewed by 316
Abstract
This study addresses the assessment of bridge damage risks associated with heavy rainfall, focusing on landslide susceptibility and driftwood generation potential. By integrating convolutional neural networks (CNNs) with traditional machine learning methods, the research develops an advanced predictive framework for estimating driftwood accumulation [...] Read more.
This study addresses the assessment of bridge damage risks associated with heavy rainfall, focusing on landslide susceptibility and driftwood generation potential. By integrating convolutional neural networks (CNNs) with traditional machine learning methods, the research develops an advanced predictive framework for estimating driftwood accumulation at river bridges—a recognized challenge in disaster management. Concentrating on the Tokachi River basin in Hokkaido, Japan, the research utilizes diverse environmental and geographical data from authoritative sources. The findings demonstrate that the innovative approach not only enhances the accuracy of driftwood volume predictions but also distinguishes the effectiveness of CNNs compared to conventional methods. Crucially, areas prone to landslides are identified as significant contributors to driftwood generation, impacting bridge safety. The study underscores the potential of machine learning models in improving disaster risk assessment, while suggesting further exploration into real-time data integration and model refinement to adapt to changing climate conditions and ensure long-term infrastructure safety. Full article
Show Figures

Figure 1

Figure 1
<p>Bridge damage caused by driftwood, as reported in survey findings from the Civil Engineering Bureau, Road Division, Hokkaido Prefectural Government. (<b>a</b>) River channel blockage due to driftwood and embankment erosion behind the abutment. (<b>b</b>) Subsidence of bridge piers due to riverbed scouring and channel blockage from driftwood.</p>
Full article ">Figure 2
<p>(<b>a</b>) Location of study area, (<b>b</b>) Distribution of major rivers, (<b>c</b>) Distributions of historical landslides, (<b>d</b>) Distribution of bridges in the study area.</p>
Full article ">Figure 3
<p>Maps of landslide features in the study area. (<b>a</b>) Elevation (m), (<b>b</b>) Slope (degrees), (<b>c</b>) Aspect, (<b>d</b>) Curvature, (<b>e</b>) Average rainfall (mm), (<b>f</b>) Distance to rivers (m), (<b>g</b>) Distance to roads (m), (<b>h</b>) Distance to faults (m).</p>
Full article ">Figure 3 Cont.
<p>Maps of landslide features in the study area. (<b>a</b>) Elevation (m), (<b>b</b>) Slope (degrees), (<b>c</b>) Aspect, (<b>d</b>) Curvature, (<b>e</b>) Average rainfall (mm), (<b>f</b>) Distance to rivers (m), (<b>g</b>) Distance to roads (m), (<b>h</b>) Distance to faults (m).</p>
Full article ">Figure 4
<p>Flowchart of landslide risk assessment.</p>
Full article ">Figure 5
<p>Flowchart for assessing the risk of river-crossing bridges during heavy rain.</p>
Full article ">Figure 6
<p>Distribution map of (<b>a</b>) vegetation reclassification and (<b>b</b>) landslide susceptibility area with slopes exceeding 30 degrees.</p>
Full article ">Figure 7
<p>Method for calculating the driftwood capture amount.</p>
Full article ">Figure 8
<p>Method for calculating the travel distance of sediment from landslide collapses [<a href="#B4-water-16-03471" class="html-bibr">4</a>,<a href="#B20-water-16-03471" class="html-bibr">20</a>].</p>
Full article ">Figure 9
<p>Diagram of CNN model structure.</p>
Full article ">Figure 10
<p>Diagram of cross-validation.</p>
Full article ">Figure 11
<p>Information gain ratio of the features used for the landslide risk assessment.</p>
Full article ">Figure 12
<p>Information gain ratio of the features used for the bridge disaster risk assessment.</p>
Full article ">Figure 13
<p>Landslide risk prediction AUC for different models.</p>
Full article ">Figure 14
<p>Landslide risk assessment using CNN and MLP.</p>
Full article ">Figure 15
<p>Calculation results of driftwood generation potential (DGP) and driftwood capture amount (DCA) based on CNN.</p>
Full article ">Figure 16
<p>Comparison of AUC for bridge disaster risk prediction in training groups using different models based on various bridge structures. (<b>a</b>) Span = 1, (<b>b</b>) Span ≥ 2.</p>
Full article ">Figure 17
<p>Prediction results for bridge disaster risk. (<b>a</b>) Bridge disaster prediction results, (<b>b</b>) Distribution map of high disaster risk bridges.</p>
Full article ">Figure A1
<p>Maps showing parts of landslide features: (<b>a</b>) Land use, (<b>b</b>) Soil, (<b>c</b>) Lithology, and (<b>d</b>) Vegetation.</p>
Full article ">Figure A1 Cont.
<p>Maps showing parts of landslide features: (<b>a</b>) Land use, (<b>b</b>) Soil, (<b>c</b>) Lithology, and (<b>d</b>) Vegetation.</p>
Full article ">
16 pages, 2225 KiB  
Article
Multimodal MRI Deep Learning for Predicting Central Lymph Node Metastasis in Papillary Thyroid Cancer
by Xiuyu Wang, Heng Zhang, Hang Fan, Xifeng Yang, Jiansong Fan, Puyeh Wu, Yicheng Ni and Shudong Hu
Cancers 2024, 16(23), 4042; https://doi.org/10.3390/cancers16234042 - 2 Dec 2024
Viewed by 183
Abstract
Background: Central lymph node metastasis (CLNM) in papillary thyroid cancer (PTC) significantly influences surgical decision-making strategies. Objectives: This study aims to develop a predictive model for CLNM in PTC patients using magnetic resonance imaging (MRI) and clinicopathological data. Methods: By incorporating deep learning [...] Read more.
Background: Central lymph node metastasis (CLNM) in papillary thyroid cancer (PTC) significantly influences surgical decision-making strategies. Objectives: This study aims to develop a predictive model for CLNM in PTC patients using magnetic resonance imaging (MRI) and clinicopathological data. Methods: By incorporating deep learning (DL) algorithms, the model seeks to address the challenges in diagnosing CLNM and reduce overtreatment. The results were compared with traditional machine learning (ML) models. In this retrospective study, preoperative MRI data from 105 PTC patients were divided into training and testing sets. A radiologist manually outlined the region of interest (ROI) on MRI images. Three classic ML algorithms (support vector machine [SVM], logistic regression [LR], and random forest [RF]) were employed across different data modalities. Additionally, an AMMCNet utilizing convolutional neural networks (CNNs) was proposed to develop DL models for CLNM. Predictive performance was evaluated using receiver operator characteristic (ROC) curve analysis, and clinical utility was assessed through decision curve analysis (DCA). Results: Lesion diameter was identified as an independent risk factor for CLNM. Among ML models, the RF-(T1WI + T2WI, T1WI + T2WI + Clinical) models achieved the highest area under the curve (AUC) at 0.863. The DL fusion model surpassed all ML fusion models with an AUC of 0.891. Conclusions: A fusion model based on the AMMCNet architecture using MRI images and clinicopathological data was developed, effectively predicting CLNM in PTC patients. Full article
(This article belongs to the Special Issue Recent Advances in Oncology Imaging: 2nd Edition)
Show Figures

Figure 1

Figure 1
<p>Inclusion and exclusion flowchart.</p>
Full article ">Figure 2
<p>Segmentation of the ROI in axial T1 and T2 images. The red arrows in the MRI images indicate the location of the primary lesion.</p>
Full article ">Figure 3
<p>Distribution of LASSO coefficients for T1 features, T2 features, and combined T1 + T2 features. (<b>a</b>,<b>b</b>) represent T1 features, (<b>c</b>,<b>d</b>) represent T2 features and (<b>e</b>,<b>f</b>) represent the combined T1 + T2 features.</p>
Full article ">Figure 4
<p>Deep learning model workflow.</p>
Full article ">Figure 5
<p>Histograms of best feature coefficients for T1, T2, and T1 + T2. (<b>a</b>) The best feature coefficients for T1; (<b>b</b>) the best feature coefficients for T2; (<b>c</b>) the best feature coefficients for T1 + T2.</p>
Full article ">Figure 6
<p>The ROC curves of ML and DL models on the training set and test set. (<b>a</b>,<b>b</b>) The ROC curves of the SVM models on the training set and test set, (<b>c</b>,<b>d</b>) the ROC curves of the LR models on the training set and test set, (<b>e</b>,<b>f</b>) the ROC curves of RF models on training set and test set, (<b>g</b>,<b>h</b>) ROC curves of DL models on training set and test set.</p>
Full article ">Figure 7
<p>ML and DL models’ DCA curves on the test set. (<b>a</b>) The DCA curves of the SVM models on the test set, (<b>b</b>) the DCA curves of the LR models on the test set, (<b>c</b>) the DCA curves of RF models on the test set, and (<b>d</b>) the DCA curves of the DL modes on the test set.</p>
Full article ">
16 pages, 952 KiB  
Article
CGFTNet:Content-Guided Frequency Domain Transform Network for Face Super-Resolution
by Yeerlan Yekeben, Shuli Cheng and Anyu Du
Information 2024, 15(12), 765; https://doi.org/10.3390/info15120765 (registering DOI) - 2 Dec 2024
Viewed by 228
Abstract
Recent advancements in face super resolution (FSR) have been propelled by deep learning techniques using convolutional neural networks (CNN). However, existing methods still struggle with effectively capturing global facial structure information, leading to reduced fidelity in reconstructed images, and often require additional manual [...] Read more.
Recent advancements in face super resolution (FSR) have been propelled by deep learning techniques using convolutional neural networks (CNN). However, existing methods still struggle with effectively capturing global facial structure information, leading to reduced fidelity in reconstructed images, and often require additional manual data annotation. To overcome these challenges, we introduce a content-guided frequency domain transform network (CGFTNet) for face super-resolution tasks. The network features a channel attention-linked encoder-decoder architecture with two key components: the Frequency Domain and Reparameterized Focus Convolution Feature Enhancement module (FDRFEM) and the Content-Guided Channel Attention Fusion (CGCAF) module. FDRFEM enhances feature representation through transformation domain techniques and reparameterized focus convolution (RefConv), capturing detailed facial features and improving image quality. CGCAF dynamically adjusts feature fusion based on image content, enhancing detail restoration. Extensive evaluations across multiple datasets demonstrate that the proposed CGFTNet consistently outperforms other state-of-the-art methods. Full article
Show Figures

Figure 1

Figure 1
<p>An overview of the proposed CGFTNet. FDRFEM and CGCAF are two core modules that we have proposed.</p>
Full article ">Figure 2
<p>The architecture of the proposed CGCAF.</p>
Full article ">Figure 3
<p>The architecture of the proposed channel attention (CA), which is a component of CGCAF.</p>
Full article ">Figure 4
<p>The architecture of the proposed FDRFEM.</p>
Full article ">Figure 5
<p>Visual comparisons of multiple methods for 8x super-resolution on the CelebA test set.</p>
Full article ">Figure 6
<p>Visual comparisons of multiple methods for 8x super-resolution on the Helen test set.</p>
Full article ">Figure 7
<p>Visual comparisons of multiple methods for 8x super resolution on the real world images.</p>
Full article ">Figure 8
<p>Model complexity studies for ×8 SR on the CelebA test sets. Our CGFTNet achieves a better balance between model size, model performance, and execution time.</p>
Full article ">
29 pages, 1618 KiB  
Article
Optimization of Deep Neural Networks Using a Micro Genetic Algorithm
by Ricardo Landa, David Tovias-Alanis and Gregorio Toscano
AI 2024, 5(4), 2651-2679; https://doi.org/10.3390/ai5040127 (registering DOI) - 2 Dec 2024
Viewed by 213
Abstract
This work proposes the use of a micro genetic algorithm to optimize the architecture of fully connected layers in convolutional neural networks, with the aim of reducing model complexity without sacrificing performance. Our approach applies the paradigm of transfer learning, enabling training without [...] Read more.
This work proposes the use of a micro genetic algorithm to optimize the architecture of fully connected layers in convolutional neural networks, with the aim of reducing model complexity without sacrificing performance. Our approach applies the paradigm of transfer learning, enabling training without the need for extensive datasets. A micro genetic algorithm requires fewer computational resources due to its reduced population size, while still preserving a substantial degree of the search capabilities found in algorithms with larger populations. By exploring different representations and objective functions, including classification accuracy, hidden neuron ratio, minimum redundancy, and maximum relevance for feature selection, eight algorithmic variants were developed, with six variants performing both hidden layers reduction and feature-selection tasks. Experimental results indicate that the proposed algorithm effectively reduces the architecture of the fully connected layers in the convolutional neural network. The variant achieving the best reduction used only 44% of the convolutional features in the input layer, and only 9.7% of neurons in the hidden layers, without negatively impacting (statistically confirmed) classification accuracy when compared to a network model based on a full reference architecture and a representative method from the literature. Full article
(This article belongs to the Section AI Systems: Theory and Applications)
Show Figures

Figure 1

Figure 1
<p>General scheme of the proposed method. Top: CNN model trained on a source domain. Center: Transfer learning of the pre-trained model parameters and tuning of the model weights of a DNN (FC layers) using the <math display="inline"><semantics> <mi>μ</mi> </semantics></math>GA-DNN algorithm. Bottom: Schematic illustrating the operation of the proposed method.</p>
Full article ">Figure 2
<p>Example of an FC layer architecture: The input layer is related to the <span class="html-italic">d</span> features automatically extracted by the convolutional layers of the CNN model. In this architecture, there are <span class="html-italic">m</span> hidden layers, where <math display="inline"><semantics> <mrow> <msub> <mi>L</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>L</mi> <mn>2</mn> </msub> <mo>,</mo> <mo>…</mo> <mo>,</mo> <msub> <mi>L</mi> <mi>m</mi> </msub> </mrow> </semantics></math> indicate the number of neurons in each. Finally, the output layer provides a response (prediction) <math display="inline"><semantics> <msub> <mi>z</mi> <mi>i</mi> </msub> </semantics></math>, with <math display="inline"><semantics> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mo>…</mo> <mo>,</mo> <mi>c</mi> </mrow> </semantics></math>, for each of the <span class="html-italic">c</span> classes of the input dataset.</p>
Full article ">Figure 3
<p>Example of an FC layer architecture. The input layer is related to the features of the dataset. Thus, this scheme assumes that the problem has a dimensionality of <math display="inline"><semantics> <mrow> <mi>d</mi> <mo>=</mo> <mn>2048</mn> </mrow> </semantics></math>. Additionally, it has <math display="inline"><semantics> <mrow> <mi>m</mi> <mo>=</mo> <mn>2</mn> </mrow> </semantics></math> hidden layers, each with 512 neurons, and the output layer has <span class="html-italic">c</span> neurons, where <span class="html-italic">c</span> is the number of classes of the problem.</p>
Full article ">Figure 4
<p>Example of a chromosome using the binary representation <math display="inline"><semantics> <mi>μ</mi> </semantics></math>GA-1, which is composed of three binary blocks representing the first hidden layer (<math display="inline"><semantics> <msub> <mi>L</mi> <mn>1</mn> </msub> </semantics></math>), the second hidden layer (<math display="inline"><semantics> <msub> <mi>L</mi> <mn>2</mn> </msub> </semantics></math>), and the learning rate (<math display="inline"><semantics> <msub> <mi>l</mi> <mi>r</mi> </msub> </semantics></math>).</p>
Full article ">Figure 5
<p>Example of a chromosome that uses the binary representation <math display="inline"><semantics> <mi>μ</mi> </semantics></math>GA-FS, which is composed of 2051 binary blocks. The first two blocks consist of 9 bits each, the third of 17 bits, and the last 2048 blocks each consist of 1 bit.</p>
Full article ">Figure 6
<p>Example of a chromosome using the binary representation <math display="inline"><semantics> <mi>μ</mi> </semantics></math>GA-MRMR, which is composed of four binary blocks.</p>
Full article ">Figure 7
<p>Results of the <math display="inline"><semantics> <mi>ACC</mi> </semantics></math> indicator obtained by each method are shown using boxplots. The best values are indicated in bold. The mean of <math display="inline"><semantics> <mi>ACC</mi> </semantics></math> and the <span class="html-italic">p</span>-value of the Wilcoxon rank sum test are shown at the top of each plot.</p>
Full article ">Figure 8
<p>Results of the mean <math display="inline"><semantics> <mi>FSR</mi> </semantics></math> obtained by each method. The top part shows the <span class="html-italic">p</span>-value of the Wilcoxon rank sum test obtained by comparing each method with <math display="inline"><semantics> <msup> <mi>F</mi> <mi>R</mi> </msup> </semantics></math> (the best variant of the proposed method in terms of this indicator). In bold, <math display="inline"><semantics> <mrow> <mi>p</mi> <mo>&lt;</mo> <mn>0.05</mn> </mrow> </semantics></math>.</p>
Full article ">Figure 9
<p>Results of the mean <math display="inline"><semantics> <mi>HNR</mi> </semantics></math> obtained by each method. The top part shows the <span class="html-italic">p</span>-value of the Wilcoxon rank sum test obtained by comparing each method with <math display="inline"><semantics> <msup> <mi>F</mi> <mi>R</mi> </msup> </semantics></math> (the best variant of the proposed method in terms of this indicator). In bold, <math display="inline"><semantics> <mrow> <mi>p</mi> <mo>&lt;</mo> <mn>0.05</mn> </mrow> </semantics></math>.</p>
Full article ">Figure 10
<p>Results of the average <math display="inline"><semantics> <mi>MC</mi> </semantics></math> obtained by each method. The best values are indicated in bold.</p>
Full article ">Figure 11
<p>Results of the average number of evaluations of the objective function obtained by each method. The best values are indicated in bold.</p>
Full article ">Figure 12
<p>Results of the average runtime (in seconds) of each method.</p>
Full article ">Figure 13
<p>Comparison of classification accuracy for the proposed models and the complete reference model.</p>
Full article ">
Back to TopTop