Nothing Special   »   [go: up one dir, main page]

Next Issue
Volume 6, December
Previous Issue
Volume 6, June
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 

Mach. Learn. Knowl. Extr., Volume 6, Issue 3 (September 2024) – 40 articles

Cover Story (view full-size image): This study explores the impact of climate change on soil health by focusing on the temperature sensitivity of soil microbial respiration (Q10). Leveraging Explainable Artificial Intelligence (XAI), the research uncovers the key chemical, physical, and microbiological soil factors that influence Q10 values. Our findings reveal the pivotal role of the soil microbiome in driving soil respiration responses to warming. By identifying these critical variables, the study provides essential insights into soil carbon dynamics, informing the development of innovative strategies for climate change mitigation and sustainable soil management. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
19 pages, 3565 KiB  
Article
A Multi-Objective Framework for Balancing Fairness and Accuracy in Debiasing Machine Learning Models
by Rashmi Nagpal, Ariba Khan, Mihir Borkar and Amar Gupta
Mach. Learn. Knowl. Extr. 2024, 6(3), 2130-2148; https://doi.org/10.3390/make6030105 - 20 Sep 2024
Viewed by 2219
Abstract
Machine learning algorithms significantly impact decision-making in high-stakes domains, necessitating a balance between fairness and accuracy. This study introduces an in-processing, multi-objective framework that leverages the Reject Option Classification (ROC) algorithm to simultaneously optimize fairness and accuracy while safeguarding protected attributes such as [...] Read more.
Machine learning algorithms significantly impact decision-making in high-stakes domains, necessitating a balance between fairness and accuracy. This study introduces an in-processing, multi-objective framework that leverages the Reject Option Classification (ROC) algorithm to simultaneously optimize fairness and accuracy while safeguarding protected attributes such as age and gender. Our approach seeks a multi-objective optimization solution that balances accuracy, group fairness loss, and individual fairness loss. The framework integrates fairness objectives without relying on a weighted summation method, instead focusing on directly optimizing the trade-offs. Empirical evaluations on publicly available datasets, including German Credit, Adult Income, and COMPAS, reveal several significant findings: the ROC-based approach demonstrates superior performance, achieving an accuracy of 94.29%, an individual fairness loss of 0.04, and a group fairness loss of 0.06 on the German Credit dataset. These results underscore the effectiveness of our framework, particularly the ROC component, in enhancing both the fairness and performance of machine learning models. Full article
Show Figures

Figure 1

Figure 1
<p>Baseline model—German Credit dataset.</p>
Full article ">Figure 2
<p>Baseline model—COMPAS dataset.</p>
Full article ">Figure 3
<p>Baseline model—Adult Income dataset.</p>
Full article ">Figure 4
<p>Proposed multi-objective optimization approach.</p>
Full article ">Figure 5
<p>Running time analysis across various datasets.</p>
Full article ">
19 pages, 3484 KiB  
Article
Efficient Visual-Aware Fashion Recommendation Using Compressed Node Features and Graph-Based Learning
by Umar Subhan Malhi, Junfeng Zhou, Abdur Rasool and Shahbaz Siddeeq
Mach. Learn. Knowl. Extr. 2024, 6(3), 2111-2129; https://doi.org/10.3390/make6030104 - 15 Sep 2024
Cited by 1 | Viewed by 1221
Abstract
In fashion e-commerce, predicting item compatibility using visual features remains a significant challenge. Current recommendation systems often struggle to incorporate high-dimensional visual data into graph-based learning models effectively. This limitation presents a substantial opportunity to enhance the precision and effectiveness of fashion recommendations. [...] Read more.
In fashion e-commerce, predicting item compatibility using visual features remains a significant challenge. Current recommendation systems often struggle to incorporate high-dimensional visual data into graph-based learning models effectively. This limitation presents a substantial opportunity to enhance the precision and effectiveness of fashion recommendations. In this paper, we present the Visual-aware Graph Convolutional Network (VAGCN). This novel framework helps improve how visual features can be incorporated into graph-based learning systems for fashion item compatibility predictions. The VAGCN framework employs a deep-stacked autoencoder to convert the input image’s high-dimensional raw CNN visual features into more manageable low-dimensional representations. In addition to improving feature representation, the GCN can also reason more intelligently about predictions, which would not be possible without this compression. The GCN encoder processes nodes in the graph to capture structural and feature correlation. Following the GCN encoder, the refined embeddings are input to a multi-layer perceptron (MLP) to calculate compatibility scores. The approach extends to using neighborhood information only during the testing phase to help with training efficiency and generalizability in practical scenarios, a key characteristic of our model. By leveraging its ability to capture latent visual features and neighborhood-based learning, VAGCN thoroughly investigates item compatibility across various categories. This method significantly improves predictive accuracy, consistently outperforming existing benchmarks. These contributions tackle significant scalability and computational efficiency challenges, showcasing the potential transformation of recommendation systems through enhanced feature representation, paving the way for further innovations in the fashion domain. Full article
(This article belongs to the Special Issue Machine Learning in Data Science)
Show Figures

Figure 1

Figure 1
<p>The diagram illustrates the model for predicting the compatibility scores of fashion items. (<b>a</b>) Feature extraction using CNN-F architecture produces a 4096-dimensional vector <span class="html-italic">x</span> from each fashion image, capturing detailed image features. (<b>b</b>) A deep-stacked autoencoder transforms these features into a latent space <span class="html-italic">y</span>, optimizing for subsequent processing. (<b>c</b>) We construct a relational graph by effectively merging item interactions (e.g., ’also viewed’, ’also bought’, ’bought together’) with node latent features, thereby enhancing data relational insights. (<b>d</b>) A tailored GCN encoder refines the graph, and then an edge prediction layer computes item compatibility scores.</p>
Full article ">Figure 2
<p>The diagram illustrates the GCN workflow, which begins with the initial input graph, which contains 256 node features. Subsequent graph convolution layers reduce the dimensionality to 64 features, utilizing ReLU activations for nonlinear transformations. The process concludes in the embedding space, where an MLP decoder utilizes node embeddings to calculate compatibility scores between nodes.</p>
Full article ">Figure 3
<p>ROC Curves for the Women’s and Men’s category interactions across different <span class="html-italic">k</span> values, showing the AUC metrics.</p>
Full article ">Figure 4
<p>Training and Validation Loss Trends for the VAGCN, demonstrating effective error minimization and consistent performance throughout the training process.</p>
Full article ">Figure 5
<p>Test Accuracy versus Neighborhood Size (k-values) for Men’s and Women’s Categories, indicating accuracy stabilization beyond k = 20 for various interaction types.</p>
Full article ">Figure 6
<p>Training loss and accuracy comparison between the GCN configurations (256, 128, 64) and Uniform GCN (256 × 3), showcasing the superior stability and efficiency of the decreasing layer size model.</p>
Full article ">Figure 7
<p>Performance comparison of different learning rates, indicating how a 0.01 learning rate achieves an optimal balance between rapid convergence and robust generalization.</p>
Full article ">Figure 8
<p>Visualization of link predictions in a women’s fashion test dataset, illustrating compatibility scores between item pairs. The scores are normalized between 0 and 1, where values closer to 1 indicate high compatibility and values closer to 0 denote low compatibility.</p>
Full article ">
15 pages, 5455 KiB  
Article
Show Me Once: A Transformer-Based Approach for an Assisted-Driving System
by Federico Pacini, Pierpaolo Dini and Luca Fanucci
Mach. Learn. Knowl. Extr. 2024, 6(3), 2096-2110; https://doi.org/10.3390/make6030103 - 13 Sep 2024
Viewed by 970
Abstract
Operating a powered wheelchair involves significant risks and requires considerable cognitive effort to maintain effective awareness of the surrounding environment. Therefore, people with significant disabilities are at a higher risk, leading to a decrease in their social interactions, which can impact their overall [...] Read more.
Operating a powered wheelchair involves significant risks and requires considerable cognitive effort to maintain effective awareness of the surrounding environment. Therefore, people with significant disabilities are at a higher risk, leading to a decrease in their social interactions, which can impact their overall health and well-being. Thus, we propose an intelligent driving-assistance system that innovatively uses Transformers, typically employed in Natural Language Processing, for navigation and a retrieval mechanism, allowing users to specify their destinations using natural language. The system records the areas visited and enables users to pinpoint these locations through descriptions, which will be considered later in the retrieval phase. Taking a foundational model, the system is fine-tuned with simulated data. The preliminary results demonstrate the system’s effectiveness compared to non-assisted solutions and its readiness for deployment on edge devices. Full article
(This article belongs to the Special Issue Advances in Machine and Deep Learning)
Show Figures

Figure 1

Figure 1
<p>Wheelchair’s geometrical representation.</p>
Full article ">Figure 2
<p>Schematic representation of the system overview.</p>
Full article ">Figure 3
<p>Show Me Once System structure overview with submodules’ details.</p>
Full article ">Figure 4
<p>Environments used for SMOS fine-tuning.</p>
Full article ">Figure 5
<p>In all the environments, point A and B represent, respectively, the starting position and the location where the target image (green frame) was captured. The system is provided with the target image as input, and the objective is to observe how the system navigates the wheelchair from point A to point B. The tests are arranged in order of increasing difficulty, from the easiest to the most challenging.</p>
Full article ">Figure 6
<p>Results related to the performance against the topological graph experiment. For all three graphs, the <span class="html-italic">Y</span>-axis shows the ratio of the current performance, given by a specific topological graph, to the performance provided by a “rich” topological graph for that environment. The <span class="html-italic">X</span>-axis represents the time spent building the topological graph. The orange line indicates the performance reference at <math display="inline"><semantics> <mrow> <mi>t</mi> <mo>=</mo> <mover accent="true"> <mi>E</mi> <mo>˜</mo> </mover> </mrow> </semantics></math>, while the green line marks <math display="inline"><semantics> <mrow> <mi>t</mi> <mo>=</mo> <msup> <mi>E</mi> <mo>*</mo> </msup> </mrow> </semantics></math>, the point at which the system reaches 80% of the performance of a “rich” graph.</p>
Full article ">Figure 7
<p>In all the environments, points A and B(x) represent, respectively, the starting position and the location where the target image was captured. The system is provided with the target image as input, and the objective is to observe if the system is able to avoid collisions. For the moving objects, each object moves according to the arrows. For the environment with ramps, multiple goals are evaluated.</p>
Full article ">Figure 8
<p>Moving-obstacles environment: the number of collisions tends to be infinite as the speed exceeds roughly 4 m/s (red area). Ramps environment: the red trajectory terminates with an overturning; the orange one is related to the usage of banned areas around curbs. (<b>a</b>) Moving objects’ graph result. (<b>b</b>) Trajectories in the environment with ramps.</p>
Full article ">Figure A1
<p>Sample of images collected during training.</p>
Full article ">Figure A2
<p>Trajectory samples during the experiment, which compared the fine-tuned system (W_FT) against the vanilla one (WO_FT).</p>
Full article ">
22 pages, 1814 KiB  
Article
A Data Science and Sports Analytics Approach to Decode Clutch Dynamics in the Last Minutes of NBA Games
by Vangelis Sarlis, Dimitrios Gerakas and Christos Tjortjis
Mach. Learn. Knowl. Extr. 2024, 6(3), 2074-2095; https://doi.org/10.3390/make6030102 - 13 Sep 2024
Viewed by 2640
Abstract
This research investigates clutch performance in the National Basketball Association (NBA) with a focus on the final minutes of contested games. By employing advanced data science techniques, we aim to identify key factors that enhance winning probabilities during these critical moments. The study [...] Read more.
This research investigates clutch performance in the National Basketball Association (NBA) with a focus on the final minutes of contested games. By employing advanced data science techniques, we aim to identify key factors that enhance winning probabilities during these critical moments. The study introduces the Estimation of Clutch Competency (EoCC) metric, which is a novel formula designed to evaluate players’ impact under pressure. Examining player performance statistics over twenty seasons, this research addresses a significant gap in the literature regarding the quantification of clutch moments and challenges conventional wisdom in basketball analytics. Our findings deal valuable insights into player efficiency during the final minutes and its impact on the probabilities of a positive outcome. The EoCC metric’s validation through comparison with the NBA Clutch Player of the Year voting results demonstrates its effectiveness in identifying top performers in high-pressure situations. Leveraging state-of-the-art data science techniques and algorithms, this study analyzes play data to uncover key factors contributing to a team’s success in pivotal moments. This research not only enhances the theoretical understanding of clutch dynamics but also provides practical insights for coaches, analysts, and the broader sports community. It contributes to more informed decision making in high-stakes basketball environments, advancing the field of sports analytics. Full article
Show Figures

Figure 1

Figure 1
<p>Data preprocessing steps for NBA clutch performance analysis.</p>
Full article ">Figure 2
<p>Sample of the final aggregated dataset.</p>
Full article ">
25 pages, 8181 KiB  
Article
A Novel Integration of Data-Driven Rule Generation and Computational Argumentation for Enhanced Explainable AI
by Lucas Rizzo, Damiano Verda, Serena Berretta and Luca Longo
Mach. Learn. Knowl. Extr. 2024, 6(3), 2049-2073; https://doi.org/10.3390/make6030101 - 12 Sep 2024
Cited by 1 | Viewed by 998
Abstract
Explainable Artificial Intelligence (XAI) is a research area that clarifies AI decision-making processes to build user trust and promote responsible AI. Hence, a key scientific challenge in XAI is the development of methods that generate transparent and interpretable explanations while maintaining scalability and [...] Read more.
Explainable Artificial Intelligence (XAI) is a research area that clarifies AI decision-making processes to build user trust and promote responsible AI. Hence, a key scientific challenge in XAI is the development of methods that generate transparent and interpretable explanations while maintaining scalability and effectiveness in complex scenarios. Rule-based methods in XAI generate rules that can potentially explain AI inferences, yet they can also become convoluted in large scenarios, hindering their readability and scalability. Moreover, they often lack contrastive explanations, leaving users uncertain why specific predictions are preferred. To address this scientific problem, we explore the integration of computational argumentation—a sub-field of AI that models reasoning processes through defeasibility—into rule-based XAI systems. Computational argumentation enables arguments modelled from rules to be retracted based on new evidence. This makes it a promising approach to enhancing rule-based methods for creating more explainable AI systems. Nonetheless, research on their integration remains limited despite the appealing properties of rule-based systems and computational argumentation. Therefore, this study also addresses the applied challenge of implementing such an integration within practical AI tools. The study employs the Logic Learning Machine (LLM), a specific rule-extraction technique, and presents a modular design that integrates input rules into a structured argumentation framework using state-of-the-art computational argumentation methods. Experiments conducted on binary classification problems using various datasets from the UCI Machine Learning Repository demonstrate the effectiveness of this integration. The LLM technique excelled in producing a manageable number of if-then rules with a small number of premises while maintaining high inferential capacity for all datasets. In turn, argument-based models achieved comparable results to those derived directly from if-then rules, leveraging a concise set of rules and excelling in explainability. In summary, this paper introduces a novel approach for efficiently and automatically generating arguments and their interactions from data, addressing both scientific and applied challenges in advancing the application and deployment of argumentation systems in XAI. Full article
(This article belongs to the Section Data)
Show Figures

Figure 1

Figure 1
<p>Illustration of the integration of a data-driven rule-generator (Logic Learning Machine) and a rule-aggregator with non-monotonic logic (structured argumentation).</p>
Full article ">Figure 2
<p>An illustration of a multipartite argumentation graph. A node represents each argument and has an <span class="html-italic">if-then</span> internal structure following Equation (<a href="#FD1-make-06-00101" class="html-disp-formula">1</a>) (premises are omitted for the sake of simplicity). Arguments <span class="html-italic">a</span>–<span class="html-italic">c</span> share a common output class, whereas arguments <span class="html-italic">d</span>–<span class="html-italic">f</span> share a different one. Each argument in a partite attacks all the other arguments in the other partite.</p>
Full article ">Figure 3
<p>An illustrative example of elicitation of arguments and the definition of their dialectical status. Node labels contain the argument label and its weight. The premise of argument <span class="html-italic">a</span> does not hold true with the input data, so it is discarded along with its incoming/outgoing attacks. For graphs 1, 2, 3, and 4, the following can be observed, respectively: attacks <math display="inline"><semantics> <mrow> <mo>{</mo> <mo>∅</mo> <mo>}</mo> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mo>{</mo> <mi>d</mi> <mo>→</mo> <mi>c</mi> <mo>}</mo> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mo>{</mo> <mi>d</mi> <mo>→</mo> <mi>c</mi> <mo>,</mo> <mi>c</mi> <mo>→</mo> <mi>b</mi> <mo>}</mo> </mrow> </semantics></math>, and <math display="inline"><semantics> <mrow> <mo>{</mo> <mi>d</mi> <mo>→</mo> <mi>c</mi> <mo>,</mo> <mi>c</mi> <mo>→</mo> <mi>d</mi> <mo>,</mo> <mi>c</mi> <mo>→</mo> <mi>b</mi> <mo>}</mo> </mrow> </semantics></math> are removed to respect the inconsistency budget defined; the grounded extensions are <math display="inline"><semantics> <mrow> <mo>{</mo> <mo>∅</mo> <mo>}</mo> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mo>{</mo> <mo>∅</mo> <mo>}</mo> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mo>{</mo> <mi>c</mi> <mo>}</mo> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mo>{</mo> <mi>c</mi> <mo>,</mo> <mi>d</mi> <mo>}</mo> </mrow> </semantics></math>, and the preferred extensions are <math display="inline"><semantics> <mrow> <mo>{</mo> <mo>{</mo> <mi>c</mi> <mo>}</mo> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mo>{</mo> <mi>b</mi> <mo>,</mo> <mi>d</mi> <mo>}</mo> <mo>}</mo> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mo>{</mo> <mo>{</mo> <mi>c</mi> <mo>}</mo> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mo>{</mo> <mi>b</mi> <mo>,</mo> <mi>d</mi> <mo>}</mo> <mo>}</mo> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mo>{</mo> <mi>c</mi> <mo>}</mo> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mo>{</mo> <mi>c</mi> <mo>,</mo> <mi>d</mi> <mo>}</mo> </mrow> </semantics></math>; the top ranked arguments for the categoriser are <math display="inline"><semantics> <mrow> <mo>{</mo> <mi>b</mi> <mo>,</mo> <mi>d</mi> <mo>}</mo> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mo>{</mo> <mi>b</mi> <mo>,</mo> <mi>c</mi> <mo>,</mo> <mi>d</mi> <mo>}</mo> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mo>{</mo> <mi>c</mi> <mo>}</mo> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mo>{</mo> <mi>c</mi> <mo>,</mo> <mi>d</mi> <mo>}</mo> </mrow> </semantics></math>.</p>
Full article ">Figure 4
<p>Design of a comparative experiment with four main steps: (<b>a</b>) Selection and pre-processing of four datasets for binary classification tasks; (<b>b</b>) Automatic formation of <span class="html-italic">if-then</span> rules from the selected dataset using the Logic Learning Machine (LLM) technique; (<b>c</b>) Generation of final inferences using two aggregator logics rules: the Standard Applied Procedure and computational argumentation; (<b>d</b>) Comparative analysis via standard binary classification metrics and percentage of undecided cases (NAs: when a model cannot lead to a final inference).</p>
Full article ">Figure 5
<p>Overall results for inferences produced using the CARS dataset grouped by the error threshold per rule parameter (10% for the first four blocks (<b>a</b>–<b>d</b>), 25% for the second four blocks (<b>e</b>–<b>h</b>)), and divided by inconsistency budget variation (25%, 50%, 90%, 100%).</p>
Full article ">Figure 6
<p>Overall results for inferences produced using the CENSUS dataset grouped by the error threshold per rule parameter (10% for the first four blocks (<b>a</b>–<b>d</b>), 25% for the second four blocks (<b>e</b>–<b>h</b>)) and divided by inconsistency budget variation (25%, 50%, 90%, 100%).</p>
Full article ">Figure 7
<p>Overall results for inferences produced using the BANK dataset grouped by the error threshold per rule parameter (10% for the first four blocks (<b>a</b>–<b>d</b>), 25% for the second four blocks (<b>e</b>–<b>h</b>)), and divided by inconsistency budget variation (25%, 50%, 90%, 100%).</p>
Full article ">Figure 8
<p>Overall results for inferences produced using the MYOCARDIAL dataset grouped by the error threshold per rule parameter (10% for the first four blocks (<b>a</b>–<b>d</b>), 25% for the second four blocks (<b>e</b>–<b>h</b>)), and divided by inconsistency budget variation (25%, 50%, 90%, 100%).</p>
Full article ">Figure A1
<p>Example of argumentation graph generated from the <span class="html-italic">if-then</span> rules extracted for the CENSUS dataset using the LLM technique with 10% error threshold per rule. (<b>a</b>) All arguments and attacks with no input data, (<b>b</b>,<b>c</b>) two examples of accepted (green) and rejected (red) arguments from some input data using the preferred semantics.</p>
Full article ">Figure A2
<p>Example of argumentation graph generated from the <span class="html-italic">if-then</span> rules extracted for the CENSUS dataset using the LLM technique with 10% error threshold per rule. (<b>a</b>) All arguments and attacks with no input data, (<b>b</b>,<b>c</b>) two examples of accepted (green) and rejected (red) arguments from some input data using the preferred semantics.</p>
Full article ">Figure A3
<p>Examples of the open-source ArgFrame framework [<a href="#B38-make-06-00101" class="html-bibr">38</a>] instantiated with argumentation graphs generated for the CENSUS dataset. It is possible to hover over nodes to analyze their internal structure. Data can also be imported, allowing the visualization of case-by-case inferences. Its use is recommended for a better understanding of the available functionalities.</p>
Full article ">
16 pages, 2094 KiB  
Article
Graph Convolutional Networks for Predicting Cancer Outcomes and Stage: A Focus on cGAS-STING Pathway Activation
by Mateo Sokač, Borna Skračić, Danijel Kučak and Leo Mršić
Mach. Learn. Knowl. Extr. 2024, 6(3), 2033-2048; https://doi.org/10.3390/make6030100 - 11 Sep 2024
Viewed by 1900
Abstract
The study presented in this paper evaluated gene expression profiles from The Cancer Genome Atlas (TCGA). To reduce complexity, we focused on genes in the cGAS–STING pathway, crucial for cytosolic DNA detection and immune response. The study analyzes three clinical variables: disease-specific survival [...] Read more.
The study presented in this paper evaluated gene expression profiles from The Cancer Genome Atlas (TCGA). To reduce complexity, we focused on genes in the cGAS–STING pathway, crucial for cytosolic DNA detection and immune response. The study analyzes three clinical variables: disease-specific survival (DSS), overall survival (OS), and tumor stage. To effectively utilize the high-dimensional gene expression data, we needed to find a way to project these data meaningfully. Since gene pathways can be represented as graphs, a novel method of presenting genomics data using graph data structure was employed, rather than the conventional tabular format. To leverage the gene expression data represented as graphs, we utilized a graph convolutional network (GCN) machine learning model in conjunction with the genetic algorithm optimization technique. This allowed for obtaining an optimal graph representation topology and capturing important activations within the pathway for each use case, enabling a more insightful analysis of the cGAS–STING pathway and its activations across different cancer types and clinical variables. To tackle the problem of unexplainable AI, graph visualization alongside the integrated gradients method was employed to explain the GCN model’s decision-making process, identifying key nodes (genes) in the cGAS–STING pathway. This approach revealed distinct molecular mechanisms, enhancing interpretability. This study demonstrates the potential of GCNs combined with explainable AI to analyze gene expression, providing insights into cancer progression. Further research with more data is needed to validate these findings. Full article
(This article belongs to the Section Network)
Show Figures

Figure 1

Figure 1
<p>Study overview. (<b>A</b>) Projecting gene expression data across cancer cohorts as graphs. (<b>B</b>) Heuristic search using the genetic algorithm optimization technique for each use case. (<b>C</b>) Training the models and visualizing inference by highlighting the most important gene activations in the pathway.</p>
Full article ">Figure 1 Cont.
<p>Study overview. (<b>A</b>) Projecting gene expression data across cancer cohorts as graphs. (<b>B</b>) Heuristic search using the genetic algorithm optimization technique for each use case. (<b>C</b>) Training the models and visualizing inference by highlighting the most important gene activations in the pathway.</p>
Full article ">Figure 2
<p>Models’ heuristic search and training. The figure illustrates the optimization process and training performance of the models. (<b>A</b>) The panel displays the fitness values over generations during the heuristic search, indicating an improvement in model performance as the search progresses. (<b>B</b>) The panel presents the categorical cross-entropy loss over training epochs, demonstrating how the loss decreases as the model trains, signifying better model fitting to the data.</p>
Full article ">Figure 2 Cont.
<p>Models’ heuristic search and training. The figure illustrates the optimization process and training performance of the models. (<b>A</b>) The panel displays the fitness values over generations during the heuristic search, indicating an improvement in model performance as the search progresses. (<b>B</b>) The panel presents the categorical cross-entropy loss over training epochs, demonstrating how the loss decreases as the model trains, signifying better model fitting to the data.</p>
Full article ">Figure 3
<p>Models’ performance. This figure compares the performance of the models across different scenarios. (<b>A</b>) Figure shows the F1 scores for each use case (OS, DSS, and stage) across all cancer types included in the study. (<b>B</b>) Figure illustrates the average F1 scores for each cancer type, providing a summary of model performance by cancer cohort.</p>
Full article ">Figure 4
<p>Consensus feature-importance graphs. This figure visualizes the consensus feature importance across different clinical variables. (<b>A</b>) Figure compares feature importance between patients with OS = false and OS = true, highlighting key genes involved in overall survival. (<b>B</b>) Panel contrasts feature importance between patients with DSS = false and DSS = true, showcasing significant genes in disease-specific survival. (<b>C</b>) Figure examines the differences in gene activation between the early and late stages of cancer, identifying critical genes involved in tumor progression.</p>
Full article ">Figure 4 Cont.
<p>Consensus feature-importance graphs. This figure visualizes the consensus feature importance across different clinical variables. (<b>A</b>) Figure compares feature importance between patients with OS = false and OS = true, highlighting key genes involved in overall survival. (<b>B</b>) Panel contrasts feature importance between patients with DSS = false and DSS = true, showcasing significant genes in disease-specific survival. (<b>C</b>) Figure examines the differences in gene activation between the early and late stages of cancer, identifying critical genes involved in tumor progression.</p>
Full article ">
15 pages, 5499 KiB  
Article
Correlating Histopathological Microscopic Images of Creutzfeldt–Jakob Disease with Clinical Typology Using Graph Theory and Artificial Intelligence
by Carlos Martínez, Susana Teijeira, Patricia Domínguez, Silvia Campanioni, Laura Busto, José A. González-Nóvoa, Jacobo Alonso, Eva Poveda, Beatriz San Millán and César Veiga
Mach. Learn. Knowl. Extr. 2024, 6(3), 2018-2032; https://doi.org/10.3390/make6030099 - 7 Sep 2024
Viewed by 1115
Abstract
Creutzfeldt–Jakob disease (CJD) is a rare, degenerative, and fatal brain disorder caused by abnormal proteins called prions. This research introduces a novel approach combining AI and graph theory to analyze histopathological microscopic images of brain tissues affected by CJD. The detection and quantification [...] Read more.
Creutzfeldt–Jakob disease (CJD) is a rare, degenerative, and fatal brain disorder caused by abnormal proteins called prions. This research introduces a novel approach combining AI and graph theory to analyze histopathological microscopic images of brain tissues affected by CJD. The detection and quantification of spongiosis, characterized by the presence of vacuoles in the brain tissue, plays a crucial role in aiding the accurate diagnosis of CJD. The proposed methodology employs image processing techniques to identify these pathological features in high-resolution medical images. By developing an automatic pipeline for the detection of spongiosis, we aim to overcome some limitations of manual feature extraction. The results demonstrate that our method correctly identifies and characterize spongiosis and allows the extraction of features that will help to better understand the spongiosis patterns in different CJD patients. Full article
(This article belongs to the Topic Applications in Image Analysis and Pattern Recognition)
Show Figures

Figure 1

Figure 1
<p>Examples of cerebellar vermis (<b>a</b>) and striatum nucleus (<b>b</b>) images and a zoomed region of each one.</p>
Full article ">Figure 2
<p>Zoom on striatum image, showing an example of CJD brain tissue that includes spongiosis, which is indicated using arrows.</p>
Full article ">Figure 3
<p>Pipeline of the image processing method followed to obtain the final set of vacuoles.</p>
Full article ">Figure 4
<p>(<b>a</b>,<b>b</b>) are zoomed regions of the ToI. Image (<b>a</b>) shows the input prior to artifact removal, where the yellow pixels belong to the dark region and the blue ones belong to the white region. Image (<b>b</b>) shows the result after artifact removal.</p>
Full article ">Figure 5
<p>This image explain the Gaussian filter: (<b>a</b>) shows the original image, and (<b>b</b>) shows the effective area (non-black pixels) after applying the Gaussian filter to (<b>a</b>).</p>
Full article ">Figure 6
<p>In this region of tissue, the final set of vacuoles is highlighted in green, vacuole candidates that do not meet the roundness criteria are highlighted in red, and vacuole candidates that do not have the minimum size (ms) are highlighted in blue.</p>
Full article ">Figure 7
<p>Scheme of the feature extraction process.</p>
Full article ">Figure 8
<p>Boxplots of the normalized areas of the samples of the patients described in <a href="#sec2dot1-make-06-00099" class="html-sec">Section 2.1</a> after analysis using the proposed pipeline comparing the control group (blue) and the CJD cases (orange).</p>
Full article ">Figure 9
<p>Heatmap of the Pearson correlation between the two clinical variables and the different metrics obtained with the proposed algorithm.</p>
Full article ">Figure 10
<p>Radar chart of the means of the features panencephalic form (<b>a</b>) and sex (<b>b</b>). In (<b>a</b>), panencephalic form is represented in orange, and non-panencephalic form is violet. In (<b>b</b>), women are represented by orange, and men are represented by violet.</p>
Full article ">
21 pages, 748 KiB  
Systematic Review
Tertiary Review on Explainable Artificial Intelligence: Where Do We Stand?
by Frank van Mourik, Annemarie Jutte, Stijn E. Berendse, Faiza A. Bukhsh and Faizan Ahmed
Mach. Learn. Knowl. Extr. 2024, 6(3), 1997-2017; https://doi.org/10.3390/make6030098 - 30 Aug 2024
Cited by 2 | Viewed by 1891
Abstract
Research into explainable artificial intelligence (XAI) methods has exploded over the past five years. It is essential to synthesize and categorize this research and, for this purpose, multiple systematic reviews on XAI mapped out the landscape of the existing methods. To understand how [...] Read more.
Research into explainable artificial intelligence (XAI) methods has exploded over the past five years. It is essential to synthesize and categorize this research and, for this purpose, multiple systematic reviews on XAI mapped out the landscape of the existing methods. To understand how these methods have developed and been applied and what evidence has been accumulated through model training and analysis, we carried out a tertiary literature review that takes as input systematic literature reviews published between 1992 and 2023. We evaluated 40 systematic literature review papers and presented binary tabular overviews of researched XAI methods and their respective characteristics, such as the scope, scale, input data, explanation data, and machine learning models researched. We identified seven distinct characteristics and organized them into twelve specific categories, culminating in the creation of comprehensive research grids. Within these research grids, we systematically documented the presence or absence of research mentions for each pairing of characteristic and category. We identified 14 combinations that are open to research. Our findings reveal a significant gap, particularly in categories like the cross-section of feature graphs and numerical data, which appear to be notably absent or insufficiently addressed in the existing body of research and thus represent a future research road map. Full article
(This article belongs to the Special Issue Machine Learning in Data Science)
Show Figures

Figure 1

Figure 1
<p>Hierarchy of evidence synthesis methods, based on Fusar-Poli and Radua [<a href="#B9-make-06-00098" class="html-bibr">9</a>].</p>
Full article ">Figure 2
<p>PRISMA flow diagram of the tertiary review.</p>
Full article ">Figure 3
<p>Overview of included articles per publication year.</p>
Full article ">Figure 4
<p>Visual overview of intrinsic (ante hoc) and post hoc explainability [<a href="#B17-make-06-00098" class="html-bibr">17</a>].</p>
Full article ">
28 pages, 1736 KiB  
Article
Black Box Adversarial Reprogramming for Time Series Feature Classification in Ball Bearings’ Remaining Useful Life Classification
by Alexander Bott, Felix Schreyer, Alexander Puchta and Jürgen Fleischer
Mach. Learn. Knowl. Extr. 2024, 6(3), 1969-1996; https://doi.org/10.3390/make6030097 - 27 Aug 2024
Viewed by 1175
Abstract
Standard ML relies on ample data, but limited availability poses challenges. Transfer learning offers a solution by leveraging pre-existing knowledge. Yet many methods require access to the model’s internal aspects, limiting applicability to white box models. To address this, Tsai, Chen and Ho [...] Read more.
Standard ML relies on ample data, but limited availability poses challenges. Transfer learning offers a solution by leveraging pre-existing knowledge. Yet many methods require access to the model’s internal aspects, limiting applicability to white box models. To address this, Tsai, Chen and Ho introduced Black Box Adversarial Reprogramming for transfer learning with black box models. While tested primarily in image classification, this paper explores its potential in time series classification, particularly predictive maintenance. We develop an adversarial reprogramming concept tailored to black box time series classifiers. Our study focuses on predicting the Remaining Useful Life of rolling bearings. We construct a comprehensive ML pipeline, encompassing feature engineering and model fine-tuning, and compare results with traditional transfer learning. We investigate the impact of hyperparameters and training parameters on model performance, demonstrating the successful application of Black Box Adversarial Reprogramming to time series data. The method achieved a weighted F1-score of 0.77, although it exhibited significant stochastic fluctuations, with scores ranging from 0.3 to 0.77 due to randomness in gradient estimation. Full article
(This article belongs to the Section Learning)
Show Figures

Figure 1

Figure 1
<p>Conceptual differences between different maintenance strategies. (After: [<a href="#B41-make-06-00097" class="html-bibr">41</a>])</p>
Full article ">Figure 2
<p>Methodology of this paper.</p>
Full article ">Figure 3
<p>Structuring of the datasets used in this paper.</p>
Full article ">Figure 4
<p>Principle of operation of the BAR algorithm.</p>
Full article ">Figure 5
<p>Structure of the training process of the BAR algorithm.</p>
Full article ">Figure 6
<p>Confusion matrices of the baseline model in the source domain.</p>
Full article ">Figure 7
<p>Confusion matrices of the baseline model in the target domain.</p>
Full article ">Figure 8
<p>Histogram of <span class="html-italic">macro f1 scores</span> obtained on target training data when training 360 times with the same hyperparameters.</p>
Full article ">Figure 9
<p>Visual representation of the average <span class="html-italic">macro f1 scores</span> over learning rate <math display="inline"><semantics> <mi>α</mi> </semantics></math> (<span class="html-italic">x</span>-axis) and vector size (legend).</p>
Full article ">Figure 10
<p>Visual representation of the performance variance broken down by learning rate <math display="inline"><semantics> <mi>α</mi> </semantics></math> (<span class="html-italic">x</span>-axis) and <span class="html-italic">q</span> and <math display="inline"><semantics> <mi>δ</mi> </semantics></math> (legend).</p>
Full article ">Figure 11
<p>Visual representation of the effect of <math display="inline"><semantics> <mi>δ</mi> </semantics></math> (<span class="html-italic">x</span>-axis) on the average <span class="html-italic">macro f1 score</span> of the algorithm, broken down by <span class="html-italic">q</span> and <math display="inline"><semantics> <mi>α</mi> </semantics></math> (legend).</p>
Full article ">Figure 12
<p>Visual representation of the effect of <span class="html-italic">q</span> (<span class="html-italic">x</span>-axis) on the average <span class="html-italic">macro f1 score</span> (<b>a</b>) and its variance (<b>b</b>), broken down by <math display="inline"><semantics> <mi>δ</mi> </semantics></math> and <math display="inline"><semantics> <mi>α</mi> </semantics></math> (legend).</p>
Full article ">Figure 13
<p>Visual representation of the effect of the size of vectors (<span class="html-italic">x</span>-axis) on the average <span class="html-italic">macro f1 score</span> (<b>a</b>) and its variance (<b>b</b>), broken down by <span class="html-italic">q</span> (legend).</p>
Full article ">
16 pages, 956 KiB  
Article
Assessing Fine-Tuned NER Models with Limited Data in French: Automating Detection of New Technologies, Technological Domains, and Startup Names in Renewable Energy
by Connor MacLean and Denis Cavallucci
Mach. Learn. Knowl. Extr. 2024, 6(3), 1953-1968; https://doi.org/10.3390/make6030096 - 27 Aug 2024
Viewed by 3714
Abstract
Achieving carbon neutrality by 2050 requires unprecedented technological, economic, and sociological changes. With time as a scarce resource, it is crucial to base decisions on relevant facts and information to avoid misdirection. This study aims to help decision makers quickly find relevant information [...] Read more.
Achieving carbon neutrality by 2050 requires unprecedented technological, economic, and sociological changes. With time as a scarce resource, it is crucial to base decisions on relevant facts and information to avoid misdirection. This study aims to help decision makers quickly find relevant information related to companies and organizations in the renewable energy sector. In this study, we propose fine-tuning five RNN and transformer models trained for French on a new category, “TECH”. This category is used to classify technological domains and new products. In addition, as the model is fine-tuned on news related to startups, we note an improvement in the detection of startup and company names in the “ORG” category. We further explore the capacities of the most effective model to accurately predict entities using a small amount of training data. We show the progression of the model from being trained on several hundred to several thousand annotations. This analysis allows us to demonstrate the potential of these models to extract insights without large corpora, allowing us to reduce the long process of annotating custom training data. This approach is used to automatically extract new company mentions as well as to extract technologies and technology domains that are currently being discussed in the news in order to better analyze industry trends. This approach further allows to group together mentions of specific energy domains with the companies that are actively developing new technologies in the field. Full article
Show Figures

Figure 1

Figure 1
<p>Pipeline.</p>
Full article ">Figure 2
<p>spaCy’s language processing pipeline [<a href="#B3-make-06-00096" class="html-bibr">3</a>].</p>
Full article ">Figure 3
<p>Training spaCy’s included models [<a href="#B3-make-06-00096" class="html-bibr">3</a>].</p>
Full article ">Figure 4
<p>Correct annotations outside of training data predicted by the trained model.</p>
Full article ">Figure 5
<p>Energy domains correctly annotated by the model.</p>
Full article ">Figure 6
<p>Co-occurrence of organizations and technological domains in the same article.</p>
Full article ">
17 pages, 2683 KiB  
Article
Forecasting the Right Crop Nutrients for Specific Crops Based on Collected Data Using an Artificial Neural Network (ANN)
by Sairoel Amertet and Girma Gebresenbet
Mach. Learn. Knowl. Extr. 2024, 6(3), 1936-1952; https://doi.org/10.3390/make6030095 - 26 Aug 2024
Viewed by 1203
Abstract
In farming technologies, it is difficult to properly provide the accurate crop nutrients for respective crops. For this reason, farmers are experiencing enormous problems. Although various types of machine learning (deep learning and convolutional neural networks) have been used to identify crop diseases, [...] Read more.
In farming technologies, it is difficult to properly provide the accurate crop nutrients for respective crops. For this reason, farmers are experiencing enormous problems. Although various types of machine learning (deep learning and convolutional neural networks) have been used to identify crop diseases, as has crop classification-based image processing, they have failed to forecast accurate crop nutrients for various crops, as crop nutrients are numerical instead of visual. Neural networks represent an opportunity for the precision agriculture sector to more accurately forecast crop nutrition. Recent technological advancements in neural networks have begun to provide greater precision, with an array of opportunities in pattern recognition. Neural networks represent an opportunity to effectively solve numerical data problems. The aim of the current study is to estimate the right crop nutrients for the right crops based on the data collected using an artificial neural network. The crop data were collected from the MNIST dataset. To forecast the precise nutrients for the crops, ANN models were developed. The entire system was simulated in a MATLAB environment. The obtained results for forecasting accurate nutrients were 99.997%, 99.996%, and 99.997% for validation, training, and testing, respectively. Therefore, the proposed algorithm is suitable for forecasting accurate crop nutrients for the crops. Full article
(This article belongs to the Section Network)
Show Figures

Figure 1

Figure 1
<p>Precision agriculture process [<a href="#B16-make-06-00095" class="html-bibr">16</a>].</p>
Full article ">Figure 2
<p>Flow chart of artificial neural network (ANN) [<a href="#B13-make-06-00095" class="html-bibr">13</a>].</p>
Full article ">Figure 3
<p>An artificial neuron model [<a href="#B13-make-06-00095" class="html-bibr">13</a>].</p>
Full article ">Figure 4
<p>Artificial neural network model’s performance with various hidden layer nodes (learning rate <math display="inline"><semantics> <mrow> <mo>=</mo> <mn>0.2</mn> <mo>)</mo> </mrow> </semantics></math>.</p>
Full article ">Figure 5
<p>Impact of different element terms on the performance of artificial neural networks (learning rate <math display="inline"><semantics> <mrow> <mo>=</mo> <mn>0.2</mn> <mo>,</mo> </mrow> </semantics></math> hidden layer <math display="inline"><semantics> <mrow> <mo>=</mo> <mn>2</mn> <mo>)</mo> </mrow> </semantics></math>.</p>
Full article ">Figure 6
<p>Effect of various learning rates on artificial neural network performance (learning rate <math display="inline"><semantics> <mrow> <mo>=</mo> <mn>0.2</mn> <mo>,</mo> </mrow> </semantics></math> hidden layer <math display="inline"><semantics> <mrow> <mo>=</mo> <mn>2</mn> <mo>)</mo> </mrow> </semantics></math>.</p>
Full article ">Figure 7
<p>Regression of the system.</p>
Full article ">Figure 8
<p>Performance of the system.</p>
Full article ">Figure 9
<p>Gradient of the system.</p>
Full article ">Figure 10
<p>Performance of the system at 1000 epochs.</p>
Full article ">Figure 11
<p>Parameter values at 1000 epochs.</p>
Full article ">Figure 12
<p>Regression of the system at 1000 epochs.</p>
Full article ">
15 pages, 1283 KiB  
Article
Optimal Knowledge Distillation through Non-Heuristic Control of Dark Knowledge
by Darian Onchis, Codruta Istin and Ioan Samuila
Mach. Learn. Knowl. Extr. 2024, 6(3), 1921-1935; https://doi.org/10.3390/make6030094 - 22 Aug 2024
Viewed by 1438
Abstract
In this paper, a method is introduced to control the dark knowledge values also known as soft targets, with the purpose of improving the training by knowledge distillation for multi-class classification tasks. Knowledge distillation effectively transfers knowledge from a larger model to a [...] Read more.
In this paper, a method is introduced to control the dark knowledge values also known as soft targets, with the purpose of improving the training by knowledge distillation for multi-class classification tasks. Knowledge distillation effectively transfers knowledge from a larger model to a smaller model to achieve efficient, fast, and generalizable performance while retaining much of the original accuracy. The majority of deep neural models used for classification tasks append a SoftMax layer to generate output probabilities and it is usual to take the highest score and consider it the inference of the model, while the rest of the probability values are generally ignored. The focus is on those probabilities as carriers of dark knowledge and our aim is to quantify the relevance of dark knowledge, not heuristically as provided in the literature so far, but with an inductive proof on the SoftMax operational limits. These limits are further pushed by using an incremental decision tree with information gain split. The user can set a desired precision and an accuracy level to obtain a maximal temperature setting for a continual classification process. Moreover, by fitting both the hard targets and the soft targets, one obtains an optimal knowledge distillation effect that mitigates better catastrophic forgetting. The strengths of our method come from the possibility of controlling the amount of distillation transferred non-heuristically and the agnostic application of this model-independent study. Full article
Show Figures

Figure 1

Figure 1
<p><a href="#make-06-00094-t002" class="html-table">Table 2</a> values for 10 classes, setting the temperature on the x-axis to 1, 2 and 20 and color-coding the number of clusters and the number of elements on each cluster.</p>
Full article ">Figure 2
<p>SoftMax minimal oscillations as the number of classes increases. As Tmax is reached for the set epsilon precision, one can observe the minimal variation between SoftMax values as the number of classes increases, i.e., less than 0.004. The x-axis represents the SoftMax values and the y-axis represents the number of classes.</p>
Full article ">Figure 3
<p><b>Above</b>: classical situation. <b>Below</b>: our situation.</p>
Full article ">Figure 4
<p><span class="html-italic">Input</span>: H is the Hoeffding tree for the almost-plateau values at high temperatures. <span class="html-italic">Output</span>: Maximum information gain splitting with Information Gain = 1 – Entropy.</p>
Full article ">Figure 5
<p>Test results on a number of 1000 classes grouped by 10 (cropped).</p>
Full article ">Figure 6
<p>Test results on a number of 1000 classes grouped by 10.</p>
Full article ">Figure 7
<p>Test results on a number of 1000 classes grouped by 10 (cropped).</p>
Full article ">Figure 8
<p>iCaRL_S_t10_5x20_2e_one - SoftMax, temperature = 10, 5 classes and 20 increments (batches), 2 epochs, fixed dataset of 100 classes, iCaRL_S_t10_5x5_1e_big - SoftMax, temperature = 10, 5 classes and 5 increments (batches), 1 epoch, from the large set of 1000 classes, 100 were randomly chosen, iCaRL_S_t10_10x5_1e_big - SoftMax, temperature = 10, 10 classes and 5 increments (batches), 1 epoch, from the large set of 1000 classes, 100 were randomly chosen, iCaRL_S_t10_5x20_2e_big - SoftMax, temperature = 10, 5 classes and 20 increments (batches), and 2 epochs; from the large set of 1000 classes, 100 were randomly chosen, iCaRL_Orig_5x20_2e_big - Original without SoftMax, 5 classes and 20 increments (batches), and 2 epochs; from the large set of 1000 classes, 100 were randomly chosen, iCaRL_S_t40_5x20_2e_big - SoftMax, temperature = 40, 5 classes and 20 increments (batches), and 2 epochs; from the large set of 1000 classes, 100 were randomly chosen.</p>
Full article ">Figure 9
<p>Testing 2 different scenarios: the difference in changing the number of epochs.</p>
Full article ">Figure 10
<p>Testing 3 different scenarios: Top 10 accuracy rating for model training in 60 epochs.</p>
Full article ">
28 pages, 7677 KiB  
Article
Visual Reasoning and Multi-Agent Approach in Multimodal Large Language Models (MLLMs): Solving TSP and mTSP Combinatorial Challenges
by Mohammed Elhenawy, Ahmad Abutahoun, Taqwa I. Alhadidi, Ahmed Jaber, Huthaifa I. Ashqar, Shadi Jaradat, Ahmed Abdelhay, Sebastien Glaser and Andry Rakotonirainy
Mach. Learn. Knowl. Extr. 2024, 6(3), 1894-1920; https://doi.org/10.3390/make6030093 - 13 Aug 2024
Cited by 3 | Viewed by 1390
Abstract
Multimodal Large Language Models (MLLMs) harness comprehensive knowledge spanning text, images, and audio to adeptly tackle complex problems. This study explores the ability of MLLMs in visually solving the Traveling Salesman Problem (TSP) and Multiple Traveling Salesman Problem (mTSP) using images that portray [...] Read more.
Multimodal Large Language Models (MLLMs) harness comprehensive knowledge spanning text, images, and audio to adeptly tackle complex problems. This study explores the ability of MLLMs in visually solving the Traveling Salesman Problem (TSP) and Multiple Traveling Salesman Problem (mTSP) using images that portray point distributions on a two-dimensional plane. We introduce a novel approach employing multiple specialized agents within the MLLM framework, each dedicated to optimizing solutions for these combinatorial challenges. We benchmarked our multi-agent model solutions against the Google OR tools, which served as the baseline for comparison. The results demonstrated that both multi-agent models—Multi-Agent 1, which includes the initializer, critic, and scorer agents, and Multi-Agent 2, which comprises only the initializer and critic agents—significantly improved the solution quality for TSP and mTSP problems. Multi-Agent 1 excelled in environments requiring detailed route refinement and evaluation, providing a robust framework for sophisticated optimizations. In contrast, Multi-Agent 2, focusing on iterative refinements by the initializer and critic, proved effective for rapid decision-making scenarios. These experiments yield promising outcomes, showcasing the robust visual reasoning capabilities of MLLMs in addressing diverse combinatorial problems. The findings underscore the potential of MLLMs as powerful tools in computational optimization, offering insights that could inspire further advancements in this promising field. Full article
Show Figures

Figure 1

Figure 1
<p>Multi-Agent 1 strategy: the diagram illustrates a three-phase approach using visual reasoning, where an initializer agent proposes routes, a critic agent refines them, and a scorer agent evaluates their visual quality, enhancing the solutions iteratively. Numbers 1–6 represent different nodes visited by a salesman.</p>
Full article ">Figure 2
<p>Multi-Agent 2 strategy: the diagram illustrates a streamlined two-agent methodology, using visual reasoning without computational distance measures. The initializer agent proposes routes, which the critic agent refines, iterating until a maximum iteration. Numbers 1–6 represent different nodes visited by a salesman.</p>
Full article ">Figure 3
<p>Comparative performance of zero-shot and Multi-Agent 1 strategy across different salesmen configurations: analysis of mean gap and standard deviation by problem size: (<b>a</b>) one salesman; (<b>b</b>) two salesmen; (<b>c</b>) three salesmen.</p>
Full article ">Figure 3 Cont.
<p>Comparative performance of zero-shot and Multi-Agent 1 strategy across different salesmen configurations: analysis of mean gap and standard deviation by problem size: (<b>a</b>) one salesman; (<b>b</b>) two salesmen; (<b>c</b>) three salesmen.</p>
Full article ">Figure 4
<p>Visual representations of seven proposed solutions by the critic agent for two salesmen, along with scores assigned by the score agent, highlighting the iterative improvement process in the multi-agent system: (<b>a</b>) initial solution, (<b>b</b>–<b>e</b>,<b>h</b>) are suggested solutions, and (<b>i</b>) is the final solution. Different lines color represent different journeys. Numbers represent the nodes visited by the salesmen.</p>
Full article ">Figure 4 Cont.
<p>Visual representations of seven proposed solutions by the critic agent for two salesmen, along with scores assigned by the score agent, highlighting the iterative improvement process in the multi-agent system: (<b>a</b>) initial solution, (<b>b</b>–<b>e</b>,<b>h</b>) are suggested solutions, and (<b>i</b>) is the final solution. Different lines color represent different journeys. Numbers represent the nodes visited by the salesmen.</p>
Full article ">Figure 5
<p>Evaluation of proposed solutions in the third iteration for 3 salesmen with scores indicating the completeness of node coverage in a 30-node network. Different lines color represents different journeys. Numbers represent the nodes visited by the salesmen.</p>
Full article ">Figure 5 Cont.
<p>Evaluation of proposed solutions in the third iteration for 3 salesmen with scores indicating the completeness of node coverage in a 30-node network. Different lines color represents different journeys. Numbers represent the nodes visited by the salesmen.</p>
Full article ">Figure 6
<p>Comparative performance of zero-shot and Multi-Agent 2 strategy across different salesmen configurations: analysis of mean gap and standard deviation by problem size: (<b>a</b>) one salesman; (<b>b</b>) two salesmen; (<b>c</b>) three salesmen.</p>
Full article ">Figure 6 Cont.
<p>Comparative performance of zero-shot and Multi-Agent 2 strategy across different salesmen configurations: analysis of mean gap and standard deviation by problem size: (<b>a</b>) one salesman; (<b>b</b>) two salesmen; (<b>c</b>) three salesmen.</p>
Full article ">Figure 7
<p>Comparative analysis of mean gap percentage reductions: evaluating Multi-Agent 1 and Multi-Agent 2 against zero-shot across different problem sizes and salesman configurations.</p>
Full article ">Figure 8
<p>Example of Multi-Agent 2 strategy to generate the solution for a salesman in a 30-node network. Numbers represent the nodes visited by a salesman.</p>
Full article ">Figure 9
<p>Example of Multi-Agent 2 strategy to generate the solution for 3 salesmen in a 30-node network. Different lines color represents different journeys. Numbers represent the nodes visited by the salesmen.</p>
Full article ">Figure 9 Cont.
<p>Example of Multi-Agent 2 strategy to generate the solution for 3 salesmen in a 30-node network. Different lines color represents different journeys. Numbers represent the nodes visited by the salesmen.</p>
Full article ">
23 pages, 4393 KiB  
Article
Balancing Results from AI-Based Geostatistics versus Fuzzy Inference by Game Theory Bargaining to Improve a Groundwater Monitoring Network
by Masoumeh Hashemi, Richard C. Peralta and Matt Yost
Mach. Learn. Knowl. Extr. 2024, 6(3), 1871-1893; https://doi.org/10.3390/make6030092 - 9 Aug 2024
Cited by 1 | Viewed by 1700
Abstract
An artificial intelligence-based geostatistical optimization algorithm was developed to upgrade a test Iranian aquifer’s existing groundwater monitoring network. For that aquifer, a preliminary study revealed that a Multi-Layer Perceptron Artificial Neural Network (MLP-ANN) more accurately determined temporally average water table elevations than geostatistical [...] Read more.
An artificial intelligence-based geostatistical optimization algorithm was developed to upgrade a test Iranian aquifer’s existing groundwater monitoring network. For that aquifer, a preliminary study revealed that a Multi-Layer Perceptron Artificial Neural Network (MLP-ANN) more accurately determined temporally average water table elevations than geostatistical kriging, spline, and inverse distance weighting. Because kriging is usually used in that area for water table estimation, the developed algorithm used MLP-ANN to guide kriging, and Genetic Algorithm (GA) to determine locations for new monitoring well location(s). For possible annual fiscal budgets allowing 1–12 new wells, 12 sets of optimal new well locations are reported. Each set has the locations of new wells that would minimize the squared difference between the time-averaged heads developed by kriging versus MLP-ANN. Also, to simultaneously consider local expertise, the algorithm used fuzzy inference to quantify an expert’s satisfaction with the number of new wells. Then, the algorithm used symmetric bargaining (Nash, Kalai–Smorodinsky, and area monotonic) to present an upgradation strategy that balanced professional judgment and heuristic optimization. In essence, the algorithm demonstrates the systematic application of relatively new computational practices to a common situation worldwide. Full article
(This article belongs to the Special Issue Sustainable Applications for Machine Learning)
Show Figures

Figure 1

Figure 1
<p>Groundwater level contours and observation wells in Qazvin Aquifer, Qazvin Province, Iran.</p>
Full article ">Figure 2
<p>Conceptual architecture of employed hidden layer perceptron (NFL = number of neurons in the first layer; NSL = number of neurons in the second layer; NAF = number of activation functions).</p>
Full article ">Figure 3
<p>The check point locations (spaced 280 m apart).</p>
Full article ">Figure 4
<p>Algorithm for improving existing monitoring networks (NOAW: Number of Additional Observation Well(s); MNAWs: Maximum Number of Added Wells; GA: Genetic Algorithm; SOE: Satisfaction of the Expert; and FIS: Fuzzy Inference System).</p>
Full article ">Figure 5
<p>Flowchart of Genetic Algorithm model (BFV: the best fitness value; BPFV: the best previous fitness value).</p>
Full article ">Figure 6
<p>Qazvin Aquifer candidate additional observation well locations (search space of Genetic Algorithm).</p>
Full article ">Figure 7
<p>Fuzzy Inference System (FIS) process.</p>
Full article ">Figure 8
<p>Membership function for NOAWs.</p>
Full article ">Figure 9
<p>Membership function for installation cost of one well.</p>
Full article ">Figure 10
<p>Membership function for satisfaction of expert.</p>
Full article ">Figure 11
<p>Ef and RMSE as functions of NOAWs.</p>
Full article ">Figure 12
<p>Fuzzification, inference, and defuzzification processes in determining the Satisfaction of the Expert (SOE) for each NOAW (for NOAWs = 9 and unit well cost = USD 4000, SOE = 61%).</p>
Full article ">Figure 13
<p>Normalized Pareto optimum curve of Ef versus SOE for USD 4000 unit well cost (labels show NOAWs).</p>
Full article ">Figure 14
<p>The groundwater level contour maps based on bargaining game results.</p>
Full article ">
14 pages, 7188 KiB  
Article
Accuracy Improvement of Debonding Damage Detection Technology in Composite Blade Joints for 20 kW Class Wind Turbine
by Hakgeun Kim, Hyeongjin Kim and Kiweon Kang
Mach. Learn. Knowl. Extr. 2024, 6(3), 1857-1870; https://doi.org/10.3390/make6030091 - 7 Aug 2024
Viewed by 1042
Abstract
Securing the structural safety of blades has become crucial, owing to the increasing size and weight of blades resulting from the recent development of large wind turbines. Composites are primarily used for blade manufacturing because of their high specific strength and specific stiffness. [...] Read more.
Securing the structural safety of blades has become crucial, owing to the increasing size and weight of blades resulting from the recent development of large wind turbines. Composites are primarily used for blade manufacturing because of their high specific strength and specific stiffness. However, in composite blades, joints may experience fractures from the loads generated during wind turbine operation, leading to deformation caused by changes in structural stiffness. In this study, 7132 debonding damage data, classified by damage type, position, and size, were selected to predict debonding damage based on natural frequency. The change in the natural frequency caused by debonding damage was acquired through finite element (FE) modeling and modal analysis. Synchronization between the FE analysis model and manufactured blades was achieved through modal testing and data analysis. Finally, the relationship between debonding damage and the change in natural frequency was examined using artificial neural network techniques. Full article
(This article belongs to the Section Network)
Show Figures

Figure 1

Figure 1
<p>Modeling method of wind turbine blade.</p>
Full article ">Figure 2
<p>Analysis process of damage prediction for wind turbine blade.</p>
Full article ">Figure 3
<p>Location of damages on blade cross section.</p>
Full article ">Figure 4
<p>Blade damage type.</p>
Full article ">Figure 5
<p>Damage selection criteria by type.</p>
Full article ">Figure 6
<p>Schematic diagram of modal test.</p>
Full article ">Figure 7
<p>Position of PRT sensor and high-speed camera.</p>
Full article ">Figure 8
<p>Modal test of composite blade.</p>
Full article ">Figure 9
<p>Vibration response data using (<b>a</b>) PRT sensor and (<b>b</b>) high-speed camera.</p>
Full article ">Figure 10
<p>Results of modal analysis of the blade and its motion.</p>
Full article ">Figure 11
<p>Damage prediction accuracy method based on the ANN.</p>
Full article ">Figure 12
<p>Damage prediction accuracy by type. (<b>a</b>) Type 1 (97%). (<b>b</b>) Type 2 (86%). (<b>c</b>) Type 3 (86%).</p>
Full article ">Figure 13
<p>Damage prediction accuracy by type through learning model improvement. (<b>a</b>) Type 1 (91%). (<b>b</b>) Type 2 (99%). (<b>c</b>) Type 3 (99%).</p>
Full article ">Figure 14
<p>Damage prediction accuracy of manufactured blades.</p>
Full article ">Figure 15
<p>Accuracy improvement of damage prediction model.</p>
Full article ">
17 pages, 786 KiB  
Article
A Parallel Approach to Enhance the Performance of Supervised Machine Learning Realized in a Multicore Environment
by Ashutosh Ghimire and Fathi Amsaad
Mach. Learn. Knowl. Extr. 2024, 6(3), 1840-1856; https://doi.org/10.3390/make6030090 - 2 Aug 2024
Cited by 1 | Viewed by 2419
Abstract
Machine learning models play a critical role in applications such as image recognition, natural language processing, and medical diagnosis, where accuracy and efficiency are paramount. As datasets grow in complexity, so too do the computational demands of classification techniques. Previous research has achieved [...] Read more.
Machine learning models play a critical role in applications such as image recognition, natural language processing, and medical diagnosis, where accuracy and efficiency are paramount. As datasets grow in complexity, so too do the computational demands of classification techniques. Previous research has achieved high accuracy but required significant computational time. This paper proposes a parallel architecture for Ensemble Machine Learning Models, harnessing multicore CPUs to expedite performance. The primary objective is to enhance machine learning efficiency without compromising accuracy through parallel computing. This study focuses on benchmark ensemble models including Random Forest, XGBoost, ADABoost, and K Nearest Neighbors. These models are applied to tasks such as wine quality classification and fraud detection in credit card transactions. The results demonstrate that, compared to single-core processing, machine learning tasks run 1.7 times and 3.8 times faster for small and large datasets on quad-core CPUs, respectively. Full article
(This article belongs to the Section Learning)
Show Figures

Figure 1

Figure 1
<p>Hierarchical structure of machine learning algorithms.</p>
Full article ">Figure 2
<p>Overview of the architecture of the Random Forest classifier machine learning model.</p>
Full article ">Figure 3
<p>Schematic diagram of the proposed parallel processing technique for an ensemble model.</p>
Full article ">Figure 4
<p>Execution time for Various models Performance with Dataset1 tested on Device1.</p>
Full article ">Figure 5
<p>Execution time for Random Forest (<b>left</b>) and XGBoost (<b>right</b>) on various devices.</p>
Full article ">Figure 6
<p>Accuracy of the various models’ performance with Dataset2 when tested on Device2.</p>
Full article ">Figure 7
<p>Execution time for the various models’ performance with Dataset2 when tested on Device2.</p>
Full article ">Figure 8
<p>Performance speed for Random Forest (<b>left</b>) and XGBoost (<b>right</b>) on two datasets.</p>
Full article ">
22 pages, 2817 KiB  
Article
Enhanced Graph Representation Convolution: Effective Inferring Gene Regulatory Network Using Graph Convolution Network with Self-Attention Graph Pooling Layer
by Duaa Mohammad Alawad, Ataur Katebi and Md Tamjidul Hoque
Mach. Learn. Knowl. Extr. 2024, 6(3), 1818-1839; https://doi.org/10.3390/make6030089 - 1 Aug 2024
Cited by 1 | Viewed by 1720
Abstract
Studying gene regulatory networks (GRNs) is paramount for unraveling the complexities of biological processes and their associated disorders, such as diabetes, cancer, and Alzheimer’s disease. Recent advancements in computational biology have aimed to enhance the inference of GRNs from gene expression data, a [...] Read more.
Studying gene regulatory networks (GRNs) is paramount for unraveling the complexities of biological processes and their associated disorders, such as diabetes, cancer, and Alzheimer’s disease. Recent advancements in computational biology have aimed to enhance the inference of GRNs from gene expression data, a non-trivial task given the networks’ intricate nature. The challenge lies in accurately identifying the myriad interactions among transcription factors and target genes, which govern cellular functions. This research introduces a cutting-edge technique, EGRC (Effective GRN Inference applying Graph Convolution with Self-Attention Graph Pooling), which innovatively conceptualizes GRN reconstruction as a graph classification problem, where the task is to discern the links within subgraphs that encapsulate pairs of nodes. By leveraging Spearman’s correlation, we generate potential subgraphs that bring nonlinear associations between transcription factors and their targets to light. We use mutual information to enhance this, capturing a broader spectrum of gene interactions. Our methodology bifurcates these subgraphs into ‘Positive’ and ‘Negative’ categories. ‘Positive’ subgraphs are those where a transcription factor and its target gene are connected, including interactions among their neighbors. ‘Negative’ subgraphs, conversely, denote pairs without a direct connection. EGRC utilizes dual graph convolution network (GCN) models that exploit node attributes from gene expression profiles and graph embedding techniques to classify these. The performance of EGRC is substantiated by comprehensive evaluations using the DREAM5 datasets. Notably, EGRC attained an AUROC of 0.856 and an AUPR of 0.841 on the E. coli dataset. In contrast, the in silico dataset achieved an AUROC of 0.5058 and an AUPR of 0.958. Furthermore, on the S. cerevisiae dataset, EGRC recorded an AUROC of 0.823 and an AUPR of 0.822. These results underscore the robustness of EGRC in accurately inferring GRNs across various organisms. The advanced performance of EGRC represents a substantial advancement in the field, promising to deepen our comprehension of the intricate biological processes and their implications in both health and disease. Full article
Show Figures

Figure 1

Figure 1
<p>Noisy skeletons derived from Spearman’s correlation generate two subgraphs: positive (<b>left</b>) and negative (<b>right</b>). The positive subgraph is a bipartite graph with centers A and B. Here, A symbolizes the transcription factor, while B denotes its associated target gene. A link exists between A and B if their Spearman correlation exceeds a threshold set at 0.8. Conversely, the negative subgraph is a bipartite graph centering with C and D. C represents the transcription factor here, while D denotes its associated target gene. This negative subgraph is characterized by its lack of a link between C and D due to a Spearman correlation below the 0.8 threshold.</p>
Full article ">Figure 2
<p>The EGRC framework. Initial noisy skeletons are created through heuristic methods, such as Spearman’s correlation and mutual information, which are employed to identify relationships between transcription factors (TFs) and their target genes from gene expression data. These identified associations form bipartite graphs, each featuring two central nodes representing TF-G and TF-TF relationships. A positive label is assigned to a bipartite graph if the central nodes are connected, while a negative label indicates unconnected nodes. Following this, a feature vector is generated for each node, incorporating two types of features—explicit features and structural embeddings. All the bipartite graphs and node features are inputs for the graph convolutional neural network.</p>
Full article ">Figure 3
<p>Comparative AUROC scores for GRN prediction algorithms on three DREAM5 datasets for (<b>a</b>) in silico dataset, (<b>b</b>) <span class="html-italic">E. coli</span> dataset, and (<b>c</b>) <span class="html-italic">S. cerevisiae</span> dataset.</p>
Full article ">Figure 4
<p>Comparative AUPR scores for GRN prediction algorithms on three DREAM5 datasets for (<b>a</b>) in silico dataset, (<b>b</b>) <span class="html-italic">E. coli</span> dataset, and (<b>c</b>) <span class="html-italic">S. cerevisiae</span> dataset.</p>
Full article ">
20 pages, 671 KiB  
Article
Learning Optimal Dynamic Treatment Regime from Observational Clinical Data through Reinforcement Learning
by Seyum Abebe, Irene Poli, Roger D. Jones and Debora Slanzi
Mach. Learn. Knowl. Extr. 2024, 6(3), 1798-1817; https://doi.org/10.3390/make6030088 - 30 Jul 2024
Viewed by 1682
Abstract
In medicine, dynamic treatment regimes (DTRs) have emerged to guide personalized treatment decisions for patients, accounting for their unique characteristics. However, existing methods for determining optimal DTRs face limitations, often due to reliance on linear models unsuitable for complex disease analysis and a [...] Read more.
In medicine, dynamic treatment regimes (DTRs) have emerged to guide personalized treatment decisions for patients, accounting for their unique characteristics. However, existing methods for determining optimal DTRs face limitations, often due to reliance on linear models unsuitable for complex disease analysis and a focus on outcome prediction over treatment effect estimation. To overcome these challenges, decision tree-based reinforcement learning approaches have been proposed. Our study aims to evaluate the performance and feasibility of such algorithms: tree-based reinforcement learning (T-RL), DTR-Causal Tree (DTR-CT), DTR-Causal Forest (DTR-CF), stochastic tree-based reinforcement learning (SL-RL), and Q-learning with Random Forest. Using real-world clinical data, we conducted experiments to compare algorithm performances. Evaluation metrics included the proportion of correctly assigned patients to recommended treatments and the empirical mean with standard deviation of expected counterfactual outcomes based on estimated optimal treatment strategies. This research not only highlights the potential of decision tree-based reinforcement learning for dynamic treatment regimes but also contributes to advancing personalized medicine by offering nuanced and effective treatment recommendations. Full article
(This article belongs to the Section Learning)
16 pages, 1999 KiB  
Article
Insights from Augmented Data Integration and Strong Regularization in Drug Synergy Prediction with SynerGNet
by Mengmeng Liu, Gopal Srivastava, J. Ramanujam and Michal Brylinski
Mach. Learn. Knowl. Extr. 2024, 6(3), 1782-1797; https://doi.org/10.3390/make6030087 - 29 Jul 2024
Cited by 1 | Viewed by 1195
Abstract
SynerGNet is a novel approach to predicting drug synergy against cancer cell lines. In this study, we discuss in detail the construction process of SynerGNet, emphasizing its comprehensive design tailored to handle complex data patterns. Additionally, we investigate a counterintuitive phenomenon when integrating [...] Read more.
SynerGNet is a novel approach to predicting drug synergy against cancer cell lines. In this study, we discuss in detail the construction process of SynerGNet, emphasizing its comprehensive design tailored to handle complex data patterns. Additionally, we investigate a counterintuitive phenomenon when integrating more augmented data into the training set results in an increase in testing loss alongside improved predictive accuracy. This sheds light on the nuanced dynamics of model learning. Further, we demonstrate the effectiveness of strong regularization techniques in mitigating overfitting, ensuring the robustness and generalization ability of SynerGNet. Finally, the continuous performance enhancements achieved through the integration of augmented data are highlighted. By gradually increasing the amount of augmented data in the training set, we observe substantial improvements in model performance. For instance, compared to models trained exclusively on the original data, the integration of the augmented data can lead to a 5.5% increase in the balanced accuracy and a 7.8% decrease in the false positive rate. Through rigorous benchmarks and analyses, our study contributes valuable insights into the development and optimization of predictive models in biomedical research. Full article
(This article belongs to the Special Issue Machine Learning in Data Science)
Show Figures

Figure 1

Figure 1
<p>SynerGNet architecture layout. (<b>A</b>) SynerGNet takes a graph-based representation of a pair of drugs and a target cancer cell line as the input. (<b>B</b>) This graph is processed by two sequential graph convolution modules designed for feature propagation and aggregation within the graph. Following each convolutional module, a new, updated embedding is produced for each node. (<b>C</b>) Embeddings from different convolution modules are merged using the jumping knowledge network (JK-Net). A readout mechanism employing a pooling layer extracts graph-level features from these node-level features. Finally, the graph embeddings are input into a prediction module, which determines whether the drug pair has a synergistic or antagonistic effect on the cancer cell line.</p>
Full article ">Figure 2
<p>Model performance during initial training with augmented data. The training set includes 16 times more augmented data than the original data. Model performance is evaluated (<b>A</b>,<b>B</b>) before and (<b>C</b>,<b>D</b>) after increasing the regularization strength. Plots show (<b>A</b>,<b>C</b>) train (red) and test (navy blue) loss, and (<b>B</b>,<b>D</b>) predictive accuracy (purple) and balanced accuracy (blue) as a function of the epoch number.</p>
Full article ">Figure 3
<p>Results of GNN training with gradually increased augmented data. Models are evaluated by (<b>A</b>) balanced accuracy and (<b>B</b>) testing loss. The <span class="html-italic">x</span>-axis represents the factor <span class="html-italic">n</span> in the formula 2<span class="html-italic"><sup>n</sup></span>, which determines the ratio of augmented data size to the original training data size. GNN models are trained on 2<span class="html-italic"><sup>n</sup></span> augmented data combined with original AZ-DREAM Challenges data and tested against the validation set from the original data. Circles represent average values over five different samples of the augmented data for each <span class="html-italic">n</span>, while boxes indicate the corresponding standard deviation.</p>
Full article ">
20 pages, 1138 KiB  
Article
Diverse Machine Learning for Forecasting Goal-Scoring Likelihood in Elite Football Leagues
by Christina Markopoulou, George Papageorgiou and Christos Tjortjis
Mach. Learn. Knowl. Extr. 2024, 6(3), 1762-1781; https://doi.org/10.3390/make6030086 - 28 Jul 2024
Cited by 3 | Viewed by 2668
Abstract
The field of sports analytics has grown rapidly, with a primary focus on performance forecasting, enhancing the understanding of player capabilities, and indirectly benefiting team strategies and player development. This work aims to forecast and comparatively evaluate players’ goal-scoring likelihood in four elite [...] Read more.
The field of sports analytics has grown rapidly, with a primary focus on performance forecasting, enhancing the understanding of player capabilities, and indirectly benefiting team strategies and player development. This work aims to forecast and comparatively evaluate players’ goal-scoring likelihood in four elite football leagues (Premier League, Bundesliga, La Liga, and Serie A) by mining advanced statistics from 2017 to 2023. Six types of machine learning (ML) models were developed and tested individually through experiments on the comprehensive datasets collected for these leagues. We also tested the upper 30th percentile of the best-performing players based on their performance in the last season, with varied features evaluated to enhance prediction accuracy in distinct scenarios. The results offer insights into the forecasting abilities of those leagues, identifying the best forecasting methodologies and the factors that most significantly contribute to the prediction of players’ goal-scoring. XGBoost consistently outperformed other models in most experiments, yielding the most accurate results and leading to a well-generalized model. Notably, when applied to Serie A, it achieved a mean absolute error (MAE) of 1.29. This study provides insights into ML-based performance prediction, advancing the field of player performance forecasting. Full article
Show Figures

Figure 1

Figure 1
<p>Flowchart for the proposed methodology.</p>
Full article ">Figure 2
<p>Performance prediction of players from all leagues.</p>
Full article ">
42 pages, 16635 KiB  
Article
Towards AI Dashboards in Financial Services: Design and Implementation of an AI Development Dashboard for Credit Assessment
by Mustafa Pamuk and Matthias Schumann
Mach. Learn. Knowl. Extr. 2024, 6(3), 1720-1761; https://doi.org/10.3390/make6030085 - 27 Jul 2024
Viewed by 2297
Abstract
Financial institutions are increasingly turning to artificial intelligence (AI) to improve their decision-making processes and gain a competitive edge. Due to the iterative process of AI development, it is mandatory to have a structured process in place, from the design to the deployment [...] Read more.
Financial institutions are increasingly turning to artificial intelligence (AI) to improve their decision-making processes and gain a competitive edge. Due to the iterative process of AI development, it is mandatory to have a structured process in place, from the design to the deployment of AI-based services in the finance industry. This process must include the required validation and coordination with regulatory authorities. An appropriate dashboard can help to shape and structure the process of model development, e.g., for credit assessment in the finance industry. In addition, the analysis of datasets must be included as an important part of the dashboard to understand the reasons for changes in model performance. Furthermore, a dashboard can undertake documentation tasks to make the process of model development traceable, explainable, and transparent, as required by regulatory authorities in the finance industry. This can offer a comprehensive solution for financial companies to optimize their models, improve regulatory compliance, and ultimately foster sustainable growth in an increasingly competitive market. In this study, we investigate the requirements and provide a prototypical dashboard to create, manage, compare, and validate AI models to be used in the credit assessment of private customers. Full article
(This article belongs to the Special Issue Sustainable Applications for Machine Learning)
Show Figures

Figure 1

Figure 1
<p>Process of the problem-centered design science research approach in <a href="#sec4dot1-make-06-00085" class="html-sec">Section 4.1</a>, <a href="#sec4dot2-make-06-00085" class="html-sec">Section 4.2</a>, <a href="#sec4dot3-make-06-00085" class="html-sec">Section 4.3</a>, <a href="#sec4dot4-make-06-00085" class="html-sec">Section 4.4</a> and <a href="#sec4dot5-make-06-00085" class="html-sec">Section 4.5</a>.</p>
Full article ">Figure 2
<p>CRISP-DM process model.</p>
Full article ">Figure 3
<p>The use-case diagram of the artifact.</p>
Full article ">Figure 4
<p>Data upload module in AIDash.</p>
Full article ">Figure 5
<p>Plot module in AIDash.</p>
Full article ">Figure 6
<p>Model module in AIDash.</p>
Full article ">Figure 7
<p>Dashboard module in AIDash.</p>
Full article ">Figure 8
<p>Data analysis in AIDash.</p>
Full article ">Figure 9
<p>Evaluation of new models.</p>
Full article ">Figure A1
<p>UML activity diagram of upload module.</p>
Full article ">Figure A2
<p>UML activity diagram of plot module.</p>
Full article ">Figure A3
<p>UML activity diagram of model module.</p>
Full article ">Figure A4
<p>UML activity diagram for dashboard module.</p>
Full article ">Figure A5
<p>Change and selection of columns.</p>
Full article ">Figure A6
<p>Coding features.</p>
Full article ">Figure A7
<p>Bar plot comparing two datasets.</p>
Full article ">Figure A8
<p>Pie plot comparing two datasets.</p>
Full article ">Figure A9
<p>Boxplot to clean outliers in selected dataset.</p>
Full article ">Figure A10
<p>Violin plots.</p>
Full article ">Figure A11
<p>Confusion matrix of the dataset from <a href="#app1-make-06-00085" class="html-app">Appendix A</a>.</p>
Full article ">Figure A12
<p>Creating a new model with the selected dataset.</p>
Full article ">Figure A13
<p>Evaluation functions in a comparison view of multiple models.</p>
Full article ">Figure A14
<p>Testing model decisions for credit applications directly in AIDash.</p>
Full article ">Figure A15
<p>An example report of a trained model.</p>
Full article ">Figure A16
<p>Testing model with another dataset in AIDash.</p>
Full article ">
21 pages, 3362 KiB  
Article
Assessing the Value of Transfer Learning Metrics for Radio Frequency Domain Adaptation
by Lauren J. Wong, Braeden P. Muller, Sean McPherson and Alan J. Michaels
Mach. Learn. Knowl. Extr. 2024, 6(3), 1699-1719; https://doi.org/10.3390/make6030084 - 25 Jul 2024
Viewed by 966
Abstract
The use of transfer learning (TL) techniques has become common practice in fields such as computer vision (CV) and natural language processing (NLP). Leveraging prior knowledge gained from data with different distributions, TL offers higher performance and reduced training time, but has yet [...] Read more.
The use of transfer learning (TL) techniques has become common practice in fields such as computer vision (CV) and natural language processing (NLP). Leveraging prior knowledge gained from data with different distributions, TL offers higher performance and reduced training time, but has yet to be fully utilized in applications of machine learning (ML) and deep learning (DL) techniques and applications related to wireless communications, a field loosely termed radio frequency machine learning (RFML). This work examines whether existing transferability metrics, used in other modalities, might be useful in the context of RFML. Results show that the two existing metrics tested, Log Expected Empirical Prediction (LEEP) and Logarithm of Maximum Evidence (LogME), correlate well with post-transfer accuracy and can therefore be used to select source models for radio frequency (RF) domain adaptation and to predict post-transfer accuracy. Full article
(This article belongs to the Section Learning)
Show Figures

Figure 1

Figure 1
<p>A system overview of a radio communications system. In a radio communications system, the transmitter and receiver hardware and synchronization will be imperfect, causing non-zero values of <math display="inline"><semantics> <mrow> <msub> <mi>α</mi> <mo>Δ</mo> </msub> <mrow> <mo>[</mo> <mi>t</mi> <mo>]</mo> </mrow> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <msub> <mi>ω</mi> <mo>Δ</mo> </msub> <mrow> <mo>[</mo> <mi>t</mi> <mo>]</mo> </mrow> </mrow> </semantics></math>, and <math display="inline"><semantics> <mrow> <msub> <mi>θ</mi> <mo>Δ</mo> </msub> <mrow> <mo>[</mo> <mi>t</mi> <mo>]</mo> </mrow> </mrow> </semantics></math>. The wireless channel provides additive noise, <math display="inline"><semantics> <mrow> <mi>ν</mi> <mo>[</mo> <mi>t</mi> <mo>]</mo> </mrow> </semantics></math>.</p>
Full article ">Figure 2
<p>In traditional ML (<b>a</b>), a new model is trained from random initialization for each domain/task pairing. TL (<b>b</b>) utilizes prior knowledge learned on one domain/task, in the form of a pre-trained model, to improve performance on a second domain and/or task. A concrete example for environmental adaptation to signal-to-noise ratio (SNR) is given in blue.</p>
Full article ">Figure 3
<p>A system overview of the (<b>a</b>) dataset creation, (<b>b</b>) model pre-training and TL, and (<b>c</b>) model evaluation and transferability metric calculation processes used in this work.</p>
Full article ">Figure 4
<p>The LEEP (<b>a</b>) and LogME (<b>b</b>) scores versus post-transfer top-1 accuracy for the sweep over SNR. The dashed lines present the linear fits for all target domains.</p>
Full article ">Figure 5
<p>The LEEP (<b>a</b>) and LogME (<b>b</b>) scores versus post-transfer top-1 accuracy for the sweep over FO. The dashed lines present the linear fits for all target domains.</p>
Full article ">Figure 6
<p>The LEEP (<b>a</b>) and LogME (<b>b</b>) scores versus post-transfer top-1 accuracy for the sweep over both SNR and FO. The dashed lines present the linear fits for all target domains.</p>
Full article ">Figure 7
<p>The LEEP versus LogME scores for the sweep over SNR, FO, and both SNR and FO. The dashed lines present the linear fit.</p>
Full article ">Figure 8
<p>The error in the predicted post-transfer accuracy using a linear fit to the LEEP scores (x-axis) and LogME scores (y-axis) for the sweep over SNR.</p>
Full article ">Figure 9
<p>The error in the predicted post-transfer accuracy using a linear fit to the LEEP scores (x-axis) and LogME scores (y-axis) for the sweep over FO. Note the change in scale compared to <a href="#make-06-00084-f008" class="html-fig">Figure 8</a> and <a href="#make-06-00084-f010" class="html-fig">Figure 10</a>.</p>
Full article ">Figure 10
<p>The error in the predicted post-transfer accuracy using a linear fit to the LEEP scores (x-axis) and LogME scores (y-axis) for the sweep over both SNR and FO.</p>
Full article ">
26 pages, 3308 KiB  
Article
Enhancing Visitor Forecasting with Target-Concatenated Autoencoder and Ensemble Learning
by Ray-I Chang, Chih-Yung Tsai and Yu-Wei Chang
Mach. Learn. Knowl. Extr. 2024, 6(3), 1673-1698; https://doi.org/10.3390/make6030083 - 25 Jul 2024
Viewed by 1134
Abstract
Accurate forecasting of inbound visitor numbers is crucial for effective planning and resource allocation in the tourism industry. Preceding forecasting algorithms primarily focused on time series analysis, often overlooking influential factors such as economic conditions. Regression models, on the other hand, face challenges [...] Read more.
Accurate forecasting of inbound visitor numbers is crucial for effective planning and resource allocation in the tourism industry. Preceding forecasting algorithms primarily focused on time series analysis, often overlooking influential factors such as economic conditions. Regression models, on the other hand, face challenges when dealing with high-dimensional data. Previous autoencoders for feature selection do not simultaneously incorporate feature and target information simultaneously, potentially limiting their effectiveness in improving predictive performance. This study presents a novel approach that combines a target-concatenated autoencoder (TCA) with ensemble learning to enhance the accuracy of tourism demand predictions. The TCA method integrates the prediction target into the training process, ensuring that the learned feature representations are optimized for specific forecasting tasks. Extensive experiments conducted on the Taiwan and Hawaii datasets demonstrate that the proposed TCA method significantly outperforms traditional feature selection techniques and other advanced algorithms in terms of the mean absolute percentage error (MAPE), mean absolute error (MAE), and coefficient of determination (R2). The results show that TCA combined with XGBoost achieves MAPE values of 3.3947% and 4.0059% for the Taiwan and Hawaii datasets, respectively, indicating substantial improvements over existing methods. Additionally, the proposed approach yields better R2 and MAE metrics than existing methods, further demonstrating its effectiveness. This study highlights the potential of TCA in providing reliable and accurate forecasts, thereby supporting strategic planning, infrastructure development, and sustainable growth in the tourism sector. Future research is advised to explore real-time data integration, expanded feature sets, and hybrid modeling approaches to further enhance the capabilities of the proposed framework. Full article
(This article belongs to the Special Issue Sustainable Applications for Machine Learning)
Show Figures

Figure 1

Figure 1
<p>Flowchart of the TCA and ensemble learning approach for enhanced visitor forecasting.</p>
Full article ">Figure 2
<p>The TCA architecture for enhanced visitor forecasting.</p>
Full article ">Figure 3
<p>Visualization of feature importance calculation in the TCA.</p>
Full article ">Figure 4
<p>Time series of visitor arrivals per month. (<b>a</b>) Visitor arrivals per month in Taiwan (2001–2023). (<b>b</b>) Visitor arrivals per month in Hawaii (1999–2024).</p>
Full article ">Figure 5
<p>Metrics vs. encoding dimension of target-concatenated autoencoder. (<b>a</b>) Metrics vs. encoding dimension on the Taiwan dataset for various models. (<b>b</b>) Metrics vs. encoding dimension on the Hawaii dataset for various models.</p>
Full article ">Figure 6
<p>Metrics vs. learning rate for TCA. (<b>a</b>) Metrics vs. learning rate for autoencoder on Taiwan dataset for various models. (<b>b</b>) Metrics vs. learning rate for autoencoder on Hawaii dataset for various models.</p>
Full article ">Figure 7
<p>Metrics vs. ablation configuration for TCA. (<b>a</b>) Metrics vs. ablation configuration for autoencoder on Taiwan dataset for various models. (<b>b</b>) Metrics vs. ablation configuration for autoencoder on Hawaii Dataset for various models.</p>
Full article ">Figure 8
<p>(<b>a</b>) True vs. predicted values for the TCA + XGBoost model on the Taiwan dataset. (<b>b</b>) True vs. forecasted values over the testing period for the Taiwan dataset.</p>
Full article ">
3 pages, 500 KiB  
Reply
Reply to Damaševičius, R. Comment on “Novozhilova et al. More Capable, Less Benevolent: Trust Perceptions of AI Systems across Societal Contexts. Mach. Learn. Knowl. Extr. 2024, 6, 342–366”
by Ekaterina Novozhilova, Kate Mays, Sejin Paik and James Katz
Mach. Learn. Knowl. Extr. 2024, 6(3), 1670-1672; https://doi.org/10.3390/make6030082 - 22 Jul 2024
Viewed by 756
Abstract
We would like to thank Dr [...] Full article
(This article belongs to the Special Issue Fairness and Explanation for Trustworthy AI)
3 pages, 498 KiB  
Comment
Comment on Novozhilova et al. More Capable, Less Benevolent: Trust Perceptions of AI Systems across Societal Contexts. Mach. Learn. Knowl. Extr. 2024, 6, 342–366
by Robertas Damaševičius
Mach. Learn. Knowl. Extr. 2024, 6(3), 1667-1669; https://doi.org/10.3390/make6030081 - 22 Jul 2024
Cited by 1 | Viewed by 800
Abstract
The referenced article [...] Full article
(This article belongs to the Section Learning)
16 pages, 2887 KiB  
Article
Global and Local Interpretable Machine Learning Allow Early Prediction of Unscheduled Hospital Readmission
by Rafael Ruiz de San Martín, Catalina Morales-Hernández, Carmen Barberá, Carlos Martínez-Cortés, Antonio Jesús Banegas-Luna, Francisco José Segura-Méndez, Horacio Pérez-Sánchez, Isabel Morales-Moreno and Juan José Hernández-Morante
Mach. Learn. Knowl. Extr. 2024, 6(3), 1653-1666; https://doi.org/10.3390/make6030080 - 17 Jul 2024
Viewed by 1202
Abstract
Nowadays, most of the health expenditure is due to chronic patients who are readmitted several times for their pathologies. Personalized prevention strategies could be developed to improve the management of these patients. The aim of the present work was to develop local predictive [...] Read more.
Nowadays, most of the health expenditure is due to chronic patients who are readmitted several times for their pathologies. Personalized prevention strategies could be developed to improve the management of these patients. The aim of the present work was to develop local predictive models using interpretable machine learning techniques to early identify individual unscheduled hospital readmissions. To do this, a retrospective, case-control study, based on information regarding patient readmission in 2018–2019, was conducted. After curation of the initial dataset (n = 76,210), the final number of participants was n = 29,026. A machine learning analysis was performed following several algorithms using unscheduled hospital readmissions as dependent variable. Local model-agnostic interpretability methods were also performed. We observed a 13% rate of unscheduled hospital readmissions cases. There were statistically significant differences regarding age and days of stay (p < 0.001 in both cases). A logistic regression model revealed chronic therapy (odds ratio: 3.75), diabetes mellitus history (odds ratio: 1.14), and days of stay (odds ratio: 1.02) as relevant factors. Machine learning algorithms yielded better results regarding sensitivity and other metrics. Following, this procedure, days of stay and age were the most important factors to predict unscheduled hospital readmissions. Interestingly, other variables like allergies and adverse drug reaction antecedents were relevant. Individualized prediction models also revealed a high sensitivity. In conclusion, our study identified significant factors influencing unscheduled hospital readmissions, emphasizing the impact of age and length of stay. We introduced a personalized risk model for predicting hospital readmissions with notable accuracy. Future research should include more clinical variables to refine this model further. Full article
Show Figures

Figure 1

Figure 1
<p>Overview of the study design.</p>
Full article ">Figure 2
<p>Characteristics of the population to the presence or not of unscheduled hospital readmission. (<b>a</b>) shows the distribution of ages depending on whether they were readmitted or not. The distribution of readmissions by age is shown in (<b>b</b>). The association with sex is also shown in (<b>c</b>). The distribution of patients’ days of stay based on readmission is also shown in (<b>d</b>). Those patients that were readmitted were older (<span class="html-italic">t</span>-test <span class="html-italic">p</span> &lt; 0.001) and stayed longer (<span class="html-italic">t</span>-test <span class="html-italic">p</span> &lt; 0.0001).</p>
Full article ">Figure 3
<p>Observed (<b>A</b>) and machine learning (<b>B</b>) unscheduled hospital readmission (UHR) probability attending to the ZIP code. Colour scale represents probability of unscheduled hospital readmission, with green colour showing the lowest probability and red colours showing the highest.</p>
Full article ">Figure 4
<p>The impact of the input features on unscheduled hospital readmissions predictions. In the figure, each dot represents the effect of a feature on the prediction for one patient. The redder the colour of the dots is, the higher the value of the features is, and the bluer the colour of the dots is, the lower the value of the features is. Dots to the left <span class="html-italic">x</span>-axis represent patients with values of the features decreasing unexpected hospital readmission prediction, and dots to the right <span class="html-italic">x</span>-axis represent patients with values of the features increasing unexpected hospital readmission prediction.</p>
Full article ">Figure 5
<p>Examples of individualized unscheduled hospital readmissions (UHR) prediction. (<b>a</b>) is an example of personalized risk factor analysis for a patient in the test set identified as UHR. (<b>b</b>) is an example of personalized risk factor analysis for an individual identified as no-UHR.</p>
Full article ">
20 pages, 4689 KiB  
Article
Extending Multi-Output Methods for Long-Term Aboveground Biomass Time Series Forecasting Using Convolutional Neural Networks
by Efrain Noa-Yarasca, Javier M. Osorio Leyton and Jay P. Angerer
Mach. Learn. Knowl. Extr. 2024, 6(3), 1633-1652; https://doi.org/10.3390/make6030079 - 17 Jul 2024
Cited by 1 | Viewed by 1499
Abstract
Accurate aboveground vegetation biomass forecasting is essential for livestock management, climate impact assessments, and ecosystem health. While artificial intelligence (AI) techniques have advanced time series forecasting, a research gap in predicting aboveground biomass time series beyond single values persists. This study introduces RECMO [...] Read more.
Accurate aboveground vegetation biomass forecasting is essential for livestock management, climate impact assessments, and ecosystem health. While artificial intelligence (AI) techniques have advanced time series forecasting, a research gap in predicting aboveground biomass time series beyond single values persists. This study introduces RECMO and DirRecMO, two multi-output methods for forecasting aboveground vegetation biomass. Using convolutional neural networks, their efficacy is evaluated across short-, medium-, and long-term horizons on six Kenyan grassland biomass datasets, and compared with that of existing single-output methods (Recursive, Direct, and DirRec) and multi-output methods (MIMO and DIRMO). The results indicate that single-output methods are superior for short-term predictions, while both single-output and multi-output methods exhibit a comparable effectiveness in long-term forecasts. RECMO and DirRecMO outperform established multi-output methods, demonstrating a promising potential for biomass forecasting. This study underscores the significant impact of multi-output size on forecast accuracy, highlighting the need for optimal size adjustments and showcasing the proposed methods’ flexibility in long-term forecasts. Short-term predictions show less significant differences among methods, complicating the identification of the best performer. However, clear distinctions emerge in medium- and long-term forecasts, underscoring the greater importance of method choice for long-term predictions. Moreover, as the forecast horizon extends, errors escalate across all methods, reflecting the challenges of predicting distant future periods. This study suggests advancing hybrid models (e.g., RECMO and DirRecMO) to improve extended horizon forecasting. Future research should enhance adaptability, investigate multi-output impacts, and conduct comparative studies across diverse domains, datasets, and AI algorithms for robust insights. Full article
(This article belongs to the Section Network)
Show Figures

Figure 1

Figure 1
<p>Time series of aboveground vegetation biomass (kg/ha) derived from calibrated PHYGROW model simulations at six rangeland locations in Kenya. Each subfigure (<b>a</b>–<b>f</b>) represents time series data from a representative location.</p>
Full article ">Figure 2
<p>Conversion of time series data into a supervised dataset. Blue squares: predictor subsequences inputted to the model (window size w). Orange squares: predicted values.</p>
Full article ">Figure 3
<p>Overview of the examined and prospective forecasting methods. Blue squares: predictor subsequences inputted to the model (window size w). Orange squares: predicted values.</p>
Full article ">Figure 4
<p>Architecture of the Convolutional Neural Network.</p>
Full article ">Figure 5
<p>Average RMSE values across different forecasting methods for aboveground vegetation biomass (kg/ha) across a horizon (<span class="html-italic">H</span>) of 24. Each subfigure from (<b>a</b>–<b>f</b>) displays the results of each time series data from a representative study location.</p>
Full article ">Figure 6
<p>Relative RMSE values of aboveground vegetation biomass (kg/ha) across different forecasting methods in relation to forecast horizons. Relative RMSE is calculated as a percentage by dividing the average RMSE by the time series mean.</p>
Full article ">
14 pages, 3044 KiB  
Article
Examining the Global Patent Landscape of Artificial Intelligence-Driven Solutions for COVID-19
by Fabio Mota, Luiza Amara Maciel Braga, Bernardo Pereira Cabral, Natiele Carla da Silva Ferreira, Cláudio Damasceno Pinto, José Aguiar Coelho and Luiz Anastacio Alves
Mach. Learn. Knowl. Extr. 2024, 6(3), 1619-1632; https://doi.org/10.3390/make6030078 - 16 Jul 2024
Viewed by 1572
Abstract
Artificial Intelligence (AI) technologies have been widely applied to tackle Coronavirus Disease 2019 (COVID-19) challenges, from diagnosis to prevention. Patents are a valuable source for understanding the AI technologies used in the COVID-19 context, allowing the identification of the current technological scenario, fields [...] Read more.
Artificial Intelligence (AI) technologies have been widely applied to tackle Coronavirus Disease 2019 (COVID-19) challenges, from diagnosis to prevention. Patents are a valuable source for understanding the AI technologies used in the COVID-19 context, allowing the identification of the current technological scenario, fields of application, and research, development, and innovation trends. This study aimed to analyze the global patent landscape of AI applications related to COVID-19. To do so, we analyzed AI-related COVID-19 patent metadata collected in the Derwent Innovations Index using systematic review, bibliometrics, and network analysis., Our results show diagnosis as the most frequent application field, followed by prevention. Deep Learning algorithms, such as Convolutional Neural Network (CNN), were predominantly used for diagnosis, while Machine Learning algorithms, such as Support Vector Machine (SVM), were mainly used for prevention. The most frequent International Patent Classification Codes were related to computing arrangements based on specific computational models, information, and communication technology for detecting, monitoring, or modeling epidemics or pandemics, and methods or arrangements for pattern recognition using electronic means. The most central algorithms of the two-mode network were CNN, SVM, and Random Forest (RF), while the most central application fields were diagnosis, prevention, and forecast. The most significant connection between algorithms and application fields occurred between CNN and diagnosis. Our findings contribute to a better understanding of the technological landscape involving AI and COVID-19, and we hope they can inform future research and development’s decision making and planning. Full article
(This article belongs to the Section Data)
Show Figures

Figure 1

Figure 1
<p>Patent records’ classification into AI subgroups. All 142 records were classified as AI-related, citing AI in general, or algorithms such as chatbot and NLP or ML or DL. Of these, 124 records were classified as ML-related for citing ML, DL, or their related algorithms. Moreover, 78 records were classified as DL-related for specifically citing DL or its related algorithms.</p>
Full article ">Figure 2
<p>Patent records’ classifications into AI-related algorithms and application fields. (<b>a</b>) Most frequent AI-related algorithm (higher or equal to three). (<b>b</b>) Most frequent application fields. Algorithm abbreviations: Convolutional Neural Network (CNN), Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), Artificial Neural Network (ANN), K-nearest neighbors (K-NN), Natural Language Processing (NLP), Multilayer Perceptron (MLP), Deep Neural Network (DNN), Fully Convolutional Network (FCN), Principal Component Analysis (PCA), and Recurrent Neural Network (RNN).</p>
Full article ">Figure 3
<p>IPC8 analysis. (<b>a</b>) Most frequent IPC (higher or equal to five). (<b>b</b>) Sunburst of IPC (all records). It shows the hierarchical structure of the IPC codes from the most aggregate (the innermost layer) to the least aggregate (the outermost layer). The segment’s size is according to their share of data.</p>
Full article ">Figure 4
<p>Network of IPC codes with a frequency higher or equal to five. The network consists of 30 nodes and 237 edges. Node sizes and colors and edges’ thicknesses are given by the weighted degree centrality. The network layout was given by the Fruchterman–Reingold algorithm.</p>
Full article ">Figure 5
<p>Two-mode network of AI-related algorithms and application fields. The network consists of 25 nodes and 144 edges. Gray nodes are algorithms (17), and blue nodes are application fields (8). Node sizes and edges’ thicknesses are given by the weighted degree centrality. The networks’ layout was given by the Fruchterman–Reingold algorithm. Algorithm abbreviations: Convolutional Neural Network (CNN), Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), Artificial Neural Network (ANN), K-nearest neighbors (K-NN), Natural Language Processing (NLP), Multilayer Perceptron (MLP), Deep Neural Network (DNN), Fully Convolutional Network (FCN), Principal Component Analysis (PCA), and Recurrent Neural Network (RNN).</p>
Full article ">
22 pages, 1933 KiB  
Article
Learning Effective Good Variables from Physical Data
by Giulio Barletta, Giovanni Trezza and Eliodoro Chiavazzo
Mach. Learn. Knowl. Extr. 2024, 6(3), 1597-1618; https://doi.org/10.3390/make6030077 - 12 Jul 2024
Cited by 2 | Viewed by 1091
Abstract
We assume that a sufficiently large database is available, where a physical property of interest and a number of associated ruling primitive variables or observables are stored. We introduce and test two machine learning approaches to discover possible groups or combinations of primitive [...] Read more.
We assume that a sufficiently large database is available, where a physical property of interest and a number of associated ruling primitive variables or observables are stored. We introduce and test two machine learning approaches to discover possible groups or combinations of primitive variables, regardless of data origin, being it numerical or experimental: the first approach is based on regression models, whereas the second on classification models. The variable group (here referred to as the new effective good variable) can be considered as successfully found when the physical property of interest is characterized by the following effective invariant behavior: in the first method, invariance of the group implies invariance of the property up to a given accuracy; in the other method, upon partition of the physical property values into two or more classes, invariance of the group implies invariance of the class. For the sake of illustration, the two methods are successfully applied to two popular empirical correlations describing the convective heat transfer phenomenon and to the Newton’s law of universal gravitation. Full article
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Overview of the protocol used to detect possible symmetries of a target property of interest with respect to its input variables, utilizing only data and ignoring the analytical functional dependence. Two distinct methodologies are presented: the former for identifying, in regression tasks, invariant groups in the form <math display="inline"><semantics> <mrow> <msubsup> <mi>x</mi> <mi>i</mi> <msub> <mi>α</mi> <mn>1</mn> </msub> </msubsup> <msubsup> <mi>x</mi> <mi>j</mi> <msub> <mi>α</mi> <mn>2</mn> </msub> </msubsup> <mo>⋯</mo> <msubsup> <mi>x</mi> <mi>m</mi> <msub> <mi>α</mi> <mi>p</mi> </msub> </msubsup> </mrow> </semantics></math>, among others; the latter for identifying, in classification tasks, one or several mixed features as power combinations of the input variables to achieve an optimal class separation.</p>
Full article ">Figure 2
<p>Overview of the procedure for identifying invariant groups/sets. A regression model is trained on the physical data and used to compute the gradient of the objective function in a point <math display="inline"><semantics> <msub> <mi mathvariant="bold">x</mi> <mn>0</mn> </msub> </semantics></math>. The matrix <b>B</b> is constructed according to the functional structure of the investigated group/set, and its kernel <b>K</b> is computed. Finally, the condition of invariance between gradient and kernel is coupled to the normalization conditions of the coefficients. If the resulting non-linear system is satisfied for the same coefficients over the <math display="inline"><semantics> <mrow> <mi>f</mi> <mo>(</mo> <mi mathvariant="bold">x</mi> <mo>)</mo> </mrow> </semantics></math> domain, the group/set is an intrinsic variable and <math display="inline"><semantics> <mrow> <mi>f</mi> <mo>(</mo> <mi mathvariant="bold">x</mi> <mo>)</mo> </mrow> </semantics></math> is invariant with respect to it.</p>
Full article ">Figure 3
<p>Overview of the procedure to identify optimal mixed variables for class separation. Threshold values are chosen to divide the physical data in classes. A Pareto optimization is performed to construct a reduced set of synthetic features that simultaneously maximizes the Bhattacharyya distance between the classes and (i) minimizes the variances of the class distributions, in the one dimensional case, or (ii) minimizes the determinants of the covariance matrix of the class distributions, in the multi dimensional case.</p>
Full article ">Figure 4
<p>Results of the DNN regression model for the noised Nusselt number <math display="inline"><semantics> <mover> <mi>Nu</mi> <mo>¯</mo> </mover> </semantics></math> in the Dittus–Boelter correlation. (<b>a</b>) Predictions over the testing set, and (<b>b</b>) corresponding loss curves for the DNN model. Model performances are shown in terms of coefficient of determination <math display="inline"><semantics> <msup> <mi>R</mi> <mn>2</mn> </msup> </semantics></math>, mean absolute error (MAE), and root mean squared error (RMSE).</p>
Full article ">Figure 5
<p>One dimensional example for classification on the Dittus–Boelter correlation. (<b>a</b>) PDFs over binned data of the training set for the two classes (<math display="inline"><semantics> <mrow> <mover> <mi>Nu</mi> <mo>¯</mo> </mover> <mo>&lt;</mo> <mn>395</mn> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mover> <mi>Nu</mi> <mo>¯</mo> </mover> <mo>≥</mo> <mn>395</mn> </mrow> </semantics></math>) reported against the normalized flow velocity. (<b>b</b>) PDFs over binned data of the training set for the two classes reported against the mixed feature <math display="inline"><semantics> <msub> <mi>y</mi> <mn>1</mn> </msub> </semantics></math>, constructed according to Equation (<a href="#FD13-make-06-00077" class="html-disp-formula">13</a>) and choosing the point of the Pareto front with the least overlapping of the two classes according to the Bhattacharyya distance, along with a GEV analytical fitting of the two binnings. (<b>c</b>) PDFs over binned data of the testing set for the two classes reported against the same mixed feature <math display="inline"><semantics> <msub> <mi>y</mi> <mn>1</mn> </msub> </semantics></math> together with the same GEV fittings of the (<b>b</b>) subfigure. The mixed variable <math display="inline"><semantics> <msub> <mover accent="true"> <mi>y</mi> <mo>˜</mo> </mover> <mn>1</mn> </msub> </semantics></math> shown here is referred exclusively to this Dittus–Boelter one dimensional optimization.</p>
Full article ">Figure 6
<p>Two dimensional example for the classification on the Dittus–Boelter correlation. (<b>a</b>) PDFs over binned data of the training set for the two classes (<math display="inline"><semantics> <mrow> <mover> <mi>Nu</mi> <mo>¯</mo> </mover> <mo>&lt;</mo> <mn>395</mn> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mover> <mi>Nu</mi> <mo>¯</mo> </mover> <mo>≥</mo> <mn>395</mn> </mrow> </semantics></math>) reported against the normalized flow velocity <math display="inline"><semantics> <msub> <mi>u</mi> <mi>norm</mi> </msub> </semantics></math> and the normalized hydraulic diameter <math display="inline"><semantics> <msub> <mi>d</mi> <mi>norm</mi> </msub> </semantics></math>. (<b>b</b>) PDFs over binned data of the training set for the two classes reported against the mixed features <math display="inline"><semantics> <mrow> <msub> <mover accent="true"> <mi>y</mi> <mo>˜</mo> </mover> <mn>1</mn> </msub> <mo>,</mo> <msub> <mover accent="true"> <mi>y</mi> <mo>˜</mo> </mover> <mn>2</mn> </msub> </mrow> </semantics></math>, constructed according to Equation (<a href="#FD13-make-06-00077" class="html-disp-formula">13</a>) and choosing the point of the Pareto front with the least overlapping of the two classes according to the Bhattacharyya distance. (<b>c</b>) PDFs over binned data of the testing set for the two classes reported against the same mixed features <math display="inline"><semantics> <mrow> <msub> <mover accent="true"> <mi>y</mi> <mo>˜</mo> </mover> <mn>1</mn> </msub> <mo>,</mo> <msub> <mover accent="true"> <mi>y</mi> <mo>˜</mo> </mover> <mn>2</mn> </msub> </mrow> </semantics></math>. The mixed variables <math display="inline"><semantics> <mrow> <msub> <mover accent="true"> <mi>y</mi> <mo>˜</mo> </mover> <mn>1</mn> </msub> <mo>,</mo> <msub> <mover accent="true"> <mi>y</mi> <mo>˜</mo> </mover> <mn>2</mn> </msub> </mrow> </semantics></math> shown here are referred exclusively to this Dittus–Boelter two dimensional optimization.</p>
Full article ">Figure 7
<p>One dimensional example for the ternary classification on the Dittus–Boelter. (<b>a</b>) PDFs over binned data of the training set for the three classes (<math display="inline"><semantics> <mrow> <mover> <mi>Nu</mi> <mo>¯</mo> </mover> <mo>&lt;</mo> <mn>197.5</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mn>197.5</mn> <mo>≤</mo> <mover> <mi>Nu</mi> <mo>¯</mo> </mover> <mo>&lt;</mo> <mn>395</mn> </mrow> </semantics></math>, and <math display="inline"><semantics> <mrow> <mover> <mi>Nu</mi> <mo>¯</mo> </mover> <mo>≥</mo> <mn>395</mn> </mrow> </semantics></math>) reported against the normalized flow velocity <math display="inline"><semantics> <msub> <mi>u</mi> <mi>norm</mi> </msub> </semantics></math>. (<b>b</b>) PDFs over binned data of the training set for the three classes reported against the mixed feature <math display="inline"><semantics> <msub> <mover accent="true"> <mi>y</mi> <mo>˜</mo> </mover> <mn>1</mn> </msub> </semantics></math>, constructed according to Equation (<a href="#FD13-make-06-00077" class="html-disp-formula">13</a>) and choosing the point of the Pareto front with the least overlapping of the three classes according to the Bhattacharyya distance, along with a GEV analytical fitting of the three binnings. (<b>c</b>) PDFs over binned data of the testing set for the three classes reported against the same mixed feature <math display="inline"><semantics> <msub> <mover accent="true"> <mi>y</mi> <mo>˜</mo> </mover> <mn>1</mn> </msub> </semantics></math> together with the same GEV fittings of the (<b>b</b>) subfigure. The mixed variable <math display="inline"><semantics> <msub> <mover accent="true"> <mi>y</mi> <mo>˜</mo> </mover> <mn>1</mn> </msub> </semantics></math> shown here is referred exclusively to this Dittus–Boelter one-dimensional optimization.</p>
Full article ">Figure 8
<p>Results of the DNN regression model for the noised Nusselt number <math display="inline"><semantics> <mover> <mi>Nu</mi> <mo>¯</mo> </mover> </semantics></math> in the Gnielinski correlation. (<b>a</b>) Predictions over the testing set, and (<b>b</b>) corresponding loss curves for the DNN model. Model performances are shown in terms of coefficient of determination <math display="inline"><semantics> <msup> <mi>R</mi> <mn>2</mn> </msup> </semantics></math>, mean absolute error (MAE), and root mean squared error (RMSE).</p>
Full article ">Figure 9
<p>One dimensional example: (<b>a</b>) PDFs over binned data of the training set for the two classes (<math display="inline"><semantics> <mrow> <mover> <mi>Nu</mi> <mo>¯</mo> </mover> <mo>&lt;</mo> <mn>500</mn> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mover> <mi>Nu</mi> <mo>¯</mo> </mover> <mo>≥</mo> <mn>500</mn> </mrow> </semantics></math>) reported against the normalized flow velocity. (<b>b</b>) PDFs over binned data of the training set for the two classes reported against the mixed feature <math display="inline"><semantics> <msub> <mi>y</mi> <mn>1</mn> </msub> </semantics></math>, constructed according to Equation (<a href="#FD13-make-06-00077" class="html-disp-formula">13</a>) and choosing the point of the Pareto front with the least overlapping of the two classes according to the Bhattacharyya distance, along with a GEV analytical fitting of the two binnings. (<b>c</b>) PDFs over binned data of the testing set for the two classes reported against the same mixed feature <math display="inline"><semantics> <msub> <mi>y</mi> <mn>1</mn> </msub> </semantics></math> together with the same GEV fittings of the (<b>b</b>) subfigure. The mixed variable <math display="inline"><semantics> <msub> <mover accent="true"> <mi>y</mi> <mo>˜</mo> </mover> <mn>1</mn> </msub> </semantics></math> shown here is referred exclusively to this Gnielinski one dimensional optimization.</p>
Full article ">Figure 10
<p>Two dimensional example for the classification on the Gnielinski correlation: (<b>a</b>) PDFs over binned data of the training set for the two classes (<math display="inline"><semantics> <mrow> <mover> <mi>Nu</mi> <mo>¯</mo> </mover> <mo>&lt;</mo> <mn>500</mn> </mrow> </semantics></math> and <math display="inline"><semantics> <mrow> <mover> <mi>Nu</mi> <mo>¯</mo> </mover> <mo>≥</mo> <mn>500</mn> </mrow> </semantics></math>) reported against the normalized flow velocity <math display="inline"><semantics> <msub> <mi>u</mi> <mi>norm</mi> </msub> </semantics></math> and friction factor <span class="html-italic">f</span>. (<b>b</b>) PDFs over binned data of the training set for the two classes reported against the mixed features <math display="inline"><semantics> <mrow> <msub> <mover accent="true"> <mi>y</mi> <mo>˜</mo> </mover> <mn>1</mn> </msub> <mo>,</mo> <msub> <mover accent="true"> <mi>y</mi> <mo>˜</mo> </mover> <mn>2</mn> </msub> </mrow> </semantics></math>, constructed according to Equation (<a href="#FD13-make-06-00077" class="html-disp-formula">13</a>) and choosing the point of the Pareto front with the least overlapping of the two classes according to the Bhattacharyya distance. (<b>c</b>) PDFs over binned data of the testing set for the two classes reported against the same mixed features <math display="inline"><semantics> <mrow> <msub> <mover accent="true"> <mi>y</mi> <mo>˜</mo> </mover> <mn>1</mn> </msub> <mo>,</mo> <msub> <mover accent="true"> <mi>y</mi> <mo>˜</mo> </mover> <mn>2</mn> </msub> </mrow> </semantics></math>. The mixed variables <math display="inline"><semantics> <mrow> <msub> <mover accent="true"> <mi>y</mi> <mo>˜</mo> </mover> <mn>1</mn> </msub> <mo>,</mo> <msub> <mover accent="true"> <mi>y</mi> <mo>˜</mo> </mover> <mn>2</mn> </msub> </mrow> </semantics></math> shown here are referred exclusively to this Gnielinski two dimensional optimization.</p>
Full article ">Figure 11
<p>One dimensional example for the ternary classification on the Gnielinski correlation. (<b>a</b>) PDFs over binned data of the training set for the three classes (<math display="inline"><semantics> <mrow> <mover> <mi>Nu</mi> <mo>¯</mo> </mover> <mo>&lt;</mo> <mn>400</mn> </mrow> </semantics></math>, <math display="inline"><semantics> <mrow> <mn>400</mn> <mo>≤</mo> <mover> <mi>Nu</mi> <mo>¯</mo> </mover> <mo>&lt;</mo> <mn>900</mn> </mrow> </semantics></math>, and <math display="inline"><semantics> <mrow> <mover> <mi>Nu</mi> <mo>¯</mo> </mover> <mo>≥</mo> <mn>900</mn> </mrow> </semantics></math>) reported against the normalized flow velocity <math display="inline"><semantics> <msub> <mi>u</mi> <mi>norm</mi> </msub> </semantics></math>. (<b>b</b>) PDFs over binned data of the training set for the three classes reported against the mixed feature <math display="inline"><semantics> <msub> <mover accent="true"> <mi>y</mi> <mo>˜</mo> </mover> <mn>1</mn> </msub> </semantics></math>, constructed according to Equation (<a href="#FD13-make-06-00077" class="html-disp-formula">13</a>) and choosing the point of the Pareto front with the least overlapping of the three classes according to the Bhattacharyya distance, along with a GEV analytical fitting of the three binnings. (<b>c</b>) PDFs over binned data of the testing set for the three classes reported against the same mixed feature <math display="inline"><semantics> <msub> <mover accent="true"> <mi>y</mi> <mo>˜</mo> </mover> <mn>1</mn> </msub> </semantics></math> together with the same GEV fittings of the <b>b</b> subfigure. The mixed variable <math display="inline"><semantics> <msub> <mover accent="true"> <mi>y</mi> <mo>˜</mo> </mover> <mn>1</mn> </msub> </semantics></math> shown here is referred exclusively to this Gnielinski one-dimensional optimization.</p>
Full article ">Figure 12
<p>Results of the DNN regression model for the noised gravitational force <math display="inline"><semantics> <msub> <mi>F</mi> <mi mathvariant="normal">g</mi> </msub> </semantics></math>. (<b>a</b>) Predictions over the testing set, and (<b>b</b>) corresponding loss curves for the DNN model. Model performances are shown in terms of coefficient of determination <math display="inline"><semantics> <msup> <mi>R</mi> <mn>2</mn> </msup> </semantics></math>, mean absolute error (MAE), and root mean squared error (RMSE).</p>
Full article ">
18 pages, 31818 KiB  
Article
Deep Learning-Powered Optical Microscopy for Steel Research
by Šárka Mikmeková, Martin Zouhar, Jan Čermák, Ondřej Ambrož, Patrik Jozefovič, Ivo Konvalina, Eliška Materna Mikmeková and Jiří Materna
Mach. Learn. Knowl. Extr. 2024, 6(3), 1579-1596; https://doi.org/10.3390/make6030076 - 11 Jul 2024
Cited by 1 | Viewed by 1326
Abstract
The success of machine learning (ML) models in object or pattern recognition naturally leads to ML being employed in the classification of the microstructure of steel surfaces. Light optical microscopy (LOM) is the traditional imaging process in this field. However, the increasing use [...] Read more.
The success of machine learning (ML) models in object or pattern recognition naturally leads to ML being employed in the classification of the microstructure of steel surfaces. Light optical microscopy (LOM) is the traditional imaging process in this field. However, the increasing use of ML to extract or relate more aspects of the aforementioned materials and the limitations of LOM motivated us to provide an improvement to the established image acquisition process. In essence, we perform style transfer from LOM to scanning electron microscopy (SEM) combined with “intelligent” upscaling. This is achieved by employing an ML model trained on a multimodal dataset to generate an SEM-like image from the corresponding LOM image. This transformation, in our opinion, which is corroborated by a detailed analysis of the source, target and prediction, successfully pushes the limits of LOM in the case of steel surfaces. The expected consequence is the improvement of the precise characterization of advanced multiphase steels’ structure based on these transformed LOM images. Full article
(This article belongs to the Section Learning)
Show Figures

Graphical abstract

Graphical abstract
Full article ">Figure 1
<p>Schematics of the CBS and ETD detectors’ arrangement in the HR mode for collecting backscattered electrons (BSEs) emitted from the sample.</p>
Full article ">Figure 2
<p>Illustrations of different navigation grids include (<b>a</b>) a TEM grid that was glued onto the sample (early data only), and (<b>b</b>) a picosecond laser-engraved navigation grid, with a subgrid and one of its individual cells highlighted in the top-left corner; a single square subgrid and its elemental square are highlighted by red color. Republished from Ref. [<a href="#B19-make-06-00076" class="html-bibr">19</a>] with permission.</p>
Full article ">Figure 3
<p>This schematic illustrates the step-by-step workflow that was employed to generate a final dataset of correlative images of a bulk metallographic sample captured using SEM, CLSM and LOM modalities. The process begins with the engraving of a navigation grid and culminates in the creation of the final dataset. Republished from Ref. [<a href="#B19-make-06-00076" class="html-bibr">19</a>] with permission.</p>
Full article ">Figure 4
<p>The U-Net neural network architecture. Both input and output are grayscale images and their three dimensions—pixels in both directions and the number of channels—are explicitly provided. All other single values beside individual components represent the number of features/filters in the corresponding convolutional layers except where otherwise noted. The number of pixels is clear from the operations performed; convolution is always paired with padding (using TensorFlow’s function <tt>Conv2D</tt> with its parameter <tt>padding</tt> set to value <tt>‘same’</tt>) at the edges to prevent pixel-count reduction. Two inputs are joined by a simple layer concatenation indicated by a circle.</p>
Full article ">Figure 5
<p>Visualization of the discriminator part of the GAN architecture. The U-Net serves as the generator (see <a href="#make-06-00076-f004" class="html-fig">Figure 4</a>) and a basic CNN (displayed) as the discriminator. The meaning of symbols is as in <a href="#make-06-00076-f004" class="html-fig">Figure 4</a>. The “zoom” in the left-most part indicates an internal batching process.</p>
Full article ">Figure 6
<p>The first row displays as-measured preprocessed data; the second row comprises transformed LOM images. As the labels indicate, the first column represents the U-Net results and the rest are GAN results. We marked some occurrences of several phases which may include ferrite (red “F”), pearlite (red “P”) and (other) secondary phases. The displayed field of view represents one corner-aligned 512 × 512 tile, a quarter of a single 1024 × 1024 image in the dataset. The material is construction steel S355J2.</p>
Full article ">Figure 7
<p>The same as <a href="#make-06-00076-f006" class="html-fig">Figure 6</a> in the case of TRIP1 steel. Yellow “SP”; each highlighted example is indicated by an arrow. We marked some occurrences of several phases which may include bainite (red “B”), ferrite (red “F”).</p>
Full article ">Figure 8
<p>The same as <a href="#make-06-00076-f006" class="html-fig">Figure 6</a> in the case of TRIP2 steel. Yellow “SP”; each highlighted example is indicated by an arrow.</p>
Full article ">Figure 9
<p>The same as <a href="#make-06-00076-f006" class="html-fig">Figure 6</a> in the case of TRIP2 steel (another dataset). Yellow “SP”; each highlighted example is indicated by an arrow.</p>
Full article ">Figure 10
<p>The same as <a href="#make-06-00076-f006" class="html-fig">Figure 6</a> in the case of boron steel for hot-stamping USIBOR.</p>
Full article ">Figure 11
<p>Comparison of an original RGB LOM image (2464 × 2056 pixels) with the CBS-like prediction (8069 × 6745 pixels). Both images were cropped to the same field of view in order to remove minor artifacts in the prediction (due to inelastic stitching of individual predicted tiles).</p>
Full article ">Figure 12
<p>Zoom of a region of interest in the case of S355J2 steel, top-right corner of segments in <a href="#make-06-00076-f006" class="html-fig">Figure 6</a>.</p>
Full article ">Figure 13
<p>The same as <a href="#make-06-00076-f012" class="html-fig">Figure 12</a> in the case of TRIP1 steel, top-right corner of segments in <a href="#make-06-00076-f007" class="html-fig">Figure 7</a>. The arrows highlight visual improvements over LOM. We note that pure U-Net ETD prediction is missing as this model was not trained on ETD data.</p>
Full article ">Figure 14
<p>The same as <a href="#make-06-00076-f012" class="html-fig">Figure 12</a> in the case of TRIP2 steel, slightly below the center in <a href="#make-06-00076-f009" class="html-fig">Figure 9</a>. The arrows highlight visual improvements over LOM.</p>
Full article ">
Previous Issue
Next Issue
Back to TopTop