Nothing Special   »   [go: up one dir, main page]

You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (2,363)

Search Parameters:
Keywords = XGBoost model

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
38 pages, 3147 KiB  
Article
A Risk-Optimized Framework for Data-Driven IPO Underperformance Prediction in Complex Financial Systems
by Mazin Alahmadi
Systems 2025, 13(3), 179; https://doi.org/10.3390/systems13030179 - 6 Mar 2025
Abstract
Accurate predictions of Initial Public Offerings (IPOs) aftermarket performance are essential for making informed investment decisions in the financial sector. This paper attempts to predict IPO short-term underperformance during a month post-listing. The current research landscape lacks modern models that address the needs [...] Read more.
Accurate predictions of Initial Public Offerings (IPOs) aftermarket performance are essential for making informed investment decisions in the financial sector. This paper attempts to predict IPO short-term underperformance during a month post-listing. The current research landscape lacks modern models that address the needs of small and imbalanced datasets relevant to emerging markets, as well as the risk preferences of investors. To fill this gap, we present a practical framework utilizing tree-based ensemble learning, including Bagging Classifier (BC), Random Forest (RF), AdaBoost (Ada), Gradient Boosting (GB), XGBoost (XG), Stacking Classifier (SC), and Extra Trees (ET), with Decision Tree (DT) as a base estimator. The framework leverages data-driven methodologies to optimize decision-making in complex financial systems, integrating ANOVA F-value for feature selection, Randomized Search for hyperparameter optimization, and SMOTE for class balance. The framework’s effectiveness is assessed using a hand-collected dataset that includes features from both pre-IPO prospectus and firm-specific financial data. We thoroughly evaluate the results using single-split evaluation and 10-fold cross-validation analysis. For the single-split validation, ET achieves the highest accuracy of 86%, while for the 10-fold validation, BC achieves the highest accuracy of 70%. Additionally, we compare the results of the proposed framework with deep-learning models such as MLP, TabNet, and ANN to assess their effectiveness in handling IPO underperformance predictions. These results demonstrate the framework’s capability to enable robust data-driven decision-making processes in complex and dynamic financial environments, even with limited and imbalanced datasets. The framework also proposes a dynamic methodology named Investor Preference Prediction Framework (IPPF) to match tree-based ensemble models to investors’ risk preferences when predicting IPO underperformance. It concludes that different models may be suitable for various risk profiles. For the dataset at hand, ET and Ada are more appropriate for risk-averse investors, while BC is suitable for risk-tolerant investors. The results underscore the framework’s importance in improving IPO underperformance predictions, which can better inform investment strategies and decision-making processes. Full article
(This article belongs to the Special Issue Data-Driven Decision Making for Complex Systems)
Show Figures

Figure 1

Figure 1
<p>The Proposed Framework.</p>
Full article ">Figure 2
<p>ROC curves of all the classifiers during testing.</p>
Full article ">Figure 3
<p>Comparison with existing studies using the test dataset [<a href="#B37-systems-13-00179" class="html-bibr">37</a>].</p>
Full article ">Figure 4
<p>Representation of model selection adjusted for investor’s risk level for single-split validation.</p>
Full article ">Figure 5
<p>Representation of model selection adjusted for investor’s risk level for 10-fold validation.</p>
Full article ">Figure 6
<p>Robustness Ratio Curves for Both Single-Split and 10-Fold Validations.</p>
Full article ">
39 pages, 9925 KiB  
Article
Dynamic Workload Management System in the Public Sector: A Comparative Analysis
by Konstantinos C. Giotopoulos, Dimitrios Michalopoulos, Gerasimos Vonitsanos, Dimitris Papadopoulos, Ioanna Giannoukou and Spyros Sioutas
Future Internet 2025, 17(3), 119; https://doi.org/10.3390/fi17030119 - 6 Mar 2025
Abstract
Efficient human resource management is critical to public sector performance, particularly in dynamic environments where traditional systems struggle to adapt to fluctuating workloads. The increasing complexity of public sector operations and the need for equitable task allocation highlight the limitations of conventional evaluation [...] Read more.
Efficient human resource management is critical to public sector performance, particularly in dynamic environments where traditional systems struggle to adapt to fluctuating workloads. The increasing complexity of public sector operations and the need for equitable task allocation highlight the limitations of conventional evaluation methods, which often fail to account for variations in employee performance and workload demands. This study addresses these challenges by optimizing load distribution through predicting employee capability using data-driven approaches, ensuring efficient resource utilization and enhanced productivity. Using a dataset encompassing public/private sector experience, educational history, and age, we evaluate the effectiveness of seven machine learning algorithms: Linear Regression, Artificial Neural Networks (ANNs), Adaptive Neuro-Fuzzy Inference System (ANFIS), Support Vector Machine (SVM), Gradient Boosting Machine (GBM), Bagged Decision Trees, and XGBoost in predicting employee capability and optimizing task allocation. Performance is assessed through ten evaluation metrics, including Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE), ensuring a comprehensive assessment of accuracy, robustness, and bias. The results demonstrate ANFIS as the superior model, consistently outperforming other algorithms across all metrics. By synergizing fuzzy logic’s capacity to model uncertainty with neural networks’ adaptive learning, ANFIS effectively captures non-linear relationships and variations in employee performance, enabling precise capability predictions in dynamic environments. This research highlights the transformative potential of machine learning in public sector workforce management, underscoring the role of data-driven decision-making in improving task allocation, operational efficiency, and resource utilization. Full article
Show Figures

Figure 1

Figure 1
<p>Employee profile.</p>
Full article ">Figure 2
<p>Task allocation and CF calculation.</p>
Full article ">Figure 3
<p>Workflow for determining the capacity factor.</p>
Full article ">Figure 4
<p>Linear Regression analysis Time Factor.</p>
Full article ">Figure 5
<p>Configuration 3. ANN Time Factor performance for three layers of 8 × 4 × 4 neurons.</p>
Full article ">Figure 6
<p>Configuration 3. ANN Training.</p>
Full article ">Figure 7
<p>Configuration 3. ANN validation performance.</p>
Full article ">Figure 8
<p>ANFIS performance.</p>
Full article ">Figure 9
<p>ANFIS Time Factor performance.</p>
Full article ">Figure 10
<p>GBM Time Factor performance.</p>
Full article ">Figure 11
<p>Bagged Decision Tree Time Factor performance.</p>
Full article ">Figure 12
<p>SVM Time Factor performance.</p>
Full article ">Figure 13
<p>XGBoost Time Factor performance.</p>
Full article ">Figure 14
<p>Load Control based on CF produced from algorithms. (<b>a</b>) Load Control: ANFIS; (<b>b</b>) Load Control: Regression Analysis; (<b>c</b>) Load Control: ANN; (<b>d</b>) Load Control: Gradient Boosting Machine; (<b>e</b>) Load Control: Bagged Decision Trees; (<b>f</b>) Load Control: Support Vector Machines; (<b>g</b>) Load Control: XGBoost.</p>
Full article ">Figure 14 Cont.
<p>Load Control based on CF produced from algorithms. (<b>a</b>) Load Control: ANFIS; (<b>b</b>) Load Control: Regression Analysis; (<b>c</b>) Load Control: ANN; (<b>d</b>) Load Control: Gradient Boosting Machine; (<b>e</b>) Load Control: Bagged Decision Trees; (<b>f</b>) Load Control: Support Vector Machines; (<b>g</b>) Load Control: XGBoost.</p>
Full article ">
19 pages, 4910 KiB  
Article
A Novel SHAP-GAN Network for Interpretable Ovarian Cancer Diagnosis
by Jingxun Cai, Zne-Jung Lee, Zhihxian Lin and Ming-Ren Yang
Mathematics 2025, 13(5), 882; https://doi.org/10.3390/math13050882 - 6 Mar 2025
Abstract
Ovarian cancer stands out as one of the most formidable adversaries in women’s health, largely due to its typically subtle and nonspecific early symptoms, which pose significant challenges to early detection and diagnosis. Although existing diagnostic methods, such as biomarker testing and imaging, [...] Read more.
Ovarian cancer stands out as one of the most formidable adversaries in women’s health, largely due to its typically subtle and nonspecific early symptoms, which pose significant challenges to early detection and diagnosis. Although existing diagnostic methods, such as biomarker testing and imaging, can help with early diagnosis to some extent, these methods still have limitations in sensitivity and accuracy, often leading to misdiagnosis or missed diagnosis. Ovarian cancer’s high heterogeneity and complexity increase diagnostic challenges, especially in disease progression prediction and patient classification. Machine learning (ML) has outperformed traditional methods in cancer detection by processing large datasets to identify patterns missed by conventional techniques. However, existing AI models still struggle with accuracy in handling imbalanced and high-dimensional data, and their “black-box” nature limits clinical interpretability. To address these issues, this study proposes SHAP-GAN, an innovative diagnostic model for ovarian cancer that integrates Shapley Additive exPlanations (SHAP) with Generative Adversarial Networks (GANs). The SHAP module quantifies each biomarker’s contribution to the diagnosis, while the GAN component optimizes medical data generation. This approach tackles three key challenges in medical diagnosis: data scarcity, model interpretability, and diagnostic accuracy. Results show that SHAP-GAN outperforms traditional methods in sensitivity, accuracy, and interpretability, particularly with high-dimensional and imbalanced ovarian cancer datasets. The top three influential features identified are PRR11, CIAO1, and SMPD3, which exhibit wide SHAP value distributions, highlighting their significant impact on model predictions. The SHAP-GAN network has demonstrated an impressive accuracy rate of 99.34% on the ovarian cancer dataset, significantly outperforming baseline algorithms, including Support Vector Machines (SVM), Logistic Regression (LR), and XGBoost. Specifically, SVM achieved an accuracy of 72.78%, LR achieved 86.09%, and XGBoost achieved 96.69%. These results highlight the superior performance of SHAP-GAN in handling high-dimensional and imbalanced datasets. Furthermore, SHAP-GAN significantly alleviates the challenges associated with intricate genetic data analysis, empowering medical professionals to tailor personalized treatment strategies for individual patients. Full article
Show Figures

Figure 1

Figure 1
<p>The distribution of ovarian cancer data.</p>
Full article ">Figure 2
<p>Pearson correlation heatmap of features in the ovarian cancer dataset.</p>
Full article ">Figure 3
<p>Principal component analysis (PCA) distribution plot for ovarian cancer data.</p>
Full article ">Figure 4
<p>The basic architecture of the ACGAN.</p>
Full article ">Figure 5
<p>The flowchart of the proposed method.</p>
Full article ">Figure 6
<p>The architecture of the SHAP-GAN network.</p>
Full article ">Figure 7
<p>Distribution of ovarian cancer samples after data augmentation.</p>
Full article ">Figure 8
<p>The relationship between model performance and the number of selected features.</p>
Full article ">Figure 9
<p>Shapley values of selected features.</p>
Full article ">Figure 10
<p>The confusion matrix for SVM model performance.</p>
Full article ">Figure 11
<p>The confusion matrix for LR model performance.</p>
Full article ">Figure 12
<p>The confusion matrix for XGBoost model performance.</p>
Full article ">Figure 13
<p>The confusion matrix for the proposed SHAP-GAN network performance.</p>
Full article ">Figure 14
<p>The ROC curve for the proposed SHAP-GAN network performance.</p>
Full article ">
15 pages, 1166 KiB  
Article
Combining Environmental Variables and Machine Learning Methods to Determine the Most Significant Factors Influencing Honey Production
by Johanna Ramirez-Diaz, Arianna Manunza, Tiago Almeida de Oliveira, Tania Bobbo, Francesco Nutini, Mirco Boschetti, Maria Grazia De Iorio, Giulio Pagnacco, Michele Polli, Alessandra Stella and Giulietta Minozzi
Insects 2025, 16(3), 278; https://doi.org/10.3390/insects16030278 - 6 Mar 2025
Abstract
Bees are crucial for food production and biodiversity. However, extreme weather variation and harsh winters are the leading causes of colony losses and low honey yields. This study aimed to identify the most important features and predict Total Honey Harvest (THH) by combining [...] Read more.
Bees are crucial for food production and biodiversity. However, extreme weather variation and harsh winters are the leading causes of colony losses and low honey yields. This study aimed to identify the most important features and predict Total Honey Harvest (THH) by combining machine learning (ML) methods with climatic conditions and environmental factors recorded from the winter before and during the harvest season. The initial dataset included 598 THH records collected from five apiaries in Lombardy (Italy) during spring and summer from 2015 to 2019. Colonies were classified into medium-low or high production using the 75th percentile as a threshold. A total of 38 features related to temperature, humidity, precipitation, pressure, wind, and enhanced vegetation index–EVI were used. Three ML models were trained: Decision Tree, Random Forest, and Extreme Gradient Boosting (XGBoost). Model performance was evaluated using accuracy, sensitivity, specificity, precision, and area under the ROC curve (AUC). All models reached a prediction accuracy greater than 0.75 both in the training and in the testing sets. Results indicate that winter climatic conditions are important predictors of THH. Understanding the impact of climate can help beekeepers in developing strategies to prevent colony decline and low production. Full article
(This article belongs to the Section Social Insects)
Show Figures

Figure 1

Figure 1
<p>Flowchart illustrating the steps in the full ML pipeline, including pre-processing, feature selection, and model building.</p>
Full article ">Figure 2
<p>Feature importance based on Random Forest. The top 10 important features for total honey harvest prediction are reported. PS2: Average of surface pressure at the surface of the earth (kPa) in February; WS2M_1: Mean wind speed (<span class="html-italic">m</span>/<span class="html-italic">s</span>) at 2 m in January; T2M_2: Mean temperature (°C) at 2 m in February; WS2M_MAX_12: Maximum wind speed (<span class="html-italic">m</span>/<span class="html-italic">s</span>) at 2 m in December; PS_1: Average of surface pressure at the surface of the earth (kPa)in January; T2M_MAX_2: Maximum temperature (°C) at 2 m in February (2); WS2M_MIN_2: Minimum wind speed (<span class="html-italic">m</span>/<span class="html-italic">s</span>) at 2 m in February (2); RH2M_2: Mean relatives humidity (%) at 2 m in February; EVI3: Enhanced Vegetation Index in March; QV2M_1: Mean specific humidity (g/kg) at 2 m in January (1).</p>
Full article ">Figure 3
<p>Prediction density plots for the test set showing the distribution of predicted probabilities for high and low honey production levels across the tree algorithms Decision Trees, Random Forest and Extreme Gradient Boosting.</p>
Full article ">
23 pages, 5525 KiB  
Article
Automatic Identification and Segmentation of Overlapping Fog Droplets Using XGBoost and Image Segmentation
by Dongde Liao, Xiongfei Chen, Muhua Liu, Yihan Zhou, Peng Fang, Jinlong Lin, Zhaopeng Liu and Xiao Wang
Appl. Sci. 2025, 15(5), 2847; https://doi.org/10.3390/app15052847 - 6 Mar 2025
Viewed by 129
Abstract
Water-sensitive paper (WSP) has been widely used to assess the quality of pesticide sprays. However, fog droplets tend to overlap on WSP. In order to accurately measure the droplet size and grasp the droplet distribution pattern, this study proposes a method based on [...] Read more.
Water-sensitive paper (WSP) has been widely used to assess the quality of pesticide sprays. However, fog droplets tend to overlap on WSP. In order to accurately measure the droplet size and grasp the droplet distribution pattern, this study proposes a method based on the optimized XGBoost classification model combined with improved concave-point matching to achieve multi-level overlapping-droplet segmentation. For different types of overlapping droplets, the corresponding improved segmentation algorithm is used to improve the segmentation accuracy. For parallel overlapping droplets, the centre-of-mass segmentation method is used; for non-parallel overlapping droplets, the minimum-distance segmentation method is used; and for strong overlapping of a single concave point, the vertical-linkage segmentation method is used. Complex overlapping droplets were gradually segmented by loop iteration until a single droplet was obtained or no further segmentation was possible, and then ellipse fitting was used to obtain the final single-droplet profile. Up to 105 WSPs were obtained in an orchard field through drone spraying experiments, and were used to validate the effectiveness of the method. The experimental results show that the classification model proposed in this paper achieves an average accuracy of 98% in identifying overlapping-droplet types, which effectively meets the needs of subsequent segmentation. The overall segmentation accuracy of the method is 91.35%, which is significantly better than the contour-solidity and watershed-based algorithm (76.19%) and the improved-concave-point-segmentation algorithm (68.82%). In conclusion, the method proposed in this paper provides an efficient and accurate new approach for pesticide spraying quality assessment. Full article
(This article belongs to the Special Issue Advances in Image Recognition and Processing Technologies)
Show Figures

Figure 1

Figure 1
<p>Overall flow chart.</p>
Full article ">Figure 2
<p>Test sample collection. The UAV was operated at a flight speed of 3–5 m/s and at a flight height of 1.5–3 m above the crop canopy. WSPs were placed on the leaves, and WSPs were placed on the sampling plane facing the direction of the UAV spray at each point.</p>
Full article ">Figure 3
<p>WSP image preprocessing: (<b>a</b>) was the original WSP’s image captured from the orchard and (<b>b</b>) was the denoising, binarisation, and filling of the original WSP’s image.</p>
Full article ">Figure 4
<p>Schematic diagram of droplet overlapping types. (<b>a</b>) shows a single concave strongly overlapping droplet (SC), (<b>b</b>) shows a parallel overlapping droplet (PA), and (<b>c</b>) shows a non-parallel overlapping droplet (NP).</p>
Full article ">Figure 5
<p>Overlapping-droplet-splitting flow chart.</p>
Full article ">Figure 6
<p>Pre-segmentation of overlapping images.</p>
Full article ">Figure 7
<p>Segmentation of strongly overlapping droplets in a single concave point. Where (<b>a</b>) is Binary Image Extraction; (<b>b</b>) shows Concave Defect Detection&amp; Key Points Localization; (<b>c</b>) shows Key Points Connection &amp; Slope Calculation; (<b>d</b>) shows Baseline L Generation &amp; Intersection Detection; (<b>e</b>) is Perpendicular Segmentation Line AD Identification.</p>
Full article ">Figure 8
<p>Segmentation process for non-parallel overlapping droplet. Where (<b>a</b>) is Binary Image Extraction; (<b>b</b>) shows Convex Hull Construction; (<b>c</b>) shows Concave Points Filtering &amp; Candidate Selection; (<b>d</b>) shows Non-Parallel Segmentation Line Generation.</p>
Full article ">Figure 9
<p>Segmentation process for parallel overlapping droplets. Where (<b>a</b>) is Binary Image Extraction; (<b>b</b>) shows Convex Hull Construction; (<b>c</b>) shows Concave Points Detection; (<b>d</b>) shows Centre of mass calculation; (<b>e</b>) is Parallel Segmentation Line Generation.</p>
Full article ">Figure 10
<p>Feature Analysis and Optimization Algorithm Performance Evaluation. (<b>a</b>) Feature Importance Ranking (XGBoost-based); (<b>b</b>) Bayesian Iterative Optimization Process.</p>
Full article ">Figure 11
<p>Confusion matrix based on improved XGBoost experimental results.</p>
Full article ">Figure 12
<p>Changes in performance metrics of XGBoost model before and after optimisation.</p>
Full article ">Figure 13
<p>Segmentation effect of different types of overlapping droplets. Where (<b>a</b>) is the original overlapping droplet on WSP; (<b>b</b>) shows the segmentation results of the proposed method in this paper; (<b>c</b>) presents the result of segmentation based on contour solidity and watershed algorithm; and (<b>d</b>) shows the segmentation results using the classic concave-point-matching algorithm.</p>
Full article ">Figure 14
<p>Single WSP overlapping-droplet segmentation results. Where (<b>a</b>) is the original overlapping-droplet layer extracted from the sample and (<b>b</b>) shows the segmentation results of the manual calibration.</p>
Full article ">
17 pages, 954 KiB  
Article
Leveraging Explainable Artificial Intelligence in Solar Photovoltaic Mappings: Model Explanations and Feature Selection
by Eduardo Gomes, Augusto Esteves, Hugo Morais and Lucas Pereira
Energies 2025, 18(5), 1282; https://doi.org/10.3390/en18051282 - 5 Mar 2025
Viewed by 178
Abstract
This work explores the effectiveness of explainable artificial intelligence in mapping solar photovoltaic power outputs based on weather data, focusing on short-term mappings. We analyzed the impact values provided by the Shapley additive explanation method when applied to two algorithms designed for tabular [...] Read more.
This work explores the effectiveness of explainable artificial intelligence in mapping solar photovoltaic power outputs based on weather data, focusing on short-term mappings. We analyzed the impact values provided by the Shapley additive explanation method when applied to two algorithms designed for tabular data—XGBoost and TabNet—and conducted a comprehensive evaluation of the overall model and across seasons. Our findings revealed that the impact of selected features remained relatively consistent throughout the year, underscoring their uniformity across seasons. Additionally, we propose a feature selection methodology utilizing the explanation values to produce more efficient models, by reducing data requirements while maintaining performance within a threshold of the original model. The effectiveness of the proposed methodology was demonstrated through its application to a residential dataset in Madeira, Portugal, augmented with weather data sourced from SolCast. Full article
(This article belongs to the Topic Smart Energy Systems, 2nd Edition)
Show Figures

Figure 1

Figure 1
<p>Proposed methodology for explaining PV production mappings using SHAP values.</p>
Full article ">Figure 2
<p>Proposed methodology for feature selection using SHAP values.</p>
Full article ">Figure 3
<p>Examples of domain and exogenous features for a period of 24 h.</p>
Full article ">Figure 4
<p>XGBoost and TabNet overall SHAP impact values. Each point represents an individual training example, with its color indicating the magnitude of a specific feature’s value. The horizontal position of each point reflects the impact of that feature on the model’s output.</p>
Full article ">Figure 5
<p>Model performances on a summer day for the testing set (4 August 2020).</p>
Full article ">
19 pages, 5256 KiB  
Article
Comparison of Machine Learning Models for Real-Time Flow Forecasting in the Semi-Arid Bouregreg Basin
by Fatima Zehrae Elhallaoui Oueldkaddour, Fatima Wariaghli, Hassane Brirhet, Ahmed Yahyaoui and Hassane Jaziri
Limnol. Rev. 2025, 25(1), 6; https://doi.org/10.3390/limnolrev25010006 (registering DOI) - 5 Mar 2025
Viewed by 59
Abstract
Morocco is geographically located between two distinct climatic zones: temperate in the north and tropical in the south. This situation is the reason for the temporal and spatial variability of the Moroccan climate. In recent years, the increasing scarcity of water resources, exacerbated [...] Read more.
Morocco is geographically located between two distinct climatic zones: temperate in the north and tropical in the south. This situation is the reason for the temporal and spatial variability of the Moroccan climate. In recent years, the increasing scarcity of water resources, exacerbated by climate change, has underscored the critical role of dams as essential water reservoirs. These dams serve multiple purposes, including flood management, hydropower generation, irrigation, and drinking water supply. Accurate estimation of reservoir flow rates is vital for effective water resource management, particularly in the context of climate variability. The prediction of monthly runoff time series is a key component of water resources planning and development projects. In this study, we employ Machine Learning (ML) techniques—specifically, Random Forest (RF), Support Vector Regression (SVR), and XGBoost—to predict monthly river flows in the Bouregreg basin, using data collected from the Sidi Mohamed Ben Abdellah (SMBA) Dam between 2010 and 2020. The primary objective of this paper is to comparatively evaluate the applicability of these three ML models for flow forecasting in the Bouregreg River. The models’ performance was assessed using three key criteria: the correlation coefficient (R2), Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC). The results demonstrate that the SVR model outperformed the RF and XGBoost models, achieving high accuracy in flow prediction. These findings are highly encouraging and highlight the potential of machine learning approaches for hydrological forecasting in semi-arid regions. Notably, the models used in this study are less data-intensive compared to traditional methods, addressing a significant challenge in hydrological modeling. This research opens new avenues for the application of ML techniques in water resource management and suggests that these methods could be generalized to other basins in Morocco, promoting efficient, effective, and integrated water resource management strategies. Full article
Show Figures

Figure 1

Figure 1
<p>DEM map of Bouregreg Watershed.</p>
Full article ">Figure 2
<p>SMBA Dam on the Bouregreg Watershed [<a href="#B10-limnolrev-25-00006" class="html-bibr">10</a>].</p>
Full article ">Figure 3
<p>Hydrographic map of Bouregreg Watershed.</p>
Full article ">Figure 4
<p>Precipitation variability at SMBA Dam station (2010 to 2020).</p>
Full article ">Figure 5
<p>Inflow variability at SMBA Dam station (2010 to 2020).</p>
Full article ">Figure 6
<p>Flow chart for the development of machine learning models.</p>
Full article ">Figure 7
<p>Random Forest Model [<a href="#B23-limnolrev-25-00006" class="html-bibr">23</a>].</p>
Full article ">Figure 8
<p>Structure of the SVR model [<a href="#B28-limnolrev-25-00006" class="html-bibr">28</a>] (ξ = Slack variable to denote the deviation from the point to the positive edge of the hyperplane, ξ* = Slack variable to denote the deviation from the point to the negative edge of the hyperplane).</p>
Full article ">Figure 9
<p>XGBoost flowchart [<a href="#B37-limnolrev-25-00006" class="html-bibr">37</a>].</p>
Full article ">Figure 10
<p>Monthly comparison of observed and predicted data from 2010 to 2020 with RF.</p>
Full article ">Figure 11
<p>Monthly comparison of observed and predicted data from 2010 to 2020 with SVR.</p>
Full article ">Figure 12
<p>Monthly comparison of observed and predicted data from 2010 to 2020 with XGBoost.</p>
Full article ">
27 pages, 2950 KiB  
Article
Enhancing Nickel Matte Grade Prediction Using SMOTE-Based Data Augmentation and Stacking Ensemble Learning for Limited Dataset
by Jehyeung Yoo
Processes 2025, 13(3), 754; https://doi.org/10.3390/pr13030754 - 5 Mar 2025
Viewed by 132
Abstract
To address the limited data availability and low predictive accuracy of nickel matte grade models in the early stages of facility operation, this study introduces a unique stepwise prediction methodology that integrates data augmentation and ensemble learning, specifically tailored for limited industrial datasets. [...] Read more.
To address the limited data availability and low predictive accuracy of nickel matte grade models in the early stages of facility operation, this study introduces a unique stepwise prediction methodology that integrates data augmentation and ensemble learning, specifically tailored for limited industrial datasets. Predicting matte nickel grade accurately is critical for nickel sulfate production, a key precursor in cathode manufacturing. However, in newly adopted facilities, operational data are scarce, posing a major challenge for conventional machine learning models that require large, well-balanced datasets to generalize effectively. Moreover, the nonlinear dependencies between raw material composition, operational conditions, and metallurgical reactions further complicate the prediction task, often leading to high errors in traditional regression models. To overcome these challenges, this study introduces an innovative approach that integrates feature engineering, Gaussian noise augmentation, SMOTE regression, and a stacking ensemble model, using XGBoost (2.0.3) and CatBoost (1.2.7). First, input variables were refined through feature engineering, followed by data augmentation to enhance dataset diversity and improve model robustness. Next, a stacking ensemble framework was implemented to mitigate overfitting and enhance predictive accuracy. Finally, SHAP, an XAI technique that quantifies the impact of each input variable on the model’s predictions based on cooperative game theory, was employed to interpret key process variables, offering deeper insights into the factors influencing nickel grade. The experimental results demonstrate a substantial improvement in prediction accuracy, with the R2 coefficient increasing from 0.3050 to 0.9245, alongside significant reductions in RMSE, MAE, and MAPE. The proposed methodology not only enhances predictive performance in data-scarce industrial environments but also provides an interpretable framework for real-world process optimization. These findings validate its applicability to nickel matte operations, offering a scalable and explainable machine learning approach for metallurgical industries with limited data availability. Full article
(This article belongs to the Section Materials Processes)
Show Figures

Figure 1

Figure 1
<p>Overall framework of the matte grade prediction model with data augmentation and stacking ensemble model.</p>
Full article ">Figure 2
<p>Probability density curve of the real data and the augmented data.</p>
Full article ">Figure 3
<p>Stacking ensemble model workflow.</p>
Full article ">Figure 4
<p>Matte nickel prediction results of stacked ensemble model with data augmentation.</p>
Full article ">Figure 5
<p>Validation of stacked ensemble model.</p>
Full article ">Figure 6
<p>Mean of SHAP values.</p>
Full article ">Figure 7
<p>SHAP values with different variables through violin plot.</p>
Full article ">
25 pages, 5717 KiB  
Article
Risk Assessment of Extreme Drought and Extreme Wetness During Growth Stages of Major Crops in China
by Mingyang Sun, Yongjiu Dai, Shulei Zhang and Hongbin Liang
Sustainability 2025, 17(5), 2221; https://doi.org/10.3390/su17052221 (registering DOI) - 4 Mar 2025
Viewed by 111
Abstract
Climate change has increased the frequency of extreme droughts and floods in China, threatening agricultural production and food security. However, the impacts of these extreme precipitation events on crops (maize, wheat, and rice) during key growth stages remain poorly understood. To address this, [...] Read more.
Climate change has increased the frequency of extreme droughts and floods in China, threatening agricultural production and food security. However, the impacts of these extreme precipitation events on crops (maize, wheat, and rice) during key growth stages remain poorly understood. To address this, we developed a three-step analytical framework: First, we used transpiration data to identify critical crop growth stages across China. Then, we applied a 10-day standardized precipitation evapotranspiration index (SPEI) to quantify drought and extreme wetness conditions during each growth phase. Finally, we integrated these data into an XGBoost model to assess the relationship between extreme weather and crop yield fluctuations. The results show that maize is most sensitive to water variability during both development and mid-season stages, while wheat is particularly vulnerable to drought during development and rice is mainly affected by water stress during the mid-season. Extreme drought risks are highest in the Northeast Plain, North China Plain, and southern China, while extreme wetness risks are concentrated in the middle and lower Yangtze River basin and southeastern coastal regions. Notably, extreme drought risks are significantly more pronounced than those associated with extreme wetness. These findings highlight the urgent need for targeted agricultural strategies to promote sustainable agricultural development. Full article
(This article belongs to the Section Air, Climate Change and Sustainability)
Show Figures

Figure 1

Figure 1
<p>Climate types and distribution of major crop cultivation in China. (<b>a</b>) Distribution of climate types; (<b>b</b>) maize; (<b>c</b>) wheat; (<b>d</b>) rice. (first letter—A: tropical, B: dry, C: mid-temperate, D: snow, E: polar; second letter—f: humid, m: monsoon, s: dry summer, w: dry winter, W: desert, S: steppe, T: tundra, F: frost; third letter—h: hot arid, k: cold arid, a: hot summer, b: warm summer, c: cool summer, d: cold summer).</p>
Full article ">Figure 2
<p>Flowchart for identifying crop growth stages and assessing extreme drought and extreme wetness risks using multisource data.</p>
Full article ">Figure 3
<p>Transpiration dynamics and mid-season stage division of major crops at different latitudes ((<b>a</b>) single-cropping maize; (<b>b</b>) double-cropping summer maize; (<b>c</b>) double-cropping winter wheat; (<b>d</b>) single-cropping rice).</p>
Full article ">Figure 4
<p>Spatial distribution of the onset and end of the mid-season stage for major crops in China ((<b>a</b>) onset of the mid-season stage for maize; (<b>b</b>) end of the mid-season stage for maize; (<b>c</b>) onset of the mid-season stage for wheat; (<b>d</b>) end of the mid-season stage for wheat; (<b>e</b>) onset of the mid-season stage for rice; (<b>f</b>) end of the mid-season stage for rice).</p>
Full article ">Figure 5
<p>Spatial distribution of the silhouette scores and geographic clustering for major crops in China ((<b>a</b>) silhouette score for maize; (<b>b</b>) geographic clustering distribution for maize; (<b>c</b>) silhouette score for wheat; (<b>d</b>) geographic clustering distribution for wheat; (<b>e</b>) silhouette score for rice; (<b>f</b>) geographic clustering distribution for rice).</p>
Full article ">Figure 6
<p>Residual scatter plots of the observed and predicted values from the XGBoost model ((<b>a</b>) residual plot for maize; (<b>b</b>) residual plot for wheat; (<b>c</b>) residual plot for rice).</p>
Full article ">Figure 7
<p>Partial dependence plots showing the effects of extreme drought (<b>A</b>) and extreme wetness (<b>B</b>) on different crop growth stages.</p>
Full article ">Figure 8
<p>Impacts of extreme drought and extreme wetness on crop yield across different growth stages ((<b>a</b>) maize under dry conditions; (<b>b</b>) maize under wet conditions; (<b>c</b>) wheat under dry conditions; (<b>d</b>) wheat under wet conditions; (<b>e</b>) rice under dry conditions; (<b>f</b>) rice under wet conditions).</p>
Full article ">Figure 9
<p>Spatial distributions of extreme drought and extreme wetness risks for major crops in China ((<b>a</b>) maize extreme drought risk; (<b>b</b>) maize extreme wetness risk; (<b>c</b>) wheat extreme drought risk; (<b>d</b>) wheat extreme wetness risk; (<b>e</b>) rice extreme drought risk; (<b>f</b>) rice extreme wetness risk).</p>
Full article ">
23 pages, 10500 KiB  
Article
Advanced Default Risk Prediction in Small and Medum-Sized Enterprises Using Large Language Models
by Haonan Huang, Jing Li, Chundan Zheng, Sikang Chen, Xuanyin Wang and Xingyan Chen
Appl. Sci. 2025, 15(5), 2733; https://doi.org/10.3390/app15052733 - 4 Mar 2025
Viewed by 120
Abstract
Predicting default risk in commercial bills for small and medium-sized enterprises (SMEs) is crucial, as these enterprises represent one of the largest components of a nation’s economic structure, and their financial stability can impact systemic financial risk. However, data on the commercial bills [...] Read more.
Predicting default risk in commercial bills for small and medium-sized enterprises (SMEs) is crucial, as these enterprises represent one of the largest components of a nation’s economic structure, and their financial stability can impact systemic financial risk. However, data on the commercial bills of SMEs are scarce and challenging to gather, which has impeded research on risk prediction for these businesses. This study aims to address this gap by leveraging 38 multi-dimensional, non-financial features collected from 1972 real SMEs in China to predict bill default risk. We identified the most influential factors among these 38 features and introduced a novel prompt-based learning framework using large language models for risk assessment, benchmarking against seven mainstream machine learning algorithms. In the experiments, the XGBoost algorithm achieved the best performance on the Z-Score standardized dataset, with an accuracy of 81.42% and an F1 score of 80%. Additionally, we tested both the standard and fine-tuned versions of the large language model, which yielded accuracies of 75% and 82.1%, respectively. These results indicate that the proposed framework has significant potential for predicting risks in SMEs and offers new insights for related research. Full article
(This article belongs to the Special Issue Natural Language Processing (NLP) and Applications—2nd Edition)
Show Figures

Figure 1

Figure 1
<p>Commercial paper default research framework.</p>
Full article ">Figure 2
<p>Heatmap of the raw data dimensions without standardization.</p>
Full article ">Figure 3
<p>Heatmap of the data dimensions after standardization.</p>
Full article ">Figure 4
<p>Template for sentiment analysis of news articles.</p>
Full article ">Figure 5
<p>ROC curves of the optimal performance of the model under different standardizations.The yellow curve (ROC curve) represents the classification performance of the model, illustrating the relationship between the True Positive Rate (TPR) and the False Positive Rate (FPR) at various thresholds. A curve closer to the top-left corner indicates better performance. The blue dashed line represents the baseline of a random classifier, which assumes the model has no classification ability and makes random predictions.</p>
Full article ">Figure 5 Cont.
<p>ROC curves of the optimal performance of the model under different standardizations.The yellow curve (ROC curve) represents the classification performance of the model, illustrating the relationship between the True Positive Rate (TPR) and the False Positive Rate (FPR) at various thresholds. A curve closer to the top-left corner indicates better performance. The blue dashed line represents the baseline of a random classifier, which assumes the model has no classification ability and makes random predictions.</p>
Full article ">Figure 6
<p>Template for default risk prediction.</p>
Full article ">Figure 7
<p>Specific process of LLM fine-tuning.</p>
Full article ">
23 pages, 571 KiB  
Article
Primary Determinants and Strategic Implications for Customer Loyalty in Pet-Related Vertical E-Commerce: A Machine Learning Approach
by YongHyun Lee, Kwangtek Na, Jungwook Rhim and Eunchan Kim
Systems 2025, 13(3), 175; https://doi.org/10.3390/systems13030175 - 4 Mar 2025
Viewed by 104
Abstract
In the contemporary and dynamic business landscape, the establishment of a loyal customer base is a fundamental imperative for long-term organizational viability. This research undertakes a comprehensive exploration into the formation of customer loyalty within the niche of pet-related vertical e-commerce, focusing on [...] Read more.
In the contemporary and dynamic business landscape, the establishment of a loyal customer base is a fundamental imperative for long-term organizational viability. This research undertakes a comprehensive exploration into the formation of customer loyalty within the niche of pet-related vertical e-commerce, focusing on South Korea, and leverages advanced machine learning methodologies. We identify key factors that significantly impact customer loyalty development using various machine learning models, including logistic regression analysis, decision trees, support vector machines, random forests, and XGBoost. Our empirical study shows that encouraging customer transactions plays a crucial and transformative role in building loyalty regardless of the day of the week. Furthermore, the strategic promotion of mobile application notifications and the active encouragement of customer participation through product reviews are indispensable strategies for strengthening and solidifying customer loyalty. These findings have crucial implications not only for enterprises within the pet-related e-commerce sector but also for the broader e-commerce domain. We hereby propose a methodology to identify loyal customers and systematically analyze the key factors that influence their formation using machine learning in the vertical e-commerce pet industry. Full article
Show Figures

Figure 1

Figure 1
<p>The process of the consumer decision journey.</p>
Full article ">Figure 2
<p>Visual example of the RFM model.</p>
Full article ">Figure 3
<p>Flowchart for pet-related vertical e-commerce loyal customer prediction and analysis.</p>
Full article ">Figure 4
<p>(<b>a</b>) Distribution comparison of purchase records by date of purchase between loyal and non-loyal customers. (<b>b</b>) Comparison of purchase records by mobile notification agreement.</p>
Full article ">Figure 5
<p>Comparison of the number of reviews between loyal and non-loyal customers.</p>
Full article ">
18 pages, 6652 KiB  
Article
Tensile Strength Predictive Modeling of Natural-Fiber-Reinforced Recycled Aggregate Concrete Using Explainable Gradient Boosting Models
by Celal Cakiroglu, Farnaz Ahadian, Gebrail Bekdaş and Zong Woo Geem
J. Compos. Sci. 2025, 9(3), 119; https://doi.org/10.3390/jcs9030119 - 4 Mar 2025
Viewed by 70
Abstract
Natural fiber composites have gained significant attention in recent years due to their environmental benefits and unique mechanical properties. These materials combine natural fibers with polymer matrices to create sustainable alternatives to traditional synthetic composites. In addition to natural fiber reinforcement, the usage [...] Read more.
Natural fiber composites have gained significant attention in recent years due to their environmental benefits and unique mechanical properties. These materials combine natural fibers with polymer matrices to create sustainable alternatives to traditional synthetic composites. In addition to natural fiber reinforcement, the usage of recycled aggregates in concrete has been proposed as a remedy to combat the rapidly increasing amount of construction and demolition waste in recent years. However, the accurate prediction of the structural performance metrics, such as tensile strength, remains a challenge for concrete composites reinforced with natural fibers and containing recycled aggregates. This study aims to develop predictive models of natural-fiber-reinforced recycled aggregate concrete based on experimental results collected from the literature. The models have been trained on a dataset consisting of 482 data points. Each data point consists of the amounts of cement, fine and coarse aggregate, water-to-binder ratio, percentages of recycled coarse aggregate and natural fiber, and the fiber length. The output feature of the dataset is the splitting tensile strength of the concrete. Extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM) and extra trees regressor models were trained to predict the tensile strength of the specimens. For optimum performance, the hyperparameters of these models were optimized using the blended search strategy (BlendSearch) and cost-related frugal optimization (CFO). The tensile strength could be predicted with a coefficient of determination greater than 0.95 by the XGBoost model. To make the predictive models accessible, an online graphical user interface was also made available on the Streamlit platform. A feature importance analysis was carried out using the Shapley additive explanations (SHAP) approach. Full article
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) Coir [<a href="#B30-jcs-09-00119" class="html-bibr">30</a>], (<b>b</b>) ramie [<a href="#B16-jcs-09-00119" class="html-bibr">16</a>], (<b>c</b>) jute [<a href="#B31-jcs-09-00119" class="html-bibr">31</a>] fibers.</p>
Full article ">Figure 2
<p>Distribution of the input and output features.</p>
Full article ">Figure 3
<p>Parallel coordinates’ plot of the dataset.</p>
Full article ">Figure 4
<p>Isolation of data points.</p>
Full article ">Figure 5
<p>Predictive model development and interpretation.</p>
Full article ">Figure 6
<p>Explained variance ratios.</p>
Full article ">Figure 7
<p>Outliers for (<b>a</b>) contamination = 0.1, (<b>b</b>) contamination = 0.06, (<b>c</b>) contamination = 0.02, (<b>d</b>) contamination = 0.01.</p>
Full article ">Figure 8
<p>Model performances with respect to contamination.</p>
Full article ">Figure 9
<p>Extra trees model performance fluctuations on the test set.</p>
Full article ">Figure 10
<p>Hyperparameter optimization steps.</p>
Full article ">Figure 11
<p>Predicted and true values for (<b>a</b>) extra trees, (<b>b</b>) LightGBM, (<b>c</b>) XGBoost.</p>
Full article ">Figure 12
<p>Online graphical user interface.</p>
Full article ">Figure 13
<p>SHAP feature importances.</p>
Full article ">Figure 14
<p>SHAP summary plot.</p>
Full article ">Figure 15
<p>SHAP heatmap plot.</p>
Full article ">
26 pages, 11713 KiB  
Article
Assessing and Forecasting Natural Regeneration in Mediterranean Landscapes After Wildfires
by Paraskevi Oikonomou, Vassilia Karathanassi, Vassilis Andronis and Ioannis Papoutsis
Remote Sens. 2025, 17(5), 897; https://doi.org/10.3390/rs17050897 - 4 Mar 2025
Viewed by 240
Abstract
Forest ecosystems in the Mediterranean basin are significantly affected by summer wildfires. Drought, extreme temperatures, and strong winds increase the fire risk in Greece. This study explores the potential of NDVI for assessing and forecasting post-fire regeneration in burnt areas of the Peloponnese [...] Read more.
Forest ecosystems in the Mediterranean basin are significantly affected by summer wildfires. Drought, extreme temperatures, and strong winds increase the fire risk in Greece. This study explores the potential of NDVI for assessing and forecasting post-fire regeneration in burnt areas of the Peloponnese (2007) and Evros (2011). NDVI data from Landsat 7 and 9 were analyzed to identify the stages of the regeneration process and the dominant vegetation species at each stage. Comparing pre-fire and post-fire values highlighted the recovery rate, while the trendline slope indicated the regeneration rate. This combined analysis forms a methodology that allows drawing conclusions about the vegetation type that prevails after the fire. Validation was conducted using photointerpretation techniques and CORINE land cover data. The findings suggest that sclerophyllous species regenerate faster, while fir forests recover slowly and may be replaced by sclerophylls. To predict vegetation regrowth, two time series models (ARMA, VARIMA) and two machine learning-based ones (random forest, XGBoost) were tested. Their performance was evaluated by comparing the predicted and actual numerical values, calculating error metrics (RMSE, MAPE), and analyzing how the predicted patterns align with the observed ones. The results showed the overperformance of multivariate models and the need to introduce additional variables, such as soil characteristics and the effect of climate change on weather parameters, to improve predictions. Full article
Show Figures

Figure 1

Figure 1
<p>Pre-fire land cover map of the Peloponnese and Evros, demonstrating the burnt areas.</p>
Full article ">Figure 2
<p>Creation of time series data and delineation of the study areas.</p>
Full article ">Figure 3
<p>Workflow for monitoring post-fire vegetation regeneration and predicting regrowth.</p>
Full article ">Figure 4
<p>Comparison of the fire extent of 2023 and 2011 in Evros (28 August 2023)—RGB (12, 8, 4); red: burnt area, green: healthy vegetation, black: water bodies.</p>
Full article ">Figure 5
<p>Post-fire NDVI time series of October for study areas, categorized by stages of natural regeneration for herbaceous and woody species.</p>
Full article ">Figure 6
<p>NDVI time series of October after the first 5 years after the fire across all study areas. Solid line: NDVI values, dashed line: trendline, square symbol: pre-fire value.</p>
Full article ">Figure 7
<p>Sclerophyllous vegetation regeneration during summer in South. Ilia (September 2003, September 2010, August 2023).</p>
Full article ">Figure 8
<p>Sclerophyllous vegetation regeneration during fall in Arkadia (September 2003, October 2012, May 2020).</p>
Full article ">Figure 9
<p>Coniferous vegetation (fir) regeneration during spring in Messinia (October 2003, September 2013, April 2017).</p>
Full article ">Figure 10
<p>Coniferous vegetation (fir) regeneration during fall in Lakonia (July 2007, October 2013, October 2022).</p>
Full article ">Figure 11
<p>Sclerophyllous vegetation regeneration during fall in Evros (October 2002, September 2018, October 2022).</p>
Full article ">Figure 12
<p>Temperature and precipitation for October across all study areas after the first 5 post-fire years.</p>
Full article ">Figure 12 Cont.
<p>Temperature and precipitation for October across all study areas after the first 5 post-fire years.</p>
Full article ">Figure 13
<p>October NDVI time series (2012–2022) and predicted values (2023–2025) for all study areas of the Peloponnese.</p>
Full article ">
16 pages, 8656 KiB  
Article
What Is the Predictive Capacity of Sesamum indicum L. Bioparameters Using Machine Learning with Red–Green–Blue (RGB) Images?
by Edimir Xavier Leal Ferraz, Alan Cezar Bezerra, Raquele Mendes de Lira, Elizeu Matos da Cruz Filho, Wagner Martins dos Santos, Henrique Fonseca Elias de Oliveira, Josef Augusto Oberdan Souza Silva, Marcos Vinícius da Silva, José Raliuson Inácio da Silva, Jhon Lennon Bezerra da Silva, Antônio Henrique Cardoso do Nascimento, Thieres George Freire da Silva and Ênio Farias de França e Silva
AgriEngineering 2025, 7(3), 64; https://doi.org/10.3390/agriengineering7030064 - 3 Mar 2025
Viewed by 143
Abstract
The application of machine learning techniques to determine bioparameters, such as the leaf area index (LAI) and chlorophyll content, has shown significant potential, particularly with the use of unmanned aerial vehicles (UAVs). This study evaluated the use of RGB images obtained from UAVs [...] Read more.
The application of machine learning techniques to determine bioparameters, such as the leaf area index (LAI) and chlorophyll content, has shown significant potential, particularly with the use of unmanned aerial vehicles (UAVs). This study evaluated the use of RGB images obtained from UAVs to estimate bioparameters in sesame crops, utilizing machine learning techniques and data selection methods. The experiment was conducted at the Federal Rural University of Pernambuco and involved using a portable AccuPAR ceptometer to measure the LAI and spectrophotometry to determine photosynthetic pigments. Field images were captured using a DJI Mavic 2 Enterprise Dual remotely piloted aircraft equipped with RGB and thermal cameras. To manage the high dimensionality of the data, CRITIC and Pearson correlation methods were applied to select the most relevant indices for the XGBoost model. The data were divided into training, testing, and validation sets to ensure model generalization, with performance assessed using the R2, MAE, and RMSE metrics. XGBoost effectively estimated the LAI, chlorophyll a, total chlorophyll, and carotenoids (R2 > 0.7) but had limited performance for chlorophyll b. Pearson correlation was found to be the most effective data selection method for the algorithm. Full article
Show Figures

Figure 1

Figure 1
<p>Spatial location of the experimental area, Serra Talhada, Pernambuco, Brazil.</p>
Full article ">Figure 2
<p>Details of the Mavic 2 Enterprise Dual: (<b>A</b>) General view of the complete equipment, and detailed view of the integrated RGB and thermal sensors (<b>B</b>).</p>
Full article ">Figure 3
<p>Plots delineated by the shapefile layer (<b>A</b>), soil removal (<b>B</b>), and applied vegetation index (<b>C</b>).</p>
Full article ">Figure 4
<p>Correlation analysis of vegetation indices and leaf area index (<b>A</b>); application of the CRITIC method to vegetation indices (<b>B</b>). The asterisk (*) represents a statistically significant difference (<span class="html-italic">p</span> &lt; 0.05).</p>
Full article ">Figure 5
<p>Correlation analysis of vegetation indices and photosynthetic pigments (<b>A</b>); and application of the CRITIC method to vegetation indices (<b>B</b>). The asterisk (*) represents a statistically significant difference (<span class="html-italic">p</span> &lt; 0.05).</p>
Full article ">Figure 6
<p>Leaf area index (LAI) estimated by the XGBoost algorithm using the significance cutoff correlation method (<b>A</b>) and the CRITIC weighting method (<b>B</b>).</p>
Full article ">Figure 7
<p>Chlorophyll a (<b>A</b>,<b>B</b>), chlorophyll b (<b>C</b>,<b>D</b>), total chlorophyll (<b>E</b>,<b>F</b>), and carotenoids (<b>G</b>,<b>H</b>) estimated by the XGBoost algorithm using the significance cutoff correlation method and the CRITIC weighting method.</p>
Full article ">
14 pages, 5732 KiB  
Article
Data-Driven Energy Consumption Analysis and Prediction of Real-World Electric Vehicles at Low Temperatures: A Case Study Under Dynamic Driving Cycles
by Yifei Zhao, Hang Liu, Jinsong Li, Hongli Liu and Bin Li
Energies 2025, 18(5), 1239; https://doi.org/10.3390/en18051239 - 3 Mar 2025
Viewed by 173
Abstract
Accurate analysis and prediction of low-temperature energy consumption in pure electric vehicles can provide a reliable reference for energy optimization strategies, thereby alleviating range anxiety. Here, we propose a data-driven energy consumption analysis and prediction approach for real-world electric vehicles in cold conditions. [...] Read more.
Accurate analysis and prediction of low-temperature energy consumption in pure electric vehicles can provide a reliable reference for energy optimization strategies, thereby alleviating range anxiety. Here, we propose a data-driven energy consumption analysis and prediction approach for real-world electric vehicles in cold conditions. Specifically, the dataset was divided into multiple kinematic segments by the fixed-step intercept method, and principal component analysis was applied on segment parameters, showing the average speed and acceleration time had the greatest impact on energy consumption at −7 °C. Then, a Bayesian optimized XGBoost model, with the two factors above as input, was constructed to predict the cumulative driving and total energy consumption. This method was validated with two different types of pure electric vehicles under different dynamic driving cycles. The results demonstrated that the model could predict low-temperature energy consumption accurately, with all mean relative errors less than 3%. Full article
Show Figures

Figure 1

Figure 1
<p>(<b>a</b>) The experimental platform for proposed low-temperature test, (<b>b</b>) the vehicle technical information, (<b>c</b>) the sample data from the power analyzer, and (<b>d</b>) the procedure of the test.</p>
Full article ">Figure 2
<p>(<b>a</b>) The CLTC-P driving cycle and (<b>b</b>) the common interurban cycle.</p>
Full article ">Figure 3
<p>(<b>a</b>) The principal component matrix for car W and (<b>b</b>) the matrix for car B.</p>
Full article ">Figure 4
<p>Each blue point represents a segment, and the red curves are quadratic fits to the energy consumption of all segments. (<b>a</b>) The energy consumption curve with the average speed and (<b>b</b>) the energy consumption curve with acceleration time.</p>
Full article ">Figure 5
<p>Each blue point represents a segment, and the red curves are quadratic fits to the energy consumption of all segments. (<b>a</b>) The energy consumption curve with the average speed and (<b>b</b>) the energy consumption curve with deceleration time.</p>
Full article ">Figure 6
<p>Curve of the elbow method.</p>
Full article ">Figure 7
<p>Visualization of clustering results. The title of <span class="html-italic">z</span>-axis is “total energy consumption/Wh·(km)<sup>−1</sup>”. Red points represent the segment EC under working condition 1, blue points under condition 2, yellow points under conditon 3 and green points under condition 4.</p>
Full article ">Figure 8
<p>(<b>a</b>) Prediction results of the driving energy consumption of car W, (<b>b</b>) prediction results of the total energy consumption of car W, and (<b>c</b>) prediction results of total energy consumption of car B.</p>
Full article ">Figure 9
<p>(<b>a</b>) Prediction results of the cumulative driving energy consumption of car W, (<b>b</b>) the cumulative total energy consumption of car W, and (<b>c</b>) the cumulative total energy consumption of car B; (<b>d</b>) an enlarged part of the cumulative total energy consumption prediction for car W.</p>
Full article ">Figure 10
<p>(<b>a</b>) Absolute error of cumulative driving energy consumption prediction from car W, (<b>b</b>) relative error of cumulative driving energy consumption prediction from car W, (<b>c</b>) absolute error of cumulative total energy consumption prediction from car W, (<b>d</b>) relative error of cumulative total energy consumption prediction from car W, (<b>e</b>) absolute error of cumulative total energy consumption prediction from car B, and (<b>f</b>) relative error of cumulative total energy consumption prediction from car B.</p>
Full article ">
Back to TopTop