MDPI - Publisher of Open Access Journals

23 pages, 5712 KiB

Open AccessArticle

Sparse Fuzzy C-Means Clustering with Lasso Penalty

by Shazia Parveen and Miin-Shen Yang

Symmetry 2024, 16(9), 1208; https://doi.org/10.3390/sym16091208 - 13 Sep 2024

Viewed by 1160

Clustering is a technique of grouping data into a homogeneous structure according to the similarity or dissimilarity measures between objects. In clustering, the fuzzy c-means (FCM) algorithm is the best-known and most commonly used method and is a fuzzy extension of k-means in [...] Read more.

Clustering is a technique of grouping data into a homogeneous structure according to the similarity or dissimilarity measures between objects. In clustering, the fuzzy c-means (FCM) algorithm is the best-known and most commonly used method and is a fuzzy extension of k-means in which FCM has been widely used in various fields. Although FCM is a good clustering algorithm, it only treats data points with feature components under equal importance and has drawbacks for handling high-dimensional data. The rapid development of social media and data acquisition techniques has led to advanced methods of collecting and processing larger, complex, and high-dimensional data. However, with high-dimensional data, the number of dimensions is typically immaterial or irrelevant. For features to be sparse, the Lasso penalty is capable of being applied to feature weights. A solution for FCM with sparsity is sparse FCM (S-FCM) clustering. In this paper, we propose a new S-FCM, called S-FCM-Lasso, which is a new type of S-FCM based on the Lasso penalty. The irrelevant features can be diminished towards exactly zero and assigned zero weights for unnecessary characteristics by the proposed S-FCM-Lasso. Based on various clustering performance measures, we compare S-FCM-Lasso with the S-FCM and other existing sparse clustering algorithms on several numerical and real-life datasets. Comparisons and experimental results demonstrate that, in terms of these performance measures, the proposed S-FCM-Lasso performs better than S-FCM and existing sparse clustering algorithms. This validates the efficiency and usefulness of the proposed S-FCM-Lasso algorithm for high-dimensional datasets with sparsity. Full article

(This article belongs to the Special Issue Symmetry in Intelligent Algorithms)

► Show Figures

Figure 1

14 pages, 731 KiB

Open AccessArticle

First-Order Sparse TSK Nonstationary Fuzzy Neural Network Based on the Mean Shift Algorithm and the Group Lasso Regularization

by Bingjie Zhang, Jian Wang, Xiaoling Gong, Zhanglei Shi, Chao Zhang, Kai Zhang, El-Sayed M. El-Alfy and Sergey V. Ablameyko

Mathematics 2024, 12(1), 120; https://doi.org/10.3390/math12010120 - 29 Dec 2023

Cited by 3 | Viewed by 1184

Abstract

Nonstationary fuzzy inference systems (NFIS) are able to tackle uncertainties and avoid the difficulty of type-reduction operation. Combining NFIS and neural network, a first-order sparse TSK nonstationary fuzzy neural network (SNFNN-1) is proposed in this paper to improve the interpretability/translatability of neural networks [...] Read more.

Nonstationary fuzzy inference systems (NFIS) are able to tackle uncertainties and avoid the difficulty of type-reduction operation. Combining NFIS and neural network, a first-order sparse TSK nonstationary fuzzy neural network (SNFNN-1) is proposed in this paper to improve the interpretability/translatability of neural networks and the self-learning ability of fuzzy rules/sets. The whole architecture of SNFNN-1 can be considered as an integrated model of multiple sub-networks with a variation in center, variation in width or variation in noise. Thus, it is able to model both “intraexpert” and “interexpert” variability. There are two techniques adopted in this network: the Mean Shift-based fuzzy partition and the Group Lasso-based rule selection, which can adaptively generate a suitable number of clusters and select important fuzzy rules, respectively. Quantitative experiments on six UCI datasets demonstrate the effectiveness and robustness of the proposed model. Full article

(This article belongs to the Section D2: Operations Research and Fuzzy Decision Making)

► Show Figures

Figure 1

Figure 1
Structure of the SNFNN-1 model. Full article ">Figure 2
An instantiation of NFS based on Gaussian functions with variation in center. Full article ">Figure 3
<math display="inline"><semantics> <msub> <mi>L</mi> <mn>2</mn> </msub> </semantics></math>-norm curves of fuzzy rules of a single training run for SFNN-1 on different datasets (each curve represents a fuzzy rule): (a) Iris. (b) Balance. (c) Liver. (d) Vertebral. (e) Glass (there are too many legends to display). (f) Vehicle. Full article ">Figure 3 Cont.
<math display="inline"><semantics> <msub> <mi>L</mi> <mn>2</mn> </msub> </semantics></math>-norm curves of fuzzy rules of a single training run for SFNN-1 on different datasets (each curve represents a fuzzy rule): (a) Iris. (b) Balance. (c) Liver. (d) Vertebral. (e) Glass (there are too many legends to display). (f) Vehicle. Full article ">Figure 4
Robustness comparison (box plot) between SFNN-1 and SNFNN-1 on different datasets (in parentheses is the standard deviation of C-noise, W-noise and Co-noise): (a) Iris (1.2; 1.2; 0.3). (b) Balance (0.6; 0.6; 0.4). (c) Liver (0.8; 0.8; 0.4). (d) Vertebral (0.8; 0.8; 0.2). (e) Glass (0.6; 0.6; 0.1). (f) Vehicle (1.2; 1.2; 0.1). Full article ">

21 pages, 1457 KiB

Open AccessArticle

Variable Selection for Sparse Logistic Regression with Grouped Variables

by Mingrui Zhong, Zanhua Yin and Zhichao Wang

Mathematics 2023, 11(24), 4979; https://doi.org/10.3390/math11244979 - 17 Dec 2023

Viewed by 1494

Abstract

We present a new penalized method for estimation in sparse logistic regression models with a group structure. Group sparsity implies that we should consider the Group Lasso penalty. In contrast to penalized log-likelihood estimation, our method can be viewed as a penalized weighted [...] Read more.

We present a new penalized method for estimation in sparse logistic regression models with a group structure. Group sparsity implies that we should consider the Group Lasso penalty. In contrast to penalized log-likelihood estimation, our method can be viewed as a penalized weighted score function method. Under some mild conditions, we provide non-asymptotic oracle inequalities promoting the group sparsity of predictors. A modified block coordinate descent algorithm based on a weighted score function is also employed. The net advantage of our algorithm over existing Group Lasso-type procedures is that the tuning parameter can be pre-specified. The simulations show that this algorithm is considerably faster and more stable than competing methods. Finally, we illustrate our methodology with two real data sets. Full article

(This article belongs to the Section D1: Probability and Statistics)

► Show Figures

Figure 1

16 pages, 4172 KiB

Open AccessArticle

A 2D-DOA Sparse Estimation Method with Total Variation Regularization for Spatially Extended Sources

by Zhihong Liu, Qingyu Liu, Zunmin Liu, Chao Li and Qixin Xu

Appl. Sci. 2023, 13(17), 9565; https://doi.org/10.3390/app13179565 - 24 Aug 2023

Viewed by 1195

Abstract

In this paper, a novel two-dimensional direction of arrival (2D-DOA) estimation method with total variation regularization is proposed to deal with the problem of sparse DOA estimation for spatially extended sources. In a general sparse framework, the sparse 2D-DOA estimation problem is formulated [...] Read more.

In this paper, a novel two-dimensional direction of arrival (2D-DOA) estimation method with total variation regularization is proposed to deal with the problem of sparse DOA estimation for spatially extended sources. In a general sparse framework, the sparse 2D-DOA estimation problem is formulated with the regularization of extended source characteristics including spatial position grouping, acoustic signal block sparse, and correlation features. An extended sources acoustic model, two-dimensional array manifold and its complete representation, total variation regularization penalty term, and the regularization equation are built, and are utilized to seek the solutions where the non-zero coefficients are grouped together with optimum sparseness. A total variation sparse 2D-DOA estimation model is constructed by combining total variation regularization with LASSO. The model can be easily solved by the convex optimization algorithm, and the solving process can promote the sparsity of the solution on the spatial derivatives and the solution itself. The theoretical analysis results show that the steps of decorrelation processing and angle matching of traditional 2D-DOA estimation methods could be avoided when adopting the proposed method. The proposed method has better robustness to noise, better sparsity, and faster estimation speed with higher resolution than traditional methods. It is promising to provide a coherent sources sparse representation of a non-strictly sparse field. Full article

(This article belongs to the Section Acoustics and Vibrations)

► Show Figures

Figure 1

13 pages, 1170 KiB

Open AccessArticle

Preoperative Prediction of Optimal Femoral Implant Size by Regularized Regression on 3D Femoral Bone Shape

by Adriaan Lambrechts, Christophe Van Dijck, Roel Wirix-Speetjens, Jos Vander Sloten, Frederik Maes and Sabine Van Huffel

Appl. Sci. 2023, 13(7), 4344; https://doi.org/10.3390/app13074344 - 29 Mar 2023

Viewed by 1470

Abstract

Preoperative determination of implant size for total knee arthroplasty surgery has numerous clinical and logistical benefits. Currently, surgeons use X-ray-based templating to estimate implant size, but this method has low accuracy. Our study aims to improve accuracy by developing a machine learning approach [...] Read more.

Preoperative determination of implant size for total knee arthroplasty surgery has numerous clinical and logistical benefits. Currently, surgeons use X-ray-based templating to estimate implant size, but this method has low accuracy. Our study aims to improve accuracy by developing a machine learning approach that predicts the required implant size based on a 3D femoral bone mesh, the key factor in determining the correct implant size. A linear regression framework imposing group sparsity on the 3D bone mesh vertex coordinates was proposed based on a dataset of 446 MRI scans. The group sparse regression method was further regularized based on the connectivity of the bone mesh to enforce neighbouring vertices to have similar importance to the model. Our hypergraph regularized group lasso had an accuracy of 70.1% in predicting femoral implant size while the initial implant size prediction provided by the instrumentation manufacturer to the surgeon has an accuracy of 23.1%. Furthermore, our method was capable of predicting the implant size up to one size smaller or larger with an accuracy of 99.1%, thereby surpassing other state-of-the-art methods. The hypergraph regularized group lasso was able to obtain a significantly higher accuracy compared to the implant size prediction provided by the instrumentation manufacturer. Full article

(This article belongs to the Special Issue Novel Advances in Computer-Assisted Surgery)

► Show Figures

Figure 1

17 pages, 4441 KiB

Open AccessArticle

Research on Aviation Safety Prediction Based on Variable Selection and LSTM

by Hang Zeng, Jiansheng Guo, Hongmei Zhang, Bo Ren and Jiangnan Wu

Sensors 2023, 23(1), 41; https://doi.org/10.3390/s23010041 - 21 Dec 2022

Cited by 6 | Viewed by 2128 | Correction

Abstract

Accurate prediction of aviation safety levels is significant for the efficient early warning and prevention of incidents. However, the causal mechanism and temporal character of aviation accidents are complex and not fully understood, which increases the operation cost of accurate aviation safety prediction. [...] Read more.

Accurate prediction of aviation safety levels is significant for the efficient early warning and prevention of incidents. However, the causal mechanism and temporal character of aviation accidents are complex and not fully understood, which increases the operation cost of accurate aviation safety prediction. This paper adopts an innovative statistical method involving a least absolute shrinkage and selection operator (LASSO) and long short-term memory (LSTM). We compiled and calculated 138 monthly aviation insecure events collected from the Aviation Safety Reporting System (ASRS) and took minor accidents as the predictor. Firstly, this paper introduced the group variables and the weight matrix into LASSO to realize the adaptive variable selection. Furthermore, it took the selected variable into multistep stacked LSTM (MSSLSTM) to predict the monthly accidents in 2020. Finally, the proposed method was compared with multiple existing variable selection and prediction methods. The results demonstrate that the RMSE (root mean square error) of the MSSLSTM is reduced by 41.98%, compared with the original model; on the other hand, the key variable selected by the adaptive spare group lasso (ADSGL) can reduce the elapsed time by 42.67% (13 s). This shows that aviation safety prediction based on ADSGL and MSSLSTM can improve the prediction efficiency of the model while keeping excellent generalization ability and robustness. Full article

(This article belongs to the Special Issue Vehicle Autonomy, Safety, and Security via Mobile Crowdsensing)

► Show Figures

Figure 1

28 pages, 1213 KiB

Open AccessArticle

A Novel Algorithm for Feature Selection Using Penalized Regression with Applications to Single-Cell RNA Sequencing Data

by Bhavithry Sen Puliparambil, Jabed H. Tomal and Yan Yan

Biology 2022, 11(10), 1495; https://doi.org/10.3390/biology11101495 - 12 Oct 2022

Cited by 2 | Viewed by 3206

Abstract

With the emergence of single-cell RNA sequencing (scRNA-seq) technology, scientists are able to examine gene expression at single-cell resolution. Analysis of scRNA-seq data has its own challenges, which stem from its high dimensionality. The method of machine learning comes with the potential of [...] Read more.

With the emergence of single-cell RNA sequencing (scRNA-seq) technology, scientists are able to examine gene expression at single-cell resolution. Analysis of scRNA-seq data has its own challenges, which stem from its high dimensionality. The method of machine learning comes with the potential of gene (feature) selection from the high-dimensional scRNA-seq data. Even though there exist multiple machine learning methods that appear to be suitable for feature selection, such as penalized regression, there is no rigorous comparison of their performances across data sets, where each poses its own challenges. Therefore, in this paper, we analyzed and compared multiple penalized regression methods for scRNA-seq data. Given the scRNA-seq data sets we analyzed, the results show that sparse group lasso (SGL) outperforms the other six methods (ridge, lasso, elastic net, drop lasso, group lasso, and big lasso) using the metrics area under the receiver operating curve (AUC) and computation time. Building on these findings, we proposed a new algorithm for feature selection using penalized regression methods. The proposed algorithm works by selecting a small subset of genes and applying SGL to select the differentially expressed genes in scRNA-seq data. By using hierarchical clustering to group genes, the proposed method bypasses the need for domain-specific knowledge for gene grouping information. In addition, the proposed algorithm provided consistently better AUC for the data sets used. Full article

(This article belongs to the Special Issue Research in Computational Molecular Biology Focused on Comparative Genomics: Selected Papers from RECOMB-CG 2022)

► Show Figures

Figure 1

12 pages, 5738 KiB

Open AccessArticle

Group Class Residual ℓ₁-Minimization on Random Projection Sparse Representation Classifier for Face Recognition

by Susmini Indriani Lestariningati, Andriyan Bayu Suksmono, Ian Joseph Matheus Edward and Koredianto Usman

Electronics 2022, 11(17), 2723; https://doi.org/10.3390/electronics11172723 - 30 Aug 2022

Cited by 2 | Viewed by 1746

Abstract

Sparse Representation-based Classification (SRC) has been seen to be a reliable Face Recognition technique. The

ℓ_{1}

Bayesian based on the Lasso algorithm has proven to be most effective in class identification and computation complexity. In this paper, we revisit classification algorithm and [...] Read more.

Sparse Representation-based Classification (SRC) has been seen to be a reliable Face Recognition technique. The

ℓ_{1}

Bayesian based on the Lasso algorithm has proven to be most effective in class identification and computation complexity. In this paper, we revisit classification algorithm and then recommend the group-based classification. The proposed modified algorithm, which is called as Group Class Residual Sparse Representation-based Classification (GCR-SRC), extends the coherency of the test sample to the whole training samples of the identified class rather than only to the nearest one of the training samples. Our method is based on the nearest coherency between a test sample and the identified training samples. To reduce the dimension of the training samples, we choose random projection for feature extraction. This method is selected to reduce the computational cost without increasing the algorithm’s complexity. From the simulation result, the reduction factor (

ρ

) 64 can achieve a maximum recognition rate about 10% higher than the SRC original using the downscaling method. Our proposed method’s feasibility and effectiveness are tested on four popular face databases, namely AT&T, Yale B, Georgia Tech, and AR Dataset. GCR-SRC and GCR-RP-SRC achieved up to 4% more accurate than SRC random projection with class-specific residuals. The experiment results show that the face recognition technology based on random projection and group-class-based not only reduces the dimension of the face data but also increases the recognition accuracy, indicating that it is a feasible method for face recognition. Full article

(This article belongs to the Special Issue Compressed Sensing in Signal Processing and Imaging)

► Show Figures

Figure 1

30 pages, 828 KiB

Open AccessArticle

MobilePrune: Neural Network Compression via ℓ₀ Sparse Group Lasso on the Mobile System

by Yubo Shao, Kaikai Zhao, Zhiwen Cao, Zhehao Peng, Xingang Peng, Pan Li, Yijie Wang and Jianzhu Ma

Sensors 2022, 22(11), 4081; https://doi.org/10.3390/s22114081 - 27 May 2022

Cited by 3 | Viewed by 2843

Abstract

It is hard to directly deploy deep learning models on today’s smartphones due to the substantial computational costs introduced by millions of parameters. To compress the model, we develop an

ℓ_{0}

-based sparse group lasso model called MobilePrune which can generate extremely [...] Read more.

It is hard to directly deploy deep learning models on today’s smartphones due to the substantial computational costs introduced by millions of parameters. To compress the model, we develop an

ℓ_{0}

-based sparse group lasso model called MobilePrune which can generate extremely compact neural network models for both desktop and mobile platforms. We adopt group lasso penalty to enforce sparsity at the group level to benefit General Matrix Multiply (GEMM) and develop the very first algorithm that can optimize the

ℓ_{0}

norm in an exact manner and achieve the global convergence guarantee in the deep learning context. MobilePrune also allows complicated group structures to be applied on the group penalty (i.e., trees and overlapping groups) to suit DNN models with more complex architectures. Empirically, we observe the substantial reduction of compression ratio and computational costs for various popular deep learning models on multiple benchmark datasets compared to the state-of-the-art methods. More importantly, the compression models are deployed on the android system to confirm that our approach is able to achieve less response delay and battery consumption on mobile phones. Full article

(This article belongs to the Section Physical Sensors)

► Show Figures

Figure 1

24 pages, 2149 KiB

Open AccessArticle

A Holistic Strategy for Classification of Sleep Stages with EEG

by Sunil Kumar Prabhakar, Harikumar Rajaguru, Semin Ryu, In cheol Jeong and Dong-Ok Won

Sensors 2022, 22(9), 3557; https://doi.org/10.3390/s22093557 - 7 May 2022

Cited by 6 | Viewed by 2310

Abstract

Manual sleep stage scoring is usually implemented with the help of sleep specialists by means of visual inspection of the neurophysiological signals of the patient. As it is a very hectic task to perform, automated sleep stage classification systems were developed in the [...] Read more.

Manual sleep stage scoring is usually implemented with the help of sleep specialists by means of visual inspection of the neurophysiological signals of the patient. As it is a very hectic task to perform, automated sleep stage classification systems were developed in the past, and advancements are being made consistently by researchers. The various stages of sleep are identified by these automated sleep stage classification systems, and it is quite an important step to assist doctors for the diagnosis of sleep-related disorders. In this work, a holistic strategy named as clustering and dimensionality reduction with feature extraction cum selection for classification along with deep learning (CDFCD) is proposed for the classification of sleep stages with EEG signals. Though the methodology follows a similar structural flow as proposed in the past works, many advanced and novel techniques are proposed under each category in this work flow. Initially, clustering is applied with the help of hierarchical clustering, spectral clustering, and the proposed principal component analysis (PCA)-based subspace clustering. Then the dimensionality of it is reduced with the help of the proposed singular value decomposition (SVD)-based spectral algorithm and the standard variational Bayesian matrix factorization (VBMF) technique. Then the features are extracted and selected with the two novel proposed techniques, such as the sparse group lasso technique with dual-level implementation (SGL-DLI) and the ridge regression technique with limiting weight scheme (RR-LWS). Finally, the classification happens with the less explored multiclass Gaussian process classification (MGC), the proposed random arbitrary collective classification (RACC), and the deep learning technique using long short-term memory (LSTM) along with other conventional machine learning techniques. This methodology is validated on the sleep EDF database, and the results obtained with this methodology have surpassed the results of the previous studies in terms of the obtained classification accuracy reporting a high accuracy of 93.51% even for the six-classes classification problem. Full article

(This article belongs to the Special Issue Assistive Sensors and Related Algorithms for Sleep, Respiration, Asthma and Stress Monitoring)

► Show Figures

Figure 1

19 pages, 11717 KiB

Open AccessArticle

Sparse Damage Detection with Complex Group Lasso and Adaptive Complex Group Lasso

by Vasileios Dimopoulos, Wim Desmet and Elke Deckers

Sensors 2022, 22(8), 2978; https://doi.org/10.3390/s22082978 - 13 Apr 2022

Cited by 5 | Viewed by 2308

Abstract

Sparsity-based methods have recently come to the foreground of damage detection applications posing a robust and efficient alternative for traditional approaches. At the same time, low-frequency inspection is known to enable global monitoring with waves propagating over large distances. In this paper, a [...] Read more.

Sparsity-based methods have recently come to the foreground of damage detection applications posing a robust and efficient alternative for traditional approaches. At the same time, low-frequency inspection is known to enable global monitoring with waves propagating over large distances. In this paper, a single sensor complex Group Lasso methodology for the problem of structural defect localization by means of compressive sensing and complex low-frequency response functions is presented. The complex Group Lasso methodology is evaluated on composite plates with induced scatterers. An adaptive setting of the methodology is also proposed to further enhance resolution. Results from both approaches are compared with a full-array, super-resolution MUSIC technique of the same signal model. Both algorithms are shown to demonstrate high and competitive performance. Full article

(This article belongs to the Section Fault Diagnosis & Sensors)

► Show Figures

Figure 1

18 pages, 1175 KiB

Open AccessArticle

Multitask Learning Radiomics on Longitudinal Imaging to Predict Survival Outcomes following Risk-Adaptive Chemoradiation for Non-Small Cell Lung Cancer

by Parisa Forouzannezhad, Dominic Maes, Daniel S. Hippe, Phawis Thammasorn, Reza Iranzad, Jie Han, Chunyan Duan, Xiao Liu, Shouyi Wang, W. Art Chaovalitwongse, Jing Zeng and Stephen R. Bowen

Cancers 2022, 14(5), 1228; https://doi.org/10.3390/cancers14051228 - 26 Feb 2022

Cited by 24 | Viewed by 4606

Abstract

Medical imaging provides quantitative and spatial information to evaluate treatment response in the management of patients with non-small cell lung cancer (NSCLC). High throughput extraction of radiomic features on these images can potentially phenotype tumors non-invasively and support risk stratification based on survival [...] Read more.

Medical imaging provides quantitative and spatial information to evaluate treatment response in the management of patients with non-small cell lung cancer (NSCLC). High throughput extraction of radiomic features on these images can potentially phenotype tumors non-invasively and support risk stratification based on survival outcome prediction. The prognostic value of radiomics from different imaging modalities and time points prior to and during chemoradiation therapy of NSCLC, relative to conventional imaging biomarker or delta radiomics models, remains uncharacterized. We investigated the utility of multitask learning of multi-time point radiomic features, as opposed to single-task learning, for improving survival outcome prediction relative to conventional clinical imaging feature model benchmarks. Survival outcomes were prospectively collected for 45 patients with unresectable NSCLC enrolled on the FLARE-RT phase II trial of risk-adaptive chemoradiation and optional consolidation PD-L1 checkpoint blockade (NCT02773238). FDG-PET, CT, and perfusion SPECT imaging pretreatment and week 3 mid-treatment was performed and 110 IBSI-compliant pyradiomics shape-/intensity-/texture-based features from the metabolic tumor volume were extracted. Outcome modeling consisted of a fused Laplacian sparse group LASSO with component-wise gradient boosting survival regression in a multitask learning framework. Testing performance under stratified 10-fold cross-validation was evaluated for multitask learning radiomics of different imaging modalities and time points. Multitask learning models were benchmarked against conventional clinical imaging and delta radiomics models and evaluated with the concordance index (c-index) and index of prediction accuracy (IPA). FDG-PET radiomics had higher prognostic value for overall survival in test folds (c-index 0.71 [0.67, 0.75]) than CT radiomics (c-index 0.64 [0.60, 0.71]) or perfusion SPECT radiomics (c-index 0.60 [0.57, 0.63]). Multitask learning of pre-/mid-treatment FDG-PET radiomics (c-index 0.71 [0.67, 0.75]) outperformed benchmark clinical imaging (c-index 0.65 [0.59, 0.71]) and FDG-PET delta radiomics (c-index 0.52 [0.48, 0.58]) models. Similarly, the IPA for multitask learning FDG-PET radiomics (30%) was higher than clinical imaging (26%) and delta radiomics (15%) models. Radiomics models performed consistently under different voxel resampling conditions. Multitask learning radiomics for outcome modeling provides a clinical decision support platform that leverages longitudinal imaging information. This framework can reveal the relative importance of different imaging modalities and time points when designing risk-adaptive cancer treatment strategies. Full article

(This article belongs to the Special Issue Medical Imaging and Machine Learning)

► Show Figures

Figure 1

Figure 1
FDG-PET/CT images for an example PET non-responder patient (a,c) and PET responder patient (b,d), acquired pretreatment (a,b) and mid-treatment (c,d). Tumor volumes are displayed as blue/green contours. Full article ">Figure 2
Overall schematic of survival outcome prediction pipeline using multitask feature selection across time points from single/multimodality radiomics (left) and steps inside the stratified cross-validation folds for multitask and gradient boosting survival (right). Note that feature selection and nested grid search for hyperparameter tuning were constrained to training folds and blinded to test folds, in order to prevent data leakage for unbiased performance evaluation. Full article ">Figure 3
Receiver operating characteristic (ROC) curves and c-index values for different modalities. Full article ">Figure 4
Kaplan–Meier curves of overall survival in test folds stratified by high-risk (>median prediction) versus low-risk (<median prediction) groups with models using the (a) FDG-PET, (b) CT, (c) perfusion SPECT radiomic features, and (d) clinical imaging variables. Full article ">Figure 5
Heatmap of c-index values of overall survival prediction for different feature selection and survival analysis algorithms using FDG-PET radiomics (DR—delta radiomics; Coxnet—Cox net survival model; RR-RFE—ridge regression recursive feature elimination; RF—random forest; RSF—random survival forest; GBS—gradient boosting survival; SSVM—survival support vector machine; GBS—gradient boosting survival). Full article ">Figure 6
Heatmap of IPA values of overall survival prediction for different feature selection and survival analysis algorithms using FDG-PET radiomics (DR—delta radiomics; Coxnet—Cox net survival model; RR-RFE—ridge regression recursive feature elimination; RF—random forest; RSF—random survival forest; GBS—gradient boosting survival; SSVM—survival support vector machine; GBS—gradient boosting survival). Full article ">

24 pages, 477 KiB

Open AccessArticle

Multivariate Functional Kernel Machine Regression and Sparse Functional Feature Selection

by Joseph Naiman and Peter Xuekun Song

Entropy 2022, 24(2), 203; https://doi.org/10.3390/e24020203 - 28 Jan 2022

Cited by 1 | Viewed by 2587

Abstract

Motivated by mobile devices that record data at a high frequency, we propose a new methodological framework for analyzing a semi-parametric regression model that allow us to study a nonlinear relationship between a scalar response and multiple functional predictors in the presence of [...] Read more.

Motivated by mobile devices that record data at a high frequency, we propose a new methodological framework for analyzing a semi-parametric regression model that allow us to study a nonlinear relationship between a scalar response and multiple functional predictors in the presence of scalar covariates. Utilizing functional principal component analysis (FPCA) and the least-squares kernel machine method (LSKM), we are able to substantially extend the framework of semi-parametric regression models of scalar responses on scalar predictors by allowing multiple functional predictors to enter the nonlinear model. Regularization is established for feature selection in the setting of reproducing kernel Hilbert spaces. Our method performs simultaneously model fitting and variable selection on functional features. For the implementation, we propose an effective algorithm to solve related optimization problems in that iterations take place between both linear mixed-effects models and a variable selection method (e.g., sparse group lasso). We show algorithmic convergence results and theoretical guarantees for the proposed methodology. We illustrate its performance through simulation experiments and an analysis of accelerometer data. Full article

(This article belongs to the Special Issue Big Data Analytics and Information Science for Business and Biomedical Applications II)

► Show Figures

Figure 1

14 pages, 540 KiB

Open AccessArticle

Iterative Variable Selection for High-Dimensional Data: Prediction of Pathological Response in Triple-Negative Breast Cancer

by Juan C. Laria, M. Carmen Aguilera-Morillo, Enrique Álvarez, Rosa E. Lillo, Sara López-Taruella, María del Monte-Millán, Antonio C. Picornell, Miguel Martín and Juan Romo

Mathematics 2021, 9(3), 222; https://doi.org/10.3390/math9030222 - 23 Jan 2021

Cited by 2 | Viewed by 2705

Abstract

Over the last decade, regularized regression methods have offered alternatives for performing multi-marker analysis and feature selection in a whole genome context. The process of defining a list of genes that will characterize an expression profile remains unclear. It currently relies upon advanced [...] Read more.

Over the last decade, regularized regression methods have offered alternatives for performing multi-marker analysis and feature selection in a whole genome context. The process of defining a list of genes that will characterize an expression profile remains unclear. It currently relies upon advanced statistics and can use an agnostic point of view or include some a priori knowledge, but overfitting remains a problem. This paper introduces a methodology to deal with the variable selection and model estimation problems in the high-dimensional set-up, which can be particularly useful in the whole genome context. Results are validated using simulated data and a real dataset from a triple-negative breast cancer study. Full article

(This article belongs to the Special Issue Analysis and Comparison of Probabilistic Models)

► Show Figures

Figure 1

15 pages, 3962 KiB

Open AccessArticle

A Deep Neural Network for Accurate and Robust Prediction of the Glass Transition Temperature of Polyhydroxyalkanoate Homo- and Copolymers

by Zhuoying Jiang, Jiajie Hu, Babetta L. Marrone, Ghanshyam Pilania and Xiong (Bill) Yu

Materials 2020, 13(24), 5701; https://doi.org/10.3390/ma13245701 - 14 Dec 2020

Cited by 20 | Viewed by 3897

Abstract

The purpose of this study was to develop a data-driven machine learning model to predict the performance properties of polyhydroxyalkanoates (PHAs), a group of biosourced polyesters featuring excellent performance, to guide future design and synthesis experiments. A deep neural network (DNN) machine learning [...] Read more.

The purpose of this study was to develop a data-driven machine learning model to predict the performance properties of polyhydroxyalkanoates (PHAs), a group of biosourced polyesters featuring excellent performance, to guide future design and synthesis experiments. A deep neural network (DNN) machine learning model was built for predicting the glass transition temperature, T_g, of PHA homo- and copolymers. Molecular fingerprints were used to capture the structural and atomic information of PHA monomers. The other input variables included the molecular weight, the polydispersity index, and the percentage of each monomer in the homo- and copolymers. The results indicate that the DNN model achieves high accuracy in estimation of the glass transition temperature of PHAs. In addition, the symmetry of the DNN model is ensured by incorporating symmetry data in the training process. The DNN model achieved better performance than the support vector machine (SVD), a nonlinear ML model and least absolute shrinkage and selection operator (LASSO), a sparse linear regression model. The relative importance of factors affecting the DNN model prediction were analyzed. Sensitivity of the DNN model, including strategies to deal with missing data, were also investigated. Compared with commonly used machine learning models incorporating quantitative structure–property (QSPR) relationships, it does not require an explicit descriptor selection step but shows a comparable performance. The machine learning model framework can be readily extended to predict other properties. Full article

► Show Figures

Figure 1

Search Results (18)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (18)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI