A Robust Variable Selection Method for Sparse Online Regression via the Elastic Net Penalty
<p>I. absolute loss; II. <math display="inline"><semantics> <mi>ϵ</mi> </semantics></math>-insensitive loss; III. canal loss.</p> "> Figure 2
<p>An illustration schematic of the online regression learning procedure.</p> "> Figure 3
<p>Results of simulations of noisy explanatory variable <span class="html-italic"><b>x</b></span> and noise response variables <span class="html-italic"><b>y</b></span> in the presence of data multicollinearity. (<b>a</b>) The noise explanatory variable <span class="html-italic"><b>x</b></span>. (<b>b</b>) The noise response variable <span class="html-italic"><b>y</b></span>.</p> "> Figure 4
<p>Simulation results for noisy explanatory variable <span class="html-italic"><b>x</b></span>. (<b>a</b>) <span class="html-italic">n</span> = 5000. (<b>b</b>) <span class="html-italic">n</span> = 10,000.</p> "> Figure 5
<p>Simulation results for noisy response <span class="html-italic"><b>y</b></span>. (<b>a</b>) <span class="html-italic">n</span> = 5000. (<b>b</b>) <span class="html-italic">n</span> = 10,000.</p> "> Figure 6
<p>Box plots of four benchmark datasets. (<b>a</b>) Letters. (<b>b</b>) Kin. (<b>c</b>) Abalone. (<b>d</b>) Pendigits.</p> "> Figure 7
<p>Experimental results on the dataset “Kin” and dataset “Abalone”. (<b>a</b>) Kin. (<b>b</b>) Abalone.</p> "> Figure 8
<p>Experimental results on the dataset “Letters” and dataset “Pendigits”. (<b>a</b>) Letters. (<b>b</b>) Pendigits.</p> ">
Abstract
:1. Introduction
- This model is efficient at handling streaming data. The suggested canal-adaptive elastic net dynamically updates the regression coefficients for regularised linear models in real-time. Each time a batch of data is fetched, the OGD framework enables updating the original model. Can handle stream data more effectively.
- The model has a sparse representation. As illustrated in Figure 1, only a tiny subsection of samples with residuals in the ranges and are used to adjust the regression parameters. As a result, the model has perfect scalability and decreases computing costs.
- The improved loss function confers on the model a significant level of noise resistance. By dynamically modifying the parameter, noisy data with absolute errors (bigger than the threshold parameter ) are recognized and excluded from being employed to alter the regression coefficients.
- The -norm and -norm are employed. Can handle the scenario of in the data more effectively. Simultaneous automatic variable selection and continuous shrinkage and can select groups of related variables. Overcoming the effects of data multicollinearity.
2. Related Works
3. Method
3.1. Canal-Adaptive Elastic Net
3.2. Online Learning Algorithm for Canal-Adaptive Elastic Net
Algorithm 1 Noise-Resilient Online Canal-adaptive Elastic Net Algorithm. |
Input: Initial estimate number of examples and instance sequences |
. |
Output: Predict |
1: |
2: for do |
3: Receive instance |
4: Predict value |
5: Receive true value |
6: Update canal loss parameter and according to Equation (5) |
7: Compute residual error |
8: if |
9: Update , |
according to Equation (4). |
10: else |
11: Update , |
according to Equation (4). |
12: end if |
13: end for |
4. Experiments
4.1. Simulation Settings
4.1.1. The Case of Both Multicollinearity and Noise
4.1.2. The Case of Noise
4.2. Benchmark Data Sets
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Gama, J. Knowledge discovery from data streams. Intell. Data Anal. 2009, 13, 403–404. [Google Scholar] [CrossRef]
- Jian, L.; Gao, F.; Ren, P.; Song, Y.; Luo, S. A noise-resilient online learning algorithm for scene classification. Remote Sens. 2018, 10, 1836. [Google Scholar] [CrossRef]
- Jian, L.; Li, J.; Liu, H. Toward online node classification on streaming networks. Data Min. Knowl. Discov. 2018, 32, 231–257. [Google Scholar] [CrossRef]
- Zinkevich, M. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the Twentieth International Conference on Machine Learning, Washington, DC, USA, 21–24 August 2003; pp. 928–936. [Google Scholar]
- Aiken, L.S.; West, S.G. Multiple Regression: Testing and Interpreting Interactions; Sage: Newbury Park, CA, USA, 1991. [Google Scholar]
- Wang, D.; Zhang, Z. Summary of variable selection methods in linear regression models. Math. Stat. Manag. 2010, 29, 615–627. [Google Scholar]
- Frank, I.E.; Priedman, J.H. A statistical view of some chemomnetrics regression tools. Technometrics 1993, 35, 109–148. [Google Scholar] [CrossRef]
- Hoerl, A.; Kennard, R. Ridge regression. In Encyclopedia of Statistical Sciences; Wiley: New York, NY, USA, 1988; Volume 8, pp. 129–136. [Google Scholar]
- Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Huang, J.; Ma, S.G.; Zhang, C.H. Adaptive lasso for sparse high-dimensional regression models. Stat. Sin. 2008, 374, 1603–1618. [Google Scholar]
- Zou, H. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef]
- Dicker, L.; Lin, X. Parallelism, uniqueness, and large-sample asymptotics for the Dantzig selector. Can. J. Stat. 2013, 41, 23–35. [Google Scholar] [CrossRef]
- Candes, E.; Tao, T. The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Stat. 2007, 35, 2313–2351. [Google Scholar]
- Yuan, M.; Lin, Y. Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. 2006, 68, 49–67. [Google Scholar] [CrossRef]
- Chesneau, C.; Hebiri, M. Some theoretical results on the Grouped Variables Lasso. Math. Methods Stat. 2008, 17, 317–326. [Google Scholar] [CrossRef]
- Percival, D. Theoretical properties of the overlapping groups lasso. Electron. J. Stat. 2012, 6, 269–288. [Google Scholar] [CrossRef]
- Li, Y.; Nan, B.; Zhu, J. Multivariate sparse group lasso for the multivariate multiple linear regression with an arbitrary group structure. Biometrics 2015, 71, 354–363. [Google Scholar] [CrossRef] [PubMed]
- Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. 2005, 67, 301–320. [Google Scholar] [CrossRef]
- Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 15–18. [Google Scholar] [CrossRef]
- Geisser, S.; Eddy, W.F. A predictive approach to model selection. J. Am. Stat. Assoc. 1979, 74, 153–160. [Google Scholar] [CrossRef]
- Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
- Xu, Y.; Zhu, S.; Yang, S.; Zhang, C.; Jin, R.; Yang, T. Learning with non-convex truncated losses by SGD. arXiv 2008, arXiv:1805.07880. [Google Scholar]
- Chang, L.; Roberts, S.A. Welsh, Robust lasso regression using tukey’s biweight criterion. Technometrics 2018, 60, 36–47. [Google Scholar] [CrossRef]
- Xu, S.; Zhang, C.-X. Robust sparse regression by modeling noise as a mixture of gaussians. J. Appl. Stat. 2019, 46, 1738–1755. [Google Scholar] [CrossRef]
- Wang, X.; Jiang, Y.; Huang, M.; Zhang, H. Robust variable selection with exponential squared loss. J. Am. Stat. Assoc. 2013, 108, 632–643. [Google Scholar] [CrossRef] [PubMed]
- Young, D.S. Handbook of Regression Methods; CRC Press: Boca Raton, FL, USA, 2017; pp. 109–136. [Google Scholar]
- Akaike, H. Information theory and an extension of the maximum likelihood principle. In Proceedings of the Second International Symposium on Information Theory; Petrov, B.N., Csaki, F., Eds.; Akademiai Kiado: Budapest, Hungary, 1973; pp. 267–281. [Google Scholar]
- Gunst, R.F.; Webster, J.T. Regression analysis and problems of multicollinearity. Commun. Stat. 1975, 4, 277–292. [Google Scholar] [CrossRef]
- Guilkey, D.K.; Murphy, J.L. Directed Ridge Regression Techniques in cases of Multicollinearity. J. Am. Stat. Assoc. 1975, 70, 767–775. [Google Scholar] [CrossRef]
- El-Dereny, M.; Rashwan, N. Solving multicollinearity problem Using Ridge Regression Models. Sciences 2011, 12, 585–600. [Google Scholar]
- Bhadeshia, H. Neural networks and information in materials science. Stat. Anal. Data Min. Asa Data Sci. J. 2009, 1, 296–305. [Google Scholar] [CrossRef]
- Zurada, J.M. Introduction to Artifificial Neural Systems; West Publishing Company: St. Paul, MN, USA, 1992; Volume 8. [Google Scholar]
- Gunn, S.R. Support vector machines for classifification and regression. ISIS Tech. Rep. 1998, 14, 5–16. [Google Scholar]
- Wang, Z.; Vucetic, S. Online training on a budget of support vector machines using twin prototypes. Stat. Anal. Data Min. ASA Data Sci. J. 2010, 3, 149–169. [Google Scholar] [CrossRef]
- Aggarwal, C.C. Data Mining: The Textbook; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
- Bottou, L. Online learning and stochastic approximations. On-Line Learn. Neural Netw. 1998, 17, 142. [Google Scholar]
- Gao, F.; Song, X.; Jian, L.; Liang, X. Toward budgeted online kernel ridge regression on streaming data. IEEE Access 2019, 7, 26136–26145. [Google Scholar] [CrossRef]
- Arce, P.; Salinas, L. Online ridge regression method using sliding windows. In Proceedings of the Chilean Computer Science Society (SCCC), Washington, DC, USA, 12–16 November 2012; pp. 87–90. [Google Scholar]
- Monti, R.P.; Anagnostopoulos, C.; Montana, G. Adaptive regularization for lasso models in the context of nonstationary data streams. Stat. Anal. Data Min. ASA Data Sci. J. 2018, 11, 237–247. [Google Scholar] [CrossRef]
- Orabona, F.; Keshet, J.; Caputo, B. The projectron: A bounded kernel-based perceptron. In Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 5–9 July 2008; pp. 720–727. [Google Scholar]
- Zhao, P.; Wang, J.; Wu, P.; Jin, R.; Hoi, S.C. Fast bounded online gradient descent algorithms for scalable kernel-based online learnin. arXiv 2012, arXiv:1206.4633. [Google Scholar]
- Robbins, H.; Monro, S. A stochastic approximation method. Ann. Math. Stat. 1951, 1, 400–407. [Google Scholar] [CrossRef]
- Dheeru, D.; Karra Taniskidou, E. UCI Machine Learning Repository; School of Information and Computer Scienc: Irvine, CA, USA, 2017; Available online: http://archive.ics.uci.edu/ml (accessed on 20 June 2022).
- Chang, C.C.; Lin, C.J. LIBSVM: A library for support vector machines. Acm Trans. Intell. Syst. Technol. 2011, 2, 1–27. [Google Scholar] [CrossRef]
- Liu, W.; Pokharel, P.P.; Principe, J.C. The kernel least-mean-square algorithm. IEEE Trans. Signal Process. 2008, 56, 543–554. [Google Scholar] [CrossRef]
Method | RMSE | MAE | Discarded Samples | Discarded Rate | Time (s) | |
---|---|---|---|---|---|---|
0 | Lasso | 1.7274 ± 0.2074 | 9.6486 ± 2.2035 | 0 | 0.00% | 0.0011 |
Elastic Net | 1.7739 ± 0.2324 | 9.8651 ± 3.6694 | 0 | 0.00% | 0.0013 | |
Ridge Regression | 1.6251 ± 0.2599 | 7.7511 ± 1.6699 | 0 | 0.00% | 0.0012 | |
Canal-Adaptive Elastic Net | 1.6109 ± 0.1826 | 7.0687 ± 1.8752 | 0 | 0.00% | 0.0014 | |
0.1 | Lasso | 2.1554 ± 0.3275 | 15.2898 ± 4.9195 | 0 | 0.00% | 0.0011 |
Elastic Net | 1.7073 ± 0.2796 | 9.3282 ± 2.8530 | 0 | 0.00% | 0.0013 | |
Ridge Regression | 1.6610 ± 0.2693 | 8.2822 ± 2.5765 | 0 | 0.00% | 0.0012 | |
Canal-Adaptive Elastic Net | 1.5942 ± 0.2684 | 7.2797 ± 2.5177 | 26.000 ± 2.000 | 13.00% | 0.0015 | |
0.2 | Lasso | 2.3174 ± 0.2899 | 18.0449 ± 4.7368 | 0 | 0.00% | 0.0012 |
Elastic Net | 2.1440 ± 0.3066 | 15.3115 ± 4.7714 | 0 | 0.00% | 0.0013 | |
Ridge Regression | 1.5664 ± 0.1476 | 7.7189 ± 1.4264 | 0 | 0.00% | 0.0011 | |
Canal-Adaptive Elastic Net | 1.4850 ± 0.1959 | 7.0048 ± 1.3591 | 44.000 ± 4.000 | 22.00% | 0.0015 | |
0.3 | Lasso | 2.3753 ± 0.3360 | 18.8495 ± 4.6064 | 0 | 0.00% | 0.0012 |
Elastic Net | 2.2583 ± 0.2608 | 16.8451 ± 3.7473 | 0 | 0.00% | 0.0013 | |
Ridge Regression | 1.7206 ± 0.1009 | 9.0645 ± 1.7891 | 0 | 0.00% | 0.0012 | |
Canal-Adaptive Elastic Net | 1.6157 ± 0.1340 | 8.0100 ± 2.1617 | 72.000 ± 7.000 | 36.00% | 0.0015 |
Method | RMSE | MAE | Discarded Samples | Discarded Rate | Time (s) | |
---|---|---|---|---|---|---|
0 | Lasso | 2.1554 ± 0.3275 | 15.2898 ± 4.9195 | 0 | 0.00% | 0.0011 |
Elastic Net | 1.7739 ± 0.2324 | 9.8651 ± 3.6694 | 0 | 0.00% | 0.0013 | |
Ridge Regression | 1.6251 ± 0.2599 | 8.0043 ± 1.6699 | 0 | 0.00% | 0.0012 | |
Canal-Adaptive Elastic Net | 1.6109 ± 0.1826 | 7.6687 ± 1.8752 | 0 | 0.00% | 0.0014 | |
0.1 | Lasso | 2.2082 ± 0.3966 | 14.8886 ± 5.2407 | 0 | 0.00% | 0.0012 |
Elastic Net | 1.8817 ± 0.2330 | 10.2875 ± 3.9768 | 0 | 0.00% | 0.0015 | |
Ridge Regression | 1.7057 ± 0.1853 | 9.0843 ± 2.6754 | 0 | 0.00% | 0.0012 | |
Canal-Adaptive Elastic Net | 1.6013 ± 0.1743 | 7.0020 ± 2.3720 | 76.000 ± 7.000 | 38.00% | 0.0016 | |
0.2 | Lasso | 2.3659 ± 0.3966 | 18.9535 ± 3.2407 | 0 | 0.00% | 0.0012 |
Elastic Net | 1.9372 ± 0.2960 | 13.1065 ± 4.0957 | 0 | 0.00% | 0.0013 | |
Ridge Regression | 1.8668 ± 0.2369 | 9.8637 ± 2.8419 | 0 | 0.00% | 0.0012 | |
Canal-Adaptive Elastic Net | 1.6585 ± 0.2178 | 7.7834 ± 2.0399 | 84.000 ± 8.000 | 42.00% | 0.0015 | |
0.3 | Lasso | 2.4068 ± 0.2157 | 19.7197 ± 3.5115 | 0 | 0.00% | 0.0011 |
Elastic Net | 2.0314 ± 0.2787 | 14.2434 ± 4.9863 | 0 | 0.00% | 0.0014 | |
Ridge Regression | 2.0668 ± 0.1779 | 14.4463 ± 1.6615 | 0 | 0.00% | 0.0012 | |
Canal-Adaptive Elastic Net | 1.7624 ± 0.3057 | 8.6620 ± 2.7414 | 90.000 ± 0.800 | 45.00% | 0.0015 |
n | Method | RMSE | MAE | Discarded Samples | Discarded Rate | Time (s) | |
---|---|---|---|---|---|---|---|
5000 | 0 | Lasso | 0.1618 ± 0.0018 | 0.8085 ± 0.0170 | 0 | 0.00% | 0.1787 |
Elastic Net | 0.1627 ± 0.0014 | 0.8165 ± 0.0153 | 0 | 0.00% | 0.3423 | ||
Ridge Regression | 0.1626 ± 0.0021 | 0.8154 ± 0.0229 | 0 | 0.00% | 0.1951 | ||
Canal-Adaptive Elastic Net | 0.1621 ± 0.0031 | 0.8122 ± 0.0260 | 694 ± 32 | 13.89% | 0.2663 | ||
0.1 | Lasso | 0.1955 ± 0.0101 | 1.1882 ± 0.1182 | 0 | 0.00% | 0.1702 | |
Elastic Net | 0.1885 ± 0.0106 | 1.1054 ± 0.1214 | 0 | 0.00% | 0.3072 | ||
Ridge Regression | 0.1697 ± 0.0047 | 0.8966 ± 0.0509 | 0 | 0.00% | 0.186 | ||
Canal-Adaptive Elastic Net | 0.1693 ± 0.0059 | 0.8896 ± 0.0620 | 903 ± 25 | 18.07% | 0.2528 | ||
0.2 | Lasso | 0.2073 ± 0.0103 | 1.3278 ± 0.1382 | 0 | 0.00% | 0.1706 | |
Elastic Net | 0.2036 ± 0.0091 | 1.2797 ± 0.1239 | 0 | 0.00% | 0.2939 | ||
Ridge Regression | 0.1841 ± 0.0063 | 1.0430 ± 0.0704 | 0 | 0.00% | 0.1736 | ||
Canal-Adaptive Elastic Net | 0.1809 ± 0.0045 | 1.0084 ± 0.0548 | 1118 ± 18 | 22.37% | 0.2442 | ||
0.3 | Lasso | 0.2151 ± 0.0109 | 1.4320 ± 0.1292 | 0 | 0.00% | 0.1685 | |
Elastic Net | 0.2099 ± 0.0046 | 1.3597 ± 0.0636 | 0 | 0.00% | 0.3036 | ||
Ridge Regression | 0.1941 ± 0.0107 | 1.1614 ± 0.1392 | 0 | 0.00% | 0.1655 | ||
Canal-Adaptive Elastic Net | 0.1884 ± 0.0061 | 1.0968 ± 0.0707 | 1325 ± 29 | 26.52% | 0.2329 | ||
10,000 | 0 | Lasso | 0.1354 ± 0.0010 | 0.8009 ± 0.0159 | 0 | 0.00% | 0.466 |
Elastic Net | 0.1355 ± 0.0011 | 0.8010 ± 0.0127 | 0 | 0.00% | 0.7631 | ||
Ridge Regression | 0.1358 ± 0.0017 | 0.8044 ± 0.0159 | 0 | 0.00% | 0.7302 | ||
Canal-Adaptive Elastic Net | 0.1358 ± 0.0011 | 0.8054 ± 0.0138 | 1353 ± 50 | 13.54% | 0.6464 | ||
0.1 | Lasso | 0.1669 ± 0.0110 | 1.2193 ± 0.1625 | 0 | 0.00% | 0.4322 | |
Elastic Net | 0.1581 ± 0.0106 | 1.0950 ± 0.1466 | 0 | 0.00% | 0.819 | ||
Ridge Regression | 0.1449 ± 0.0033 | 0.9172 ± 0.0476 | 0 | 0.00% | 0.4735 | ||
Canal-Adaptive Elastic Net | 0.1434 ± 0.0035 | 0.8981 ± 0.0466 | 1889 ± 44 | 18.90% | 0.6209 | ||
0.2 | Lasso | 0.1758 ± 0.0048 | 1.3500 ± 0.0746 | 0 | 0.00% | 0.4313 | |
Elastic Net | 0.1737 ± 0.0075 | 1.3187 ± 0.1231 | 0 | 0.00% | 0.778 | ||
Ridge Regression | 0.1560 ± 0.0087 | 1.0660 ± 0.1086 | 0 | 0.00% | 0.4408 | ||
Canal-Adaptive Elastic Net | 0.1517 ± 0.0069 | 1.0067 ± 0.0921 | 2300 ± 34 | 23.00% | 0.587 | ||
0.3 | Lasso | 0.1825 ± 0.0060 | 1.4521 ± 0.0948 | 0 | 0.00% | 0.5801 | |
Elastic Net | 0.1785 ± 0.0053 | 1.3895 ± 0.0854 | 0 | 0.00% | 0.7243 | ||
Ridge Regression | 0.1656 ± 0.0064 | 1.1980 ± 0.0919 | 0 | 0.00% | 0.4343 | ||
Canal-Adaptive Elastic Net | 0.1603 ± 0.0045 | 1.1198 ± 0.0599 | 2661 ± 51 | 26.61% | 0.5867 |
n | Method | RMSE | MAE | Discarded Samples | Discarded Rate | Time (s) | |
---|---|---|---|---|---|---|---|
5000 | 0 | Lasso | 0.1618 ± 0.0018 | 0.8085 ± 0.0170 | 0 | 0.00% | 0.1787 |
Elastic Net | 0.1627 ± 0.0014 | 0.8165 ± 0.0153 | 0 | 0.00% | 0.3423 | ||
Ridge Regression | 0.1626 ± 0.0021 | 0.8154 ± 0.0229 | 0 | 0.00% | 0.1951 | ||
Canal-Adaptive Elastic Net | 0.1621 ± 0.0031 | 0.8122 ± 0.0260 | 694 ± 32 | 13.89% | 0.2663 | ||
0.1 | Lasso | 0.6225 ± 0.0767 | 12.3388 ± 3.0485 | 0 | 0.00% | 0.1875 | |
Elastic Net | 0.4406 ± 0.0607 | 6.7411 ± 1.8332 | 0 | 0.00% | 0.3103 | ||
Ridge Regression | 0.2215 ± 0.0532 | 1.5738 ± 0.7018 | 0 | 0.00% | 0.1983 | ||
Canal-Adaptive Elastic Net | 0.1641 ± 0.0031 | 0.8260 ± 0.0350 | 1126 ± 21 | 22.54% | 0.2627 | ||
0.2 | Lasso | 0.8503 ± 0.0511 | 23.1978 ± 2.9234 | 0 | 0.00% | 0.1789 | |
Elastic Net | 0.6047 ± 0.0648 | 13.0583 ± 3.1128 | 0 | 0.00% | 0.3212 | ||
Ridge Regression | 0.2678 ± 0.0511 | 2.2870 ± 0.8511 | 0 | 0.00% | 0.1956 | ||
Canal-Adaptive Elastic Net | 0.1643 ± 0.0040 | 0.8354 ± 0.0382 | 1539 ± 25 | 30.79% | 0.2648 | ||
0.3 | Lasso | 0.9959 ± 0.0761 | 31.8982 ± 5.2155 | 0 | 0.00% | 0.1769 | |
Elastic Net | 0.6990 ± 0.0836 | 17.7096 ± 4.2660 | 0 | 0.00% | 0.2947 | ||
Ridge Regression | 0.2583 ± 0.0447 | 2.1066 ± 0.7001 | 0 | 0.00% | 0.2001 | ||
Canal-Adaptive Elastic Net | 0.1668 ± 0.0030 | 0.8548 ± 0.0363 | 1973 ± 29 | 39.47% | 0.2608 | ||
10,000 | 0 | Lasso | 0.1354 ± 0.0010 | 0.8009 ± 0.0159 | 0 | 0.00% | 0.466 |
Elastic Net | 0.1355 ± 0.0011 | 0.8010 ± 0.0127 | 0 | 0.00% | 0.7631 | ||
Ridge Regression | 0.1358 ± 0.0017 | 0.8044 ± 0.0159 | 0 | 0.00% | 0.7302 | ||
Canal-Adaptive Elastic Net | 0.1358 ± 0.0011 | 0.8054 ± 0.0138 | 1353 ± 50 | 13.54% | 0.6464 | ||
0.1 | Lasso | 0.5265 ± 0.0631 | 12.4175 ± 2.8731 | 0 | 0.00% | 0.4823 | |
Elastic Net | 0.3636 ± 0.0573 | 6.3646 ± 2.0023 | 0 | 0.00% | 0.7922 | ||
Ridge Regression | 0.1549 ± 0.0172 | 1.0593 ± 0.2253 | 0 | 0.00% | 0.5049 | ||
Canal-Adaptive Elastic Net | 0.1360 ± 0.0019 | 0.8052 ± 0.0173 | 2274 ± 29 | 22.74% | 0.641 | ||
0.2 | Lasso | 0.6783 ± 0.0658 | 20.8429 ± 4.3086 | 0 | 0.00% | 0.4626 | |
Elastic Net | 0.4885 ± 0.0612 | 11.9550 ± 3.0504 | 0 | 0.00% | 0.7855 | ||
Ridge Regression | 0.1638 ± 0.0224 | 1.1977 ± 0.3214 | 0 | 0.00% | 0.4963 | ||
Canal-Adaptive Elastic Net | 0.1370 ± 0.0017 | 0.8204 ± 0.0242 | 3093 ± 29 | 30.94% | 0.6472 | ||
0.3 | Lasso | 0.8442 ± 0.0727 | 32.6705 ± 5.6883 | 0 | 0.00% | 0.4569 | |
Elastic Net | 0.6067 ± 0.0485 | 18.8505 ± 2.9166 | 0 | 0.00% | 0.7683 | ||
Ridge Regression | 0.1840 ± 0.0378 | 1.5151 ± 0.5767 | 0 | 0.00% | 0.4911 | ||
Canal-Adaptive Elastic Net | 0.1402 ± 0.0025 | 0.8608 ± 0.0298 | 3967 ± 23 | 39.67% | 0.6422 |
n | Method | ||||
---|---|---|---|---|---|
5000 | Lasso | 6 | 6 | 7 | 8 |
Elastic Net | 6 | 8 | 11 | 12 | |
Canal-Adaptive Elastic Net | 6 | 7 | 9 | 10 | |
10,000 | Lasso | 6 | 6 | 6 | 7 |
Elastic Net | 7 | 8 | 9 | 11 | |
Canal-Adaptive Elastic Net | 7 | 7 | 8 | 9 |
n | Method | ||||
---|---|---|---|---|---|
5000 | Lasso | 6 | 8 | 9 | 10 |
Elastic Net | 8 | 10 | 12 | 14 | |
Canal-Adaptive Elastic Net | 7 | 10 | 11 | 11 | |
10,000 | Lasso | 6 | 8 | 9 | 9 |
Elastic Net | 7 | 10 | 10 | 14 | |
Canal-Adaptive Elastic Net | 7 | 9 | 10 | 11 |
Dataset | #Sample | #Features | #Train Number | #Test Number |
---|---|---|---|---|
Kin | 3000 × 3 | 8 | 2100 × 3 | 900 × 3 |
Abalone | 4177 × 3 | 7 | 2924 × 3 | 1253 × 3 |
Letters | 5000 × 3 | 15 | 3500 × 3 | 1500 × 3 |
Pendigits | 7129 × 3 | 14 | 4990 × 3 | 2139 × 3 |
Dataset | Kin | Abalone | Letters | Pendigits |
---|---|---|---|---|
0.1 | 0.1 | 0.1 | 0.1 | |
1.9 | 2.0 | 1.5 | 1.9 |
Dataset | Method | RMSE | MAE | Discarded Samples | Discarded Rate | Time (s) | |
---|---|---|---|---|---|---|---|
Kin | 0 | Lasso | 0.0683 ± 0.0008 | 0.1951 ± 0.0052 | 0 | 0 | 0.2258 |
Elastic Net | 0.0684 ± 0.0007 | 0.1946 ± 0.0045 | 0 | 0 | 0.2821 | ||
Ridge Regression | 0.0696 ± 0.0019 | 0.2016 ± 0.0111 | 0 | 0 | 0.2659 | ||
Canal-Adaptive Elastic Net | 0.0681 ± 0.0011 | 0.1677 ± 0.0051 | 1982 ± 126 | 22.02% | 0.3297 | ||
0.1 | Lasso | 0.1074 ± 0.0265 | 0.4981 ± 0.2223 | 0 | 0 | 0.2222 | |
Elastic Net | 0.0911 ± 0.0221 | 0.3552 ± 0.1579 | 0 | 0 | 0.2891 | ||
Ridge Regression | 0.0683 ± 0.0015 | 0.1951 ± 0.0081 | 0 | 0 | 0.2631 | ||
Canal-Adaptive Elastic Net | 0.0662 ± 0.0014 | 0.1914 ± 0.0074 | 2466 ± 153 | 27.40% | 0.3321 | ||
0.2 | Lasso | 0.1365 ± 0.0230 | 0.8678 ± 0.3194 | 0 | 0 | 0.2184 | |
Elastic Net | 0.1036 ± 0.0156 | 0.4722 ± 0.1389 | 0 | 0 | 0.2877 | ||
Ridge Regression | 0.0692 ± 0.0023 | 0.2502 ± 0.0124 | 0 | 0 | 0.2675 | ||
Canal-Adaptive Elastic Net | 0.0673 ± 0.0026 | 0.1972 ± 0.0135 | 2620 ± 110 | 29.11% | 0.3300 | ||
0.3 | Lasso | 0.1695 ± 0.0170 | 1.3323 ± 0.2765 | 0 | 0 | 0.2232 | |
Elastic Net | 0.1242 ± 0.0130 | 0.6804 ± 0.1548 | 0 | 0 | 0.2809 | ||
Ridge Regression | 0.0746 ± 0.0029 | 0.2322 ± 0.0140 | 0 | 0 | 0.2622 | ||
Canal-Adaptive Elastic Net | 0.0693 ± 0.0050 | 0.2000 ± 0.0301 | 2931 ± 420 | 32.57% | 0.3317 | ||
Abalone | 0 | Lasso | 0.1898 ± 0.0032 | 1.6091 ± 0.0466 | 0 | 0 | 0.3906 |
Elastic Net | 0.1932 ± 0.0021 | 1.6553 ± 0.0492 | 0 | 0 | 0.4951 | ||
Ridge Regression | 0.2015 ± 0.0053 | 1.6560 ± 0.0513 | 0 | 0 | 0.4873 | ||
Canal-Adaptive Elastic Net | 0.1939 ± 0.0024 | 1.7259 ± 0.0415 | 758 ± 131 | 6.00% | 0.5314 | ||
0.1 | Lasso | 0.4633 ± 0.06548 | 11.1963 ± 2.6828 | 0 | 0 | 0.3707 | |
Elastic Net | 0.3807 ± 0.0552 | 8.4261 ± 2.4375 | 0 | 0 | 0.4934 | ||
Ridge Regression | 0.2201 ± 0.0034 | 2.0897 ± 0.0523 | 0 | 0 | 0.4642 | ||
Canal-Adaptive Elastic Net | 0.2010 ± 0.0024 | 1.8987 ± 0.0867 | 1305 ± 98 | 10.40% | 0.5315 | ||
0.2 | Lasso | 0.6084 ± 0.0468 | 19.1165 ± 2.9824 | 0 | 0 | 0.3763 | |
Elastic Net | 0.5164 ± 0.0808 | 15.3841 ± 4.4489 | 0 | 0 | 0.4929 | ||
Ridge Regression | 0.2672 ± 0.0041 | 3.5966 ± 0.0945 | 0 | 0 | 0.4571 | ||
Canal-Adaptive Elastic Net | 0.2248 ± 0.0037 | 2.3715 ± 0.0963 | 6161 ± 562 | 28.81% | 0.5253 | ||
0.3 | Lasso | 0.7372 ± 0.0543 | 28.0564 ± 4.3451 | 0 | 0 | 0.3821 | |
Elastic Net | 0.6355 ± 0.0751 | 23.1508 ± 6.1709 | 0 | 0 | 0.4924 | ||
Ridge Regression | 0.3372 ± 0.0070 | 6.1546 ± 0.1947 | 0 | 0 | 0.4484 | ||
Canal-Adaptive Elastic Net | 0.2646 ± 0.0056 | 3.1383 ± 0.1093 | 4351 ± 860 | 34.70% | 0.5209 |
Dataset | Method | RMSE | MAE | Discarded Samples | Discarded Rate | Time(s) | |
---|---|---|---|---|---|---|---|
Letters | 0 | Lasso | 0.3463 ± 0.0028 | 6.5160 ± 0.1389 | 0 | 0 | 0.5545 |
Elastic Net | 0.3478 ± 0.0035 | 6.3841 ± 0.0890 | 0 | 0 | 0.7206 | ||
Ridge Regression | 0.3503 ± 0.0024 | 6.3821 ± 0.1006 | 0 | 0 | 0.6559 | ||
Canal-Adaptive Elastic Net | 0.3507 ± 0.0030 | 6.3834 ± 0.0802 | 2558 ± 211 | 17.05% | 0.7862 | ||
0.1 | Lasso | 0.4708 ± 0.0556 | 12.1065 ± 2.8283 | 0 | 0 | 0.5841 | |
Elastic Net | 0.3905 ± 0.0435 | 8.2491 ± 1.7055 | 0 | 0 | 0.7142 | ||
Ridge Regression | 0.3219 ± 0.0069 | 5.6313 ± 0.2547 | 0 | 0 | 0.6690 | ||
Canal-Adaptive Elastic Net | 0.3162 ± 0.0023 | 5.4247 ± 0.0973 | 3385 ± 162 | 22.57% | 0.7994 | ||
0.2 | Lasso | 0.5841 ± 0.0665 | 19.5762 ± 4.6320 | 0 | 0 | 0.5656 | |
Elastic Net | 0.4731 ± 0.0614 | 12.7691 ± 3.1076 | 0 | 0 | 0.7180 | ||
Ridge Regression | 0.3345 ± 0.0097 | 6.0297 ± 0.2622 | 0 | 0 | 0.6621 | ||
Canal-Adaptive Elastic Net | 0.3296 ± 0.0029 | 5.8493 ± 0.0884 | 4245 ± 143 | 28.30% | 0.8144 | ||
0.3 | Lasso | 0.7574 ± 0.0666 | 33.8068 ± 6.3160 | 0 | 0 | 0.5730 | |
Elastic Net | 0.5822 ± 0.0780 | 20.2497 ± 5.1140 | 0 | 0 | 0.7283 | ||
Ridge Regression | 0.3658 ± 0.0151 | 7.1164 ± 0.5715 | 0 | 0 | 0.6688 | ||
Canal-Adaptive Elastic Net | 0.3453 ± 0.0052 | 6.3131 ± 0.1868 | 4950 ± 116 | 33.00% | 0.7812 | ||
Pendigits | 0 | Lasso | 0.1806 ± 0.0014 | 2.0619 ± 0.0623 | 0 | 0 | 1.0078 |
Elastic Net | 0.1823 ± 0.0017 | 1.9752 ± 0.0340 | 0 | 0 | 1.3111 | ||
Ridge Regression | 0.1839 ± 0.0023 | 1.9378 ± 0.0369 | 0 | 0 | 1.2064 | ||
Canal-Adaptive Elastic Net | 0.1843 ± 0.0014 | 1.9436 ± 0.0239 | 4271 ± 445 | 19.97% | 1.4879 | ||
0.1 | Lasso | 0.2569 ± 0.0356 | 4.3298 ± 0.9613 | 0 | 0 | 0.976 | |
Elastic Net | 0.2041 ± 0.0219 | 2.6950 ± 0.5324 | 0 | 0 | 1.2822 | ||
Ridge Regression | 0.1890 ± 0.0021 | 2.0031 ± 0.0424 | 0 | 0 | 1.2495 | ||
Canal-Adaptive Elastic Net | 0.1809 ± 0.0018 | 1.8885 ± 0.0431 | 5143 ± 356 | 24.05% | 1.4029 | ||
0.2 | Lasso | 0.3089 ± 0.0425 | 6.2770 ± 1.3394 | 0 | 0 | 0.9855 | |
Elastic Net | 0.2477 ± 0.0366 | 3.9761 ± 1.0450 | 0 | 0 | 1.2542 | ||
Ridge Regression | 0.2085 ± 0.0018 | 2.8034 ± 0.0480 | 0 | 0 | 1.2359 | ||
Canal-Adaptive Elastic Net | 0.1813 ± 0.0014 | 1.9111 ± 0.0327 | 6161 ± 562 | 28.81% | 1.3915 | ||
0.3 | Lasso | 0.3736 ± 0.0358 | 9.5432 ± 2.4227 | 0 | 0 | 0.9999 | |
Elastic Net | 0.2625 ± 0.0244 | 4.4683 ± 0.7597 | 0 | 0 | 1.2752 | ||
Ridge Regression | 0.2320 ± 0.0024 | 3.5514 ± 0.0530 | 0 | 0 | 1.2178 | ||
Canal-Adaptive Elastic Net | 0.1825 ± 0.0023 | 1.9490 ± 0.0373 | 7803 ± 429 | 36.48% | 1.4096 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, W.; Liang, J.; Liu, R.; Song, Y.; Zhang, M. A Robust Variable Selection Method for Sparse Online Regression via the Elastic Net Penalty. Mathematics 2022, 10, 2985. https://doi.org/10.3390/math10162985
Wang W, Liang J, Liu R, Song Y, Zhang M. A Robust Variable Selection Method for Sparse Online Regression via the Elastic Net Penalty. Mathematics. 2022; 10(16):2985. https://doi.org/10.3390/math10162985
Chicago/Turabian StyleWang, Wentao, Jiaxuan Liang, Rong Liu, Yunquan Song, and Min Zhang. 2022. "A Robust Variable Selection Method for Sparse Online Regression via the Elastic Net Penalty" Mathematics 10, no. 16: 2985. https://doi.org/10.3390/math10162985
APA StyleWang, W., Liang, J., Liu, R., Song, Y., & Zhang, M. (2022). A Robust Variable Selection Method for Sparse Online Regression via the Elastic Net Penalty. Mathematics, 10(16), 2985. https://doi.org/10.3390/math10162985