The One Standard Error Rule for Model Selection: Does It Work?
<p>Probability that model with <math display="inline"><semantics> <msub> <mi>λ</mi> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> </msub> </semantics></math> outperforms model with <math display="inline"><semantics> <msub> <mi>λ</mi> <mrow> <mn>1</mn> <mi>s</mi> <mi>e</mi> </mrow> </msub> </semantics></math> for estimation.</p> "> Figure 2
<p>Probability that model with <math display="inline"><semantics> <msub> <mi>λ</mi> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> </msub> </semantics></math> outperforms model with <math display="inline"><semantics> <msub> <mi>λ</mi> <mrow> <mn>1</mn> <mi>s</mi> <mi>e</mi> </mrow> </msub> </semantics></math> for estimation.</p> "> Figure 3
<p>Difference in proportions of selecting true model: <math display="inline"><semantics> <mrow> <msub> <mi>P</mi> <mi>c</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>λ</mi> <mrow> <mn>1</mn> <mi>s</mi> <mi>e</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>−</mo> <msub> <mi>P</mi> <mi>c</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>λ</mi> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> </msub> <mo>)</mo> </mrow> </mrow> </semantics></math> (positive means 1se is better).</p> "> Figure 4
<p>Difference in proportions of selecting true model: <math display="inline"><semantics> <mrow> <msub> <mi>P</mi> <mi>c</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>λ</mi> <mrow> <mn>1</mn> <mi>s</mi> <mi>e</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>−</mo> <msub> <mi>P</mi> <mi>c</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>λ</mi> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> </msub> <mo>)</mo> </mrow> </mrow> </semantics></math> (positive means 1se is better).</p> "> Figure 5
<p>Difference in proportions of selecting true model: <math display="inline"><semantics> <mrow> <msub> <mi>P</mi> <mi>c</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>λ</mi> <mrow> <mn>1</mn> <mi>s</mi> <mi>e</mi> </mrow> </msub> <mo>)</mo> </mrow> <mo>−</mo> <msub> <mi>P</mi> <mi>c</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>λ</mi> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> </msub> <mo>)</mo> </mrow> </mrow> </semantics></math> (positive means 1se is better).</p> "> Figure A1
<p>Histogram of <a href="#stats-04-00051-t0A1" class="html-table">Table A1</a> and <a href="#stats-04-00051-t0A2" class="html-table">Table A2</a>.</p> "> Figure A2
<p>Histogram of <a href="#stats-04-00051-t0A3" class="html-table">Table A3</a> and <a href="#stats-04-00051-t0A4" class="html-table">Table A4</a>.</p> "> Figure A3
<p>Boston Housing Price: <math display="inline"><semantics> <mrow> <mo>∣</mo> <mi>S</mi> <mo>▽</mo> <mover accent="true"> <mi>S</mi> <mo stretchy="false">^</mo> </mover> <mo>∣</mo> <mo>;</mo> <mi>S</mi> <mo>=</mo> <mi>v</mi> <mi>a</mi> <msub> <mi>r</mi> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> </msub> <mo>,</mo> </mrow> </semantics></math> (<b>a</b>)<math display="inline"><semantics> <mrow> <mspace width="3.33333pt"/> <mover accent="true"> <mi>S</mi> <mo stretchy="false">^</mo> </mover> <mo>=</mo> <mi>v</mi> <mi>a</mi> <msub> <mi>r</mi> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> </msub> <mo>;</mo> </mrow> </semantics></math> (<b>b</b>)<math display="inline"><semantics> <mrow> <mspace width="3.33333pt"/> <mover accent="true"> <mi>S</mi> <mo stretchy="false">^</mo> </mover> <mo>=</mo> <mi>v</mi> <mi>a</mi> <msub> <mi>r</mi> <mrow> <mn>1</mn> <mi>s</mi> <mi>e</mi> </mrow> </msub> </mrow> </semantics></math>.</p> "> Figure A4
<p>Boston Housing Price: <math display="inline"><semantics> <mrow> <mo>∣</mo> <mi>S</mi> <mo>▽</mo> <mover accent="true"> <mi>S</mi> <mo stretchy="false">^</mo> </mover> <mo>∣</mo> <mo>;</mo> <mi>S</mi> <mo>=</mo> <mi>v</mi> <mi>a</mi> <msub> <mi>r</mi> <mrow> <mn>1</mn> <mi>s</mi> <mi>e</mi> </mrow> </msub> <mo>,</mo> </mrow> </semantics></math> (<b>a</b>)<math display="inline"><semantics> <mrow> <mspace width="3.33333pt"/> <mover accent="true"> <mi>S</mi> <mo stretchy="false">^</mo> </mover> <mo>=</mo> <mi>v</mi> <mi>a</mi> <msub> <mi>r</mi> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> </msub> <mo>;</mo> </mrow> </semantics></math> (<b>b</b>)<math display="inline"><semantics> <mrow> <mspace width="3.33333pt"/> <mover accent="true"> <mi>S</mi> <mo stretchy="false">^</mo> </mover> <mo>=</mo> <mi>v</mi> <mi>a</mi> <msub> <mi>r</mi> <mrow> <mn>1</mn> <mi>s</mi> <mi>e</mi> </mrow> </msub> </mrow> </semantics></math>.</p> "> Figure A5
<p>Bardet–Biedl data: <math display="inline"><semantics> <mrow> <mo>∣</mo> <mi>S</mi> <mo>▽</mo> <mover accent="true"> <mi>S</mi> <mo stretchy="false">^</mo> </mover> <mo>∣</mo> <mo>;</mo> <mi>S</mi> <mo>=</mo> <mi>v</mi> <mi>a</mi> <msub> <mi>r</mi> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> </msub> <mo>,</mo> </mrow> </semantics></math> (<b>a</b>)<math display="inline"><semantics> <mrow> <mspace width="3.33333pt"/> <mover accent="true"> <mi>S</mi> <mo stretchy="false">^</mo> </mover> <mo>=</mo> <mi>v</mi> <mi>a</mi> <msub> <mi>r</mi> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> </msub> <mo>;</mo> </mrow> </semantics></math> (<b>b</b>)<math display="inline"><semantics> <mrow> <mspace width="3.33333pt"/> <mover accent="true"> <mi>S</mi> <mo stretchy="false">^</mo> </mover> <mo>=</mo> <mi>v</mi> <mi>a</mi> <msub> <mi>r</mi> <mrow> <mn>1</mn> <mi>s</mi> <mi>e</mi> </mrow> </msub> </mrow> </semantics></math>.</p> "> Figure A6
<p>Bardet–Biedl data: <math display="inline"><semantics> <mrow> <mo>∣</mo> <mi>S</mi> <mo>▽</mo> <mover accent="true"> <mi>S</mi> <mo stretchy="false">^</mo> </mover> <mo>∣</mo> <mo>;</mo> <mi>S</mi> <mo>=</mo> <mi>v</mi> <mi>a</mi> <msub> <mi>r</mi> <mrow> <mn>1</mn> <mi>s</mi> <mi>e</mi> </mrow> </msub> <mo>,</mo> </mrow> </semantics></math> (<b>a</b>)<math display="inline"><semantics> <mrow> <mspace width="3.33333pt"/> <mover accent="true"> <mi>S</mi> <mo stretchy="false">^</mo> </mover> <mo>=</mo> <mi>v</mi> <mi>a</mi> <msub> <mi>r</mi> <mrow> <mi>m</mi> <mi>i</mi> <mi>n</mi> </mrow> </msub> <mo>;</mo> </mrow> </semantics></math> (<b>b</b>)<math display="inline"><semantics> <mrow> <mspace width="3.33333pt"/> <mover accent="true"> <mi>S</mi> <mo stretchy="false">^</mo> </mover> <mo>=</mo> <mi>v</mi> <mi>a</mi> <msub> <mi>r</mi> <mrow> <mn>1</mn> <mi>s</mi> <mi>e</mi> </mrow> </msub> </mrow> </semantics></math>.</p> ">
Abstract
:1. Background
1.1. Regularization Methods
1.2. Tuning Parameter Selection
1.3. Goal for the One Standard Error Rule
2. Theoretical Result on the One-Standard-Error Rule
2.1. When Is the Standard Error Formula Valid?
2.2. An Illustrative Example
3. Numerical Research Objectives and Setup
- (1)
- Does the 1se itself provide a good estimation of the standard deviation of the cross validation error as intended?
- (2)
- Does the model selected by the 1se rule (model with ) typically outperform the model selected by minimizing the CV error (model with ) in variable selection?
- (3)
- What if estimating the regression function or prediction is the goal?
3.1. Simulation Settings
3.2. Regression Estimation
3.3. Variable Selection
4. Simulation Results
4.1. Is 1se on Target of Estimating the Standard Deviation?
4.1.1. Data Generating Processes (DGP)
- DGP 1: Y is generated by 5 predictors (out of 30, , ).
- DGP 2: Y is generated by 20 predictors (out of 30, , ).
- DGP 3: Y is generated by 20 predictors (out of 1000, , ).
4.1.2. Procedure
- (1)
- Do the j-th simulation (). Simulate a data set of sample size n given the parameter values.
- (2)
- K-fold cross validation. Randomly split the data set into K equal-sized parts: and record the cross validation error for each part where .
- (3)
- Calculate the total cross validation error, , by taking the mean of these K numbers. Find that minimizes the total CV error. Now, we take throughout the remaining steps. Calculate the standard error of cross validation error by one standard error rule:
- (4)
- Repeat step 1 to step 3 for N times (). Calculate the standard deviation of the cross validation error , and call it , and the mean standard error of cross validation error (as used for 1se rule), .
- (5)
- Performance assessment. Calculate the ratio of (the claimed) standard error over (the simulated true) standard deviation:
4.2. Is 1se Rule Better for Regression Estimation?
4.2.1. Model Specification
4.2.2. Procedure
- Simulate a fixed validation set of 500 observations of the predictors for estimation of the loss.
- Each time randomly simulate a training set of n observations from the same data generating process.
- Apply the 1se rule (10-fold cross validation) over the training set and record the selected model: model with and the model with . Calculate the estimation losses of these two models: , where is based on or respectively, and are independently generated from the same distribution used to generate X.
- Repeat this process for M times () and calculate the fraction that model with has a smaller loss.
4.2.3. Case 1: Large
4.2.4. Case 2: Small
4.3. Is 1se Rule Better for Variable Selection?
4.3.1. Procedure
- Randomly simulate a data set of sample size n given the parameter values.
- Perform variable selection with and over the simulated data set, respectively. If the two returned models are the same, then discard the result and go back to step 1. Otherwise, check their variable selection results.
- Repeat the above process for M times () and calculate the fraction of correct variable selection, denoted by and , respectively. We report the proportion difference . Positive means 1se rule is better.
4.3.2. Case 1: Constant Coefficients
4.3.3. Case 2: Decaying Coefficients
4.3.4. Case 3: Hybrid Case
5. Data Examples
5.1. Regression Estimation
5.1.1. Procedure for Cross Validation
- We randomly select observations from the data sets as training set and the rest as validation set, for Boston Housing Price and for Bardet–Biedl data.
- Apply K-fold cross validation , for Boston Housing Price and for Bardet–Biedl data over the training set and compute the mean square prediction error of model and model over the validation set.
- Repeat the above process for 500 times and compute the proportion of better estimation for each method.
5.1.2. Procedure for DGS
- Obtain model and model with K-fold cross validation, , over the data and also the coefficient estimates , , the standard deviation estimates , and the estimated responses , (fitted values).
- Simulate the new responses in two scenarios:Scenario 1: with ,Scenario 2: with .
- Here, apply Lasso with and (using the same K-fold CV) on the new data set (i.e., new response and the original design matrix) and get the new estimated responses and for each of the two scenarios.
- Calculate the mean square estimation error and for scenario 1;and for scenario 2.
- Repeat the above resampling process for 500 times and compute the proportion of better estimation for each method.
5.2. Variable Selection: DGS
Procedure
- Apply the 10-fold (default) cross validation over real data set and select the true set of variables : or , either by the model with or by the model with .
- Do least square estimation by regressing the response on the true set of variables and obtain the estimated response and the residual standard error .
- Simulate the new response by adding the error term randomly generated from to . Apply K-fold cross validation, , over the simulated data set (i.e., and the original design matrix) and select set of variables : or , either by model or by model . Repeat this process for 500 times.
- Calculate the symmetric difference .
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Conflicts of Interest
Appendix A. Additional Numerical Results
DGP1 | DGP2 | DGP3 | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2-fold | 5-fold | 10-fold | 20-fold | 2-fold | 5-fold | 10-fold | 20-fold | 2-fold | 5-fold | 10-fold | 20-fold | |||
Lasso | 1.489 | 1.305 | 1.275 | 1.271 | 1.068 | 1.185 | 1.183 | 1.127 | 1.015 | 1.217 | 1.266 | 1.222 | ||
Ridge | 1.187 | 1.104 | 0.975 | 0.924 | 1.370 | 1.171 | 1.082 | 1.031 | 0.918 | 1.190 | 1.252 | 1.194 | ||
Lasso | 1.759 | 1.376 | 1.145 | 1.088 | 1.247 | 1.181 | 1.148 | 1.134 | 0.886 | 1.094 | 0.906 | 0.728 | ||
Ridge | 1.499 | 1.350 | 1.231 | 1.129 | 1.512 | 1.310 | 1.202 | 1.102 | 1.003 | 1.092 | 1.093 | 1.049 | ||
Lasso | 1.260 | 1.162 | 1.067 | 1.071 | 1.333 | 1.202 | 1.164 | 1.119 | 1.121 | 0.580 | 0.319 | 0.229 | ||
Ridge | 1.540 | 1.332 | 1.217 | 1.108 | 1.553 | 1.360 | 1.252 | 1.142 | 1.080 | 0.969 | 0.968 | 0.949 | ||
Lasso | 1.221 | 1.226 | 1.123 | 1.134 | 1.876 | 1.100 | 1.028 | 0.997 | 1.519 | 1.166 | 1.101 | 1.111 | ||
Ridge | 1.725 | 1.011 | 0.880 | 0.853 | 1.490 | 0.882 | 0.780 | 0.777 | 1.385 | 1.299 | 1.254 | 1.251 | ||
Lasso | 1.960 | 1.233 | 1.068 | 1.031 | 1.725 | 1.219 | 1.073 | 0.959 | 1.853 | 0.889 | 0.711 | 0.637 | ||
Ridge | 2.173 | 1.243 | 1.060 | 1.029 | 2.069 | 1.188 | 1.010 | 0.978 | 1.346 | 1.224 | 1.183 | 1.177 | ||
Lasso | 1.650 | 1.031 | 0.955 | 0.873 | 2.122 | 1.184 | 1.003 | 0.880 | 1.868 | 0.831 | 0.668 | 0.598 | ||
Ridge | 2.139 | 1.218 | 1.032 | 0.996 | 2.154 | 1.244 | 1.057 | 1.020 | 1.180 | 1.104 | 1.085 | 1.079 |
DGP1 | DGP2 | DGP3 | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2-fold | 5-fold | 10-fold | 20-fold | 2-fold | 5-fold | 10-fold | 20-fold | 2-fold | 5-fold | 10-fold | 20-fold | |||
Lasso | 1.275 | 1.225 | 1.130 | 1.077 | 1.029 | 1.087 | 1.110 | 1.128 | 1.221 | 0.952 | 0.581 | 0.379 | ||
Ridge | 0.973 | 0.987 | 0.918 | 0.909 | 1.035 | 1.012 | 0.964 | 0.905 | 1.152 | 1.130 | 1.143 | 1.108 | ||
Lasso | 1.532 | 1.197 | 1.055 | 0.999 | 1.121 | 1.054 | 1.132 | 1.030 | 1.243 | 0.978 | 0.651 | 0.459 | ||
Ridge | 1.039 | 0.989 | 0.936 | 0.924 | 1.083 | 1.043 | 0.977 | 0.946 | 1.049 | 0.986 | 1.004 | 0.967 | ||
Lasso | 1.569 | 1.241 | 1.074 | 1.026 | 1.109 | 1.125 | 1.061 | 1.041 | 1.236 | 1.033 | 0.767 | 0.584 | ||
Ridge | 1.027 | 0.990 | 0.938 | 0.933 | 1.050 | 1.030 | 0.963 | 0.946 | 1.006 | 0.951 | 0.963 | 0.930 | ||
Lasso | 1.126 | 0.914 | 0.748 | 0.668 | 1.499 | 1.046 | 0.864 | 0.820 | 1.855 | 0.961 | 0.789 | 0.741 | ||
Ridge | 1.509 | 1.108 | 0.964 | 0.909 | 1.677 | 1.007 | 0.900 | 0.854 | 1.203 | 1.127 | 1.100 | 1.091 | ||
Lasso | 1.100 | 0.749 | 0.641 | 0.597 | 1.624 | 1.181 | 0.984 | 0.888 | 2.065 | 1.026 | 0.823 | 0.747 | ||
Ridge | 1.264 | 1.037 | 0.975 | 0.959 | 1.327 | 0.994 | 0.944 | 0.926 | 1.003 | 0.962 | 0.962 | 0.964 | ||
Lasso | 1.864 | 1.193 | 0.964 | 0.879 | 1.866 | 0.965 | 0.797 | 0.710 | 2.102 | 1.042 | 0.836 | 0.756 | ||
Ridge | 1.145 | 1.004 | 0.963 | 0.960 | 1.131 | 0.975 | 0.944 | 0.938 | 0.955 | 0.935 | 0.937 | 0.941 |
DGP1 | DGP2 | DGP3 | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2-fold | 5-fold | 10-fold | 20-fold | 2-fold | 5-fold | 10-fold | 20-fold | 2-fold | 5-fold | 10-fold | 20-fold | |||
Lasso | 1.452 | 1.346 | 1.322 | 1.307 | 0.958 | 1.132 | 1.140 | 1.103 | 0.953 | 1.155 | 1.214 | 1.161 | ||
Ridge | 1.211 | 1.279 | 1.068 | 1.047 | 1.573 | 1.391 | 1.246 | 1.184 | 1.046 | 1.370 | 1.399 | 1.356 | ||
Lasso | 1.595 | 1.284 | 1.170 | 1.134 | 0.955 | 1.186 | 1.195 | 1.136 | 0.947 | 1.178 | 1.231 | 1.203 | ||
Ridge | 1.307 | 1.280 | 1.144 | 1.074 | 1.783 | 1.428 | 1.384 | 1.290 | 1.068 | 1.397 | 1.440 | 1.380 | ||
Lasso | 1.529 | 1.415 | 1.176 | 1.070 | 0.998 | 1.280 | 1.260 | 1.254 | 1.039 | 1.312 | 1.290 | 1.270 | ||
Ridge | 1.432 | 1.340 | 1.221 | 1.130 | 1.566 | 1.312 | 1.208 | 1.111 | 1.092 | 1.422 | 1.471 | 1.389 | ||
Lasso | 1.193 | 1.187 | 1.110 | 1.119 | 1.084 | 0.804 | 0.773 | 0.744 | 1.492 | 1.128 | 1.122 | 1.079 | ||
Ridge | 2.087 | 1.123 | 0.970 | 0.912 | 1.742 | 1.448 | 1.423 | 1.301 | 1.726 | 1.596 | 1.509 | 1.482 | ||
Lasso | 1.995 | 1.469 | 1.330 | 1.292 | 1.389 | 1.035 | 0.950 | 0.948 | 1.573 | 1.225 | 1.154 | 1.159 | ||
Ridge | 2.017 | 1.196 | 1.029 | 1.011 | 2.315 | 1.448 | 1.200 | 1.088 | 1.744 | 1.599 | 1.505 | 1.486 | ||
Lasso | 2.200 | 1.023 | 0.883 | 0.806 | 2.128 | 1.375 | 1.162 | 1.127 | 2.144 | 1.385 | 1.235 | 1.204 | ||
Ridge | 2.163 | 1.260 | 1.078 | 1.054 | 1.991 | 1.261 | 1.050 | 1.008 | 1.783 | 1.612 | 1.500 | 1.490 |
DGP1 | DGP2 | DGP3 | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2-fold | 5-fold | 10-fold | 20-fold | 2-fold | 5-fold | 10-fold | 20-fold | 2-fold | 5-fold | 10-fold | 20-fold | |||
Lasso | 1.180 | 1.248 | 1.178 | 1.158 | 0.921 | 1.082 | 1.160 | 1.109 | 1.197 | 1.172 | 0.599 | 0.386 | ||
Ridge | 0.918 | 1.149 | 1.099 | 1.104 | 1.121 | 1.089 | 1.009 | 0.851 | 1.103 | 1.112 | 1.096 | 1.050 | ||
Lasso | 1.311 | 1.241 | 1.133 | 1.075 | 0.930 | 1.110 | 1.195 | 1.129 | 1.371 | 1.166 | 0.594 | 0.380 | ||
Ridge | 0.953 | 0.980 | 0.898 | 0.886 | 1.116 | 0.997 | 0.858 | 0.740 | 1.122 | 1.116 | 1.096 | 1.054 | ||
Lasso | 1.366 | 1.202 | 1.087 | 1.057 | 0.966 | 1.104 | 1.151 | 1.123 | 1.387 | 1.155 | 0.597 | 0.382 | ||
Ridge | 1.018 | 0.985 | 0.917 | 0.906 | 1.135 | 1.016 | 0.896 | 0.799 | 1.159 | 1.107 | 1.092 | 1.052 | ||
Lasso | 0.939 | 0.978 | 0.842 | 0.763 | 1.149 | 1.030 | 1.093 | 1.003 | 1.068 | 1.050 | 1.057 | 1.067 | ||
Ridge | 1.475 | 0.969 | 0.768 | 0.719 | 1.101 | 1.045 | 0.942 | 0.896 | 1.543 | 1.446 | 1.412 | 1.396 | ||
Lasso | 1.311 | 1.003 | 0.833 | 0.788 | 0.991 | 1.001 | 1.065 | 0.954 | 1.181 | 1.027 | 1.063 | 1.017 | ||
Ridge | 1.511 | 1.086 | 0.924 | 0.874 | 1.362 | 0.572 | 0.460 | 0.433 | 1.370 | 1.185 | 1.154 | 1.142 | ||
Lasso | 0.996 | 0.824 | 0.650 | 0.631 | 1.317 | 0.971 | 0.909 | 0.798 | 1.293 | 1.000 | 1.014 | 0.957 | ||
Ridge | 1.405 | 1.099 | 0.974 | 0.934 | 1.552 | 0.793 | 0.679 | 0.652 | 1.303 | 1.157 | 1.136 | 1.131 |
Low-Dimension | High-Dimension | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Decay | Decay | |||||||||
0.52 | 0.62 | 1.00 | 0.52 | 0.46 | 0.64 | 0.84 | 0.54 | |||
0.42 | 0.52 | 0.52 | 0.54 | 0.58 | 0.50 | 0.50 | 0.52 | |||
0.54 | 0.83 | 0.92 | 0.52 | 0.58 | 0.70 | 0.90 | 0.58 | |||
0.44 | 0.56 | 0.54 | 0.58 | 0.38 | 0.60 | 0.60 | 0.60 | |||
0.56 | 0.88 | 1.00 | 0.60 | 0.54 | 0.84 | 0.96 | 0.54 | |||
0.54 | 0.56 | 0.56 | 0.54 | 0.57 | 0.52 | 0.52 | 0.46 | |||
0.58 | 0.57 | NaN | 0.58 | 0.64 | 0.71 | NaN | 0.74 | |||
0.45 | 0.58 | 0.56 | 0.52 | 0.57 | 0.66 | 0.66 | 0.72 | |||
0.50 | NaN | NaN | 0.54 | 0.68 | NaN | NaN | 0.68 | |||
0.52 | 0.50 | 0.52 | 0.54 | 0.64 | 0.68 | 0.68 | 0.70 | |||
0.56 | NaN | NaN | 0.60 | 0.66 | NaN | NaN | 0.70 | |||
0.56 | 0.54 | 0.58 | 0.54 | 0.68 | 0.66 | 0.64 | 0.66 |
Low-Dimension | High-Dimension | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Decay | Decay | |||||||||
0.56 | 0.88 | 0.89 | 0.53 | 0.54 | 0.84 | 1.00 | 0.64 | |||
0.53 | 0.56 | 0.54 | 0.52 | 0.61 | 0.52 | 0.56 | 0.60 | |||
0.60 | 1.00 | 1.00 | 0.70 | 0.56 | 0.82 | 0.90 | 0.70 | |||
0.58 | 0.54 | 0.62 | 0.62 | 0.62 | 0.58 | 0.64 | 0.64 | |||
0.50 | NaN | NaN | 0.50 | 0.74 | NaN | NaN | 0.78 | |||
0.50 | 0.42 | 0.50 | 0.44 | 0.72 | 0.74 | 0.76 | 0.78 | |||
0.48 | NaN | NaN | 0.54 | 0.74 | NaN | NaN | 0.78 | |||
0.52 | 0.50 | 0.52 | 0.46 | 0.72 | 0.74 | 0.76 | 0.76 |
Low-Dimension | High-Dimension | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Decay | Decay | |||||||||
0.13 | 0.13 | 0 | 0.06 | 0 | 0 | 0.02 | NaN | |||
0 | 0.12 | 0.11 | 0.01 | 0 | 0 | 0 | 0.90 | |||
0.45 | 0 | 0 | 0 | 0.25 | 0.01 | 0 | NaN | |||
0 | 0.40 | 0.43 | 0.05 | 0 | 0.16 | 0.27 | 0.87 | |||
0.40 | 0 | 0 | 0 | 0.61 | 0 | 0 | NaN | |||
0 | 0.14 | 0.39 | 0.01 | 0 | 0.25 | 0.62 | 0.7 | |||
0.72 | 0.10 | NaN | 0.70 | 0.27 | 0.16 | NaN | NaN | |||
0 | 0.71 | 0.70 | 0 | 0 | 0.25 | 0.25 | 0.52 | |||
0.86 | NaN | NaN | 0.03 | 0.86 | NaN | NaN | NaN | |||
0 | 0.85 | 0.85 | 0 | 0 | 0.85 | 0.85 | 0.95 | |||
0.56 | NaN | NaN | 0 | 0.76 | NaN | NaN | NaN | |||
0 | 0.67 | 0.36 | 0 | 0 | 0.77 | 0.17 | 0.65 |
Low-Dimension | High-Dimension | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Decay | Decay | |||||||||
0 | 0 | 0.1 | 0 | 0 | 0 | 0 | 0 | |||
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |||
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |||
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |||
0 | NaN | NaN | 0.01 | 0 | NaN | NaN | 0 | |||
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |||
0 | NaN | NaN | 0.01 | 0 | NaN | NaN | 0 | |||
0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Appendix B. Proof of the Main Result
Appendix B.1. A Lemma
Appendix B.2. Proof of Lemma A1
Appendix B.3. Proof of Theorem 1
References
- Nan, Y.; Yang, Y. Variable selection diagnostics measures for high-dimensional regression. J. Comput. Graph. Stat. 2014, 23, 636–656. [Google Scholar] [CrossRef]
- Yu, Y.; Yang, Y.; Yang, Y. Performance assessment of high-dimensional variable identification. arXiv 2017, arXiv:1704.08810. [Google Scholar]
- Ye, C.; Yang, Y.; Yang, Y. Sparsity oriented importance learning for high-dimensional linear regression. J. Am. Stat. Assoc. 2018, 113, 1797–1812. [Google Scholar] [CrossRef]
- Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. (Methodol.) 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Tikhonov, A.N. On the stability of inverse problems. Dokl. Akad. Nauk SSSR 1943, 39, 195–198. [Google Scholar]
- Horel, A.E. Applications of ridge analysis to regression problems. Chem. Eng. Progress. 1962, 58, 54–59. [Google Scholar]
- Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control 1974, 19, 716–723. [Google Scholar] [CrossRef]
- Hurvich, C.M.; Tsai, C.L. Regression and time series model selection in small samples. Biometrika 1989, 76, 297–307. [Google Scholar] [CrossRef]
- Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
- Chen, J.; Chen, Z. Extended Bayesian information criteria for model selection with large model spaces. Biometrika 2008, 95, 759–771. [Google Scholar] [CrossRef] [Green Version]
- Wang, H.; Li, R.; Tsai, C.L. Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika 2007, 94, 553–568. [Google Scholar] [CrossRef]
- Allen, D.M. The relationship between variable selection and data agumentation and a method for prediction. Technometrics 1974, 16, 125–127. [Google Scholar] [CrossRef]
- Stone, M. Cross-validatory choice and assessment of statistical predictions. J. R. Stat. Soc. Ser. (Methodol.) 1974, 36, 111–133. [Google Scholar] [CrossRef]
- Geisser, S. The predictive sample reuse method with applications. J. Am. Stat. Assoc. 1975, 70, 320–328. [Google Scholar] [CrossRef]
- Zhang, Y.; Yang, Y. Cross-validation for selecting a model selection procedure. J. Econom. 2015, 187, 95–112. [Google Scholar] [CrossRef]
- Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; Routledge: Oxfordshire, UK, 2017. [Google Scholar]
- Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.; Springer Series in Statistics; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
- Yang, Y. Comparing learning methods for classification. Stat. Sin. 2006, 16, 635–657. [Google Scholar]
- Harrison, D., Jr.; Rubinfeld, D.L. Hedonic housing prices and the demand for clean air. J. Environ. Econ. Manag. 1978, 5, 81–102. [Google Scholar] [CrossRef] [Green Version]
- Scheetz, T.E.; Kim, K.Y.; Swiderski, R.E.; Philp, A.R.; Braun, T.A.; Knudtson, K.L.; Dorrance, A.M.; DiBona, G.F.; Huang, J.; Casavant, T.L.; et al. Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proc. Natl. Acad. Sci. USA 2006, 103, 14429–14434. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Meinshausen, N.; Bühlmann, P. Stability selection. J. R. Stat. Soc. Ser. (Stat. Methodol.) 2010, 72, 417–473. [Google Scholar] [CrossRef]
- Lim, C.; Yu, B. Estimation stability with cross-validation (ESCV). J. Comput. Graph. Stat. 2016, 25, 464–492. [Google Scholar] [CrossRef] [Green Version]
- Yang, W.; Yang, Y. Toward an objective and reproducible model choice via variable selection deviation. Biometrics 2017, 73, 20–30. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Parameter | Values |
---|---|
for constant coefficients | |
where for decaying coefficients | |
n | |
where | |
for AR(1) covariance; | |
for compound symmetry covariance |
Proportion of Being Better in Prediction | |||
---|---|---|---|
5-fold | 10-fold | 20-fold | |
train set | 0.554 | 0.526 | 0.538 |
train set | 0.560 | 0.556 | 0.564 |
train set | 0.552 | 0.564 | 0.558 |
Proportion of Being Better in Prediction | ||
---|---|---|
5-fold | 10-fold | |
train set | 0.582 | 0.566 |
train set | 0.580 | 0.594 |
Proportion of Being Better in Estimation | |||||
---|---|---|---|---|---|
Boston Housing Price | Bardet–Biedl | ||||
5-fold | 10-fold | 20-fold | 5-fold | 10-fold | |
Scenario 1 | 0.478 | 0.486 | 0.468 | 0.528 | 0.523 |
Scenario 2 | 0.515 | 0.489 | 0.487 | 0.494 | 0.480 |
5-fold | 10-fold | 20-fold | ||
---|---|---|---|---|
12.744 | 12.672 | 12.636 | ||
10.564 | 10.608 | 10.560 | ||
10.792 | 10.824 | 10.772 | ||
8.250 | 8.254 | 8.246 |
5-fold | 10-fold | 20-fold | ||
---|---|---|---|---|
32.272 | 33.604 | 35.500 | ||
21.064 | 21.864 | 22.404 | ||
24.208 | 25.426 | 26.350 | ||
19.852 | 19.992 | 20.274 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, Y.; Yang, Y. The One Standard Error Rule for Model Selection: Does It Work? Stats 2021, 4, 868-892. https://doi.org/10.3390/stats4040051
Chen Y, Yang Y. The One Standard Error Rule for Model Selection: Does It Work? Stats. 2021; 4(4):868-892. https://doi.org/10.3390/stats4040051
Chicago/Turabian StyleChen, Yuchen, and Yuhong Yang. 2021. "The One Standard Error Rule for Model Selection: Does It Work?" Stats 4, no. 4: 868-892. https://doi.org/10.3390/stats4040051
APA StyleChen, Y., & Yang, Y. (2021). The One Standard Error Rule for Model Selection: Does It Work? Stats, 4(4), 868-892. https://doi.org/10.3390/stats4040051