Abstract
An efficient optimization algorithm for identifying the best least squares regression model under the condition of non-negative coefficients is proposed. The algorithm exposits an innovative solution via the unrestricted least squares and is based on the regression tree and branch-and-bound techniques for computing the best subset regression. The aim is to filling a gap in computationally tractable solutions to the non-negative least squares problem and model selection. The proposed method is illustrated with a real dataset. Experimental results on real and artificial random datasets confirm the computational efficacy of the new strategy and demonstrates its ability to solve large model selection problems that are subject to non-negativity constrains.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Armstrong, R.D., Frome, E.L.: A branch-and-bound solution of a restricted least squares problem. Technometrics 18, 447–450 (1976)
Burks, A.W., Warren, D.W., Wright, J.B.: An analysis of a logical machine using parenthesis-free notation. Math. Tables Other Aids Comput. 8(46), 53–57 (1954)
Cutler, A.: A branch and bound algorithm for constrained least squares. Commun. Stat., Simul. Comput. 2, 305–321 (1993)
Furnival, G.M., Wilson, R.W.: Regression by leaps and bounds. Technometrics 16(4), 499–511 (1974). Republished in Technometrics 42, 69–79 (2000)
Gatu, C., Kontoghiorghes, E.J.: Parallel algorithms for computing all possible subset regression models using the QR decomposition. Parallel Comput. 29(4), 505–521 (2003)
Gatu, C., Kontoghiorghes, E.J.: Branch-and-bound algorithms for computing the best-subset regression models. J. Comput. Graph. Stat. 15, 139–156 (2006)
Gatu, C., Yanev, P.I., Kontoghiorghes, E.J.: A graph approach to generate all possible regression submodels. Comput. Stat. Data Anal. 52, 799–815 (2007)
Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. Johns Hopkins University Press, Baltimore (1996)
Hofmann, M., Gatu, C., Kontoghiorghes, E.J.: Efficient algorithms for computing the best subset regression models for large-scale problems. Comput. Stat. Data Anal. 52, 16–29 (2007)
Judge, G.G., Takayama, T.: Inequality restrictions in regression analysis. J. Am. Stat. Assoc. 61, 166–181 (1966)
Kontoghiorghes, E.J.: Parallel Algorithms for Linear Models: Numerical Methods and Estimation Problems. Advances in Computational Economics, vol. 15. Kluwer Academic, Boston (2000)
Kozlov, M.K., Tarasov, S.P., Khachiyan, L.G.: The polynomial solvability of convex quadratic programming. U.S.S.R. Comput. Math. Math. Phys. 20(5), 223–228 (1980)
Lawson, C.L., Hanson, R.J.: Solving Least Squares Problems. Prentice-Hall, Englewood Cliffs (1974)
Mantel, N.: Restricted least squares regression and convex quadratic programming. Technometrics 11, 763–773 (1969)
McDonald, G.C., Schwing, R.C.: Instabilities of regression estimates relating air pollution to mortality. Technometrics 15, 463–482 (1973)
Miller, A.J.: Subset Selection in Regression. Monographs on Statistics and Applied Probability, vol. 40. Chapman and Hall, London (1990)
Potra, F.A., Wright, S.J.: Interior-point methods. J. Comput. Appl. Math. 124, 281–302 (2000)
Rueda, C., Salvador, B., Fernández, M.A.: Simultaneous estimation in a restricted linear model. J. Multivar. Anal. 61, 61–66 (1997)
Sen, A., Srivastava, M.: Regression Analysis. Theory, Methods and Applications. Springer Texts in Statistics. Springer, New York (1990)
Smith, D.M., Bremner, J.M.: All possible subset regressions using the QR decomposition. Comput. Stat. Data Anal. 7, 217–235 (1989)
Waterman, M.S.: A restricted least squares problem. Technometrics 16, 135–136 (1974)
Waterman, M.S.: Least squares with nonnegative regression coefficients. J. Stat. Comput. Simul. 6, 67–70 (1977)
Wolfe, P.: The simplex method for quadratic programming. Econometrica 27, 382–398 (1959)
Acknowledgements
The authors are grateful to the Editor and referees for their valuable comments and suggestions. This work is in part supported by the University of Cyprus project UCY-31030, the Cyprus University of Technology project E.X. 200079 and the Romanian project PN-II-RU-TE-2011-3-0242.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: Selection criteria equivalence
Lemma 2
Let V={1,2,…,n} denote a set of variables. Finding the best-subset regression with respect to some selection criterion f∈{RSS,AIC,BIC,…} reduces to finding the n-list of the best subset models corresponding to each model size p=1,…,n.
Proof
Without loss of generality it is assumed that the selection criterion f is to be minimized. The problem of finding the best subset-model with respect to f can be written as
Note that
Now, consider that the computed the n-list of the best submodels of each size p is given by
The problem (6) becomes
This completes the proof. □
There are \(\binom{n}{p} \) candidate submodels that selects p variables out of n. The total number of submodels of V when excluding the empty model is \(\sum_{p=1}^{n} \binom{n}{p} = 2^{n} - 1\). Lemma 2 states that once the n-list of the best submodels corresponding to each model size have been computed, the best submodel of V is obtained as the optimum of this n-best list.
Property 1
Let V={1,2,…,n} denote a set of variables. A selection criterion is said to be monotonous in the RSS for equal-size subsets if
Standard selection criteria such as AIC and BIC satisfies the above property, since for equal-size subsets the penalty term is constant.
Lemma 3
Let V={1,2,…,n} denote a set of variables and p such that 1≤p≤n. If f is a selection criterion that satisfies the monotonicity property 1, then
Proof
Let
It follows that
Using the monotonicity property 1 the latter becomes
which implies \(f(W_{p}^{*}) =f(\widehat{W}_{p}) \). This completes the proof. □
Lemma 3 states that if the model size p is fixed, then the same best submodel optimizes the RSS as well as any selection criterion that satisfies monotonicity in RSS Property 1.
Appendix B: The Survival data
Details and a description of the data can be found in Armstrong and Frome (1976) and Cutler (1993). The use of the original data derives the best-subset solution in the root node of the regression tree. Thus, for illustration purposes, in the case study the first and last independent variables (a 2 and a 4) have been interchanged to give the dataset below. The constant variable a 1=1 has been added. The variable labels as they appear in the original data are indicated in the last column.
Rights and permissions
About this article
Cite this article
Gatu, C., Kontoghiorghes, E.J. A fast algorithm for non-negativity model selection. Stat Comput 23, 403–411 (2013). https://doi.org/10.1007/s11222-012-9318-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-012-9318-8