Abstract
A column generation based approach is proposed for solving the cluster-wise regression problem. The proposed strategy relies firstly on several efficient heuristic strategies to insert columns into the restricted master problem. If these heuristics fail to identify an improving column, an exhaustive search is performed starting with incrementally larger ending subsets, all the while iteratively performing heuristic optimization to ensure a proper balance of exact and heuristic optimization. Additionally, observations are sequenced by their dual variables and by their inclusion in joint pair branching rules. The proposed strategy is shown to outperform the best known alternative (BBHSE) when the number of clusters is greater than three. Additionally, the current work further demonstrates and expands the successful use of the new paradigm of using incrementally larger ending subsets to strengthen the lower bounds of a branch and bound search as pioneered by Brusco's Repetitive Branch and Bound Algorithm (RBBA).
Similar content being viewed by others
References
ALOISE, D., CAFIERI, S., CAPOROSSI G., HANSEN, P., PERRON, S., LIBERTI, L. (2010a), "Column Generation Algorithms for Exact Modularity Maximization in Networks," Physical Review E, 82, 046112.
ALOISE, D., HANSEN, P., and LIBERTI, L. (2010b), "An Improved Column Generation Algorithm for Minimum Sum-of-Squares Clustering", Mathematical Programming, 1–26.
AURIFEILLE, J.M. (2000), "A Bio-Mimetic Approach to Marketing Segmentation: Principles and Comparative Analysis," European Journal of Economic and Social Systems, 14, 93–108.
AURIFEILLE, J.M., and MEDLIN, C. (2001), "A Dyadic Segmentation Approach to Business Partnerships," European Journal of Economic and Social Systems, 15, 3–16.
AURIFEILLE, J.M., and QUESTER, P.G. (2003), "Predicting Business Ethical Tolerance in International Markets: A Concomitant Clusterwise Regression Analysis," International Business Review, 12, 253–272.
BARNHART, C., JOHNSON, E.L., NEMHAUSER, G.L., SAVELSBERGH, M.W.P., and VANCE, P.H. (1998), "Branch-and-Price: Column Generation for Solving Huge Integer Programs," Operations Research, 46, 316–329.
BERKELAAR, M., EIKLAND, K., and NOTEBAERT, P. (2010), "Lp_Solve: Open Source (Mixed-Integer) Linear Programming System," http://lpsolve.sourceforge.net/5.5/
BRUSCO, M.J. (2006), "A Repetitive Branch-and-Bound Procedure for Minimum Within-Cluster Sums of Squares Partitioning," Psychometrika, 71, 347–363.
BRUSCO, M.J., CRADIT, J.D., STEINLEY, D., and FOX, G.L. (2008), "Cautionary Remarks on the Use of Clusterwise Regression," Multivariate Behavioral Research, 43, 29–49.
BRUSCO, M.J., and STAHL, S. (2005), Branch-and-Bound Applications in Combinatorial Data Analysis, New York: Springer Verlag.
CAPOROSSI, G., and HANSEN, P. (2005), "Variable Neighborhood Search for Least Squares Clusterwise Regression," Les Cahiers du GERAD, G-2005-61.
CARBONNEAU, R.A., CAPOROSSI, G., and HANSEN, P. (2011a), "Extensions to the Repetitive Branch and Bound Algorithm for Globally Optimal Clusterwise Regression," GERAD, G-2011-10.
CARBONNEAU, R.A., CAPOROSSI, G., and HANSEN, P. (2011b), "Globally Optimal Clusterwise Regression by Mixed Logical-Quadratic Programming," European Journal of Operational Research, 212, 213–222.
CHARLES, C. (1977), "Régression Typologique Et Reconnaissance Des Formes," Thèse de doctorat 3ième cycle, Université de Paris IX.
CHVÁTAL, V. (1983), Linear Programming, New York: WH Freeman.
CIAMPI, A., RICH, B., DYACHENKO, A., ANTONIANO, I., MURIE, C., AND NADON, R. (2007), "Locally Linear Regression and the Calibration Problem for Micro-Array Analysis," in Selected Contributions in Data Analysis and Classification, eds. P. Brito, G. Cucumel, P. Bertrand and F. de Carvalho, Berlin: Springer, pp. 549–555.
DANTZIG, G.B., and WOLFE, P. (1960), "Decomposition Principle for Linear Programs," Operations Research, 8, 101–111.
DESARBO, W. (1988), "A Maximum Likelihood Methodology for Clusterwise Linear Regression," Journal of Classification, 5, 249–282.
DESARBO, W., OLIVER, R., and RANGASWAMY, A. (1989), "A Simulated Annealing Methodology for Clusterwise Linear Regression," Psychometrika, 54, 707–736.
DESAULNIERS, G., DESROSIERS, J., and SOLOMON, M.M. (eds.) (2005), Column Generation (Vol. 5), Berlin: Springer Verlag.
DIDAY, E. (1979), Optimization En Classification Automatique, Le Chesnay: INRIA.
GENTLEMAN, W.M. (1973), "Least Squares Computations by Givens Transformations without Square Roots," IMA Journal of Applied Mathematics, 12, 329–336.
HANSEN, P., and MEYER, C. (2011), "A New Column Generation Algorithm for Logical Analysis of Data," Annals of Operations Research, 188, 215–249.
HENNIG, C. (2000), "Identifiablity of Models for Clusterwise Linear Regression," Journal of Classification, 17, 273–296.
HOOKER, J.N. (2002), "Logic, Optimization, and Constraint Programming," INFORMS Journal on Computing, 14, 295–321.
HOOKER, J.N. (2007), Integrated Methods for Optimization, New York: Springer.
HOOKER, J.N., and OSORIO, M.A. (1999), "Mixed Logical-Linear Programming," Discrete Applied Mathematics, 96, 395–442.
HOOKER, J.N., OTTOSSON, G., THORSTEINSSON, E.S., and KIM, H.J. (2000), "A Scheme for Unifying Optimization and Constraint Satisfaction Methods," The Knowledge Engineering Review, 15, 11–30.
IBM (2009a), Ibm Ilog Cplex 12.1.0, Armonk, NY: IBM.
IBM (2009b), Ibm Ilog Opl 6.3, Armonk, NY: IBM.
KNUTH, D.E. (1997), "Art of Computer Programming, Volume 2: Seminumerical Algorithms" (3rd ed.), Addison-Wesley Professional.
LAU, KN., LEUNG, P.L., and TSE, K.K. (1999), "A Mathematical Programming Approach to Clusterwise Regression Model and Its Extensions," European Journal of Operational Research, 116, 640–652.
LÜBBECKE, M.E., and DESROSIERS, J. (2005), "Selected Topics in Column Generation," Operations Research, 53, 1007–1023.
MIRKIN, B. (2005), Clustering for Data Mining, London: Chapman & Hall/CRC.
RYAN, D.M., and FOSTER, B.A. (1981), "An Integer Programming Approach to Scheduling," in Computer Scheduling of Public Transport Urban Passenger Vehicle and Crew Scheduling, Amsterdam: North-Holland, pp. 269–280.
SPÄTH, H. (1979), "Algorithm 39 Clusterwise Linear Regression," Computing, 22, 367–373.
SPÄTH, H. (1981), "Correction to Algorithm 39 Clusterwise Linear Regression," Computing, 26, 275–275.
SPÄTH, H. (1982), "A Fast Algorithm for Clusterwise Linear Regression," Computing, 29, 175–181.
VAN HENTENRYCK, P., LUSTIG, I., MICHEL, L., and PUGET, J.F. (1999), The Opl Optimization Programming Language, Cambridge: MIT Press.
WEDEL, M. (1990), "Clusterwise Regression and Market Segmentation. Developments and Applications," doctoral thesis, Landbouwuniversiteit Wageningen.
WEDEL, M. (1998), Glimmix: Simultaneous Estimation of Latent Classes and Generalized Models within Each Latent Class, User's Manual, Version 1.0, Groningen: ProGAMMA.
WEDEL, M., and DESARBO, W.S. (1995), "A Mixture Likelihood Approach for Generalized Linear Models," Journal of Classification, 12, 21–55.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Carbonneau, R.A., Caporossi, G. & Hansen, P. Globally Optimal Clusterwise Regression By Column Generation Enhanced with Heuristics, Sequencing and Ending Subset Optimization. J Classif 31, 219–241 (2014). https://doi.org/10.1007/s00357-014-9155-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00357-014-9155-x