Trust is a major issue with deploying empirical models in the real world since changes in the underlying system or use of the model in new regions of parameter space can produce (potentially dangerous) incorrect predictions. The trepidation involved with model usage can be mitigated by assembling ensembles of diverse models and using their consensus as a trust metric, since these models will be constrained to agree in the data region used for model development and also constrained to disagree outside that region. The problem is to define an appropriate model complexity (since the ensemble should consist of models of similar complexity), as well as to identify diverse models from the candidate model set.
In this chapter we discuss strategies for the development and selection of robust models and model ensembles and demonstrate those strategies against industrial data sets. An important benefit of this approach is that all available data may be used in the model development rather than a partition into training, test and validation subsets. The result is constituent models are more accurate without risk of over-fitting, the ensemble predictions are more accurate and the ensemble predictions have a meaningful trust metric.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Castillo, Flor, Kordon, Arthur, Sweeney, Jeff, and Zirk, Wayne (2004). Using genetic programming in industrial statistical model building. In O’Reilly, Una-May, Yu, Tina, Riolo, Rick L., and Worzel, Bill, editors, Genetic Programming Theory and Practice II, chapter 3, pages 31-48. Springer, Ann Arbor.
Hamill, Thomas (2002). An overview of ensemble forecasting and data assimilation. In Preprints of the 14th conference on Numerical Weather Prediction, Ft.Lauderdale, USA. American Meteorological Society.
Keijzer, Maarten (2003). Improving symbolic regression with interval arithmetic and linear scaling. In Ryan, Conor, Soule, Terence, Keijzer, Maarten, Tsang, Edward, Poli, Riccardo, and Costa, Ernesto, editors, Genetic Programming, Proceedings of EuroGP’2003, volume 2610 of LNCS, pages 70-82, Essex. Springer-Verlag.
Kordon, Arthur, Smits, Guido, Kalos, Alex, and Jordaan, Elsa (2003). Robust soft sensor development using genetic programming. In Leardi, R., editor, Nature-Inspired Methods in Chemometrics: Genetic Algorithms and Artificial Neural Networks. Elsevier, Amsterdam.
Kordon, Arthur, Smits, Guido, and Kotanchek, Mark (2006). Industrial evolutionary computing. In GECCO 2006: Tutorials of the 8th annual conference on Genetic and evolutionary computation, Seattle, Washington, USA. ACM Press.
Korns, Michael F. (2006). Large-scale, time-constrained symbolic regression. In Riolo, Rick L., Soule, Terence, and Worzel, Bill, editors, Genetic Programming Theory and Practice IV, volume 5 of Genetic and Evolutionary Computation, chapter 16. Springer, Ann Arbor.
Kotanchek, Mark, Smits, Guido, and Vladislavleva, Ekaterina (2006). Pursuing the pareto paradigm tournaments, algorithm variations & ordinal optimization. In Riolo, Rick L., Soule, Terence, and Worzel, Bill, editors, Genetic Programming Theory and Practice IV, volume 5 of Genetic and Evolutionary Computation, chapter 3. Springer, Ann Arbor.
Smits, Guido and Vladislavleva, Ekaterina (2006). Ordinal pareto genetic programming. In Proceedings of the 2006 IEEE Congress on Evolutionary Computation, Vancouver. IEEE Press.
DataModeler (2007). Add-on analysis package for Mathematica.
Vladislavleva, Ekaterina and Smits, Guido (2007). Order of non-linearity as a complexity measure for models generated by symbolic regression via genetic programming. In review at IEEE Trans. on Evolutionary Computation (sumbitted).
Wichard, Joerg (2006). Model selection in an ensemble framework. In Proceedings of the IEEE World Congress on Computational Intelligence WCCI 2006, Vancouver, Canada.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Kotanchek, M., Smits, G., Vladislavleva, E. (2008). Trustable symbolic regression models: using ensembles, interval arithmetic and pareto fronts to develop robust and trust-aware models. In: Riolo, R., Soule, T., Worzel, B. (eds) Genetic Programming Theory and Practice V. Genetic and Evolutionary Computation Series. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-76308-8_12
Download citation
DOI: https://doi.org/10.1007/978-0-387-76308-8_12
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-76307-1
Online ISBN: 978-0-387-76308-8
eBook Packages: Computer ScienceComputer Science (R0)