Abstract
This paper presents robust weighted variants of batch and online standard Gaussian processes (GPs) to effectively reduce the negative impact of outliers in the corresponding GP models. This is done by introducing robust data weighers that rely on robust and quasi-robust weight functions that come from robust M-estimators. Our robust GPs are compared to various GP models on four datasets. It is shown that our batch and online robust weighted GPs are indeed robust to outliers, significantly outperforming the corresponding standard GPs and the recently proposed heteroscedastic GP method GPz. Our experiments also show that our methods are comparable to and sometimes better than a state-of-the-art robust GP that uses a Student-t likelihood.
Similar content being viewed by others
References
Agostinelli C, Greco L (2013) A weighted strategy to handle likelihood uncertainty in Bayesian inference. Comput Stat 28:319–339
Almosallam IA, Jarvis MJ, Roberts SJ (2016) GPz: non-stationary sparse Gaussian processes for heteroscedastic uncertainty estimation in photometric redshifts. Mon Not R Astron Soc 462(1):726–739
Bentley JL (1980) Multidimensional divide and conquer. Commun ACM 23(4):214–229
Bernholt T, Fischer P (2004) The complexity of computing the MCD-estimator. Theor Comput Sci 326:383–398
Bishop CM (2006) Pattern recognition and machine learning. Springer, New York
Buta R (1987) The structure and dynamics of ringed galaxies, III: surface photometry and kinematics of the ringed nonbarred spiral NGC7531. Astrophys J Suppl Ser 64:1–37
Csató L (2002) Gaussian processes—iterative sparse approximations. PhD thesis. Aston University, Birmingham, UK. http://publications.aston.ac.uk/id/eprint/1327/
Csató L, Opper M (2002) Sparse on-line Gaussian processes. Neural Comput 14(3):641–668
de Boor CA (2001) Practical guide to splines. Applied mathematical sciences, vol 27, revised edn. Springer, New York
Dennis JE Jr, Welsch RE (1978) Techniques for nonlinear least squares and robust regression. Commun Stat Simul Comput 7(4):345–359
Drucker H, Burges CJ, Kaufman L, Smola AJ, Vapnik V (1997) Support vector regression machines. Advances in neural information processing systems, pp 155–161
Geweke J (1993) Bayesian treatment of the independent Student-t linear model. J Appl Econom 8(S1):S19–S40
Girden ER (1992) ANOVA: repeated measures. Sage University Paper Series on Quantitative Applications in the Social Sciences 07-084, Sage University, Newbury Park, CA
Greco L, Racugno W, Ventura L (2008) Robust likelihood functions in Bayesian analysis. J Stat Plan Inference 138(5):1258–1270
Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA (1986) Robust statistics. The approach based on influence functions. Wiley, New York
Huber PJ (1964) Robust estimation of a location parameter. Ann Stat 53(1):73–101
Huber PJ, Ronchetti EM (1981) Robust statistics. Wiley, New York
Jylänki P, Vanhatalo J, Vehtari A (2011) Robust Gaussian process regression with a student-t likelihood. J Mach Learn Res 12:3227–3257
Kemmler M, Rodner E, Denzler J (2010) One-class classification with Gaussian processes. In: Proceedings of the Asian conference on computer vision. Lecture notes in computer science, vol 6493. Springer, pp 489–500
Kuss M (2006) Gaussian process models for robust regression, classification, and reinforcement learning. Doctoral dissertation, Technische Universität Darmstadt, Germany. http://hdl.handle.net/11858/00-001M-0000-0013-D2CD-C
Kuss M, Pfingsten T, CsatóL, Rasmussen CE (2005) Approximate inference for robust Gaussian process regression. Max Planck Institute for Biological Cybernetics, Tübingen, Germany, Technical Report 136. http://hdl.handle.net/11858/00-001M-0000-0013-D703-4
Le QV, Smola AJ, Canu S (2005) Heteroscedastic Gaussian process regression. In: Proceedings of the 22nd international conference on machine learning. ACM, pp 489–496
MacKay DJC (1998) Introduction to Gaussian processes. In: Bishop CM (ed) Neural networks and machine learning. NATO ASI series, vol 168. Springer, Berlin, pp 133–165
Mahalanobis PC (1936) On the generalised distance in statistics. Proc Natl Inst Sci India 2(1):49–55
Maronna RA, Martin DR, Yohai VJ (2006) Robust statistics: theory and methods. Wiley, Chichester
Mattos CLC, Santos JDA, Barreto GA (2015) An empirical evaluation of robust Gaussian process models for system identification. In: International conference on intelligent data engineering and automated learning. Springer, Cham, pp 172–180
Minka TP (2001) Expectation propagation for approximate Bayesian Inference. In: Proceedings of the seventeenth conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., pp 362–369
Murphy L, Martin S, Corke P (2012) Creating and using probabilistic cost maps from vehicle experience. In: Proceedings of IEEE/RSJ international conference on intelligent robots and systems, intelligent robots and systems (IROS). IEEE, pp 4689–4694
Neal RM (1997) Monte Carlo implementation of Gaussian process models for Bayesian regression and classification. Technical Report 9702, Department of Statistics and Department of Computer Science, University of Toronto. arXiv:physics/9701026
Opper M (1998) A Bayesian approach to on-line learning. In: Saad D (ed) On-line learning in neural networks. Cambridge University Press, Cambridge
Ramirez-Padron R (2015) Batch and online implicit weighted Gaussian processes for robust Novelty detection. Doctoral dissertation, University of Central Florida. http://purl.fcla.edu/fcla/etd/CFE0005869
Ramirez-Padron R, Mederos B, Gonzalez AJ (2013) Novelty detection using sparse online Gaussian processes for visual object recognition. In: International FLAIRS conference. St. Pete Beach, FL, USA, pp 124–129
Ranjan R, Huang B, Fatehi A (2016) Robust Gaussian process modeling using EM algorithm. J Process Control 42:125–136
Rasmussen CE, Williams C (2006) Gaussian processes for machine learning. MIT Press, Cambridge
Rey WJJ (1983) Introduction to robust and quasi-robust statistical methods. Springer, Berlin
Rottmann A, Burgard W (2010) Learning non-stationary system dynamics online using Gaussian processes. In: Proceedings of 32nd DAGM symposium, Darmstadt, Germany, pp 192–201
Rousseeuw PJ (1984) Least median of squares regression. J Am Stat Assoc 79(388):871–880
Schmidt G, Mattern R, Schüler F (1981) Biomechanical investigation to determine physical and traumatological differentiation criteria for the maximum load capacity of head and vertebral column with and without protective helmet under effects of impact. EEC Research Program on Biomechanics of Impacts, Final Report Phase III, Project 65, Institut für Rechtsmedizin, Universität Heidelberg, Germany
Seeger M (2004) Gaussian processes for machine learning. Int J Neural Syst 14(2):69–106
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge
Silverman BW (1985) Some aspects of the spline smoothing approach to non-parametric curve fitting. J R Stat Soc Ser B (Methodol) 47(1):1–52
Sugiyama M, Krauledat M, Müller KR (2007) Covariate shift adaptation by importance weighted cross validation. J Mach Learn Res 8:985–1005
Tipping ME (2000) The relevance vector machine. In: Advances in neural information processing systems, pp 652–658
Tipping ME, Lawrence ND (2005) Variational inference for Student-t models: robust Bayesian interpolation and generalised component analysis. Neurocomputing 69(1–3):123–141
Verboven S, Hubert M (2005) LIBRA: a MATLAB Library for robust analysis. Chemometr Intell Lab Syst 75:127–136
Wald I, Havran V (2006) On building fast kd-trees for ray tracing, and on doing that in O(N log N). In: Proceedings of the 2006 IEEE symposium on interactive ray tracing, pp 61–69
Wang B, Mao Z (2019) Outlier detection based on Gaussian process with application to industrial processes. Appl Soft Comput J 76:505–516
West M (1984) Outlier models and prior distributions in Bayesian linear regression. J R Stat Soc (Ser B) 46(3):431–439
Williams CKI, Barber D (1998) Bayesian classification with Gaussian processes. IEEE Trans Pattern Anal Mach Intell 12(20):1342–1351
Yeh C (1998) Modeling of strength of high performance concrete using artificial neural networks. Cem Concr Res 28(12):1797–1808
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Estimation of weighted GP hyperparameters
Appendix: Estimation of weighted GP hyperparameters
Here we derive the expressions for the derivatives of the log marginal likelihood of the batch RWGP with respect to the GP hyperparameters \(\varvec{\theta }\) = \(({\theta }_{0}, \varvec{\theta }_k)^T = ({\sigma }^{2},{\varvec{\theta }}_k)^T\), where \(\varvec{\theta }_k = (\theta _1, \theta _2, \dots , \theta _{l})\), with \(l \ge 0\) (i.e. \(\varvec{\theta }_k\) might be empty), denotes the kernel hyperparameters. Given the training set D, the MML method consists of finding the hyperparameter values that maximize the log marginal likelihood given by (25):
where \(\textit{\textbf{K}}_p = \textit{\textbf{K}}_D + \textit{\textbf{W}}\). The derivatives of \(\mathcal {L}\) with respect to each \({\theta }_i,\ i = 0, 1, \dots , l\) are written as:
where the derivatives of \(\textit{\textbf{K}}_p\) are as follows:
Rights and permissions
About this article
Cite this article
Ramirez-Padron, R., Mederos, B. & Gonzalez, A.J. Robust weighted Gaussian processes. Comput Stat 36, 347–373 (2021). https://doi.org/10.1007/s00180-020-01011-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-020-01011-0