Nothing Special   »   [go: up one dir, main page]

Skip to main content
Log in

Robust weighted Gaussian processes

  • Original paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

This paper presents robust weighted variants of batch and online standard Gaussian processes (GPs) to effectively reduce the negative impact of outliers in the corresponding GP models. This is done by introducing robust data weighers that rely on robust and quasi-robust weight functions that come from robust M-estimators. Our robust GPs are compared to various GP models on four datasets. It is shown that our batch and online robust weighted GPs are indeed robust to outliers, significantly outperforming the corresponding standard GPs and the recently proposed heteroscedastic GP method GPz. Our experiments also show that our methods are comparable to and sometimes better than a state-of-the-art robust GP that uses a Student-t likelihood.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Agostinelli C, Greco L (2013) A weighted strategy to handle likelihood uncertainty in Bayesian inference. Comput Stat 28:319–339

    Article  MathSciNet  Google Scholar 

  • Almosallam IA, Jarvis MJ, Roberts SJ (2016) GPz: non-stationary sparse Gaussian processes for heteroscedastic uncertainty estimation in photometric redshifts. Mon Not R Astron Soc 462(1):726–739

    Article  Google Scholar 

  • Bentley JL (1980) Multidimensional divide and conquer. Commun ACM 23(4):214–229

    Article  MathSciNet  Google Scholar 

  • Bernholt T, Fischer P (2004) The complexity of computing the MCD-estimator. Theor Comput Sci 326:383–398

    Article  MathSciNet  Google Scholar 

  • Bishop CM (2006) Pattern recognition and machine learning. Springer, New York

    MATH  Google Scholar 

  • Buta R (1987) The structure and dynamics of ringed galaxies, III: surface photometry and kinematics of the ringed nonbarred spiral NGC7531. Astrophys J Suppl Ser 64:1–37

    Article  Google Scholar 

  • Csató L (2002) Gaussian processes—iterative sparse approximations. PhD thesis. Aston University, Birmingham, UK. http://publications.aston.ac.uk/id/eprint/1327/

  • Csató L, Opper M (2002) Sparse on-line Gaussian processes. Neural Comput 14(3):641–668

    Article  Google Scholar 

  • de Boor CA (2001) Practical guide to splines. Applied mathematical sciences, vol 27, revised edn. Springer, New York

    MATH  Google Scholar 

  • Dennis JE Jr, Welsch RE (1978) Techniques for nonlinear least squares and robust regression. Commun Stat Simul Comput 7(4):345–359

    Article  Google Scholar 

  • Drucker H, Burges CJ, Kaufman L, Smola AJ, Vapnik V (1997) Support vector regression machines. Advances in neural information processing systems, pp 155–161

  • Geweke J (1993) Bayesian treatment of the independent Student-t linear model. J Appl Econom 8(S1):S19–S40

    Article  Google Scholar 

  • Girden ER (1992) ANOVA: repeated measures. Sage University Paper Series on Quantitative Applications in the Social Sciences 07-084, Sage University, Newbury Park, CA

  • Greco L, Racugno W, Ventura L (2008) Robust likelihood functions in Bayesian analysis. J Stat Plan Inference 138(5):1258–1270

    Article  Google Scholar 

  • Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA (1986) Robust statistics. The approach based on influence functions. Wiley, New York

    MATH  Google Scholar 

  • Huber PJ (1964) Robust estimation of a location parameter. Ann Stat 53(1):73–101

    Article  MathSciNet  Google Scholar 

  • Huber PJ, Ronchetti EM (1981) Robust statistics. Wiley, New York

    Book  Google Scholar 

  • Jylänki P, Vanhatalo J, Vehtari A (2011) Robust Gaussian process regression with a student-t likelihood. J Mach Learn Res 12:3227–3257

    MathSciNet  MATH  Google Scholar 

  • Kemmler M, Rodner E, Denzler J (2010) One-class classification with Gaussian processes. In: Proceedings of the Asian conference on computer vision. Lecture notes in computer science, vol 6493. Springer, pp 489–500

  • Kuss M (2006) Gaussian process models for robust regression, classification, and reinforcement learning. Doctoral dissertation, Technische Universität Darmstadt, Germany. http://hdl.handle.net/11858/00-001M-0000-0013-D2CD-C

  • Kuss M, Pfingsten T, CsatóL, Rasmussen CE (2005) Approximate inference for robust Gaussian process regression. Max Planck Institute for Biological Cybernetics, Tübingen, Germany, Technical Report 136. http://hdl.handle.net/11858/00-001M-0000-0013-D703-4

  • Le QV, Smola AJ, Canu S (2005) Heteroscedastic Gaussian process regression. In: Proceedings of the 22nd international conference on machine learning. ACM, pp 489–496

  • MacKay DJC (1998) Introduction to Gaussian processes. In: Bishop CM (ed) Neural networks and machine learning. NATO ASI series, vol 168. Springer, Berlin, pp 133–165

    Google Scholar 

  • Mahalanobis PC (1936) On the generalised distance in statistics. Proc Natl Inst Sci India 2(1):49–55

    MathSciNet  MATH  Google Scholar 

  • Maronna RA, Martin DR, Yohai VJ (2006) Robust statistics: theory and methods. Wiley, Chichester

    Book  Google Scholar 

  • Mattos CLC, Santos JDA, Barreto GA (2015) An empirical evaluation of robust Gaussian process models for system identification. In: International conference on intelligent data engineering and automated learning. Springer, Cham, pp 172–180

  • Minka TP (2001) Expectation propagation for approximate Bayesian Inference. In: Proceedings of the seventeenth conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., pp 362–369

  • Murphy L, Martin S, Corke P (2012) Creating and using probabilistic cost maps from vehicle experience. In: Proceedings of IEEE/RSJ international conference on intelligent robots and systems, intelligent robots and systems (IROS). IEEE, pp 4689–4694

  • Neal RM (1997) Monte Carlo implementation of Gaussian process models for Bayesian regression and classification. Technical Report 9702, Department of Statistics and Department of Computer Science, University of Toronto. arXiv:physics/9701026

  • Opper M (1998) A Bayesian approach to on-line learning. In: Saad D (ed) On-line learning in neural networks. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Ramirez-Padron R (2015) Batch and online implicit weighted Gaussian processes for robust Novelty detection. Doctoral dissertation, University of Central Florida. http://purl.fcla.edu/fcla/etd/CFE0005869

  • Ramirez-Padron R, Mederos B, Gonzalez AJ (2013) Novelty detection using sparse online Gaussian processes for visual object recognition. In: International FLAIRS conference. St. Pete Beach, FL, USA, pp 124–129

  • Ranjan R, Huang B, Fatehi A (2016) Robust Gaussian process modeling using EM algorithm. J Process Control 42:125–136

    Article  Google Scholar 

  • Rasmussen CE, Williams C (2006) Gaussian processes for machine learning. MIT Press, Cambridge

    MATH  Google Scholar 

  • Rey WJJ (1983) Introduction to robust and quasi-robust statistical methods. Springer, Berlin

    Book  Google Scholar 

  • Rottmann A, Burgard W (2010) Learning non-stationary system dynamics online using Gaussian processes. In: Proceedings of 32nd DAGM symposium, Darmstadt, Germany, pp 192–201

  • Rousseeuw PJ (1984) Least median of squares regression. J Am Stat Assoc 79(388):871–880

    Article  MathSciNet  Google Scholar 

  • Schmidt G, Mattern R, Schüler F (1981) Biomechanical investigation to determine physical and traumatological differentiation criteria for the maximum load capacity of head and vertebral column with and without protective helmet under effects of impact. EEC Research Program on Biomechanics of Impacts, Final Report Phase III, Project 65, Institut für Rechtsmedizin, Universität Heidelberg, Germany

  • Seeger M (2004) Gaussian processes for machine learning. Int J Neural Syst 14(2):69–106

    Article  Google Scholar 

  • Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Silverman BW (1985) Some aspects of the spline smoothing approach to non-parametric curve fitting. J R Stat Soc Ser B (Methodol) 47(1):1–52

    MATH  Google Scholar 

  • Sugiyama M, Krauledat M, Müller KR (2007) Covariate shift adaptation by importance weighted cross validation. J Mach Learn Res 8:985–1005

    MATH  Google Scholar 

  • Tipping ME (2000) The relevance vector machine. In: Advances in neural information processing systems, pp 652–658

  • Tipping ME, Lawrence ND (2005) Variational inference for Student-t models: robust Bayesian interpolation and generalised component analysis. Neurocomputing 69(1–3):123–141

    Article  Google Scholar 

  • Verboven S, Hubert M (2005) LIBRA: a MATLAB Library for robust analysis. Chemometr Intell Lab Syst 75:127–136

    Article  Google Scholar 

  • Wald I, Havran V (2006) On building fast kd-trees for ray tracing, and on doing that in O(N log N). In: Proceedings of the 2006 IEEE symposium on interactive ray tracing, pp 61–69

  • Wang B, Mao Z (2019) Outlier detection based on Gaussian process with application to industrial processes. Appl Soft Comput J 76:505–516

    Article  Google Scholar 

  • West M (1984) Outlier models and prior distributions in Bayesian linear regression. J R Stat Soc (Ser B) 46(3):431–439

    MathSciNet  MATH  Google Scholar 

  • Williams CKI, Barber D (1998) Bayesian classification with Gaussian processes. IEEE Trans Pattern Anal Mach Intell 12(20):1342–1351

    Article  Google Scholar 

  • Yeh C (1998) Modeling of strength of high performance concrete using artificial neural networks. Cem Concr Res 28(12):1797–1808

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruben Ramirez-Padron.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Estimation of weighted GP hyperparameters

Appendix: Estimation of weighted GP hyperparameters

Here we derive the expressions for the derivatives of the log marginal likelihood of the batch RWGP with respect to the GP hyperparameters \(\varvec{\theta }\) = \(({\theta }_{0}, \varvec{\theta }_k)^T = ({\sigma }^{2},{\varvec{\theta }}_k)^T\), where \(\varvec{\theta }_k = (\theta _1, \theta _2, \dots , \theta _{l})\), with \(l \ge 0\) (i.e. \(\varvec{\theta }_k\) might be empty), denotes the kernel hyperparameters. Given the training set D, the MML method consists of finding the hyperparameter values that maximize the log marginal likelihood given by (25):

$$\begin{aligned} \mathcal {L}(\varvec{\theta }) = -\frac{1}{2}{\left( {\mathbf{y}}-{\mathbb {E}}_0\left[ {\mathbf{f}}_D\right] \right) }^T\textit{\textbf{K}}^{-1}_p\left( {\mathbf{y}}-\mathbb {E}_0\left[ {\mathbf{f}}_D\right] \right) , \end{aligned}$$

where \(\textit{\textbf{K}}_p = \textit{\textbf{K}}_D + \textit{\textbf{W}}\). The derivatives of \(\mathcal {L}\) with respect to each \({\theta }_i,\ i = 0, 1, \dots , l\) are written as:

$$\begin{aligned} \frac{\partial {\mathcal {L}}}{\partial {\theta }_i} = \frac{1}{2}{\left( {\mathbf{y}} - {\mathbb {E}}_0\left[ {\mathbf{f}}_D\right] \right) }^T \textit{\textbf{K}}^{-1}_p\frac{\partial \textit{\textbf{K}}_p}{\partial {\theta }_i} \textit{\textbf{K}}^{-1}_p\left( {\mathbf{y}} - {\mathbb {E}}_0\left[ {\mathbf{f}}_D\right] \right) - \frac{1}{2} {\mathrm {tr}\left( \textit{\textbf{K}}^{-1}_p\frac{\partial \textit{\textbf{K}}_p}{\partial {\theta }_i}\right) }, \end{aligned}$$
(32)

where the derivatives of \(\textit{\textbf{K}}_p\) are as follows:

$$\begin{aligned} \frac{\partial \textit{\textbf{K}}_p}{\partial {\sigma }^2} =\frac{\partial {\mathbf{W}}}{\partial {\sigma }^2}= & {} \mathrm {diag}\left( \frac{1}{{\mathrm {w}}_1},\frac{1}{{\mathrm {w}}_2},\dots ,\frac{1}{{\mathrm {w}}_N}\right) , \end{aligned}$$
(33)
$$\begin{aligned} \frac{\partial \textit{\textbf{K}}_p}{\partial {\theta }_i}= & {} \frac{\partial \textit{\textbf{K}}_D}{\partial {\theta }_i} ,\quad i = 1, 2,\dots , l. \end{aligned}$$
(34)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ramirez-Padron, R., Mederos, B. & Gonzalez, A.J. Robust weighted Gaussian processes. Comput Stat 36, 347–373 (2021). https://doi.org/10.1007/s00180-020-01011-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-020-01011-0

Keywords

Navigation