Robust weighted Gaussian processes

Ruben Ramirez-Padron ORCID: orcid.org/0000-0002-2263-4931¹^nAff2,
Boris Mederos³ &
Avelino J. Gonzalez⁴

393 Accesses
2 Citations
Explore all metrics

Abstract

This paper presents robust weighted variants of batch and online standard Gaussian processes (GPs) to effectively reduce the negative impact of outliers in the corresponding GP models. This is done by introducing robust data weighers that rely on robust and quasi-robust weight functions that come from robust M-estimators. Our robust GPs are compared to various GP models on four datasets. It is shown that our batch and online robust weighted GPs are indeed robust to outliers, significantly outperforming the corresponding standard GPs and the recently proposed heteroscedastic GP method GPz. Our experiments also show that our methods are comparable to and sometimes better than a state-of-the-art robust GP that uses a Student-t likelihood.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distributed robust Gaussian Process regression

Article 19 July 2017

Review of Recent Advances in Gaussian Process Regression Methods

Sparse Information Filter for Fast Gaussian Process Regression

References

Agostinelli C, Greco L (2013) A weighted strategy to handle likelihood uncertainty in Bayesian inference. Comput Stat 28:319–339
Article MathSciNet Google Scholar
Almosallam IA, Jarvis MJ, Roberts SJ (2016) GPz: non-stationary sparse Gaussian processes for heteroscedastic uncertainty estimation in photometric redshifts. Mon Not R Astron Soc 462(1):726–739
Article Google Scholar
Bentley JL (1980) Multidimensional divide and conquer. Commun ACM 23(4):214–229
Article MathSciNet Google Scholar
Bernholt T, Fischer P (2004) The complexity of computing the MCD-estimator. Theor Comput Sci 326:383–398
Article MathSciNet Google Scholar
Bishop CM (2006) Pattern recognition and machine learning. Springer, New York
MATH Google Scholar
Buta R (1987) The structure and dynamics of ringed galaxies, III: surface photometry and kinematics of the ringed nonbarred spiral NGC7531. Astrophys J Suppl Ser 64:1–37
Article Google Scholar
Csató L (2002) Gaussian processes—iterative sparse approximations. PhD thesis. Aston University, Birmingham, UK. http://publications.aston.ac.uk/id/eprint/1327/
Csató L, Opper M (2002) Sparse on-line Gaussian processes. Neural Comput 14(3):641–668
Article Google Scholar
de Boor CA (2001) Practical guide to splines. Applied mathematical sciences, vol 27, revised edn. Springer, New York
MATH Google Scholar
Dennis JE Jr, Welsch RE (1978) Techniques for nonlinear least squares and robust regression. Commun Stat Simul Comput 7(4):345–359
Article Google Scholar
Drucker H, Burges CJ, Kaufman L, Smola AJ, Vapnik V (1997) Support vector regression machines. Advances in neural information processing systems, pp 155–161
Geweke J (1993) Bayesian treatment of the independent Student-t linear model. J Appl Econom 8(S1):S19–S40
Article Google Scholar
Girden ER (1992) ANOVA: repeated measures. Sage University Paper Series on Quantitative Applications in the Social Sciences 07-084, Sage University, Newbury Park, CA
Greco L, Racugno W, Ventura L (2008) Robust likelihood functions in Bayesian analysis. J Stat Plan Inference 138(5):1258–1270
Article Google Scholar
Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA (1986) Robust statistics. The approach based on influence functions. Wiley, New York
MATH Google Scholar
Huber PJ (1964) Robust estimation of a location parameter. Ann Stat 53(1):73–101
Article MathSciNet Google Scholar
Huber PJ, Ronchetti EM (1981) Robust statistics. Wiley, New York
Book Google Scholar
Jylänki P, Vanhatalo J, Vehtari A (2011) Robust Gaussian process regression with a student-t likelihood. J Mach Learn Res 12:3227–3257
MathSciNet MATH Google Scholar
Kemmler M, Rodner E, Denzler J (2010) One-class classification with Gaussian processes. In: Proceedings of the Asian conference on computer vision. Lecture notes in computer science, vol 6493. Springer, pp 489–500
Kuss M (2006) Gaussian process models for robust regression, classification, and reinforcement learning. Doctoral dissertation, Technische Universität Darmstadt, Germany. http://hdl.handle.net/11858/00-001M-0000-0013-D2CD-C
Kuss M, Pfingsten T, CsatóL, Rasmussen CE (2005) Approximate inference for robust Gaussian process regression. Max Planck Institute for Biological Cybernetics, Tübingen, Germany, Technical Report 136. http://hdl.handle.net/11858/00-001M-0000-0013-D703-4
Le QV, Smola AJ, Canu S (2005) Heteroscedastic Gaussian process regression. In: Proceedings of the 22nd international conference on machine learning. ACM, pp 489–496
MacKay DJC (1998) Introduction to Gaussian processes. In: Bishop CM (ed) Neural networks and machine learning. NATO ASI series, vol 168. Springer, Berlin, pp 133–165
Google Scholar
Mahalanobis PC (1936) On the generalised distance in statistics. Proc Natl Inst Sci India 2(1):49–55
MathSciNet MATH Google Scholar
Maronna RA, Martin DR, Yohai VJ (2006) Robust statistics: theory and methods. Wiley, Chichester
Book Google Scholar
Mattos CLC, Santos JDA, Barreto GA (2015) An empirical evaluation of robust Gaussian process models for system identification. In: International conference on intelligent data engineering and automated learning. Springer, Cham, pp 172–180
Minka TP (2001) Expectation propagation for approximate Bayesian Inference. In: Proceedings of the seventeenth conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc., pp 362–369
Murphy L, Martin S, Corke P (2012) Creating and using probabilistic cost maps from vehicle experience. In: Proceedings of IEEE/RSJ international conference on intelligent robots and systems, intelligent robots and systems (IROS). IEEE, pp 4689–4694
Neal RM (1997) Monte Carlo implementation of Gaussian process models for Bayesian regression and classification. Technical Report 9702, Department of Statistics and Department of Computer Science, University of Toronto. arXiv:physics/9701026
Opper M (1998) A Bayesian approach to on-line learning. In: Saad D (ed) On-line learning in neural networks. Cambridge University Press, Cambridge
MATH Google Scholar
Ramirez-Padron R (2015) Batch and online implicit weighted Gaussian processes for robust Novelty detection. Doctoral dissertation, University of Central Florida. http://purl.fcla.edu/fcla/etd/CFE0005869
Ramirez-Padron R, Mederos B, Gonzalez AJ (2013) Novelty detection using sparse online Gaussian processes for visual object recognition. In: International FLAIRS conference. St. Pete Beach, FL, USA, pp 124–129
Ranjan R, Huang B, Fatehi A (2016) Robust Gaussian process modeling using EM algorithm. J Process Control 42:125–136
Article Google Scholar
Rasmussen CE, Williams C (2006) Gaussian processes for machine learning. MIT Press, Cambridge
MATH Google Scholar
Rey WJJ (1983) Introduction to robust and quasi-robust statistical methods. Springer, Berlin
Book Google Scholar
Rottmann A, Burgard W (2010) Learning non-stationary system dynamics online using Gaussian processes. In: Proceedings of 32nd DAGM symposium, Darmstadt, Germany, pp 192–201
Rousseeuw PJ (1984) Least median of squares regression. J Am Stat Assoc 79(388):871–880
Article MathSciNet Google Scholar
Schmidt G, Mattern R, Schüler F (1981) Biomechanical investigation to determine physical and traumatological differentiation criteria for the maximum load capacity of head and vertebral column with and without protective helmet under effects of impact. EEC Research Program on Biomechanics of Impacts, Final Report Phase III, Project 65, Institut für Rechtsmedizin, Universität Heidelberg, Germany
Seeger M (2004) Gaussian processes for machine learning. Int J Neural Syst 14(2):69–106
Article Google Scholar
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge
Book Google Scholar
Silverman BW (1985) Some aspects of the spline smoothing approach to non-parametric curve fitting. J R Stat Soc Ser B (Methodol) 47(1):1–52
MATH Google Scholar
Sugiyama M, Krauledat M, Müller KR (2007) Covariate shift adaptation by importance weighted cross validation. J Mach Learn Res 8:985–1005
MATH Google Scholar
Tipping ME (2000) The relevance vector machine. In: Advances in neural information processing systems, pp 652–658
Tipping ME, Lawrence ND (2005) Variational inference for Student-t models: robust Bayesian interpolation and generalised component analysis. Neurocomputing 69(1–3):123–141
Article Google Scholar
Verboven S, Hubert M (2005) LIBRA: a MATLAB Library for robust analysis. Chemometr Intell Lab Syst 75:127–136
Article Google Scholar
Wald I, Havran V (2006) On building fast kd-trees for ray tracing, and on doing that in O(N log N). In: Proceedings of the 2006 IEEE symposium on interactive ray tracing, pp 61–69
Wang B, Mao Z (2019) Outlier detection based on Gaussian process with application to industrial processes. Appl Soft Comput J 76:505–516
Article Google Scholar
West M (1984) Outlier models and prior distributions in Bayesian linear regression. J R Stat Soc (Ser B) 46(3):431–439
MathSciNet MATH Google Scholar
Williams CKI, Barber D (1998) Bayesian classification with Gaussian processes. IEEE Trans Pattern Anal Mach Intell 12(20):1342–1351
Article Google Scholar
Yeh C (1998) Modeling of strength of high performance concrete using artificial neural networks. Cem Concr Res 28(12):1797–1808
Article Google Scholar

Download references

Author information

Ruben Ramirez-Padron
Present address: TeleTracking Technologies, Pittsburgh, PA, USA

Authors and Affiliations

Department of Electrical and Computer Engineering, University of Central Florida, Orlando, FL, USA
Ruben Ramirez-Padron
Department of Physics and Mathematics, Universidad Autónoma de Ciudad Juárez, Ciudad Juárez, Mexico
Boris Mederos
Department of Computer Science, University of Central Florida, Orlando, FL, USA
Avelino J. Gonzalez

Authors

Ruben Ramirez-Padron
View author publications
You can also search for this author in PubMed Google Scholar
Boris Mederos
View author publications
You can also search for this author in PubMed Google Scholar
Avelino J. Gonzalez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruben Ramirez-Padron.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Estimation of weighted GP hyperparameters

Here we derive the expressions for the derivatives of the log marginal likelihood of the batch RWGP with respect to the GP hyperparameters $\varvec{\theta }$ = $({\theta }_{0}, \varvec{\theta }_k)^T = ({\sigma }^{2},{\varvec{\theta }}_k)^T$, where $\varvec{\theta }_k = (\theta _1, \theta _2, \dots , \theta _{l})$, with $l \ge 0$ (i.e. $\varvec{\theta }_k$ might be empty), denotes the kernel hyperparameters. Given the training set D, the MML method consists of finding the hyperparameter values that maximize the log marginal likelihood given by (25):

$$\begin{aligned} \mathcal {L}(\varvec{\theta }) = -\frac{1}{2}{\left( {\mathbf{y}}-{\mathbb {E}}_0\left[ {\mathbf{f}}_D\right] \right) }^T\textit{\textbf{K}}^{-1}_p\left( {\mathbf{y}}-\mathbb {E}_0\left[ {\mathbf{f}}_D\right] \right) , \end{aligned}$$

where $\textit{\textbf{K}}_p = \textit{\textbf{K}}_D + \textit{\textbf{W}}$. The derivatives of $\mathcal {L}$ with respect to each ${\theta }_i,\ i = 0, 1, \dots , l$ are written as:

$$\begin{aligned} \frac{\partial {\mathcal {L}}}{\partial {\theta }_i} = \frac{1}{2}{\left( {\mathbf{y}} - {\mathbb {E}}_0\left[ {\mathbf{f}}_D\right] \right) }^T \textit{\textbf{K}}^{-1}_p\frac{\partial \textit{\textbf{K}}_p}{\partial {\theta }_i} \textit{\textbf{K}}^{-1}_p\left( {\mathbf{y}} - {\mathbb {E}}_0\left[ {\mathbf{f}}_D\right] \right) - \frac{1}{2} {\mathrm {tr}\left( \textit{\textbf{K}}^{-1}_p\frac{\partial \textit{\textbf{K}}_p}{\partial {\theta }_i}\right) }, \end{aligned}$$

(32)

where the derivatives of $\textit{\textbf{K}}_p$ are as follows:

$$\begin{aligned} \frac{\partial \textit{\textbf{K}}_p}{\partial {\sigma }^2} =\frac{\partial {\mathbf{W}}}{\partial {\sigma }^2}= & {} \mathrm {diag}\left( \frac{1}{{\mathrm {w}}_1},\frac{1}{{\mathrm {w}}_2},\dots ,\frac{1}{{\mathrm {w}}_N}\right) , \end{aligned}$$

(33)

$$\begin{aligned} \frac{\partial \textit{\textbf{K}}_p}{\partial {\theta }_i}= & {} \frac{\partial \textit{\textbf{K}}_D}{\partial {\theta }_i} ,\quad i = 1, 2,\dots , l. \end{aligned}$$

(34)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ramirez-Padron, R., Mederos, B. & Gonzalez, A.J. Robust weighted Gaussian processes. Comput Stat 36, 347–373 (2021). https://doi.org/10.1007/s00180-020-01011-0

Download citation

Received: 10 May 2018
Accepted: 29 June 2020
Published: 09 July 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s00180-020-01011-0

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Distributed robust Gaussian Process regression

Review of Recent Advances in Gaussian Process Regression Methods

Sparse Information Filter for Fast Gaussian Process Regression

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Estimation of weighted GP hyperparameters

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Robust weighted Gaussian processes

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Distributed robust Gaussian Process regression

Review of Recent Advances in Gaussian Process Regression Methods

Sparse Information Filter for Fast Gaussian Process Regression

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Estimation of weighted GP hyperparameters

Appendix: Estimation of weighted GP hyperparameters

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation