Authors:
Fabian Berns
1
and
Christian Beecks
2
Affiliations:
1
Department of Computer Science, University of Münster, Germany
;
2
Department of Computer Science, University of Münster, Germany, Fraunhofer Institute for Applied Information Technology FIT, Sankt Augustin, Germany
Keyword(s):
Bayesian Machine Learning, Gaussian Process, Statistical Data Modeling.
Abstract:
Gaussian Process Models (GPMs) are applicable for a large variety of different data analysis tasks, such as time series interpolation, regression, and classification. Frequently, these models of bayesian machine learning instantiate a Gaussian Process by a zero-mean function and the well-known Gaussian kernel. While these default instantiations yield acceptable analytical quality for many use cases, GPM retrieval algorithms allow to automatically search for an application-specific model suitable for a particular dataset. State-of-the-art GPM retrieval algorithms have only been applied for small datasets, as their cubic runtime complexity impedes analyzing datasets beyond a few thousand data records. Even though global approximations of Gaussian Processes extend the applicability of those models to medium-sized datasets, sets of millions of data records are still far beyond their reach. Therefore, we develop a new large-scale GPM structure, which incorporates a divide-&-conquer-based
paradigm and thus enables efficient GPM retrieval for large-scale data. We outline challenges concerning this newly developed GPM structure regarding its algorithmic retrieval, its integration with given data platforms and technologies, as well as cross-model comparability and interpretability.
(More)