Abstract
It is useful to predict future values in time series data, for example when there are many sensors monitoring environments such as urban space. The Gaussian Process (GP) model is considered as a promising technique for this setting. However, the GP model requires too high a training cost to be tractable for large data. Though approximation methods have been proposed to improve GP's scalability, they usually can only capture global trends in the data and fail to preserve small-scale patterns, resulting in unsatisfactory performance.
We propose a new method to apply the GP for sensor time series prediction. Instead of (eagerly) training GPs on entire datasets, we custom-build query-dependent GPs on small fractions of the data for each prediction request.
Implementing this idea in practice at scale requires us to overcome two obstacles. On the one hand, a central challenge with such a semi-lazy learning model is the substantial model-building effort at kNN query time, which could lead to unacceptable latency. We propose a novel two-level inverted-like index to support kNN search using the DTW on the GPU, making such "just-in-time" query-dependent model construction feasible for real-time applications.
On the other hand, several parameters should be tuned for each time series individually since different sensors have different data generating processes in diverse environments. Manually configuring the parameters is usually not feasible due to the large number of sensors. To address this, we devise an adaptive auto-tuning mechanism to automatically determine and dynamically adjust the parameters for each time series with little human assistance.
Our method has the following strong points: (a) it can make prediction in real time without a training phase; (b) it can yield superior prediction accuracy; and (c) it can effectively estimate the analytical predictive uncertainty.
To illustrate our points, we present SMiLer, a semi-lazy time series prediction system for sensors. Extensive experiments on real-world datasets demonstrate its effectiveness and efficiency. In particular, by devising a two-level inverted-like index on the GPU with an enhanced lower bound of the DTW, SMiLer accelerates the efficiency of kNN search by one order of magnitude over its baselines. The prediction accuracy of SMiLer is better than the state-of-the-art competitors (up to 10 competitors) with better estimation of predictive uncertainty.