Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2723372.2749429acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

SMiLer: A Semi-Lazy Time Series Prediction System for Sensors

Published: 27 May 2015 Publication History

Abstract

It is useful to predict future values in time series data, for example when there are many sensors monitoring environments such as urban space. The Gaussian Process (GP) model is considered as a promising technique for this setting. However, the GP model requires too high a training cost to be tractable for large data. Though approximation methods have been proposed to improve GP's scalability, they usually can only capture global trends in the data and fail to preserve small-scale patterns, resulting in unsatisfactory performance.
We propose a new method to apply the GP for sensor time series prediction. Instead of (eagerly) training GPs on entire datasets, we custom-build query-dependent GPs on small fractions of the data for each prediction request.
Implementing this idea in practice at scale requires us to overcome two obstacles. On the one hand, a central challenge with such a semi-lazy learning model is the substantial model-building effort at kNN query time, which could lead to unacceptable latency. We propose a novel two-level inverted-like index to support kNN search using the DTW on the GPU, making such "just-in-time" query-dependent model construction feasible for real-time applications.
On the other hand, several parameters should be tuned for each time series individually since different sensors have different data generating processes in diverse environments. Manually configuring the parameters is usually not feasible due to the large number of sensors. To address this, we devise an adaptive auto-tuning mechanism to automatically determine and dynamically adjust the parameters for each time series with little human assistance.
Our method has the following strong points: (a) it can make prediction in real time without a training phase; (b) it can yield superior prediction accuracy; and (c) it can effectively estimate the analytical predictive uncertainty.
To illustrate our points, we present SMiLer, a semi-lazy time series prediction system for sensors. Extensive experiments on real-world datasets demonstrate its effectiveness and efficiency. In particular, by devising a two-level inverted-like index on the GPU with an enhanced lower bound of the DTW, SMiLer accelerates the efficiency of kNN search by one order of magnitude over its baselines. The prediction accuracy of SMiLer is better than the state-of-the-art competitors (up to 10 competitors) with better estimation of predictive uncertainty.

References

[1]
Genie and lamp. http://www.comp.nus.edu.sg/~atung/gl/, 2014.
[2]
R. Adhikari and R. Agrawal. A novel weighted ensemble technique for time series forecasting. In PAKDD, pages 38--49, 2012.
[3]
T. Alabi, J. D. Blanchard, B. Gordon, and R. Steinbach. Fast k-selection algorithms for graphics processing units. Journal of Experimental Algorithmics, 17:4--2, 2012.
[4]
N. S. Altman. An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3):175--185, 1992.
[5]
I. Assent, R. Krieger, F. Afschari, and T. Seidl. The ts-tree: efficient time series search and retrieval. In EDBT, pages 252--263, 2008.
[6]
V. Athitsos, P. Papapetrou, M. Potamias, G. Kollios, and D. Gunopulos. Approximate embedding-based subsequence matching of time series. In SIGMOD, pages 365--378, 2008.
[7]
K. Bache and M. Lichman. UCI machine learning repository. https://archive.ics.uci.edu/ml, 2013.
[8]
T. Ban, R. Zhang, S. Pang, A. Sarrafzadeh, and D. Inoue. Referential knn regression for financial time series forecasting. In ICONIP, pages 601--608, 2013.
[9]
R. Barillec, B. Ingram, D. Cornford, and L. Csató. Projected sequential gaussian processes: A ctool for interpolation of large datasets with heterogeneous noise. Computers & Geosciences, 37(3):295--309, 2011.
[10]
D. Berndt and J. Clifford. Using dynamic time warping to find patterns in time series. In AAAI94 workshop on knowledge discovery in databases, pages 359--370, 1994.
[11]
G. Biau, K. Bleakley, L. Györfi, and G. Ottucsák. Non-parametric sequential prediction of time series. Journal of Nonparametric Statistics, 22(3):297--317, 2010.
[12]
E. Blanzieri and F. Melgani. Nearest neighbor classification of remote sensing images with the maximal margin principle. IEEE Transactions on Geoscience and Remote Sensing, 46(6):1804--1811, 2008.
[13]
T. Bollerslev. Generalized autoregressive conditional heteroskedasticity. Journal of econometrics, 31(3):307--327, 1986.
[14]
L. Bottou. On-line learning and stochastic approximations. In On-line learning in neural networks, pages 9--42. Cambridge University Press, 1999.
[15]
G. E. Box, G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. Wiley, 4th edition, 2008.
[16]
P. Boyle and M. Frean. Dependent gaussian processes. In NIPS, pages 217--224, 2004.
[17]
S. Brahim-Belhouari and A. Bermak. Gaussian process for nonstationary time series prediction. Computational Statistics & Data Analysis, 47(4):705--712, 2004.
[18]
D. Chakrabarti and C. Faloutsos. F4: large-scale automated forecasting using fractals. In CIKM, pages 2--9, 2002.
[19]
C.-C. Chang and C.-J. Lin. Libsvm: a library for support vector machines. ACM TIST, 2(3):27, 2011.
[20]
N. Chapados and Y. Bengio. Augmented functional time series representation and forecasting with gaussian processes. In NIPS, pages 457--464, 2007.
[21]
L. Chen and R. Ng. On the marriage of lp-norms and edit distance. In VLDB, pages 792--803, 2004.
[22]
L. Chen, M. T. Özsu, and V. Oria. Robust and fast similarity search for moving object trajectories. In SIGMOD, pages 491--502, 2005.
[23]
T. Chen and J. Ren. Bagging for gaussian process regression. Neurocomputing, 72(7):1605--1610, 2009.
[24]
Y. Chen, M. Nascimento, B. Ooi, and A. Tung. Spade: On shape-based pattern detection in streaming time series. In ICDE, pages 786--795, 2007.
[25]
L. Csató and M. Opper. Sparse on-line gaussian processes. Neural computation, 14(3):641--668, 2002.
[26]
C. Cuda. Programming guide. NVIDIA Corporation, 2014.
[27]
M. Cuturi. Fast global alignment kernels. In ICML, pages 929--936, 2011.
[28]
M. Cuturi. Pems-sf data set. https://archive.ics.uci.edu/ml/datasets/PEMS-SF, 2014.
[29]
DataMarket. Internet traffic data. http://data.is/19Cbyed, 2014.
[30]
H. Ding, G. Trajcevski, P. Scheuermann, X. Wang, and E. Keogh. Querying and mining of time series data: experimental comparison of representations and distance measures. PVLDB, 1(2):1542--1552, 2008.
[31]
R. F. Engle. Autoregressive conditional heteroscedasticity with estimates of the variance of united kingdom inflation. Econometrica, pages 987--1007, 1982.
[32]
C. Faloutsos and R. T. Snodgrass. Fast subsequence matching in time-series databases. In SIGMOD, pages 419--429, 1994.
[33]
R. Furrer, M. G. Genton, and D. Nychka. Covariance tapering for interpolation of large spatial datasets. Journal of Computational and Graphical Statistics, 15(3), 2006.
[34]
A. Girard, C. E. Rasmussen, J. Quinonero-Candela, and R. Murray-Smith. Gaussian process priors with uncertain inputs-application to multiple-step ahead time series forecasting. In NIPS, pages 545--552, 2002.
[35]
T. Hachino and V. Kadirkamanathan. Time series forecasting using multiple gaussian process prior model. In CIDM, pages 604--609, 2007.
[36]
W.-S. Han, J. Lee, Y.-S. Moon, S.-W. Hwang, and H. Yu. A new approach for processing ranked subsequence matching based on ranked union. In SIGMOD, pages 457--468, 2011.
[37]
W.-S. Han, J. Lee, Y.-S. Moon, and H. Jiang. Ranked subsequence matching in time-series databases. In VLDB, pages 423--434, 2007.
[38]
C. C. Holt. Forecasting seasonals and trends by exponentially weighted moving averages. International Journal of Forecasting, 20(1):5--10, 2004.
[39]
R. J. Hyndman and Y. Khandakar. Automatic time series forecasting: The forecast package for r. Journal of Statistical Software, 27(i03), 2008.
[40]
J.-T. Jeng, C.-C. Chuang, and C.-W. Tao. Hybrid svmr-gpr for modeling of chaotic time series systems with noise and outliers. Neurocomputing, 73(10):1686--1693, 2010.
[41]
E. Keogh. Exact indexing of dynamic time warping. In VLDB, pages 406--417, 2002.
[42]
Y. Liu, A. Choudhary, J. Zhou, and A. Khokhar. A scalable distributed stream mining system for highway traffic data. In PKDD, pages 309--321. 2006.
[43]
K. H. Low, J. Yu, J. Chen, and P. Jaillet. Parallel gaussian process regression for big data: low-rank representation meets markov approximation. In AAAI, 2015.
[44]
J. McNames. A nearest trajectory strategy for time series prediction. In Proceedings of the International Workshop on Advanced Black-Box Techniques for Nonlinear Modeling, pages 112--128, 1998.
[45]
Y.-S. Moon, K.-Y. Whang, and W.-K. Loh. Duality-based subsequence matching in time-series databases. In ICDE, pages 263--272, 2001.
[46]
K.-R. Müller, A. J. Smola, G. Rätsch, B. Schölkopf, J. Kohlmorgen, and V. Vapnik. Predicting time series with support vector machines. In ICANN, pages 999--1004, 1997.
[47]
R. M. Neal. Bayesian Learning for Neural Networks. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 1996.
[48]
D. Nguyen-Tuong, M. Seeger, and J. Peters. Model learning with local gaussian process regression. Advanced Robotics, 23(15):2015--2034, 2009.
[49]
A. O'Hagan and J. Kingman. Curve fitting and optimal design for prediction. Journal of the Royal Statistical Society. Series B (Methodological), pages 1--42, 1978.
[50]
C. J. Paciorek and M. J. Schervish. Nonstationary covariance functions for gaussian process regression. In NIPS, pages 273--280, 2003.
[51]
C. Park, J. Z. Huang, and Y. Ding. Domain decomposition approach for fast gaussian process regression of large spatial data sets. JMLR, 12:1697--1728, 2011.
[52]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al. Scikit-learn: Machine learning in python. JMLR, 12:2825--2830, 2011.
[53]
PeMS. Freeway performance measurement system. http://pems.dot.ca.gov/, 2014.
[54]
T. Rakthanmanon, B. Campana, A. Mueen, G. Batista, B. Westover, Q. Zhu, J. Zakaria, and E. Keogh. Searching and mining trillions of time series subsequences under dynamic time warping. In KDD, pages 262--270, 2012.
[55]
C. E. Rasmussen. Evaluation of Gaussian Process and Other Methods for Non-linear Regression. PhD thesis, University of Toronto, 1996.
[56]
C. E. Rasmussen and C. K. Williams. Gaussian processes for machine learning. MIT Press, 2006.
[57]
C. A. Ratanamahatana and E. Keogh. Three myths about dynamic time warping data mining. In SDM, pages 506--510, 2005.
[58]
G. Ristanoski, W. Liu, and J. Bailey. Time series forecasting using distribution enhanced linear regression. In PAKDD, pages 484--495. 2013.
[59]
P. J. Rousseeuw and A. M. Leroy. Robust regression and outlier detection, volume 589. John Wiley & Sons, 2005.
[60]
D. Sart, A. Mueen, W. Najjar, E. Keogh, and V. Niennattrakul. Accelerating dynamic time warping subsequence search with gpus and fpgas. In ICDM, pages 1001--1006, 2010.
[61]
N. Segata and E. Blanzieri. Fast and scalable local kernel machines. JMLR, 11:1883--1926, 2010.
[62]
SheffieldML. Gpy gaussian process toolkit. http://sheffieldml.github.io/GPy/, 2014.
[63]
L. T. A. Singapore. Carpark lots availability. https://www.mytransport.sg/content/mytransport/home/dataMall.html, 2014.
[64]
S. Sundararajan and S. S. Keerthi. Predictive approaches for choosing hyperparameters in gaussian processes. Neural Computation, 13(5):1103--1118, 2001.
[65]
M. K. Titsias. Variational learning of inducing variables in sparse gaussian processes. In AISTATS, pages 567--574, 2009.
[66]
M. Vlachos, G. Kollios, and D. Gunopulos. Discovering similar multidimensional trajectories. In ICDE, pages 673--684, 2002.
[67]
Y. Wang and B. Chaib-Draa. A knn based kalman filter gaussian process regression. In IJCAI, pages 1771--1777, 2013.
[68]
P. J. Werbos. Generalization of backpropagation with application to a recurrent gas market model. Neural Networks, 1(4):339--356, 1988.
[69]
C. Williams and M. Seeger. Using the nyström method to speed up kernel machines. In NIPS, pages 682--688, 2001.
[70]
C. K. Williams and C. E. Rasmussen. Gaussian processes for regression. In NIPS, pages 514--520, 1996.
[71]
P. R. Winters. Forecasting sales by exponentially weighted moving averages. Management Science, 6(3):324--342, 1960.
[72]
W. Yan, H. Qiu, and Y. Xue. Gaussian process for long-term time-series forecasting. In IJCNN, pages 3420--3427, 2009.
[73]
A. Zakai and Y. Ritov. Consistency and localizability. JMLR, 10:827--856, 2009.
[74]
H. Zhang, A. C. Berg, M. Maire, and J. Malik. Svm-knn: Discriminative nearest neighbor classification for visual category recognition. In CVPR, pages 2126--2136, 2006.
[75]
T. Zhang. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In ICML, pages 116--123, 2004.
[76]
J. Zhou, A. K. Tung, W. Wu, and W. S. Ng. R2-d2: a system to support probabilistic path prediction in dynamic environments via 'semi-lazy" learning. PVLDB, 6(12):1366--1369, 2013.
[77]
J. Zhou, A. K. Tung, W. Wu, and W. S. Ng. A "semi-lazy" approach to probabilistic path prediction in dynamic environments. In KDD, pages 748--756, 2013.
[78]
Y. Zhu and D. Shasha. Warping indexes with envelope transforms for query by humming. In SIGMOD, pages 181--192, 2003.

Cited By

View all
  • (2024)Generating Daily Activities with Need DynamicsACM Transactions on Intelligent Systems and Technology10.1145/363749315:2(1-28)Online publication date: 22-Feb-2024
  • (2023)Robust Spatiotemporal Traffic Forecasting with Reinforced Dynamic Adversarial TrainingProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599492(1417-1428)Online publication date: 6-Aug-2023
  • (2023)Exploring the Generalizability of Spatio-Temporal Traffic Prediction: Meta-Modeling and an Analytic FrameworkIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.313076235:4(3870-3884)Online publication date: 1-Apr-2023
  • Show More Cited By

Index Terms

  1. SMiLer: A Semi-Lazy Time Series Prediction System for Sensors

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
    May 2015
    2110 pages
    ISBN:9781450327589
    DOI:10.1145/2723372
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 May 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. dtw
    2. gaussian process
    3. gpu
    4. predictive analysis
    5. semi-lazy learning
    6. sensors
    7. time series

    Qualifiers

    • Research-article

    Funding Sources

    • Singapore NRF under its IRC@SG Funding Initiative and administered by the IDMPO
    • National University of Singapore

    Conference

    SIGMOD/PODS'15
    Sponsor:
    SIGMOD/PODS'15: International Conference on Management of Data
    May 31 - June 4, 2015
    Victoria, Melbourne, Australia

    Acceptance Rates

    SIGMOD '15 Paper Acceptance Rate 106 of 415 submissions, 26%;
    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)24
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 12 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Generating Daily Activities with Need DynamicsACM Transactions on Intelligent Systems and Technology10.1145/363749315:2(1-28)Online publication date: 22-Feb-2024
    • (2023)Robust Spatiotemporal Traffic Forecasting with Reinforced Dynamic Adversarial TrainingProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599492(1417-1428)Online publication date: 6-Aug-2023
    • (2023)Exploring the Generalizability of Spatio-Temporal Traffic Prediction: Meta-Modeling and an Analytic FrameworkIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.313076235:4(3870-3884)Online publication date: 1-Apr-2023
    • (2023)A Contextual Master-Slave Framework on Urban Region Graph for Urban Village Detection2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00062(736-748)Online publication date: Apr-2023
    • (2023)Traffic Flow Prediction Based on Two-Channel Multi-Modal Fusion of MCB and AttentionIEEE Access10.1109/ACCESS.2023.328006811(58745-58753)Online publication date: 2023
    • (2023)Prediction in Smart Environments and Administration: Systematic Literature ReviewAdvanced Information Networking and Applications10.1007/978-3-031-28694-0_4(36-47)Online publication date: 15-Mar-2023
    • (2022)Time Series Based Data Explorer and Stream Analysis for Anomaly PredictionWireless Communications & Mobile Computing10.1155/2022/58859042022Online publication date: 1-Jan-2022
    • (2022)Semi-Supervised City-Wide Parking Availability Prediction via Hierarchical Recurrent Graph Neural NetworkIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.303414034:8(3984-3996)Online publication date: 1-Aug-2022
    • (2022)G-PICS: A Framework for GPU-Based Spatial Indexing and Query ProcessingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.299244034:3(1243-1257)Online publication date: 1-Mar-2022
    • (2022)Online Spatio-Temporal Crowd Flow Distribution Prediction for Complex Metro SystemIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.298595234:2(865-880)Online publication date: 1-Feb-2022
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media