Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Database-support for continuous prediction queries over streaming data

Published: 01 September 2010 Publication History

Abstract

Prediction is emerging as an essential ingredient for real-time monitoring, planning and decision support applications such as intrusion detection, e-commerce pricing and automated resource management. This paper presents a system that efficiently supports continuous prediction queries (CPQs) over streaming data using seamlessly-integrated probabilistic models. Specifically, we describe how to execute and optimize CPQs using discrete (Dynamic) Bayesian Networks as the underlying predictive model. Our primary contribution is a novel cost-based optimization framework that employs materialization, sharing, and model-specific optimization techniques to enable highly-efficient point- and range-based CPQ execution. Furthermore, we support efficient execution of top-k and threshold-based high probability queries. We characterize the behavior of our system and demonstrate significant performance gains using a prototype implementation operating on real-world network intrusion data and deployed as part of a real-time software-performance monitoring system.

References

[1]
Jensen, F. V. Bayesian Networks and Decision Graphs. Springer-Verlag, 2001.
[2]
Bravo, H. C. and Ramakrishnan, R. Optimizing mpf queries: decision support and probabilistic inference. SIGMOD 2007.
[3]
Duan, S. and Babu, S. Processing forecasting queries. VLDB 2007.
[4]
Pearl, J. Probabilistic Reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, 1988.
[5]
Kanagal B., Deshpande A. Online Filtering, Smoothing and Probabilistic Modeling of Streaming data. ICDE 2008.
[6]
Letchner J. et al. Access Methods for Markovian Streams. ICDE 2009.
[7]
Wu D., Wong M.: Global Propagation in Bayesian Networks Vs Semijoin Programs in Relational Databases. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 13(5), 2005.
[8]
Wong S. K. M., Butz C. J., and Xiang Y. A method for implementing a probabilistic model as a relational database. UAI, 556--564, Montreal, 1995.
[9]
Hettich, S. and Bay, S. D. The UCI KDD Archive {http://kdd.ics.uci.edu}. Irvine, CA: University of California, Department of Information and Computer Science 1999.
[10]
Zhang N. L. and Poole D. Exploiting causal independence in Bayesian networks inference, JAIR 5, 1996.
[11]
Chaudhuri S. and Shim K. Including Group-By in Query Optimization. VLDB'94.
[12]
Chaudhuri S. and Shim K. Optimizing queries with aggregate views. In EDBT'96.
[13]
Reiss, S. P. Dynamic detection and visualization of software phases. WODA '05.
[14]
Reiss, S. P. Visual representations of executing programs. Journal of Visual Languages and Computing 18, 2, 2007.
[15]
MySQL Prepared Statements. http://dev.mysql.com/tech-resources/articles/4.1/prepared-statements.html
[16]
Murphy K. "Dynamic Bayesian Networks: Representation, Inference and Learning". PhD Thesis. UC Berkeley, 2002.
[17]
Ghahramani Z. Learning Dynamic Bayesian Networks. Adaptive Processing of Sequences and Data Structures. Lecture Notes in Artificial Intelligence, 1387, 168--197, 1998.
[18]
Arlitt M. and Jin T., "1998 World Cup Web Site Access Logs", August 1998. www.acm.org/sigcomm/ITA.
[19]
Witten I. H., Frank E. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufman, 2005.
[20]
Jetty, open source web server. http://www.mortbay.org/jetty/
[21]
H2 Database Engine. www.h2database.com
[22]
Zhang N. L. and Poole D. A simple approach to Bayesian network computations. Tenth Canadian Conference on Artificial Intelligence, 171--178, 1994.
[23]
Boncz P., et al. Database architecture optimized for the new bottleneck: Memory access. VLDB 1999.
[24]
Guo, L., et al. Efficient top-k processing over query-dependent functions. PVLDB 2008.
[25]
Soliman M. A. and Ilyas I. F. Top-k query processing in uncertain databases. ICDE 2007.

Cited By

View all
  • (2012)Efficient integration of external information into forecast models from the energy domainProceedings of the 16th East European conference on Advances in Databases and Information Systems10.1007/978-3-642-33074-2_11(139-152)Online publication date: 18-Sep-2012

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 3, Issue 1-2
September 2010
1658 pages

Publisher

VLDB Endowment

Publication History

Published: 01 September 2010
Published in PVLDB Volume 3, Issue 1-2

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)1
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2012)Efficient integration of external information into forecast models from the energy domainProceedings of the 16th East European conference on Advances in Databases and Information Systems10.1007/978-3-642-33074-2_11(139-152)Online publication date: 18-Sep-2012

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media