Nothing Special   »   [go: up one dir, main page]

skip to main content
article

Issues in data stream management

Published: 01 June 2003 Publication History

Abstract

Traditional databases store sets of relatively static records with no pre-defined notion of time, unless timestamp attributes are explicitly added. While this model adequately represents commercial catalogues or repositories of personal information, many current and emerging applications require support for on-line analysis of rapidly changing data streams. Limitations of traditional DBMSs in supporting streaming applications have been recognized, prompting research to augment existing technologies and build new systems to manage streaming data. The purpose of this paper is to review recent work in data stream management systems, with an emphasis on application requirements, data models, continuous query languages, and query evaluation.

References

[1]
N. Alon, Y. Matias, M. Szegedy. The Space Complexity of Approximating the Frequency Moments. In Proc. ACM Symp. on Theory of Computing, 1996, pp. 20--29.]]
[2]
A. Arasu, B. Babcock, S. Babu, J. McAlister, J. Widom. Characterizing Memory Requirements for Queries over Continuous Data Streams. In Proc. ACM Symp. on Principles of Database Systems, 2002, pp. 221--232.]]
[3]
A. Arasu, S. Babu, J. Widom. An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations. Technical Report, Nov. 2002. dbpubs.stanford.edu:8090/pub/2002-57.]]
[4]
R. Avnur, J. Hellerstein. Eddies: Continuously Adaptive Query Processing. In Proc. ACM Int. Conf. on Management of Data, 2000, pp. 261--272.]]
[5]
B. Babcock, S. Babu, M. Datar, R. Motwani. Chain: Operator Scheduling for Memory Minimization in Data Stream Systems. To appear in Proc. ACM Int. Conf. on Management of Data, June 2003.]]
[6]
B. Babcock, S. Babu, M. Datar, R. Motwani, J. Widom. Models and Issues in Data Streams. In Proc. ACM Symp. on Principles of Database Systems, 2002, pp. 1--16.]]
[7]
B. Babcock, M. Datar, R. Motwani. Sampling from a Moving Window over Streaming Data. In Proc. SIAM-ACM Symp. on Discrete Algorithms, 2002, pp. 633--634.]]
[8]
B. Babcock, M. Datar, R. Motwani, L. O'Callaghan. Maintaining Variance and k-Medians over Data Stream Windows. To appear in Proc. ACM Symp. on Principles of Database Systems, June 2003.]]
[9]
S. Babu, J. Widom. Exploiting k-Constraints to Reduce Memory Overhead in Continuous Queries over Data Streams. Technical Report, Nov. 2002. dbpubs.stanford.edu:8090/pub/2002-52.]]
[10]
P. Bonnet, J. Gehrke, P. Seshadri. Towards Sensor Database Systems. In Proc. Int. Conf. on Mobile Data Management, 2001, pages 3--14.]]
[11]
D. Carney, U. Cetinternel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, S. Zdonik. Monitoring streams---A New Class of Data Management Applications. In Proc. Int. Conf. on Very Large Data Bases, 2002, pp. 215--226.]]
[12]
S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Hellerstein, W. Hong, S. Krishnamurthy, S. Madden, V. Raman, F. Reiss, M. Shah. TelegraphCQ: Continuous Data flow Processing for an Uncertain World. In Proc. Conf. on Innovative Data Syst. Res, 2003, pp. 269--280.]]
[13]
S. Chandrasekaran, M. J. Franklin. Streaming Queries over Streaming Data. In Proc. Int. Conf. on Very Large Data Bases, 2002, pp. 203--214.]]
[14]
S. Chandrasekaran, S. Krishnamurthy, S. Madden, A. Deshpande, M. J. Franklin, J. M. Hellerstein, M. Shah. Windows Explained, Windows Expressed. 2003. www.cs.berkeley.edu/~sirish/research/streaquel.pdf.]]
[15]
M. Charikar, K. Chen, M. Farach-Colton. Finding frequent items in data streams. In Proc. Int. Colloquium on Automata, Languages and Programming, 2002, pp. 693--703.]]
[16]
M. Charikar, L. O'Callaghan, R. Panigrahy. Better Streaming Algorithms for Clustering Problems. To appear in Proc. ACM Symp. on Theory of Computing, June 2003.]]
[17]
J. Chen, D. DeWitt, F. Tian, Y. Wang. NiagaraCQ: A Scalable Continuous Query System for Internet Databases. In Proc. ACM Int. Conf. on Management of Data, 2000, pp. 379--390.]]
[18]
Y. Chen, G. Dong, J. Han, B. W. Wah, J. Wang. Multi-Dimensional Regression Analysis of Time-Series Data Streams. In Proc. Int. Conf. on Very Large Data Bases, 2002, pp. 323--334.]]
[19]
M. Cherniack, H. Balakrishnan, M. Balazinska, D. Carney, U. Cetintemel, Y. Xing, S. Zdonik. Scalable Distributed Stream Processing. In Proc. Conf. on Innovative Data Syst. Res, 2003.]]
[20]
G. Cormode, M. Datar, P. Indyk, S. Muthukrishnan. Comparing Data Streams Using Hamming Norms (How to Zero In). In Proc. Int. Conf. on Very Large Data Bases, 2002, pp. 335--345.]]
[21]
C. Cortes, K. Fisher, D. Pregibon, A. Rogers, F. Smith. Hancock: A Language for Extracting Signatures from Data Streams. In Proc. ACM Int. Conf. on Knowledge Discovery and Data Mining, 2000, pp. 9--17.]]
[22]
C. Cranor, Y. Gao, T. Johnson, V. Shkapenyuk, O. Spatscheck. GigaScope: High Performance Network Monitoring with an SQL Interface. In Proc. ACM Int. Conf. on Management of Data, 2002, p. 623.]]
[23]
M. Datar, A. Gionis, P. Indyk, R. Motwani. Maintaining Stream Statistics over Sliding Windows. In Proc. SIAM-ACM Symp. on Discrete Algorithms, 2002, pp. 635--644]]
[24]
D. DeHaan, E. D. Demaine, L. Golab, A. Lopez-Ortiz, J. I. Munro. Towards Identifying Frequent Items in Sliding Windows. Technical Report, March 2003. db.uwaterloo.ca/~lgolab/frequent.pdf.]]
[25]
E. Demaine, A. Lopez-Ortiz, J. I. Munro. Frequency Estimation of Internet Packet Streams with Limited Space. In Proc. European Symp. on Algorithms, 2002, pp. 348--360.]]
[26]
A. Dobra, M. Garofalakis, J. Gehrke, R. Rastogi. Processing Complex Aggregate Queries over Data Streams. In Proc. ACM Int. Conf. on Management of Data, 2002, pp. 61--72.]]
[27]
C. Estan, G. Varghese. New Directions in Traffic Measurement and Accounting. In Proc. ACM SIGCOMM Internet Measurement Workshop, 2001, pp. 75--80.]]
[28]
C. Faloutsos. Sensor Data Mining: Similarity Search and Pattern Analysis. Tutorial in Proc. Int. Conf. on Very Large Data Bases, 2002.]]
[29]
J. Feigenbaum, S. Kannan, M. Strauss, M. Viswanathan. An Approximate L1-Difference Algorithm for Massive Data Streams. In Proc. Symp. on Foundations of Computer Science, 1999. pp. 501--511.]]
[30]
P. Flajolet, G. N. Martin. Probabilistic Counting. In Proc. Symp. on Foundations of Computer Science, 1983, pp. 76--82, 1983.]]
[31]
M. Garofalakis, J. Gehrke, R. Rastogi. Querying and Mining Data Streams: You Only Get One Look. Tutorial in ACM Int. Conf. on Management of Data, 2002.]]
[32]
M. Garofalakis, P. Gibbons. Wavelet Synopses with Error Guarantees. In Proc. ACM Int. Conf. on Management of Data, 2002, pp. 476--487.]]
[33]
J. Gehrke, F. Korn, D. Srivastava. On Computing Correlated Aggregates Over Continual Data Streams. In Proc. ACM Int. Conf. on Management of Data, 2001, pp. 13--24.]]
[34]
P. Gibbons, S. Tirthapura. Estimating Simple Functions on the Union of Data Streams. In Proc. ACM Symp. on Parallel Algorithms an Architectures, 2001, pp. 281--291.]]
[35]
P. Gibbons, S. Tirthapura. Distributed Streams Algorithms for Sliding Windows. In Proc. ACM Symp. on Parallel Algorithms and Architectures, 2002, pp. 63--72.]]
[36]
A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, M. J. Strauss. QuickSAND: Quick Summary and Analysis of Network Data. Technical Report, Dec. 2001. citeseer.nj.nec.com/gilbert01quicksand.html]]
[37]
A. C. Gilbert, Y. Kotidis, S. Muthukrishnan, M. J. Strauss. Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries. In Proc. Int. Conf. on Very Large Data Bases, 2001, pp. 79--88.]]
[38]
L. Golab, M. T. Özsu. Processing Sliding Window Multi-Joins in Continuous Queries over Data Streams. Technical Report, Feb. 2003. db.uwaterloo.ca/~ddbms/publications/stream/multijoins.pdf.]]
[39]
L. Golab, M. T. Özsu. Data Stream Management Issues --- A Survey. Technical Report, Apr. 2003. db.uwaterloo.ca/~ddbms/publications/stream/streamsurvey.pdf.]]
[40]
J. Greenwald, F. Khanna. Space Efficient On-Line Computation of Quantile Summaries. In Proc. ACM Int. Conf. on Management of Data, 2001, pp. 58--66.]]
[41]
S. Guha, P. Indyk, S. Muthukrishnan, M. Strauss. Histogramming Data Streams with Fast Per-Item Processing. In Proc. Int. Colloquium on Automata, Languages and Programming, 2002, pp. 681--692.]]
[42]
S. Guha, N. Mishra, R. Motwani, L. O'Callaghan. Clustering Data Streams. In Proc. IEEE Symp. on Foundations of Computer Science, pp. 359--366.]]
[43]
M. A. Hammad, M. J. Franklin, W. G. Aref, A. K. Elmagarmid. Scheduling for shared window joins over data streams. Submitted for publication, Feb. 2003.]]
[44]
G. Hulten, L. Spencer, P. Domingos. Mining Time-Changing Data Streams. In Proc. ACM Int. Conf. on Knowledge Discovery and Data Mining, 2001, pp. 97--106.]]
[45]
J. Kang, J. Naughton, S. Viglas. Evaluating Window Joins over Unbounded Streams. To appear in Proc. Int. Conf. on Data Engineering, 2003.]]
[46]
F. Korn, S. Muthukrishnan, D. Srivastava. Reverse Nearest Neighbor Aggregates over Data Streams. In Proc. Int. Conf. on Very Large Data Bases, 2002, pp. 814--825.]]
[47]
A. Lerner, D. Shasha. AQuery: Query Language for Ordered Data, Optimization Techniques, and Experiments. Technical Report, March 2003. csdocs.cs.nyu.edu/Dienst/Repository/2.0/Body/ncstrl.nyu_cs%2fTR2003-836/pdf.]]
[48]
L. Liu, C. Pu, W. Tang. Continual Queries for Internet-Scale Event-Driven Information Delivery. In IEEE Trans. Knowledge and Data Eng., 11(4): 610--628, 1999.]]
[49]
S. Madden, M. J. Franklin. Fjording the Stream: An Architecture for Queries Over Streaming Sensor Data. In Proc. Int. Conf. on Data Engineering, 2002, pp. 555--566.]]
[50]
S. Madden, M. J. Franklin, J. M. Hellerstein, W. Hong. TAG: a Tiny AGgregation Service for Ad-Hoc Sensor Networks. In Proc. Symp. on Operating Systems Design and Implementation, 2002.]]
[51]
S. Madden, M. J. Franklin, J. M. Hellerstein, W. Hong. The Design of an Acquisitional Query Processor For Sensor Networks. To appear in Proc. ACM Int. Conf. on Management of Data, June 2003.]]
[52]
S. Madden, M. Shah, J. Hellerstein, V. Raman. Continuously Adaptive Continuous Queries Over Streams. In Proc. ACM Int. Conf. on Management of Data, 2002, pp. 49--60.]]
[53]
S. Madden, R. Szewczyk, M. J. Franklin, D. Culler. Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks. In Proc. IEEE Workshop on Mobile Computing Systems and Applications, 2002, pp. 49--58.]]
[54]
G. S. Manku, R. Motwani. Approximate Frequency Counts over Data Streams. In Proc. Int. Conf. on Very Large Data Bases, 2002, pp. 346--357.]]
[55]
G.S. Manku, S. Rajagopalan, B.G. Lindsay. Random Sampling Techniques for Space Efficient Online Computation of Order Statistics of Large Datasets. In Proc. ACM Int. Conf. on Management of Data, 1999, pp. 251--262.]]
[56]
R. Motwani, J. Widom, A. Arasu, B. Babcock, S. Babu, M. Datar, G. Manku, C. Olston, J. Rosen-stein, R. Varma. Query Processing, Approximation, and Resource Management in a Data Stream Management System. In Proc. Conf. on Innovative Data Syst. Res, 2003, pp. 245--256.]]
[57]
C. Olston, J. Jiang, J. Widom. Adaptive Filters for Continuous Queries over Distributed Data Streams. To appear in Proc. ACM Int. Conf. on Management of Data, June 2003.]]
[58]
V. Raman, A. Deshpande, J. Hellerstein. Using State Modules for Adaptive Query Processing. To appear in Proc. Int. Conf. on Data Engineering, 2003.]]
[59]
M. A. Shah, J. M. Hellerstein, S. Chandrasekaran, M. J. Franklin. Flux: An Adaptive Partitioning Operator for Continuous Query Systems. To appear in Proc. Int. Conf. on Data Engineering, 2003.]]
[60]
Stream Query Repository, www-db.stanford.edu/stream/sqr.]]
[61]
M. Sullivan, A. Heybey. Tribeca: A System for Managing Large Databases of Network Trafic. In Proc. USENIX Annual Technical Conf., 1998.]]
[62]
N. Tatbul, U. Cetintemel, S. Zdonik, M. Cherniack, M. Stonebraker. Load Shedding in a Data Stream Manager. Technical Report, Feb. 2003. www.cs.brown.edu/~tatbul/papers tatbul_tr.pdf.]]
[63]
Traderbot, www.traderbot.com.]]
[64]
P. Tucker, D. Maier, T. Sheard, L. Fegaras. Enhancing relational operators for querying over punctuated data streams. 2002. www.cse.ogi.edu/dot/niagara/pstream/punctuating.pdf.]]
[65]
P. Tucker, T. Tufte, V. Papadimos, D. Maier. NEXMark---a Benchmark for Querying Data Streams. 2002. www.cse.ogi.edu/dot/niagara/pstream/nexmark.pdf.]]
[66]
T. Urhan, M. J. Franklin. XJoin: A Reactively-Scheduled Pipelined Join Operator. In IEEE Data Engineering Bulletin, 23(2):27--33, June 2000.]]
[67]
S. Viglas and J. Naughton. Rate-Based Query Optimization for Streaming Information Sources. In Proc. ACM Int. Conf. on Management of Data, 2002, pp. 37--48.]]
[68]
H. Wang, C. Zaniolo. ATLaS: A Native Extension of SQL for Data Mining and Stream Computations. citeseer.nj.nec.com/551711.html.]]
[69]
A. Wilschut, P. Apers. Dataflow query execution in a parallel main-memory environment. In Proc. Int. Conf. Parallel and Distributed Information Systems, 1991, pp. 68--77.]]
[70]
Y. Yao and J. Gehrke. Query Processing for Sensor Networks. In Proc. Conf. on Innovative Data Syst. Res, 2003, pp. 233--244.]]
[71]
B.-K. Yi, N. Sidiropoulos, T. Johnson, H. V. Jagadish, C. Faloutsos, A. Biliris. On-Line Data Mining for Co-Evolving Time Sequences. In Proc. Int. Conf. on Data Engineering, 2000, pp. 13--22.]]
[72]
Y. Zhu, D. Shasha. StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time. In Proc. Int. Conf. on Very Large Data Bases, 2002, pp. 358--369.]]

Cited By

View all
  • (2025)Non-blocking functional dependency discovery from data streamsInformation Sciences10.1016/j.ins.2024.121531690(121531)Online publication date: Feb-2025
  • (2024)Incremental Sliding Window Connectivity over Streaming GraphsProceedings of the VLDB Endowment10.14778/3675034.367504017:10(2473-2486)Online publication date: 1-Jun-2024
  • (2024)An Overview of Continuous Querying in (Modern) Data SystemsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654679(605-612)Online publication date: 9-Jun-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record
ACM SIGMOD Record  Volume 32, Issue 2
June 2003
87 pages
ISSN:0163-5808
DOI:10.1145/776985
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2003
Published in SIGMOD Volume 32, Issue 2

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)151
  • Downloads (Last 6 weeks)10
Reflects downloads up to 12 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2025)Non-blocking functional dependency discovery from data streamsInformation Sciences10.1016/j.ins.2024.121531690(121531)Online publication date: Feb-2025
  • (2024)Incremental Sliding Window Connectivity over Streaming GraphsProceedings of the VLDB Endowment10.14778/3675034.367504017:10(2473-2486)Online publication date: 1-Jun-2024
  • (2024)An Overview of Continuous Querying in (Modern) Data SystemsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654679(605-612)Online publication date: 9-Jun-2024
  • (2024)Advancing Incremental Few-Shot Video Action Recognition with Cluster Compression and Generative SeparationInternational Journal of Pattern Recognition and Artificial Intelligence10.1142/S0218001424550139Online publication date: 28-Oct-2024
  • (2024)Unsupervised context-sensitive anomaly detection on streaming data relying on multi-view profiling2024 IEEE International Conference on Evolving and Adaptive Intelligent Systems (EAIS)10.1109/EAIS58494.2024.10569106(1-10)Online publication date: 23-May-2024
  • (2024)RRA-FFSCIL: Inter-intra classes representation and relationship augmentation federated few-shot incremental learningNeurocomputing10.1016/j.neucom.2024.127956597(127956)Online publication date: Sep-2024
  • (2024)Data Sources and ProcessingArtificial Intelligence-Driven Geographies10.1007/978-981-97-5116-7_3(71-117)Online publication date: 12-Sep-2024
  • (2024)Integrating Data Quality in Industrial Big Data Architectures: An Action Design Research StudySoftware Architecture10.1007/978-3-031-70797-1_1(3-19)Online publication date: 1-Sep-2024
  • (2023)Dynamic Load Balancing in Stream Processing Pipelines Containing Stream-Static JoinsElectronics10.3390/electronics1207161312:7(1613)Online publication date: 29-Mar-2023
  • (2023)Assess: anomaly sensitive state estimation with streaming systemsEnergy Informatics10.1186/s42162-023-00276-16:S1Online publication date: 19-Oct-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media