Abstract
Streaming applications, such as environment monitoring and vehicle location tracking require handling high volumes of continuously arriving data and sudden fluctuations in these volumes while efficiently supporting multi-dimensional historical queries. The use of the traditional database management systems is inappropriate because they require excessive number of disk I/O in continuously updating massive data streams. In this paper, we propose DCF (Data Stream Clustering Framework), a novel framework that supports efficient data stream archiving for streaming applications. DCF can reduce a great amount of disk I/O in the storage system by grouping incoming data into clusters and storing them instead of raw data elements. In addition, even when there is a temporary fluctuation in the amount of incoming data, it can stably support storing all incoming raw data by controlling the cluster size. Our experimental results show that our approach significantly reduces the number of disk accesses in terms of both inserting and retrieving data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Guttman, A.: R-Trees: A Dynamic Index Structure for Spatial Searching. In: Proceedings of ACM SIGMOD, pp. 47–57 (1984)
Wolfson, O., Prasad Sistla, A., Chamberlain, S., Yesha, Y.: Updating and Querying Databases that Track Mobile Units. Special issue on mobile data management and applications of distributed and parallel databases 7(3), 257–387 (1999)
Kwon, D., Lee, S., Lee, S.: Indexing the Current Positions of Moving Objects Using the Lazy Update R-tree. In: Proceeding of the Third International Conference on Mobile Data Management, Singapore (Januray 2002)
Lee, M.L., Hsu, W., Jensen, C.S., Cui, B., Teo, K.L.: Supporting Frequent Updates in R-Trees: A Bottom-Up Approach. In: Proceedings of the 29th VLDB Conferences, Berlin, Germany, pp. 608–619 (2003)
Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast Subsequence Matching in Time-Series Databases. In: Proceeding of ACM SIGMOD Conference, Mineapolis, MN (1994)
Kamel, I., Faloutsos, C.: On Packing R–trees. In: Proceedings of the second international conference on Information and Knowledge Management, Washington D.C., US, pp. 490–499 (1993)
Dewitt, D.J., Kabra, N., Luo, J., Patel, J.M., Yu, J.-B.: Client-server Paradise. In: Proceedings of the 20th International Conference on Very Large Data Base (VLDB 1994), pp. 558–569. Morgan Kaufmann, San Francisco (1994)
Kamel, M.K., Kouramajian, V.: Bulk insertion in dynamic R-trees. In: Proceedings of the 4th International Symposium on Spatial Data Handling (SDH 1996), pp. 3B.31–3B.42 (1996)
Leutenegger, S.T., Lopez, M.A., Edgington, J.: STR: A simple and efficient algorithm for R-tree packing. In: Proceedings of the Thirteenth International Conference on Data Engineering, pp. 497–506 (1997)
Roussopoulos, N., Leifker, D.: Direct spatial search on pictorial databases using packed R-trees. In: Proceedings ACM-SIGMOD International Conference on Management of Data, SIGMOD Record, vol. 14(4), pp. 17–31
Li, C., Choubey, R., Rundensteiner, E.A.: Bulk-insertions into R-trees using the samll-tree-large-tree approach. In: Proceedings of the sixth ACM international symposium on Advances in geographic information systems, pp. 161–162 (1998)
Lee, T., Moon, B., Lee, S.: Bulk Insertion for R-Tree by Seeded Clustering. In: Mařík, V., Štěpánková, O., Retschitzegger, W. (eds.) DEXA 2003. LNCS, vol. 2736, pp. 129–138. Springer, Heidelberg (2003)
Arge, L., Hinrichs, K.H., Vahrenhold, J., Vitter, J.S.: Efficient Bulk Operations on Dynamic R-trees. Algorithmica 33(1), 104–128 (2002)
Theodoridis, Y., Nascimento, M.A.: Generating Spatiotemporal Datasets on the WWW. SIGMOD Record 29(3), 39–43 (2000)
Golab, L., Tamer Ozsu, M.: Data Stream Management Issues – A Survey, Technical Report CS 2003-08, University of Waterloo (April 2003)
Anderberg, M.R.: Probability and Mathematical Statistics. Academic Press, New York (1973)
Vazirgiannis, M., Theodoridis, Y., Sellis, T.: Spatio-temporal composition and indexing for large multimedia applications. Multimedia Systems 6(4), 284–298 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cho, K., Jo, S., Jang, H., Kim, S.M., Song, J. (2006). DCF: An Efficient Data Stream Clustering Framework for Streaming Applications. In: Bressan, S., Küng, J., Wagner, R. (eds) Database and Expert Systems Applications. DEXA 2006. Lecture Notes in Computer Science, vol 4080. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11827405_12
Download citation
DOI: https://doi.org/10.1007/11827405_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37871-6
Online ISBN: 978-3-540-37872-3
eBook Packages: Computer ScienceComputer Science (R0)