Abstract
More and more emerging applications are involved in monitoring multiple data streams concurrently. In these applications, the data flow out of multiple concurrent sources continuously. In such large-scale real-time monitoring applications, continuously identifying representatives out of massive streams is an important task which aims to capture key trends to support online monitoring and analysis. In this paper, we present a framework for continuously extracting representatives out of massive streams. Our framework identifies and traces representatives based on core clustering technique. We adapt the core clustering model under streaming condition and propose a method of extracting representatives by utilizing the advantage characteristic of core clusters that core set is tight. In order to continuously identify the representatives in an efficient way, we apply online representatives adjust processes only when significant clustering evolution happens. As shown in our experimental studies, our algorithm is effective and efficient.
This work is supported by the National Natural Science Foundation of China under Grant No.61103025.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
CANARY User’s Manual, VERSION 4.2., http://www.epa.gov/NHSRC/news/news122007.html
Papadimitriou, S., Sun, J., Faloutsos, C.: Streaming pattern discovery in multiple time-series. In: VLDB (2005)
Dai, B.-R., Huang, J.-W., Yeh, M.-Y., Chen, M.-S.: Adaptive Clustering for Multiple Evolving Streams. IEEE Trans. Knowledge and Data Eng. 18(9), 1166–1180 (2006)
Center for Water System at University of Exeter, http://centres.exeter.ac.uk/cws
Rodrigues, P.P., Gama, J., Pedroso, J.P.: ODAC: Hierarchical Clustering of Time Series Data Streams. In: Proc. Sixth SIAM Int’l Conf. Data Mining, pp. 499–503 (2006)
Yeh, M., Dai, B., Chen, M.: Clustering over Multiple Evolving Streams by Events and Corre-lations. TKDE 19(10), 1349–1362 (2007)
Wang, H., Wang, W., Yang, J., et al.: Clustering by Pattern Similarity in Large Data Sets. In: The Int’l Conf on Management of Data, Madison (2002)
Jiang, L., Yang, D., Tang, S., Ma, X., Zhang, D.: A Core Clustering Approach for Cube Slice. Journal of Computer Research and Development, 359–365 (2006)
Mueen, A., Nath, S., Liu, J.: Fast approximate correlation for massive time-series data. In: SIGMOD (2010)
Li, L., McCann, J., Pollard, N., Faloutsos, C.: DynaMMO: Mining and Summarization of Coevolving Sequences with missing values. In: SIGKDD (2009)
Zhou, A., Cao, F., Yan, Y., Sha, C., He, X.: Distributed Data Stream Clustering: A Fast EM-based Approach. In: ICDE (2007)
Cormode, G., Muthukrishnan, S., Zhuang, W.: Conquering the Divide: Continuous Clustering of Distributed Data Streams. In: ICDE (2007)
Zhang, Q., Liu, J., Wang, W.: Approximate Clustering on Distributed Data Streams. In: ICDE (2008)
Rossman, L.A.: EPANET2 user’s manual. National Risk Management Research Labora-tory: U.S. Environmental Protection Agency (2000)
Ostfeld, A., Uber, J.G., Salomons, E.: Battle of water sensor networks: A design challenge for engineers and algorithms. In: WDSA (2006)
Jiang, L., Yang, D., Tang, S., Ma, X., Zhang, D.: Mining Maximal Correlated Member Clusters in High Dimensional Database. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 149–159. Springer, Heidelberg (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, Q., Ma, X., Tang, S., Xie, S. (2011). Continuously Identifying Representatives Out of Massive Streams. In: Tang, J., King, I., Chen, L., Wang, J. (eds) Advanced Data Mining and Applications. ADMA 2011. Lecture Notes in Computer Science(), vol 7120. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25853-4_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-25853-4_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25852-7
Online ISBN: 978-3-642-25853-4
eBook Packages: Computer ScienceComputer Science (R0)