Abstract
Current sequential pattern mining algorithms often produce a large number of patterns. It is difficult for a user to explore in so many patterns and get a global view of the patterns and the underlying data. In this paper, we examine the problem of how to compress a set of sequential patterns using only K SP-Features(Sequential Pattern Features). A novel similarity measure is proposed for clustering SP-Features and an effective SP-Feature combination method is designed. We also present an efficient algorithm, called CSP(Compressing Sequential Patterns) to mine compressed sequential patterns based on the hierarchical clustering framework. A thorough experimental study with both real and synthetic datasets shows that CSP can compress sequential patterns effectively.
This work is supported by the National Natural Science Foundation of China under Grant No. 60473051.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Afrati, F., Gionis, A., Mannila, H.: Approximating a Collection of Frequent Sets. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 12–19 (2004)
Agrawal, R., Srikant, R.: Mining Sequential Patterns. In: Proceedings of the Eleventh International Conference on Data Engineering, pp. 3–14 (1995)
Chang, L., Yang, D., Tang, S., Wang, T.: Mining Compressed Sequential Patterns. Technical Report PKUCS-R-2006-3-105, Department of Computer Science & Technology, Peking University (2006)
Gribskov, M., McLachlan, A., Eisenberg, D.: Profile analysis: Detection of distantly related proteins. In: Proceeding of National Academy Science, pp. 4355–4358 (1987)
Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. In: Proceedings of International Conference on Data Engineering, pp. 215–224 (2001)
Stoye, J., Evers, D., Meyer, F.: Rose: generating sequence families. Bioinformatics 14(2), 157–163 (1998)
Xin, D., Han, J., Yan, X., Cheng, H.: Mining Compressed Frequent-Pattern Sets. In: Proceedings of International Conference on Very Large Data Bases, pp. 709–720 (2005)
Yan, X., Han, J., Afshar, R.: CloSpan: Mining Closed Sequential Patterns in Large Datasets. In: Proceddings of SIAM International Conference on Data Mining (2003)
Yan, X., Cheng, H., Han, J., Xin, D.: Summarizing Itemset Patterns: A Profile-Based Approach. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 314–323 (2005)
Yang, J., Wang, W., Yu, S.P., Han, J.: Mining Long Sequential Patterns in a Noisy Environment. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 406–417 (2002)
Wang, J., Han, J.: BIDE: Efficient Mining of Frequent Closed Sequences. In: Proceedings of International Conference on Data Engineering, pp. 79–90 (2004)
Zaki, M.J.: SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning 42(1/2), 31–60 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chang, L., Yang, D., Tang, S., Wang, T. (2006). Mining Compressed Sequential Patterns. In: Li, X., Zaïane, O.R., Li, Z. (eds) Advanced Data Mining and Applications. ADMA 2006. Lecture Notes in Computer Science(), vol 4093. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11811305_83
Download citation
DOI: https://doi.org/10.1007/11811305_83
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37025-3
Online ISBN: 978-3-540-37026-0
eBook Packages: Computer ScienceComputer Science (R0)