Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach

Published: 01 November 2004 Publication History

Abstract

Sequential pattern mining is an important data mining problem with broad applications. However, it is also a difficult problem since the mining may have to generate or examine a combinatorially explosive number of intermediate subsequences. Most of the previously developed sequential pattern mining methods, such as GSP, explore a candidate generation-and-test approach [1] to reduce the number of candidates to be examined. However, this approach may not be efficient in mining large sequence databases having numerous patterns and/or long patterns. In this paper, we propose a projection-based, sequential pattern-growth approach for efficient mining of sequential patterns. In this approach, a sequence database is recursively projected into a set of smaller projected databases, and sequential patterns are grown in each projected database by exploring only locally frequent fragments. Based on an initial study of the pattern growth-based sequential pattern mining, FreeSpan [8], we propose a more efficient method, called PSP, which offers ordered growth and reduced projected databases. To further improve the performance, a pseudoprojection technique is developed in PrefixSpan. A comprehensive performance study shows that PrefixSpan, in most cases, outperforms the a priori-based algorithm GSP, FreeSpan, and SPADE [29] (a sequential pattern mining algorithm that adopts vertical data format), and PrefixSpan integrated with pseudoprojection is the fastest among all the tested algorithms. Furthermore, this mining methodology can be extended to mining sequential patterns with user-specified constraints. The high promise of the pattern-growth approach may lead to its further extension toward efficient mining of other kinds of frequent patterns, such as frequent substructures.

References

[1]
R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 1994 Int'l Conf. Very Large Data Bases (VLDB '94), pp. 487-499, Sept. 1994.]]
[2]
R. Agrawal and R. Srikant, “Mining Sequential Patterns,” Proc. 1995 Int'l Conf. Data Eng. (ICDE '95), pp. 3-14, Mar. 1995.]]
[3]
R.J. Bayardo, “Efficiently Mining Long Patterns from Databases,” Proc. 1998 ACM-SIGMOD Int'l Conf. Management of Data (SIGMOD '98), pp. 85-93, June 1998.]]
[4]
R.J. Bayardo R. Agrawal and D. Gunopulos, “Constraint-Based Rule Mining on Large, Dense Data Sets,” Proc. 1999 Int'l Conf. Data Eng. (ICDE '99), pp. 188-197, Apr. 1999.]]
[5]
C. Bettini X.S. Wang and S. Jajodia, “Mining Temporal Relationships with Multiple Granularities in Time Sequences,” Data Eng. Bull., vol. 21, pp. 32-38, 1998.]]
[6]
S. Guha R. Rastogi and K. Shim, “Rock: A Robust Clustering Algorithm for Categorical Attributes,” Proc. 1999 Int'l Conf. Data Eng. (ICDE '99), pp. 512-521, Mar. 1999.]]
[7]
J. Han G. Dong and Y. Yin, “Efficient Mining of Partial Periodic Patterns in Time Series Database,” Proc. 1999 Int'l Conf. Data Eng. (ICDE '99), pp. 106-115, Apr. 1999.]]
[8]
J. Han J. Pei B. Mortazavi-Asl Q. Chen U. Dayal and M.-C. Hsu, “FreeSpan: Frequent Pattern-Projected Sequential Pattern Mining,” Proc. 2000 ACM SIGKDD Int'l Conf. Knowledge Discovery in Databases (KDD '00), pp. 355-359, Aug. 2000.]]
[9]
J. Han J. Pei and Y. Yin, “Mining Frequent Patterns without Candidate Generation,” Proc. 2000 ACM-SIGMOD Int'l Conf. Management of Data (SIGMOD '00), pp. 1-12, May 2000.]]
[10]
R. Kohavi C. Brodley B. Frasca L. Mason and Z. Zheng, “KDD-Cup 2000 Organizers' Report: Peeling the Onion,” Proc. SIGKDD Explorations, vol. 2, pp. 86-98, 2000.]]
[11]
M. Kuramochi and G. Karypis, “Frequent Subgraph Discovery,” Proc. 2001 Int'l Conf. Data Mining (ICDM '01), pp. 313-320, Nov. 2001.]]
[12]
H. Lu J. Han and L. Feng, “Stock Movement and n-Dimensional Inter-Transaction Association Rules,” Proc. 1998 SIGMOD Workshop Research Issues on Data Mining and Knowledge Discovery (DMKD '98), vol. 12, pp. 1-7, June 1998.]]
[13]
H. Mannila H Toivonen and A.I. Verkamo, “Discovery of Frequent Episodes in Event Sequences,” Data Mining and Knowledge Discovery, vol. 1, pp. 259-289, 1997.]]
[14]
F. Masseglia F. Cathala and P. Poncelet, “The PSP Approach for Mining Sequential Patterns,” Proc. 1998 European Symp. Principle of Data Mining and Knowledge Discovery (PKDD '98), pp. 176-184, Sept. 1998.]]
[15]
R. Ng L.V.S. Lakshmanan J. Han and A. Pang, “Exploratory Mining and Pruning Optimizations of Constrained Associations Rules,” Proc. 1998 ACM-SIGMOD Int'l Conf. Management of Data (SIGMOD '98), pp. 13-24, June 1998.]]
[16]
B. Özden S. Ramaswamy and A. Silberschatz, “Cyclic Association Rules,” Proc. 1998 Int'l Conf. Data Eng. (ICDE '98), pp. 412-421, Feb. 1998.]]
[17]
N. Pasquier Y. Bastide R. Taouil and L. Lakhal, “Discovering Frequent Closed Itemsets for Association Rules,” Proc. Seventh Int'l Conf. Database Theory (ICDT '99), pp. 398-416, Jan. 1999.]]
[18]
J. Pei J. Han and L.V.S. Lakshmanan, “Mining Frequent Itemsets with Convertible Constraints,” Proc. 2001 Int'l Conf. Data Eng. (ICDE '01), pp. 433-332, Apr. 2001.]]
[19]
J. Pei J. Han B. Mortazavi-Asl H. Pinto Q. Chen U. Dayal and M.-C. Hsu, “PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth,” Proc. 2001 Int'l Conf. Data Eng. (ICDE '01), pp. 215-224, Apr. 2001.]]
[20]
J. Pei J. Han and W. Wang, “Constraint-Based Sequential Pattern Mining in Large Databases,” Proc. 2002 Int'l Conf. Information and Knowledge Management (CIKM '02), pp. 18-25, Nov. 2002.]]
[21]
H. Pinto J. Han J. Pei K. Wang Q. Chen and U. Dayal, “Multi-Dimensional Sequential Pattern Mining,” Proc. 2001 Int'l Conf. Information and Knowledge Management (CIKM '01), pp. 81-88, Nov. 2001.]]
[22]
S. Ramaswamy S. Mahajan and A. Silberschatz, “On the Discovery of Interesting Patterns in Association Rules,” Proc. 1998 Int'l Conf. Very Large Data Bases (VLDB '98), pp. 368-379, Aug. 1998.]]
[23]
R. Srikant and R. Agrawal, “Mining Sequential Patterns: Generalizations and Performance Improvements,” Proc. Fifth Int'l Conf. Extending Database Technology (EDBT '96), pp. 3-17, Mar. 1996.]]
[24]
J. Wang G. Chirn T. Marr B. Shapiro D. Shasha and K. Zhang, “Combinatiorial Pattern Discovery for Scientific Data: Some Preliminary Results,” Proc. 1994 ACM-SIGMOD Int'l Conf. Management of Data (SIGMOD '94), pp. 115-125, May 1994.]]
[25]
X. Yan and J. Han, “gSpan: Graph-Based Substructure Pattern Mining,” Proc. 2002 Int'l Conf. Data Mining (ICDM '02), pp. 721-724, Dec. 2002.]]
[26]
X. Yan and J. Han, “CloseGraph: Mining Closed Frequent Graph Patterns,” Proc. 2003 ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD '03), Aug. 2003.]]
[27]
X. Yan J. Han and R. Afshar, “CloSpan: Mining Closed Sequential Patterns in Large Datasets,” Proc. 2003 SIAM Int'l Conf. Data Mining (SDM '03), pp. 166-177, May 2003.]]
[28]
J. Yang W. Wang P.S. Yu and J. Han, “Mining Long Sequential Patterns in a Noisy Environment,” Proc. 2002 ACM-SIGMOD Int'l Conf. Management of Data (SIGMOD '02), pp. 406-417, June 2002.]]
[29]
M. Zaki, “SPADE: An Efficient Algorithm for Mining Frequent Sequences,” Machine Learning, vol. 40, pp. 31-60, 2001.]]
[30]
M.J. Zaki, “Efficient Enumeration of Frequent Sequences,” Proc. Seventh Int'l Conf. Information and Knowledge Management (CIKM '98), pp. 68-75, Nov. 1998.]]
[31]
M.J. Zaki, “Efficiently Mining Frequent Trees in a Forest,” Proc. 2002 ACM SIGKDD Int'l Conf. Knowledge Discovery in Databases (KDD '02), pp. 71-80, July 2002.]]

Cited By

View all
  • (2025)High-utility sequential pattern mining in incremental databaseThe Journal of Supercomputing10.1007/s11227-024-06568-x81:1Online publication date: 1-Jan-2025
  • (2024)Cut to the Chase: An Error-Oriented Approach to Detect Error-Handling BugsProceedings of the ACM on Software Engineering10.1145/36607871:FSE(1796-1818)Online publication date: 12-Jul-2024
  • (2024)Mouse Dynamics Behavioral Biometrics: A SurveyACM Computing Surveys10.1145/364031156:6(1-33)Online publication date: 24-Jan-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Knowledge and Data Engineering  Volume 16, Issue 11
November 2004
143 pages

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 November 2004

Author Tags

  1. 65
  2. Index Terms- Data mining algorithm
  3. frequent pattern
  4. performance analysis.
  5. scalability
  6. sequence database
  7. sequential pattern
  8. transaction database

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2025)High-utility sequential pattern mining in incremental databaseThe Journal of Supercomputing10.1007/s11227-024-06568-x81:1Online publication date: 1-Jan-2025
  • (2024)Cut to the Chase: An Error-Oriented Approach to Detect Error-Handling BugsProceedings of the ACM on Software Engineering10.1145/36607871:FSE(1796-1818)Online publication date: 12-Jul-2024
  • (2024)Mouse Dynamics Behavioral Biometrics: A SurveyACM Computing Surveys10.1145/364031156:6(1-33)Online publication date: 24-Jan-2024
  • (2024)Totally-ordered Sequential Rules for Utility MaximizationACM Transactions on Knowledge Discovery from Data10.1145/362845018:4(1-23)Online publication date: 12-Feb-2024
  • (2024)RulePrompt: Weakly Supervised Text Classification with Prompting PLMs and Self-Iterative Logical RulesProceedings of the ACM Web Conference 202410.1145/3589334.3645602(4272-4282)Online publication date: 13-May-2024
  • (2024)RNP-Miner: Repetitive Nonoverlapping Sequential Pattern MiningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.333430036:9(4874-4889)Online publication date: 1-Sep-2024
  • (2024)Mining fuzzy local periodic activity pattern for Smart home applicationsKnowledge-Based Systems10.1016/j.knosys.2024.111629293:COnline publication date: 7-Jun-2024
  • (2024)Towards utility-driven contiguous sequential patterns in uncertain multi-sequencesKnowledge-Based Systems10.1016/j.knosys.2023.111314284:COnline publication date: 25-Jan-2024
  • (2024)Scholar's Career Switch from Academia to IndustryBig Data Research10.1016/j.bdr.2024.10044136:COnline publication date: 28-May-2024
  • (2024)A three-stage quality evaluation method for experience products: taking animation as an exampleMultimedia Systems10.1007/s00530-024-01401-030:4Online publication date: 8-Jul-2024
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media