Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/1031453.1031477acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

FS-Miner: efficient and incremental mining of frequent sequence patterns in web logs

Published: 12 November 2004 Publication History

Abstract

Mining frequent patterns is an important component of many prediction systems. One common usage in web applications is the mining of users' access behavior for the purpose of predicting and hence pre-fetching the web pages that the user is likely to visit.
In this paper we introduce an efficient strategy for discovering frequent patterns in sequence databases that requires only two scans of the database. The first scan obtains support counts for subsequences of length two. The second scan extracts potentially frequent sequences of any length and represents them as a compressed frequent sequences tree structure (FS-tree). Frequent sequence patterns are then mined from the FS-tree. Incremental and interactive mining functionalities are also facilitated by the FS-tree. As part of this work, we developed the FS-Miner, a system that discovers frequent sequences from web log files. The FS-Miner has the ability to adapt to changes in users' behavior over time, in the form of new input sequences, and to respond incrementally without the need to perform full re-computation. Our system also allows the user to change the input parameters (e.g., minimum support and desired pattern size) interactively without requiring full re-computation in most cases.
We have tested our system comparing it against two other algorithms from the literature. Our experimental results show that our system scales up linearly with the size of the input database. Furthermore, it exhibits excellent adaptability to support threshold decreases. We also show that the incremental update capability of the system provides significant performance advantages over full re-computation even for relatively large update sizes.

References

[1]
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In VLDB, pages 487--499, 1994.
[2]
R. Cooley, J. Srivastava, and B. Mobasher. Web mining: Information and pattern discovery on the world wide web. In CTAI, pages 558--567, 1997.
[3]
M. H. Dunham. Data Mining: Introductory and Advanced Topics. Prentice Hall, 2003.
[4]
M. EL-Sayed, E. A. Rundensteiner, and C. Ruiz. FS-Miner: An Efficient and Incremental System to Mine Contiguous Frequent Sequences. Technical Report WPI-TR-03-20, Department of Computer Science. WPI, June 2003.
[5]
W. Gaul and L. Schmidt-Thieme. Mining web navigation path fragments. In WEBKDD, 2000.
[6]
M. Gery and H. Haddad. Evaluation of web usage mining approaches for user's next request prediction. In WIDM, pages 74--81, 2003.
[7]
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In SIGMOD, pages 1--12, May 2000.
[8]
Hettich, S. and Bay, S. D. The UCI KDD Archive. Irvine, CA: University of California, Department of Information and Computer Science. http://kdd.ics.uci.edu, 1999.
[9]
S. Jespersen, T. B. Pedersen, and J. Thorhauge. Evaluating the Markov assumption for web usage mining. In WIDM, pages 82--89, 2003.
[10]
A. Nanopoulos, D. Katsaros, and Y. Manolopoulos. Effective prediction of web-user accesses: A data mining approach. In WEBKDD Workshop, San Francisco, CA, Aug. 2001.
[11]
R. Ng, L. Lakshmanan, J. Han, and Pang. Exploratory mining and pruning optimization of constrained association rules. In SIGMOD Conf., pages 13--24, 1998.
[12]
S. Parthasarathy, M. J. Zaki, M. Ogihara, and S. Dwarkadas. Incremental and interactive sequence mining. In CIKM, pages 251--258, 1999.
[13]
S. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule mining with relational database systems: alternatives and implications. In SIGMOD Conf., pages 343--354, 1998.
[14]
R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. In EDBT, pages 3--17, 1996.
[15]
Z. Su, Q. Yang, Y. Lu, and H. Zhang. Whatnext: A prediction system for web request using n-gram sequence models. In WISE, pages 214--221, 2000.
[16]
Y. Xiao and M. H. Dunham. Efficient Mining of Traversal Patterns. DKE, 2(39):191 -- 214, 2001.
[17]
Q. Yang, H. H. Zhang, and I. T. Y. Li. Mining web logs for prediction models in WWWcaching and prefetching. In KDD, pages 473--478, 2001.

Cited By

View all
  • (2022)VEPRECO: Vertical databases with pre-pruning strategies and common candidate selection policies to fasten sequential pattern miningExpert Systems with Applications10.1016/j.eswa.2022.117517204(117517)Online publication date: Oct-2022
  • (2020)Issues and Research Challenges in Sequential Pattern Mining2020 IEEE International Conference on Advances and Developments in Electrical and Electronics Engineering (ICADEE)10.1109/ICADEE51157.2020.9368943(1-7)Online publication date: 10-Dec-2020
  • (2020)Using Positional Sequence Patterns to Estimate the Selectivity of SQL LIKE QueriesExpert Systems with Applications10.1016/j.eswa.2020.113762(113762)Online publication date: Jul-2020
  • Show More Cited By

Index Terms

  1. FS-Miner: efficient and incremental mining of frequent sequence patterns in web logs

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WIDM '04: Proceedings of the 6th annual ACM international workshop on Web information and data management
    November 2004
    168 pages
    ISBN:1581139780
    DOI:10.1145/1031453
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 November 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. frequent patterns
    2. incremental mining
    3. prediction
    4. prefetching
    5. sequence mining
    6. traversal patterns
    7. web logs
    8. web usage mining

    Qualifiers

    • Article

    Conference

    CIKM04
    Sponsor:
    CIKM04: Conference on Information and Knowledge Management
    November 12 - 13, 2004
    Washington DC, USA

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 16 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)VEPRECO: Vertical databases with pre-pruning strategies and common candidate selection policies to fasten sequential pattern miningExpert Systems with Applications10.1016/j.eswa.2022.117517204(117517)Online publication date: Oct-2022
    • (2020)Issues and Research Challenges in Sequential Pattern Mining2020 IEEE International Conference on Advances and Developments in Electrical and Electronics Engineering (ICADEE)10.1109/ICADEE51157.2020.9368943(1-7)Online publication date: 10-Dec-2020
    • (2020)Using Positional Sequence Patterns to Estimate the Selectivity of SQL LIKE QueriesExpert Systems with Applications10.1016/j.eswa.2020.113762(113762)Online publication date: Jul-2020
    • (2020)Connecting Web Event Forecasting with Anomaly Detection: A Case Study on Enterprise Web Applications Using Self-supervised Neural NetworksSecurity and Privacy in Communication Networks10.1007/978-3-030-63086-7_27(481-502)Online publication date: 12-Dec-2020
    • (2019)Mining Rank DataACM Transactions on Knowledge Discovery from Data10.1145/336357213:6(1-36)Online publication date: 11-Nov-2019
    • (2019)A Survey of Parallel Sequential Pattern MiningACM Transactions on Knowledge Discovery from Data10.1145/331410713:3(1-34)Online publication date: 7-Jun-2019
    • (2015)Methods for web revisitation predictionUser Modeling and User-Adapted Interaction10.1007/s11257-015-9161-725:4(331-369)Online publication date: 1-Oct-2015
    • (2015)Mining the change of customer behavior in dynamic marketsInformation Technology and Management10.1007/s10799-014-0197-x16:2(117-138)Online publication date: 1-Jun-2015
    • (2015)Data PersonalizationData Management in Pervasive Systems10.1007/978-3-319-20062-0_11(213-234)Online publication date: 2015
    • (2014)An alternative approach for clustering web user sessions considering sequential informationIntelligent Data Analysis10.5555/2639292.263929518:2(137-156)Online publication date: 1-Mar-2014
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media