Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/584931.584945acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Indexing web access-logs for pattern queries

Published: 08 November 2002 Publication History

Abstract

In this paper, we develop a new indexing method for large web access-logs. We are concerned with pattern queries, which advocate the search for access sequences that contain certain query patterns. This kind of queries find applications in processing web-log mining results (e.g., finding typical/atypical access-sequences). The proposed method focuses on scalability to web-logs' sizes. For this reason, we examine the gains due to signature-trees, which can further improve the scalability to very large web-logs. Experimental results illustrate the superiority of the proposed method.

References

[1]
R. Agrawal, R. Srikant. "Mining Sequential Patterns". Proc. IEEE Int. Conf. on Data Engineering (ICDE'01), pp.3--14, 1995.]]
[2]
M.S. Chen, J.S. Park, P.S. Yu. "Efficient Data Mining for Path Traversal Patterns". IEEE Transactions on Knowledge and Data Engineering, Vol.10, No.2, pp.209--221, 1998.]]
[3]
R. Cooley, B. Mobasher, J. Srivastava. "Data Preparation for Mining World Wide Web Browsing Patterns". Knowledge and Information Systems, Vol.1, No.1, pp.5--32, 1999.]]
[4]
U. Deppish. "S-tree: a Dynamic Balanced Signature Index for Office Retrieval". Proc. ACM Int. Conference on Information Retrieval (SIGIR'86), pp. 77--87, 1986.]]
[5]
S. Helmer, G. Moerkotte. "A Study of Four Index Structures for Set-Valued Attributes of Low Cardinality". Reihe Informatik 2/1999, University of Mannheim, p.20, 1999.]]
[6]
M. Kitagawa, Y. Ishikawa, N. Obho. "Evaluation of Signature Files as Set Access Facility in OODBs". Proc. of the ACM SIGMOD Conference on Management of Data, pp.247-256, Santa Barbara, CA, 1993.]]
[7]
M. Garofalakis, R. Rastogi, S. Seshadri, K. Shim. "Data Mining and the Web: Past, Present and Future". Workshop on Web Information and Data Management (WIDM'99), pp. 43--47, 1999.]]
[8]
T. Imielinski, A. Virmani. "MSQL: A Query Language for Database Mining. Data Mining and Knowledge Discovery". Vol. 3, No. 4, 373--408, 1999.]]
[9]
C.-C. Liu, J.-L. Hsu, A. Chen. "Efficient Theme and Non-Trivial Repeating Pattern Discovering in Music Databases". Proc. of IEEE Int. Conf. on Data Engineering (ICDE'99), pp. 14--21, 1999.]]
[10]
T. Morzy, M. Zakrzewicz. "Group Bitmap Index: a Structure for Association Rules Retrieval". Proc. Int. Conf. on Knowledge Discovery in Databases and Data Mining (KDD'98), pp.284--288, 1998.]]
[11]
A. Nanopoulos, D. Katsaros, Y. Manolopoulos. "A Data Mining Algorithm for Generalized Web Prefetching". IEEE Transactions on Knowledge and Data Engineering, to appear, 2002.]]
[12]
A. Nanopoulos, Y. Manolopoulos. "Finding Generalized Path Patterns for Web Log Data Mining". Proc. of East-European Conf. on Advances in Databases and Information Systems (ADBIS-DASFAA'2000), pp.215-228, 2000.]]
[13]
A. Nanopoulos, Y. Manolopoulos. "Efficient Similarity Search for Market Basket Data". The VLDB Journal, accepted, 2002.]]
[14]
J. Pei, J. Han, B. Mortazavi-Asl, H. Zhu, Mining "Access Patterns Efficiently from Web Logs". Proc. of Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD'00), 2000.]]
[15]
M. Perkowitz, O. Etzioni. "Adaptive Web Sites; an AI Challenge". Proc. of the 15th Int. Joint Conf. AI, 1997.]]
[16]
J. Pitkow. "In Search of Reliable Usage Data on the WWW". Proc. of the 6th Int. WWW Conference, 1997.]]
[17]
M. Spiliopoulou, L. Faulstich. "WUM - A Tool for WWW Ulitization Analysis". The World Wide Web and Databases, International Workshop (WebDB'98), pp. 184--103, 1998.]]
[18]
E. Tousidou, A. Nanopoulos, Y. Manolopoulos. "Improved Methods for Signature-Tree Construction". The Computer Journal, Vol.43, No.4, pp.301-314, 2000.]]
[19]
A. Whitney, D. Shasha. "Lots o' Ticks: Real-Time High Performance Time Series Queries on Billions of Trades and Quotes". Proc. of the ACM SIGMOD Conference on Management of Data, 2001.]]
[20]
T.W. Yan, M. Jacobsen, H. Garcia-Molina and U. Dayal: "From User Access Patterns to Dynamic Hypertext Linking", Computer Networks, Vol.28, No.7-11, pp.1007--1014, May 1996.]]
[21]
M. Zakrzewicz. "Sequential Index Structure for Content-Based Retrieval". Proc. Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD'01), pp.306--311, 2001.]]

Cited By

View all

Index Terms

  1. Indexing web access-logs for pattern queries

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WIDM '02: Proceedings of the 4th international workshop on Web information and data management
    November 2002
    116 pages
    ISBN:1581135939
    DOI:10.1145/584931
    • Program Chairs:
    • Roger Chiang,
    • Ee-Peng Lim
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 08 November 2002

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Conference

    CIKM02

    Upcoming Conference

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 30 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)SIESTA: A Scalable Infrastructure of Sequential Pattern AnalysisIEEE Transactions on Big Data10.1109/TBDATA.2022.32290929:3(975-990)Online publication date: 1-Jun-2023
    • (2019)Mining interesting knowledge from weblogsData & Knowledge Engineering10.1016/j.datak.2004.08.00153:3(225-241)Online publication date: 1-Jan-2019
    • (2018)Big Data Analysis in Cloud and Machine LearningBig Data Processing Using Spark in Cloud10.1007/978-981-13-0550-4_3(51-85)Online publication date: 17-Jun-2018
    • (2016)Analytics, challenges and applications in big data environment: a surveyJournal of Management Analytics10.1080/23270012.2016.11865783:3(206-239)Online publication date: Jul-2016
    • (2014)Big DataMobile Networks and Applications10.1007/s11036-013-0489-019:2(171-209)Online publication date: 1-Apr-2014
    • (2014)Big Data Generation and AcquisitionBig Data10.1007/978-3-319-06245-7_3(19-32)Online publication date: 7-Apr-2014
    • (2006)AISSProceedings of the 8th international conference on Data Warehousing and Knowledge Discovery10.1007/11823728_48(503-512)Online publication date: 4-Sep-2006
    • (2005)Indexing of sequences of sets for efficient exact and similar subsequence matchingProceedings of the 20th international conference on Computer and Information Sciences10.1007/11569596_88(864-873)Online publication date: 26-Oct-2005
    • (2004)Extracting User Behavior by Web Communities Technology on Global Web LogsDatabase and Expert Systems Applications10.1007/978-3-540-30075-5_92(957-968)Online publication date: 2004
    • (2003)Recent Developments in Web Usage Mining ResearchData Warehousing and Knowledge Discovery10.1007/978-3-540-45228-7_15(140-150)Online publication date: 2003
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media