Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/956750.956783acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Screening and interpreting multi-item associations based on log-linear modeling

Published: 24 August 2003 Publication History

Abstract

Association rules have received a lot of attention in the data mining community since their introduction. The classical approach to find rules whose items enjoy high support (appear in a lot of the transactions in the data set) is, however, filled with shortcomings. It has been shown that support can be misleading as an indicator of how interesting the rule is. Alternative measures, such as lift, have been proposed. More recently, a paper by DuMouchel et al. proposed the use of all-two-factor loglinear models to discover sets of items that cannot be explained by pairwise associations between the items involved. This approach, however, has its limitations, since it stops short of considering higher order interactions (other than pairwise) among the items. In this paper, we propose a method that examines the parameters of the fitted loglinear models to find all the significant association patterns among the items. Since fitting loglinear models for large data sets can be computationally prohibitive, we apply graph-theoretical results to divide the original set of items into components (sets of items) that are statistically independent from each other. We then apply loglinear modeling to each of the components and find the interesting associations among items in them. The technique is experimentally evaluated with a real data set (insurance data) and a series of synthetic data sets. The results show that the technique is effective in finding interesting associations among the items involved.

References

[1]
R. Agrawal, T. Imilienski, and A. Swami. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Database, pages 207--216, 1993.
[2]
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proceedings of the International Conference on Very Large Data Bases, pages 487--499, September 1994.
[3]
A. Agresti. Categorical data anlysis. Wiley, 1990.
[4]
J. Badsberg. An environment for graphical models. Ph.D. Thesis, Aalborg University, Demark, 1995.
[5]
D. Barbará, W. DuMouchel, C. Faloutsos, P. Hass, J. M. Hellerstein, Y. Ioannidis, H. Jagadish, T. Johnson, R. Ng, V. Poosala, K. Ross, and K. Sevcik. The new jersey data reduction report. Bulletin of the Technical Committee on Data Engineering, 20(4):3--45, December 1997.
[6]
D. Barbará and X. Wu. Loglinear based quasi cubes. Journal of Information and Intelligent Systems, 16(3):255--276, 2001.
[7]
J. Benedetti and M. Brown. Stratigies for the selection of loglinear models. Biometrics, 34:680--686, 1978.
[8]
Y. M. Bishop, S. E. Fienberg, and P. W. Holland. Discrete Multivariate Analysis: Theory and Practice. The MIT Press, Cambride, Massachusetts, and London, England, 1975.
[9]
M. Brown. Screening effects in multidimensional contigency tables. Applied Statistics, 25:37--46, 1976.
[10]
COIL challenge 2000. The insurance company (tic) benchmark. http://kdd.uci.edu/databases/tic/tic.html.
[11]
A. Deshpande, M. Garofalakis, and R. Rastogi. Independence is good: Dependency-based histogram synopses for high-dimensional data. In Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, pages 199--210. Santa Barbara, California, May 2001.
[12]
W. DuMouchel and D. Pregibon. Empirical bayes screening for multi-item association. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data. San Francisco, CA, August 2001.
[13]
L. Goodman. The analysis of multidimensional contigency tables: Stepwise procedures and direct estimation methods for building models for multiple classifications. Technometrics, 13:33--61, 1971.
[14]
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In Proceedings of the ACM SIGMOD International Conference on Management of Database, pages 1--12. Dallas, TX, May 2000.
[15]
V. Harinarayan, A. Rajaraman, and J. Ullman. Implementing data cubes efficiently. In Proceedings of the ACM SIGMOD International Conference on Management of Database, pages 205--216. Montreal, Quebec, Canada, 1996.
[16]
S. Lauritzen. Graphical Models. Oxford University Press, 1996.
[17]
D. Pavlov, H. Mannila, and P. Symth. Probabilistic models for query approximation with large sparse binary data sets. In Proceedings of Uncertainty in Artificial Intelligence, pages 199--210. Stanford, California, June 2000.
[18]
S. Sarawagi, R. Agrawal, and N. Meggido. Discovery-driven explorations of olap data cubes. In Proceedings of the International Conference on Extending Data Base Technolgy, pages 168--182. Valencia, Spain, 1998.
[19]
C. Silverstein, S. Brin, and R. Motwani. Beyond market baskets: Generalizing association rules to dependence rules. Data Mining and Knowledge Discovery, 2:39--68, 1998.
[20]
R. Tarjan. Decomposition by clique separators. Discrete Mathematics, 55:221--232, 1985.
[21]
J. Whittaker. Graphical Models in Applied Mathematical Multivariate Statistics. Wiley, 1990.
[22]
X. Wu and D. Barbará. Modeling and imputation of large incomplete multidimensional datasets. In Proceedings of the International Conference on Data Warehousing and Knowledge Discovery. Aix-en-Provence, France, Septemeber 2002.

Cited By

View all
  • (2016)A Multiple Test Correction for Streams and Cascades of Statistical Hypothesis TestsProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2939672.2939775(1255-1264)Online publication date: 13-Aug-2016
  • (2016)Using Loglinear Model for Discrimination Discovery and Prevention2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)10.1109/DSAA.2016.18(110-119)Online publication date: Oct-2016
  • (2015)Constrained independence for detecting interesting patterns2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA)10.1109/DSAA.2015.7344897(1-10)Online publication date: Oct-2015
  • Show More Cited By

Index Terms

  1. Screening and interpreting multi-item associations based on log-linear modeling

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2003
    736 pages
    ISBN:1581137370
    DOI:10.1145/956750
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 August 2003

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. association rule
    2. graphical model
    3. log-linear model

    Qualifiers

    • Article

    Conference

    KDD03
    Sponsor:

    Acceptance Rates

    KDD '03 Paper Acceptance Rate 46 of 298 submissions, 15%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 26 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2016)A Multiple Test Correction for Streams and Cascades of Statistical Hypothesis TestsProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2939672.2939775(1255-1264)Online publication date: 13-Aug-2016
    • (2016)Using Loglinear Model for Discrimination Discovery and Prevention2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)10.1109/DSAA.2016.18(110-119)Online publication date: Oct-2016
    • (2015)Constrained independence for detecting interesting patterns2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA)10.1109/DSAA.2015.7344897(1-10)Online publication date: Oct-2015
    • (2014)A Statistically Efficient and Scalable Method for Log-Linear Analysis of High-Dimensional DataProceedings of the 2014 IEEE International Conference on Data Mining10.1109/ICDM.2014.23(480-489)Online publication date: 14-Dec-2014
    • (2013)Interestingness measures for association rules within groupsIntelligent Data Analysis10.5555/2595554.259555717:2(195-215)Online publication date: 1-Mar-2013
    • (2013)Scaling Log-Linear Analysis to High-Dimensional Data2013 IEEE 13th International Conference on Data Mining10.1109/ICDM.2013.17(597-606)Online publication date: Dec-2013
    • (2012)Examining Multi-factor Interactions in Microblogging Based on Log-linear ModelingProceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)10.1109/ASONAM.2012.41(189-193)Online publication date: 26-Aug-2012
    • (2010)A statistical interestingness measures for XML based association rulesProceedings of the 11th Pacific Rim international conference on Trends in artificial intelligence10.5555/1884293.1884315(194-205)Online publication date: 30-Aug-2010
    • (2010)Self-sufficient itemsetsACM Transactions on Knowledge Discovery from Data10.1145/1644873.16448764:1(1-20)Online publication date: 18-Jan-2010
    • (2009)A computerized system for detecting signals due to drug–drug interactions in spontaneous reporting systemsBritish Journal of Clinical Pharmacology10.1111/j.1365-2125.2009.03557.x69:1(67-73)Online publication date: 23-Dec-2009
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media