Nothing Special   »   [go: up one dir, main page]

skip to main content
10.5555/2033831.2033841guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Parameter-free anomaly detection for categorical data

Published: 30 August 2011 Publication History

Abstract

Outlier detection can usually be considered as a preprocessing step for locating, from a data set, the objects that do not conform to well defined notions of expected behaviors. It is a major issue of data mining for discovering novel or rare events, actions and phenomena. We investigate outlier detection from a categorical data set. The problem is especially challenging because of difficulty in defining a meaningful similarity measure for categorical data. In this paper, we propose a formal definition of outliers and formulize outlier detection as an optimization problem. To solve the optimization problem, we design a practical and parameter-free method, named ITB. Experimental results show that the ITB method is much more effective and efficient than existing mainstream methods.

References

[1]
Ferreira, P., Alves, R., Belo, O., Cortesao, L.: Establishing Fraud Detection Patterns Based on Signatures. In: Industrial Conference on Data Mining 2006 (2006)
[2]
Chandola, V., Banerjee, A., Kumar, V.: Anomaly Detection: A Survey. ACM Computing Surveys (2009)
[3]
Cover, T., Thomas, J.: Elements of Information Theory. JohnWiley & Sons, Chichester
[4]
Otey, M.E., Ghoting, A., Parthasarathy, S.: Fast Distributed Outlier Detection in Mixed-Attribute Data Sets. DMKD 12, 203-228 (2006)
[5]
He, Z., Xu, X., Huang, Z.J., Deng, S.: FP-outlier: Frequent pattern based outlier detection. Computer Sci. and Info. Sys. 2, 103-118 (2005)
[6]
Li, S., Lee, R., Lang, S.: Mining Distance-based Outliers from Categorical Data. In: ICDM 2007 (2007)
[7]
Bohm, C., Haegler, K., Muller, N.S., Plant, C.: CoCo: Coding Cost for Parameter-Free Outlier Detection. In: KDD 2009 (2009)
[8]
Wu, M., Song, X., Jermaine, C., Ranka, S., Gums, J.: A LRT Framework for Fast Spatial Anomaly Detection. In: KDD 2009 (2009)
[9]
Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: SIGMOD 1993 (1993)
[10]
Li, T., Ma, S., Ogihara, M.: Entropy-Based Criterion in Categorical Cluster. In: ICML 2004 (2004)
[11]
Srinivasa, S.: A Review on Multivariate Mutual Information. Univ. of Notre Dame (2008)
[12]
Watanabe, S.: Information Theoretical Analysis of Multivariate Correlation. IBM Journal of Research and Development 4, 66-82 (1960)
[13]
Wei, L., Qian, W., Zhou, A., Jin, W., Yu, J.X.: HOT: Hypergraph-Based Outlier Test for Categorical Data. In: PAKDD 2003 (2003)
[14]
Breunig, M., Kriegel, H.-P., Ng, R., Sander, J.: LOF: Identifying Density-based Local Outliers. In: ACM SIGMOD 2000 (2000)
[15]
Chan, P.K., Mahoney, M.V., Arshad, M.H.: A machine learning approach to anomaly detection, Technical Report CS-2003-06, Florida Institute of Technology (2003)
[16]
Fox, M., Gramajo, G., Koufakou, A., Georgiopoulos, M.: Detecting Outliers in Categorical Data Sets Using Non-Derivable Itemsets, Technical Report TR-2008- 04, The AMALTHEA REU Program (2008)
[17]
Koufakou, A., Ortiz, E.G., Georgiopoulos, M., et al.: A Scalable and Efficient Outlier Detection Strategy for Categorical Data. In: ICTAI 2007 (2007)
[18]
Han, J., Kamber, M.: Data Mining - Concepts and Techniques. Elsevier, Amsterdam (2006)
[19]
Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: Loci: Fast outlier detection using thelocal correlation integral. In: ICDE 2003 (2003)
[20]
http://nsl.cs.unb.ca/NSL-KDD/
[21]
http://www.cs.umb.edu/dana/GAClust/index.html
[22]
UCI Machine Learning Repository, http://www.ics.uci.edu/mlearn/MLRepository.html

Cited By

View all

Index Terms

  1. Parameter-free anomaly detection for categorical data

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    MLDM'11: Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
    August 2011
    611 pages
    ISBN:9783642231988
    • Editor:
    • Petra Perner

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 30 August 2011

    Author Tags

    1. categorical data
    2. information theory
    3. outlier detection

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 22 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media