A Rough Clustering Algorithm for Mining Outliers in Categorical Data

N. N. R. Ranga Suri¹⁸,
Musti Narasimha Murty¹⁹ &
Gopalasamy Athithan^18,20

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8251))

Included in the following conference series:

International Conference on Pattern Recognition and Machine Intelligence

1853 Accesses
1 Citations

Abstract

Outlier detection is an important data mining task with applications in various domains. Mining of outliers in data has to deal with uncertainty regarding the membership of such outlier objects to one of the normal groups (classes) of objects. In this context, a soft computing approach based on rough sets happens to be a better choice to handle such mining tasks. Motivated by this requirement, a novel rough clustering algorithm is proposed here by modifying the basic k-modes algorithm to incorporate the lower and upper approximation properties of rough sets. The proposed algorithm includes the necessary computational steps required for determining the object assignment to various clusters and the modified centroid (mode) computation on categorical data. An experimental evaluation of the proposed rough k-modes algorithm is also presented here to demonstrate its performance in detecting outliers using various benchmark categorical data sets.

Download to read the full chapter text

Chapter PDF

Detecting outliers in categorical data through rough clustering

Article 08 February 2015

Enhancing Rough Clustering with Outlier Detection Based on Evidential Clustering

Rough Mode: A Generalized Centroid Proposal for Clustering Categorical Data Using the Rough Set Theory

Keywords

References

Albanese, A., Pal, S.K., Petrosino, A.: Rough sets, kernel set and spatio-temporal outlier detection. IEEE Trans. on Knowledge and Data Engineering (2012) (online)
Google Scholar
Asuncion, A., Newman, D.J.: UCI machine learning repository (2007), http://archive.ics.uci.edu/ml
Cao, F., Liang, J., Bai, L.: A new initialization method for categorical data clustering. Expert Systems with Applications 36, 10223–10228 (2009)
Article Google Scholar
Huang, Z.: A fast clustering algorithm to cluster very large categorical data sets in data mining. In: SIGMOD DMKD Workshop, pp. 1–8 (1997)
Google Scholar
Lingras, P., Peters, G.: Applying rough set concepts to clustering. In: Rough Sets: Selected Methods and Applications in Management and Engineering, pp. 23–38. Springer, London (2012)
Chapter Google Scholar
Ng, M.K., Li, M.J., Huang, J.Z., He, Z.: On the impact of dissimilarity measure in k-modes clustering algorithm. IEEE PAMI 29(3), 503–507 (2007)
Article Google Scholar
Pawlak, Z.: Rough sets. International Journal of Computer and Information Sciences 11, 341–356 (1982)
Article MathSciNet MATH Google Scholar
Peters, G.: Some refinements of rough k-means clustering. Pattern Recognition 39, 1481–1491 (2006)
Article MATH Google Scholar
Suri, N.N.R.R., Murty, M.N., Athithan, G.: Data mining techniques for outlier detection. In: Visual Analytics and Interactive Technologies: Data, Text and Web Mining Applications, ch. 2, pp. 22–38. IGI Global, New York (2011)
Google Scholar
Suri, N.N.R.R., Murty, M.N., Athithan, G.: An algorithm for mining outliers in categorical data through ranking. In: IEEE HIS, Pune, India, pp. 247–252 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Centre for AI and Robotics (CAIR), Bangalore, India
N. N. R. Ranga Suri & Gopalasamy Athithan
Dept of CSA, Indian Institute of Science (IISc), Bangalore, India
Musti Narasimha Murty
Presently Working at Scientific Analysis Group (SAG), Delhi, India
Gopalasamy Athithan

Authors

N. N. R. Ranga Suri
View author publications
You can also search for this author in PubMed Google Scholar
Musti Narasimha Murty
View author publications
You can also search for this author in PubMed Google Scholar
Gopalasamy Athithan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Machine Intelligence Unit, Indian Statistical Institute, 203, B. T. Road, 700108, Kolkata, India
Pradipta Maji , Ashish Ghosh , Kuntal Ghosh & Sankar K. Pal , , &
Department of Computer Science and Automation, Indian Institute of Science, 560012, Bangalore, India
M. Narasimha Murty

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Suri, N.N.R.R., Murty, M.N., Athithan, G. (2013). A Rough Clustering Algorithm for Mining Outliers in Categorical Data. In: Maji, P., Ghosh, A., Murty, M.N., Ghosh, K., Pal, S.K. (eds) Pattern Recognition and Machine Intelligence. PReMI 2013. Lecture Notes in Computer Science, vol 8251. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45062-4_23

Download citation

DOI: https://doi.org/10.1007/978-3-642-45062-4_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45061-7
Online ISBN: 978-3-642-45062-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

A Rough Clustering Algorithm for Mining Outliers in Categorical Data

Abstract

Chapter PDF

Similar content being viewed by others

Detecting outliers in categorical data through rough clustering

Enhancing Rough Clustering with Outlier Detection Based on Evidential Clustering

Rough Mode: A Generalized Centroid Proposal for Clustering Categorical Data Using the Rough Set Theory

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

A Rough Clustering Algorithm for Mining Outliers in Categorical Data

Abstract

Chapter PDF

Similar content being viewed by others

Detecting outliers in categorical data through rough clustering

Enhancing Rough Clustering with Outlier Detection Based on Evidential Clustering

Rough Mode: A Generalized Centroid Proposal for Clustering Categorical Data Using the Rough Set Theory

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation