Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

Towards continuous consistency axiom

Published: 30 June 2022 Publication History

Abstract

It is shown for the first time in this paper, that Kleinberg’s (2002) (self-contradictory) axiomatic system for distance-based clustering fails (that is one of the data transforming axioms, consistency axiom, turns out to be identity transformation) in fixed-dimensional Euclidean space due to the consistency axiom limitations and that its replacement with inner-consistency or outer consistency does not help if continuous data transformations are required. Therefore we formulate a new, sound axiomatic framework for cluster analysis in the fixed dimensional Euclidean space, suitable for k-means like algorithms. The system incorporates centric consistency axiom and motion consistency axiom which induce clustering preserving transformations useful e.g. for deriving new labelled sets for testing clustering procedures. It is suitable for continuous data transformations so that labelled data with small perturbations can be derived. Unlike Kleinberg’s consistency, the new axioms do not lead the data outside of Euclidean space nor cause increase in data dimensionality. Our cluster preserving transformations have linear complexity in data transformation and checking. They are in practice less restrictive, less rigid than Kleinberg’s consistency as they do not enforce inter-cluster distance increase and inner cluster distance decrease when performing clustering preserving transformation.

References

[1]
Ackerman M (2012) Towards theoretical foundations of clustering, university of Waterloo. PhD Thesis
[2]
Ackerman M, Ben-david S, Loker D (2010) Characterization of linkage-based clustering. In: COLT. pp 270–281
[3]
Ackerman M, Ben-David S, Loker D (2010) Towards property-based classification of clustering paradigms. In: Adv neural information proc Sys. 23. pp 10–18, Curran Associates, Inc.
[4]
Ackerman M, Ben-David S, Brânzei S, and Loker D Weighted clustering: towards solving the user’s dilemma Pattern Recogn 2021 120 108152
[5]
Awasthi P, Blum A, and Sheffet O Center-based clustering under perturbation stability Inf Process Lett 2012 112 1-2 49-54
[6]
Balcan M and Liang Y Clustering under perturbation resilience SIAM J Comput 2016 45 1 102-155
[7]
Ben-David S, Ackerman M (2009) Measures of clustering quality: a working set of axioms for clustering. In: Koller D, Schuurmans D, Bengio Y, Bottou L (eds) Advances in neural information processing systems 21. Curran Associates, Inc., pp 121–128
[8]
Campagner A, Ciucci D (2020) A formal learning theory for three-way clustering. In: Davis J, Tabia K (eds) Scalable uncertainty management - 14th international conference, SUM 2020, Bozen-Bolzano, Italy, 23-25 September 2020, proceedings. lecture notes in computer science. Springer, vol 12322, pp 128–140, DOI
[9]
Carlsson G, Mémoli F (2008) Persistent clustering and a theorem of j. kleinberg. arXiv:0808.2241
[10]
Carlsson G and Mémoli F Characterization, stability and convergence of hierarchical clustering methods J Mach Learn Res 2010 11 1425-1470
[11]
Chang J C, Amershi S, Kamar E (2017) Revolt: collaborative crowdsourcing for labeling machine learning datasets. In: Mark G, Fussell SR, Lampe C, schraefel MC, Hourcade JP, Appert C, Wigdor D (eds) Proceedings of the 2017 CHI conference on human factors in computing systems, Denver, CO, USA, 06-11 May 2017, pp 2334–2346. ACM, DOI
[12]
Lin C-R and chen M-S Combining partitional and hierarchical algorithms for robust and efficient data clustering with cohesion self-merging IEEE Trans Knowl Data Eng 2005 17 2 145-159
[13]
Cohen-Addad V, Kanade V, Mallmann-Trenn F (2018) Clustering redemption beyond the impossibility of kleinberg’s axioms. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems. Curran Associates, Inc., vol 31
[14]
Cui X, Yao J, and Yao Y Modeling use-oriented attribute importance with the three-way decision theory Rough Sets 2020 12179 122-136
[15]
Duda R, Hart P, Stork G (2000) Pattern Classification. Wiley, New York 2nd edn
[16]
Gower JC (1990) Clustering axioms. Classification society of North America newsletter. pp 2–3
[17]
Hennig C What are the true clusters? Pattern Recogn Lett 2015 64 15 53-62
[18]
Hopcroft J, Kannan R (2012) Computer science theory for the information age. chapter 8.13.2. a satisfiable set of axioms, p 272ff
[19]
Iglesias F, Zseby T, and Ferreira D Mdcgen: multidimensional dataset generator for clustering J Classif 2019 36 599-618
[20]
Ke Z, Wang D, Yan Q, Ren J, Lau RW (2019) Dual student: breaking the limits of the teacher in semi-supervised learning. In: Proc. IEEE CVF international conference on computer vision. pp 6728–6736
[21]
Kleinberg J (2002) An impossibility theorem for clustering. In: Proc. NIPS 2002. pp 446–453. http://books.nips.cc/papers/files/nips15/LT17.pdf
[22]
Kłopotek M (2017) On the existence of kernel function for kernel-trick of k-means. In: Kryszkiewicz M, Appice A, §lȩżak D, Rybiński H, Skowron A, Raś Z (eds) Foundations of intelligent systems. ISMIS, Lecture notes in computer science, Springer, Cham. vol 10352
[23]
Klopotek MA On the consistency of k-means++ algorithm Fundam Inform 2020 172 4 361-377
[24]
Kłopotek MA (2022) A clustering preserving transformation for k-means algorithm output
[25]
Kłopotek R, Kłopotek M, and Wierzchoń S A feasible k-means kernel trick under non-euclidean feature space Int J Appl Math Comput Sci 2020 30 4 703-715
[26]
Kłopotek MA, Kłopotek R (2020) In-the-limit clustering axioms. In: To appear in proc. ICAISC2020
[27]
Kłopotek MA, Wierzchoń ST, Kłopotek R (2020) k-means cluster shape implications. In: To appear in proc. AIAI
[28]
van Laarhoven T and Marchiori E Axioms for graph clustering quality functions J Mach Learn Res 2014 15 193-215
[29]
Larsen KG, Nelson J, Nguyundefinedn HL, and Thorup M Heavy hitters via cluster-preserving clustering Commun ACM 2019 62 8 95-100
[30]
Li W, Hannig J, and Mukherjee S Subspace clustering through sub-clusters J Mach Learn Res 2021 22 1-37
[31]
Liu M, Jiang X, and Kot AC A multi-prototype clustering algorithm Pattern Recognition 2009 42 5 689-698 http://www.sciencedirect.com/science/article/pii/S0031320308003798
[32]
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proc. Fifth berkeley symp. on math. statist. and prob. vol 1, pp 281–297, Univ. of calif. Press
[33]
Moore J, Ackerman M (2016) Foundations of perturbation robust clustering. In: IEEE ICDM. pp 1089–1094, DOI
[34]
Mount DM (2005) Kmlocal: A testbed for k-means clustering algorithms. https://www.cs.umd.edu/mount/Projects/KMeans/kmlocal-doc.pdf
[35]
Nie F, Wang CL, Li X (2019) K-multiple-means: A multiple-means clustering method with specified k clusters. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery and data mining. pp 959–967, KDD ’19, association for computing machinery, New York, NY, USA.
[36]
Pollard D Strong consistency of k–means clustering Ann Statist 1981 9 1 135-140
[37]
Puzicha J, Hofmann T, and Buhmann J A theory of proximity based clustering: structure detection by optimization Pattern Recogn 2000 33 4 617-634
[38]
Qin Y, Ding S, and Wang L Research progress on semi-supervised clustering Cogn Comput 2019 11 599-612
[39]
Shekar B (1988) A knowledge-based approach to pattern clustering. Ph.D. thesis Indian Institute of Science
[40]
Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. In: Proceedings of KDD workshop on text mining, proceedings of the 6th international confer1193 ence on knowledge discovery and data mining, Boston,MA
[41]
Strazzeri F, Sánchez-García RJ (2021) Possibility results for graph clustering: a novel consistency axiom. arXiv:1806.06142
[42]
Thomann P, Steinwart I, Schmid N (2015) Towards an axiomatic approach to hierarchical clustering of measures
[43]
Vidal A, Esteva F, and Godo L Axiomatizing logics of fuzzy preferences using graded modalities Fuzzy Sets and Systems 2020 401 163-188 https://www.sciencedirect.com/science/article/pii/S0165011419303203,. fuzzy Measures, Integrals and Quantification in Artificial Intelligence Problems – An Homage to Prof. Miguel Delgado
[44]
Wang P and Yang X Three-way clustering method based on stability theory IEEE Access 2021 9 33944-33953
[45]
Wang X, Kihara D, Luo J, and Qi GJ Enaet: a self-trained framework for semi-supervised and supervised learning with ensemble transformations IEEE Trans Image Process 2021 30 1639-1647
[46]
Wierzchoń S, Kłopotek M (2018) Modern Clustering Algorithms. Studies in Big Data 34. Springer
[47]
Wright W A formalization of cluster analysis Pattern Rec 1973 5 3 273-282
[48]
Zadeh RB, Ben-David S (2009) A uniqueness theorem for clustering. In: Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence. pp 639–646, UAI ’09, AUAI Press, Arlington, Virginia, United States
[49]
Zeng G, Wang Y, Pu J, Liu X, Sun X, Zhang J (2016) Communities in preference networks: refined axioms and beyond. In: ICDM. pp 599–608
[50]
Zhao Y, Tarus SK, Yang LT, Sun J, Ge Y, and Wang J Privacy-preserving clustering for big data in cyber-physical-social systems: survey and perspectives Information Sciences 2020 515 132-155 https://www.sciencedirect.com/science/article/pii/S0020025519309764

Cited By

View all

Index Terms

  1. Towards continuous consistency axiom
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Applied Intelligence
        Applied Intelligence  Volume 53, Issue 5
        Mar 2023
        1244 pages

        Publisher

        Kluwer Academic Publishers

        United States

        Publication History

        Published: 30 June 2022
        Accepted: 29 April 2022

        Author Tags

        1. Cluster analysis
        2. Clustering axioms
        3. Consistency
        4. Continuous consistency
        5. Inner- and outer-consistency
        6. Continuous inner- and outer-consistency
        7. Gravitational consistency
        8. Centric consistency
        9. Motion consistency
        10. k-means algorithm

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 14 Nov 2024

        Other Metrics

        Citations

        Cited By

        View all

        View Options

        View options

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media