Nothing Special   »   [go: up one dir, main page]

skip to main content
10.1145/2020408.2020585acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
poster

Incorporating SAT solvers into hierarchical clustering algorithms: an efficient and flexible approach

Published: 21 August 2011 Publication History

Abstract

The area of constrained clustering has been actively pursued for the last decade. A more recent extension that will be the focus of this paper is constrained hierarchical clustering which allows building user-constrained dendrograms/trees. Like all forms of constrained clustering, previous work on hierarchical constrained clustering uses simple constraints that are typically implemented in a procedural language. However, there exists mature results and packages in the fields of constraint satisfaction languages and solvers that the constrained clustering field has yet to explore. This work marks the first steps towards introducing constraints satisfaction languages/solvers into hierarchical constrained clustering. We make several significant contributions. We show how many existing and new constraints for hierarchical clustering, can be modeled as a Horn-SAT problem that is easily solvable in polynomial time and which allows their implementation in any number of declarative languages or efficient solvers. We implement our own solver for efficiency reasons. We then show how to formulate constrained hierarchical clustering in a flexible manner so that any number of algorithms, whose output is a dendrogram, can make use of the constraints.

References

[1]
K. Bade and D. Benz. Evaluation strategies for learning algorithms of hierarchies. In Proceedings of the 32nd Annual Conference of the German Classification Society (GfKl'08), 2009.
[2]
K. Bade and A. Nürnberger. Personalized hierarchical clustering. In WI '06: Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, pages 181--187, Washington, DC, USA, 2006. IEEE Computer Society.
[3]
K. Bade and A. Nürnberger. Creating a cluster hierarchy under constraints of a partially known hierarchy. In Proceedings of the SIAM International Conference on Data Mining, SDM 2008, pages 13--24, 2008.
[4]
S. Baso, I. Davidson, and K. L. Wagstaff, editors. Constrained Clustering: Advances in Algorithms, Theory, and Applications. CRC Press, 2009.
[5]
S. A. Cook. The complexity of theorem-proving procedures. In STOC '71: Proceedings of the third annual ACM symposium on Theory of computing, pages 151--158, New York, NY, USA, 1971. ACM.
[6]
I. Davidson and S. S. Ravi. Agglomerative hierarchical clustering with constraints: Theoretical and empirical results. In Knowledge Discovery in Databases: PKDD 2005, 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, pages 59--70, 2005.
[7]
I. Davidson and S. S. Ravi. Using instance-level constraints in agglomerative hierarchical clustering: theoretical and empirical results. Data Min. Knowl. Discov., 18(2):257--282, 2009.
[8]
S. Davidson, Ravi. A sat-based framework for efficient constrained clustering. In Siam Data Mining (SDM), 2010.
[9]
W. F. Dowling and J. J. Gallier. Linear-time algorithms for testing the satisfiability of propositional horn formulae. J. Logic Programming, 3:267--284, 1984.
[10]
S. Gilpin. Code for algorithms in this paper. http://csiflabs.cs.ucdavis.edu/ sagilpin/.
[11]
R. Karp. Reducibility among combinatorial problems. In R. Miller and J. Thatcher, editors, Complexity of Computer Computations, pages 85--103. Plenum Press, 1972.
[12]
H. A. Kestler, J. M. Kraus, G. Palm, and F. Schwenker. On the effects of constraints in semi-supervised hierarchical clustering. In Artificial Neural Networks in Pattern Recognition, Second IAPR Workshop, ANNPR 2006, pages 57--66, 2006.
[13]
J. R. Kettenring. A perspective on cluster analysis. Stat. Anal. Data Min., 1(1):52--53, 2008.
[14]
A. K. McCallum. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering. http://www.cs.cmu.edu/ mccallum/bow, 1996.
[15]
J. Rennie. 20 newsgroups data set.
[16]
S. J. Russell and P. Norvig. Artificial Intelligence a Modern Approach. Prentice Hall, Upper Saddle River, N.J., 2nd international edition edition, 2003.
[17]
R. E. Tarjan. Efficiency of a good but not linear set union algorithm. J. ACM, 22(2):215--225, 1975.
[18]
K. Wagstaff and C. Cardie. Clustering with instance-level constraints. In Proceedings of the Seventeenth International Conference on Machine Learning (ICML 2000), pages 1103--1110, 2000.

Cited By

View all
  • (2024)A Constrained Cluster Ensemble Using Hierarchical Clustering Methods2024 IEEE 12th International Conference on Intelligent Systems (IS)10.1109/IS61756.2024.10705267(1-6)Online publication date: 29-Aug-2024
  • (2020)SAT-based models for overlapping community detection in networksComputing10.1007/s00607-020-00803-yOnline publication date: 23-Mar-2020
  • (2020)Initial Centroid Selection Method for an Enhanced K-means Clustering AlgorithmUbiquitous Networking10.1007/978-3-030-58008-7_15(182-190)Online publication date: 16-Aug-2020
  • Show More Cited By

Index Terms

  1. Incorporating SAT solvers into hierarchical clustering algorithms: an efficient and flexible approach

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '11: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2011
    1446 pages
    ISBN:9781450308137
    DOI:10.1145/2020408
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 August 2011

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. clustering
    2. constrained clustering
    3. hierarchical clustering

    Qualifiers

    • Poster

    Conference

    KDD '11
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 20 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Constrained Cluster Ensemble Using Hierarchical Clustering Methods2024 IEEE 12th International Conference on Intelligent Systems (IS)10.1109/IS61756.2024.10705267(1-6)Online publication date: 29-Aug-2024
    • (2020)SAT-based models for overlapping community detection in networksComputing10.1007/s00607-020-00803-yOnline publication date: 23-Mar-2020
    • (2020)Initial Centroid Selection Method for an Enhanced K-means Clustering AlgorithmUbiquitous Networking10.1007/978-3-030-58008-7_15(182-190)Online publication date: 16-Aug-2020
    • (2020)Constrained Clustering: Current and New TrendsA Guided Tour of Artificial Intelligence Research10.1007/978-3-030-06167-8_14(447-484)Online publication date: 8-May-2020
    • (2018)Triangle-Driven Community Detection in Large Graphs Using Propositional Satisfiability2018 IEEE 32nd International Conference on Advanced Information Networking and Applications (AINA)10.1109/AINA.2018.00072(437-444)Online publication date: May-2018
    • (2018)A Data Mining-Based Framework for Multi-item Markdown OptimizationArtificial Intelligence for Fashion Industry in the Big Data Era10.1007/978-981-13-0080-6_4(47-70)Online publication date: 17-May-2018
    • (2017)A framework for minimal clustering modification via constraint programmingProceedings of the Thirty-First AAAI Conference on Artificial Intelligence10.5555/3298239.3298442(1389-1395)Online publication date: 4-Feb-2017
    • (2017)Survey on using constraints in data miningData Mining and Knowledge Discovery10.1007/s10618-016-0480-z31:2(424-464)Online publication date: 1-Mar-2017
    • (2017)A SAT-Based Framework for Overlapping Community Detection in NetworksAdvances in Knowledge Discovery and Data Mining10.1007/978-3-319-57529-2_61(786-798)Online publication date: 23-Apr-2017
    • (2016)Modeling in MiningZincData Mining and Constraint Programming10.1007/978-3-319-50137-6_10(257-281)Online publication date: 3-Dec-2016
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media