Research on the problem of clustering tends to be fragmented across the pattern recognition, database, data mining, and machine learning communities. Addressing this problem in a unified way, Data Clustering: Algorithms and Applications provides complete coverage of the entire area of clustering, from basic methods to more refined and complex data clustering approaches. It pays special attention to recent issues in graphs, social networks, and other domains. The book focuses on three primary aspects of data clustering: Methods, describing key techniques commonly used for clustering, such as feature selection, agglomerative clustering, partitional clustering, density-based clustering, probabilistic clustering, grid-based clustering, spectral clustering, and nonnegative matrix factorization Domains, covering methods used for different domains of data, such as categorical data, text data, multimedia data, graph data, biological data, stream data, uncertain data, time series clustering, high-dimensional clustering, and big data Variations and Insights, discussing important variations of the clustering process, such as semisupervised clustering, interactive clustering, multiview clustering, cluster ensembles, and cluster validation In this book, top researchers from around the world explore the characteristics of clustering problems in a variety of application areas. They also explain how to glean detailed insight from the clustering processincluding how to verify the quality of the underlying clustersthrough supervision, human intervention, or the automated generation of alternative clusters.
Cited By
- Ahmadian S, Bateni M, Esfandiari H, Lattanzi S, Monemizadeh M and Norouzi-Fard A Resilient k-Clustering Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, (29-38)
- Mariz J, Badiozamani M, Peroni R and Silva R (2024). A critical review of bench aggregation and mining cut clustering techniques based on optimization and artificial intelligence to enhance the open-pit mine planning, Engineering Applications of Artificial Intelligence, 133:PD, Online publication date: 1-Jul-2024.
- Wu X, Feng Q, Xu J and Wang J (2024). New algorithms for fair k-center problem with outliers and capacity constraints, Theoretical Computer Science, 997:C, Online publication date: 27-May-2024.
- Prasad R, Sarmah R, Chakraborty S and Sarmah S (2023). NNVDC, Expert Systems with Applications: An International Journal, 227:C, Online publication date: 1-Oct-2023.
- Saadia B and Fotopoulos G (2023). Unsupervised clustering of ambient seismic noise in an urban environment, Computers & Geosciences, 179:C, Online publication date: 1-Oct-2023.
- Han X, Zhu Y, Ting K and Li G (2023). The impact of isolation kernel on agglomerative hierarchical clustering algorithms, Pattern Recognition, 139:C, Online publication date: 1-Jul-2023.
- Fowdur T and Doorgakant B (2023). A review of machine learning techniques for enhanced energy efficient 5G and 6G communications, Engineering Applications of Artificial Intelligence, 122:C, Online publication date: 1-Jun-2023.
- Jiao L, Denœux T, Liu Z and Pan Q (2022). EGMM, Applied Soft Computing, 129:C, Online publication date: 1-Nov-2022.
- Kashani E, Bagheri Shouraki S and Norouzi Y (2022). Evolving data stream clustering based on constant false clustering probability, Information Sciences: an International Journal, 614:C, (1-18), Online publication date: 1-Oct-2022.
- Strazzeri F and Sánchez-García R (2022). Possibility results for graph clustering, Pattern Recognition, 128:C, Online publication date: 1-Aug-2022.
- Guo Y and Li J (2021). Distributed Latent Dirichlet Allocation on Streams, ACM Transactions on Knowledge Discovery from Data, 16:1, (1-20), Online publication date: 28-Feb-2022.
- Zhu Y, Ting K, Jin Y and Angelova M (2022). Hierarchical clustering that takes advantage of both density-peak and density-connectivity, Information Systems, 103:C, Online publication date: 1-Jan-2022.
- Rocha D, Aloise D, Aloise D and Contardo C (2022). Visual attractiveness in vehicle routing via bi-objective optimization, Computers and Operations Research, 137:C, Online publication date: 1-Jan-2022.
- Huang Y, Morvan G, Pichon F and Mercier D SPSC Proceedings of the Winter Simulation Conference, (1-12)
- Zhang X, Liu H, Wu X, Zhang X and Liu X (2022). Spectral embedding network for attributed graph clustering, Neural Networks, 142:C, (388-396), Online publication date: 1-Oct-2021.
- Naseem U, Razzak I, Khan S and Prasad M (2021). A Comprehensive Survey on Word Representation Models: From Classical to State-of-the-Art Word Representation Language Models, ACM Transactions on Asian and Low-Resource Language Information Processing, 20:5, (1-35), Online publication date: 30-Sep-2021.
- Yao S, Hu C, Wang T and Cui X (2021). Autoencoder-like semi-NMF multiple clustering, Information Sciences: an International Journal, 572:C, (331-342), Online publication date: 1-Sep-2021.
- Latifi-Pakdehi A and Daneshpour N (2021). DBHC, Data & Knowledge Engineering, 135:C, Online publication date: 1-Sep-2021.
- Zhang Z, Sun L, Su S, Qu J and Li G (2020). Reconciling Multiple Social Networks Effectively and Efficiently: An Embedding Approach, IEEE Transactions on Knowledge and Data Engineering, 33:1, (224-238), Online publication date: 1-Jan-2021.
- Silva P, Bezerra C, Lima R and Machado I Classifying Feature Models Maintainability based on Machine Learning Algorithms Proceedings of the 14th Brazilian Symposium on Software Components, Architectures, and Reuse, (1-10)
- El Malki N, Cugny R, Teste O and Ravat F DECWA Proceedings of the 29th ACM International Conference on Information & Knowledge Management, (2005-2008)
- Khader M and Al-Naymat G (2020). Density-based Algorithms for Big Data Clustering Using MapReduce Framework, ACM Computing Surveys, 53:5, (1-38), Online publication date: 15-Oct-2020.
- Yamashita N and Adachi K (2019). A Modified k-Means Clustering Procedure for Obtaining a Cardinality-Constrained Centroid Matrix, Journal of Classification, 37:2, (509-525), Online publication date: 1-Jul-2020.
- Bhattacharjee P and Mitra P (2019). BISDBx: towards batch-incremental clustering for dynamic datasets using SNN-DBSCAN, Pattern Analysis & Applications, 23:2, (975-1009), Online publication date: 1-May-2020.
- Esmaelian M, Shahmoradi H and Nemati F (2019). A new preference disaggregation method for clustering problem: DISclustering, Soft Computing - A Fusion of Foundations, Methodologies and Applications, 24:6, (4483-4503), Online publication date: 1-Mar-2020.
- Noferesti M and Jalili R (2020). ACoPE, Computer Networks: The International Journal of Computer and Telecommunications Networking, 166:C, Online publication date: 15-Jan-2020.
- Singh M (2018). Scalability and sparsity issues in recommender datasets: a survey, Knowledge and Information Systems, 62:1, (1-43), Online publication date: 1-Jan-2020.
- Liang X and Znati T (2019). On the performance of intelligent techniques for intensive and stealthy DDos detection, Computer Networks: The International Journal of Computer and Telecommunications Networking, 164:C, Online publication date: 9-Dec-2019.
- Bera S, Chakrabarty D, Flores N and Negahbani M Fair algorithms for clustering Proceedings of the 33rd International Conference on Neural Information Processing Systems, (4954-4965)
- Guha S, Li Y and Zhang Q (2019). Distributed Partial Clustering, ACM Transactions on Parallel Computing, 6:3, (1-20), Online publication date: 5-Dec-2019.
- Khader M and Al-Naymat G An overview of various enhancements of DENCLUE algorithm Proceedings of the Second International Conference on Data Science, E-Learning and Information Systems, (1-7)
- Beck G, Duong T, Lebbah M, Azzag H and Cérin C (2019). A distributed approximate nearest neighbors algorithm for efficient large scale mean shift clustering, Journal of Parallel and Distributed Computing, 134:C, (128-139), Online publication date: 1-Dec-2019.
- Angelova M, Beliakov G and Zhu Y (2019). Density-based clustering using approximate natural neighbours, Applied Soft Computing, 85:C, Online publication date: 1-Dec-2019.
- Morichetta A and Mellia M (2019). Clustering and evolutionary approach for longitudinal web traffic analysis, Performance Evaluation, 135:C, Online publication date: 1-Nov-2019.
- Hao J, Bouzouane A and Gaboury S (2019). An incremental learning method based on formal concept analysis for pattern recognition in nonstationary sensor-based smart environments, Pervasive and Mobile Computing, 59:C, Online publication date: 1-Oct-2019.
- Galán S (2019). Comparative evaluation of region query strategies for DBSCAN clustering, Information Sciences: an International Journal, 502:C, (76-90), Online publication date: 1-Oct-2019.
- Höppner F and Jahnke M Holistic Assessment of Structure Discovery Capabilities of Clustering Algorithms Machine Learning and Knowledge Discovery in Databases, (223-239)
- Robles-Berumen H, Zafra A, Fardoun H and Ventura S (2019). LEAC, Knowledge-Based Systems, 179:C, (117-119), Online publication date: 1-Sep-2019.
- Mustafi D and Sahoo G (2019). A hybrid approach using genetic algorithm and the differential evolution heuristic for enhanced initialization of the k-means algorithm with applications in text clustering, Soft Computing - A Fusion of Foundations, Methodologies and Applications, 23:15, (6361-6378), Online publication date: 1-Aug-2019.
- Ji Y, Zhu W and Champagne B (2019). Recurrent Neural Network-Based Dictionary Learning for Compressive Speech Sensing, Circuits, Systems, and Signal Processing, 38:8, (3616-3643), Online publication date: 1-Aug-2019.
- Wang J, Liang J, Zheng W, Zhao X and Mu J (2019). Protein complex detection algorithm based on multiple topological characteristics in PPI networks, Information Sciences: an International Journal, 489:C, (78-92), Online publication date: 1-Jul-2019.
- Fan J (2019). OPE-HCA: an optimal probabilistic estimation approach for hierarchical clustering algorithm, Neural Computing and Applications, 31:7, (2095-2105), Online publication date: 1-Jul-2019.
- Mahdavi M, Abedjan Z, Castro Fernandez R, Madden S, Ouzzani M, Stonebraker M and Tang N Raha Proceedings of the 2019 International Conference on Management of Data, (865-882)
- Kampffmeyer M, Løkse S, Bianchi F, Livi L, Salberg A and Jenssen R (2022). Deep divergence-based approach to clustering, Neural Networks, 113:C, (91-101), Online publication date: 1-May-2019.
- Yang M, Chang-Chien S and Nataliani Y (2022). Unsupervised fuzzy model-based Gaussian clustering, Information Sciences: an International Journal, 481:C, (1-23), Online publication date: 1-May-2019.
- Melendez-Melendez G, Cruz-Paz D, Carrasco-Ochoa J and Martínez-Trinidad J (2019). An improved algorithm for partial clustering, Expert Systems with Applications: An International Journal, 121:C, (282-291), Online publication date: 1-May-2019.
- Feng F, He X, Wang X, Luo C, Liu Y and Chua T (2019). Temporal Relational Ranking for Stock Prediction, ACM Transactions on Information Systems, 37:2, (1-30), Online publication date: 30-Apr-2019.
- Oregi I, Pérez A, Del Ser J and Lozano J (2022). On-line Elastic Similarity Measures for time series, Pattern Recognition, 88:C, (506-517), Online publication date: 1-Apr-2019.
- Valenzuela-Valdés J, Luna F, Padilla P, Padilla J, Luque-Baena R and Agudo J (2019). Securing and Greening Wireless Sensor Networks with Beamforming, Mobile Networks and Applications, 24:2, (712-720), Online publication date: 1-Apr-2019.
- Canossa A, Makarovych S, Togelius J and Drachen A Like a DNA string Proceedings of the Fourteenth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, (152-158)
- Zhu Y, Ting K and Carman M (2018). Grouping points by shared subspaces for effective subspace clustering, Pattern Recognition, 83:C, (230-244), Online publication date: 1-Nov-2018.
- Liu H, Tao Z and Fu Y (2018). Partition Level Constrained Clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence, 40:10, (2469-2483), Online publication date: 1-Oct-2018.
- Tuhkala A, Kärkkäinen T and Nieminen P Semi-automatic literature mapping of participatory design studies 2006--2016 Proceedings of the 15th Participatory Design Conference: Short Papers, Situated Actions, Workshops and Tutorial - Volume 2, (1-5)
- Baydoun M, Ghaziri H and Al-Husseini M (2018). CPU and GPU parallelized kernel K-means, The Journal of Supercomputing, 74:8, (3975-3998), Online publication date: 1-Aug-2018.
- Chambon A, Boureau T, Lardeux F and Saubion F (2018). Logical characterization of groups of data, Applied Intelligence, 48:8, (2284-2303), Online publication date: 1-Aug-2018.
- Ye M, Zhang P and Nie L (2018). Clustering sparse binary data with hierarchical Bayesian Bernoulli mixture model, Computational Statistics & Data Analysis, 123:C, (32-49), Online publication date: 1-Jul-2018.
- Schmidt F and Ehrenfeld Y ViMEC Proceedings of the Eurographics/IEEE VGTC Conference on Visualization: Posters, (29-31)
- Liu H, Zhang X and Zhang X (2018). Possible world based consistency learning model for clustering and classifying uncertain data, Neural Networks, 102:C, (48-66), Online publication date: 1-Jun-2018.
- Wu Z and Xu J (2018). A consensus model for large-scale group decision making with hesitant fuzzy information and changeable clusters, Information Fusion, 41:C, (217-231), Online publication date: 1-May-2018.
- Guo Y, Xu Q, Luo X, Wei H, Bu H and Sbert M (2018). A group-based signal filtering approach for trajectory abstraction and restoration, Neural Computing and Applications, 29:9, (371-387), Online publication date: 1-May-2018.
- Feng F, He X, Liu Y, Nie L and Chua T Learning on Partial-Order Hypergraphs Proceedings of the 2018 World Wide Web Conference, (1523-1532)
- Alizadeh M, Peters S, Etalle S and Zannone N Behavior analysis in the medical sector Proceedings of the 33rd Annual ACM Symposium on Applied Computing, (1637-1646)
- Ahmad A and Starkey A (2018). Application of feature selection methods for automated clustering analysis, Neural Computing and Applications, 29:7, (317-328), Online publication date: 1-Apr-2018.
- Miyauchi A, Sonobe T and Sukegawa N Exact clustering via integer programming and maximum satisfiability Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence, (1387-1394)
- Nguyen C and Artemiadis P (2018). EEG feature descriptors and discriminant analysis under Riemannian Manifold perspective, Neurocomputing, 275:C, (1871-1883), Online publication date: 31-Jan-2018.
- Hosseini B and Kiani K (2018). FWCMR, Expert Systems with Applications: An International Journal, 91:C, (198-210), Online publication date: 1-Jan-2018.
- Saltos R, Weber R and Maldonado S (2017). Dynamic Rough-Fuzzy Support Vector Clustering, IEEE Transactions on Fuzzy Systems, 25:6, (1508-1521), Online publication date: 1-Dec-2017.
- Zhao N, Zhang L, Du B, Zhang Q, You J and Tao D (2017). Robust Dual Clustering with Adaptive Manifold Regularization, IEEE Transactions on Knowledge and Data Engineering, 29:11, (2498-2509), Online publication date: 1-Nov-2017.
- Liu W, Ye M, Wei J and Hu X (2017). Compressed constrained spectral clustering framework for large-scale data sets, Knowledge-Based Systems, 135:C, (77-88), Online publication date: 1-Nov-2017.
- Barbon A, Barbon S, Campos G, Seixas J, Peres L, Mastelini S, Andreo N, Ulrici A and Bridi A (2017). Development of a flexible Computer Vision System for marbling classification, Computers and Electronics in Agriculture, 142:PB, (536-544), Online publication date: 1-Nov-2017.
- Su P, Shang C, Chen T and Shen Q (2017). Exploiting Data Reliability and Fuzzy Clustering for Journal Ranking, IEEE Transactions on Fuzzy Systems, 25:5, (1306-1319), Online publication date: 1-Oct-2017.
- Pibiri G and Venturini R (2017). Clustered Elias-Fano Indexes, ACM Transactions on Information Systems, 36:1, (1-33), Online publication date: 30-Aug-2017.
- Bojchevski A, Matkovic Y and Günnemann S Robust Spectral Clustering for Noisy Data Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (737-746)
- Xu J, Wang G, Li T, Deng W and Gou G (2017). Fat node leading tree for data stream clustering with density peaks, Knowledge-Based Systems, 120:C, (99-117), Online publication date: 15-Mar-2017.
- Gupta S, Kumar R, Lu K, Moseley B and Vassilvitskii S (2017). Local search methods for k-means with outliers, Proceedings of the VLDB Endowment, 10:7, (757-768), Online publication date: 1-Mar-2017.
- Luna J, Castro C and Romero C (2017). MDM tool, Computer Applications in Engineering Education, 25:1, (90-102), Online publication date: 1-Jan-2017.
- Sartea R, Preda M, Farinelli A, Giacobazzi R and Mastroeni I Active Android malware analysis Proceedings of the 6th Workshop on Software Security, Protection, and Reverse Engineering, (1-10)
- Esmaelian M, Shahmoradi H and Vali M (2016). A novel classification method, Applied Soft Computing, 49:C, (56-70), Online publication date: 1-Dec-2016.
- Jia B, Yu B, Wu Q, Wei C and Law R (2016). Adaptive affinity propagation method based on improved cuckoo search, Knowledge-Based Systems, 111:C, (27-35), Online publication date: 1-Nov-2016.
- Akin D and Alasalvar S (2016). Estimate Urban Growth and Expansion by Modeling Urban Spatial Structure Using Hierarchical Cluster Analyses of Interzonal Travel Data, International Journal of System Dynamics Applications, 5:4, (16-41), Online publication date: 1-Oct-2016.
- Ceccato M, Nguyen C, Appelt D and Briand L SOFIA: an automated security oracle for black-box testing of SQL-injection vulnerabilities Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, (167-177)
- Melo D, Toledo S, Mouro F, Sachetto R, Andrade G, Ferreira R, Parthasarathy S and Rocha L (2016). Hierarchical Density-Based Clustering Based on GPU Accelerated Data Indexing Strategy, Procedia Computer Science, 80:C, (951-961), Online publication date: 1-Jun-2016.
- Khan F, Qamar U and Bashir S (2016). SWIMS, Knowledge-Based Systems, 100:C, (97-111), Online publication date: 15-May-2016.
- Liu L, Chen X, Liu M, Jia Y, Zhong J, Gao R and Zhao Y (2016). An influence power-based clustering approach with PageRank-like model, Applied Soft Computing, 40:C, (17-32), Online publication date: 1-Mar-2016.
- Driemel A, Krivošija A and Sohler C Clustering time series under the fréachet distance Proceedings of the twenty-seventh annual ACM-SIAM symposium on Discrete algorithms, (766-785)
- Mure S, Grenier T, Meier D, Guttmann C and Benoit-Cattin H (2015). Unsupervised spatio-temporal filtering of image sequences. A mean-shift specification, Pattern Recognition Letters, 68:P1, (48-55), Online publication date: 15-Dec-2015.
- Ketu S, Prasad B and Agarwal S Effect of Corpus Size Selection on Performance of Map-Reduce Based Distributed K-Means for Big Textual Data Clustering Proceedings of the Sixth International Conference on Computer and Communication Technology 2015, (256-260)
- Chrysouli C and Tefas A (2015). Spectral clustering and semi-supervised learning using evolving similarity graphs, Applied Soft Computing, 34:C, (625-637), Online publication date: 1-Sep-2015.
- Ravindra P, Gupta R and Anyanwu K Shared execution of clustering tasks Proceedings of the 4th International Conference on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications - Volume 41, (81-96)
- Begum N, Ulanova L, Wang J and Keogh E Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (49-58)
- Loyola-González O, Martínez-Trinidad J, Carrasco-Ochoa J and García-Borroto M Correlation of Resampling Methods for Contrast Pattern Based Classifiers Proceedings of the 7th Mexican Conference on Pattern Recognition - Volume 9116, (93-102)
- Ferrari D and de Castro L (2015). Clustering algorithm selection by meta-learning systems, Information Sciences: an International Journal, 301:C, (181-194), Online publication date: 20-Apr-2015.
- Higgs B and Abbas M (2015). Segmentation and Clustering of Car-Following Behavior: Recognition of Driving Patterns, IEEE Transactions on Intelligent Transportation Systems, 16:1, (81-90), Online publication date: 1-Feb-2015.
- Chaddad A (2015). Automated feature extraction in brain tumor by magnetic resonance imaging using Gaussian mixture models, Journal of Biomedical Imaging, 2015, (8-8), Online publication date: 1-Jan-2015.
- Shao Y, Luo X, Qian C, Zhu P and Zhang L Towards a scalable resource-driven approach for detecting repackaged Android applications Proceedings of the 30th Annual Computer Security Applications Conference, (56-65)
- Wang W, Guyet T, Quiniou R, Cordier M, Masseglia F and Zhang X (2014). Autonomic intrusion detection, Knowledge-Based Systems, 70:C, (103-117), Online publication date: 1-Nov-2014.
- So-In C, Poolsanguan S and Rujirakul K (2014). A hybrid mobile environmental and population density management system for smart poultry farms, Computers and Electronics in Agriculture, 109:C, (287-301), Online publication date: 1-Nov-2014.
- Aggarwal C The setwise stream classification problem Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, (432-441)
- Aggarwal C and Subbian K (2014). Evolutionary Network Analysis, ACM Computing Surveys, 47:1, (1-36), Online publication date: 1-Jul-2014.
- Aggarwal C (2013). Outlier ensembles, ACM SIGKDD Explorations Newsletter, 14:2, (49-58), Online publication date: 30-Apr-2013.
- Aggarwal C (2013). On the equivalence of PLSI and projected clustering, ACM SIGMOD Record, 41:4, (45-50), Online publication date: 17-Jan-2013.
Index Terms
- Data Clustering: Algorithms and Applications
Recommendations
On cluster tree for nested and multi-density data clustering
Clustering is one of the important data mining tasks. Nested clusters or clusters of multi-density are very prevalent in data sets. In this paper, we develop a hierarchical clustering approach-a cluster tree to determine such cluster structure and ...
Data clustering: 50 years beyond K-means
Organizing data into sensible groupings is one of the most fundamental modes of understanding and learning. As an example, a common scheme of scientific classification puts organisms into a system of ranked taxa: domain, kingdom, phylum, class, etc. ...
Improved k- means clustering algorithm for two dimensional data
CCSEIT '12: Proceedings of the Second International Conference on Computational Science, Engineering and Information TechnologyClustering is a procedure of organizing the objects in groups whose member exhibits some kind of similarity. So a cluster is a collection of objects which are alike and are different from the objects belonging to other clusters. K-Means is one of ...