An Exact Algorithm of Searching for the Largest Cluster in an Integer-Valued Problem of 2-Partitioning a Sequence

A. V. Kel’manov^1,2,
S. A. Khamidullin¹,
V. I. Khandeev^1,2 &
…
A. V. Pyatkin^1,2

37 Accesses
Explore all metrics

Abstract

We analyze mathematical aspects of one of the fundamental data analysis problems consisting in the search (selection) for the subset with the largest number of similar elements among a collection of objects. In particular, the problem appears in connection with the analysis of data in the form of time series (discrete signals). One of the problems in modeling this challenge is considered, namely, the problem of finding the cluster of the largest size (cardinality) in a 2-partition of a finite sequence of points in Euclidean space into two clusters (subsequences) under two constraints. The first constraint is on the choice of the indices of elements included in the clusters. This constraint simulates the set of time-admissible configurations of similar elements in the observed discrete signal. The second constraint is imposed on the value of the quadratic clustering function. This constraint simulates the level of intracluster proximity of objects. The clustering function under the second constraint is the sum (over both clusters) of the intracluster sums of squared distances between the cluster elements and its center. The center of one of the clusters is unknown and defined as the centroid (the arithmetic mean over all elements of this cluster). The center of the other cluster is the origin. Under the first constraint, the difference between any two subsequent indices of elements contained in a cluster with an unknown center is bounded above and below by some constants. It is established in the paper that the optimization problem under consideration, which models one of the simplest significant problems of data analysis, is strongly NP-hard. We propose an exact algorithm for the case of a problem with integer coordinates of its input points. If the dimension of the space is bounded by a constant, then the algorithm is pseudopolynomial.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Exact Algorithm of Searching for the Largest Size Cluster in an Integer Sequence 2-Clustering Problem

An Approximation Algorithm for a Problem of Partitioning a Sequence into Clusters with Constraints on Their Cardinalities

Article 01 December 2017

Exact Algorithm for One Cardinality-Weighted 2-Partitioning Problem of a Sequence

References

J. B. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proc. Fifth Berkeley Symposium on Mathematical Statistics and Probability (Univ. of California Press, Berkeley, 1967), Vol. 1, pp. 281–297.
Google Scholar
M. Rao, “Cluster analysis and mathematical programming,” J. Amer. Stat. Assoc. 66, 622–626 (1971).
Article MATH Google Scholar
P. Hansen, B. Jaumard, and N. Mladenović, “Minimum sum of squares clustering in a low dimensional space,” J. Classif. 15 (1), 37–55 (1998).
Article MathSciNet MATH Google Scholar
P. Hansen and B. Jaumard, “Cluster analysis and mathematical programming,” Math. Program. 79 (1–3), 191–215 (1997).
MathSciNet MATH Google Scholar
R. A. Fisher, Statistical Methods and Scientific Inference (Hafner Press, New York, 1956).
MATH Google Scholar
D. Aloise, A. Deshpande, P. Hansen, and P. Popat, “NP–hardness of Euclidean sum–of–squares clustering,” Mach. Learn. 75 (2), 245–248 (2009).
Article MATH Google Scholar
P. Drineas, A. Frieze, R. Kannan, S. Vempala, V. Vinay, “Clustering large graphs via the singular value decomposition,” Mach. Learn. 56 (1–3), 9–33 (2004).
Article MATH Google Scholar
A. V. Dolgushev and A. V. Kel’manov, “On the algorithmic complexity of a problem in cluster analysis,” J. Appl. Ind. Math. 5 (2), 191–194 (2011).
Article MathSciNet Google Scholar
M. Mahajan, P. Nimbhorkar, and K. Varadarajan, “The planar k–means problem is NP–hard,” Theor. Comput. Sci. 442, 13–21 (2012).
Article MathSciNet MATH Google Scholar
A. V. Kel’manov and S. A. Khamidullin. “An approximating polynomial algorithm for a sequence partitioning problem,” J. Appl. Ind. Math. 8 (2), 236–244 (2014).
Article MathSciNet MATH Google Scholar
A. V. Kel’manov, S. A. Khamidullin, and V. I. Khandeev, “Exact pseudopolynomial algorithm for one sequence partitioning problem,” Autom. Remote Control, 78 (1), 66–73 (2017).
Article MATH Google Scholar
A. V. Kel’manov, S. A. Khamidullin, and V. I. Khandeev, “A fully polynomial–time approximation scheme for a sequence 2–cluster partitioning problem,” J. Appl. Ind. Math. 10 (2), 209–219 (2016).
Article MathSciNet MATH Google Scholar
A. V. Kel’manov, S. A. Khamidullin, and V. I. Khandeev, “A randomized algorithm for a sequence 2–clustering problem,” Comput. Math. and Math. Phys. (accepted) (2018).
Google Scholar
A. V. Kel’manov, S. A. Khamidullin, and V. I. Khandeev, “A randomized algorithm for 2–partition of a sequence,” in Analysis of Images, Social Networks and Texts, AIST 2017, Ed. by W. M. P. van der Aalst, Lecture Notes in Computer Science (Springer, Cham, 2018), Vol. 10716, pp. 297–306.
C. M. Bishop, Pattern Recognition and Machine Learning (Springer Science + Business Media, New York, 2006).
MATH Google Scholar
G. James, D. Witten, T. Hastie, and R. Tibshirani, An Introduction to Statistical Learning: With Application in R (Springer Science + Business Media, New York, 2013).
Book MATH Google Scholar
T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. (Springer, New York, 2009).
Book MATH Google Scholar
C. C. Aggarwal, Data Mining: The Textbook (Springer International Publishing, Cham, 2015).
MATH Google Scholar
I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, in Adaptive Computation and Machine Learning series (MIT Press, Cambridge, MA, 2016).
MATH Google Scholar
A. S. Shirkhorshidi, S. Aghabozorgi, T. Y. Wah, and T. Herawan, “Big data clustering: A review,” in Computational Science and Its Applications—ICCSA 2014, Ed. by B. Murgante, Lecture Notes in Computer Science (Springer, Cham, 2014), Vol. 8583, pp. 707–720.
Google Scholar
A. K. Jain, “Data clustering: 50 years beyond–means,” Pattern Recogn. Lett. 31 (8), 651–666 (2010).
Article Google Scholar
J. Pach and P. K. Agarwal, Combinatorial Geometry (Wiley, New York, 1995).
Book MATH Google Scholar
T.–C. Fu, “A review on time series data mining,” Eng. Appl. Artif. Intell. 24 (1), 164–181 (2011).
Article Google Scholar
C. Künzer, S. Dech, and W. Wagner (Eds.), Remote Sensing Time Series, in Remote Sensing and Digital Image Processing (Springer, Cham, 2015), Vol. 22.
T. W. Liao, “Clustering of time series data–A survey,” Pattern Recogn. 38 (11), 1857–1874 (2005).
Article MATH Google Scholar
A. V. Kel’manov and A. V. Pyatkin, “On the complexity of a search for a subset of “similar” vectors,” Dokl. Math. 78 (1), 574–575 (2008).
Article MathSciNet MATH Google Scholar
A. V. Kel’manov and A. V. Pyatkin, “On a version of the problem of choosing a vector subset,” J. Appl. Ind. Math. 3 (4), 447–455 (2009).
Article MathSciNet Google Scholar
A. V. Kel’manov and V. I. Khandeev, “A 2–approximation polynomial algorithm for a clustering problem,” J. Appl. Ind. Math. 7 (4), 515–521 (2013).
Article MathSciNet MATH Google Scholar
E. Kh. Gimadi, A. V. Kel’manov, M. A. Kel’manova, and S. A. Khamidullin, “A posteriori detection of a quasiperiodic fragment with a given number of repetitions in a numerical sequence,” Sib. Zh. Ind. Mat. 9 (1(25)), 55–74 (2006) [in Russian].
Google Scholar
E. Kh. Gimadi, A. V. Kel’manov, M. A. Kel’manova, and S. A. Khamidullin, “A posteriori detecting a quasiperiodic fragment in a numerical sequence,” Pattern Recogn. Image Anal. 18 (1), 30–42 (2008).
Article MATH Google Scholar
A. E. Baburin, E. Kh. Gimadi, N. I. Glebov, and A. V. Pyatkin, “The problem of finding a subset of vectors with the maximum total weight,” J. Appl. Ind. Math. 2 (1), 32–38 (2008).
Article MathSciNet MATH Google Scholar
A. V. Kel’manov and V. I. Khandeev, “An exact pseudopolynomial algorithm for a problem of the two–cluster partitioning of a set of vectors,” J. Appl. Ind. Math. 9 (4), 497–502 (2015).
Article MathSciNet MATH Google Scholar
E. Kh. Gimadi, A. V. Pyatkin, and I. A. Rykov, “On polynomial solvability of some problems of a vector subset choice in a Euclidean space of fixed dimension,” J. Appl. Ind. Math. 4 (1), 48–53 (2010).
Article MathSciNet Google Scholar
V. V. Shenmaier, “Solving some vector subset problems by Voronoi diagrams,” J. Appl. Ind. Math. 10 (4), 560–566 (2016).
Article MathSciNet MATH Google Scholar
A. V. Dolgushev and A. V. Kel’manov, “An approximation algorithm for solving a problem of cluster analysis,” J. Appl. Ind. Math. 5 (4), 551–558 (2011).
Article MathSciNet MATH Google Scholar
A. V. Dolgushev, A. V. Kel’manov, and V. V. Shenmaier, “Polynomial–time approximation scheme for a problem of partitioning a finite set into two clusters,” Proc. Steklov Inst. Math. 295 (Suppl. 1), S47–S56 (2016).
A. V. Kel’manov and V. I. Khandeev, “Fully polynomial–time approximation scheme for a special case of a quadratic Euclidean 2–clustering problem,” Comput. Math. Math. Phys. 56 (2), 334–341 (2016).
Article MathSciNet MATH Google Scholar
A. V. Kel’manov, A. V. Motkova, and V. V. Shenmaier, “An approximation scheme for a weighted two–cluster partition problem,” in Analysis of Images, Social Networks and Texts, AIST 2017, Ed. by W. M. P. van der Aalst, Lecture Notes in Computer Science (Springer, Cham, 2018), Vol. 10716, pp. 323–333.
Google Scholar
A. V. Kel’manov and V. I. Khandeev, “A randomized algorithm for two–cluster partition of a set of vectors,” Comput. Math. Math. Phys. 55 (2), 330–339 (2015).
Article MathSciNet MATH Google Scholar
A. V. Kel’manov, V. I. Khandeev, and A. V. Panasenko, “Exact algorithms for the special cases of two hard to solve problems of searching for the largest subset,” in Proc. 7th Intern. Conf. on Analysis of Images, Social Networks, and Texts (AIST 2018), Moscow, Russia, July 5–7, 2018, Lecture Notes in Computer Science (accepted) (Springer, Cham, 2019).
Google Scholar
A. V. Kel’manov, V. I. Khandeev, and A. V. Panasenko, “Exact algorithms for two hard to solve 2–clustering problems,” Pattern Recogn. Image Anal. (accepted) (2019).
Google Scholar
A. V. Kel’manov and A. V. Pyatkin, “On complexity of some problems of cluster analysis of vector sequences,” J. Appl. Ind. Math. 7 (3), 363–369 (2013).
Article MathSciNet MATH Google Scholar
A. V. Kel’manov and A. V. Khamidullin, “An approximation polynomial–time algorithm for a sequence biclustering problem,” Comput. Math. Math. Phys. 55 (6), 1068–1076 (2015).
Article MathSciNet MATH Google Scholar
A. V. Kel’manov and A. V. Pyatkin, “NP–completeness of some problems of choosing a vector subset,” J. Appl. Ind. Math. 5 (3), 352–357 (2011).
Article MathSciNet Google Scholar
M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP–Completeness (W. H. Freeman and Co., San Francisco, 1979).
MATH Google Scholar
A. V. Kel’manov and S. A. Khamidullin, “An approximating polynomial algorithm for a sequence partitioning problem,” J. Appl. Ind. Math. 8 (2), 236–244 (2014).
Article MathSciNet MATH Google Scholar
A. V. Kel’manov and S. A. Khamidullin, “A posteriori detection of a given number of identical subsequences in a quasiperiodic sequence,” Comput. Math. Math. Phys. 41 (5), 762–774 (2001).
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Sobolev Institute of Mathematics, Siberian Branch, Russian Academy of Sciences, pr. Akad. Koptyuga 4, Novosibirsk, Russia
A. V. Kel’manov, S. A. Khamidullin, V. I. Khandeev & A. V. Pyatkin
Novosibirsk State University, ul. Pirogova 2, Novosibirsk, Russia
A. V. Kel’manov, V. I. Khandeev & A. V. Pyatkin

Authors

A. V. Kel’manov
View author publications
You can also search for this author in PubMed Google Scholar
S. A. Khamidullin
View author publications
You can also search for this author in PubMed Google Scholar
V. I. Khandeev
View author publications
You can also search for this author in PubMed Google Scholar
A. V. Pyatkin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. V. Kel’manov.

Additional information

Sergei Asgadullovich Khamidullin. Born 1952. Graduated from Novosibirsk State University in 1974 with specialty in Physics and Applied Mathematics. Received Candidate’s Degree in Engineering Cybernetics and Information Theory in 1997. Currently with the Sobolev Institute of Mathematics, Siberian Branch of the Russian Academy of Sciences, Senior research scientist in Data Analysis Laboratory. Scientific interests: mathematical methods and computer technologies for data analysis and pattern recognition, discrete optimization. Author of more than 70 publications.

Aleksandr Vasil’evich Kel’manov. Born 1952. Graduated from Izhevsk State Technical University in 1974 with specialty in Applied Mathematics. Received Candidate’s Degree in Engineering Cybernetics and Information Theory in 1980 and Doctor of Sciences degree in Physics and Mathematics in 1994. Currently with the Sobolev Institute of Mathematics, Siberian Branch of the Russian Academy of Sciences, Head of Data Analysis Laboratory. Scientific interests: data analysis, data mining, pattern recognition, clusterization, discrete optimization, NP-hard problems, efficient algorithms with performance guarantees. Author of more than 200 publications.

Vladimir Il’ich Khandeev. Born 1991. Graduated from Novosibirsk State University in 2014 with specialty in Applied Mathematics and Computer Science. Received Candidate’s Degree in Physics and Mathematics in 2017. Currently with the Sobolev Institute of Mathematics, Siberian Branch of the Russian Academy of Sciences, Research scientist in Data Analysis Laboratory. Scientific interests: data mining, pattern recognition, clusterization, discrete optimization, NP-hard problems, efficient algorithms with performance guarantees. Author of 12 publications.

Artem Valer’evich Pyatkin. Born 1973. Graduated from Novosibirsk State University in 1996 with specialty in Mathematics. Received Candidate’s Degree in Physics and Mathematics in 1999 and Doctor of Sciences degree in Physics and Mathematics in 2009. Currently with the Sobolev Institute of Mathematics, Siberian Branch of the Russian Academy of Sciences, head of Discrete Optimization in Operations Research Laboratory. Scientific interests: graph Theory, data analysis, pattern recognition, clustering, discrete optimization, NP-hard problems. Author of more than 60 publications.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kel’manov, A.V., Khamidullin, S.A., Khandeev, V.I. et al. An Exact Algorithm of Searching for the Largest Cluster in an Integer-Valued Problem of 2-Partitioning a Sequence. Pattern Recognit. Image Anal. 28, 703–711 (2018). https://doi.org/10.1134/S105466181804017X

Download citation

Received: 01 June 2018
Published: 06 December 2018
Issue Date: October 2018
DOI: https://doi.org/10.1134/S105466181804017X

An Exact Algorithm of Searching for the Largest Cluster in an Integer-Valued Problem of 2-Partitioning a Sequence

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Exact Algorithm of Searching for the Largest Size Cluster in an Integer Sequence 2-Clustering Problem

An Approximation Algorithm for a Problem of Partitioning a Sequence into Clusters with Constraints on Their Cardinalities

Exact Algorithm for One Cardinality-Weighted 2-Partitioning Problem of a Sequence

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

An Exact Algorithm of Searching for the Largest Cluster in an Integer-Valued Problem of 2-Partitioning a Sequence

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An Exact Algorithm of Searching for the Largest Size Cluster in an Integer Sequence 2-Clustering Problem

An Approximation Algorithm for a Problem of Partitioning a Sequence into Clusters with Constraints on Their Cardinalities

Exact Algorithm for One Cardinality-Weighted 2-Partitioning Problem of a Sequence

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation