Computer Science > Data Structures and Algorithms

arXiv:2212.01821 (cs)

[Submitted on 4 Dec 2022]

Title:Clustering Permutations: New Techniques with Streaming Applications

Authors:Diptarka Chakraborty, Debarati Das, Robert Krauthgamer

View PDF

Abstract:We study the classical metric $k$-median clustering problem over a set of input rankings (i.e., permutations), which has myriad applications, from social-choice theory to web search and databases. A folklore algorithm provides a $2$-approximate solution in polynomial time for all $k=O(1)$, and works irrespective of the underlying distance measure, so long it is a metric; however, going below the $2$-factor is a notorious challenge. We consider the Ulam distance, a variant of the well-known edit-distance metric, where strings are restricted to be permutations. For this metric, Chakraborty, Das, and Krauthgamer [SODA, 2021] provided a $(2-\delta)$-approximation algorithm for $k=1$, where $\delta\approx 2^{-40}$.
Our primary contribution is a new algorithmic framework for clustering a set of permutations. Our first result is a $1.999$-approximation algorithm for the metric $k$-median problem under the Ulam metric, that runs in time $(k \log (nd))^{O(k)}n d^3$ for an input consisting of $n$ permutations over $[d]$. In fact, our framework is powerful enough to extend this result to the streaming model (where the $n$ input permutations arrive one by one) using only polylogarithmic (in $n$) space. Additionally, we show that similar results can be obtained even in the presence of outliers, which is presumably a more difficult problem.

Subjects:	Data Structures and Algorithms (cs.DS)
ACM classes:	F.2.0
Cite as:	arXiv:2212.01821 [cs.DS]
	(or arXiv:2212.01821v1 [cs.DS] for this version)
	https://doi.org/10.48550/arXiv.2212.01821

Submission history

From: Diptarka Chakraborty [view email]
[v1] Sun, 4 Dec 2022 13:48:02 UTC (36 KB)

Computer Science > Data Structures and Algorithms

Title:Clustering Permutations: New Techniques with Streaming Applications

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Data Structures and Algorithms

Title:Clustering Permutations: New Techniques with Streaming Applications

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators