Computer Science > Data Structures and Algorithms

arXiv:2208.10298 (cs)

[Submitted on 22 Aug 2022 (v1), last revised 28 Mar 2023 (this version, v2)]

Title:Approximate sorting and its application in I/O model

View PDF

Abstract:The approximate sorting for big data is considered in this paper. The goal of approximate sorting for big data is to generate an approximate sorted result, but using less CPU and I/O cost. For big data, we consider the approximate sorting in I/O model. The existing metrics on permutation space are not available for external approximate sorting algorithms. Thus, we propose a new kind of metric named External metric, which ignores the errors and dislocation that happened in each I/O this http URL External Spearmans footrule metric is an example of external metric for Spearmans footrule metric. Furthermore, to facilitate a better evaluation of the approximate sorted result, we propose a new metric, named as errors, which directly states the number of dislocation of the elements. Its external metric external errors is also considered in this paper. Then, according to the rate-distortion relationship endowed by these two metrics, the lower bound of these two metrics on external approximate sorting problem with t I/O operations is proved. We propose a k-pass external approximate sorting algorithm, named as EASORT, and prove that EASORT is asymptotically optimal. Finally, we consider the applications on approximate sorting results. An index for the result of our approximate sorting is proposed and analyze the single and range query on approximate sorted result using this index. Further, the sort-merge join on two relations, where one of the relations is approximate sorted or both relations are approximate sorted, are all discussed in this paper.

Subjects:	Data Structures and Algorithms (cs.DS); Databases (cs.DB); Information Theory (cs.IT)
Cite as:	arXiv:2208.10298 [cs.DS]
	(or arXiv:2208.10298v2 [cs.DS] for this version)
	https://doi.org/10.48550/arXiv.2208.10298

Submission history

From: Tianpeng Gao [view email]
[v1] Mon, 22 Aug 2022 13:25:20 UTC (16 KB)
[v2] Tue, 28 Mar 2023 10:54:52 UTC (866 KB)

Computer Science > Data Structures and Algorithms

Title:Approximate sorting and its application in I/O model

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Data Structures and Algorithms

Title:Approximate sorting and its application in I/O model

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators