Abstract
We consider the problem of approximate sorting of a data stream (in one pass) with limited internal storage where the goal is not to rearrange data but to output a permutation that reflects the ordering of the elements of the data stream as closely as possible. Our main objective is to study the relationship between the quality of the sorting and the amount of available storage. To measure quality, we use permutation distortion metrics, namely the Kendall tau and Chebyshev metrics, as well as mutual information, between the output permutation and the true ordering of data elements. We provide bounds on the performance of algorithms with limited storage and present a simple algorithm that asymptotically requires a constant factor as much storage as an optimal algorithm in terms of mutual information and average Kendall tau distortion.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proc. 21st ACM Symp. Principles of Database Systems (PODS), New York, NY, USA (2002)
Chakrabarti, A., Jayram, T.S., Pǎtraşcu, M.: Tight lower bounds for selection in randomly ordered streams. In: ACM-SIAM Symp. Discrete Algorithms (SODA), pp. 720–729. Society for Industrial and Applied Mathematics, Philadelphia (2008), http://dl.acm.org/citation.cfm?id=1347082.1347161
Chen, C.P., Qi, F.: The best lower and upper bounds of harmonic sequence. Global Journal of Applied Mathematics and Mathematical Sciences 1(1), 41–49 (2008)
Corless, R.M., Gonnet, G.H., Hare, D.E.G., Jeffrey, D.J., Knuth, D.E.: On the Lambert W function. Advances in Computational Mathematics 5(1), 329–359 (1996), http://dx.doi.org/10.1007/BF02124750
Cover, T.M., Thomas, J.A.: Elements of information theory. John Wiley & Sons (2006)
Diaconis, P.: Group Representations in Probability and Statistics, vol. 11. Institute of Mathematical Statistics (1988)
Farnoud, F., Schwartz, M., Bruck, J.: Rate-distortion for ranking with incomplete information. arXiv preprint (2014), http://arxiv.org/abs/1401.3093
Greenwald, M., Khanna, S.: Space-efficient online computation of quantile summaries. In: Proc. ACM SIGMOD Int. Conf. Management of Data, pp. 58–66. ACM, New York (2001), http://doi.acm.org/10.1145/375663.375670
Holst, L.: On the lengths of the pieces of a stick broken at random. Journal of Applied Probability 17(3), 623–634 (1980)
Manku, G.S., Rajagopalan, S., Lindsay, B.G.: Approximate medians and other quantiles in one pass and with limited memory. In: Proc. ACM SIGMOD Int. Conf. Management of Data, pp. 426–435. ACM, New York (1998), http://doi.acm.org/10.1145/276304.276342
McGregor, A., Valiant, P.: The shifting sands algorithm. In: ACM-SIAM Symp. Discrete Algorithms (SODA), pp. 453–458. SIAM (2012), http://dl.acm.org/citation.cfm?id=2095116.2095155
Munro, J., Paterson, M.: Selection and sorting with limited storage. Theoretical Computer Science 12(3), 315–323 (1980), http://www.sciencedirect.com/science/article/pii/0304397580900614
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Farnoud (Hassanzadeh), F., Yaakobi, E., Bruck, J. (2014). Approximate Sorting of Data Streams with Limited Storage. In: Cai, Z., Zelikovsky, A., Bourgeois, A. (eds) Computing and Combinatorics. COCOON 2014. Lecture Notes in Computer Science, vol 8591. Springer, Cham. https://doi.org/10.1007/978-3-319-08783-2_40
Download citation
DOI: https://doi.org/10.1007/978-3-319-08783-2_40
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08782-5
Online ISBN: 978-3-319-08783-2
eBook Packages: Computer ScienceComputer Science (R0)