Nothing Special   »   [go: up one dir, main page]

skip to main content
research-article

BlockQuicksort: Avoiding Branch Mispredictions in Quicksort

Published: 30 January 2019 Publication History

Abstract

It is well known that Quicksort -- which is commonly considered as one of the fastest in-place sorting algorithms -- suffers in an essential way from branch mispredictions. We present a novel approach to addressing this problem by partially decoupling control from dataflow: in order to perform the partitioning, we split the input into blocks of constant size. Then, all elements in one block are compared with the pivot and the outcomes of the comparisons are stored in a buffer. In a second pass, the respective elements are rearranged. By doing so, we avoid conditional branches based on outcomes of comparisons (except for the final Insertionsort). Moreover, we prove that when sorting n elements, the average total number of branch mispredictions is at most ϵn log n + O(n) for some small ϵ depending on the block size.
Our experimental results are promising: when sorting random-integer data, we achieve an increase in speed (number of elements sorted per second) of more than 80% over the GCC implementation of Quicksort (C++ std::sort). Also, for many other types of data and non-random inputs, there is still a significant speedup over std::sort. Only in a few special cases, such as sorted or almost sorted inputs, can std::sort beat our implementation. Moreover, on random-input permutations, our implementation is even slightly faster than an implementation of the highly tuned Super Scalar Sample Sort, which uses a linear amount of additional space.
Finally, we also apply our approach to Quickselect and obtain a speed-up of more than 100% over the GCC implementation (C++ std::nth_element).

References

[1]
2011. ARMv8 Instruction Set Overview. Retrieved December 24, 2018 from https://www.element14.com/community/servlet/JiveServlet/previewBody/41836-102-1-229511/ARM.Reference_Manual.pdf Document number: PRD03-GENC-010197 15.0.
[2]
2016. Intel 64 and IA-32 Architecture Optimization Reference Manual. Retrieved December 24, 2018 from http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf Order Number: 248966-032.
[3]
D. Abhyankar and M. Ingle. 2011. Engineering of a Quicksort partitioning algorithm. Journal of Global Research in Computer Science 2, 2 (2011), 17--23.
[4]
Martin Aumüller and Martin Dietzfelbinger. 2013. Optimal partitioning for dual pivot Quicksort (Extended abstract). In Proceedings of Automata, Languages, and Programming - 40th International Colloquium (ICALP’13), Riga, Latvia, July 8-12, 2013, Part I, Lecture Notes in Computer Science, Fedor V. Fomin, Rusins Freivalds, Marta Z. Kwiatkowska, and David Peleg (Eds.), Vol. 7965. Springer, Berlin, 33--44.
[5]
Martin Aumüller, Martin Dietzfelbinger, and Pascal Klaue. 2016. How good is multi-pivot Quicksort? ACM Trans. Algorithms 13, 1 (2016), 8:1--8:47.
[6]
Michael Axtmann, Sascha Witt, Daniel Ferizovic, and Peter Sanders. 2017. In-place parallel super scalar Samplesort (IPSSSSo). In 25th Annual European Symposium on Algorithms (ESA’17), September 4-6, 2017, Vienna, Austria (LIPIcs), Kirk Pruhs and Christian Sohler (Eds.), Vol. 87. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 9:1--9:14.
[7]
Paul Biggar, Nicholas Nash, Kevin Williams, and David Gregg. 2008. An experimental study of sorting and branch prediction. J. Exp. Algorithmics 12 (2008), 1.8:1--39.
[8]
Gerth Stølting Brodal, Rolf Fagerberg, and Kristoffer Vinther. 2008. Engineering a cache-oblivious sorting algorithm. J. Exp. Algorithmics 12 (2008), 2.2:1--23.
[9]
Gerth Stølting Brodal and Gabriel Moruz. 2005. Tradeoffs between branch mispredictions and comparisons for sorting algorithms. In WADS. Lecture Notes in Computer Science, Vol. 3608. Springer, Berlin, 385--395.
[10]
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2001. Introduction to Algorithms (2nd ed.). The MIT Press.
[11]
Stefan Edelkamp and Armin Weiß. 2016. BlockQuicksort: Avoiding branch mispredictions in Quicksort. In 24th Annual European Symposium on Algorithms (ESA’16), August 22-24, 2016, Aarhus, Denmark (LIPIcs), Piotr Sankowski and Christos D. Zaroliagis (Eds.), Vol. 57. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 38:1--38:16.
[12]
Amr Elmasry and Jyrki Katajainen. 2012. Lean programs, branch mispredictions, and sorting. In Proceedings of Fun with Algorithms (FUN’12). Lecture Notes in Computer Science, Evangelos Kranakis, Danny Krizanc, and Flaminia L. Luccio (Eds.), Vol. 7288. Springer, Berlin, 119--130.
[13]
Amr Elmasry, Jyrki Katajainen, and Max Stenmark. 2012. Branch mispredictions don’t affect Mergesort. In Proceedings of Experimental Algorithms - 11th International Symposium (SEA’12), Bordeaux, France, June 7-9, 2012. Lecture Notes in Computer Science, Ralf Klasing (Ed.), Vol. 7276. Springer, Berlin, 160--171.
[14]
Robert W. Floyd. 1964. Algorithm 245: Treesort 3. Comm. ACM 7, 12 (1964), 701.
[15]
Robert W. Floyd and Ronald L. Rivest. 1975. The algorithm SELECT - for finding the ith smallest of n elements {M1} (Algorithm 489). Commun. ACM 18, 3 (1975), 173.
[16]
Robert W. Floyd and Ronald L. Rivest. 1975. Expected time bounds for selection. Commun. ACM 18, 3 (1975), 165--172.
[17]
Nikolaj Hass and Mikkel Angaju Rasmussen. 2016. Is Multi-Pivot BlockQuickSort viable? Unpublished student project supervised by Martin Aumüller.
[18]
John L. Hennessy and David A. Patterson. 2011. Computer Architecture: A Quantitative Approach (5th ed.). Morgan Kaufmann.
[19]
Charles A. R. Hoare. 1961. Algorithm 64: Quicksort. Commun. ACM 4, 7 (1961), 321.
[20]
Charles A. R. Hoare. 1961. Algorithm 65: Find. Commun. ACM 4, 7 (1961), 321--322.
[21]
Charles A. R. Hoare. 1962. Quicksort. Comput. J. 5, 1 (1962), 10--16.
[22]
Kanela Kaligosi and Peter Sanders. 2006. How branch mispredictions affect Quicksort. In Proceedings of Algorithms - 14th Annual European Symposium (ESA’06), Zurich, Switzerland, September 11-13, 2006, Lecture Notes in Computer Science, Yossi Azar and Thomas Erlebach (Eds.), Vol. 4168. Springer, Berlin, 780--791.
[23]
Jyrki Katajainen. 2014. Sorting Programs Executing Fewer Branches. CPH STL Report 2263887503. Department of Computer Science, University of Copenhagen.
[24]
Peter Kirschenhofer, Helmut Prodinger, and Conrado Martinez. 1997. Analysis of Hoare’s FIND algorithm with median-of-three partition. Random Struct. Algorithms 10, 1-2 (1997), 143--156.
[25]
Krzysztof C. Kiwiel. 2005. On Floyd and Rivest’s SELECT algorithm. Theor. Comput. Sci. 347, 1-2 (2005), 214--238.
[26]
Donald E. Knuth. 1998. Sorting and Searching (2nd ed.). The Art of Computer Programming, Vol. 3. Addison Wesley Longman.
[27]
Shrinu Kushagra, Alejandro López-Ortiz, Aurick Qiao, and J. Ian Munro. 2014. Multi-pivot Quicksort: Theory and experiments. In Proceedings of the 16th Workshop on Algorithm Engineering and Experiments (ALENEX’14), Portland, Oregon, January 5, 2014, Catherine C. McGeoch and Ulrich Meyer (Eds.). SIAM, 47--60.
[28]
Anthony LaMarca and Richard E. Ladner. 1999. The influence of caches on the performance of sorting. J. Algorithms 31, 1 (1999), 66--104.
[29]
Conrado Martínez, Markus E. Nebel, and Sebastian Wild. 2015. Analysis of branch misses in Quicksort. In Workshop on Analytic Algorithmics and Combinatorics (ANALCO’15), San Diego, CA, January 4, 2015. 114--128.
[30]
Conrado Martinez, Daniel Panario, and Alfredo Viola. 2004. Adaptive sampling for Quickselect. In Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’04), New Orleans, Louisiana, January 11-14, 2004, J. Ian Munro (Ed.). SIAM, 447--455. http://dl.acm.org/citation.cfm?id=982792.982856.
[31]
Conrado Martínez, Daniel Panario, and Alfredo Viola. 2010. Adaptive sampling strategies for Quickselects. ACM Trans. Algorithms 6, 3 (2010), 53:1--53:45.
[32]
Conrado Martínez and Salvador Roura. 2001. Optimal sampling strategies in Quicksort and Quickselect. SIAM J. Comput. 31, 3 (2001), 683--705.
[33]
David R. Musser. 1997. Introspective sorting and selection algorithms. Software—Practice and Experience 27, 8 (1997), 983--993.
[34]
Charles Price. 1995. MIPS IV Instruction Set. Retrieved December 24, 2018 from http://math-atlas.sourceforge.net/devel/assembly/mips-iv.pdf
[35]
Peter Sanders and Sebastian Winkel. 2004. Super scalar sample sort. In Proceedings of Algorithms (ESA’04), Lecture Notes in Computer Science, Susanne Albers and Tomasz Radzik (Eds.), Vol. 3221. Springer, Berlin, 784--796.
[36]
Robert Sedgewick. 1977. The analysis of Quicksort programs. Acta Inf. 7, 4 (1977), 327--355.
[37]
Robert Sedgewick. 1978. Implementing Quicksort programs. Commun. ACM 21, 10 (1978), 847--857.
[38]
Sebastian Wild and Markus E. Nebel. 2012. Average case analysis of Java 7’s dual pivot Quicksort. In Proceedings of ESA’12, Lecture Notes in Computer Science, Leah Epstein and Paolo Ferragina (Eds.), Vol. 7501. Springer, Berlin, 825--836.
[39]
Sebastian Wild, Markus E. Nebel, and Ralph Neininger. 2015. Average case and distributional analysis of dual-pivot Quicksort. ACM Trans. Algorithms 11, 3 (2015), 22:1--42.
[40]
John W. J. Williams. 1964. Algorithm 232: HEAPSORT. Commun. ACM 7, 6 (1964), 347--348.
[41]
Vladimir Yaroslavskiy. 2009. Dual-Pivot Quicksort algorithm. Retrieved December 24, 2018 from http://codeblab.com/wp-content/uploads/2009/09/DualPivotQuicksort.pdf

Cited By

View all
  • (2024)Data-centric workloads with MPI_SortJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.104833187(104833)Online publication date: May-2024
  • (2023)Performance Evaluation of Parallel Sortings on the Supercomputer FugakuJournal of Information Processing10.2197/ipsjjip.31.45231(452-458)Online publication date: 2023
  • (2023)Billion-scale Detection of Isomorphic Nodes2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW59300.2023.00046(230-233)Online publication date: May-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Journal of Experimental Algorithmics
ACM Journal of Experimental Algorithmics  Volume 24, Issue
Special Issue ESA 2016, Regular Papers and Special Issue SEA 2018
2019
622 pages
ISSN:1084-6654
EISSN:1084-6654
DOI:10.1145/3310279
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 January 2019
Accepted: 01 August 2018
Revised: 01 January 2018
Received: 01 May 2017
Published in JEA Volume 24

Author Tags

  1. In-place sorting
  2. Quicksort
  3. branch mispredictions

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)90
  • Downloads (Last 6 weeks)22
Reflects downloads up to 21 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Data-centric workloads with MPI_SortJournal of Parallel and Distributed Computing10.1016/j.jpdc.2023.104833187(104833)Online publication date: May-2024
  • (2023)Performance Evaluation of Parallel Sortings on the Supercomputer FugakuJournal of Information Processing10.2197/ipsjjip.31.45231(452-458)Online publication date: 2023
  • (2023)Billion-scale Detection of Isomorphic Nodes2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW59300.2023.00046(230-233)Online publication date: May-2023
  • (2023)These Rows Are Made for Sorting and That’s Just What We’ll Do2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00159(2050-2062)Online publication date: Apr-2023
  • (2023)Optimizing the gravitational tree algorithm for many-core processorsMonthly Notices of the Royal Astronomical Society10.1093/mnras/stad4001528:1(821-832)Online publication date: 29-Dec-2023
  • (2023)Parallel Multi-Deque Partition Dual-Deque Merge sorting algorithm using OpenMPScientific Reports10.1038/s41598-023-33583-413:1Online publication date: 19-Apr-2023
  • (2023)Scalable Text Index ConstructionAlgorithms for Big Data10.1007/978-3-031-21534-6_14(252-284)Online publication date: 18-Jan-2023
  • (2022)Improving Quicksort Performance by Optimizing Branch Prediction2022 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE)10.1109/CSDE56538.2022.10089318(1-6)Online publication date: 18-Dec-2022
  • (2022)A sorting algorithm based on ordered block insertionsJournal of Computational Science10.1016/j.jocs.2022.10186664(101866)Online publication date: Oct-2022
  • (2021)Building Advanced SQL Analytics From Low-Level Plan OperatorsProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457288(1001-1013)Online publication date: 9-Jun-2021
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media