20837 – libc++ std::sort has O(n^2) worst case, standard mandates O(N log(N))

LLVM Bugzilla is read-only and represents the historical archive of all LLVM issues filled before November 26, 2021. Use github to submit LLVM bugs

Bug 20837 - libc++ std::sort has O(n^2) worst case, standard mandates O(N log(N))

Summary: libc++ std::sort has O(n^2) worst case, standard mandates O(N log(N))

Status:	RESOLVED FIXED

Alias:	None

Product:	libc++
Classification:	Unclassified
Component:	All Bugs (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P normal
Assignee:	Marshall Clow (home)

URL:
Keywords:

Duplicates (1):	28551 (view as bug list)
Depends on:
Blocks:

Reported:	2014-09-02 13:40 PDT by Orson Peters
Modified:	2021-11-23 04:42 PST (History)
CC List:	12 users (show)

See Also:
Fixed By Commit(s):

Attachments
Evidence for quadratic worst case. (1.15 KB, text/plain) 2014-09-02 13:40 PDT, Orson Peters	Details
A patch that fixes the quadratic behaviour by implementing introsort. (3.67 KB, patch) 2014-09-02 13:41 PDT, Orson Peters	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Orson Peters 2014-09-02 13:40:12 PDT

Created attachment 12978 [details]
Evidence for quadratic worst case.

The C++ standard mandates that `std::sort` has complexity O(N log(N)), but libc++'s implementation takes O(N^2) in the worst case.

As proof I've attached a program that constructs a worst case input for several sizes. It also instruments the number of comparisons used to sort these worst cases and prints the results. The technique used is described in the 1999 paper "A Killer Adversary for Quicksort" by M. D. McIlroy.

Compiling and running:

$ clang++ -O2 -m64 -march=native -std=c++11 -stdlib=libc++ main.cpp -nodefaultlibs -lc++ -lc++abi -lm -lc -lgcc_s -lgcc && ./a.out
N: comparisons
100: 2479
200: 10229
400: 40729
800: 161729
1600: 448698
3200: 1413898
6400: 5264298

This is clearly quadratic. To illustrate this defect more, these are the comparison counts given by compiling using libstdc++:

$ clang++ -O2 -m64 -march=native -std=c++11 main.cpp && ./a.out
N: comparisons
100: 1742
200: 4260
400: 10035
800: 22784
1600: 51049
3200: 112386
6400: 244850

Inspecting the source of <algorithm> shows the cause of the issue: there is no introsort implemented, only quicksort (albeit with non-trivial optimizations, but nothing preventing the worst case). I've written a patch that augments the current implementation to make it work as an introspective sort, switching to heapsort if the recursion depth exceeds 2*floor(log(N)). This post can only have one attachment, so I'll post it as an attachment to a comment.

Having not contributed to libc++ before I'm not 100% familiar with all coding styles/(un)written rules. My changes are rather minimal though, so I've just followed what style could be found in context. And of course I knowingly submit my code under the libc++ licenses, the MIT license and the UIUC License.

Comment 1 Orson Peters 2014-09-02 13:41:28 PDT

Created attachment 12979 [details]
A patch that fixes the quadratic behaviour by implementing introsort.

This is the patch my bug report refers to.

Comment 2 Hal Finkel 2014-09-02 13:46:05 PDT

Hi Orson,

Please mail this patch to the cfe-commits list for review. Please make sure the patch is sent as an attachment to the e-mail (not inline text), and please make sure the repeat the background information you've included here in the e-mail itself.

For more information on the development policy and workflow, please see: http://llvm.org/docs/DeveloperPolicy.html

Thanks!

Comment 3 Orson Peters 2014-09-02 13:59:57 PDT

Hi Hal,

I've sent the patch to the mailing list, where it is waiting for moderator approval since I'm not a member of the list.


Greetings,

Orson Peters

Comment 4 Marshall Clow (home) 2015-03-03 08:33:19 PST

An update. EricWF is working on a set of benchmarks for std::sort and other algorithms/containers. He expects to finish this quite soon, and I'll be using that to make sure this doesn't introduce a performance regression - and will be adding this to the test suite.

So I haven't forgotten about it.

Comment 5 Hal Finkel 2016-07-01 09:58:31 PDT

(In reply to comment #4)
> An update. EricWF is working on a set of benchmarks for std::sort and other
> algorithms/containers. He expects to finish this quite soon, and I'll be
> using that to make sure this doesn't introduce a performance regression -
> and will be adding this to the test suite.
> 
> So I haven't forgotten about it.

What's the status on this?

Comment 6 Sebastian Pop 2016-07-14 13:11:40 PDT

I think std::sort() is still broken in trunk as of today.  std::sort() should not call the compare function on the same element: it cannot be O(N log N) if it does.

Related bug: https://llvm.org/bugs/show_bug.cgi?id=28551

Reduced testcase:
$ cat a.cc
#include <algorithm>
#include <vector>
#include <cassert>

int main(int argc, char** argv) {
  int size = 100;
  std::vector<int> v(size);

  // Elements in v are unique.
  for (int i = 0; i < size; ++i)
    v[i] = i;

  std::sort(v.begin(), v.end(),
            [&](int x, int y) { assert(x != y); return x < y; });

  return 0;
}
$ clang++ -std=c++11 -stdlib=libc++ a.cc -o a.out
$ ./a.out
a.out: a.cc:14: auto main(int, char **)::(anonymous class)::operator()(int, int) const: Assertion `x != y' failed.
./go.sh: line 5: 19447 Aborted                 (core dumped) ./a.out

$ clang++ -std=c++11 -stdlib=libstdc++ a.cc -o a.out
$ ./a.out
Works fine with libstdc++.

Comment 7 Sebastian Pop 2016-07-14 13:25:53 PDT

*** Bug 28551 has been marked as a duplicate of this bug. ***

Comment 8 Eric Fiselier 2016-07-18 01:58:16 PDT

I would like to point out that our std::sort implementation is between 2x and 5x faster on already sorted inputs than libstdc++ is.

Some benchmark results:
Run on (4 X 4228.3 MHz CPU s)

// libc++ //
Benchmark                                       Time           CPU Iterations
-----------------------------------------------------------------------------
BM_Sort/random_uint32/1024                 170614 ns     174629 ns       4375
BM_Sort/sorted_ascending_uint32/1024        12902 ns      13276 ns      53030
BM_Sort/sorted_descending_uint32/1024       20919 ns      21712 ns      29661
BM_Sort/single_element_uint32/1024          13747 ns      14384 ns      60345
BM_Sort/random_strings/1024               1097028 ns     998514 ns        673
BM_Sort/sorted_ascending_strings/1024      233688 ns     208469 ns       3070
BM_Sort/sorted_descending_strings/1024     341642 ns     361934 ns       1923
BM_Sort/single_element_strings/1024       1081605 ns    1126143 ns        547


// libstdc++ //
Benchmark                                       Time           CPU Iterations
-----------------------------------------------------------------------------
BM_Sort/random_uint32/1024                 156709 ns     161200 ns       4268
BM_Sort/sorted_ascending_uint32/1024        49431 ns      53600 ns      10000
BM_Sort/sorted_descending_uint32/1024       42401 ns      41201 ns      16990
BM_Sort/single_element_uint32/1024          32115 ns      30892 ns      20588
BM_Sort/random_strings/1024                878349 ns     796434 ns        673
BM_Sort/sorted_ascending_strings/1024      576253 ns     532000 ns       1000
BM_Sort/sorted_descending_strings/1024     506059 ns     368390 ns       1683
BM_Sort/single_element_strings/1024       2453953 ns    2542751 ns        269

The benchmarks used can be found here: https://reviews.llvm.org/D22240

Comment 9 Orson Peters 2016-07-18 06:57:41 PDT

Eric Fiselier, can you run that same benchmark with the 'block' branch of my pattern-defeating quicksort algorithm? https://github.com/orlp/pdqsort/tree/block

I've been waiting for a while now on this benchmark suite to start discussion of integrating the algorithm to libc++, I wasn't aware you finished the benchmark.

Comment 10 Sebastian Pop 2016-07-19 10:26:52 PDT

(In reply to comment #8)
> Benchmark                                       Time           CPU Iterations
> -----------------------------------------------------------------------------
> BM_Sort/random_uint32/1024                 170614 ns     174629 ns       4375
> BM_Sort/sorted_ascending_uint32/1024        12902 ns      13276 ns      53030
[...]

Could you please measure the number of instructions executed, data reads, and data writes, instead of the number of nano seconds of execution?  I think those are a more stable metrics than the execution time that may vary with the state of the machine, daemons, noise, etc.

$ valgrind --tool=cachegrind ./a.out
$ cg_annotate cachegrind.out

Comment 11 Alexander Zaitsev 2018-03-03 03:40:03 PST

I double Orson's request about using pdqsort as internal algorithm for std::sort. You can see here, how pdqsort is fast (https://github.com/orlp/pdqsort).

Comment 12 Danila Kutenin 2021-11-23 04:42:33 PST

Fixed with https://reviews.llvm.org/D113413, should land in llvm 14



If you have stability sorting guarantees that you cannot fix and need migration, use -D_LIBCPP_DEBUG_RANDOMIZE_UNSPECIFIED_STABILITY or -D_LIBCPP_DEBUG=1 from  https://reviews.llvm.org/D96946 to randomize the input range before sorting