miniselect
is a C++ header-only library that contains various generic selection
and partial sorting algorithms with the ease of use, testing, advice on usage and
benchmarking.
Sorting is everywhere and there are many outstanding sorting algorithms that compete in speed, comparison count and cache friendliness. However, selection algorithms are always a bit outside of the competition scope, they are pretty important, for example, in databases ORDER BY LIMIT N is used extremely often which can benefit from more optimal selection and partial sorting algorithms. This library tries to solve this problem with Modern C++.
std::nth_element
and std::partial_sort
functions including custom comparators and order guarantees. Just replace the names of the functions in your project and it should work!You can either include this project as a cmake dependency and then use the headers that are provided in the include folder or just pass the include folder to your compiler.
#include <iostream>
#include <vector>
#include "miniselect/median_of_ninthers.h"
int main() {
std::vector<int> v = {1, 8, 4, 3, 2, 9, 0, 7, 6, 5};
miniselect::median_of_ninthers_select(v.begin(), v.begin() + 5, v.end());
for (const int i : v) {
std::cout << i << ' ';
}
return 0;
}
// Compile it `clang++/g++ -I$DIRECTORY/miniselect/include/ example.cpp -std=c++11 -O3 -o example
// Possible output: 0 1 4 3 2 5 8 7 6 9
// ^ on the right place
Examples can be found in examples.
We support all compilers starting from GCC 7 and Clang 6. We are also planning to support Windows, for now it is best effort but no issues are known so far.
More on which algorithms are available, see documentation. For overview of this work you can read the article in the author's blog.
To test and benchmark, we use Google benchmark library. Simply do in the root directory:
# Check out the libraries.
$ git clone https://github.com/google/benchmark.git
$ git clone https://github.com/google/googletest.git
$ mkdir build && cd build
$ cmake -DMINISELECT_TESTING=on -DBENCHMARK_ENABLE_GTEST_TESTS=off -DBENCHMARK_ENABLE_TESTING=off ..
$ make -j
$ ctest -j4 --output-on-failure
It will create two tests and two benchmarks test_sort
, test_select
,
benchmark_sort
, benchmark_select
. Use them to validate or contribute. You
can also use ctest
.
There are several selection algorithms available, further is the number of elements in the array, is the selection element that is needed to be found (all algorithms are deterministic and not stable unless otherwise is specified):
Name | Average | Best Case | Worst Case | Comparisons | Memory |
---|---|---|---|---|---|
pdqselect | At least . Random data | ||||
Floyd-Rivest | Avg: | ||||
Median Of Medians | Between and . Random data | ||||
Median Of Ninthers | Between and . Random data | ||||
Median Of 3 Random | At least . Random data | ||||
HeapSelect | on average, for some data patterns might be better | ||||
libstdc++ (introselect) | At least . Random data | ||||
libc++ (median of 3) | At least . Random data |
For sorting the situation is similar except every line adds comparisons and pdqselect is using memory.
All functions end either in select
, either in partial_sort
and
their behavior is exactly the same as for
std::nth_element
and std::partial_sort
respectively, i.e. they accept 3 arguments as first
, middle
, end
iterators
and an optional comparator. Several notes:
Compare
function. Standard library
also does not specify the behavior in that matter.All functions are in the miniselect
namespace. See the example for that.
pdqselect
pdqsort
which is acknowledged as one of the fastest generic sort algorithms.miniselect/pdqselect.h
.pdqselect
, pdqselect_branchless
, pdqpartial_sort
, pdqpartial_sort_branchless
. Branchless version uses branchless partition algorithm provided by pdqsort
. Use it if your comparison function is branchless, it might give performance for very big ranges.miniselect/floyd_rivest_select.h
.floyd_rivest_select
, floyd_rivest_partial_sort
.We present here two gifs, for median and for order statistic.
miniselect/median_of_medians.h
.median_of_medians_select
, median_of_medians_partial_sort
.miniselect/median_of_ninthers.h
.median_of_ninthers_select
, median_of_ninthers_partial_sort
.PICK
algorithms and has lower constanst than MedianOfMedians.miniselect/median_of_3_random.h
.median_of_3_random_select
, median_of_3_random_partial_sort
.std::nth_element
, however instead of falling back to MedianOfMedians it is using HeapSelect which adds logarithm to its worst complexity.<algorithm>
.std::nth_element
.std::nth_element
.<algorithm>
.std::nth_element
.std::partial_sort
or HeapSelect
<algorithm>
, miniselect/heap_select.h
.std::partial_sort
, heap_select
, heap_partial_sort
.We use 10 datasets and 8 algorithms with 10000000 elements to find median and
other on Intel(R) Core(TM) i5-4200H CPU @ 2.80GHz
for std::vector<int>
,
for median the benchmarks are the following:
For smaller , for example, 1000, the results are the following
Other benchmarks can be found here.
If you are planning to use miniselect in your product, please work from one of our releases and if you wish, you can write the acknowledgment in this section for visibility.
Patches are welcome with new algorithms! You should add the selection algorithm together with the partial sorting algorithm in include, add tests in testing and ideally run benchmarks to see how it performs. If you also have some data cases to test against, we would be more than happy to merge them.
Firstly the author was interested if any research had been done for small in selection algorithms and was struggling to find working implementations to compare different approaches from standard library and quickselect algorithms. After that it turned out that the problem is much more interesting than it looks like and after reading The Art of Computer Programming from Donald Knuth about minimum comparison sorting and selection algorithms the author decided to look through all non-popular algorithms and try them out.
The author have not found any decent library for selection algorithms and little research is published in open source, so that they decided to merge all that implementations and compare them with possible merging of different ideas into a decent one algorithm for most needs. For a big story of adventures see the author's blog post.
The code is made available under the Boost License 1.0.
Library | License |
---|---|
pdqsort | MIT |
MedianOfNinthers | Boost License 1.0 |