robot-piglet 9 месяцев назад
Родитель
Сommit
7bdac764bd

+ 16 - 7
contrib/libs/croaring/README.md

@@ -16,7 +16,7 @@ Bitsets, also called bitmaps, are commonly used as fast data structures. Unfortu
 
 Roaring bitmaps are compressed bitmaps which tend to outperform conventional compressed bitmaps such as WAH, EWAH or Concise.
 They are used by several major systems such as [Apache Lucene][lucene] and derivative systems such as [Solr][solr] and
-[Elasticsearch][elasticsearch], [Metamarkets' Druid][druid], [LinkedIn Pinot][pinot], [Netflix Atlas][atlas], [Apache Spark][spark], [OpenSearchServer][opensearchserver], [Cloud Torrent][cloudtorrent], [Whoosh][whoosh], [InfluxDB](https://www.influxdata.com), [Pilosa][pilosa], [Bleve](http://www.blevesearch.com), [Microsoft Visual Studio Team Services (VSTS)][vsts], and eBay's [Apache Kylin][kylin]. The CRoaring library is used in several systems such as [Apache Doris](http://doris.incubator.apache.org), [ClickHouse](https://github.com/ClickHouse/ClickHouse), and [StarRocks](https://github.com/StarRocks/starrocks). The YouTube SQL Engine, [Google Procella](https://research.google/pubs/pub48388/), uses Roaring bitmaps for indexing.
+[Elasticsearch][elasticsearch], [Metamarkets' Druid][druid], [LinkedIn Pinot][pinot], [Netflix Atlas][atlas], [Apache Spark][spark], [OpenSearchServer][opensearchserver], [Cloud Torrent][cloudtorrent], [Whoosh][whoosh], [InfluxDB](https://www.influxdata.com), [Pilosa][pilosa], [Bleve](http://www.blevesearch.com), [Microsoft Visual Studio Team Services (VSTS)][vsts], and eBay's [Apache Kylin][kylin]. The CRoaring library is used in several systems such as [Apache Doris](http://doris.incubator.apache.org), [ClickHouse](https://github.com/ClickHouse/ClickHouse), [Redpanda](https://github.com/redpanda-data/redpanda), and [StarRocks](https://github.com/StarRocks/starrocks). The YouTube SQL Engine, [Google Procella](https://research.google/pubs/pub48388/), uses Roaring bitmaps for indexing.
 
 We published a peer-reviewed article on the design and evaluation of this library:
 
@@ -147,7 +147,7 @@ Linux or macOS users might follow the following instructions if they have a rece
 # Using Roaring as a CPM dependency
 
 
-If you like CMake and CPM, you can just a few lines in you `CMakeLists.txt` file to grab a `CRoaring` release. [See our CPM demonstration for further details](https://github.com/RoaringBitmap/CPMdemo).
+If you like CMake and CPM, you can add just a few lines in your `CMakeLists.txt` file to grab a `CRoaring` release. [See our CPM demonstration for further details](https://github.com/RoaringBitmap/CPMdemo).
 
 
 
@@ -177,7 +177,7 @@ target_link_libraries(hello roaring::roaring)
 
 # Using as a CMake dependency with FetchContent
 
-If you like CMake, you can just a few lines in you `CMakeLists.txt` file to grab a `CRoaring` release. [See our demonstration for further details](https://github.com/RoaringBitmap/croaring_cmake_demo_single_file).
+If you like CMake, you can add just a few lines in your `CMakeLists.txt` file to grab a `CRoaring` release. [See our demonstration for further details](https://github.com/RoaringBitmap/croaring_cmake_demo_single_file).
 
 If you installed the CRoaring library locally, you may use it with CMake's `find_package` function as in this example:
 
@@ -236,7 +236,16 @@ It will generate three files for C users: ``roaring.h``, ``roaring.c`` and ``ama
 
 # API
 
-The C interface is found in the file ``include/roaring/roaring.h``. We have C++ interface at `cpp/roaring.hh`.
+The C interface is found in the files
+
+- [roaring.h](https://github.com/RoaringBitmap/CRoaring/blob/master/include/roaring/roaring.h),
+- [roaring64.h](https://github.com/RoaringBitmap/CRoaring/blob/master/include/roaring/roaring64.h).
+
+We also have a C++ interface:
+
+- [roaring.hh](https://github.com/RoaringBitmap/CRoaring/blob/master/cpp/roaring.hh),
+- [roaring64map.hh](https://github.com/RoaringBitmap/CRoaring/blob/master/cpp/roaring64map.hh).
+
 
 # Dealing with large volumes
 
@@ -249,7 +258,7 @@ We have microbenchmarks constructed with the Google Benchmarks.
 Under Linux or macOS, you may run them as follows:
 
 ```
-cmake -B build
+cmake -B build -D ENABLE_ROARING_MICROBENCHMARKS=ON
 cmake --build build
 ./build/microbenchmarks/bench
 ```
@@ -266,7 +275,7 @@ have an x64 processor, you could benchmark the code without AVX-512 even if both
 and compiler supports it:
 
 ```
-cmake -B buildnoavx512 -D ROARING_DISABLE_AVX512=ON
+cmake -B buildnoavx512 -D ROARING_DISABLE_AVX512=ON -D ENABLE_ROARING_MICROBENCHMARKS=ON
 cmake --build buildnoavx512
 ./buildnoavx512/microbenchmarks/bench
 ```
@@ -274,7 +283,7 @@ cmake --build buildnoavx512
 You can benchmark without AVX or AVX-512 as well:
 
 ```
-cmake -B buildnoavx -D ROARING_DISABLE_AVX=ON
+cmake -B buildnoavx -D ROARING_DISABLE_AVX=ON -D ENABLE_ROARING_MICROBENCHMARKS=ON
 cmake --build buildnoavx
 ./buildnoavx/microbenchmarks/bench
 ```

+ 7 - 0
contrib/libs/croaring/include/roaring/roaring64.h

@@ -292,6 +292,13 @@ uint64_t roaring64_bitmap_maximum(const roaring64_bitmap_t *r);
  */
 bool roaring64_bitmap_run_optimize(roaring64_bitmap_t *r);
 
+/**
+ *  (For advanced users.)
+ * Collect statistics about the bitmap
+ */
+void roaring64_bitmap_statistics(const roaring64_bitmap_t *r,
+                                 roaring64_statistics_t *stat);
+
 /**
  * Perform internal consistency checks.
  *

+ 36 - 2
contrib/libs/croaring/include/roaring/roaring_types.h

@@ -89,14 +89,48 @@ typedef struct roaring_statistics_s {
         max_value; /* the maximal value, undefined if cardinality is zero */
     uint32_t
         min_value; /* the minimal value, undefined if cardinality is zero */
-    uint64_t sum_value; /* the sum of all values (could be used to compute
-                           average) */
+    uint64_t sum_value; /* deprecated always zero */
 
     uint64_t cardinality; /* total number of values stored in the bitmap */
 
     // and n_values_arrays, n_values_rle, n_values_bitmap
 } roaring_statistics_t;
 
+/**
+ *  (For advanced users.)
+ * The roaring64_statistics_t can be used to collect detailed statistics about
+ * the composition of a roaring64 bitmap.
+ */
+typedef struct roaring64_statistics_s {
+    uint64_t n_containers; /* number of containers */
+
+    uint64_t n_array_containers;  /* number of array containers */
+    uint64_t n_run_containers;    /* number of run containers */
+    uint64_t n_bitset_containers; /* number of bitmap containers */
+
+    uint64_t
+        n_values_array_containers;    /* number of values in array containers */
+    uint64_t n_values_run_containers; /* number of values in run containers */
+    uint64_t
+        n_values_bitset_containers; /* number of values in  bitmap containers */
+
+    uint64_t n_bytes_array_containers;  /* number of allocated bytes in array
+                                           containers */
+    uint64_t n_bytes_run_containers;    /* number of allocated bytes in run
+                                           containers */
+    uint64_t n_bytes_bitset_containers; /* number of allocated bytes in  bitmap
+                                           containers */
+
+    uint64_t
+        max_value; /* the maximal value, undefined if cardinality is zero */
+    uint64_t
+        min_value; /* the minimal value, undefined if cardinality is zero */
+
+    uint64_t cardinality; /* total number of values stored in the bitmap */
+
+    // and n_values_arrays, n_values_rle, n_values_bitmap
+} roaring64_statistics_t;
+
 /**
  * Roaring-internal type used to iterate within a roaring container.
  */

+ 6 - 5
contrib/libs/croaring/include/roaring/roaring_version.h

@@ -1,11 +1,12 @@
-// /include/roaring/roaring_version.h automatically generated by release.py, do
-// not change by hand
+// clang-format off
+// /include/roaring/roaring_version.h automatically generated by release.py, do not change by hand
 #ifndef ROARING_INCLUDE_ROARING_VERSION
 #define ROARING_INCLUDE_ROARING_VERSION
-#define ROARING_VERSION "3.0.0"
+#define ROARING_VERSION "4.0.0"
 enum {
-    ROARING_VERSION_MAJOR = 3,
+    ROARING_VERSION_MAJOR = 4,
     ROARING_VERSION_MINOR = 0,
     ROARING_VERSION_REVISION = 0
 };
-#endif  // ROARING_INCLUDE_ROARING_VERSION
+#endif // ROARING_INCLUDE_ROARING_VERSION
+// clang-format on

+ 7 - 7
contrib/libs/croaring/src/art/art.c

@@ -1041,7 +1041,7 @@ static art_indexed_child_t art_node_next_child(const art_node_t *node,
             return art_node256_next_child((art_node256_t *)node, index);
         default:
             assert(false);
-            return (art_indexed_child_t){0};
+            return (art_indexed_child_t){0, 0, 0};
     }
 }
 
@@ -1065,7 +1065,7 @@ static art_indexed_child_t art_node_prev_child(const art_node_t *node,
             return art_node256_prev_child((art_node256_t *)node, index);
         default:
             assert(false);
-            return (art_indexed_child_t){0};
+            return (art_indexed_child_t){0, 0, 0};
     }
 }
 
@@ -1089,7 +1089,7 @@ static art_indexed_child_t art_node_child_at(const art_node_t *node,
             return art_node256_child_at((art_node256_t *)node, index);
         default:
             assert(false);
-            return (art_indexed_child_t){0};
+            return (art_indexed_child_t){0, 0, 0};
     }
 }
 
@@ -1113,7 +1113,7 @@ static art_indexed_child_t art_node_lower_bound(const art_node_t *node,
             return art_node256_lower_bound((art_node256_t *)node, key_chunk);
         default:
             assert(false);
-            return (art_indexed_child_t){0};
+            return (art_indexed_child_t){0, 0, 0};
     }
 }
 
@@ -1670,7 +1670,7 @@ static bool art_node_iterator_lower_bound(const art_node_t *node,
 }
 
 art_iterator_t art_init_iterator(const art_t *art, bool first) {
-    art_iterator_t iterator = {0};
+    art_iterator_t iterator = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
     if (art->root == NULL) {
         return iterator;
     }
@@ -1727,7 +1727,7 @@ bool art_iterator_lower_bound(art_iterator_t *iterator,
 }
 
 art_iterator_t art_lower_bound(const art_t *art, const art_key_chunk_t *key) {
-    art_iterator_t iterator = {0};
+    art_iterator_t iterator = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
     if (art->root != NULL) {
         art_node_iterator_lower_bound(art->root, &iterator, key);
     }
@@ -1735,7 +1735,7 @@ art_iterator_t art_lower_bound(const art_t *art, const art_key_chunk_t *key) {
 }
 
 art_iterator_t art_upper_bound(const art_t *art, const art_key_chunk_t *key) {
-    art_iterator_t iterator = {0};
+    art_iterator_t iterator = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
     if (art->root != NULL) {
         if (art_node_iterator_lower_bound(art->root, &iterator, key) &&
             art_compare_keys(iterator.key, key) == 0) {

+ 6 - 26
contrib/libs/croaring/src/roaring.c

@@ -199,7 +199,7 @@ roaring_bitmap_t *roaring_bitmap_of(size_t n_args, ...) {
     // todo: could be greatly optimized but we do not expect this call to ever
     // include long lists
     roaring_bitmap_t *answer = roaring_bitmap_create();
-    roaring_bulk_context_t context = {0};
+    roaring_bulk_context_t context = {0, 0, 0, 0};
     va_list ap;
     va_start(ap, n_args);
     for (size_t i = 0; i < n_args; i++) {
@@ -371,20 +371,6 @@ void roaring_bitmap_printf_describe(const roaring_bitmap_t *r) {
     printf("}");
 }
 
-typedef struct min_max_sum_s {
-    uint32_t min;
-    uint32_t max;
-    uint64_t sum;
-} min_max_sum_t;
-
-static bool min_max_sum_fnc(uint32_t value, void *param) {
-    min_max_sum_t *mms = (min_max_sum_t *)param;
-    if (value > mms->max) mms->max = value;
-    if (value < mms->min) mms->min = value;
-    mms->sum += value;
-    return true;  // we always process all data points
-}
-
 /**
  *  (For advanced users.)
  * Collect statistics about the bitmap
@@ -395,15 +381,8 @@ void roaring_bitmap_statistics(const roaring_bitmap_t *r,
 
     memset(stat, 0, sizeof(*stat));
     stat->n_containers = ra->size;
-    stat->cardinality = roaring_bitmap_get_cardinality(r);
-    min_max_sum_t mms;
-    mms.min = UINT32_C(0xFFFFFFFF);
-    mms.max = UINT32_C(0);
-    mms.sum = 0;
-    roaring_iterate(r, &min_max_sum_fnc, &mms);
-    stat->min_value = mms.min;
-    stat->max_value = mms.max;
-    stat->sum_value = mms.sum;
+    stat->min_value = roaring_bitmap_minimum(r);
+    stat->max_value = roaring_bitmap_maximum(r);
 
     for (int i = 0; i < ra->size; ++i) {
         uint8_t truetype =
@@ -412,6 +391,7 @@ void roaring_bitmap_statistics(const roaring_bitmap_t *r,
             container_get_cardinality(ra->containers[i], ra->typecodes[i]);
         uint32_t sbytes =
             container_size_in_bytes(ra->containers[i], ra->typecodes[i]);
+        stat->cardinality += card;
         switch (truetype) {
             case BITSET_CONTAINER_TYPE:
                 stat->n_bitset_containers++;
@@ -1561,7 +1541,7 @@ roaring_bitmap_t *roaring_bitmap_deserialize(const void *buf) {
         if (bitmap == NULL) {
             return NULL;
         }
-        roaring_bulk_context_t context = {0};
+        roaring_bulk_context_t context = {0, 0, 0, 0};
         for (uint32_t i = 0; i < card; i++) {
             // elems may not be aligned, read with memcpy
             uint32_t elem;
@@ -1604,7 +1584,7 @@ roaring_bitmap_t *roaring_bitmap_deserialize_safe(const void *buf,
         if (bitmap == NULL) {
             return NULL;
         }
-        roaring_bulk_context_t context = {0};
+        roaring_bulk_context_t context = {0, 0, 0, 0};
         for (uint32_t i = 0; i < card; i++) {
             // elems may not be aligned, read with memcpy
             uint32_t elem;

+ 50 - 5
contrib/libs/croaring/src/roaring64.c

@@ -224,7 +224,7 @@ roaring64_bitmap_t *roaring64_bitmap_of_ptr(size_t n_args,
 
 roaring64_bitmap_t *roaring64_bitmap_of(size_t n_args, ...) {
     roaring64_bitmap_t *r = roaring64_bitmap_create();
-    roaring64_bulk_context_t context = {0};
+    roaring64_bulk_context_t context = {0, 0, 0, 0, 0, 0, 0};
     va_list ap;
     va_start(ap, n_args);
     for (size_t i = 0; i < n_args; i++) {
@@ -317,7 +317,7 @@ void roaring64_bitmap_add_many(roaring64_bitmap_t *r, size_t n_args,
         return;
     }
     const uint64_t *end = vals + n_args;
-    roaring64_bulk_context_t context = {0};
+    roaring64_bulk_context_t context = {0, 0, 0, 0, 0, 0, 0};
     for (const uint64_t *current_val = vals; current_val != end;
          current_val++) {
         roaring64_bitmap_add_bulk(r, &context, *current_val);
@@ -456,7 +456,8 @@ bool roaring64_bitmap_contains_bulk(const roaring64_bitmap_t *r,
     uint8_t high48[ART_KEY_BYTES];
     uint16_t low16 = split_key(val, high48);
 
-    if (context->leaf == NULL || context->high_bytes != high48) {
+    if (context->leaf == NULL ||
+        art_compare_keys(context->high_bytes, high48) != 0) {
         // We're not positioned anywhere yet or the high bits of the key
         // differ.
         leaf_t *leaf = (leaf_t *)art_find(&r->art, high48);
@@ -640,7 +641,7 @@ void roaring64_bitmap_remove_many(roaring64_bitmap_t *r, size_t n_args,
         return;
     }
     const uint64_t *end = vals + n_args;
-    roaring64_bulk_context_t context = {0};
+    roaring64_bulk_context_t context = {0, 0, 0, 0, 0, 0, 0};
     for (const uint64_t *current_val = vals; current_val != end;
          current_val++) {
         roaring64_bitmap_remove_bulk(r, &context, *current_val);
@@ -803,6 +804,50 @@ bool roaring64_bitmap_run_optimize(roaring64_bitmap_t *r) {
     return has_run_container;
 }
 
+/**
+ *  (For advanced users.)
+ * Collect statistics about the bitmap
+ */
+void roaring64_bitmap_statistics(const roaring64_bitmap_t *r,
+                                 roaring64_statistics_t *stat) {
+    memset(stat, 0, sizeof(*stat));
+    stat->min_value = roaring64_bitmap_minimum(r);
+    stat->max_value = roaring64_bitmap_maximum(r);
+
+    art_iterator_t it = art_init_iterator(&r->art, true);
+    while (it.value != NULL) {
+        leaf_t *leaf = (leaf_t *)it.value;
+        stat->n_containers++;
+        uint8_t truetype = get_container_type(leaf->container, leaf->typecode);
+        uint32_t card =
+            container_get_cardinality(leaf->container, leaf->typecode);
+        uint32_t sbytes =
+            container_size_in_bytes(leaf->container, leaf->typecode);
+        stat->cardinality += card;
+        switch (truetype) {
+            case BITSET_CONTAINER_TYPE:
+                stat->n_bitset_containers++;
+                stat->n_values_bitset_containers += card;
+                stat->n_bytes_bitset_containers += sbytes;
+                break;
+            case ARRAY_CONTAINER_TYPE:
+                stat->n_array_containers++;
+                stat->n_values_array_containers += card;
+                stat->n_bytes_array_containers += sbytes;
+                break;
+            case RUN_CONTAINER_TYPE:
+                stat->n_run_containers++;
+                stat->n_values_run_containers += card;
+                stat->n_bytes_run_containers += sbytes;
+                break;
+            default:
+                assert(false);
+                roaring_unreachable;
+        }
+        art_iterator_next(&it);
+    }
+}
+
 static bool roaring64_leaf_internal_validate(const art_val_t *val,
                                              const char **reason) {
     leaf_t *leaf = (leaf_t *)val;
@@ -1924,7 +1969,7 @@ bool roaring64_bitmap_iterate(const roaring64_bitmap_t *r,
 
 void roaring64_bitmap_to_uint64_array(const roaring64_bitmap_t *r,
                                       uint64_t *out) {
-    roaring64_iterator_t it = {0};
+    roaring64_iterator_t it;  // gets initialized in the next line
     roaring64_iterator_init_at(r, &it, /*first=*/true);
     roaring64_iterator_read(&it, out, UINT64_MAX);
 }

+ 2 - 2
contrib/libs/croaring/ya.make

@@ -10,9 +10,9 @@ LICENSE(
 
 LICENSE_TEXTS(.yandex_meta/licenses.list.txt)
 
-VERSION(3.0.1)
+VERSION(4.0.0)
 
-ORIGINAL_SOURCE(https://github.com/RoaringBitmap/CRoaring/archive/v3.0.1.tar.gz)
+ORIGINAL_SOURCE(https://github.com/RoaringBitmap/CRoaring/archive/v4.0.0.tar.gz)
 
 ADDINCL(
     GLOBAL contrib/libs/croaring/include

+ 1 - 1
contrib/python/hypothesis/py3/.dist-info/METADATA

@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: hypothesis
-Version: 6.101.0
+Version: 6.102.1
 Summary: A library for property-based testing
 Home-page: https://hypothesis.works
 Author: David R. MacIver and Zac Hatfield-Dodds

+ 16 - 13
contrib/python/hypothesis/py3/hypothesis/internal/conjecture/engine.py

@@ -1072,20 +1072,23 @@ class ConjectureRunner:
                     failed_mutations += 1
                     continue
 
-                assert isinstance(new_data, ConjectureResult)
-                if (
-                    new_data.status >= data.status
-                    and data.buffer != new_data.buffer
-                    and all(
-                        k in new_data.target_observations
-                        and new_data.target_observations[k] >= v
-                        for k, v in data.target_observations.items()
-                    )
-                ):
-                    data = new_data
-                    failed_mutations = 0
+                if new_data is Overrun:
+                    failed_mutations += 1  # pragma: no cover # annoying case
                 else:
-                    failed_mutations += 1
+                    assert isinstance(new_data, ConjectureResult)
+                    if (
+                        new_data.status >= data.status
+                        and data.buffer != new_data.buffer
+                        and all(
+                            k in new_data.target_observations
+                            and new_data.target_observations[k] >= v
+                            for k, v in data.target_observations.items()
+                        )
+                    ):
+                        data = new_data
+                        failed_mutations = 0
+                    else:
+                        failed_mutations += 1
 
     def optimise_targets(self) -> None:
         """If any target observations have been made, attempt to optimise them

Некоторые файлы не были показаны из-за большого количества измененных файлов