Browse Source

Multi-Tier database backend for long term metrics storage (#13263)

* Tier part 1

* Tier part 2

* Tier part 3

* Tier part 4

* Tier part 5

* Fix some ML compilation errors

* fix more conflicts

* pass proper tier

* move metric_uuid from state to RRDDIM

* move aclk_live_status from state to RRDDIM

* move ml_dimension from state to RRDDIM

* abstracted the data collection interface

* support flushing for mem db too

* abstracted the query api

* abstracted latest/oldest time per metric

* cleanup

* store_metric for tier1

* fix for store_metric

* allow multiple tiers, more than 2

* state to tier

* Change storage type in db. Query param to request min, max, sum or average

* Store tier data correctly

* Fix skipping tier page type

* Add tier grouping in the tier

* Fix to handle archived charts (part 1)

* Temp fix for query granularity when requesting tier1 data

* Fix parameters in the correct order and calculate the anomaly based on the anomaly count

* Proper tiering grouping

* Anomaly calculation based on anomaly count

* force type checking on storage handles

* update cmocka tests

* fully dynamic number of storage tiers

* fix static allocation

* configure grouping for all tiers; disable tiers for unittest; disable statsd configuration for private charts mode

* use default page dt using the tiering info

* automatic selection of tier

* fix for automatic selection of tier

* working prototype of dynamic tier selection

* automatic selection of tier done right (I hope)

* ask for the proper tier value, based on the grouping function

* fixes for unittests and load_metric_next()

* fixes for lgtm findings

* minor renames

* add dbengine to page cache size setting

* add dbengine to page cache with malloc

* query engine optimized to loop as little are required based on the view_update_every

* query engine grouping methods now do not assume a constant number of points per group and they allocate memory with OWA

* report db points per tier in jsonwrap

* query planer that switches database tiers on the fly to satisfy the query for the entire timeframe

* dbegnine statistics and documentation (in progress)

* calculate average point duration in db

* handle single point pages the best we can

* handle single point pages even better

* Keep page type in the rrdeng_page_descr

* updated doc

* handle future backwards compatibility - improved statistics

* support &tier=X in queries

* enfore increasing iterations on tiers

* tier 1 is always 1 iteration

* backfilling higher tiers on first data collection

* reversed anomaly bit

* set up to 5 tiers

* natural points should only be offered on tier 0, except a specific tier is selected

* do not allow more than 65535 points of tier0 to be aggregated on any tier

* Work only on actually activated tiers

* fix query interpolation

* fix query interpolation again

* fix lgtm finding

* Activate one tier for now

* backfilling of higher tiers using raw metrics from lower tiers

* fix for crash on start when storage tiers is increased from the default

* more statistics on exit

* fix bug that prevented higher tiers to get any values; added backfilling options

* fixed the statistics log line

* removed limit of 255 iterations per tier; moved the code of freezing rd->tiers[x]->db_metric_handle

* fixed division by zero on zero points_wanted

* removed dead code

* Decide on the descr->type for the type of metric

* dont store metrics on unknown page types

* free db_metric_handle on sql based context queries

* Disable STORAGE_POINT value check in the exporting engine unit tests

* fix for db modes other than dbengine

* fix for aclk archived chart queries destroying db_metric_handles of valid rrddims

* fix left-over freez() instead of OWA freez on median queries

Co-authored-by: Costa Tsaousis <costa@netdata.cloud>
Co-authored-by: Vladimir Kobal <vlad@prokk.net>
Stelios Fragkakis 2 years ago
parent
commit
49234f23de

+ 2 - 2
collectors/plugins.d/pluginsd_parser.c

@@ -146,13 +146,13 @@ PARSER_RC pluginsd_dimension_action(void *user, RRDSET *st, char *id, char *name
     if (likely(unhide_dimension)) {
         rrddim_flag_clear(rd, RRDDIM_FLAG_HIDDEN);
         if (rrddim_flag_check(rd, RRDDIM_FLAG_META_HIDDEN)) {
-            (void)sql_set_dimension_option(&rd->state->metric_uuid, NULL);
+            (void)sql_set_dimension_option(&rd->metric_uuid, NULL);
             rrddim_flag_clear(rd, RRDDIM_FLAG_META_HIDDEN);
         }
     } else {
         rrddim_flag_set(rd, RRDDIM_FLAG_HIDDEN);
         if (!rrddim_flag_check(rd, RRDDIM_FLAG_META_HIDDEN)) {
-           (void)sql_set_dimension_option(&rd->state->metric_uuid, "hidden");
+           (void)sql_set_dimension_option(&rd->metric_uuid, "hidden");
             rrddim_flag_set(rd, RRDDIM_FLAG_META_HIDDEN);
         }
     }

+ 5 - 20
collectors/statsd.plugin/statsd.c

@@ -271,9 +271,7 @@ static struct statsd {
     size_t tcp_idle_timeout;
     collected_number decimal_detail;
     size_t private_charts;
-    size_t max_private_charts;
     size_t max_private_charts_hard;
-    RRD_MEMORY_MODE private_charts_memory_mode;
     long private_charts_rrd_history_entries;
     unsigned int private_charts_hidden:1;
 
@@ -290,7 +288,6 @@ static struct statsd {
     LISTEN_SOCKETS sockets;
 } statsd = {
         .enabled = 1,
-        .max_private_charts = 200,
         .max_private_charts_hard = 1000,
         .private_charts_hidden = 0,
         .recvmmsg_size = 10,
@@ -1591,7 +1588,7 @@ static inline void statsd_get_metric_type_and_id(STATSD_METRIC *m, char *type, c
 }
 
 static inline RRDSET *statsd_private_rrdset_create(
-        STATSD_METRIC *m
+        STATSD_METRIC *m __maybe_unused
         , const char *type
         , const char *id
         , const char *name
@@ -1603,16 +1600,6 @@ static inline RRDSET *statsd_private_rrdset_create(
         , int update_every
         , RRDSET_TYPE chart_type
 ) {
-    RRD_MEMORY_MODE memory_mode = statsd.private_charts_memory_mode;
-    long history = statsd.private_charts_rrd_history_entries;
-
-    if(unlikely(statsd.private_charts >= statsd.max_private_charts)) {
-        debug(D_STATSD, "STATSD: metric '%s' will be charted with memory mode = none, because the maximum number of charts has been reached.", m->name);
-        info("STATSD: metric '%s' will be charted with memory mode = none, because the maximum number of charts (%zu) has been reached. Increase the number of charts by editing netdata.conf, [statsd] section.", m->name, statsd.max_private_charts);
-        memory_mode = RRD_MEMORY_MODE_NONE;
-        history = 5;
-    }
-
     statsd.private_charts++;
     RRDSET *st = rrdset_create_custom(
             localhost         // host
@@ -1628,8 +1615,8 @@ static inline RRDSET *statsd_private_rrdset_create(
             , priority        // priority
             , update_every    // update every
             , chart_type      // chart type
-            , memory_mode     // memory mode
-            , history         // history
+            , default_rrd_memory_mode     // memory mode
+            , default_rrd_history_entries // history
     );
     rrdset_flag_set(st, RRDSET_FLAG_STORE_FIRST);
 
@@ -2300,7 +2287,7 @@ static inline void statsd_flush_index_metrics(STATSD_INDEX *index, void (*flush_
         if(unlikely(!(m->options & STATSD_METRIC_OPTION_PRIVATE_CHART_CHECKED))) {
             if(unlikely(statsd.private_charts >= statsd.max_private_charts_hard)) {
                 debug(D_STATSD, "STATSD: metric '%s' will not be charted, because the hard limit of the maximum number of charts has been reached.", m->name);
-                info("STATSD: metric '%s' will not be charted, because the hard limit of the maximum number of charts (%zu) has been reached. Increase the number of charts by editing netdata.conf, [statsd] section.", m->name, statsd.max_private_charts);
+                info("STATSD: metric '%s' will not be charted, because the hard limit of the maximum number of charts (%zu) has been reached. Increase the number of charts by editing netdata.conf, [statsd] section.", m->name, statsd.max_private_charts_hard);
                 m->options &= ~STATSD_METRIC_OPTION_PRIVATE_CHART_ENABLED;
             }
             else {
@@ -2446,9 +2433,7 @@ void *statsd_main(void *ptr) {
 #endif
 
     statsd.charts_for = simple_pattern_create(config_get(CONFIG_SECTION_STATSD, "create private charts for metrics matching", "*"), NULL, SIMPLE_PATTERN_EXACT);
-    statsd.max_private_charts = (size_t)config_get_number(CONFIG_SECTION_STATSD, "max private charts allowed", (long long)statsd.max_private_charts);
-    statsd.max_private_charts_hard = (size_t)config_get_number(CONFIG_SECTION_STATSD, "max private charts hard limit", (long long)statsd.max_private_charts * 5);
-    statsd.private_charts_memory_mode = rrd_memory_mode_id(config_get(CONFIG_SECTION_STATSD, "private charts memory mode", rrd_memory_mode_name(default_rrd_memory_mode)));
+    statsd.max_private_charts_hard = (size_t)config_get_number(CONFIG_SECTION_STATSD, "max private charts hard limit", (long long)statsd.max_private_charts_hard);
     statsd.private_charts_rrd_history_entries = (int)config_get_number(CONFIG_SECTION_STATSD, "private charts history", default_rrd_history_entries);
     statsd.decimal_detail = (collected_number)config_get_number(CONFIG_SECTION_STATSD, "decimal detail", (long long int)statsd.decimal_detail);
     statsd.tcp_idle_timeout = (size_t) config_get_number(CONFIG_SECTION_STATSD, "disconnect idle tcp clients after seconds", (long long int)statsd.tcp_idle_timeout);

+ 15 - 15
daemon/config/README.md

@@ -82,21 +82,21 @@ Please note that your data history will be lost if you have modified `history` p
 
 ### [db] section options
 
-|              setting               |  default   | info                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
-|:----------------------------------:|:----------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-|                mode                | `dbengine` | `dbengine`: The default for long-term metrics storage with efficient RAM and disk usage. Can be extended with `page cache size MB` and `dbengine disk space MB`. <br />`save`: Netdata will save its round robin database on exit and load it on startup. <br />`map`: Cache files will be updated in real-time. Not ideal for systems with high load or slow disks (check `man mmap`). <br />`ram`: The round-robin database will be temporary and it will be lost when Netdata exits. <br />`none`: Disables the database at this host, and disables health monitoring entirely, as that requires a database of metrics. |
-|             retention              |   `3600`   | Used with `mode = save/map/ram/alloc`, not the default `mode = dbengine`. This number reflects the number of entries the `netdata` daemon will by default keep in memory for each chart dimension. Check [Memory Requirements](/database/README.md) for more information.                                                                                                                                                                                                                                                                                                                                                  |
-|            update every            |    `1`     | The frequency in seconds, for data collection. For more information see the [performance guide](/docs/guides/configure/performance.md).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
-|         page cache size MB         |     32     | Determines the amount of RAM in MiB that is dedicated to caching Netdata metric values.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
-|       dbengine disk space MB       |    256     | Determines the amount of disk space in MiB that is dedicated to storing Netdata metric values and all related metadata describing them.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
-|  dbengine multihost disk space MB  |    256     | Same functionality as `dbengine disk space MB`, but includes support for storing metrics streamed to a parent node by its children. Can be used in single-node environments as well.                                                                                                                                                                                                                                                                                                                                                                                                                                       |
-|     memory deduplication (ksm)     |   `yes`    | When set to `yes`, Netdata will offer its in-memory round robin database and the dbengine page cache to kernel same page merging (KSM) for deduplication. For more information check [Memory Deduplication - Kernel Same Page Merging - KSM](/database/README.md#ksm)                                                                                                                                                                                                                                                                                                                                                      |
-| cleanup obsolete charts after secs |   `3600`   | See [monitoring ephemeral containers](/collectors/cgroups.plugin/README.md#monitoring-ephemeral-containers), also sets the timeout for cleaning up obsolete dimensions                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
-|   gap when lost iterations above   |    `1`     |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
-|  cleanup orphan hosts after secs   |   `3600`   | How long to wait until automatically removing from the DB a remote Netdata host (child) that is no longer sending data.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
-|    delete obsolete charts files    |   `yes`    | See [monitoring ephemeral containers](/collectors/cgroups.plugin/README.md#monitoring-ephemeral-containers), also affects the deletion of files for obsolete dimensions                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
-|     delete orphan hosts files      |   `yes`    | Set to `no` to disable non-responsive host removal.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
-|        enable zero metrics         |    `no`    | Set to `yes` to show charts when all their metrics are zero.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
+|              setting               |  default   | info                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+|:----------------------------------:|:----------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+|                mode                | `dbengine` | `dbengine`: The default for long-term metrics storage with efficient RAM and disk usage. Can be extended with `dbengine page cache size MB` and `dbengine disk space MB`. <br />`save`: Netdata will save its round robin database on exit and load it on startup. <br />`map`: Cache files will be updated in real-time. Not ideal for systems with high load or slow disks (check `man mmap`). <br />`ram`: The round-robin database will be temporary and it will be lost when Netdata exits. <br />`none`: Disables the database at this host, and disables health monitoring entirely, as that requires a database of metrics. |
+|             retention              |   `3600`   | Used with `mode = save/map/ram/alloc`, not the default `mode = dbengine`. This number reflects the number of entries the `netdata` daemon will by default keep in memory for each chart dimension. Check [Memory Requirements](/database/README.md) for more information.                                                                                                                                                                                                                                                                                                                                                           |
+|            update every            |    `1`     | The frequency in seconds, for data collection. For more information see the [performance guide](/docs/guides/configure/performance.md).                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+|    dbengine page cache size MB     |     32     | Determines the amount of RAM in MiB that is dedicated to caching Netdata metric values.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+|       dbengine disk space MB       |    256     | Determines the amount of disk space in MiB that is dedicated to storing Netdata metric values and all related metadata describing them.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+|  dbengine multihost disk space MB  |    256     | Same functionality as `dbengine disk space MB`, but includes support for storing metrics streamed to a parent node by its children. Can be used in single-node environments as well.                                                                                                                                                                                                                                                                                                                                                                                                                                                |
+|     memory deduplication (ksm)     |   `yes`    | When set to `yes`, Netdata will offer its in-memory round robin database and the dbengine page cache to kernel same page merging (KSM) for deduplication. For more information check [Memory Deduplication - Kernel Same Page Merging - KSM](/database/README.md#ksm)                                                                                                                                                                                                                                                                                                                                                               |
+| cleanup obsolete charts after secs |   `3600`   | See [monitoring ephemeral containers](/collectors/cgroups.plugin/README.md#monitoring-ephemeral-containers), also sets the timeout for cleaning up obsolete dimensions                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
+|   gap when lost iterations above   |    `1`     |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
+|  cleanup orphan hosts after secs   |   `3600`   | How long to wait until automatically removing from the DB a remote Netdata host (child) that is no longer sending data.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+|    delete obsolete charts files    |   `yes`    | See [monitoring ephemeral containers](/collectors/cgroups.plugin/README.md#monitoring-ephemeral-containers), also affects the deletion of files for obsolete dimensions                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+|     delete orphan hosts files      |   `yes`    | Set to `no` to disable non-responsive host removal.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
+|        enable zero metrics         |    `no`    | Set to `yes` to show charts when all their metrics are zero.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
 
 ### [directories] section options
 

+ 19 - 12
daemon/global_statistics.c

@@ -451,21 +451,28 @@ static void dbengine_statistics_charts(void) {
         RRDHOST *host;
         unsigned long long stats_array[RRDENG_NR_STATS] = {0};
         unsigned long long local_stats_array[RRDENG_NR_STATS];
-        unsigned dbengine_contexts = 0, counted_multihost_db = 0, i;
+        unsigned dbengine_contexts = 0, counted_multihost_db[RRD_STORAGE_TIERS] = { 0 }, i;
 
         rrdhost_foreach_read(host) {
             if (host->rrd_memory_mode == RRD_MEMORY_MODE_DBENGINE && !rrdhost_flag_check(host, RRDHOST_FLAG_ARCHIVED)) {
-                if (&multidb_ctx == host->rrdeng_ctx) {
-                    if (counted_multihost_db)
-                        continue; /* Only count multi-host DB once */
-                    counted_multihost_db = 1;
-                }
-                ++dbengine_contexts;
-                /* get localhost's DB engine's statistics */
-                rrdeng_get_37_statistics(host->rrdeng_ctx, local_stats_array);
-                for (i = 0; i < RRDENG_NR_STATS; ++i) {
-                    /* aggregate statistics across hosts */
-                    stats_array[i] += local_stats_array[i];
+
+                /* get localhost's DB engine's statistics for each tier */
+                for(int tier = 0; tier < storage_tiers ;tier++) {
+                    if(!host->storage_instance[tier]) continue;
+
+                    if(is_storage_engine_shared(host->storage_instance[tier])) {
+                        if(counted_multihost_db[tier])
+                            continue;
+                        else
+                            counted_multihost_db[tier] = 1;
+                    }
+
+                    ++dbengine_contexts;
+                    rrdeng_get_37_statistics((struct rrdengine_instance *)host->storage_instance[tier], local_stats_array);
+                    for (i = 0; i < RRDENG_NR_STATS; ++i) {
+                        /* aggregate statistics across hosts */
+                        stats_array[i] += local_stats_array[i];
+                    }
                 }
             }
         }

+ 16 - 23
daemon/main.c

@@ -55,11 +55,13 @@ void netdata_cleanup_and_exit(int ret) {
         // free the database
         info("EXIT: freeing database memory...");
 #ifdef ENABLE_DBENGINE
-        rrdeng_prepare_exit(&multidb_ctx);
+        for(int tier = 0; tier < storage_tiers ; tier++)
+            rrdeng_prepare_exit(multidb_ctx[tier]);
 #endif
         rrdhost_free_all();
 #ifdef ENABLE_DBENGINE
-        rrdeng_exit(&multidb_ctx);
+        for(int tier = 0; tier < storage_tiers ; tier++)
+            rrdeng_exit(multidb_ctx[tier]);
 #endif
     }
     sql_close_database();
@@ -533,10 +535,16 @@ static void backwards_compatible_config() {
                 CONFIG_SECTION_DB,      "update every");
 
     config_move(CONFIG_SECTION_GLOBAL,  "page cache size",
-                CONFIG_SECTION_DB,      "page cache size MB");
+                CONFIG_SECTION_DB,      "dbengine page cache size MB");
+
+    config_move(CONFIG_SECTION_DB,      "page cache size",
+                CONFIG_SECTION_DB,      "dbengine page cache size MB");
 
     config_move(CONFIG_SECTION_GLOBAL,  "page cache uses malloc",
-                CONFIG_SECTION_DB,      "page cache with malloc");
+                CONFIG_SECTION_DB,      "dbengine page cache with malloc");
+
+    config_move(CONFIG_SECTION_DB,      "page cache with malloc",
+                CONFIG_SECTION_DB,      "dbengine page cache with malloc");
 
     config_move(CONFIG_SECTION_GLOBAL,  "dbengine disk space",
                 CONFIG_SECTION_DB,      "dbengine disk space MB");
@@ -650,12 +658,12 @@ static void get_netdata_configured_variables() {
     // ------------------------------------------------------------------------
     // get default Database Engine page cache size in MiB
 
-    db_engine_use_malloc = config_get_boolean(CONFIG_SECTION_DB, "page cache with malloc", CONFIG_BOOLEAN_NO);
-    default_rrdeng_page_cache_mb = (int) config_get_number(CONFIG_SECTION_DB, "page cache size MB", default_rrdeng_page_cache_mb);
+    db_engine_use_malloc = config_get_boolean(CONFIG_SECTION_DB, "dbengine page cache with malloc", CONFIG_BOOLEAN_NO);
+    default_rrdeng_page_cache_mb = (int) config_get_number(CONFIG_SECTION_DB, "dbengine page cache size MB", default_rrdeng_page_cache_mb);
     if(default_rrdeng_page_cache_mb < RRDENG_MIN_PAGE_CACHE_SIZE_MB) {
         error("Invalid page cache size %d given. Defaulting to %d.", default_rrdeng_page_cache_mb, RRDENG_MIN_PAGE_CACHE_SIZE_MB);
         default_rrdeng_page_cache_mb = RRDENG_MIN_PAGE_CACHE_SIZE_MB;
-        config_set_number(CONFIG_SECTION_DB, "page cache size MB", default_rrdeng_page_cache_mb);
+        config_set_number(CONFIG_SECTION_DB, "dbengine page cache size MB", default_rrdeng_page_cache_mb);
     }
 
     // ------------------------------------------------------------------------
@@ -946,6 +954,7 @@ int main(int argc, char **argv) {
                             default_rrd_update_every = 1;
                             default_rrd_memory_mode = RRD_MEMORY_MODE_RAM;
                             default_health_enabled = 0;
+                            storage_tiers = 1;
                             registry_init();
                             if(rrd_init("unittest", NULL)) {
                                 fprintf(stderr, "rrd_init failed for unittest\n");
@@ -1303,22 +1312,6 @@ int main(int argc, char **argv) {
         // initialize the log files
         open_all_log_files();
 
-#ifdef ENABLE_DBENGINE
-        default_rrdeng_page_fetch_timeout = (int) config_get_number(CONFIG_SECTION_DB, "dbengine page fetch timeout secs", PAGE_CACHE_FETCH_WAIT_TIMEOUT);
-        if (default_rrdeng_page_fetch_timeout < 1) {
-            info("'dbengine page fetch timeout secs' cannot be %d, using 1", default_rrdeng_page_fetch_timeout);
-            default_rrdeng_page_fetch_timeout = 1;
-            config_set_number(CONFIG_SECTION_DB, "dbengine page fetch timeout secs", default_rrdeng_page_fetch_timeout);
-        }
-
-        default_rrdeng_page_fetch_retries = (int) config_get_number(CONFIG_SECTION_DB, "dbengine page fetch retries", MAX_PAGE_CACHE_FETCH_RETRIES);
-        if (default_rrdeng_page_fetch_retries < 1) {
-            info("\"dbengine page fetch retries\" found in netdata.conf cannot be %d, using 1", default_rrdeng_page_fetch_retries);
-            default_rrdeng_page_fetch_retries = 1;
-            config_set_number(CONFIG_SECTION_DB, "dbengine page fetch retries", default_rrdeng_page_fetch_retries);
-        }
-#endif
-
         get_system_timezone();
 
         // --------------------------------------------------------------------

+ 27 - 23
daemon/unit_test.c

@@ -1704,7 +1704,7 @@ static void test_dbengine_create_charts(RRDHOST *host, RRDSET *st[CHARTS], RRDDI
     // Fluh pages for subsequent real values
     for (i = 0 ; i < CHARTS ; ++i) {
         for (j = 0; j < DIMS; ++j) {
-            rrdeng_store_metric_flush_current_page(rd[i][j]);
+            rrdeng_store_metric_flush_current_page((rd[i][j])->tiers[0]->db_collection_handle);
         }
     }
 }
@@ -1751,11 +1751,10 @@ static int test_dbengine_check_metrics(RRDSET *st[CHARTS], RRDDIM *rd[CHARTS][DI
 {
     fprintf(stderr, "%s() running...\n", __FUNCTION__ );
     uint8_t same;
-    time_t time_now, time_retrieved;
+    time_t time_now, time_retrieved, end_time;
     int i, j, k, c, errors, update_every;
     collected_number last;
     NETDATA_DOUBLE value, expected;
-    SN_FLAGS nflags;
     struct rrddim_query_handle handle;
     size_t value_errors = 0, time_errors = 0;
 
@@ -1767,14 +1766,16 @@ static int test_dbengine_check_metrics(RRDSET *st[CHARTS], RRDDIM *rd[CHARTS][DI
         time_now = time_start + (c + 1) * update_every;
         for (i = 0 ; i < CHARTS ; ++i) {
             for (j = 0; j < DIMS; ++j) {
-                rd[i][j]->state->query_ops.init(rd[i][j], &handle, time_now, time_now + QUERY_BATCH * update_every);
+                rd[i][j]->tiers[0]->query_ops.init(rd[i][j]->tiers[0]->db_metric_handle, &handle, time_now, time_now + QUERY_BATCH * update_every, TIER_QUERY_FETCH_SUM);
                 for (k = 0; k < QUERY_BATCH; ++k) {
                     last = ((collected_number)i * DIMS) * REGION_POINTS[current_region] +
                            j * REGION_POINTS[current_region] + c + k;
                     expected = unpack_storage_number(pack_storage_number((NETDATA_DOUBLE)last, SN_DEFAULT_FLAGS));
 
-                    time_t end_time;
-                    value = rd[i][j]->state->query_ops.next_metric(&handle, &time_retrieved, &end_time, &nflags);
+                    STORAGE_POINT sp = rd[i][j]->tiers[0]->query_ops.next_metric(&handle);
+                    value = sp.sum;
+                    time_retrieved = sp.start_time;
+                    end_time = sp.end_time;
 
                     same = (roundndd(value) == roundndd(expected)) ? 1 : 0;
                     if(!same) {
@@ -1793,7 +1794,7 @@ static int test_dbengine_check_metrics(RRDSET *st[CHARTS], RRDDIM *rd[CHARTS][DI
                         errors++;
                     }
                 }
-                rd[i][j]->state->query_ops.finalize(&handle);
+                rd[i][j]->tiers[0]->query_ops.finalize(&handle);
             }
         }
     }
@@ -1826,7 +1827,7 @@ static int test_dbengine_check_rrdr(RRDSET *st[CHARTS], RRDDIM *rd[CHARTS][DIMS]
         ONEWAYALLOC *owa = onewayalloc_create(0);
         RRDR *r = rrd2rrdr(owa, st[i], points, time_start, time_end,
                            RRDR_GROUPING_AVERAGE, 0, RRDR_OPTION_NATURAL_POINTS,
-                           NULL, NULL, NULL, 0);
+                           NULL, NULL, NULL, 0, 0);
 
         if (!r) {
             fprintf(stderr, "    DB-engine unittest %s: empty RRDR on region %d ### E R R O R ###\n", st[i]->name, current_region);
@@ -1913,7 +1914,7 @@ int test_dbengine(void)
     for (i = 0 ; i < CHARTS ; ++i) {
         st[i]->update_every = update_every;
         for (j = 0; j < DIMS; ++j) {
-            rrdeng_store_metric_flush_current_page(rd[i][j]);
+            rrdeng_store_metric_flush_current_page((rd[i][j])->tiers[0]->db_collection_handle);
         }
     }
 
@@ -1932,7 +1933,7 @@ int test_dbengine(void)
     for (i = 0 ; i < CHARTS ; ++i) {
         st[i]->update_every = update_every;
         for (j = 0; j < DIMS; ++j) {
-            rrdeng_store_metric_flush_current_page(rd[i][j]);
+            rrdeng_store_metric_flush_current_page((rd[i][j])->tiers[0]->db_collection_handle);
         }
     }
 
@@ -1960,7 +1961,7 @@ int test_dbengine(void)
         ONEWAYALLOC *owa = onewayalloc_create(0);
         RRDR *r = rrd2rrdr(owa, st[i], points, time_start[0] + update_every,
                            time_end[REGIONS - 1], RRDR_GROUPING_AVERAGE, 0,
-                           RRDR_OPTION_NATURAL_POINTS, NULL, NULL, NULL, 0);
+                           RRDR_OPTION_NATURAL_POINTS, NULL, NULL, NULL, 0, 0);
         if (!r) {
             fprintf(stderr, "    DB-engine unittest %s: empty RRDR ### E R R O R ###\n", st[i]->name);
             ++errors;
@@ -2005,9 +2006,9 @@ int test_dbengine(void)
     }
 error_out:
     rrd_wrlock();
-    rrdeng_prepare_exit(host->rrdeng_ctx);
+    rrdeng_prepare_exit((struct rrdengine_instance *)host->storage_instance[0]);
     rrdhost_delete_charts(host);
-    rrdeng_exit(host->rrdeng_ctx);
+    rrdeng_exit((struct rrdengine_instance *)host->storage_instance[0]);
     rrd_unlock();
 
     return errors + value_errors + time_errors;
@@ -2092,7 +2093,7 @@ static void generate_dbengine_chart(void *arg)
         thread_info->time_max = time_current;
     }
     for (j = 0; j < DSET_DIMS; ++j) {
-        rrdeng_store_metric_finalize(rd[j]);
+        rrdeng_store_metric_finalize((rd[j])->tiers[0]->db_collection_handle);
     }
 }
 
@@ -2182,10 +2183,9 @@ static void query_dbengine_chart(void *arg)
     RRDSET *st;
     RRDDIM *rd;
     uint8_t same;
-    time_t time_now, time_retrieved;
+    time_t time_now, time_retrieved, end_time;
     collected_number generatedv;
     NETDATA_DOUBLE value, expected;
-    SN_FLAGS nflags;
     struct rrddim_query_handle handle;
     size_t value_errors = 0, time_errors = 0;
 
@@ -2213,13 +2213,13 @@ static void query_dbengine_chart(void *arg)
             time_before = MIN(time_after + duration, time_max); /* up to 1 hour queries */
         }
 
-        rd->state->query_ops.init(rd, &handle, time_after, time_before);
+        rd->tiers[0]->query_ops.init(rd->tiers[0]->db_metric_handle, &handle, time_after, time_before, TIER_QUERY_FETCH_SUM);
         ++thread_info->queries_nr;
         for (time_now = time_after ; time_now <= time_before ; time_now += update_every) {
             generatedv = generate_dbengine_chart_value(i, j, time_now);
             expected = unpack_storage_number(pack_storage_number((NETDATA_DOUBLE) generatedv, SN_DEFAULT_FLAGS));
 
-            if (unlikely(rd->state->query_ops.is_finished(&handle))) {
+            if (unlikely(rd->tiers[0]->query_ops.is_finished(&handle))) {
                 if (!thread_info->delete_old_data) { /* data validation only when we don't delete */
                     fprintf(stderr, "    DB-engine stresstest %s/%s: at %lu secs, expecting value " NETDATA_DOUBLE_FORMAT
                         ", found data gap, ### E R R O R ###\n",
@@ -2228,8 +2228,12 @@ static void query_dbengine_chart(void *arg)
                 }
                 break;
             }
-            time_t end_time;
-            value = rd->state->query_ops.next_metric(&handle, &time_retrieved, &end_time, &nflags);
+
+            STORAGE_POINT sp = rd->tiers[0]->query_ops.next_metric(&handle);
+            value = sp.sum;
+            time_retrieved = sp.start_time;
+            end_time = sp.end_time;
+
             if (!netdata_double_isnumber(value)) {
                 if (!thread_info->delete_old_data) { /* data validation only when we don't delete */
                     fprintf(stderr, "    DB-engine stresstest %s/%s: at %lu secs, expecting value " NETDATA_DOUBLE_FORMAT
@@ -2263,7 +2267,7 @@ static void query_dbengine_chart(void *arg)
                 }
             }
         }
-        rd->state->query_ops.finalize(&handle);
+        rd->tiers[0]->query_ops.finalize(&handle);
     } while(!thread_info->done);
 
     if(value_errors)
@@ -2411,9 +2415,9 @@ void dbengine_stress_test(unsigned TEST_DURATION_SEC, unsigned DSET_CHARTS, unsi
     }
     freez(query_threads);
     rrd_wrlock();
-    rrdeng_prepare_exit(host->rrdeng_ctx);
+    rrdeng_prepare_exit((struct rrdengine_instance *)host->storage_instance[0]);
     rrdhost_delete_charts(host);
-    rrdeng_exit(host->rrdeng_ctx);
+    rrdeng_exit((struct rrdengine_instance *)host->storage_instance[0]);
     rrd_unlock();
 }
 

+ 6 - 6
database/engine/README.md

@@ -26,18 +26,18 @@ To use the database engine, open `netdata.conf` and set `[db].mode` to `dbengine
     mode = dbengine
 ```
 
-To configure the database engine, look for the `page cache size MB` and `dbengine multihost disk space MB` settings in the
+To configure the database engine, look for the `dbengine page cache size MB` and `dbengine multihost disk space MB` settings in the
 `[db]` section of your `netdata.conf`. The Agent ignores the `[db].retention` setting when using the dbengine.
 
 ```conf
 [db]
-    page cache size MB = 32
+    dbengine page cache size MB = 32
     dbengine multihost disk space MB = 256
 ```
 
 The above values are the default values for Page Cache size and DB engine disk space quota.
 
-The `page cache size MB` option determines the amount of RAM dedicated to caching Netdata metric values. The
+The `dbengine page cache size MB` option determines the amount of RAM dedicated to caching Netdata metric values. The
 actual page cache size will be slightly larger than this figure—see the [memory requirements](#memory-requirements)
 section for details.
 
@@ -59,10 +59,10 @@ Netdata metric values per legacy database engine instance (see [details on the l
 
 ### Streaming metrics to the database engine
 
-When using the multihost database engine, all parent and child nodes share the same `page cache size MB` and `dbengine
+When using the multihost database engine, all parent and child nodes share the same `dbengine page cache size MB` and `dbengine
 multihost disk space MB` in a single dbengine instance. The [**database engine
 calculator**](/docs/store/change-metrics-storage.md#calculate-the-system-resources-ram-disk-space-needed-to-store-metrics)
-helps you properly set `page cache size MB` and `dbengine multihost disk space MB` on your parent node to allocate enough
+helps you properly set `dbengine page cache size MB` and `dbengine multihost disk space MB` on your parent node to allocate enough
 resources based on your metrics retention policy and how many child nodes you have.
 
 #### Legacy mode
@@ -98,7 +98,7 @@ available memory.
 There are explicit memory requirements **per** DB engine **instance**:
 
 -   The total page cache memory footprint will be an additional `#dimensions-being-collected x 4096 x 2` bytes over what
-    the user configured with `page cache size MB`.
+    the user configured with `dbengine page cache size MB`.
 
 -   an additional `#pages-on-disk x 4096 x 0.03` bytes of RAM are allocated for metadata.
 

+ 27 - 1
database/engine/datafile.c

@@ -444,18 +444,44 @@ void finalize_data_files(struct rrdengine_instance *ctx)
     struct rrdengine_journalfile *journalfile;
     struct extent_info *extent, *next_extent;
 
+    size_t extents_number = 0;
+    size_t extents_bytes = 0;
+    size_t page_compressed_sizes = 0;
+
+    size_t files_number = 0;
+    size_t files_bytes = 0;
+
     for (datafile = ctx->datafiles.first ; datafile != NULL ; datafile = next_datafile) {
         journalfile = datafile->journalfile;
         next_datafile = datafile->next;
 
         for (extent = datafile->extents.first ; extent != NULL ; extent = next_extent) {
+            extents_number++;
+            extents_bytes += sizeof(*extent) + sizeof(struct rrdeng_page_descr *) * extent->number_of_pages;
+            page_compressed_sizes += extent->size;
+
             next_extent = extent->next;
             freez(extent);
         }
         close_journal_file(journalfile, datafile);
         close_data_file(datafile);
+
+        files_number++;
+        files_bytes += sizeof(*journalfile) + sizeof(*datafile);
+
         freez(journalfile);
         freez(datafile);
-
     }
+
+    if(!files_number) files_number = 1;
+    if(!extents_number) extents_number = 1;
+
+    info("DBENGINE STATISTICS ON DATAFILES:"
+         " Files %zu, structures %zu bytes, %0.2f bytes per file."
+         " Extents %zu, structures %zu bytes, %0.2f bytes per extent."
+         " Compressed size of all pages: %zu bytes."
+         , files_number, files_bytes, (double)files_bytes/files_number
+         , extents_number, extents_bytes, (double)extents_bytes/extents_number
+         , page_compressed_sizes
+         );
 }

+ 3 - 2
database/engine/journalfile.c

@@ -302,8 +302,8 @@ static void restore_extent_metadata(struct rrdengine_instance *ctx, struct rrden
         Pvoid_t *PValue;
         struct pg_cache_page_index *page_index = NULL;
 
-        if (PAGE_METRICS != jf_metric_data->descr[i].type) {
-            error("Unknown page type encountered.");
+        if (jf_metric_data->descr[i].type > PAGE_TYPE_MAX) {
+            error("Unknown page type %d encountered.", jf_metric_data->descr[i].type );
             continue;
         }
         temp_id = (uuid_t *)jf_metric_data->descr[i].uuid;
@@ -331,6 +331,7 @@ static void restore_extent_metadata(struct rrdengine_instance *ctx, struct rrden
         descr->end_time = jf_metric_data->descr[i].end_time;
         descr->id = &page_index->id;
         descr->extent = extent;
+        descr->type = jf_metric_data->descr[i].type;
         extent->pages[valid_pages++] = descr;
         pg_cache_insert(ctx, page_index, descr);
     }

+ 133 - 12
database/engine/pagecache.c

@@ -1194,24 +1194,66 @@ void init_page_cache(struct rrdengine_instance *ctx)
     init_committed_page_index(ctx);
 }
 
+
+
+/*
+ * METRIC                                            # number
+ * 1. INDEX: JudyHS                                  # bytes
+ * 2. DATA: page_index                               # bytes
+ *
+ * PAGE (1 page of 1 metric)                         # number
+ * 1. INDEX AT METRIC: page_index->JudyL_array       # bytes
+ * 2. DATA: descr                                    # bytes
+ *
+ * PAGE CACHE (1 page of 1 metric at the cache)      # number
+ * 1. pg_cache_descr (if PG_CACHE_DESCR_ALLOCATED)   # bytes
+ * 2. data (if RRD_PAGE_POPULATED)                   # bytes
+ *
+ */
+
+
 void free_page_cache(struct rrdengine_instance *ctx)
 {
     struct page_cache *pg_cache = &ctx->pg_cache;
-    Word_t ret_Judy, bytes_freed = 0;
     Pvoid_t *PValue;
     struct pg_cache_page_index *page_index, *prev_page_index;
     Word_t Index;
     struct rrdeng_page_descr *descr;
     struct page_cache_descr *pg_cache_descr;
 
+    Word_t metrics_number      = 0,
+           metrics_bytes       = 0,
+           metrics_index_bytes = 0,
+           metrics_duration    = 0;
+
+    Word_t pages_number        = 0,
+           pages_bytes         = 0,
+           pages_index_bytes   = 0;
+
+    Word_t pages_size_per_type[256]  = { 0 },
+           pages_count_per_type[256] = { 0 };
+
+    Word_t cache_pages_number  = 0,
+           cache_pages_bytes   = 0,
+           cache_pages_data_bytes  = 0;
+
+    size_t points_in_db        = 0,
+           uncompressed_points_size = 0,
+           seconds_in_db       = 0,
+           single_point_pages  = 0;
+
+    Word_t pages_dirty_index_bytes = 0;
+
+    usec_t oldest_time_ut = LONG_MAX, latest_time_ut = 0;
+
     /* Free committed page index */
-    ret_Judy = JudyLFreeArray(&pg_cache->committed_page_index.JudyL_array, PJE0);
+    pages_dirty_index_bytes = JudyLFreeArray(&pg_cache->committed_page_index.JudyL_array, PJE0);
     fatal_assert(NULL == pg_cache->committed_page_index.JudyL_array);
-    bytes_freed += ret_Judy;
 
     for (page_index = pg_cache->metrics_index.last_page_index ;
          page_index != NULL ;
          page_index = prev_page_index) {
+
         prev_page_index = page_index->prev;
 
         /* Find first page in range */
@@ -1219,37 +1261,116 @@ void free_page_cache(struct rrdengine_instance *ctx)
         PValue = JudyLFirst(page_index->JudyL_array, &Index, PJE0);
         descr = unlikely(NULL == PValue) ? NULL : *PValue;
 
+        size_t metric_duration = 0;
+        size_t metric_update_every = 0;
+        size_t metric_single_point_pages = 0;
+
         while (descr != NULL) {
             /* Iterate all page descriptors of this metric */
 
             if (descr->pg_cache_descr_state & PG_CACHE_DESCR_ALLOCATED) {
+                cache_pages_number++;
+
                 /* Check rrdenglocking.c */
                 pg_cache_descr = descr->pg_cache_descr;
                 if (pg_cache_descr->flags & RRD_PAGE_POPULATED) {
                     dbengine_page_free(pg_cache_descr->page);
-                    bytes_freed += RRDENG_BLOCK_SIZE;
+                    cache_pages_data_bytes += RRDENG_BLOCK_SIZE;
                 }
                 rrdeng_destroy_pg_cache_descr(ctx, pg_cache_descr);
-                bytes_freed += sizeof(*pg_cache_descr);
+                cache_pages_bytes += sizeof(*pg_cache_descr);
+            }
+
+            if(descr->start_time < oldest_time_ut)
+                oldest_time_ut = descr->start_time;
+
+            if(descr->end_time > latest_time_ut)
+                latest_time_ut = descr->end_time;
+
+            pages_size_per_type[descr->type] += descr->page_length;
+            pages_count_per_type[descr->type]++;
+
+            size_t points_in_page = (descr->page_length / ctx->storage_size);
+            size_t page_duration  = ((descr->end_time - descr->start_time) / USEC_PER_SEC);
+            size_t update_every = (page_duration == 0) ? 1 : page_duration / (points_in_page - 1);
+
+            if (!page_duration && metric_update_every) {
+                page_duration = metric_update_every;
+                update_every = metric_update_every;
+            }
+            else if(page_duration)
+                metric_update_every = update_every;
+
+            uncompressed_points_size += descr->page_length;
+
+            if(page_duration > 0) {
+                page_duration = update_every * points_in_page;
+                metric_duration += page_duration;
+                seconds_in_db += page_duration;
+                points_in_db += descr->page_length / ctx->storage_size;
             }
+            else
+                metric_single_point_pages++;
+
             freez(descr);
-            bytes_freed += sizeof(*descr);
+            pages_bytes += sizeof(*descr);
+            pages_number++;
 
             PValue = JudyLNext(page_index->JudyL_array, &Index, PJE0);
             descr = unlikely(NULL == PValue) ? NULL : *PValue;
         }
 
+        if(metric_single_point_pages && metric_update_every) {
+            points_in_db += metric_single_point_pages;
+            seconds_in_db += metric_update_every * metric_single_point_pages;
+            metric_duration += metric_update_every * metric_single_point_pages;
+        }
+        else
+            single_point_pages += metric_single_point_pages;
+
         /* Free page index */
-        ret_Judy = JudyLFreeArray(&page_index->JudyL_array, PJE0);
+        pages_index_bytes += JudyLFreeArray(&page_index->JudyL_array, PJE0);
         fatal_assert(NULL == page_index->JudyL_array);
-        bytes_freed += ret_Judy;
         freez(page_index);
-        bytes_freed += sizeof(*page_index);
+
+        metrics_number++;
+        metrics_bytes += sizeof(*page_index);
+        metrics_duration += metric_duration;
     }
     /* Free metrics index */
-    ret_Judy = JudyHSFreeArray(&pg_cache->metrics_index.JudyHS_array, PJE0);
+    metrics_index_bytes = JudyHSFreeArray(&pg_cache->metrics_index.JudyHS_array, PJE0);
     fatal_assert(NULL == pg_cache->metrics_index.JudyHS_array);
-    bytes_freed += ret_Judy;
 
-    info("Freed %lu bytes of memory from page cache.", bytes_freed);
+    if(!metrics_number) metrics_number = 1;
+    if(!pages_number) pages_number = 1;
+    if(!cache_pages_number) cache_pages_number = 1;
+    if(!points_in_db) points_in_db = 1;
+    if(latest_time_ut == oldest_time_ut) oldest_time_ut -= USEC_PER_SEC;
+
+    if(single_point_pages) {
+        long double avg_duration = (long double)seconds_in_db / points_in_db;
+        points_in_db += single_point_pages;
+        seconds_in_db += (size_t)(avg_duration * single_point_pages);
+    }
+
+    info("DBENGINE STATISTICS ON METRICS:"
+         " Metrics: %lu (structures %lu bytes - per metric %0.2f, index (HS) %lu bytes - per metric %0.2f bytes - duration %zu secs) |"
+         " Page descriptors: %lu (structures %lu bytes - per page %0.2f bytes, index (L) %lu bytes - per page %0.2f, dirty index %lu bytes). |"
+         " Page cache: %lu pages (structures %lu bytes - per page %0.2f bytes, data %lu bytes). |"
+         " Points in db %zu, uncompressed size of points database %zu bytes. |"
+         " Duration of all points %zu seconds, average point duration %0.2f seconds."
+         " Duration of the database %llu seconds, average metric duration %0.2f seconds, average metric lifetime %0.2f%%."
+         , metrics_number, metrics_bytes, (double)metrics_bytes/metrics_number, metrics_index_bytes, (double)metrics_index_bytes/metrics_number, metrics_duration
+         , pages_number, pages_bytes, (double)pages_bytes/pages_number, pages_index_bytes, (double)pages_index_bytes/pages_number, pages_dirty_index_bytes
+         , cache_pages_number, cache_pages_bytes, (double)cache_pages_bytes/cache_pages_number, cache_pages_data_bytes
+         , points_in_db, uncompressed_points_size
+         , seconds_in_db, (double)seconds_in_db/points_in_db
+         , (latest_time_ut - oldest_time_ut) / USEC_PER_SEC, (double)metrics_duration/metrics_number
+         , (double)metrics_duration/metrics_number * 100.0 / ((latest_time_ut - oldest_time_ut) / USEC_PER_SEC)
+         );
+
+    for(int i = 0; i < 256 ;i++) {
+        if(pages_count_per_type[i])
+            info("DBENGINE STATISTICS ON PAGE TYPES: page type %d total pages %lu, average page size %0.2f bytes", i, pages_count_per_type[i], (double)pages_size_per_type[i]/pages_count_per_type[i]);
+    }
 }

Some files were not shown because too many files changed in this diff