|
@@ -6,74 +6,114 @@ custom_edit_url: https://github.com/netdata/netdata/edit/master/database/engine/
|
|
|
|
|
|
# Database engine
|
|
|
|
|
|
-The Database Engine works like a traditional database. It dedicates a certain amount of RAM to data caching and
|
|
|
-indexing, while the rest of the data resides compressed on disk. Unlike other [database modes](/database/README.md), the
|
|
|
-amount of historical metrics stored is based on the amount of disk space you allocate and the effective compression
|
|
|
+The Database Engine works like a traditional time series database. Unlike other [database modes](/database/README.md),
|
|
|
+the amount of historical metrics stored is based on the amount of disk space you allocate and the effective compression
|
|
|
ratio, not a fixed number of metrics collected.
|
|
|
|
|
|
-By using both RAM and disk space, the database engine allows for long-term storage of per-second metrics inside of the
|
|
|
-Agent itself.
|
|
|
+## Tiering
|
|
|
|
|
|
-In addition, the dbengine is the only mode that supports changing the data collection update frequency
|
|
|
-(`update every`) without losing the metrics your Agent already gathered and stored.
|
|
|
+Tiering is a mechanism of providing multiple tiers of data with
|
|
|
+different [granularity on metrics](/docs/store/distributed-data-architecture.md#granularity-of-metrics).
|
|
|
|
|
|
-## Configuration
|
|
|
+For Netdata Agents with version `netdata-1.35.0.138.nightly` and greater, `dbengine` supports Tiering, allowing almost
|
|
|
+unlimited retention of data.
|
|
|
|
|
|
-To use the database engine, open `netdata.conf` and set `[db].mode` to `dbengine`.
|
|
|
|
|
|
-```conf
|
|
|
+### Metric size
|
|
|
+
|
|
|
+Every Tier down samples the exact lower tier (lower tiers have greater resolution). You can have up to 5
|
|
|
+Tiers **[0. . 4]** of data (including the Tier 0, which has the highest resolution)
|
|
|
+
|
|
|
+Tier 0 is the default that was always available in `dbengine` mode. Tier 1 is the first level of aggregation, Tier 2 is
|
|
|
+the second, and so on.
|
|
|
+
|
|
|
+Metrics on all tiers except of the _Tier 0_ also store the following five additional values for every point for accurate
|
|
|
+representation:
|
|
|
+
|
|
|
+1. The `sum` of the points aggregated
|
|
|
+2. The `min` of the points aggregated
|
|
|
+3. The `max` of the points aggregated
|
|
|
+4. The `count` of the points aggregated (could be constant, but it may not be due to gaps in data collection)
|
|
|
+5. The `anomaly_count` of the points aggregated (how many of the aggregated points found anomalous)
|
|
|
+
|
|
|
+Among `min`, `max` and `sum`, the correct value is chosen based on the user query. `average` is calculated on the fly at
|
|
|
+query time.
|
|
|
+
|
|
|
+### Tiering in a nutshell
|
|
|
+
|
|
|
+The `dbengine` is capable of retaining metrics for years. To further understand the `dbengine` tiering mechanism let's
|
|
|
+explore the following configuration.
|
|
|
+
|
|
|
+```
|
|
|
[db]
|
|
|
mode = dbengine
|
|
|
+
|
|
|
+ # per second data collection
|
|
|
+ update every = 1
|
|
|
+
|
|
|
+ # enables Tier 1 and Tier 2, Tier 0 is always enabled in dbengine mode
|
|
|
+ storage tiers = 3
|
|
|
+
|
|
|
+ # Tier 0, per second data for a week
|
|
|
+ dbengine multihost disk space MB = 1100
|
|
|
+
|
|
|
+ # Tier 1, per minute data for a month
|
|
|
+ dbengine tier 1 multihost disk space MB = 330
|
|
|
+
|
|
|
+ # Tier 2, per hour data for a year
|
|
|
+ dbengine tier 2 multihost disk space MB = 67
|
|
|
```
|
|
|
|
|
|
-To configure the database engine, look for the `dbengine page cache size MB` and `dbengine multihost disk space MB` settings in the
|
|
|
-`[db]` section of your `netdata.conf`. The Agent ignores the `[db].retention` setting when using the dbengine.
|
|
|
+For 2000 metrics, collected every second and retained for a week, Tier 0 needs: 1 byte x 2000 metrics x 3600 secs per
|
|
|
+hour x 24 hours per day x 7 days per week = 1100MB.
|
|
|
|
|
|
-```conf
|
|
|
-[db]
|
|
|
- dbengine page cache size MB = 32
|
|
|
- dbengine multihost disk space MB = 256
|
|
|
-```
|
|
|
+By setting `dbengine multihost disk space MB` to `1100`, this node will start maintaining about a week of data. But pay
|
|
|
+attention to the number of metrics. If you have more than 2000 metrics on a node, or you need more that a week of high
|
|
|
+resolution metrics, you may need to adjust this setting accordingly.
|
|
|
+
|
|
|
+Tier 1 is by default sampling the data every **60 points of Tier 0**. In our case, Tier 0 is per second, if we want to
|
|
|
+transform this information in terms of time then the Tier 1 "resolution" is per minute.
|
|
|
+
|
|
|
+Tier 1 needs four times more storage per point compared to Tier 0. So, for 2000 metrics, with per minute resolution,
|
|
|
+retained for a month, Tier 1 needs: 4 bytes x 2000 metrics x 60 minutes per hour x 24 hours per day x 30 days per month
|
|
|
+= 330MB.
|
|
|
+
|
|
|
+Tier 2 is by default sampling data every 3600 points of Tier 0 (60 of Tier 1, which is the previous exact Tier). Again
|
|
|
+in term of "time" (Tier 0 is per second), then Tier 2 is per hour.
|
|
|
+
|
|
|
+The storage requirements are the same to Tier 1.
|
|
|
|
|
|
-The above values are the default values for Page Cache size and DB engine disk space quota.
|
|
|
+For 2000 metrics, with per hour resolution, retained for a year, Tier 2 needs: 4 bytes x 2000 metrics x 24 hours per day
|
|
|
+x 365 days per year = 67MB.
|
|
|
|
|
|
-The `dbengine page cache size MB` option determines the amount of RAM dedicated to caching Netdata metric values. The
|
|
|
-actual page cache size will be slightly larger than this figure—see the [memory requirements](#memory-requirements)
|
|
|
-section for details.
|
|
|
+## Legacy configuration
|
|
|
|
|
|
-The `dbengine multihost disk space MB` option determines the amount of disk space that is dedicated to storing
|
|
|
-Netdata metric values and all related metadata describing them. You can use the [**database engine
|
|
|
-calculator**](/docs/store/change-metrics-storage.md#calculate-the-system-resources-ram-disk-space-needed-to-store-metrics)
|
|
|
-to correctly set `dbengine multihost disk space MB` based on your metrics retention policy. The calculator gives an
|
|
|
-accurate estimate based on how many child nodes you have, how many metrics your Agent collects, and more.
|
|
|
+### v1.35.1 and prior
|
|
|
|
|
|
-### Legacy configuration
|
|
|
+These versions of the Agent do not support [Tiering](#Tiering). You could change the metric retention for the parent and
|
|
|
+all of its children only with the `dbengine multihost disk space MB` setting. This setting accounts the space allocation
|
|
|
+for the parent node and all of its children.
|
|
|
|
|
|
-The deprecated `dbengine disk space MB` option determines the amount of disk space that is dedicated to storing
|
|
|
-Netdata metric values per legacy database engine instance (see [details on the legacy mode](#legacy-mode) below).
|
|
|
+To configure the database engine, look for the `page cache size MB` and `dbengine multihost disk space MB` settings in
|
|
|
+the `[db]` section of your `netdata.conf`.
|
|
|
|
|
|
```conf
|
|
|
[db]
|
|
|
- dbengine disk space MB = 256
|
|
|
+ dbengine page cache size MB = 32
|
|
|
+ dbengine multihost disk space MB = 256
|
|
|
```
|
|
|
|
|
|
-### Streaming metrics to the database engine
|
|
|
-
|
|
|
-When using the multihost database engine, all parent and child nodes share the same `dbengine page cache size MB` and `dbengine
|
|
|
-multihost disk space MB` in a single dbengine instance. The [**database engine
|
|
|
-calculator**](/docs/store/change-metrics-storage.md#calculate-the-system-resources-ram-disk-space-needed-to-store-metrics)
|
|
|
-helps you properly set `dbengine page cache size MB` and `dbengine multihost disk space MB` on your parent node to allocate enough
|
|
|
-resources based on your metrics retention policy and how many child nodes you have.
|
|
|
-
|
|
|
-#### Legacy mode
|
|
|
+### v1.23.2 and prior
|
|
|
|
|
|
_For Netdata Agents earlier than v1.23.2_, the Agent on the parent node uses one dbengine instance for itself, and
|
|
|
another instance for every child node it receives metrics from. If you had four streaming nodes, you would have five
|
|
|
instances in total (`1 parent + 4 child nodes = 5 instances`).
|
|
|
|
|
|
-The Agent allocates resources for each instance separately using the `dbengine disk space MB` (**deprecated**) setting. If
|
|
|
-`dbengine disk space MB`(**deprecated**) is set to the default `256`, each instance is given 256 MiB in disk space, which
|
|
|
-means the total disk space required to store all instances is, roughly, `256 MiB * 1 parent * 4 child nodes = 1280 MiB`.
|
|
|
+The Agent allocates resources for each instance separately using the `dbengine disk space MB` (**deprecated**) setting.
|
|
|
+If
|
|
|
+`dbengine disk space MB`(**deprecated**) is set to the default `256`, each instance is given 256 MiB in disk space,
|
|
|
+which means the total disk space required to store all instances is,
|
|
|
+roughly, `256 MiB * 1 parent * 4 child nodes = 1280 MiB`.
|
|
|
|
|
|
#### Backward compatibility
|
|
|
|
|
@@ -90,41 +130,44 @@ Agent.
|
|
|
For more information about setting `[db].mode` on your nodes, in addition to other streaming configurations, see
|
|
|
[streaming](/streaming/README.md).
|
|
|
|
|
|
-### Memory requirements
|
|
|
+## Requirements & limitations
|
|
|
+
|
|
|
+### Memory
|
|
|
|
|
|
Using database mode `dbengine` we can overcome most memory restrictions and store a dataset that is much larger than the
|
|
|
available memory.
|
|
|
|
|
|
There are explicit memory requirements **per** DB engine **instance**:
|
|
|
|
|
|
-- The total page cache memory footprint will be an additional `#dimensions-being-collected x 4096 x 2` bytes over what
|
|
|
- the user configured with `dbengine page cache size MB`.
|
|
|
+- The total page cache memory footprint will be an additional `#dimensions-being-collected x 4096 x 2` bytes over what
|
|
|
+ the user configured with `dbengine page cache size MB`.
|
|
|
+
|
|
|
|
|
|
-- an additional `#pages-on-disk x 4096 x 0.03` bytes of RAM are allocated for metadata.
|
|
|
+- an additional `#pages-on-disk x 4096 x 0.03` bytes of RAM are allocated for metadata.
|
|
|
|
|
|
- - roughly speaking this is 3% of the uncompressed disk space taken by the DB files.
|
|
|
+ - roughly speaking this is 3% of the uncompressed disk space taken by the DB files.
|
|
|
|
|
|
- - for very highly compressible data (compression ratio > 90%) this RAM overhead is comparable to the disk space
|
|
|
- footprint.
|
|
|
+ - for very highly compressible data (compression ratio > 90%) this RAM overhead is comparable to the disk space
|
|
|
+ footprint.
|
|
|
|
|
|
An important observation is that RAM usage depends on both the `page cache size` and the `dbengine multihost disk space`
|
|
|
options.
|
|
|
|
|
|
-You can use our [database engine
|
|
|
-calculator](/docs/store/change-metrics-storage.md#calculate-the-system-resources-ram-disk-space-needed-to-store-metrics)
|
|
|
+You can use
|
|
|
+our [database engine calculator](/docs/store/change-metrics-storage.md#calculate-the-system-resources-ram-disk-space-needed-to-store-metrics)
|
|
|
to validate the memory requirements for your particular system(s) and configuration (**out-of-date**).
|
|
|
|
|
|
-### Disk space requirements
|
|
|
+### Disk space
|
|
|
|
|
|
There are explicit disk space requirements **per** DB engine **instance**:
|
|
|
|
|
|
-- The total disk space footprint will be the maximum between `#dimensions-being-collected x 4096 x 2` bytes or what
|
|
|
- the user configured with `dbengine multihost disk space` or `dbengine disk space`.
|
|
|
+- The total disk space footprint will be the maximum between `#dimensions-being-collected x 4096 x 2` bytes or what the
|
|
|
+ user configured with `dbengine multihost disk space` or `dbengine disk space`.
|
|
|
|
|
|
-### File descriptor requirements
|
|
|
+### File descriptor
|
|
|
|
|
|
-The Database Engine may keep a **significant** amount of files open per instance (e.g. per streaming child or
|
|
|
-parent server). When configuring your system you should make sure there are at least 50 file descriptors available per
|
|
|
+The Database Engine may keep a **significant** amount of files open per instance (e.g. per streaming child or parent
|
|
|
+server). When configuring your system you should make sure there are at least 50 file descriptors available per
|
|
|
`dbengine` instance.
|
|
|
|
|
|
Netdata allocates 25% of the available file descriptors to its Database Engine instances. This means that only 25% of
|
|
@@ -148,7 +191,7 @@ ulimit -n 65536
|
|
|
```
|
|
|
|
|
|
at the beginning of the service file. Alternatively you can change the system-wide limits of the kernel by changing
|
|
|
- `/etc/sysctl.conf`. For linux that would be:
|
|
|
+`/etc/sysctl.conf`. For linux that would be:
|
|
|
|
|
|
```conf
|
|
|
fs.file-max = 65536
|
|
@@ -165,8 +208,8 @@ You can apply the settings by running `sysctl -p` or by rebooting.
|
|
|
|
|
|
## Files
|
|
|
|
|
|
-With the DB engine mode the metric data are stored in database files. These files are organized in pairs, the
|
|
|
-datafiles and their corresponding journalfiles, e.g.:
|
|
|
+With the DB engine mode the metric data are stored in database files. These files are organized in pairs, the datafiles
|
|
|
+and their corresponding journalfiles, e.g.:
|
|
|
|
|
|
```sh
|
|
|
datafile-1-0000000001.ndf
|
|
@@ -191,15 +234,16 @@ storage at lower granularity.
|
|
|
The DB engine stores chart metric values in 4096-byte pages in memory. Each chart dimension gets its own page to store
|
|
|
consecutive values generated from the data collectors. Those pages comprise the **Page Cache**.
|
|
|
|
|
|
-When those pages fill up they are slowly compressed and flushed to disk. It can take `4096 / 4 = 1024 seconds = 17
|
|
|
-minutes`, for a chart dimension that is being collected every 1 second, to fill a page. Pages can be cut short when we
|
|
|
-stop Netdata or the DB engine instance so as to not lose the data. When we query the DB engine for data we trigger disk
|
|
|
-read I/O requests that fill the Page Cache with the requested pages and potentially evict cold (not recently used)
|
|
|
-pages.
|
|
|
+When those pages fill up, they are slowly compressed and flushed to disk. It can
|
|
|
+take `4096 / 4 = 1024 seconds = 17 minutes`, for a chart dimension that is being collected every 1 second, to fill a
|
|
|
+page. Pages can be cut short when we stop Netdata or the DB engine instance so as to not lose the data. When we query
|
|
|
+the DB engine for data we trigger disk read I/O requests that fill the Page Cache with the requested pages and
|
|
|
+potentially evict cold (not recently used)
|
|
|
+pages.
|
|
|
|
|
|
When the disk quota is exceeded the oldest values are removed from the DB engine at real time, by automatically deleting
|
|
|
the oldest datafile and journalfile pair. Any corresponding pages residing in the Page Cache will also be invalidated
|
|
|
-and removed. The DB engine logic will try to maintain between 10 and 20 file pairs at any point in time.
|
|
|
+and removed. The DB engine logic will try to maintain between 10 and 20 file pairs at any point in time.
|
|
|
|
|
|
The Database Engine uses direct I/O to avoid polluting the OS filesystem caches and does not generate excessive I/O
|
|
|
traffic so as to create the minimum possible interference with other applications.
|
|
@@ -214,19 +258,19 @@ Constellation ES.3 2TB magnetic HDD and a SAMSUNG MZQLB960HAJR-00007 960GB NAND
|
|
|
For our workload, we defined 32 charts with 128 metrics each, giving us a total of 4096 metrics. We defined 1 worker
|
|
|
thread per chart (32 threads) that generates new data points with a data generation interval of 1 second. The time axis
|
|
|
of the time-series is emulated and accelerated so that the worker threads can generate as many data points as possible
|
|
|
-without delays.
|
|
|
+without delays.
|
|
|
|
|
|
-We also defined 32 worker threads that perform queries on random metrics with semi-random time ranges. The
|
|
|
-starting time of the query is randomly selected between the beginning of the time-series and the time of the latest data
|
|
|
-point. The ending time is randomly selected between 1 second and 1 hour after the starting time. The pseudo-random
|
|
|
-numbers are generated with a uniform distribution.
|
|
|
+We also defined 32 worker threads that perform queries on random metrics with semi-random time ranges. The starting time
|
|
|
+of the query is randomly selected between the beginning of the time-series and the time of the latest data point. The
|
|
|
+ending time is randomly selected between 1 second and 1 hour after the starting time. The pseudo-random numbers are
|
|
|
+generated with a uniform distribution.
|
|
|
|
|
|
The data are written to the database at the same time as they are read from it. This is a concurrent read/write mixed
|
|
|
-workload with a duration of 60 seconds. The faster `dbengine` runs, the bigger the dataset size becomes since more
|
|
|
-data points will be generated. We set a page cache size of 64MiB for the two disk-bound scenarios. This way, the dataset
|
|
|
-size of the metric data is much bigger than the RAM that is being used for caching so as to trigger I/O requests most
|
|
|
-of the time. In our final scenario, we set the page cache size to 16 GiB. That way, the dataset fits in the page cache
|
|
|
-so as to avoid all disk bottlenecks.
|
|
|
+workload with a duration of 60 seconds. The faster `dbengine` runs, the bigger the dataset size becomes since more data
|
|
|
+points will be generated. We set a page cache size of 64MiB for the two disk-bound scenarios. This way, the dataset size
|
|
|
+of the metric data is much bigger than the RAM that is being used for caching so as to trigger I/O requests most of the
|
|
|
+time. In our final scenario, we set the page cache size to 16 GiB. That way, the dataset fits in the page cache so as to
|
|
|
+avoid all disk bottlenecks.
|
|
|
|
|
|
The reported numbers are the following:
|
|
|
|
|
@@ -237,15 +281,15 @@ The reported numbers are the following:
|
|
|
| N/A | 16 GiB | 6.8 GiB | 118.2M | 30.2M |
|
|
|
|
|
|
where "reads/sec" is the number of metric data points being read from the database via its API per second and
|
|
|
-"writes/sec" is the number of metric data points being written to the database per second.
|
|
|
+"writes/sec" is the number of metric data points being written to the database per second.
|
|
|
|
|
|
Notice that the HDD numbers are pretty high and not much slower than the SSD numbers. This is thanks to the database
|
|
|
engine design being optimized for rotating media. In the database engine disk I/O requests are:
|
|
|
|
|
|
-- asynchronous to mask the high I/O latency of HDDs.
|
|
|
-- mostly large to reduce the amount of HDD seeking time.
|
|
|
-- mostly sequential to reduce the amount of HDD seeking time.
|
|
|
-- compressed to reduce the amount of required throughput.
|
|
|
+- asynchronous to mask the high I/O latency of HDDs.
|
|
|
+- mostly large to reduce the amount of HDD seeking time.
|
|
|
+- mostly sequential to reduce the amount of HDD seeking time.
|
|
|
+- compressed to reduce the amount of required throughput.
|
|
|
|
|
|
As a result, the HDD is not thousands of times slower than the SSD, which is typical for other workloads.
|
|
|
|