Netdata's CPU consumption is affected by the following factors:
On production systems, where Netdata is running with default settings, monitoring the system it is installed at and its containers and applications, CPU utilization should usually be about 1% to 5% of a single CPU core.
This includes 3 database tiers, machine learning, per-second data collection, alerts, and streaming to a Netdata Parent.
On Metrics Centralization Points, Netdata Parents running on modern server hardware, we estimate CPU utilization per million of samples collected per second:
Feature | Depends On | Expected Utilization | Key Reasons |
---|---|---|---|
Metrics Ingestion | Number of samples received per second | 2 CPU cores per million of samples per second | Decompress and decode received messages, update database. |
Metrics re-streaming | Number of samples resent per second | 2 CPU cores per million of samples per second | Encode and compress messages towards Netdata Parent. |
Machine Learning | Number of unique time-series concurrently collected | 2 CPU cores per million of unique metrics concurrently collected | Train machine learning models, query existing models to detect anomalies. |
We recommend keeping the total CPU utilization below 60% when a Netdata Parent is steadily ingesting metrics, training machine learning models and running health checks. This will leave enough CPU resources available for queries.
You can control Netdata's CPU utilization with these parameters:
dbengine
, which compresses and commits data to disk. If you have a Netdata Parent where metrics are aggregated and saved to disk and there is a reliable connection between the Netdata you want to optimize and its Parent, switch to database mode ram
or alloc
. This disables saving to disk, so your Netdata will also not use any disk I/O.When a Netdata Parent starts and Netdata children get connected to it, there are several operations that temporarily affect CPU utilization, network bandwidth and disk I/O.
The general flow looks like this:
tier2
and of the last minute of tier1
, ensuring that higher tiers reflect all the information tier0
has. If Netdata was stopped abnormally (e.g. due to a system failure or crash), higher tiers may have to be back-filled for longer durations.The above process is per metric. So, while one metric back-fills, another replicates and a third one streams.
At the same time:
So, when looking for the "steady CPU consumption during ingestion" of a busy Netdata Parent, we recommend to let it stabilize for a few hours before checking.
Keep in mind that Netdata has been designed so that even if during the initialization phase and the connection of hundreds of Netdata Children the system lacks CPU resources, the Netdata Parent will complete all the operations and eventually enter a steady CPU consumption during ingestion, without affecting the quality of the metrics stored. So, it is ok if during initialization of a busy Netdata Parent, CPU consumption spikes to 100%.
Important: the above initialization process is not such intense when new nodes get connected to a Netdata Parent for the first time (e.g. ephemeral nodes), since several of the steps involved are not required.
Especially for the cases where children disconnect and reconnect to the Parent due to network related issues (i.e. both the Netdata Child and the Netdata Parent have not been restarted and less than 1 hour has passed since the last disconnection), the re-negotiation phase is minimal and metrics are instantly entering the normal streaming phase.