Netdata’s tiered storage is designed to efficiently retain metric data and metadata for long periods. However, when extreme cardinality occurs—often unintentionally through misconfigurations or inadvertent practices (e.g., spawning many short-lived docker containers or using unbounded label values)—the long-term retention of metadata can lead to excessive resource consumption.
To protect Netdata from extreme cardinality, Netdata has an automated protection. This document explains why this protection is needed, how it works, how to configure it, and how to verify its operation.
Extreme cardinality refers to the explosion in the number of unique time series generated when metrics are combined with a wide range of labels or dimensions. In modern observability platforms like Netdata, metrics aren’t just simple numeric values—they come with metadata (labels, tags, dimensions) that help contextualize the data. When these labels are overly dynamic or unbounded (for example, when using unique identifiers such as session IDs, user IDs, or ephemeral container names), combined with a very long retention, like the one provided by Netdata, the system ends up tracking an enormous number of unique series.
Despite the fact that Netdata performs better than most other observability solution, extreme cardinality has a few implications:
Metrics ephemerality is the percentage of metrics that is no longer actively collected (old) compared to the total metrics available (sum of currently collected metrics and old metrics).
High Ephemerality (close to 100%): The system frequently generates new unique metrics for a short period, indicating a high turnover in metrics. Low Ephemerality (close to 0%): The system maintains a stable set of metrics over time, with little change in the total number of unique series.
The mechanism kicks in during tier0 (high-resolution) database rotations (i.e., when the oldest tier0 samples are deleted) and proceeds as follows:
Counting Instances with Zero Tier0 Retention:
Threshold Verification:
Forceful Clearing in Long-Term Storage:
Retention Rules:
You can control the protection mechanism via the following settings in the netdata.conf
file under the [db]
section:
[db]
extreme cardinality protection = yes
extreme cardinality keep instances = 1000
extreme cardinality min ephemerality = 50
extreme cardinality keep instances:
The minimum number of instances per context that should be kept. The default value is 1000.
extreme cardinality min ephemerality:
The minimum percentage (in percent) of instances in a context that have zero tier0 retention to trigger the cleanup. The default value is 50%.
Recommendations:
When the protection mechanism is activated, Netdata logs a detailed message. The log entry includes:
EXTREME CARDINALITY PROTECTION: on host '<HOST>', for context '<CONTEXT>': forcefully cleared the retention of <METRICS_COUNT> metrics and <INSTANCES_COUNT> instances, having non-tier0 retention from <START_TIME> to <END_TIME>.
This log message is tagged with the following message ID for easy identification:
MESSAGE_ID=d1f59606dd4d41e3b217a0cfcae8e632
Using System Logs:
You can use journalctl
(or your system’s log viewer) to search for the message ID:
journalctl --namespace=netdata MESSAGE_ID=d1f59606dd4d41e3b217a0cfcae8e632
Netdata Logs Dashboard:
Navigate to the Netdata Logs dashboard. On the right side under MESSAGE_ID
, select "Netdata extreme cardinality" to filter only those messages.
The extreme cardinality protection mechanism in Netdata is designed to automatically safeguard your system against the potential issues caused by excessive metric metadata retention. It does so by:
By properly configuring tier0 and adjusting the extreme cardinality
settings in netdata.conf
, you can ensure that your system remains both efficient and protected, even when extreme cardinality issues occur.