optimize-the-netdata-agents-performance.md 4.6 KB

Agent Performance Optimization Guide

While Netdata Agents prioritize simplicity and out-of-the-box functionality, their default configuration focuses on comprehensive monitoring rather than performance optimization.

By default, Agents provide:

  • Automatic Application Discovery: Continuously detects and monitors applications running on your node without manual configuration.
  • Real-time Metric Collection: Collects metrics with one-second granularity.
  • Health Monitoring: Actively tracks the health status of your applications and system components with built-in alerting.
  • Machine Learning: Trains models for each metric to detect anomalies and unusual patterns in your system's behavior (Anomaly Detection).

Note

For details about Agent resource requirements, see Resource Utilization.

This document describes various strategies to optimize Netdata's performance for your specific monitoring needs.

Summary of performance optimizations

The following table summarizes the effect of each optimization on the CPU, RAM and Disk IO utilization in production.

Optimization CPU RAM Disk IO
Implement Centralization Points :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
Disable Plugins or Collectors :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:
Adjust data collection frequency :heavy_check_mark: :heavy_check_mark:
Optimize metric retention settings :heavy_check_mark: :heavy_check_mark:
Select appropriate Database Mode :heavy_check_mark: :heavy_check_mark:
Disable ML on Children :heavy_check_mark:

Implement Centralization Points

In production environments, use Parent nodes as centralization points to collect and aggregate data from Child nodes across your infrastructure. This architecture follows our recommended Centralization Points pattern.

Disable Plugins or Collectors

You can improve Agent performance by selectively disabling Plugins or Collectors that you don't need for your monitoring requirements.

Note

Inactive Plugins and Collectors automatically shut down and don't consume system resources. Performance benefits come only from disabling those that are actively collecting metrics.

For detailed instructions on managing Plugins and Collectors, see configuration guide.

Adjust Data Collection Frequency

One of the most effective ways to reduce the Agent's resource consumption is to modify its data collection frequency.

If you don't require per-second precision, or if your Agent is consuming excessive CPU during periods of low dashboard activity, you can reduce the collection frequency. This adjustment can significantly improve CPU utilization while maintaining meaningful monitoring capabilities.

Optimize Metric Retention Settings

You can reduce memory and disk usage by adjusting how long the Agent stores metrics.

Select Appropriate Database Mode

For IoT devices and Child nodes in Centralization Point setups, you can optimize performance by switching to RAM mode. This can significantly reduce resource usage while maintaining essential monitoring capabilities.

Disable Machine Learning on Children

For optimal resource allocation, we recommend running Machine Learning only on Parent nodes, or on systems with sufficient CPU and memory capacity.

To reduce resource usage on Child nodes or less powerful systems, you can disable ML by modifying netdata.conf using edit-config:

[ml]
   enabled = no

This configuration is particularly beneficial for Child nodes since their primary role is to collect and stream metrics to Parent nodes, where ML analysis can be performed centrally.