Netdata Cloud On-Prem Troubleshooting

Netdata Cloud On-Prem is an enterprise-grade monitoring solution that relies on several infrastructure components:

Databases: PostgreSQL, Redis, Elasticsearch
Message Brokers: Pulsar, EMQX
Traffic Controllers: Ingress, Traefik
Kubernetes Cluster

These components should be monitored and managed according to your organization's established practices and requirements.

Common Issues

Slow Chart Loading or Chart Errors

When charts take a long time to load or fail with errors, the issue typically stems from data collection challenges. The charts service must gather data from multiple Agents within a Room, requiring successful responses from all queried Agents.

Issue	Symptoms	Cause	Solution
Agent Connectivity	- Queries stall or timeout - Inconsistent chart loading	Slow Agents or unreliable network connections prevent timely data collection	Deploy additional Parent nodes to provide reliable backends. The system will automatically prefer these for queries when available
Kubernetes Resources	- Service throttling - Slow data processing - Delayed dashboard updates	Resource saturation at the node level or restrictive container limits	Review and adjust container resource limits and node capacity as needed
Database Performance	- Slow query responses - Increased latency across services	PostgreSQL performance bottlenecks	Monitor and optimize database resource utilization: - CPU usage - Memory allocation - Disk I/O performance
Message Broker	- Delayed node status updates (online/offline/stale) - Slow alert transitions - Dashboard update delays	Message accumulation in Pulsar due to processing bottlenecks	- Review Pulsar configuration - Adjust microservice resource allocation - Monitor message processing rates

troubleshooting.md 3.1 KB History Raw

Netdata Cloud On-Prem Troubleshooting

Common Issues

Slow Chart Loading or Chart Errors

troubleshooting.md 3.1 KB

History Raw