Netdata Cloud On-Prem is an enterprise-grade monitoring solution that relies on several infrastructure components:
These components should be monitored and managed according to your organization's established practices and requirements.
When charts take a long time to load or fail with errors, the issue typically stems from data collection challenges. The charts
service must gather data from multiple Agents within a Room, requiring successful responses from all queried Agents.
Issue | Symptoms | Cause | Solution |
---|---|---|---|
Agent Connectivity | - Queries stall or timeout - Inconsistent chart loading |
Slow Agents or unreliable network connections prevent timely data collection | Deploy additional Parent nodes to provide reliable backends. The system will automatically prefer these for queries when available |
Kubernetes Resources | - Service throttling - Slow data processing - Delayed dashboard updates |
Resource saturation at the node level or restrictive container limits | Review and adjust container resource limits and node capacity as needed |
Database Performance | - Slow query responses - Increased latency across services |
PostgreSQL performance bottlenecks | Monitor and optimize database resource utilization: - CPU usage - Memory allocation - Disk I/O performance |
Message Broker | - Delayed node status updates (online/offline/stale) - Slow alert transitions - Dashboard update delays |
Message accumulation in Pulsar due to processing bottlenecks | - Review Pulsar configuration - Adjust microservice resource allocation - Monitor message processing rates |