This alert is related to CockroachDB, a scalable and distributed SQL database. When you receive this alert, it means that there are under-replicated ranges in your database cluster. Under-replicated ranges can impact the availability and fault tolerance of your database, leading to potential data loss or unavailability in case of node failures.
In a CockroachDB cluster, data is split into small chunks called ranges. These ranges are then replicated across multiple nodes to ensure fault tolerance and high availability. The desired replication factor determines the number of replicas for each range.
When a range has fewer replicas than the desired replication factor, it is considered as "under-replicated". This situation can occur if nodes are unavailable or if the cluster is in the process of recovering from failures.
Access the Admin UI by navigating to the URL http://<any-node-ip>:8080
on any of your cluster nodes.
In the Admin UI, check the 'Under-replicated Ranges' metric on the main 'Dashboard' or 'Metrics' page.
Look for any error messages or issues that could be causing under-replication. For example, you may see errors related to node failures or network issues.
Make sure that all nodes in the cluster are running and healthy. You can do this by running the command cockroach node status
. Consider adding more nodes or increasing the capacity if your nodes are overworked.
Check your cluster's replication factor configuration to ensure it is set to an appropriate value. The default replication factor is 3, which can tolerate one failure. You can view and change it using the zone configurations
.
If specific nodes are causing under-replication, consider decommissioning them to allow the cluster to automatically rebalance the ranges. Follow the decommissioning guide in the CockroachDB documentation.