This alert triggers when there is a leadership transition
in the Consul
service mesh. If you receive this alert, it means that server ${label:node_name}
in datacenter ${label:datacenter}
has become the new leader.
Consul is a service mesh solution that provides service discovery, configuration, and segmentation functionality. It uses the Raft consensus algorithm to maintain a consistent data state across the cluster. A leadership transition occurs when the current leader node loses its leadership status and a different node takes over.
Leadership transitions in Consul can be caused by various reasons, such as:
Frequent leadership transitions may lead to service disruptions, increased latency, and reduced availability. Therefore, it's essential to identify and resolve the root cause promptly.
Check the Consul logs for indications of network issues or node failures:
journalctl -u consul.service
Alternatively, you can check the Consul log file, which is usually located at /var/log/consul/consul.log
.
Inspect the health and status of the Consul cluster using the consul members
command:
consul members
This command lists all cluster members and their roles, including the new leader node.
Determine if there's high resource usage on the affected nodes by monitoring CPU, memory, and disk usage:
top
Examine network connectivity between nodes using tools like ping
, traceroute
, or mtr
.
If the transitions are forced by operators, review the changes made and their impact on the cluster.
Consider increasing the heartbeat timeout configuration to allow the leader more time to respond, especially if high resource usage is causing frequent leadership transitions.
Review Consul's documentation on consensus and leadership and operation and maintenance to gain insights into best practices and ways to mitigate leadership transitions.