redis_master_link_down.md 2.0 KB

Understand the alert

The redis_master_link_down alert is triggered when there is a disconnection between a Redis master and its slave for more than 10 seconds. This alert indicates a potential problem with the replication process and can impact the data consistency across multiple instances.

Troubleshoot the alert

  1. Check the Redis logs

Examine the Redis logs for any errors or issues regarding the disconnection between the master and slave instances. By default, Redis log files are located at /var/log/redis/redis.log. Look for messages related to replication, network errors or timeouts.

   grep -i "replication" /var/log/redis/redis.log
   grep -i "timeout" /var/log/redis/redis.log
  1. Check the Redis replication status

Connect to the Redis master using the redis-cli tool, and execute the INFO command to get the detailed information about the master instance:

   redis-cli
   INFO REPLICATION

Also, check the replication status on the slave instance. If you have access to the IP address and port of the slave, connect to it and run the same INFO command.

  1. Verify the network connection between the master and slave instances

Test the network connectivity using ping and telnet or nc commands, ensuring that the connection between the master and slave instances is stable and there are no issues with firewalls or network policies.

   ping <slave_ip_address>
   telnet <slave_ip_address> <redis_port>
  1. Restart the Redis instances (if needed)

If Redis instances are experiencing issues or are unable to reconnect, consider restarting them. Be cautious as restarting instances might result in data loss or consistency issues.

   sudo systemctl restart redis
  1. Monitor the situation

After addressing the potential issues, keep an eye on the Redis instances to ensure that the problem doesn't reoccur.

Useful resources

  1. Redis Replication Documentation