This alarm calculates the system load average
(CPU and I/O demand) over the period of five minutes. If you receive this alarm, it means that your system is "overloaded."
The alert gets raised into warning if the metric is 4 times the expected value and cleared if the value is 3.5 times the expected value.
For further information on how our alerts are calculated, please have a look at our Documentation.
The term system load average
on a Linux machine, measures the number of threads that are currently working and those waiting to work (CPU, disk, uninterruptible locks). So simply stated: System load average measures the number of threads that aren't idle.
Let's look at a single core CPU system and think of its core count as car lanes on a bridge. A car represents a process in this example:
So this is how you can imagine CPU load, but keep in mind that load average
counts also I/O demand, so there is an analogous example there.
First you need to check if you are running on a CPU load or an I/O load problem.
vmstat
(or vmstat 1
, to set a delay between updates in seconds):The procs
column, shows:
r: The number of runnable processes (running or waiting for run time).
b: The number of processes blocked waiting for I/O to complete.
ps
command:The grep
command will fetch the processes that their state code starts either with R (running or runnable (on run queue)) or D(uninterruptible sleep (usually IO)).
To see the processes that are the main CPU consumers, use the task manager program top
like this:
top -o +%CPU -i
Use iotop
:
iotop
is a useful tool, similar to top
, used to monitor Disk I/O usage, if you don't have it, then install it
sudo iotop
Minimize the load by closing any unnecessary main consumer processes. We strongly advise you to double-check if the process you want to close is necessary.