This alert is triggered when an NVMe device
experiences critical warnings
. The alert is focusing on your NVMe
(Non-Volatile Memory Express) SSD storage device which is designed for high-performance and low-latency storage.
A critical warning state indicates that the NVMe device has experienced an event, error, or condition which could negatively impact performance, data integrity or device longevity. This could result from a variety of reasons such as high temperature, hardware failures, internal errors, or device reaching end of life.
This alert provides information in the info
field about the affected device. It should look like: "NVMe device ${label:device} has critical warnings", where ${label:device}
will be replaced with the actual device name.
SMART
(Self-Monitoring, Analysis, and Reporting Technology) provides detailed information about the current health and performance of your NVMe device. To check SMART information for the affected NVMe device, use smartctl
command:
sudo smartctl -a /dev/nvme0n1
Replace /dev/nvme0n1
with the actual device name identified in step 1.
Review the output of the smartctl
command to identify the critical warnings or any other concerning attributes. You might see high temperature, high uncorrectable error counts, or high percent of used endurance. These values might help you diagnose the issue with your NVMe device.
Make sure to replace, stop the usage of, or seek support for the problematic NVMe device(s) depending on the analysis.