Plugin: proc.plugin Module: /sys/devices/system/edac/mc
The Error Detection and Correction (EDAC) subsystem is detecting and reporting errors in the system's memory, primarily ECC (Error-Correcting Code) memory errors.
The collector provides data for:
Per memory controller (MC): correctable and uncorrectable errors. These can be of 2 kinds:
Per memory DIMM: correctable and uncorrectable errors. There are 2 kinds:
This collector is supported on all platforms.
This collector supports collecting metrics from multiple instances of this integration, including remote instances.
This integration doesn't support auto-detection.
The default configuration for this integration does not impose any limits on data collection.
The default configuration for this integration is not expected to impose a significant performance impact on the system.
Metrics grouped by scope.
The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.
These metrics refer to the memory controller.
Labels:
Label | Description |
---|---|
controller | mcX directory name of this memory controller. |
mc_name | Memory controller type. |
size_mb | The amount of memory in megabytes that this memory controller manages. |
max_location | Last available memory slot in this memory controller. |
Metrics:
Metric | Dimensions | Unit |
---|---|---|
mem.edac_mc | correctable, uncorrectable, correctable_noinfo, uncorrectable_noinfo | errors/s |
These metrics refer to the memory module (or rank, depends on the memory controller).
Labels:
Label | Description |
---|---|
controller | mcX directory name of this memory controller. |
dimm | dimmX or rankX directory name of this memory module. |
dimm_dev_type | Type of DRAM device used in this memory module. For example, x1, x2, x4, x8. |
dimm_edac_mode | Used type of error detection and correction. For example, S4ECD4ED would mean a Chipkill with x4 DRAM. |
dimm_label | Label assigned to this memory module. |
dimm_location | Location of the memory module. |
dimm_mem_type | Type of the memory module. |
size | The amount of memory in megabytes that this memory module manages. |
Metrics:
Metric | Dimensions | Unit |
---|---|---|
mem.edac_mc | correctable, uncorrectable | errors/s |
The following alerts are available:
Alert name | On metric | Description |
---|---|---|
ecc_memory_mc_noinfo_correctable | mem.edac_mc | memory controller ${label:controller} ECC correctable errors (unknown DIMM slot) in the last 10 minutes |
ecc_memory_mc_noinfo_uncorrectable | mem.edac_mc | memory controller ${label:controller} ECC uncorrectable errors (unknown DIMM slot) in the last 10 minutes |
ecc_memory_dimm_correctable | mem.edac_mc_dimm | DIMM ${label:dimm} controller ${label:controller} (location ${label:dimm_location}) ECC correctable errors in the last 10 minutes |
ecc_memory_dimm_uncorrectable | mem.edac_mc_dimm | DIMM ${label:dimm} controller ${label:controller} (location ${label:dimm_location}) ECC uncorrectable errors in the last 10 minutes |
No action required.
There is no configuration file.
There are no configuration options.
There are no configuration examples.