Просмотр исходного кода

multi-threaded version of freeipmi.plugin (#15327)

* multi-threaded version of freeipmi.plugin

* fix type check

* debug info

* debug info

* updated should be smaller, not bigger

* ignore sensors without name

* variable data collection frequencies for sensors and sel; also respect the min data collection frequency

* reorg and code cleanup

* collect states even for unknown units and empty names

* render all sensors

* reset unknown state sensors

* ignore sensors without name

* added component fan

* Update ipmi.conf

* added label type

* remove global state counters and chart

* updated copyright notice

* remove unused struct members

* remove unused variable

* added a log line everytime the plugin decides to exit to show what was wrong

* reworked freeipmi for optimal performance

* disabled debugging and fixed bug

* added debug

* added debug

* added debug

* removed debugging info

* cleanup and final touches

* let fan metrics be categorized by the component they are cooling

* added plugin and module to charts

* more component matches

* code cleanup, sel should now be a lot faster

* make sel min collection time 30 secs

* more component matches; refreshed functions copied from freeipmi codebase

* add keepalive to avoid parser read timeout during ipmi_detect_speed_secs

* ipmi.fan_speed => ipmi.sensor_fan_speed

* update metrics csv and readme

* ok newline

---------

Co-authored-by: Ilya Mashchenko <ilya@netdata.cloud>
Costa Tsaousis 1 год назад
Родитель
Сommit
eb6f1de7c6

+ 111 - 30
collectors/freeipmi.plugin/README.md

@@ -11,7 +11,10 @@ learn_rel_path: "Integrations/Monitor/Devices"
 
 Netdata has a [freeipmi](https://www.gnu.org/software/freeipmi/) plugin.
 
-> FreeIPMI provides in-band and out-of-band IPMI software based on the IPMI v1.5/2.0 specification. The IPMI specification defines a set of interfaces for platform management and is implemented by a number vendors for system management. The features of IPMI that most users will be interested in are sensor monitoring, system event monitoring, power control, and serial-over-LAN (SOL).
+> FreeIPMI provides in-band and out-of-band IPMI software based on the IPMI v1.5/2.0 specification. The IPMI
+> specification defines a set of interfaces for platform management and is implemented by a number vendors for system
+> management. The features of IPMI that most users will be interested in are sensor monitoring, system event monitoring,
+> power control, and serial-over-LAN (SOL).
 
 ## Installing the FreeIPMI plugin
 
@@ -22,7 +25,8 @@ installed automatically due to the large number of dependencies it requires.
 When using a static build of Netdata, the FreeIPMI plugin will be included and installed automatically, though
 you will still need to have FreeIPMI installed on your system to be able to use the plugin.
 
-When using a local build of Netdata, you need to ensure that the FreeIPMI development packages (typically called `libipmimonitoring-dev`, `libipmimonitoring-devel`, or `freeipmi-devel`) are installed when building Netdata.
+When using a local build of Netdata, you need to ensure that the FreeIPMI development packages (typically
+called `libipmimonitoring-dev`, `libipmimonitoring-devel`, or `freeipmi-devel`) are installed when building Netdata.
 
 ### Special Considerations
 
@@ -30,7 +34,9 @@ Accessing IPMI requires root access, so the FreeIPMI plugin is automatically ins
 
 FreeIPMI does not work correctly on IBM POWER systems, thus Netdata’s FreeIPMI plugin is not usable on such systems.
 
-If you have not previously used IPMI on your system, you will probably need to run the `ipmimonitoring` command as root to initiailze IPMI settings so that the Netdata plugin works correctly. It should return information about available seensors on the system.
+If you have not previously used IPMI on your system, you will probably need to run the `ipmimonitoring` command as root
+to initiailze IPMI settings so that the Netdata plugin works correctly. It should return information about available
+seensors on the system.
 
 In some distributions `libipmimonitoring.pc` is located in a non-standard directory, which
 can cause building the plugin to fail when building Netdata from source. In that case you
@@ -38,37 +44,68 @@ should find the file and link it to the standard pkg-config directory. Usually,
 /usr/lib/$(uname -m)-linux-gnu/pkgconfig/libipmimonitoring.pc/libipmimonitoring.pc /usr/lib/pkgconfig/libipmimonitoring.pc`
 resolves this issue.
 
-## Netdata use
+## Metrics
 
-The plugin creates (up to) 8 charts, based on the information collected from IPMI:
+The plugin does a speed test when it starts, to find out the duration needed by the IPMI processor to respond. Depending
+on the speed of your IPMI processor, charts may need several seconds to show up on the dashboard.
 
-1.  number of sensors by state
-2.  number of events in SEL
-3.  Temperatures CELSIUS
-4.  Temperatures FAHRENHEIT
-5.  Voltages
-6.  Currents
-7.  Power
-8.  Fans
+Metrics grouped by *scope*.
 
-It also adds 2 alarms:
+The scope defines the instance that the metric belongs to. An instance is uniquely identified by a set of labels.
 
-1.  Sensors in non-nominal state (i.e. warning and critical)
-2.  SEL is non empty
+### global
 
-![image](https://cloud.githubusercontent.com/assets/2662304/23674138/88926a20-037d-11e7-89c0-20e74ee10cd1.png)
+These metrics refer to the monitored host.
 
-The plugin does a speed test when it starts, to find out the duration needed by the IPMI processor to respond. Depending on the speed of your IPMI processor, charts may need several seconds to show up on the dashboard.
+This scope has no labels.
 
-## `freeipmi.plugin` configuration
+Metrics:
+
+| Metric   | Dimensions |  Unit  |
+|----------|:----------:|:------:|
+| ipmi.sel |   events   | events |
+
+### sensor
+
+These metrics refer to the VPN user.
+
+Labels:
+
+| Label     | Description                                                                                                     |
+|-----------|-----------------------------------------------------------------------------------------------------------------|
+| sensor    | Sensor name. Same value as the "Name" column in the `ipmi-sensors` output.                                      |
+| type      | Sensor type. Same value as the "Type" column in the `ipmi-sensors` output.                                      |
+| component | General sensor component. Identified by Netdata based on sensor name and type (e.g. System, Processor, Memory). |
+
+Metrics:
+
+| Metric                      |             Dimensions              |    Unit    |
+|-----------------------------|:-----------------------------------:|:----------:|
+| ipmi.sensor_state           | nominal, critical, warning, unknown |   state    |
+| ipmi.sensor_temperature_c   |             temperature             |  Celsius   |
+| ipmi.sensor_temperature_f   |             temperature             | Fahrenheit |
+| ipmi.sensor_voltage         |               voltage               |   Volts    |
+| ipmi.sensor_ampere          |               ampere                |    Amps    |
+| ipmi.sensor_fan_speed       |              rotations              |    RPM     |
+| ipmi.sensor_power           |                power                |   Watts    |
+| ipmi.sensor_reading_percent |             percentage              |     %      |
+
+## Alarms
+
+There are 2 alarms:
+
+- The sensor is in a warning or critical state.
+- System Event Log (SEL) is non-empty.
+
+## Configuration
 
 The plugin supports a few options. To see them, run:
 
 ```text
-# /usr/libexec/netdata/plugins.d/freeipmi.plugin -h
+# ./freeipmi.plugin --help
 
- netdata freeipmi.plugin 1.8.0-546-g72ce5d6b_rolling
- Copyright (C) 2016-2017 Costa Tsaousis <costa@tsaousis.gr>
+ netdata freeipmi.plugin v1.40.0-137-gf162c25bd
+ Copyright (C) 2023 Netdata Inc.
  Released under GNU General Public License v3 or later.
  All rights reserved.
 
@@ -86,17 +123,49 @@ The plugin supports a few options. To see them, run:
   no-sel                  enable/disable SEL collection
                           default: enabled
 
+  reread-sdr-cache        re-read SDR cache on every iteration
+                          default: disabled
+
+  interpret-oem-data      attempt to parse OEM data
+                          default: disabled
+
+  assume-system-event-record
+                          tread illegal SEL events records as normal
+                          default: disabled
+
+  ignore-non-interpretable-sensors
+                          do not read sensors that cannot be interpreted
+                          default: disabled
+
+  bridge-sensors          bridge sensors not owned by the BMC
+                          default: disabled
+
+  shared-sensors          enable shared sensors, if found
+                          default: disabled
+
+  no-discrete-reading     do not read sensors that their event/reading type code is invalid
+                          default: enabled
+
+  ignore-scanning-disabled
+                          Ignore the scanning bit and read sensors no matter what
+                          default: disabled
+
+  assume-bmc-owner        assume the BMC is the sensor owner no matter what
+                          (usually bridging is required too)
+                          default: disabled
+
   hostname HOST
   username USER
   password PASS           connect to remote IPMI host
                           default: local IPMI processor
 
+  no-auth-code-check
   noauthcodecheck         don't check the authentication codes returned
 
-  driver-type IPMIDRIVER
-                          Specify the driver type to use instead of doing an auto selection. 
-                          The currently available outofband drivers are LAN and  LAN_2_0,
-                          which  perform  IPMI  1.5  and  IPMI  2.0 respectively. 
+ driver-type IPMIDRIVER
+                          Specify the driver type to use instead of doing an auto selection.
+                          The currently available outofband drivers are LAN and LAN_2_0,
+                          which  perform  IPMI  1.5  and  IPMI  2.0 respectively.
                           The currently available inband drivers are KCS, SSIF, OPENIPMI and SUNBMC.
 
   sdr-cache-dir PATH      directory for SDR cache files
@@ -105,9 +174,15 @@ The plugin supports a few options. To see them, run:
   sensor-config-file FILE filename to read sensor configuration
                           default: system default
 
+  sel-config-file FILE    filename to read sel configuration
+                          default: system default
+
   ignore N1,N2,N3,...     sensor IDs to ignore
                           default: none
 
+  ignore-status N1,N2,N3,... sensor IDs to ignore status (nominal/warning/critical)
+                          default: none
+
   -v
   -V
   version                 print version and exit
@@ -131,13 +206,17 @@ You can set these options in `/etc/netdata/netdata.conf` at this section:
 	command options = 
 ```
 
-Append to `command options =` the settings you need. The minimum `update every` is 5 (enforced internally by the plugin). IPMI is slow and CPU hungry. So, once every 5 seconds is pretty acceptable.
+Append to `command options =` the settings you need. The minimum `update every` is 5 (enforced internally by the
+plugin). IPMI is slow and CPU hungry. So, once every 5 seconds is pretty acceptable.
 
 ## Ignoring specific sensors
 
-Specific sensor IDs can be excluded from freeipmi tools by editing `/etc/freeipmi/freeipmi.conf` and setting the IDs to be ignored at `ipmi-sensors-exclude-record-ids`. **However this file is not used by `libipmimonitoring`** (the library used by Netdata's `freeipmi.plugin`).
+Specific sensor IDs can be excluded from freeipmi tools by editing `/etc/freeipmi/freeipmi.conf` and setting the IDs to
+be ignored at `ipmi-sensors-exclude-record-ids`. **However this file is not used by `libipmimonitoring`** (the library
+used by Netdata's `freeipmi.plugin`).
 
-So, `freeipmi.plugin` supports the option `ignore` that accepts a comma separated list of sensor IDs to ignore. To configure it, edit `/etc/netdata/netdata.conf` and set:
+So, `freeipmi.plugin` supports the option `ignore` that accepts a comma separated list of sensor IDs to ignore. To
+configure it, edit `/etc/netdata/netdata.conf` and set:
 
 ```
 [plugin:freeipmi]
@@ -196,7 +275,9 @@ You can also permanently set the above setting by creating the file `/etc/modpro
 options ipmi_si kipmid_max_busy_us=10
 ```
 
-This instructs the kernel IPMI module to pause for a tick between checking IPMI. Querying IPMI will be a lot slower now (e.g. several seconds for IPMI to respond), but `kipmi` will not use any noticeable CPU. You can also use a higher number (this is the number of microseconds to poll IPMI for a response, before waiting for a tick).
+This instructs the kernel IPMI module to pause for a tick between checking IPMI. Querying IPMI will be a lot slower
+now (e.g. several seconds for IPMI to respond), but `kipmi` will not use any noticeable CPU. You can also use a higher
+number (this is the number of microseconds to poll IPMI for a response, before waiting for a tick).
 
 If you need to disable IPMI for Netdata, edit `/etc/netdata/netdata.conf` and set:
 

Разница между файлами не показана из-за своего большого размера
+ 484 - 350
collectors/freeipmi.plugin/freeipmi_plugin.c


+ 9 - 9
collectors/freeipmi.plugin/metrics.csv

@@ -1,10 +1,10 @@
 metric,scope,dimensions,unit,description,chart_type,labels,plugin,module
-ipmi.sel,,events,events,"IPMI Events",area,,freeipmi.plugin,
-ipmi.sensors_states,,"nominal, critical, warning",sensors,"IPMI Sensors State",line,,freeipmi.plugin,
-ipmi.temperatures_c,,a dimension per sensor,Celsius,"System Celsius Temperatures read by IPMI",line,,freeipmi.plugin,
-ipmi.temperatures_f,,a dimension per sensor,Fahrenheit,"System Celsius Temperatures read by IPMI",line,,freeipmi.plugin,
-ipmi.voltages,,a dimension per sensor,Volts,"System Voltages read by IPMI",line,,freeipmi.plugin,
-ipmi.amps,,a dimension per sensor,Amps,"System Current read by IPMI",line,,freeipmi.plugin,
-ipmi.rpm,,a dimension per sensor,RPM,"System Fans read by IPMI",line,,freeipmi.plugin,
-ipmi.watts,,a dimension per sensor,Watts,"System Power read by IPMI",line,,freeipmi.plugin,
-ipmi.percent,,a dimension per sensor,%,"System Metrics read by IPMI",line,,freeipmi.plugin,
+ipmi.sel,,events,events,"IPMI Events",area,,freeipmi.plugin,sel
+ipmi.sensor_state,sensor,"nominal, critical, warning, unknown",state,"IPMI Sensors State",line,,freeipmi.plugin,
+ipmi.sensor_temperature_c,sensor,temperature,Celsius,"IPMI Sensor Temperature Celsius",line,,freeipmi.plugin,sensors
+ipmi.sensor_temperature_f,sensor,temperature,Fahrenheit,"IPMI Sensor Temperature Fahrenheit",line,,freeipmi.plugin,sensors
+ipmi.sensor_voltage,sensor,voltage,Volts,"IPMI Sensor Voltage",line,,freeipmi.plugin,sensors
+ipmi.sensor_ampere,sensor,ampere,Amps,"IPMI Sensor Current",line,,freeipmi.plugin,sensors
+ipmi.sensor_fan_speed,sensor,rotations,RPM,"IPMI Sensor Fans Speed",line,,freeipmi.plugin,sensors
+ipmi.sensor_power,sensor,power,Watts,"IPMI Sensor Power",line,,freeipmi.plugin,sensors
+ipmi.sensor_reading_percent,sensor,percentage,%,"IPMI Sensor Reading Percentage",line,,freeipmi.plugin,sensors

+ 5 - 5
health/health.d/ipmi.conf

@@ -1,15 +1,15 @@
-    alarm: ipmi_sensors_states
-       on: ipmi.sensors_states
+ template: ipmi_sensor_state
+       on: ipmi.sensor_state
     class: Errors
      type: System
 component: IPMI
      calc: $warning + $critical
-    units: sensors
+    units: state
     every: 10s
-     warn: $this > 0
+     warn: $warning > 0
      crit: $critical > 0
     delay: up 5m down 15m multiplier 1.5 max 1h
-     info: number of IPMI sensors in non-nominal state
+     info: IPMI sensor ${label:sensor} (${label:component}) state
        to: sysadmin
 
     alarm: ipmi_events

+ 26 - 0
libnetdata/libnetdata.h

@@ -930,6 +930,32 @@ typedef enum {
     TIMING_STEP_END2_PROPAGATE,
     TIMING_STEP_END2_STORE,
 
+    TIMING_STEP_FREEIPMI_CTX_CREATE,
+    TIMING_STEP_FREEIPMI_DSR_CACHE_DIR,
+    TIMING_STEP_FREEIPMI_SENSOR_CONFIG_FILE,
+    TIMING_STEP_FREEIPMI_SENSOR_READINGS_BY_X,
+    TIMING_STEP_FREEIPMI_READ_record_id,
+    TIMING_STEP_FREEIPMI_READ_sensor_number,
+    TIMING_STEP_FREEIPMI_READ_sensor_type,
+    TIMING_STEP_FREEIPMI_READ_sensor_name,
+    TIMING_STEP_FREEIPMI_READ_sensor_state,
+    TIMING_STEP_FREEIPMI_READ_sensor_units,
+    TIMING_STEP_FREEIPMI_READ_sensor_bitmask_type,
+    TIMING_STEP_FREEIPMI_READ_sensor_bitmask,
+    TIMING_STEP_FREEIPMI_READ_sensor_bitmask_strings,
+    TIMING_STEP_FREEIPMI_READ_sensor_reading_type,
+    TIMING_STEP_FREEIPMI_READ_sensor_reading,
+    TIMING_STEP_FREEIPMI_READ_event_reading_type_code,
+    TIMING_STEP_FREEIPMI_READ_record_type,
+    TIMING_STEP_FREEIPMI_READ_record_type_class,
+    TIMING_STEP_FREEIPMI_READ_sel_state,
+    TIMING_STEP_FREEIPMI_READ_event_direction,
+    TIMING_STEP_FREEIPMI_READ_event_type_code,
+    TIMING_STEP_FREEIPMI_READ_event_offset_type,
+    TIMING_STEP_FREEIPMI_READ_event_offset,
+    TIMING_STEP_FREEIPMI_READ_event_offset_string,
+    TIMING_STEP_FREEIPMI_READ_manufacturer_id,
+
     // terminator
     TIMING_STEP_MAX,
 } TIMING_STEP;

Некоторые файлы не были показаны из-за большого количества измененных файлов