Просмотр исходного кода

Apps plugin improvements2 (#18673)

* improvements

* simple patterns can now be configured to run without separators; added netdata and spawn-plugins as process managers; updated documentation

* cosmetic changes

* fix issue in rrdlabels sanitizer

* fix text_sanitizers to skip leading spaces

* use quoted_strings_splitter_whitespace() instead of quoted_strings_splitter_pluginsd()

* remove know extensions from executable files

* detect sh -c exec ... and extract the right process name

* workaround for infine loop in cgroup-network with sanitization enabled
Costa Tsaousis 5 месяцев назад
Родитель
Сommit
82d91954ae

+ 10 - 0
CMakeLists.txt

@@ -804,6 +804,15 @@ set(LIBNETDATA_FILES
         src/libnetdata/parsers/entries.h
         src/libnetdata/parsers/entries.h
         src/libnetdata/sanitizers/chart_id_and_name.c
         src/libnetdata/sanitizers/chart_id_and_name.c
         src/libnetdata/sanitizers/chart_id_and_name.h
         src/libnetdata/sanitizers/chart_id_and_name.h
+        src/libnetdata/sanitizers/utf8-sanitizer.c
+        src/libnetdata/sanitizers/utf8-sanitizer.h
+        src/libnetdata/sanitizers/sanitizers.h
+        src/libnetdata/sanitizers/sanitizers-labels.c
+        src/libnetdata/sanitizers/sanitizers-labels.h
+        src/libnetdata/sanitizers/sanitizers-functions.c
+        src/libnetdata/sanitizers/sanitizers-functions.h
+        src/libnetdata/sanitizers/sanitizers-pluginsd.c
+        src/libnetdata/sanitizers/sanitizers-pluginsd.h
 )
 )
 
 
 if(ENABLE_PLUGIN_EBPF)
 if(ENABLE_PLUGIN_EBPF)
@@ -1897,6 +1906,7 @@ if(ENABLE_PLUGIN_APPS)
             src/collectors/apps.plugin/apps_os_windows.c
             src/collectors/apps.plugin/apps_os_windows.c
             src/collectors/apps.plugin/apps_incremental_collection.c
             src/collectors/apps.plugin/apps_incremental_collection.c
             src/collectors/apps.plugin/apps_os_windows_nt.c
             src/collectors/apps.plugin/apps_os_windows_nt.c
+            src/collectors/apps.plugin/apps_pid_match.c
     )
     )
 
 
     add_executable(apps.plugin ${APPS_PLUGIN_FILES})
     add_executable(apps.plugin ${APPS_PLUGIN_FILES})

+ 91 - 53
src/collectors/apps.plugin/README.md

@@ -4,51 +4,54 @@
 
 
 ## Process Aggregation and Grouping
 ## Process Aggregation and Grouping
 
 
-`apps.plugin` aggregates processes in three distinct ways to provide a more insightful
-breakdown of resource utilization:
+`apps.plugin` aggregates processes in three distinct ways to provide a more
+insightful breakdown of resource utilization:
 
 
 - **Tree** or **Category**: Grouped by their position in the process tree.
 - **Tree** or **Category**: Grouped by their position in the process tree.
-   This is customizable and allows aggregation by process managers and individual
-   processes of interest. Allows also renaming the processes for presentation purposes.
+   This is customizable and allows aggregation by process managers and 
+   individual processes of interest. Allows also renaming the processes for
+   presentation purposes.
 
 
 - **User**: Grouped by the effective user (UID) under which the processes run.
 - **User**: Grouped by the effective user (UID) under which the processes run.
 
 
-- **Group**: Grouped by the effective group (GID) under which the processes run.
+- **Group**: Grouped by the effective group (GID) under which the processes
+   run.
 
 
 ## Short-Lived Process Handling
 ## Short-Lived Process Handling
 
 
-`apps.plugin` accounts for resource utilization of both running and exited processes,
-capturing the impact of processes that spawn short-lived subprocesses, such as shell
-scripts that fork hundreds or thousands of times per second. So, although processes
-may spawn short lived sub-processes, `apps.plugin` will aggregate their resources
-utilization providing a holistic view of how resources are shared among the processes.
+`apps.plugin` accounts for resource utilization of both running and exited
+processes, capturing the impact of processes that spawn short-lived
+subprocesses, such as shell scripts that fork hundreds or thousands of times
+per second. So, although processes may spawn short-lived sub-processes,
+`apps.plugin` will aggregate their resources utilization providing a holistic
+view of how resources are shared among the processes.
 
 
 ## Charts sections
 ## Charts sections
 
 
-To provide more valuable insights, apps.plugin aggregates individual processes in several ways.
-Each type of aggregation is presented as a different section on the dashboard.
+To provide more valuable insights, apps.plugin aggregates individual processes
+in several ways. Each type of aggregation is presented as a different section
+on the dashboard.
 
 
 ### Custom Process Groups (Apps)
 ### Custom Process Groups (Apps)
 
 
-In this section, apps.plugin summarizes the resources consumed by all processes, grouped based
-on the groups provided in `/etc/netdata/apps_groups.conf`. You can edit this file using our [`edit-config`](/docs/netdata-agent/configuration/README.md#edit-a-configuration-file-using-edit-config) script.
+In this section, apps.plugin summarizes the resources consumed by all
+processes, grouped based on their position in the process tree and the groups
+provided in `/etc/netdata/apps_groups.conf`. You can edit this file using our
+[`edit-config`](/docs/netdata-agent/configuration/README.md#edit-a-configuration-file-using-edit-config) script.
 
 
-For this section, `apps.plugin` builds a process tree (much like `ps fax` does in Linux), and groups
-processes together (evaluating both child and parent processes) so that the result is always a list with
-a predefined set of members (of course, only process groups found running are reported).
-
-> If you find that `apps.plugin` categorizes standard applications as `other`, we would be
-> glad to accept pull requests improving the defaults shipped with Netdata in `apps_groups.conf`.
+For this section, `apps.plugin` builds a process tree (much like `ps fax` does
+in Linux), and groups processes together (evaluating both child and parent
+processes).
 
 
 ### By User (Users)
 ### By User (Users)
 
 
-In this section, apps.plugin summarizes the resources consumed by all processes, grouped by the
-effective user under which each process runs.
+In this section, apps.plugin summarizes the resources consumed by all
+processes, grouped by the effective user under which each process runs.
 
 
 ### By User Group (Groups)
 ### By User Group (Groups)
 
 
-In this section, apps.plugin summarizes the resources consumed by all processes, grouped by the
-effective user group under which each process runs.
+In this section, apps.plugin summarizes the resources consumed by all
+processes, grouped by the effective user group under which each process runs.
 
 
 ## Charts
 ## Charts
 
 
@@ -97,14 +100,14 @@ The above are reported:
 
 
 ## Performance
 ## Performance
 
 
-`apps.plugin` is a complex piece of software and has a lot of work to do
-We are proud that `apps.plugin` is a lot faster compared to any other similar tool,
-while collecting a lot more information for the processes, however the fact is that
-this plugin may require more CPU resources than the `netdata` daemon itself.
+We are proud that `apps.plugin` is a lot faster compared to any other similar
+tools, while collecting a lot more information for the processes, however the
+fact is that this plugin needs to traverse the entire process tree on every
+iteration, so its resources usage may be noticable.
 
 
-Under Linux, for each process running, `apps.plugin` reads several `/proc` files
-per process. Doing this work per-second, especially on hosts with several thousands
-of processes, may increase the CPU resources consumed by the plugin.
+Under Linux, for each process running, `apps.plugin` reads several `/proc`
+files per process. Doing this work per-second, especially on hosts with several
+thousands of processes, may increase the CPU resources consumed by the plugin.
 
 
 In such cases, you many need to lower its data collection frequency.
 In such cases, you many need to lower its data collection frequency.
 
 
@@ -116,20 +119,23 @@ To do this, edit `/etc/netdata/netdata.conf` and find this section:
  # command options =
  # command options =
 ```
 ```
 
 
-Uncomment the line `update every` and set it to a higher number. If you just set it to `2`,
-its CPU resources will be cut in half, and data collection will be once every 2 seconds.
+Uncomment the line `update every` and set it to a higher number. If you just
+set it to `2`, its CPU resources will be cut in half, and data collection will
+be once every 2 seconds.
 
 
 ## Configuration
 ## Configuration
 
 
-The configuration file is `/etc/netdata/apps_groups.conf`. You can edit this file using our [`edit-config`](/docs/netdata-agent/configuration/README.md#edit-a-configuration-file-using-edit-config) script.
+The configuration file is `/etc/netdata/apps_groups.conf`. You can edit this
+file using our [`edit-config`](/docs/netdata-agent/configuration/README.md#edit-a-configuration-file-using-edit-config) script.
 
 
 ### Configuring process managers
 ### Configuring process managers
 
 
-`apps.plugin` needs to know the common process managers, meaning the names of the processes
-which spawn other processes. Process managers are used so that `apps.plugin` will automatically
-consider all their sub-processes important to monitor.
+`apps.plugin` needs to know the common process managers, the names of the processes
+which spawn other processes. Process managers help `apps.plugin` automatically
+consider all their sub-processes, important to monitor.
 
 
-Process managers are configured in `apps_groups.conf` with the prefix `managers:`, like this:
+Process managers are configured in `apps_groups.conf` with the prefix
+`managers:`, like this:
 
 
 ```txt
 ```txt
 managers: process1 process2 process3
 managers: process1 process2 process3
@@ -137,9 +143,32 @@ managers: process1 process2 process3
 
 
 Multiple lines may exist, all starting with `managers:`.
 Multiple lines may exist, all starting with `managers:`.
 
 
-The process names given here should be exactly as the operating system sets them. In Linux these
-process names are limited to 15 characters. Usually the command `ps -e` or `cat /proc/{PID}/stat`
-states the names needed here.
+A line `managers: clear` will clear all managers, so that a new list can be
+provided.
+
+### Configuring interpreters
+
+Interpreted languages like `python`, `bash`, `sh`, `node` and more, may hide
+the actual name of a process.
+
+For such programs, `apps.plugin` can be instructed to check for the actual
+process name in one of the command line parameters of the program. When a
+process matches an interpreter, apps.plugin will go through all the parameters
+of the interpreter and find the first parameter that is an absolute filename
+existing on disk. When  found, `apps.plugin` will name the process using
+the name of that filename.
+
+Interpreters are configured in `apps_groups.conf` with the prefix
+`interpreters:`, like this:
+
+```txt
+interpreters: process1 process2 process3
+```
+
+Multiple lines may exist, all starting with `interpreters:`.
+
+A line `interpreters: clear` will clear all interpreters, so that a new list
+can be provided.
 
 
 ### Configuring process groups and renaming processes
 ### Configuring process groups and renaming processes
 
 
@@ -151,16 +180,20 @@ group: process1 process2 ...
 
 
 Each group can be given multiple times, to add more processes to it.
 Each group can be given multiple times, to add more processes to it.
 
 
-For each process given, all of its sub-processes will be grouped, not just the matched process.
+For each process given, all of its sub-processes will be grouped, not just the
+matched process.
+
+### Matching processes
 
 
 The process names are the ones returned by:
 The process names are the ones returned by:
 
 
 - **comm**: `ps -e` or `cat /proc/{PID}/stat`
 - **comm**: `ps -e` or `cat /proc/{PID}/stat`
 - **cmdline**: in case of substring mode (see below): `/proc/{PID}/cmdline`
 - **cmdline**: in case of substring mode (see below): `/proc/{PID}/cmdline`
 
 
-On Linux **comm** is limited to just a few characters. `apps.plugin` attempts to find the entire
-**comm** name by looking for it at the **cmdline**. When this is successful, the entire process name
-is available, otherwise the shortened one is used.
+On Linux **comm** is limited to 15 characters. `apps.plugin` attempts to find
+the entire **comm** name by looking for it at the **cmdline**. When this is
+successful, the entire process name is available, otherwise the shortened one
+is used.
 
 
 To add process names with spaces, enclose them in quotes (single or double)
 To add process names with spaces, enclose them in quotes (single or double)
 example: `'Plex Media Serv'` or `"my other process"`.
 example: `'Plex Media Serv'` or `"my other process"`.
@@ -171,18 +204,23 @@ You can add asterisks (`*`) to provide a pattern:
 - `name*` _prefix_ mode: will match a **comm** beginning with `name`.
 - `name*` _prefix_ mode: will match a **comm** beginning with `name`.
 - `*name*` _substring_ mode: will search for `name` in **cmdline**.
 - `*name*` _substring_ mode: will search for `name` in **cmdline**.
 
 
-Asterisks may appear in the middle of `name` (like `na*me`), without affecting what is being
-matched (**comm** or **cmdline**).
+Asterisks may appear in the middle of `name` (like `na*me`), without affecting
+what is being matched (**comm** or **cmdline**).
+
+To add processes with single quotes, enclose them in double quotes:
+`"process with this ' single quote"`.
 
 
-To add processes with single quotes, enclose them in double quotes: `"process with this ' single quote"`
+To add processes with double quotes, enclose them in single quotes:
+`'process with this " double quote'`.
 
 
-To add processes with double quotes, enclose them in single quotes: `'process with this " double quote'`
+The order of the entries in this list is important: the first one that matches
+a process is used, so follow a top-down hierarchy. Processes not matched by any
+row, will inherit it from their parents.
 
 
-The order of the entries in this list is important: the first one that matches a process is used, so follow a top-down hierarchy.
-Processes not matched by any row, will inherit it from their parents.
+There are a few command line options you can pass to `apps.plugin`. The list of
+available options can be acquired with the `--help` flag. The options can be
+set in the `netdata.conf` using the [`edit-config` script](/docs/netdata-agent/configuration/README.md).
 
 
-There are a few command line options you can pass to `apps.plugin`. The list of available
-options can be acquired with the `--help` flag. The options can be set in the `netdata.conf` using the [`edit-config` script](/docs/netdata-agent/configuration/README.md).
 For example, to disable user and user group charts you would set:
 For example, to disable user and user group charts you would set:
 
 
 ```txt
 ```txt

+ 6 - 49
src/collectors/apps.plugin/apps_aggregations.c

@@ -110,61 +110,18 @@ static inline void cleanup_exited_pids(void) {
     }
     }
 }
 }
 
 
-static struct target *matched_apps_groups_target(struct pid_stat *p, struct target *w) {
-    if(is_process_manager(p))
-        return NULL;
-
-    p->matched_by_config = true;
-    return w->target ? w->target : w;
-}
-
 static struct target *get_apps_groups_target_for_pid(struct pid_stat *p) {
 static struct target *get_apps_groups_target_for_pid(struct pid_stat *p) {
     targets_assignment_counter++;
     targets_assignment_counter++;
 
 
     for(struct target *w = apps_groups_root_target; w ; w = w->next) {
     for(struct target *w = apps_groups_root_target; w ; w = w->next) {
         if(w->type != TARGET_TYPE_APP_GROUP) continue;
         if(w->type != TARGET_TYPE_APP_GROUP) continue;
 
 
-        if(!w->starts_with && !w->ends_with) {
-            if(w->ag.pattern) {
-                if(simple_pattern_matches_string(w->ag.pattern, p->comm))
-                    return matched_apps_groups_target(p, w);
-            }
-            else {
-                if(w->ag.compare == p->comm || w->ag.compare == p->comm_orig)
-                    return matched_apps_groups_target(p, w);
-            }
-        }
-        else if(w->starts_with && !w->ends_with) {
-            if(w->ag.pattern) {
-                if(simple_pattern_matches_string(w->ag.pattern, p->comm))
-                    return matched_apps_groups_target(p, w);
-            }
-            else {
-                if(string_starts_with_string(p->comm, w->ag.compare) ||
-                    (p->comm != p->comm_orig && string_starts_with_string(p->comm, w->ag.compare)))
-                    return matched_apps_groups_target(p, w);
-            }
-        }
-        else if(!w->starts_with && w->ends_with) {
-            if(w->ag.pattern) {
-                if(simple_pattern_matches_string(w->ag.pattern, p->comm))
-                    return matched_apps_groups_target(p, w);
-            }
-            else {
-                if(string_ends_with_string(p->comm, w->ag.compare) ||
-                    (p->comm != p->comm_orig && string_ends_with_string(p->comm, w->ag.compare)))
-                    return matched_apps_groups_target(p, w);
-            }
-        }
-        else if(w->starts_with && w->ends_with && p->cmdline) {
-            if(w->ag.pattern) {
-                if(simple_pattern_matches_string(w->ag.pattern, p->cmdline))
-                    return matched_apps_groups_target(p, w);
-            }
-            else {
-                if(strstr(string2str(p->cmdline), string2str(w->ag.compare)))
-                    return matched_apps_groups_target(p, w);
-            }
+        if(pid_match_check(p, &w->match)) {
+            if(p->is_manager)
+                return NULL;
+
+            p->matched_by_config = true;
+            return w->target ? w->target : w;
         }
         }
     }
     }
 
 

+ 1 - 1
src/collectors/apps.plugin/apps_functions.c

@@ -86,7 +86,7 @@ void function_processes(const char *transaction, char *function,
             access, HTTP_ACCESS_SIGNED_ID | HTTP_ACCESS_SAME_SPACE | HTTP_ACCESS_SENSITIVE_DATA | HTTP_ACCESS_VIEW_AGENT_CONFIG) || enable_function_cmdline;
             access, HTTP_ACCESS_SIGNED_ID | HTTP_ACCESS_SAME_SPACE | HTTP_ACCESS_SENSITIVE_DATA | HTTP_ACCESS_VIEW_AGENT_CONFIG) || enable_function_cmdline;
 
 
     char *words[PLUGINSD_MAX_WORDS] = { NULL };
     char *words[PLUGINSD_MAX_WORDS] = { NULL };
-    size_t num_words = quoted_strings_splitter_pluginsd(function, words, PLUGINSD_MAX_WORDS);
+    size_t num_words = quoted_strings_splitter_whitespace(function, words, PLUGINSD_MAX_WORDS);
 
 
     struct target *category = NULL, *user = NULL, *group = NULL; (void)category; (void)user; (void)group;
     struct target *category = NULL, *user = NULL, *group = NULL; (void)category; (void)user; (void)group;
     const char *process_name = NULL;
     const char *process_name = NULL;

+ 25 - 28
src/collectors/apps.plugin/apps_groups.conf

@@ -4,49 +4,46 @@
 ## Documentation at:
 ## Documentation at:
 ## https://github.com/netdata/netdata/blob/master/src/collectors/apps.plugin/README.md
 ## https://github.com/netdata/netdata/blob/master/src/collectors/apps.plugin/README.md
 ##
 ##
-## Subprocesses of process managers are monitored.
-## (uncomment to edit - the default is also hardcoded into the plugin)
+## -----------------------------------------------------------------------------
+## Subprocesses of process managers are monitored individually.
+## (uncomment to add or edit - the default is also hardcoded into the plugin)
+
+## Clear all the managers, to set yours, otherwise append to the internal list.
+#managers: clear
 
 
 ## Linux process managers
 ## Linux process managers
 #managers: init systemd containerd-shim-runc-v2 dumb-init gnome-shell docker-init
 #managers: init systemd containerd-shim-runc-v2 dumb-init gnome-shell docker-init
-#managers: openrc-run.sh crond plasmashell xfwm4
+#managers: spawn-plugins openrc-run.sh crond plasmashell xfwm4
 
 
 ## FreeBSD process managers
 ## FreeBSD process managers
-#managers: init
+#managers: init spawn-plugins
 
 
 ## MacOS process managers
 ## MacOS process managers
-#managers: launchd
+#managers: launchd spawn-plugins
 
 
 ## Windows process managers
 ## Windows process managers
-#managers: wininit services explorer System
+#managers: wininit services explorer System netdata
+
+## -----------------------------------------------------------------------------
+## Interpreters to search for the actual command name in command line.
+## (uncomment to add or edit - the default is also hardcoded into the plugin)
+
+## Clear all the interpreters, to set yours, otherwise append to the internal list.
+#interpreters: clear
+
+#interpreters: python python2 python3
+#interpreters: sh bash zsh
+#interpreters: node perl awk
 
 
 ## -----------------------------------------------------------------------------
 ## -----------------------------------------------------------------------------
 ## Processes of interest
 ## Processes of interest
+## Grouping and/or rename individual processes.
+## (there is no internal default for this section)
 
 
 ## NETDATA processes accounting
 ## NETDATA processes accounting
 netdata: netdata
 netdata: netdata
-## netdata known plugins
-## plugins not defined here will be accumulated into netdata, above
-apps.plugin: *apps.plugin*
-go.d.plugin: *go.d.plugin*
-systemd-journal.plugin: *systemd-journal.plugin*
-network-viewer.plugin: *network-viewer.plugin*
-windows-events.plugin: *windows-events.plugin*
-cups.plugin: *cups.plugin*
-perf.plugin: *perf.plugin*
-nfacct.plugin: *nfacct.plugin*
-xenstat.plugin: *xenstat.plugin*
-freeipmi.plugin: *freeipmi.plugin*
-charts.d.plugin: *charts.d.plugin*
-python.d.plugin: *python.d.plugin*
-slabinfo.plugin: *slabinfo.plugin*
-ebpf.plugin: *ebpf.plugin*
-debugfs.plugin: *debugfs.plugin*
-tc-qos-helper: *tc-qos-helper.sh*
-fping: fping
-ioping: ioping
-
-## agent-service-discovery
+
+## NETDATA agent-service-discovery (kubernetes)
 agent_sd: agent_sd
 agent_sd: agent_sd
 
 
 ## -----------------------------------------------------------------------------
 ## -----------------------------------------------------------------------------

+ 2 - 3
src/collectors/apps.plugin/apps_incremental_collection.c

@@ -174,15 +174,14 @@ int read_proc_pid_cmdline(struct pid_stat *p) {
     if(unlikely(!OS_FUNCTION(apps_os_get_pid_cmdline)(p, cmdline, sizeof(cmdline))))
     if(unlikely(!OS_FUNCTION(apps_os_get_pid_cmdline)(p, cmdline, sizeof(cmdline))))
         goto cleanup;
         goto cleanup;
 
 
-    string_freez(p->cmdline);
-    p->cmdline = string_strdupz(cmdline);
+    update_pid_cmdline(p, cmdline);
 
 
     return 1;
     return 1;
 
 
 cleanup:
 cleanup:
     // copy the command to the command line
     // copy the command to the command line
     string_freez(p->cmdline);
     string_freez(p->cmdline);
-    p->cmdline = string_dup(p->comm);
+    p->cmdline = NULL;
     return 0;
     return 0;
 }
 }
 #endif
 #endif

+ 25 - 18
src/collectors/apps.plugin/apps_os_windows.c

@@ -521,15 +521,24 @@ static char *wchar_to_utf8(WCHAR *s) {
     return utf8;
     return utf8;
 }
 }
 
 
-// Convert wide string to UTF-8
-static STRING *wchar_to_string(WCHAR *s) {
-    return string_strdupz(wchar_to_utf8(s));
+static char *ansi_to_utf8(LPCSTR str) {
+    static __thread WCHAR unicode[PATH_MAX];
+    static __thread int unicode_size = sizeof(unicode) / sizeof(*unicode);
+
+    // Step 1: Convert ANSI string (LPSTR) to wide string (UTF-16)
+    int wideLength = MultiByteToWideChar(CP_ACP, 0, str, -1, NULL, 0);
+    if (wideLength == 0 || wideLength > unicode_size)
+        return NULL;
+
+    MultiByteToWideChar(CP_ACP, 0, str, -1, unicode, wideLength);
+
+    return wchar_to_utf8(unicode);
 }
 }
 
 
 // --------------------------------------------------------------------------------------------------------------------
 // --------------------------------------------------------------------------------------------------------------------
 
 
 // return a sanitized name for the process
 // return a sanitized name for the process
-STRING *GetProcessFriendlyNameSanitized(WCHAR *path) {
+STRING *GetProcessFriendlyNameFromPathSanitized(WCHAR *path) {
     static __thread uint8_t void_buf[1024 * 1024];
     static __thread uint8_t void_buf[1024 * 1024];
     static __thread DWORD void_buf_size = sizeof(void_buf);
     static __thread DWORD void_buf_size = sizeof(void_buf);
     static __thread wchar_t unicode[PATH_MAX];
     static __thread wchar_t unicode[PATH_MAX];
@@ -548,7 +557,7 @@ STRING *GetProcessFriendlyNameSanitized(WCHAR *path) {
             wcsncpy(unicode, value, unicode_size - 1);
             wcsncpy(unicode, value, unicode_size - 1);
             unicode[unicode_size - 1] = L'\0';
             unicode[unicode_size - 1] = L'\0';
             char *name = wchar_to_utf8(unicode);
             char *name = wchar_to_utf8(unicode);
-            sanitize_chart_meta(name);
+            sanitize_apps_plugin_chart_meta(name);
             return string_strdupz(name);
             return string_strdupz(name);
         }
         }
     }
     }
@@ -573,7 +582,7 @@ static STRING *GetNameFromCmdlineSanitized(struct pid_stat *p) {
                 char service[strlen(words[i + 1]) + sizeof(SERVICE_PREFIX)]; // sizeof() includes a null
                 char service[strlen(words[i + 1]) + sizeof(SERVICE_PREFIX)]; // sizeof() includes a null
                 strcpy(service, SERVICE_PREFIX);
                 strcpy(service, SERVICE_PREFIX);
                 strcpy(&service[sizeof(SERVICE_PREFIX) - 1], words[i + 1]);
                 strcpy(&service[sizeof(SERVICE_PREFIX) - 1], words[i + 1]);
-                sanitize_chart_meta(service);
+                sanitize_apps_plugin_chart_meta(service);
                 return string_strdupz(service);
                 return string_strdupz(service);
             }
             }
         }
         }
@@ -621,13 +630,12 @@ static void GetServiceNames(void) {
         if(p && !p->got_service) {
         if(p && !p->got_service) {
             p->got_service = true;
             p->got_service = true;
 
 
-            size_t len = strlen(pServiceStatus[i].lpDisplayName);
-            char buf[len + 1];
-            memcpy(buf, pServiceStatus[i].lpDisplayName, sizeof(buf));
-            sanitize_chart_meta(buf);
-
-            string_freez(p->name);
-            p->name = string_strdupz(buf);
+            char *name = ansi_to_utf8(pServiceStatus[i].lpDisplayName);
+            if(name) {
+                sanitize_apps_plugin_chart_meta(name);
+                string_freez(p->name);
+                p->name = string_strdupz(name);
+            }
         }
         }
     }
     }
 
 
@@ -695,14 +703,13 @@ void GetAllProcessesInfo(void) {
         {
         {
             WCHAR *cmdline = GetProcessCommandLine(hProcess); // returns malloc'd buffer
             WCHAR *cmdline = GetProcessCommandLine(hProcess); // returns malloc'd buffer
             if (cmdline) {
             if (cmdline) {
-                string_freez(p->cmdline);
-                p->cmdline = wchar_to_string(cmdline);
+                update_pid_cmdline(p, wchar_to_utf8(cmdline));
 
 
                 // extract the process full path from the command line
                 // extract the process full path from the command line
                 WCHAR *path = executable_path_from_cmdline(cmdline);
                 WCHAR *path = executable_path_from_cmdline(cmdline);
                 if(path) {
                 if(path) {
                     string_freez(p->name);
                     string_freez(p->name);
-                    p->name = GetProcessFriendlyNameSanitized(path);
+                    p->name = GetProcessFriendlyNameFromPathSanitized(path);
                 }
                 }
 
 
                 free(cmdline); // free(), not freez()
                 free(cmdline); // free(), not freez()
@@ -713,10 +720,10 @@ void GetAllProcessesInfo(void) {
             if (QueryFullProcessImageNameW(hProcess, 0, unicode, &unicode_size)) {
             if (QueryFullProcessImageNameW(hProcess, 0, unicode, &unicode_size)) {
                 // put the full path name to the command into cmdline
                 // put the full path name to the command into cmdline
                 if(!p->cmdline)
                 if(!p->cmdline)
-                    p->cmdline = wchar_to_string(unicode);
+                    update_pid_cmdline(p, wchar_to_utf8(unicode));
 
 
                 if(!p->name)
                 if(!p->name)
-                    p->name = GetProcessFriendlyNameSanitized(unicode);
+                    p->name = GetProcessFriendlyNameFromPathSanitized(unicode);
             }
             }
         }
         }
 
 

+ 160 - 31
src/collectors/apps.plugin/apps_pid.c

@@ -317,37 +317,165 @@ static inline void link_all_processes_to_their_parents(void) {
 
 
 // --------------------------------------------------------------------------------------------------------------------
 // --------------------------------------------------------------------------------------------------------------------
 
 
-static inline STRING *comm_from_cmdline_sanitized(char *comm, STRING *cmdline) {
-    if(!cmdline) {
-        sanitize_chart_meta(comm);
-        return string_strdupz(comm);
+static bool is_filename(const char *s) {
+    if(!s || !*s) return false;
+
+#if defined(OS_WINDOWS)
+    if( (isalpha((uint8_t)*s) || (s[1] == ':' && s[2] == '\\')) ||                  // windows native "x:\"
+        (isalpha((uint8_t)*s) || (s[1] == ':' && s[2] == '/')) ||                   // windows native "x:/"
+        (*s == '\\' && s[1] == '\\' && isalpha((uint8_t)s[2]) && s[3] == '\\') ||   // windows native "\\x\"
+        (*s == '/' && s[1] == '/' && isalpha((uint8_t)s[2]) && s[3] == '/')) {      // windows native "//x/"
+
+        WCHAR ws[FILENAME_MAX];
+        int wlen = MultiByteToWideChar(CP_UTF8, 0, s, -1, NULL, 0);
+        if (wlen <= 0 || (size_t)wlen > sizeof(ws) / sizeof(*ws)) {
+            return false; // Failed to convert UTF-8 to UTF-16
+        }
+
+        MultiByteToWideChar(CP_UTF8, 0, s, -1, ws, wlen);
+        DWORD attributes = GetFileAttributesW(ws);
+        if (attributes != INVALID_FILE_ATTRIBUTES)
+            return true;
     }
     }
+#endif
 
 
-    const char *cl = string2str(cmdline);
-    size_t len = string_strlen(cmdline);
+    // for: sh -c "exec /path/to/command parameters"
+    if(strncmp(s, "exec ", 5) == 0 && s[5]) {
+        s += 5;
+        char look_for = ' ';
+        if(*s == '\'') { look_for = '\''; s++; }
+        if(*s == '"') { look_for = '"'; s++; }
+        char *end = strchr(s, look_for);
+        if(end) *end = '\0';
+    }
 
 
-    char buf_cmd[len + 1];
-    // if it is enclosed in (), remove the parenthesis
-    if(cl[0] == '(' && cl[len - 1] == ')') {
-        memcpy(buf_cmd, &cl[1], len - 2);
-        buf_cmd[len - 2] = '\0';
+    // linux, freebsd, macos, msys, cygwin
+    if(*s == '/') {
+        struct statvfs stat;
+        return statvfs(s, &stat) == 0;
     }
     }
-    else
-        memcpy(buf_cmd, cl, sizeof(buf_cmd));
 
 
-    size_t comm_len = strlen(comm);
-    char *start = strstr(buf_cmd, comm);
-    if(start) {
+    return false;
+}
+
+static const char *extensions_to_strip[] = {
+        ".sh", // shell scripts
+        ".py", // python scripts
+        ".pl", // perl scripts
+        ".js", // node.js
+#if defined(OS_WINDOWS)
+        ".exe",
+#endif
+        NULL,
+};
+
+// strip extensions we don't want to show
+static void remove_extension(char *name) {
+    size_t name_len = strlen(name);
+    for(size_t i = 0; extensions_to_strip[i] != NULL; i++) {
+        const char *ext = extensions_to_strip[i];
+        size_t ext_len = strlen(ext);
+        if(name_len > ext_len) {
+            char *check = &name[name_len - ext_len];
+            if(strcmp(check, ext) == 0) {
+                *check = '\0';
+                break;
+            }
+        }
+    }
+}
+
+static inline STRING *comm_from_cmdline_param_sanitized(STRING *cmdline) {
+    if(!cmdline) return NULL;
+
+    char buf[string_strlen(cmdline) + 1];
+    memcpy(buf, string2str(cmdline), sizeof(buf));
+
+    char *words[100];
+    size_t num_words = quoted_strings_splitter_whitespace(buf, words, 100);
+    for(size_t word = 1; word < num_words ;word++) {
+        char *s = words[word];
+        if(is_filename(s)) {
+            char *name = strrchr(s, '/');
+
+#if defined(OS_WINDOWS)
+            if(!name)
+                name = strrchr(s, '\\');
+#endif
+
+            if(name && *name) {
+                name++;
+                remove_extension(name);
+                sanitize_apps_plugin_chart_meta(name);
+                return string_strdupz(name);
+            }
+        }
+    }
+
+    return NULL;
+}
+
+static inline STRING *comm_from_cmdline_sanitized(STRING *comm, STRING *cmdline) {
+    if(!cmdline) return NULL;
+
+    char buf[string_strlen(cmdline) + 1];
+    memcpy(buf, string2str(cmdline), sizeof(buf));
+
+    size_t comm_len = string_strlen(comm);
+    char *start = strstr(buf, string2str(comm));
+    while (start) {
         char *end = start + comm_len;
         char *end = start + comm_len;
-        while(*end && !isspace((uint8_t)*end) && *end != '/' && *end != '\\' && *end != '"') end++;
+        while (*end &&
+               !isspace((uint8_t) *end) &&
+               *end != '/' &&    // path separator - linux
+               *end != '\\' &&   // path separator - windows
+               *end != '"' &&    // closing double quotes
+               *end != '\'' &&   // closing single quotes
+               *end != ')' &&    // sometimes process add ) at their end
+               *end != ':')      // sometimes process add : at their end
+            end++;
+
         *end = '\0';
         *end = '\0';
 
 
-        sanitize_chart_meta(start);
+        remove_extension(start);
+        sanitize_apps_plugin_chart_meta(start);
         return string_strdupz(start);
         return string_strdupz(start);
     }
     }
 
 
-    sanitize_chart_meta(comm);
-    return string_strdupz(comm);
+    return NULL;
+}
+
+static void update_pid_comm_from_cmdline(struct pid_stat *p) {
+    bool updated = false;
+
+    STRING *new_comm = comm_from_cmdline_sanitized(p->comm, p->cmdline);
+    if(new_comm) {
+        string_freez(p->comm);
+        p->comm = new_comm;
+        updated = true;
+    }
+
+    if(is_process_an_interpreter(p)) {
+        new_comm = comm_from_cmdline_param_sanitized(p->cmdline);
+        if(new_comm) {
+            string_freez(p->comm);
+            p->comm = new_comm;
+            updated = true;
+        }
+    }
+
+    if(updated) {
+        p->is_manager = is_process_a_manager(p);
+        p->is_aggregator = is_process_an_aggregator(p);
+    }
+}
+
+void update_pid_cmdline(struct pid_stat *p, const char *cmdline) {
+    string_freez(p->cmdline);
+    p->cmdline = cmdline ? string_strdupz(cmdline) : NULL;
+
+    if(p->cmdline)
+        update_pid_comm_from_cmdline(p);
 }
 }
 
 
 void update_pid_comm(struct pid_stat *p, const char *comm) {
 void update_pid_comm(struct pid_stat *p, const char *comm) {
@@ -355,10 +483,8 @@ void update_pid_comm(struct pid_stat *p, const char *comm) {
         // no change
         // no change
         return;
         return;
 
 
-#if (PROCESSES_HAVE_CMDLINE == 1)
-    if(likely(proc_pid_cmdline_is_needed && !p->cmdline))
-        managed_log(p, PID_LOG_CMDLINE, read_proc_pid_cmdline(p));
-#endif
+    string_freez(p->comm_orig);
+    p->comm_orig = string_strdupz(comm);
 
 
     // some process names have ( and ), remove the parenthesis
     // some process names have ( and ), remove the parenthesis
     size_t len = strlen(comm);
     size_t len = strlen(comm);
@@ -370,14 +496,17 @@ void update_pid_comm(struct pid_stat *p, const char *comm) {
     else
     else
         memcpy(buf, comm, sizeof(buf));
         memcpy(buf, comm, sizeof(buf));
 
 
-    string_freez(p->comm_orig);
-    p->comm_orig = string_strdupz(comm);
+    sanitize_apps_plugin_chart_meta(buf);
+    p->comm = string_strdupz(buf);
+    p->is_manager = is_process_a_manager(p);
+    p->is_aggregator = is_process_an_aggregator(p);
 
 
-    string_freez(p->comm);
-    p->comm = comm_from_cmdline_sanitized(buf, p->cmdline);
-
-    p->is_manager = is_process_manager(p);
-    p->is_aggregator = is_process_aggregator(p);
+#if (PROCESSES_HAVE_CMDLINE == 1)
+    if(likely(proc_pid_cmdline_is_needed && !p->cmdline))
+        managed_log(p, PID_LOG_CMDLINE, read_proc_pid_cmdline(p));
+#else
+    update_pid_comm_from_cmdline(p);
+#endif
 
 
     // the process changed comm, we may have to reassign it to
     // the process changed comm, we may have to reassign it to
     // an apps_groups.conf target.
     // an apps_groups.conf target.

+ 90 - 0
src/collectors/apps.plugin/apps_pid_match.c

@@ -0,0 +1,90 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "apps_plugin.h"
+
+bool pid_match_check(struct pid_stat *p, APPS_MATCH *match) {
+    if(!match->starts_with && !match->ends_with) {
+        if(match->pattern) {
+            if(simple_pattern_matches_string(match->pattern, p->comm))
+                return true;
+        }
+        else {
+            if(match->compare == p->comm || match->compare == p->comm_orig)
+                return true;
+        }
+    }
+    else if(match->starts_with && !match->ends_with) {
+        if(match->pattern) {
+            if(simple_pattern_matches_string(match->pattern, p->comm))
+                return true;
+        }
+        else {
+            if(string_starts_with_string(p->comm, match->compare) ||
+               (p->comm != p->comm_orig && string_starts_with_string(p->comm, match->compare)))
+                return true;
+        }
+    }
+    else if(!match->starts_with && match->ends_with) {
+        if(match->pattern) {
+            if(simple_pattern_matches_string(match->pattern, p->comm))
+                return true;
+        }
+        else {
+            if(string_ends_with_string(p->comm, match->compare) ||
+               (p->comm != p->comm_orig && string_ends_with_string(p->comm, match->compare)))
+                return true;
+        }
+    }
+    else if(match->starts_with && match->ends_with && p->cmdline) {
+        if(match->pattern) {
+            if(simple_pattern_matches_string(match->pattern, p->cmdline))
+                return true;
+        }
+        else {
+            if(strstr(string2str(p->cmdline), string2str(match->compare)))
+                return true;
+        }
+    }
+
+    return false;
+}
+
+APPS_MATCH pid_match_create(const char *comm) {
+    APPS_MATCH m = {
+            .starts_with = false,
+            .ends_with = false,
+            .compare = NULL,
+            .pattern = NULL,
+    };
+
+    // copy comm to make changes to it
+    size_t len = strlen(comm);
+    char buf[len + 1];
+    memcpy(buf, comm, sizeof(buf));
+
+    trim_all(buf);
+
+    if(buf[len - 1] == '*') {
+        buf[--len] = '\0';
+        m.starts_with = true;
+    }
+
+    const char *nid = buf;
+    if (nid[0] == '*') {
+        m.ends_with = true;
+        nid++;
+    }
+
+    m.compare = string_strdupz(nid);
+
+    if(strchr(nid, '*'))
+        m.pattern = simple_pattern_create(comm, SIMPLE_PATTERN_NO_SEPARATORS, SIMPLE_PATTERN_EXACT, true);
+
+    return m;
+}
+
+void pid_match_cleanup(APPS_MATCH *m) {
+    string_freez(m->compare);
+    simple_pattern_free(m->pattern);
+}
+

+ 4 - 0
src/collectors/apps.plugin/apps_plugin.c

@@ -118,6 +118,10 @@ static char *stock_config_dir = LIBCONFIG_DIR;
 
 
 size_t pagesize;
 size_t pagesize;
 
 
+void sanitize_apps_plugin_chart_meta(char *buf) {
+    external_plugins_sanitize(buf, buf, strlen(buf) + 1);
+}
+
 // ----------------------------------------------------------------------------
 // ----------------------------------------------------------------------------
 // update chart dimensions
 // update chart dimensions
 
 

Некоторые файлы не были показаны из-за большого количества измененных файлов