Browse Source

Apps plugin improvements2 (#18673)

* improvements

* simple patterns can now be configured to run without separators; added netdata and spawn-plugins as process managers; updated documentation

* cosmetic changes

* fix issue in rrdlabels sanitizer

* fix text_sanitizers to skip leading spaces

* use quoted_strings_splitter_whitespace() instead of quoted_strings_splitter_pluginsd()

* remove know extensions from executable files

* detect sh -c exec ... and extract the right process name

* workaround for infine loop in cgroup-network with sanitization enabled
Costa Tsaousis 5 months ago
parent
commit
82d91954ae

+ 10 - 0
CMakeLists.txt

@@ -804,6 +804,15 @@ set(LIBNETDATA_FILES
         src/libnetdata/parsers/entries.h
         src/libnetdata/sanitizers/chart_id_and_name.c
         src/libnetdata/sanitizers/chart_id_and_name.h
+        src/libnetdata/sanitizers/utf8-sanitizer.c
+        src/libnetdata/sanitizers/utf8-sanitizer.h
+        src/libnetdata/sanitizers/sanitizers.h
+        src/libnetdata/sanitizers/sanitizers-labels.c
+        src/libnetdata/sanitizers/sanitizers-labels.h
+        src/libnetdata/sanitizers/sanitizers-functions.c
+        src/libnetdata/sanitizers/sanitizers-functions.h
+        src/libnetdata/sanitizers/sanitizers-pluginsd.c
+        src/libnetdata/sanitizers/sanitizers-pluginsd.h
 )
 
 if(ENABLE_PLUGIN_EBPF)
@@ -1897,6 +1906,7 @@ if(ENABLE_PLUGIN_APPS)
             src/collectors/apps.plugin/apps_os_windows.c
             src/collectors/apps.plugin/apps_incremental_collection.c
             src/collectors/apps.plugin/apps_os_windows_nt.c
+            src/collectors/apps.plugin/apps_pid_match.c
     )
 
     add_executable(apps.plugin ${APPS_PLUGIN_FILES})

+ 91 - 53
src/collectors/apps.plugin/README.md

@@ -4,51 +4,54 @@
 
 ## Process Aggregation and Grouping
 
-`apps.plugin` aggregates processes in three distinct ways to provide a more insightful
-breakdown of resource utilization:
+`apps.plugin` aggregates processes in three distinct ways to provide a more
+insightful breakdown of resource utilization:
 
 - **Tree** or **Category**: Grouped by their position in the process tree.
-   This is customizable and allows aggregation by process managers and individual
-   processes of interest. Allows also renaming the processes for presentation purposes.
+   This is customizable and allows aggregation by process managers and 
+   individual processes of interest. Allows also renaming the processes for
+   presentation purposes.
 
 - **User**: Grouped by the effective user (UID) under which the processes run.
 
-- **Group**: Grouped by the effective group (GID) under which the processes run.
+- **Group**: Grouped by the effective group (GID) under which the processes
+   run.
 
 ## Short-Lived Process Handling
 
-`apps.plugin` accounts for resource utilization of both running and exited processes,
-capturing the impact of processes that spawn short-lived subprocesses, such as shell
-scripts that fork hundreds or thousands of times per second. So, although processes
-may spawn short lived sub-processes, `apps.plugin` will aggregate their resources
-utilization providing a holistic view of how resources are shared among the processes.
+`apps.plugin` accounts for resource utilization of both running and exited
+processes, capturing the impact of processes that spawn short-lived
+subprocesses, such as shell scripts that fork hundreds or thousands of times
+per second. So, although processes may spawn short-lived sub-processes,
+`apps.plugin` will aggregate their resources utilization providing a holistic
+view of how resources are shared among the processes.
 
 ## Charts sections
 
-To provide more valuable insights, apps.plugin aggregates individual processes in several ways.
-Each type of aggregation is presented as a different section on the dashboard.
+To provide more valuable insights, apps.plugin aggregates individual processes
+in several ways. Each type of aggregation is presented as a different section
+on the dashboard.
 
 ### Custom Process Groups (Apps)
 
-In this section, apps.plugin summarizes the resources consumed by all processes, grouped based
-on the groups provided in `/etc/netdata/apps_groups.conf`. You can edit this file using our [`edit-config`](/docs/netdata-agent/configuration/README.md#edit-a-configuration-file-using-edit-config) script.
+In this section, apps.plugin summarizes the resources consumed by all
+processes, grouped based on their position in the process tree and the groups
+provided in `/etc/netdata/apps_groups.conf`. You can edit this file using our
+[`edit-config`](/docs/netdata-agent/configuration/README.md#edit-a-configuration-file-using-edit-config) script.
 
-For this section, `apps.plugin` builds a process tree (much like `ps fax` does in Linux), and groups
-processes together (evaluating both child and parent processes) so that the result is always a list with
-a predefined set of members (of course, only process groups found running are reported).
-
-> If you find that `apps.plugin` categorizes standard applications as `other`, we would be
-> glad to accept pull requests improving the defaults shipped with Netdata in `apps_groups.conf`.
+For this section, `apps.plugin` builds a process tree (much like `ps fax` does
+in Linux), and groups processes together (evaluating both child and parent
+processes).
 
 ### By User (Users)
 
-In this section, apps.plugin summarizes the resources consumed by all processes, grouped by the
-effective user under which each process runs.
+In this section, apps.plugin summarizes the resources consumed by all
+processes, grouped by the effective user under which each process runs.
 
 ### By User Group (Groups)
 
-In this section, apps.plugin summarizes the resources consumed by all processes, grouped by the
-effective user group under which each process runs.
+In this section, apps.plugin summarizes the resources consumed by all
+processes, grouped by the effective user group under which each process runs.
 
 ## Charts
 
@@ -97,14 +100,14 @@ The above are reported:
 
 ## Performance
 
-`apps.plugin` is a complex piece of software and has a lot of work to do
-We are proud that `apps.plugin` is a lot faster compared to any other similar tool,
-while collecting a lot more information for the processes, however the fact is that
-this plugin may require more CPU resources than the `netdata` daemon itself.
+We are proud that `apps.plugin` is a lot faster compared to any other similar
+tools, while collecting a lot more information for the processes, however the
+fact is that this plugin needs to traverse the entire process tree on every
+iteration, so its resources usage may be noticable.
 
-Under Linux, for each process running, `apps.plugin` reads several `/proc` files
-per process. Doing this work per-second, especially on hosts with several thousands
-of processes, may increase the CPU resources consumed by the plugin.
+Under Linux, for each process running, `apps.plugin` reads several `/proc`
+files per process. Doing this work per-second, especially on hosts with several
+thousands of processes, may increase the CPU resources consumed by the plugin.
 
 In such cases, you many need to lower its data collection frequency.
 
@@ -116,20 +119,23 @@ To do this, edit `/etc/netdata/netdata.conf` and find this section:
  # command options =
 ```
 
-Uncomment the line `update every` and set it to a higher number. If you just set it to `2`,
-its CPU resources will be cut in half, and data collection will be once every 2 seconds.
+Uncomment the line `update every` and set it to a higher number. If you just
+set it to `2`, its CPU resources will be cut in half, and data collection will
+be once every 2 seconds.
 
 ## Configuration
 
-The configuration file is `/etc/netdata/apps_groups.conf`. You can edit this file using our [`edit-config`](/docs/netdata-agent/configuration/README.md#edit-a-configuration-file-using-edit-config) script.
+The configuration file is `/etc/netdata/apps_groups.conf`. You can edit this
+file using our [`edit-config`](/docs/netdata-agent/configuration/README.md#edit-a-configuration-file-using-edit-config) script.
 
 ### Configuring process managers
 
-`apps.plugin` needs to know the common process managers, meaning the names of the processes
-which spawn other processes. Process managers are used so that `apps.plugin` will automatically
-consider all their sub-processes important to monitor.
+`apps.plugin` needs to know the common process managers, the names of the processes
+which spawn other processes. Process managers help `apps.plugin` automatically
+consider all their sub-processes, important to monitor.
 
-Process managers are configured in `apps_groups.conf` with the prefix `managers:`, like this:
+Process managers are configured in `apps_groups.conf` with the prefix
+`managers:`, like this:
 
 ```txt
 managers: process1 process2 process3
@@ -137,9 +143,32 @@ managers: process1 process2 process3
 
 Multiple lines may exist, all starting with `managers:`.
 
-The process names given here should be exactly as the operating system sets them. In Linux these
-process names are limited to 15 characters. Usually the command `ps -e` or `cat /proc/{PID}/stat`
-states the names needed here.
+A line `managers: clear` will clear all managers, so that a new list can be
+provided.
+
+### Configuring interpreters
+
+Interpreted languages like `python`, `bash`, `sh`, `node` and more, may hide
+the actual name of a process.
+
+For such programs, `apps.plugin` can be instructed to check for the actual
+process name in one of the command line parameters of the program. When a
+process matches an interpreter, apps.plugin will go through all the parameters
+of the interpreter and find the first parameter that is an absolute filename
+existing on disk. When  found, `apps.plugin` will name the process using
+the name of that filename.
+
+Interpreters are configured in `apps_groups.conf` with the prefix
+`interpreters:`, like this:
+
+```txt
+interpreters: process1 process2 process3
+```
+
+Multiple lines may exist, all starting with `interpreters:`.
+
+A line `interpreters: clear` will clear all interpreters, so that a new list
+can be provided.
 
 ### Configuring process groups and renaming processes
 
@@ -151,16 +180,20 @@ group: process1 process2 ...
 
 Each group can be given multiple times, to add more processes to it.
 
-For each process given, all of its sub-processes will be grouped, not just the matched process.
+For each process given, all of its sub-processes will be grouped, not just the
+matched process.
+
+### Matching processes
 
 The process names are the ones returned by:
 
 - **comm**: `ps -e` or `cat /proc/{PID}/stat`
 - **cmdline**: in case of substring mode (see below): `/proc/{PID}/cmdline`
 
-On Linux **comm** is limited to just a few characters. `apps.plugin` attempts to find the entire
-**comm** name by looking for it at the **cmdline**. When this is successful, the entire process name
-is available, otherwise the shortened one is used.
+On Linux **comm** is limited to 15 characters. `apps.plugin` attempts to find
+the entire **comm** name by looking for it at the **cmdline**. When this is
+successful, the entire process name is available, otherwise the shortened one
+is used.
 
 To add process names with spaces, enclose them in quotes (single or double)
 example: `'Plex Media Serv'` or `"my other process"`.
@@ -171,18 +204,23 @@ You can add asterisks (`*`) to provide a pattern:
 - `name*` _prefix_ mode: will match a **comm** beginning with `name`.
 - `*name*` _substring_ mode: will search for `name` in **cmdline**.
 
-Asterisks may appear in the middle of `name` (like `na*me`), without affecting what is being
-matched (**comm** or **cmdline**).
+Asterisks may appear in the middle of `name` (like `na*me`), without affecting
+what is being matched (**comm** or **cmdline**).
+
+To add processes with single quotes, enclose them in double quotes:
+`"process with this ' single quote"`.
 
-To add processes with single quotes, enclose them in double quotes: `"process with this ' single quote"`
+To add processes with double quotes, enclose them in single quotes:
+`'process with this " double quote'`.
 
-To add processes with double quotes, enclose them in single quotes: `'process with this " double quote'`
+The order of the entries in this list is important: the first one that matches
+a process is used, so follow a top-down hierarchy. Processes not matched by any
+row, will inherit it from their parents.
 
-The order of the entries in this list is important: the first one that matches a process is used, so follow a top-down hierarchy.
-Processes not matched by any row, will inherit it from their parents.
+There are a few command line options you can pass to `apps.plugin`. The list of
+available options can be acquired with the `--help` flag. The options can be
+set in the `netdata.conf` using the [`edit-config` script](/docs/netdata-agent/configuration/README.md).
 
-There are a few command line options you can pass to `apps.plugin`. The list of available
-options can be acquired with the `--help` flag. The options can be set in the `netdata.conf` using the [`edit-config` script](/docs/netdata-agent/configuration/README.md).
 For example, to disable user and user group charts you would set:
 
 ```txt

+ 6 - 49
src/collectors/apps.plugin/apps_aggregations.c

@@ -110,61 +110,18 @@ static inline void cleanup_exited_pids(void) {
     }
 }
 
-static struct target *matched_apps_groups_target(struct pid_stat *p, struct target *w) {
-    if(is_process_manager(p))
-        return NULL;
-
-    p->matched_by_config = true;
-    return w->target ? w->target : w;
-}
-
 static struct target *get_apps_groups_target_for_pid(struct pid_stat *p) {
     targets_assignment_counter++;
 
     for(struct target *w = apps_groups_root_target; w ; w = w->next) {
         if(w->type != TARGET_TYPE_APP_GROUP) continue;
 
-        if(!w->starts_with && !w->ends_with) {
-            if(w->ag.pattern) {
-                if(simple_pattern_matches_string(w->ag.pattern, p->comm))
-                    return matched_apps_groups_target(p, w);
-            }
-            else {
-                if(w->ag.compare == p->comm || w->ag.compare == p->comm_orig)
-                    return matched_apps_groups_target(p, w);
-            }
-        }
-        else if(w->starts_with && !w->ends_with) {
-            if(w->ag.pattern) {
-                if(simple_pattern_matches_string(w->ag.pattern, p->comm))
-                    return matched_apps_groups_target(p, w);
-            }
-            else {
-                if(string_starts_with_string(p->comm, w->ag.compare) ||
-                    (p->comm != p->comm_orig && string_starts_with_string(p->comm, w->ag.compare)))
-                    return matched_apps_groups_target(p, w);
-            }
-        }
-        else if(!w->starts_with && w->ends_with) {
-            if(w->ag.pattern) {
-                if(simple_pattern_matches_string(w->ag.pattern, p->comm))
-                    return matched_apps_groups_target(p, w);
-            }
-            else {
-                if(string_ends_with_string(p->comm, w->ag.compare) ||
-                    (p->comm != p->comm_orig && string_ends_with_string(p->comm, w->ag.compare)))
-                    return matched_apps_groups_target(p, w);
-            }
-        }
-        else if(w->starts_with && w->ends_with && p->cmdline) {
-            if(w->ag.pattern) {
-                if(simple_pattern_matches_string(w->ag.pattern, p->cmdline))
-                    return matched_apps_groups_target(p, w);
-            }
-            else {
-                if(strstr(string2str(p->cmdline), string2str(w->ag.compare)))
-                    return matched_apps_groups_target(p, w);
-            }
+        if(pid_match_check(p, &w->match)) {
+            if(p->is_manager)
+                return NULL;
+
+            p->matched_by_config = true;
+            return w->target ? w->target : w;
         }
     }
 

+ 1 - 1
src/collectors/apps.plugin/apps_functions.c

@@ -86,7 +86,7 @@ void function_processes(const char *transaction, char *function,
             access, HTTP_ACCESS_SIGNED_ID | HTTP_ACCESS_SAME_SPACE | HTTP_ACCESS_SENSITIVE_DATA | HTTP_ACCESS_VIEW_AGENT_CONFIG) || enable_function_cmdline;
 
     char *words[PLUGINSD_MAX_WORDS] = { NULL };
-    size_t num_words = quoted_strings_splitter_pluginsd(function, words, PLUGINSD_MAX_WORDS);
+    size_t num_words = quoted_strings_splitter_whitespace(function, words, PLUGINSD_MAX_WORDS);
 
     struct target *category = NULL, *user = NULL, *group = NULL; (void)category; (void)user; (void)group;
     const char *process_name = NULL;

+ 25 - 28
src/collectors/apps.plugin/apps_groups.conf

@@ -4,49 +4,46 @@
 ## Documentation at:
 ## https://github.com/netdata/netdata/blob/master/src/collectors/apps.plugin/README.md
 ##
-## Subprocesses of process managers are monitored.
-## (uncomment to edit - the default is also hardcoded into the plugin)
+## -----------------------------------------------------------------------------
+## Subprocesses of process managers are monitored individually.
+## (uncomment to add or edit - the default is also hardcoded into the plugin)
+
+## Clear all the managers, to set yours, otherwise append to the internal list.
+#managers: clear
 
 ## Linux process managers
 #managers: init systemd containerd-shim-runc-v2 dumb-init gnome-shell docker-init
-#managers: openrc-run.sh crond plasmashell xfwm4
+#managers: spawn-plugins openrc-run.sh crond plasmashell xfwm4
 
 ## FreeBSD process managers
-#managers: init
+#managers: init spawn-plugins
 
 ## MacOS process managers
-#managers: launchd
+#managers: launchd spawn-plugins
 
 ## Windows process managers
-#managers: wininit services explorer System
+#managers: wininit services explorer System netdata
+
+## -----------------------------------------------------------------------------
+## Interpreters to search for the actual command name in command line.
+## (uncomment to add or edit - the default is also hardcoded into the plugin)
+
+## Clear all the interpreters, to set yours, otherwise append to the internal list.
+#interpreters: clear
+
+#interpreters: python python2 python3
+#interpreters: sh bash zsh
+#interpreters: node perl awk
 
 ## -----------------------------------------------------------------------------
 ## Processes of interest
+## Grouping and/or rename individual processes.
+## (there is no internal default for this section)
 
 ## NETDATA processes accounting
 netdata: netdata
-## netdata known plugins
-## plugins not defined here will be accumulated into netdata, above
-apps.plugin: *apps.plugin*
-go.d.plugin: *go.d.plugin*
-systemd-journal.plugin: *systemd-journal.plugin*
-network-viewer.plugin: *network-viewer.plugin*
-windows-events.plugin: *windows-events.plugin*
-cups.plugin: *cups.plugin*
-perf.plugin: *perf.plugin*
-nfacct.plugin: *nfacct.plugin*
-xenstat.plugin: *xenstat.plugin*
-freeipmi.plugin: *freeipmi.plugin*
-charts.d.plugin: *charts.d.plugin*
-python.d.plugin: *python.d.plugin*
-slabinfo.plugin: *slabinfo.plugin*
-ebpf.plugin: *ebpf.plugin*
-debugfs.plugin: *debugfs.plugin*
-tc-qos-helper: *tc-qos-helper.sh*
-fping: fping
-ioping: ioping
-
-## agent-service-discovery
+
+## NETDATA agent-service-discovery (kubernetes)
 agent_sd: agent_sd
 
 ## -----------------------------------------------------------------------------

+ 2 - 3
src/collectors/apps.plugin/apps_incremental_collection.c

@@ -174,15 +174,14 @@ int read_proc_pid_cmdline(struct pid_stat *p) {
     if(unlikely(!OS_FUNCTION(apps_os_get_pid_cmdline)(p, cmdline, sizeof(cmdline))))
         goto cleanup;
 
-    string_freez(p->cmdline);
-    p->cmdline = string_strdupz(cmdline);
+    update_pid_cmdline(p, cmdline);
 
     return 1;
 
 cleanup:
     // copy the command to the command line
     string_freez(p->cmdline);
-    p->cmdline = string_dup(p->comm);
+    p->cmdline = NULL;
     return 0;
 }
 #endif

+ 25 - 18
src/collectors/apps.plugin/apps_os_windows.c

@@ -521,15 +521,24 @@ static char *wchar_to_utf8(WCHAR *s) {
     return utf8;
 }
 
-// Convert wide string to UTF-8
-static STRING *wchar_to_string(WCHAR *s) {
-    return string_strdupz(wchar_to_utf8(s));
+static char *ansi_to_utf8(LPCSTR str) {
+    static __thread WCHAR unicode[PATH_MAX];
+    static __thread int unicode_size = sizeof(unicode) / sizeof(*unicode);
+
+    // Step 1: Convert ANSI string (LPSTR) to wide string (UTF-16)
+    int wideLength = MultiByteToWideChar(CP_ACP, 0, str, -1, NULL, 0);
+    if (wideLength == 0 || wideLength > unicode_size)
+        return NULL;
+
+    MultiByteToWideChar(CP_ACP, 0, str, -1, unicode, wideLength);
+
+    return wchar_to_utf8(unicode);
 }
 
 // --------------------------------------------------------------------------------------------------------------------
 
 // return a sanitized name for the process
-STRING *GetProcessFriendlyNameSanitized(WCHAR *path) {
+STRING *GetProcessFriendlyNameFromPathSanitized(WCHAR *path) {
     static __thread uint8_t void_buf[1024 * 1024];
     static __thread DWORD void_buf_size = sizeof(void_buf);
     static __thread wchar_t unicode[PATH_MAX];
@@ -548,7 +557,7 @@ STRING *GetProcessFriendlyNameSanitized(WCHAR *path) {
             wcsncpy(unicode, value, unicode_size - 1);
             unicode[unicode_size - 1] = L'\0';
             char *name = wchar_to_utf8(unicode);
-            sanitize_chart_meta(name);
+            sanitize_apps_plugin_chart_meta(name);
             return string_strdupz(name);
         }
     }
@@ -573,7 +582,7 @@ static STRING *GetNameFromCmdlineSanitized(struct pid_stat *p) {
                 char service[strlen(words[i + 1]) + sizeof(SERVICE_PREFIX)]; // sizeof() includes a null
                 strcpy(service, SERVICE_PREFIX);
                 strcpy(&service[sizeof(SERVICE_PREFIX) - 1], words[i + 1]);
-                sanitize_chart_meta(service);
+                sanitize_apps_plugin_chart_meta(service);
                 return string_strdupz(service);
             }
         }
@@ -621,13 +630,12 @@ static void GetServiceNames(void) {
         if(p && !p->got_service) {
             p->got_service = true;
 
-            size_t len = strlen(pServiceStatus[i].lpDisplayName);
-            char buf[len + 1];
-            memcpy(buf, pServiceStatus[i].lpDisplayName, sizeof(buf));
-            sanitize_chart_meta(buf);
-
-            string_freez(p->name);
-            p->name = string_strdupz(buf);
+            char *name = ansi_to_utf8(pServiceStatus[i].lpDisplayName);
+            if(name) {
+                sanitize_apps_plugin_chart_meta(name);
+                string_freez(p->name);
+                p->name = string_strdupz(name);
+            }
         }
     }
 
@@ -695,14 +703,13 @@ void GetAllProcessesInfo(void) {
         {
             WCHAR *cmdline = GetProcessCommandLine(hProcess); // returns malloc'd buffer
             if (cmdline) {
-                string_freez(p->cmdline);
-                p->cmdline = wchar_to_string(cmdline);
+                update_pid_cmdline(p, wchar_to_utf8(cmdline));
 
                 // extract the process full path from the command line
                 WCHAR *path = executable_path_from_cmdline(cmdline);
                 if(path) {
                     string_freez(p->name);
-                    p->name = GetProcessFriendlyNameSanitized(path);
+                    p->name = GetProcessFriendlyNameFromPathSanitized(path);
                 }
 
                 free(cmdline); // free(), not freez()
@@ -713,10 +720,10 @@ void GetAllProcessesInfo(void) {
             if (QueryFullProcessImageNameW(hProcess, 0, unicode, &unicode_size)) {
                 // put the full path name to the command into cmdline
                 if(!p->cmdline)
-                    p->cmdline = wchar_to_string(unicode);
+                    update_pid_cmdline(p, wchar_to_utf8(unicode));
 
                 if(!p->name)
-                    p->name = GetProcessFriendlyNameSanitized(unicode);
+                    p->name = GetProcessFriendlyNameFromPathSanitized(unicode);
             }
         }
 

+ 160 - 31
src/collectors/apps.plugin/apps_pid.c

@@ -317,37 +317,165 @@ static inline void link_all_processes_to_their_parents(void) {
 
 // --------------------------------------------------------------------------------------------------------------------
 
-static inline STRING *comm_from_cmdline_sanitized(char *comm, STRING *cmdline) {
-    if(!cmdline) {
-        sanitize_chart_meta(comm);
-        return string_strdupz(comm);
+static bool is_filename(const char *s) {
+    if(!s || !*s) return false;
+
+#if defined(OS_WINDOWS)
+    if( (isalpha((uint8_t)*s) || (s[1] == ':' && s[2] == '\\')) ||                  // windows native "x:\"
+        (isalpha((uint8_t)*s) || (s[1] == ':' && s[2] == '/')) ||                   // windows native "x:/"
+        (*s == '\\' && s[1] == '\\' && isalpha((uint8_t)s[2]) && s[3] == '\\') ||   // windows native "\\x\"
+        (*s == '/' && s[1] == '/' && isalpha((uint8_t)s[2]) && s[3] == '/')) {      // windows native "//x/"
+
+        WCHAR ws[FILENAME_MAX];
+        int wlen = MultiByteToWideChar(CP_UTF8, 0, s, -1, NULL, 0);
+        if (wlen <= 0 || (size_t)wlen > sizeof(ws) / sizeof(*ws)) {
+            return false; // Failed to convert UTF-8 to UTF-16
+        }
+
+        MultiByteToWideChar(CP_UTF8, 0, s, -1, ws, wlen);
+        DWORD attributes = GetFileAttributesW(ws);
+        if (attributes != INVALID_FILE_ATTRIBUTES)
+            return true;
     }
+#endif
 
-    const char *cl = string2str(cmdline);
-    size_t len = string_strlen(cmdline);
+    // for: sh -c "exec /path/to/command parameters"
+    if(strncmp(s, "exec ", 5) == 0 && s[5]) {
+        s += 5;
+        char look_for = ' ';
+        if(*s == '\'') { look_for = '\''; s++; }
+        if(*s == '"') { look_for = '"'; s++; }
+        char *end = strchr(s, look_for);
+        if(end) *end = '\0';
+    }
 
-    char buf_cmd[len + 1];
-    // if it is enclosed in (), remove the parenthesis
-    if(cl[0] == '(' && cl[len - 1] == ')') {
-        memcpy(buf_cmd, &cl[1], len - 2);
-        buf_cmd[len - 2] = '\0';
+    // linux, freebsd, macos, msys, cygwin
+    if(*s == '/') {
+        struct statvfs stat;
+        return statvfs(s, &stat) == 0;
     }
-    else
-        memcpy(buf_cmd, cl, sizeof(buf_cmd));
 
-    size_t comm_len = strlen(comm);
-    char *start = strstr(buf_cmd, comm);
-    if(start) {
+    return false;
+}
+
+static const char *extensions_to_strip[] = {
+        ".sh", // shell scripts
+        ".py", // python scripts
+        ".pl", // perl scripts
+        ".js", // node.js
+#if defined(OS_WINDOWS)
+        ".exe",
+#endif
+        NULL,
+};
+
+// strip extensions we don't want to show
+static void remove_extension(char *name) {
+    size_t name_len = strlen(name);
+    for(size_t i = 0; extensions_to_strip[i] != NULL; i++) {
+        const char *ext = extensions_to_strip[i];
+        size_t ext_len = strlen(ext);
+        if(name_len > ext_len) {
+            char *check = &name[name_len - ext_len];
+            if(strcmp(check, ext) == 0) {
+                *check = '\0';
+                break;
+            }
+        }
+    }
+}
+
+static inline STRING *comm_from_cmdline_param_sanitized(STRING *cmdline) {
+    if(!cmdline) return NULL;
+
+    char buf[string_strlen(cmdline) + 1];
+    memcpy(buf, string2str(cmdline), sizeof(buf));
+
+    char *words[100];
+    size_t num_words = quoted_strings_splitter_whitespace(buf, words, 100);
+    for(size_t word = 1; word < num_words ;word++) {
+        char *s = words[word];
+        if(is_filename(s)) {
+            char *name = strrchr(s, '/');
+
+#if defined(OS_WINDOWS)
+            if(!name)
+                name = strrchr(s, '\\');
+#endif
+
+            if(name && *name) {
+                name++;
+                remove_extension(name);
+                sanitize_apps_plugin_chart_meta(name);
+                return string_strdupz(name);
+            }
+        }
+    }
+
+    return NULL;
+}
+
+static inline STRING *comm_from_cmdline_sanitized(STRING *comm, STRING *cmdline) {
+    if(!cmdline) return NULL;
+
+    char buf[string_strlen(cmdline) + 1];
+    memcpy(buf, string2str(cmdline), sizeof(buf));
+
+    size_t comm_len = string_strlen(comm);
+    char *start = strstr(buf, string2str(comm));
+    while (start) {
         char *end = start + comm_len;
-        while(*end && !isspace((uint8_t)*end) && *end != '/' && *end != '\\' && *end != '"') end++;
+        while (*end &&
+               !isspace((uint8_t) *end) &&
+               *end != '/' &&    // path separator - linux
+               *end != '\\' &&   // path separator - windows
+               *end != '"' &&    // closing double quotes
+               *end != '\'' &&   // closing single quotes
+               *end != ')' &&    // sometimes process add ) at their end
+               *end != ':')      // sometimes process add : at their end
+            end++;
+
         *end = '\0';
 
-        sanitize_chart_meta(start);
+        remove_extension(start);
+        sanitize_apps_plugin_chart_meta(start);
         return string_strdupz(start);
     }
 
-    sanitize_chart_meta(comm);
-    return string_strdupz(comm);
+    return NULL;
+}
+
+static void update_pid_comm_from_cmdline(struct pid_stat *p) {
+    bool updated = false;
+
+    STRING *new_comm = comm_from_cmdline_sanitized(p->comm, p->cmdline);
+    if(new_comm) {
+        string_freez(p->comm);
+        p->comm = new_comm;
+        updated = true;
+    }
+
+    if(is_process_an_interpreter(p)) {
+        new_comm = comm_from_cmdline_param_sanitized(p->cmdline);
+        if(new_comm) {
+            string_freez(p->comm);
+            p->comm = new_comm;
+            updated = true;
+        }
+    }
+
+    if(updated) {
+        p->is_manager = is_process_a_manager(p);
+        p->is_aggregator = is_process_an_aggregator(p);
+    }
+}
+
+void update_pid_cmdline(struct pid_stat *p, const char *cmdline) {
+    string_freez(p->cmdline);
+    p->cmdline = cmdline ? string_strdupz(cmdline) : NULL;
+
+    if(p->cmdline)
+        update_pid_comm_from_cmdline(p);
 }
 
 void update_pid_comm(struct pid_stat *p, const char *comm) {
@@ -355,10 +483,8 @@ void update_pid_comm(struct pid_stat *p, const char *comm) {
         // no change
         return;
 
-#if (PROCESSES_HAVE_CMDLINE == 1)
-    if(likely(proc_pid_cmdline_is_needed && !p->cmdline))
-        managed_log(p, PID_LOG_CMDLINE, read_proc_pid_cmdline(p));
-#endif
+    string_freez(p->comm_orig);
+    p->comm_orig = string_strdupz(comm);
 
     // some process names have ( and ), remove the parenthesis
     size_t len = strlen(comm);
@@ -370,14 +496,17 @@ void update_pid_comm(struct pid_stat *p, const char *comm) {
     else
         memcpy(buf, comm, sizeof(buf));
 
-    string_freez(p->comm_orig);
-    p->comm_orig = string_strdupz(comm);
+    sanitize_apps_plugin_chart_meta(buf);
+    p->comm = string_strdupz(buf);
+    p->is_manager = is_process_a_manager(p);
+    p->is_aggregator = is_process_an_aggregator(p);
 
-    string_freez(p->comm);
-    p->comm = comm_from_cmdline_sanitized(buf, p->cmdline);
-
-    p->is_manager = is_process_manager(p);
-    p->is_aggregator = is_process_aggregator(p);
+#if (PROCESSES_HAVE_CMDLINE == 1)
+    if(likely(proc_pid_cmdline_is_needed && !p->cmdline))
+        managed_log(p, PID_LOG_CMDLINE, read_proc_pid_cmdline(p));
+#else
+    update_pid_comm_from_cmdline(p);
+#endif
 
     // the process changed comm, we may have to reassign it to
     // an apps_groups.conf target.

+ 90 - 0
src/collectors/apps.plugin/apps_pid_match.c

@@ -0,0 +1,90 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "apps_plugin.h"
+
+bool pid_match_check(struct pid_stat *p, APPS_MATCH *match) {
+    if(!match->starts_with && !match->ends_with) {
+        if(match->pattern) {
+            if(simple_pattern_matches_string(match->pattern, p->comm))
+                return true;
+        }
+        else {
+            if(match->compare == p->comm || match->compare == p->comm_orig)
+                return true;
+        }
+    }
+    else if(match->starts_with && !match->ends_with) {
+        if(match->pattern) {
+            if(simple_pattern_matches_string(match->pattern, p->comm))
+                return true;
+        }
+        else {
+            if(string_starts_with_string(p->comm, match->compare) ||
+               (p->comm != p->comm_orig && string_starts_with_string(p->comm, match->compare)))
+                return true;
+        }
+    }
+    else if(!match->starts_with && match->ends_with) {
+        if(match->pattern) {
+            if(simple_pattern_matches_string(match->pattern, p->comm))
+                return true;
+        }
+        else {
+            if(string_ends_with_string(p->comm, match->compare) ||
+               (p->comm != p->comm_orig && string_ends_with_string(p->comm, match->compare)))
+                return true;
+        }
+    }
+    else if(match->starts_with && match->ends_with && p->cmdline) {
+        if(match->pattern) {
+            if(simple_pattern_matches_string(match->pattern, p->cmdline))
+                return true;
+        }
+        else {
+            if(strstr(string2str(p->cmdline), string2str(match->compare)))
+                return true;
+        }
+    }
+
+    return false;
+}
+
+APPS_MATCH pid_match_create(const char *comm) {
+    APPS_MATCH m = {
+            .starts_with = false,
+            .ends_with = false,
+            .compare = NULL,
+            .pattern = NULL,
+    };
+
+    // copy comm to make changes to it
+    size_t len = strlen(comm);
+    char buf[len + 1];
+    memcpy(buf, comm, sizeof(buf));
+
+    trim_all(buf);
+
+    if(buf[len - 1] == '*') {
+        buf[--len] = '\0';
+        m.starts_with = true;
+    }
+
+    const char *nid = buf;
+    if (nid[0] == '*') {
+        m.ends_with = true;
+        nid++;
+    }
+
+    m.compare = string_strdupz(nid);
+
+    if(strchr(nid, '*'))
+        m.pattern = simple_pattern_create(comm, SIMPLE_PATTERN_NO_SEPARATORS, SIMPLE_PATTERN_EXACT, true);
+
+    return m;
+}
+
+void pid_match_cleanup(APPS_MATCH *m) {
+    string_freez(m->compare);
+    simple_pattern_free(m->pattern);
+}
+

+ 4 - 0
src/collectors/apps.plugin/apps_plugin.c

@@ -118,6 +118,10 @@ static char *stock_config_dir = LIBCONFIG_DIR;
 
 size_t pagesize;
 
+void sanitize_apps_plugin_chart_meta(char *buf) {
+    external_plugins_sanitize(buf, buf, strlen(buf) + 1);
+}
+
 // ----------------------------------------------------------------------------
 // update chart dimensions
 

Some files were not shown because too many files changed in this diff