123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270 |
- <!--
- title: Monitor any process in real-time with Netdata
- sidebar_label: Monitor any process in real-time with Netdata
- description: "Tap into Netdata's powerful collectors, with per-second utilization metrics for every process, to troubleshoot faster and make data-informed decisions."
- image: /img/seo/guides/monitor/process.png
- custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/monitor/process.md
- learn_status: "Published"
- learn_rel_path: "Operations"
- -->
- # Monitor any process in real-time with Netdata
- Netdata is more than a multitude of generic system-level metrics and visualizations. Instead of providing only a bird's
- eye view of your system, leaving you to wonder exactly _what_ is taking up 99% CPU, Netdata also gives you visibility
- into _every layer_ of your node. These additional layers give you context, and meaningful insights, into the true health
- and performance of your infrastructure.
- One of these layers is the _process_. Every time a Linux system runs a program, it creates an independent process that
- executes the program's instructions in parallel with anything else happening on the system. Linux systems track the
- state and resource utilization of processes using the [`/proc` filesystem](https://en.wikipedia.org/wiki/Procfs), and
- Netdata is designed to hook into those metrics to create meaningful visualizations out of the box.
- While there are a lot of existing command-line tools for tracking processes on Linux systems, such as `ps` or `top`,
- only Netdata provides dozens of real-time charts, at both per-second and event frequency, without you having to write
- SQL queries or know a bunch of arbitrary command-line flags.
- With Netdata's process monitoring, you can:
- - Benchmark/optimize performance of standard applications, like web servers or databases
- - Benchmark/optimize performance of custom applications
- - Troubleshoot CPU/memory/disk utilization issues (why is my system's CPU spiking right now?)
- - Perform granular capacity planning based on the specific needs of your infrastructure
- - Search for leaking file descriptors
- - Investigate zombie processes
- ... and much more. Let's get started.
- ## Prerequisites
- - One or more Linux nodes running [Netdata](/packaging/installer/README.md)
- - A general understanding of how
- to [configure the Netdata Agent](/docs/netdata-agent/configuration/README.md)
- using `edit-config`.
- - A Netdata Cloud account. [Sign up](https://app.netdata.cloud) if you don't have one already.
- ## How does Netdata do process monitoring?
- The Netdata Agent already knows to look for hundreds
- of [standard applications that we support via collectors](/src/collectors/COLLECTORS.md),
- and groups them based on their
- purpose. Let's say you want to monitor a MySQL
- database using its process. The Netdata Agent already knows to look for processes with the string `mysqld` in their
- name, along with a few others, and puts them into the `sql` group. This `sql` group then becomes a dimension in all
- process-specific charts.
- The process and groups settings are used by two unique and powerful collectors.
- [**`apps.plugin`**](/src/collectors/apps.plugin/README.md) looks at the Linux
- process tree every second, much like `top` or
- `ps fax`, and collects resource utilization information on every running process. It then automatically adds a layer of
- meaningful visualization on top of these metrics, and creates per-process/application charts.
- [**`ebpf.plugin`**](/src/collectors/ebpf.plugin/README.md): Netdata's extended
- Berkeley Packet Filter (eBPF) collector
- monitors Linux kernel-level metrics for file descriptors, virtual filesystem IO, and process management, and then hands
- process-specific metrics over to `apps.plugin` for visualization. The eBPF collector also collects and visualizes
- metrics on an _event frequency_, which means it captures every kernel interaction, and not just the volume of
- interaction at every second in time. That's even more precise than Netdata's standard per-second granularity.
- ### Per-process metrics and charts in Netdata
- With these collectors working in parallel, Netdata visualizes the following per-second metrics for _any_ process on your
- Linux systems:
- - CPU utilization (`apps.cpu`)
- - Total CPU usage
- - User/system CPU usage (`apps.cpu_user`/`apps.cpu_system`)
- - Disk I/O
- - Physical reads/writes (`apps.preads`/`apps.pwrites`)
- - Logical reads/writes (`apps.lreads`/`apps.lwrites`)
- - Open unique files (if a file is found open multiple times, it is counted just once, `apps.files`)
- - Memory
- - Real Memory Used (non-shared, `apps.mem`)
- - Virtual Memory Allocated (`apps.vmem`)
- - Minor page faults (i.e. memory activity, `apps.minor_faults`)
- - Processes
- - Threads running (`apps.threads`)
- - Processes running (`apps.processes`)
- - Carried over uptime (since the last Netdata Agent restart, `apps.uptime`)
- - Minimum uptime (`apps.uptime_min`)
- - Average uptime (`apps.uptime_average`)
- - Maximum uptime (`apps.uptime_max`)
- - Pipes open (`apps.pipes`)
- - Swap memory
- - Swap memory used (`apps.swap`)
- - Major page faults (i.e. swap activity, `apps.major_faults`)
- - Network
- - Sockets open (`apps.sockets`)
- - eBPF file
- - Number of calls to open files. (`apps.file_open`)
- - Number of files closed. (`apps.file_closed`)
- - Number of calls to open files that returned errors.
- - Number of calls to close files that returned errors.
- - eBPF syscall
- - Number of calls to delete files. (`apps.file_deleted`)
- - Number of calls to `vfs_write`. (`apps.vfs_write_call`)
- - Number of calls to `vfs_read`. (`apps.vfs_read_call`)
- - Number of bytes written with `vfs_write`. (`apps.vfs_write_bytes`)
- - Number of bytes read with `vfs_read`. (`apps.vfs_read_bytes`)
- - Number of calls to write a file that returned errors.
- - Number of calls to read a file that returned errors.
- - eBPF process
- - Number of process created with `do_fork`. (`apps.process_create`)
- - Number of threads created with `do_fork` or `__x86_64_sys_clone`, depending on your system's kernel
- version. (`apps.thread_create`)
- - Number of times that a process called `do_exit`. (`apps.task_close`)
- - eBPF net
- - Number of bytes sent. (`apps.bandwidth_sent`)
- - Number of bytes received. (`apps.bandwidth_recv`)
- As an example, here's the per-process CPU utilization chart, including a `sql` group/dimension.
- ![A per-process CPU utilization chart in Netdata Cloud](https://user-images.githubusercontent.com/1153921/101217226-3a5d5700-363e-11eb-8610-aa1640aefb5d.png)
- ## Configure the Netdata Agent to recognize a specific process
- To monitor any process, you need to make sure the Netdata Agent is aware of it. As mentioned above, the Agent is already
- aware of hundreds of processes, and collects metrics from them automatically.
- But, if you want to change the grouping behavior, add an application that isn't yet supported in the Netdata Agent, or
- monitor a custom application, you need to edit the `apps_groups.conf` configuration file.
- Navigate to your [Netdata config directory](/docs/netdata-agent/configuration/README.md) and
- use `edit-config` to edit the file.
- ```bash
- cd /etc/netdata # Replace this with your Netdata config directory if not at /etc/netdata.
- sudo ./edit-config apps_groups.conf
- ```
- Inside the file are lists of process names, oftentimes using wildcards (`*`), that the Netdata Agent looks for and
- groups together. For example, the Netdata Agent looks for processes starting with `mysqld`, `mariad`, `postgres`, and
- others, and groups them into `sql`. That makes sense, since all these processes are for SQL databases.
- ```text
- sql: mysqld* mariad* postgres* postmaster* oracle_* ora_* sqlservr
- ```
- These groups are then reflected as [dimensions](/src/web/README.md#dimensions)
- within Netdata's charts.
- ![An example per-process CPU utilization chart in Netdata
- Cloud](https://user-images.githubusercontent.com/1153921/101369156-352e2100-3865-11eb-9f0d-b8fac162e034.png)
- See the following two sections for details based on your needs. If you don't need to configure `apps_groups.conf`, jump
- down to [visualizing process metrics](#visualize-process-metrics).
- ### Standard applications (web servers, databases, containers, and more)
- As explained above, the Netdata Agent is already aware of most standard applications you run on Linux nodes, and you
- shouldn't need to configure it to discover them.
- However, if you're using multiple applications that the Netdata Agent groups together you may want to separate them for
- more precise monitoring. If you're not running any other types of SQL databases on that node, you don't need to change
- the grouping, since you know that any MySQL is the only process contributing to the `sql` group.
- Let's say you're using both MySQL and PostgreSQL databases on a single node, and want to monitor their processes
- independently. Open the `apps_groups.conf` file as explained in
- the [section above](#configure-the-netdata-agent-to-recognize-a-specific-process) and scroll down until you find
- the `database servers` section. Create new groups for MySQL and PostgreSQL, and move their process queries into the
- unique groups.
- ```text
- # -----------------------------------------------------------------------------
- # database servers
- mysql: mysqld*
- postgres: postgres*
- sql: mariad* postmaster* oracle_* ora_* sqlservr
- ```
- Restart Netdata with `sudo systemctl restart netdata`, or
- the appropriate method for your system, to start collecting utilization metrics
- from your application. Time to [visualize your process metrics](#visualize-process-metrics).
- ### Custom applications
- Let's assume you have an application that runs on the process `custom-app`. To monitor eBPF metrics for that application
- separate from any others, you need to create a new group in `apps_groups.conf` and associate that process name with it.
- Open the `apps_groups.conf` file as explained in
- the [section above](#configure-the-netdata-agent-to-recognize-a-specific-process). Scroll down
- to `# NETDATA processes accounting`.
- Above that, paste in the following text, which creates a new `custom-app` group with the `custom-app` process. Replace
- `custom-app` with the name of your application's Linux process. `apps_groups.conf` should now look like this:
- ```text
- ...
- # -----------------------------------------------------------------------------
- # Custom applications to monitor with apps.plugin and ebpf.plugin
- custom-app: custom-app
- # -----------------------------------------------------------------------------
- # NETDATA processes accounting
- ...
- ```
- Restart Netdata with `sudo systemctl restart netdata`, or
- the appropriate method for your system, to start collecting utilization metrics
- from your application.
- ## Visualize process metrics
- Now that you're collecting metrics for your process, you'll want to visualize them using Netdata's real-time,
- interactive charts. Find these visualizations in the same section regardless of whether you
- use [Netdata Cloud](https://app.netdata.cloud) for infrastructure monitoring, or single-node monitoring with the local
- Agent's dashboard at `http://localhost:19999`.
- If you need a refresher on all the available per-process charts, see
- the [above list](#per-process-metrics-and-charts-in-netdata).
- ### Using Netdata's application collector (`apps.plugin`)
- `apps.plugin` puts all of its charts under the **Applications** section of any Netdata dashboard.
- ![Screenshot of the Applications section on a Netdata dashboard](https://user-images.githubusercontent.com/1153921/101401172-2ceadb80-388f-11eb-9e9a-88443894c272.png)
- Let's continue with the MySQL example. We can create a [test
- database](https://www.digitalocean.com/community/tutorials/how-to-measure-mysql-query-performance-with-mysqlslap) in
- MySQL to generate load on the `mysql` process.
- `apps.plugin` immediately collects and visualizes this activity `apps.cpu` chart, which shows an increase in CPU
- utilization from the `sql` group. There is a parallel increase in `apps.pwrites`, which visualizes writes to disk.
- ![Per-application CPU utilization metrics](https://user-images.githubusercontent.com/1153921/101409725-8527da80-389b-11eb-96e9-9f401535aafc.png)
- ![Per-application disk writing metrics](https://user-images.githubusercontent.com/1153921/101409728-85c07100-389b-11eb-83fd-d79dd1545b5a.png)
- Next, the `mysqlslap` utility queries the database to provide some benchmarking load on the MySQL database. It won't
- look exactly like a production database executing lots of user queries, but it gives you an idea into the possibility of
- these visualizations.
- ```bash
- sudo mysqlslap --user=sysadmin --password --host=localhost --concurrency=50 --iterations=10 --create-schema=employees --query="SELECT * FROM dept_emp;" --verbose
- ```
- The following per-process disk utilization charts show spikes under the `sql` group at the same time `mysqlslap` was run
- numerous times, with slightly different concurrency and query options.
- ![Per-application disk metrics](https://user-images.githubusercontent.com/1153921/101411810-d08fb800-389e-11eb-85b3-f3fa41f1f887.png)
- > 💡 Click on any dimension below a chart in Netdata Cloud (or to the right of a chart on a local Agent dashboard), to
- > visualize only that dimension. This can be particularly useful in process monitoring to separate one process'
- > utilization from the rest of the system.
- ### Using Netdata's eBPF collector (`ebpf.plugin`)
- Netdata's eBPF collector puts its charts in two places. Of most importance to process monitoring are the **ebpf file**,
- **ebpf syscall**, **ebpf process**, and **ebpf net** sub-sections under **Applications**, shown in the above screenshot.
- For example, running the above workload shows the entire "story" how MySQL interacts with the Linux kernel to open
- processes/threads to handle a large number of SQL queries, then subsequently close the tasks as each query returns the
- relevant data.
- ![Per-process eBPF charts](https://user-images.githubusercontent.com/1153921/101412395-c8844800-389f-11eb-86d2-20c8a0f7b3c0.png)
- `ebpf.plugin` visualizes additional eBPF metrics, which are system-wide and not per-process, under the **eBPF** section.
|