process.txt 14 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270
  1. <!--
  2. title: Monitor any process in real-time with Netdata
  3. sidebar_label: Monitor any process in real-time with Netdata
  4. description: "Tap into Netdata's powerful collectors, with per-second utilization metrics for every process, to troubleshoot faster and make data-informed decisions."
  5. image: /img/seo/guides/monitor/process.png
  6. custom_edit_url: https://github.com/netdata/netdata/edit/master/docs/guides/monitor/process.md
  7. learn_status: "Published"
  8. learn_rel_path: "Operations"
  9. -->
  10. # Monitor any process in real-time with Netdata
  11. Netdata is more than a multitude of generic system-level metrics and visualizations. Instead of providing only a bird's
  12. eye view of your system, leaving you to wonder exactly _what_ is taking up 99% CPU, Netdata also gives you visibility
  13. into _every layer_ of your node. These additional layers give you context, and meaningful insights, into the true health
  14. and performance of your infrastructure.
  15. One of these layers is the _process_. Every time a Linux system runs a program, it creates an independent process that
  16. executes the program's instructions in parallel with anything else happening on the system. Linux systems track the
  17. state and resource utilization of processes using the [`/proc` filesystem](https://en.wikipedia.org/wiki/Procfs), and
  18. Netdata is designed to hook into those metrics to create meaningful visualizations out of the box.
  19. While there are a lot of existing command-line tools for tracking processes on Linux systems, such as `ps` or `top`,
  20. only Netdata provides dozens of real-time charts, at both per-second and event frequency, without you having to write
  21. SQL queries or know a bunch of arbitrary command-line flags.
  22. With Netdata's process monitoring, you can:
  23. - Benchmark/optimize performance of standard applications, like web servers or databases
  24. - Benchmark/optimize performance of custom applications
  25. - Troubleshoot CPU/memory/disk utilization issues (why is my system's CPU spiking right now?)
  26. - Perform granular capacity planning based on the specific needs of your infrastructure
  27. - Search for leaking file descriptors
  28. - Investigate zombie processes
  29. ... and much more. Let's get started.
  30. ## Prerequisites
  31. - One or more Linux nodes running [Netdata](/packaging/installer/README.md)
  32. - A general understanding of how
  33. to [configure the Netdata Agent](/docs/netdata-agent/configuration/README.md)
  34. using `edit-config`.
  35. - A Netdata Cloud account. [Sign up](https://app.netdata.cloud) if you don't have one already.
  36. ## How does Netdata do process monitoring?
  37. The Netdata Agent already knows to look for hundreds
  38. of [standard applications that we support via collectors](/src/collectors/COLLECTORS.md),
  39. and groups them based on their
  40. purpose. Let's say you want to monitor a MySQL
  41. database using its process. The Netdata Agent already knows to look for processes with the string `mysqld` in their
  42. name, along with a few others, and puts them into the `sql` group. This `sql` group then becomes a dimension in all
  43. process-specific charts.
  44. The process and groups settings are used by two unique and powerful collectors.
  45. [**`apps.plugin`**](/src/collectors/apps.plugin/README.md) looks at the Linux
  46. process tree every second, much like `top` or
  47. `ps fax`, and collects resource utilization information on every running process. It then automatically adds a layer of
  48. meaningful visualization on top of these metrics, and creates per-process/application charts.
  49. [**`ebpf.plugin`**](/src/collectors/ebpf.plugin/README.md): Netdata's extended
  50. Berkeley Packet Filter (eBPF) collector
  51. monitors Linux kernel-level metrics for file descriptors, virtual filesystem IO, and process management, and then hands
  52. process-specific metrics over to `apps.plugin` for visualization. The eBPF collector also collects and visualizes
  53. metrics on an _event frequency_, which means it captures every kernel interaction, and not just the volume of
  54. interaction at every second in time. That's even more precise than Netdata's standard per-second granularity.
  55. ### Per-process metrics and charts in Netdata
  56. With these collectors working in parallel, Netdata visualizes the following per-second metrics for _any_ process on your
  57. Linux systems:
  58. - CPU utilization (`apps.cpu`)
  59. - Total CPU usage
  60. - User/system CPU usage (`apps.cpu_user`/`apps.cpu_system`)
  61. - Disk I/O
  62. - Physical reads/writes (`apps.preads`/`apps.pwrites`)
  63. - Logical reads/writes (`apps.lreads`/`apps.lwrites`)
  64. - Open unique files (if a file is found open multiple times, it is counted just once, `apps.files`)
  65. - Memory
  66. - Real Memory Used (non-shared, `apps.mem`)
  67. - Virtual Memory Allocated (`apps.vmem`)
  68. - Minor page faults (i.e. memory activity, `apps.minor_faults`)
  69. - Processes
  70. - Threads running (`apps.threads`)
  71. - Processes running (`apps.processes`)
  72. - Carried over uptime (since the last Netdata Agent restart, `apps.uptime`)
  73. - Minimum uptime (`apps.uptime_min`)
  74. - Average uptime (`apps.uptime_average`)
  75. - Maximum uptime (`apps.uptime_max`)
  76. - Pipes open (`apps.pipes`)
  77. - Swap memory
  78. - Swap memory used (`apps.swap`)
  79. - Major page faults (i.e. swap activity, `apps.major_faults`)
  80. - Network
  81. - Sockets open (`apps.sockets`)
  82. - eBPF file
  83. - Number of calls to open files. (`apps.file_open`)
  84. - Number of files closed. (`apps.file_closed`)
  85. - Number of calls to open files that returned errors.
  86. - Number of calls to close files that returned errors.
  87. - eBPF syscall
  88. - Number of calls to delete files. (`apps.file_deleted`)
  89. - Number of calls to `vfs_write`. (`apps.vfs_write_call`)
  90. - Number of calls to `vfs_read`. (`apps.vfs_read_call`)
  91. - Number of bytes written with `vfs_write`. (`apps.vfs_write_bytes`)
  92. - Number of bytes read with `vfs_read`. (`apps.vfs_read_bytes`)
  93. - Number of calls to write a file that returned errors.
  94. - Number of calls to read a file that returned errors.
  95. - eBPF process
  96. - Number of process created with `do_fork`. (`apps.process_create`)
  97. - Number of threads created with `do_fork` or `__x86_64_sys_clone`, depending on your system's kernel
  98. version. (`apps.thread_create`)
  99. - Number of times that a process called `do_exit`. (`apps.task_close`)
  100. - eBPF net
  101. - Number of bytes sent. (`apps.bandwidth_sent`)
  102. - Number of bytes received. (`apps.bandwidth_recv`)
  103. As an example, here's the per-process CPU utilization chart, including a `sql` group/dimension.
  104. ![A per-process CPU utilization chart in Netdata Cloud](https://user-images.githubusercontent.com/1153921/101217226-3a5d5700-363e-11eb-8610-aa1640aefb5d.png)
  105. ## Configure the Netdata Agent to recognize a specific process
  106. To monitor any process, you need to make sure the Netdata Agent is aware of it. As mentioned above, the Agent is already
  107. aware of hundreds of processes, and collects metrics from them automatically.
  108. But, if you want to change the grouping behavior, add an application that isn't yet supported in the Netdata Agent, or
  109. monitor a custom application, you need to edit the `apps_groups.conf` configuration file.
  110. Navigate to your [Netdata config directory](/docs/netdata-agent/configuration/README.md) and
  111. use `edit-config` to edit the file.
  112. ```bash
  113. cd /etc/netdata # Replace this with your Netdata config directory if not at /etc/netdata.
  114. sudo ./edit-config apps_groups.conf
  115. ```
  116. Inside the file are lists of process names, oftentimes using wildcards (`*`), that the Netdata Agent looks for and
  117. groups together. For example, the Netdata Agent looks for processes starting with `mysqld`, `mariad`, `postgres`, and
  118. others, and groups them into `sql`. That makes sense, since all these processes are for SQL databases.
  119. ```text
  120. sql: mysqld* mariad* postgres* postmaster* oracle_* ora_* sqlservr
  121. ```
  122. These groups are then reflected as [dimensions](/src/web/README.md#dimensions)
  123. within Netdata's charts.
  124. ![An example per-process CPU utilization chart in Netdata
  125. Cloud](https://user-images.githubusercontent.com/1153921/101369156-352e2100-3865-11eb-9f0d-b8fac162e034.png)
  126. See the following two sections for details based on your needs. If you don't need to configure `apps_groups.conf`, jump
  127. down to [visualizing process metrics](#visualize-process-metrics).
  128. ### Standard applications (web servers, databases, containers, and more)
  129. As explained above, the Netdata Agent is already aware of most standard applications you run on Linux nodes, and you
  130. shouldn't need to configure it to discover them.
  131. However, if you're using multiple applications that the Netdata Agent groups together you may want to separate them for
  132. more precise monitoring. If you're not running any other types of SQL databases on that node, you don't need to change
  133. the grouping, since you know that any MySQL is the only process contributing to the `sql` group.
  134. Let's say you're using both MySQL and PostgreSQL databases on a single node, and want to monitor their processes
  135. independently. Open the `apps_groups.conf` file as explained in
  136. the [section above](#configure-the-netdata-agent-to-recognize-a-specific-process) and scroll down until you find
  137. the `database servers` section. Create new groups for MySQL and PostgreSQL, and move their process queries into the
  138. unique groups.
  139. ```text
  140. # -----------------------------------------------------------------------------
  141. # database servers
  142. mysql: mysqld*
  143. postgres: postgres*
  144. sql: mariad* postmaster* oracle_* ora_* sqlservr
  145. ```
  146. Restart Netdata with `sudo systemctl restart netdata`, or
  147. the appropriate method for your system, to start collecting utilization metrics
  148. from your application. Time to [visualize your process metrics](#visualize-process-metrics).
  149. ### Custom applications
  150. Let's assume you have an application that runs on the process `custom-app`. To monitor eBPF metrics for that application
  151. separate from any others, you need to create a new group in `apps_groups.conf` and associate that process name with it.
  152. Open the `apps_groups.conf` file as explained in
  153. the [section above](#configure-the-netdata-agent-to-recognize-a-specific-process). Scroll down
  154. to `# NETDATA processes accounting`.
  155. Above that, paste in the following text, which creates a new `custom-app` group with the `custom-app` process. Replace
  156. `custom-app` with the name of your application's Linux process. `apps_groups.conf` should now look like this:
  157. ```text
  158. ...
  159. # -----------------------------------------------------------------------------
  160. # Custom applications to monitor with apps.plugin and ebpf.plugin
  161. custom-app: custom-app
  162. # -----------------------------------------------------------------------------
  163. # NETDATA processes accounting
  164. ...
  165. ```
  166. Restart Netdata with `sudo systemctl restart netdata`, or
  167. the appropriate method for your system, to start collecting utilization metrics
  168. from your application.
  169. ## Visualize process metrics
  170. Now that you're collecting metrics for your process, you'll want to visualize them using Netdata's real-time,
  171. interactive charts. Find these visualizations in the same section regardless of whether you
  172. use [Netdata Cloud](https://app.netdata.cloud) for infrastructure monitoring, or single-node monitoring with the local
  173. Agent's dashboard at `http://localhost:19999`.
  174. If you need a refresher on all the available per-process charts, see
  175. the [above list](#per-process-metrics-and-charts-in-netdata).
  176. ### Using Netdata's application collector (`apps.plugin`)
  177. `apps.plugin` puts all of its charts under the **Applications** section of any Netdata dashboard.
  178. ![Screenshot of the Applications section on a Netdata dashboard](https://user-images.githubusercontent.com/1153921/101401172-2ceadb80-388f-11eb-9e9a-88443894c272.png)
  179. Let's continue with the MySQL example. We can create a [test
  180. database](https://www.digitalocean.com/community/tutorials/how-to-measure-mysql-query-performance-with-mysqlslap) in
  181. MySQL to generate load on the `mysql` process.
  182. `apps.plugin` immediately collects and visualizes this activity `apps.cpu` chart, which shows an increase in CPU
  183. utilization from the `sql` group. There is a parallel increase in `apps.pwrites`, which visualizes writes to disk.
  184. ![Per-application CPU utilization metrics](https://user-images.githubusercontent.com/1153921/101409725-8527da80-389b-11eb-96e9-9f401535aafc.png)
  185. ![Per-application disk writing metrics](https://user-images.githubusercontent.com/1153921/101409728-85c07100-389b-11eb-83fd-d79dd1545b5a.png)
  186. Next, the `mysqlslap` utility queries the database to provide some benchmarking load on the MySQL database. It won't
  187. look exactly like a production database executing lots of user queries, but it gives you an idea into the possibility of
  188. these visualizations.
  189. ```bash
  190. sudo mysqlslap --user=sysadmin --password --host=localhost --concurrency=50 --iterations=10 --create-schema=employees --query="SELECT * FROM dept_emp;" --verbose
  191. ```
  192. The following per-process disk utilization charts show spikes under the `sql` group at the same time `mysqlslap` was run
  193. numerous times, with slightly different concurrency and query options.
  194. ![Per-application disk metrics](https://user-images.githubusercontent.com/1153921/101411810-d08fb800-389e-11eb-85b3-f3fa41f1f887.png)
  195. > 💡 Click on any dimension below a chart in Netdata Cloud (or to the right of a chart on a local Agent dashboard), to
  196. > visualize only that dimension. This can be particularly useful in process monitoring to separate one process'
  197. > utilization from the rest of the system.
  198. ### Using Netdata's eBPF collector (`ebpf.plugin`)
  199. Netdata's eBPF collector puts its charts in two places. Of most importance to process monitoring are the **ebpf file**,
  200. **ebpf syscall**, **ebpf process**, and **ebpf net** sub-sections under **Applications**, shown in the above screenshot.
  201. For example, running the above workload shows the entire "story" how MySQL interacts with the Linux kernel to open
  202. processes/threads to handle a large number of SQL queries, then subsequently close the tasks as each query returns the
  203. relevant data.
  204. ![Per-process eBPF charts](https://user-images.githubusercontent.com/1153921/101412395-c8844800-389f-11eb-86d2-20c8a0f7b3c0.png)
  205. `ebpf.plugin` visualizes additional eBPF metrics, which are system-wide and not per-process, under the **eBPF** section.