Netdata is a distributed and real-time health monitoring and performance troubleshooting toolkit for monitoring your systems and applications.
Because the monitoring agent is highly-optimized, you can install it all your physical systems, containers, IoT devices, and edge devices without disrupting their core function.
By default, and without configuration, Netdata delivers real-time insights into everything happening on the system, from CPU utilization to packet loss on every network device. Netdata can also auto-detect metrics from hundreds of your favorite services and applications, like MySQL/MariaDB, Docker, Nginx, Apache, MongoDB, and more.
All metrics are automatically-updated, providing interactive dashboards that allow you to dive in, discover anomalies, and figure out the root cause analysis of any issue.
Best of all, Netdata is entirely free, open-source software! Solo developers and enterprises with thousands of systems can both use it free of charge. We're hosted on GitHub.
Want to learn about the history of Netdata, and what inspired our CEO to build it in the first place, and where we're headed? Read Costa's comprehensive blog post: Redefining monitoring with Netdata (and how it came to be).
In the first step of the Netdata guide, you'll learn about:
Let's get started!
Netdata has only been around for a few years, but it's a complex piece of software. Here are just some of the features we'll cover throughout this guide.
Because you care about the health and performance of your systems and applications, and all of the awesome features we just mentioned. And it's free!
All these may be valid reasons, but let's step back and talk about Netdata's principles for health monitoring and performance troubleshooting. We have a lot of complementary systems, and we think there's a good reason why Netdata should always be your first choice when troubleshooting an anomaly.
We built Netdata on four principles.
Our first principle is per-second data collection for all metrics.
That matters because you can't monitor a 2-second service-level agreement (SLA) with 10-second metrics. You can't detect quick anomalies if your metrics don't show them.
How do we solve this? By decentralizing monitoring. Each node is responsible for collecting metrics, triggering alarms, and building dashboards locally, and we work hard to ensure it does each step (and others) with remarkable efficiency. For example, Netdata can collect 100,000 metrics every second while using only 9% of a single server-grade CPU core!
By decentralizing monitoring and emphasizing speed at every turn, Netdata helps you scale your health monitoring and performance troubleshooting to an infrastructure of every size. And you get to keep per-second metrics in long-term storage thanks to the database engine.
We believe all metrics are fundamentally important, and all metrics should be available to the user.
If you don't collect all the metrics a system creates, you're only seeing part of the story. It's like saying you've read a book after skipping all but the last ten pages. You only know the ending, not everything that leads to it.
Most monitoring solutions exist to poke you when there's a problem, and then tell you to use a dozen different console tools to find the root cause. Netdata prefers to give you every piece of information you might need to understand why an anomaly happened.
We want every piece of Netdata's dashboard not only to look good and update every second, but also provide context as to what you're looking at and why it matters.
The principle of meaningful presentation is fundamental to our dashboard's user experience (UX). We could have put charts in a grid or hidden some behind tabs or buttons. We instead chose to stack them vertically, on a single page, so you can visually see how, for example, a jump in disk usage can also increase system load.
Here's an example of a system undergoing a disk stress test:
For the curious, here's the command:
stress-ng --fallocate 4 --fallocate-bytes 4g --timeout 1m --metrics --verify --times
!
Finally, Netdata should be usable from the moment you install it.
As we've talked about, and as you'll learn in the following nine steps, Netdata comes installed with:
By standardizing your monitoring infrastructure, Netdata tries to make at least one part of your administrative tasks easy!
We'll cover this quickly, as you're probably eager to get on with using Netdata itself.
We don't want to lock you in to using Netdata by itself, and forever. By supporting archiving to external databases like Graphite, Prometheus, OpenTSDB, MongoDB, and others, you can use Netdata in conjunction with software that might seem like our competitors.
We don't want to "wage war" with another monitoring solution, whether it's commercial, open-source, or anything in between. We just want to give you all the metrics every second, and what you do with them next is your business, not ours. Our mission is helping people create more extraordinary infrastructures!
We think it's imperative you understand why we built Netdata the way we did. But now that we have that behind us, let's get right into that dashboard you've heard so much about.