log2journal
and systemd-cat-native
can be used to convert a structured log file, such as the ones generated by web servers, into systemd-journal
entries.
By combining these tools, together with the usual UNIX shell tools you can create advanced log processing pipelines sending any kind of structured text logs to systemd-journald. This is a simple, but powerful and efficient way to handle log processing.
The process involves the usual piping of shell commands, to get and process the log files in realtime.
The overall process looks like this:
tail -F /var/log/nginx/*.log |\ # outputs log lines
log2journal 'PATTERN' |\ # outputs Journal Export Format
systemd-cat-native # send to local/remote journald
Let's see the steps:
tail -F /var/log/nginx/*.log
*.log
files in /var/log/nginx/
. We use -F
instead of -f
to ensure that files will still be tailed after log rotation.log2joural
is a Netdata program. It reads log entries and extracts fields, according to the PCRE2 pattern it accepts. It can also apply some basic operations on the fields, like injecting new fields or duplicating existing ones or rewriting their values. The output of log2journal
is in Systemd Journal Export Format, and it looks like this:
KEY1=VALUE1 # << start of the first log line
KEY2=VALUE2
# << log lines separator
KEY1=VALUE1 # << start of the second log line
KEY2=VALUE2
systemd-cat-native
is a Netdata program. I can send the logs to a local systemd-journald
(journal namespaces supported), or to a remote systemd-journal-remote
.
We have an nginx server logging in this format:
log_format access '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'$request_length $request_time '
'"$http_referer" "$http_user_agent"';
First, let's find the right pattern for log2journal
. We ask ChatGPT:
My nginx log uses this log format:
log_format access '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'$request_length $request_time '
'"$http_referer" "$http_user_agent"';
I want to use `log2joural` to convert this log for systemd-journal.
`log2journal` accepts a PCRE2 regular expression, using the named groups
in the pattern as the journal fields to extract from the logs.
Prefix all PCRE2 group names with `NGINX_` and use capital characters only.
For the $request, use the field `MESSAGE` (without NGINX_ prefix), so that
it will appear in systemd journals as the message of the log.
Please give me the PCRE2 pattern.
ChatGPT replies with this:
^(?<NGINX_REMOTE_ADDR>[^ ]+) - (?<NGINX_REMOTE_USER>[^ ]+) \[(?<NGINX_TIME_LOCAL>[^\]]+)\] "(?<MESSAGE>[^"]+)" (?<NGINX_STATUS>\d+) (?<NGINX_BODY_BYTES_SENT>\d+) (?<NGINX_REQUEST_LENGTH>\d+) (?<NGINX_REQUEST_TIME>[\d.]+) "(?<NGINX_HTTP_REFERER>[^"]*)" "(?<NGINX_HTTP_USER_AGENT>[^"]*)"
Let's test it with a sample line (instead of tail
):
# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 104 0.001 "-" "Go-http-client/1.1"' | log2journal '^(?<NGINX_REMOTE_ADDR>[^ ]+) - (?<NGINX_REMOTE_USER>[^ ]+) \[(?<NGINX_TIME_LOCAL>[^\]]+)\] "(?<MESSAGE>[^"]+)" (?<NGINX_STATUS>\d+) (?<NGINX_BODY_BYTES_SENT>\d+) (?<NGINX_REQUEST_LENGTH>\d+) (?<NGINX_REQUEST_TIME>[\d.]+) "(?<NGINX_HTTP_REFERER>[^"]*)" "(?<NGINX_HTTP_USER_AGENT>[^"]*)"'
MESSAGE=GET /index.html HTTP/1.1
NGINX_BODY_BYTES_SENT=4172
NGINX_HTTP_REFERER=-
NGINX_HTTP_USER_AGENT=Go-http-client/1.1
NGINX_REMOTE_ADDR=1.2.3.4
NGINX_REMOTE_USER=-
NGINX_REQUEST_LENGTH=104
NGINX_REQUEST_TIME=0.001
NGINX_STATUS=200
NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000
As you can see, it extracted all the fields.
The MESSAGE
however, has 3 fields by itself: the method, the URL and the procotol version. Let's ask ChatGPT to extract these too:
I see that the MESSAGE has 3 key items in it. The request method (GET, POST,
etc), the URL and HTTP protocol version.
I want to keep the MESSAGE as it is, with all the information in it, but also
extract the 3 items from it as separate fields.
Can this be done?
ChatGPT responded with this:
^(?<NGINX_REMOTE_ADDR>[^ ]+) - (?<NGINX_REMOTE_USER>[^ ]+) \[(?<NGINX_TIME_LOCAL>[^\]]+)\] "(?<MESSAGE>(?<NGINX_METHOD>[A-Z]+) (?<NGINX_URL>[^ ]+) HTTP/(?<NGINX_HTTP_VERSION>[^"]+))" (?<NGINX_STATUS>\d+) (?<NGINX_BODY_BYTES_SENT>\d+) (?<NGINX_REQUEST_LENGTH>\d+) (?<NGINX_REQUEST_TIME>[\d.]+) "(?<NGINX_HTTP_REFERER>[^"]*)" "(?<NGINX_HTTP_USER_AGENT>[^"]*)"
Let's test this too:
# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 104 0.001 "-" "Go-http-client/1.1"' | log2journal '^(?<NGINX_REMOTE_ADDR>[^ ]+) - (?<NGINX_REMOTE_USER>[^ ]+) \[(?<NGINX_TIME_LOCAL>[^\]]+)\] "(?<MESSAGE>(?<NGINX_METHOD>[A-Z]+) (?<NGINX_URL>[^ ]+) HTTP/(?<NGINX_HTTP_VERSION>[^"]+))" (?<NGINX_STATUS>\d+) (?<NGINX_BODY_BYTES_SENT>\d+) (?<NGINX_REQUEST_LENGTH>\d+) (?<NGINX_REQUEST_TIME>[\d.]+) "(?<NGINX_HTTP_REFERER>[^"]*)" "(?<NGINX_HTTP_USER_AGENT>[^"]*)"'
MESSAGE=GET /index.html HTTP/1.1 # <<<<<<<<< MESSAGE
NGINX_BODY_BYTES_SENT=4172
NGINX_HTTP_REFERER=-
NGINX_HTTP_USER_AGENT=Go-http-client/1.1
NGINX_HTTP_VERSION=1.1 # <<<<<<<<< VERSION
NGINX_METHOD=GET # <<<<<<<<< METHOD
NGINX_REMOTE_ADDR=1.2.3.4
NGINX_REMOTE_USER=-
NGINX_REQUEST_LENGTH=104
NGINX_REQUEST_TIME=0.001
NGINX_STATUS=200
NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000
NGINX_URL=/index.html # <<<<<<<<< URL
Ideally, we would want the 5xx errors to be red in our journalctl
output. To achieve that we need to add a PRIORITY field to set the log level. Log priorities are numeric and follow the syslog
priorities. Checking /usr/include/sys/syslog.h
we can see these:
#define LOG_EMERG 0 /* system is unusable */
#define LOG_ALERT 1 /* action must be taken immediately */
#define LOG_CRIT 2 /* critical conditions */
#define LOG_ERR 3 /* error conditions */
#define LOG_WARNING 4 /* warning conditions */
#define LOG_NOTICE 5 /* normal but significant condition */
#define LOG_INFO 6 /* informational */
#define LOG_DEBUG 7 /* debug-level messages */
Avoid setting priority to 0 (LOG_EMERG
), because these will be on your terminal (the journal uses wall
to let you know of such events). A good priority for errors is 3 (red in journalctl
), or 4 (yellow in journalctl
).
To set the PRIORITY field in the output, we can use NGINX_STATUS
fields. We need a copy of it, which we will alter later.
We can instruct log2journal
to duplicate NGINX_STATUS
, like this: log2journal --duplicate=PRIORITY=NGINX_STATUS
. Let's try it:
# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 104 0.001 "-" "Go-http-client/1.1"' | log2journal '^(?<NGINX_REMOTE_ADDR>[^ ]+) - (?<NGINX_REMOTE_USER>[^ ]+) \[(?<NGINX_TIME_LOCAL>[^\]]+)\] "(?<MESSAGE>(?<NGINX_METHOD>[A-Z]+) (?<NGINX_URL>[^ ]+) HTTP/(?<NGINX_HTTP_VERSION>[^"]+))" (?<NGINX_STATUS>\d+) (?<NGINX_BODY_BYTES_SENT>\d+) (?<NGINX_REQUEST_LENGTH>\d+) (?<NGINX_REQUEST_TIME>[\d.]+) "(?<NGINX_HTTP_REFERER>[^"]*)" "(?<NGINX_HTTP_USER_AGENT>[^"]*)"' --duplicate=PRIORITY=NGINX_STATUS
MESSAGE=GET /index.html HTTP/1.1
NGINX_BODY_BYTES_SENT=4172
NGINX_HTTP_REFERER=-
NGINX_HTTP_USER_AGENT=Go-http-client/1.1
NGINX_HTTP_VERSION=1.1
NGINX_METHOD=GET
NGINX_REMOTE_ADDR=1.2.3.4
NGINX_REMOTE_USER=-
NGINX_REQUEST_LENGTH=104
NGINX_REQUEST_TIME=0.001
NGINX_STATUS=200
PRIORITY=200 # <<<<<<<<< PRIORITY IS HERE
NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000
NGINX_URL=/index.html
Now that we have the PRIORITY
field equal to the NGINX_STATUS
, we can use instruct log2journal
to change it to a valid priority, by appending: --rewrite=PRIORITY=/^5/3 --rewrite=PRIORITY=/.*/6
. These rewrite commands say to match everything that starts with 5
and replace it with priority 3
(error) and everything else with priority 6
(info). Let's see it:
# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 104 0.001 "-" "Go-http-client/1.1"' | log2journal '^(?<NGINX_REMOTE_ADDR>[^ ]+) - (?<NGINX_REMOTE_USER>[^ ]+) \[(?<NGINX_TIME_LOCAL>[^\]]+)\] "(?<MESSAGE>(?<NGINX_METHOD>[A-Z]+) (?<NGINX_URL>[^ ]+) HTTP/(?<NGINX_HTTP_VERSION>[^"]+))" (?<NGINX_STATUS>\d+) (?<NGINX_BODY_BYTES_SENT>\d+) (?<NGINX_REQUEST_LENGTH>\d+) (?<NGINX_REQUEST_TIME>[\d.]+) "(?<NGINX_HTTP_REFERER>[^"]*)" "(?<NGINX_HTTP_USER_AGENT>[^"]*)"' --duplicate=STATUS2PRIORITY=NGINX_STATUS --rewrite=PRIORITY=/^5/3 --rewrite=PRIORITY=/.*/6
MESSAGE=GET /index.html HTTP/1.1
NGINX_BODY_BYTES_SENT=4172
NGINX_HTTP_REFERER=-
NGINX_HTTP_USER_AGENT=Go-http-client/1.1
NGINX_HTTP_VERSION=1.1
NGINX_METHOD=GET
NGINX_REMOTE_ADDR=1.2.3.4
NGINX_REMOTE_USER=-
NGINX_REQUEST_LENGTH=104
NGINX_REQUEST_TIME=0.001
NGINX_STATUS=200
PRIORITY=6 # <<<<<<<<<< PRIORITY changed to 6
NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000
NGINX_URL=/index.html
Similarly, we could duplicate NGINX_URL
to NGINX_ENDPOINT
and then process it with sed to remove any query string, or replace IDs in the URL path with constant names, thus giving us uniform endpoints independently of the parameters.
To complete the example, we can also inject a SYSLOG_IDENTIFIER
with log2journal
, using --inject=SYSLOG_IDENTIFIER=nginx-log
, like this:
# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 104 0.001 "-" "Go-http-client/1.1"' | log2journal '^(?<NGINX_REMOTE_ADDR>[^ ]+) - (?<NGINX_REMOTE_USER>[^ ]+) \[(?<NGINX_TIME_LOCAL>[^\]]+)\] "(?<MESSAGE>(?<NGINX_METHOD>[A-Z]+) (?<NGINX_URL>[^ ]+) HTTP/(?<NGINX_HTTP_VERSION>[^"]+))" (?<NGINX_STATUS>\d+) (?<NGINX_BODY_BYTES_SENT>\d+) (?<NGINX_REQUEST_LENGTH>\d+) (?<NGINX_REQUEST_TIME>[\d.]+) "(?<NGINX_HTTP_REFERER>[^"]*)" "(?<NGINX_HTTP_USER_AGENT>[^"]*)"' --duplicate=STATUS2PRIORITY=NGINX_STATUS --inject=SYSLOG_IDENTIFIER=nginx -rewrite=PRIORITY=/^5/3 --rewrite=PRIORITY=/.*/6
MESSAGE=GET /index.html HTTP/1.1
NGINX_BODY_BYTES_SENT=4172
NGINX_HTTP_REFERER=-
NGINX_HTTP_USER_AGENT=Go-http-client/1.1
NGINX_HTTP_VERSION=1.1
NGINX_METHOD=GET
NGINX_REMOTE_ADDR=1.2.3.4
NGINX_REMOTE_USER=-
NGINX_REQUEST_LENGTH=104
NGINX_REQUEST_TIME=0.001
NGINX_STATUS=200
PRIORITY=6
NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000
NGINX_URL=/index.html
SYSLOG_IDENTIFIER=nginx-log # <<<<<<<<< THIS HAS BEEN ADDED
Now the message is ready to be sent to a systemd-journal. For this we use systemd-cat-native
. This command can send such messages to a journal running on the localhost, a local journal namespace, or a systemd-journal-remote
running on another server. By just appending | systemd-cat-native
to the command, the message will be sent to the local journal.
# echo '1.2.3.4 - - [19/Nov/2023:00:24:43 +0000] "GET /index.html HTTP/1.1" 200 4172 104 0.001 "-" "Go-http-client/1.1"' | log2journal '^(?<NGINX_REMOTE_ADDR>[^ ]+) - (?<NGINX_REMOTE_USER>[^ ]+) \[(?<NGINX_TIME_LOCAL>[^\]]+)\] "(?<MESSAGE>(?<NGINX_METHOD>[A-Z]+) (?<NGINX_URL>[^ ]+) HTTP/(?<NGINX_HTTP_VERSION>[^"]+))" (?<NGINX_STATUS>\d+) (?<NGINX_BODY_BYTES_SENT>\d+) (?<NGINX_REQUEST_LENGTH>\d+) (?<NGINX_REQUEST_TIME>[\d.]+) "(?<NGINX_HTTP_REFERER>[^"]*)" "(?<NGINX_HTTP_USER_AGENT>[^"]*)"' --duplicate=STATUS2PRIORITY=NGINX_STATUS --inject=SYSLOG_IDENTIFIER=nginx -rewrite=PRIORITY=/^5/3 --rewrite=PRIORITY=/.*/6 | systemd-cat-native
# no output
# let's find the message
# journalctl -o verbose SYSLOG_IDENTIFIER=nginx
Sun 2023-11-19 04:34:06.583912 EET [s=1eb59e7934984104ab3b61f5d9648057;i=115b6d4;b=7282d89d2e6e4299969a6030302ff3e4;m=69b419673;t=60a783417ac72;x=2cec5dde8bf01ee7]
PRIORITY=6
_UID=0
_GID=0
_BOOT_ID=7282d89d2e6e4299969a6030302ff3e4
_MACHINE_ID=6b72c55db4f9411dbbb80b70537bf3a8
_HOSTNAME=costa-xps9500
_RUNTIME_SCOPE=system
_TRANSPORT=journal
_CAP_EFFECTIVE=1ffffffffff
_AUDIT_LOGINUID=1000
_AUDIT_SESSION=1
_SYSTEMD_CGROUP=/user.slice/user-1000.slice/user@1000.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-59780d3d-a3ff-4a82-a6fe-8d17d2261106.scope
_SYSTEMD_OWNER_UID=1000
_SYSTEMD_UNIT=user@1000.service
_SYSTEMD_USER_UNIT=vte-spawn-59780d3d-a3ff-4a82-a6fe-8d17d2261106.scope
_SYSTEMD_SLICE=user-1000.slice
_SYSTEMD_USER_SLICE=app-org.gnome.Terminal.slice
_SYSTEMD_INVOCATION_ID=6195d8c4c6654481ac9a30e9a8622ba1
_COMM=systemd-cat-nat
MESSAGE=GET /index.html HTTP/1.1 # <<<<<<<<< CHECK
NGINX_BODY_BYTES_SENT=4172 # <<<<<<<<< CHECK
NGINX_HTTP_REFERER=- # <<<<<<<<< CHECK
NGINX_HTTP_USER_AGENT=Go-http-client/1.1 # <<<<<<<<< CHECK
NGINX_HTTP_VERSION=1.1 # <<<<<<<<< CHECK
NGINX_METHOD=GET # <<<<<<<<< CHECK
NGINX_REMOTE_ADDR=1.2.3.4 # <<<<<<<<< CHECK
NGINX_REMOTE_USER=- # <<<<<<<<< CHECK
NGINX_REQUEST_LENGTH=104 # <<<<<<<<< CHECK
NGINX_REQUEST_TIME=0.001 # <<<<<<<<< CHECK
NGINX_STATUS=200 # <<<<<<<<< CHECK
NGINX_TIME_LOCAL=19/Nov/2023:00:24:43 +0000 # <<<<<<<<< CHECK
NGINX_URL=/index.html # <<<<<<<<< CHECK
SYSLOG_IDENTIFIER=nginx-log # <<<<<<<<< CHECK
_PID=354312
_SOURCE_REALTIME_TIMESTAMP=1700361246583912
So, the log line, with all its fields parsed, ended up in systemd-journal.
The complete example, would look like the following script.
Running this script with parameter test
will produce output on the terminal for you to inspect.
Unmatched log entries are added to the journal with PRIORITY=1 (ERR_ALERT
), so that you can spot them.
We also used the --filename-key
of log2journal
, which parses the filename when tail
switches output
between files, and adds the field NGINX_LOG_FILE
with the filename each log line comes from.
Finally, the script also adds the field NGINX_STATUS_FAMILY
taking values 2xx
, 3xx
, etc, so that
it is easy to find all the logs of a specific status family.
#!/usr/bin/env bash
test=0
last=0
send_or_show='./systemd-cat-native'
[ "${1}" = "test" ] && test=1 && last=100 && send_or_show=cat
pattern='(?x) # Enable PCRE2 extended mode
^
(?<NGINX_REMOTE_ADDR>[^ ]+) \s - \s # NGINX_REMOTE_ADDR
(?<NGINX_REMOTE_USER>[^ ]+) \s # NGINX_REMOTE_USER
\[
(?<NGINX_TIME_LOCAL>[^\]]+) # NGINX_TIME_LOCAL
\]
\s+ "
(?<MESSAGE> # MESSAGE
(?<NGINX_METHOD>[A-Z]+) \s+ # NGINX_METHOD
(?<NGINX_URL>[^ ]+) \s+ # NGINX_URL
HTTP/(?<NGINX_HTTP_VERSION>[^"]+) # NGINX_HTTP_VERSION
)
" \s+
(?<NGINX_STATUS>\d+) \s+ # NGINX_STATUS
(?<NGINX_BODY_BYTES_SENT>\d+) \s+ # NGINX_BODY_BYTES_SENT
"(?<NGINX_HTTP_REFERER>[^"]*)" \s+ # NGINX_HTTP_REFERER
"(?<NGINX_HTTP_USER_AGENT>[^"]*)" # NGINX_HTTP_USER_AGENT
'
tail -n $last -F /var/log/nginx/*access.log \
| log2journal "${pattern}" \
--filename-key=NGINX_LOG_FILE \
--duplicate=PRIORITY=NGINX_STATUS \
--duplicate=NGINX_STATUS_FAMILY=NGINX_STATUS \
--inject=SYSLOG_IDENTIFIER=nginx-log \
--unmatched-key=MESSAGE \
--inject-unmatched=PRIORITY=1 \
--rewrite='PRIORITY=/^5/3 --rewrite=PRIORITY=/.*/6' \
--rewrite='NGINX_STATUS_FAMILY=/^(?<first_digit>[0-9]).*$/${first_digit}xx' \
--rewrite='NGINX_STATUS_FAMILY=/^.*$/UNKNOWN' \
| $send_or_show
log2journal
options
Netdata log2journal v1.43.0-276-gfff8d1181
Convert structured log input to systemd Journal Export Format.
Using PCRE2 patterns, extract the fields from structured logs on the standard
input, and generate output according to systemd Journal Export Format.
Usage: ./log2journal [OPTIONS] PATTERN
Options:
--file /path/to/file.yaml
Read yaml configuration file for instructions.
--show-config
Show the configuration in yaml format before starting the job.
This is also an easy way to convert command line parameters to yaml.
--filename-key KEY
Add a field with KEY as the key and the current filename as value.
Automatically detects filenames when piped after 'tail -F',
and tail matches multiple filenames.
To inject the filename when tailing a single file, use --inject.
--unmatched-key KEY
Include unmatched log entries in the output with KEY as the field name.
Use this to include unmatched entries to the output stream.
Usually it should be set to --unmatched-key=MESSAGE so that the
unmatched entry will appear as the log message in the journals.
Use --inject-unmatched to inject additional fields to unmatched lines.
--duplicate TARGET=KEY1[,KEY2[,KEY3[,...]]
Create a new key called TARGET, duplicating the values of the keys
given. Useful for further processing. When multiple keys are given,
their values are separated by comma.
Up to 512 duplications can be given on the command line, and up to
20 keys per duplication command are allowed.
--inject LINE
Inject constant fields to the output (both matched and unmatched logs).
--inject entries are added to unmatched lines too, when their key is
not used in --inject-unmatched (--inject-unmatched override --inject).
Up to 512 fields can be injected.
--inject-unmatched LINE
Inject lines into the output for each unmatched log entry.
Usually, --inject-unmatched=PRIORITY=3 is needed to mark the unmatched
lines as errors, so that they can easily be spotted in the journals.
Up to 512 such lines can be injected.
--rewrite KEY=/SearchPattern/ReplacePattern
Apply a rewrite rule to the values of a specific key.
The first character after KEY= is the separator, which should also
be used between the search pattern and the replacement pattern.
The search pattern is a PCRE2 regular expression, and the replacement
pattern supports literals and named capture groups from the search pattern.
Example:
--rewrite DATE=/^(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})$/
${day}/${month}/${year}
This will rewrite dates in the format YYYY-MM-DD to DD/MM/YYYY.
Only one rewrite rule is applied per key; the sequence of rewrites stops
for the key once a rule matches it. This allows providing a sequence of
independent rewriting rules for the same key, matching the different values
the key may get, and also provide a catch-all rewrite rule at the end of the
sequence for setting the key value if no other rule matched it.
The combination of duplicating keys with the values of multiple other keys
combined with multiple rewrite rules, allows creating complex rules for
rewriting key values.
Up to 512 rewriting rules are allowed.
-h, --help
Display this help and exit.
PATTERN
PATTERN should be a valid PCRE2 regular expression.
RE2 regular expressions (like the ones usually used in Go applications),
are usually valid PCRE2 patterns too.
Regular expressions without named groups are ignored.
The program accepts all parameters as both --option=value and --option value.
The maximum line length accepted is 1048576 characters.
The maximum number of fields in the PCRE2 pattern is 1024.
PIPELINE AND SEQUENCE OF PROCESSING
This is a simple diagram of the pipeline taking place:
+---------------------------------------------------+
| INPUT |
+---------------------------------------------------+
v v
+---------------------------------+ |
| EXTRACT FIELDS AND VALUES | |
+---------------------------------+ |
v v |
+---------------+ | |
| DUPLICATE | | |
| create fields | | |
| with values | | |
+---------------+ | |
v v v
+---------------------------------+ +--------------+
| REWRITE PIPELINES | | INJECT |
| altering the values | | constants |
+---------------------------------+ +--------------+
v v
+---------------------------------------------------+
| OUTPUT |
+---------------------------------------------------+
JOURNAL FIELDS RULES (enforced by systemd-journald)
- field names can be up to 64 characters
- the only allowed field characters are A-Z, 0-9 and underscore
- the first character of fields cannot be a digit
- protected journal fields start with underscore:
* they are accepted by systemd-journal-remote
* they are NOT accepted by a local systemd-journald
For best results, always include these fields:
MESSAGE=TEXT
The MESSAGE is the body of the log entry.
This field is what we usually see in our logs.
PRIORITY=NUMBER
PRIORITY sets the severity of the log entry.
0=emerg, 1=alert, 2=crit, 3=err, 4=warn, 5=notice, 6=info, 7=debug
- Emergency events (0) are usually broadcast to all terminals.
- Emergency, alert, critical, and error (0-3) are usually colored red.
- Warning (4) entries are usually colored yellow.
- Notice (5) entries are usually bold or have a brighter white color.
- Info (6) entries are the default.
- Debug (7) entries are usually grayed or dimmed.
SYSLOG_IDENTIFIER=NAME
SYSLOG_IDENTIFIER sets the name of application.
Use something descriptive, like: SYSLOG_IDENTIFIER=nginx-logs
You can find the most common fields at 'man systemd.journal-fields'.
Example YAML file:
--------------------------------------------------------------------------------
# Netdata log2journal Configuration Template
# The following parses nginx log files using the combined format.
# The PCRE2 pattern to match log entries and give names to the fields.
# The journal will have these names, so follow their rules. You can
# initiate an extended PCRE2 pattern by starting the pattern with (?x)
pattern: |
(?x) # Enable PCRE2 extended mode
^
(?<NGINX_REMOTE_ADDR>[^ ]+) \s - \s # NGINX_REMOTE_ADDR
(?<NGINX_REMOTE_USER>[^ ]+) \s # NGINX_REMOTE_USER
\[
(?<NGINX_TIME_LOCAL>[^\]]+) # NGINX_TIME_LOCAL
\]
\s+ "
(?<MESSAGE>
(?<NGINX_METHOD>[A-Z]+) \s+ # NGINX_METHOD
(?<NGINX_URL>[^ ]+) \s+
HTTP/(?<NGINX_HTTP_VERSION>[^"]+)
)
" \s+
(?<NGINX_STATUS>\d+) \s+ # NGINX_STATUS
(?<NGINX_BODY_BYTES_SENT>\d+) \s+ # NGINX_BODY_BYTES_SENT
"(?<NGINX_HTTP_REFERER>[^"]*)" \s+ # NGINX_HTTP_REFERER
"(?<NGINX_HTTP_USER_AGENT>[^"]*)" # NGINX_HTTP_USER_AGENT
# When log2journal can detect the filename of each log entry (tail gives it
# only when it tails multiple files), this key will be used to send the
# filename to the journals.
filename:
key: NGINX_LOG_FILENAME
# Duplicate fields under a different name. You can duplicate multiple fields
# to a new one and then use rewrite rules to change its value.
duplicate:
# we insert the field PRIORITY as a copy of NGINX_STATUS.
- key: PRIORITY
values_of:
- NGINX_STATUS
# we inject the field NGINX_STATUS_FAMILY as a copy of NGINX_STATUS.
- key: NGINX_STATUS_FAMILY
values_of:
- NGINX_STATUS
# Inject constant fields into the journal logs.
inject:
- key: SYSLOG_IDENTIFIER
value: "nginx-log"
# Rewrite the value of fields (including the duplicated ones).
# The search pattern can have named groups, and the replace pattern can use
# them as ${name}.
rewrite:
# PRIORTY is a duplicate of NGINX_STATUS
# Valid PRIORITIES: 0=emerg, 1=alert, 2=crit, 3=error, 4=warn, 5=notice, 6=info, 7=debug
- key: "PRIORITY"
search: "^[123]"
replace: 6
- key: "PRIORITY"
search: "^4"
replace: 5
- key: "PRIORITY"
search: "^5"
replace: 3
- key: "PRIORITY"
search: ".*"
replace: 4
# NGINX_STATUS_FAMILY is a duplicate of NGINX_STATUS
- key: "NGINX_STATUS_FAMILY"
search: "^(?<first_digit>[1-5])"
replace: "${first_digit}xx"
- key: "NGINX_STATUS_FAMILY"
search: ".*"
replace: "UNKNOWN"
# Control what to do when input logs do not match the main PCRE2 pattern.
unmatched:
# The journal key to log the PCRE2 error message to.
# Set this to MESSAGE, so you to see the error in the log.
key: MESSAGE
# Inject static fields to the unmatched entries.
# Set PRIORITY=1 (alert) to help you spot unmatched entries in the logs.
inject:
- key: PRIORITY
value: 1
--------------------------------------------------------------------------------
systemd-cat-native
options
Netdata systemd-cat-native v1.40.0-1214-gae733dd49
This program reads from its standard input, lines in the format:
KEY1=VALUE1\n
KEY2=VALUE2\n
KEYN=VALUEN\n
\n
and sends them to systemd-journal.
- Binary journal fields are not accepted at its input
- Binary journal fields can be generated after newline processing
- Messages have to be separated by an empty line
- Keys starting with underscore are not accepted (by journald)
- Other rules imposed by systemd-journald are imposed (by journald)
Usage:
systemd-cat-native
[--newline=STRING]
[--log-as-netdata|-N]
[--namespace=NAMESPACE] [--socket=PATH]
[--url=URL [--key=FILENAME] [--cert=FILENAME] [--trust=FILENAME|all]]
The program has the following modes of logging:
* Log to a local systemd-journald or stderr
This is the default mode. If systemd-journald is available, logs will be
sent to systemd, otherwise logs will be printed on stderr, using logfmt
formatting. Options --socket and --namespace are available to configure
the journal destination:
--socket=PATH
The path of a systemd-journald UNIX socket.
The program will use the default systemd-journald socket when this
option is not used.
--namespace=NAMESPACE
The name of a configured and running systemd-journald namespace.
The program will produce the socket path based on its internal
defaults, to send the messages to the systemd journal namespace.
* Log as Netdata, enabled with --log-as-netdata or -N
In this mode the program uses environment variables set by Netdata for
the log destination. Only log fields defined by Netdata are accepted.
If the environment variables expected by Netdata are not found, it
falls back to stderr logging in logfmt format.
* Log to a systemd-journal-remote TCP socket, enabled with --url=URL
In this mode, the program will directly sent logs to a remote systemd
journal (systemd-journal-remote expected at the destination)
This mode is available even when the local system does not support
systemd, or even it is not Linux, allowing a remote Linux systemd
journald to become the logs database of the local system.
Unfortunately systemd-journal-remote does not accept compressed
data over the network, so the stream will be uncompressed.
--url=URL
The destination systemd-journal-remote address and port, similarly
to what /etc/systemd/journal-upload.conf accepts.
Usually it is in the form: https://ip.address:19532
Both http and https URLs are accepted. When using https, the
following additional options are accepted:
--key=FILENAME
The filename of the private key of the server.
The default is: /etc/ssl/private/journal-upload.pem
--cert=FILENAME
The filename of the public key of the server.
The default is: /etc/ssl/certs/journal-upload.pem
--trust=FILENAME | all
The filename of the trusted CA public key.
The default is: /etc/ssl/ca/trusted.pem
The keyword 'all' can be used to trust all CAs.
--keep-trying
Keep trying to send the message, if the remote journal is not there.
NEWLINES PROCESSING
systemd-journal logs entries may have newlines in them. However the
Journal Export Format uses binary formatted data to achieve this,
making it hard for text processing.
To overcome this limitation, this program allows single-line text
formatted values at its input, to be binary formatted multi-line Journal
Export Format at its output.
To achieve that it allows replacing a given string to a newline.
The parameter --newline=STRING allows setting the string to be replaced
with newlines.
For example by setting --newline='{NEWLINE}', the program will replace
all occurrences of {NEWLINE} with the newline character, within each
VALUE of the KEY=VALUE lines. Once this this done, the program will
switch the field to the binary Journal Export Format before sending the
log event to systemd-journal.