Fluentd multiline log events missing fields - kubernetes

I have log lines in Kubernetes:
2022-06-12T19:55:19.014511382Z stdout F dbug: Microsoft.Extensions.Http.DefaultHttpClientFactory[101]
2022-06-12T19:55:19.014515391Z stdout F Ending HttpMessageHandler cleanup cycle after 0.0016ms - processed: 0 items - remaining: 1 items
2022-06-12T19:55:29.010438412Z stdout F dbug: Microsoft.Extensions.Http.DefaultHttpClientFactory[100]
2022-06-12T19:55:29.010512909Z stdout F Starting HttpMessageHandler cleanup cycle with 1 items
2022-06-12T19:55:29.010518914Z stdout F dbug: Microsoft.Extensions.Http.DefaultHttpClientFactory[101]
2022-06-12T19:55:29.010532801Z stdout F Ending HttpMessageHandler cleanup cycle after 0.002ms - processed: 0 items - remaining: 1 items
I'm trying to parse these lines using the multiline plugin. I have the following config:
<source>
#type tail
#id in_tail_container_logs
path /var/log/containers/*namespace*.log
pos_file /var/log/fluentd-containers.log.pos
tag kubernetes.*
read_from_head true
<parse>
#type multiline
format_firstline /^(?<logtime>[^\]]*) \w+ \w (?<level>\w+): (?<classpath>(\w+\.?)+\[\d+\])/
format1 /^[^\]]* \w+ \w \s+ (?<message>.+)/
</parse>
</source>
<filter kubernetes.var.log.containers.**.log>
#type kubernetes_metadata
</filter>
<match kubernetes.var.log.containers.**namespace**.log>
#type loggly
loggly_url "https://logs-01.loggly.com/inputs/#{ENV['TOKEN']}/"
</match>
I can see the logs coming into Loggly, but they're missing all the fields I'm adding in the multiline parser's config. I have debug logging enabled and I can see that the loggly plugin is getting the records without the fields before sending them.
I tested the regular expressions against the lines individually using fluentular and they seem to work individually. Is there something I'm missing from my config that could be causing the fields to be missing?

Related

Fluentd tail reads every time from the start of the file

I am trying to forward syslogs,
I start the td-agent with read_from_head param set to false.
So on checking logs nothing happens, then I added a line to the target file via
vscode so there wasn't issue with the inode changing.
The issue that I face is that upon any addition of log lines to the target file, the
pos_file gets updated and reads from the top and after adding
another line td-agent collects from the top again.
How does fluentd in_tail works?
<source>
#type tail
format none
path /home/user/Documents/debugg/demo.log
pos_file /var/log/td-agent/demo.log.pos
read_from_head false
follow_inodes true
tag syslog
<source>

containerd multiline logs parsing with fluentbit

After shifting from Docker to containerd as docker engine used by our kubernetes, we are not able to show the multiline logs in a proper way by our visualization app (Grafana) as some details prepended to the container/pod logs by the containerd itself (i.e. timestamp, stream & log severity to be specific it is appending something like the following and as shown in the below sample: 2022-07-25T06:43:17.20958947Z stdout F  ) which make some confusion for the developers and the application owners.
I am showing here a dummy sample of the logs generated by the application and how it got printed in the nodes of kuberenetes'nodes after containerd prepended the mentioned details.
The following logs generated by the application (kubectl logs ):
2022-07-25T06:43:17,309ESC[0;39m dummy-[txtThreadPool-2] ESC[39mDEBUGESC[0;39m
  ESC[36mcom.pkg.sample.ComponentESC[0;39m - Process message meta {
  timestamp: 1658731397308720468
  version {
      major: 1
      minor: 0
      patch: 0
  }
}
when I check the logs in the filesystem (/var/log/container/ABCXYZ.log) :
2022-07-25T06:43:17.20958947Z stdout F 2022-07-25T06:43:17,309ESC[0;39m dummy-[txtThreadPool-2]
ESC[39mDEBUGESC[0;39m
ESC[36mcom.pkg.sample.ComponentESC[0;39m - Process message meta {
2022-07-25T06:43:17.20958947Z stdout F timestamp: 1658731449723010774
2022-07-25T06:43:17.209593379Z stdout F version {
2022-07-25T06:43:17.209595933Z stdout F major: 14
2022-07-25T06:43:17.209598466Z stdout F minor: 0
2022-07-25T06:43:17.209600712Z stdout F patch: 0
2022-07-25T06:43:17.209602926Z stdout F }
2022-07-25T06:43:17.209605099Z stdout F }
I am able to parse the multiline logs with fluentbit but the problem is I am not able to remove the details injected by containerd ( >> 2022-07-25T06:43:17.209605099Z stdout F .......). So is there anyway to configure containerd to not prepend these details somehow in the logs and print them as they are generated from the application/container ?
On the other hand is there any plugin to remove such details from fluentbit side .. as per the existing plugins none of them can manipulate or change the logs (which is logical  as the log agent should not do any change on the logs).
Thanks in advance.
This is the workaround I followed to show the multiline log lines in Grafana by applying extra fluentbit filters and multiline parser.
1- First I receive the stream by tail input which parse it by a multiline parser (multilineKubeParser).
2- Then another filter will intercept the stream to do further processing by a regex parser (kubeParser).
3- After that another filter will remove the details added by the containerd by a lua parser ().
fluent-bit.conf: |-
[SERVICE]
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_PORT 2020
Flush 1
Daemon Off
Log_Level warn
Parsers_File parsers.conf
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
multiline.Parser multilineKubeParser
Exclude_Path /var/log/containers/*_ABC-logging_*.log
DB /run/fluent-bit/flb_kube.db
Mem_Buf_Limit 5MB
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Merge_Log On
Merge_Parser kubeParser
K8S-Logging.Parser Off
K8S-Logging.Exclude On
[FILTER]
Name lua
Match kube.*
call remove_dummy
Script filters.lua
[Output]
Name grafana-loki
Match kube.*
Url http://loki:3100/api/prom/push
TenantID ""
BatchWait 1
BatchSize 1048576
Labels {job="fluent-bit"}
RemoveKeys kubernetes
AutoKubernetesLabels false
LabelMapPath /fluent-bit/etc/labelmap.json
LineFormat json
LogLevel warn
labelmap.json: |-
{
"kubernetes": {
"container_name": "container",
"host": "node",
"labels": {
"app": "app",
"release": "release"
},
"namespace_name": "namespace",
"pod_name": "instance"
},
"stream": "stream"
}
parsers.conf: |-
[PARSER]
Name kubeParser
Format regex
Regex /^([^ ]*).* (?<timeStamp>[^a].*) ([^ ].*)\[(?<requestId>[^\]]*)\] (?<severity>[^ ]*) (?<message>[^ ].*)$/
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L%z
Time_Keep On
Time_Offset +0200
[MULTILINE_PARSER]
name multilineKubeParser
type regex
flush_timeout 1000
rule "start_state" "/[^ ]* stdout .\s+\W*\w+\d\d\d\d-\d\d-\d\d \d\d\:\d\d\:\d\d,\d\d\d.*$/" "cont"
rule "cont" "/[^ ]* stdout .\s+(?!\W+\w+\d\d\d\d-\d\d-\d\d \d\d\:\d\d\:\d\d,\d\d\d).*$/" "cont"
filters.lua: |-
function remove_dummy(tag, timestamp, record)
new_log=string.gsub(record["log"],"%d+-%d+-%d+T%d+:%d+:%d+.%d+Z%sstdout%sF%s","")
new_record=record
new_record["log"]=new_log
return 2, timestamp, new_record
end
As I mentioned this is a workaround till I can find any other/better solution.
From the configuration options for containerd, it appears there's no way to configure logging in any way. You can see the config doc here.
Also, I checked at the logging code inside containerd and it appears this is prepended to the logs as they are redirected from the stdout of the container. You can see that the testcase here checks for appropriate fields by "splitting" the log line received. It checks for a tag and a stream entry prepended to the content of the log. I suppose that's the way logs are processed in containerd.
The best thing to do would be to open an issue in the project with your design requirement and perhaps the team can develop configurable stdout redirection for you.
This might help. They use a custom regex to capture the message log and the rest of the info in the logs and then use lift in the nest filter to flatten the json.
https://github.com/microsoft/fluentbit-containerd-cri-o-json-log

CentOS EPEL fail2ban not processing systemd journal for tomcat

I've installed fail2ban 0.10.5-2.el7 from EPEL on CentOS 7.8. I'm trying to get it to work with systemd for processing a Tomcat log (also systemd).
In jail.local I added:
[guacamole]
enabled = true
port = http,https
backend = systemd
In filter.d/guacamole.conf:
[Definition]
failregex = Authentication attempt from <HOST> for user "[^"]*" failed\.$
ignoreregex =
journalmatch = _SYSTEMD_UNIT=tomcat.service + _COMM=java
If I run journalctl -u tomcat.service I see all the log lines. The ones I am interested in look like this:
May 18 13:58:26 myhost catalina.sh[42065]: 13:58:26.485 [http-nio-8080-exec-6] WARN o.a.g.r.auth.AuthenticationService - Authentication attempt from 1.2.3.4 for user "test" failed.
If I redirect journalctl -u tomcat.service to a log file, and process it with fail2ban-regex then it works exactly the way I want it to work, finding all the lines it needs.
% fail2ban-regex /tmp/j9 /etc/fail2ban/filter.d/guacamole.conf
Running tests
=============
Use failregex filter file : guacamole, basedir: /etc/fail2ban
Use log file : /tmp/j9
Use encoding : UTF-8
Results
=======
Failregex: 47 total
|- #) [# of hits] regular expression
| 1) [47] Authentication attempt from <HOST> for user "[^"]*" failed\.$
`-
Ignoreregex: 0 total
Date template hits:
|- [# of hits] date format
| [1] ExYear(?P<_sep>[-/.])Month(?P=_sep)Day(?:T| ?)24hour:Minute:Second(?:[.,]Microseconds)?(?:\s*Zone offset)?
| [570] {^LN-BEG}(?:DAY )?MON Day %k:Minute:Second(?:\.Microseconds)?(?: ExYear)?
`-
Lines: 571 lines, 0 ignored, 47 matched, 524 missed
[processed in 0.12 sec]
However, if fail2ban reads the journal directly then it does not work:
fail2ban-regex systemd-journal /etc/fail2ban/filter.d/guacamole.conf
It comes back right away, and processes 0 lines!
Running tests
=============
Use failregex filter file : guacamole, basedir: /etc/fail2ban
Use systemd journal
Use encoding : UTF-8
Use journal match : _SYSTEMD_UNIT=tomcat.service + _COMM=java
Results
=======
Failregex: 0 total
Ignoreregex: 0 total
Lines: 0 lines, 0 ignored, 0 matched, 0 missed
[processed in 0.00 sec]
I've tried to remove _COMM=java. It doesn't make a difference.
If I leave out the journal match line altogether, it at least processes all the lines from the journal, but does not find any matches (even though, as I mentioned, it processes a dump of the log file fine):
Running tests
=============
Use failregex filter file : guacamole, basedir: /etc/fail2ban
Use systemd journal
Use encoding : UTF-8
Results
=======
Failregex: 0 total
Ignoreregex: 0 total
Lines: 202271 lines, 0 ignored, 0 matched, 202271 missed
[processed in 34.54 sec]
Missed line(s): too many to print. Use --print-all-missed to print all 202271 lines
Either this is a bug, or I'm missing a small detail.
Thanks for any help you can provide.
To make sure the filter definition is properly initialised, it would be good to include the common definition. Your filter definition (/etc/fail2ban/filter.d/guacamole.conf) would therefore look like:
[INCLUDES]
before = common.conf
[Definition]
journalmatch = _SYSTEMD_UNIT='tomcat.service'
failregex = Authentication attempt from <HOST> for user "[^"]*" failed\.$
ignoreregex =
A small note given that your issue only occurs with systemd but not flat files, could you try the same pattern without $ at the end? Maybe there is an issue with the end of line when printed to the journal?
In your jail definition (/etc/fail2ban/jail.d/guacamole.conf), remember to define the ban time/find time/retries if they haven't already been defined in the default configuration:
[guacamole]
enabled = true
port = http,https
maxretry = 3
findtime = 1h
bantime = 1d
# "backend" specifies the backend used to get files modification.
# systemd: uses systemd python library to access the systemd journal.
# Specifying "logpath" is not valid for this backend.
# See "journalmatch" in the jails associated filter config
backend = systemd
Remember to restart the fail2ban service after doing such changes.

Disable time and tags in fluentd stdout output plugin

Remove time and tag from fluentd output plugin stdout with json
Fluentd's output plugin produces output like:
2017-11-28 11:43:13.814351757 +0900 tag: {"field1":"value1","field2":"value2"}
So timestamp and tag are before the json. How can I remove these fields - I only like to have the json output
<match pattern>
#type stdout
</match>
expected output:
{"field1":"value1","field2":"value2"}
Set the json format type which by default doesn't includes time and tag in output:
<match pattern>
#type stdout
<format>
#type json
</format>
</match>
Did you try filters?
<filter pattern>
#type record_transformer
<record>
${tag}
</record>
</filter>

Fluentd parser plugin

I am trying to implement a parser plugin for fluentd. Below are the configuration file and the plugin file.
Fluentd config file.
<source>
type syslog
port 9010
bind x.x.x.x
tag flog
format flog_message
</source>
Plugin file
module Fluent
class TextParser
class ElogParser < Parser
Plugin.register_parser("flog_message", self)
config_param :delimiter, :string, :default => " " # delimiter is configurable with " " as default
config_param :time_format, :string, :default => nil # time_format is configurable
# This method is called after config_params have read configuration parameters
def configure(conf)
if #delimiter.length != 1
raise ConfigError, "delimiter must be a single character. #{#delimiter} is not."
end
# TimeParser class is already given. It takes a single argument as the time format
# to parse the time string with.
#time_parser = TimeParser.new(#time_format)
end
def call(text)
# decode text
# ...
# decode text
yield result_hash
end
end
end
end
However the method call is not executed after running fluentd. Any help is greatly appreciated.
Since v0.12, use parse instead of call.
docs.fluentd.org was outdated so I just updated the article: http://docs.fluentd.org/articles/plugin-development#parser-plugins
Sorry for forget to update the document...