td-agent is unable to ship logs from file when the file contains a single multiline log - elastic-stack

td-agent unable to ship logs from line when log file contains single multiline logs. The logs are not picked up by td-agent until a new line is added
Installed td-agent on a windows machine. configured the td-agent.conf file to pick logs from a file containing single multiline log. The logs are not shipped until a new line is added to the file
td-agent.conf
<source>
#type tail
path "C:/abc.txt"
pos_file etc/td-agent/pos/abc-file.pos
tag abc-file-test
multiline_flush_interval 5s
format multiline
<parse>
#type multiline
format_firstline /^2019*/
format1 /^(?<message>.*)/
</parse>
read_from_head true
</source>
<filter abc-file-**>
#type record_modifier
<record>
entity "abc"
component ${tag}
hostname "#{Socket.gethostname}"
</record>
</filter>
<match abc-file-**>
#type kafka_buffered
brokers "localhost:9092"
default_topic abc-topic
flush_interval 5s
kafka_agg_max_bytes 1000000
max_send_limit_bytes 10000000
discard_kafka_delivery_failed true
output_data_type json
compression_codec gzip
max_send_retries 1
required_acks 1
get_kafka_client_log true
</match>
abc.txt log file:
2019-04-12 12:09:45 INFO abc.java exception occured at com.*************
at com.**************************
at com.************************
The logs should flow to kafka but it doesn't

It is the limitation of in_tail plugin.
How about using fluent-plugin-concat with multiline_end_regexp parameter?

Related

Fluentd incorrectly routing logs to its own STDOUT

I have a GKE cluster, in which I'm using Fluentd Kubernetes Daemonset v1.14.3 (docker image: fluent/fluentd-kubernetes-daemonset:v1.14.3-debian-gcs-1.1) to collect the logs from certain containers and forward them to a GCS Bucket.
My configuration takes logs from /var/log/containers/*.log, filter the containers based on some kubernetes annotations and then upload them to GCS using a plugin.
In most cases, this works correctly, but I'm currently stuck with some weird issue:
certain containers' logs are sometimes printed into fluentd own stdout. Let me elaborate:
Assume we have a container called helloworld which runs echo "HELLO WORLD".
Soon after the container starts, I can see in fluentd's own logs:
2022-07-20 13:29:04 +0000 [info]: #0 [in_tail_container_logs] following tail of /var/log/containers/my-pod_my-namespace_helloworld-7e6359514a5601e5ad1823d145fd3b73f7b65648f5cb760f2c1855dabe27d606.log"
...
HELLO WORLD
...
This .log file contains the standard output of my docker container ("HELLO WORLD" in some json structured format).
Normally, fluentd should tail this log, and send the messages to the GCS plugin, which should upload them to the destination bucket. But sometimes, the logs are printed directly into fluentd's own output, instead of being passed to the plugin.
I'd appreciate any help to root-cause this issue.
Things I've looked into
I have increased the verbosity of the logs using fluentd -vv, but found nothing relevant.
This happens kind of randomly, but only for certain containers. It never happens for some containers, but for others it sometimes does, and sometimes it doesn't.
There's nothing special about the containers presenting this issue.
We don't have any configuration for fluentd to stream the logs to stdout. We triple checked this.
The issue happens on a GKE cluster (1.21.12-gke.1700).
Below the fluentd configuration being used:
<label #FLUENT_LOG>
<match fluent.**>
#type null
</match>
</label>
<source>
#type tail
#id in_tail_container_logs
path /var/log/containers/*.log
exclude_path ["/var/log/containers/fluentd-*", "/var/log/containers/fluentbit-*", "/var/log/containers/kube-*", "/var/log/containers/pdsci-*", "/var/log/containers/gke-*"]
pos_file /var/log/fluentd-containers.log.pos
tag "kubernetes.*"
refresh_interval 1s
read_from_head true
follow_inodes true
<parse>
#type json
time_format %Y-%m-%dT%H:%M:%S.%NZ
keep_time_key true
</parse>
</source>
<filter kubernetes.**>
#type kubernetes_metadata
#id filter_kube_metadata
kubernetes_url "#{'https://' + ENV.fetch('KUBERNETES_SERVICE_HOST') + ':' + ENV.fetch('KUBERNETES_SERVICE_PORT') + '/api'}"
verify_ssl true
ca_file "#{ENV['KUBERNETES_CA_FILE']}"
watch false # Don't watch for changes in container metadata
de_dot false # Don't replace dots in labels and annotations
skip_labels false
skip_master_url true
skip_namespace_metadata true
annotation_match ["app.*\/log-.+"]
</filter>
<filter kubernetes.**>
#type grep
<regexp>
key $.kubernetes.namespace_name
pattern /^my-namespace$/
</regexp>
<regexp>
key $['kubernetes']['labels']['example.com/collect']
pattern /^yes$/
</regexp>
</filter>
<match kubernetes.**>
# docs: https://github.com/daichirata/fluent-plugin-gcs
#type gcs
#id out_gcs
project "#{ENV['GCS_BUCKET_PROJECT']}"
bucket "#{ENV.fetch('GCS_BUCKET_PROJECT')"
object_key_format %Y%m%d/%H%M/%{$.kubernetes.pod_name}_${$.kubernetes.container_name}_${$.docker.container_id}/%{index}.%{file_extension}
store_as json
<buffer time,$.kubernetes.pod_name,$.kubernetes.container_name,$.docker.container_id>
#type file
path /var/log/fluentd-buffers/gcs.buffer
timekey 30
timekey_wait 5
timekey_use_utc true # use utc
chunk_limit_size 1MB
flush_at_shutdown true
</buffer>
<format>
#type json
</format>
</match>

kubernetes container_name got null in fluentdconfiguration

I try to get log from my application container and attach fluentd log agent as sidecar container in my project. And I want to get which log is coming from which application in my Kibana dashboard. That's why I configured like that in fluentd.
<source>
#id fluentd-containers.log
#type tail
path /var/log/containers/mylog*.log
pos_file /var/log/es-containers.log.pos
time_format %Y-%m-%dT%H:%M:%S.%NZ
tag kubernetes.myapp.container
read_from_head true
<parse>
#type none
</parse>
</source>
<filter kubernetes**>
#type record_transformer
enable_ruby true
<record>
service_name ${tag_parts[1]}
instance_name ${record["kubernetes"]["container_name"]}
log_type ${tag_parts[2]}
host_name ${hostname}
send_to "ES"
</record>
</filter>
<match kubernetes.**>
#type stdout
</match>
But when I deployed it, ${[record[""]["container_name"]} got null as displaying unknown placeholder ${record["kubernetes"]["container_name"]}. Please help me how to resolve it, thanks.
Got that error message
0 dump an error event: error_class=RuntimeError error="failed to
expand record[\"kubernetes\"][\"container_name\"] : error =
undefined method []' for nil:NilClass" location="/fluentd/vendor/bundle/ruby/2.6.0/gems/fluentd-1.11.2/lib/fluent/plugin/filter_record_transformer.rb:310:in rescue in expand'" tag="kubernetes.myapp.container" time=2020-09-23
11:29:05.705209241 +0000 record={"message"=>"{"log":"I0923
11:28:59.157177 1 main.go:71] Health check
succeeded\n","stream":"stderr","time":"2020-09-23T11:28:59.157256887Z"}"}
`
The record doesn't contain the required fields that you want to access i.e. record["kubernetes"]["container_name"].
You need to make sure that it has those fields.
Please go through Container Deployment and kubernetes_metadata_filter plugin for the detailed information on this.

fluentbit writes to /var/log/messages

I'm running fluentbit (td-agent-bit) on a CentOS system in order to output all logs in a centralized system. Everytime fluentbit pushes a record to the remote location, it adds a record in /var/log/messages as well, leading up to a huge log filesize.
Jul 21 08:48:53 hostname td-agent-bit: [2020/07/21 08:48:53] [ info] [out_azure] customer_id=XXXXXXXXXXXXXXXXXXXXXXXX, HTTP status=200
Any idea how can I stop a service (td-agent-bit) from writing to /var/log/messages? Couldn't find any configuration parameter (e.g. verbose) in fluentbit documentation. Thanks!
Your log_level is "info" which includes a lot of messages of the pipeline. You can either decrease the log level inside the output section of the plugin to "error" only, e.g:
[OUTPUT]
name azure
match *
log_level error
note: you can decrease the general log_level also in the main [SERVICE] section.

How to use fluentd+elasticsearch+grafana to display the first 12 characters of the container ID?

Need to use fluentd to collect logs of kubernets and store logs in elasticsearch. And use grafana to display logs and digests. However, the docker's container id is 64 characters. How to set the fluentd, or elasticsearch, or grafana, to display just first 12-character of container id in grafana?
my config file as follow:
https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/fluentd-elasticsearch/fluentd-es-configmap.yaml
Try something like this at the end of containers.input.conf:
<filter kubernetes.**>
#type record_transformer
enable_ruby
<record>
docker.container_id ${record["docker.container_id"][0,12]}
</record>
</filter>
If it is okay to only store 12-character IDs, you can add a fluent filter parser (tested with Fluent Bit only):
parsers.conf
[PARSER]
Name dockerid_parser
Format regex
Regex ^(?<container_id>.{12})
fluent-docker.conf
[SERVICE]
...
Parsers_File /full/path/to/parsers.conf
...
[FILTER]
Name parser
Match *
Key_Name container_id
Parser dockerid_parser
Reserve_Data On
...

Fluentd create a tag based on a field value

I have a Kubernetes cluster in which i'm trying to aggregate container logs on the nodes and send them to MongoDB. However i need to be able to send the log records to different MongoDB servers based on values in the log record it self.
I'm using the fluent-plugin-kubernetes_metadata_filter plugin to attach additional information from Kubernetes to the log record. One of those fields are kubernetes_namespace_name. Is it possible to use that field to create a tag which i can use to match against the mongodb output plugin.
For example. Below i'm using only one output, but the idea is to have multiple and let fluent send the logs to that mongodb database based on the value in the field kubernetes_namespace_name:
<source>
#type tail
#label #KUBERNETES
path /var/log/containers/*.log
pos_file /var/log/es-containers.log.pos
time_format %Y-%m-%dT%H:%M:%S
tag kubernetes.*
format json
keep_time_key true
read_from_head true
</source>
<label #KUBERNETES>
<filter kubernetes.**>
#type kubernetes_metadata
kubernetes_url "#{ENV['K8S_HOST_URL']}"
bearer_token_file /var/run/secrets/kubernetes.io/serviceaccount/token
ca_file /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
include_namespace_id true
</filter>
<filter kubernetes.**>
#type flatten_hash
separator _
</filter>
# < Tag 'kubernetes.namespace.default' is created here somehow >
<match kubernetes.namespace.default>
#type mongo
host "#{ENV['MONGO_HOST']}"
port "#{ENV['MONGO_PORT']}"
database "#{ENV['MONGO_DATABASE']}"
collection "#{ENV['MONGO_COLLECTION']}"
capped
capped_size 1024m
user "#{ENV['MONGO_USER']}"
password "#{ENV['MONGO_PASSWORD']}"
time_key time
flush_interval 10s
</match>
</label>
instead of using the tag, you can use the message content to do the filtering using Fluentd's grep filter. You can add the filter after the kubernetes meta data filter, and before the data flattener. This allows you to specify the key kubernetes_namespace_name and then route according to the value within. As you may have additional MongoDB outputs using labels can help separate the process workflows.
Documentation: https://docs.fluentd.org/v0.12/articles/filter_grep
Example:
<filter kubernetes.**>
#type grep
<regexp>
key kubernetes_namespace_name
pattern cool
</regexp>
</filter>
<YOUR MONGO CONFIG HERE>