Filebeat is configured to use input from kafka and output to file
When the multiline setting is turned off, the output is published to a file.
But when kafka input is configured with mutiline, no output in the file(file is not even created)
Here is relevant filebeat configuration
Input configuration
filebeat.inputs:
- type: kafka
hosts:
- <ip>:9092
topics:
- "my-multiline-log"
group_id: "kafka-consumer-filebeat"
parsers:
- multiline:
# type: pattern
pattern: '^'
negate: true
match: after
Output Configuration:
output.file:
path: "/tmp/filebeat"
filename: filebeat
# codec.format:
# string: '%{[message]}'
Filebeat relevant logs
2021-12-16T11:02:34.551Z INFO [input.kafka] compat/compat.go:111 Input kafka starting {"id": "19A7FFEEC9EDFC04"}
2021-12-16T11:02:34.551Z INFO [input.kafka.kafka input] kafka/input.go:129 Starting Kafka input {"id": "19A7FFEEC9EDFC04", "hosts": ["<ip>:9092"]}
2021-12-16T11:02:38.158Z DEBUG [reader_multiline] multiline/pattern.go:142 Multiline event flushed because timeout reached.
2021-12-16T11:02:44.767Z DEBUG [reader_multiline] multiline/pattern.go:142 Multiline event flushed because timeout reached.
2021-12-16T11:02:51.481Z DEBUG [reader_multiline] multiline/pattern.go:142 Multiline event flushed because timeout reached.
2021-12-16T11:02:58.225Z DEBUG [reader_multiline] multiline/pattern.go:142 Multiline event flushed because timeout reached.
2021-12-16T11:03:04.555Z DEBUG cgroup/util.go:276 PID 1 contains a cgroups V2 path (0::/) but no V2 mountpoint was found.
This may be because metricbeat is running inside a container on a hybrid system.
To monitor cgroups V2 processess in this way, mount the unified (V2) hierarchy inside
the container as /sys/fs/cgroup/unified and start metricbeat with --system.hostfs.
Getting same four line of reader_multiline in the logs repeated
Edit: The support for multiline parser for kafka has been added in version 7.16
Related
The Elastic documentation states that the event key can be extracted from the event using a format string But unfortunately, I couldn't find any proper example for it.
My input file contains a json object which also has nested json objects:
{"attribute":"XYZ","operation":"ABC","params":{"id":"12345"},"serviceId":"service_1","deployedRegion":"US"},"timestamp":1659420866000,""stats":{"type”:FILEBEAT,”retry":"0"}}
I want the filebeat kafka output event to get the key dynamically from the kafka event. From the above sample json provided, i want the kafka event key to be the value of "id".
I have tried multiple combinations of the output.kafka.key value but unsuccessful so far. Here is my kafka output configuration:
output.kafka:
hosts: ["localhost:9092"]
topic: 'TEST'
required_acks: 1
version: '0.11.0.2'
compression: none
max_retries: 3
broker_timeout: 15
codec.format:
string: '%{[message]}'
key: '%{[message.id]:default}'
processors:
- add_host_metadata: ~
- drop_fields:
fields: ["#metadata","ecs", "log","input","host","agent"]
ignore_missing: false
Any support is appreciated. Thank you!
I think the word 'log' is used in more than one way when it comes to Kafka. I'm talking about log output that ends up in stdout or your-app.log or splunk/datadog/etc.
Every 30 seconds, something happens 3 times. And each time it happens, approximately 65 log events appear. I'm wondering
What is this something?
Can I cause all of its output to appear on a single line? (My log 'provider' charges per log event, and each line counts as a separate event.)
The logs are like this:
INFO - Kafka version: ...
INFO - Kafka commitId: ...
INFO - Kafka startTimeMs: ...
INFO - App info kafka.admin.client for adminclient-...
INFO - Metrics scheduler closed
INFO - Closing reporter org.apache.kafka.common.metrics.JmxReporter
INFO - Metrics reporters closed
INFO - AdminClientConfig values:
bootstrap.servers = [...
foo = ...
bar = ...
baz = ...
qux = ...
Each line is an Slf4j event. If you want to change its format from your client or the broker, you'll need to modify your logging framework configurations. In the broker, you'll find a log4j.properties file.
All output cannot appear in a single line. Each INFO, for example, is an individual event. These can be reduced by disabling the logs for the Java packages that print them.
The alternative is to install some other log forwarder on your systems like Fluentd and parse/filter/forward data using that.
After install and config Suricata 5.0.2 according to document https://suricata.readthedocs.io/.
I try to change some configuration in suricata.yaml by adding:
- alert-json-log:
enabled: yes
filetype: kafka
kafka:
brokers: >
xxx-kafka-online003:9092,
xxx-kafka-online004:9092,
xxx-kafka-online005:9092,
xxx-kafka-online006:9092,
xxx-kafka-online007:9092
topic: nsm_event
partitions: 5
http: yes
Next I run Suricata, and receive the error
Invalid entry for alert-json-log.filetype. Expected "regular" (default), "unix_stream", "pcie" or "unix_dgram"
I don't know to configure on Suricata to enable sending log to Kafka topics.
Please help.
I don't see Kafka listed as an output type, therefore "no, there is not"
Refer docs: https://suricata.readthedocs.io/en/suricata-5.0.2/output/index.html
Plus, I'm not sure I understand what you expect http: yes to do since Kafka is not an HTTP service
What you could do is set filetype: unix_stream, then I assume that is Syslog, and you can add another service like Kafka Connect or Fluentd or Logstash to route that data to Kafka.
In other words, services don't need to integrate with Kafka. Plenty of alternatives exist to read files or stdout/stderr/syslog streams
I'm using the Filebeat -> Kafka output connector and I would like to construct the hosts and topic parameter based on the information traveling in the messages that filebeat is processing at that moment.
To my surprise, specifying exactly the same expression leads to it being resolved for the topic and not for the hosts field. Any advice on how I can achieve my goal?
My configuration is below:
kafka.yaml: |
processors:
- add_kubernetes_metadata:
namespace: {{ .Release.Namespace }}
# Drop all log lines that don't contain kubernetes.labels.entry field
- drop_event:
when:
not:
regexp:
kubernetes.labels.entry: ".*"
filebeat.config_dir: /conf/
output.kafka:
hosts: '%{[kubernetes][labels][entry]}'
topic: '%{[kubernetes][labels][entry]}'
required_acks: 1
version: 0.11.0.0
client_id: filebeat
bulk_max_size: 100
max_message_bytes: 20480
And here's the error message I'm getting from filebeat:
2018/05/09 01:54:29.805431 log.go:36: INFO Failed to connect to broker [[%{[kubernetes][labels][entry]} dial tcp: address %{[kubernetes][labels][entry]}: missing port in address]]: %!s(MISSING)
I did try adding port to the above config, the error message then still shows that the field has not been resolved:
2018/05/09 02:13:41.392742 log.go:36: INFO client/metadata fetching metadata for all topics from broker [[%{[kubernetes][labels][entry]}:9092]]
2018/05/09 02:13:41.392854 log.go:36: INFO Failed to connect to broker [[%{[kubernetes][labels][entry]}:9092 dial tcp: address %{[kubernetes][labels][entry]}:9092: unexpected '[' in address]]: %!s(MISSING)
I found the answer on the Elastic forum:
You cannot control hosts or files (in the case of the file output) via variables. Doing so would require Beats to manage state and connections to each different host. You can only use variables to control the destination topic, but not the broker.
So it's not possible to achieve what I'd like to do at this time.
when filebeat output data to kafka , there are many warning message in filebeat log.
..
*WARN producer/broker/0 maximum request accumulated, waiting for space
*WARN producer/broker/0 maximum request accumulated, waiting for space
..
nothing special in my filebeat config:
..
output.kafka:
hosts: ["localhost:9092"]
topic: "log-oneday"
..
i have also updated these socket setting in kafka:
...
socket.send.buffer.bytes=10240000
socket.receive.buffer.bytes=10240000
socket.request.max.bytes=1048576000
queued.max.requests=1000
...
but it did not work.
is there something i missing? or i have to increase those number bigger?
besides , no error or exception found in kafka server log
is there any expert have any idea about this ?
thanks
Apparently you have only one partition in your topic. Try to increase partitions for the topic. See the links below for more information.
More Partitions Lead to Higher Throughput
https://www.confluent.io/blog/how-to-choose-the-number-of-topicspartitions-in-a-kafka-cluster/
https://kafka.apache.org/documentation/#basic_ops_modify_topic
Try the following command (replacing info with your particular use case):
bin/kafka-topics.sh --zookeeper zk_host:port/chroot --alter --topic my_topic_name --partitions 40
You need to configure 3 things:
Brokers
Filebeat kafka output
Consumer
Here a example (change paths according your environment).
Broker configuration:
# open kafka server configuration file
vim /opt/kafka/config/server.properties
# add this line
# The largest record batch size allowed by Kafka.
message.max.bytes=100000000
# restart kafka service
systemctl restart kafka.service
Filebeat kafka output:
output.kafka:
...
max_message_bytes: 100000000
Consumer configuration:
# larger than the max.message.size
max.partition.fetch.bytes=200000000