filebeat does not resolve complicated hosts parameter in kafka output connector - apache-kafka

I'm using the Filebeat -> Kafka output connector and I would like to construct the hosts and topic parameter based on the information traveling in the messages that filebeat is processing at that moment.
To my surprise, specifying exactly the same expression leads to it being resolved for the topic and not for the hosts field. Any advice on how I can achieve my goal?
My configuration is below:
kafka.yaml: |
processors:
- add_kubernetes_metadata:
namespace: {{ .Release.Namespace }}
# Drop all log lines that don't contain kubernetes.labels.entry field
- drop_event:
when:
not:
regexp:
kubernetes.labels.entry: ".*"
filebeat.config_dir: /conf/
output.kafka:
hosts: '%{[kubernetes][labels][entry]}'
topic: '%{[kubernetes][labels][entry]}'
required_acks: 1
version: 0.11.0.0
client_id: filebeat
bulk_max_size: 100
max_message_bytes: 20480
And here's the error message I'm getting from filebeat:
2018/05/09 01:54:29.805431 log.go:36: INFO Failed to connect to broker [[%{[kubernetes][labels][entry]} dial tcp: address %{[kubernetes][labels][entry]}: missing port in address]]: %!s(MISSING)
I did try adding port to the above config, the error message then still shows that the field has not been resolved:
2018/05/09 02:13:41.392742 log.go:36: INFO client/metadata fetching metadata for all topics from broker [[%{[kubernetes][labels][entry]}:9092]]
2018/05/09 02:13:41.392854 log.go:36: INFO Failed to connect to broker [[%{[kubernetes][labels][entry]}:9092 dial tcp: address %{[kubernetes][labels][entry]}:9092: unexpected '[' in address]]: %!s(MISSING)

I found the answer on the Elastic forum:
You cannot control hosts or files (in the case of the file output) via variables. Doing so would require Beats to manage state and connections to each different host. You can only use variables to control the destination topic, but not the broker.
So it's not possible to achieve what I'd like to do at this time.

Related

kafka is not ingesting logs

I created a Kafka service on a kube cluster based on bitnami chart. The deployment goes well.
Next, I installed a filebeat to send logs to that service. It seems to me that the filebeat communicates with the cluster but does not ingest the logs. Indeed, after starting the filebeat service, I find a topic called "logs-topic" which is created by filebeat. However, this topic remains empty. My configuration is given below :
filebeat.inputs:
- type: filestream
enabled: true
paths:
- /var/log/test.log
fields:
level: debug
review: 1
filebeat.config.modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: true
setup.template.settings:
index.number_of_shards: 1
setup.kibana:
processors:
- add_host_metadata:
when.not.contains.tags: forwarded
- add_cloud_metadata: ~
- add_docker_metadata: ~
- add_kubernetes_metadata: ~
output.kafka:
hosts: ["ip-172-31-26-181:30092"]
topic: "logs-topic"
codec.json:
pretty: false
Kafka topic is present
I have no name!#kafka-release-client:/$ kafka-topics.sh --list --bootstrap-server kafka-release.default.svc.cluster.local:9092
.......
logs-topic
syslog output
Dec 30 08:28:45 ip-172-31-23-248 filebeat[29968]: 2021-12-30T08:28:45.928Z#011INFO#011[file_watcher]#011filestream/fswatch.go:137#011Start next scan
Dec 30 08:28:53 ip-172-31-23-248 filebeat[29968]: 2021-12-30T08:28:53.186Z#011INFO#011[publisher]#011pipeline/retry.go:219#011retryer: send unwait signal to consumer
Dec 30 08:28:53 ip-172-31-23-248 filebeat[29968]: 2021-12-30T08:28:53.186Z#011INFO#011[publisher]#011pipeline/retry.go:223#011 done
Dec 30 08:28:55 ip-172-31-23-248 filebeat[29968]: 2021-12-30T08:28:55.927Z#011INFO#011[file_watcher]#011filestream/fswatch.go:137#011Start next scan
I've found a workaround for this issue and maybee it will help others.
My problem was the same described in the post above and i struggled a long time, because i didn't find any logs from filebeat, which indicates the problem. Also i couldn't find a way to increase logging level instead of the debug option, like:
filebeat -e -c filebeat.yml -v -d "publisher"
The Kafka Broker Logs pointed to some SSL Handshake Failures at INFO Level:
journalctl -f -u kafka
Jan 22 18:18:22 kafka sh[15893]: [2022-01-22 18:18:22,604] INFO [SocketServer listenerType=ZK_BROKER, nodeId=1] Failed authentication with /172.20.3.10 (SSL handshake failed) (org.apache.kafka.common.network.Selector)
Also with DEBUG level configured, i couldn't see the actual problem.
I decided to produce some data with a python script and was able to reproduce the problem:
from kafka import KafkaConsumer, KafkaProducer
import logging
logging.basicConfig(level=logging.DEBUG)
try:
topic = "test"
sasl_mechanism = "PLAIN"
username = "test"
password = "fake"
security_protocol = "SASL_SSL"
producer = KafkaProducer(bootstrap_servers='kafka.domain.com:9092',
security_protocol=security_protocol,
ssl_check_hostname=True,
ssl_cafile='ca.crt',
sasl_mechanism=sasl_mechanism,
sasl_plain_username=username,
sasl_plain_password=password)
producer.send(topic, "test".encode('utf-8'))
producer.flush()
print("Succeed")
except Exception as e:
print("Exception:\n\n")
print(e)
Output:
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'kafka'. (_ssl.c:1129)
DEBUG:kafka.producer.sender:Node 1 not ready; delaying produce of accumulated batch
WARNING:kafka.conn:SSL connection closed by server during handshake.
INFO:kafka.conn:<BrokerConnection node_id=1 host=kafka:9092 <handshake> [IPv4 ('198.168.1.11', 9092)]>: Closing connection. KafkaConnectionError: SSL connection closed by server during handshake
=> The Hostname can not be successfully verified in the certificate. I changed and added a lot of parameter (listener / advertised.listeners / host.name) in the servers.properties of Kafka and was not able to configure another/full domain name which will be returned to the client, in the metadata. It always returns "kafka:9092", and this is not the domainname which is indicated in the certificat.
Workaround:
Disable Hostname Verification on both side (server/client).
kafka-server: ssl.endpoint.identification.algorithm=
python-client: ssl_check_hostname=True
Filebeat can do this too, but it's not realy clear:
output.kafka:
ssl.verification_mode: certificate
certificate
Verifies that the provided certificate is signed by a trusted authority (CA), but does not perform any hostname verification.
Source: https://www.elastic.co/guide/en/beats/filebeat/current/configuration-ssl.html

Filebeat kafka input using multiline parser gives no output

Filebeat is configured to use input from kafka and output to file
When the multiline setting is turned off, the output is published to a file.
But when kafka input is configured with mutiline, no output in the file(file is not even created)
Here is relevant filebeat configuration
Input configuration
filebeat.inputs:
- type: kafka
hosts:
- <ip>:9092
topics:
- "my-multiline-log"
group_id: "kafka-consumer-filebeat"
parsers:
- multiline:
# type: pattern
pattern: '^'
negate: true
match: after
Output Configuration:
output.file:
path: "/tmp/filebeat"
filename: filebeat
# codec.format:
# string: '%{[message]}'
Filebeat relevant logs
2021-12-16T11:02:34.551Z INFO [input.kafka] compat/compat.go:111 Input kafka starting {"id": "19A7FFEEC9EDFC04"}
2021-12-16T11:02:34.551Z INFO [input.kafka.kafka input] kafka/input.go:129 Starting Kafka input {"id": "19A7FFEEC9EDFC04", "hosts": ["<ip>:9092"]}
2021-12-16T11:02:38.158Z DEBUG [reader_multiline] multiline/pattern.go:142 Multiline event flushed because timeout reached.
2021-12-16T11:02:44.767Z DEBUG [reader_multiline] multiline/pattern.go:142 Multiline event flushed because timeout reached.
2021-12-16T11:02:51.481Z DEBUG [reader_multiline] multiline/pattern.go:142 Multiline event flushed because timeout reached.
2021-12-16T11:02:58.225Z DEBUG [reader_multiline] multiline/pattern.go:142 Multiline event flushed because timeout reached.
2021-12-16T11:03:04.555Z DEBUG cgroup/util.go:276 PID 1 contains a cgroups V2 path (0::/) but no V2 mountpoint was found.
This may be because metricbeat is running inside a container on a hybrid system.
To monitor cgroups V2 processess in this way, mount the unified (V2) hierarchy inside
the container as /sys/fs/cgroup/unified and start metricbeat with --system.hostfs.
Getting same four line of reader_multiline in the logs repeated
Edit: The support for multiline parser for kafka has been added in version 7.16

How to configure Kafka MQTT connector to subscribe to all MQTT topic?

I’m using a Kafka MQTT connector to duplicate topics from my MQTT broker into my Kafka cluster. This works great so far but I can't make the connector to subscribe to all topics.
I have tried different configurations but none of them work:
With mqtt.topics="#" or mqtt.topics="topic/#", I get this error:
java.lang.IllegalArgumentException: Invalid usage of multi-level wildcard in topic string: "topic/#"
if I remove the quote: mqtt.topics=#, I have this error:
ERROR WorkerSourceTask{id=mqttConnector-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:184)
org.apache.kafka.connect.errors.ConnectException: Unable to connect to server (32103) - java.net.ConnectException: Connection refused (Connection refused)
Is it possible to subscribe to multiple topic using a wildcard # ?
The complete properties file if needed:
#
# Copyright [2018 - 2020] Confluent Inc.
#
name=mqttConnector
tasks.max=1
connector.class=io.confluent.connect.mqtt.MqttSourceConnector
mqtt.server.uri=ssl://localhost:8883
mqtt.topics=#
mqtt.username=my-username
mqtt.password=my-password
confluent.topic.bootstrap.servers=localhost:9092
confluent.topic.replication.factor=1
# Auto topic Creation
topic.creation.enable=true
topic.creation.default.replication.factor=-1
topic.creation.default.partitions=-1
# SSL
mqtt.ssl.trust.store.path=myStore
mqtt.ssl.trust.store.password=my-password
mqtt.ssl.key.store.path=myStore
mqtt.ssl.key.store.password=my-password
mqtt.ssl.key.password=

kafka listening on multiple interfaces

I have a requirement as below:
Kafka needs to listen to multiple interfaces, one external and one internal interface. All other components within the system will connect kafka to internal interfaces.
At installation time internal ips on other host are not reachable, need to do some configuration to make them reachable, we do not have control over that. So, assume that when kafka is coming up, internal IPs on other nodes are not reachable to each other.
Scenario:
I have two nodes in cluster:
node1 (External IP: 10.10.10.4, Internal IP: 5.5.5.4)
node2 (External IP: 10.10.10.5, Internal IP: 5.5.5.5)
Now, while installation, 10.10.10.4 can ping to 10.10.10.5 and vice versa, but 5.5.5.4 can not reach to 5.5.5.5. That will happen once kafka installation is done and after that someone does some config to make it reachable, so before kafka installation, we can do make them reachable.
Now the requirement is kafka brokers will exchange the messages on 10.10.10 interface, such that cluster will be formed, but clients will send messages on 5.5.5.X interface.
What I tried was as below:
listeners=USERS://0.0.0.0:9092,REPLICATION://0.0.0.0:9093
advertised.listeners=USERS://5.5.5.5:9092,REPLICATION://5.5.5.5:9093
Where 5.5.5.5 is the internal ip address.
But with this, while restarting kafka, I see below logs:
{"log":"[2020-06-23 19:05:34,923] INFO Creating /brokers/ids/2 (is it secure? false) (kafka.zk.KafkaZkClient)\n","stream":"stdout","time":"2020-06-23T19:05:34.923403973Z"}
{"log":"[2020-06-23 19:05:34,925] INFO Result of znode creation at /brokers/ids/2 is: OK (kafka.zk.KafkaZkClient)\n","stream":"stdout","time":"2020-06-23T19:05:34.925237419Z"}
{"log":"[2020-06-23 19:05:34,926] INFO Registered broker 2 at path /brokers/ids/2 with addresses: ArrayBuffer(EndPoint(5.5.5.5,9092,ListenerName(USERS),PLAINTEXT), EndPoint(5.5.5.5,9093,ListenerName(REPLICATION),PLAINTEXT)) (kafka.zk.KafkaZkClient)\n","stream":"stdout","time":"2020-06-23T19:05:34.926127438Z"}
.....
{"log":"[2020-06-23 19:05:35,078] INFO Kafka version : 1.1.0 (org.apache.kafka.common.utils.AppInfoParser)\n","stream":"stdout","time":"2020-06-23T19:05:35.078444509Z"}
{"log":"[2020-06-23 19:05:35,078] INFO Kafka commitId : fdcf75ea326b8e07 (org.apache.kafka.common.utils.AppInfoParser)\n","stream":"stdout","time":"2020-06-23T19:05:35.078471358Z"}
{"log":"[2020-06-23 19:05:35,079] INFO [KafkaServer id=2] started (kafka.server.KafkaServer)\n","stream":"stdout","time":"2020-06-23T19:05:35.079436798Z"}
{"log":"[2020-06-23 19:05:35,136] ERROR [KafkaApi-2] Number of alive brokers '0' does not meet the required replication factor '2' for the offsets topic (configured via 'offsets.topic.replication.factor'). This error can be ignored if the cluster is starting up and not all brokers are up yet. (kafka.server.KafkaApis)\n","stream":"stdout","time":"2020-06-23T19:05:35.136792119Z"}
And after that this msg continuously comes up.
{"log":"[2020-06-23 19:05:35,166] ERROR [KafkaApi-2] Number of alive brokers '0' does not meet the required replication factor '2' for the offsets topic (configured via 'offsets.topic.replication.factor'). This error can be ignored if the cluster is starting up and not all brokers are up yet. (kafka.server.KafkaApis)\n","stream":"stdout","time":"2020-06-23T19:05:35.166895344Z"}
Is there any way we can achieve that?
With regards,
-M-

How to send Suricata log to Kafka?

After install and config Suricata 5.0.2 according to document https://suricata.readthedocs.io/.
I try to change some configuration in suricata.yaml by adding:
- alert-json-log:
enabled: yes
filetype: kafka
kafka:
brokers: >
xxx-kafka-online003:9092,
xxx-kafka-online004:9092,
xxx-kafka-online005:9092,
xxx-kafka-online006:9092,
xxx-kafka-online007:9092
topic: nsm_event
partitions: 5
http: yes
Next I run Suricata, and receive the error
Invalid entry for alert-json-log.filetype. Expected "regular" (default), "unix_stream", "pcie" or "unix_dgram"
I don't know to configure on Suricata to enable sending log to Kafka topics.
Please help.
I don't see Kafka listed as an output type, therefore "no, there is not"
Refer docs: https://suricata.readthedocs.io/en/suricata-5.0.2/output/index.html
Plus, I'm not sure I understand what you expect http: yes to do since Kafka is not an HTTP service
What you could do is set filetype: unix_stream, then I assume that is Syslog, and you can add another service like Kafka Connect or Fluentd or Logstash to route that data to Kafka.
In other words, services don't need to integrate with Kafka. Plenty of alternatives exist to read files or stdout/stderr/syslog streams