output azure eventhub service using filebeat - apache-kafka

Is that possible to use output as output.azure via file beat in the filebeat.yml file and input from a local file or we need Kafka as output to inject a logs to eventhub azure.
As far as I explored, only I can see kafka module as output one for sending logs to Azure EventHub
And storage account is necessary for eventhub injection or not ??
Thanks!

We use filebeat with event hubs via kafka surface. In your filebeat.yaml you use normal kafka output pointing to the event hubs instance. Your config would be something like this:
output.kafka:
topic: ${event_hub_connect_topic}
required_acks: 1
client_id: filebeat
version: '1.0.0'
hosts:
- ${event_hub_connect_hosts}
username: "$ConnectionString"
password: ${event_hub_connect_string}
ssl.enabled: true
compression: none
Note that you can use env vars in these settings. Since we use filebeat in k8s, we provide them through a secret.
As for the storage account, I believe you only need it if you want to use the capture feature of event hubs.

Related

How can I install connector config in kafka connect

Is there any other way to deploy connector config rather than POSTing connector config to kafka connect REST api? https://docs.confluent.io/platform/current/connect/references/restapi.html#tasks
I am thinking of any form of persistent approach like a volume or s3, where connect during bootstrap would grap those configs would be great. Don't know/can't find if thats anywhere available.
regards
The REST API is the only way.
You can use abstractions like Terraform or Kubernetes resources, however, which wrap an HTTP client.
If you use other storage, that'll require you to write extra code to download files and call the REST API.

How to send logs from Google Stackdriver to Kafka

I see many docs and posts about how to send logs to Stackdriver but almost no information about how to do the opposite - send logs from the Stackdriver to Kafka.
In my case, our Ops want to collect the logs from our web servers using Google's stackdriver agents and pushing them to stackdriver ... However, for my stream processing needs I want to get the logs into Kafka to use it's unparalleled abilities to retain and reprocess data by any number of consumers, something that I cannot do with PubSub.
So, what are the options for doing this? I only saw a couple of possible avenues - neither sounds too good:
based on this post: (https://powerspace.tech/how-to-stream-data-from-google-pubsub-to-kafka-with-kafka-connect-dbef1c340a76) push data into PubSub first, and then read from it using either Kafka connector or write my own Kafka consumer. I hate the thought of adding yet another hop (serialize/deserialize/ack/etc.) between the source of data and Kafka ....
I noticed a brief mentioning in passing on adding a plugin to Google's version of Fluentd (which is what stackdriver log collection agent is based on) here: https://powerspace.tech/how-to-stream-data-from-google-pubsub-to-kafka-with-kafka-connect-dbef1c340a76 . Not many details - so hard to tell how involved this approach is ...
Any other options?
Thank you!
Enter in to the Kafka console and add certain elements in the console. Once you have added the elements in the Kafka console you need to check if these elements are reflected successfully in the cloud shell. For this you will run the command > $ gcloud pubsub subscriptions pull from-kafka — auto-ack — limit=10 < . Once you run this command it will take some time to sync with the Kafka console. You will get the results after running this command a couple of times.
You will run the commands in the Cloud Shell and see the output in the Kafka VM SSH.
***Image1
Now you will be verifying the exact opposite procedure where in you will be running the command in the Kafka VM and seeing the output in the Cloud Shell. It will take some time for the output to be reflected and you may have to run the command > $ gcloud pubsub subscriptions pull from-kafka — auto-ack — limit=10 < a couple of times to see the output. Your output will look like this
*** image2
The Kafka plugin is deprecated. For more information, refer to https://cloud.google.com/stackdriver/docs/deprecations
Note: This functionality is only available for agents running on Linux. It is not available on Windows.
Kafka is monitored via JMX. Monitoring supports monitoring Kafka version 0.8.2 and higher.
On your VM instance, download kafka-082.conf from the GitHub configuration repository and place it in the directory /etc/stackdriver/collectd.d/:
(cd /etc/stackdriver/collectd.d/ && sudo curl -O https://raw.githubusercontent.com/Stackdriver/stackdriver-agent-service-configs/master/etc/collectd.d/kafka-082.conf)
The downloaded plugin configuration file assumes that your Kafka server is configured to accept JMX connections on port 9999. If you have configured Kafka with a different JMX port, as root, edit the file and follow the instructions to change the JMX port settings.
After adding the configuration file, restart the Monitoring agent by running the following command:
sudo service stackdriver-agent restart
What is monitored:
https://cloud.google.com/monitoring/api/metrics_agent#agent-kafka

Dynamic creation of Kafka Connectors

I have deployed a Kafka cluster and a Kafka Connect cluster in kubernetes, using Strimzi and AKS. And I wanted to start reading from RSS resources to feed my Kafka cluster, so I created a connector instance of "org.kaliy.kafka.connect.rss.RssSourceConnector" which reads from a specific RSS feed, given an url, and writes to a specific topic. But my whole intention with this is to eventually have a Kafka Connect cluster able to manage a lot of external requests of new RSSs to read from; and here is where all my doubts come in:
Shoud I create an instance of Kaliy RSS connector for each RSS feed? Or would it be better to implement my own connector, so I create only one instance of it and each time I want to read a new RSS feed I would create a new Task in the connector?
Who should be resposible of assuring the Kafka Connect Cluster state is the desired one? I mean that if a Connector(in the case of 1 RSS feed : 1 Connector instance) stopped working, who should try to start it again? An external client via the Kafka Connect REST API? Kubernetes itself?
Right now, I think my best option is to rely on Kafka Connect REST API making the external client responsible of managing the state of the set of connectors, but I don't know if these was designed to recieve a lot of requests as it would be the case. Maybe these could be scaled by provisioning several listeners in the Kafka Connect REST API configuration but I do not know.
Thanks a lot!
One of the main benefits in using Kafka Connect is to leverage a configuration-driven approach, so you will lose this by implementing your own Connector. In my opinion, the best strategy is to have one Connector instance for each RSS feed. Reducing the number of instances could make sense when having a single data source system, to avoid overloading it.
Using Strimzi Operator, Kafka Connect cluster would be monitored and it will try to restore the desired cluster state when needed. This does not include the single Connector instances and their tasks, but you may leverage the K8s API for monitoring the Connector custom resource (CR) status, instead of the REST API.
Example:
$ kubetctl get kafkaconnector amq-sink -o yaml
apiVersion: kafka.strimzi.io/v1alpha1
kind: KafkaConnector
# ...
status:
conditions:
- lastTransitionTime: "2020-12-07T10:30:28.349Z"
status: "True"
type: Ready
connectorStatus:
connector:
state: RUNNING
worker_id: 10.116.0.66:8083
name: amq-sink
tasks:
- id: 0
state: RUNNING
worker_id: 10.116.0.66:8083
type: sink
observedGeneration: 1
It could be late, but it could help anyone will pass by the question, It is more relevant to have a look at Kafka-connect CR (Custom Resources) as a part of Confluent For Kubernetes (CFK), it introduces a clear cut declarative way to manage and monitor Connector with health checks and auto healing.
https://www.confluent.io/blog/declarative-connectors-with-confluent-for-kubernetes/

Is it possible to get a notification if kubernetes job fails

I would like to know if it is possible to send notification using yaml config if the kubernetes job fails?
For example, I have a kubetnetes job which runs once in everyday. Now i have been running a jenkins job to check and send notification if the job fails. Do we have any options to get notification from kubernetes jobs directly if it fails? It should be something like we add in job yaml
I'm not sure about any built in notification support. That seems like the kind of feature you can find in external dedicated monitoring/notification tools such as Prometheus or Logstash output.
For example, you can try this tutorial to leverage the prometheus metrics generated by default in many kubernetes clusters: https://medium.com/#tristan_96324/prometheus-k8s-cronjob-alerts-94bee7b90511
Or you can theoretically setup Logstash and monitor incoming logs sent by filebeat and conditionally send alerts as part of the output stage of the pipelines via the "email output plugin"
Other methods exist as well as mentioned in this similar issue: How to send alerts based on Kubernetes / Docker events?
For reference, you may also wish to read this request as discussed in github: https://github.com/kubernetes/kubernetes/issues/22207

Filebeat kafka output use filename as key

I want to use filebeat 5.4.0 to ship log to kafka. My logs are all docker container logs, in /var/lib/docker/containers/*/${container_name}.log, or soft link in /var/log/containers/${appname}-${container_name}.log.
I want to save all app logs to one topic in kafka. And my requirements are:
Make sure the log from the same container go to the same partition
in order.
The msg must contains the appname and the container_name where it comes out.
And I'm facing two problems.
How to get log from a soft link?
How to get the appname and container_name from the filename, and set to key of output.kafka?
Beats are supposed to be lightweight, if you want to do more filtering then that is what logstash is for. You can use filebeats+logstash+kafka. Before sending to kafka, use logstash's split filter.
Also you can use 'type' property in filebeats to map the log paths like below
...
paths:
"/var/log/container/${appname}-${container_name}"
document_type: log
output.kafka:
...
key:'%{[type]}'
...