Dynamic creation of Kafka Connectors - apache-kafka

I have deployed a Kafka cluster and a Kafka Connect cluster in kubernetes, using Strimzi and AKS. And I wanted to start reading from RSS resources to feed my Kafka cluster, so I created a connector instance of "org.kaliy.kafka.connect.rss.RssSourceConnector" which reads from a specific RSS feed, given an url, and writes to a specific topic. But my whole intention with this is to eventually have a Kafka Connect cluster able to manage a lot of external requests of new RSSs to read from; and here is where all my doubts come in:
Shoud I create an instance of Kaliy RSS connector for each RSS feed? Or would it be better to implement my own connector, so I create only one instance of it and each time I want to read a new RSS feed I would create a new Task in the connector?
Who should be resposible of assuring the Kafka Connect Cluster state is the desired one? I mean that if a Connector(in the case of 1 RSS feed : 1 Connector instance) stopped working, who should try to start it again? An external client via the Kafka Connect REST API? Kubernetes itself?
Right now, I think my best option is to rely on Kafka Connect REST API making the external client responsible of managing the state of the set of connectors, but I don't know if these was designed to recieve a lot of requests as it would be the case. Maybe these could be scaled by provisioning several listeners in the Kafka Connect REST API configuration but I do not know.
Thanks a lot!

One of the main benefits in using Kafka Connect is to leverage a configuration-driven approach, so you will lose this by implementing your own Connector. In my opinion, the best strategy is to have one Connector instance for each RSS feed. Reducing the number of instances could make sense when having a single data source system, to avoid overloading it.
Using Strimzi Operator, Kafka Connect cluster would be monitored and it will try to restore the desired cluster state when needed. This does not include the single Connector instances and their tasks, but you may leverage the K8s API for monitoring the Connector custom resource (CR) status, instead of the REST API.
Example:
$ kubetctl get kafkaconnector amq-sink -o yaml
apiVersion: kafka.strimzi.io/v1alpha1
kind: KafkaConnector
# ...
status:
conditions:
- lastTransitionTime: "2020-12-07T10:30:28.349Z"
status: "True"
type: Ready
connectorStatus:
connector:
state: RUNNING
worker_id: 10.116.0.66:8083
name: amq-sink
tasks:
- id: 0
state: RUNNING
worker_id: 10.116.0.66:8083
type: sink
observedGeneration: 1

It could be late, but it could help anyone will pass by the question, It is more relevant to have a look at Kafka-connect CR (Custom Resources) as a part of Confluent For Kubernetes (CFK), it introduces a clear cut declarative way to manage and monitor Connector with health checks and auto healing.
https://www.confluent.io/blog/declarative-connectors-with-confluent-for-kubernetes/

Related

How can I install connector config in kafka connect

Is there any other way to deploy connector config rather than POSTing connector config to kafka connect REST api? https://docs.confluent.io/platform/current/connect/references/restapi.html#tasks
I am thinking of any form of persistent approach like a volume or s3, where connect during bootstrap would grap those configs would be great. Don't know/can't find if thats anywhere available.
regards
The REST API is the only way.
You can use abstractions like Terraform or Kubernetes resources, however, which wrap an HTTP client.
If you use other storage, that'll require you to write extra code to download files and call the REST API.

Create a cronjob in Knative publishing messages to Kafka

I would like to create a cronjob via Knative that sends healthcheck messages to my Kafka topic every 10 minutes.
Then we will have a separate endpoint created, that will receive these messages and pass some response to a receiver (healthcheck).
Tried using KafkaBinding, which seems suitable, but cannot apply this template (TLS): https://github.com/knative/docs/tree/main/code-samples/eventing/kafka/binding
I also find it odd, that regular KafkaBinding template contains apiVersion: bindings.knative.dev/v1beta1, while the one with TLS: sources.knative.dev/v1beta1.
Haven't found much documentation on how to create a Cronjob sending messages on Kafka, which is then grabbed using KafkaSource and passed using broker+trigger to my service on K8s.
Anyone got it implemented right? Or found another way for this?
The idea is test the whole flow including KafkaSource, which seems a little bit unstable.

Consume events from AWS EventBridge in a self hosted kafka cluster outsite aws

We got a SaaS which is publishing it's events on AWS eventbridge (coulple of milion per day). We would love to consume those events and put them on our self hosted Kafka cluster. What would be the best methode to do this? We where thinking about lambda's, but the seem expensive for this use case and we don't to to manage to many other components. Does anyone have some good practices?
i was looking for a similar solution but in my case it is from eventbridge to MSK within AWS account. at this point looks like the only option is to use a lambda to feed into Kafka.
As per today, AWS only supports following Targets - https://docs.amazonaws.cn/en_us/eventbridge/latest/userguide/eb-targets.html#eb-console-targets
I have a similar use case where i need to send a message to AWS RabbitMQ or even to AWS Kafka as i need Priority Logic Implemented.
As AWS supports Lambda's, the message can be forwarded to the lambda from where it can be fed to any system

Ignite: work with messaging and less dependencies

I'm writing a microservice for an existing Ignite cluster. I need to have basic communications with Ignite Messaging system, and don't need other Ignite capabilities. I don't want to include Ignite libraries as it will bloat my microservice - ignite.zip is about 10 times larger than my server and I only need a small subset of functionality.
How can I send messages to existing Ignite cluster and receive messages from it?
EDIT: Ignite documentation lists REST API as one of ways to use Ignite. I'm not sure how it can be used to work with Ignite messaging - suppose I want to receive message as soon as it becomes available in the Ignite messaging? I don't want to poll for messages, as that's not efficient enough for me. If using REST API, the question becomes: how (if it's possible) to receive message using Ignite REST API from the distributed messaging system?
To do this you need only one JAR - ignite-core, which doesn't have any additional dependencies.
To achieve functionality, you can start a client node in your application and use IgniteMessaging API: https://apacheignite.readme.io/docs/messaging

How to use kafka and storm on cloudfoundry?

I want to know if it is possible to run kafka as a cloud-native application, and can I create a kafka cluster as a service on Pivotal Web Services. I don't want only client integration, I want to run the kafka cluster/service itself?
Thanks,
Anil
I can point you at a few starting points, there would be some work involved to go from those starting points to something fully functional.
One option is to deploy the kafka cluster on Cloud Foundry (e.g. Pivotal Web Services) using docker images. Spotify has Dockerized kafka and kafka-proxy (including Zookeeper). One thing to keep in mind is that PWS currently doesn't support apps with persistence (although this work is starting) so if you were to go this route right now, you would lose the data in kafka when the application is rolled. Looking at that Spotify repo, it looks like the docker images are generally run without any mounted volumes, so this persistence-less kafka seems like it may be a valid use case (I don't know enough about kafka to say).
The other option is to deploy kafka directly on some IaaS (e.g. AWS) using BOSH. BOSH can be hard if you're seeing it for the first time, but it is the ideal way to deploy any distributed software that you want running on VMs. You will also be able to have persistent volumes attached to your kafka VMs if necessary. Here is a kafka BOSH release which may work.
Once you have your cluster running, you have two ways to integrate your Cloud Foundry applications with it. The simplest is just to provide it to your applications as a "user-provided service", which lets you flow kafka cluster access info to your apps. The alternative would to put a service broker in front of your cluster, which would be especially useful if you have many different people who will be pushing apps that need to talk to the kafka cluster. Rather than you having to manually tell people the access info each time, they can do something simple like cf bind-service SOME_APP YOUR_KAFKA_SERVICE. Here is a kafka service broker along with more info about service brokers in general.
According to the 12-factor app description (https://12factor.net/processes), Kafka should not run as an application on top of Cloud Foundry:
Twelve-factor processes are stateless and share-nothing. Any data that needs to persist must be stored in a stateful backing service, typically a database.
Kafka is often considered a "distributed commit log" and as such carries a large amount of state. Many companies use it to keep all events flowing through their distributed system of micro services for a long (sometimes unlimited) amount of time.
Therefore I would strongly recommend to go for the second option in the accepted answer: Kafka topics should be bound to your applications in the form of stateful services.