How to use Kafka connect in Strimzi - kubernetes

I am using Kafka with strimzi operator, I created a Kafka cluster and and also deployed Kafka connect using yml file. But after this I am totally blank what to do next . I read that Kafka connect is used to copy data from a source to Kafka cluster or from Kafka cluster to another destination.
I want to use Kafka connect to copy the data from a file to Kafka cluster's any topic.
Can any one please help me how can I do that I am sharing the yml file using which I created my Kafka connect cluster.
apiVersion: kafka.strimzi.io/v1beta1
kind: KafkaConnect
metadata:
name: my-connect-cluster
# annotations:
# # use-connector-resources configures this KafkaConnect
# # to use KafkaConnector resources to avoid
# # needing to call the Connect REST API directly
# strimzi.io/use-connector-resources: "true"
spec:
version: 2.6.0
replicas: 1
bootstrapServers: my-cluster-kafka-bootstrap:9093
tls:
trustedCertificates:
- secretName: my-cluster-cluster-ca-cert
certificate: ca.crt
config:
group.id: connect-cluster
offset.storage.topic: connect-cluster-offsets
config.storage.topic: connect-cluster-configs
status.storage.topic: connect-cluster-status
#kubeclt create -f kafka-connect.yml -n strimzi
After that pod for Kafka connect is in running status ,I don't know what to do next. Please help me.

Kafka Connect exposes a REST API, so you need to expose that HTTP endpoint from the Connect pods
I read that Kafka connect is used to copy data from a source to Kafka cluster or from Kafka cluster to another destination.
That is one application, but sounds like you want MirrorMaker2 instead for that
If you don't want to use the REST API, then uncomment this line
# strimzi.io/use-connector-resources: "true"
and use another YAML file to configure the Connect resources , as shown here for Debezium. See kind: "KafkaConnector"

Look at this simple example from scratch. Not really what you want to do, but pretty close. We are sending messages to a topic using the kafka-console-producer.sh and consuming them using a file sink connector.
The example also shows how to include additional connectors by creating your own custom Connect image, based on the Strimzi one. This step would be needed for more complex examples involving external systems.

Related

no matches for kind "KafkaConnect" in version "kafka.strimzi.io/v1beta2"

I'm trying work with Strimzi to create kafka-connect cluster and encountering the following error
unable to recognize "kafka-connect.yaml": no matches for kind "KafkaConnect" in
version "kafka.strimzi.io/v1beta2"
Here's the kafka-connect.yaml I have:
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnect
metadata:
name: kafka-connect
namespace: connect
annotations:
strimzi.io/use-connector-resources: "true"
spec:
version: 2.4.0
replicas: 1
bootstrapServers: host:port
tls:
trustedCertificates:
- secretName: connectorsecret
certificate: cert
config:
group.id: o
offset.storage.topic: strimzi-connect-cluster-offsets
config.storage.topic: strimzi-connect-cluster-configs
status.storage.topic: strimzi-connect-cluster-status
sasl.mechanism: scram-sha-256
security.protocol: SASL_SSL
secretName: connectorsecret
sasl.jaas.config: org.apache.kafka.common.security.scram.ScramLoginModule required username=username password=password
Then I tried to apply the config via kubectl apply -f kafka-connect.yaml
Is there anything necessities to create resources using Strimzi or something I'm doing wrong?
I think there are two possibilities:
You did not installed the CRD resources
You are using Strimzi version which is too old and does not support the v1beta2 API
Judging that you are trying to use Kafka 2.4.0, I guess the second option is more likely. If you Ĺ•eally want to do that, you should make sure to use the documentation, examples and everything from the version of Strimzi you use- they should be useing one of the older APIs (v1alpha1 or v1beta1).
But in general, I would strongly recommend you to use the latest version of Strimzi and not a version which is several years old.
One more note: If you want to configure the SASL authentication for your Kafka Connect cluster, you should do it in the .spec.authentication section of the custom resource and not in .spec.config.

Atlas MongoDB sink connector for strimzi kafka setup

I am new kafka space and I have setup Strimzi cluster operator, Kafka bootstrap server, entity operator, and kafka connect in Kubernetes following the below guidelines:
https://strimzi.io/docs/operators/latest/deploying.html
How do I setup kafka mongo sink connector for strimzi kafka connect cluster ?
I have the official mongodb connector plugin. Can I use this plugin to connect to atlas mongodb ?
Most of the forums have explanation on confluent kafka but not strimzi kafka.
Below is my kafka connect config:
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnect
metadata:
name: my-mongo-connect
annotations:
strimzi.io/use-connector-resources: "true"
spec:
image: STRIMZI KAFKA CONNECT IMAGE WITH MONGODB PLUGIN
version: 3.2.1
replicas: 1
bootstrapServers: my-cluster-kafka-bootstrap:9092
logging:
type: inline
loggers:
connect.root.logger.level: "INFO"
config:
group.id: my-cluster
offset.storage.topic: mongo-connect-cluster-offsets
config.storage.topic: mongo-connect-cluster-configs
status.storage.topic: mongo-connect-cluster-status
key.converter: org.apache.kafka.connect.json.JsonConverter
value.converter: org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable: true
value.converter.schemas.enable: true
config.storage.replication.factor: -1
offset.storage.replication.factor: -1
status.storage.replication.factor: -1
Below is my sink connector config:
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnector
metadata:
name: mongodb-sink-connector
labels:
strimzi.io/cluster: my-cluster
spec:
class: com.mongodb.kafka.connect.MongoSinkConnector
tasksMax: 2
config:
topics: my-topic
connection.uri: "MONGO ATLAS CONNECTION STRING"
database: my_database
collection: my_collection
post.processor.chain: com.mongodb.kafka.connect.sink.processor.DocumentIdAdder,com.mongodb.kafka.connect.sink.processor.KafkaMetaAdder
key.converter: org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable: false
value.converter: org.apache.kafka.connect.json.JsonConverter
value.converter.schemas.enable: false
But the above setup is not working though my kafka server is up and running producer-consumer example works.
Is the official mongodb plugin (Maven Central Repository Search) appropriate for this ? or do I use debezium mongodb connector ?
If anyone can shed some light on step-by-step guideline with this regard, that would of great help.
Thanks in advance.
Since the comment section is getting longer, I'm posting some of the answers to the questions asked in the comments.
Below is my dockerfile:
FROM quay.io/strimzi/kafka:0.31.0-kafka-3.2.1
USER root:root
COPY ./my-plugins/ /opt/kafka/plugins/
USER 1001
my-plugins folder contains two jar files mongo-kafka-connect-1.8.0.jar and mongodb-driver-core-4.7.1.jar. I'm not sure if I need the core-driver plugin for atlas mongodb, but anyway I have it there.
I changed the producer-consumer example version to the following:
kubectl -n kafka run kafka-producer -ti --image=quay.io/strimzi/kafka:0.31.0-kafka-3.2.1 --rm=true --restart=Never -- bin/kafka-console-producer.sh --broker-list my-cluster-kafka-bootstrap:9092 --topic my-topic
To summerize, my strimzi operator is 0.31.0 and kafka-connect is set to 3.2.1 which is aligned with the compatibility table here under supported versions
Regarding adding tls spec section in kafka-connect, the liveness-probe fails saying: failed to connect to the IP-ADDRESS and the pod keeps restarting.
Below is my tls spec where the cert is present in my kafka server:
tls:
trustedCertificates:
- secretName: my-cluster-cluster-ca-cert
certificate: ca.crt
I removed the key converter configs as suggested. But what should the group.id in my kafka-connect ?
I also changed storage.topic config to the following:
offset.storage.topic: mongo-connect-cluster-offsets
config.storage.topic: mongo-connect-cluster-configs
status.storage.topic: mongo-connect-cluster-status
I referred this blog for the above settings.
The log from kubectl -n kafka exec -it YOUR-KAFKA-CONNECT-POD -- curl http://localhost:8083/connectors is [] so there is a problem here?
The log from kubectl -n kafka exec -it YOUR-KAFKA-CONNECT-POD -- bin/kafka-topics.sh --bootstrap-server my-cluster-kafka-bootstrap:9092 --list:
__consumer_offsets
__strimzi-topic-operator-kstreams-topic-store-changelog
__strimzi_store_topic
mongo-connect-cluster-configs
mongo-connect-cluster-offsets
mongo-connect-cluster-status
my-topic
Below is pod logs for kafka-connect, I will find a way to share the whole log files soon.
So, where am I going wrong? Also, how do I verify that the data flow is happening the way it is intended to?

how to pass MongoDB tls certificates while creating debezium mongodb kafka connector?

We have MongoDB cluster with three replicas. I have enabled preferred TLS and authentication type as MongoDB-X509.
We have three broker strimzi kafka cluster and connect cluster with all required plugins (i.e. mongoDB provided by debezium) up and running.
Below provided partial connect.yaml file used for kafka connect deployment:-
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnect
metadata:
name: my-connect
spec:
config:
config.providers: directory
config.providers.directory.class: org.apache.kafka.common.config.provider.DirectoryConfigProvider
externalConfiguration:
volumes:
- name: connector-config
secret:
secretName: mysecret
deployment works fine and able to see ca.pem and mongo-server.pem file in /opt/kafka/external-configuration/connector-config directory.
After then I am trying to create mongoDB connector with configuration files as give below, but not sure on exact way of passing certificates. As there is no sample configuration file available for mongoDb connectors. Could you please help on this by providing some sample configuration.
I tried below configuration file:-
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnector
metadata:
name: my-source-connector
labels:
strimzi.io/cluster: my-connect-cluster
spec:
class: io.debezium.connector.mongodb.MongoDbConnector
tasksMax: 2
config:
ssl.truststore.type: PEM
ssl.truststore.location: "${directory:/opt/kafka/external-configuration/connector-config:ca.pem}"
ssl.keystore.type: PEM
ssl.keystore.location: "${directory:/opt/kafka/external-configuration/connector-config:mongo-server.pem}"
"mongodb.hosts": "rs0/192.168.99.100:27017"
"mongodb.name": "fullfillment"
"collection.include.list": "inventory[.]*"
"mongodb.ssl.enabled": true
"mongodb.ssl.invalid.hostname.allowed": true
but it was throwing syntax error. Please help on this by providing sample mongoDB connector.yaml?
As for Strimzi, you can use the external configuration to mount Secrets or Config Maps into the Strimzi Kafka Connect deployment: https://strimzi.io/docs/operators/latest/full/using.html#type-ExternalConfiguration-reference. Once it is loaded into the Pods, you can either just refer to it using the file path, or you can use the Kafka configuration providers to load the data into the configuration options.

Is it possible to access Zookeeper in Strimzi Kafka installed with Route listener type on OpenShift?

I have Strimzi Kafka cluster on OpenShift, configured like described here:
https://strimzi.io/blog/2019/04/30/accessing-kafka-part-3/
Basically like this:
kind: Kafka
metadata:
name: ...
spec:
kafka:
version: 2.7.0
replicas: 2
listeners:
plain: {}
tls:
authentication:
type: tls
external:
type: route
tls: true
authentication:
type: tls
authorization:
type: simple
According to the article above, I can only access bootstrap server via port 443. Basically, this set up works and does what I need.
I am wondering if I can get external access to Zookeper to manage cluster via command line from my machine? And if yes, should I download Kafka binaries and use CLI from archive? Or I need to login to Zookeeper Pod (e.g. via OpenShift UI) and manage Kafka cluster via CLI from there?
Thanks in advance.
Strimzi does not provide any access to Zookeeper. It is locked down using mTLS and network policies. If you really need it, you can use this unofficial project https://github.com/scholzj/zoo-entrance and create a route manually your self. But it is not secure - so use it on your own risk. Openin a temrinal inside the Zookeeper pod would be an option as well. But in most cases, you should not need Zookeeper access today as Kafka is anyway preparing for its removal.

Installing kafka and zookeeper cluster using kubernetes

Can anyone share me the yaml file for creating kafka cluster with two kafka broker and zookeeper cluster with 3 servers.I'm new to kubernetes.
Take look at https://github.com/Yolean/kubernetes-kafka, Make sure the broker memory limit is 2 GB or above.
Maintaining a reliable kafka cluster in kubernetes is still a challenge, good luck.
I recommend you to try Strimzi Kafka Operator. Using it you can define a Kafka cluster just like other Kubernetes object - writing a yaml file. Moreover, also users, topics and Kafka Connect cluster are just a k8s objects. Some (by not all!) features of Strimzi Kafka Operator:
Secure communication between brokers and between brokers and zookeeper with TLS
Ability to expose the cluster outside k8s cluster
Deployable as a helm chart (it simplifies things a lot)
Rolling updates when changing cluster configuration
Smooth scaling out
Ready to monitor the cluster using Prometheus and Grafana.
It's worth to mention a great documentation.
Creating a Kafka cluster is as simple as applying a Kubernetes manifest like this:
apiVersion: kafka.strimzi.io/v1beta1
kind: Kafka
metadata:
name: my-cluster
spec:
kafka:
version: 2.2.0
replicas: 3
listeners:
plain: {}
tls: {}
config:
offsets.topic.replication.factor: 3
transaction.state.log.replication.factor: 3
transaction.state.log.min.isr: 2
log.message.format.version: "2.2"
storage:
type: jbod
volumes:
- id: 0
type: persistent-claim
size: 100Gi
deleteClaim: false
zookeeper:
replicas: 3
storage:
type: persistent-claim
size: 100Gi
deleteClaim: false
entityOperator:
topicOperator: {}
userOperator: {}
I think that you could take a look at the Strimzi project here https://strimzi.io/.
It's based on the Kubernetes operator pattern and provide a simple way to deploy and manage a Kafka cluster on Kubernetes using custom resources.
The Kafka cluster is described through a new "Kafka" resource YAML file for setting all you need.
The operator takes care of that and deploys the Zookeeper ensemble + the Kafka cluster for you.
It also deploys more two operators for handling topics and users (but they are optional).
Another simple configuration of Kafka/Zookeeper on Kubernetes in DigitalOcean with external access:
https://github.com/StanislavKo/k8s_digitalocean_kafka
You can connect to Kafka from outside of AWS/DO/GCE by regular binary protocol. Connection is PLAINTEXT or SASL_PLAINTEXT (username/password).
Kafka cluster is StatefulSet, so you can scale cluster easily.