Confluent Schema Registry Cluster Mode - apache-kafka

I am using Kafka Connect from Confluent to consume Kafka stream and write to HDFS in parquet format. I am using Schema Registry service in 1 node and it is running fine. Now I want to distribute Schema Registry to cluster mode to handle fail over. Any link or snippet on how to achieve that will be very useful.

It is hard to find, but we covered this architecture in our documentation:
http://docs.confluent.io/3.0.0/schema-registry/docs/deployment.html#multi-dc-setup
To quote from the docs a bit (although you should read the docs, lots of good architecture advice and a recovery runbook are included):
Assuming you have Schema Registry running, here are the recommended
steps to add Schema Registry instances in a new “slave” datacenter
(call it DC B):
In DC B, make sure Kafka has unclean.leader.election.enable set to
false. In Kafka in DC B, create the _schemas topic. It should have 1
partition, kafkastore.topic.replication.factor of 3, and
min.insync.replicas at least 2. In DC B, run MirrorMaker with Kafka in
the “master” datacenter as the source and Kafka in DC B as the target.
In the Schema Registry config files in DC B, set
kafkastore.connection.url and schema.registry.zk.namespace to match
the instances already running, and set master.eligibility to false.
Start your new Schema Registry instances with these configs.

I used confluent schema-registry docker image to form the cluster.
docker run --restart always -d -p 8081:8081 --name=schema-registry-1 -e SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL=ip1:2181,ip2:2181,ip3:2181 -e SCHEMA_REGISTRY_HOST_NAME=schema-registry-1 -e SCHEMA_REGISTRY_LISTENERS=http://0.0.0.0:8081 -e SCHEMA_REGISTRY_DEBUG=true confluentinc/cp-schema-registry:5.2.1-1
docker run --restart always -d -p 8081:8081 --name=schema-registry-2 -e SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL=ip1:2181,ip2:2181,ip3:2181 -e SCHEMA_REGISTRY_HOST_NAME=schema-registry-2 -e SCHEMA_REGISTRY_LISTENERS=http://0.0.0.0:8081 -e SCHEMA_REGISTRY_DEBUG=true confluentinc/cp-schema-registry:5.2.1-1
docker run --restart always -d -p 8081:8081 --name=schema-registry-3 -e SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL=ip1:2181,ip2:2181,ip3:2181 -e SCHEMA_REGISTRY_HOST_NAME=schema-registry-3 -e SCHEMA_REGISTRY_LISTENERS=http://0.0.0.0:8081 -e SCHEMA_REGISTRY_DEBUG=true confluentinc/cp-schema-registry:5.2.1-1
Once this is up and running I verified if schema-registry cluster is formed and if its leader election is successful, by checking the zookeeper contents.
$ docker exec -it zookeeper bash
# /usr/bin/zookeeper-shell localhost:2181
Connecting to localhost:2181
Welcome to ZooKeeper!
JLine support is enabled
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0] ls /
[schema_registry, cluster, controller, brokers, zookeeper, admin, isr_change_notification, log_dir_event_notification, controller_epoch, kafka-manager, CruiseControlBrokerList, consumers, latest_producer_id_block, config]
[zk: localhost:2181(CONNECTED) 1] ls /schema_registry
[schema_registry_master, schema_id_counter]
[zk: localhost:2181(CONNECTED) 4] get /schema_registry/schema_registry_master
{"host":"schema-registry-1","port":8081,"master_eligibility":true,"scheme":"http","version":1}
#
Hope this helps.

You just need to put this in the connect-avro-distributed.properties to use multi schema registry:
key.converter.schema.registry.url=http://node1:8081,http://node2:8081
value.converter.schema.registry.url=http://node1:8081,http://node2:8081
Hope this is useful for you.

Don't forget to mention master.eligibility=true in all the nodes.

Related

Kafdrop (localhost/127.0.0.1:9092) could not be established. Broker may not be available

I setup Kafka and Zookeeper on my local machine and I would like to use Kafdrop as UI. I tried running with docker command below:
docker run -d --rm -p 9000:9000 \
-e KAFKA_BROKERCONNECT=<localhost:9092> \
-e JVM_OPTS="-Xms32M -Xmx64M" \
-e SERVER_SERVLET_CONTEXTPATH="/" \
obsidiandynamics/kafdrop
and I get -bash: https://locahost:9092: No such file or directory
When I remove the KAFKA_BROKERCONNECT parameter, the application run but I got error below:
[AdminClient clientId=kafdrop-admin] Connection to node -1 (localhost/127.0.0.1:9092) could not be established. Broker may not be available.
2020-07-22 09:39:29.108 WARN 1 [| kafdrop-admin] o.a.k.c.NetworkClient
I did setup Kafka Server's listener setting to this but did not help.
listeners=PLAINTEXT://localhost:9092
I found this similar issue on GitHub but couldn't understand most of the answers.
Kafka is not HTTP-based. You do not need a schema protocol to connect to Kafka, and angle brackets do not need used.
You also cannot use localhost, as that is Kafdrop container, not Kafka.
I suggest you use Docker Compose with Kafdrop and Kafka
I followed what #OneCricketeer said in his answer and everything worked perfectly
Here what I did :
I downloaded the compose file from GitHub the link above or Click here
Run the compose file by going to the directory where the file exists and run it using cmd docker-compose up
Stop all your kafka server and Zookeeper because everything going to be downloaded with the docker-compose command
after go to http://localhost:9000/ and Voila

How to purge all kafka data for fresh start in a dev environment

Sometimes its necessary to fresh-start a kafka cluster with no data. when running a kafka inside docker containers this behavior is achieved for free.
How to do it with kafka process ? can i delete /var/log/kafka* and restart it ? is it ok to do so ?
BTW - i am using something like this :
# bash shell
# tl is a list of all topics
for t in $(cat tl); do
./kafka-topics.sh --zookeeper $ZOO --delete --topic $t
done
there are 2 problem with the above :
if hdd usage is 100%, then i got error when trying the kafka-topics.sh
its very inefficient if i have many topics
looking for a fast and clean way to do in dev envs.
seems like this do the job
$ ###### stop and clear all brokers
$ sudo systemctl stop kafka.service zookeeper.service
$ sudo rm -rf /var/log/kafka-logs/*
$ ###### continue ONLY after finish the above on all brokers
$ sudo systemctl start zookeeper.service
$ sleep 10s # make sure zookeeper is ready
$ sudo systemctl start kafka.service
I would suggest you to temporary set log.retention.ms to low value, lets say 1000. This way you tell kafka to keep messages only one second before deleting it forever. You wait a little bit, kafka will delete all of your messages(all but not messages on active segments) and when that is done, you can revert that log.retention.ms settings.

How can I get into the Zookeeper that is integrated in Kafka? ( 2.2.0 )

I have installed a kafka that has integrated zookeeper.
I have seen that to enter an independent Zookeeper installation, you can run the following command to enter the zookeeper console:
bin/ZkCli.sh
ls /zookeeper/quota
But in Kafka's scripts I only have:
zookeeper-security-migration.sh
zookeeper-server-start.sh
zookeeper-server-stop.sh
zookeeper-shell.sh
I have tried to do the following:
./zookeeper-shell.sh -server 127.0.0.1:2181 ls /zookeeper/quota
But it doesn't work, it doesn't do anything
How can I get into the Zookeeper that is integrated in Kafka?
After starting Zookeeper, you can connect to it using the zookeeper-shell.sh tool.
To get into the shell:
./zookeeper-shell IP:2181
Then you can execute commands, like:
ls /
You can use cd to move within the nodes and get to print the content of nodes.
You can also use this script to just run commands and return (without getting into the shell)
./zookeeper-shell.sh localhost:2181 get /controller
/zookeeper/quota is not a path used by Kafka, Quotas are stored under /config

Is there a equivalent Debezium command to starting Kafka Connect without Docker container

The debezium kafka connect command is :
docker run -it --rm --name connect -p 8083:8083 -e GROUP_ID=1 -e CONFIG_STORAGE_TOPIC=my_connect_configs -e OFFSET_STORAGE_TOPIC=my_connect_offsets -e STATUS_STORAGE_TOPIC=my_connect_statuses --link zookeeper:zookeeper --link kafka:kafka --link mysql:mysql debezium/connect:0.9
Is there an equivalent way to not run inside a docker container with flags to specify the zookeeper instance and kafka bootstrap servers/broker ? I have my kafka and zookeeper running on my mac locally but not inside a docker container .
Thanks
There are no "flags", just properties files. The docker image is just using variable substitution inside of those files.
You can refer to the Debezium installation documentation, and it is just a plugin to Kafka Connect, which is included with your Kafka installation.
Find connect-standalone.properties in your Kafka install to get started. One important property you will want to edit is plugin.path, which must be the full parent path to where you put the Debezium JAR files. Then Kafka is configured there as well
Then you would run this to start a single node
connect-standalone.sh connect-standalone.properties your-debezium-config.properties
(Docker image is running connect-distributed.sh, but you wouldn't need to run a Cluster just on your Mac)

How to resolve "Leader not available" Kafka error when trying to consume

I'm playing around with Kafka and using my own local single instance of zookeeper + kafka and running into this error that I don't seem to understand how to resolve.
I started a simple server per the Apache Kafka Quickstart Guide
$ bin/zookeeper-server-start.sh config/zookeeper.properties
$ bin/kafka-server-start.sh config/server.properties
Then utilizing kafkacat (installed via Homebrew) I started a Producer that will just echo messages that I type into the console
$ kafkacat -P -b localhost:9092 -t TestTopic -T
test1
test1
But when I try to consume those messages I get an error:
$ kafkacat -C -b localhost:9092 -t TestTopic
% ERROR: Topic TestTopic error: Broker: Leader not available
And similarly when I try to list its' metadata
$ kafkacat -L -b localhost:9092 -t TestTopic
Metadata for TestTopic (from broker -1: localhost:9092/bootstrap):
0 brokers:
1 topics:
topic "TestTopic" with 0 partitions: Broker: Leader not available (try again)
My questions:
Is this an issue with my running instance of zookeeper and/or kafkacat - I ask this because I've been constantly shutting them down and restarting them, after deleting the /tmp/zookeeper and /tmp/kafka-logs directories
Is there some simple setting that I need to try? I tried adding auto.leader.rebalance.enable=true in Kafka's server.properties settings file, but that didn't fix this particular issue
How do I do a fresh restart of zookeeper/kafka. Is shutting them down, deleting the /tmp/zookeeper and /tmp/kafka-logs directories and then restarting zookeeper and then kafka the way to go? (Well maybe the way to go is to build a docker container that I can stand-up and tear down, I was going to use the spotify/docker-kafka container but that is not on Kafka 0.9.0.0 and I haven't taking the time to build my own)
It might be, but probably is not. My guess is the topic isn't created, so kafkacat echoes the massage on screen but doesn't really send it to kafka. All the topics are probably deleted after you delete the /tmp/kafka-logs
No. I don't think this is the way to look for a solution.
Having a docker container is definitely the way to go - you'll soon end up running kafka on multiple brokers, examining the replication behavior, high availability scenarios etc.. Having it dockerised helps a lot.