Read Avro messages from Kafka in terminal - kafka-avro-console-consumer alternative - apache-kafka

I'm trying to find easiest way how to read Avro messages from the Kafka topics in readable format. There is option to use Confluent kafka-avro-console-consumer in following way
./kafka-avro-console-consumer \
--topic topic \
--from-beginning \
--bootstrap-server bootstrap_server_url \
--max-messages 10 \
--property schema.registry.url=schema_registry_url
but for this I need to download whole Confluent platform (1.7 GB) that I see as an overkill in my scenario.
Is there any alternative how I could get Avro messages from the Kafka topics in the terminal easily?

I was able to get the last Avro messages in the readable form with kcat
kcat -C -b bootstrap_server \
-t topic \
-r schema_registry \
-p 0 -o -1 -s value=avro -e

You will need to download additional Kafka tools that supports the Schema Registry and Avro format, such as ksqlDB or Conduktor or AKHQ or similar GUI tools
kcat might support Avro now, I cannot recall
You could write your own consumer script. The Python library from Confluent doesn't require much code to consume Avro records
You could also clone the Schema Registry project from Github and build it on its own, then use the CLI scripts there

Another way to read messages in Avro is to use Kafdrop. Test it by adding such section to your docker-compose.yml along with broker, server-regestry and other containers:
kafdrop:
image: obsidiandynamics/kafdrop
restart: "no"
ports:
- "9001:9000"
environment:
KAFKA_BROKERCONNECT: "broker:9092"
JVM_OPTS: "-Xms16M -Xmx48M -Xss180K -XX:-TieredCompilation -XX:+UseStringDeduplication -noverify"
CMD_ARGS: "--message.format=AVRO --schemaregistry.connect=http://schema-registry:8081"
depends_on:
- "broker"
After that, open Kafdrop at http://localhost:9001, click at topic where avro-messages will be put, and choose AVRO message format in drop down.

Related

How to manipulate offsets of the source database for Debezium

So I've been experimenting with Kafka and I am trying to manipulate/change the offsets of the source database using this link https://debezium.io/documentation/faq/. I was successfully able to do it but I was wondering how I would do this in native kafka commands instead of using kafkacat.
So these are the kafka commands that I'm using
kafkacat -b kafka:9092 -C -t connect_offsets -f 'Partition(%p) %k %s\n'
and
echo '["In-house",{"server":"optimus-prime"}]|{"ts_sec":1657643280,"file":"mysql-bin.000200"","pos":2136,"row":1,"server_id":223344,"event":2}' | \
kafkacat -P -b kafka:9092 -t connect_offsets -K \| -p 2
It basically reverts the offset of the source system back to a previous binlog and I can be able to read the db from a previous point in time. So this works well, but was wondering what I would need to compose via native kafka since we don't have kafkacat on our dev/prod servers although I do see it's value and maybe that will be installed later in the future. This is what I have so far for the transalation but it's not quite doing what I'm thinking.
./kafka-console-consumer.sh --bootstrap-server kafka:9092 --topic
connect_offsets --property print.offset=true --property print.partition=true --
property print.headers=true --property print.timestamp=true --property
print.key=true --from-beginning
After I run this I get these results.
This works well for the kafka consumer command but when I try to translate the producer command I run into issues.
./kafka-console-producer.sh --bootstrap-server kafka:9092 --topic connect_offsets
--property parse.partition=true --property parse.key=true --property
key.separator=":"
I get a prompt after the producer command and I enter this
["In-house",{"server":"optimus-prime"}]:{"ts_sec":1657643280,"file":"mysql-bin.000200","pos":2136,"row":1,"server_id":223344,"event":2}:2
But it seems like it's not taking the command because the bin log position doesn't update after I run the consumer command again. Any ideas? Let me know.
EDIT: After applying OneCricketeer's changes I'm getting this stack trace.
key.separator=":" looks like it will be an issue considering it will split your data at ["In-house",{"server":
So, basically you produced a bad event into the topic, and maybe broke the connector...
If you want to literally use the same command, keep your key separator as |, or any other character that will not be part of the key.
Also, parse.partition isn't a property that is used, so you should remove :2 at the end... I'm not even sure kafka-console-producer can target a specific partition.

kafka delete a topic using bootstrap server vs zookeeper

I want to know the different between these commands.
-- With bootstrap server
kafka-topics \
--bootstrap-server b-1.bhuvi-cluster-secure.jj9mhr.c3.kafka.ap-south-1.amazonaws.com:9098,b-2.bhuvi-cluster-secure.jj9mhr.c3.kafka.ap-south-1.amazonaws.com:9098 \
--delete \
--topic debezium-my-topic \
--command-config /etc/kafka/client.properties
-- With zookeeper
kafka-topics \
--zookeeper z-3.bhuvi-cluster-secure.jj9mhr.c3.kafka.ap-south-1.amazonaws.com:2182,z-1.bhuvi-cluster-secure.jj9mhr.c3.kafka.ap-south-1.amazonaws.com:2181,z-2.bhuvi-cluster-secure.jj9mhr.c3.kafka.ap-south-1.amazonaws.com:2181 \
--delete \
--topic debezium-my-topic
The reason behind this is, the Kafka ACL for delete topic is restricted. If I run the first command it's giving an error like Topic authorization failed which is correct(due to ACL) but the second command didn't check anything from ACL and deleted the topic directly.
Your authorizer.class.name configured on the brokers may only depend on Kafka, and doesn't use Zookeeper AdminClient to verify ACLs
CLI --zookeeper option is considered deprecated, and will be completely removed with KIP-500

Confluent Schema Registry Cluster Mode

I am using Kafka Connect from Confluent to consume Kafka stream and write to HDFS in parquet format. I am using Schema Registry service in 1 node and it is running fine. Now I want to distribute Schema Registry to cluster mode to handle fail over. Any link or snippet on how to achieve that will be very useful.
It is hard to find, but we covered this architecture in our documentation:
http://docs.confluent.io/3.0.0/schema-registry/docs/deployment.html#multi-dc-setup
To quote from the docs a bit (although you should read the docs, lots of good architecture advice and a recovery runbook are included):
Assuming you have Schema Registry running, here are the recommended
steps to add Schema Registry instances in a new “slave” datacenter
(call it DC B):
In DC B, make sure Kafka has unclean.leader.election.enable set to
false. In Kafka in DC B, create the _schemas topic. It should have 1
partition, kafkastore.topic.replication.factor of 3, and
min.insync.replicas at least 2. In DC B, run MirrorMaker with Kafka in
the “master” datacenter as the source and Kafka in DC B as the target.
In the Schema Registry config files in DC B, set
kafkastore.connection.url and schema.registry.zk.namespace to match
the instances already running, and set master.eligibility to false.
Start your new Schema Registry instances with these configs.
I used confluent schema-registry docker image to form the cluster.
docker run --restart always -d -p 8081:8081 --name=schema-registry-1 -e SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL=ip1:2181,ip2:2181,ip3:2181 -e SCHEMA_REGISTRY_HOST_NAME=schema-registry-1 -e SCHEMA_REGISTRY_LISTENERS=http://0.0.0.0:8081 -e SCHEMA_REGISTRY_DEBUG=true confluentinc/cp-schema-registry:5.2.1-1
docker run --restart always -d -p 8081:8081 --name=schema-registry-2 -e SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL=ip1:2181,ip2:2181,ip3:2181 -e SCHEMA_REGISTRY_HOST_NAME=schema-registry-2 -e SCHEMA_REGISTRY_LISTENERS=http://0.0.0.0:8081 -e SCHEMA_REGISTRY_DEBUG=true confluentinc/cp-schema-registry:5.2.1-1
docker run --restart always -d -p 8081:8081 --name=schema-registry-3 -e SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL=ip1:2181,ip2:2181,ip3:2181 -e SCHEMA_REGISTRY_HOST_NAME=schema-registry-3 -e SCHEMA_REGISTRY_LISTENERS=http://0.0.0.0:8081 -e SCHEMA_REGISTRY_DEBUG=true confluentinc/cp-schema-registry:5.2.1-1
Once this is up and running I verified if schema-registry cluster is formed and if its leader election is successful, by checking the zookeeper contents.
$ docker exec -it zookeeper bash
# /usr/bin/zookeeper-shell localhost:2181
Connecting to localhost:2181
Welcome to ZooKeeper!
JLine support is enabled
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0] ls /
[schema_registry, cluster, controller, brokers, zookeeper, admin, isr_change_notification, log_dir_event_notification, controller_epoch, kafka-manager, CruiseControlBrokerList, consumers, latest_producer_id_block, config]
[zk: localhost:2181(CONNECTED) 1] ls /schema_registry
[schema_registry_master, schema_id_counter]
[zk: localhost:2181(CONNECTED) 4] get /schema_registry/schema_registry_master
{"host":"schema-registry-1","port":8081,"master_eligibility":true,"scheme":"http","version":1}
#
Hope this helps.
You just need to put this in the connect-avro-distributed.properties to use multi schema registry:
key.converter.schema.registry.url=http://node1:8081,http://node2:8081
value.converter.schema.registry.url=http://node1:8081,http://node2:8081
Hope this is useful for you.
Don't forget to mention master.eligibility=true in all the nodes.

How to configure zookeeper, kafka and storm in ubuntu 14.04?

I want to clear steps of how to install zookeeper, kafka and storm in Ubuntu
It will guide through sequence of steps :
Kafka binary file already has built-in Zookeeper with it, so you don't need to download it separately. Download Kafka at the below link.
Download Kafka version 0.8.2.0 from http://kafka.apache.org/downloads.html
Release and Un-tar the zip file using the below commands
tar -xzf kafka_2.9.1-0.8.2.0.tgz
Go into the extracted folder
cd kafka_2.9.1-0.8.2.0
Start the Zookeeper Server(which listens on port 2181 for kafka server requests)
bin/zookeeper-server-start.sh config/zookeeper.properties
Now start the Kafka Server in a new terminal window
bin/kafka-server-start.sh config/server.properties
Now let us test if the zookeeper-kafka configuration is working.
Open a new terminal and Create a topic test:
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
bin/kafka-topics.sh --list --zookeeper localhost:2181
Use a producer to Kafka's topic test:
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
This is a message(You have to enter these messages)
This is another message
Use Kafka's Consumer to see the messages produced :
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
The above command should list all the messages you typed earlier. That's it. You successfully configured your Zookeeper-Kafka Single Broker.
To configure multil-broker use the link following, refer the official site kafka.apache.org
Now Let's install Apache Storm :
Download tar.gz file from mirrorShttp://mirrors.ibiblio.org/apache/storm/apache-storm-0.9.2-incubating/
Extract it : $tar xzvf apache-storm-0.9.2-incubating.tar.gz
Create a data directory
sudo mkdir /var/stormtmp
sudo chmod -R 777 /var/stormtmp
sudo gedit apache-storm-0.9.2-incubating/conf/storm.yaml
Edit the opened file such that it should have following properties set(JAVA_HOME path, you can use jdk7 or higher versions. Java must be installed in your system) :
storm.zookeeper.servers: - "localhost"
storm.zookeeper.port: 2181
nimbus.host: "localhost"
storm.local.dir: "/var/stormtmp"
java.library.path: "/usr/lib/jvm/java-7-openjdk-amd64"
supervisor.slots.ports:
-6700
-6701
-6702
-6703
worker.childopts: "-Xmx768m"
nimbus.childopts: "-Xmx512m"
supervisor.childopts: "-Xmx256m"
If everything goes fine, you are now ready with apache-zookeeper-kafka-storm, you can restart the system, That's it.

How to resolve "Leader not available" Kafka error when trying to consume

I'm playing around with Kafka and using my own local single instance of zookeeper + kafka and running into this error that I don't seem to understand how to resolve.
I started a simple server per the Apache Kafka Quickstart Guide
$ bin/zookeeper-server-start.sh config/zookeeper.properties
$ bin/kafka-server-start.sh config/server.properties
Then utilizing kafkacat (installed via Homebrew) I started a Producer that will just echo messages that I type into the console
$ kafkacat -P -b localhost:9092 -t TestTopic -T
test1
test1
But when I try to consume those messages I get an error:
$ kafkacat -C -b localhost:9092 -t TestTopic
% ERROR: Topic TestTopic error: Broker: Leader not available
And similarly when I try to list its' metadata
$ kafkacat -L -b localhost:9092 -t TestTopic
Metadata for TestTopic (from broker -1: localhost:9092/bootstrap):
0 brokers:
1 topics:
topic "TestTopic" with 0 partitions: Broker: Leader not available (try again)
My questions:
Is this an issue with my running instance of zookeeper and/or kafkacat - I ask this because I've been constantly shutting them down and restarting them, after deleting the /tmp/zookeeper and /tmp/kafka-logs directories
Is there some simple setting that I need to try? I tried adding auto.leader.rebalance.enable=true in Kafka's server.properties settings file, but that didn't fix this particular issue
How do I do a fresh restart of zookeeper/kafka. Is shutting them down, deleting the /tmp/zookeeper and /tmp/kafka-logs directories and then restarting zookeeper and then kafka the way to go? (Well maybe the way to go is to build a docker container that I can stand-up and tear down, I was going to use the spotify/docker-kafka container but that is not on Kafka 0.9.0.0 and I haven't taking the time to build my own)
It might be, but probably is not. My guess is the topic isn't created, so kafkacat echoes the massage on screen but doesn't really send it to kafka. All the topics are probably deleted after you delete the /tmp/kafka-logs
No. I don't think this is the way to look for a solution.
Having a docker container is definitely the way to go - you'll soon end up running kafka on multiple brokers, examining the replication behavior, high availability scenarios etc.. Having it dockerised helps a lot.