Check message has been written to kafka topic using command line - apache-kafka

Firstly, please note that using the java consumer API is not an option. Why it is not an option I am unable to disclose, but I must be able to do the following using a shell command.
I have a topic that I have written a message to, and I can confirm this is the case if I run ./kafka-console-consumer.sh with the --from-beginning option, but since this starts a consumer then the command gets stuck and requires manual intervention with a SIGINT. I have come close using --timeout-ms, however this is not ideal as unless I select a high value there is the possibility that the dump of the data is unreliable.
I would like to dump the output of console-consumer in such a manner that it can be grepped, or a suitable alternative method.

When you write to Kafka, you can set in the producer acks which is the level of guarantee you want from the broker that the message has been received and written by the local broker and/or all replicas.
If you use this then you have no need to try and consume from the topic to determine if the record was written or not. This sounds like a really bad idea to try and do.
If you absolutely must use a command-line tool to do this (which, is not a good idea) then use kafkacat which can consume from any offset for any number of messages, e.g.:
Consume (-C) five messages (-c 5) from the beginning (-o beginning), or exit (-e) when end of partition is reached
kafkacat -b localhost:9092 -t mytopic -o beginning -e -C -c 5
Consume (-C) ten messages (-c 10) from the end (-o -10), or exit (-e) when end of partition is reached
kafkacat -b localhost:9092 -t mytopic -o -10 -e -C -c 10
Consume (-C) one messages (-c 1) at offset 42 (-o 42), or exit (-e) when end of partition is reached
kafkacat -b localhost:9092 -t mytopic -o 42 -e -C -c 1

Related

How to manipulate offsets of the source database for Debezium

So I've been experimenting with Kafka and I am trying to manipulate/change the offsets of the source database using this link https://debezium.io/documentation/faq/. I was successfully able to do it but I was wondering how I would do this in native kafka commands instead of using kafkacat.
So these are the kafka commands that I'm using
kafkacat -b kafka:9092 -C -t connect_offsets -f 'Partition(%p) %k %s\n'
and
echo '["In-house",{"server":"optimus-prime"}]|{"ts_sec":1657643280,"file":"mysql-bin.000200"","pos":2136,"row":1,"server_id":223344,"event":2}' | \
kafkacat -P -b kafka:9092 -t connect_offsets -K \| -p 2
It basically reverts the offset of the source system back to a previous binlog and I can be able to read the db from a previous point in time. So this works well, but was wondering what I would need to compose via native kafka since we don't have kafkacat on our dev/prod servers although I do see it's value and maybe that will be installed later in the future. This is what I have so far for the transalation but it's not quite doing what I'm thinking.
./kafka-console-consumer.sh --bootstrap-server kafka:9092 --topic
connect_offsets --property print.offset=true --property print.partition=true --
property print.headers=true --property print.timestamp=true --property
print.key=true --from-beginning
After I run this I get these results.
This works well for the kafka consumer command but when I try to translate the producer command I run into issues.
./kafka-console-producer.sh --bootstrap-server kafka:9092 --topic connect_offsets
--property parse.partition=true --property parse.key=true --property
key.separator=":"
I get a prompt after the producer command and I enter this
["In-house",{"server":"optimus-prime"}]:{"ts_sec":1657643280,"file":"mysql-bin.000200","pos":2136,"row":1,"server_id":223344,"event":2}:2
But it seems like it's not taking the command because the bin log position doesn't update after I run the consumer command again. Any ideas? Let me know.
EDIT: After applying OneCricketeer's changes I'm getting this stack trace.
key.separator=":" looks like it will be an issue considering it will split your data at ["In-house",{"server":
So, basically you produced a bad event into the topic, and maybe broke the connector...
If you want to literally use the same command, keep your key separator as |, or any other character that will not be part of the key.
Also, parse.partition isn't a property that is used, so you should remove :2 at the end... I'm not even sure kafka-console-producer can target a specific partition.

Read Avro messages from Kafka in terminal - kafka-avro-console-consumer alternative

I'm trying to find easiest way how to read Avro messages from the Kafka topics in readable format. There is option to use Confluent kafka-avro-console-consumer in following way
./kafka-avro-console-consumer \
--topic topic \
--from-beginning \
--bootstrap-server bootstrap_server_url \
--max-messages 10 \
--property schema.registry.url=schema_registry_url
but for this I need to download whole Confluent platform (1.7 GB) that I see as an overkill in my scenario.
Is there any alternative how I could get Avro messages from the Kafka topics in the terminal easily?
I was able to get the last Avro messages in the readable form with kcat
kcat -C -b bootstrap_server \
-t topic \
-r schema_registry \
-p 0 -o -1 -s value=avro -e
You will need to download additional Kafka tools that supports the Schema Registry and Avro format, such as ksqlDB or Conduktor or AKHQ or similar GUI tools
kcat might support Avro now, I cannot recall
You could write your own consumer script. The Python library from Confluent doesn't require much code to consume Avro records
You could also clone the Schema Registry project from Github and build it on its own, then use the CLI scripts there
Another way to read messages in Avro is to use Kafdrop. Test it by adding such section to your docker-compose.yml along with broker, server-regestry and other containers:
kafdrop:
image: obsidiandynamics/kafdrop
restart: "no"
ports:
- "9001:9000"
environment:
KAFKA_BROKERCONNECT: "broker:9092"
JVM_OPTS: "-Xms16M -Xmx48M -Xss180K -XX:-TieredCompilation -XX:+UseStringDeduplication -noverify"
CMD_ARGS: "--message.format=AVRO --schemaregistry.connect=http://schema-registry:8081"
depends_on:
- "broker"
After that, open Kafdrop at http://localhost:9001, click at topic where avro-messages will be put, and choose AVRO message format in drop down.

How to purge all kafka data for fresh start in a dev environment

Sometimes its necessary to fresh-start a kafka cluster with no data. when running a kafka inside docker containers this behavior is achieved for free.
How to do it with kafka process ? can i delete /var/log/kafka* and restart it ? is it ok to do so ?
BTW - i am using something like this :
# bash shell
# tl is a list of all topics
for t in $(cat tl); do
./kafka-topics.sh --zookeeper $ZOO --delete --topic $t
done
there are 2 problem with the above :
if hdd usage is 100%, then i got error when trying the kafka-topics.sh
its very inefficient if i have many topics
looking for a fast and clean way to do in dev envs.
seems like this do the job
$ ###### stop and clear all brokers
$ sudo systemctl stop kafka.service zookeeper.service
$ sudo rm -rf /var/log/kafka-logs/*
$ ###### continue ONLY after finish the above on all brokers
$ sudo systemctl start zookeeper.service
$ sleep 10s # make sure zookeeper is ready
$ sudo systemctl start kafka.service
I would suggest you to temporary set log.retention.ms to low value, lets say 1000. This way you tell kafka to keep messages only one second before deleting it forever. You wait a little bit, kafka will delete all of your messages(all but not messages on active segments) and when that is done, you can revert that log.retention.ms settings.

kafkacat's -o (offset to start consuming from) option

Will execute the following kafkacat command with the -o (offset to start consuming from) option but without the -G (group id) option affect other consumer groups?
kafkacat -C -b 10.52.1.1:9092,10.52.1.2:9092,10.52.1.3:9092 -t MyTopic -o beginning
No, kafkacat in standalone consumer mode (-C) will not join or affect any consumer group, it is safe to use without interfering with existing consumer groups.

How to resolve "Leader not available" Kafka error when trying to consume

I'm playing around with Kafka and using my own local single instance of zookeeper + kafka and running into this error that I don't seem to understand how to resolve.
I started a simple server per the Apache Kafka Quickstart Guide
$ bin/zookeeper-server-start.sh config/zookeeper.properties
$ bin/kafka-server-start.sh config/server.properties
Then utilizing kafkacat (installed via Homebrew) I started a Producer that will just echo messages that I type into the console
$ kafkacat -P -b localhost:9092 -t TestTopic -T
test1
test1
But when I try to consume those messages I get an error:
$ kafkacat -C -b localhost:9092 -t TestTopic
% ERROR: Topic TestTopic error: Broker: Leader not available
And similarly when I try to list its' metadata
$ kafkacat -L -b localhost:9092 -t TestTopic
Metadata for TestTopic (from broker -1: localhost:9092/bootstrap):
0 brokers:
1 topics:
topic "TestTopic" with 0 partitions: Broker: Leader not available (try again)
My questions:
Is this an issue with my running instance of zookeeper and/or kafkacat - I ask this because I've been constantly shutting them down and restarting them, after deleting the /tmp/zookeeper and /tmp/kafka-logs directories
Is there some simple setting that I need to try? I tried adding auto.leader.rebalance.enable=true in Kafka's server.properties settings file, but that didn't fix this particular issue
How do I do a fresh restart of zookeeper/kafka. Is shutting them down, deleting the /tmp/zookeeper and /tmp/kafka-logs directories and then restarting zookeeper and then kafka the way to go? (Well maybe the way to go is to build a docker container that I can stand-up and tear down, I was going to use the spotify/docker-kafka container but that is not on Kafka 0.9.0.0 and I haven't taking the time to build my own)
It might be, but probably is not. My guess is the topic isn't created, so kafkacat echoes the massage on screen but doesn't really send it to kafka. All the topics are probably deleted after you delete the /tmp/kafka-logs
No. I don't think this is the way to look for a solution.
Having a docker container is definitely the way to go - you'll soon end up running kafka on multiple brokers, examining the replication behavior, high availability scenarios etc.. Having it dockerised helps a lot.