kcat protobuf deserialization

kcat protobuf deserialization - apache-kafka

I'm using kcat to check the content of kafka topics when working locally but, when messages are serialized with protobuf, the result I get is an unreadable stream of encoded characters. I'm aware of the existence of some other kafka-consumers tools (Kafdrop, AKHQ, Kowl, Kadek...) but I'm looking for the simplest option which fits my needs.
Does kcat support protobuf key/value deserialization from protofile?
Is there any simple terminal-based tool which allows this?

I've had luck with this command:
% kcat -C -t <topic> -b <kafkahost>:9092 -o -1 -e -q -D "" | protoc --decode=<full message class> path/to/my.proto --proto_path <proto_parent_folder>

any simple terminal-based tool which allows this
Only ones that integrate with the Confluent Schema Registry (which is what those linked tools use as well), e.g. kafka-protobuf-console-consumer is already part of Confluent Platform.
Regarding kcat - refer https://github.com/edenhill/kcat/issues/72 and linked issues

Related

Sending messages to Kafka unbuffered using kafkacat

I have single node Kafka instance running locally via docker-compose.
(system: Mac/Arm64, image: wurstmeister/kafka:2.13-2.6.0)
I want to use kafkacat (kcat installed via Homebrew) to instantly produce and consume messages to and from Kafka.
Here is a minimal script:
#!/usr/bin/env bash
NUM_MESSAGES=${1:-3} # use arg1 or use default=3
KCAT_ARGS="-q -u -c $NUM_MESSAGES -b localhost:9092 -t unbuffered"
log() { echo "$*" 1>&2; }
producer() {
log "starting producer"
for i in `seq 1 3`; do
echo "msg $i"
log "produced: msg $i"
sleep 1
done | kcat $KCAT_ARGS -P
}
consumer() {
log "starting consumer"
kcat $KCAT_ARGS -C -o end | while read line; do
log "consumed: $line"
done
}
producer&
consumer&
wait
I would expect (roughly) the following output:
starting producer
starting consumer
produced: msg 1
consumed: msg 1
produced: msg 2
consumed: msg 2
produced: msg 3
consumed: msg 3
However, I only get output with produced and consumed messages fully batched into two groups, even though both the consumer and producer are running in parallel:
starting producer
starting consumer
produced: msg 1
produced: msg 2
produced: msg 3
consumed: msg 1
consumed: msg 2
consumed: msg 3
Here are some kafkacat/kafka producer properties and the values I already tried to change the producer behavior.
# kcat options having no effect on the test case
-u # unbuffered output
-T # act like `tee` and echo input
# kafka properties having no effect on the test case
-X queue.buffering.max.messages=1
-X queue.buffering.max.kbytes=1
-X batch.num.messages=1
-X queue.buffering.max.ms=100
-X socket.timeout.ms=100
-X max.in.flight.requests.per.connection=1
-X auto.commit.interval.ms=100
-X request.timeout.ms=100
-X message.timeout.ms=100
-X offset.store.sync.interval.ms=1
-X message.copy.max.bytes=100
-X socket.send.buffer.bytes=100
-X linger.ms=1
-X delivery.timeout.ms=100
None of the options above had any effect on the pipeline.
What am I missing?
Edit: It seems to be a flushing issue with either kcat or librdkafka. Maybe the -X properties are not used correctly.
Here are the current observations (will edit them as I learn more):
When sending a larger payload of 10000 messages with a smaller delay in the script, kcat will produce several batches of messages. It seems to be size-based, but not configurable by any of the -X options.
The batches are then also correctly picked up by the consumer. So it must be a producer issue.
I also tried the script in docker with the current kafkacat from the apline repos. This one seems to flush a but earlier; with less data needed to fill the "hidden" buffer. The -X options also had no effect.
Also the -X properties seem to be checked. If I set out-of-range values, kcat (or maybe librdkafka) will complain. However, setting low values for any of the timeout and buffer size values has no effect.
When calling kcat for every message (which is a bit of an overkill), the messages are produced instantly.
The question remains:
How do I tell a Kafka-pipeline to instantly produce my first message?
If you have an example in Go, this would also help, since I am having similar observations with a small Go program using kafka-go. I may post a separate question if I can strip that down to a postable format.
UPDATE: I tried using a bitnami image on a pure Linux host. Producing and consuming via kafkacat works as expected on this system. I will post an answer once I know more.

Here is how I solved the problem.
The issue was not in the Kafka docker images.
They all work as expected, although I was able to crash the Java-based Kafkas by just firing up kcat against them. I later added rpk (RedPanda, a non-Java "Kafka"), which was much more stable in my single node setup.
Findings
Using kcat I did not find any way of producing messages instantly without buffering. It notoriously ignores all -X args. (edenhill/kcat Version 1.7.0, MacOS, Arm64)
Sending single messages works. When closing the input pipe, kcat will flush the output buffer.
Consuming messages instantly via kcat is possible and works by default.
Other Kafka clients do not have this issue. I created a small kafka-go example that just works as expected; no extensive buffering by default.
Conculsion
Do not use kcat to produce messages via long-running pipes.
Use kafka-go or a similar client event for small health checks and other "scripts".

Kafka connector logs

I am working on kafka connectors and by time my connectors are increasing. So my log file is getting really messy, I was wondering if it is possible to have separate log file for each connector.

you can use grep command to view only the required logs.
command: tail /var/log/kafka/connect.log -f | grep -n 'phrase to search'
your path for log file could be different.

Whatever you are trying to achieve is not possible as of current Kafka-connect implementation. But there is a KIP under discussion which may help you when it's accepted and implemented. https://cwiki.apache.org/confluence/display/KAFKA/KIP-449%3A+Add+connector+contexts+to+Connect+worker+logs

How to add JVM parameters to Apache Kafka?

When configuring authentication for kafka, the document mentioned that JVM parameters need to be added when starting kafka server. like:
-Djava.security.auth.login.config=/etc/kafka/kafka_server_jaas.conf
Since we are using bin/kafka-server-start.sh to start the server, the document didn't mention where to specify the JVM parameters.
Modifying the kafka-server-start.sh or kafka-server-class.sh is not a good idea, then what will be the right way to add the parameter at the start?

I'd recommend to use the KAFKA_OPTS environment variable for this.
This environment variable is recognized by Kafka, and defaults to the empty string (= no settings). See the following code snippet from bin/kafka-run-class.sh in the Kafka source code:
# Generic jvm settings you want to add
if [ -z "$KAFKA_OPTS" ]; then
KAFKA_OPTS=""
fi
So, for example, you can do:
$ export KAFKA_OPTS="-Djava.security.auth.login.config=/etc/kafka/kafka_server_jaas.conf"
$ bin/kafka-server-start.sh
or
$ KAFKA_OPTS="-Djava.security.auth.login.config=/etc/kafka/kafka_server_jaas.conf" bin/kafka-server-start.sh

kafka consumer in shell script

I am newbie in Kafka. I want to consume remote kafka message in shell script. Basically I have linux machine where I cannot run any web server (some strange reasons) only thing I can do is use crontab/shell script to listen for kafka message which is hosted remotely. Is it possible to write simple shell script which will consume kafka message, parse it and take corresponding action.

kafka clients are available in multiple languages. you can use any client, you don't need any web server or browser for it.
you may use shell script for consuming message & parsing but that script have to use any kafka client provided here because currently there is no client written in pure shell script.
Kafka has provided kafka client console producer and consumer, you can use that as well.
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
follow the document properly.

You could also use the tool kafkacat which is documented for example here.
This is a very powerful and fast tool to read data out of Kafka from the console and is open source: https://github.com/edenhill/kafkacat.
Many exmples are provided on GitHub and one example is shown below:
kafkacat -C -b mybroker -t mytopic

How to copy a topic from a kafka cluster to another kafka cluster?

One way to do that as the Kafka documentation shows is through kafka.tools.MirrorMaker which can do that trick. However, I need to copy a topic (say test with 1 partition) (its content and meta data) from a production environment to a development environment where connectivity is not there. I could do simple file transfer between environments though.
My question: if I move the *.log and .index from the folder test-0 to the destination Kafka cluster, is that good enough? Or there is more that I need to do like meta data and ZooKeeper-related data that I need to move too?

Just copying the log and indexes will not suffice - kafka stores offsets and topic meta data in zookeeper. MirrorMaker is actually a quite simple tool, it spawns consumers to the source topic as well as producers to the target topic and runs until all consumers consumed the source queue. You can't find a simpler process to migrate a topic.

Use kafkacat
Unless your data is binary,
you can use a stock kafkacat.
Write topic to file:
kafkacat -b broker:9092 -e -K, -t my-topic > my-topic.txt
Write file back to topic:
kafkacat -b broker:9092 -K, -t my-topic -l my-topic.txt
If your data is binary,
you unfortunately have to build your own kafkacat from this branch, which is an as of yet unmerged PR.
Write topic with binary values to file:
kafkacat -b broker:9092 -e -Svalue=base64 -K, -t my-topic > my-topic.txt
Write file back to topic:
kafkacat -b broker:9092 -Svalue=base64 -K, -t my-topic -l my-topic.txt

What worked for me in your scenario was the following sequence of actions:
Create the topic in Kafka where you will later insert your files (with 1 partition and 1 replica and an appropriate retention.ms config so that Kafka doesn't delete your presumably outdated segments).
Stop your Kafka and Zookeeper.
Find the location of the files of the 0-partition you created in Kafka in step 1 (it will be something like kafka-logs-<hash>/<your-topic>-0).
In this folder, remove the existing files and copy your files to it.
Start Kafka and Zookeeper.
This also works if your Kafka is run from docker-compose (but you'll have to set up an appropriate volume, of course).

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

kcat protobuf deserialization - apache-kafka

I've had luck with this command: % kcat -C -t <topic> -b <kafkahost>:9092 -o -1 -e -q -D "" | protoc --decode=<full message class> path/to/my.proto --proto_path <proto_parent_folder>

Related

Sending messages to Kafka unbuffered using kafkacat

Kafka connector logs

How to add JVM parameters to Apache Kafka?

kafka consumer in shell script

How to copy a topic from a kafka cluster to another kafka cluster?

Categories

Resources