Is there a way to add headers in kafka-console-producer.sh - apache-kafka

I'd like to use the kafka-console-producer.sh to fire a few JSON messages with Kafka headers.
Is this possible?
docker exec -it kafka_1 /opt/kafka_2.12-2.3.0/bin/kafka-console-producer.sh --broker-list localhost:9093 --topic my-topic --producer.config /opt/kafka_2.12-2.3.0/config/my-custom.properties

No, but you can with kafkacat's -H argument:
Produce:
echo '{"col_foo":1}'|kafkacat -b localhost:9092 -t test -P -H foo=bar
Consume:
kafkacat -b localhost:9092 -t test -C -f '-----\nTopic %t[%p]\nOffset: %o\nHeaders: %h\nKey: %k\nPayload (%S bytes): %s\n'
-----
Topic test[0]
Offset: 0
Headers: foo=bar
Key:
Payload (9 bytes): col_foo:1
% Reached end of topic test [0] at offset 1

Starting from kafka 3.1.0 there is an option to turn on headers parsing parse.headers=true and then you just place them before your record value, info form docs:
| parse.headers=true:
| "h1:v1,h2:v2...\tvalue"
So your command will look like
kafka-console-producer.sh --bootstrap-server localhost:9092 --topic topic_name --property parse.headers=true
and then you pass
header_name:header_value\nrecord_value

Related

How to force log compaction of a Kafka topic?

Using Kafka 2.7.0 (in K8s), I create a test topic with cleanup.policy=compact:
./kafka-topics.sh --create --bootstrap-server kafka.core-kafka.svc.cluster.local:9092 --topic _test_quick_compaction_2021_12_02 --partitions 1 --replication-factor 3 --config cleanup.policy=compact
Write some messages to it:
kafkacat -b kafka.core-kafka.svc.cluster.local:9092 -P -t _test_quick_compaction_2021_12_02 -K:
1:a
2:b
3:c
1:d
2:e
Change the topic settings in a way such that compaction should kick in after 10 seconds:
./kafka-topics.sh --alter --zookeeper zookeeper.core-kafka.svc.cluster.local --topic _test_quick_compaction_2021_12_02 --config max.compaction.lag.ms=10000 --config min.cleanable.dirty.ratio=0.0 --config segment.ms=10000 --config delete.retention.ms=10000
Wait a minute, just to be sure:
sleep 60
Check the topic content:
kafkacat -C -e -o beginning -b kafka.core-kafka.svc.cluster.local:9092 -t _test_quick_compaction_2021_12_02 -K:
And to my surprise, the content is still
1:a
2:b
3:c
1:d
2:e
instead of the
3:c
1:d
2:e
which I expected.
Why is the topic not compacted, and what can I do to force it?
Since active segments are not eligible for compaction, the trick was to again write something to the topic to force the creation of a new segment.
# Create a test topic.
./kafka-topics.sh --create --bootstrap-server kafka.core-kafka.svc.cluster.local:9092 --topic _test_quick_compaction_2021_12_02 --partitions 1 --replication-factor 3 --config cleanup.policy=compact
# Write some messages to it.
echo "1:a\n2:b\n3:c" | kafkacat -b kafka.core-kafka.svc.cluster.local:9092 -P -t _test_quick_compaction_2021_12_02 -K:
# Check the topic content.
kafkacat -C -e -o beginning -b kafka.core-kafka.svc.cluster.local:9092 -t _test_quick_compaction_2021_12_02 -K:
# Change the topic settings in a way such that compaction should kick in after 10 seconds.
./kafka-topics.sh --alter --zookeeper zookeeper.core-kafka.svc.cluster.local --topic _test_quick_compaction_2021_12_02 --config max.compaction.lag.ms=10000 --config min.cleanable.dirty.ratio=0.0 --config segment.ms=10000 --config delete.retention.ms=10000
# Wait for the last segment to outdate
sleep 11
# Write new messages.
echo "1:d\n2:e" | kafkacat -b kafka.core-kafka.svc.cluster.local:9092 -P -t _test_quick_compaction_2021_12_02 -K:
# Check the topic content.
kafkacat -C -e -o beginning -b kafka.core-kafka.svc.cluster.local:9092 -t _test_quick_compaction_2021_12_02 -K:
# Wait for this segment to outdate.
sleep 11
# Write new messages again.
echo "1:d\n2:e" | kafkacat -b kafka.core-kafka.svc.cluster.local:9092 -P -t _test_quick_compaction_2021_12_02 -K:
# Check the topic content.
kafkacat -C -e -o beginning -b kafka.core-kafka.svc.cluster.local:9092 -t _test_quick_compaction_2021_12_02 -K:
# Wait for compaction to happen.
sleep 11
# Check the topic content to validate that it has been compacted.
kafkacat -C -e -o beginning -b kafka.core-kafka.svc.cluster.local:9092 -t _test_quick_compaction_2021_12_02 -K:
# Revert the setting changes.
./kafka-topics.sh --alter --zookeeper zookeeper.core-kafka.svc.cluster.local --topic _test_quick_compaction_2021_12_02 --delete-config max.compaction.lag.ms --delete-config min.cleanable.dirty.ratio --delete-config segment.ms --delete-config delete.retention.ms
# Delete the topic
# /home/th/kafka_2.13-2.7.0/bin/kafka-topics.sh --delete --bootstrap-server kafka.core-kafka.svc.cluster.local:9092 --topic _test_quick_compaction_2021_12_02

How can I produce a Kafka Record with null value using the kafka tool set

I'm using the following command:
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test.topic --property parse.key=true --property key.separator=#
This allows me to start typing key#value entries.
However, no matter what I try, I'm not able to create a null entry.
If I try sending [myKey#] and press Enter, on the feed I will see an Empty message for the Key, but not null.
I need to create a Null value.
kafkacat allows you to produce tombstones through the -Z option.
Produce a tombstone (a "delete" for compacted topics) for key "abc" by providing an empty message value which -Z interpretes as NULL:
$ echo "abc:" | kafkacat -b mybroker -t mytopic -Z -K:
Console producer cannot produce a null record. It parses the input as UTF8 strings
Personally, I would write a simple python or ruby script to do so

how to view kafka headers

We are sending message with headers to Kafka using
org.apache.kafka.clients.producer.ProducerRecord
public ProducerRecord(String topic, Integer partition, K key, V value, Iterable<Header> headers) {
this(topic, partition, (Long)null, key, value, headers);
}
How can I actually see these headers using command. kafka-console-consumer.sh only shows me payload and no headers.
You can use the excellent kafkacat tool.
Sample command:
kafkacat -b kafka-broker:9092 -t my_topic_name -C \
-f '\nKey (%K bytes): %k
Value (%S bytes): %s
Timestamp: %T
Partition: %p
Offset: %o
Headers: %h\n'
Sample output:
Key (-1 bytes):
Value (13 bytes): {foo:"bar 5"}
Timestamp: 1548350164096
Partition: 0
Offset: 34
Headers: __connect.errors.topic=test_topic_json,__connect.errors.partition=0,__connect.errors.offset=94,__connect.errors.connector.name=file_sink_03,__connect.errors.task.id=0,__connect.errors.stage=VALU
E_CONVERTER,__connect.errors.class.name=org.apache.kafka.connect.json.JsonConverter,__connect.errors.exception.class.name=org.apache.kafka.connect.errors.DataException,__connect.errors.exception.message=Co
nverting byte[] to Kafka Connect data failed due to serialization error: ,__connect.errors.exception.stacktrace=org.apache.kafka.connect.errors.DataException: Converting byte[] to Kafka Connect data failed
due to serialization error:
The kafkacat header option is only available in recent builds of kafkacat; you may want to build from master branch yourself if your current version doesn't include it.
You can also run kafkacat from Docker:
docker run --rm edenhill/kafkacat:1.5.0 \
-b kafka-broker:9092 \
-t my_topic_name -C \
-f '\nKey (%K bytes): %k
Value (%S bytes): %s
Timestamp: %T
Partition: %p
Offset: %o
Headers: %h\n'
If you use Docker bear in mind the network implications of how to reach the Kafka broker.
Starting with kafka-2.7.0 you can enable printing headers in console-consumer by providing property print.headers=true
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic quickstart-events --property print.key=true --property print.headers=true --property print.timestamp=true
You can also use kafkactl for this. E.g. with output as yaml:
kafkactl consume my-topic --print-headers -o yaml
Sample output:
partition: 1
offset: 22
headers:
key1: value1
key2: value2
value: my-value
Disclaimer: I am contributor to this project
From kafka-console-consumer.sh script:
exec $(dirname $0)/kafka-run-class.sh kafka.tools.ConsoleConsumer "$#"
src: https://github.com/apache/kafka/blob/2.1.1/bin/kafka-console-consumer.sh
In kafka.tools.ConsoleConsumer the header is provided to the Formatter, but none of the existing Formatters makes use of it:
formatter.writeTo(new ConsumerRecord(msg.topic, msg.partition, msg.offset, msg.timestamp,
msg.timestampType, 0, 0, 0, msg.key, msg.value, msg.headers),
output)
src: https://github.com/apache/kafka/blob/2.1.1/core/src/main/scala/kafka/tools/ConsoleConsumer.scala
At the bottom of the above link you can see existing Formatters.
If you want to print headers you need to implement your own kafka.common.MessageFormatter and in particular its write method:
def writeTo(consumerRecord: ConsumerRecord[Array[Byte], Array[Byte]], output: PrintStream): Unit
and then run your console consumer with --formatter providing your own formatter (it should also be present on the classpath).
Another, simpler and faster way, would be to implement your own mini-program using KafkaConsumer and check headers in debug.
kcat -C -b $brokers -t $topic -f 'key: %k Headers: %h: Message value: %s\n'

Kafka producer to read data files

I am trying to load a data file in loop(to check stats) instead of standard input in Kafka. After downloading Kafka, I performed the following steps:
Started zookeeper:
bin/zookeeper-server-start.sh config/zookeeper.properties
Started Server:
bin/kafka-server-start.sh config/server.properties
Created a topic named "test":
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
Ran the Producer:
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
Test1
Test2
Listened by the Consumer:
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
Test1
Test2
Instead of Standard input, I want to pass a data file to the Producer which can be seen directly by the Consumer. Or is there any kafka producer instead of console consumer using which I can read data files. Any help would really be appreciated. Thanks!
You can read data file via cat and pipeline it to kafka-console-producer.sh.
cat ${datafile} | ${kafka_home}/bin/kafka-console-producer.sh --broker-list ${brokerlist} --topic test
If there is always a single file, you can just use tail command and then pipeline it to kafka console producer.
But if a new file will be created when some conditions met, you may need use apache.commons.io.monitor to monitor new file created, then repeat above.
Kafka has this built-in File Stream Connector, for piping the content of a file to producer(file source), or directing file content to another destination(file sink).
We have bin/connect-standalone.sh to read from file which can be configured in config/connect-file-source.properties and config/connect-standalone.properties.
So the command will be:
bin/connect-standalone.sh config/connect-standalone.properties config/connect-file-source.properties
The easiest way if you are using Linux or Mac is:
kafka-console-producer --broker-list localhost:9092 --topic test < messages.txt
Reference:
https://github.com/Landoop/kafka-cheat-sheet
You can probably try the kafkacat utility as well.
The readme on Github provides examples
It would be great if you could share which tool worked the best for you :)
Details from KafkaCat Readme:
Read messages from stdin, produce to 'syslog' topic with snappy compression
$ tail -f /var/log/syslog | kafkacat -b mybroker -t syslog -z snappy
kafka-console-produce.sh \
--broker-list localhost:9092 \
--topic my_topic \
--new-producer < my_file.txt
Follow this link: http://grokbase.com/t/kafka/users/157b71babg/kafka-producer-input-file
Below command is ofcourse the easiest way to do that.
kafka-console-producer --broker-list localhost:9092 --topic test < message.txt
But sometimes it is not able to find the file.
example :
C:\kafka_2.11-2.4.0\bin\windows>kafka-console-producer.bat --broker-list localhost:9092 --topic jason-input < C:\data\message.txt
you given the actual path but it is not able to find C at the current location so it will give the error : file not found. We would be thinking that we have given the actual path so it will go to root and it will start the path from there but it is finding the C(root) at the current place.
Solution for that is to give the ..\ into the path to move to the parent folder.
for example.
you are executing the command like
C:\kafka_2.11-2.4.0\bin\windows>kafka-console-producer.bat --broker-list localhost:9092 --topic jason-input < ..\..\..\data\message.txt
as of now i am into the windows folder. ..\ will move the current directory to bin folder and again ..\ will move the current directory to the kafka.... folder and again ..\ will move to the C:. so now my path starts. data and then message.txt

What happens after modifying Kafka topic replication factor?

According to the docs, you just modify a topic to increase the replication factor.
> bin/kafka-topics.sh --zookeeper zk_host:port/chroot --create --topic my_topic_name
--partitions 20 --replication-factor 3 --config x=y
Unfortunately, it doesn't specify what happens then after you modify the topic. Do existing log segments get replicated to the new replicas, or only new messages?
For kafka 10 you need to invoke the kafka-reassign-partitions.sh script to change the replication factor of the topic.
Here is a demo of a script that will display the topic before and after the change:
updateTopicReplication() {
TOPIC_NAME=$1
REPLICAS=$2
echo "****************************************"
echo "describe $TOPIC_NAME"
bin/kafka-topics.sh --zookeeper localhost:2181 --describe --topic $TOPIC_NAME
JSON_FILE=./configChanges/${TOPIC_NAME}-update.json
echo $JSON_FILE
[ -e $JSON_FILE ] && rm $JSON_FILE
touch $JSON_FILE
echo -e "{\"version\":1, \"partitions\":[{\"topic\":\"${TOPIC_NAME}\",\"partition\":0,\"replicas\":[${REPLICAS}]}]}" >> $JSON_FILE
echo "****************************************"
echo "updateing $TOPIC_NAME"
bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --reassignment-json-file $JSON_FILE --execute
echo "****************************************"
echo "describe $TOPIC_NAME"
bin/kafka-topics.sh --zookeeper localhost:2181 --describe --topic $TOPIC_NAME
}
It seems that kafka actually doesn't support increasing (or decreasing) the replication factor for a topic, according to the same docs I mentioned in the question.