Kafka producer to read data files - apache-kafka

I am trying to load a data file in loop(to check stats) instead of standard input in Kafka. After downloading Kafka, I performed the following steps:
Started zookeeper:
bin/zookeeper-server-start.sh config/zookeeper.properties
Started Server:
bin/kafka-server-start.sh config/server.properties
Created a topic named "test":
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
Ran the Producer:
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
Test1
Test2
Listened by the Consumer:
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
Test1
Test2
Instead of Standard input, I want to pass a data file to the Producer which can be seen directly by the Consumer. Or is there any kafka producer instead of console consumer using which I can read data files. Any help would really be appreciated. Thanks!

You can read data file via cat and pipeline it to kafka-console-producer.sh.
cat ${datafile} | ${kafka_home}/bin/kafka-console-producer.sh --broker-list ${brokerlist} --topic test

If there is always a single file, you can just use tail command and then pipeline it to kafka console producer.
But if a new file will be created when some conditions met, you may need use apache.commons.io.monitor to monitor new file created, then repeat above.

Kafka has this built-in File Stream Connector, for piping the content of a file to producer(file source), or directing file content to another destination(file sink).
We have bin/connect-standalone.sh to read from file which can be configured in config/connect-file-source.properties and config/connect-standalone.properties.
So the command will be:
bin/connect-standalone.sh config/connect-standalone.properties config/connect-file-source.properties

The easiest way if you are using Linux or Mac is:
kafka-console-producer --broker-list localhost:9092 --topic test < messages.txt
Reference:
https://github.com/Landoop/kafka-cheat-sheet

You can probably try the kafkacat utility as well.
The readme on Github provides examples
It would be great if you could share which tool worked the best for you :)
Details from KafkaCat Readme:
Read messages from stdin, produce to 'syslog' topic with snappy compression
$ tail -f /var/log/syslog | kafkacat -b mybroker -t syslog -z snappy

kafka-console-produce.sh \
--broker-list localhost:9092 \
--topic my_topic \
--new-producer < my_file.txt
Follow this link: http://grokbase.com/t/kafka/users/157b71babg/kafka-producer-input-file

Below command is ofcourse the easiest way to do that.
kafka-console-producer --broker-list localhost:9092 --topic test < message.txt
But sometimes it is not able to find the file.
example :
C:\kafka_2.11-2.4.0\bin\windows>kafka-console-producer.bat --broker-list localhost:9092 --topic jason-input < C:\data\message.txt
you given the actual path but it is not able to find C at the current location so it will give the error : file not found. We would be thinking that we have given the actual path so it will go to root and it will start the path from there but it is finding the C(root) at the current place.
Solution for that is to give the ..\ into the path to move to the parent folder.
for example.
you are executing the command like
C:\kafka_2.11-2.4.0\bin\windows>kafka-console-producer.bat --broker-list localhost:9092 --topic jason-input < ..\..\..\data\message.txt
as of now i am into the windows folder. ..\ will move the current directory to bin folder and again ..\ will move the current directory to the kafka.... folder and again ..\ will move to the C:. so now my path starts. data and then message.txt

Related

How can I produce a Kafka Record with null value using the kafka tool set

I'm using the following command:
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test.topic --property parse.key=true --property key.separator=#
This allows me to start typing key#value entries.
However, no matter what I try, I'm not able to create a null entry.
If I try sending [myKey#] and press Enter, on the feed I will see an Empty message for the Key, but not null.
I need to create a Null value.
kafkacat allows you to produce tombstones through the -Z option.
Produce a tombstone (a "delete" for compacted topics) for key "abc" by providing an empty message value which -Z interpretes as NULL:
$ echo "abc:" | kafkacat -b mybroker -t mytopic -Z -K:
Console producer cannot produce a null record. It parses the input as UTF8 strings
Personally, I would write a simple python or ruby script to do so

Is there a way to add headers in kafka-console-producer.sh

I'd like to use the kafka-console-producer.sh to fire a few JSON messages with Kafka headers.
Is this possible?
docker exec -it kafka_1 /opt/kafka_2.12-2.3.0/bin/kafka-console-producer.sh --broker-list localhost:9093 --topic my-topic --producer.config /opt/kafka_2.12-2.3.0/config/my-custom.properties
No, but you can with kafkacat's -H argument:
Produce:
echo '{"col_foo":1}'|kafkacat -b localhost:9092 -t test -P -H foo=bar
Consume:
kafkacat -b localhost:9092 -t test -C -f '-----\nTopic %t[%p]\nOffset: %o\nHeaders: %h\nKey: %k\nPayload (%S bytes): %s\n'
-----
Topic test[0]
Offset: 0
Headers: foo=bar
Key:
Payload (9 bytes): col_foo:1
% Reached end of topic test [0] at offset 1
Starting from kafka 3.1.0 there is an option to turn on headers parsing parse.headers=true and then you just place them before your record value, info form docs:
| parse.headers=true:
| "h1:v1,h2:v2...\tvalue"
So your command will look like
kafka-console-producer.sh --bootstrap-server localhost:9092 --topic topic_name --property parse.headers=true
and then you pass
header_name:header_value\nrecord_value

kafka-configs.sh can't find kafka.admin.ConfigCommand

I've downloaded Kafka bash script on https://www.apache.org/dyn/closer.cgi?path=/kafka/2.2.0/kafka_2.12-2.2.0.tgz
When I run ./kafka_2.12-2.2.0/bin/kafka-configs.sh --zookeeper localhost:12181 --entity-type topics --entity-name challenges --alter --add-config cleanup.policy=compact,segment.bytes=1000000,segment.ms=60000.
Or just ./kafka_2.12-2.2.0/bin/kafka-configs.sh
I always have an error
"Can't find or load kafka.admin.ConfigCommand"
.
And I have already tried to re-download and to test with kafka_2.12-2.3.0.tgz
EDIT: I'm on Windows 10 and use Git bash to run script

bin/kafka-console-producer.sh Exception

when I typed this command line:
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test
But,the final result returns like this:
it returns nothing,it is unusual.Somebody could help?Thanks in advance.

What happens after modifying Kafka topic replication factor?

According to the docs, you just modify a topic to increase the replication factor.
> bin/kafka-topics.sh --zookeeper zk_host:port/chroot --create --topic my_topic_name
--partitions 20 --replication-factor 3 --config x=y
Unfortunately, it doesn't specify what happens then after you modify the topic. Do existing log segments get replicated to the new replicas, or only new messages?
For kafka 10 you need to invoke the kafka-reassign-partitions.sh script to change the replication factor of the topic.
Here is a demo of a script that will display the topic before and after the change:
updateTopicReplication() {
TOPIC_NAME=$1
REPLICAS=$2
echo "****************************************"
echo "describe $TOPIC_NAME"
bin/kafka-topics.sh --zookeeper localhost:2181 --describe --topic $TOPIC_NAME
JSON_FILE=./configChanges/${TOPIC_NAME}-update.json
echo $JSON_FILE
[ -e $JSON_FILE ] && rm $JSON_FILE
touch $JSON_FILE
echo -e "{\"version\":1, \"partitions\":[{\"topic\":\"${TOPIC_NAME}\",\"partition\":0,\"replicas\":[${REPLICAS}]}]}" >> $JSON_FILE
echo "****************************************"
echo "updateing $TOPIC_NAME"
bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --reassignment-json-file $JSON_FILE --execute
echo "****************************************"
echo "describe $TOPIC_NAME"
bin/kafka-topics.sh --zookeeper localhost:2181 --describe --topic $TOPIC_NAME
}
It seems that kafka actually doesn't support increasing (or decreasing) the replication factor for a topic, according to the same docs I mentioned in the question.