Kafka topic have no duplication on messages - apache-kafka

How to achieve such outcome with messages in kafka topics?
I.e changelog-like functionality - have multiple messages coming into the topic, but I only care about the last one that came in.
Also what happens in the case topic is partitioned?
Is it possible in Kafka?

To achieve this, you should set cleanup.policy for this topic to compact, as shown below:
CREATE TOPIC:
bin/kafka-topics.sh --zookeeper localhost:2181 --create --topic my-topic --partitions 1 --replication-factor 1 --config cleanup.policy=compact
UPDATE TOPIC:
bin/kafka-configs.sh --zookeeper localhost:2181 --entity-type topics --entity-name my-topic --alter --add-config cleanup.policy=compact
With compact policy set, you have to assign a key for every message and Kafka producer will partition messages based on that key.

Related

kafka-configs.sh + what is the configuration after retention.ms deletion on topic

On the following kafka topic syntax
kafka-topics.sh --zookeeper zookeeper:2181 --describe --topics-with-overrides
we get :
Topic: __consumer_offsets PartitionCount: 50 ReplicationFactor: 3 Configs: compression.type=producer,cleanup.policy=compact,min.insync.replicas=3,segment.bytes=104857600,retention.ms=7200000
We can see that retention.ms=18000000 ( its means that retention in mili second is 5 hours )
now we removed the retention.ms configuration by the following kafka cli
kafka-configs.sh --zookeeper zookeeper:2181 --alter --entity-type topics --entity-name __consumer_offsets --delete-config retention.ms
and therefore we get
kafka-topics.sh --zookeeper zookeeper:2181 --describe --topics-with-overrides
Topic: __consumer_offsets PartitionCount: 50 ReplicationFactor: 3 Configs: compression.type=producer,cleanup.policy=compact,min.insync.replicas=3,segment.bytes=104857600
As we can see above retention.ms was removed
Now my Question is
After we removed the retention.ms , what should be the new retention ? , is it means that kafka is looking now on kafka configuration file as - server.properties ?
Second , what is the risks when we perform the kafka cli - kafka-configs.sh --zookeeper zookeeper:2181 --alter --entity-type topics --entity-name __consumer_offsets --delete-config retention.ms
All topics have a retention, so there is no risk of removing the topic-override.
If the override is removed, then yes, it'll take the default, global retention from the broker server.properties
Note that you may see different values if you use --describe --bootstrap-servers kafka:9092

How to find the retention bytes per topic

we the following kafka config command we can set the retention bytes to 1000000
TOPIC_NAME=test
kafka-configs --alter --zookeeper localhost:2181 --entity-type topics --entity-name $TOPIC_NAME --add-config retention.bytes=1000000
but how to do the opposite way to find the retention bytes per topic ?
You can use describe key
./kafka-configs --zookeeper localhost:2181 --describe --entity-type topics --entity-name test
The altered properties are returned, otherwise they are default from the broker.

How to purge or delete a topic in kafka 2.1.0 version

Would like to share different ways to purge or delete a kafka topic in 2.1.0 version. I've found similar question here Purge Kafka Topic however, the accepted answer has been deprecated and it works on Kafka version 0.8 and below hence, creating this question with answer.
This is not a duplicate question.
Kafka by default keeps the messages for 168 hrs which is 7 days. If you wanted to force kafka to purge the topic, you can do this by multiple ways. Let’s see each in detail.
1. Using kafka-configs.sh command
Temporarily change the retention policy to 1 sec.
kafka-configs.sh --zookeeper localhost:2181 --alter --entity-type topics --add-config retention.ms=1000 --entity-name text_topic
You can check the current value of retention policy by running below command.
kafka-configs.sh --zookeeper localhost:2181 --entity-type topics --describe --entity-name text_topic
Configs for topic 'text_topic' are retention.ms=1000
Wait for 1 sec and remove the retention policy config which will set it back to default.
kafka-configs.sh --zookeeper localhost:2181 --alter --entity-type topics --delete-config retention.ms --entity-name text_topic
2. Delete topic and re-create
Before we delete an existing topic, first get the partition and replica of the current topic as you would need these to re-create the topic. You can get this information by running describe of the topic
kafka-topics.sh --zookeeper localhost:2181 --describe --topic text_topic
Topic:text_topic PartitionCount:3 ReplicationFactor:3 Configs:
Topic: text_topic Partition: 0 Leader: 0 Replicas: 0 Isr: 0
Delete the topic.
kafka-topics.sh --zookeeper localhost:2181 --delete --topic text_topic
Re-create the topic with replication and partition details.
kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 3 --topic text_topic
3. Manually delete the data from kafka logs.
Stop zookeeper and kafka from all nodes.
Clean kafka logs from all nodes. kafka stores its log files at
/tmp/kafka-logs/MyTopic-0 where /tmp/kafka-logs is specified by the
log.dirattribute
Restart zookeeper and kafka.
Hope this helps !!

kafka consumer not showing the messages?

I created the new topic 'rahul' with the following command :
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic rahul
Created topic "rahul".
I also re-checked the topics with
bin/kafka-topics.sh --list --zookeeper localhost:2181
__consumer_offsets
rahhy
rahul`
Now starting the producer:
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic rahul
hey
hi
hello
But when the time comes to consumer to show the messages: there is nothing
As of Kafka 0.9, you don't use Zookeeper for consumption or production
Try kafka-console-consumer --topic rahul --bootstrap-server localhost:9092
There are other ways you can check messages were sent to Kafka - by checking that the offsets of the topic partitions have changed using GetOffsetShell

Ambiguity in the number of partition in server.properties and in topic creation --partition parameter in apache kafka

In kafka I have created the topic with the ./kafka-topics.sh command. the command would be like
./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 8 --topic test
With 8 partitions for the topic test, there is server.properties configuration in the kafka broker, which also has num.partitionsparameter which is default is 1.
Now my specific question is it will not create ambiguity in the partition for the topic test. It will consider partition that were mentioned at the time of topic creation or num.partition in the server.properties
Kafka can be configured to create topics on demand. It means that if you try to send message to the topic that not exists, topic will be created automatically with the number of partitions that was specified as num.partitions property in server.properties. If you are going to create topic by yourself using
./kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 8 --topic test
the topic will be created with the number of partitions specified as --partitions, in your case it will be 8, and property num.partitions will be ignored.