Kafka is not sending messages to other partitions - apache-kafka

Apache Kafka installed on Mac (Intel).
Single local producer and single local consumer.
1 topic with 3 partitions and 1 replication factor is created:
bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --topic animal --partitions 3 --replication-factor 1
Producer code:
bin/kafka-console-producer.sh --bootstrap-server localhost:9092 --topic animal
Producer Messages:
>alligator
>crocodile
>tiger
When producing messages (manually via producer-console), all go into the same partition. Shouldn't they get distributed across partitions?
I've tried with 3 records (as above), but they get sent to 1 partition only. Checked within tmp/kafka-logs/topic-0/00**00.log
Other logs in topic- are empty.
I've tried with tens of records, but no luck.
I even increased the default partition configuration (num.partitions=3) within 'config/server.properties', but no luck.
I've also tried with different topics, but no luck.

Starting with kafka 2.4, the default partitioner was changed from round-robin to sticky, which will stick to the same partition (pun intended) for an entire batch.
With my kafka version, the kafka-console-producer uses a default batch size of 16384, so once you produce enough messages to fill that buffer, the partition will change.

If a producer, produces messages with the same key then it’s guaranteed to be produced on the same partition. so in your case if you want it to be consumed by different partitions than make sure to publish it with different keys.
You will need to set below property.
--property parse.key=true
See below command to produce record with key.
kafka-console-producer --broker-list 127.0.0.1:9092 --topic first_topic --property parse.key=true --property key.separator=,
> key1,value1
> key2,value2

Related

how Compaction works in Apache Kafka

My input is 1:45$ and we are processing the message and next I am updating the 1:null.
I do see still the 1:45$ in the topic along with the 1:null (I can see both the messages)
I want output to be 1:null in the same topic.
I have used this code:
kafka-topics --create --zookeeper zookeeper:2181 --topic latest- product-price --replication-factor 1 --partitions 1 --config "cleanup.policy=compact" --config "delete.retention.ms=100" --config "segment.ms=100" --config "min.cleanable.dirty.ratio=0.01"
kafka-console-producer --broker-list localhost:9092 --topic latest- product-price --property parse.key=true --property key.separator=::
1::45$
1::null
kafka-console-consumer --bootstrap-server localhost:9092 --topic latest-product-price --property print.key=true --property key.separator=:: --from-beginning
But I do not find any compaction in my case and need some inputs to make the value as 1::null
Compaction in Kafka is not immediate. If you send two messages with the same key to a compacted topic, and you have a live consumer on that topic, that consumer will see both messages come through.
Periodically, there's a background cleaner thread that goes looking for duplicate keys in compacted topics, and removes the overwritten records, so that a consumer that pulls down the data after that log cleaner has run, will only see the last change/update for a particular key. So, topic compaction seems to be better suited for consumers that run periodically, not ones that are active 100% of the time.
One thing that you can tune how often this background log cleaner thread runs, to maybe run those consumers more often. Look for the log.cleaner configuration parameters in the Kafka documentation: https://kafka.apache.org/documentation/#brokerconfigs
There's a good explanation on how Kafka log compaction works at this link:
https://medium.com/swlh/introduction-to-topic-log-compaction-in-apache-kafka-3e4d4afd2262

Kafka Consumer does not receive data when one of the brokers is down

Kafka Quickstart
Using Kafka v2.1.0 on RHEL v6.9
Consumer fails to receive data when one of the Kafka brokers is down.
Steps performed:
1. Start zookeeper
2. Start Kafka-Server0 (localhost:9092, kafkalogs1)
3. Start Kafka-Server1 (localhost:9094, kafkalog2)
4. Create topic "test1", num of partitions = 1, replication factor = 2
5. Run producer for topic "test1"
6. Run consumer
7. Send messages from the producer
8. Receive messages on the consumer side.
All the above steps worked without any issues.
When I shutdown Kafka-Server0, the consumer stops getting data from Producer.
When I bring back up Kafka-Server0, the consumer starts to get messages from where it left off.
These are the commands used
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test1
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test1
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 2 --partitions 1 --topic test1
The behavior is the same (no message received on the consumer side) when I run the consumer with two servers specified in the --bootstrap-server option.
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092,localhost:9094 --topic test1
Any idea why the consumer stops getting messages when server0 is down even though the replication factor for the topic test1 was set to 2?
There is a similar question already but it was not answered completely
Kafka 0.10 quickstart: consumer fails when "primary" broker is brought down
If the offsets topic is unavailable, you cannot consume.
Look at the server.properties file for these, and see the comment above, and increase accordingly (only applies if topic doesn't already exist)
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
According to your previous question, looks like it only has one replica
See how you can increase replication factor for an existing topic
In initial versions of Kafka, offset was being managed at zookeeper, but Kafka has continuously evolved over the time introducing lot of new features. Now Kafka manages the offset in a topic __consumer_offsets.
You can think of a scenario where you created a topic with a replication factor of 1. In case the broker goes down the data is only on that Kafka node which is down. So you can't get this data. Same analogy applies to __consumer_offsets topic.
You need to revisit the server.properties in order to get features you are expecting. But in case you still wanna consume the messages from the replica partition, you may need to re-start the console consumer with --from-beginning true

how to increase number of partitions for a topic and change the leader of each partition and rebalance them

If we have a topic that is in kafka and it has 5 partitions can we increase the number of partitions to 30. Also after increasing the number of partitions we change the leader of each partition in order of the broker ids and rebalance the cluster for that particular topic. How can we do that ?
I found out how it works.
1) First find the information of the existing topic
2) Find out all the brokers in your cluster with thier IDs
3) Scale number of partitions
4) Run the cluster reassignment script to rebalance
Give the actual zookeeper urls or the ips where i am using localhost.
1) In the bin directory we have kafka-topics.sh
./kafka-topics.sh --zookeeper localhost:2181 --topic dummytopic --describe
shows you all about the topic where the leaders of partitions are there and replicas are there
2) ./zookeeper-shell.sh localhost:2181
do ls /brokers/ids
gives you list of all the brokers ids
3)./kafka-topics.sh --alter --zookeeper localhost:2181 --topic dummytopic --partitions 30
Increases the number of partitions
4)
Before running this command you need a json file which will let you change the partition leader for a particular topic
For that i developed a simple tool do generate the json for very large partition count
https://github.com/chandradeepak/kafka-reassignment-gen
go build && topic=dummytopic num_partitions=30 brokerid_start=1022 replica_count=3 ./kafka-reassignment-gen
this will generate a json which we can use for the expand-cluster-reassignment.json . it looks some thing like this
{"version":1,"partitions":[{"topic":"dummytopic","partition":0,"replicas":[1001,1002,1003]},{"topic":"dummytopic","partition":1,"replicas":[1002,1003,1004]},{"topic":"dummytopic","partition":2,"replicas":[1003,1004,1005]},{"topic":"dummytopic","partition":3,"replicas":[1004,1005,1006]},{"topic":"dummytopic","partition":4,"replicas":[1005,1006,1007]},{"topic":"dummytopic","partition":5,"replicas":[1006,1007,1008]},{"topic":"dummytopic","partition":6,"replicas":[1007,1008,1009]},{"topic":"dummytopic","partition":7,"replicas":[1008,1009,1010]},{"topic":"dummytopic","partition":8,"replicas":[1009,1010,1011]},{"topic":"dummytopic","partition":9,"replicas":[1010,1011,1012]},{"topic":"dummytopic","partition":10,"replicas":[1011,1012,1013]},{"topic":"dummytopic","partition":11,"replicas":[1012,1013,1014]},{"topic":"dummytopic","partition":12,"replicas":[1013,1014,1015]},{"topic":"dummytopic","partition":13,"replicas":[1014,1015,1016]},{"topic":"dummytopic","partition":14,"replicas":[1015,1016,1017]},{"topic":"dummytopic","partition":15,"replicas":[1016,1017,1018]},{"topic":"dummytopic","partition":16,"replicas":[1017,1018,1019]},{"topic":"dummytopic","partition":17,"replicas":[1018,1019,1020]},{"topic":"dummytopic","partition":18,"replicas":[1019,1020,1021]},{"topic":"dummytopic","partition":19,"replicas":[1020,1021,1022]},{"topic":"dummytopic","partition":20,"replicas":[1021,1022,1023]},{"topic":"dummytopic","partition":21,"replicas":[1022,1023,1024]},{"topic":"dummytopic","partition":22,"replicas":[1023,1024,1025]},{"topic":"dummytopic","partition":23,"replicas":[1024,1025,1026]},{"topic":"dummytopic","partition":24,"replicas":[1025,1026,1027]},{"topic":"dummytopic","partition":25,"replicas":[1026,1027,1028]},{"topic":"dummytopic","partition":26,"replicas":[1027,1028,1029]},{"topic":"dummytopic","partition":27,"replicas":[1028,1029,1030]},{"topic":"dummytopic","partition":28,"replicas":[1029,1030,1001]},{"topic":"dummytopic","partition":29,"replicas":[1030,1001,1002]}]}
./kafka-reassign-partitions.sh --zookeeper localhost:2181 --reassignment-json-file expand-cluster-reassignment.json --execute
This will execute the cluster reassignment and changes the partition leaders to what you expect.

Kafka 10.2 new consumer vs old consumer

I've spent some hours to figure out what was going on but didn't manage to find the solution.
Here is my set up on a single machine:
1 zookeeper running
3 broker running (on port 9092/9093/9094)
1 topic with 3 partitions and 3 replications (each partition are properly assigned between brokers)
I'm using kafka console producer to insert messages. If i check the replication offset (cat replication-offset-checkpoint), I see that my messages are properly ingested by Kafka.
Now I use the kafka console consumer (new):
sudo bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic testTopicPartitionned2
I dont see anything consumed. I tried to delete my logs folder (/tmp/kafka-logs-[1,2,3]), create new topics, still nothing.
However when I use the old kafka consumer:
sudo bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic testTopicPartitionned2
I can see my messages.
Am I missing something big here to make this new consumer work ?
Thanks in advance.
Check to see what setting the consumer is using for auto.offset.reset property
This will affect what a consumer group without a previously committed offset will do in terms of setting where to start reading messages from a partition.
Check the Kafka docs for more on this.
Try providing all your brokers to --bootstrap-server argument to see if you notice any differnce:
sudo bin/kafka-console-consumer.sh --bootstrap-server localhost:9092,localhost:9093,localhost:9094 --from-beginning --topic testTopicPartitionned2
Also, your topic name is rather long. I assume you've already made sure you provide the correct topic name.

Kafka Connect Offsets. Get/Set?

How do I get, set, or reset the offset of a Kafka Connect connector/task/sink?
I can use the /usr/bin/kafka-consumer-groups tool which runs kafka.admin.ConsumerGroupCommand to see the offsets for all my regular Kafka consumer groups. However, Kafka Connect tasks and groups do not show up with this tool.
Similarly, I can use the zookeeper-shell to connect to Zookeeper and I can see zookeeper entries for regular Kafka consumer groups, but not for Kafka Connect sinks.
As of 0.10.0.0, Connect doesn't provide an API for managing offsets. It's something we want to improve in the future, but not there yet. The ConsumerGroupCommand would be the right tool to manage offsets for Sink connectors. Note that source connector offsets are stored in a special offsets topic for Connect (they aren't like normal Kafka offsets since they are defined by the source system, see offset.storage.topic in the worker configuration docs) and since sink Connectors uses the new consumer, they won't store their offsets in Zookeeper -- all modern clients use native Kafka-based offset storage. The ConsumerGroupCommand can work with these offsets, you just need to pass the --new-consumer option).
You can't set offsets, but you can use kafka-consumer-groups.sh tool to "scroll" the feed forward.
The consumer group of your connector has a name of connect-*CONNECTOR NAME*, but you can double check:
unset JMX_PORT; ./bin/kafka-consumer-groups.sh --bootstrap-server *KAFKA HOSTS* --list
To view current offset:
unset JMX_PORT; ./bin/kafka-consumer-groups.sh --bootstrap-server *KAFKA HOSTS* --group connect-*CONNECTOR NAME* --describe
To move the offset forward:
unset JMX_PORT; ./bin/kafka-console-consumer.sh --bootstrap-server *KAFKA HOSTS* --topic *TOPIC* --max-messages 10000 --consumer-property group.id=connect-*CONNECTOR NAME* > /dev/null
I suppose you can move the offset backward as well by deleting the consumer group first, using --delete flag.
Don't forget to pause and resume your connector via Kafka Connect REST API.
In my case(testing reading files into producer and consume in console, all in local only), I just saw this in producer output:
offset.storage.file.filename=/tmp/connect.offsets
So I wanted to open it but it is binary, with some hardly recognizable characters.
I deleted it(rename it also works), and then I can write into the same file and get the file content from consumer again. You have to restart the console producer to take effect because it attempts to read the offset file, if not there, create a new one, so that the offset is reset.
If you want to reset it without deletion, you can use:
bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group <group-name> --reset-offsets --to-earliest --topic <topic_name>
You can check all group names by:
bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --list
and check details of each group:
bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --group <group_name> --describe
In production environment, this offset is managed by zookeeper, so more steps (and caution) is needed. You can refer to this page:
https://metabroadcast.com/blog/resetting-kafka-offsets
https://community.hortonworks.com/articles/81357/manually-resetting-offset-for-a-kafka-topic.html
Steps:
kafka-topics --list --zookeeper localhost:2181
kafka-run-class kafka.tools.GetOffsetShell --broker-list localhost:9092 -topic vital_signs --time -1 // -1 for largest, -2 for smallest
set /consumers/{yourConsumerGroup}/offsets/{yourFancyTopic}/{partitionId} {newOffset}