Kafka Consumer does not receive data when one of the brokers is down - apache-kafka

Kafka Quickstart
Using Kafka v2.1.0 on RHEL v6.9
Consumer fails to receive data when one of the Kafka brokers is down.
Steps performed:
1. Start zookeeper
2. Start Kafka-Server0 (localhost:9092, kafkalogs1)
3. Start Kafka-Server1 (localhost:9094, kafkalog2)
4. Create topic "test1", num of partitions = 1, replication factor = 2
5. Run producer for topic "test1"
6. Run consumer
7. Send messages from the producer
8. Receive messages on the consumer side.
All the above steps worked without any issues.
When I shutdown Kafka-Server0, the consumer stops getting data from Producer.
When I bring back up Kafka-Server0, the consumer starts to get messages from where it left off.
These are the commands used
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test1
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test1
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 2 --partitions 1 --topic test1
The behavior is the same (no message received on the consumer side) when I run the consumer with two servers specified in the --bootstrap-server option.
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092,localhost:9094 --topic test1
Any idea why the consumer stops getting messages when server0 is down even though the replication factor for the topic test1 was set to 2?
There is a similar question already but it was not answered completely
Kafka 0.10 quickstart: consumer fails when "primary" broker is brought down

If the offsets topic is unavailable, you cannot consume.
Look at the server.properties file for these, and see the comment above, and increase accordingly (only applies if topic doesn't already exist)
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
According to your previous question, looks like it only has one replica
See how you can increase replication factor for an existing topic

In initial versions of Kafka, offset was being managed at zookeeper, but Kafka has continuously evolved over the time introducing lot of new features. Now Kafka manages the offset in a topic __consumer_offsets.
You can think of a scenario where you created a topic with a replication factor of 1. In case the broker goes down the data is only on that Kafka node which is down. So you can't get this data. Same analogy applies to __consumer_offsets topic.
You need to revisit the server.properties in order to get features you are expecting. But in case you still wanna consume the messages from the replica partition, you may need to re-start the console consumer with --from-beginning true

Related

Kafka is not sending messages to other partitions

Apache Kafka installed on Mac (Intel).
Single local producer and single local consumer.
1 topic with 3 partitions and 1 replication factor is created:
bin/kafka-topics.sh --bootstrap-server localhost:9092 --create --topic animal --partitions 3 --replication-factor 1
Producer code:
bin/kafka-console-producer.sh --bootstrap-server localhost:9092 --topic animal
Producer Messages:
>alligator
>crocodile
>tiger
When producing messages (manually via producer-console), all go into the same partition. Shouldn't they get distributed across partitions?
I've tried with 3 records (as above), but they get sent to 1 partition only. Checked within tmp/kafka-logs/topic-0/00**00.log
Other logs in topic- are empty.
I've tried with tens of records, but no luck.
I even increased the default partition configuration (num.partitions=3) within 'config/server.properties', but no luck.
I've also tried with different topics, but no luck.
Starting with kafka 2.4, the default partitioner was changed from round-robin to sticky, which will stick to the same partition (pun intended) for an entire batch.
With my kafka version, the kafka-console-producer uses a default batch size of 16384, so once you produce enough messages to fill that buffer, the partition will change.
If a producer, produces messages with the same key then it’s guaranteed to be produced on the same partition. so in your case if you want it to be consumed by different partitions than make sure to publish it with different keys.
You will need to set below property.
--property parse.key=true
See below command to produce record with key.
kafka-console-producer --broker-list 127.0.0.1:9092 --topic first_topic --property parse.key=true --property key.separator=,
> key1,value1
> key2,value2

Getting Kafka usage details

I am trying to find ways to get current usage statistics for my kafka cluster. I am looking to collect following information:
Number of topics in kafka cluster
Number of partitions per kafka broker
Number of active consumers and producers
Number of client connections per kafka broker
Number of messages on each partition, size of disk etc.
Lagging replicas, consumer lag etc.
Active consumer groups
Any other statistics that can and should be collected, currently I am looking at collecting the above stats.
I can get 1 and 2 using zookeeper utilities but I am lost on rest. I have looked at mbeans in Jconsole but didn't find anything about above. I also tried JmxTool to get these mbeans using regex based expression but that also didn't work.
I am using Kafka v2.1 and using new consumer api so zookeeper doesn't have any information about consumers.
Any pointers would be great help!
Might as well use https://github.com/yahoo/kafka-manager or https://github.com/linkedin/cruise-control to get this information.
There are scripts under $KAFKA_HOME/bin which can help you.
Number of topics in kafka cluster
./kafka-topics.sh --zookeeper localhost:2181 --list
Number of partitions per kafka broker
./kafka-topics.sh --zookeeper localhost:2181 --describe
Number of messages on each partition, size of disk etc.
./kafka-log-dirs.sh --describe --bootstrap-server localhost:9092
Lagging replicas, consumer lag etc.
./kafka-consumer-group.sh --bootstrap-server localhost:9092 --group $GROUP_NAME --describe
Active consumer groups
Number of active consumers and producers
You can't get active producer.
Know existing producers for a kafka topic
./kafka-consumer-group.sh --bootstrap-server localhost:9092 --list
Number of client connections per kafka broker
./

Kafka: Consumer can't read records from topic

We're using Kafka streams to write data to a sink topic. I'm running an avro-consumer command line to check if there's data in the sink topic:
bin/kafka-avro-console-consumer --topic sink.output.topic --from-beginning --new-consumer --bootstrap-server
I see data when I simultaneously run the consumer while kafka streams application is running but if I stop the consumer and run again after a few minutes, I don't see any data. Few possibilities:
1) Is this because the kafka streams is wiping out the records from the output topic every time it pushes records to sink?
2) Or is this just a consumer related issue?
I believe that this is because --from-beginning is used only when consumer didn't establish offset yet. Have you tried to use --offset earliest instead?
From you description issues seems to be with the retention time.
data might have removed when you run 2nd time.
you can configure the retention time
Example : log.retention.hours=168

Kafka 10.2 new consumer vs old consumer

I've spent some hours to figure out what was going on but didn't manage to find the solution.
Here is my set up on a single machine:
1 zookeeper running
3 broker running (on port 9092/9093/9094)
1 topic with 3 partitions and 3 replications (each partition are properly assigned between brokers)
I'm using kafka console producer to insert messages. If i check the replication offset (cat replication-offset-checkpoint), I see that my messages are properly ingested by Kafka.
Now I use the kafka console consumer (new):
sudo bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic testTopicPartitionned2
I dont see anything consumed. I tried to delete my logs folder (/tmp/kafka-logs-[1,2,3]), create new topics, still nothing.
However when I use the old kafka consumer:
sudo bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic testTopicPartitionned2
I can see my messages.
Am I missing something big here to make this new consumer work ?
Thanks in advance.
Check to see what setting the consumer is using for auto.offset.reset property
This will affect what a consumer group without a previously committed offset will do in terms of setting where to start reading messages from a partition.
Check the Kafka docs for more on this.
Try providing all your brokers to --bootstrap-server argument to see if you notice any differnce:
sudo bin/kafka-console-consumer.sh --bootstrap-server localhost:9092,localhost:9093,localhost:9094 --from-beginning --topic testTopicPartitionned2
Also, your topic name is rather long. I assume you've already made sure you provide the correct topic name.

How do I add a backup producer in Apache Kafka?

I am a Kafka newbie. I have order/market data multicast (over Tibco Rendezvous) and my Kafka producer is listening to it and publishing it on a topic all in 1 partition to a broker (I have a list of brokers and a Zookeeper ensemble of 3 nodes) tolerating Zookeeper and broker failures.
However, persistence in Kafka brokers though necessary won't be sufficient if my producer goes down as I would have lost the messages multicast anyway. My consumer commits offsets after every message as it cannot double process a single message.
I was thinking of having a backup producer publish on a different topic, but how would the consumer know where to start picking off even if Kafka allows leeway to restart the consumer.
Additionally I might not have a unique identifier on the incoming message.
The Data what ever you have produced from the producer is stored in the Consumer. Which means in Kafka-logs. If you need the backup then a simple solution for this is to put the replica of the Topic which you created.bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 2 --topic topicname