How can I control request/message send by a Kafka cluster? - apache-kafka

Suppose, I have 3 Kafka broker, a zookeeper, 50 producers, 50 consumers, and 1 topics (testTopic1).
And All the consumer are subscribed to testTopic1. Now I will send 50 messages at the same time with the 50 producers to the same topic (testTopic1) . Now I want that Kafka cluster do not send more than 40 messages at the same time to consumers. The remaining 10 will keep on queue or drop it.
Maybe it is a load balancing in Kafka.
I do not understand how I will do this work. Im new in Kafka please help.

Kafka brokers are dumb. They cant limit/remove message published to kafka.
If all kafka consumers are part of same consumer group, and there are 50 consumers, then all consumers may or may not receive all those 50 messages at same time, depending on the key. If multiple messages have same key then all same key messages will be listened by single consumer one by one. If all 50 messages have distinct keys, then they they may or may not (depending on hash of the key) will be listened by same or different consumers.
Can you explain your use case more for better understanding.

Kafka broker cannot drop messages randomly. But you can implement logic within consumer to drop the message while processing.
If you have a single topic and single partition for that topic; one among your consumers belong to the same consumer group will process all your messages since partition guaranteed ordering in processing in consumer end.
If you have 10 consumer groups and each belongs to 5 consumers and there is a single partition for the topic, at least 10 consumers process your message from topic. In case one of the consumer from consumer-group-1 fails to process the message, another consumer from same consumer group will process the message.
If you have the requirement to drop randomly 1 out 10 messages while processing, you can achieve it by adjusting the logic in consumer end. But as per consumer group offset according to broker all data is processed in its end, if system configured to maintain offset management in brokers side.

Related

Can I have multiple consumers process messages from a single queue in Apache Kafka

What I want to achieve is this:
Subscribe multiple consumers to a single topic
Each message should be processed by only one consumer
No consumer should be idle as long as the topic has unprocessed messages
As far as I understand I can get the first two points by defining multiple partitions for that topic, at least one partition per consumer. But that doesn't satisfy my 3rd requirement.
Assume I created a topic with 3 partitions and subscribe 3 consumers (same group id). Then a producer pushes a bulk of 300 messages which are equally distributed to all three partitions. So each partition contains 100 messages and consumers start to process. For whatever reasons one consumer takes longer and at some point when 2 consumers have already processed all messages of their partitions, the 3rd consumer still has several messages left to process.
In that scenario the 2 fast consumers would fall idle while the 3rd one is still processing messages.
What I have in mind is something like a topic with only one partition and all consumers subscribed share the same offset index. Then, whenever a consumer is idle it will fetch the next message from the topic that hasn't been processed by any of the consumers yet. I know that Kafka cannot have multiple consumers of the same group on one partition. It's just to explain my intentions.
Is there a way to configure my topology to meet my requirements?

Does consumer consume from replica partitions if multiple consumers running under same consumer group?

I am writing a kafka consumer application. I have a topic with 4 partitions - 1 is leader and 3 are followers. Producer uses key to identify a partition to push a message.
If I write a consumer and run it on different nodes or start 4 instances of same consumer, how message consuming will happen ? Does all 4 instances will get same messages ?
What happens in the case of multiple consumer(same group) consuming a single topic?
Do they get same data?
How offset is managed? Is it separate for each consumer?
I would suggest that you read at least first few chapters of confluent's definitive guide to kafka to get a priliminary understanding of how kafka works.
I've kept my answers brief. Please refer to the book for detailed explanation.
How offset is managed? Is it separate for each consumer?
Depends on the group id. Only one offset is managed for a group.
What happens in the case of multiple consumer(same group) consuming a single topic?
Consumers can be multiple - all can be identified by the same or different groups.
If 2 consumers belong to the same group, both will not get all messages.
Do they get same data?
No. Once a message is sent and a read is committed, the offset is incremented for that group. So a different consumer with the same group will not receive that message.
Hope that helps :)
What happens in the case of multiple consumer(same group) consuming a single topic?
Answer: Producers send records to a particular partition based on the record’s key here. The default partitioner for Java uses a hash of the record’s key to choose the partition. When there are multiple consumers in same consumer group, each consumer gets different partition. So, in this case, only single consumer receives all the messages. When the consumer which is receiving messages goes down, group coordinator (one of the brokers in the cluster) triggers rebalance and then that partition is assigned to one of the available consumer.
Do they get same data?
Answer: If consumer commits consumed messages to partition and goes down, so as stated above, rebalance occurs. The consumer who gets this partition, will not get messages. But if consumer goes down before committing its then the consumer who gets this partition, will get messages.
How offset is managed? Is it separate for each consumer?
Answer: No, offset is not separate to each consumer. Partition never gets assigned to multiple consumers in same consumer group at a time. The consumer who gets partition assigned, gets offset as well by default.

Can I have all the consumers of a group consume message from all the partitions of a kafka topic?

Let's say in Kafka I have 4 partitions of a topic 'A' and I have 20 consumers of Consumer Group 'AC'. I don't need any ordering, but I want to process the messages faster by scaling my consumer instances. Please note all messages are independent and can be processed independently.
I looked at a consumer configuration partition.assignment.strategy, but not sure if I can achieve dynamic assignment of consumer to partition, depending on the message availability.
One partition is assigned to exactly one consumer in the group. In your case you have only 4 consumers on 20 which are currently working. You have to increase partitions number if you want more assigned consumers.

Scaling up kafka consumer applications

Lets say I have one consumer group which subscribed to 4 topics and partitions for each topics are:-
EDITED:
First topic => 5 partitions
Second topic => 3 partitions
Third topic => 2 partitions
Fourth topic => 1 partitions
Total number of partitions = 11. So total how many applications I can run.
5(max number of partitions in input topics) or 11?
In kafka, scaling consumers depends on partition number.
Lets assume you have one topic with 3 partitions. And you have 2 different consumer app (different consumer groups) which does different works.
You can scale your consumer number up to 3 for per consumer group.
Single consumer (consumer group A) can consume messages from 3
partitions.
Two consumer (same consumer group) can not consume single
partition.
Take look at image : https://hadoopabcd.files.wordpress.com/2015/04/consumer-group.png
Read more about consumer groups blog series : https://dzone.com/articles/understanding-kafka-consumer-groups-and-consumer-l
In ideal situation the number of consumer in the consumer group should be equal to the number of partition. If that is not the case then you can have more then one consumer group kafka provides the feature that 2 consumer from the different consumer group can read from the same partition. That’s totally depends on your resources how many resources do you have for running the consumers.
Suppose you have an application that needs to read messages from a Kafka topic, run some validations against them, and write the results to another data store. In this case your application will create a consumer object, subscribe to the appropriate topic, and start receiving messages, validating them and writing the results. This may work well for a while, but what if the rate at which producers write messages to the topic exceeds the rate at which your application can validate them? If you are limited to a single consumer reading and processing the data, your application may fall farther and farther behind, unable to keep up with the rate of incoming messages. Obviously there is a need to scale consumption from topics. Just like multiple producers can write to the same topic, we need to allow multiple consumers to read from the same topic, splitting the data between them.
Kafka consumers are typically part of a consumer group. When multiple consumers are subscribed to a topic and belong to the same consumer group, each consumer in the group will receive messages from a different subset of the partitions in the topic.
Please refer to this https://www.safaribooksonline.com/library/view/kafka-the-definitive/9781491936153/ch04.html

Kafka alway one consumer consume the topic message in one group

I have two consumer servers with same group id subscribed the same topic.
A kafka server is running with only one partition.
As far as I know, the message should be consumed randomly in those two consumer servers.
But now it seems to be always the same consumer server A consume messages, another one does not consume messages.If I stop consumer server A, another one will work fine.
What I expect that they can consume message randomly.
To be able to use two consumer instances in parallel you need at least two partitions in the topic. A consumer will bind to one or more partitions of a topic and other consumers with the same groupId will not claim partitions which already have consumers bound to them. If a consumer fails/crashes, the partition will be released and then picked up by another consumer instance.