It's odd that I can't easily find this information online, but I could not. So, my question is that I have one Kafka topic. I want 2 consumer groups independently of each other consume messages from this topic. So, what I want is that both consumer groups to be able to see and consume all messages independent of each other.
Just to be even more clear, I don't want some messages to be consumed by one consumer group, and some messages to be consumed by the other group; no, I want the 2 groups consume messages as if the other group doesn't even exist.
Yes, separate consumer groups are completely independent, so they all see all messages.
Partitioning of resources (topics/partitions) only happen within groups.
Related
So i have been trying to get my head around consumer groups and kafka, i understand how a single consumer group work which has multiple consumers each consuming from a partition.
my question is if i have multiple consumer groups, does that mean that each group consumes the same message ie like a fan out. Or does it mean that the topic is further split into partitions. if its the latter, then is it the same as having one consumer group and multiple consumers? whats the difference or reasoning? i understand it has its own offset, which would mean that the messages are not duplicated?
In general, the consumer group will consume all messages from all partitions from given topic and two consumers in the same group will never consume the same partition.
This means that if You have 10 partitions and 2 consumers with the same consumer group, in general every consumer will consume all messages from 5 separate partitions and they will never consume the same messages. They will divide all work between each other.
On the other hand, if You have 10 partitions and 2 consumers with different consumer groups, both consumers will consume all 10 partitions from the topic, so overall the messages would be read twice.
Consumer groups are a mechanism to make load splitting as easy as possible, You can just create multiple consumers and assign them a single group and kafka will make sure they don't consume the same messages and thus the load is split. This is also why, when You will have more consumers than partitions, some of the consumers will be idle. If You for some reason want to pass messages to multiple places/applications & You want to make sure they all receive all messages You can simply assign multiple group ids.
As Kafka has a topic based pub-sub architecture how can I handle One-to-One and Group Messaging part of web application using Kafka?
I am using SpringBoot+Angular stack and Docker Kafka server.
I'll write another answer here.
Based on my experience with the chatting service. You only need one topic for all the messages. Using a well designed Message body.
public class Message {
private String from; // user id
private String to; // user id or group id
}
Then you can create like 100 partitions for this topic and create two consumers to consume them (50 partitions for one consumer in the beginning).
Then if your system reaches the bottleneck, you can easier scale X more consumers to handle the load.
How to do distribute the messages in the consumer. I used to send the messages to the Mobile app, so all the app has a long-existing connection to the server, and the server sends the messages to the app by that channel. For group chat, I create a Redis cache to store all the active users in the group, so I can easier get the users who belong to this group, send them the messages.
And another thing, Kafka stateless, means Kafka doesn't de-coupled from the business logic, only acts as a message system, transfers the messages. If you connect your business logic to Kafka, like create a topic "One-to-One" and delete some after they finished, Kafka will be very messy.
By One-to-One, I suppose you mean one producer and one consumer i.e. using at as a queue.
This is certainly possible with Kafka. You can have one consumer subscribe to a topic and and restrict others by not giving them authorization . See Authorization in Kafka
Note that once a message is consumed, it is not deleted, rather it is committed so that the same consumer will not consume it again.
By Group Messaging, I suppose you mean one producer > multiple consumers or
multiple-producer > multiple-consumers
This is also possible, a producer can produce messages to a topic and multiple consumers can consume them.
If all the consumers have the same group id, then each consumer in the group gets only a subset of messages.
If they have different group ids then each consumer will get all messages.
Multiple producers also can produce to the same topic.
A consumer can also subscribe to multiple topics.
Ok, It's a very complicated question, I try to type some simple basic information.
Kafka topics are divided into a number of partitions. Partitions allow you to parallelize a topic by splitting the data in a particular topic across multiple brokers — each partition can be placed on a separate machine to allow for multiple consumers to read from a topic in parallel.
So if you are using partitions, means you have multiple consumers to consume some in parallel.
consumer groups for a given topic — each consumer within the group reads from a unique partition and the group as a whole consumes all messages from the entire topic.
Basically, you can have only one group, then the message will not be processed twice in the same consumer group, and this is how Kafka delivers exactly once.
If you need two consumer groups, you need to think about why you need two? Are the consumers in two groups handling the different logic?
There is more, please check the official document, or you can answer a smaller question.
The apache kafka documentation mentions the following :
If all the consumer instances have the same consumer group, then the
records will effectively be load balanced over the consumer instances.
If all the consumer instances have different consumer groups, then
each record will be broadcast to all the consumer processes.
this makes things a bit unclear for me when thinking about partitions, does that second statement mean that if i have multiple consumer groups, does that mean that each consumer in each group will read all the records in all partitions ?!!
Still the photo they used in the documentation does not agree with the above as per my humble understanding.
In fact i was reading through a great article, kafka in a nutshell and the quoted statements below conform much better with the photo provided in the documentation.
Consumers can also be organized into consumer groups for a given topic
— each consumer within the group reads from a unique partition and the
group as a whole consumes all messages from the entire topic. If you
have more consumers than partitions then some consumers will be idle
because they have no partitions to read from. If you have more
partitions than consumers then consumers will receive messages from
multiple partitions. If you have equal numbers of consumers and
partitions, each consumer reads messages in order from exactly one
partition.
I was hoping someone could shed some light on the above and explain clearly a scenario based on Apache's official documentation.
does that mean that each consumer in each group will read all the records in all partitions ?!!
No. The statement assumes that each group has exactly one consumer (as indicated by "If all the consumer instances have different consumer groups").
So your overall understanding is correct. If you have multiple consumer groups a message will be sent to each group once.
If you have less consumers than partitions, does that simply mean you will not consume all the messages on a given topic?
In a cloud environment, how are you suppose to keep track how many consumers are running and how many are pointing to a given topic#partition?
What if you have multiple consumers on a given topic#partition? I guess the consumer has to somehow keep track of what messages it has already processed in case of duplicates?
In fact, each consumer belongs to a consumer group. When Kafka cluster sends data to a consumer group, all records of a partition will be sent to a single consumer in the group.
If there're more paritions than consumers in a group, some consumers will consume data from more than one partition. If there're more consumers in a group than paritions, some consumers will get no data. If you add new consumer instances to the group, they will take over some partitons from old members. If you remove a consumer from the group (or the consumer dies), its partition will be reassigned to other member.
Now let's take a look at your questions:
If you have less consumers than partitions, does that simply mean you will not consume all the messages on a given topic?
NO. Some consumers in the same consumer group will consume data from more than one partition.
In a cloud environment, how are you suppose to keep track how many consumers are running and how many are pointing to a given topic#partition?
Kafka will take care of it. If new consumers join the group, or old consumers dies, Kafka will do reblance.
What if you have multiple consumers on a given topic#partition?
You CANNOT have multiple consumers (in a consumer group) to consume data from a single parition. However, if there're more than one consumer group, the same partition can be consumed by one (and only one) consumer in each consumer group.
1) No that means you will one consumer handling more than one consumer.
2) Kafka never assigns same partition to more than one consumer because that will violate order guarantee within a partition.
3) You could implement ConsumerRebalanceListener, in your client code that gets called whenever partitions are assigned or revoked from consumer.
You might want to take a look at this article specically "Assigning partitions to consumers" part. In that i have a sample where you create topic with 3 partitions and then a consumer with ConsumerRebalanceListener telling you which consumer is handling which partition. Now you could play around with it by starting 1 or more consumers and see what happens. The sample code is in github
http://www.javaworld.com/article/3066873/big-data/big-data-messaging-with-kafka-part-2.html
I'm reading the Kafka FAQ, where they are specifying as below.
•Each partition is not consumed by more than one consumer thread/process in each consumer group. This allows to have each process consume in a single threaded fashion to guarantee ordering to the consumer within the partition (if we split up a partition of ordered messages and handed them out to multiple consumers even though the messages were stored in order they would be processed out of order at times).
Is it possible that, One Partition can be consumed by Multiple Consumers in different consumer groups?
If yes, how it is managing the duplicate message read?
Update:
Actually I wanted to ask , I have one partition with 10 messages, and if I have 2 different consumer groups[group1, group2], is it possible consumer group1 reads first 5 messages and Consumer group2 reads another 5 messages?
It's not possible (at least not designed to). The goal of having different consumer groups is exactly to be able to process same messages for a different purpose.