Kafka multiple producer writing to same topic? - apache-kafka

Say I have a topic T1 with 3 partitions i.e. P1,P2 and P3. Where p1 is leader and rest are followers.
Now there are 2 producers want to push to same topic T1. I believe P1 will be leader for both of them ? Also single offset will be maintained
for both of them or offset is maintainer per partition per producer ?
Now I have single consumer which is polling from T1. Will it get messages from both producers by default or it has to explicitly mention producer name if it
wants message from specfic producer ?

Leader is not dependent on the producers or consumers, so p1 will be always returned as a leader. Offsets are not important for producers, they are defined per consumer group. Offset determines, which messages were read and committed by a consumer group.
Consumer will always read all the messages, it does not matter, which producer published them.

You're maybe mixing up replicas and partitions. When you say you have a topic with 3 partitions, it means your records will be dispatched amongs them according to the record key ( or dispatcher algo) .
There is no ' leader partition' . However you have a leader broker that handle a partition. In your case you will have 3 leaders, each of them managing one of your 3 partitions.
An interstingng post here, regarding Kafka partitions:
Understanding Kafka Topics and Partitions
Yannick

Related

Clarification on producer while posting message to the topic

I am a beginner in learning Kafka and was going through topics and producer. As per my understanding
The topic is just a logical name for a group of partitions and the partitions are spread across the nodes.
Is my understanding correct that for a given topic, lets say there are 5 partitions, then all 5 partitions will be on 5 different brokers. And if there is another topic with 5 partitions, then all the 5 partitions will be on 5 brokers. Effectively for this configuration, each of the 5 brokers would have two partitions with each partition of a topic. Am I right?
Another point while the producer is posting a message and the consumer is consuming, is that, the producer will have a list of brokers configured and will post the message to a topic and the list of brokers. The message will always be written to the leader partition. i.e one of the partition on a broker. The message will then be replicated to all the other partitions on other brokers. In this, case, if the producer is configured with only one broker in the producer configuration, does the message be posted to the leader partition in this case too, even in case the broker configuration is not the same as the leader partition for that topic, ex: topic name - events with 5 partitions on 5 brokers. broker-2 is contains the leader partition but the producer is configured with broker-1 alone.
I also read that the producer can specify the partition name also while posting the message. If this is the case, is it not contradicting that the producer will also post the message to the leader partition and if the producer post the message to a custom partition and if the broker containing the custom partition is down, then the message will not be posted. Also in case of distributed systems, it is not a best practice to nail down a specific partition. Am I missing something here?
Does the consumer also reads from the lead partition or the consumer group assigns different consumers to different partition?

How multiple consumers from different consumer groups read from same partition?

I have a use case where i have 2 consumers in different consumer groups(cg1 and cg2) subscribing to same topic(Topic A) with 4 partitions.
What happens if both consumers are reading from same partition and one of them failed and other one commited the offset?
In Kafka the offset management is done by Consumer Group per Partition.
If you have two consumer groups reading the same topic and even partition a commit from one consumer group will not have any impact to the other consumer group. The consumer groups are completely discoupled.
One consumer of a consumer group can read data from a single topic partition. A single consumer can't read data from multiple partitions of a topic.
Example Consumer 1 of Consumer Group 1 can read data of only single topic partition.
Offset management is done by the zookeeper.
__consumer_offsets: Every consumer group maintains its offset per topic partitions. Since v0.9 the information of committed offsets for every consumer group is stored in this internal topic (prior to v0.9 this information was stored on Zookeeper).
When the offset manager receives an OffsetCommitRequest, it appends the request to a special compacted Kafka topic named __consumer_offsets. Finally, the offset manager will send a successful offset commit response to the consumer, only when all the replicas of the offsets topic receive the offsets.
simultaneously two consumers from two different consumer groups(cg1 and cg2) can read the data from same topic.
In kafka 1: Offset management is taken care by zookeeper.
In kafka 2: offsets of each consumer is stored at __Consumer_offsets topic
Offset used for keeping the track of consumers (how much records consumed by consumers), let say consumer-1 consume 10 records and consumer-2 consume-20 records and suddenly consumer-1 got died now whenever the consumer-1 will up then it will start reading from 11th record onward.

How multiple consumer group consumers work across partition on the same topic in Kafka?

I was reading this SO answer and many such blogs.
What I know:
Multiple consumers can run on a single partition when running multiple consumers with multiple consumer group id and only one consumer from a consumer group can consume at a given time from a partition.
My question is related to multiple consumers from multiple consumer groups consuming from the same topic:
What happens in the case of multiple consumers(different groups) consuming a single topic(eventually the same partition)?
Do they get the same data?
How offset is managed? Is it separate for each consumer?
(Might be opinion based) How do you or generally recommended way is to handle overlapping data across two consumers of a separate group operating on a single partition?
Edit:
"overlapping data": means two consumers of separate consumer groups operating on the same partition getting the same data.
Yes they get the same data. Kafka only stores one copy of the data in the topic partitions' commit log. If consumers are not in the same group then they can each get the same data using fetch requests from the clients' consumer library. The assignment of which partitions each group member will get is managed by the lead consumer of each group. The entire process in detailed steps is documented here https://community.hortonworks.com/articles/72378/understanding-kafka-consumer-partition-assignment.html
Offsets are "managed" by the consumers, but "stored" in a special __consumer_offsets topic on the Kafka brokers.
Offsets are stored for each (consumer group, topic, partition) tuple. This combination is also used as the key when publishing offsets to the __consumer_offsets topic so that log compaction can delete old unneeded offset commit messages and so that all offsets for the same (consumer group, topic, partition) tuple are stored in the same partition of the __consumer_offsets topic (which defaults to 50 partitions)
Each consumer group gets every message from a subscribed topic.
Yes
Offset are stored by partition. For example let's say you have a topic with 2 partitions and a consumer group named cg made up of 2 consumers. In that case Kafka assigns each of the consumers one of the partitions. Then the consumers fetch the offset for the partition they were assigned to from Kafka (e.g. consumer 'asks' Kafka: "What is the offset for this topic for consumer group cg partition 1", or partition 2 for the other consumer). After getting the correct offset the consumer polls some Kafka broker for the next message in that partition.
I'm not entirely sure what you mean by overlapping data, can you clarify a bit or give an example?

Kafka partitions and consumer groups for at-least-once message delivery

I am trying to come up with a design using Kafka for a number of processing agents to process messages from a Kafka topic in parallel.
I would like to ensure close to exactly-once per message processing across the whole consumer group, although can tolerate at-least-once.
I find the documentation unclear in many regards, and there are a few specific questions I have to know if this is a viable approach:
if a message is published to a topic, does it exist once only across all partitions in the topic or is it replicated on possibly more than one partition? I have read statements that could support both possibilities.
is the "offset" per partition or per consumer/consumergroup/partition?
when I start a new consumer, does it look at the offset for the consumer group as a whole or for the partition it is assigned?
if I want to scale up new consumers and there are no free partitions (I believe there can be not more than one consumer per partition), will kafka rebalance existing messages from the existing partitions, and how does that affect the offsets and consumers of existing partitions?
Or are there any other points I am missing that may help my understanding of this?
if a message is published to a topic, does it exist once only across all partitions in the topic or is it replicated on possibly more than one partition? I have read statements that could support both possibilities.
[A]: the partition is replicated across nodes depending on replication factor. if you have partition P1 in a broker with 2 nodes and replication factor of 2, then, node1 will be primary leader for P1 and node2 will also have the P1 contents/messaged but it will be the replica (and replication happens in async manner)
is the "offset" per partition or per consumer/consumergroup/partition?
[A]: per partition from a broker standpoint. its also per consumer since 'offset' is explicitly tracked/managed on the consumer end. The consumer code can delegate this work to Kafka or manage the offsets manually
when I start a new consumer, does it look at the offset for the consumer group as a whole or for the partition it is assigned?
[A]: kafka would trigger a rebalance when a new consumer enters the group and assign certain partitions to it. from there on, the consumer will only care about the offsets of the partitions which it is responsible for
if I want to scale up new consumers and there are no free partitions (I believe there can be not more than one consumer per partition), will kafka rebalance existing messages from the existing partitions, and how does that affect the offsets and consumers of existing partitions?
[A] for parallelism, the ideal scenario is to have 1-1 mapping b/w consumer and partition e.g. if you have 10 partitions, you can have at max 10 consumers. If you bring in the 11th one, kafka wont assign partitions to it unless an existing consumer leaves the group

Kafka consumer & Partition query

I am new to Kafka and read few tutorials. I couldn't understand the relationship between consumer and partition.
Please address my below queries.
As per documentation, only one consumer can consume message in group. Why do we need to create more consumers in that same group? What is the benefit?
Does consumer are assigned to individual partition by ZK? , if Yes, if producer sends message to different partition then how will other partition’s consumer consume the message ?
I have one topic and that has 3 partitions. I post msg, it goes to P0. I have 5 consumers (different consumer group). Will all consumers read message from P0? if I increase many Consumer, will all read message from same P0 ?
If all consumer read from same PO then how performance will be high?
How rebalancing is working? will it work when you increase consumer group or consumer in same group ?
Please clarify my questions and give some example.
Yes, only once consumer in consumer group can consume message from one partition, rest of consumer in the same group will be assigned to remaining partition to do parallel process. Advantage is parallel processing.
Yes partition will be assigned to consumer by ZK. Based on partition count and consumer count, allocation will be done. Ex: Topic (Test) has 3 Partition (P1, P2, and P3). We have one consumer (C1). C1 will read message from all partition. If you add one more consumer in that same group (c2). ZK will assign P1, p2 to C1 and P3 goes to C2. If add one more consumer (C3) than P1=C1, P2=C2 and P3=C3. No of consumer should not be greater than no of partition for that topic.
Above point will answer this one.
Rebalancing will work when you add consumer on the same consumer group.