Consumer with multiple partitions are not interleaved - apache-kafka

I am trying to run the simple example as shown in https://projectreactor.io/docs/kafka/release/reference/#_sample_consumer. I see the output that is described in the link however I am confused if this is the expected output. Specifically the link says
The 20 messages published by the Producer sample should appear on the
console. As shown in the output above, messages are consumed in order
for each partition, but messages from different partitions may be
interleaved.
The output in the link is what I seem to be getting too. However everything in partition 1 is consumed first followed by partition 0. What I actually expected was one message from partition 0, a couple from partition 1 then a couple or so from partition 0 and so on (although inside the partition the messages are as expected ordered).
When I run locally I get same output too. Is this something I am missing?

What you're seeing is expected behavior for a very small amount of messages. The consumer will interleave when consuming from multiple partitions, but only with a large quantity of messages.
What happens is that Kafka consumers work in "batches". They poll every so often, and if the 10 messages or so in one partition are small enough to fit in one poll request or "batch", then the consumer will simply consume them all at the same time, before even getting to the next partition. That's why you're not seeing this interleaving effect with 20 messages.
If you retry your test with 20K messages, you should see the interleaving behavior much more clearly.

+1 to #mjuarez 's answer. Just wanted to add that you may also be able to reproduce interleaving messages if you reduce the max.poll.records for your consumer to 1 (the default is 500) thus forcing it to process one message at a time.
From Kafka Reference:
NAME: max.poll.records
DESCRIPTION: The maximum number of records returned in a single call to poll().
TYPE: int
DEFAULT: 500
VALID VALUES: [1,...]
IMPORTANCE: medium

Related

How does one Kafka consumer read from more than one partition?

I would like to know how one consumer consumes from more than one partition, specifically, in what order are messages read from the different partitions?
I had a peek at the source code (Consumer, Fetcher) but I can't really follow all of it.
This is what I thought would happen:
Partitions are read sequentially. That is: all the messages in one partition will be read before continuing to the next. If we reach max.poll.records without consuming the whole partition, the next fetch will continue reading the current partition until it is exhausted, before going on to the next.
I tried setting max.poll.records to a relatively low number and seeing what happens.
If I send messages to a topic and then start a consumer, all the messages are read from one partition before continuing to the next, even if the number of messages in that partition is higher than max.poll.records.
Then I tried to see if I could "lock" the consumer in one partition, by sending messages to that partition continuously (using JMeter). But I couldn't do it: messages from other partitions were also being read.
The consumer is polling for messages from its assigned partitions in a greedy round-robin way.
e.g. if max.poll.records is set to 100, and there are 2 partitions assigned A,B. The consumer will try to poll 100 from A. If partition A hasn't had 100 available messages, it will poll whats left to complete to 100 messages from partition B.
Although this is not ideal, this way no partition will be starved.
This is also explain why ordering is not guaranteed between partitions.
I have read the KIP mentioned in the answer to the question linked in the comments and I think I finally understood how the consumer works.
There are two main configuration options that affect how data is consumed:
max.partition.fetch.bytes: the maximum amount of data that the server will return for a given partition
max.poll.records: the maximum amount of records that are returned each time the consumer polls
The process of fetching from each partition is greedy and proceeds in a round-robin way. Greedy means that as many records as possible will be retrieved from each partition; if all records in a partition occupy less than max.partition.fetch.bytes, then all of them will be fetched; otherwise, only max.partition.fetch.bytes will be fetched.
Now, not all the fetched records will be returned in a poll call. Only max.poll.records will be returned.
The remaining records will be retained for the next call to poll.
Moreover, if the number of retained records is less than max.poll.records, the poll method will start a new round of fetching (pre-fetching) before returning. This means that, usually, the consumer is processing records while new records are being fetched.
If some partitions receive considerably more messages than others, this could lead to the less active partitions not being processed for long periods of time.
The only downside to this approach is that it could lead to some partitions going unconsumed for an extended amount of time when there is a large imbalance between the partition's respective message rates. For example, suppose that a consumer with max messages set to 1 fetches data from partitions A and B. If the returned fetch includes 1000 records from A and no records from B, the consumer will have to process all 1000 available records from A before fetching on partition B again.
In order to prevent this, we could reduce max.partition.fetch.bytes.

How to evenly distribute messages over partitions in Kafka?

Setting the stage..
Here's a diagram to help explain my problem better:
Now, keep in mind the following points:
I have a producer sending messages to 8 partitions of My topic.
On the other side, I have 8 consumers, one for each partition.
The legacy system has limited resources, and can process at most 8 simultaneous requests.
To make sure I don't overwhelm the legacy system, a consumer will only send one request at a time. Any new message will wait for the current message to finish processing.
Explaining the problem..
Since messages are blocked until the previous message is processed, I want to minimize the time a message will wait before it's processed. To do that I need messages to be distributed equally over the partitions. A massage must not be consumed by a busy consumer when another is free.
For example, if 8 messages are produced simultaneously, each message should be sent to one partition. Therefore, each message will be consumed by one consumer, ensuring the messages are processed concurrently without any lag.
What I tried so far
Since the partitions are assigned correctly to the consumers, I had to assume the producer wasn't evenly delivering messages to the partitions. Which turned out to be the case. Here's what I tried so far to resolve the issue...
Using null keys
The most intuitive solution was to produce records without keys which will basically make the DefaultPartitioner behave like the RoundRobinPartitioner. unfortunately, this solution did not work.
Using null keys and batch.size=0
Since using null keys didn't work, It made sense that messages were being sent in batches breaking the even distribution. Setting the batch size to 0 should've caused the producer to send messages one by one. That didn't work either.
Using RoundRobinPartitioner
This one was weird. The RoundRobinPartitioner distributed messages evenly, but it only used 4 out of the 8 partitions.
Using RoundRobinPartitioner and batch.size=0
This made no difference.
Finally, my question:
I need the producer to send messages in Round Robin fashion one by one without batching. How can I do that?
TL;DR
I need the producer to send messages in Round Robin fashion without batching. How can I do that?

Kafka fetch max bytes doesn't work as expected

I have a topic worth 1 GB of messages. A. Kafka consumer decides to consume these messages. What could I do to prohibit the consumer from consuming all messages at once? I tried to set the
fetch.max.bytes on the broker
to 30 MB to allow only 30 MB of messages in each poll. The broker doesn't seem to honor that and tries to give all messages at once to the consumer causing Consumer out of memory error. How can I resolve this issue?
Kafka configurations can be quite overwhelming. Typically in Kafka, multiple configurations can work together to achieve a result. This brings flexibility, but flexibility comes with a price.
From the documentation of fetch.max.bytes:
Records are fetched in batches by the consumer, and if the first record batch in the first non-empty partition of the fetch is larger than this value, the record batch will still be returned to ensure that the consumer can make progress.
Only on the consumer side, there are more configurations to consider for bounding the consumer memory usage, including:
max.poll.records: limits the number of records retrieved in a single call to poll. Default is 500.
max.partition.fetch.bytes: limits the number of bytes fetched per partition. This should not be a problem as the default is 1MB.
As per the information in KIP-81, the memory usage in practice should be something like min(num brokers * max.fetch.bytes, max.partition.fetch.bytes * num_partitions).
Also, in the same KIP:
The consumer (Fetcher) delays decompression until the records are returned to the user, but because of max.poll.records, it may end up holding onto the decompressed data from a single partition for a few iterations.
I'd suggest you to also tune these parameters and hopefully this will get you into the desired state.

Kafka - timestamp order

Assume I'm using log.message.timestamp.type=LogAppendTime.
Also assume number of messages per topic/partition during first read:
topic0:partition0: 5
topic0:partition1: 0
topic0:partition2: 3
topic1:partition0: 2
topic1:partition1: 0
topic1:partition2: 4
and during second read:
topic0:partition0: 5
topic0:partition1: 2
topic0:partition2: 3
topic1:partition0: 2
topic1:partition1: 4
topic1:partition2: 4
If I read first message from each partition, does Kafka guarantee that reading again from each partition won't return a message that's older than those I read during first read?
Focus on topic0:partition1 and topic1:partition1 which didn't have any messages during first read, but have during second read.
Kafka guarantees message ordering at partition level, so your use case perfectly fits kafka's architecture.
There are some concepts to explain in here. First of all, you have the starting consumer position (when you first launch a new consumer group), defined by the auto.offset.reset parameter.
This will kick in only if there's no saved offset for that group, or if a saved offset is not valid anymore (f.e, if it was already deleted by retention policies). You should normally only worry for this if you launch a new consumer group (and you want to decide wether it starts from the oldest messages, or from the present - newest one).
Regarding your example, in normal conditions (there are no consumer shutdowns, etc), you have nothing to worry about. Consumers within a same consumer group will only read their messages once, no matter the number of partitions nor the number of consumers. These consumers remember their last read offset, and periodically save it in the _consumer_offsets topic.
There are 2 properties that define this periodical recording:
enable.auto.commit
Setting it to true (which is the default value) will allow the automatic commit to the _consumer_offsets topic.
auto.commit.interval.ms
Defines when the offsets are commited. For example, with a value of 10000, your consumer offsets will be stored every 10 seconds.
You can also set enable.auto.commit to false and store your offsets in your own way (f.e to a database, etc), but this is a more special use case.
The auto offset committing will allow you to stop your consumers, and start them again later without losing any message nor reprocessing already processed ones (it's like a mark in a book's page). If you don't stop your consumers (and without any errors from broker/zookeeper/consumers), even less worries for you.
For more info, you can take a look here: https://docs.confluent.io/current/clients/consumer.html#concepts
Hope it helps!

How can I scale Kafka consumers?

I'm reading the Kafka documentation and noticed the following line:
Note however that there cannot be more consumer instances in a consumer group than partitions.
Hmm. How can I auto-scale this?
For example let's say I have a messaging system with hi/lo priorities, so I create a topic for messages and partitions for hi and lo priority messages.
If this was RabbitMQ, I'd have an auto-scalable group of consumers assigned to each partition, like this:
If I understand the Kafka model I can't have >1 consumer per partition in a consumer group, so that picture doesn't work for Kafka, right?
Ok, so what about >1 consumer groups like this:
That get's around Kafka's limitation but... If I understand how this works both consumer groups would be pulling from a partition, for example msg.hi, with their own offsets so neither would know about the other--meaning messages would likely be delivered twice!
How can I achieve the capability I had in the Rabbit design w/Kafka and still maintain the "queue-ness" of the behavior (I don't want to send a message twice)? What am I missing?
TL;DR
Topic is made up of partitions. Partitions decide the max number of consumers you can have in a group.
Scenario 1:
When we have only one consumer, It can read all the messages from all the partitions.
Scenario 2:
In the above set up, when you increase the number of consumers in the group, partition reassignment happens and instead of consumer 1 reading all the messages from all the partitions, consumer 2 could share some of the load with consumer 1 as shown below.
Scenario 3:
What happens If I have more number of consumers than the number of partitions.? Each consumer would be assigned 1 partition. Any additional consumers in the group will be sitting idle unless you increase the number of partitions for a Topic.
Summary:
We need to choose the partitions accordingly. That decides the max number of consumers in the group. Changing the partition for an existing topic is really NOT recommended as It could cause issues.
That is, Let's assume a producer producing names into a topic where we have 3 partitions. All the names starting with A-I go to Partition 1, J-R in partition 2 and S-Z in partition 3. Let's also assume that we have already produced 1 million messages. Now if you suddenly increase the number of partitions to 5 from 3, It will create a different A-Z range now. That is, A-F in Partition 1, G-K in partition 2, L-Q in partition 3, R-U in partition 4 and V-Z in partition 5. Do you get it? It kind of affects the order of the messages we had before! So you need to be aware of this. If this could be a problem, then we need to choose the partition accordingly upfront.
More info is here - http://www.vinsguru.com/kafka-scaling-consumers-out-for-a-consumer-group/
Your assumption about messages being consumed twice is correct (since each group consumes 100% of messages from a topic).
I agree with David. Moreover, I suggest that you create more partitions than you really need, which would leave you some headroom to increase the number of threads in the group when such a need arises.
You can always increase the number of partitions later (and/or add additional brokers), but it's nice to have that already done, so that you can only increase number of threads and be done with it (those situations usually require a quick response, so you should do all the prep. that you can do in advance).
Just create a bunch of partitions for hi and lo. 12 is a good number. So is 60. Just pick a number of partitions that matches how much maximum parallelization you want.
Honestly, although I personally would makemsg.hi and msg.lo be different topics entirely, that's not a requirement -- you can do custom parititoning to divide messages between partitions.
You can also use an AI based auto scaler like this https://www.confluent.io/events/kafka-summit-americas-2021/intelligent-auto-scaling-of-kafka-consumers-with-workload-prediction/
This scaler calculates the right number of consumer PODs based on workload prediciton and target KPI metrics