Consume only a specific number of messages from the queue - apache-kafka

Is it possible to consume only a specific number of messages like 10 or 50 or 100 messages from the 1000 that are in the queue? I was looking at 'fetch.max.bytes' config, but it seems like it is for a message size rather than number of messages.
I don't know how to set "max.partition.fetch.bytes" as my byte size is not the same in every message.
Is there a way to dynamically set this to read 10 or 50 or 100 messages per minute?
Or is there any way I can do this?
note : please "max.poll.records" note that I cannot use the method

Per minute? No, not really, because you have little control as a consumer client over producer speeds or even network speeds.
If you just want a static number, seek the consumer to a specific partition offset and simply count the number of records consumed until you're satisfied with the number, then commit the offsets back (or don't).

Related

How does one Kafka consumer read from more than one partition?

I would like to know how one consumer consumes from more than one partition, specifically, in what order are messages read from the different partitions?
I had a peek at the source code (Consumer, Fetcher) but I can't really follow all of it.
This is what I thought would happen:
Partitions are read sequentially. That is: all the messages in one partition will be read before continuing to the next. If we reach max.poll.records without consuming the whole partition, the next fetch will continue reading the current partition until it is exhausted, before going on to the next.
I tried setting max.poll.records to a relatively low number and seeing what happens.
If I send messages to a topic and then start a consumer, all the messages are read from one partition before continuing to the next, even if the number of messages in that partition is higher than max.poll.records.
Then I tried to see if I could "lock" the consumer in one partition, by sending messages to that partition continuously (using JMeter). But I couldn't do it: messages from other partitions were also being read.
The consumer is polling for messages from its assigned partitions in a greedy round-robin way.
e.g. if max.poll.records is set to 100, and there are 2 partitions assigned A,B. The consumer will try to poll 100 from A. If partition A hasn't had 100 available messages, it will poll whats left to complete to 100 messages from partition B.
Although this is not ideal, this way no partition will be starved.
This is also explain why ordering is not guaranteed between partitions.
I have read the KIP mentioned in the answer to the question linked in the comments and I think I finally understood how the consumer works.
There are two main configuration options that affect how data is consumed:
max.partition.fetch.bytes: the maximum amount of data that the server will return for a given partition
max.poll.records: the maximum amount of records that are returned each time the consumer polls
The process of fetching from each partition is greedy and proceeds in a round-robin way. Greedy means that as many records as possible will be retrieved from each partition; if all records in a partition occupy less than max.partition.fetch.bytes, then all of them will be fetched; otherwise, only max.partition.fetch.bytes will be fetched.
Now, not all the fetched records will be returned in a poll call. Only max.poll.records will be returned.
The remaining records will be retained for the next call to poll.
Moreover, if the number of retained records is less than max.poll.records, the poll method will start a new round of fetching (pre-fetching) before returning. This means that, usually, the consumer is processing records while new records are being fetched.
If some partitions receive considerably more messages than others, this could lead to the less active partitions not being processed for long periods of time.
The only downside to this approach is that it could lead to some partitions going unconsumed for an extended amount of time when there is a large imbalance between the partition's respective message rates. For example, suppose that a consumer with max messages set to 1 fetches data from partitions A and B. If the returned fetch includes 1000 records from A and no records from B, the consumer will have to process all 1000 available records from A before fetching on partition B again.
In order to prevent this, we could reduce max.partition.fetch.bytes.

How to read specific number of messages per minute from apache kafka message queue

How to read specific number of messages per minute from apache kafka message queue? For example, imagine that there are 100 messages in the queue, how can I get 5 messages to be read per minute. I don't know how to set "max.partition.fetch.bytes" as my byte size is not the same in every message.
Is there a way to dynamically set this to read 5 messages per minute?

Delaying all messages by 30 minutes in Kafka Streams

We have a use case where we need to write all messages of topic a into topic b, but with a delay of 30 minutes for each message. Why, you ask? Because time is of critical importance for this stream of data, so paying customers get the real-time feed, for freeloaders, we offer the delayed stream.
I guess it would be relatively easy to do in a KafkaConsumer poll() loop, by comparing system time and message time (using an ordered message time like producer time or ingestion time) and then pause()ing the partitions in question and resume()ing them after the appropriate time interval of up to 30 minutes(, all the while continuing to poll() to avoid getting failed over).
As the data, though delayed, still needs to be delivered in a streaming fashion, the delay of the ingestion times of all messages in topic a and b should be as close to 30 minutes as possible.
But is this also easily possible in Kafka Streams, so that we can use its built-in exactly-once guarantees? I wonder if "it's ok to call Thread.sleep() in Kafka Streams also applies to longer sleeps of up to 30 minutes? (Of course we don't want a partition rebalance to occur because Kafka thinks something's wrong with our process)
Assuming we get this to work, is there a way to get proper lag monitoring for this? If we just delay messages, I would think the consumer group lag would always amount to at least 30 minutes worth of messages. So is it possible to have the lag monitor count only unprocessed messages older than 30 minutes?
(2. is of less importance for us than getting 1. to work)
Edit: https://stackoverflow.com/a/59261274/709537 proposes a solution to a somewhat related problem, but that involves state stores and thus looks more complicated than would seem necessary for our simple (?) "delay all messages by x minutes" task.
Regarding 2., I assume we will have to roll our own lag monitoring for this.
A simple way to do something related - measuring latency instead of the number of lagging messages - would be to periodically and for every partition
get the first unread input message
currentLatency = max(0, ingestionTime(firstUnreadMessage) - 30min)
If we wanted to monitor the number of lagging messages, something a little more involved would need to be done:
read input messages backwards, until there is one with ingestionTime + 30min <= systemTime
the number of those messages would be the lag
However reading messages backwards is not exactly one of Kafka's core competencies... A clever binary search style could be devised to get the exact value. However, no-one really wants to know whether the message lag is 43123 or 40513, what they want to know is the order of magnitude. That will keep the number of seeks down to a handful (per partition), and no binary search style back and forth would be necessary. The output could e.g. be
lag < 10
lag < 100
lag < 1000
lag < 10000
...

Kafka consumer, which takes effect, max.poll.records or max.partition.fetch.bytes?

I have trouble understand these two consumer settings, let's say I have a topic with 20 partitions, and all messages are 1KB size, and, to simplify this discussion, also suppose that I have only one consumer.
If I set max.partition.fetch.bytes=1024, then, since each partition will give me one message, I will get 20 messages in one go if I understanding it correctly.
But what if I set max.poll.records=10? Could anyone help to provide some explanation? Many thanks.
There is no direct relation between two configs. max.partition.fetch.bytes specifies max amount data per partition the server will return. On the other hand max.poll.records is specifies maximum number of records returned in a single poll().
So in your case if you have a topic with 20 partitions that each of them has one record with 1KB size and there is only one consumer which subscribe this topic,
then you can get each of the messages beacuse message size doesn't exceed max.partition.fetch.bytes but you can also get at most 10 messages in one poll.
As a result;
Your consumer will get 10 messages in the first call to poll().
In your second poll() it will get other 10 records.

How does kafka compression relate to configurations that refer to bytes?

It's unclear to me (and I haven't managed to find any documentation that makes it perfectly clear) how compression affects kafka configurations that deal with bytes.
Take a hypothetical message that is exactly 100 bytes, a producer with a batch size of 1000 bytes, and a consumer with a fetch size of 1000 bytes.
With no compression it seems pretty clear that my producer would batch 10 messages at a time and my consumer would poll 10 messages at a time.
Now assume a compression (specified at the producer -- not on the broker) that (for simplicity) compresses to exactly 10% of the uncompressed size.
With that same config, would my producer still batch 10 messages at a time, or would it start batching 100 messages at a time? I.e. is the batch size pre- or post-compression? The docs do say this:
Compression is of full batches of data
...which I take to mean that it would compress 1000 bytes (the batch size) down to 100 bytes and send that. Is that correct?
Same question for the consumer fetch. Given a 1K fetch size, would it poll just 10 messages at a time (because the uncompressed size is 1K) or would it poll 100 messages (because the compressed size is 1K)? I believe that the fetch size will cover the compressed batch, in which case the consumer would be fetching 10 batches as-produced-by-the-producer at a time. Is this correct?
It seems confusing to me that, if I understand correctly, the producer is dealing with pre-compression sizes and the consumer is dealing with post-compression sizes.
It's both simpler and more complicated ;-)
It's simpler in that both the producer and the consumer compresses and uncompresses the same Kafka Protocol Produce Requests and Fetch Requests and the broker just stores them with zero copy in their native wire format. Kafka does not compress individual messages before they are sent. It waits until a batch of messages (all going to the same partition) are ready for send and then compresses the entire batch and sends it as one Produce Request.
It's more complicated because you also have to factor in the linger time which will trigger a send of a batch of messages earlier than when the producer buffer size is full. You also have to consider that messages may have different keys, or for other reasons be going to different topic partitions on different brokers so it's not true to say that qty(10) records compressed to 100 bytes each go all as one batch to one broker as a single produce request of 1000 bytes (unless all the messages are being sent to a topic with a single partition).
From https://kafka.apache.org/0110/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html
The producer maintains buffers of unsent records for each partition.
These buffers are of a size specified by the batch.size config. Making
this larger can result in more batching, but requires more memory
(since we will generally have one of these buffers for each active
partition).
By default a buffer is available to send immediately even if there is
additional unused space in the buffer. However if you want to reduce
the number of requests you can set linger.ms to something greater than
0. This will instruct the producer to wait up to that number of milliseconds before sending a request in hope that more records will
arrive to fill up the same batch. This is analogous to Nagle's
algorithm in TCP. For example, in the code snippet above, likely all
100 records would be sent in a single request since we set our linger
time to 1 millisecond. However this setting would add 1 millisecond of
latency to our request waiting for more records to arrive if we didn't
fill up the buffer. Note that records that arrive close together in
time will generally batch together even with linger.ms=0 so under
heavy load batching will occur regardless of the linger configuration;
however setting this to something larger than 0 can lead to fewer,
more efficient requests when not under maximal load at the cost of a
small amount of latency.