Pub/Sub Lite Delayed Consumer - apache-kafka

I am implementing Kafka delayed topic consumption with consumer.pause(<partitions>).
Pub/Sub Kafka shim turns pause into a NoOp:
https://github.com/googleapis/java-pubsublite-kafka/blob/v0.6.7/src/main/java/com/google/cloud/pubsublite/kafka/PubsubLiteConsumer.java#L590-L600
Is there any documentation on how to delay consumption of a pub sub lite topic by a set duration?
i.e. I want to consume all messages from a Pub/Sub Lite topic but with a synthetic 4 minute lag.
Here is my algorithm with Kafka native:
call consumer.poll()
resume all assigned partitions consumer.resume(consumer.assignment())
combine previously delayed records with recently polled records
separate records into
records that are old enough to process
records still too young to process
pause partitions for any records that are too young consumer.pause(<partitions of too young>)
keep a buffer of too young records to reconsider on the next pass, called delayed
processes records that are old enough
rinse, repeat
We only commit offsets of records that are old enough, if the process dies any records in the “too young” buffer will remain uncommitted and they will be revisited by whichever consumer receives the partition in the ensuing rebalance.
Is there a more generalized form of this algorithm that will work with native Kafka and Pub/Sub Lite?
Edit: CloudTasks is a bad idea here as it disconnects the offset commit chain. I need to ensure I only commit offsets for records that have gotten an ack from the downstream system.

Something similar to the above would likely work fine if you remove the pause and resume stages. I'd note that with both systems, you are not guaranteed to receive all messages that exist on the server until now in any given poll() call, so you may add extra delay if you are not given any records for a given partition in a poll call.
If you do the following with autocommit enabled, you should effectively delay processing by strictly more than 4 minutes.
call consumer.poll()
sleep until every record 4 minutes old
process records
go to 1.
If you use manual commits, you can make the sleeps closer to 4 minutes on a per-message basis, but with the downside of needing to manage offsets manually:
call consumer.poll()
put records into ordered per-partition buffers
sleep until the oldest record for any partition is 4 minutes in the past
process records which are more than 4 minutes in the past
commit offsets for processed records
go to 1

Related

How does Kafka provides next batch of records to poll when commitAsync gets failed in committing offset

I have a use-case regarding consuming records by Kafka consumer.
For instance,
I have 1 topic which has 1 partition. Currently, it has 10 records and while consuming the first 10 records, another 10 records are written to the partition.
myConsumer polls the first time and returns the first 10 records say 0 - 9 records.
It processed all the records successfully.
It invoked commitAsync() to Kafka to commit the last offset.
Commit response is in processing. It can be a success or a failure.
But, since it is an asynchronous mode, it continues to poll for the next batch.
Now, how does either Kafka or consumer poll know that it has to read from the 10th position? Because the commitAsync request has not yet completed.
Please help me in understanding this concept.
Commit Offset tells the broker that the consumer has processed the corresponding message successfully. The consumer itself would be aware of its progress (except for start of consumer where it gets its last committed offset from broker).
At step-5 in your description, the commit offset is in progress. So:
Broker does not know that 0-9 records have been processed
Consumer itself has the read the messages and so it knows that is has read 0-9 messages. So it will know to read 10th onwards next.
Possible Scenarios
Lets say the commit fails for (0-9). Your next batch, say (10-15) is processed and committed succesfully then there is no harm done. Since we mark to the broker that processing till 15 is complete.
Lets say the commit fails for (0-9). Your next batch, (10-15) is processed and before committing, the consumer goes down. When your consumer is brought back up, it takes its state from broker (which does not have commit for either of the batch). So it will start reading from 0th message.
You can come up with several other scenarios as well. I guess the bottom line is, the importance of commit will come into picture when your consumer is restarted for whatever reason and it has get its last processed offset from kafka broker.

Does kafka partition assignment happen across processes?

I have a topic with 20 partitions and 3 processes with consumers(with the same group_id) consuming messages from the topic.
But I am seeing a discrepancy where unless one of the process commits , the other consumer(in a different process) is not reading any message.
The consumers in other process do cconsume messages when I set auto-commit to true. (which is why I suspect the consumers are being assigned to the first partition in each process)
Can someone please help me out with this issue? And also how to consume messages parallely across processes ?
If it is of any use , I am doing this on a pod(kubernetes) , where the 3 processes are 3 different mules.
Commit shouldn't make any difference because the committed offset is only used when there is a change in group membership. With three processes there would be some rebalancing while they start up but then when all 3 are running they will each have a fair share of the partitions.
Each time they poll, they keep track in memory of which offset they have consumed on each partition and each poll causes them to fetch from that point on. Whether they commit or not doesn't affect that behaviour.
Autocommit also makes little difference - it just means a commit is done synchronously during a subsequent poll rather than your application code doing it. The only real reason to manually commit is if you spawn other threads to process messages and so need to avoid committing messages that have not actually been processed - doing this is generally not advisable - better to add consumers to increase throughput rather than trying to share out processing within a consumer.
One possible explanation is just infrequent polling. You mention that other consumers are picking up partitions, and committing affects behaviour so I think it is safe to say that rebalances must be happening. Rebalances are caused by either a change in partitions at the broker (presumably not the case) or a change in group membership caused by either heartbeat thread dying (a pod being stopped) or a consumer failing to poll for a long time (default 5 minutes, set by max.poll.interval.ms)
After a rebalance, each partition is assigned to a consumer, and if a previous consumer has ever committed an offset for that partition, then the new one will poll from that offset. If not then the new one will poll from either the start of the partition or the high watermark - set by auto.offset.reset - default is latest (high watermark)
So, if you have a consumer, it polls but doesn't commit, and doesn't poll again for 5 minutes then a rebalance happens, a new consumer picks up the partition, starts from the end (so skipping any messages up to that point). Its first poll will return nothing as it is starting from the end. If it doesn't poll for 5 minutes another rebalance happens and the sequence repeats.
That could be the cause - there should be more information about what is going on in your logs - Kafka consumer code puts in plenty of helpful INFO level logging about rebalances.

Processing kafka messages taking long time

I have a Python process (or rather, set of processes running in parallel within a consumer group) that processes data according to inputs coming in as Kafka messages from certain topic. Usually each message is processed quickly, but sometimes, depending on the content of the message, it may take a long time (several minutes). In this case, Kafka broker disconnects the client from the group and initiates the rebalance. I could set session_timeout_ms to a really large value but it would be like 10 minutes of more, which means if a client dies, the cluster would not be properly rebalanced for 10 minutes. This seems to be a bad idea. Also, most messages (about 98% of them) are fast, so paying such penalty for just 1-2% of messages seems wasteful. OTOH, large messages are frequent enough to cause a lot of rebalances and cost a lot of performance (since while the group is rebalancing, nothing is getting done, and then the "dead" client re-joins again and causes another rebalance).
So, I wonder, are there any other ways for handling messages that take a long time to process? Is there any way to initiate heartbeats manually to tell the broker "it's ok, I am alive, I'm just working on the message"? I thought the Python client (I use kafka-python 1.4.7) was supposed to do that for me but it doesn't seem to happen. Also, the API doesn't seem to even have separate "heartbeat" function at all. And as I understand, calling poll() would actually get me the next messages - while I am not even done with the current one, and would also mess up iterator API for Kafka consumer, which is quite convenient to use in Python.
In case it's important, the Kafka cluster is Confluent, version 2.3 if I remember correctly.
In Kafka, 0.10.1+ Kafka polling and session heartbeat are decoupled to each other.
You can get an explanationhere
max.poll.interval.ms how much time permit to complete processing by consumer instance before time out means if processing time takes more than max.poll.interval.ms time Consumer Group will presume its die remove from Consumer Group and invoke rebalance.
To increase this will increase the interval between expected polls which give consumers more time to handle a batch of records returned from poll(long).
But at the same time, it will also delay group rebalances since the consumer will only join the rebalance inside the call to poll.
session.timeout.ms is the timeout used to identify if the consumer is still alive and sending a heartbeat on a defined interval (heartbeat.interval.ms). In general, the thumb-rule is heartbeat.interval.ms should be 1/3 of session timeout so in case of network failure consumers can miss at most 3-time heartbeat before session timeout.
session.timeout.ms: low value would be good to detect failure more quickly.
max.poll.interval.ms: large value will reduce the risk of failure due to increased processing time however increases the rebalancing time.
Note: A large number of partition and topics consumed by Consumer Group also effect on overall rebalance time
The other approach if you would really want to get rid of rebalancing you can assign partitions on each consumer instance manually, using partition assign. In that case, each consumer instance will be running independently with their own assigned partitions. But in that case, you would not able to leverage the rebalance features to assign partitions automatically.

Is it possible consumer Kafka messages after arrival?

I would like to consume events from a kafka topic after the time they arrive. The time on which I want the event to be consumed is in the payload of the message. Is it possible to achieve something like that in Kafka? What are the drawbacks of it?
Practical example: a message M is produced at 12:10, arrives to my kafka topic at 12:11 and I want the consumer to poll it at 12:41 (30 minutes after arrival)
Kafka has a default retention period of all topic for 7 days. You can therefore consume up to a week's data at any moment, the drawback being network saturation if you are constantly doing this.
If you want to consume data that is not at the latest offset, then for any new consumer group, you would set auto.offset.reset=earliest. Otherwise for existing groups, you would need to use kafka-consumer-groups --reset command in order to re-consume an already consumed record.
Sometimes you may want to start from beginning of a topic, for example, if you have a compacted topic, in order to rebuild the "deltas" of the data within a topic - lookup the "Stream / Table Duality"
The time on which I want the event to be consumed is in the payload of the message
Since KIP-32 every message has a timestamp outside the payload, by the way
I want the consumer to poll it ... (30 minutes after arrival)
Sure, you can start a consumer whenever, as long as the data is within the retention window, you will get that event.
There isn't a way to finely control when that happens that other than acually making your consumer at that time, for example 30 minutes later. You could play with max.poll.records and max.poll.interval.ms, but I find anything larger than a few seconds really isn't a use-case for Kafka.
For example, you could rather have a TimerTask around a consumer thread, or Spark or MapReduce scheduled with an Oozie/Airflow task that reads a max amount of records.

Submitting offsets to kafka after storm batch

What would be the correct way to submit only the highest offset of every partion when batch bolt finishes proccessing a batch? My main concern is machines dying while proccessing batches as the whole shebang is going to run in AWS spot instances.
I am new to storm development I can't seem to find an answer to IMO is pretty straight forward usage of kafka and storm.
Scenario:
Based on the Guaranteeing Message Processing guide, lets assume that I have a steam (kafka topic) of ("word",count) tuples, Batch bolt that proccess X tupples, does some aggregation and creates CSV file, uploads the file to hdfs/db and acks.
In non-strom "naive" implementation, I would read X msgs (or read for Y seconds), aggregate,write to hdfs and once the upload is completed, commit the latest (highest) offset for every partition to kafka. If machine or proccess dies before the db commit - the next iteration will start from the previous place.
In storm I can create batch bolt that will anchor all of the batch tuples and ack them at once, however I can't find a way to commit the highest offset of every partition to kafka, as the spouts are unaware of the batch, so once the batch bolt acks the tupples, every spout instance will ack his tupples one by one, so I the way I see it I can:
Submit the offset of the acked message on every ack on the spout. This will cause many submits (every batch can be few Ks of tupples), probably out of order, and if the spout works dies while submitting the offsets, I will end up partially replaing some of the events.
Same as 1. but I can add some local offset management in the highest offset commited (fixing out of order offset commits) and submit the highets offset seen every few seconds (reducing the high number of submits) but I still can end up with partially commited offsets if the spout dies
Move the offset submition logic to the bolts - I can add the partition and offset of every message into data sent to the batch bolt and submit the highest proccessed offset of every partition as part of the batch (emit to "offset submitter" bolt at the end of the batch). This will solve the offset tracking, multiple submits and patial replayes issues but this adds kafka specific logic to the bolts thus copling the bolt code with kafka and in generally speaking seems to me as reinventing the wheel.
Go even further with wheel reinvention and manually managed the highest processed patition-offset combination in ZK and read this value when I init the spout.
There is quite a lot in your question, so not sure if this addresses it fully, but if you are concerned about the number of confirmations sent to kafka (e.g. after every message) you should be able to set a batch size for consumption, for example 1000 to reduce this a lot.