What would be the correct way to submit only the highest offset of every partion when batch bolt finishes proccessing a batch? My main concern is machines dying while proccessing batches as the whole shebang is going to run in AWS spot instances.
I am new to storm development I can't seem to find an answer to IMO is pretty straight forward usage of kafka and storm.
Scenario:
Based on the Guaranteeing Message Processing guide, lets assume that I have a steam (kafka topic) of ("word",count) tuples, Batch bolt that proccess X tupples, does some aggregation and creates CSV file, uploads the file to hdfs/db and acks.
In non-strom "naive" implementation, I would read X msgs (or read for Y seconds), aggregate,write to hdfs and once the upload is completed, commit the latest (highest) offset for every partition to kafka. If machine or proccess dies before the db commit - the next iteration will start from the previous place.
In storm I can create batch bolt that will anchor all of the batch tuples and ack them at once, however I can't find a way to commit the highest offset of every partition to kafka, as the spouts are unaware of the batch, so once the batch bolt acks the tupples, every spout instance will ack his tupples one by one, so I the way I see it I can:
Submit the offset of the acked message on every ack on the spout. This will cause many submits (every batch can be few Ks of tupples), probably out of order, and if the spout works dies while submitting the offsets, I will end up partially replaing some of the events.
Same as 1. but I can add some local offset management in the highest offset commited (fixing out of order offset commits) and submit the highets offset seen every few seconds (reducing the high number of submits) but I still can end up with partially commited offsets if the spout dies
Move the offset submition logic to the bolts - I can add the partition and offset of every message into data sent to the batch bolt and submit the highest proccessed offset of every partition as part of the batch (emit to "offset submitter" bolt at the end of the batch). This will solve the offset tracking, multiple submits and patial replayes issues but this adds kafka specific logic to the bolts thus copling the bolt code with kafka and in generally speaking seems to me as reinventing the wheel.
Go even further with wheel reinvention and manually managed the highest processed patition-offset combination in ZK and read this value when I init the spout.
There is quite a lot in your question, so not sure if this addresses it fully, but if you are concerned about the number of confirmations sent to kafka (e.g. after every message) you should be able to set a batch size for consumption, for example 1000 to reduce this a lot.
Related
I am implementing Kafka delayed topic consumption with consumer.pause(<partitions>).
Pub/Sub Kafka shim turns pause into a NoOp:
https://github.com/googleapis/java-pubsublite-kafka/blob/v0.6.7/src/main/java/com/google/cloud/pubsublite/kafka/PubsubLiteConsumer.java#L590-L600
Is there any documentation on how to delay consumption of a pub sub lite topic by a set duration?
i.e. I want to consume all messages from a Pub/Sub Lite topic but with a synthetic 4 minute lag.
Here is my algorithm with Kafka native:
call consumer.poll()
resume all assigned partitions consumer.resume(consumer.assignment())
combine previously delayed records with recently polled records
separate records into
records that are old enough to process
records still too young to process
pause partitions for any records that are too young consumer.pause(<partitions of too young>)
keep a buffer of too young records to reconsider on the next pass, called delayed
processes records that are old enough
rinse, repeat
We only commit offsets of records that are old enough, if the process dies any records in the “too young” buffer will remain uncommitted and they will be revisited by whichever consumer receives the partition in the ensuing rebalance.
Is there a more generalized form of this algorithm that will work with native Kafka and Pub/Sub Lite?
Edit: CloudTasks is a bad idea here as it disconnects the offset commit chain. I need to ensure I only commit offsets for records that have gotten an ack from the downstream system.
Something similar to the above would likely work fine if you remove the pause and resume stages. I'd note that with both systems, you are not guaranteed to receive all messages that exist on the server until now in any given poll() call, so you may add extra delay if you are not given any records for a given partition in a poll call.
If you do the following with autocommit enabled, you should effectively delay processing by strictly more than 4 minutes.
call consumer.poll()
sleep until every record 4 minutes old
process records
go to 1.
If you use manual commits, you can make the sleeps closer to 4 minutes on a per-message basis, but with the downside of needing to manage offsets manually:
call consumer.poll()
put records into ordered per-partition buffers
sleep until the oldest record for any partition is 4 minutes in the past
process records which are more than 4 minutes in the past
commit offsets for processed records
go to 1
I have a use-case regarding consuming records by Kafka consumer.
For instance,
I have 1 topic which has 1 partition. Currently, it has 10 records and while consuming the first 10 records, another 10 records are written to the partition.
myConsumer polls the first time and returns the first 10 records say 0 - 9 records.
It processed all the records successfully.
It invoked commitAsync() to Kafka to commit the last offset.
Commit response is in processing. It can be a success or a failure.
But, since it is an asynchronous mode, it continues to poll for the next batch.
Now, how does either Kafka or consumer poll know that it has to read from the 10th position? Because the commitAsync request has not yet completed.
Please help me in understanding this concept.
Commit Offset tells the broker that the consumer has processed the corresponding message successfully. The consumer itself would be aware of its progress (except for start of consumer where it gets its last committed offset from broker).
At step-5 in your description, the commit offset is in progress. So:
Broker does not know that 0-9 records have been processed
Consumer itself has the read the messages and so it knows that is has read 0-9 messages. So it will know to read 10th onwards next.
Possible Scenarios
Lets say the commit fails for (0-9). Your next batch, say (10-15) is processed and committed succesfully then there is no harm done. Since we mark to the broker that processing till 15 is complete.
Lets say the commit fails for (0-9). Your next batch, (10-15) is processed and before committing, the consumer goes down. When your consumer is brought back up, it takes its state from broker (which does not have commit for either of the batch). So it will start reading from 0th message.
You can come up with several other scenarios as well. I guess the bottom line is, the importance of commit will come into picture when your consumer is restarted for whatever reason and it has get its last processed offset from kafka broker.
I am designing an apache storm topology (using streamparse) built with one spout (apache kafka spout) and 1 bolt with a parallelism > 1 that read messages in batch from kafka spout and persist messages in mysql table
The bolt read messages in batches. If the batch complete successfully i manually commit apache kafka offset.
When the bolt insert on mysql fail i don't commit the offset in kafka, but some messages are already in the queue of the messages that the spout has sent to the bolt.
The messages that are already on the queue should be removed because i cannot advance the kafka offset without loosing the previous failed messages.
is there a way in streamparse to clean or fail all the messages that are already in the queue at bolt startup?
I don't know about streamparse, but the impression I get is that you want to bundle up tuples up and write them as a batch. Let's say you've written up to offset 10. Now your bolt receives offset 11-15, and the batch fails to write. Offset 15-20 is queued, and you want to not process them right now, because that would process the batches out of order.
Is this understanding right?
First, I would drop manually committing offsets. You should let the spout handle that. Assuming you are using storm-kafka-client, you can configure it to only commit offsets once the corresponding tuple and all preceding tuples have been acked.
What you should probably do is keep track in the bolt (or even better, in your database) what the highest offset was in the failed batch. Then when your bolt fails to write offset 11-15, you can make the bolt fail every tuple with offset > 15. At some point, you will receive offset 11-15 again, and can retry writing the batch. Since you failed all messages with offset > 15, they will also be retried, and will arrive after the messages in the failed batch.
This solution assumes that you don't do reordering of the message stream between the spout and your writer bolt, so the messages arrive at the bolt in the order they are emitted.
I have a batch job, which populates data to Kafka topic. Every message has data and job identifier.
On the consumer side, I want to only read messages, which belong to this job. After the job has finished and all the messages consumed, consumer side has to do some post processing.
1) If this is guaranteed, that no other messages will be produced during the job, how can I understand that job has finished and all the messages, produced by the job were consumed? (taking into consideration multiple partitions and asychrony).
2) If it is NOT guaranteed, that no other messages will be produced during the job, noise can be skipped, I believe.
Thanks
I'm assuming the job_id is constant. In that case, you can put a check in your consumer to shut down if n subsequent poll returns empty records from Kafka. n will depend on your ingestion rate and consumer poll interval.
I am only talking about the first case here. Mind you, this is just an idea and I have never tried it myself
You can use endOffsets()to get the last offsets of all the partitions and then loop over all of them after every message to check if all the current offsets match the ending offsets. If all are a match, you have reached the end.
I have a requirement to read messages from a topic, batch them and push the batch to an external system. If the batch fails for any reason, I need to consume the same set of messages again and repeat the process. So for every batch, the from and to offsets for each partition are stored in a database. In order to achieve this, I am creating one Kafka consumer per partition by assigning partition to the reader, based on the previous offsets stored, the consumers seek to that position and start reading. I have turned off auto commit and I dont commit offsets from the consumer. For every batch, I create a new consumer per partition, read messages from the last offset stored and publish to the external system. Do you see any problems in consuming messages without committing offsets and using the same consumer group across batches, but at any point there won't be more than one consumer per partition ?
Your design seems reasonable to me.
Committing offsets to Kafka is just a convenient built-in mechanism within Kafka to keep track of offsets. However, there is no requirement whatsoever to use it -- you can use any other mechanism to track offsets, too (like using a DB as in your case).
Furthermore, if you assign partitions manually, there will be no group management anyway. So parameter group.id has no effect. See http://docs.confluent.io/current/clients/consumer.html for more details.
In kafka version two i achieved this behaviour without the need for a database to store the offsets.
The following is a configuration for spring-boot-kafka but it should also work with any kafka consumer api
spring:
kafka:
bootstrap-servers: ...
consumer:
value-deserializer: ...
max-poll-records: 1000
enable-auto-commit: false
fetch-min-size: 262144 # 1/4 mb..
group-id: ...
fetch-max-wait: 10000 # we will consume every 10s or when 1/4 mb or 1000 records are accumulated.
auto-offset-reset: earliest
listener:
type: batch
concurrency: 7
ack-mode: manual
This gives me the messages in batches of max. 1000 records (dependent on load). I then write these records asynchronously to a database and count how many success callbacks i get. If the successful writes equals the received batch size i acknowledge the batch, e.g. i commit the offset. This design was very reliable even in a high-load production environment.