Maintain Ordering Guarantees With Kafka Consumer Retries - apache-kafka

I'm in the process of coming up with an architecture for consumer retries in a Kafka based data processing pipeline. We're using Kafka producers and consumers and are thinking of retry topics on which messages will be sent if they error out on consumption. There will be consumers running on these retry topics at a certain cadence.
I read many reference architectures, but none talk about how to maintain ordering guarantees during message consumption failures. Let me give an example:
Our Kafka messages contain payload that has an object and an operation type (which could either be CREATE/UPDATE/DELETE). We partition messages on object_id to make sure that the operations on that object are ordered. However, if a message fails on consumption, should you automatically flag the subsequent messages with the same object_id as failed, without even attempting to process them? And how do you maintain that state?
Are there any reference architectures that address this?

Yes, you need to have a mechanism in place wherein if one message with the same object_id fails and goes to retry, then all subsequent messages with the same object_id also go to retry directly.
I suggest to use a cache to co-ordinate this - whenever a message goes to retry, increment the object_id key. Similarly, whenever a message is successfully consumed from the retry topic, decrement the key.
Now, you need only check whether there exists a key with >0 value corresponding to the object_id before attempting to consume a message, and if it is, directly send it to retry.

If there are multiple levels of retry topics, maintain distributed key-value cache, with key is object_id and value is level of retry topic.
On consumption of message check against this cache & if object_id is present then send event directly to the topic.

The easiest would be to have a blocking retry-policy here: i.e, not using retry-topics, but instead block on consumer (sleep the thread for some time), and then retry the same message. In this case you can guarantee the order always.
If you opt to use the retry-topic instead, you are gonna have a lot of headaches making sure the order is guaranteed.

Related

Reconsume Kafka Message that failed during processing due to DB error

I am new to Kafka and would like to seek advice on what is the best practice to handle such scenario.
Scenario:
I have a spring boot application that has a consumer method that is listening for messages via the #KafkaListner annotation. Once an incoming message has occurred, the consumer method will process the message, which simply performs database updates to different tables via JdbcTemplate.
If the updates to the tables are successful, I will manually commit the message by calling the acknowledge() method. If the database update fails, instead of calling the acknowledge() method, I will call the nack() method with a given duration (E.g. 10 seconds) such that the message will reappear again to be consumed.
Things to note
I am not concerned with the ordering of the messages. Whatever event comes I just have to consume and process it, that's all.
I am only given a topic (no retryable topic and no dead letter topic)
Here is the problem
If I do the above method, my consumer becomes inconsistent. Let's say if I call the nack() method with a duration of 1min, meaning to say after 1 min, the same message will reappear.
Within this 1 min, there could "x" number of incoming messages to be consumed and processed. The observation made was none of these messages are getting consumed and processed.
What I want to know
Hence, I hope someone will advise me what I am doing wrongly and what is the best practice / way to handle such scenarios.
Thanks!
Records are always received in order; there is no way to defer the current record until later, but continue to process other records after this one when consuming from a single topic.
Kafka topics are a linear log and not a queue.
You would need to send it to another topic; the #RetryableTopic (non-blocking retrties) feature is specifically designed for this use case.
https://docs.spring.io/spring-kafka/docs/current/reference/html/#retry-topic
You could also increase the container concurrency so at least you could continue to process records from other partitions.

Will this be enough for exactly once processing?

From my understanding if I enable idempotent producer I won't have duplicate messages on producer side, for the consumer side I can use an incremented message ID and store somewhere (redis, kafka state, db) it along with the partition after processing.
Every-time a consumer restarts or joins the grup I can discard dup messages with an ID less then the stored one.
The only issue I see, a failure when storing the ID after processing the message, albeit I can retry the operation, but can't help it if there is a server crash, but it seems an acceptable risk. Also I should not have concurrency issues when storing the ID since 1 consumer per partition, probably still save to lock the process.
I'm correct in my assumptions? I'm missing something? Thanks
I tried reading kafka docs but I fell I'm missing something

How to manage Kafka transactional producer objects in request oriented applications

What is the best practice for managing Kafka producer objects in request oriented (e.g. http or RPC servers) applications, when configured as transactional producers? Specifically, how to share producer objects among serving threads, and how to define the transactional.id configuration value for those objects?
In non-transactional usage, producer objects are thread safe and it is common to share one object among all request serving threads. It is also straightforward to setup transactional producer objects to be used by kafka consumer threads, just instantiating one object for each consumer thread works well.
Combining transactional producers with request oriented applications appears to be more complicated, as the life-cycle of serving threads is usually dynamically controlled by a thread pool. I can think of a few options, all with downsides:
Share a single object, protected against concurrency by some kind of mutex. Contention under load would probably be a serious problem.
Instantiate a producer object for each request coming in. KafkaProducer objects are slow to initialize, as they maintain network connections, threads, and other heavyweight objects; paying this cost for each request seems impractical.
Maintain a pool of producer objects, and lease one for each request. The main downside I can see is the amount of machinery required. It is also unclear how to configure transactional.id for these objects, as their lifecycle does not map cleanly to a shard identifier in a partitioned, stateful, application as the documentation says.
Are there other options? Is there an optimal approach?
TL;DR
The transactional id is for preventing duplicates caused by zombie processes in the read-process-write pattern where you read from and produce to kafka topics. For request oriented applications, e.g. messages being produced by an incoming http request, transactional id doesn't bring any benefit (of course you still need to assign one if you want to use transactions and shouldn't be repeated between producers in the same process or different processes in your cluster)
Long answer
As the docs say, transactional producers are not thread safe
As is hinted at in the example, there can be only one open transaction per producer. All messages sent between the beginTransaction() and commitTransaction() calls will be part of a single transaction
so as you correctly explained there can't be concurrent access to the producer so we must pick one of the three options you described.
For this answer I'm going to assume that request oriented applications corresponds to http requests as the mechanism is triggering a message being produced with a transaction (actually, more than one message, otherwise will be enough with idempotent producers and transactions won't be needed)
In terms of correctness all of them are ok as, option 1 would work but depending on your application throughput it could have a high contention, option 2 will also work but you will pay the price of a higher latency and won't be very efficient.
IMHO I think option 3 could be the best since is a compromise between of the two previous options, although of course requires a more careful implementation than just opening a new producer each time.
Transactional id
The question that remains is how to assign a transactional id to the producer, specially in the last case (although both options 1 and 3 share the same concern, since in both cases we are reusing a producer with the same transactional id to handle different requests).
To answer this we first need to understand that the goal of transactional.id is to protect us from having duplicate message being produced caused by zombie processes (a process that hangs for a while, e.g. bc of a long gc pause, and is considered dead but after a while comes back and continues), this is called zombie fencing.
An important detail to understand the need of zombie fencing is understanding in which use case it could happen and this is the read-process-write pattern where you read from a topic, process the element and write to an output topic and the offset topic, which give us atomicity and Exactly-once semantics (if you are not doing any side effects on the process step).
Idempotent producers prevent us from having duplicates caused by producer retries (where the message was persisted by the broker but the ack wasn't received by the producer) and two-phase commit within kafka (where we are not only writing to the output but also marked the message as consumed by also producing to the offset topic) prevent us from having duplicates caused by consuming the message more than once (if the process crashes after producing to the output topic but before committing the offset).
There is still a subtle case where a duplicate can be introduced and it is a zombie producer, which is fenced by monotonically increasing an epoch each time a producer calls initTransactions that will be send with every message the producer sends.
So, for a producer to be fenced, another producer should have being started with the same transaction id, the key here is explained by Jason Gustafson in this talk
"what we are looking for is a guarantee that for each input partition there is only a single write that is responsible for reading that data and writing the output"
This means the transactional.id is assigned in terms of the partition is being consumed in the "read-process-write" pattern.
So if a process that has assigned partition 0 of topic A is considered dead, a rebalance will kick off and the new process that is assigned should create a producer with the same transactional.id, that's why it should be something like this <prefix><group>.<topic>.<partition> as described in this answer, where the partition is part of the transactional.id. This also means a producer per partition assigned, which could also represent an overhead depending on how many topics and partitions your consumers are being assigned.
This slides from the talk clarifies this situation
Transactional id before process crash
Transactional id reassigned to other process after crash
Transactional id in http requests
Going back to your original question, http requests won't follow the read-process-write pattern where zombies can introduce duplicates, because each http request will be unique, even if you introduce a unique identifier it will be a different message from the point of view of the transactional producer.
In this case I would argue that you may still have value using the transactional producer if you want the atomicity of writing to two different topics, but you can choose a random transactional id for option 2, or reuse it for options 1 and 3.
UPDATE
My answer is outdated since is based in an old version of kafka.
The overhead of having one producer per partition described before was a concern that was tackled in KIP-447
This architecture does not scale well as the number of input partitions increases. Every producer come with separate memory buffers, a separate thread, separate network connections. This limits the performance of the producer since we cannot effectively use the output of multiple tasks to improve batching. It also causes unneeded load on brokers since there are more concurrent transactions and more redundant metadata management.
This is the main difference as explained in this post
When the partition assignment is finalized after a consumer group rebalance, the first step for the consumer is to always get the next offset to begin fetching data. With this observation, the OffsetFetch protocol protection is enhanced, such that when a consumer group has pending transactional offsets associated with one partition, the OffsetFetch call can be blocked until the associated transaction completes. Previously, the “outdated” offset data would be returned and the application allowed to continue immediately.
Whit this new feature, the use of transactional.id is no longer clear to me.
Although it is still unclear why fencing requires both blocking the poll if there are pending transactions while it seems to me that the sending the consumer group metadata should be enough (I assume a zombie producer will be fenced by commiting with an old generation.id for that group.id, the generation.id being bumped with each rebalance) it seems the transactional.id doesn't play a major role anymore. e.g. spring docs says
With mode V1, the producer is "fenced" if another instance with the same transactional.id is started. Spring manages this by using a Producer for each group.id/topic/partition; when a rebalance occurs a new instance will use the same transactional.id and the old producer is fenced.
With mode V2, it is not necessary to have a producer for each group.id/topic/partition because consumer metadata is sent along with the offsets to the transaction and the broker can determine if the producer is fenced using that information instead.

How to process events which are out of order using Kafka Streams

I have an application where events are sent on a Kafka topic based on user actions like User Login, user's Intermediate actions (optional) and User Logout. Each event has some information in a event object along with userId , for example a Login Event has loginTime; Add Note has notes (Intermediate actions). Similarly a Logout event has logoutTime. The requirement is to aggregate information from all these events into one object after receiving the Logout event for each user & send it on downstream.
Due to some reasons (Network delay, multiple event producer) events may not come in order (User Logout event may come before Intermediate event), So the question is how to handle such scenarios? I can not wait for Intermediate events after receiving User Logout event since Intermediate events are optional depending on user's actions.
The only option which I think here, is to wait for some time after receiving User Logout event, process Intermediate events if received within that wait time & send processed event, but again not sure how to achieve this.
Kafka does not guarantee order on topic, it guarantee order on partition. One topic can have more than one partition so every consumer that is consuming your topic will consume one partition. That is how kafka is achieving scalability. So what you are experiencing is normal behavior (it isn't bug or related to network delay or something like that). What you can do is to make sure that all messages that you want to proceed in order are sent to the same partition. You can do that by setting number of partitions to 1, that is the dumbest way. When you send message with producer, by default kafka take a look into key, take hash of it and by that hash know on which partition should send a message. You can make sure that for all messages, the key is the same. That way all hashes of keys will be the same and all messages will go to the same partition. Also, you can implement custom partitioner and override default way how kafka choose on which partition message will go. In this way, all messages will arrive in order. If you cannot do any of this actions, then you will receive events out of order and you will have to think about a way how to consume them out of order but that is not question related to kafka.
If you are not able to preserve order of event (that Logout will be last event),
you can achieve your requirements using ProcesorApi from Kafka Streams. Kafka Streams DSL can be combine with Processor API (more details here).
You can have several partitions, but all events for particular user has to be send to same Partition.
You have to implement custom Processor/Transformer.
Your processor will be put each event/activity in state store (aggregate all event from particular user under same key).
Processor API gives you ability to create some kind of scheduler (Punctuator).
You can schedule to check every X seconds events for particular user. If Logout was long ago, you get all events/activities and make some aggregation and send results to downstreams.
As said in other answers, in Kafka order is maintained on per-partition basis.
Since you are talking about user events, why don't you make UserID as your Kafka topic key? So, that all events related to a specific user will always be ordered (provided they are produced by a single producer).
You should ensure (by design) that only one Kafka producer pushes all the user change events to the given topic. In this way, you can avoid out-of order messages due to multiple producers.
From streams, you might also want to look at Windows in Kafka streams. Tumbling windows for example is non-overlapping and fixed size. You aggregate records over a period of time.
Now you may want to sort the aggregated by their timestamp (or you said you have logout time, login time etc) and act accordingly.
Simple and effective solution
Use synchronous send and set delivery.timeout.ms and retries to a maximum value.
To ensure fault tolerance set acks=all with min.insync.replicas=2 (topic configuration) and use a single producer to push to that topic.
You should also set max.block.ms to some max value so that your send() does not return immediately if there is an error in fetching the metadata (for example, when Kafka is down).
Benchmark the synchronous send with your rate and check to see if it meets your requirements or benchmark number.
This ensures that a message that came first is sent first to Kafka and then the next message is not sent until the previous message is successfully acknowledged.
If your benchmark figure is not met, try having a back-pressure
mechanism like in-memory/persistent queue.
Add event to a queue in Thread-1
Peek (not dequeue) event from the queue in Thread-2
Call producer.send(...).get() in Thread-2
Dequeue the event in Thread-2
The key is to make your frontend tracker to send ordered events to the backend service which then produces events to kafka.
You can achieve that by batching the events, and sending the batched events to the backend only after the previous batched events are successfully delivered.

Concurrent writes for event sourcing on top of Kafka

I've been considering to use Apache Kafka as the event store in an event sourcing configuration. The published events will be associated to specific resources, delivered to a topic associated to the resource type and sharded into partitions by resource id. So for instance a creation of a resource of type Folder and id 1 would produce a FolderCreate event that would be delivered to the "folders" topic in a partition given by sharding the id 1 across the total number of partitions in the topic. Even though I don't know how to handle concurrent events that make the log inconsistent.
The simplest scenario would be having two concurrent actions that can invalidate each other such as one to update a folder and one to destroy that same folder. In that case the partition for that topic could end up containing the invalid sequence [FolderDestroy, FolderUpdate]. That situation is often fixed by versioning the events as explained here but Kafka does not support such feature.
What can be done to ensure the consistency of the Kafka log itself in those cases?
I think it's probably possible to use Kafka for event sourcing of aggregates (in the DDD sense), or 'resources'. Some notes:
Serialise writes per partition, using a single process per partition (or partitions) to manage this. Ensure you send messages serially down the same Kafka connection, and use ack=all before reporting success to the command sender, if you can't afford rollbacks. Ensure the producer process keeps track of the current successful event offset/version for each resource, so it can do the optimistic check itself before sending the message.
Since a write failure might be returned even if the write actually succeeded, you need to retry writes and deal with deduplication by including an ID in each event, say, or reinitialize the producer by re-reading (recent messages in) the stream to see whether the write actually worked or not.
Writing multiple events atomically - just publish a composite event containing a list of events.
Lookup by resource id. This can be achieved by reading all events from a partition at startup (or all events from a particular cross-resource snapshot), and storing the current state either in RAM or cached in a DB.
https://issues.apache.org/jira/browse/KAFKA-2260 would solve 1 in a simpler way, but seems to be stalled.
Kafka Streams appears to provide a lot of this for you. For example, 4 is a KTable, which you can have your event producer use one to work out whether an event is valid for the current resource state before sending it.