Understanding use case for max.in.flight.request property in Kafka - apache-kafka

I'm building a Spring Boot consumer-producers project with Kafka as middleman between two microservices. The theme of the project is a basketball game. Here is a small state machine diagram, in which events are displayed. There will be many more different events, this is just a snippet.
Start event:
{
"id" : 5,
"actualStartTime" : "someStartTime"
}
Point event:
{
"game": 5,
"type": "POINT",
"payload": {
"playerId": 44,
"value": 3
}
}
Assist event:
{
"game": 4,
"type": "ASSIST",
"payload": {
"playerId": 278,
"value": 1
}
}
Jump event:
{
"game": 2,
"type": "JUMP",
"payload": {
"playerId": 55,
"value": 1
}
}
End event:
{
"id" : 5,
"endTime" : "someStartTime"
}
Main thing to note here is that if there was an Assist event it must be followed with Point event.
Since I'm new to Kafka, I'll keep things simple and have one broker with one topic and one partition. For my use case I need to maintain ordering of each of these events as they actually happen live on the court (I have a json file with 7000 lines and bunch of these and other events).
So, let's say that from the Admin UI someone is sending these events (for instance via WebSockets) to the producers app. Producer app will be doing some simple validation or whatever it needs to do. Now, we can also image that we have two instances of producer app, one is at ip:8080 (prd1) and other one at ip:8081 (prd2).
In reality sequence of these three events happend: Assist -> Point -> Jump. The operator on the court send those three events in that order.
Assist event was sent on prd1 and Point was sent on prd2. Let's now imagine that there was a network glitch in communication between prd1 and Kafka cluster. Since we are using Kafka latest Kafka at the time of this writing, we already have enabled.idempotence=true and Assist event will not be sent twice.
During retry of Assist event on prd1 (towards Kafka), Point event on prd2 passed successfully. Then Assist event passed and after it Jump event (at any producer) also ended up in Kafka.
Now in queue we have: Point -> Assist -> Jump. This is not allowed.
My question is whether these types of problems should be handle by application's business logic (for example Spring State Machine) or this ordering can be handled by Kafka?
In case of latter, is property max.in.flight.request=1 responsible for ordering? Are there any other properties which might preserve ordering?
On the side note, is it a good tactic to use single partition for single match and multiple consumers for any of the partitions? Most probably I would be streaming different types of matches (basketball, soccer, golf, across different leagues and nations) and most of them will require some sort of ordering.
This maybe can be done with KStreams but I'm still on Kafka's steep learning curve.
Update 1 (after Jessica Vasey's comments):
Hi, thanks for very through comments. Unfortunately I didn't quite get all the pieces of the puzzle. What confuses me the most is some terminology you use and order of things happening. Not saying it's not correct, just I didn't understand.
I'll have two microservices, so two Producers. I got be be able to understand Kafka in microservices world, since I'm Java Spring developer and its all about microservices and multiple instances.
So let's say that on prd1 few dto events came along [Start -> Point -> Assist] and they are sent as a ProducerRequest (https://kafka.apache.org/documentation/#recordbatch), they are placed in RECORDS field. On the prd2 we got [Point -> Jump] also as a ProducerRequest. They are, in my understanding, two independent in-flight requests (out of 5 possible?)? Their ordering is based on a timestamp?
So when joining to the cluster, Kafka assigns id to producer let's say '0' for prd1 and '1' for prd2 (I guess it also depends on topic-partition they have been assigned). I don't understand whether each RecordBatch has its monotonically increasing sequence number id or each Kafka message within RecordBatch has its own monotonically increasing sequence number or both? Also the part 'time to recover' is bugging me. Like, if I got OutofOrderSequenceException, does it mean that [Point -> Jump] batch (with possibly other in-flight requsets and other batches in producer's buffer) will sit on Kafka until either delivery.timeout.ms expirees or when it finally successfully [Start -> Point -> Assist] is sent?

Sorry for confusing you further, it's some complex logic you have! Hopefully, I can clarify some points for you. I assumed you had one producer, but after re-reading your post I see you have two producers.
You cannot guarantee the order of messages across both producers. You can only guarantee the order for each individual producer. This post explains this quite nicely Kafka ordering with multiple producers on same topic and parititon
On this question:
They are, in my understanding, two independent in-flight requests (out
of 5 possible?)? Their ordering is based on a timestamp?
Yes, each producer will have max.in.flight.requests.per.connection set to 5.
You could provide a timestamp in your producer, which could help with your situation. However, I won't go into too much detail on that right now and will first answer your questions.
I don't understand whether
each RecordBatch has its monotonically increasing sequence number id
or each Kafka message within RecordBatch has its own monotonically
increasing sequence number or both? Also the part 'time to recover' is
bugging me. Like, if I got OutofOrderSequenceException, does it mean
that [Point -> Jump] batch (with possibly other in-flight requsets and
other batches in producer's buffer) will sit on Kafka until either
delivery.timeout.ms expirees or when it finally successfully [Start ->
Point -> Assist] is sent?
Each message is assigned a monotonically increasing sequence number. This LinkedIn post explains is better than I ever could!
Yes, other batches will sit on the producer until either the previous batch is acknowledged (which could be less than 2 mins) OR delivery.timeout.ms expires.

Even if max.in.flight.requests.per.connection > 1, setting enable.idempotence=true should preserve the message order as this assigns the messages a sequence number. When a batch fails, all subsequent batches to the same partition fail with OutofOrderSequenceException.
Number of partitions should be determined by your target throughput. If you wanted to send basketball matches to one partition and golf to another, you can use keys to determine which message should be sent where.

Related

CQRS + ES Implementation Advice

I'm working on a generic CQRS + ES framework (with nodejs) in the company. Remark: Only RDBMS + Redis (without AOF/RDB persistence) is allowed due to some reasons.
I really need some advices on how to implement the CQRS + ES framework....
Ignoring the ES part, I'm struggling with the implementation on the message propagation.
Here is the tables I have in the RDBMS.
EventStore: [aggregateId (varchar), aggregateType (varchar), aggregateVersion (bigint), messageId (varchar), messageData (varchar), messageMetadata (varchar), sequenceNumber (bigint)]
EventDelivery: [messageId (varchar, foreign key to EventStore), sequenceId (equal to aggregateId, varchar), sequenceNumber (equal to the one in EventStore, bigint)]
ConsumerGroup: [consumerGroup (varchar), lastSequenceNumberSeen (bigint)]
And I have multiple EventSubscriber
// In Application 1
#EventSubscriber("consumerGroup1", AccountOpenedEvent)
...
// In Application 2
#EventSubscriber("consumerGroup2", AccountOpenedEvent)
...
Here is the the flow when an AccountOpenedEvent is written to EventStore table.
For each application (i.e application 1 and application 2), it will scan the codebase to obtain all the #EventSubscriber, create a consumer group in ConsumerGroup table with lastSequeneNumberSeen = 0, then having a scheduler (with 100ms polling interval) to poll all the interested events (group by consumer group) in EventStore with condition sequeneNumber >= lastSequeneNumberSeen.
For each event (EventStore) in step 1, calculate the sequenceId (here the sequenceId is equal to aggregateId), this sequenceId (together with the sequenceNumber) is used to guarantee the message delivery ordering. Persist it into EventDelivery table, and update the lastSequeneNumberSeen = sequenceNumber (this is to prevent duplicate event being scanned in next interval).
For each application (i.e application 1 and application 2), we have another scheduler (also with 100ms polling interval) to poll the EventDelivery table (group by seqeunceId and order by sequenceNumber ASC).
For each event (EventDelivery) in step 3, call the corresponding message handler, after message is handled, acknowledge the message by deleting the record in EventDelivery.
Since I have 2 applications, I have to separate the AccountOpenedEvent in EventStore into 2 transactions, supposing 2 applications don't know each other, I can only do it passively. Thats why I need the EventDelivery table and polling scheduler.
Assuming I can use redlock + cron to make sure there is only 1 instance do the polling jobs, in case application 1 have more than 1 replicas.
Application 1 will poll the AccountOpenedEvent and create a record in EventDelivery, and store the lastSequenceNumberSeen in its consumer group.
Application 2 will also poll the AccountOpenedEvent and create a record in EventDelivery and store the lastSequenceNumberSeen in its consumer group.
Since application 1 and application 2 are different consumer group, they treat the event store stream separately.
Here is a problem, we have 2 schedulers and we would have more if there are more consumer group, these will make heavy traffic loads to the database. How to solve this? One of my solution is convert these 2 schedulers to a job and put these jobs into queue, the queue will handle the jobs per interval (lets say 100ms), but seems like this would introduce large latency if the job is unfortunately placed at the end of the queue.
Here is the 2nd problem, in the above flow, I introduced the 2nd polling job to guarantee the message delivery ordering. But unlike the first one, I don't have the lastSequenceNumberSeen, the 2nd polling job will remove the job in EventDelivery if the message is handled. But it is common a message would be handled over 100ms. If thats in case, the same event in EventDelivery will be scanned again.
I'm not sure the common practice. I'm quite struggling on how to implement this. I did lots of research on the internet. I see some of them implement the message propagation by using Debezium + Kafka (Although I cannot use these 2 tools, I still cannot understand how it works).
I know Debezium using CDC approach to tail the transaction logs of RDBMS and forward the message to Kafka. And I see some recommendations that we should not have multiple subscription on the same transaction log. Let's say Debezium guaranteed the event can be propagated to Kafka, it means I need applciation 1 and applciation 2 subscribe the Kafka topic, both should belongs to different consumer group (also use aggregateId as partition key). Since Kafka guaranteed the message ordering, everything should work fine. But I don't think Kafka would store all the message from the most beginning, lets say it is configured to store 1000000 messages, when the message handler keep failed due to unexpected reason, the 1000000 messages after this failed message cannot be handled, the 1000001th event will get lost... Although this is rare case, I'm not sure I understand it right or not, the database table is the most reliable source to trust as it store all the events from the most beginning, if the system suffer from this case, is that mean I need to manually republish all the events to Kafka to recover the projection model?
And other case, if I have new event subscriber, which need to historical events to build the projection model. With Debezium + Kafka, we need assign a new consumerGroup and configured it to read the Kafka stream from the most beginning? It has the same problem as the consumerGroup can only get the last 1000000 events... But this is not a case if we poll the database table directly instead.
I don't understand why most implementation doesn't poll the database table but make use of message broker.
And, I really need advice on how to implement a CQRS + ES framework.... especially the message propagation part (keep in mind I can only use RDBMS + Redis(without persistence))....

Minimizing failure without impacting recovery when building processes on top of Kafka

I am working with a microservice that consumes messages from Kafka. It does some processing on the message and then inserts the result in a database. Only then am I acknowledging the message with Kafka.
It is required that I keep data loss to an absolute minimum but recovery rate is quick (avoid reprocessing message because it is expensive).
I realized that if there was to be some kind of failure, like my microservice would crash, my messages would be reprocessed. So I thought to add some kind of 'checkpoint' to my process by writing the state of the transformed message to the file and reading from it after a failure. I thought this would mean that I could move my Kafka commit to an earlier stage, only after writing to the file is successful.
But then, upon further thinking, I realized that if there was to be a failure on the file system, I might not find my files e.g. using a cloud file service might still have a chance of failure even if the marketed rate is that of >99% availability. I might end up in an inconsistent state where I have data in my Kafka topic (which is unaccessible because the Kafka offset has been committed) but I have lost my file on the file system. This made me realize that I should send the Kafka commit at a later stage.
So now, considering the above two design decisions, it feels like there is a tradeoff between not missing data and minimizing time to recover from failure. Am I being unrealistic in my concerns? Is there some design pattern that I can follow to minimize the tradeoffs? How do I reason about this situation? Here I thought that maybe the Saga pattern is appropriate, but am I overcomplicating things?
If you are that concerned of data reprocess, you could always follow the paradigm of sending the offsets out of kafka.
For example, in your consumer-worker reading loop:
(pseudocode)
while(...)
{
MessageAndOffset = getMsg();
//do your things
saveOffsetInQueueToDB(offset);
}
saveOffsetInQueueToDB is responsible of adding the offset to a Queue/List, or whatever. This operation is only done one the message has been correctly processed.
Periodically, when a certain number of offsets are stored, or when shutdown is captured, you could implement another function that stores the offsets for each topic/partition in:
An external database.
An external SLA backed storing system, such as S3 or Azure Blobs.
Internal (disk) and remote loggers.
If you are concerned about failures, you could use a combination of two of those three options (or even use all three).
Storing these in a "memory buffer" allows the operation to be async, so there's no need for a new transfer/connection to the database/datalake/log for each processed message.
If there's a crash, you could read all messages from the beginning (easiest way is just changing the group.id and setting from beginning) but discarding those whose offset is included in the database, avoiding the reprocess. For example by adding a condition in your loop (yep pseudocode again):
while(...)
{
MessageAndOffset = getMsg();
if (offset.notIncluded(offsetListFromDB))
{
//do your things
saveOffsetInQueueToDB(offset);
}
}
You could implement better performant algorithms instead a "non-included" type one, just storing the last read offsets for each partition in a HashMap and then just checking if the partition that belongs to each consumer is bigger or not than the stored one. For example, partition 0's last offset was 558 and partitions 1's 600:
//offsetMap = {[0,558],[1,600]}
while(...)
{
MessageAndOffset = getMsg();
//get partition => 0
if (offset > offsetMap.get(partition))
{
//do your things
saveOffsetInQueueToDB(offset);
}
}
This way, you guarantee that only the non-processed messages from each partition will be processed.
Regarding file system failures, that's why Kafka comes as a cluster: Fault tolerance in Kafka is done by copying the partition data to other brokers which are known as replicas.
So if you have 5 brokers, for example, you must experience a total of 5 different system failures at the same time (I guess brokers are in separate hosts) in order to lose any data. Even 4 different brokers could fail at the same time without losing any data.
All brokers save the same amount of data, same partitions. If a filesystem error occurs in one of the brokers, the others will still hold all the information:

Kafka streams threads and count of records being processed

Say we have a Topic with 2 partitions and there are 'n' no of producers which are producing the data to this Topic. Now, Millions of the MessageRecords are being spread over 2 partitions.
Say, we have 2 threads (i.e. 2 separate Instances) powering to the Streams Processor. Now, In this setup, say Thread-1(i.e. Streaming Task-1) got Partition P-1 and say Thread-2(i.e. Streaming Task-2) got Partition P-2 for processing !!
ASK is :- Say, we want to know, how many MessageRecords have been processed by Streaming-Task-1 so far OR say for 28th September, 2KK ?? How do I do that ?
And, even the bigger the question is : "Streaming-Task-1" would never know about the TOTAL count of MessageRecords being processed, it shall only know about the count processed by itself !!
Can it ever know it know about the count processed by another Task-2 ??
There are several ways to accomplish what you are asking. If you are using the DSL I suggest you take a look at the word count example (https://docs.confluent.io/current/streams/quickstart.html). With a map operation you can make all the counts you want relatively simply.
If you are not using the dsl you can still do the same with a couple processors and state stores.

Implementing sagas with Kafka

I am using Kafka for Event Sourcing and I am interested in implementing sagas using Kafka.
Any best practices on how to do this? The Commander pattern mentioned here seems close to the architecture I am trying to build but sagas are not mentioned anywhere in the presentation.
This talk from this year's DDD eXchange is the best resource I came across wrt Process Manager/Saga pattern in event-driven/CQRS systems:
https://skillsmatter.com/skillscasts/9853-long-running-processes-in-ddd
(requires registering for a free account to view)
The demo shown there lives on github: https://github.com/flowing/flowing-retail
I've given it a spin and I quite like it. I do recommend watching the video first to set the stage.
Although the approach shown is message-bus agnostic, the demo uses Kafka for the Process Manager to send commands to and listen to events from other bounded contexts. It does not use Kafka Streams but I don't see why it couldn't be plugged into a Kafka Streams topology and become part of the broader architecture like the one depicted in the Commander presentation you referenced.
I hope to investigate this further for our own needs, so please feel free to start a thread on the Kafka users mailing list, that's a good place to collaborate on such patterns.
Hope that helps :-)
I would like to add something here about sagas and Kafka.
In general
In general Kafka is a tad different than a normal queue. It's especially good in scaling. And this actually can cause some complications.
One of the means to accomplish scaling, Kafka uses partitioning of the data stream. Data is placed in partitions, which can be consumed at its own rate, independent of the other partitions of the same topic. Here is some info on it: how-choose-number-topics-partitions-kafka-cluster. I'll come back on why this is important.
The most common ways to ensure the order within Kafka are:
Use 1 partition for the topic
Use a partition message key to "assign" the message to a topic
In both scenarios your chronologically dependent messages need to stream through the same topic.
Also, as #pranjal thakur points out, make sure the delivery method is set to "exactly once", which has a performance impact but ensures you will not receive the messages multiple times.
The caveat
Now, here's the caveat: When changing the amount of partitions the message distribution over the partitions (when using a key) will be changed as well.
In normal conditions this can be handled easily. But if you have a high traffic situation, the migration toward a different number of partitions can result in a moment in time in which a saga-"flow" is handled over multiple partitions and the order is not guaranteed at that point.
It's up to you whether this will be an issue in your scenario.
Here are some questions you can ask to determine if this applies to your system:
What will happen if you somehow need to migrate/copy data to a new system, using Kafka?(high traffic scenario)
Can you send your data to 1 topic?
What will happen after a temporary outage of your saga service? (low availability scenario/high traffic scenario)
What will happen when you need to replay a bunch of messages?(high traffic scenario)
What will happen if we need to increase the partitions?(high traffic scenario/outage & recovery scenario)
The alternative
If you're thinking of setting up a saga, based on steps, like a state machine, I would challenge you to rethink your design a bit.
I'll give an example:
Lets consider a booking-a-hotel-room process:
Simplified, it might consist of the following steps:
Handle room reserved (incoming event)
Handle room payed (incoming event)
Send acknowledgement of the booking (after payed and some processing)
Now, if your saga is not able to handle the payment if the reservation hasn't come in yet, then you are relying on the order of events.
In this case you should ask yourself: when will this break?
If you conclude you want to avoid the chronological dependency; consider a system without a saga, or a saga which does not depend on the order of events - i.e.: accepting all messages, even when it's not their turn yet in the process.
Some examples:
aggregators
Modeled as business process: parallel gateways (parallel process flows)
Do note in such a setup it is even more crucial that every action has got an implemented compensating action (rollback action).
I know this is often hard to accomplish; but, if you start small, you might start to like it :-)

RabbitMQ - Message order of delivery

I need to choose a new Queue broker for my new project.
This time I need a scalable queue that supports pub/sub, and keeping message ordering is a must.
I read Alexis comment: He writes:
"Indeed, we think RabbitMQ provides stronger ordering than Kafka"
I read the message ordering section in rabbitmq docs:
"Messages can be returned to the queue using AMQP methods that feature
a requeue
parameter (basic.recover, basic.reject and basic.nack), or due to a channel
closing while holding unacknowledged messages...With release 2.7.0 and later
it is still possible for individual consumers to observe messages out of
order if the queue has multiple subscribers. This is due to the actions of
other subscribers who may requeue messages. From the perspective of the queue
the messages are always held in the publication order."
If I need to handle messages by their order, I can only use rabbitMQ with an exclusive queue to each consumer?
Is RabbitMQ still considered a good solution for ordered message queuing?
Well, let's take a closer look at the scenario you are describing above. I think it's important to paste the documentation immediately prior to the snippet in your question to provide context:
Section 4.7 of the AMQP 0-9-1 core specification explains the
conditions under which ordering is guaranteed: messages published in
one channel, passing through one exchange and one queue and one
outgoing channel will be received in the same order that they were
sent. RabbitMQ offers stronger guarantees since release 2.7.0.
Messages can be returned to the queue using AMQP methods that feature
a requeue parameter (basic.recover, basic.reject and basic.nack), or
due to a channel closing while holding unacknowledged messages. Any of
these scenarios caused messages to be requeued at the back of the
queue for RabbitMQ releases earlier than 2.7.0. From RabbitMQ release
2.7.0, messages are always held in the queue in publication order, even in the presence of requeueing or channel closure. (emphasis added)
So, it is clear that RabbitMQ, from 2.7.0 onward, is making a rather drastic improvement over the original AMQP specification with regard to message ordering.
With multiple (parallel) consumers, order of processing cannot be guaranteed.
The third paragraph (pasted in the question) goes on to give a disclaimer, which I will paraphrase: "if you have multiple processors in the queue, there is no longer a guarantee that messages will be processed in order." All they are saying here is that RabbitMQ cannot defy the laws of mathematics.
Consider a line of customers at a bank. This particular bank prides itself on helping customers in the order they came into the bank. Customers line up in a queue, and are served by the next of 3 available tellers.
This morning, it so happened that all three tellers became available at the same time, and the next 3 customers approached. Suddenly, the first of the three tellers became violently ill, and could not finish serving the first customer in the line. By the time this happened, teller 2 had finished with customer 2 and teller 3 had already begun to serve customer 3.
Now, one of two things can happen. (1) The first customer in line can go back to the head of the line or (2) the first customer can pre-empt the third customer, causing that teller to stop working on the third customer and start working on the first. This type of pre-emption logic is not supported by RabbitMQ, nor any other message broker that I'm aware of. In either case, the first customer actually does not end up getting helped first - the second customer does, being lucky enough to get a good, fast teller off the bat. The only way to guarantee customers are helped in order is to have one teller helping customers one at a time, which will cause major customer service issues for the bank.
It is not possible to ensure that messages get handled in order in every possible case, given that you have multiple consumers. It doesn't matter if you have multiple queues, multiple exclusive consumers, different brokers, etc. - there is no way to guarantee a priori that messages are answered in order with multiple consumers. But RabbitMQ will make a best-effort.
Message ordering is preserved in Kafka, but only within partitions rather than globally. If your data need both global ordering and partitions, this does make things difficult. However, if you just need to make sure that all of the same events for the same user, etc... end up in the same partition so that they are properly ordered, you may do so. The producer is in charge of the partition that they write to, so if you are able to logically partition your data this may be preferable.
I think there are two things in this question which are not similar, consumption order and processing order.
Message Queues can -to a degree- give you a guarantee that messages will get consumed in order, they can't, however, give you any guarantees on the order of their processing.
The main difference here is that there are some aspects of message processing which cannot be determined at consumption time, for example:
As mentioned a consumer can fail while processing, here the message's consumption order was correct, however, the consumer failed to process it correctly, which will make it go back to the queue. At this point the consumption order is intact, but the processing order is not.
If by "processing" we mean that the message is now discarded and finished processing completely, then consider the case when your processing time is not linear, in other words processing one message takes longer than the other. For example, if message 3 takes longer to process than usual, then messages 4 and 5 might get consumed and finish processing before message 3 does.
So even if you managed to get the message back to the front of the queue (which by the way violates the consumption order) you still cannot guarantee they will also be processed in order.
If you want to process the messages in order:
Have only 1 consumer instance at all times, or a main consumer and several stand-by consumers.
Or don't use a messaging queue and do the processing in a synchronous blocking method, which might sound bad but in many cases and business requirements it is completely valid and sometimes even mission critical.
There are proper ways to guarantuee the order of messages within RabbitMQ subscriptions.
If you use multiple consumers, they will process the message using a shared ExecutorService. See also ConnectionFactory.setSharedExecutor(...). You could set a Executors.newSingleThreadExecutor().
If you use one Consumer with a single queue, you can bind this queue using multiple bindingKeys (they may have wildcards). The messages will be placed into the queue in the same order that they were received by the message broker.
For example you have a single publisher that publishes messages where the order is important:
try (Connection connection2 = factory.newConnection();
Channel channel2 = connection.createChannel()) {
// publish messages alternating to two different topics
for (int i = 0; i < messageCount; i++) {
final String routingKey = i % 2 == 0 ? routingEven : routingOdd;
channel2.basicPublish(exchange, routingKey, null, ("Hello" + i).getBytes(UTF_8));
}
}
You now might want to receive messages from both topics in a queue in the same order that they were published:
// declare a queue for the consumer
final String queueName = channel.queueDeclare().getQueue();
// we bind to queue with the two different routingKeys
final String routingEven = "even";
final String routingOdd = "odd";
channel.queueBind(queueName, exchange, routingEven);
channel.queueBind(queueName, exchange, routingOdd);
channel.basicConsume(queueName, true, new DefaultConsumer(channel) { ... });
The Consumer will now receive the messages in the order that they were published, regardless of the fact that you used different topics.
There are some good 5-Minute Tutorials in the RabbitMQ documentation that might be helpful:
https://www.rabbitmq.com/tutorials/tutorial-five-java.html