Spring Cloud Stream Kafka - How to implement idempotency to support distributed transaction management (eventual consistency) - spring-cloud

I have the following typical scenario:
An order service used to purchase products. Acts as the commander of the distributed transaction.
A product service with the list of products and its stock.
A payment service.
Orders DB Products DB
| |
--------------- ---------------- ----------------
| OrderService | | ProductService | | PaymentService |
--------------- ---------------- ----------------
| | |
| -------------------- |
--------------- | Kafka orders topic |-------------
---------------------
The normal flow would be:
The user orders a product.
Order service creates an order in DB and publishes a message in Kafka topic "orders" to reserve a product (PRODUCT_RESERVE_REQUEST).
Product service decreases the product stock one unit in its DB and publishes a message in "orders" saying PRODUCT_RESERVED
Order service gets the PRODUCT_RESERVED message and orders the payment publishing a message PAYMENT_REQUESTED
Payment service orders the payment and answers with a message PAYED
Order service reads the PAYED message and marks the order as COMPLETED, finishing the transaction.
I am having trouble to deal with error cases, e.g: let's assume this:
Payment service fails to charge for the product, so it publishes a message PAYMENT_FAILED
Order service reacts publishing a message UNDO_PRODUCT_RESERVATION
Product service increases the stock in the DB to cancel the reservation and publishes PRODUCT_UNRESERVATION_COMPLETED
Order service finishes the transaction saving the final state of the order as CANCELLED_PAYMENT_FAILED.
In this scenario imagine that for whatever reason, order service publishes a UNDO_PRODUCT_RESERVATION message but doesn't receive the PRODUCT_UNRESERVATION_COMPLETED message, so it retries publishing another UNDO_PRODUCT_RESERVATION message.
Now, imagine that those two UNDO_PRODUCT_RESERVATION messages for the same order end up arriving to ProductService. If I process both of them I could end up setting an invalid stock for the product.
In this scenario how can I implement idempotency?
UPDATE:
Following Artem's instructions I can now detect duplicated messages (by checking the message header) and ignore them but there may still be situations like the following where I shouldn't ignore the duplicated messages:
Order Service sends UNDO_PRODUCT_RESERVATION
Product service gets the message and starts processing it but crashes before updating the stock.
Order Service doesn't get a response so it retries sending UNDO_PRODUCT_RESERVATION
Product service knows this is a duplicated message BUT, in this case it should repeat the processing again.
Can you help me come up with a way to support this scenario as well? How could I distinguish when I should discard the message or reprocess it?

We used spring-integration-kafka to produce and consume messages with Kafka in our microservices. In our case, we send org.springframework.messaging.Message objects to topics and get the same type from topics after deserialization from byte-array. In Message entity there are message-id, sent-time etc. headers values other than message payload which is the actual object that you want to transfer from one microservice to others. We use unique message-id value to implement idempotency. On producer side, you must implement some logic to ensure that, the message-id of the Message is the same when it is produced multiple times. This is actually related to your produce logic. In our case, we use Publishing Events Using Local Transactions which is very well described in the blog https://www.nginx.com/blog/event-driven-data-management-microservices/ by Chris Richardson. With this approach we can recrate Message object with the same message-id on producer side. On consumer side, we persist all the consumed message id values to database and check this ids before processing the received messages. If we see a message whose id is in our persistent store, we simply ignore it.
In your case, To implement idempotency:
you should keep a unique identifier with the messages,
On producer side, you must generate the same identifier when it is produced multiple times,
On consumer side, you must check the received id to detect whether it is consumed before or not
Regarding to Second Scenario Which is Described in UPDATE,
I think you should change your mind a little bit. If you want to implement publish-subscribe mechanism which is more suitable in microservices architecture, you shouldn't wait response on producer side. In this scenario, you wait other message to know whether the consumer consumed the message or not and if it is not consumed by the consumer, you send it again.
How about the implementation below;
On producer side, you send messages to Kafka within a transaction in producer. You should provide a mechanism here to send messages to kafka only the transaction on producer side is committed. This is Atomicity issue and i give a link above which shows how to solve this issue.
On Consumer side, you poll messages from kafka topic one by one in order and you get the next message only when the current message can be consumed. If it is not consumed, you shouldn't get the next message. Because the next message might be related to current message and if you consume the next message you may corrupt consistency of your data. Its not producer's concern when the message not consumed. On consumer side, you should provide retry and replay mechanisms to consume messages.
I think you shouldn't wait response on producer side. Kafka is a very smart tool, and with its offset commit capability, as a consumer you don't have to consume messages when you poll messages from topic. If you have a problem while processing messages, you simply don't commit offset to get next message.
With the implementation described above, you don't have a problem like "How could I distinguish when I should discard the message or reprocess it?"
Regards...

actually because of the complications you mentioned about organizing transaction over multiple micro services over Apache Kafka, I developed another concept and wrote a blog about it.
If you reach a state of complication that Kafka solution might not be feasible anymore, you might find it as an interesting read. It is too long to explain here but basically it uses a J2EE container fully with Micro Service principle and with full transaction support between the Micro Services with the help of the Spring Boot + Netflix.
Micro Services Fanout and Transaction Problems and Solutions with Spring Boot and Netflix

Related

Kafka with multiple instances of microservices and end-users

This is more of a design/architecture question.
We have a microservice A (MSA) with multiple instances (say 2) running of it behind LB.
The purpose of this microservice is to get the messages from Kafka topic and send to end users/clients. Both instances use same consumer group id for a particular client/user so as messages are not duplicated. And we have 2 (or =#instances) partitions of Kafka topic
End users/clients connect to LB to fetch the message from MSA. Long polling is used here.
Request from client can land to any instance. If it lands to MSA1, it will pull the data from kafka partion1 and if it lands to MSA2, it will pull the data from partition2.
Now, a producer is producing the messages, we dont have high messages count. So, lets say producer produce msg1 and it goes to partition1. End user/client will not get this message unless it's request lands to MSA1, which might not happen always as there are other requests coming to LB.
We want to solve this issue. We want that client gets the message near realtime.
One of the solution can be having a distributed persistent queue (e.g. ActiveMQ) where both MSA1 and MSA2 keep on putting the messages after reading from Kafka and client just fetch the message from queue. But this will cause separate queue for every end-user/client/groupid.
Is this a good solution, can we go ahead with this? Anything that we should change here. We are deploying our system on AWS, so if any AWS managed service can help here e.g. SNS+SQS combination?
Some statistics:
~1000 users, one group id per user
2-4 instances of microservice
long polling every few seconds (~20s)
average message size ~10KB
Broadly you have three possible approaches:
You can dispense with using Kafka's consumer group functionality and allow each instance to consume from all partitions.
You can make the instances of each service aware of each other. For example, an instance which gets a request which can be fulfilled by another instance will forward the request there. This is most effective if the messages can be partitioned by client on the producer end (so that a request from a given client only needs to be routed to an instance). Even then, the consumer group functionality introduces some extra difficulty (rebalances mean that the consumer currently responsible for a given partition might not have seen all the messages in the partition). You may want to implement your own variant of the consumer group coordination protocol, only on rebalance, the instance starts from some suitably early point regardless of where the previous consumer got to.
If you can't reliably partition by client in the producer (e.g. the client is requesting a stream of all messages matching arbitrary criteria) then Kafka is really not going to be a fit and you probably want a database (with all the expense and complexity that implies).

If Service produce event to one topic, only this service have to consume processed event from another topic (Kafka)

I have to implement event driven architecture services with Kafka (Java tech stack).
I drew example:
Imagine that I have 3 external producers (Ms1, Ms2, Ms3), who sends events in to one topic, which my service reads. After receiving event, my service processing some business logic and than pushes event to another topic. Ms1, Ms2, Ms3 subscribe on this topic and listen what come in. My goal is: if Ms1 sent event to topic-1, only Ms1 must received response event from topic-2 (despite the fact that other Consumers are listening to this topic too, they are forbidden to receive event belong to Ms1). If Ms2 sent event to topic-1, than only Ms2 must received event from topic-2.
And I don't know how many consumers/producer will be. It's floating amount. Today it can be 3 external producers/consumers, tomorrow maybe 30 and so on. They can subscribe and unsubscribe.
Kafka records shouldn't "belong" to particular services, IMO, this is mostly metadata about data lineage; maybe that information will be valuable for some other consumer use case that you've not considered yet.
If you have multiple consumers from one topic, there's no logic outside of filtering and explicit partition assignments that would get "all M1 producer events to all M1 consumers"
If you want to lock down access to topics to particular clients, use ACLs and certificates. Otherwise, there's nothing stopping new consumer groups from subscribing to whatever topics they want

Kafka Streams handling timeout on a cluster

In a Kafka based distributed JVM application running in several instances, I need to act on the event of "not receiving" a certain message in a specific Kafka topic for a certain configurable amount of time (this timeout value is driven by the business logic, is subject to change).
How can I accomplish this in a cluster-safe way?
Is the goal to trace latency of the E2E flow or is there some trigger which causes a second message to be expected in some configurable time?
If tracking latency, some options include:
Add a timestamp to the message. When the message is received, the latency can be calculated and used.
Add UUID, timestamp, and current component to the message and delegate message tracking to a separate service partitioned on UUID.
If some trigger causes a second message to be expected, some options include:
Partition the relevant topic in a way that guarantees the expected message will either arrive or not arrive at only 1 JVM (similar to 2 above). This will allow a list of expected messages to be kept in memory. Remove the expected messages when received and every N seconds handle 'not received' messages.
Keep track of the expected messages in a data store (DB/distributed cache). When received, remove the records. Periodically, handle 'not received' messages.
Edit:
With details in the comment, one way to approach this with a callback style approach. Messages can be routed to a specific server by setting a partition key. By adding an intermediate topic/service partitioned on UUID it should be possible to achieve this as follows:
Send Message A to ttl_routing_service. Message A should include a UUID, TTL, where to send the message (functional topic), and what to do on expiry.
Routing Service picks up the message and tracks some metadata (ex: TTL/what to do on timeout) in a local cache or starts a delayed coroutine then routes message A to the functional topic including the UUID.
On completion of message A processing, a message can be sent to ttl_routing_service with the UUID preventing the coroutine or removing the cached record.
If not removed, 'what to do on expiry' is performed.

How to configure Kafka RPC caller topic and group

I'm trying to implement an RPC architecture using Kafka as a message broker. The decision of using Kafka instead of another message broker solution is dictated by the current context.
The actual implementation consists on two different types of service:
The receiver: this service receives messages from a Kafka topic which consumes, processes the messages and then publish the response message to a response topic;
The caller: this service receives HTTP requests, then publish messages to the receiver topic, consumes the response topic of the receiver service for the response message, then returns it as an HTTP response.
The request/response messages published in the topics are related by the message key.
The receiver implementation was fairly simple: at startup, it creates the "request" and "response" topic, then starts consuming the request topic with the service group id (many instances of the receiver will share the same group id in order to implement a proper request balance). When a request arrives, the service processes the request and then publish the response in the response topic.
My problem is with the caller implementation, in particular while consuming the response from the response queue.
With the following assumptions:
The HTTP requests must be managed concurrently;
There could be more than one instance of this caller service.
every single thread/service must receive all the messages in the response topic, in order to find the message with the corresponding request key.
As an example, imagine that two receiver services produce two messages with keys 1 and 2 respectively. These messages will be published in the receiver topic, and processed. The response will then be published in the topic receiver-responses. If the two receiver services share the same group-id, it could be that response 1 arrives to the service that published message 2 and vice versa, resulting in a HTTP timeout.
To avoid this problem, I've managed to think these possible solutions:
Creating a new group for every request (EDIT: but a group cannot be deleted via code, hence it would be necessary another service to clean the zookeeper from these groups);
Creating a new topic for every request, then delete it afterwards.
Hoping that I made myself sufficiently clear - I must admit I am a beginner to Kafka - my question would be:
Which solution is more costly than the other? Or is there another topic/group configuration that could achieve the assumption 3?
Thanks.
I think I've found a possible solution. A group will be automatically deleted by the zookeeper when it's offset doesn't update for a period of time, determined by the configuration offsets.topic.retention.minutes.
The offset update time check should be possible to set up by setting the configuration offsets.retention.check.interval.ms.
This way, when a consumer connects to the response topic searching for the reply message, the created group can be abandoned, and it will be deleted by the zookeeper later in time.

Pub-Sub messaging system for two way communication

Hi, I have the system as captured in the image. I'm planning to adopt a reliable messaging system, but I'm bit confused over which one to use. Below explained the detail flow of data and my requirement.
Step 1: data from System is given to Publisher.
Step 2: Publisher simply pushes the data to the Topic based Messaging
system.
Step 3: There will be more than one subscribers for each topic and
subscribers should get notified as soon there are some entries in
messaging system.
Step 4: Subscribers process the data and update the status back to messaging
system.
Step 5: Publisher should get notified for the processed messages and
acknowledge the System which gave the data.
So, my question is can I use RabbitMq or Kafka for "Topic Based Messaging System" ? my main requirement here is to update the status back from subscribers and also publisher should get notification for the status update. (I'm not much bothered about the throughput, performance, scalable AT THIS POINT of TIME). Also my another concern is data recovery/HA.
How about having a N+1 topic system, one for publishing messages which would be consumed by N subscribers, and N topics for acknowledgements, one per subscriber.
Your "System" could subscribe to all these N acknowledgment topics, and can verify if all the subscribers processed the original message which was published by the producer.
Each message in Kafka for eg. has a message key, and the same message key can be used to co-relate the original message with its subscriber specific acknowledgement.
Does this achieve what you want in your system ?