Masstransit change the message in the middleware - event-handling

I want to check some constraints on messages and "canonicalize" them accordingly on the way to the final consumer.
I need to listen to any event of any type , load the rules (based on the type of the message) from IoC apply changes to the messages and let all of them pass.
Am I in a true rode?
Am I allowed to change the messages in the middleware or I should do the following steps?
Listen to the events
Create commands
Send them
Handle the commands(apply the rules by the consumer)
Create new events from the changed messages
And finally publish new events from them

Related

Fully Transactional Spring Kafka Consumer/Listener

Currently, I have a Kafka Listener configured with a ConcurrentKafkaListenerContainerFactory and a SeekToCurrentErrorHandler (with a DeadLetterPublishingRecoverer configured with 1 retry).
My Listener method is annotated with #Transactional (and also all the methods in my Services that interact with the DB).
My Listener method does the following:
Receive message from Kafka
Interact with several services that save different parts of the received data to the DB
Ack message in Kafka (i.e., commit offset)
If it fails somewhere in the middle, it should rollback and retry until max retries.
Then send message to DLT.
I'm trying to make this method fully transactional, i.e., if something fails all previous changes are rolled back.
However, the #Transactional annotation in the Listener method is not enough.
How can I achieve this?
What configurations should I employ to make the Listener method fully transactional?
If you are not also publishing to Kafka from the listener, there is no need (or benefit) to using Kafka transactions; just overhead. The STCEH + DLPR is enough.
If you are also publishing to Kafka (and want those to be rolled back too), then see the documentation - configure a KafkaTransactionManager in the listener container.

What is the correctly way to publish message to saga using kafka?

I'm designing a library using the cqrs pattern. I treat saga as an aggregate using the method mentioned in https://blog.jonathanoliver.com/cqrs-sagas-with-event-sourcing-part-i-of-ii/.
Here I have question on how to implement the message delivery that subscribed by a saga.
Let's say I have two aggregates, Account and MoneyTransferSaga in the application.
TransferMoney command would trigger the MoneySent event that reduce the amount of account balance.
MoneySent event would trigger the creation of MoneyTransferSaga which dispatch the ReceiveMoney command to another Account that would increase the amount of account balance.
So, we can see MoneyTransferSaga is subscribing the event MoneySent. The question I would like to ask is, what is the correct way to send this MoneySent event to the saga?
I'm using kafka as the message broker and here I have topics account-commands and account-events for the account aggregate.
Approach A: mark the MoneyTransferSaga as another consumer group which subscribe the topic "account-events". When the coming event match the type "MoneySent", trigger the saga event handler. Other events would be ignored by the saga event handler.
Approach B: create a topic called "money-transfer-saga-events". When event is sent to account-events, the account event handler duplicate the event to topic "money-transfer-saga-events". The MoneyTransferSaga subscribe the topic "money-transfer-saga-events" instead of "account-events".
I'm not sure which approach is better.
Using approach A, the saga instance would receive the all the events from account.
Using approach B, the account service have to know the existence of MoneyTransferSaga.
Thanks.

How to process events which are out of order using Kafka Streams

I have an application where events are sent on a Kafka topic based on user actions like User Login, user's Intermediate actions (optional) and User Logout. Each event has some information in a event object along with userId , for example a Login Event has loginTime; Add Note has notes (Intermediate actions). Similarly a Logout event has logoutTime. The requirement is to aggregate information from all these events into one object after receiving the Logout event for each user & send it on downstream.
Due to some reasons (Network delay, multiple event producer) events may not come in order (User Logout event may come before Intermediate event), So the question is how to handle such scenarios? I can not wait for Intermediate events after receiving User Logout event since Intermediate events are optional depending on user's actions.
The only option which I think here, is to wait for some time after receiving User Logout event, process Intermediate events if received within that wait time & send processed event, but again not sure how to achieve this.
Kafka does not guarantee order on topic, it guarantee order on partition. One topic can have more than one partition so every consumer that is consuming your topic will consume one partition. That is how kafka is achieving scalability. So what you are experiencing is normal behavior (it isn't bug or related to network delay or something like that). What you can do is to make sure that all messages that you want to proceed in order are sent to the same partition. You can do that by setting number of partitions to 1, that is the dumbest way. When you send message with producer, by default kafka take a look into key, take hash of it and by that hash know on which partition should send a message. You can make sure that for all messages, the key is the same. That way all hashes of keys will be the same and all messages will go to the same partition. Also, you can implement custom partitioner and override default way how kafka choose on which partition message will go. In this way, all messages will arrive in order. If you cannot do any of this actions, then you will receive events out of order and you will have to think about a way how to consume them out of order but that is not question related to kafka.
If you are not able to preserve order of event (that Logout will be last event),
you can achieve your requirements using ProcesorApi from Kafka Streams. Kafka Streams DSL can be combine with Processor API (more details here).
You can have several partitions, but all events for particular user has to be send to same Partition.
You have to implement custom Processor/Transformer.
Your processor will be put each event/activity in state store (aggregate all event from particular user under same key).
Processor API gives you ability to create some kind of scheduler (Punctuator).
You can schedule to check every X seconds events for particular user. If Logout was long ago, you get all events/activities and make some aggregation and send results to downstreams.
As said in other answers, in Kafka order is maintained on per-partition basis.
Since you are talking about user events, why don't you make UserID as your Kafka topic key? So, that all events related to a specific user will always be ordered (provided they are produced by a single producer).
You should ensure (by design) that only one Kafka producer pushes all the user change events to the given topic. In this way, you can avoid out-of order messages due to multiple producers.
From streams, you might also want to look at Windows in Kafka streams. Tumbling windows for example is non-overlapping and fixed size. You aggregate records over a period of time.
Now you may want to sort the aggregated by their timestamp (or you said you have logout time, login time etc) and act accordingly.
Simple and effective solution
Use synchronous send and set delivery.timeout.ms and retries to a maximum value.
To ensure fault tolerance set acks=all with min.insync.replicas=2 (topic configuration) and use a single producer to push to that topic.
You should also set max.block.ms to some max value so that your send() does not return immediately if there is an error in fetching the metadata (for example, when Kafka is down).
Benchmark the synchronous send with your rate and check to see if it meets your requirements or benchmark number.
This ensures that a message that came first is sent first to Kafka and then the next message is not sent until the previous message is successfully acknowledged.
If your benchmark figure is not met, try having a back-pressure
mechanism like in-memory/persistent queue.
Add event to a queue in Thread-1
Peek (not dequeue) event from the queue in Thread-2
Call producer.send(...).get() in Thread-2
Dequeue the event in Thread-2
The key is to make your frontend tracker to send ordered events to the backend service which then produces events to kafka.
You can achieve that by batching the events, and sending the batched events to the backend only after the previous batched events are successfully delivered.

Using Single SQS for multiple subscribers based on message identifier

We have application where multiple subscribers are writing to publisher Kafka topic This data is then propagated to specific subscriber topic then subscriber consumes this data from specific topic assigned to it.
We want to use SQS for same purpose but issue is we will again need an SQS for each subscriber.
Handling these multiple SQS will create an issue and if there is time when no data is published to subscriber the queue assigned to it will be idle.
Is there any way i can use single SQS from which all subscribers can consumed messages base don message identifier.
Challenges needs to be cover in this design are:
Each subscriber can get its message based on identifier
Latency must not be there in case one publisher publish very few messages and other one is publishing it in millions.
We can have one SQS for each publisher but single SQS for all subscribers of this publisher.
Can any one suggest any architecture using similar implementation.
Thanks
I think you can achieve it by setting up a single SQS queue. You would want to set up a Lambda trigger on that queue which will serve as a Service Manager (SM). SM will have a static JSON file that define the mapping between message identifier and its subscriber/worker. SM will receive an SQS message event, find the message attribute used for identifier, and then look up in JSON to find the corresponding subscriber. If subscriber is found, SM will invoke it.
Consider using SQS batch trigger.

custom Flume interceptor: intercept() method called multiple times for the same Event

TL;DR
When a Flume source fails to push a transaction to the next channel in the pipeline, does it always keep event instances for the next try?
In general, is it safe to have a stateful Flume interceptor, where processing of events depends on previously processed events?
Full problem description:
I am considering the possibility of leveraging guarantees offered by Apache Kafka regarding the way topic partitions are distributed among consumers in a consumer group to perform streaming deduplication in an existing Flume-based log consolidation architecture.
Using the Kafka Source for Flume and custom routing to Kafka topic partitions, I can ensure that every event that should go to the same logical "deduplication queue" will be processed by a single Flume agent in the cluster (for as long as there are no agent stops/starts within the cluster). I have the following setup using a custom-made Flume interceptor:
[KafkaSource with deduplication interceptor]-->()MemoryChannel)-->[HDFSSink]
It seems that when the Flume Kafka source runner is unable to push a batch of events to the memory channel, the event instances that are part of the batch are passed again to my interceptor's intercept() method. In this case, it was easy to add a tag (in the form of a Flume event header) to processed events to distinguish actual duplicates from events in a failed batch that got re-processed.
However, I would like to know if there is any explicit guarantee that Event instances in failed transactions are kept for the next try or if there is the possibility that events are read again from the actual source (in this case, Kafka) and re-built from zero. In that case, my interceptor will consider those events to be duplicates and discard them, even though they were never delivered to the channel.
EDIT
This is how my interceptor distinguishes an Event instance that was already processed from a non-processed event:
public Event intercept(Event event) {
Map<String,String> headers = event.getHeaders();
// tagHeaderName is the name of the header used to tag events, never null
if( !tagHeaderName.isEmpty() ) {
// Don't look further if event was already processed...
if( headers.get(tagHeaderName)!=null )
return event;
// Mark it as processed otherwise...
else
headers.put(tagHeaderName, "");
}
// Continue processing of event...
}
I encountered the similar issue:
When a sink write failed, Kafka Source still hold the data that has already been processed by interceptors. In next attempt, those data will send to interceptors, and get processed again and again. By reading the KafkaSource's code, I believe it's bug.
My interceptor will strip some information from origin message, and will modify the origin message. Due to this bug, the retry mechanism will never work as expected.
So far, The is no easy solution.