How can I instantly get the result of my produced event in kafka and kafka-streams? - apache-kafka

I am simplifying my problem with the following scenario:
3 friends share a loyalty card. The card has two restrictions
can be used max 10 times (does not matter which uses the card, ie
friend_a can use it 10 times.
the max money in the card is 200. So with 1 "event" of value = 200 the card is "completed".
I am using a kafka producer that sends events in the kafka cluster like this
{ "name": "friend_1", "value": 10 }
{ "name": "friend_3", "value": 20 }
the events are posted to a topic that is connected with a kafka stream that groups by key and doing aggregation to sum the money spent. That seems to work, however I am facing a "concurrency issue"
Let's imaging the card is used 9 times, so only 1 time remains to be used and the total money spent is 190, that means there are 10 units left to spend.
So friend_2 wants to buy something that costs 11 units (which should not be allowed) and friend_3 wants to buy something that costs 9 units which should be allowed. Friend_3 will modify the state using the card for the 10th time. All other future attempts should not modify something.
So it seems reasonable for the card user to know if the event he sent modified the max used number and the total count. How can I do it in kafka? Using the streams aggregation I can always increase the values, but how can do I know if my action "modified the state" of the card?
UPDATE: the card user should immediately get a negative feedback if the transaction validates a rule.

From how I understand your question there are a few options.
One option is to derive a new stream after the aggregation that you filter() for data that would modify 'the state' of the card, like filtering all events that have > 200 units spent or > 10 uses. This stream can then be used to inform the card user(s) that the card has been spent, e.g. by sending an email. This is approach can be implemented solely with the DSL.
When more flexibility or tighter control is needed, another option is to use the Processor API (which you can integrate with the DSL so most of your code could keep using the DSL), where you implement the aggregation step yourself (working with state stores that you attach to a Transformer or Processor). During the aggregation, you can implement the logic that checks whether the incoming event is valid (in your example: friend_3 with 9 units is valid, friend_2 with 11 units is not). If it is valid, the aggregation increases the card's counters (units and uses), and that's it. If it is invalid, the event is discarded and will not modify the counters, the Transformer/Processor can emit a new event to another stream that tells the card user(s) that something didn't work. You can similarly implement the functionality to inform users that a card has been fully used, that a card is no longer usable, or any other 'state change' of that card.
Also, depending on what you want to do, take a look at the interactive queries feature of Kafka Streams. Sometimes other applications may want to do quick point lookups (queries) of the latest state of something, like the card's state, which can be done with interactive queries via e.g. a REST API.
Hope this helps!

Related

What is the best way to handle obsolete information in a database with a spring boot application

I am working on an application tracking objects detected by multiple sensors. I receive my inputs by consuming a kafka message and I save the informations in a postgresql database.
An object is located in a specific location if it's last scan was detected by sensor in that exact location, example :
Object last scanned by sensor in room 1 -> means last location known for the object is room 1
The scans are continously happening and we can set a frequency of a few seconds to a few minutes. So for an object that wasn't scanned in the last hour for example. It needs to be considered out of range.
So my question now is, how can I design a system that generates some sort of notification when a device is out of range ?
for example if the timestamp of the last detection goes longer than 5 minutes ago, it triggers a notification.
The only solution I can think of, is to create a batch that repeatedly checks for all the objects with last detection time is more than 5 minutes for example. But I am wondering if that is the right approach and I would like to ask if there is a better way.
I use kotlin and spring boot for my application.
Thank you for your help.
You would need some type of heartbeat mechanism, yes.
Query all detection events with "last seen timestamp" greater than your threshold, and fire an alert when that returned result set is more than some tolerable threshold (e.g. if you are willing to accept intermittent lost devices and expect them to be found in the next scan).
As far as "where/how" to alert - up to you. Slack webhooks are a popular example. Grafana can do alerting and query your database.

Orion CB Batch updates and notifications throttle

I'm running several batch updates (with about 200 entities each one) and I have a Quantum Leap subscribed to the CB to capture the historical data. It is supposed that each batch update should generate 200 different notifications, but, if I have set a throttle of "1", does this mean I will only receive the first notification and loose the other 199? Is this right? (just watching the QL, it seems to me that I'm loosing many notifications)
Entities are processed one by one during a batch update request. Thus, if you have 200 entities the update in each one triggers a subscription, then you will have 200 notifications being sent.
The throttling effect depends on the case. For instance:
If each entity triggers a different subscription, then they have any effect (as the trottling is evaluated by subscription)
If each entity triggers the same subscription and assuming that all the 200 notifications are send very fast (let's say, less than 1 second) then only the first one will be sent, and the 199 remaining ones will be lost.
In general, we don't recommend to use throttling due to this (and some other :) problems. It uses to be better to implement traffic flow control in the receiver.

Can event sourcing be used to resolve late arriving events

We have are developing an application that will receive events from various systems via a message queue (Azure) but it is just possible that some events (messages) will not arrive in the order they were sent. These events will be received and processed by a central CQRS/ES based system but my worry is that if the events are placed in the event store in the wrong order we will get garbage out (for example "order create" after "add order item").
Are typical ES systems meant to resolve this issue or are we meant to ensure that such messages are put in the right order before being pushed into the event store? If you have links to articles that back up either view it would help.
Edit: I think my description is clearly far too vague so the responses, while helpful in understanding CQRS/ES, do not quite answer my problem so I'll add a little more detail and hopefully someone will recognise the problem.
Firstly the players.
the front end web site (not actually relevant to this problem) delivers orders to the management system.
our management system which takes orders from the web site and passes them to the warehouse and is hosted on site.
the warehouse which accepts orders, fulfils them if possible and notifies us when an order is fulfilled or cannot be partially or completely fulfilled.
Linking the warehouse to the management system is a fairly thin Azure cloud based coupling. Messages from the warehouse are sent to a WCF/Soap layer in the cloud, parsed, and sent over the messages bus. Message to the warehouse are sent over the message bus and then, again in the cloud, converted into Soap calls to a server at the warehouse.
The warehouse is very careful to ensure that messages it sends have identifiers that increment without a gap so we can know when a message is missed. However when we take those messages and forward them to the management system they are transported over the message bus and could, in theory, arrive in the wrong order.
Now given that we have a sequence number in the messages we could ensure the messages are put back in the right order before they are sent to the CQRS/ES system but my questions is, is that necessary, can the ES actually be used to reorder the events into the logical order they were intended?
Each message that arrives in Service Bus is tagged with a SequenceNumber. The SequenceNumber is a monotonically increasing, gapless 64-bit integer sequence, scoped to the Queue (or Topic) that provides an absolute order criterion by arrival in the Queue. That order may different from the delivery order due to errors/aborts and exists so you can reconstitute order of arrival.
Two features in Service Bus specific to management of order inside a Queue are:
Sessions. A sessionful queue puts locks on all messages with the same SessionId property, meaning that FIFO is guaranteed for that sequence, since no messages later in the sequence are delivered until the "current" message is either processed or abandoned.
Deferral. The Defer method puts a message aside if the message cannot be processed at this time. The message can later be retrieved by its SequenceNumber, which pulls from the hidden deferral queue. If you need a place to keep track of which messages have been deferred for a session, you can put a data structure holding that information right into the message session, if you use a sessionful queue. You can then pick up that state again elsewhere on an accepted session if you, for instance, fail over processing onto a different machine.
These features have been built specifically for document workflows in Office 365 where order obviously matters quite a bit.
I would have commented on KarlM's answer but stackoverflow won't allow it, so here goes...
It sounds like you want the transport mechanism to provide transactional locking on your aggregate. To me this sounds inherently wrong.
It sounds as though the design being proposed is flawed. Having had this exact problem in the past, I would look at your constraints. Either you want to provide transactional guarantees to the website, or you want to provide them to the warehouse. You can't do both, one always wins.
To be fully distributed: If you want to provide them to the website, then the warehouse must ask if it can begin to fulfil the order. If you want to provide them to the warehouse, then the website must ask if it can cancel the order.
Hope that is useful.
For events generated from a single command handler/aggregate in an "optimistic locking" scenario, I would assume you would include the aggregate version in the event, and thus those events are implicitly ordered.
Events from multiple aggregates should not care about order, because of the transactional guarantees of an aggregate.
Check out http://cqrs.nu/Faq/aggregates , http://cqrs.nu/Faq/command-handlers and related FAQs
For an intro to ES and optimistic locking, look at http://www.jayway.com/2013/03/08/aggregates-event-sourcing-distilled/
You say:
"These events will be received and processed by a central CQRS/ES based system but my worry is that if the events are placed in the event store in the wrong order we will get garbage out (for example "order create" after "add order item")."
There seems to be a misunderstanding about what CQRS pattern with Event Sourcing is.
Simply put Event Sourcing means that you change Aggregates (as per DDD terminology) via internally generated events, the Aggregate persistence is represented by events and the Aggregate can be restored by replaying events. This means that the scope is quite small, the Aggregate itself.
Now, CQRS with Event Sourcing means that these events from the Aggregates are published and used to create Read projections, or other domain models that have different purposes.
So I don't really get your question given the explanations above.
Related to Ordering:
there is already an answer mentioning optimistic locking, so events generated inside a single Aggregate must be ordered and optimistic locking is a solution
Read projections processing events in order. A solution I used in the past was to to publish events on RabbitMQ and process them with Storm.
RabbitMQ has some guarantees about ordering and Storm has some processing affinity features. For Storm, (as far as I remember) allows you to specify that for a given ID (for example an Aggregate ID) the same handler would be used, hence the events are processed in the same order as received from RabbitMQ.
The article on MSDN https://msdn.microsoft.com/en-us/library/jj591559.aspx states "Stored events should be immutable and are always read in the order in which they were saved" under "Performance, Scalability, and consistency". This clearly means that appending events out of order is not tolerated. The same article also states multiple times that while events cannot be altered, corrective events can be made. This would imply again that events are processed in the order they are received to determine the current truth (state of of the aggregate). My conclusion is that we should fixed the messaging order problem before posting events to the event store.

How to resequence after filtering for aggregation /Spring Integration/

I'm doing a project in Spring Integration and I have a big problem.
There are some filtering components in the flow and later in the flow I have an aggregation element.
The problem is that the filtering component does not support to "apply-sequence" property. It filters out some records without modifying the original sequence number however the number of messages are reduced.
Later in the flow I need an aggregation which fails releasing elements since some messages are filtered out.
I don't want to use any special routing elements which have apply-sequence property.
Can you suggest me any common solution for this type of filtering problem?
Thanks,
I'd say you misunderstand the behaviour of the filter and aggregator.
I guees you have some apply-sequence-aware component upstream. So, all messages in that group accept several headers - correlationId - to group messages in the default aggregator; sequenceNumber - the index of the message; sequenceSize - the number of messages in the group.
Filter just checks messages for some condition and sends them to the outpu-channel or does discard logic. It doesn't modify messages. However even if we could do that, it doesn't sounds good anyway.
Assume we have just only two messages in the group. The first on is OK for filtering - we just send it to the aggregator. But the second is discarded, and, yes, it won't be sent to aggregator. And the last one never releases that group, because the sequenceSize isn't reached.
To overcome your requirement you need to have some custom ReleaseStrategy on the aggregator (by default it is SequenceSizeReleaseStrategy). For example to check some state in your system that all messages in the group have been sent independently of true or false result after filter. Or have some fake message for the same reason and check its availability in the group.
In this case you will need just take care about correlationId to group messages in the aggregator.
UPDATE
What is the suggested release strategy for such a scenario? Would it be a good strategy to use timeout as release stretegy?
What I can say that sometimes it is really difficult to find good solution for some integration scenarios. The messaging is stateless by nature, so to correlate and group an undetermined number of messages may be a problem.
There is need to see requirements and environment.
For example when all your messages are processed in the single thread you can safely send some fake marker message in the end directly to the aggregator and check it from ReleaseStrategy. And it will work even when all your messages from the group may be discarded.
If you process those messages in parallel or they are received from different threads, you really won't be able to determine the order of messages and the time for each process.
In this case the TimeoutCountSequenceSizeReleaseStrategy really can help. Of course, there will be need to find the good timeframe compromise according to the requirements to your system.

Bloomberg Java API - bond yield in real time subscription

Goal:
I use Bloomberg Java API's subscription service to monitor bond prices in real time (subscribing to ASK/BID real time fields). However in the RESPONSE messages, bloomberg does not provide the associated yield for the given price. I need a way to calculate the yields.
Attempt:
Here's what I've tried:
Within in the code that processes Events coming backing from a real time subscription, when I get a BID or ASK response, I extract the price from the message element, and then initiates a new synchronous reference data request, using overrides to get the YAS_BOND_YLD by providing YAS_BOND_PX and setting the overriding flag.
Problem:
This seems very slow and cumbersome. Is there a better way other than having to calculate yields myself?
In my code, I seem to be able to process real time prices if they are being sent to me slowly. If a few bonds' prices were updated at the same time (say, in MSG1 pricing), I seem to only capture one out of these updates, it feels like I'm missing the other events.. Is this because I cannot use a synchronous reference data request while the subscription is still alive?
Thanks.
bloomberg does not provide the associated yield for the given price
Have you tried retrieving the ASK_YIELD and BID_YIELD fields? They may be what you are looking for.
Problem: This seems very slow and cumbersome.
Synchronous one-off requests are slower than real time subscription. Unless you need real time data on the yield, you could queue the requests and send them all at once every x seconds for example. The time to get 100 or 1 yield is probably not that different, and certainly not 100 times slower.
In my code, I seem to be able to process real time prices if they are being sent to me slowly. If a few bonds' prices were updated at the same time (say, in MSG1 pricing), I seem to only capture one out of these updates, it feels like I'm missing the other events.. Is this because I cannot use a synchronous reference data request while the subscription is still alive?
You should not miss items just because you are sending a synchronous request. You may get a "Slow consumer warning" but that's about it. It's difficult to say more without seeing your code. However, if you want to make sure your real time data is not delayed by your synchronous requests, you should use two separate Sessions.