Different Pseudo Clock for groups of Facts - drools

I am new to drools / fusion (7.x) and am not sure how to solve this requirement. Assume I have event objects as Event{long: timestamp, id: string} where id identifies a physical asset (like tractor) and timestamp represents the time the event fired relative to the asset. In my scenario these Events do not arrive in my system in 'real-time', meaning they can be seconds, minutes or even days late. And my rules system needs to monitor multiple assets. Given this, when rules are evaluate the clock needs to be relative to the asset being monitored, it can't be a clock that spans assets.
I'm aware of Pseudo Clock, is there a way to assign Pseudo clocks per Asset?
My assumption is that a clock must always progress forward or temporal functions will not work properly. Take for the example the following scenario:
Fact A for Asset 1 arrive at 1:00 it is inserted into memory and rules fired. Then Fact B arrives for same Asset 1 at 2:00. It too is inserted and rules fired. Now Fact Z arrives for Asset 2 at 1:30 (- 30 minutes from clock). I'm assuming I shouldn't simply progress the clock backwards and evaluate, furthermore I'd want to set the clock back to 2:00 since that was the "latest" data I received. Now assume I am monitoring thousands of assets, all sending data at different times...
The best way I can think to address this is to keep a clock per asset and then save the engine state when each assets data is evaluated. Can individual KieSession's have different clocks, or is it at a container level?
Sample rule: When Fact 1 arrives after Fact 2 for the same Asset.

You're approaching the problem incorrectly. Regardless of whether you're using a realtime or psuedo clock, you're using a clock. You can't say "Fact #1 use clock A, and Fact #2 use clock B."
Instead you should be leveraging the metadata tags for events, specifically the #timestamp tag. This tag indicates to Drools that a specific field inside of the event is actually the timestamp for the Event, rather than the actual time the fact enters working memory.
For example:
import com.example.SampleEvent
declare SampleEvent
#role( event )
// this field is actually in the object, it's not the time the fact was inserted
#timestamp( createdDateTime )
end
Not knowing anything about what your rules are actually doing, the major issue I can foresee here is that if your rules rely on the temporal operators or define an expiry (#expires), they're not going to work and you'll need to redesign them. Especially for expirations: once an event expires, it is removed from working memory; when your out-of-band events come in any previously expired events are already gone and can't be worked against.
Of course that concern would be true regardless of whether you use #timestamp or your original "different psuedo clock" plan. Either way you're going to have to manage the fact that events cannot live forever in working memory -- you will eventually run out of resources and your system will crash. Events must be evicted at some point, so you'll need to design around that in both your models and your rules.

Related

Swift: How can I change backgroundFetchIntervalMinimum with BGAppRefreshTask

I am changing my deprecated function setMinimumBackgroundFetchInterval() to new one BGAppRefreshTask() but every example is using actual-time like this:
request.earliestBeginDate = Date(timeIntervalSinceNow: 15 * 60) // Fetch no earlier than 15 minutes from now
but my current source used backgroundFetchIntervalMinimum like...
UIApplication.shared.setMinimumBackgroundFetchInterval(UIApplication.backgroundFetchIntervalMinimum)
so how can I apply backgroundFetchIntervalMinimum with BGAppRefreshTask?
Setting the request’s earliestBeginDate is equivalent to the old minimum fetch interval. Just use earliestBeginDate like you have seen in those other questions. And when your app runs a background fetch, you just schedule the next background fetch with its own earliestBeginDate at that point. That is how BGAppRefreshTask works.
If you are asking what is equivalent to UIApplication.backgroundFetchIntervalMinimum, you can just set earliestBeginDate to nil. The documentation tells us that:
Specify nil for no start delay.
Setting the property indicates that the background task shouldn’t start any earlier than this date. However, the system doesn’t guarantee launching the task at the specified date, but only that it won’t begin sooner.
The latter caveat applies to backgroundFetchIntervalMinimum, too. Background fetch, regardless of which mechanism you use, is at the discretion of the OS. It attempts to balance the app’s desire to have current data ready for the user the next time they launch the app with other considerations (such as preserving the device battery, avoiding excessive use of the cellular data plan, etc.).
The earliestBeginDate (like the older backgroundFetchIntervalMinimum) is merely a mechanism whereby you can give the OS additional hints to avoid redundant fetches. E.g., if your backend only updates a resource every twelve hours, there is no point in performing backend refreshes much more frequently than that.

Event sourcing, Eventually Consistent, CQRS

I am currently stuck in a problem and hope you can help me with some ideas or best practices here...
Let's say I have an application using Event Sourcing and CQRS. I have
A green pot containing a number
A red pot containing a number
A setting where I define, the number from which pot should be displayed in the UI.
A calculated result that contains the number to be displayed
The current state of my application is
Red pot: 10
Green pot 20
Setting: Red
Result: 10 (value from red pot)
I have a Calculation service that subscribes to Red Pot service, Green Pot service and Settings service. I have a View Updater service that additionally subscribes to the Calculation service and updates the read model on any changes.
Now the following events are dropping in:
Green Pot: 25
Setting: Green Pot
The View Updater service is a bit busy today and has some delay in updating the view model.
The Calculation service handles the Green Pot event. It fetches the setting from read model (which is still set to Red) and does nothing.
After that the Calculation service handles Settings event. It fetches the Green value (which is still 20) from read read model and sends a new event (Result: 20)
After that the View Updater handles both events and updates the read model.
In this case, my application is not consistent - not even eventually.
Do have any thoughts how to handle something like this? Thanks for any ideas :-)
First thought is that it isn't clear that you are sharing the common understanding of eventual consistency. Martin Kleppmann's talk emphasized three ideas
Eventual Delivery
Convergence
No data loss
Second thought is that you seem to have introduced a race condition into your Calculation service design. If RedPot, GreenPot, and Setting are separate aggregates/streams, then there really isn't any time binding between them. The arrival of events from those sources is inherently racy.
Udi Dahan wrote Race Conditions Don't Exist
A microsecond difference in timing shouldn’t make a difference to core business behaviors.
This is where convergence comes into play: you need to design your solutions so that they reach the same results even though the timing of messages is different. This often means that you need to include in your model some concept of a clock, or time, and some way of defining the interval in which some result is true.
As you've defined the problem, the Results produced by the calculation service are more about caching a snapshot than maintaining a history. So a different way to think about the problem you are having is to consider that the calculation service should not accept any arbitrary data from the read model, but rather one that aligns with the events that it is consuming.
Calculation service: "Hi, I can haz green pot as of event #40?"
Read model: "Sorry, no can haz, retry-after: 10 minutes"
Calculation service: "OK, I'll try again later."
The Calculation service should subscribe to and receive all events (GreenPotUpdated, RedPotUpdated and SettingsChanged).
It is not OK for the Calculation service to depend on an eventually consistent read-model. Instead it should maintain its own private state, ensuring that the events are received in the correct order.

Can I use Time as globally unique event version?

I found time as the best value as event version.
I can merge perfectly independent events of different event sources on different servers whenever needed without being worry about read side event order synchronization. I know which event (from server 1) had happened before the other (from server 2) without the need for global sequential event id generator which makes all read sides to depend on it.
As long as the time is a globally ever sequential event version , different teams in companies can act as distributed event sources or event readers And everyone can always relay on the contract.
The world's simplest notification from a write side to subscribed read sides followed by a query pulling the recent changes from the underlying write side can simplify everything.
Are there any side effects I'm not aware of ?
Time is indeed increasing and you get a deterministic number, however event versioning is not only serves the purpose of preventing conflicts. We always say that when we commit a new event to the event store, we send the new event version there as well and it must match the expected version on the event store side, which must be the previous version plus exactly one. If there will be a thousand or three millions of ticks between two events - I do not really care, this does not give me the information I need. And if I have missed one event on the go is critical to know. So I would not use anything else than incremental counter, with events versioned per aggregate/stream.

Reactive systems - Reacting to time passing

Let's say we have a reactive sales forecasting system.
Every time we make a sale we re-calculate our Forecast for future sales.
This works beautifully if there are lots of sales triggering our re-forecasting.
What happens however if sales go from 100 events per second, to 0. And stay 0 for a long time?
The forecast we published back when sales were good stays being the most up to date forecast.
How would you model in this situation an event that represents 'No sales happening' without falling back to some batch hourly/minutely/arbitrary time segment event that says 'X time has passed'.
This is a specific case of a generic question - How do you model time passing with nothing happening in an event based system - without using a ticking clock style event which would wake everyone up to reconsider their current values [an implementation which would not scale].
The only option I have considered that makes sense:
Every time we take a sale, we also schedule a deferred event 2 hours in the future that asks us to reconsider our assessment of that sale.
In handling that deferred event we may then choose to schedule further deferred events for re-considering.
Considering this is a very generic scenario, you've made a rather large assumption that it's not possible to come up with a design for re-evaluating past sales in a scalable way unless it's done one sale at a time.
There are many different scale related numbers in the scenario, and you're only looking at the one whereby a single scheduled forecast updater may attempt to process a very large number of past sales at the same time.
Other scalability issues I can think of:
Reevaluating the forecast for every single new sale doesn't sound great if you're expecting 100s of sales per second. If you're talking about a financial forecasting model for accounting, it's unlikely it needs to be updated every single time the organisation makes a sale, if the organisation is making hundreds of sales a second.
If you're talking about a short term predictive engine to be used for financial markets (ie predicting how much cash you'll need in the next 10 seconds, or energy, or other resources), then it sounds like you have constant volatility and you're not really likely to have a situation where nothing happens for hours. And if you do need forecasts updated very frequently, waiting a couple of hours before triggering a re-update is not likely to get you the kind of information you need in the way you need it.
With your approach, you will end up with one future scheduled event per product (which could be large), and every time you make a sale, you'll be dropping the old scheduled event and scheduling a new one. So for frequently selling products, you'll be doing repetitive work to constantly kick the can down the road a bit further, when you're not likely to ever get there.
What constitutes a good design is going to be based on the real scenario. The generic case is interesting to think about, but good designs need to be shaped to their circumstances.
Here are a few ideas I have that might be appropriate:
If you want an updated forecast per product when that product has a sale, but some products can sell very frequently, then a good approach may be to throttle or buffer the sales on a per product basis. If a product is selling 50 times a second, you can probably afford to wait 1 second, 10 seconds, 2 hours, whatever and evaluate all those sales at once, rather than re-forecasting 50 times a second. Especially if your forecasting process is heavy, doing it for every sale is likely to cause high load for low value, as the information will be outdated almost straight away by the next sale.
You could also have a generic timer that updates forecasts for all products that haven't sold in the last window, but handle the products in buffers. For example, every hour you could pick the 10 products with the oldest forecasts and update them. This prevents the single timer from taking on re-forecasting the entire product set in one hit.
You could use only the single timer approach above and forget the forecast updates on every sale if you want something dead simple.
If you're worried about load from batch forecasting, this kind of work should be done on different hardware from the ones handling sales.

'In-Crate' updates instead of client programs?

This is a simplified example scenario.
I collect interval temperature data from a room's heating, let's say every minute.
Additionally, there is a light switch, which sends its status (on/off) when someone switches the light on or off. All events are stored in crate with their timestamps.
I'd like to query all temperature readings while the light switch is in the "on"-state.
My design is to denormalize the data. So, each temperature event gets a new field, which contains the light switch status. My query breaks down to a simple filter then. However, I have only the events if someone presses the switch, but no continuous readings. So, I need to read out all light switch data, resemble the switch's state over time and update all temperature data accordingly.
Using crate, is there a way doing everything within crate using crate's SQL only, i.e. 'in-crate' data updates? I do not want to setup and maintain external client programs for such operations.
In more complex scenarios, I may also face the problem of reading a huge amount of data first via a single client program in order to update afterwards other data stored within the same data store. This "bottleneck" design approach worries me. Any ideas?
Thanks,
nodot