Error with pseudo clock in drools when I have two rules matching different events - drools

I want to test drools 6.3 with a scenario, But I have a problem in a special situation.
Here is my scenario in a simple form:
I have two systems, A and B, in a simulated Network that generate events. I want to write two rules to find out patterns in these events. Two rules for testing this scenario is:
declare A
#timestamp(timestampA)
end
declare B
#timestamp(timestampB)
end
Rule “1”
When
accumulate( A() over window:time( 10s ) ; s:count(1) ; s>1)
Then
System.out.println( " Rule 1 matched " );
Rule “2”
When
B()
Then
System.out.println( " Rule 2 matched " );
Timestamp of each event is the timestamp from log generated in each system on when received by drools and inserted in working memory.
I’m using STREAM mode with pseudo clock, because events from System B receives with 25min delay due to network congestion and I should adjust session clock manually. Session clock set with the timestamp of every event inserted into the session. And All rules fire when every event inserted.
When order of receiving and inserting events are like below matched correctly.
Event A received at 10:31:21 – Session clock : 10:31:21 – insert A and fire
Event A received at 10:31:23 - Session clock : 10:31:23 – insert A and fire
Rule 1 matched
Event B received at 10:06:41 - Session clock : 10:06:41 – insert B and fire
Rule 2 matched
But when order of receiving and inserting events are like below matched incorrectly:
Event A received at 10:31:21 – Session clock : 10:31:21 – insert A and fire
Event B received at 10:06:41 - Session clock : 10:06:41 – insert B and fire
Rule 2 matched
Event A received at 10:31:23 - Session clock : 10:31:23 – insert A and fire
When second A event inserted tow A events in last 10s are in working memory but rule 1 does not match. Why?

What you are doing is somewhat in conflict with the assumptions underlying the CEP (Continuous Event Processing) of Drools. STREAM mode implies that events should be inserted in the order of their timestamps, irrespective of their origin. Setting the pseudo clock back and forth in big jumps is another good way to confuse the Engine.
Don't use STREAM mode, window:time and forget about session clocks.
You have facts containing time stamps, and you can easily write your rules by referring to these time stamps, either using plain old arithmetic or by applying the temporal operators (which are nothing but syntactic sugar for testing the relation of long (as in java.lang) values.

Related

anylogic triggering an event

In accordance, I have a question in the modeling process. I want my agent to have an event that is triggered by time and condition, example: goToSchool if it is more than 6 am and there is a school bus. I am confused about whether to use the timeout trigger (but cannot use the condition) or the condition (but cannot use the timeout) or is there any possible alternative?
In your example, "if it is more than 6 am" is a condition and not a timeout. A timeout trigger is used when you want an event to happen at an exact time. In your case, while "more than 6 am" is time related, it is still a condition. So I would use a condition triggered event with two conditions:
getHourOfDay() > 6 && <bus condition>
getHourOfDay() function returns the hour of the day in a 24-hr format.
You need to keep in mind something important related to condition triggered events, they are only evaluated "on change". I recommend you read this carefully:
https://help.anylogic.com/index.jsp?topic=%2Fcom.anylogic.help%2Fhtml%2Fstatecharts%2Fcondition-event.html
My recommendation would be to use the onChange() function in the block controlling your bus arrival so that the condition is evaluated each time a bus arrives.

How to replay in a deterministic way in CQRS / event-sourcing?

In CQRS / ES based systems, you store events in an event-store. These events refer to an aggregate, and they have an order with respect to the aggregate they belong to. Furthermore, aggregates are consistency / transactional boundaries, which means that any transactional guarantees are only given on a per-aggregate level.
Now, supposed I have a read model which consumes events from multiple aggregates (which is perfectly fine, AFAIK). To be able to replay the read model in a deterministic way, the events need some kind of global ordering, across aggregates – otherwise you wouldn't know whether to replay events for aggregate A before or after the ones for B, or how to intermix them.
The simplest solution to achieve this is by using a timestamp on the events, but typically timestamps are not fine-granular enough (or, to put it another way, not all databases are created equal). Another option is to use a global sequence, but this is bad performance-wise and hinders scaling.
How do you solve this issue? Or is my basic assumption, that replays of read models should be deterministic, wrong?
I see these options:
Global sequence
if your database allows it, you can use timestamp+aggregateId+aggregateVersion as an index. This usually doesnt work well in the distributed database case.
in the distributed database you can use vector clock to get a global sequence without having a lock.
Event sequence inside each read model. You can literally store all events in the read model and sort them as you want before applying a projection function.
Allow non-determinism and deal with it. For instance, in your example, if there is no group when add_user event arrives - just create an empty group record to the read model and add a user. And when create_group event arrives - update that group record.
After all, you have checked in UI and/or command handler that there
is a group with this aggregateId, right?
How do you solve this issue?
It's known issue, and of course nor simple timestamps, nor global sequence, nor event naïve methods will not help.
Use vector clock with weak timestamp to enumerate your events and vector cursor to read them. That guarantees some stable deterministic order to intermix events between aggregates. This will work even if each thread has clock synchronization gap, which is regular use case for database clusters, because perfect timestamp synchronization is impossible.
Also this automatically gives possibility to seamless mix reading events from event store and event bus later, and excludes any database locks inter different aggregates events.
Algorithm draft:
1) Determine real quantity of simultaneous transactions in your database, e.g. maximum number of workers in cluster.
Since every event had been written in only one transaction in one thread, you can determine it's unique id as tuple (thread number, thread counter), where thread counter is amount of transactions processed on current thread.
Calculate event weak timestamp as MAX(thread timestamp, aggregate timestamp), where aggregate timestamp is timestamp of last event for current aggregate.
2) Prepare vector cursor for reading events via thread number boundary. Read events from each thread sequentially until timestamp gap exceed allowed value. Allowed weak timestamp gap is trade between event reading performance and preserving native events order.
Minimal value is cluster threads synchronization time delta, so events are arrived in native aggregate intermix order. Maximum value is infinity, so events will be spitted by aggregate. When using RDBMS like postgres, that value can be automatically determined via smart SQL query.
You can see referent implementation for PostgreSQL database for saving events and loading events. Saving events performance is about 10000 events per second for 4GB RAM RDS Postgres cluster.

Flink session window with onEventTime trigger?

I want to create an EventTime based session-window in Flink, such that it triggers when the event time of a new message is more than 180 seconds greater than the event time of the message, that created the window.
For example:
t1(0 seconds) : msg1 <-- This is the first message which causes the session-windows to be created
t2(13 seconds) : msg2
t3(39 seconds) : msg3
.
.
.
.
t7(190 seconds) : msg7 <-- The event time (t7) is more than 180 seconds than t1 (t7 - t1 = 190), so the window should be triggered and processed now.
t8(193 seconds) : msg8 <-- This message, and all subsequent messages have to be ignored as this window was processed at t7
I want to create a trigger such that the above behavior is achieved through appropriate watermark or onEventTime trigger. Can anyone please provide some examples to achieve this?
The best way to approach this might be with a ProcessFunction, rather than with custom windowing. If, as shown in your example, the events will be processed in timestamp order, then this will be pretty straightforward. If, on the other hand, you have to handle out-of-order events (which is common when working with event time data), it will be somewhat more complex. (Imagine that msg6 with for time 187 arrives after t8. If that's possible, and if that will affect the results you want to produce, then this has to be handled.)
If the events are in order, then the logic would look roughly like this:
Use an AscendingTimestampExtractor as the basis for watermarking.
Use Flink state (perhaps ListState) to store the window contents. When an event arrives, add it to the window and check to see if it has been more than 180 seconds since the first event. If so, process the window contents and clear the list.
If your events can be out-of-order, then use a BoundedOutOfOrdernessTimestampExtractor, and don't process the window's contents until currentWatermark indicates that event time has passed 180 seconds past the window's start time (you can use an event time timer for this). Don't completely clear the list when triggering a window, but just remove the elements that belong to the window that is closing.

RDBMS Event-Store: Ensure ordering (single threaded writer)

Short description about the setup:
I'm trying to implement a "basic" event store/ event-sourcing application using a RDBMS (in my case Postgres). The events are general purpose events with only some basic fields like eventtime, location, action, formatted as XML. Due to this general structure, there is now way of partitioning them in a useful way. The events are captured via a Java Application, that validate the events and then store them in an events table. Each event will get an uuid and recordtime when it is captured.
In addition, there can be subscriptions to external applications, which should get all events matching a custom criteria. When a new matching event is captured, the event should be PUSHED to the subscriber. To ensure, that the subscriber does not miss any event, I'm currently forcing the capture process to be single threaded. When a new event comes in, a lock is set, the event gets a recordtime assigned to the current time and the event is finally inserted into the DB table (explicitly waiting for the commit). Then the lock is released. For a subscription which runs scheduled for example every 5 seconds, I track the recordtime of the last sent event, and execute a query for new events like where recordtime > subscription_recordtime. When the matching events are successfully pushed to the subscriber, the subscription_recordtime is set to the events max recordtime.
Everything is actually working but as you can imagine, a single threaded capture process, does not scale very well. Thus the main question is: How can I optimise this and allow for example multiple capture processes running in parallel?
I already thought about setting the recordtime in the DB itself on insert, but since the order of commits cannot be guaranteed (JVM pauses), I think I might loose events when two capture transactions are running nearly at the same time. When I understand the DB generated timestamp currectly, it will be set before the actual commit. Thus a transaction with a recordtime t2 can already be visible to the subscription query, although another transaction with a recordtime t1 (t1 < t2), is still ongoing and so has not been committed. The recordtime for the subscription will be set to t2 and so the event from transaction 1 will be lost...
Is there a way to guarantee the order on a DB level, so that events are visible in the order they are captured/ committed? Every newly visible event must have a later timestamp then the event before (strictly monotonically increasing). I know about a full table lock, but I think, then I will have the same performance penalties as before.
Is it possible to set the DB to use a single threaded writer? Then each capture process would also be waiting for another write TX to finished, but on a DB level, which would be much better than a single instance/threaded capture application. Or can I use a different field/id for tracking the current state? Normal sequence ids will suffer from the same reasons.
Is there a way to guarantee the order on a DB level, so that events are visible in the order they are captured/ committed?
You should not be concerned with global ordering of events. Your events should contain a Version property. When writing events, you should always be inserting monotonically increasing Version numbers for a given Aggregate/Stream ID. That really is the only ordering that should matter when you are inserting. For Customer ABC, with events 1, 2, 3, and 4, you should only write event 5.
A database transaction can ensure the correct order within a stream using the rules above.
For a subscription which runs scheduled for example every 5 seconds, I track the recordtime of the last sent event, and execute a query for new events like where recordtime > subscription_recordtime.
Reading events is a slightly different story. Firstly, you will likely have a serial column to uniquely identify events. That will give you ordering and allow you to determine if you have read all events. When you read events from the store, if you detect a gap in the sequence. This will happen if an insert was in flight when you read the latest events. In this case, simply re-read the data and see if the gap is gone. This requires your subscription to maintain it's position in the index. Alternatively or additionally, you can read events that are at least N milliseconds old where N is a threshold high enough to compensate for delays in transactions (e.g 500 or 1000).
Also, bear in mind that there are open source RDBMS event stores that you can either use or leverage in your process.
Marten: http://jasperfx.github.io/marten/documentation/events/
SqlStreamStore: https://github.com/SQLStreamStore/SQLStreamStore

Drools Fusion Out of Order events

I am using Drools fusion for processing real time events. Each event has a timestamp field.
Issue is events can be out of order sometimes. Can drools fusion handle this situtation and if yes how ?
Thanks
If A is stamped 0:00:00 and B is stamped 0:01:00 and B arrives and you have the rule
rule "A before B"
when
$b: B()
not A( this before $b )
then ... end
it will fire.
You can use fact insertion time as a timestamp.
Keep the original timestamp as a property. Maybe you'll want to look at it if a "situation" occurs. It depends.
Addition If you have a maximum delay dt you can put all arriving events into "quarantine" for this delay. Before you insert A, check all other streams (sources) for an event B that precedes A and react accordingly. Everything will react at least delayed by dt.