Count and Time window in Esper EPL - complex-event-processing

I have the following use case, which I'm trying to write in EPL, without success. I'm generating analytics events of different types, generated in different intervals (1min, 5min, 10min, ...). In special kind of analytics, I need to collect 4 specific
Analytics events (from which I will count another analytic event) of different types, returned every interval (1min, 5min, 10min, ...). The condition there is, that on every whole interval, e.g., every whole minute 00:01:00, 00:02:00 I want to have returned either 4 events or nothing if the events don't arrive in some slack period after (e.g., 2s).
case 1: events A,B,C,D arrive at times 00:01:00.500, 00:01:00.600, 00:01:00.700, 00:01:00.800 - right after fourth event arrives to esper, the aggregated event with all 4 events is returned
case 2: slack period is 2seconds, events A,B,C,D arrives at 00:01:00.500, 00:01:00.600, 00:01:00.700, 00:01:02.200 - nothing is arrived, as the last event is out of the slack period

You could create a trigger event every minute like this:
insert into TriggerEvent select * from pattern[timer:schedule(date:'1970-01-01T00:00:00.0Z', period: 1 minute, repetitions: -1)]
The trigger that arrives every minute can kick off a pattern or context. A pattern would seem to be good enough. Here is something like that:
select * from pattern [every TriggerEvent -> (a=A -> b=B -> c=C -> d=D) where timer:within(2 seconds)]

Related

Grafana 2 queries in same panel but set query interval difference

I have 2 queries in a grafana panel.
I want to run Query A every 5 min
I want to run Query B every 10 min, so I can check the value difference between each query using transform.
How can I set the query interval, I know I can change scrape interval but my goal here is to check pending messages and if it doesnt change in 10 min trigger an alert. I am trying to get a count at 1st minute and get count again at 10th minute. check the difference using transform and trigger an alert if no change (messages are not getting processed )
using grafana 7
Thanks !

Esper - handle out of order events

I want to monitor if an event does NOT arrive within 10 minutes of arrival of event with the same id.
This is the EPL I am currently using:
SELECT * FROM pattern[ every s=Order_Status(status="placed") -> (timer:interval(600 sec) and not e=Order_Status(status="delivered", id=s.id))]
Usually placed event arrives before delivered, but sometime because of some lag in our systems, delivered event happens to come before placed for some id.
Cases
time: 8:00 event: Order_Status{id=167, status="placed"}
time: 8:07 event: Order_Status{id=167, status="delivered"}
< No alert > (delivered within 10 minutes)
time: 8:00 event: Order_Status{id=189, status="placed"}
time: 8:17 event: Order_Status{id=189, status="delivered"}
< Alert> (delivered after 10 minutes)
time: 8:00 event: Order_Status{id=2637, status="delivered"}
time: 8:08 event: Order_Status{id=2637, status="placed"}
< Alert > (but shouldn't alert, the problem is delivered event for this id has arrived before placed)
As stated, I would get a false alert as the EPL pattern starts the window after the placed event and waits for delivered event which has already arrived.
How do I handle this scenario of out of order events ?
Note:
(Basically I want to check for every id if the time difference between placed and delivered is above a certain threshold.
I also have the timestamp fields inside each event)
You have a "not" in your pattern that is used to detect the absence of events. Your requirement doesn't search for absence so the "not" isn't right.
There is a requirement questions as well. You don't state what happens when there are many A events and just one B event. Are there many matches or just the match for the last A event or the first A event or something else?
Sample pattern:
pattern [A -> (timer:interval(10 minutes) and B)]
Or this is a join that would seem to match what you want:
select * from B unidirectional, A#window(10 minutes)#lastevent
The "A#window(10 minutes)#lastevent" keeps the last event for up to 10 minutes.

Use External Window Time Stamp to Debug Siddhi Stream Query

I am planning to use the historical event traces (stored in JSON with my own event time stamp recorded for each event) to debug the Siddhi stream queries that I have just created. My stream starts with:
from MyInputEventStream#window.externalTime(my_own_timestamp, 10 min)
select some_fields
insert into MyOutpuStream;
and I will input my events from traces, one by one.
Supposed event 1 arrives at the specified my_own_timestamp = 1528905600000, which is 9 am PST time, June 13. and event 2 arrives at 11 minutes later, my_own_timestamp = 1528906260000. I believe that I will get the output at MyOutpuStream at 9:10 am, as time_stamp(e2) - time_stamp(e1) > 10 min, and e2 will trigger the system after the windows passes.
Now supposed event 1 arrives at my_own_timestamp = 1528905600000, that is, 9:00 am. But no events will arrive in the next 2 hours. Do I still get the output at 9:10 am, as in reality, the window time should expire at 9:10 am, independent of when the next event should arrive? But it seems that in this case, the internal timing system of Siddhi will have to incorporate my event input's time stamp, and then set the expiration time of the events based on the clock system of the process on which the Siddhi is running. Is this correct? could you help clarify it.
You won't get an output at 9:10 am. Because if you use externalTime, the event expiration logic will entirely base on the timestamp that you defined. And it will wait for a timestamp that satisfies the time difference which is greater than or equal to expire the previous event.
What internally happens is;
def array previousEvents;
foreach currentEvent in currentEvents (events that are coming in):
def currentTime = currentEvent.timestamp;
foreach previousEvent in previousEvents:
def previousTime = previousEvent.timestamp;
def timeDiff = previousTime - currentTime + windowLength;
if (timeDiff <= 0) {
remove previousEvent from previousEvents;
set expired timestamp of previousEvent to currentTime;
expire previousEvent;
}
previousEvents.add(currentEvent);

Is it possible to join 2 Kafka KStreams where the JoinWindows duration is stored in the object of 1 of the streams?

Let say I have 2 streams:
TimeWindow (with begin time, end time)
Numbers (with time stamp)
Is it possible to user either DSL API or Process API to join the streams such that the output will contain TimeWindow object that contains the sum of the numbers that is within the time range specified in TimeWindow?
To be specific, how do you set XXX where it is the duration store in win.getDuration() where win is the one referenced in ValueJoiner.
timeWindow.join(
numbers,
(ValueJoiner<TimeWindow, Number, TimeWindow>) (win, num) -> win.addToTotal(num),
new JoinWindows(XXX, 0)
).to("output_Topic");
The JoinWindows after is 0 because TimeWindow's timestamp is endtime. XXX duration should be calculate as TimeWindows end time - begin time in milli seconds.
Many thanks for any help!
Thanks to Matthias' incite, I end up roll back to use Processor API with the implementation of TimestampExtractors and usage of in memory state store (default to use RockDB) to implemented this function.

How can I schedule bi-weekly jobs?

My app requires users to schedule recurring events that can recur daily, weekly, monthly, or bi-weekly.
By bi-weekly, I mean every fortnight (14 days) starting from an arbitrary date value provided at the time of creation.
My jobs table has two columns to support this: job_frequency_id and job_frequency_value. I'm able to schedule all types except for bi-weekly.
The first col is an FK to the job_frequencies table; it contains daily, weekly, monthy, bi-weekly values. The job_frequency_value contains the value corresponding to the frequency.
For example: If a job has a job_frquency_id == 3 and job_frequency_value == 10, it will run every 10th day of the month.
How do I add bi-weekly support without tampering with my db structure? I will use the job_frequency_value col to store the start date of the 14 day period, but I'm unsure of the calculation going forward.
Say your starting date is stored as a variable named 'createdDate'.
nextFortnight = DateAdd("ww", job_frequency_value*2, createdDate);
can you wrap your scheduled task in a and set it to run every week?
Something like
<cfif DateDiff('ww',CreateDate(2011,01,01),Today'sDate) MOD 2 EQ 1>
That way if the weeks are odd your scheduled task runs completely and if it's an odd week then it runs the scheduled task, but ignore all your code.