Siddhi CEP: Aggregate the String values of grouped events in a batch time window - aggregate

I'm using Siddhi to reduce the amount of events existing in a system. To do so, I declared a batch time window, that groupes all the events based on their target_ip.
from Events#window.timeBatch(30 sec)
select id as meta_ID, Target_IP4 as target_ip
group by Target_IP4
insert into temp;
The result I would like to have is a single event for each target_ip and the meta_ID parameter value as the concatenation of the distinct events that forms the event.
The problem is that the previous query generates as many events as distinct meta_ID values. for example, I'm getting
"id_10", "target_1"
"id_11", "target_1"
And I would like to have
"id_10,id_11", "target_1"
I'm aware that some aggregation method is missing in my query, I saw a lot of aggregation function in Siddhi, including the siddhi-execution-string extension which has the method str:concat, but I don't know how to use it to aggregate the meta_ID values. Any idea?

You could write an execution plan as shown below, to achieve your requirement:
define stream inputStream (id string, target string);
-- Query 1
from inputStream#window.timeBatch(30 sec)
select *
insert into temp;
-- Query 2
from temp#custom:aggregator(id, target)
select *
insert into reducedStream;
Here, the custom:aggregator is the custom stream processor extension that you will have to implement. You can follow [1] when implementing it.
Let me explain a bit about how things work:
Query 1 generates a batch of events every 30 seconds. In other words, we use Query 1 for creating a batch of events.
So, at the end of every 30 second interval, the batch of events will be fed into the custom:aggregator stream processor. When an input is received to the stream processor, its process() method will be hit.
#Override
protected void process(ComplexEventChunk<StreamEvent> streamEventChunk, Processor nextProcessor, StreamEventCloner streamEventCloner, ComplexEventPopulater complexEventPopulater) {
//implement the aggregation & grouping logic here
}
The batch of events is there in the streamEventChunk. When implementing the process() method, you can iterate over the streamEventChunk and create one event per each destination. You will need to implement this logic in the process() method.
[1] https://docs.wso2.com/display/CEP420/Writing+a+Custom+Stream+Processor+Extension

Related

Push query to ksqlDB not returning final result in first result row

I'm trying to get the count of events in a ksqlDB table within an arbitrary time window.
The table my_table was created with a WINDOW SESSION.
It is important to note the query is being run after all data was processed, and the ksqlDB server is basically doing nothing.
My query looks something like this
count(*) as count
FROM my_table
WHERE WINDOWSTART < (1602010972370 + 5000) AND WINDOWEND > 1602010972370
group by 1 emit changes;
Running this kind of query will very often return one result row, and immediately after a second result row with the actual "final" result.
It doesn't look like its a result of values in the table not being "settled" yet, because if I repeat the same query (as many times as I want) I get the same exact behavior.
I'm assuming there is some configuration value which will let ksqlDB to wait just a little longer (in the order of one second) before it returns the result, so I could get the final result in the first row?
BTW using emit final will not work on the query itself since it only apply to "windowed querys"

Cumulocity CEP Event Query depending on other event

I have an event that is triggered when a device starts a process with a unique process id.
When the process stops it sends another event with its Timestamp and the same process id.
Now I want to calculate the total process time. So subtract the Timestamp from the Startevent from the Timestamp from the Endevent.
I tried multiple ways to accomplish this but they all failed.
Is it possible to save an item from a query to a variable?
e.g.
select
#var = d.ProcessID
from table d
or is it possible to make subqueries??
e.g.
select
d.TimeStamp
from table d
where d.ProcessID = (select
e.ProcessID
from table e)
Or if anyone has a different suggestion it would be great to have some input :)
Thanks in advance
Greets
You can use patterns to achieve this. Something like that might work:
select * from pattern [every a=StartEvent -> b=StopEvent(sourceId = a.sourceId, processId = a.processId)]
For more info have a look at the Esper doc.

Is it possible to Group By a field extracted from JSON input in a Siddhi Query?

I currently have an stream, with 1 string attribute that contains a Json event.
This stream receives different events, which I want to apply Json path expressions so I can use those attributes on filters and functions.
JsonPath extractors work like a charm on filters and selectors, unfortunately, I am not being able to use them for the 'Group By' part.
I am actually doing it in an embedded Siddhi App with siddhi-execution-json extension added manually, but for the discussion, so everybody can
easily check and test it, I will paste an example app that works on WSO2 Stream Processor.
The objective looks like the following App:
#App:name("Group_by_json_attribute")
define stream JsonStream(json string);
#sink(type='log')
define stream LogStream(myField string, count long);
#info(name='query1')
from JsonStream#window.time(10 sec)
select json:getString(json, '$.myField') as myField, count() as count
group by myField having count > 1
insert into LogStream;
and it can accept the following events:
{"myField": "my_value"}
However, this query will raise the error:
Cannot find attribute type as 'myField' does not exist in 'JsonStream'; define stream JsonStream(json string)
I have also tried to use directly the Json extractor at 'Group by':
group by json:getString(json, '$.myField') as myField having count > 1
However the error now is:
mismatched input ':' expecting {',', ORDER, LIMIT, OFFSET, HAVING, INSERT, DELETE, UPDATE, RETURN, OUTPUT}
which seems to not be expecting to use an extension here
I am just wondering, if it is possible to group by attributes not directly defined in the input stream. In this case is a field extracted from a JSON object, but it could be any other function that generates another attribute.
I am also using versions from maven central repository
Siddhi: io.siddhi:siddhi-core:5.0.1
siddhi-execution-json: io.siddhi.extension.execution.json:siddhi-execution-json:2.0.1
(Edit) Clarification
The objective is, to use attributes not directly defined in the Stream, to be used on the Group By.
The reason why is, I currently have an embedded app which defines the whole set of input streams coming from external sources formatted as JSON, and there are also a set of output streams to inform external components when a query matches.
This app allows users to create custom queries on this set of predefined Streams, but they are not able to create Streams by their own.
Many thanks!
It seems we are expecting the group by fields from the query input stream, in this case, JsonStream. Use another query before this for extraction and the aggregation and filtering in the following query,
#App:name("Group_by_json_attribute")
define stream JsonStream(json string);
#sink(type='log')
define stream LogStream(myField string, count long);
#info(name='extract_stream')
from JsonStream
select json:getString(json, '$.myField') as myField
insert into ExtractedStream;
#info(name='query1')
from ExtractedStream#window.time(10 sec)
select myField, count() as count
group by myField
having count > 1
insert into LogStream;

How to do pattern matching using match_recognize in Esper for unknown properties of events in event stream?

I am new to Esper and I am trying to filter the events properties from event streams having multiple events coming with high velocity.
I am using Kafka to send row by row of CSV from producer to consumer and at consumer I am converting those rows to HashMap and creating Events at run-time in Esper.
For example I have events listed below which are coming every 1 second
WeatherEvent Stream:
E1 = {hum=51.0, precipi=1, precipm=1, tempi=57.9, icon=clear, pickup_datetime=2016-09-26 02:51:00, tempm=14.4, thunder=0, windchilli=, wgusti=, pressurei=30.18, windchillm=}
E2 = {hum=51.5, precipi=1, precipm=1, tempi=58.9, icon=clear, pickup_datetime=2016-09-26 02:55:00, tempm=14.5, thunder=0, windchilli=, wgusti=, pressurei=31.18, windchillm=}
E3 = {hum=52, precipi=1, precipm=1, tempi=59.9, icon=clear, pickup_datetime=2016-09-26 02:59:00, tempm=14.6, thunder=0, windchilli=, wgusti=, pressurei=32.18, windchillm=}#
Where E1, E2...EN are multiple events in WeatherEvent
In the above events I just want to filter out properties like hum, tempi, tempm and presssurei because they are changing as the time proceeds ( during 4 secs) and dont want to care about the properties which are not changing at all or are changing really slowly.
Using below EPL query I am able to filter out the properties like temp, hum etc
#Name('Out') select * from weatherEvent.win:time(10 sec)
match_recognize (
partition by pickup_datetime?
measures A.tempm? as a_temp, B.tempm? as b_temp
pattern (A B)
define
B as Math.abs(B.tempm? - A.tempm?) > 0
)
The problem is I can only do it when I specify tempm or hum in the query for pattern matching.
But as the data is coming from CSV and it has high-dimensions or features so I dont know what are the properties of Events before-hand.
I want Esper to automatically detects features/properties (during run-time) which are changing and filter it out, without me specifying properties of events.
Any Ideas how to do it? Is that even possible with Esper? If not, can I do it with other CEP engines like Siddhi, OracleCEP?
You may add a "?" to the event property name to get the value of those properties that are not known at the time the event type is defined. This is called dynamic property see documentation . The type returned is Object so you need to downcast.

How to express the event de-duplication logics in Siddhi stream processing

Hi: I need the following de-duplication logics to be implemented in Siddhi stream processing. Assume I have an InputStream, and I want to produce the OutputStream as the following:
(1) when the event is the first one (since the Event Processing engine starts) in the InputStream, insert the event to the OutputStream.
(2)if the event with the same signature, for example, the same event name, arrives in within a 2-minute windows, we consider that the event is identical, and we should NOT insert the event into the OutputStream. Otherwise, we should insert the event into the OutputStream.
I tried to use event pattern to do the filtering. However, I can not find that I can express the "negation logics" in Siddhi, that is, if (not ( e1 --> e2 with same signature in 2 minute window)). Is there a clever way to perform such event-deduplication logics? Note that event deduplication is a very common expression needed for event processing.
If I would implement it in Java, that is relatively straightforward. I will create a hash table. When the first event arrives, I register it to the hash able, and set the event acceptable time of this registered event to be 2 minutes later. When the next event arrives, I look up the hash table, and compare the retrieved event's acceptable time with my current event time, and if the current event time is smaller than the acceptable time, I will not consider it as an output event. Instead of the Java implementation, I prefer to having a declarative solution implemented in Siddhi's stream processing query, if that is possible.
You can use an in-memory table and achieve that; Please find the sample below; it's pretty much similar to your approach with Java.
define stream InputStream (event_id string, data string);
define stream OutputStream (event_id string, data string);
define table ProcessedEvents (event_id string);
from InputStream[not(ProcessedEvents.event_id == event_id in ProcessedEvents)]
insert into OutputStream ;
from OutputStream
select event_id
insert into ProcessedEvents ;
from OutputStream#window.time(2 sec)
select event_id
insert expired events into PurgeStream ;
from PurgeStream
delete ProcessedEvents
on ProcessedEvents.event_id == event_id ;