Siddhi: Ignore duplicate events while pattern active - complex-event-processing

I have a non-occurring event pattern to detect if a certain condition happens, then alert me if that condition doesn't change within a time limit. The below query could be described as "If value 1 appears for a user, alert me if there is not a new value for that user within 5 seconds":
define stream inStream(name string, value int);
partition with (name of inStream)
begin
from every in=inStream[ value == 1 ]
-> not inStream[ not(value == 1) ] for 5 sec
select in.name, in.value
insert into outStream;
end;
This query works exactly as expected: if I don't receive a value different than 1 within 5 seconds then the query is triggered. The issue arises when there are duplicate events with the value of 1.
If I send the event {name: "bob", value: 1} every second for 10 seconds, I would like to see the query triggered twice, once at 5 seconds, and once at 10 seconds. Right now however I see the query being triggered every second starting at 5 seconds in. Essentially the query (working as it should) is starting the 5 second timer for every event with value 1 it sees. However I would like it to not start that timer (or not output at least) if there is already a timer running.
I attempted to solve this with the following query (simply adding an 'output' line):
define stream inStream(name string, value int);
partition with (name of inStream)
begin
from every in=inStream[ value == 1 ]
-> not inStream[ not(value == 1) ] for 5 sec
select in.name, in.value
output first every 5 sec
insert into outStream;
end;
I also tried output last and output all.
The queries above did not work as expected: in the case of all and last no events at all were output, in the case of first only a single event was output, not the subsequent ones after the first 5 second block passed.
Is there any way to achieve what I would like? I have a hunch that using time windows or output is the way to solve it, but so far have not been able to get it to work.

The second query in the original question ends up working as expected. I was previously on Siddhi 4.1.4. After upgrading to Siddhi 5.0.0 the query works as I wanted it to.

Related

T-SQL set a rotating Flag (True/False) in records using Stored Proc

I can do this using multiple commands in C# for the app I'm creating, but prefer a stored proc to eliminate issues with latency/locks, etc. (hopefully):
I have a table of 10 extensions (important fields):
SortOrder, Extension, IsUsed
First record will be set to IsUsed = true
When calling the stored proc, I need the IsUsed of the NEXT record in sort order to be set to true,the current record that is true set to false. When I hit the last record, rotate back to the first record.
Use Case: I need to rotate through a bank of usable numbers. Multiple people use the app, so cannot reuse. a number within the last 4 minutes (Bank of 10 will suffice, but we can extend if necessary). When the user requests a number, they get the next avail. I can build the table however needed, so any and all options to achieve use case are welcome.
I need to set the flag to true on the 1st record when stored proc is called. All other records should be false.
I have seen this, which is of interest, but doesn't quite answer:
Get "next" row from SQL Server database and flag it in single transaction
If all that you're using this for is to return a number to identify a session, I'd suggest scrapping the whole table idea and letting SQL Server do the work for you.
You can create a SEQUENCE object that will cycle and return the next value for you, without needing to write any code or maintain any tables.
CREATE SEQUENCE dbo.Extension
AS integer
START WITH 5
INCREMENT BY 5
MINVALUE 5
MAXVALUE 50
CYCLE;
This will return the number 5 the first time it's called, up to the number 50 on call number 10, and then start back over. You can adjust the numbers in the code to more or less do whatever you would like, though.
Get the next value like this:
SELECT NEXT VALUE FOR dbo.Extension;
And when/if you need to extend the range:
ALTER SEQUENCE dbo.Extension
MAXVALUE 100;
Play around with the idea on the Rextester demo.
Edit: In light of the comments above and below, I'd still stick with a SEQUENCE, I think.
Every time your code calls the table for an extension, use a query along the lines of this:
SELECT
Extension
FROM
ExtTable
WHERE
SortOrder = NEXT VALUE FOR dbo.Extension;
Functionally, this should do what you're after, again with no code to write or maintain.

Siddhi query for event prior to another within time limit

I am trying to write a Siddhi query to detect if an event didn't happen prior to another within a time limit. The query I have to detect if 'X' didn't ever happen prior to 'Y' in the entire life of the siddhi application is:
from stream[value == 'Y']
and not stream[value == 'X']
I assumed adding a time constraint would work:
from stream[value == 'Y']
and not stream[value == 'X'] for 5 min
However, the 'for' statement never has any effect that I can see. This query is still triggered whether 'X' was 4 minutes ago or 6 minutes ago. I understand that a similar effect can be achieved by checking if 'Y' comes after 'X' within a time limit, but for my purposes I need to know the other way around.
Is this possible with Siddhi? If so can someone please provide a sample query that could achieve this?
Based on your comment I am composing below answer.
Sequence construct in Siddhi will guarantee that no one can enter the state flow at a random point. As an example let's take below sample sequence query.
from every e1=InputStream[state='X'], e2=InputStream[state'Y']
select e1.state as initialState, e2.state as finalState
insert into NextStream;
So in here we are mandating X followed by a consecutive Y. If Y occurs before X sequence construct will handle that and discard Y. So whatever goes into NextStream is guaranteed to satisfy x -> Y state transition.
More relaxed construct of sequence is pattern. It is same as sequence while relaxing the consecutive arrival requirement. Hope this helps!!

KSQL Hopping Window : accessing only oldest subwindow

I am tracking the rolling sum of a particular field by using a query which looks something like this :
SELECT id, SUM(quantity) AS quantity from stream \
WINDOW HOPPING (SIZE 1 MINUTE, ADVANCE BY 10 SECONDS) \
GROUP BY id;
Now, for every input tick, it seems to return me 6 different aggregated values I guess which are for the following time periods :
[start, start+60] seconds
[start+10, start+60] seconds
[start+20, start+60] seconds
[start+30, start+60] seconds
[start+40, start+60] seconds
[start+50, start+60] seconds
What if I am interested is only getting the [start, start+60] seconds result for every tick that comes in. Is there anyway to get ONLY that?
Because you specify a hopping window, each record falls into multiple windows and all windows need to be updated when processing a record. Updating only one window would be incorrect and the result would be wrong.
Compare the Kafka Streams docs about hopping windows (Kafka Streams is KSQL's internal runtime engine): https://docs.confluent.io/current/streams/developer-guide/dsl-api.html#hopping-time-windows
Update
Kafka Streams is adding proper sliding window support via KIP-450 (https://cwiki.apache.org/confluence/display/KAFKA/KIP-450%3A+Sliding+Window+Aggregations+in+the+DSL). This should allow to add sliding window to ksqlDB later, too.
I was in a similar situation and creating a user defined function to access only the window with collect_list(column).size() = window duration appears to be a promising track.
In the udf use List type to get one of your aggregate base column list of values. Then assess is the formed list size is equal to the hopping window number of period, return null otherwise.
From this create a table selecting data and transforming it with the udf.
Create a table from this latest table and filter out null values on the transformed column.

tJavaFlex behaviour when changing loop position

Having some problems in a job, and I suspect it is due to a lack of understanding of tJavaFlex. I am generating 10 rows in this test job, and am generating loop inside a tJavaFlex:
So there are 10 rows coming in, and a loop in the Start and End section. I was expecting that for each row coming in, it would generate 10 identical rows coming out. And that I would see iterations 0,1,2,3....9 for each row.
What I got was this. This looks to me like the entire job is running 10 times, and so I have 100 random values coming through the flow from the tRowGenerator.
If I move the for loop into the Main Code section, I get close to the behaviour I was expecting. I am expecting each row when it comes in to be repeated 10 times, and for 1 row coming in to produce 10 output rows. What I get is this.
But even then my tLogRow is only generating one row for each 10 iterations it seems (look at the tLogRow output after iteration 9 above why not 10 items?). I had thought I would be getting 10 rows for each single row coming in and I would see this in the tLogRow.
What I need to do is take a value from a field coming in, do some reg exp parsing and split into an array, and then for each item in the array create lines in the output flow. i.e. 1 row coming in can be turned into x number of rows coming out using a string.split() method.
Can someone explain the behaviour above, and also advise on the best approach to get one value coming in, do some java manipulation and then generate multiple rows coming out?
Any advice appreciated.
Yes you don't use it correctly.
The initial part is for initiate variable. (executed one time before the first tow)
In the principal you put your loop (executed one time at each row)
In the final you store in global variable for example.(executed one time after the last row)
The principal code will be executed at each row in a tjavaflex. So don't put a for loop inside you can do like the example in the screen.
You tjavaflex comportement is normal. you have ten row so each row the for loop wil be executed 10 time (i<10)
You can use it like :
You dont need to create your own loop.
By putting the for loop in the Start code, your main code will be triggered by the loop and by incoming rows, and it will be executed n*r times.
The behaviour of subjob that contains a tJavaFlex, reveils that component before tJavaFlex is included into its starting code, and the after component is included in the ending code, but that may depend to many conditions like data propagation and trigger type.
start code :
System.out.print("tJavaFlex is starting...");
int i = 0;
Main code :
i++;
System.out.print("tJavaFlex inside Main Code...iteration:"+i);
row8.ITEM_NAME = row7.ITEM_NAME;
row8.ITEM_COUNT = row7.ITEM_COUNT;
End code :
System.out.print("tJavaFlex is ending...");
System.out.print(row7.ITEM_NAME);
Instead of main flow in row5, try using iterate flow to connect tJavaFlex

Postgresql sequences

When I delete all records from a Postgresql table and then try to reset the sequence to start a new record with number 1 when it is inserted, i get different results :
SELECT setval('tblname_id_seq', (SELECT COALESCE(MAX(id),1) FROM tblname));
This sets the current value of the sequence to 1, but the NEXT record (actually the first because there are no records yet) gets number 2!
And I can't set it to 0, because the minimum value in the sequence is 1!
When I use :
ALTER SEQUENCE tblname_id_seq RESTART WITH 1;
the first record that is inserted actually gets number 1 ! But the above code doesn't accept a SELECT as a value instead of 1.
I wish to reset the sequence to number 1 when there are no records, and the first record then should start with 1. But when there ARE already records in the table, I want to reset the sequence so that the next record that is inserted will get {highest}+1
Does anyone have a clear solution for this?
Use the three-argument form of setval to set the is_called flag to false, so that it returns the current value of the sequence for the next nextval call rather than immediately generating a new one.
http://www.postgresql.org/docs/current/interactive/functions-sequence.html
Also note you need to use COALESCE(MAX(id),0)+1, otherwise the first value from the sequence will be MAX(id), which you know to already exist. (thx Stephen Denne)
See http://www.postgresql.org/docs/current/static/functions-sequence.html, near the bottom of the page.
Specifically, at least in Postgresql 9.0, you can find and set the value of a sequence. You can use the three-argument form of setval() to set either the current value of the sequence or the next value of the sequence (which would allow you to set the sequence to 1 on the next value retrieval).