Triggering when a event does not happens after some seconds using CEP - complex-event-processing

I know how to trigger when an event occurs some time during a time window, like this:
select count(*) from MyEvent.win:time(10 sec) having count(*) >= 3
But how to trigger when no event occurs during 10 seconds?

Here is one way:
select * from pattern[every(timer:interval(10) and not MyEvent)]
There is a solution patterns page in Esper documentation that has some more information on detecting absences.

Related

KSQL SELECT code works, but CREATE TABLE `..` AS SELECT code returns error - io.confluent.ksql.util.KsqlStatementException: Column cannot be resolved

I've got a problem with creating a table or stream in KSQL.
I've made everything as shown in official examples and I don't get why my code does not work.
Example from https://docs.confluent.io/current/ksql/docs/tutorials/examples.html#joining :
CREATE TABLE pageviews_per_region_per_session AS
SELECT regionid,
windowStart(),
windowEnd(),
count(*)
FROM pageviews_enriched
WINDOW SESSION (60 SECONDS)
GROUP BY regionid;
NOW MY CODE:
I've tried to run select in command prom and it WORKS WELL:
SELECT count(*) as attempts_count, "computer", (WINDOWSTART() / 1000) as row_time
FROM LOG_FLATTENED
WINDOW TUMBLING (SIZE 20 SECONDS)
WHERE "event_id" = 4625
GROUP BY "computer"
HAVING count(*) > 2;
But when I try to create the table based on this select (from ksql command-line tool):
CREATE TABLE `incorrect_logins` AS
SELECT count(*) as attempts_count, "computer", (WINDOWSTART() / 1000) as row_time
FROM LOG_FLATTENED
WINDOW TUMBLING (SIZE 20 SECONDS)
WHERE "event_id" = 4625
GROUP BY "computer"
HAVING count(*) > 2;
I GET AN ERROR - io.confluent.ksql.util.KsqlStatementException: Column COMPUTER cannot be resolved. But this column exists and select without create table statement works perfectly.
I'm using the latest stable KSQL image (confluentinc/cp-ksql-server:5.3.1)
In first place, I apologize for my bad english, if something that I'll say it's not clear enough, do not hesitate to reply me and I try to explain me in a better way.
I don't know a lot of KSQL, but I'll try to help you, based on my experience creating STREAMS like your TABLE.
1) As you probably know, KSQL process everything as UpperCase unless you specify the opposite.
2) KSQL doesn't support double quotes in a SELECT inside a CREATE query, in fact, KSQL will ignore this characters and will handle your field as a UpperCase column, for that reason, in the error returned to you, appears COMPUTER and not "computer".
A workaround of this issue is:
In first place, create an empty table with the lowerCase fields:
CREATE TABLE "incorrect_logins" ("attempts_count" INTEGER, "computer" VARCHAR, "row_time" INTEGER) WITH (KAFKA_TOPIC='topic_that_you_want', VALUE_FORMAT='avro')
(If the topic doesn't exist, you'll have to create it before)
Once the table has been created, you could insert data in the table using your SELECT query:
INSERT INTO "incorrect_logins" SELECT count() as "attempts_count", "computer", (WINDOWSTART() / 1000) as "row_time"
FROM LOG_FLATTENED
WINDOW TUMBLING (SIZE 20 SECONDS)
WHERE "event_id" = 4625
GROUP BY "computer"
HAVING count() > 2;
Hope it helps you!

Postgresql select count(*) takes too long time

I have a table in my postgresql table. The table has about 9.100.000 rows. When I execute a query select count(*) from table the execution time is about 1.5 minutes. Is this normal? And what can I do decrease this tim?
If you want an estimation of the size you can use count_estimate. It is much faster.
https://wiki.postgresql.org/wiki/Count_estimate
Another workaround is to use a statistics field, to increase it every time a new row is being added.
Also please read https://www.citusdata.com/blog/2016/10/12/count-performance/

Postgres trigger when any of "top" rows change

I have a table with about 10,000 users. I want to watch the top 10, as sorted by score, and give realtime updates whenever the composition of the list changes or any of the scores of the top 10 change.
What's the best way to do this with a trigger, and will this be a highly expensive trigger to keep running?
As long as you have an index on score, it would be cheap to have an AFTER UPDATE trigger FOR EACH ROW that contains:
IF (SELECT count(*)
FROM (SELECT 1
FROM users
WHERE score > NEW.score
LIMIT 10
) top_users
) < 10
OR
(SELECT count(*)
FROM (SELECT 1
FROM users
WHERE score > OLD.score
LIMIT 10
) top_users
) < 10
THEN
/* your action here */
END IF;
You'd also need a DELETE trigger that contains only the second part of the query and an INSERT trigger with only the first part.
The problem I see here is /* your action here */.
This should be a short operation like adding something to a queue, otherwise you might end up with long transactions and long locks.

Run Time of the Trigger in Siddhi

I want read the postgres table from siddhi, and I am using a Trigger:
#From(eventtable='rdbms', jdbc.url='jdbc:postgresql://localhost:5432/pruebabg', username='postgres', password='Easysoft16', driver.name='org.postgresql.Driver', table.name='Trazablack')
define table Trazablack (sensorValue double);
define trigger FiveMinTriggerStream at every 10 min;
from FiveMinTriggerStream join Trazablack as t
select t.sensorValue as sensorValue
insert into StreamBlack;
But, I have a problem, the query is run every 10 minutes, I need to run when a new event arrives.
is this possible?
from sensorStream#window.length(2)
JOIN StreamBlack#window.length(1)
on sensorStream.sensorValue==StreamBlack.sensorValue
select sensorStream.meta_timestamp, sensorStream.meta_sensorName, sensorStream.correlation_longitude,
sensorStream.correlation_latitude, sensorStream.sensorValue as valor1, StreamBlack.sensorValue as valor2
insert INTO StreamPaso;
from sensorStream#window.length(2)
LEFT OUTER JOIN StreamBlack#window.length(1)
on sensorStream.sensorValue==StreamBlack.sensorValue
select sensorStream.meta_timestamp, sensorStream.meta_sensorName, sensorStream.correlation_longitude,
sensorStream.correlation_latitude, sensorStream.sensorValue as valor1, StreamBlack.sensorValue as valor2
insert INTO StreamPaso;
In this case you can simply do the join with the input stream instead of trigger stream using a window. We can use a 0 length window since we don't need to store anything in the window and we need the query to trigger when an event arrives through the stream. Also we can use the 'current events' (i.e. incoming events) clause to make sure that the 'expired events' (i.e. events kept and emitted by the window) of the window are not considered (refer Siddhi QL documentation on windows for more information).
eg (assuming sensorStream is the input stream):
from sensorStream#window.length(0) join Trazablack as t
select t.sensorValue as sensorValue
insert current events into StreamBlack;

Syntax for Avoiding Multiple Subqueries In Postgres

I'm new to postgres, so please take it easy on me.
I'm trying to write a query so that for any user, I can pull ALL of the log files (both their activity and the activity of others) for one minute prior and one minute after their name appears in the logs within the same batchstamp.
chat.batchstamp is varchar
chat.datetime is timestamp
chat.msg is text
chat.action is text (this is the field with the username)
Here are the separate commands I want to use, I just don't know how to put them together and if this is really the right path to go on this.
SELECT batchstamp, datetime, msg FROM chat WHERE action LIKE 'username';
Anticipated output:
batchstamp datetime msg
abc 2010-12-13 23:18:00 System logon
abc 2010-12-13 10:12:13 System logon
def 2010-12-14 11:12:18 System logon
SELECT * FROM chat WHERE datetime BETWEEN datetimefrompreviousquery - interval '1 minute' AND datetimefrompreviousquery + interval '1 minute';
Can you please help explain to me what I should do to feed data from the previous query in to the second query? I've looked at subqueries, but do I need to run two subqueries? Should I build a temporary table?
After this is all done, how do I make sure that the times the query matches are within the same batchstamp?
If you're able to point me in the right direction, that's great. If you're able to provide the query, that's even better. If my explanation doesn't make sense, maybe I've been looking at this too long.
Thanks for your time.
Based on nate c's code below, I used this:
SELECT * FROM chat,
( SELECT batchstamp, datetime FROM chat WHERE action = 'fakeuser' )
AS log WHERE chat.datetime BETWEEN log.datetime - interval '1 minute' AND log.datetime + '1 minute';
It doesn't seem to return every hit of 'fakeuser' and when it does, it pulls the logs from every 'batchstamp' instead of just the one where 'fakeuser' was found. Am I in for another nested query? What's this type of procedure called so I can further research it?
Thanks again.
You first query can go in the from clause with '(' brackets around it and 'as alias' name. After that you can reference it as you would a normal table in the rest of the query.
SELECT
*
FROM chat,
(
SELECT
batchstamp,
datetime,
msg
FROM log
WHERE action LIKE 'username'
) AS log
WHERE chat.datetime BETWEEN
log.datetime - interval '1 minute'
AND log.datetime + interval '1 minute';
That should get you started.
A colleague at work came up with the following solution which seems to provide the results I'm looking for. Thanks for everyone's help.
SELECT batchstamp, datetime, msg INTO temptable FROM chat WHERE action = 'fakeusername';
select a.batchstamp, a.action, a.datetime, a.msg
FROM chat a, temptable b
WHERE a.batchstamp = b.batchstamp
and (
a.datetime BETWEEN b.datetime - interval '1 minute'
AND b.datetime + interval '1 minute'
) and a.batchstamp = '2011-3-1 21:21:37'
group by a.batchstamp, a.action, a.datetime, a.msg
order by a.datetime;