Esper: create window for events that occur after another event - complex-event-processing

I want mean speed (sensor=1) at different power settings 1-10 (sensor=2). I don't want speeds from sensor one to be included in the mean calculation if they occurred before the change in setting (sensor=2).
E.g. power_setting=1, speed=50, speed=60, power_setting=2, speed=100, speed=120
If I took a time window around power_setting = 2, then it may include a speed for power_setting = 1.
How do I explicitly tell esper to only calculate with values that occured after the power setting change.
Here is my current script:
CREATE window SpeedWindow.win:time_batch(30 sec) as (speed double, power_setting int);
INSERT into SpeedWindow
SELECT
speedEvent.value as speed,
powerSetting.value as power_setting
FROM
Sensors(id = 1).std:lastevent() as speedEvent,
Sensors(id = 2).std:lastevent() as powerSetting;
INSERT into Output
SELECT
case
when SpeedWindowEvent.power_setting = 1 then 4154
when SpeedWindowEvent.power_setting = 2 then 4155
... etc ...
end as id,
avg(SpeedWindowEvent.speed) as value
FROM
SpeedWindow as SpeedWindowEvent
GROUP BY
SpeedWindowEvent.power_setting;

Based on the comments clarifying the use case. Create a context that initiates and terminates on speed event.
// make it easy to read, takes no CPU at all
insert into Speed select * from Sensor(id=1);
insert into Power select * from Sensor(id=2);
create context SpeedSession initiated by Speed terminated by Speed;
context SpeedSession select avg(...) from Power;

Related

Aggregation over aggregation

I have time-series data (stock exchange trades) and I need to aggregate them by time interval: one minute, 5 minutes, 15 mins etc.
Senior time frame could be calculated from minor time frame, that is 5 x one-minutes -> 5 minutes.
I made MATERIALIZED VIEW, AggregatingMergeTree, which successfully calculates m1, like
maxState(price) as price_high, countState(item_id) as trades_count
But I don't how to make the next timeframes. If I use maxMerge in next view I return an incorrect result, which is fine as docs say I must use -state in AggregatingMergeTree, when I use -State in m5 too it complains on error.
I'd like to build series of materialized views, where minor view feeds senior one in a pipe with updates from trades
UPDATE (SQL):
CREATE MATERIALIZED VIEW IF NOT EXISTS candle_m1_state
ENGINE = AggregatingMergeTree() PARTITION BY toYYYYMM(toDateTime(timestamp_close_m1/1000))
ORDER BY (platform_id, symbol, timestamp_close_m1)
POPULATE AS
select
platform_id as platform_id,
symbol as symbol,
'1m' as `candle_interval`,
1000*toUnixTimestamp(toStartOfMinute(toDateTime(timestamp/1000))) as timestamp_m1,
1000*toUnixTimestamp(addMinutes(toStartOfMinute(toDateTime(timestamp/1000)), 1)) as timestamp_close_m1,
...
minState(price) as price_low,
countState(item_id) as trades_count
from trade
group by platform_id, symbol, timestamp_m1, timestamp_close_m1, `candle_interval`
order by timestamp_close_m1;
/*The one below definitely wrong due to -State suffix*/
CREATE MATERIALIZED VIEW IF NOT EXISTS candle_m5_test
ENGINE = AggregatingMergeTree() PARTITION BY toYYYYMM(toDateTime(timestamp_close_m5 / 1000))
ORDER BY (platform_id, symbol, timestamp_close_m5) SETTINGS index_granularity = 8192
POPULATE AS
SELECT platform_id, symbol, '5m' AS candle_interval,
1000 * toUnixTimestamp(toStartOfFiveMinute(toDateTime(timestamp_m1 / 1000))) AS timestamp_m5,
1000 * toUnixTimestamp(addMinutes(toStartOfFiveMinute(toDateTime(timestamp_m1 / 1000)), 5)) AS timestamp_close_m5,
...
minState(price_low) AS price_low,
countState(trades_count) AS trades_count
FROM candle_m1_state
GROUP BY platform_id, symbol, timestamp_m5, timestamp_close_m5
ORDER BY platform_id ASC, symbol ASC, timestamp_close_m5 ASC;
I wouldn't try to chain views. I'd do one view per aggregation.
Also have in mind that MATERIALIZED VIEW is rather trigger than view.
I'd recommend:
CREATE MATERIALIZED VIEW
stream__source__target_5m TO target_5m
AS
SELECT ...
CREATE MATERIALIZED VIEW
stream__source__target_1m TO target_1m
AS
SELECT ...
Etc.
where target_xm are your target tables.
It's obvious that select-query time for chain of materialized view I'd like to stick to that solution, rather than making view for each Time Frame (TF) aggregation from original data.
So solution is:
Original raw data -> TF1 materialized view (AggregatingMergeTree, -State suffix) -> TF2 (from TF1) (AggregatingMergeTree, -MergeState suffix)
Then queries form any TF1, TF2.. with -Merge suffix

OrientDB query to receive last vertex before a given date

Let's say I have the following list of vertices (connected by edges) in the orient database:
[t=1] --> [t=2] --> [t=3] --> [t=4] --> [t=5] --> [t=6] --> [t=7]
Each vertex has a timestamp t. I now want to receive the last vertex before a given date. Example: give me the last vertex before t=5, which is t=4.
Currently I'am using the following query to do this:
SELECT FROM ANYVERTEX WHERE t < 5 ORDER BY t DESC LIMIT 1
This is working fine when having up to let's say 1000 elements but the performance of that query drops with the number of elements inserted in the list. I already tried using an index, which improved the overall performance, but the problem, that the performance drops with the amount of elements still persists.
When building queries, always try to use the information you have about the relationship in your query to improve performance. In this case you don't need the sort (which is an expensive operation) because you know that the vertex you need has an incoming edge to the vertex, you can simply use that information in your query.
For example, let's say I have the following setup:
CREATE CLASS T EXTENDS V
CREATE VERTEX T SET t = 1
CREATE VERTEX T SET t = 2
CREATE VERTEX T SET t = 3
CREATE VERTEX T SET t = 4
CREATE VERTEX T SET t = 5
CREATE CLASS link EXTENDS E
CREATE EDGE link FROM (SELECT * FROM T WHERE t = 1) TO (SELECT * FROM T WHERE t = 2)
CREATE EDGE link FROM (SELECT * FROM T WHERE t = 2) TO (SELECT * FROM T WHERE t = 3)
CREATE EDGE link FROM (SELECT * FROM T WHERE t = 3) TO (SELECT * FROM T WHERE t = 4)
CREATE EDGE link FROM (SELECT * FROM T WHERE t = 4) TO (SELECT * FROM T WHERE t = 5)
Then I can select the vertex before any T as such:
SELECT expand(in('link')) FROM T WHERE t = 2
This query does the following:
Select the vertex from T where t=2
From that vertex, traverse the incoming edge(s) of type link
expand() the vertex from which that edge comes from to get all of its information
The result is exactly what you want:
This should give better performance (especially if you add an index on the attribute t of the vertices) because you are using all the information you know in advance about the relationship = the node you need has an edge to the node you select.
Hope that helps you out.

How to query the first row efficiently?

I have a table with large amount of records:
date instrument price
2019.03.07 X 1.1
2019.03.07 X 1.0
2019.03.07 X 1.2
...
When I query for the day opening price, I use:
1 sublist select from prices where date = 2019.03.07, instrument = `X
It takes a long time to execute because it selects all the prices on that day and get the first one.
I also tried:
select from prices where date = 2019.03.07, instrument = `X, i = 0 //It does not return any record (why?)
select from prices where date = 2019.03.07, instrument = `X, i = first i //Seem to work. Does it?
In Oracle an equivalent will be:
select * from prices where date = to_date(...) and instrument = "X" and rownum = 1
and Oracle will stop immediately when it finds the first record.
How to do this in KDB (e.g. stop immediately after it finds the first record)?
In kdb, where subclauses in select statements are executed sequentially. i.e. only those records which pass the first "test" get passed to the second test. With that in mind, looking at your two attempts:
select from prices where date = 2019.03.07, instrument = `X, i = 0 //It does not return any record (why?)
This doesn't (necessarily) return anything, because by the time it gets to the i=0 check, you've already filtered out some records (possibly including the first record in the original table, which would have i=0)
select from prices where date = 2019.03.07, instrument = `X, i = first i //Seem to work. Does it?
This one should work. First you filter by date. Then within the records for that date, you select the records for instrument `X. Then within those records, you take the record where i is the first i (where i has already been filtered down, so first i is simply the index of the first record [still the index from the original table, not the filtered down version])
Q-SQL equivalent for that is select[n] which also performs better than other approaches in most of the cases. Positive 'n' will give first n records and negative will give last n records.
q) select[1] from prices where date = 2019.03.07, instrument = `X
There is no inbuilt functionality to stop after first match. You can write custom function for that but that would probably execute slower than above supported version.

How write RQLQuery?

I am new to ATG, and I have this question. How can I write my RQLQuery that provide me data, such as this SQL query?
select avg(rating) from rating WHERE album_id = ?;
I'm trying this way:
RqlStatement statement;
Object rqlparam[] = new Object[1];
rqlparam[0] = album_Id;
statement= RqlStatement.parseRqlStatement("album_id= ? 0");
MutableRepository repository = (MutableRepository) getrMember();
RepositoryView albumView = repository.getView(ALBUM);
This query returns me an item for a specific album_id, how can I improve my RQL query so that it returns to me the average field value, as SQL query above.
There is no RQL syntax that will allow for the calculation of an average value for items in the query. As such you have two options. You can either execute your current statement:
album_id= ? 0
And then loop through the resulting RepositoryItem[] and calculate the average yourself (this could be time consuming on large datasets and means you'll have to load all the results into memory, so perhaps not the best solution) or you can implement a SqlPassthroughQuery that you execute.
Object params[] = new Object[1];
params[0] = albumId;
Builder builder = (Builder)view.getQueryBuilder();
String str = "select avg(rating) from rating WHERE album_id = 1 group by album_id";
RepositoryItem[] items =
view.executeQuery (builder.createSqlPassthroughQuery(str, params));
This will execute the average calculation on the database (something it is quite good at doing) and save you CPU cycles and memory in the application.
That said, don't make a habit of using SqlPassthroughQuery as means you don't get to use the repository cache as much, which could be detrimental to your application.

Postgres: query for a specific date on a timestampz returns no data while I can see the data in my postgres client

I'm very new at postgres and I've been googling this for about 1h before posting here.
Hopefully, you can help me with this probably trivial issue.
I have a database LTC that was created with this schema:
CREATE TABLE LTC (
id SERIAL PRIMARY KEY,
time timestamptz,
side CHAR(4),
price REAL,
v REAL,
n INT
);
I want to query to get all the data for time='2017-08-12T03:58:26.563Z' (ISO string from javascript). I know there are over a hundred lines of data in the database for that time, i see it in my postgres client.
Here is the query I'm doing:
select * from LTC where time = '2017-08-12T03:58:26.563Z'::timestamptz
Why am I getting no results?
Edit:
Still unsure why it wasn't working, but I wrote a work-around that does:
In JavaScript:
var date = new Date('2017-08-12T03:58:26.563Z').toISOString(); // actual time passed as parameter in my function, hard-coded for the example
var reg = new RegExp("([0-9]{4})-([0-9]{2})-([0-9]{2})T([0-9]{2}):([0-9]{2}):([0-9]{2})\.[0-9]*Z","gmi");
var start = date.replace(reg, function(match,year,month,day,hour,min,sec,ms) {
return year+'-'+month+'-'+day+'T'+hour+':'+min+':00.000Z';
});
var end = date.replace(reg, function(match,year,month,day,hour,min,sec,ms) {
var min = parseInt(min)+1;
if (min<=9) {
min = '0'+min;
}
return year+'-'+month+'-'+day+'T'+hour+':'+min+':00.000Z';
});
var query = "select * from LTC where time >= '"+start+"'::timestamptz and time < '"+end+"'::timestamptz"; // This works
Try using date_trunc('second',timestamp) on both to ensure that numerical precision is not what is throwing you off. There may be additional decimal places that are not being shown by your client and throwing off the equality.
The other possible solution is giving a range (between x and y) to avoid the numerical equality issue.
It would be easier for us to help if you can get an exact copy of the data being represented (using pg_dump perhaps, if you are familiar) so that we can test with the data that you are using.
The final thing you may want to check is explicitly stating the time zones that you are referencing. I generally use timestamp without time zone to avoid this issue, but auto-setting time zones to different values may be throwing you off as well. A good way to test is by selecting the two values, something like
select *, l.time = p.ts as test
from LTC l, (select '2017-08-12T03:58:26.563Z'::timestamptz as ts) p
;
EDIT:
I have built a test to try to reproduce your behavior:
CREATE TABLE LTC (
id serial
, time timestamptz
);
INSERT INTO LTC (time)
values ('2017-08-12T03:58:26.56312345'::timestamptz)
returning *;
select *
from LTC
where time = '2017-08-12T03:58:26.563Z'::timestamptz
;
select *, l.time = p.ts as test
from LTC l, (select '2017-08-12T03:58:26.563Z'::timestamptz as ts) p
;
What I get here is actually:
1;"2017-08-12 03:58:26.563123-04";"2017-08-11 23:58:26.563-04";f
Hopefully you can see what is happening - the '2017-08-12T03:58:26.563Z'::timestamptz is being interpreted as a UTC time and then converted to my time zone (UTC-04), so what is being compared is actually a different date altogether! In the future, showing this type of equality side-by-side is a great way to test that you are executing what you think you are (especially with dates / times where auto-conversion happens often).