#unique not summing up correctly based on order of events - complex-event-processing

As answered in Esper sum of snapshots (as opposed to sum of deltas), #unique(field) ensures that the sum is using specified field when summing up the values. That worked great except in the case listed below.
Given the following epl:
create schema StockTick(security string, price double);
create schema OrderEvent (orderId int, trader string, symbol string, strategy string,
quantity double);
select orderId, trader, symbol, strategy, sum(quantity)
from OrderEvent#unique(orderId) as o
full outer join StockTick#lastevent as t
on o.symbol = t.security
group by trader, symbol, strategy
The following events:
StockTick={security='IBM', price=99}
OrderEvent={orderID=1, trader="A", symbol='IBM', strategy="VWAP", quantity=10}
StockTick={security='IBM', price=99}
correctly provides the sum as sum(quantity)=10.0.
But if the StockTick did not happen first:
OrderEvent={orderID=1, trader="A", symbol='IBM', strategy="VWAP", quantity=10}
StockTick={security='IBM', price=99}
I still expected the value of quantity to be 10. But in this case it is sum(quantity)=20.0!!
Based on that #unique(field) does, I figure this shouldn’t be 20. Why is #unique(field) not being adhered to and what is needed in the query for the output to be unique on orderId for both cases?
The following set of events is also available to run on Esper Notebook
https://notebook.esperonline.net/#/notebook/2HE9662E6

In a streaming join the aggregation result aggregates all rows produced by the join. Since the join produces 2 rows as output with quantity 10 each the result is 20.
You can take an aggregation of the result of the join and unique by order id, you can use insert into. Like so:
insert into JoinedStream select orderId, trader, symbol, strategy, quantity
from OrderEvent#unique(orderId) as o
full outer join StockTick#lastevent as t
on o.symbol = t.security;
select orderId, trader, symbol, strategy, sum(quantity)
from JoinedStream#unique(orderId)
group by trader, symbol, strategy;

Related

Selecting max value grouped by specific column

Focused DB tables:
Task:
For given location ID and culture ID, get max(crop_yield.value) * culture_price.price (let's call this multiplication monetaryGain) grouped by year, so something like:
[
{
"year":2014,
"monetaryGain":...
},
{
"year":2015,
"monetaryGain":...
},
{
"year":2016,
"monetaryGain":...
},
...
]
Attempt:
SELECT cp.price * max(cy.value) AS monetaryGain, EXTRACT(YEAR FROM cy.date) AS year
FROM culture_price AS cp
JOIN culture AS c ON cp.id_culture = c.id
JOIN crop_yield AS cy ON cy.id_culture = c.id
WHERE c.id = :cultureId AND cy.id_location = :locationId AND cp.year = year
GROUP BY year
ORDER BY year
The problem:
"columns "cp.price", "cy.value" and "cy.date" must appear in the GROUP BY clause or be used in an aggregate function"
If I put these three columns in GROUP BY, I won't get expected result - It won't be grouped just by year obviously.
Does anyone have an idea on how to fix/write this query better in order to get task result?
Thanks in advance!
The fix
Rewrite monetaryGain to be:
max(cp.price * cy.value) AS monetaryGain
That way you will not be required to group by cp.price because it is not outputted as an group member, but used in aggregate.
Why?
When you write GROUP BY query you can output only columns that are in GROUP BY list and aggregate function values. Well this is expected - you expect single row per group, but you may have several distinct values for the field that is not in grouping column list.
For the same reason you can not use a non grouping column(-s) in arithmetic or any other (not aggregate) function because this would lead in several results for in single row - there would not be a way to display.
This is VERY loose explanation but I hope will help to grasp the concept.
Aliases in GROUP BY
Also you should not use aliases in GROUP BY. Use:
GROUP BY EXTRACT(YEAR FROM cy.date)
Using alias in GROUP BY is not allowed. This link might explain why: https://www.postgresql.org/message-id/7608.1259177709%40sss.pgh.pa.us

What does filter mean?

select driverid, count(*)
from f1db.results
where position is null
group by driverid order by driverid
My thought: first find out all the records that position is null then do the aggregate function.
select driverid, count(*) filter (where position is null) as outs
from f1db.results
group by driverid order by driverid
First time, I meet with filter clause, not sure what does it mean?
Two code block results are different.
Already googled, seems don't have many tutorials about the FILTER. Kindly share some links.
A more helpful term to search for is aggregate filters, since FILTER (WHERE) is only used with aggregates.
Filter clauses are an additional filter that is applied only to that aggregate and nowhere else. That means it doesn't affect the rest of the columns!
You can use it to get a subset of the data, like for example, to get the percentage of cats in a pet shop:
SELECT shop_id,
CAST(COUNT(*) FILTER (WHERE species = 'cat')
AS DOUBLE PRECISION) / COUNT(*) as "percentage"
FROM animals
GROUP BY shop_id
For more information, see the docs
FILTER ehm... filters the records which should be aggregated - in your case, your aggregation counts all positions that are NULL.
E.g. you have 10 records:
SELECT COUNT(*)
would return 10.
If 2 of the records have position = NULL
than
SELECT COUNT(*) FILTER (WHERE position IS NULL)
would return 2.
What you want to achieve is to count all non-NULL records:
SELECT COUNT(*) FILTER (WHERE position IS NOT NULL)
In the example, it returns 8.

Postgresql: how to query hstore dynamically

I have the following tables
ORDER (idOrder int, idCustomer int) [PK: idOrder]
ORDERLINE (idOrder int, idProduct int) [PK: idOrder, idProduct]
PRODUCT (idProduct int, rating hstore) [PK: idProduct]
In the PRODUCT table, 'rating' is a key/value column where the key is an idCustomer, and the value is an integer rating.
The query to count the orders containing a product on which the customer has given a good rating looks like this:
select count(distinct o.idOrder)
from order o, orderline l, product p
where o.idorder = l.idorder and l.idproduct = p.idproduct
and (p.rating->(o.idcust::varchar)::int) > 4;
The query plan seems correct, but this query takes forever. So I tried a different query, where I explode all the records in the hstore:
select count(distinct o.idOrder)
from order o, orderline l,
(select idproduct, skeys(p.rating) idcustomer, svals(p.rating) intrating from product) as p
where o.idorder = l.idorder and l.idproduct = p.idproduct
and o.idcustomer = p.idcustomer and p.intrating > 4;
This query takes only a few seconds. How is this possible? I assumed that exploding all values of an hstore would be quite inefficient, but it seems to be the opposite. Is it possible that I am not writing the first query correctly?
I'm suspecting it is because in the first query you are doing:
p.rating->(o.idcust::varchar)::int
a row at a time as the query iterates over the rest of the operations, whereas in the second query the hstore values are expanded in a single query. If you want more insight use EXPLAIN ANALYZE:
https://www.postgresql.org/docs/12/sql-explain.html

How to get ids of grouped by rows in postgresql and use result?

I have a table containing transactions with an amount. I want to create a batch of transactions so that the sum of amount of each 'group by' is negative.
My problematic is to get all ids of the rows concerned by a 'group by' where each group is validate by a sum condition.
I find many solutions which don't work for me.
The best solution I found is to request the db a first time with the 'group by' and the sum, then return ids to finally request the db another time with all of them.
Here an example of what I would like (it doesn't work!) :
SELECT * FROM transaction_table transaction
AND transaction.id IN (
select string_agg(grouped::character varying, ',' ) from (
SELECT array_agg(transaction2.id) as grouped FROM transaction_table transaction2
WHERE transaction2.c_scte='c'
AND (same conditions)
GROUP BY
transaction2.motto ,
transaction2.accountBnf ,
transaction2.payment ,
transaction2.accountClt
HAVING sum(transaction2.amount)<0
)
);
the result of the array_agg is like:
{39758,39759}
{39757,39756,39755,39743,39727,39713}
and the string_agg is :
{39758,39759},{39757,39756,39755,39743,39727,39713}
Now I just need to use them but I don't know how to...
unfortunatly, it doesn't work because of type casting :
ERROR: operator does not exist: integer = integer[]
Indice : No operator matches the given name and argument type(s). You might need to add explicit type casts.
Maybe you are looking for
SELECT id, motto, accountbnf, payment, accountclnt, amount
FROM (SELECT id, motto, accountbnf, payment, accountclnt, amount,
sum(amount)
OVER (PARTITION BY motto, accountbnf, payment, accountclnt)
AS group_total
FROM transaction_table) AS q
WHERE group_total < 0;
The inner SELECT adds an additional column using a window function that calculates the sum for each group, and the outer query removes all results where that sum is not negative.
Finally I found this option using the 'unnest' method. It works perfectly.
Array_agg bring together all ids in different array
unnest flattened all of them
This comes from here
SELECT * FROM transaction_table transaction
WHERE transaction.id = ANY(
SELECT unnest(array_agg(transaction2.id)) as grouped FROM transaction_table transaction2
WHERE transaction2.c_scte='c'
AND (same conditions)
GROUP BY
transaction2.motto ,
transaction2.accountBnf ,
transaction2.payment ,
transaction2.accountClt
HAVING sum(transaction2.amount)<0
);
The problem with this solution is that hibernate doesn't take into account the array_agg method.

oracle: grouping on merged columns

I have a 2 tables FIRST
id,rl_no,adm_date,fees
1,123456,14-11-10,100
2,987654,10-11-12,30
3,4343,14-11-17,20
and SECOND
id,rollno,fare,type
1,123456,20,bs
5,634452,1000,bs
3,123456,900,bs
4,123456,700,bs
My requirement is twofold,
1, i first need to get all columns from both tables with common rl_no. So i used:
SELECT a.ID,a.rl_no,a.adm_date,a.fees,b.rollno,b.fare,b.type FROM FIRST a
INNER JOIN
SECOND b ON a.rl_no = b.rollno
The output is like this:
id,rl_no,adm_date,fees,rollno,fare,type
1,123456,14-11-10,100,123456,20,bs
1,123456,10-11-12,100,123456,900,bs
1,123456,14-11-17,100,123456,700,bs
2,Next i wanted to get the sum(fare) of those rollno that were common between the 2 tables and also whose fare >= fees from FIRST table group by rollno and id.
My query is:
SELECT x.ID,x.rl_no,,x.adm_date,x.fees,x.rollno,x.type,sum(x.fare) as "fare" from (SELECT a.ID,a.rl_no,a.adm_date,a.fees,b.rollno,b.fare,b.type FROM FIRST a
INNER JOIN
SECOND b ON a.rl_no = b.rollno) x, FIRST y
WHERE x.rollno = y.rl_no AND x.fare >= y.fees AND x.type IS NOT NULL GROUP BY x.rollno,x.ID ;
But this is throwing in exceptions.
ORA-00979: not a GROUP BY expression
00979. 00000 - "not a GROUP BY expression"
The expected output will be like this:
id,rollno,adm_date,fare,type
1,123456,14-11-10,1620,bs
So could someone care to show an oracle newbie what i'm doing wrong here?
It looks like there's a couple different problems here;
Firstly, you're trying to group by an x.ID column which doesn't exist; it looks like you'll want to add ID to the selected columns in your sub-query.
Secondly, when aggregating with GROUP BY, all selected columns need to be either listed in the GROUP BY statement or aggregated. If you're grouping by rollno and ID, what do you want to have happen to all the extra values for adm_date, fees, and type? Are those always going to be the same for each distinct rollno and ID pair?
If so, simply add them to the GROUP BY statement, ie,
GROUP BY adm_date, fees, type, rollno, ID
If not, you'll need to work out exactly how you want to select which one to be output; If you've got output like your example (adding in an ID column here)
ID,adm_date,fees,rollno,fare,type
1,14-11-10,100,123456,20,bs
1,10-11-12,100,123456,900,bs
1,14-11-17,100,123456,700,bs
Call that result set 'a'. If I run;
SELECT a.ID, a.rollno, SUM(a.fare) as total_fare
FROM a
GROUP BY a.ID, a.rollno
Then the result will be a single row;
ID,rollno,total_fare
1,123456,1620
So, if you also select the adm_date, fees, and type columns, oracle has no idea what you mean to do with them. You're not using them for grouping, and you're not telling oracle how you want to pick which one to use.
You could do something like
SELECT a.ID,
FIRST(a.adm_date) as first_adm_date,
FIRST(a.fees) as first_fees,
a.rollno,
SUM(a.fare) as total_fare,
FIRST(a.type) as first_type
FROM a
GROUP BY a.ID, a.rollno
Which would give the result;
ID,first_adm_date,first_fees,rollno,total_fare,first_type
1,14-11-10,100,123456,1620,bs
I'm not sure if that's what you mean to do though.