I have the following tables
ORDER (idOrder int, idCustomer int) [PK: idOrder]
ORDERLINE (idOrder int, idProduct int) [PK: idOrder, idProduct]
PRODUCT (idProduct int, rating hstore) [PK: idProduct]
In the PRODUCT table, 'rating' is a key/value column where the key is an idCustomer, and the value is an integer rating.
The query to count the orders containing a product on which the customer has given a good rating looks like this:
select count(distinct o.idOrder)
from order o, orderline l, product p
where o.idorder = l.idorder and l.idproduct = p.idproduct
and (p.rating->(o.idcust::varchar)::int) > 4;
The query plan seems correct, but this query takes forever. So I tried a different query, where I explode all the records in the hstore:
select count(distinct o.idOrder)
from order o, orderline l,
(select idproduct, skeys(p.rating) idcustomer, svals(p.rating) intrating from product) as p
where o.idorder = l.idorder and l.idproduct = p.idproduct
and o.idcustomer = p.idcustomer and p.intrating > 4;
This query takes only a few seconds. How is this possible? I assumed that exploding all values of an hstore would be quite inefficient, but it seems to be the opposite. Is it possible that I am not writing the first query correctly?
I'm suspecting it is because in the first query you are doing:
p.rating->(o.idcust::varchar)::int
a row at a time as the query iterates over the rest of the operations, whereas in the second query the hstore values are expanded in a single query. If you want more insight use EXPLAIN ANALYZE:
https://www.postgresql.org/docs/12/sql-explain.html
Related
Goal: Create a query to pull the closest cycle count event (Table C) for a product ID based on the inventory adjustments results sourced from another table (Table A).
All records from Table A will be used, but is not guaranteed to have a match in Table C.
The ID column will be present in both tables, but is not unique in either, so that pair of IDs and Timestamps together are needed for each table.
Current simplified SQL
SELECT
A.WHENOCCURRED,
A.LPID,
A.ITEM,
A.ADJQTY,
C.WHENOCCURRED,
C.LPID,
C.LOCATION,
C.ITEM,
C.QUANTITY,
C.ENTQUANTITY
FROM
A
LEFT JOIN
C
ON A.LPID = C.LPID
WHERE
A.facility = 'FACID'
AND A.WHENOCCURRED > '23-DEC-22'
AND A.ADJREASONABBREV = 'CYCLE COUNTS'
ORDER BY A.WHENOCCURRED DESC
;
This is currently pulling the first hit on C.WHENOCCURRED on the LPID matches. Want to see if there is a simpler JOIN solution before going in a direction that creates 2 temp tables based on WHENOCCURRED.
I have a functioning INDEX(MATCH(MIN()) solution in Excel but that requires exporting a couple system reports first and is extremely slow with X,XXX row tables.
If you are using Oracle 12 or later, you can use a LATERAL join and FETCH FIRST ROW ONLY:
SELECT A.WHENOCCURRED,
A.LPID,
A.ITEM,
A.ADJQTY,
C.WHENOCCURRED,
C.LPID,
C.LOCATION,
C.ITEM,
C.QUANTITY,
C.ENTQUANTITY
FROM A
LEFT OUTER JOIN LATERAL (
SELECT *
FROM C
WHERE A.LPID = C.LPID
AND A.whenoccurred <= c.whenoccurred
ORDER BY c.whenoccurred
FETCH FIRST ROW ONLY
) C
ON (1 = 1) -- The join condition is inside the lateral join
WHERE A.facility = 'FACID'
AND A.WHENOCCURRED > DATE '2022-12-23'
AND A.ADJREASONABBREV = 'CYCLE COUNTS'
ORDER BY A.WHENOCCURRED DESC;
As answered in Esper sum of snapshots (as opposed to sum of deltas), #unique(field) ensures that the sum is using specified field when summing up the values. That worked great except in the case listed below.
Given the following epl:
create schema StockTick(security string, price double);
create schema OrderEvent (orderId int, trader string, symbol string, strategy string,
quantity double);
select orderId, trader, symbol, strategy, sum(quantity)
from OrderEvent#unique(orderId) as o
full outer join StockTick#lastevent as t
on o.symbol = t.security
group by trader, symbol, strategy
The following events:
StockTick={security='IBM', price=99}
OrderEvent={orderID=1, trader="A", symbol='IBM', strategy="VWAP", quantity=10}
StockTick={security='IBM', price=99}
correctly provides the sum as sum(quantity)=10.0.
But if the StockTick did not happen first:
OrderEvent={orderID=1, trader="A", symbol='IBM', strategy="VWAP", quantity=10}
StockTick={security='IBM', price=99}
I still expected the value of quantity to be 10. But in this case it is sum(quantity)=20.0!!
Based on that #unique(field) does, I figure this shouldn’t be 20. Why is #unique(field) not being adhered to and what is needed in the query for the output to be unique on orderId for both cases?
The following set of events is also available to run on Esper Notebook
https://notebook.esperonline.net/#/notebook/2HE9662E6
In a streaming join the aggregation result aggregates all rows produced by the join. Since the join produces 2 rows as output with quantity 10 each the result is 20.
You can take an aggregation of the result of the join and unique by order id, you can use insert into. Like so:
insert into JoinedStream select orderId, trader, symbol, strategy, quantity
from OrderEvent#unique(orderId) as o
full outer join StockTick#lastevent as t
on o.symbol = t.security;
select orderId, trader, symbol, strategy, sum(quantity)
from JoinedStream#unique(orderId)
group by trader, symbol, strategy;
I have a table cusers with a primary key:
primary key(uid, lid, cnt)
And I try to insert some values into the table:
insert into cusers (uid, lid, cnt, dyn, ts)
values
(A, B, C, (
select C - cnt
from cusers
where uid = A and lid = B
order by ts desc
limit 1
), now())
on conflict do nothing
Quite often (with the possibility of 98%) a row cannot be inserted to cusers because it violates the primary key constraint, so hard select queries do not need to be executed at all. But as I can see PostgreSQL first counts the select query as a result of dyn column and only then rejects row because of uid, lid, cnt violation.
What is the best way to insert rows quickly in such situation?
Another explanation
I have a system where one row depends on another. Here is an example:
(x, x, 2, 2, <timestamp>)
(x, x, 5, 3, <timestamp>)
Two columns contain an absolute value (2 and 5) and relative value (2, 5 - 2). Each time I insert new row it should:
avoid same rows (see primary key constraint)
if new row differs, it should count a difference and put it into the dyn column (so I take the last inserted row for the user according to the timestamp and subtract values).
Another solution I've found is to use returning uid, lid, ts for inserts and get user ids which were really inserted - this is how I know they have differences from existing rows. Then I update inserted values:
update cusers
set dyn = (
select max(cnt) - min(cnt)
from (
select cnt
from cusers
where uid = A and lid = B
order by ts desc
limit 2) Table
)
where uid = A and lid = B and ts = TS
But it is not a fast approach either, as it seeks all over the ts column to find the two last inserted rows for each user. I need a fast insert query as I insert millions of rows at a time (but I do not write duplicates).
What the solution can be? May be I need a new index for this? Thanks in advance.
I am trying to map a certain value of a column based on its count on another table. If the count of [Location] i.e a column of IMPORT.DATA_SCRAP table in each row. For now for location static value i.e Utah and Kathmandu is supplied for test purpose only is equal to 1, then only i need to get the result in the select statement i.e only single value expression must be returned but here n rows of table with value is returned.
For. eg. In the below query,total rows of IMPORT.DATA_SCRAP gets returned, i only need the single first row value in my case.
I came to know whether cursor or CTE will acheive my result but i am unable to figure it out.
Here,
select
case
when
((SELECT COUNT(stateName) FROM Location.tblState where stateName = 'Utah')=1)
then (select stateName, CountryName from Location.tblState where stateName= 'Utah')
end as nameof
from IMPORT.DATA_SCRAP
The relation between country, state, city is as below:
select
case
when
((SELECT COUNT(cityName) FROM Location.tblCity where cityName = 'Kathmandu')=1)
then (select ct.countryName from Location.tblCity c
inner join Location.tblState s
on c.stateID = s.StateID
inner join Location.tblCountry ct
on ct.countryId = s.CountryId
where c.cityName = 'Kathmandu'
)
end as nameof
from IMPORT.DATA_SCRAP
How can i return only a single value expresion despite of multiple nmax rows of IMPORT.DATA_SCRAP row in the result.
If i comment out the -- from IMPORT.DATA_SCRAP in the above query i would get the desired single result expression in my case, but unable how can i acheive it in other ways or suggest me the appropriate way to do these types of situation.
I have the next query which does not work:
UPDATE item
SET popularity= (CASE
WHEN (select SUM(io.quantity) from item i NATURAL JOIN itemorder io GROUP BY io.item_id) > 3 THEN TRUE
ELSE FALSE
END);
Here I want to compare each line of inner SELECT SUM value with 3 and update popularity. But SQL gives error:
ERROR: more than one row returned by a subquery used as an expression
I understand that inner SELECT returns many values, but can smb help me in how to compare each line. In other words make loop.
When using a subquery you need to get a single row back, so you're effectively doing a query for each record in the item table.
UPDATE item i
SET popularity = (SELECT SUM(io.quantity) FROM itemorder io
WHERE io.item_id = i.item_id) > 3;
An alternative (which is a postgresql extension) is to use a derived table in a FROM clause.
UPDATE item i2
SET popularity = x.orders > 3
FROM (select i.item_id, SUM(io.quantity) as orders
from item i NATURAL JOIN itemorder io GROUP BY io.item_id)
as x(item_id,orders)
WHERE i2.item_id = x.item_id
Here you're doing a single group clause as you had, and we're joining the table to be updated with the results of the group.