PostgreSQL many scalar functions on same unnest - postgresql

I have this query:
select distinct * ,
(select count(A) filter(where A < 360) from unnest(cust_journey_time_series) as A) as count_journey_time_series
, (select avg(A) filter(where A < 360) from unnest(cust_journey_time_series) as A) as avg_journey_time_series
, (select sum(A) filter(where A < 360) from unnest(cust_journey_time_series) as A) as order_journey_time
, (select sum(A) ..
, (select count(distinct A) ..
from (
select a,b,c, max(s) s, max(ev) ev, max(ord) ord,
array_agg(cust_journey_time_seconds order by ev asc) as cust_journey_time_series, min(mi) mi,
array_agg(col order by ev asc) as cus,
min(ra) ra, max(de) de, max(to) to,
max(orde) orde, max(cam) cam,
max(pag) pag,
count(collec) FILTER (WHERE collec <> las ) AS pag_count,
max (fi) as ad,
array_agg(fi) as dd2c,
array_agg(ut) as ut_places,
array_agg(ct) as ct_places,
array_agg(ut) as utaces
from temp_j
group by a,b,c
order by ev
)aaa
my intuitive code would be select distinct * , (select count(A), sum(A),avg(A) filter(where A < 360) from unnest(cust_journey_time_series) as A) as count_journey_time_series, sum_journey_time_series , avg_journey_time_series
but I get this error
subquery must return only one column
Is there a way to optimize the query or PostgreSQL does it under the hood?

I'll try to translate the fragment you provide to something better:
select distinct tab.*,
count(cjts.A) filter (where cjts.A < 360) as count_journey_time_series,
avg(cjts.A) filter (where cjts.A < 360) as avg_journey_time_series,
sum(cjts.A) filter (where cjts.A < 360) as order_journey_time,
...
from (select ...) as tab
cross join lateral unnest(tab.cust_journey_time_series) as cjts
group by tab.pkey;
That way, you have to evaluate the function only once.
The error message of your modified query comes from:
SELECT
(SELECT count(A), sum(A), avg(A) FROM ...)
...
That is not allowed. You can only use a single result column in a subquery in the SELECT list. Writing the query my way avoids the problem.

Related

how to order output of arrays when using union

I have a query like this:
SELECT array_agg(candles) as candles FROM ( SELECT * FROM ... ) AS candles
UNION ALL
SELECT array_agg(trades) as trades FROM ( SELECT * FROM ... ) AS trades
UNION ALL
SELECT ...
But then I'll get rows that contain arrays, but the order of the rows doesn't necessarily match the query order.
For example, it is possible that the output will have the trades row before the candles row.
How can I get the rows in a predictable order?
Edit:
updated the query based on the answer but getting an error:
SELECT a FROM
(
SELECT 1 as o, array_agg(candles) as a
FROM (
SELECT ts, open, high, low, close, midpoint, volume
FROM exchange.binance.candles
WHERE instrument = 'BTCUSDT' AND ts >= '2022-04-01 00:00:00' AND ts < '2022-04-01 01:00:00'
ORDER BY ts) AS candles
UNION ALL
SELECT 2 as o, array_agg(trades)
FROM (
SELECT ts, price, quantity, direction
FROM exchange.binance.trades
WHERE instrument = 'BTCUSDT' AND ts >= '2022-04-01 00:00:00' AND ts < '2022-04-01 01:00:00'
ORDER BY ts) AS trades
UNION ALL
SELECT 3 as o, array_agg(kvwap)
FROM (
SELECT ts, price, "interval"
FROM exchange.binance.kvwap
WHERE instrument = 'BTCUSDT' AND "interval" IN ('M5', 'H1', 'H4') AND ts >= '2022-04-01 00:00:00' AND ts < '2022-04-01 01:00:00'
ORDER BY ts) AS kvwap
)
ORDER BY o;
the error is:
[42601] ERROR: subquery in FROM must have an alias Hint: For example, FROM (SELECT ...) [AS] foo. Position: 15
Add a column for ordering to each subquery, but don't include it in the output:
SELECT a FROM (
SELECT 1 as o, array_agg(candles) as a FROM ( SELECT * FROM ... ) c group by 1
UNION ALL
SELECT 2, array_agg(trades) FROM ( SELECT * FROM ... ) t group by 1
UNION ALL
SELECT ...
) x
ORDER BY o
Note that with UNION only the first subquery's column names are relevant - the entire union uses column names from the first subquery - so don't bother providing aliases for the others.

Compare varchar string to produce missing items list

I have a table with a column. The column stores locations using varchar as the datatype. The locations use the format -2,7 -25,30 etc. I am trying to produce a list of missing locations i.e. where we don't have any customers.
The locations go from -30,-30 to 30,30. I can't find a way to setup a loop to run though all the options. Is there a way to do this?
Microsoft SQL Server 2017
;WITH cte as (
select -30 as n --anchor member
UNION ALL
select n + 1 --recursive member
from cte
where n < 31
)
select z.*
from (
select CONCAT(y.n,',',x.n) as locations
from cte as x CROSS JOIN cte y
) as z
LEFT OUTER JOIN dbo.Client as cli ON cli.client_location = z.locations
where cli.client_location IS NULL
order by z.locations asc
Generate all combinations.
Then match the generated against the existing combinations.
WITH DIGITS AS
(
SELECT n FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) AS val(n)
),
NUMS AS
(
SELECT (tens.n * 10 + ones.n)-50 AS n
FROM DIGITS ones
CROSS JOIN DIGITS tens
),
LOCATIONS AS
(
SELECT CONCAT(n1.n,',',n2.n) AS location, n1.n as n1, n2.n as n2
FROM NUMS n1
JOIN NUMS n2 ON n2.n BETWEEN -30 AND 30
WHERE n1.n BETWEEN -30 AND 30
)
SELECT loc.location
FROM LOCATIONS loc
LEFT JOIN
(
SELECT Client_Location, COUNT(*) Cnt
FROM dbo.Client
GROUP BY Client_Location
) cl ON cl.Client_Location = loc.location
WHERE cl.Client_Location IS NULL
ORDER BY loc.n1, loc.n2
I would go with a recursive CTE. This is a slight variation of SNR's approach:
with cte as (
select -30 as n --anchor member
union all
select n + 1 --recursive member
from cte
where n < 30
)
select cte.x, cte.y,
concat(cte_x.n, ',', cte_y.n) as missing_location
from cte cte_x cross join
cte cte_y left join
dbo.client c
on c.client_location = concat(cte_x.n, ',', cte_y.n)
where c.client_location is null;
Or to avoid the concat() twice:
select cte.x, cte.y, v.location as missing_location
from cte cte_x cross join
cte cte_y cross apply
(values (concat(cte_x.n, ',', cte_y.n))
) v(location) left join
dbo.client c
on c.client_location = v.location
where c.client_location is null;

postgres: Combining filter() with a percentile_cont()

Trying to use a filter() clause within a call to percentile_const and I am not sure I can do this. Is there a way? Here's the example query:
select
count(*) as n1,
count(*) filter(where ha >= 0) as n2,
percentile_cont(.9) within group (order by es asc) as p1,
percentile_cont(.9) filter (where ha >= 0) within group (order by es asc) as p2
from mytable where mypid = 123;
The query works fine without the p2 call of course, but you can see what I want to do.
The filter needs to go after the within group part:
select
count(*) as n1,
count(*) filter(where ha >= 0) as n2,
percentile_cont(.9) within group (order by es asc) as p1,
percentile_cont(.9) within group (order by es asc) filter (where ha >= 0) as p2
from mytable
where mypid = 123;

Top N values in window frame

I have a table t with 3 fields of interest:
d (date), pid (int), and score (numeric)
I am trying to calculate a 4th field that is an average of each player's top N (3 or 5) scores for the days before the current row.
I tried the following join on a subquery but it is not producing the results I'm looking for:
SELECT t.d, t.pid, t.score, sq.highscores
FROM t, (SELECT *, avg(score) as highscores FROM
(SELECT *, row_number() OVER w AS rnum
FROM t AS t2
WINDOW w AS (PARTITION BY pid ORDER BY score DESC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)) isq
WHERE rnum <= 3) sq
WHERE t.d = sq.d AND t.pid = sq.pid
Any suggestions would be greatly appreciated! I'm a hobbyist programmer and this is more complex of a query than I'm used to.
You can't select * and avg(score) in the same (inner) query. I.e. which non-aggregated values should be selected for each average? PostgreSQL won't decide this instead of you.
Becasue you PARTITION BY pid in the innermost query, you should use GROUP BY pid in the aggregating subquery. That way, you can SELECT pid, avg(score) as highscores:
SELECT pid, avg(score) as highscores
FROM (SELECT *, row_number() OVER w AS rnum
FROM t AS t2
WINDOW w AS (PARTITION BY pid ORDER BY score DESC)) isq
WHERE rnum <= 3
GROUP BY pid
Note: ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING makes no difference for row_number().
But if the top N part is fixed (and N will be few in your real-world use-case too), you can solve this without that much subquery (with the nth_value() window function):
SELECT d, pid, score,
(coalesce(nth_value(score, 1) OVER w, 0) +
coalesce(nth_value(score, 2) OVER w, 0) +
coalesce(nth_value(score, 3) OVER w, 0)) /
((nth_value(score, 1) OVER w IS NOT NULL)::int +
(nth_value(score, 2) OVER w IS NOT NULL)::int +
(nth_value(score, 3) OVER w IS NOT NULL)::int) highscores
FROM t
WINDOW w AS (PARTITION BY pid ORDER BY score DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
http://rextester.com/GUUPO5148

Limit by percent instead of number of rows without subqueries

I would like to select the top 1% of rows; however, I cannot use subqueries to do it. I.e., this won't work:
SELECT * FROM mytbl
WHERE var='value'
ORDER BY id,random()
LIMIT(SELECT (COUNT(*) * 0.01)::integer FROM mytbl)
How would I accomplish the same output without using a subquery with limit?
You can utilize PERCENT_RANK:
WITH cte(ID, var, pc) AS
(
SELECT ID, var, PERCENT_RANK() OVER (ORDER BY random()) AS pc
FROM mytbl
WHERE var = 'value'
)
SELECT *
FROM cte
WHERE pc <= 0.01
ORDER BY id;
SqlFiddleDemo
I solved it with Python using the psycopg2 package:
cur.execute("SELECT ROUND(COUNT(id)*0.01,0)
FROM mytbl")
nrows = str([int(d[0]) for d in cur.fetchall()][0])
cur.execute("SELECT *
FROM mytbl
WHERE var='value'
ORDER BY id, random() LIMIT (%s)",nrows)
Perhaps there is a more elegant solution using just SQL, or a more efficient one, but this does exactly what I'm looking for.
If I got it right, you need:
Random 1% sample of all rows,
If some id is within the sample, all rows with the same id must be there too.
The follow sql should do the trick:
with ids as (
select id,
total,
sum(cnt) over (order by max(rnd)) running_total
from (
select id,
count(*) over (partition by id) cnt,
count(*) over () total,
row_number() over(order by random()) rnd
from mytbl
) q
group by id,
cnt,
total
)
select mytbl.*
from mytbl,
ids
where mytbl.id = ids.id
and ids.running_total <= ids.total * 0.01
order by mytbl.id;
I don’t have your data, of course, but I have no trouble using a sub query in the LIMIT clause.
However, the sub query contains only the count(*) part and I then multiply the result by 0.01:
SELECT * FROM mytbl
WHERE var='value'
ORDER BY id,random()
LIMIT(SELECT count(*) FROM mytbl)*0.01;