invalid reference to FROM-clause entry with CTE - postgresql

I'm trying to write a query to generate time periods but am getting the error:
ERROR: invalid reference to FROM-clause entry for table "prop"
Hint: There is an entry for table "prop", but it cannot be referenced from this part of the query.
Position: 346
The query is:
WITH prop AS (SELECT p.stagger, p.tariff FROM core_property p WHERE p.id = 1)
SELECT day + (hour.a * INTERVAL '1 hour') AS time,
day + (hour.b * INTERVAL '1 hour') AS timeEnd
FROM prop,
GENERATE_SERIES(date '2022-05-18' + prop.stagger, date '2022-05-20', '1 day'::INTERVAL) day,
(SELECT UNNEST(CASE
WHEN prop.tariff = 'Economy 7' THEN ARRAY [0,12,18]
WHEN prop.tariff = 'Economy 10' THEN ARRAY [0] END) a,
UNNEST(CASE
WHEN prop.tariff = 'Economy 7' THEN ARRAY [5,13,20]
WHEN prop.tariff = 'Economy 10' THEN ARRAY [7] END) b) hour;
I have a version of this query that almost works but generates duplicates and is ugly:
SELECT day + p.stagger + (CASE
WHEN p.tariff = 'Economy 7' THEN e7.a
WHEN p.tariff = 'Economy 10' THEN e10.a END) * INTERVAL '1 hour' AS time,
day + p.stagger + (CASE
WHEN p.tariff = 'Economy 7' THEN e7.b
WHEN p.tariff = 'Economy 10' THEN e10.b END) * INTERVAL '1 hour' AS timeEnd
FROM GENERATE_SERIES(date '2022-05-18', date '2022-05-20', '1 day'::INTERVAL) day,
(SELECT UNNEST(ARRAY [0,13,20]) a, UNNEST(ARRAY [5,16,22]) b) e10,
(SELECT UNNEST(ARRAY [0]) a, UNNEST(ARRAY [7]) b) e7,
(SELECT p.stagger, p.tariff FROM core_property p WHERE p.id = 1) p;
The result is this when p.tariff = 'Economy 7', but there should only be one entry for each day:
+---------------------------------+---------------------------------+
|time |timeend |
+---------------------------------+---------------------------------+
|2022-05-18 00:15:00.000000 +00:00|2022-05-18 07:15:00.000000 +00:00|
|2022-05-18 00:15:00.000000 +00:00|2022-05-18 07:15:00.000000 +00:00|
|2022-05-18 00:15:00.000000 +00:00|2022-05-18 07:15:00.000000 +00:00|
|2022-05-19 00:15:00.000000 +00:00|2022-05-19 07:15:00.000000 +00:00|
|2022-05-19 00:15:00.000000 +00:00|2022-05-19 07:15:00.000000 +00:00|
|2022-05-19 00:15:00.000000 +00:00|2022-05-19 07:15:00.000000 +00:00|
|2022-05-20 00:15:00.000000 +00:00|2022-05-20 07:15:00.000000 +00:00|
|2022-05-20 00:15:00.000000 +00:00|2022-05-20 07:15:00.000000 +00:00|
|2022-05-20 00:15:00.000000 +00:00|2022-05-20 07:15:00.000000 +00:00|
+---------------------------------+---------------------------------+
When p.tariff = 'Economy 10' then the result is right:
+---------------------------------+---------------------------------+
|time |timeend |
+---------------------------------+---------------------------------+
|2022-05-18 00:15:00.000000 +00:00|2022-05-18 05:15:00.000000 +00:00|
|2022-05-18 13:15:00.000000 +00:00|2022-05-18 16:15:00.000000 +00:00|
|2022-05-18 20:15:00.000000 +00:00|2022-05-18 22:15:00.000000 +00:00|
|2022-05-19 00:15:00.000000 +00:00|2022-05-19 05:15:00.000000 +00:00|
|2022-05-19 13:15:00.000000 +00:00|2022-05-19 16:15:00.000000 +00:00|
|2022-05-19 20:15:00.000000 +00:00|2022-05-19 22:15:00.000000 +00:00|
|2022-05-20 00:15:00.000000 +00:00|2022-05-20 05:15:00.000000 +00:00|
|2022-05-20 13:15:00.000000 +00:00|2022-05-20 16:15:00.000000 +00:00|
|2022-05-20 20:15:00.000000 +00:00|2022-05-20 22:15:00.000000 +00:00|
+---------------------------------+---------------------------------+
I'm not sure if I really need a CTE but it seemed like the only way to use prop.tariff and prop.stagger in the FROM clause.

A subquery in a FROM/JOIN list cannot reference other tables of the FROM/JOIN list, unless the join is declared as lateral. So declare it as LATERAL.
WITH prop AS (SELECT p.stagger, p.tariff FROM core_property p WHERE p.id = 1)
SELECT day + (hour.a * INTERVAL '1 hour') AS time,
day + (hour.b * INTERVAL '1 hour') AS timeEnd
FROM prop,
GENERATE_SERIES(date '2022-05-18' + prop.stagger, date '2022-05-20', '1 day'::INTERVAL) day,
LATERAL (SELECT UNNEST(CASE
WHEN prop.tariff = 'Economy 7' THEN ARRAY [0,12,18]
WHEN prop.tariff = 'Economy 10' THEN ARRAY [0] END) a,
UNNEST(CASE
WHEN prop.tariff = 'Economy 7' THEN ARRAY [5,13,20]
WHEN prop.tariff = 'Economy 10' THEN ARRAY [7] END) b) hour;
An exception to this is a bare set-returning function used in the FROM/JOIN list, which is implicitly treated as lateral. Which is why the `generate_series(...) doesn't throw the same error.

Related

postgres, group by date, and bucketize per hour

I would like to create a result object that can be used with Grafana for a heatmap. In order to display the data correctly I need it the output to be like:
| date | 00:00 | 01:00 | 02:00 | 03:00 | ...etc |
| 2023-01-01 | 1 | 2 | 0 | 1 | ... |
| 2023-01-02 | 0 | 0 | 1 | 1 | ... |
| 2023-01-03 | 4 | 0 | 2 | 0 | ... |
my data table structure:
trades
-----
id
closed_at
asset
So far, I know that I need to use generate_series and use the interval function to return the hours, but I need my query to plot these hours as columns, but I've not been able to do that, as its getting a bit too advanced.
So far I have the following query:
SELECT
closed_at::DATE,
COUNT(id)
FROM trades
GROUP BY closed_at
ORDER BY closed_at
It now shows the amount of rows grouped by the days, I want to further aggregate the data, so it outputs the count per hour, as shown above.
Thanks for your help!
You can add more columns, now I only add 0:00 to 05:00.
filter usage: https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-AGGREGATES
date_trunc usage: https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC
BEGIN;
CREATE temp TABLE trades (
id bigint GENERATED BY DEFAULT AS IDENTITY,
closed_a timestamp,
asset text
) ON COMMIT DROP;
INSERT INTO trades (closed_a)
SELECT
date '2023-01-01' + interval '10 min' * (random() * i * 10)::int
FROM
generate_series(1, 10) g (i);
INSERT INTO trades (closed_a)
SELECT
date '2023-01-02' + interval '10 min' * (random() * i * 10)::int
FROM
generate_series(1, 10) g (i);
SELECT
closed_a::date
,COUNT(id) FILTER (WHERE date_trunc('hour', closed_a) = closed_a::date) AS "0:00"
,COUNT(id) FILTER (WHERE date_trunc('hour', closed_a) = closed_a::date + interval '1 hour') AS "1:00"
,COUNT(id) FILTER (WHERE date_trunc('hour', closed_a) = closed_a::date + interval '2 hour') AS "2:00"
,COUNT(id) FILTER (WHERE date_trunc('hour', closed_a) = closed_a::date + interval '3 hour') AS "3:00"
,COUNT(id) FILTER (WHERE date_trunc('hour', closed_a) = closed_a::date + interval '4 hour') AS "4:00"
,COUNT(id) FILTER (WHERE date_trunc('hour', closed_a) = closed_a::date + interval '5 hour') AS "5:00"
FROM
trades
GROUP BY
1;
END;

Postgresql, Get the top 5 products that have increased in value from yesterday to today, returning the delta

I have a pricing table that contains the pricing data for products. There are around 600 unique product_id, each currently having 4 days worth of pricing data, which will eventually go up to 30 days. The table below is a small subset of the data to represent that table structure:
date
product_id
price_trend
2022-08-21
1
0.08
2022-08-22
1
0.18
2022-08-23
1
0.30
2022-08-21
2
0.15
2022-08-22
2
0.20
2022-08-23
2
0.22
So in my script, for each product_id I am trying to get yesterdays price_trend and todays price_trend and then calculate the price_delta between the two. I then order by price_delta and limit the results to 5.
I am having some issues as in some cases yesterdays price_trend is 0 and then todays price trend is 0.50 for example. This does not mean that the price trend has increased, but mostly likely that price_trend was not gathered yesterday for whatever reason.
Now I would like to remove any records where price_trend for today or yesterday equals 0, however, when I add AND pricing.trend_price > 0 the value return is just null instead.
Script:
SELECT
magic_sets_cards.name,
(SELECT pricing.trend_price
FROM pricing
WHERE pricing.product_id = magic_sets_cards_identifiers.mcm_id
AND pricing.date = (SELECT MAX(date) - INTERVAL '2 DAY' FROM pricing)
AND pricing.trend_price > 0) AS price_yesterday,
(SELECT pricing.trend_price
FROM pricing
WHERE pricing.product_id = magic_sets_cards_identifiers.mcm_id
AND pricing.date = (SELECT MAX(date) FROM pricing)
AND pricing.trend_price > 0) AS price_today,
((SELECT pricing.trend_price
FROM pricing
WHERE pricing.product_id = magic_sets_cards_identifiers.mcm_id
AND pricing.date = (SELECT MAX(date) FROM pricing)) -
(SELECT pricing.trend_price
FROM pricing
WHERE pricing.product_id = magic_sets_cards_identifiers.mcm_id
AND pricing.date = (SELECT MAX(date) - INTERVAL '2 DAY' FROM pricing))) AS price_delta
FROM magic_sets
JOIN magic_sets_cards ON magic_sets_cards.set_id = magic_sets.id
JOIN magic_sets_cards_identifiers ON magic_sets_cards_identifiers.card_id = magic_sets_cards.id
JOIN pricing ON pricing.product_id = magic_sets_cards_identifiers.mcm_id
WHERE magic_sets.code = '2X2'
AND pricing.date = (SELECT MAX(date) FROM pricing)
ORDER BY price_delta DESC
LIMIT 5
Results:
name
price_yesterday
price_today
price_delta
"Fiery Justice"
null
0.50
0.50
"Hostage Taker"
3.50
4.00
0.50
"Damnation"
17.02
17.33
0.31
"Bring to Light"
0.42
0.72
0.30
"City of Brass"
17.41
17.68
0.27
I would like to get it so that the "Fiery Justice" in this example is just ignored.
with the use of rank() you can get the output ., Look into...
Query without null rows :
with cte as (Select
product_id,
SUM(CASE WHEN rank = 1 THEN price_trend ELSE null END) today,
SUM(CASE WHEN rank = 2 THEN price_trend ELSE null END) yesterday,
SUM(CASE WHEN rank = 1 THEN price_trend ELSE 0 END) -
SUM(CASE WHEN rank = 2 THEN price_trend ELSE 0 END) as diff
FROM (
SELECT
product_id,
price_trend,
date,
rank() OVER (PARTITION BY product_id ORDER BY date DESC) as rank
FROM tableName where price_trend>0 and date between current_date-5 and current_date-4) p
WHERE rank in (1,2)
GROUP BY product_id
) select * from cte where (case when today is null or yesterday is null then 'NULL' else 'VALID' end)!='NULL'
Query with null values :
Select
product_id,
SUM(CASE WHEN rank = 1 THEN price_trend ELSE 0 END) today,
SUM(CASE WHEN rank = 2 THEN price_trend ELSE 0 END) yesterday,
SUM(CASE WHEN rank = 1 THEN price_trend ELSE 0 END) -
SUM(CASE WHEN rank = 2 THEN price_trend ELSE 0 END) as diff
FROM (
SELECT
product_id,
price_trend,
date,
rank() OVER (PARTITION BY product_id ORDER BY date DESC) as rank
FROM tableName where date between current_date-5 and current_date-4) p
WHERE rank in (1,2)
GROUP BY product_id
Change the condition :
where date between current_date-3 and current_date-2
OUTPUT :
product_id today yesterday diff
1 0.06 0.02 0.04
2 0.64 0.62 0.02
CREATE TABLE tableName
(
date date,
product_id int,
price_trend numeric(9,2)
);
INSERT INTO tableName (date ,product_id ,price_trend) VALUES ('2022-08-21 ', '1 ', '0.02');
INSERT INTO tableName (date ,product_id ,price_trend) VALUES ('2022-08-22 ', '1 ', '0.06');
INSERT INTO tableName (date ,product_id ,price_trend) VALUES ('2022-08-23 ', '1 ', '0.10');
INSERT INTO tableName (date ,product_id ,price_trend) VALUES ('2022-08-24 ', '1 ', '0.13');
INSERT INTO tableName (date ,product_id ,price_trend) VALUES ('2022-08-25 ', '1 ', '0.18');
INSERT INTO tableName (date ,product_id ,price_trend) VALUES ('2022-08-26 ', '1 ', '0.30');
INSERT INTO tableName (date ,product_id ,price_trend) VALUES ('2022-08-21 ', '2 ', '0.62');
INSERT INTO tableName (date ,product_id ,price_trend) VALUES ('2022-08-22 ', '2 ', '0.64');
INSERT INTO tableName (date ,product_id ,price_trend) VALUES ('2022-08-23 ', '2 ', '0.69');
INSERT INTO tableName (date ,product_id ,price_trend) VALUES ('2022-08-24 ', '2 ', '0.78');
INSERT INTO tableName (date ,product_id ,price_trend) VALUES ('2022-08-25 ', '2 ', '0.88');
INSERT INTO tableName (date ,product_id ,price_trend) VALUES ('2022-08-26 ', '2 ', '0.90');

Postgres Find missing dates between a dataset of two columns

Im trying to create a query that retuns missing dates between two columns and multiple rows.
Example:
leases
move_in move_out hotel_id
2021-04-01 2021-04-14 1
2021-04-17 2021-04-30 1
2021-04-01 2021-04-14 2
2021-04-17 2021-04-30 2
Result should be
date hotel_id
2021-04-15 1
2021-04-16 1
2021-04-15 2
2021-04-16 2
You're finding the difference between two sets. One is the leased hotel days. The other is all days in the month of April. And you're doing this for all hotels.
We can make a set of all days of April for all hotels. First we need to build the set of all days in the month of April: generate_series('2022-04-01'::date, '2022-04-30'::date, '1 day').
Then we need to cross join this with all hotel IDs.
select *
from generate_series('2021-04-01'::date, '2021-04-30'::date, '1 day') as dates(day)
cross join (
select distinct hotel_id as id
from leases
) as hotels(id)
Now for each day we can left join this with the leases for that day.
select *
from generate_series('2021-04-01'::date, '2021-04-30'::date, '1 day') as dates(day)
cross join (
select distinct hotel_id as id
from leases
) as hotels(id)
Any days without a lease won't have a lease.id, so filter on that.
select day, hotels.id
from generate_series('2021-04-01'::date, '2021-04-30'::date, '1 day') as dates(day)
cross join (
select distinct hotel_id as id
from leases
) as hotels(id)
left join leases on day between move_in and leases.move_out and hotel_id = hotels.id
where leases.id is null
order by hotels.id, day
Demonstration.
If you're using postgresql 14+ you can use multiranges to do this:
CREATE TEMP TABLE t (
"move_in" DATE,
"move_out" DATE,
"hotel_id" INTEGER
);
INSERT INTO t
("move_in", "move_out", "hotel_id")
VALUES ('2021-04-01', '2021-04-14', '1')
, ('2021-04-17', '2021-04-30', '1')
, ('2021-05-03', '2021-05-30', '1') -- added this as a test case
, ('2021-04-01', '2021-04-14', '2')
, ('2021-04-17', '2021-04-30', '2');
SELECT hotel_id, datemultirange(DATERANGE(MIN(move_in), MAX(move_out))) - range_agg(DATERANGE(move_in, move_out, '[]')) AS r
FROM t
GROUP BY hotel_id
returns
+--------+-------------------------------------------------+
|hotel_id|r |
+--------+-------------------------------------------------+
|2 |{[2021-04-14,2021-04-17)} |
|1 |{[2021-04-14,2021-04-17),[2021-04-30,2021-05-03)}|
+--------+-------------------------------------------------+
If you want to have 1 row per day you can use unnest and generate_series to expand the multiranges:
WITH available_ranges AS(
SELECT hotel_id, unnest(datemultirange(DATERANGE(MIN(move_in), MAX(move_out), '[]')) - range_agg(DATERANGE(move_in, move_out, '[]'))) AS r
FROM t
GROUP BY hotel_id
)
SELECT hotel_id, generate_series(lower(r), upper(r) - 1, '1 day'::interval)
FROM available_ranges
ORDER BY 1, 2
;
returns
+--------+---------------------------------+
|hotel_id|generate_series |
+--------+---------------------------------+
|1 |2021-04-15 00:00:00.000000 +00:00|
|1 |2021-04-16 00:00:00.000000 +00:00|
|1 |2021-05-01 00:00:00.000000 +00:00|
|1 |2021-05-02 00:00:00.000000 +00:00|
|2 |2021-04-15 00:00:00.000000 +00:00|
|2 |2021-04-16 00:00:00.000000 +00:00|
+--------+---------------------------------+

How to use date_part function to split value per month to each day and country

DB-Fiddle
CREATE TABLE sales (
id SERIAL PRIMARY KEY,
country VARCHAR(255),
sales_date DATE,
sales_volume DECIMAL,
fix_costs DECIMAL
);
INSERT INTO sales
(country, sales_date, sales_volume, fix_costs
)
VALUES
('DE', '2020-01-03', '500', '2000'),
('FR', '2020-01-03', '350', '2000'),
('None', '2020-01-31', '0', '2000'),
('DE', '2020-02-15', '0', '5000'),
('FR', '2020-02-15', '0', '5000'),
('None', '2020-02-29', '0', '5000'),
('DE', '2020-03-27', '180', '4000'),
('FR', '2020-03-27', '970', '4000'),
('None', '2020-03-31', '0', '4000');
Expected Result:
sales_date | country | sales_volume | fix_costs
-------------|--------------|------------------|------------------------------------------
2020-01-03 | DE | 500 | 37.95 (= 2000/31 = 64.5 x 0.59)
2020-01-03 | FR | 350 | 26.57 (= 2000/31 = 64.5 x 0.41)
-------------|--------------|------------------|------------------------------------------
2020-02-15 | DE | 0 | 86.21 (= 5000/28 = 172.4 x 0.50)
2020-02-15 | FR | 0 | 86.21 (= 5000/28 = 172.4 x 0.50)
-------------|--------------|------------------|------------------------------------------
2020-03-27 | DE | 180 | 20.20 (= 4000/31 = 129.0 x 0.16)
2020-03-27 | FR | 970 | 108.84 (= 4000/31 = 129.0 x 0.84)
-------------|--------------|------------------|-------------------------------------------
The column fix_costs in the expected result is calculated as the following:
Step 1) Get the daily rate of the fix_costs per month.(2000/31 = 64.5; 5000/29 = 172.4; 4000/31 = 129.0)
Step 2) Split the daily value to the countries DE and FR based on their share in the sales_volume. (500/850 = 0.59; 350/850 = 0.41; 180/1150 = 0.16; 970/1150 = 0.84)
Step 3) In case the sales_volume is 0 the daily rate gets split 50/50 to DE and FR as you can see for 2020-02-15.
In MariaDB I was able to this with the below query:
SELECT
s.sales_date,
s.country,
s.sales_volume,
(CASE WHEN SUM(sales_volume) OVER (PARTITION BY sales_date) > 0
THEN ((s.fix_costs/ DAY(LAST_DAY(sales_date))) *
sales_volume / NULLIF(SUM(sales_volume) OVER (PARTITION BY sales_date), 0)
)
ELSE (s.fix_costs / DAY(LAST_DAY(sales_date))) * 1 / SUM(country <> 'None') OVER (PARTITION by sales_date)
END) AS imputed_fix_costs
FROM sales s
WHERE country <> 'None'
GROUP BY 1,2,3
ORDER BY 1;
However, in PostgresSQL I get an error on DAY(LAST_DAY(sales_date)).
I tried to replace this part with (date_part('DAY', ((date_trunc('MONTH', s.sales_date) + INTERVAL '1 MONTH - 1 DAY')::date)))
However, this is causing another error.
How do I need to modify the query to get the expected result?
The Postgresql equivalent of DAY(LAST_DAY(sales_date)) would be:
extract(day from (date_trunc('month', sales_date + interval '1 month') - interval '1 day'))
The expression SUM(country <> 'None') also needs to be fixed as
SUM(case when country <> 'None' then 1 else 0 end)
It might be a good idea to define this compatibility function:
create function last_day(d date) returns date as
$$
select date_trunc('month', d + interval '1 month') - interval '1 day';
$$ language sql immutable;
Then the first expression becomes simply
extract(day from last_day(sales_date))
I would create a function to return the last day (number) for a given date - which is actually the "length" of the month.
create function month_length(p_input date)
returns integer
as
$$
select extract(day from (date_trunc('month', p_input) + interval '1 month - 1 day'));
$$
language sql
immutable;
Then the query can be written as:
select sales_date, country,
sum(sales_volume),
sum(fix_costs_per_day * cost_factor)
from (
select id, country, sales_date, sales_volume, fix_costs,
fix_costs / month_length(sales_date) as fix_costs_per_day,
case
when sum(sales_volume) over (partition by sales_date) > 0
then sales_volume::numeric / sum(sales_volume) over (partition by sales_date)
else sales_volume::numeric / 2
end as cost_factor
from sales
where country <> 'None'
) t
group by sales_date, country
order by sales_date, country

PostgreSQL Query: Column Sum for the latest available date of each month

Given a pSQL table which looks like this:
date | data
2015-01-23 | 15
2015-01-23 | 11
2015-02-25 | 15
2015-02-25 | 11
2015-01-25 | 24
2015-01-25 | 2
2015-01-25 | 13
2015-01-29 | 5
2015-02-28 | 12
2015-02-28 | 1
2015-05-15 | 12
2015-05-16 | 1
How can I get the sum of data for the last available date of each month?
Example result:
date | data
2015-01-29 | 5
2015-02-28 | 13
2015-05-16 | 1
This is what I've tried so far:
SELECT year,month,max(day),sum(data) FROM
(
SELECT
date,
date_part('year', date) AS year,
date_part('month', date) AS month,
date_part('day', date) AS day,
sum(data) AS tdata
FROM table a
GROUP BY date, date_part('year', date), date_part('month', date), date_part('day', date)
ORDER BY year ASC, month ASC, day ASC
) dataq
GROUP BY year,month
The sum I get from this appears to be wrong.
You should calculate the sums in the inner query, grouping by a single day. Select latest day in month in the outer query:
select distinct on (year, month)
make_date(year::int, month::int, day::int) as date,
data
from (
select
date_part('year', date) as year,
date_part('month', date) as month,
date_part('day', date) as day,
sum(data) as data
from my_table
group by date
) s
order by year, month, day desc
date | data
------------+------
2015-01-29 | 5
2015-02-28 | 13
2015-05-16 | 1
(3 rows)
I guess you need just to remove days that you don't want to sum. For example using NOT EXISTS as follows:
SELECT year,month,max(day),sum(tdata) tdata FROM
(
SELECT
d,
date_part('year', d) AS year,
date_part('month', d) AS month,
date_part('day', d) AS day,
sum(data) AS tdata
FROM tab a
WHERE NOT EXISTS
(
SELECT *
FROM tab a2
WHERE date_part('year', a.d) = date_part('year', a2.d) AND
date_part('month', a.d) = date_part('month', a2.d) AND
date_part('day', a.d) < date_part('day', a2.d)
)
GROUP BY d, date_part('year', d), date_part('month', d), date_part('day', d)
ORDER BY year ASC, month ASC, day ASC
) dataq
GROUP BY year,month
SQLFiddle