How to set new column equal to subquery PostGreSQL - postgresql

Full disclosure: I've seen 1 variation of this question for mySQL, and the PostgreSQL answer didn't satisfy me.
I have 2 tables: Reviews & businesses. In the Reviews table, the only 3 relevant columns for the purpose of this question are 'business_id', 'date' (yyyy-mm-dd), and stars (1-5), and the primary key is (review_id). In the businesses table, the relevant columns are 'business_id', 'year', and 'month'.' The 'year' and 'month' columns are there because there is another column in the business table called 'review_count', which represents the number of reviews a business received on each month of each year. Because of this, the composite primary key of this table is (business_id, year, month).
Essentially, I am trying to create a column in the business table with the average rating (represented by stars) a business received on each month of each year it was open.
The following query gives me the exact result I want:
SELECT round(CAST(AVG(stars) AS NUMERIC), 2)
FROM reviews_for_trending_businesses
WHERE business_id IN (SELECT DISTINCT(business_id)
FROM trending_businesses_v2)
GROUP BY business_id, EXTRACT("year" FROM reviews_for_trending_businesses.date), EXTRACT('month' FROM reviews_for_trending_businesses.date);
This code returns the column and all the correct values that I want to insert into my business table.
However, when I try to actually update the table, I get an error saying more than one row was returned by the subquery used as an expression. This is the code I'm trying to update with:
UPDATE trending_businesses_v2
SET avg_monthly_rating = (SELECT round(CAST(AVG(stars) AS NUMERIC), 2)
FROM reviews_for_trending_businesses
WHERE business_id IN (SELECT DISTINCT(business_id)
FROM trending_businesses_v2)
GROUP BY business_id, EXTRACT("year" FROM reviews_for_trending_businesses.date), EXTRACT('month' FROM reviews_for_trending_businesses.date);
I've tried a number of other solutions as well, including using joins, but keep getting a similar error.
UPDATE: Still No Answer but getting Closer:
Still can't quite figure out where I'm going wrong here. I also don't understand why I have to groupby 'rtb.date' here if I'm only extracting values from it (returned error if I didn't).
UPDATE trending_businesses_v2 tb
SET avg_monthly_rating = t.val
FROM (SELECT business_id, EXTRACT("year" FROM rtb.date) AS year, EXTRACT('month' FROM rtb.date) AS month, round(CAST(AVG(stars) AS NUMERIC), 2) as val
FROM reviews_for_trending_businesses rtb
WHERE business_id IN (SELECT DISTINCT(business_id)
FROM trending_businesses_v2
)
GROUP BY business_id, year, month, rtb.date
) t
WHERE t.business_id = tb.business_id AND
t.year = tb.year AND t.month = tb.month;

You need to match the rows, presumably using a business id and date. Something like this:
UPDATE trending_businesses_v2 tb
SET avg_monthly_rating = t.val
FROM (SELECT business_id, date_trunc('month', rtb.date) as yyyymm, round(CAST(AVG(stars) AS NUMERIC), 2) as val
FROM reviews_for_trending_businesses rtb
WHERE business_id IN (SELECT DISTINCT(business_id)
FROM trending_businesses_v2
)
GROUP BY business_id, date_trunc('month', rtb.date)
) t
WHERE t.business_id = tb.business_id AND
t.yyyymm = tb.?;

Related

select opposite operation group by month SQL Redshift

Iam trying to extract a list of operation that was inactive in each month :
table 1 "all_opreration" is containing the whole list of operation id
table all_operation
the second table "active_operation" is containing the operations that was active on the specified month
table active operation
So I want to get "inactive operation" by month (for each month the operation that was not in active_operation table
==> Wished table :
wished table inactive operation
I have tried several ways but without success
Thank you in advance
You just need to left join and check the the right side is NULL. You will need to expand your all_operations to have all dates but this can be done with a cross join. Like this:
Set up:
create table all_ops (op_id varchar(32));
insert into all_ops values ('A'), ('B'), ('C');
create table active_ops (month date, op_id varchar(32));
insert into active_ops values ('2021-10-01', 'A'),
('2021-10-01', 'B'),
('2021-07-01', 'C');
Find the missing data / id pairs:
select l.month, l.op_id
from (
select month, op_id
from all_ops
cross join (select distinct month from active_ops) m
) l
left join active_ops r
on l.op_id = r.op_id and l.month = r.month
where r.op_id is null;

How to join on closest date in Postgresql

Suppose, I have following tables
product_prices
product|price|date
-------+-----+----------
apple |10 |2014-03-01
-------+-----+----------
apple |20 |2014-05-02
-------+-----+----------
egg |2 |2014-03-03
-------+-----+----------
egg |4 |2015-10-12
purchases:
user|product|date
----+-------+----------
John|apple |2014-03-02
----+-------+----------
John|apple |2014-06-03
----+-------+----------
John|egg |2014-08-13
----+-------+----------
John|egg |2016-08-13
What I need is table similar to this:
name|product|purchase date |price date|price
----+-------+--------------+----------+-----
John|apple |2014-03-02 |2014-03-01|10
----+-------+--------------+----------+-----
John|apple |2014-06-03 |2014-05-02|20
----+-------+--------------+----------+-----
John|egg |2014-08-13 |2014-08-13|2
----+-------+--------------+----------+-----
John|egg |2016-08-13 |2015-10-12|4
Or "what is the price for product at this day". Where price is calculated based on date from products table.
On real DB I tried to use something similar to:
SELECT name, product, pu.date, pp.date, pp.price
FROM purchases AS pu
LEFT JOIN product_prices AS pp
ON pu.date = (
SELECT date
FROM product_prices
ORDER BY date DESC LIMIT 1);
But I keep either getting only left part of table (with (null) instead of product dates and prices) or many rows with all the combinations of prices and dates.
I would suggest changing product_prices table to use a daterange column instead (or at least a start_date and an end_date).
You can use an exclusion constraint to make sure you never have overlapping ranges for one product and an insert trigger that "closes" the "current" prices and creates a new unbounded range for the newly inserted price.
A daterange can efficiently be indexed and with that in place the query gets as easy as:
SELECT name, product, pu.date, pp.valid_during, pp.price
FROM purchases AS pu
LEFT JOIN product_prices AS pp ON pu.date <# pp.valid_during
(assuming the range column is named valid_during)
The exclusion constraint would only work however if the product was an integer (not a varchar) - but I guess your real product_purchases table uses a foreign key to some product table anyway (which is an integer).
The new table definitions could look something like this:
create table purchase_prices
(
product_id integer not null references products,
price numeric(16,4) not null,
valid_during daterange not null
);
And the constraint that prevents overlapping ranges:
alter table purchase_prices
add constraint check_price_range
exclude using gist (product_id with =, valid_during with &&);
The constraint needs the btree_gist extension.
As always improving query speed comes with a price and in this case it's the higher maintenance costs for the GiST index. You would need to run some tests to see if the easier (and most probably much faster) query outweighs the slower insert performance on purchase_prices.
Look at your scalar sub-query very closely. It is not correlated back to the outer query. In other words, it will return the same result every time: the latest date in the product_prices table. Period. Think about the query out of context:
SELECT date
FROM product_prices
ORDER BY date DESC LIMIT 1
There are two problems with it:
It will return 2015-10-12 for every row in the join and ultimately, nothing was purchased on that date, hence, null.
Your approximation of closest is that the dates are equal. Unless you have a product_prices row for every product for every single date, you'll always have misses. "Closest" implies distance and ranking.
WITH close_prices_by_purchase AS (
SELECT
p.user,
p.product,
p.date pp.date,
pp.price,
row_number() over (partition by pp.product, order by pp.date desc) as distance -- calculate distance between purchase date and price date
FROM purchases AS p
INNER JOIN product_prices AS pp on pp.product = p.product
WHERE pp.date < p.date
)
SELECT user as name, product, pu.date as purchase_date, pp.date as price_date, price
FROM close_prices_by_purchase AS cpbp
WHERE distance = 1; -- shortest distance
You can try something like this, although I am sure there's a better way:
with diffs as (
select
a.*,
b."date" as bdate,
b.price,
b."date" - a."date" as diffdays,
row_number() over (
partition by "user", a."product", a."date"
order by "user", a."product", a."date", b."date" - a."date" desc
) as sr
from purchases a
inner join product_prices b on a.product = b.product
where b."date" - a."date" < 1
)
select
"user" as "name",
product,
"date" as "purchase date",
bdate as "price date",
price
from diffs
where sr = 1
Example: https://www.db-fiddle.com/f/dwQ9EXmp1SdpNpxyV1wc6M/0
Explanation
I attempted to join both tables and find the difference between dates of purchase and price, and ranked them by closest date prior to the purchase. Rank of 1 will go to the closest date. Then, data with rank of 1 was extracted.
This is a great place to use date ranges! We know the start date of the price range and we can use a window function to get the next date. At that point, it's really easy to figure out the price on any day.
with price_ranges as
(select product,
price,
date as price_date,
daterange(date, lead(date, 1)
OVER (partition by product order by date), '[)'
) as valid_price_range from product_prices
)
select "user" as name,
purchases.product,
purchases.date,
price_date,
price
from purchases
join price_ranges on purchases.product = price_ranges.product
and purchases.date <# price_ranges.valid_price_range
order by purchases.date;

PostgreSQL crosstab() alternative with CASE and aggregates

I want to create a pivot table view showing month on month sum of bookings for every travel_mode.
Table bookings:
timestamp
, bookings
, provider_id
Table providers:
provider_id
, travel_mode
Pivot table function and crosstab functions are not to be used to do this. So I am trying to use JOIN and CASE. Following is the query:
SELECT b.month,
(CASE WHEN p.travel_mode=train then b.amount end)train,
(CASE WHEN p.travel_mode=bus then b.amount end)bus,
(CASE WHEN p.travel_mode=air then b.amount end)air
FROM
(SELECT to_char(date_,month) as month, travel_mode, sum(bookings) as amount
from bookings as b
join providers as p
on b.provider_id=p.provider_id
group by b.month, p.travel_mode)
group by b.month;
However I am getting an error which says:
subquery in FROM must have an alias LINE 6:
And when I add an alias it throws an error saying:
column p.travel_mode must appear in the GROUP BY clause or be used in an aggregate function
LINE 2:
The final result should be something like this
Month Air Bus Train
01 Amount(air) Amount(Bus) Amount(train)
I have a feeling it is a minor error somewhere but I am unable to figure it out at all.
P.S. I had to remove all quotations in the question as it was not allowing me to post this. But those are being taken care of in the actual query.
Multiple problems. The missing table alias is just one of them. This query should work:
SELECT month
, sum(CASE WHEN travel_mode = 'train' THEN amount END) AS train
, sum(CASE WHEN travel_mode = 'bus' THEN amount END) AS bus
, sum(CASE WHEN travel_mode = 'air' THEN amount END) AS air
FROM (
SELECT to_char(timestamp, 'MM') AS month, travel_mode, sum(bookings) AS amount
FROM bookings b
JOIN providers p USING (provider_id)
GROUP BY month, p.travel_mode
) sub
GROUP BY month;
Missing single quotes for string literals. (You seem to have removed those being under the wrong impression you couldn't post quotations.)
Missing table alias for the subquery - just like the 1st error message says.
In the outer query, table names (or aliases) of underlying tables in the subquery are not visible. Only the table alias of the subquery is. Since there is only one subquery, you don't need table-qualification at all there.
month is an output column name (not in the underlying table), so the table qualification b.month was wrong, too.
You seem to want 2-digit numbers for months. Use the template pattern 'MM' instead of 'month' with to_char().
The aggregation in the outer query does not work like you had it - just like your 2nd error message says. You have to wrap the outer CASE expression in a aggregate function. You might as well use min() or max() in this case, because there are never more than one rows after the subquery.
Still unclear where date_ is coming from? You mean timestamp? (which is not a good identifier).
But you don't need the subquery to begin with and can simplify to:
SELECT to_char(timestamp, 'MM') AS month
, sum(CASE WHEN p.travel_mode = 'train' THEN b.bookings END) AS train
, sum(CASE WHEN p.travel_mode = 'bus' THEN b.bookings END) AS bus
, sum(CASE WHEN p.travel_mode = 'air' THEN b.bookings END) AS air
FROM bookings b
JOIN providers p USING (provider_id)
GROUP BY 1;
For best performance you should still use crosstab(), though:
PostgreSQL Crosstab Query
You have to name the subquery as the error message says:
SELECT b.month,
(CASE WHEN p.travel_mode=train then b.amount end)train,
(CASE WHEN p.travel_mode=bus then b.amount end)bus,
(CASE WHEN p.travel_mode=air then b.amount end)air
FROM
(SELECT to_char(date_,month) as month, travel_mode, sum(bookings) as amount
from bookings as b
join providers as p
on b.provider_id=p.provider_id
group by b.month, p.travel_mode)
**as foo** group by b.month;
Remove the stars to make it work.

Finding Detail Based on date in SSRS

I have a Table that I am using to pull order details in SSRS that has when the price of a product number was changed. It has Data Changed and Updated Cost.
I am pairing up two different tables to create a report that is the cost of the package at the time of the order. Here is how I am pulling my data:
SELECT
WAREHOUSE.ActPkgCostHist.ItemNo AS [ActPkgCostHist ItemNo]
,WAREHOUSE.ActPkgCostHist.ActPkgCostDate
,WAREHOUSE.ActPkgCostHist.ActPkgCost
,ORDER.OrderHist.OrderNo
,ORDER.OrderHist.ItemNo AS [OrderHist ItemNo]
,ORDER.OrderHist.DispenseDt
FROM
WAREHOUSE.ActPkgCostHist
INNER JOIN ORDER.OrderHist
ON WAREHOUSE.ActPkgCostHist.ItemNo = ORDER.OrderHist.ItemNo
Catalog=ShippedOrders
ActPkgCostHist Table has What the cost of an Item was and what date the cost was changed.
OrderHist Table has the complete details of the order except the ActPkgCost at the time of the purchase.
I am attempting to create a table that Has order number, the date of the order and the package cost at the time of the order.
The ROW_NUMBER function is very useful for cases like this.
SELECT WAREHOUSE.ActPkgCostHist.ItemNo AS [ActPkgCostHist ItemNo]
,WAREHOUSE.ActPkgCostHist.ActPkgCostDate
,WAREHOUSE.ActPkgCostHist.ActPkgCost
,ORDER.OrderHist.OrderNo
,ORDER.OrderHist.ItemNo AS [OrderHist ItemNo]
,ORDER.OrderHist.DispenseDt
FROM ORDER.OrderHist
INNER JOIN (
SELECT ItemNo, ActPkgCostDate, ActPkgCost
, ROW_NUMBER() OVER (PARTITION BY ItemNo ORDER BY ActPkgCostDate DESC) as RN
FROM WAREHOUSE.ActPkgCostHist
--if there are future dated changes, limit ActPkgCostDate to be <= the current date
) ActPkgCostHist on ActPkgCostHist.ItemNo = OrderHist.ItemNo
WHERE RN = 1
What this subquery does is group the cost history by ItemNo. Then for each one, it ranks the changes by recency with the most recent change being 1. Then in the main query you filter it to just rows with a 1.
For each item in each order you have to find the latest cost date and use it when joining with the cost table
SELECT C.ItemNo AS [ActPkgCostHist ItemNo],
C.ActPkgCostDate,
C.ActPkgCost,
O.OrderNo,
O.ItemNo AS [OrderHist ItemNo],
O.DispenseDt
FROM WAREHOUSE.ActPkgCostHist AS C
-- JOIN order detail with cost table in order to define the cost date per item/order
INNER JOIN (SELECT Max(CH.ActPkgCostDate) AS ItemCostDate,
OH.OrderNo,
OH.ItemNo,
OH.DispenseDt
FROM WAREHOUSE.ActPkgCostHist AS CH
INNER JOIN ORDER.OrderHist AS OH
ON CH.ItemNo = OH.ItemNo
-- Get the latest cost date only from dates before order date
WHERE CH.ActPkgCostDate <= OH.DispenseDt
GROUP BY OH.OrderNo,
OH.ItemNo,
OH.DispenseDt) AS O
ON C.ItemNo = O.ItemNo
AND C.ActPkgCostDate = O.ItemCostDate

multiple extract() with WHERE clause possible?

So far I have come up with the below:
WHERE (extract(month FROM orders)) =
(SELECT min(extract(month from orderdate))
FROM orders)
However, that will consequently return zero to many rows, and in my case, many, because many orders exist within that same earliest (minimum) month, i.e. 4th February, 9th February, 15th Feb, ...
I know that a WHERE clause can contain multiple columns, so why wouldn't the below work?
WHERE (extract(day FROM orderdate)), (extract(month FROM orderdate)) =
(SELECT min(extract(day from orderdate)), min(extract(month FROM orderdate))
FROM orders)
I simply get: SQL Error: ORA-00920: invalid relational operator
Any help would be great, thank you!
Sample data:
02-Feb-2012
14-Feb-2012
22-Dec-2012
09-Feb-2013
18-Jul-2013
01-Jan-2014
Output:
02-Feb-2012
14-Feb-2012
Desired output:
02-Feb-2012
I recreated your table and found out you just messed up the brackets a bit. The following works for me:
where
(extract(day from OrderDate),extract(month from OrderDate))
=
(select
min(extract(day from OrderDate)),
min(extract(month from OrderDate))
from orders
)
Use something like this:
with cte1 as (
select
extract(month from OrderDate) date_month,
extract(day from OrderDate) date_day,
OrderNo
from tablename
), cte2 as (
select min(date_month) min_date_month, min(date_day) min_date_day
from cte1
)
select cte1.*
from cte1
where (date_month, date_day) = (select min_date_month, min_date_day from cte2)
A common table expression enables you to restructure your data and then use this data to do your select. The first cte-block (cte1) selects the month and the day for each of your table rows. Cte2 then selects min(month) and min(date). The last select then combines both ctes to select all rows from cte1 that have the desired month and day.
There is probably a shorter solution to that, however I like common table expressions as they are almost all the time better to understand than the "optimal, shortest" query.
If that is really what you want, as bizarre as it seems, then as a different approach you could forget the extracts and the subquery against the table to get the minimums, and use an analytic approach instead:
select orderdate
from (
select o.*,
row_number() over (order by to_char(orderdate, 'MMDD')) as rn
from orders o
)
where rn = 1;
ORDERDATE
---------
01-JAN-14
The row_number() effectively adds a pseudo-column to every row in your original table, based on the month and day in the order date. The rn values are unique, so there will be one row marked as 1, which will be from the earliest day in the earliest month. If you have multiple orders with the same day/month, say 01-Jan-2013 and 01-Jan-2014, then you'll still only get exactly one with rn = 1, but which is picked is indeterminate. You'd need to add further order by conditions to make it deterministic, but I have no idea what you might want.
That is done in the inner query; the outer query then filters so that only the records marked with rn = 1 is returned; so you get exactly one row back from the overall query.
This also avoids the situation where the earliest day number is not in the earliest month number - say if you only had 01-Jan-2014 and 02-Feb-2014; comparing the day and month separately would look for 01-Feb-2014, which doesn't exist.
SQL Fiddle (with Thomas Tschernich's anwer thrown in too, giving the same result for this data).
To join the result against your invoice table, you don't need to join to the orders table again - especially not with a cross join, which is skewing your results. You can do the join (at least) two ways:
SELECT
o.orderno,
to_char(o.orderdate, 'DD-MM-YYYY'),
i.invno
FROM
(
SELECT o.*,
row_number() over (order by to_char(orderdate, 'MMDD')) as rn
FROM orders o
) o, invoices i
WHERE i.invno = o.invno
AND rn = 1;
Or:
SELECT
o.orderno,
to_char(o.orderdate, 'DD-MM-YYYY'),
i.invno
FROM
(
SELECT orderno, orderdate, invno
FROM
(
SELECT o.*,
row_number() over (order by to_char(orderdate, 'MMDD')) as rn
FROM orders o
)
WHERE rn = 1
) o, invoices i
WHERE i.invno = o.invno;
The first looks like it does more work but the execution plans are the same.
SQL Fiddle with your pastebin-supplied query that gets two rows back, and these two that get one.