This question already has answers here:
Select first row in each GROUP BY group?
(20 answers)
Closed 3 years ago.
I have the following table in PostgreSQL and I want to display the maximum sale figure of each fname along with the relevant saledate. Please help me with the solution code.
CREATE TABLE CookieSale (
ID VARCHAR(4),
fname VARCHAR(15),
sale FLOAT,
saleDate DATE
);
INSERT INTO CookieSale
VALUES
('E001', 'Linda', 1000.00, '2016-01-30'),
('E002', 'Sally', 750.00, '2016-01-30'),
('E003', 'Zindy', 500.00, '2016-01-30'),
('E001', 'Linda', 150.00, '2016-02-01'),
('E001', 'Linda', 5000.00, '2016-02-01'),
('E002', 'Sally', 250.00, '2016-02-01'),
('E001', 'Linda', 250.00, '2016-02-02'),
('E002', 'Sally', 150.00, '2016-02-02'),
('E003', 'Zindy', 50.00, '2016-02-02');
I tried with
SELECT fname, MAX(sale), saleDate
FROM CookieSale;
I need the results to be like
"Lynda | 5000.00 | 2016-02-01"
Your description and expected results are inconsistent. Description says "maximum sale figure of each fname" but expected results indicates only the maximum overall. And neither has anything to do with the Title. Please try being a little more consistent.
I'll take description as what is actually wanted. For that use the Window function RANK within a sub select and the outer select just take rank 1.
select fname, sale, saledate
from ( select cs.*, rank() over (partition by fname order by fname, sale desc) rk
from cookiesales cs
) csr
where rk = 1;
Related
I have a table that contains employee names, sales from current year, and sales from last year. Lets call the two sales columns 2022 and 2021. Im looking to sort my table by the highest difference between this year and last years sales. For example, the highest difference would be at the top.
Currently I have it as
SELECT
DISTINCT customerid,
full_name,
"2012 Sales",
"2013 Sales"
FROM
customer_loyalty
ORDER BY "2013 Sales" DESC
limit 10;
Can i just insert a where conditon like 2013-2012 ASC?
This should work:
SELECT
DISTINCT customerid,
full_name,
`2012 Sales`,
`2013 Sales`
FROM
customer_loyalty
ORDER BY
`2013 Sales` - `2012 Sales` DESC
LIMIT 10;
Student Records are updated for subject and update date. Student can be enrolled in one or multiple subjects. I would like to get each student record with most subject update date and status.
CREATE TABLE Student
(
StudentID int,
FirstName varchar(100),
LastName varchar(100),
FullAddress varchar(100),
CityState varchar(100),
MathStatus varchar(100),
MUpdateDate datetime2,
ScienceStatus varchar(100),
SUpdateDate datetime2,
EnglishStatus varchar(100),
EUpdateDate datetime2
);
Desired query output, I am using CTE method but trying to find alternative and better way.
SELECT StudentID, FirstName, LastName, FullAddress, CityState, [SubjectStatus], UpdateDate
FROM Student
;WITH orginal AS
(SELECT * FROM Student)
,Math as
(
SELECT DISTINCT StudentID, FirstName, LastName, FullAddress, CityState,
ROW_NUMBER OVER (PARTITION BY StudentID, MathStatus ORDER BY MUpdateDate DESC) as rn
, _o.MathStatus as SubjectStatus, _o.MupdateDate as UpdateDate
FROM original as o
left join orignal as _o on o.StudentID = _o.StudentID
where _o.MathStatus is not null and _o.MUpdateDate is not null
)
,Science AS
(
...--Same as Math
)
,English AS
(
...--Same As Math
)
SELECT * FROM Math WHERE rn = 1
UNION
SELECT * FROM Science WHERE rn = 1
UNION
SELECT * FROM English WHERE rn = 1
First: storing data in a denormalized form is not recommended. Some data model redesign might be in order. There are multiple resources about data normalization available on the web, like this one.
Now then, I made some guesses about how your source table is populated based on the query you wrote. I generated some sample data that could show how the source data is created. Besides that I also reduced the number of columns to reduce my typing efforts. The general approach should still be valid.
Sample data
create table Student
(
StudentId int,
StudentName varchar(15),
MathStat varchar(5),
MathDate date,
ScienceStat varchar(5),
ScienceDate date
);
insert into Student (StudentID, StudentName, MathStat, MathDate, ScienceStat, ScienceDate) values
(1, 'John Smith', 'A', '2020-01-01', 'B', '2020-05-01'),
(1, 'John Smith', 'A', '2020-01-01', 'B+', '2020-06-01'), -- B for Science was updated to B+ month later
(2, 'Peter Parker', 'F', '2020-01-01', 'A', '2020-05-01'),
(2, 'Peter Parker', 'A+', '2020-03-01', 'A', '2020-05-01'), -- Spider-Man would never fail Math, fixed...
(3, 'Tom Holland', null, null, 'A', '2020-05-01'),
(3, 'Tom Holland', 'A-', '2020-07-01', 'A', '2020-05-01'); -- Tom was sick for Math, but got a second chance
Solution
Your question title already contains the word unpivot. That word actually exists in T-SQL as a keyword. You can learn about the unpivot keyword in the documentation. Your own solution already contains common table expression, these constructions should look familiar.
Steps:
cte_unpivot = unpivot all rows, create a Subject column and place the corresponding values (SubjectStat, Date) next to it with a case expression.
cte_recent = number the rows to find the most recent row per student and subject.
Select only those most recent rows.
This gives:
with cte_unpivot as
(
select up.StudentId,
up.StudentName,
case up.[Subject]
when 'MathStat' then 'Math'
when 'ScienceStat' then 'Science'
end as [Subject],
up.SubjectStat,
case up.[Subject]
when 'MathStat' then up.MathDate
when 'ScienceStat' then up.ScienceDate
end as [Date]
from Student s
unpivot ([SubjectStat] for [Subject] in ([MathStat], [ScienceStat])) up
),
cte_recent as
(
select cu.StudentId, cu.StudentName, cu.[Subject], cu.SubjectStat, cu.[Date],
row_number() over (partition by cu.StudentId, cu.[Subject] order by cu.[Date] desc) as [RowNum]
from cte_unpivot cu
)
select cr.StudentId, cr.StudentName, cr.[Subject], cr.SubjectStat, cr.[Date]
from cte_recent cr
where cr.RowNum = 1;
Result
StudentId StudentName Subject SubjectStat Date
----------- --------------- ------- ----------- ----------
1 John Smith Math A 2020-01-01
1 John Smith Science B+ 2020-06-01
2 Peter Parker Math A+ 2020-03-01
2 Peter Parker Science A 2020-05-01
3 Tom Holland Math A- 2020-07-01
3 Tom Holland Science A 2020-05-01
Suppose, I have following tables
product_prices
product|price|date
-------+-----+----------
apple |10 |2014-03-01
-------+-----+----------
apple |20 |2014-05-02
-------+-----+----------
egg |2 |2014-03-03
-------+-----+----------
egg |4 |2015-10-12
purchases:
user|product|date
----+-------+----------
John|apple |2014-03-02
----+-------+----------
John|apple |2014-06-03
----+-------+----------
John|egg |2014-08-13
----+-------+----------
John|egg |2016-08-13
What I need is table similar to this:
name|product|purchase date |price date|price
----+-------+--------------+----------+-----
John|apple |2014-03-02 |2014-03-01|10
----+-------+--------------+----------+-----
John|apple |2014-06-03 |2014-05-02|20
----+-------+--------------+----------+-----
John|egg |2014-08-13 |2014-08-13|2
----+-------+--------------+----------+-----
John|egg |2016-08-13 |2015-10-12|4
Or "what is the price for product at this day". Where price is calculated based on date from products table.
On real DB I tried to use something similar to:
SELECT name, product, pu.date, pp.date, pp.price
FROM purchases AS pu
LEFT JOIN product_prices AS pp
ON pu.date = (
SELECT date
FROM product_prices
ORDER BY date DESC LIMIT 1);
But I keep either getting only left part of table (with (null) instead of product dates and prices) or many rows with all the combinations of prices and dates.
I would suggest changing product_prices table to use a daterange column instead (or at least a start_date and an end_date).
You can use an exclusion constraint to make sure you never have overlapping ranges for one product and an insert trigger that "closes" the "current" prices and creates a new unbounded range for the newly inserted price.
A daterange can efficiently be indexed and with that in place the query gets as easy as:
SELECT name, product, pu.date, pp.valid_during, pp.price
FROM purchases AS pu
LEFT JOIN product_prices AS pp ON pu.date <# pp.valid_during
(assuming the range column is named valid_during)
The exclusion constraint would only work however if the product was an integer (not a varchar) - but I guess your real product_purchases table uses a foreign key to some product table anyway (which is an integer).
The new table definitions could look something like this:
create table purchase_prices
(
product_id integer not null references products,
price numeric(16,4) not null,
valid_during daterange not null
);
And the constraint that prevents overlapping ranges:
alter table purchase_prices
add constraint check_price_range
exclude using gist (product_id with =, valid_during with &&);
The constraint needs the btree_gist extension.
As always improving query speed comes with a price and in this case it's the higher maintenance costs for the GiST index. You would need to run some tests to see if the easier (and most probably much faster) query outweighs the slower insert performance on purchase_prices.
Look at your scalar sub-query very closely. It is not correlated back to the outer query. In other words, it will return the same result every time: the latest date in the product_prices table. Period. Think about the query out of context:
SELECT date
FROM product_prices
ORDER BY date DESC LIMIT 1
There are two problems with it:
It will return 2015-10-12 for every row in the join and ultimately, nothing was purchased on that date, hence, null.
Your approximation of closest is that the dates are equal. Unless you have a product_prices row for every product for every single date, you'll always have misses. "Closest" implies distance and ranking.
WITH close_prices_by_purchase AS (
SELECT
p.user,
p.product,
p.date pp.date,
pp.price,
row_number() over (partition by pp.product, order by pp.date desc) as distance -- calculate distance between purchase date and price date
FROM purchases AS p
INNER JOIN product_prices AS pp on pp.product = p.product
WHERE pp.date < p.date
)
SELECT user as name, product, pu.date as purchase_date, pp.date as price_date, price
FROM close_prices_by_purchase AS cpbp
WHERE distance = 1; -- shortest distance
You can try something like this, although I am sure there's a better way:
with diffs as (
select
a.*,
b."date" as bdate,
b.price,
b."date" - a."date" as diffdays,
row_number() over (
partition by "user", a."product", a."date"
order by "user", a."product", a."date", b."date" - a."date" desc
) as sr
from purchases a
inner join product_prices b on a.product = b.product
where b."date" - a."date" < 1
)
select
"user" as "name",
product,
"date" as "purchase date",
bdate as "price date",
price
from diffs
where sr = 1
Example: https://www.db-fiddle.com/f/dwQ9EXmp1SdpNpxyV1wc6M/0
Explanation
I attempted to join both tables and find the difference between dates of purchase and price, and ranked them by closest date prior to the purchase. Rank of 1 will go to the closest date. Then, data with rank of 1 was extracted.
This is a great place to use date ranges! We know the start date of the price range and we can use a window function to get the next date. At that point, it's really easy to figure out the price on any day.
with price_ranges as
(select product,
price,
date as price_date,
daterange(date, lead(date, 1)
OVER (partition by product order by date), '[)'
) as valid_price_range from product_prices
)
select "user" as name,
purchases.product,
purchases.date,
price_date,
price
from purchases
join price_ranges on purchases.product = price_ranges.product
and purchases.date <# price_ranges.valid_price_range
order by purchases.date;
Full disclosure: I've seen 1 variation of this question for mySQL, and the PostgreSQL answer didn't satisfy me.
I have 2 tables: Reviews & businesses. In the Reviews table, the only 3 relevant columns for the purpose of this question are 'business_id', 'date' (yyyy-mm-dd), and stars (1-5), and the primary key is (review_id). In the businesses table, the relevant columns are 'business_id', 'year', and 'month'.' The 'year' and 'month' columns are there because there is another column in the business table called 'review_count', which represents the number of reviews a business received on each month of each year. Because of this, the composite primary key of this table is (business_id, year, month).
Essentially, I am trying to create a column in the business table with the average rating (represented by stars) a business received on each month of each year it was open.
The following query gives me the exact result I want:
SELECT round(CAST(AVG(stars) AS NUMERIC), 2)
FROM reviews_for_trending_businesses
WHERE business_id IN (SELECT DISTINCT(business_id)
FROM trending_businesses_v2)
GROUP BY business_id, EXTRACT("year" FROM reviews_for_trending_businesses.date), EXTRACT('month' FROM reviews_for_trending_businesses.date);
This code returns the column and all the correct values that I want to insert into my business table.
However, when I try to actually update the table, I get an error saying more than one row was returned by the subquery used as an expression. This is the code I'm trying to update with:
UPDATE trending_businesses_v2
SET avg_monthly_rating = (SELECT round(CAST(AVG(stars) AS NUMERIC), 2)
FROM reviews_for_trending_businesses
WHERE business_id IN (SELECT DISTINCT(business_id)
FROM trending_businesses_v2)
GROUP BY business_id, EXTRACT("year" FROM reviews_for_trending_businesses.date), EXTRACT('month' FROM reviews_for_trending_businesses.date);
I've tried a number of other solutions as well, including using joins, but keep getting a similar error.
UPDATE: Still No Answer but getting Closer:
Still can't quite figure out where I'm going wrong here. I also don't understand why I have to groupby 'rtb.date' here if I'm only extracting values from it (returned error if I didn't).
UPDATE trending_businesses_v2 tb
SET avg_monthly_rating = t.val
FROM (SELECT business_id, EXTRACT("year" FROM rtb.date) AS year, EXTRACT('month' FROM rtb.date) AS month, round(CAST(AVG(stars) AS NUMERIC), 2) as val
FROM reviews_for_trending_businesses rtb
WHERE business_id IN (SELECT DISTINCT(business_id)
FROM trending_businesses_v2
)
GROUP BY business_id, year, month, rtb.date
) t
WHERE t.business_id = tb.business_id AND
t.year = tb.year AND t.month = tb.month;
You need to match the rows, presumably using a business id and date. Something like this:
UPDATE trending_businesses_v2 tb
SET avg_monthly_rating = t.val
FROM (SELECT business_id, date_trunc('month', rtb.date) as yyyymm, round(CAST(AVG(stars) AS NUMERIC), 2) as val
FROM reviews_for_trending_businesses rtb
WHERE business_id IN (SELECT DISTINCT(business_id)
FROM trending_businesses_v2
)
GROUP BY business_id, date_trunc('month', rtb.date)
) t
WHERE t.business_id = tb.business_id AND
t.yyyymm = tb.?;
I have following data in the table as below and I am looking for a way to group the continuous time intervals for each id to return:
CREATE TABLE DUMMY
(
ID VARCHAR2(10 BYTE),
TIME_STAMP VARCHAR2(8 BYTE),
NAME VARCHAR2(255 BYTE)
);
SELECT ID, min(TIME_STAMP) "startDate", max(TIME_STAMP) "endDate", NAME
GROUP BY ID , NAME
something like
100 20011128 20011203 David
100 20011204 20011207 Unknown
100 20011208 20011215 David
100 20011216 20011220 Sara
and so on ...
ps. I have a sample script, but i don't know how to attach my file.
Hi every one here is more input:
There is only one record with time_stamp for a specific ID.
Users can be different, for example for day 1 David, day 2 unknown, day 3 David and so on.
So there is one row for every day of year for each ID but with different users.
Now, i want to see the break point, differences base on time_stamp intervals from day one
until last day for a specific ID in day order from begin day until last day.
Query Result should be :
ID NAME MIN_DATE MAX_DATE
100 David 20011128 20050407
100 Sara 20050408 20050417
100 David 20050418 20080416
100 Unknown 20080417 20080507
100 David 20080508 20080508
100 Unknown 20080509 20080607
100 David 20080608 20080608
100 Unknown 20080609 20080921
100 David 20080922 20080922
100 Unknown 20080923 20081231
100 David 20090101 20090405
thanks
Hi again, many thanks to everyone, i have solved the problem, here is the solution:
select id, min(time_stamp), max(time_stamp), name
from ( select id, time_stamp, name,
max(rn) over (order by time_stamp) grp
from ( select id, time_stamp, name,
case
when lag(name) over (order by time_stamp) <> name or
row_number() over (order by time_stamp) = 1
then row_number() over (order by time_stamp)
end rn
from dummy
)
)
group by id, grp, name
order by 1
Select
ID,
Name,
min(time_stamp) min_date,
max(time_stamp) max_date
from
Dummy
group by
Id,
Name
That should work.
IF you want the date range for each Id, but all the names you can do:
Select
d.Id,
d.Name,
dr.min_date,
dr.max_date
from
Dummy d
JOIN
(Select
Id,
min(time_stamp) min_date,
max(time_stamp) max_date
from
Dummy
group by
Id
) dr
on ( dr.Id = d.Id)