postgres lag when data is missing - postgresql

I have data on baseball players annual salaries, with some years missing. What I would like to do is calculate the min, max, average change in salary from the prior year for all players in a year.
For example data looks like below from the table 'salaries':
| playerid | yearid | salary |
| a | 2016 | 10000 |
| b | 2016 | 5000 |
| a | 2015 | 9000 |
| b | 2015 | 3000 |
| a | 2014 | 3000 |
| b | 2014 | 15000 |
| a | 2010 | 1000 |
As you can see, player A has a yearly change of 1k and 6k. player B has a yearly change of 2k and -12k. So I would like a select statement that brings out:
| yearid | min change | max change | avg change |
| 2016 | 1k | 2k | 1.5k |
| 2015 | -12k | 6k | -9k |
Is there a way to do this?
My lag function has unfortunately captured the difference between 2014 and 2010 for playerid a and that is obviously wrong. I couldn't figure out how to use the lag function only if the previous row's yearid was 1 less than the current rows yearid.
Any suggestions would be greatly appreciated.

Just use the previous year for the filtering:
select year, min(salary - prev_salary), max(salary - prev_salary),
avg(salary - prev_salary)
from (select s.*,
lag(s.salary) over (partition by s.playerid order by yearid) as prev_salary,
lag(s.yearid) over (partition by s.playerid order by yearid) as prev_yearid
from salaries s
) s
where prev_yearid = yearid - 1;
Or, you can just use a join:
select s.yearid, . . .
from salaries s join
salaries sp
on sp.playerid = s.playerid and sp.yearid = s.yearid - 1
group by s.yearid;

Related

Postgres Hierarchy output

im struggling on how to get the correct output using hierarchy query.
I have one table which loads per day all product and its price. during time this can cancel and being activate again.
I believe with oracle we could use the Connect By.
WITH RECURSIVE cte AS (
select min(event_date) event_date, item_code,sum(price::numeric)/1024/1024 price, 1 AS level
from rdpidevdat.raid_r_cbs_offer_accttype_map where product_type='cars' and item_code in ('Renault')
group by item_code
UNION ALL
SELECT e.event_date, e.item_code, e.price, cte.level + 1
from (select event_date, item_code,sum(price::numeric)/1024/1024 price
from rdpidevdat.raid_r_cbs_offer_accttype_map where product_type='cars' and item_code in ('9859')
group by event_date,item_code) e join cte ON e.event_date = cte.event_date and e.item_code = cte.item_code
)
SELECT *
FROM cte where item_code in ('Renault') ;
how do i put an ouput where will have the range of each product during time?
if we have the data:
EVENT_DATE | ITEM_COD| PRICE
20210910 | Renaut | 2500
20210915 | Renaut | 2500
20210920 | Renaut | 2600
20211020 | Renaut | 2900
20220101 | Renaut | 2500
the expected output should be:
-------------------------------------------------
FROM_EVENT_DATE | TO_EVENT_DATE | ITEM_COD| PRICE
20210910 | 20210915 | Renaut | 2500
20210915 | 20210920 | Renaut | 2600
20210920 | 20211020 | Renaut | 2900
20211020 | 20220101 | Renaut | 2500
Thanks in Advance and Regards!
I already found the solution. Using the Lag and lastvalue function. no need to use the hierarchy one.

Create a pivot table for Month over Month variation

I have these records returned from a query
+---------+--------------+-----------+----------+
| Country | other fields | sales | date |
+---------+--------------+-----------+----------+
| US | 1 | $100.00 | 01/01/21 |
| CA | 1 | $100.00 | 01/01/21 |
| UK | 1 | $100.00 | 01/01/21 |
| FR | 1 | $100.00 | 01/01/21 |
| US | 1 | $200.00 | 01/02/21 |
| CA | 1 | $200.00 | 01/02/21 |
| UK | 1 | $200.00 | 01/02/21 |
| FR | 1 | $200.00 | 01/02/21 |
And I want to show the sales variation from one month to previous, like this:
| Country | 01/02/21 | 01/01/21 | Var% |
| US | $200.00 | $100.00 | 100% |
| CA | $200.00 | $100.00 | 100% |
| FR | $200.00 | $100.00 | 100% |
+---------+--------------+-----------+----------+
How could be done with a Postgres query?
if you always comparing two month only :
select country
, sum(sales) filter (where date ='01/01/21') month1
, sum(sales) filter (where date ='01/02/21') month2
, ((sum(sales) filter (where date ='01/02/21') /sum(sales) filter (where date ='01/01/21')) - 1) * 100 var
from tablename
where date in ('01/01/21' , '01/02/21')
group by country
you also can look at crosstab from tablefunc extension which basically does the same as above query.
CREATE EXTENSION IF NOT EXISTS tablefunc;
select * ,("01/02/21" /"01/01/21") - 1) * 100 var
from(
select * from crosstab ('select Country,date , sales from tablename')
as ct(country varchar(2),"01/01/21" money , "01/02/21" money)
) t
for more info about crosstab , see tablefunc
but if you want to show date in rows instead of columns, you can easily generalize it for all the dates :
select *
, ((sales / LAG(sales,1,1) over (partition by country order by date)) -1)* 100 var
from
country

How to get list day of month data per month in postgresql

i use psql v.10.5
and i have a structure table like this :
| date | total |
-------------------------
| 01-01-2018 | 50 |
| 05-01-2018 | 90 |
| 30-01-2018 | 20 |
how to get recap data by month, but the data showed straight 30 days, i want the data showed like this :
| date | total |
-------------------------
| 01-01-2018 | 50 |
| 02-01-2018 | 0 |
| 03-01-2018 | 0 |
| 04-01-2018 | 0 |
| 05-01-2018 | 90 |
.....
| 29-01-2018 | 0 |
| 30-01-2018 | 20 |
i've tried this query :
SELECT * FROM date
WHERE EXTRACT(month FROM "date") = 1 // dynamically
AND EXTRACT(year FROM "date") = 2018 // dynamically
but the result is not what i expected. also the params of month and date i create dynamically.
any help will be appreciated
Use the function generate_series(start, stop, step interval), e.g.:
select d::date, coalesce(total, 0) as total
from generate_series('2018-01-01', '2018-01-31', '1 day'::interval) d
left join my_table t on d::date = t.date
Working example in rextester.

Postgresql: Looping through a date_trunc generated group

I've got some records on my database that have a 'createdAt' timestamp.
What I'm trying to get out of postgresql is those records grouped by 'createdAt'
So far I've got this query:
SELECT date_trunc('day', "updatedAt") FROM goal GROUP BY 1
Which gives me:
+---+------------+-------------+
| date_trunc |
+---+------------+-------------+
| Sep 20 00:00:00 |
+---+------------+-------------+
Which are the days where the records got created.
My question is: Is there any way to generate something like:
| Sep 20 00:00:00 |
| id | name | gender | state | age |
|----|-------------|--------|-------|-----|
| 1 | John Kenedy | male | NY | 32 |
| |
| Sep 24 00:00:00 |
| |
| id | name | gender | state | age |
|----|-------------|--------|-------|-----|
| 1 | John Kenedy | male | NY | 32 |
| 2 | John De | male | NY | 32 |
That means group by date_trunc and select all the columns of those rows?
Thanks a lot!
Please try SELECT date_trunc('day', "updatedAt"), name, gender, state, age FROM goal GROUP BY 1,2,3. It will not provide as the structure, you expect, but will "group by date_trunc and select all the columns ".

Grouping in t-sql with latest dates

I have a table like this
Event ID | Contract ID | Event date | Amount |
----------------------------------------------
1 | 1 | 2009-01-01 | 100 |
2 | 1 | 2009-01-02 | 20 |
3 | 1 | 2009-01-03 | 50 |
4 | 2 | 2009-01-01 | 80 |
5 | 2 | 2009-01-04 | 30 |
For each contract I need to fetch the latest event and amount associated with the event and get something like this
Event ID | Contract ID | Event date | Amount |
----------------------------------------------
3 | 1 | 2009-01-03 | 50 |
5 | 2 | 2009-01-04 | 30 |
I can't figure out how to group the data correctly. Any ideas?
Thanks in advance.
SQL 2k5/2k8:
with cte_ranked as (
select *
, row_number() over (
partition by ContractId order by EvantDate desc) as [rank]
from [table])
select *
from cte_ranked
where [rank] = 1;
SQL 2k:
select t.*
from table as t
join (
select max(EventDate) as MaxDate
, ContractId
from table
group by ContractId) as mt
on t.ContractId = mt.ContractId
and t.EventDate = mt.MaxDate