Pandas's `pct_change()` equivalent in postgres - postgresql

Let's assume I have a table like this:
id
date
value
1
2021-04-05
100
1
2021-04-04
50
1
2021-04-03
25
1
2021-04-02
5
2
2021-04-05
80
2
2021-04-04
20
2
2021-04-03
15
2
2021-04-02
10
I need to add another column that groups by id and calculates a day-over-day percent change from the value with the date before it. So for this example it would look like this:
id
date
value
pct_change
1
2021-04-05
100
100
1
2021-04-04
50
100
1
2021-04-03
25
400
1
2021-04-02
5
NaN
2
2021-04-05
80
300
2
2021-04-04
20
33.33
2
2021-04-03
15
50
2
2021-04-02
10
NaN
In python this would be easy, I could do something like this:
df['pct_change'] = df.groupby('id').value.pct_change() * 100
But if I wanted to do this in the Postgres database call, I'd suddenly implode with stupidity... does anybody know how to do this?

Maybe something like this?
SELECT
id,
date,
value,
(value - prev_value) / prev_value AS pct_change
FROM
(
SELECT
id,
date,
value,
LAG(value) OVER (PARTITION BY id ORDER BY date
ROWS BETWEEN 1 PRECEDING AND
CURRENT ROW) AS prev_value
FROM
your_table
)
ORDER BY date, id

Related

Getting ranking based on a number from CTE

I have a complex situation in PostgreSQL 11 where i need to generate a numbering based on a single figure which i get it from a CTE.
Below is the CTE
WITH pending_orders_to_be_processed_details
AS
(
SELECT ROW_NUMBER() OVER(ORDER BY so.create_date ) as queue_no
, name,so.create_date ::TIMESTAMP
FROM picking sp
LEFT JOIN order so ON so.name=sp.origin
WHERE sp.state IN('assigned','confirmed')
)
,orders_which_can_be_processed_today AS
(
-- This CTE will give me a count of orders
and its hourly average, Lets say count is 400 and hourly avg is 3
)
Now i need to number the details according to the hourly average, Means the first 3 orders need to be ranked as 1, next 3 to be ranked as 2 and so on, so that i can able to identify that these can be processed based on this ranking.
Input will be
name queu_number. create_date
so1 1 2021-03-11 12:00:00
so2 2 2021-03-11 13:00:00
so3 3 2021-03-11 14:00:00
so4 4 2021-03-11 15:00:00
so5 5 2021-03-11 16:00:00
so6 6 2021-03-11 17:00:00
so7 7 2021-03-11 18:00:00
so8 8 2021-03-11 19:00:00
so9 9 2021-03-11 20:00:00
The expected output will be
name rank
so1 1
so2 1
so3 1
so4 2
so5 2
so6 2
so7 3
so8 3
so9 3
Any help/suggestions.
Edit: I recently learned about a function, which fits well here:
demo:db<>fiddle
You can use the ntile() window function for that:
SELECT
*,
ntile(3) OVER (ORDER BY create_date)
FROM mytable
demo:db<>fiddle
Since you already created a cumulative row count, you can use this to create your expected rank:
SELECT
*,
floor((queue_no - 1) / 3) + 1 as rank
FROM my_cte
queue_no - 1 (so, 1 to 3 will be shifted to 0 to 2)
Diff by 3: so, 0 to 2 will be 0.x and 3 to 5 will be 1.x, ...
Now round these result to 0, 1, 2, ...
If you want to start with 1 instead of 0, add 1

T_SQL counting particular values in one row with multiple columns

I have little problem with counting cells with particular value in one row in MSSMS.
Table looks like
ID
Month
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
11
12
13
14
15
16
...
31
5000
1
null
null
1
1
null
1
1
null
null
2
2
2
2
2
null
null
3
3
3
3
3
null
...
1
I need to count how many cells in one row have value for example 1. In this case it would be 5.
Data represents worker shifts in a month. Be aware that there is a column named month (FK with values 1-12), i don't want to count that in a result.
Column ID is ALWAYS 4 digit number.
Possibility is to use count(case when) but in examples there are only two or three columns not 31. Statement will be very long. Is there any other option to count it?
Thanks for any advices.
I'm going to strongly suggest that you abandon your current table design, and instead store one day per month, per record, not column. That is, use this design:
ID | Date | Value
5000 | 2021-01-01 | NULL
5000 | 2021-01-02 | NULL
5000 | 2021-01-03 | 1
5000 | 2021-01-04 | 1
5000 | 2021-01-05 | NULL
...
5000 | 2021-01-31 | 5
Then use this query:
SELECT
ID,
CONVERT(varchar(7), Date, 120),
COUNT(CASE WHEN Value = 1 THEN 1 END) AS one_cnt
FROM yourTable
GROUP BY
ID,
CONVERT(varchar(7), Date, 120);

Getting data from alternate dates of same ID column

I've a table data as below, now I need to fetch the record with in same code, where (Value2-Value1)*2 of one row >= (Value2-Value1) of consequtive date row. (all dates are uniform with in all codes)
---------------------------------------
code Date Value1 Value2
---------------------------------------
1 1-1-2018 13 14
1 2-1-2018 14 16
1 4-1-2018 15 18
2 1-1-2019 1 3
2 2-1-2018 2 3
2 4-1-2018 3 7
ex: output needs to be
1 1-1-2018 13 14
as I am begginer to SQL coding, tried my best, but cannot get through with compare only on consequtive dates.
Use a self join.
You can specify all the conditions you've listed in the ON clause:
SELECT T0.code, T0.Date, T0.Value1, T0.Value2
FROM Table As T0
JOIN Table As T1
ON T0.code = T1.code
AND T0.Date = DateAdd(Day, 1, T1.Date)
AND (T0.Value2 - T0.Value1) * 2 >= T1.Value2 - T1.Value1

PostgreSQL: first date cumulative score

I have this sample table:
id date score
11 1/1/2017 14:32 25.34
4 1/2/2017 12:14 34.34
25 1/2/2017 18:08 37.15
4 3/2/2017 23:42 47.24
4 4/2/2017 23:42 54.12
25 7/3/2017 22:07 65.21
11 9/3/2017 21:02 74.6
25 10/3/2017 5:15 11.3
4 10/3/2017 7:11 22.45
My aim is to calculates the first(!) date (YYYY-MM-DD) on which an id's cumulative score has reached 100 (>=). For that, I've written the following code:
SELECT date(date),id, score,
sum(score) over (partition by id order by date(date) rows unbounded preceding) as cumulative_score
FROM test_q1
GROUP BY id, date, score
Order by id, date
It returns:
date id score cumulative_score
1/1/2017 11 25.34 25.34
9/3/2017 11 74.6 99.94
1/2/2017 4 34.34 34.34
3/2/2017 4 47.24 81.58
4/2/2017 4 54.12 135.7
10/3/2017 4 22.45 158.15
1/2/2017 25 37.15 37.15
7/3/2017 25 65.21 102.36
10/3/2017 25 11.3 113.66
I tried to add either WHERE cumulative_score >= 100 or HAVING cumulative score >= 100, but it returns_
ERROR: column "cumulative_score" does not exist
LINE 4: WHERE cumulative_score >= 100
^
SQL state: 42703
Character: 206
Anyone knows how to solve this?
Thanks
What I expect is:
date id score cumulative_score
4/2/2017 4 54.12 135.7
7/3/2017 25 65.21 102.36
And the output just id and date.
Try this:
with cumulative_sum AS (
SELECT id,date,sum(score) over( partition by id order by date) as sum from test_q1
),
above_100_score_rank AS (
SELECT *, rank() over (partition by id order by sum) AS rank
FROM cumulative_sum where sum > 100
)
SELECT * FROM above_100_score_rank WHERE rank= 1;

Postgresql Query for display of records every 45 days

I have a table that has data of user_id and the timestamp they joined.
If I need to display the data month-wise I could just use:
select
count(user_id),
date_trunc('month',(to_timestamp(users.timestamp))::timestamp)::date
from
users
group by 2
The date_trunc code allows to use 'second', 'day', 'week' etc. Hence I could get data grouped by such periods.
How do I get data grouped by "n-day" period say 45 days ?
Basically I need to display number users per 45 day period.
Any suggestion or guidance appreciated!
Currently I get:
Date Users
2015-03-01 47
2015-04-01 72
2015-05-01 123
2015-06-01 132
2015-07-01 136
2015-08-01 166
2015-09-01 129
2015-10-01 189
I would like the data to come in 45 days interval. Something like :-
Date Users
2015-03-01 85
2015-04-15 157
2015-05-30 192
2015-07-14 229
2015-08-28 210
2015-10-12 294
UPDATE:
I used the following to get the output, but one problem remains. I'm getting values that are offset.
with
new_window as (
select
generate_series as cohort
, lag(generate_series, 1) over () as cohort_lag
from
(
select
*
from
generate_series('2015-03-01'::date, '2016-01-01', '45 day')
)
t
)
select
--cohort
cohort_lag -- This worked. !!!
, count(*)
from
new_window
join users on
user_timestamp <= cohort
and user_timestamp > cohort_lag
group by 1
order by 1
But the output I am getting is:
Date Users
2015-04-15 85
2015-05-30 157
2015-07-14 193
2015-08-28 225
2015-10-12 210
Basically The users displayed at 2015-03-01 should be the users between 2015-03-01 and 2015-04-15 and so on.
But I seem to be getting values of users upto a date. ie: upto 2015-04-15 users 85. which is not the results I want.
Any help here ?
Try this query :
SELECT to_char(i::date,'YYYY-MM-DD') as date, 0 as users
FROM generate_series('2015-03-01', '2015-11-30','45 day'::interval) as i;
OUTPUT :
date users
2015-03-01 0
2015-04-15 0
2015-05-30 0
2015-07-14 0
2015-08-28 0
2015-10-12 0
2015-11-26 0
This looks like a hot mess, and it might be better wrapped in a function where you could use some variables, but would something like this work?
with number_of_intervals as (
select
min (timestamp)::date as first_date,
ceiling (extract (days from max (timestamp) - min (timestamp)) / 45)::int as num
from users
),
intervals as (
select
generate_series(0, num - 1, 1) int_start,
generate_series(1, num, 1) int_end
from number_of_intervals
),
date_spans as (
select
n.first_date + 45 * i.int_start as interval_start,
n.first_date + 45 * i.int_end as interval_end
from
number_of_intervals n
cross join intervals i
)
select
d.interval_start, count (*) as user_count
from
users u
join date_spans d on
u.timestamp >= d.interval_start and
u.timestamp < d.interval_end
group by
d.interval_start
order by
d.interval_start
With this sample data:
User Id timestamp derived range count
1 3/1/2015 3/1-4/15
2 3/26/2015 "
3 4/4/2015 "
4 4/6/2015 " (4)
5 5/6/2015 4/16-5/30
6 5/19/2015 " (2)
7 6/16/2015 5/31-7/14
8 6/27/2015 "
9 7/9/2015 " (3)
10 7/15/2015 7/15-8/28
11 8/8/2015 "
12 8/9/2015 "
13 8/22/2015 "
14 8/27/2015 " (5)
Here is the output:
2015-03-01 4
2015-04-15 2
2015-05-30 3
2015-07-14 5