indicator for increase over time - postgresql

I'm trying to create an indicator for value increase over time within a group. In particular, I'm trying to flag certain grp if value ever increases by 50% over time.
I have a raw data that looks like:
id grp value_dt value
--------------------------------
1 1 11/20/20 1.4
1 1 11/21/20 0.8
1 1 11/24/20 2.8
1 1 11/25/20 2.5
1 2 11/29/20 1.5
1 2 12/1/20 1.6
2 1 11/21/20 0.8
2 2 11/26/20 0.9
2 3 12/1/20 0.9
2 3 12/3/20 2.8
You can see that for id = 1 and grp = 1, the value fluctuates as it increases and decreases over time, but because it had increase over time between 11/21/20 and 11/24/20 from 0.8 to 2.8 (greater than 50% increase), I want to flag the whole grp 1. I want my output to look like:
id grp val_ind
-----------------------
1 1 1
1 2 0
2 1 0
2 2 0
2 3 1
I can only think of using min and max (something like below), which doesn't include the 'over the time' factor in...
select id,
grp,
min(value) as min_grp,
max(value) as max_grp,
(max_grp - min_grp) as val_diff,
case when val_diff >= min_grp * 1.5 then 1 else 0 end as val_ind
If anyone can offer their advice, I will greatly appreciate it!

I think you want to know if at any point at time there is an increase of 50% , you flag that group. if yes , here is how you can do it,
you need to use cte and window functions :
; WITH cte AS (
SELECT *
, CASE WHEN COALESCE(LEAD(value) OVER (PARTITION BY id, grp ORDER BY value_dt),0) >= value* 2 THEN 1 ELSE 0 END val_ind
FROM ttt
)
SELECT
id , grp , MAX(val_ind) val_ind
FROM cte
GROUP BY
id , grp
id | grp | val_ind
-: | --: | ------:
1 | 1 | 1
1 | 2 | 0
2 | 1 | 0
2 | 2 | 0
2 | 3 | 1
db<>fiddle here

Related

query customer retention over range

I am trying to find the best way to accomplish the following.
Get the beginning customer count, which carries from the previous day
Get New Customer count
Get the number of Customers who have not come in since the prior month
Get the number of Customers who have come back after lapsing
Get the number of total customers
The following example
Customer ID
Store ID
Date
Amount
1
1
1/2/22
1.00
2
2
1/2/22
2.00
1
1
2/2/22
1.00
3
2
3/2/22
1.00
2
2
3/2/22
1.00
1
1
3/2/22
1.00
1
1
4/2/22
1.00
4
1
4/2/22
1.00
2
2
4/2/22
1.00
The result would be
Date
Store
Beginning
New
Dropped
Returned
Total
1/2/22
1
0
1
0
0
1
1/2/22
2
0
1
0
0
1
2/2/22
1
1
0
0
0
1
2/2/22
2
1
0
1
0
0
3/2/22
1
1
0
0
0
1
3/2/22
2
0
1
0
1
2
4/2/22
1
1
1
0
0
2
4/2/22
2
2
0
1
0
1
I kind of have a query, but it's not getting the right results
WITH customerset AS (
SELECT
location_id,
date,
array_agg(DISTINCT customer_id ORDER BY customer_id ASC) AS customer_ids
FROM customer_orders
GROUP BY
location_id,
date
)
SELECT
cset.location_id,
cset.date,
array_length(cset2.customer_ids, 1) AS beginning,
array_length((past2.customer_ids - cset.customer_ids), 1) AS dropped,
array_length((cset.customer_ids - past2.customer_ids), 1) AS returned
FROM
(
SELECT
ords.location_id,
ords.date,
array_agg(DISTINCT ords.customer_id ORDER BY ords.customer_id ASC) AS customers_id
FROM customer_orders ords
GROUP BY
ords.location_id,
ords.date
) cset
JOIN
customerset cset2 ON cset.date - '1 month'::interval = cset2.date
AND cset2.location_id = cset.location_id
GROUP BY
cset.location_id,
cset.date,
cset2.customer_ids,
cset.customer_ids
ORDER BY
cset.date ASC

Need to have subquery within subquery

I have a stock table which holds for example
Partnumber | Depot | flag_redundant
------------+-------+----------------
1 | 1 | 5
1 | 2 | 0
1 | 3 | 0
1 | 4 | 5
2 | 1 | 0
2 | 2 | 0
2 | 3 | 0
2 | 4 | 0
I need to be able to see the depots in which the parts have not been flagged as redundant, but the flag_redundant has been at least been flagged once for that part, and I need to ignore any parts where there has not been a flag flagged.
Any help appreciated!
I'm thinking of something along the lines of ....
SELECT stock.part, stock.depot,
OrderCount = (SELECT CASE WHEN Stock.flag_redundant = 5 THEN 1 end as Countcolumn FROM stock C)
FROM stock
Partnumber | MissingDepots
------------+---------------
1 | Yes
You can group by partnumber and set the conditions in the HAVING clause:
select
partnumber, 'Yes' MissingDepots
from stock
group by partnumber
having
sum(flag_redundant) > 0 and
sum(case when flag_redundant = 0 then 1 end) > 0
Or:
select
partnumber, 'Yes' MissingDepots
from stock
group by partnumber
having sum(case when flag_redundant = 0 then 1 end) between 1 and count(*) - 1
See the demo.
Results:
> partnumber | missingdepots
> ---------: | :------------
> 1 | Yes
Assuming you want to get these partnumbers that contain data sets with flag_redundant = 5 AND 0:
demo:db<>fiddle
SELECT
partnumber,
'Yes' AS missing
FROM (
SELECT
partnumber,
COUNT(flag_redundant) FILTER (WHERE flag_redundant = 5) AS cnt_redundant, -- 2
COUNT(*) AS cnt -- 3
FROM
stock
GROUP BY partnumber -- 1
) s
WHERE cnt_redundant > 0 -- 4
AND cnt_redundant < cnt -- 5
Group by partnumber
Count all records with flag_redundant = 5
Count all records
Find all partnumbers that contain any element with 5 ...
... and which have more records than 5-element records

Create Pivot Table using PostgreSQL

I have a table like this:
type code desc store Sales/Day Stock
-----------------------------------------------
1 AA1 abc 101 3 6
1 AA2 abd 101 4 0
1 AA3 abf 101 4 3
2 BA1 bba 101 5 1
2 BA2 bbc 101 2 1
1 AA1 abc 102 1 4
1 AA2 abd 102 2 0
2 BA1 bba 102 4 2
2 BA2 bbc 102 5 5
etc.
How I can show the result table like this:
type code desc Store 101 Store 102
Sales/Day | Stock Sales/Day | Stock
--------------------------------------------------------------
1 AA1 abc 3 6 1 4
1 AA2 abd 4 0 2 0
1 AA3 abf 4 3 0 0
2 BA1 bba 5 1 4 2
2 BA2 bbc 2 1 5 5
etc.
Note:
Colspan is only display.
demo:db<>fiddle
First way: FILTER
SELECT
type,
code,
"desc",
COALESCE(SUM(sales_day) FILTER (WHERE store = 101)) as sales_day_101,
COALESCE(SUM(stock) FILTER (WHERE store = 101), 0) as stock_101,
COALESCE(SUM(sales_day) FILTER (WHERE store = 102), 0) as sales_day_102,
COALESCE(SUM(stock) FILTER (WHERE store = 102), 0) as stock_102
FROM mytable
GROUP BY type, code, "desc"
ORDER BY type, code
Aggregating your values. I took SUM but in your case with distinct rows many other aggregate functions would do it. FILTER allows you to aggregate only one store.
The COALESCE is to avoid NULL values if no values are present for one aggregation (like AA3 in store 102).
Second way, CASE WHEN
SELECT
type,
code,
"desc",
SUM(CASE WHEN store = 101 THEN sales_day ELSE 0 END) as sales_day_101,
SUM(CASE WHEN store = 101 THEN stock ELSE 0 END) as stock_101,
SUM(CASE WHEN store = 102 THEN sales_day ELSE 0 END) as sales_day_102,
SUM(CASE WHEN store = 102 THEN stock ELSE 0 END) as stock_102
FROM mytable
GROUP BY type, code, "desc"
ORDER BY type, code
The idea is the same, but the newer FILTER function is replace by the more common CASE clause.
Notice that "desc" is a reserved word in Postgres. So I strictly recommend to rename your column.

Recursive Cumulative Sum up to a certain value Postgres

I have my data that looks like this:
user_id touchpoint_number days_difference
1 1 5
1 2 20
1 3 25
1 4 10
2 1 2
2 2 30
2 3 4
I would like to create one more column that would create a cumulative sum of the days_difference, partitioned by user_id, but would reset whenever the value reaches 30 and starts counting from 0. I have been trying to do it, but I couldn't figure it out how to do it in PostgreSQL, because it has to be recursive.
The outcome I would like to have would be something like:
user_id touchpoint_number days_difference cum_sum_upto30
1 1 5 5
1 2 20 25
1 3 25 0 --- new count all over again
1 4 10 10
2 1 2 2
2 2 30 0 --- new count all over again
2 3 4 4
Do you have any cool ideas how this could be done?
This should do what you want:
with cte as (
select t.a, t.b, t.c, t.c as sumc
from t
where b = 1
union all
select t.a, t.b, t.c,
(case when t.c + cte.sumc > 30 then 0 else t.c + cte.sumc end)
from t join
cte
on t.b = cte.b + 1 and t.a = cte.a
)
select *
from cte
order by a, b;
Here is a rextester.

Redshift - Get a value from one column A for each ID in the grouping ID column B based on max value in another column C

I have a sql problem (on Redshift) where I need to get the value from column index for each id in column id based on max value in column final_score and put this value in a new column fav_index. score2 equals to the value of score1 where index n = index n + 1, for example, for id = abc1, index = 0 and score1 = 10 the value of score2 will be the value of score1 where index = 1 and the value of final_score is the difference between score1 and score2.
It's easier if you look at below table score. This table score is a result of a sql query which is shown later below.
id index score1 score2 final_score
abc1 0 10 20 10
abc1 1 20 45 25
abc1 2 45 (null) (null)
abc2 0 5 10 5
abc2 1 10 (null) (null)
abc3 0 50 30 -20
abc3 1 30 (null) (null)
So, the resulting table containing column fav_index should look like this:
id index score1 score2 final_score fav_index
abc1 0 10 20 10 0
abc1 1 20 45 25 1
abc1 2 45 (null) (null) 0
abc2 0 5 10 5 0
abc2 1 10 (null) (null) 0
abc3 0 50 30 -20 0
abc3 1 30 (null) (null) 0
Below is the script to generate table score from table story:
select
m.id,
m.index,
max(m.max) as score1,
fmt.score2,
round(fmt.score2 - max(m.max), 1) as final_score
from
(select
sv.id,
case when sv.story_number % 2 = 0 then cast(sv.story_number / 2 - 1 as int) else cast(floor(sv.story_number/2) as int) end as index,
max(sv.score1)
from
story as sv
group by
sv.id,
index,
sv.score1
order by
sv.id,
index
) as m
left join
(select
sv.id,
case when sv.story_number % 2 = 0 then cast(sv.story_number / 2 - 1 as int) else cast(floor(sv.story_number/2) as int) end as index,
max(score1) as score2
from
story as sv
group by
id,
index
) as fmt
on
m.id = fmt.id
and
m.index = fmt.index - 1
group by
m.id,
m.index,
fmt.score2
Table story is as below:
id story_number score1
abc1 1 10
abc1 2 10
abc1 3 20
abc1 4 20
abc1 5 45
abc1 6 45
The only solution I can think of is to do something like,
select id, max(final_score) from score group by id
and then join it back to the long script above (which was used to generate table score). I really want to avoid writing such a long script to get just 1 extra column of information that I need.
Is there a better way to do this?
Thank you!
Update: answer in mysql is also accepted. thanks!
After spending more hours on this and asking people around, I finally figured out a solution by referring to this window function documentation - PostgreSQL https://www.postgresql.org/docs/9.1/static/tutorial-window.html
I basically added 2 x select statements at the top and 1 x where statement at the very bottom. The where statement is to take care of the rows where final_score = null because otherwise the rank() function will rank them as 1.
My code then becomes:
select
id, index, final_score, rank, case when rank = 1 then index else null end as fav_index
from
(select
id, index, final_score, rank() over (partition by id order by final_score desc)
from
(select
m.id,
m.index,
max(m.max) as score1,
fmt.score2,
round(fmt.score2 - max(m.max), 1) as final_score
from
(select
sv.id,
case when sv.story_number % 2 = 0 then cast(sv.story_number / 2 - 1 as int) else cast(floor(sv.story_number/2) as int) end as index,
max(sv.score1)
from
story as sv
group by
sv.id,
index,
sv.score1
order by
sv.id,
index
) as m
left join
(select
sv.id,
case when sv.story_number % 2 = 0 then cast(sv.story_number / 2 - 1 as int) else cast(floor(sv.story_number/2) as int) end as index,
max(score1) as score2
from
story as sv
group by
id,
index
) as fmt
on
m.id = fmt.id
and
m.index = fmt.index - 1
group by
m.id,
m.index,
fmt.score2)
where
final_score is not null)
And the result is as follows:
id index final_score rank fav_index
abc1 0 10 2 (null)
abc1 1 25 1 1
abc2 0 5 1 0
abc3 0 -20 1 0
Result is slightly different than what I stated in the question, however, the fav_index for each id is identified and this is what I needed really. Hope this might help someone. Cheers