postgres detect repeating patterns of zeros

postgres detect repeating patterns of zeros - postgresql

Is there a way to detect subseries of zeros of length at least 3 within a time series in Postgres?
year value
--------------
1 0
2 0
3 0
4 33
5 72
6 0
7 0
8 0
9 0
10 25
11 0
12 56
13 37
So in this example I'd like to return years 1-3 and 6-9, but not year 11.

This one will do it:
WITH d(y,v) AS (VALUES
(1,0),(2,0),(3,0),(4,33),(5,72),
(6,0),(7,0),(8,0),(9,0),(10,25),
(11,0),(12,56),(13,37)
)
SELECT grp, numrange(min(y),max(y),'[]') as ys, count(*) as len
FROM (
/* group identifiers via running total */
SELECT y, v, g, sum(g) OVER (ORDER BY y) grp
FROM (
/* group boundaries */
SELECT y, v, CASE WHEN
v IS DISTINCT FROM lag(v) OVER (ORDER BY y) THEN 1
END g
FROM d) s
WHERE v=0) s
GROUP BY grp
HAVING count(*) >= 3;

Related

Get count of values in different subgroups

I need to delete some rows in the dataset, of which the speed equals zero and lasting over N times (let's assume N is 2).
The structure of the table demo looks like:
id
car
speed
time
1
foo
0
1
2
foo
0
2
3
foo
0
3
4
foo
1
4
5
foo
1
5
6
foo
0
6
7
bar
0
1
8
bar
0
2
9
bar
5
3
10
bar
5
4
11
bar
5
5
12
bar
5
6
Then I hope to generate a table like the one below by using window_function:
id
car
speed
time
lasting
1
foo
0
1
3
2
foo
0
2
3
3
foo
0
3
3
4
foo
1
4
2
5
foo
1
5
2
6
foo
0
6
1
7
bar
0
1
2
8
bar
0
2
2
9
bar
5
3
4
10
bar
5
4
4
11
bar
5
5
4
12
bar
5
6
4
Then I can easily exclude those rows by using WHERE NOT (speed = 0 AND lasting > 2)
Put the code I tried here, but it didn't return the value I expected and I guess those FROM (SELECT ... FROM (SELECT ... might not be the best practice to solve the problem:
SELECT g3.*, count(id) OVER (PARTITION BY car, cumsum ORDER BY id) as num
FROM (SELECT g2.*, sum(grp2) OVER (PARTITION BY car ORDER BY id) AS cumsum
FROM (SELECT g1.*, (CASE ne0 WHEN 0 THEN 0 ELSE 1 END) AS grp2
FROM (SELECT g.*, speed - lag(speed, 1, 0) OVER (PARTITION BY car) AS ne0
FROM (SELECT *, row_number() OVER (PARTITION BY car) AS grp FROM demo) g ) g1 ) g2 ) g3
ORDER BY id;

You can use window function LAG() to check for the previous speed value for each row and SUM() window function to create the groups for the continuous values.
Then with COUNT() window function you can count the number of rows in each group so that you can filter out the rows with 0 speed in the groups that have more than 2 rows:
SELECT id, car, speed, time
FROM (
SELECT *, COUNT(*) OVER (PARTITION BY car, grp) counter
FROM (
SELECT *, SUM(flag::int) OVER (PARTITION BY car ORDER BY time) grp
FROM (
SELECT *, speed <> LAG(speed, 1, speed - 1) OVER (PARTITION BY car ORDER BY time) flag
FROM demo
) t
) t
) t
WHERE speed <> 0 OR counter <= 2
ORDER BY id;
See the demo.

Recursive Cumulative Sum up to a certain value Postgres

I have my data that looks like this:
user_id touchpoint_number days_difference
1 1 5
1 2 20
1 3 25
1 4 10
2 1 2
2 2 30
2 3 4
I would like to create one more column that would create a cumulative sum of the days_difference, partitioned by user_id, but would reset whenever the value reaches 30 and starts counting from 0. I have been trying to do it, but I couldn't figure it out how to do it in PostgreSQL, because it has to be recursive.
The outcome I would like to have would be something like:
user_id touchpoint_number days_difference cum_sum_upto30
1 1 5 5
1 2 20 25
1 3 25 0 --- new count all over again
1 4 10 10
2 1 2 2
2 2 30 0 --- new count all over again
2 3 4 4
Do you have any cool ideas how this could be done?

This should do what you want:
with cte as (
select t.a, t.b, t.c, t.c as sumc
from t
where b = 1
union all
select t.a, t.b, t.c,
(case when t.c + cte.sumc > 30 then 0 else t.c + cte.sumc end)
from t join
cte
on t.b = cte.b + 1 and t.a = cte.a
)
select *
from cte
order by a, b;
Here is a rextester.

PostgreSQL window function & difference between dates

Suppose I have data formatted in the following way (FYI, total row count is over 30K):
customer_id order_date order_rank
A 2017-02-19 1
A 2017-02-24 2
A 2017-03-31 3
A 2017-07-03 4
A 2017-08-10 5
B 2016-04-24 1
B 2016-04-30 2
C 2016-07-18 1
C 2016-09-01 2
C 2016-09-13 3
I need a 4th column, let's call it days_since_last_order which, in the case where order_rank = 1 then 0 else calculate the number of days since the previous order (with rank n-1).
So, the above would return:
customer_id order_date order_rank days_since_last_order
A 2017-02-19 1 0
A 2017-02-24 2 5
A 2017-03-31 3 35
A 2017-07-03 4 94
A 2017-08-10 5 38
B 2016-04-24 1 0
B 2016-04-30 2 6
C 2016-07-18 1 79
C 2016-09-01 2 45
C 2016-09-13 3 12
Is there an easier way to calculate the above with a window function (or similar) rather than join the entire dataset against itself (eg. on A.order_rank = B.order_rank - 1) and doing the calc?
Thanks!

use the lag window function
SELECT
customer_id
, order_date
, order_rank
, COALESCE(
DATE(order_date)
- DATE(LAG(order_date) OVER (PARTITION BY customer_id ORDER BY order_date))
, 0)
FROM <table_name>

TSQL - First and last number in range

I have table with:
1
2
3
4
5
6
9
10
11
12
and I need to receive:
1-6
9-12
How I can do that?
I need to see that I have two or more range of number i table and that from 1 to 6 and from 9 to 12.

SELECT
CONCAT(MIN(A.b), '-', max(A.b))
FROM
(
SELECT
*,
ROW_NUMBER() OVER (ORDER BY b) RowId
FROM
(VALUES (1), (2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12)) a(b)
--WHERE
--(a.b >= 1 AND a.b <= 6) OR
--(a.b >= 9 AND a.b <= 12)
) A
GROUP BY
A.b - A.RowId

Refer to current row in window function

Is it possible to refer to the current row in a window partition? I want to do something like the following:
SELECT min(ABS(variable - CURRENT.variable)) over (order by criterion RANGE UNBOUNDED PRECEDING)
That is, i want to find in the given partition the variable which is closest to the current value. Is is possible to do something like that?
As an example, from:
criterion | variable
1 2
2 4
3 2
4 7
5 6
We would obtain:
null
2
0
3
1
Thanks

As far as I know, this cannot be done with window functions.
But it can be done with a self join:
SELECT a.id,
a.variable,
min(abs(a.variable - b.variable))
FROM mydata a
LEFT JOIN mydata b
ON (b.criterion < a.criterion)
GROUP BY a.id, a.variable
ORDER BY a.id;

If I understand correctly:
with t (v) as (values (-5),(-2),(0),(1),(3),(10))
select v,
least(
v - lag(v) over (order by v),
lead(v) over (order by v) - v
) as closest
from t
;
v | closest
----+---------
-5 | 3
-2 | 2
0 | 1
1 | 1
3 | 2
10 | 7

Hope this could help you (pay attention for performance problems).
I tried this in MSSQL (at bottom you'll find POSTGRESQL version):
CREATE TABLE TX (CRITERION INT, VARIABILE INT);
INSERT INTO TX VALUES (1,2), (2,4),(3,2),(4,7), (5,6);
SELECT CRITERION, MIN_DELTA FROM
(
SELECT TX.CRITERION
, MIN(ABS(B.TX2_VAR - TX.VARIABILE)) OVER (PARTITION BY TX.CRITERION) AS MIN_DELTA
, RANK() OVER (PARTITION BY TX.CRITERION ORDER BY ABS(B.TX2_VAR - TX.VARIABILE) ) AS MIN_RANK
FROM TX
CROSS APPLY (SELECT TX2.CRITERION AS TX2_CRIT, TX2.VARIABILE AS TX2_VAR FROM TX TX2 WHERE TX2.CRITERION < TX.CRITERION) B
) C
WHERE MIN_RANK=1
ORDER BY CRITERION
;
Output:
CRITERION MIN_DELTA
----------- -----------
2 2
3 0
4 3
5 1
POSTGRESQL Version (tested on Rextester http://rextester.com/VMGJ87600):
CREATE TABLE TX (CRITERION INT, VARIABILE INT);
INSERT INTO TX VALUES (1,2), (2,4),(3,2),(4,7), (5,6);
SELECT * FROM TX;
SELECT CRITERION, MIN_DELTA FROM
(
SELECT TX.CRITERION
, MIN(ABS(B.TX2_VAR - TX.VARIABILE)) OVER (PARTITION BY TX.CRITERION) AS MIN_DELTA
, RANK() OVER (PARTITION BY TX.CRITERION ORDER BY ABS(B.TX2_VAR - TX.VARIABILE) ) AS MIN_RANK
FROM TX
LEFT JOIN LATERAL (SELECT TX2.CRITERION AS TX2_CRIT, TX2.VARIABILE AS TX2_VAR FROM TX TX2 WHERE TX2.CRITERION < TX.CRITERION) B ON TRUE
) C
WHERE MIN_RANK=1
ORDER BY CRITERION
;
DROP TABLE TX;
Output:
criterion variabile
1 1 2
2 2 4
3 3 2
4 4 7
5 5 6
criterion min_delta
1 1 NULL
2 2 2
3 3 0
4 4 3
5 5 1