How to collect rows in a batch - scala

I have a table that looks like this:
id
values
1
a
2
b
3
c
4
d
5
e
6
f
and I need to generate group_id column to be able to collect rows in a batch using
select collect_list(values) from table group by group_id
For example, for batchSize = 2
id
values
group_id
1
a
1
2
b
1
3
c
2
4
d
2
5
e
3
6
f
3
to get it out:
group_id
collect_list(values)
1
[a, b]
2
[c, d]
3
[e, f]
or, for batchSize = 3
id
values
group_id
1
a
1
2
b
1
3
c
1
4
d
2
5
e
2
6
f
2
out
group_id
collect_list(values)
1
[a, b, c]
2
[d, e, f]
How do I generate this column group_id so I can collect the values and group by group_id?

You could use row_number and DIV to generate group_id
To expand my answer, we use the Integer division properties to get group id
Row_number will give consecutive Numbers from 1 to N
But we need the numbers to start with 0, so we subtract 1 of the row number
rownumber Div (3)
0 0
1 0
2 0
3 1
4 1
5 1
6 2
This can be proven to be true for all integers to infinity
As group_id must start with 1(not necessary actually) we need to add anither 1 to the result
After with the resulting Group-id you can collect_list(values) tpo get your arrays
SELECT
id, `values`,
((ROW_NUMBER() OVER (ORDEr By id) -1) DIV 3) + 1 group_id
FROM tab1
id
values
group_id
1
a
1
2
b
1
3
c
1
4
d
2
5
e
2
6
f
2
7
g
3
8
h
3
9
i
3
SELECT
id, `values`,
((ROW_NUMBER() OVER (ORDEr By id) -1) DIV 2) + 1 group_id
FROM tab1
id
values
group_id
1
a
1
2
b
1
3
c
2
4
d
2
5
e
3
6
f
3
7
g
4
8
h
4
9
i
5

From my understanding what you're trying to do is chunking your selection.
Then this should do the trick: https://stackoverflow.com/a/29975781/21188126

Related

Get count of values in different subgroups

I need to delete some rows in the dataset, of which the speed equals zero and lasting over N times (let's assume N is 2).
The structure of the table demo looks like:
id
car
speed
time
1
foo
0
1
2
foo
0
2
3
foo
0
3
4
foo
1
4
5
foo
1
5
6
foo
0
6
7
bar
0
1
8
bar
0
2
9
bar
5
3
10
bar
5
4
11
bar
5
5
12
bar
5
6
Then I hope to generate a table like the one below by using window_function:
id
car
speed
time
lasting
1
foo
0
1
3
2
foo
0
2
3
3
foo
0
3
3
4
foo
1
4
2
5
foo
1
5
2
6
foo
0
6
1
7
bar
0
1
2
8
bar
0
2
2
9
bar
5
3
4
10
bar
5
4
4
11
bar
5
5
4
12
bar
5
6
4
Then I can easily exclude those rows by using WHERE NOT (speed = 0 AND lasting > 2)
Put the code I tried here, but it didn't return the value I expected and I guess those FROM (SELECT ... FROM (SELECT ... might not be the best practice to solve the problem:
SELECT g3.*, count(id) OVER (PARTITION BY car, cumsum ORDER BY id) as num
FROM (SELECT g2.*, sum(grp2) OVER (PARTITION BY car ORDER BY id) AS cumsum
FROM (SELECT g1.*, (CASE ne0 WHEN 0 THEN 0 ELSE 1 END) AS grp2
FROM (SELECT g.*, speed - lag(speed, 1, 0) OVER (PARTITION BY car) AS ne0
FROM (SELECT *, row_number() OVER (PARTITION BY car) AS grp FROM demo) g ) g1 ) g2 ) g3
ORDER BY id;
You can use window function LAG() to check for the previous speed value for each row and SUM() window function to create the groups for the continuous values.
Then with COUNT() window function you can count the number of rows in each group so that you can filter out the rows with 0 speed in the groups that have more than 2 rows:
SELECT id, car, speed, time
FROM (
SELECT *, COUNT(*) OVER (PARTITION BY car, grp) counter
FROM (
SELECT *, SUM(flag::int) OVER (PARTITION BY car ORDER BY time) grp
FROM (
SELECT *, speed <> LAG(speed, 1, speed - 1) OVER (PARTITION BY car ORDER BY time) flag
FROM demo
) t
) t
) t
WHERE speed <> 0 OR counter <= 2
ORDER BY id;
See the demo.

Map the column value against same sequences in PostgreSQL

I want to write a query as per below input and output
Input :-
Num Sr_no Exp_no
NULL 1 1
NULL 2 1
ABC_1 3 1
NULL 4 1
NULL 1 2
NULL 2 2
ABC_2 3 2
NULL 4 4
Expected Output:-
Num Sr_no Exp_no
ABC_1 1 1
ABC_1 2 1
ABC_1 3 1
ABC_1 4 1
ABC_2 1 2
ABC_2 2 2
ABC_2 3 2
ABC_2 4 4
As there is no details in question, this answer is on below assumptions
you want to fill num field based on exp_no grouping.
Assuming there is only one value in a exp_no group.
Try this:
with cte as
(
select distinct on (num,exp_no) num, exp_no
from test
where num is not null
order by 1)
select
coalesce(t1.num, cte.num),
t1.sr_no,
t1.exp_no
from test t1 left join cte on t1.exp_no=cte.exp_no
DEMO

Recursive Cumulative Sum up to a certain value Postgres

I have my data that looks like this:
user_id touchpoint_number days_difference
1 1 5
1 2 20
1 3 25
1 4 10
2 1 2
2 2 30
2 3 4
I would like to create one more column that would create a cumulative sum of the days_difference, partitioned by user_id, but would reset whenever the value reaches 30 and starts counting from 0. I have been trying to do it, but I couldn't figure it out how to do it in PostgreSQL, because it has to be recursive.
The outcome I would like to have would be something like:
user_id touchpoint_number days_difference cum_sum_upto30
1 1 5 5
1 2 20 25
1 3 25 0 --- new count all over again
1 4 10 10
2 1 2 2
2 2 30 0 --- new count all over again
2 3 4 4
Do you have any cool ideas how this could be done?
This should do what you want:
with cte as (
select t.a, t.b, t.c, t.c as sumc
from t
where b = 1
union all
select t.a, t.b, t.c,
(case when t.c + cte.sumc > 30 then 0 else t.c + cte.sumc end)
from t join
cte
on t.b = cte.b + 1 and t.a = cte.a
)
select *
from cte
order by a, b;
Here is a rextester.

PostgreSQL window function & difference between dates

Suppose I have data formatted in the following way (FYI, total row count is over 30K):
customer_id order_date order_rank
A 2017-02-19 1
A 2017-02-24 2
A 2017-03-31 3
A 2017-07-03 4
A 2017-08-10 5
B 2016-04-24 1
B 2016-04-30 2
C 2016-07-18 1
C 2016-09-01 2
C 2016-09-13 3
I need a 4th column, let's call it days_since_last_order which, in the case where order_rank = 1 then 0 else calculate the number of days since the previous order (with rank n-1).
So, the above would return:
customer_id order_date order_rank days_since_last_order
A 2017-02-19 1 0
A 2017-02-24 2 5
A 2017-03-31 3 35
A 2017-07-03 4 94
A 2017-08-10 5 38
B 2016-04-24 1 0
B 2016-04-30 2 6
C 2016-07-18 1 79
C 2016-09-01 2 45
C 2016-09-13 3 12
Is there an easier way to calculate the above with a window function (or similar) rather than join the entire dataset against itself (eg. on A.order_rank = B.order_rank - 1) and doing the calc?
Thanks!
use the lag window function
SELECT
customer_id
, order_date
, order_rank
, COALESCE(
DATE(order_date)
- DATE(LAG(order_date) OVER (PARTITION BY customer_id ORDER BY order_date))
, 0)
FROM <table_name>

Different set of unnest not showing correctly?

Example Data
Sql fiddle http://sqlfiddle.com/#!15/c8a17/4
My Query
select
unnest(array[g,g1,g2]) as disp,
unnest(array[g,g||'-'||g1,g||'-'||g1||'-'||g2]) as grp,
unnest(array[1,2,3]) as ord,
unnest(array['assesvalue','lst2','salesvalue','itemprofit','profitper','itemstockvalue'])as analysis,
unnest(array[value1,tt,sv,tp,per,tsv])as val
from (
select
g,
g1,
g2,
sum(value1) as value1,
sum(tt) as tt,
sum(sv) as sv,
sum(tp) as tp,
sum(per) as per,
sum(tsv) as tsv
from table1
group by g,g1,g2
) as ta
It Show the output like this
disp grp ord analysis val
A A 1 assesvalue 100
B A-B 2 lst2 30
C A-B-C 3 salesvalue 20
A A 1 itemprofit 5
B A-B 2 profitper 1
C A-B-C 3 itemstockvalue 10
Expected Result :
disp grp ord analysis val
A A 1 assesvalue 100
A A 1 lst2 30
A A 1 salesvalue 20
A A 1 itemprofit 5
A A 1 profitper 1
A A 1 itemstockvalue 10
B A-B 2 assesvalue 100
B A-B 2 lst2 30
B A-B 2 salesvalue 20
B A-B 2 itemprofit 5
B A-B 2 profitper 1
B A-B 2 itemstockvalue 10
C A-B-C 3 assesvalue 100
C A-B-C 3 lst2 30
C A-B-C 3 salesvalue 20
C A-B-C 3 itemprofit 5
C A-B-C 3 profitper 1
C A-B-C 3 itemstockvalue 10
In Query i am using multiple unnest.
first 3 unnest inside have 3 columns other have 6 columns that its show wrong output but if last 2 unnest have less then 6 its show my expected result.
what am doing wrong in query??
i am using postgresql 9.3