Grouped LIMIT 10 in Postgresql - postgresql

I have a query:
select
a.kli,
b.term_desc,
count(distinct(a.adic)) as count,
a.partner_id
from
ad_delivery.sgmt_kli_adic a
join wand.wandterms b on a.kli = b.term_code
join wand.wandterms c on b.term_desc=c.term_desc
join dwh.sgmt_clients e on a.partner_id::varchar = e.partner_id
join dwh.schema_names f on e.partner_id::integer = f.partner_id::integer
where
a.partner_id::integer in (f.partner_id)
and c.class_code = 969
group by a.partner_id, b.term_desc, a.kli
order by partner_id, count desc;
which brings back counts for certain terms per partner_id. I want to be able to show the top 10 for each of the ~40 partner_id in order by the count desc
the query results look like
db=# SELECT * FROM xxx;
pid | term_desc | count
----+------------+------
4 | termdesc1 | 3434
4 | termdesc2 | 235
4 | termdesc3 | 367
4 | termdesc4 | 4533
5 | termdesc1 | 235
5 | termdesc2 | 567
5 | termdesc3 | 344
5 | termdesc4 | 56
(10k+ rows)

You could add a rank column and then filter the result by the rank :
select
a.kli,
b.term_desc,
count(distinct(a.adic)) as count,
a.partner_id,
RANK() OVER (PARTITION BY a.partner_id order by a.partner_id DESC) AS r
from
ad_delivery.sgmt_kli_adic a
join wand.wandterms b on a.kli = b.term_code
join wand.wandterms c on b.term_desc=c.term_desc
join dwh.sgmt_clients e on a.partner_id::varchar = e.partner_id
join dwh.schema_names f on e.partner_id::integer = f.partner_id::integer
where
a.partner_id::integer in (f.partner_id)
and c.class_code = 969
group by a.partner_id, b.term_desc, a.kli
HAVING r < 11
order by partner_id, count desc;
I have not tested the code, however the trick is ranking the each row of the GROUP BY and filter the resultset with the HAVING clause, keeping only item with a lower rank than 11 (you will get 10 item per group).

Related

PostgreSQL how to generate a partition row_number() with certain numbers overridden

I have an unusual problem I'm trying to solve with SQL where I need to generate sequential numbers for partitioned rows but override specific numbers with values from the data, while not breaking the sequence (unless the override causes a number to be used greater than the number of rows present).
I feel I might be able to achieve this by selecting the rows where I need to override the generated sequence value and the rows I don't need to override the value, then unioning them together and somehow using coalesce to get the desired dynamically generated sequence value, or maybe there's some way I can utilise recursive.
I've not been able to solve this problem yet, but I've put together a SQL Fiddle which provides a simplified version:
http://sqlfiddle.com/#!17/236b5/5
The desired_dynamic_number is what I'm trying to generate and the generated_dynamic_number is my current work-in-progress attempt.
Any pointers around the best way to achieve the desired_dynamic_number values dynamically?
Update:
I'm almost there using lag:
http://sqlfiddle.com/#!17/236b5/24
step-by-step demo:db<>fiddle
SELECT
*,
COALESCE( -- 3
first_value(override_as_number) OVER w -- 2
, 1
)
+ row_number() OVER w - 1 -- 4, 5
FROM (
SELECT
*,
SUM( -- 1
CASE WHEN override_as_number IS NOT NULL THEN 1 ELSE 0 END
) OVER (PARTITION BY grouped_by ORDER BY secondary_order_by)
as grouped
FROM sample
) s
WINDOW w AS (PARTITION BY grouped_by, grouped ORDER BY secondary_order_by)
Create a new subpartition within your partitions: This cumulative sum creates a unique group id for every group of records which starts with a override_as_number <> NULL followed by NULL records. So, for instance, your (AAA, d) to (AAA, f) belongs to the same subpartition/group.
first_value() gives the first value of such subpartition.
The COALESCE ensures a non-NULL result from the first_value() function if your partition starts with a NULL record.
row_number() - 1 creates a row count within a subpartition, starting with 0.
Adding the first_value() of a subpartition with the row count creates your result: Beginning with the one non-NULL record of a subpartition (adding the 0 row count), the first following NULL records results in the value +1 and so forth.
Below query gives exact result, but you need to verify with all combinations
select c.*,COALESCE(c.override_as_number,c.act) as final FROM
(
select b.*, dense_rank() over(partition by grouped_by order by grouped_by, actual) as act from
(
select a.*,COALESCE(override_as_number,row_num) as actual FROM
(
select grouped_by , secondary_order_by ,
dense_rank() over ( partition by grouped_by order by grouped_by, secondary_order_by ) as row_num
,override_as_number,desired_dynamic_number from fiddle
) a
) b
) c ;
column "final" is the result
grouped_by | secondary_order_by | row_num | override_as_number | desired_dynamic_number | actual | act | final
------------+--------------------+---------+--------------------+------------------------+--------+-----+-------
AAA | a | 1 | 1 | 1 | 1 | 1 | 1
AAA | b | 2 | | 2 | 2 | 2 | 2
AAA | c | 3 | 3 | 3 | 3 | 3 | 3
AAA | d | 4 | 3 | 3 | 3 | 3 | 3
AAA | e | 5 | | 4 | 5 | 4 | 4
AAA | f | 6 | | 5 | 6 | 5 | 5
AAA | g | 7 | 999 | 999 | 999 | 6 | 999
XYZ | a | 1 | | 1 | 1 | 1 | 1
ZZZ | a | 1 | | 1 | 1 | 1 | 1
ZZZ | b | 2 | | 2 | 2 | 2 | 2
(10 rows)
Hope this helps!
The real world problem I was trying to solve did not have a nicely ordered secondary_order_by column, instead it would be something a bit more randomised (a created timestamp).
For the benefit of people who stumble across this question with a similar problem to solve, a colleague solved this problem using a cartesian join, who's solution I'm posting below. The solution is Snowflake SQL which should be possible to adapt to Postgres. It does fall down on higher override_as_number values though unless the from table(generator(rowcount => 1000)) 1000 value is not increased to something suitably high.
The SQL:
with tally_table as (
select row_number() over (order by seq4()) as gen_list
from table(generator(rowcount => 1000))
),
base as (
select *,
IFF(override_as_number IS NULL, row_number() OVER(PARTITION BY grouped_by, override_as_number order by random),override_as_number) as rownum
from "SANDPIT"."TEST"."SAMPLEDATA" order by grouped_by,override_as_number,random
) --select * from base order by grouped_by,random;
,
cart_product as (
select *
from tally_table cross join (Select distinct grouped_by from base ) as distinct_grouped_by
) --select * from cart_product;
,
filter_product as (
select *,
row_number() OVER(partition by cart_product.grouped_by order by cart_product.grouped_by,gen_list) as seq_order
from cart_product
where CONCAT(grouped_by,'~',gen_list) NOT IN (select concat(grouped_by,'~',override_as_number) from base where override_as_number is not null)
) --select * from try2 order by 2,3 ;
select base.grouped_by,
base.random,
base.override_as_number,
base.answer, -- This is hard coded as test data
IFF(override_as_number is null, gen_list, seq_order) as computed_answer
from base inner join filter_product on base.rownum = filter_product.seq_order and base.grouped_by = filter_product.grouped_by
order by base.grouped_by,
random;
In the end I went for a simpler solution using a temporary table and cursor to inject override_as_number values and shuffle other numbers.

Difference of top two values while GROUP BY

Suppose I have the following SQL Table:
id | score
------------
1 | 4433
1 | 678
1 | 1230
1 | 414
5 | 8899
5 | 123
6 | 2345
6 | 567
6 | 2323
Now I wanted to do a GROUP BY id operation wherein the score column would be modified as follows: take the absolute difference between the top two highest scores for each id.
For example, the response for the above query should be:
id | score
------------
1 | 3203
5 | 8776
6 | 22
How can I perform this query in PostgreSQL?
Using ROW_NUMBER along with pivoting logic we can try:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY score DESC) rn
FROM yourTable
)
SELECT id,
ABS(MAX(score) FILTER (WHERE rn = 1) -
MAX(score) FILTER (WHERE rn = 2)) AS score
FROM cte
GROUP BY id;
Demo

Get different LIMIT on each group on postgresql rank

To get 2 rows from each group I can use ROW_NUMBER() with condition <= 2 at last but my question is what If I want to get different limits on each group e.g 3 rows for section_id 1, 1 rows for 2 and 1 rows for 3?
Given the following table:
db=# SELECT * FROM xxx;
id | section_id | name
----+------------+------
1 | 1 | A
2 | 1 | B
3 | 1 | C
4 | 1 | D
5 | 2 | E
6 | 2 | F
7 | 3 | G
8 | 2 | H
(8 rows)
I get the first 2 rows (ordered by name) for each section_id, i.e. a result similar to:
id | section_id | name
----+------------+------
1 | 1 | A
2 | 1 | B
5 | 2 | E
6 | 2 | F
7 | 3 | G
(5 rows)
Current Query:
SELECT
*
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY section_id ORDER BY name) AS r,
t.*
FROM
xxx t) x
WHERE
x.r <= 2;
Create a table to contain the section limits, then join. The big advantage being that as new sections are required or limits change maintenance is reduced to a single table update and comes at very little cost. See example.
select s.section_id, s.name
from (select section_id, name
, row_number() over (partition by section_id order by name) rn
from sections
) s
left join section_limits sl on (sl.section_id = s.section_id)
where
s.rn <= coalesce(sl.limit_to,2);
Just fix up your where clause:
with numbered as (
select row_number() over (partition by section_id
order by name) as r,
t.*
from xxx t
)
select *
from numbered
where (section_id = 1 and r <= 3)
or (section_id = 2 and r <= 1)
or (section_id = 3 and r <= 1);

Combining three very similar queries? (Postgres)

So I have three queries. I'm trying to combine them all into one query. Here they are with their outputs:
Query 1:
SELECT distinct on (name) name, count(distinct board_id)
FROM tablea
INNER JOIN table_b on tablea.id = table_b.id
GROUP BY name
ORDER BY name ASC
Output:
A | 15
B | 26
C | 24
D | 11
E | 31
F | 32
G | 16
Query 2:
SELECT distinct on (name) name, count(board_id) as total
FROM tablea
INNER JOIN table_b on tablea.id = table_b.id
GROUP BY 1, board_id
ORDER BY name, total DESC
Output:
A | 435
B | 246
C | 611
D | 121
E | 436
F | 723
G | 293
Finally, the last query:
SELECT distinct on (name) name, count(board_id) as total
FROM tablea
INNER JOIN table_b on tablea.id = table_b.id
GROUP BY 1
ORDER BY name, total DESC
Output:
A | 14667
B | 65123
C | 87426
D | 55198
E | 80612
F | 31485
G | 43392
Is it possible to format it to be like this:
A | 15 | 435 | 14667
B | 26 | 246 | 65123
C | 24 | 611 | 87426
D | 11 | 121 | 55198
E | 31 | 436 | 80612
F | 32 | 723 | 31485
G | 16 | 293 | 43392
EDIT:
With #Clodoaldo Neto 's help, I combined the first and the third queries with this:
SELECT name, count(distinct board_id), count(board_id) as total
FROM tablea
INNER JOIN table_b on tablea.id = table_b.id
GROUP BY 1
ORDER BY description ASC
The only thing preventing me from combining the second query with this new one is the GROUP BY clause needing board_id to be in it. Any thoughts from here?
This is hard to get right without test data. But here is my try:
with s as (
select name, grouping(name, board_id) as grp,
count(distinct board_id) as dist_total,
count(*) as name_total,
count(*) as name_board_total
from
tablea
inner join
table_b on tablea.id = table_b.id
group by grouping sets ((name), (name, board_id))
)
select name, dist_total, name_total, name_board_total
from
(
select name, dist_total, name_total
from s
where grp = 1
) r
inner join
(
select name, max(name_board_total) as name_board_total
from s
where grp = 0
group by name
) q using (name)
order by name
https://www.postgresql.org/docs/current/static/queries-table-expressions.html#QUERIES-GROUPING-SETS

How does one print depth-level in a Postgres query that uses RECURSIVE to select descendants?

I have a table persons that contains a column for parent_id, which refers to another row in the same table. Assume this is the logical hierarchy:
P1
P2 P3 P4
P5 P6 P7 P8 P9 P10
I have written a query that prints all parents of a given node, along with the height above the node, and it seems to work fine:
WITH
RECURSIVE ancestors AS (
SELECT id, parent_id
FROM persons
WHERE id = 8
UNION
SELECT p.id, p.parent_id
FROM persons p
INNER JOIN ancestors
ON
p.id = ancestors.parent_id
)
SELECT persons.id, persons.name,
ROW_NUMBER() over () as height
FROM ancestors
INNER JOIN persons
ON
ancestors.id = persons.id
WHERE
persons.id <> 8
Result:
id | name | height
-------+-------------+---------
3 | P3 | 1
1 | P1 | 2
(2 rows)
I now want to write a query that similarly prints all descendants, along with depth. Here's the query so far (same as above with id and parent_id swapped in the UNION join):
WITH
RECURSIVE descendants AS (
SELECT id, parent_id
FROM persons
WHERE id = 1
UNION
SELECT p.id, p.parent_id
FROM persons p
INNER JOIN descendants
ON
p.parent_id = descendants.id
)
SELECT persons.id, persons.name,
ROW_NUMBER() over () as depth
FROM descendants
INNER JOIN persons
ON
descendants.id = persons.id
WHERE
persons.id <> 1
This gives the following result:
id | name | depth
-------+-------------+---------
2 | P2 | 1
3 | P3 | 2
4 | P4 | 3
5 | P5 | 4
6 | P6 | 5
7 | P7 | 6
8 | P8 | 7
9 | P9 | 8
10 | P10 | 9
(9 rows)
Clearly, the depth is all wrong. ROW_NUMBER() isn't doing what I want. How do I go about this?
I've thought about using a counter within the recursive part of the query itself, which increments every time it is run, but I'm not sure if there's a way to achieve that.
Use an additional integer column with values incremented at each recursive step.
WITH RECURSIVE descendants AS (
SELECT id, parent_id, 0 AS depth
FROM persons
WHERE id = 1
UNION
SELECT p.id, p.parent_id, d.depth+ 1
FROM persons p
INNER JOIN descendants d
ON p.parent_id = d.id
)
SELECT p.id, p.name, depth
FROM descendants d
INNER JOIN persons p
ON d.id = p.id
WHERE p.id <> 1;
id | name | depth
----+------+-------
2 | P2 | 1
3 | P3 | 1
4 | P4 | 1
5 | P5 | 2
6 | P6 | 2
7 | P7 | 2
8 | P8 | 2
9 | P9 | 2
10 | P10 | 2
(9 rows)
Db<>fiddle.