Get different LIMIT on each group on postgresql rank - postgresql

To get 2 rows from each group I can use ROW_NUMBER() with condition <= 2 at last but my question is what If I want to get different limits on each group e.g 3 rows for section_id 1, 1 rows for 2 and 1 rows for 3?
Given the following table:
db=# SELECT * FROM xxx;
id | section_id | name
----+------------+------
1 | 1 | A
2 | 1 | B
3 | 1 | C
4 | 1 | D
5 | 2 | E
6 | 2 | F
7 | 3 | G
8 | 2 | H
(8 rows)
I get the first 2 rows (ordered by name) for each section_id, i.e. a result similar to:
id | section_id | name
----+------------+------
1 | 1 | A
2 | 1 | B
5 | 2 | E
6 | 2 | F
7 | 3 | G
(5 rows)
Current Query:
SELECT
*
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY section_id ORDER BY name) AS r,
t.*
FROM
xxx t) x
WHERE
x.r <= 2;

Create a table to contain the section limits, then join. The big advantage being that as new sections are required or limits change maintenance is reduced to a single table update and comes at very little cost. See example.
select s.section_id, s.name
from (select section_id, name
, row_number() over (partition by section_id order by name) rn
from sections
) s
left join section_limits sl on (sl.section_id = s.section_id)
where
s.rn <= coalesce(sl.limit_to,2);

Just fix up your where clause:
with numbered as (
select row_number() over (partition by section_id
order by name) as r,
t.*
from xxx t
)
select *
from numbered
where (section_id = 1 and r <= 3)
or (section_id = 2 and r <= 1)
or (section_id = 3 and r <= 1);

Related

PostgreSQL how to generate a partition row_number() with certain numbers overridden

I have an unusual problem I'm trying to solve with SQL where I need to generate sequential numbers for partitioned rows but override specific numbers with values from the data, while not breaking the sequence (unless the override causes a number to be used greater than the number of rows present).
I feel I might be able to achieve this by selecting the rows where I need to override the generated sequence value and the rows I don't need to override the value, then unioning them together and somehow using coalesce to get the desired dynamically generated sequence value, or maybe there's some way I can utilise recursive.
I've not been able to solve this problem yet, but I've put together a SQL Fiddle which provides a simplified version:
http://sqlfiddle.com/#!17/236b5/5
The desired_dynamic_number is what I'm trying to generate and the generated_dynamic_number is my current work-in-progress attempt.
Any pointers around the best way to achieve the desired_dynamic_number values dynamically?
Update:
I'm almost there using lag:
http://sqlfiddle.com/#!17/236b5/24
step-by-step demo:db<>fiddle
SELECT
*,
COALESCE( -- 3
first_value(override_as_number) OVER w -- 2
, 1
)
+ row_number() OVER w - 1 -- 4, 5
FROM (
SELECT
*,
SUM( -- 1
CASE WHEN override_as_number IS NOT NULL THEN 1 ELSE 0 END
) OVER (PARTITION BY grouped_by ORDER BY secondary_order_by)
as grouped
FROM sample
) s
WINDOW w AS (PARTITION BY grouped_by, grouped ORDER BY secondary_order_by)
Create a new subpartition within your partitions: This cumulative sum creates a unique group id for every group of records which starts with a override_as_number <> NULL followed by NULL records. So, for instance, your (AAA, d) to (AAA, f) belongs to the same subpartition/group.
first_value() gives the first value of such subpartition.
The COALESCE ensures a non-NULL result from the first_value() function if your partition starts with a NULL record.
row_number() - 1 creates a row count within a subpartition, starting with 0.
Adding the first_value() of a subpartition with the row count creates your result: Beginning with the one non-NULL record of a subpartition (adding the 0 row count), the first following NULL records results in the value +1 and so forth.
Below query gives exact result, but you need to verify with all combinations
select c.*,COALESCE(c.override_as_number,c.act) as final FROM
(
select b.*, dense_rank() over(partition by grouped_by order by grouped_by, actual) as act from
(
select a.*,COALESCE(override_as_number,row_num) as actual FROM
(
select grouped_by , secondary_order_by ,
dense_rank() over ( partition by grouped_by order by grouped_by, secondary_order_by ) as row_num
,override_as_number,desired_dynamic_number from fiddle
) a
) b
) c ;
column "final" is the result
grouped_by | secondary_order_by | row_num | override_as_number | desired_dynamic_number | actual | act | final
------------+--------------------+---------+--------------------+------------------------+--------+-----+-------
AAA | a | 1 | 1 | 1 | 1 | 1 | 1
AAA | b | 2 | | 2 | 2 | 2 | 2
AAA | c | 3 | 3 | 3 | 3 | 3 | 3
AAA | d | 4 | 3 | 3 | 3 | 3 | 3
AAA | e | 5 | | 4 | 5 | 4 | 4
AAA | f | 6 | | 5 | 6 | 5 | 5
AAA | g | 7 | 999 | 999 | 999 | 6 | 999
XYZ | a | 1 | | 1 | 1 | 1 | 1
ZZZ | a | 1 | | 1 | 1 | 1 | 1
ZZZ | b | 2 | | 2 | 2 | 2 | 2
(10 rows)
Hope this helps!
The real world problem I was trying to solve did not have a nicely ordered secondary_order_by column, instead it would be something a bit more randomised (a created timestamp).
For the benefit of people who stumble across this question with a similar problem to solve, a colleague solved this problem using a cartesian join, who's solution I'm posting below. The solution is Snowflake SQL which should be possible to adapt to Postgres. It does fall down on higher override_as_number values though unless the from table(generator(rowcount => 1000)) 1000 value is not increased to something suitably high.
The SQL:
with tally_table as (
select row_number() over (order by seq4()) as gen_list
from table(generator(rowcount => 1000))
),
base as (
select *,
IFF(override_as_number IS NULL, row_number() OVER(PARTITION BY grouped_by, override_as_number order by random),override_as_number) as rownum
from "SANDPIT"."TEST"."SAMPLEDATA" order by grouped_by,override_as_number,random
) --select * from base order by grouped_by,random;
,
cart_product as (
select *
from tally_table cross join (Select distinct grouped_by from base ) as distinct_grouped_by
) --select * from cart_product;
,
filter_product as (
select *,
row_number() OVER(partition by cart_product.grouped_by order by cart_product.grouped_by,gen_list) as seq_order
from cart_product
where CONCAT(grouped_by,'~',gen_list) NOT IN (select concat(grouped_by,'~',override_as_number) from base where override_as_number is not null)
) --select * from try2 order by 2,3 ;
select base.grouped_by,
base.random,
base.override_as_number,
base.answer, -- This is hard coded as test data
IFF(override_as_number is null, gen_list, seq_order) as computed_answer
from base inner join filter_product on base.rownum = filter_product.seq_order and base.grouped_by = filter_product.grouped_by
order by base.grouped_by,
random;
In the end I went for a simpler solution using a temporary table and cursor to inject override_as_number values and shuffle other numbers.

how to drop rows if a variale is less than x, in sql

I have the following query code
query = """
with double_entry_book as (
SELECT to_address as address, value as value
FROM `bigquery-public-data.crypto_ethereum.traces`
WHERE to_address is not null
AND block_timestamp < '2022-01-01 00:00:00'
AND status = 1
AND (call_type not in ('delegatecall', 'callcode', 'staticcall') or call_type is null)
union all
-- credits
SELECT from_address as address, -value as value
FROM `bigquery-public-data.crypto_ethereum.traces`
WHERE from_address is not null
AND block_timestamp < '2022-01-01 00:00:00'
AND status = 1
AND (call_type not in ('delegatecall', 'callcode', 'staticcall') or call_type is null)
union all
)
SELECT address,
sum(value) / 1000000000000000000 as balance
from double_entry_book
group by address
order by balance desc
LIMIT 15000000
"""
In the last part, I want to drop rows where "balance" is less than, let's say, 0.02 and then group, order, etc. I imagine this should be a simple code. Any help will be appreciated!
We can delete on a CTE and use returning to get the id's of the rows being deleted, but they still exist until the transaction is comitted.
CREATE TABLE t (
id serial,
variale int);
insert into t (variale) values
(1),(2),(3),(4),(5);
✓
5 rows affected
with del as
(delete from t
where variale < 3
returning id)
select
t.id,
t.variale,
del.id ids_being_deleted
from t
left join del
on t.id = del.id;
id | variale | ids_being_deleted
-: | ------: | ----------------:
1 | 1 | 1
2 | 2 | 2
3 | 3 | null
4 | 4 | null
5 | 5 | null
select * from t;
id | variale
-: | ------:
3 | 3
4 | 4
5 | 5
db<>fiddle here

Finding the length of a series in postgres

A tricky query for postgres. Imagine, I have a set of rows with a boolean column called (for example) success. Like this:
id | success
9 | false
8 | false
7 | true
6 | true
5 | true
4 | false
3 | false
2 | true
1 | false
And I need to calculate a length of the latest (not) successful series. E. g. in this case it would be "3" for successful and "2" for not successful. Or using window functions, then something like:
id | success | length
9 | false | 2
8 | false | 2
7 | true | 3
6 | true | 3
5 | true | 3
4 | false | 1
3 | true | 2
2 | true | 2
1 | false | 1
(note that I generally need a length of only the latest series, not all of those)
The closest answer I've found so far was this article:
https://jaxenter.com/10-sql-tricks-that-you-didnt-think-were-possible-125934.html
(See #5)
However, postgres doesn't support "IGNORE NULLS" option so the query doesn't work. Without "IGNORE NULLS" it simply returns me nulls in length column.
Here is the closest I was able to get:
WITH
trx1(id, success, rn) AS (
SELECT id, success, row_number() OVER (ORDER BY id desc)
FROM results
),
trx2(id, success, rn, lo, hi) AS (
SELECT trx1.*,
CASE WHEN coalesce(lag(success) OVER (ORDER BY id DESC), FALSE) != success THEN rn END,
CASE WHEN coalesce(lead(success) OVER (ORDER BY id DESC), FALSE) != success THEN rn END
FROM trx1
)
SELECT trx2.*, 1
- last_value (lo) IGNORE nulls OVER (ORDER BY id DESC ROWS BETWEEN
UNBOUNDED PRECEDING AND CURRENT ROW)
+ first_value(hi) OVER (ORDER BY id DESC ROWS BETWEEN CURRENT ROW
AND UNBOUNDED FOLLOWING)
AS length FROM trx2;
Do you have any ideas of such a query?
You can use the window function row_number() to designate series:
select max(id) as max_id, success, count(*) as length
from (
select *, row_number() over wa - row_number() over wp as grp
from my_table
window
wp as (partition by success order by id desc),
wa as (order by id desc)
) s
group by success, grp
order by 1 desc
max_id | success | length
--------+---------+--------
9 | f | 2
7 | t | 3
4 | f | 2
2 | t | 1
1 | f | 1
(5 rows)
DbFiddle.
Even though answer by Klin is totally correct, I'd like to post another solution my friend suggested:
with last_success as (
select max(id) id from my_table where success
)
select count(mt.id) last_fails_count
from my_table mt, last_success lt
where mt.id > lt.id;
--------------------
| last_fails_count |
--------------------
| 2 |
--------------------
DbFiddle
It is twice faster if I only need to get the last failing or successful series.

Increment Row_Number Only Where Distinct

I have the following table, which I've made very simple because I do not know how to format it as a table on here (side note if anyone could link me to an easy tutorial on that I would be forever grateful).
id
1
1
1
2
2
2
I'd like to add another column which increments in number only on distinct IDs so the outcome should be
Id
1
1
1
2
2
2
rowNum
1
1
1
2
2
2
Currently all I can manage to get is:
id
1
1
1
2
2
2
rowNum
1
2
3
4
5
6
I'm missing something very simple here as I'm confident I should be able to solve this issue using either row_number or rank and a window function but I cannot figure it out.
Use DENSE_RANK() instead of ROW_NUMBER():
SELECT
id,
DENSE_RANK() OVER (ORDER BY id) dr
FROM yourTable
Demo
You can do this with a subquery self join, as well.
mysql> select id,
> (select count(distinct id)
> from
> testtest b
> where b.id < a.id)
> from testtest a;
+------+---------------------------------------------------------------+
| id | (select count(distinct id) from testtest b where b.id < a.id) |
+------+---------------------------------------------------------------+
| 1 | 0 |
| 1 | 0 |
| 1 | 0 |
| 2 | 1 |
| 2 | 1 |
| 2 | 1 |
+------+---------------------------------------------------------------+
6 rows in set (0.01 sec)
And one more way:
select a.id, b.idRank
from testtest a,
(
select id,
rank() over
(order by id) as idRank
from (
select distinct id
from testtest
) testtest2
) b
where a.id = b.id

pl sql query recuresive looping

i have only one table "tbl_test"
Which have table filed given below
tbl_test table
trx_id | proj_num | parent_num|
1 | 14 | 0 |
2 | 14 | 1 |
3 | 14 | 2 |
4 | 14 | 0 |
5 | 14 | 3 |
6 | 15 | 0 |
Result i want is : when trx_id value 5 is fetched
it's a parent child relationship. so,
trx_id -> parent_num
5 -> 3
3 -> 2
2 -> 1
That means output value:
3
2
1
Getting all parent chain
Query i used :
SELECT * FROM (
WITH RECURSIVE tree_data(project_num, task_num, parent_task_num) AS(
SELECT project_num, task_num, parent_task_num
FROM tb_task
WHERE project_num = 14 and task_num = 5
UNION ALL
SELECT child.project_num, child.task_num, child.parent_task_num
FROM tree_data parent Join tb_task child
ON parent.task_num = child.task_num AND parent.task_num = child.parent_task_num
)
SELECT project_num, task_num, parent_task_num
FROM tree_data
) AS tree_list ;
Can anybody help me ?
There's no need to do this with pl/pgsql. You can do it straight in SQL. Consider:
WITH RECURSIVE my_tree AS (
SELECT trx_id as id, parent_id as parent, trx_id::text as path, 1 as level
FROM tbl_test
WHERE trx_id = 5 -- start value
UNION ALL
SELECT t.trx_id, t.parent_id, p.path || ',' || t.trx_id::text, p.level + 1
FROM my_tree p
JOIN tbl_text t ON t.trx_id = p.parent
)
select * from my_tree;
If you are using PostgresSQL, try using a WITH clause:
WITH regional_sales AS (
SELECT region, SUM(amount) AS total_sales
FROM orders
GROUP BY region
), top_regions AS (
SELECT region
FROM regional_sales
WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales)
)
SELECT region,
product,
SUM(quantity) AS product_units,
SUM(amount) AS product_sales
FROM orders
WHERE region IN (SELECT region FROM top_regions)
GROUP BY region, product;