Renumbering a column in postgresql based on sorted values in that column - postgresql

Edit: I am using postgresql v8.3
I have a table that contains a column we can call column A.
Column A is populated, for our purposes, with arbitrary positive integers.
I want to renumber column A from 1 to N based on ordering the records of the table by column A ascending. (SELECT * FROM table ORDER BY A ASC;)
Is there a simple way to accomplish this without the need of building a postgresql function?
Example:
(Before:
A: 3,10,20,100,487,1,6)
(After:
A: 2,4,5,6,7,1,3)

Use the rank() (or dense_rank() ) WINDOW-functions (available since PG-8.4):
create table aaa
( id serial not null primary key
, num integer not null
, rnk integer not null default 0
);
insert into aaa(num) values( 3) , (10) , (20) , (100) , (487) , (1) , (6)
;
UPDATE aaa
SET rnk = w.rnk
FROM (
SELECT id
, rank() OVER (order by num ASC) AS rnk
FROM aaa
) w
WHERE w.id = aaa.id;
SELECT * FROM aaa
ORDER BY id
;
Results:
CREATE TABLE
INSERT 0 7
UPDATE 7
id | num | rnk
----+-----+-----
1 | 3 | 2
2 | 10 | 4
3 | 20 | 5
4 | 100 | 6
5 | 487 | 7
6 | 1 | 1
7 | 6 | 3
(7 rows)
IF window functions are not available, you could still count the number of rows before any row:
UPDATE aaa
SET rnk = w.rnk
FROM ( SELECT a0.id AS id
, COUNT(*) AS rnk
FROM aaa a0
JOIN aaa a1 ON a1.num <= a0.num
GROUP BY a0.id
) w
WHERE w.id = aaa.id;
SELECT * FROM aaa
ORDER BY id
;
Or the same with a scalar subquery:
UPDATE aaa a0
SET rnk =
( SELECT COUNT(*)
FROM aaa a1
WHERE a1.num <= a0.num
)
;

Related

PostgreSQL how to generate a partition row_number() with certain numbers overridden

I have an unusual problem I'm trying to solve with SQL where I need to generate sequential numbers for partitioned rows but override specific numbers with values from the data, while not breaking the sequence (unless the override causes a number to be used greater than the number of rows present).
I feel I might be able to achieve this by selecting the rows where I need to override the generated sequence value and the rows I don't need to override the value, then unioning them together and somehow using coalesce to get the desired dynamically generated sequence value, or maybe there's some way I can utilise recursive.
I've not been able to solve this problem yet, but I've put together a SQL Fiddle which provides a simplified version:
http://sqlfiddle.com/#!17/236b5/5
The desired_dynamic_number is what I'm trying to generate and the generated_dynamic_number is my current work-in-progress attempt.
Any pointers around the best way to achieve the desired_dynamic_number values dynamically?
Update:
I'm almost there using lag:
http://sqlfiddle.com/#!17/236b5/24
step-by-step demo:db<>fiddle
SELECT
*,
COALESCE( -- 3
first_value(override_as_number) OVER w -- 2
, 1
)
+ row_number() OVER w - 1 -- 4, 5
FROM (
SELECT
*,
SUM( -- 1
CASE WHEN override_as_number IS NOT NULL THEN 1 ELSE 0 END
) OVER (PARTITION BY grouped_by ORDER BY secondary_order_by)
as grouped
FROM sample
) s
WINDOW w AS (PARTITION BY grouped_by, grouped ORDER BY secondary_order_by)
Create a new subpartition within your partitions: This cumulative sum creates a unique group id for every group of records which starts with a override_as_number <> NULL followed by NULL records. So, for instance, your (AAA, d) to (AAA, f) belongs to the same subpartition/group.
first_value() gives the first value of such subpartition.
The COALESCE ensures a non-NULL result from the first_value() function if your partition starts with a NULL record.
row_number() - 1 creates a row count within a subpartition, starting with 0.
Adding the first_value() of a subpartition with the row count creates your result: Beginning with the one non-NULL record of a subpartition (adding the 0 row count), the first following NULL records results in the value +1 and so forth.
Below query gives exact result, but you need to verify with all combinations
select c.*,COALESCE(c.override_as_number,c.act) as final FROM
(
select b.*, dense_rank() over(partition by grouped_by order by grouped_by, actual) as act from
(
select a.*,COALESCE(override_as_number,row_num) as actual FROM
(
select grouped_by , secondary_order_by ,
dense_rank() over ( partition by grouped_by order by grouped_by, secondary_order_by ) as row_num
,override_as_number,desired_dynamic_number from fiddle
) a
) b
) c ;
column "final" is the result
grouped_by | secondary_order_by | row_num | override_as_number | desired_dynamic_number | actual | act | final
------------+--------------------+---------+--------------------+------------------------+--------+-----+-------
AAA | a | 1 | 1 | 1 | 1 | 1 | 1
AAA | b | 2 | | 2 | 2 | 2 | 2
AAA | c | 3 | 3 | 3 | 3 | 3 | 3
AAA | d | 4 | 3 | 3 | 3 | 3 | 3
AAA | e | 5 | | 4 | 5 | 4 | 4
AAA | f | 6 | | 5 | 6 | 5 | 5
AAA | g | 7 | 999 | 999 | 999 | 6 | 999
XYZ | a | 1 | | 1 | 1 | 1 | 1
ZZZ | a | 1 | | 1 | 1 | 1 | 1
ZZZ | b | 2 | | 2 | 2 | 2 | 2
(10 rows)
Hope this helps!
The real world problem I was trying to solve did not have a nicely ordered secondary_order_by column, instead it would be something a bit more randomised (a created timestamp).
For the benefit of people who stumble across this question with a similar problem to solve, a colleague solved this problem using a cartesian join, who's solution I'm posting below. The solution is Snowflake SQL which should be possible to adapt to Postgres. It does fall down on higher override_as_number values though unless the from table(generator(rowcount => 1000)) 1000 value is not increased to something suitably high.
The SQL:
with tally_table as (
select row_number() over (order by seq4()) as gen_list
from table(generator(rowcount => 1000))
),
base as (
select *,
IFF(override_as_number IS NULL, row_number() OVER(PARTITION BY grouped_by, override_as_number order by random),override_as_number) as rownum
from "SANDPIT"."TEST"."SAMPLEDATA" order by grouped_by,override_as_number,random
) --select * from base order by grouped_by,random;
,
cart_product as (
select *
from tally_table cross join (Select distinct grouped_by from base ) as distinct_grouped_by
) --select * from cart_product;
,
filter_product as (
select *,
row_number() OVER(partition by cart_product.grouped_by order by cart_product.grouped_by,gen_list) as seq_order
from cart_product
where CONCAT(grouped_by,'~',gen_list) NOT IN (select concat(grouped_by,'~',override_as_number) from base where override_as_number is not null)
) --select * from try2 order by 2,3 ;
select base.grouped_by,
base.random,
base.override_as_number,
base.answer, -- This is hard coded as test data
IFF(override_as_number is null, gen_list, seq_order) as computed_answer
from base inner join filter_product on base.rownum = filter_product.seq_order and base.grouped_by = filter_product.grouped_by
order by base.grouped_by,
random;
In the end I went for a simpler solution using a temporary table and cursor to inject override_as_number values and shuffle other numbers.

how to drop rows if a variale is less than x, in sql

I have the following query code
query = """
with double_entry_book as (
SELECT to_address as address, value as value
FROM `bigquery-public-data.crypto_ethereum.traces`
WHERE to_address is not null
AND block_timestamp < '2022-01-01 00:00:00'
AND status = 1
AND (call_type not in ('delegatecall', 'callcode', 'staticcall') or call_type is null)
union all
-- credits
SELECT from_address as address, -value as value
FROM `bigquery-public-data.crypto_ethereum.traces`
WHERE from_address is not null
AND block_timestamp < '2022-01-01 00:00:00'
AND status = 1
AND (call_type not in ('delegatecall', 'callcode', 'staticcall') or call_type is null)
union all
)
SELECT address,
sum(value) / 1000000000000000000 as balance
from double_entry_book
group by address
order by balance desc
LIMIT 15000000
"""
In the last part, I want to drop rows where "balance" is less than, let's say, 0.02 and then group, order, etc. I imagine this should be a simple code. Any help will be appreciated!
We can delete on a CTE and use returning to get the id's of the rows being deleted, but they still exist until the transaction is comitted.
CREATE TABLE t (
id serial,
variale int);
insert into t (variale) values
(1),(2),(3),(4),(5);
✓
5 rows affected
with del as
(delete from t
where variale < 3
returning id)
select
t.id,
t.variale,
del.id ids_being_deleted
from t
left join del
on t.id = del.id;
id | variale | ids_being_deleted
-: | ------: | ----------------:
1 | 1 | 1
2 | 2 | 2
3 | 3 | null
4 | 4 | null
5 | 5 | null
select * from t;
id | variale
-: | ------:
3 | 3
4 | 4
5 | 5
db<>fiddle here

Get different LIMIT on each group on postgresql rank

To get 2 rows from each group I can use ROW_NUMBER() with condition <= 2 at last but my question is what If I want to get different limits on each group e.g 3 rows for section_id 1, 1 rows for 2 and 1 rows for 3?
Given the following table:
db=# SELECT * FROM xxx;
id | section_id | name
----+------------+------
1 | 1 | A
2 | 1 | B
3 | 1 | C
4 | 1 | D
5 | 2 | E
6 | 2 | F
7 | 3 | G
8 | 2 | H
(8 rows)
I get the first 2 rows (ordered by name) for each section_id, i.e. a result similar to:
id | section_id | name
----+------------+------
1 | 1 | A
2 | 1 | B
5 | 2 | E
6 | 2 | F
7 | 3 | G
(5 rows)
Current Query:
SELECT
*
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY section_id ORDER BY name) AS r,
t.*
FROM
xxx t) x
WHERE
x.r <= 2;
Create a table to contain the section limits, then join. The big advantage being that as new sections are required or limits change maintenance is reduced to a single table update and comes at very little cost. See example.
select s.section_id, s.name
from (select section_id, name
, row_number() over (partition by section_id order by name) rn
from sections
) s
left join section_limits sl on (sl.section_id = s.section_id)
where
s.rn <= coalesce(sl.limit_to,2);
Just fix up your where clause:
with numbered as (
select row_number() over (partition by section_id
order by name) as r,
t.*
from xxx t
)
select *
from numbered
where (section_id = 1 and r <= 3)
or (section_id = 2 and r <= 1)
or (section_id = 3 and r <= 1);

Can window function LAG reference the column which value is being calculated?

I need to calculate value of some column X based on some other columns of the current record and the value of X for the previous record (using some partition and order). Basically I need to implement query in the form
SELECT <some fields>,
<some expression using LAG(X) OVER(PARTITION BY ... ORDER BY ...) AS X
FROM <table>
This is not possible because only existing columns can be used in window function so I'm looking way how to overcome this.
Here is an example. I have a table with events. Each event has type and time_stamp.
create table event (id serial, type integer, time_stamp integer);
I wan't to find "duplicate" events (to skip them). By duplicate I mean the following. Let's order all events for given type by time_stamp ascending. Then
the first event is not a duplicate
all events that follow non duplicate and are within some time frame after it (that is their time_stamp is not greater then time_stamp of the previous non duplicate plus some constant TIMEFRAME) are duplicates
the next event which time_stamp is greater than previous non duplicate by more than TIMEFRAME is not duplicate
and so on
For this data
insert into event (type, time_stamp)
values
(1, 1), (1, 2), (2, 2), (1,3), (1, 10), (2,10),
(1,15), (1, 21), (2,13),
(1, 40);
and TIMEFRAME=10 result should be
time_stamp | type | duplicate
-----------------------------
1 | 1 | false
2 | 1 | true
3 | 1 | true
10 | 1 | true
15 | 1 | false
21 | 1 | true
40 | 1 | false
2 | 2 | false
10 | 2 | true
13 | 2 | false
I could calculate the value of duplicate field based on current time_stamp and time_stamp of the previous non-duplicate event like this:
WITH evt AS (
SELECT
time_stamp,
CASE WHEN
time_stamp - LAG(current_non_dupl_time_stamp) OVER w >= TIMEFRAME
THEN
time_stamp
ELSE
LAG(current_non_dupl_time_stamp) OVER w
END AS current_non_dupl_time_stamp
FROM event
WINDOW w AS (PARTITION BY type ORDER BY time_stamp ASC)
)
SELECT time_stamp, time_stamp != current_non_dupl_time_stamp AS duplicate
But this does not work because the field which is calculated cannot be referenced in LAG:
ERROR: column "current_non_dupl_time_stamp" does not exist.
So the question: can I rewrite this query to achieve the effect I need?
Naive recursive chain knitter:
-- temp view to avoid nested CTE
CREATE TEMP VIEW drag AS
SELECT e.type,e.time_stamp
, ROW_NUMBER() OVER www as rn -- number the records
, FIRST_VALUE(e.time_stamp) OVER www as fst -- the "group leader"
, EXISTS (SELECT * FROM event x
WHERE x.type = e.type
AND x.time_stamp < e.time_stamp) AS is_dup
FROM event e
WINDOW www AS (PARTITION BY type ORDER BY time_stamp)
;
WITH RECURSIVE ttt AS (
SELECT d0.*
FROM drag d0 WHERE d0.is_dup = False -- only the "group leaders"
UNION ALL
SELECT d1.type, d1.time_stamp, d1.rn
, CASE WHEN d1.time_stamp - ttt.fst > 20 THEN d1.time_stamp
ELSE ttt.fst END AS fst -- new "group leader"
, CASE WHEN d1.time_stamp - ttt.fst > 20 THEN False
ELSE True END AS is_dup
FROM drag d1
JOIN ttt ON d1.type = ttt.type AND d1.rn = ttt.rn+1
)
SELECT * FROM ttt
ORDER BY type, time_stamp
;
Results:
CREATE TABLE
INSERT 0 10
CREATE VIEW
type | time_stamp | rn | fst | is_dup
------+------------+----+-----+--------
1 | 1 | 1 | 1 | f
1 | 2 | 2 | 1 | t
1 | 3 | 3 | 1 | t
1 | 10 | 4 | 1 | t
1 | 15 | 5 | 1 | t
1 | 21 | 6 | 1 | t
1 | 40 | 7 | 40 | f
2 | 2 | 1 | 2 | f
2 | 10 | 2 | 2 | t
2 | 13 | 3 | 2 | t
(10 rows)
An alternative to a recursive approach is a custom aggregate. Once you master the technique of writing your own aggregates, creating transition and final functions is easy and logical.
State transition function:
create or replace function is_duplicate(st int[], time_stamp int, timeframe int)
returns int[] language plpgsql as $$
begin
if st is null or st[1] + timeframe <= time_stamp
then
st[1] := time_stamp;
end if;
st[2] := time_stamp;
return st;
end $$;
Final function:
create or replace function is_duplicate_final(st int[])
returns boolean language sql as $$
select st[1] <> st[2];
$$;
Aggregate:
create aggregate is_duplicate_agg(time_stamp int, timeframe int)
(
sfunc = is_duplicate,
stype = int[],
finalfunc = is_duplicate_final
);
Query:
select *, is_duplicate_agg(time_stamp, 10) over w
from event
window w as (partition by type order by time_stamp asc)
order by type, time_stamp;
id | type | time_stamp | is_duplicate_agg
----+------+------------+------------------
1 | 1 | 1 | f
2 | 1 | 2 | t
4 | 1 | 3 | t
5 | 1 | 10 | t
7 | 1 | 15 | f
8 | 1 | 21 | t
10 | 1 | 40 | f
3 | 2 | 2 | f
6 | 2 | 10 | t
9 | 2 | 13 | f
(10 rows)
Read in the documentation: 37.10. User-defined Aggregates and CREATE AGGREGATE.
This feels more like a recursive problem than windowing function. The following query obtained the desired results:
WITH RECURSIVE base(type, time_stamp) AS (
-- 3. base of recursive query
SELECT x.type, x.time_stamp, y.next_time_stamp
FROM
-- 1. start with the initial records of each type
( SELECT type, min(time_stamp) AS time_stamp
FROM event
GROUP BY type
) x
LEFT JOIN LATERAL
-- 2. for each of the initial records, find the next TIMEFRAME (10) in the future
( SELECT MIN(time_stamp) next_time_stamp
FROM event
WHERE type = x.type
AND time_stamp > (x.time_stamp + 10)
) y ON true
UNION ALL
-- 4. recursive join, same logic as base
SELECT e.type, e.time_stamp, z.next_time_stamp
FROM event e
JOIN base b ON (e.type = b.type AND e.time_stamp = b.next_time_stamp)
LEFT JOIN LATERAL
( SELECT MIN(time_stamp) next_time_stamp
FROM event
WHERE type = e.type
AND time_stamp > (e.time_stamp + 10)
) z ON true
)
-- The actual query:
-- 5a. All records from base are not duplicates
SELECT time_stamp, type, false
FROM base
UNION
-- 5b. All records from event that are not in base are duplicates
SELECT time_stamp, type, true
FROM event
WHERE (type, time_stamp) NOT IN (SELECT type, time_stamp FROM base)
ORDER BY type, time_stamp
There are a lot of caveats with this. It assumes no duplicate time_stamp for a given type. Really the joins should be based on a unique id rather than type and time_stamp. I didn't test this much, but it may at least suggest an approach.
This is my first time to try a LATERAL join. So there may be a way to simplify that moe. Really what I wanted to do was a recursive CTE with the recursive part using MIN(time_stamp) based on time_stamp > (x.time_stamp + 10), but aggregate functions are not allowed in CTEs in that manner. But it seems the lateral join can be used in the CTE.

pl sql query recuresive looping

i have only one table "tbl_test"
Which have table filed given below
tbl_test table
trx_id | proj_num | parent_num|
1 | 14 | 0 |
2 | 14 | 1 |
3 | 14 | 2 |
4 | 14 | 0 |
5 | 14 | 3 |
6 | 15 | 0 |
Result i want is : when trx_id value 5 is fetched
it's a parent child relationship. so,
trx_id -> parent_num
5 -> 3
3 -> 2
2 -> 1
That means output value:
3
2
1
Getting all parent chain
Query i used :
SELECT * FROM (
WITH RECURSIVE tree_data(project_num, task_num, parent_task_num) AS(
SELECT project_num, task_num, parent_task_num
FROM tb_task
WHERE project_num = 14 and task_num = 5
UNION ALL
SELECT child.project_num, child.task_num, child.parent_task_num
FROM tree_data parent Join tb_task child
ON parent.task_num = child.task_num AND parent.task_num = child.parent_task_num
)
SELECT project_num, task_num, parent_task_num
FROM tree_data
) AS tree_list ;
Can anybody help me ?
There's no need to do this with pl/pgsql. You can do it straight in SQL. Consider:
WITH RECURSIVE my_tree AS (
SELECT trx_id as id, parent_id as parent, trx_id::text as path, 1 as level
FROM tbl_test
WHERE trx_id = 5 -- start value
UNION ALL
SELECT t.trx_id, t.parent_id, p.path || ',' || t.trx_id::text, p.level + 1
FROM my_tree p
JOIN tbl_text t ON t.trx_id = p.parent
)
select * from my_tree;
If you are using PostgresSQL, try using a WITH clause:
WITH regional_sales AS (
SELECT region, SUM(amount) AS total_sales
FROM orders
GROUP BY region
), top_regions AS (
SELECT region
FROM regional_sales
WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales)
)
SELECT region,
product,
SUM(quantity) AS product_units,
SUM(amount) AS product_sales
FROM orders
WHERE region IN (SELECT region FROM top_regions)
GROUP BY region, product;