Postgres query : using previous dynamically created colum value in next - postgresql

I'm trying to implement what I have in code as a postgres query.
The following example isn't exactly what we're trying to do but I hope it shows how I'm trying to use the value from a previously calculated row in the next.
A sample table to help me demonstrate what I'm trying to do :
test=# select * from test ;
id | field1 | field2 | field3 | score
----+--------+--------+--------+-------
1 | 1 | 3 | 2 | 1.25
2 | 1 | -1 | 1 |
3 | 2 | 1 | 5 |
4 | 3 | -2 | 4 |
Here's the query in progress:
select id,
coalesce (
score,
case when lag_field3 = 2 then 0.25*(3*field1+field2) end
) as new_score
from (
select id, field1, field2, field3, score,
lag (field3) over (order by id) as lag_field3
from test
) inner1 ;
Which returns what I want so far ...
id | new_score
----+-----------
1 | 1.25
2 | 0.5
3 |
4 |
The next iteration of the query:
select id,
coalesce (
score,
case when lag_field3 = 2 then 0.25*(3*field1+field2) end,
case when field1 = 2 then 0.75 * lag (new_score) end
) as new_score
from (
select id, field1, field2, field3, score,
lag (field3) over (order by id) as lag_field3
from test
) inner1 ;
The difference is this :
case when field1 = 2 then 0.75 * lag (new_score) end
I know and understand why this won't work.
I've aliased the calculated field as new_score and when field1 = 2, I want 0.75 * the previous rows new_score value.
I understand that new_score is an alias and can't be used.
Is there some way I can accomplish this? I could try to copy that expression, wrap a lag around it, alias that as something else and try to work with that but that would get very messy.
Any ideas?
Many thanks.

Postgres lets you use windows in CASE statements. Probably you were missing the OVER (ORDER BY id) part. You can also define different windows but you can't use windows in conjunction with GROUP BY. Also, it won't let you use annidate windows, so you have to write down some subqueries or CTEs.
Here's the query:
SELECT id, COALESCE(tmp_score,
CASE
WHEN field1 = 2
THEN 0.75 * LAG(tmp_score) OVER (ORDER BY id)
-- missing ELSE statement here
END
) AS new_score
FROM (
SELECT id, field1,
COALESCE (
score,
CASE
WHEN LAG(field3) OVER (ORDER BY id) = 2
THEN 0.25*(3*field1+field2)
END
) AS tmp_score
FROM test
) inner1
The code to create and populate the table:
CREATE TABLE test(
id int,
field1 int,
field2 int,
field3 int,
score numeric
);
INSERT INTO test VALUES
(1, 1, 3, 2, 1.25),
(2, 1, -1, 1, NULL),
(3, 2, 1, 5, NULL),
(4, 3, -2, 4, NULL);
The query returns this output:
id | new_score
----+-----------
1 | 1.25
2 | 0.50
3 | 0.3750
4 |

Related

PostgreSQL how to generate a partition row_number() with certain numbers overridden

I have an unusual problem I'm trying to solve with SQL where I need to generate sequential numbers for partitioned rows but override specific numbers with values from the data, while not breaking the sequence (unless the override causes a number to be used greater than the number of rows present).
I feel I might be able to achieve this by selecting the rows where I need to override the generated sequence value and the rows I don't need to override the value, then unioning them together and somehow using coalesce to get the desired dynamically generated sequence value, or maybe there's some way I can utilise recursive.
I've not been able to solve this problem yet, but I've put together a SQL Fiddle which provides a simplified version:
http://sqlfiddle.com/#!17/236b5/5
The desired_dynamic_number is what I'm trying to generate and the generated_dynamic_number is my current work-in-progress attempt.
Any pointers around the best way to achieve the desired_dynamic_number values dynamically?
Update:
I'm almost there using lag:
http://sqlfiddle.com/#!17/236b5/24
step-by-step demo:db<>fiddle
SELECT
*,
COALESCE( -- 3
first_value(override_as_number) OVER w -- 2
, 1
)
+ row_number() OVER w - 1 -- 4, 5
FROM (
SELECT
*,
SUM( -- 1
CASE WHEN override_as_number IS NOT NULL THEN 1 ELSE 0 END
) OVER (PARTITION BY grouped_by ORDER BY secondary_order_by)
as grouped
FROM sample
) s
WINDOW w AS (PARTITION BY grouped_by, grouped ORDER BY secondary_order_by)
Create a new subpartition within your partitions: This cumulative sum creates a unique group id for every group of records which starts with a override_as_number <> NULL followed by NULL records. So, for instance, your (AAA, d) to (AAA, f) belongs to the same subpartition/group.
first_value() gives the first value of such subpartition.
The COALESCE ensures a non-NULL result from the first_value() function if your partition starts with a NULL record.
row_number() - 1 creates a row count within a subpartition, starting with 0.
Adding the first_value() of a subpartition with the row count creates your result: Beginning with the one non-NULL record of a subpartition (adding the 0 row count), the first following NULL records results in the value +1 and so forth.
Below query gives exact result, but you need to verify with all combinations
select c.*,COALESCE(c.override_as_number,c.act) as final FROM
(
select b.*, dense_rank() over(partition by grouped_by order by grouped_by, actual) as act from
(
select a.*,COALESCE(override_as_number,row_num) as actual FROM
(
select grouped_by , secondary_order_by ,
dense_rank() over ( partition by grouped_by order by grouped_by, secondary_order_by ) as row_num
,override_as_number,desired_dynamic_number from fiddle
) a
) b
) c ;
column "final" is the result
grouped_by | secondary_order_by | row_num | override_as_number | desired_dynamic_number | actual | act | final
------------+--------------------+---------+--------------------+------------------------+--------+-----+-------
AAA | a | 1 | 1 | 1 | 1 | 1 | 1
AAA | b | 2 | | 2 | 2 | 2 | 2
AAA | c | 3 | 3 | 3 | 3 | 3 | 3
AAA | d | 4 | 3 | 3 | 3 | 3 | 3
AAA | e | 5 | | 4 | 5 | 4 | 4
AAA | f | 6 | | 5 | 6 | 5 | 5
AAA | g | 7 | 999 | 999 | 999 | 6 | 999
XYZ | a | 1 | | 1 | 1 | 1 | 1
ZZZ | a | 1 | | 1 | 1 | 1 | 1
ZZZ | b | 2 | | 2 | 2 | 2 | 2
(10 rows)
Hope this helps!
The real world problem I was trying to solve did not have a nicely ordered secondary_order_by column, instead it would be something a bit more randomised (a created timestamp).
For the benefit of people who stumble across this question with a similar problem to solve, a colleague solved this problem using a cartesian join, who's solution I'm posting below. The solution is Snowflake SQL which should be possible to adapt to Postgres. It does fall down on higher override_as_number values though unless the from table(generator(rowcount => 1000)) 1000 value is not increased to something suitably high.
The SQL:
with tally_table as (
select row_number() over (order by seq4()) as gen_list
from table(generator(rowcount => 1000))
),
base as (
select *,
IFF(override_as_number IS NULL, row_number() OVER(PARTITION BY grouped_by, override_as_number order by random),override_as_number) as rownum
from "SANDPIT"."TEST"."SAMPLEDATA" order by grouped_by,override_as_number,random
) --select * from base order by grouped_by,random;
,
cart_product as (
select *
from tally_table cross join (Select distinct grouped_by from base ) as distinct_grouped_by
) --select * from cart_product;
,
filter_product as (
select *,
row_number() OVER(partition by cart_product.grouped_by order by cart_product.grouped_by,gen_list) as seq_order
from cart_product
where CONCAT(grouped_by,'~',gen_list) NOT IN (select concat(grouped_by,'~',override_as_number) from base where override_as_number is not null)
) --select * from try2 order by 2,3 ;
select base.grouped_by,
base.random,
base.override_as_number,
base.answer, -- This is hard coded as test data
IFF(override_as_number is null, gen_list, seq_order) as computed_answer
from base inner join filter_product on base.rownum = filter_product.seq_order and base.grouped_by = filter_product.grouped_by
order by base.grouped_by,
random;
In the end I went for a simpler solution using a temporary table and cursor to inject override_as_number values and shuffle other numbers.

how to drop rows if a variale is less than x, in sql

I have the following query code
query = """
with double_entry_book as (
SELECT to_address as address, value as value
FROM `bigquery-public-data.crypto_ethereum.traces`
WHERE to_address is not null
AND block_timestamp < '2022-01-01 00:00:00'
AND status = 1
AND (call_type not in ('delegatecall', 'callcode', 'staticcall') or call_type is null)
union all
-- credits
SELECT from_address as address, -value as value
FROM `bigquery-public-data.crypto_ethereum.traces`
WHERE from_address is not null
AND block_timestamp < '2022-01-01 00:00:00'
AND status = 1
AND (call_type not in ('delegatecall', 'callcode', 'staticcall') or call_type is null)
union all
)
SELECT address,
sum(value) / 1000000000000000000 as balance
from double_entry_book
group by address
order by balance desc
LIMIT 15000000
"""
In the last part, I want to drop rows where "balance" is less than, let's say, 0.02 and then group, order, etc. I imagine this should be a simple code. Any help will be appreciated!
We can delete on a CTE and use returning to get the id's of the rows being deleted, but they still exist until the transaction is comitted.
CREATE TABLE t (
id serial,
variale int);
insert into t (variale) values
(1),(2),(3),(4),(5);
✓
5 rows affected
with del as
(delete from t
where variale < 3
returning id)
select
t.id,
t.variale,
del.id ids_being_deleted
from t
left join del
on t.id = del.id;
id | variale | ids_being_deleted
-: | ------: | ----------------:
1 | 1 | 1
2 | 2 | 2
3 | 3 | null
4 | 4 | null
5 | 5 | null
select * from t;
id | variale
-: | ------:
3 | 3
4 | 4
5 | 5
db<>fiddle here

How to write a select query for displaying data on a table in another way using Postgresql?

I want to write a select query to pick data from a table which is shown in this image below,PICTURE_1
1.Table Containing Data
and display it like this image in this link below, PICTURE_2
2.Result of the query
About the data: The first picture shows data logged into a table for 2 seconds from 3 IDs(1,2&3) having 2 sub IDs (aa&bb). Values and timestamp are also displayed in the picture. The table conatins only 3 column as shown in PICTURE_1. Could you guys help me write a query to display data in the table to get displayed as shown in the second image using Postgresql?. You can extract ID name using substring function. The language that Im using is plpgsql. Any ideas/logic also will be good.Thank you for your time.
Please try this. Here row value has been shown in column wise and also use CTE.
-- PostgreSQL(v11)
WITH cte_t AS (
SELECT LEFT(name, 1) id
, RIGHT(name, POSITION('.' IN REVERSE(name)) - 1) t_name
, value
, time_stamp
FROM test
)
SELECT id
, time_stamp :: DATE "date"
, time_stamp :: TIME "time"
, MAX(CASE WHEN t_name = 'aa' THEN value END) "aa"
, MAX(CASE WHEN t_name = 'bb' THEN value END) "bb"
FROM cte_t
GROUP BY id, time_stamp
ORDER BY date, time, id;
Please check from url https://dbfiddle.uk/?rdbms=postgres_11&fiddle=6d35047560b3f83e6c906584b23034e9
Check this query dbfiddle
with cte (name, value, timeStamp) as (values
('1.aa', 1, '2021-08-20 10:10:01'),
('2.aa', 2, '2021-08-20 10:10:01'),
('3.aa', 3, '2021-08-20 10:10:01'),
('1.bb', 4, '2021-08-20 10:10:01'),
('2.bb', 5, '2021-08-20 10:10:01'),
('3.bb', 6, '2021-08-20 10:10:01'),
('1.aa', 7, '2021-08-20 10:10:02'),
('2.aa', 8, '2021-08-20 10:10:02'),
('3.aa', 9, '2021-08-20 10:10:02'),
('1.bb', 0, '2021-08-20 10:10:02'),
('2.bb', 1, '2021-08-20 10:10:02'),
('3.bb', 2, '2021-08-20 10:10:02')
), sub_cte as (
select split_name[1] as id, split_name[2] as name, value, tt::date as date, tt::time as time from (
select
regexp_split_to_array(name, '\.') split_name,
value,
to_timestamp(timestamp, 'YYYY-MM-DD HH:MI:SS') as tt
from cte
) foo
)
select id, date, time, a.value as aa, b.value as bb from sub_cte a
left join (
select * from sub_cte where name = 'bb'
) as b using (id, date, time)
where a.name = 'aa'
Result
id | date | time | aa | bb
----+------------+----------+----+----
1 | 2021-08-20 | 10:10:01 | 1 | 4
2 | 2021-08-20 | 10:10:01 | 2 | 5
3 | 2021-08-20 | 10:10:01 | 3 | 6
1 | 2021-08-20 | 10:10:02 | 7 | 0
2 | 2021-08-20 | 10:10:02 | 8 | 1
3 | 2021-08-20 | 10:10:02 | 9 | 2
(6 rows)

Can window function LAG reference the column which value is being calculated?

I need to calculate value of some column X based on some other columns of the current record and the value of X for the previous record (using some partition and order). Basically I need to implement query in the form
SELECT <some fields>,
<some expression using LAG(X) OVER(PARTITION BY ... ORDER BY ...) AS X
FROM <table>
This is not possible because only existing columns can be used in window function so I'm looking way how to overcome this.
Here is an example. I have a table with events. Each event has type and time_stamp.
create table event (id serial, type integer, time_stamp integer);
I wan't to find "duplicate" events (to skip them). By duplicate I mean the following. Let's order all events for given type by time_stamp ascending. Then
the first event is not a duplicate
all events that follow non duplicate and are within some time frame after it (that is their time_stamp is not greater then time_stamp of the previous non duplicate plus some constant TIMEFRAME) are duplicates
the next event which time_stamp is greater than previous non duplicate by more than TIMEFRAME is not duplicate
and so on
For this data
insert into event (type, time_stamp)
values
(1, 1), (1, 2), (2, 2), (1,3), (1, 10), (2,10),
(1,15), (1, 21), (2,13),
(1, 40);
and TIMEFRAME=10 result should be
time_stamp | type | duplicate
-----------------------------
1 | 1 | false
2 | 1 | true
3 | 1 | true
10 | 1 | true
15 | 1 | false
21 | 1 | true
40 | 1 | false
2 | 2 | false
10 | 2 | true
13 | 2 | false
I could calculate the value of duplicate field based on current time_stamp and time_stamp of the previous non-duplicate event like this:
WITH evt AS (
SELECT
time_stamp,
CASE WHEN
time_stamp - LAG(current_non_dupl_time_stamp) OVER w >= TIMEFRAME
THEN
time_stamp
ELSE
LAG(current_non_dupl_time_stamp) OVER w
END AS current_non_dupl_time_stamp
FROM event
WINDOW w AS (PARTITION BY type ORDER BY time_stamp ASC)
)
SELECT time_stamp, time_stamp != current_non_dupl_time_stamp AS duplicate
But this does not work because the field which is calculated cannot be referenced in LAG:
ERROR: column "current_non_dupl_time_stamp" does not exist.
So the question: can I rewrite this query to achieve the effect I need?
Naive recursive chain knitter:
-- temp view to avoid nested CTE
CREATE TEMP VIEW drag AS
SELECT e.type,e.time_stamp
, ROW_NUMBER() OVER www as rn -- number the records
, FIRST_VALUE(e.time_stamp) OVER www as fst -- the "group leader"
, EXISTS (SELECT * FROM event x
WHERE x.type = e.type
AND x.time_stamp < e.time_stamp) AS is_dup
FROM event e
WINDOW www AS (PARTITION BY type ORDER BY time_stamp)
;
WITH RECURSIVE ttt AS (
SELECT d0.*
FROM drag d0 WHERE d0.is_dup = False -- only the "group leaders"
UNION ALL
SELECT d1.type, d1.time_stamp, d1.rn
, CASE WHEN d1.time_stamp - ttt.fst > 20 THEN d1.time_stamp
ELSE ttt.fst END AS fst -- new "group leader"
, CASE WHEN d1.time_stamp - ttt.fst > 20 THEN False
ELSE True END AS is_dup
FROM drag d1
JOIN ttt ON d1.type = ttt.type AND d1.rn = ttt.rn+1
)
SELECT * FROM ttt
ORDER BY type, time_stamp
;
Results:
CREATE TABLE
INSERT 0 10
CREATE VIEW
type | time_stamp | rn | fst | is_dup
------+------------+----+-----+--------
1 | 1 | 1 | 1 | f
1 | 2 | 2 | 1 | t
1 | 3 | 3 | 1 | t
1 | 10 | 4 | 1 | t
1 | 15 | 5 | 1 | t
1 | 21 | 6 | 1 | t
1 | 40 | 7 | 40 | f
2 | 2 | 1 | 2 | f
2 | 10 | 2 | 2 | t
2 | 13 | 3 | 2 | t
(10 rows)
An alternative to a recursive approach is a custom aggregate. Once you master the technique of writing your own aggregates, creating transition and final functions is easy and logical.
State transition function:
create or replace function is_duplicate(st int[], time_stamp int, timeframe int)
returns int[] language plpgsql as $$
begin
if st is null or st[1] + timeframe <= time_stamp
then
st[1] := time_stamp;
end if;
st[2] := time_stamp;
return st;
end $$;
Final function:
create or replace function is_duplicate_final(st int[])
returns boolean language sql as $$
select st[1] <> st[2];
$$;
Aggregate:
create aggregate is_duplicate_agg(time_stamp int, timeframe int)
(
sfunc = is_duplicate,
stype = int[],
finalfunc = is_duplicate_final
);
Query:
select *, is_duplicate_agg(time_stamp, 10) over w
from event
window w as (partition by type order by time_stamp asc)
order by type, time_stamp;
id | type | time_stamp | is_duplicate_agg
----+------+------------+------------------
1 | 1 | 1 | f
2 | 1 | 2 | t
4 | 1 | 3 | t
5 | 1 | 10 | t
7 | 1 | 15 | f
8 | 1 | 21 | t
10 | 1 | 40 | f
3 | 2 | 2 | f
6 | 2 | 10 | t
9 | 2 | 13 | f
(10 rows)
Read in the documentation: 37.10. User-defined Aggregates and CREATE AGGREGATE.
This feels more like a recursive problem than windowing function. The following query obtained the desired results:
WITH RECURSIVE base(type, time_stamp) AS (
-- 3. base of recursive query
SELECT x.type, x.time_stamp, y.next_time_stamp
FROM
-- 1. start with the initial records of each type
( SELECT type, min(time_stamp) AS time_stamp
FROM event
GROUP BY type
) x
LEFT JOIN LATERAL
-- 2. for each of the initial records, find the next TIMEFRAME (10) in the future
( SELECT MIN(time_stamp) next_time_stamp
FROM event
WHERE type = x.type
AND time_stamp > (x.time_stamp + 10)
) y ON true
UNION ALL
-- 4. recursive join, same logic as base
SELECT e.type, e.time_stamp, z.next_time_stamp
FROM event e
JOIN base b ON (e.type = b.type AND e.time_stamp = b.next_time_stamp)
LEFT JOIN LATERAL
( SELECT MIN(time_stamp) next_time_stamp
FROM event
WHERE type = e.type
AND time_stamp > (e.time_stamp + 10)
) z ON true
)
-- The actual query:
-- 5a. All records from base are not duplicates
SELECT time_stamp, type, false
FROM base
UNION
-- 5b. All records from event that are not in base are duplicates
SELECT time_stamp, type, true
FROM event
WHERE (type, time_stamp) NOT IN (SELECT type, time_stamp FROM base)
ORDER BY type, time_stamp
There are a lot of caveats with this. It assumes no duplicate time_stamp for a given type. Really the joins should be based on a unique id rather than type and time_stamp. I didn't test this much, but it may at least suggest an approach.
This is my first time to try a LATERAL join. So there may be a way to simplify that moe. Really what I wanted to do was a recursive CTE with the recursive part using MIN(time_stamp) based on time_stamp > (x.time_stamp + 10), but aggregate functions are not allowed in CTEs in that manner. But it seems the lateral join can be used in the CTE.

Renumbering a column in postgresql based on sorted values in that column

Edit: I am using postgresql v8.3
I have a table that contains a column we can call column A.
Column A is populated, for our purposes, with arbitrary positive integers.
I want to renumber column A from 1 to N based on ordering the records of the table by column A ascending. (SELECT * FROM table ORDER BY A ASC;)
Is there a simple way to accomplish this without the need of building a postgresql function?
Example:
(Before:
A: 3,10,20,100,487,1,6)
(After:
A: 2,4,5,6,7,1,3)
Use the rank() (or dense_rank() ) WINDOW-functions (available since PG-8.4):
create table aaa
( id serial not null primary key
, num integer not null
, rnk integer not null default 0
);
insert into aaa(num) values( 3) , (10) , (20) , (100) , (487) , (1) , (6)
;
UPDATE aaa
SET rnk = w.rnk
FROM (
SELECT id
, rank() OVER (order by num ASC) AS rnk
FROM aaa
) w
WHERE w.id = aaa.id;
SELECT * FROM aaa
ORDER BY id
;
Results:
CREATE TABLE
INSERT 0 7
UPDATE 7
id | num | rnk
----+-----+-----
1 | 3 | 2
2 | 10 | 4
3 | 20 | 5
4 | 100 | 6
5 | 487 | 7
6 | 1 | 1
7 | 6 | 3
(7 rows)
IF window functions are not available, you could still count the number of rows before any row:
UPDATE aaa
SET rnk = w.rnk
FROM ( SELECT a0.id AS id
, COUNT(*) AS rnk
FROM aaa a0
JOIN aaa a1 ON a1.num <= a0.num
GROUP BY a0.id
) w
WHERE w.id = aaa.id;
SELECT * FROM aaa
ORDER BY id
;
Or the same with a scalar subquery:
UPDATE aaa a0
SET rnk =
( SELECT COUNT(*)
FROM aaa a1
WHERE a1.num <= a0.num
)
;