Postgresql: Select value difference between values in Integer column - postgresql

My question is simple. Say I have the following column:
order_in_group
integer
------
1
2
3
5
6
9
I would like the query result to be the difference between the current and next values which is bigger then 1:
value1 value2 difference
integer integer integer
------- ------- -------
3 5 2
6 9 3
Any help will be great.

Try this:
with q(i) as (
select unnest(array[1,2,3,5,6,9])
)
select prev, curr, curr- prev diff
from (
select i curr, lag(i) over (order by i) prev
from q
) s
where curr > prev+ 1;
prev | curr | diff
------+------+------
3 | 5 | 2
6 | 9 | 3
(2 rows)

You should be able to just use LAG to get the previous row to compare with;
WITH cte AS (
SELECT order_in_group value2,
LAG(order_in_group) OVER (ORDER BY order_in_group) value1
FROM mytable
)
SELECT value1, value2, value2-value1 difference
FROM cte
WHERE value2-value1 > 1;

Related

How to get the result of table contains numeric and strings using where condition in postgres

I have a table like below
When I select item_no>'1623G' from above table
I want to print the result below
1623H | 1623I | 1666 | 1674 | 1912 | 1952 | 1953
I am trying below command
select * from t where substring(item_no,'([0-9]+)') :: int > 1623G
But it's not giving result
please help
I would go the regexp way:
demo:db<>fiddle
WITH cte AS (
SELECT
item_no,
regexp_replace(item_no, '\D', '', 'g')::int AS digit,
regexp_replace(item_no, '\d', '', 'g') AS nondigit,
regexp_replace('200a', '\D', '', 'g')::int AS compare_digit,
regexp_replace('200a', '\d', '', 'g') AS compare_nondigit
FROM t
)
SELECT
item_no
FROM
cte
WHERE
(digit > compare_digit) OR (digit = compare_digit AND nondigit > compare_nondigit)
Splitting both values (the row value and the comparing one) into its both parts (digits and non-digits) and compare each part separately.
I am curious if there is better solution.
You can use CONVERT_TO as:
testdb1=# CREATE TABLE t (item_no varchar(20));
CREATE TABLE
testdb1=# INSERT INTO t VALUES('2'),('20'),('200'),('200a'),('200b'),('200c'),('2000');
INSERT 0 7
testdb1=# SELECT * FROM t;
item_no
---------
2
20
200
200a
200b
200c
2000
(7 rows)
testdb1=# select * from t where substring(convert_to(item_no,'SQL_ASCII')::text,3)::int > substring(convert_to('2a','SQL_ASCII')::text,3)::int;
item_no
---------
200
200a
200b
200c
2000
(5 rows)
testdb1=# select * from t where substring(convert_to(item_no,'SQL_ASCII')::text,3)::int > substring(convert_to('150','SQL_ASCII')::text,3)::int;
item_no
---------
200
200a
200b
200c
2000
(5 rows)

PostgreSQL - How to get the previous(lag) calculated value

I would like to get the previous(lag) calculated value?
id | value
-------|-------
1 | 1
2 | 3
3 | 5
4 | 7
5 | 9
What I am trying to achieve is this:
id | value | new value
-------|-------|-----------
1 | 1 | 10 <-- 1 * lag(new_value)
2 | 3 | 30 <-- 3 * lag(new_value)
3 | 5 | 150 <-- 5 * lag(new_value)
4 | 7 | 1050 <-- 7 * lag(new_value)
5 | 9 | 9450 <-- 9 * lag(new_value)
What I have tried:
SELECT value,
COALESCE(lag(new_value) OVER () * value, 10) AS new_value
FROM table
Error:
ERROR: column "new_value" does not exist
Similar to Juan's answer but I thought I'd post it anyway. It at least avoids the need for the ID column and doesn't have the empty row at the end:
with recursive all_data as (
select value, value * 10 as new_value
from data
where value = 1
union all
select c.value,
c.value * p.new_value
from data c
join all_data p on p.value < c.value
where c.value = (select min(d.value)
from data d
where d.value > p.value)
)
select *
from all_data
order by value;
The idea is to join exactly one row in the recursive part to exactly one "parent" row. While the "exactly one parent" can be done with a derived table and a lateral join (which surprisingly does allow the limit). The "exactly one row" from the "child" in the recursive part can unfortunately only be done using the sub-select with a min().
The where c.value= (...) wouldn't be necessary if it was possible to use an order by and limit in the recursive part as well, but unfortunately that is not supported in the current Postgres version.
Online example: http://rextester.com/WFBVM53545
My bad, this isnt that easy as I thought. Got a very close result but still need some tunning.
DEMO
WITH RECURSIVE t(n, v) AS (
SELECT MIN(value), 10
FROM Table1
UNION ALL
SELECT (SELECT min(value) from Table1 WHERE value > n),
(SELECT min(value) from Table1 WHERE value > n) * v
FROM t
JOIN Table1 on t.n = Table1.value
)
SELECT n, v
FROM t;

Can window function LAG reference the column which value is being calculated?

I need to calculate value of some column X based on some other columns of the current record and the value of X for the previous record (using some partition and order). Basically I need to implement query in the form
SELECT <some fields>,
<some expression using LAG(X) OVER(PARTITION BY ... ORDER BY ...) AS X
FROM <table>
This is not possible because only existing columns can be used in window function so I'm looking way how to overcome this.
Here is an example. I have a table with events. Each event has type and time_stamp.
create table event (id serial, type integer, time_stamp integer);
I wan't to find "duplicate" events (to skip them). By duplicate I mean the following. Let's order all events for given type by time_stamp ascending. Then
the first event is not a duplicate
all events that follow non duplicate and are within some time frame after it (that is their time_stamp is not greater then time_stamp of the previous non duplicate plus some constant TIMEFRAME) are duplicates
the next event which time_stamp is greater than previous non duplicate by more than TIMEFRAME is not duplicate
and so on
For this data
insert into event (type, time_stamp)
values
(1, 1), (1, 2), (2, 2), (1,3), (1, 10), (2,10),
(1,15), (1, 21), (2,13),
(1, 40);
and TIMEFRAME=10 result should be
time_stamp | type | duplicate
-----------------------------
1 | 1 | false
2 | 1 | true
3 | 1 | true
10 | 1 | true
15 | 1 | false
21 | 1 | true
40 | 1 | false
2 | 2 | false
10 | 2 | true
13 | 2 | false
I could calculate the value of duplicate field based on current time_stamp and time_stamp of the previous non-duplicate event like this:
WITH evt AS (
SELECT
time_stamp,
CASE WHEN
time_stamp - LAG(current_non_dupl_time_stamp) OVER w >= TIMEFRAME
THEN
time_stamp
ELSE
LAG(current_non_dupl_time_stamp) OVER w
END AS current_non_dupl_time_stamp
FROM event
WINDOW w AS (PARTITION BY type ORDER BY time_stamp ASC)
)
SELECT time_stamp, time_stamp != current_non_dupl_time_stamp AS duplicate
But this does not work because the field which is calculated cannot be referenced in LAG:
ERROR: column "current_non_dupl_time_stamp" does not exist.
So the question: can I rewrite this query to achieve the effect I need?
Naive recursive chain knitter:
-- temp view to avoid nested CTE
CREATE TEMP VIEW drag AS
SELECT e.type,e.time_stamp
, ROW_NUMBER() OVER www as rn -- number the records
, FIRST_VALUE(e.time_stamp) OVER www as fst -- the "group leader"
, EXISTS (SELECT * FROM event x
WHERE x.type = e.type
AND x.time_stamp < e.time_stamp) AS is_dup
FROM event e
WINDOW www AS (PARTITION BY type ORDER BY time_stamp)
;
WITH RECURSIVE ttt AS (
SELECT d0.*
FROM drag d0 WHERE d0.is_dup = False -- only the "group leaders"
UNION ALL
SELECT d1.type, d1.time_stamp, d1.rn
, CASE WHEN d1.time_stamp - ttt.fst > 20 THEN d1.time_stamp
ELSE ttt.fst END AS fst -- new "group leader"
, CASE WHEN d1.time_stamp - ttt.fst > 20 THEN False
ELSE True END AS is_dup
FROM drag d1
JOIN ttt ON d1.type = ttt.type AND d1.rn = ttt.rn+1
)
SELECT * FROM ttt
ORDER BY type, time_stamp
;
Results:
CREATE TABLE
INSERT 0 10
CREATE VIEW
type | time_stamp | rn | fst | is_dup
------+------------+----+-----+--------
1 | 1 | 1 | 1 | f
1 | 2 | 2 | 1 | t
1 | 3 | 3 | 1 | t
1 | 10 | 4 | 1 | t
1 | 15 | 5 | 1 | t
1 | 21 | 6 | 1 | t
1 | 40 | 7 | 40 | f
2 | 2 | 1 | 2 | f
2 | 10 | 2 | 2 | t
2 | 13 | 3 | 2 | t
(10 rows)
An alternative to a recursive approach is a custom aggregate. Once you master the technique of writing your own aggregates, creating transition and final functions is easy and logical.
State transition function:
create or replace function is_duplicate(st int[], time_stamp int, timeframe int)
returns int[] language plpgsql as $$
begin
if st is null or st[1] + timeframe <= time_stamp
then
st[1] := time_stamp;
end if;
st[2] := time_stamp;
return st;
end $$;
Final function:
create or replace function is_duplicate_final(st int[])
returns boolean language sql as $$
select st[1] <> st[2];
$$;
Aggregate:
create aggregate is_duplicate_agg(time_stamp int, timeframe int)
(
sfunc = is_duplicate,
stype = int[],
finalfunc = is_duplicate_final
);
Query:
select *, is_duplicate_agg(time_stamp, 10) over w
from event
window w as (partition by type order by time_stamp asc)
order by type, time_stamp;
id | type | time_stamp | is_duplicate_agg
----+------+------------+------------------
1 | 1 | 1 | f
2 | 1 | 2 | t
4 | 1 | 3 | t
5 | 1 | 10 | t
7 | 1 | 15 | f
8 | 1 | 21 | t
10 | 1 | 40 | f
3 | 2 | 2 | f
6 | 2 | 10 | t
9 | 2 | 13 | f
(10 rows)
Read in the documentation: 37.10. User-defined Aggregates and CREATE AGGREGATE.
This feels more like a recursive problem than windowing function. The following query obtained the desired results:
WITH RECURSIVE base(type, time_stamp) AS (
-- 3. base of recursive query
SELECT x.type, x.time_stamp, y.next_time_stamp
FROM
-- 1. start with the initial records of each type
( SELECT type, min(time_stamp) AS time_stamp
FROM event
GROUP BY type
) x
LEFT JOIN LATERAL
-- 2. for each of the initial records, find the next TIMEFRAME (10) in the future
( SELECT MIN(time_stamp) next_time_stamp
FROM event
WHERE type = x.type
AND time_stamp > (x.time_stamp + 10)
) y ON true
UNION ALL
-- 4. recursive join, same logic as base
SELECT e.type, e.time_stamp, z.next_time_stamp
FROM event e
JOIN base b ON (e.type = b.type AND e.time_stamp = b.next_time_stamp)
LEFT JOIN LATERAL
( SELECT MIN(time_stamp) next_time_stamp
FROM event
WHERE type = e.type
AND time_stamp > (e.time_stamp + 10)
) z ON true
)
-- The actual query:
-- 5a. All records from base are not duplicates
SELECT time_stamp, type, false
FROM base
UNION
-- 5b. All records from event that are not in base are duplicates
SELECT time_stamp, type, true
FROM event
WHERE (type, time_stamp) NOT IN (SELECT type, time_stamp FROM base)
ORDER BY type, time_stamp
There are a lot of caveats with this. It assumes no duplicate time_stamp for a given type. Really the joins should be based on a unique id rather than type and time_stamp. I didn't test this much, but it may at least suggest an approach.
This is my first time to try a LATERAL join. So there may be a way to simplify that moe. Really what I wanted to do was a recursive CTE with the recursive part using MIN(time_stamp) based on time_stamp > (x.time_stamp + 10), but aggregate functions are not allowed in CTEs in that manner. But it seems the lateral join can be used in the CTE.

Renumbering a column in postgresql based on sorted values in that column

Edit: I am using postgresql v8.3
I have a table that contains a column we can call column A.
Column A is populated, for our purposes, with arbitrary positive integers.
I want to renumber column A from 1 to N based on ordering the records of the table by column A ascending. (SELECT * FROM table ORDER BY A ASC;)
Is there a simple way to accomplish this without the need of building a postgresql function?
Example:
(Before:
A: 3,10,20,100,487,1,6)
(After:
A: 2,4,5,6,7,1,3)
Use the rank() (or dense_rank() ) WINDOW-functions (available since PG-8.4):
create table aaa
( id serial not null primary key
, num integer not null
, rnk integer not null default 0
);
insert into aaa(num) values( 3) , (10) , (20) , (100) , (487) , (1) , (6)
;
UPDATE aaa
SET rnk = w.rnk
FROM (
SELECT id
, rank() OVER (order by num ASC) AS rnk
FROM aaa
) w
WHERE w.id = aaa.id;
SELECT * FROM aaa
ORDER BY id
;
Results:
CREATE TABLE
INSERT 0 7
UPDATE 7
id | num | rnk
----+-----+-----
1 | 3 | 2
2 | 10 | 4
3 | 20 | 5
4 | 100 | 6
5 | 487 | 7
6 | 1 | 1
7 | 6 | 3
(7 rows)
IF window functions are not available, you could still count the number of rows before any row:
UPDATE aaa
SET rnk = w.rnk
FROM ( SELECT a0.id AS id
, COUNT(*) AS rnk
FROM aaa a0
JOIN aaa a1 ON a1.num <= a0.num
GROUP BY a0.id
) w
WHERE w.id = aaa.id;
SELECT * FROM aaa
ORDER BY id
;
Or the same with a scalar subquery:
UPDATE aaa a0
SET rnk =
( SELECT COUNT(*)
FROM aaa a1
WHERE a1.num <= a0.num
)
;

How to compute the sum of multiple columns in PostgreSQL

I would like to know if there's a way to compute the sum of multiple columns in PostgreSQL.
I have a table with more than 80 columns and I have to write a query that adds each value from each column.
I tried with SUM(col1, col2, col3 etc) but it didn't work.
SELECT COALESCE(col1,0) + COALESCE(col2,0)
FROM yourtable
It depends on how you'd like to sum the values. If I read your question correctly, you are looking for the second SELECT from this example:
template1=# SELECT * FROM yourtable ;
a | b
---+---
1 | 2
4 | 5
(2 rows)
template1=# SELECT a + b FROM yourtable ;
?column?
----------
3
9
(2 rows)
template1=# SELECT SUM( a ), SUM( b ) FROM yourtable ;
sum | sum
-----+-----
5 | 7
(1 row)
template1=# SELECT SUM( a + b ) FROM yourtable ;
sum
-----
12
(1 row)
template1=#
Combined the current answers and used this to get total SUM:
SELECT SUM(COALESCE(col1,0) + COALESCE(col2,0)) FROM yourtable;
SELECT(
SELECT SUM(t.f)
FROM (VALUES (yourtable.col1), (yourtable.col2), (yourtable.col3)) t(f)
)
FROM yourtable;