Unnest a column - postgresql

I have some sql which has some hard coded values which I am trying to replace with values from a database column
with pre as (
with a(k, v) as (select id, my_column from mytable),
col(s, n) as (select * from unnest(array['Title', 'First', 'Middle', 'Last']) with ordinality c (s, n))
I'm trying to replace the unnest sql with something like this:
select unnest(string_to_array(my_column, ':')) as elements from mytable
mycolumn contents vary in length but an example could be title=aaa:first=bbb:middle=ccc:last=ddd
Thanks

I base my answer on your previous post. Here is an example of how you can join unnest(string_to_array(my_column, ':')) with named columns on ordinality:
t=# with a as (select id,k,v from my_table, unnest(string_to_array(my_column, ':')) with ordinality as t(k,v))
, col(s, n) as (select * from unnest(array['Title', 'First', 'Middle', 'Last']) with ordinality c (s, n))
select * from a join col on n=v;
id | k | v | s | n
----+------------+---+--------+---
1 | title=aaa | 1 | Title | 1
1 | first=bbb | 2 | First | 2
1 | middle=ccc | 3 | Middle | 3
1 | last=ddd | 4 | Last | 4
(4 rows)
Of course you will have to join differently (based on your previous post). But if the unclear part was how you select from table instead of select from values, the example above should help.
update
put values to table:
t=# create table keys(t text);
CREATE TABLE
Time: 91.908 ms
t=# insert into keys select unnest(array['Title', 'First', 'Middle', 'Last']);
INSERT 0 4
Time: 11.552 ms
t=# select * from keys ;
t
--------
Title
First
Middle
Last
(4 rows)
and now join against keys table:
t=# with a as (select id,k,v from my_table, unnest(string_to_array(my_column, ':')) with ordinality as t(k,v))
select * from a join keys on split_part(k,'=',1) = lower(t);
id | k | v | t
----+------------+---+--------
1 | first=bbb | 2 | First
1 | last=ddd | 4 | Last
1 | middle=ccc | 3 | Middle
1 | title=aaa | 1 | Title
(4 rows)

Related

Get different LIMIT on each group on postgresql rank

To get 2 rows from each group I can use ROW_NUMBER() with condition <= 2 at last but my question is what If I want to get different limits on each group e.g 3 rows for section_id 1, 1 rows for 2 and 1 rows for 3?
Given the following table:
db=# SELECT * FROM xxx;
id | section_id | name
----+------------+------
1 | 1 | A
2 | 1 | B
3 | 1 | C
4 | 1 | D
5 | 2 | E
6 | 2 | F
7 | 3 | G
8 | 2 | H
(8 rows)
I get the first 2 rows (ordered by name) for each section_id, i.e. a result similar to:
id | section_id | name
----+------------+------
1 | 1 | A
2 | 1 | B
5 | 2 | E
6 | 2 | F
7 | 3 | G
(5 rows)
Current Query:
SELECT
*
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY section_id ORDER BY name) AS r,
t.*
FROM
xxx t) x
WHERE
x.r <= 2;
Create a table to contain the section limits, then join. The big advantage being that as new sections are required or limits change maintenance is reduced to a single table update and comes at very little cost. See example.
select s.section_id, s.name
from (select section_id, name
, row_number() over (partition by section_id order by name) rn
from sections
) s
left join section_limits sl on (sl.section_id = s.section_id)
where
s.rn <= coalesce(sl.limit_to,2);
Just fix up your where clause:
with numbered as (
select row_number() over (partition by section_id
order by name) as r,
t.*
from xxx t
)
select *
from numbered
where (section_id = 1 and r <= 3)
or (section_id = 2 and r <= 1)
or (section_id = 3 and r <= 1);

How to merge rows on a table and update junction table on postgres

Consider 2 tables (table A and table B) with a many-to-many relationship, each containing a primary key and other attributes. To map this relation there's a third joint table (table C) containing the foreign keys for each table of the relation ( fk_tableA | fk_tableB ).
Table B contains duplicate rows (except for the pk), so I want to merge these together into a single record with whatever unique primary key, just like so:
table B table B (after merging duplicates)
1 | Henry | 100.0 1 | Henry | 100.0
2 | Jessi | 97.0 2 | Jessi | 97.0
3 | Henry | 100.0 4 | Erica | 11.2
4 | Erica | 11.2
By merging these records, there may be foreign keys of table C (joint table) pointing to primary keys of table B that no longer exist. My goal is to edit them to point to the merged record:
Before merging:
tableA table B table C
id | att1 id | att1 | att2 fk_A | fk_b
----------- ------------------- ------------
1 | ab123 1 | Henry | 100.0 1 | 1
2 | adawd 2 | Jessi | 97.0 2 | 3
3 | da3wf 3 | Henry | 100.0
4 | Erica | 11.2
On table C, 2 records from table B are referenced (1 and 3) which happen to be duplicated rows. My goal is to merge those into a single record (in table B) and update the foreign key in table C:
After merging:
tableA table B table C
id | att1 id | att1 | att2 fk_A | fk_b
----------- ------------------- ------------
1 | ab123 1 | Henry | 100.0 1 | 1
2 | adawd 2 | Jessi | 97.0 2 | 1
3 | da3wf 4 | Erica | 11.2
- Note that id=3 was merged/deleted from table B and the same id
was updated on table C to point to the merged record's id.
So my question is basically how to update a junction table upon merging records of a table? I am currently using Postgres and working on millions of data.
-- \i tmp.sql
CREATE TABLE persons
( id integer primary key
, name text
, weight decimal(4,1)
);
INSERT INTO persons(id,name,weight)VALUES
(1 ,'Henry', 100.0)
,(2 ,'Jessi', 97.0)
,(3 ,'Henry', 100.0)
,(4 ,'Erica', 11.)
;
CREATE TABLE junctiontab
( fk_A integer NOT NULL
, p_id integer REFERENCES persons(id)
, PRIMARY KEY (fk_A,p_id)
);
INSERT INTO junctiontab(fk_A, p_id)VALUES (1 , 1 ),(2 , 3 );
-- find the ids of affected persons.
-- [for simplicity: put them in a temp table]
CREATE TEMP table xlat AS
SELECT * FROM(
SELECt id AS wrong_id
,min(id) OVER (PARTITION BY name ORDER BY id) AS good_id
FROM persons p
) x
WHERE good_id <> wrong_id
;
--show it
SELECT *FROM xlat;
UPDATE junctiontab j
SET p_id = x.good_id
FROM xlat x
WHERE j.p_id = x.wrong_id
-- The good junction-entry *could* already exist...
AND NOT EXISTS (
SELECT *FROM junctiontab nx
WHERE nx.fk_A= j.fk_A
AND nx.p_id= x.good_id
)
;
DELETE FROM junctiontab d
-- if the good junction-entry already existed, we can delete the wrong one now.
WHERE EXISTS (
SELECT *FROM junctiontab g
JOIN xlat x ON g.p_id= x.good_id
AND d.p_id = x.wrong_id
WHERE g.fk_A= d.fk_A
)
;
--show it
SELECT *FROM junctiontab
;
-- Delete thewrongperson-records
DELETE FROM persons p
WHERE EXISTS (
SELECT *FROM xlat x
WHERE p.id = x.wrong_id
);
--show it
SELECT * FROM persons p;
Result:
DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
INSERT 0 4
CREATE TABLE
INSERT 0 2
SELECT 1
wrong_id | good_id
----------+---------
3 | 1
(1 row)
UPDATE 1
DELETE 0
fk_a | p_id
------+------
1 | 1
2 | 1
(2 rows)
DELETE 1
id | name | weight
----+-------+--------
1 | Henry | 100.0
2 | Jessi | 97.0
4 | Erica | 11.0
(3 rows)

How to calculate the nearest neighbor distance for 10000 points in a table

I am using PostgreSQL and I am using PostGIS extension.
I am able to compare one point with this query:
SELECT st_distance(geom, 'SRID=4326;POINT(12.601828337172 50.5173393068512)'::geometry) as d
FROM pointst1
ORDER BY d
but I want to compare not to one fixed point but to a column of points. And I want to do this with some sort of indexing so that it is computationally cheap and not 10000x10000 like a cross join within that table.
Create table:
create table pointst1
(
id integer not null
constraint pointst1_id_pk
primary key,
geom geometry(Point, 4325)
);
create unique index pointst1_id_uindex
on pointst1 (id);
create index geomidx
on pointst1 (geom);
Edit:
Refined query (comparing 10000 points with their nearest neighbor but getting the result of the point itself which is 0 and not the next nearest point:
select points.*,
p1.id as p1_id,
ST_Distance(geography(p1.geom), geography(points.geom)) as distance
from
(select distinct on(p2.geom)*
from pointst1 p2
where p2.id is not null) as points
cross join lateral
(select id, geom
from pointst1
order by points.geom <-> geom
limit 1) as p1;
Your query is already calculating the distance from the given geometry to all records in the table pointst1.
Considering these values ..
INSERT INTO pointst1 VALUES (1,'SRID=4326;POINT(16.19 48.21)'),
(2,'SRID=4326;POINT(18.96 47.50)'),
(3,'SRID=4326;POINT(13.47 52.52)'),
(4,'SRID=4326;POINT(-3.70 40.39)');
... if you run your query, it will already calculate the distance from all points in the table:
SELECT ST_Distance(geom, 'SRID=4326;POINT(12.6018 50.5173)'::geometry) as d
FROM pointst1
ORDER BY d
d
------------------
2.1827914536208
4.26600662563949
7.03781262396208
19.1914274750473
(4 Zeilen)
Change your index to GIST, which is the most suitable for geometry data:
create index geomidx on pointst1 using GIST (geom);
Just note that an index won't speed up this query of yours, since you're doing a full scan. But as soon as you start playing more in the where clause, you might see some improvement.
EDIT:
WITH j AS (SELECT id AS id2, geom AS geom2 FROM pointst1)
SELECT id,j.id2,ST_Distance(geom, j.geom2) AS d
FROM pointst1,j
WHERE id <> j.id2
ORDER BY id,id2
id | id2 | d
----+-----+------------------
1 | 2 | 2.85954541841881
1 | 3 | 5.0965184194703
1 | 4 | 21.3720495039666
2 | 1 | 2.85954541841881
2 | 3 | 7.43911957156222
2 | 4 | 23.7492673571207
3 | 1 | 5.0965184194703
3 | 2 | 7.43911957156222
3 | 4 | 21.0225069865609
4 | 1 | 21.3720495039666
4 | 2 | 23.7492673571207
4 | 3 | 21.0225069865609
(12 rows)
Removing duplicate distances:
SELECT DISTINCT ON(d) * FROM (
WITH j AS (SELECT id AS id2, geom AS geom2 FROM pointst1)
SELECT id,j.id2,ST_Distance(geom, j.geom2) AS d
FROM pointst1,j
WHERE id <> j.id2
ORDER BY id,id2) AS j
id | id2 | d
----+-----+------------------
1 | 2 | 2.85954541841881
3 | 1 | 5.0965184194703
3 | 2 | 7.43911957156222
4 | 3 | 21.0225069865609
4 | 1 | 21.3720495039666
2 | 4 | 23.7492673571207
(6 rows)

Hierarchy trees in database and web app

I want to create web app which will use tree data structures. Users will be able to create, update and delete trees. I have the following table in PostgreSQL called nodes in database:
id INTEGER PRIMARY KEY,
name VARCHAR(50) NOT NULL UNIQUE,
parent_id INTEGER NULL REFERENCE nodes(id)
Getting data
I want to get data in the following form:
id | name | children
---|------|--------------
1 | a | [2,3]
2 | b | []
3 | c | [4]
4 | d | []
I created query which returns data in form
id | name | parent_id
---|------|--------------
1 | a |
2 | b | 1
3 | c | 1
4 | d | 3
And here is code:
WITH RECURSIVE nodes_cte(id, name, parent_id, level) AS (
SELECT nodes.id, nodes.name, nodes.parent_id, 0 AS level
FROM nodes
WHERE name = 'a'
UNION ALL
SELECT nodes.id, nodes.name, nodes.parent_id, level+1
FROM nodes
JOIN nodes_cte
ON nodes_cte.id = nodes.parent_id
)
SELECT * FROM nodes_cte;
Can I change SQL code to get what I want or should I do that in app??
Inserting data
I want to know what are the ways to insert data into the table. I think that following approach will work for me:
create sequence in database
increase sequence for number of elements in tree
manually compute ids in app and insert elements in the table
Are there better ways?
CREATE TABLE nodes
( id INTEGER PRIMARY KEY
, name VARCHAR(50) NOT NULL UNIQUE
, parent_id INTEGER NULL REFERENCES nodes(id)
);
-- I created query which returns data in form
INSERT INTO nodes(id,name,parent_id)VALUES
( 1 , 'a' , NULL)
,( 2 , 'b' , 1)
,( 3 , 'c' , 1)
,( 4 , 'd' , 3)
;
SELECT p.id, p.name
, array_agg(c.id) AS children
FROM nodes p
LEFT JOIN nodes c ON c.parent_id = p.id
GROUP BY p.id, p.name
;
Result:
id | name | children
----+------+----------
1 | a | {2,3}
2 | b | {NULL}
3 | c | {4}
4 | d | {NULL}
(4 rows)
Extra: using generate_series() to insert a bunch of records. Each record having id/3 as parent, (except when zero).
INSERT INTO nodes(id,name,parent_id)
SELECT gs, 'zzz_'|| gs::text, NULLIF(gs/3 , 0)
FROM generate_series ( 5,25) gs
;
INSERTING/UPDATING DATA
Normally, your front-end should not mess with sequences, but leave that to the DBMS. You already have a UNIQUE constraint on name, because it is a natural key . So, your front-end should use that key to address rows in the nodes table, like in:
CREATE TABLE nodes2
( id SERIAL NOT NULL PRIMARY KEY
, name VARCHAR(50) NOT NULL UNIQUE
, parent_id INTEGER NULL REFERENCES nodes(id)
);
INSERT INTO nodes2(name,parent_id)
SELECT 'Omg_'|| gs::text, NULLIF(gs/3 , 0)
FROM generate_series ( 1,15) gs
;
PREPARE upd (text, text) AS
-- child, parent
UPDATE nodes2 c
SET parent_id = p.id
FROM nodes2 p
WHERE p.name = $2 -- parent
AND c.name = $1 -- child
;
EXECUTE upd( 'Omg_12', 'Omg_11');
EXECUTE upd( 'Omg_15', 'Omg_11');
Result:
CREATE TABLE
INSERT 0 15
PREPARE
UPDATE 1
UPDATE 1
id | name | children
----+--------+-----------
1 | Omg_1 | {3,4,5}
2 | Omg_2 | {6,7,8}
3 | Omg_3 | {9,10,11}
4 | Omg_4 | {13,14}
5 | Omg_5 | {NULL}
6 | Omg_6 | {NULL}
7 | Omg_7 | {NULL}
8 | Omg_8 | {NULL}
9 | Omg_9 | {NULL}
10 | Omg_10 | {NULL}
11 | Omg_11 | {15,12}
12 | Omg_12 | {NULL}
13 | Omg_13 | {NULL}
14 | Omg_14 | {NULL}
15 | Omg_15 | {NULL}
(15 rows)

Can window function LAG reference the column which value is being calculated?

I need to calculate value of some column X based on some other columns of the current record and the value of X for the previous record (using some partition and order). Basically I need to implement query in the form
SELECT <some fields>,
<some expression using LAG(X) OVER(PARTITION BY ... ORDER BY ...) AS X
FROM <table>
This is not possible because only existing columns can be used in window function so I'm looking way how to overcome this.
Here is an example. I have a table with events. Each event has type and time_stamp.
create table event (id serial, type integer, time_stamp integer);
I wan't to find "duplicate" events (to skip them). By duplicate I mean the following. Let's order all events for given type by time_stamp ascending. Then
the first event is not a duplicate
all events that follow non duplicate and are within some time frame after it (that is their time_stamp is not greater then time_stamp of the previous non duplicate plus some constant TIMEFRAME) are duplicates
the next event which time_stamp is greater than previous non duplicate by more than TIMEFRAME is not duplicate
and so on
For this data
insert into event (type, time_stamp)
values
(1, 1), (1, 2), (2, 2), (1,3), (1, 10), (2,10),
(1,15), (1, 21), (2,13),
(1, 40);
and TIMEFRAME=10 result should be
time_stamp | type | duplicate
-----------------------------
1 | 1 | false
2 | 1 | true
3 | 1 | true
10 | 1 | true
15 | 1 | false
21 | 1 | true
40 | 1 | false
2 | 2 | false
10 | 2 | true
13 | 2 | false
I could calculate the value of duplicate field based on current time_stamp and time_stamp of the previous non-duplicate event like this:
WITH evt AS (
SELECT
time_stamp,
CASE WHEN
time_stamp - LAG(current_non_dupl_time_stamp) OVER w >= TIMEFRAME
THEN
time_stamp
ELSE
LAG(current_non_dupl_time_stamp) OVER w
END AS current_non_dupl_time_stamp
FROM event
WINDOW w AS (PARTITION BY type ORDER BY time_stamp ASC)
)
SELECT time_stamp, time_stamp != current_non_dupl_time_stamp AS duplicate
But this does not work because the field which is calculated cannot be referenced in LAG:
ERROR: column "current_non_dupl_time_stamp" does not exist.
So the question: can I rewrite this query to achieve the effect I need?
Naive recursive chain knitter:
-- temp view to avoid nested CTE
CREATE TEMP VIEW drag AS
SELECT e.type,e.time_stamp
, ROW_NUMBER() OVER www as rn -- number the records
, FIRST_VALUE(e.time_stamp) OVER www as fst -- the "group leader"
, EXISTS (SELECT * FROM event x
WHERE x.type = e.type
AND x.time_stamp < e.time_stamp) AS is_dup
FROM event e
WINDOW www AS (PARTITION BY type ORDER BY time_stamp)
;
WITH RECURSIVE ttt AS (
SELECT d0.*
FROM drag d0 WHERE d0.is_dup = False -- only the "group leaders"
UNION ALL
SELECT d1.type, d1.time_stamp, d1.rn
, CASE WHEN d1.time_stamp - ttt.fst > 20 THEN d1.time_stamp
ELSE ttt.fst END AS fst -- new "group leader"
, CASE WHEN d1.time_stamp - ttt.fst > 20 THEN False
ELSE True END AS is_dup
FROM drag d1
JOIN ttt ON d1.type = ttt.type AND d1.rn = ttt.rn+1
)
SELECT * FROM ttt
ORDER BY type, time_stamp
;
Results:
CREATE TABLE
INSERT 0 10
CREATE VIEW
type | time_stamp | rn | fst | is_dup
------+------------+----+-----+--------
1 | 1 | 1 | 1 | f
1 | 2 | 2 | 1 | t
1 | 3 | 3 | 1 | t
1 | 10 | 4 | 1 | t
1 | 15 | 5 | 1 | t
1 | 21 | 6 | 1 | t
1 | 40 | 7 | 40 | f
2 | 2 | 1 | 2 | f
2 | 10 | 2 | 2 | t
2 | 13 | 3 | 2 | t
(10 rows)
An alternative to a recursive approach is a custom aggregate. Once you master the technique of writing your own aggregates, creating transition and final functions is easy and logical.
State transition function:
create or replace function is_duplicate(st int[], time_stamp int, timeframe int)
returns int[] language plpgsql as $$
begin
if st is null or st[1] + timeframe <= time_stamp
then
st[1] := time_stamp;
end if;
st[2] := time_stamp;
return st;
end $$;
Final function:
create or replace function is_duplicate_final(st int[])
returns boolean language sql as $$
select st[1] <> st[2];
$$;
Aggregate:
create aggregate is_duplicate_agg(time_stamp int, timeframe int)
(
sfunc = is_duplicate,
stype = int[],
finalfunc = is_duplicate_final
);
Query:
select *, is_duplicate_agg(time_stamp, 10) over w
from event
window w as (partition by type order by time_stamp asc)
order by type, time_stamp;
id | type | time_stamp | is_duplicate_agg
----+------+------------+------------------
1 | 1 | 1 | f
2 | 1 | 2 | t
4 | 1 | 3 | t
5 | 1 | 10 | t
7 | 1 | 15 | f
8 | 1 | 21 | t
10 | 1 | 40 | f
3 | 2 | 2 | f
6 | 2 | 10 | t
9 | 2 | 13 | f
(10 rows)
Read in the documentation: 37.10. User-defined Aggregates and CREATE AGGREGATE.
This feels more like a recursive problem than windowing function. The following query obtained the desired results:
WITH RECURSIVE base(type, time_stamp) AS (
-- 3. base of recursive query
SELECT x.type, x.time_stamp, y.next_time_stamp
FROM
-- 1. start with the initial records of each type
( SELECT type, min(time_stamp) AS time_stamp
FROM event
GROUP BY type
) x
LEFT JOIN LATERAL
-- 2. for each of the initial records, find the next TIMEFRAME (10) in the future
( SELECT MIN(time_stamp) next_time_stamp
FROM event
WHERE type = x.type
AND time_stamp > (x.time_stamp + 10)
) y ON true
UNION ALL
-- 4. recursive join, same logic as base
SELECT e.type, e.time_stamp, z.next_time_stamp
FROM event e
JOIN base b ON (e.type = b.type AND e.time_stamp = b.next_time_stamp)
LEFT JOIN LATERAL
( SELECT MIN(time_stamp) next_time_stamp
FROM event
WHERE type = e.type
AND time_stamp > (e.time_stamp + 10)
) z ON true
)
-- The actual query:
-- 5a. All records from base are not duplicates
SELECT time_stamp, type, false
FROM base
UNION
-- 5b. All records from event that are not in base are duplicates
SELECT time_stamp, type, true
FROM event
WHERE (type, time_stamp) NOT IN (SELECT type, time_stamp FROM base)
ORDER BY type, time_stamp
There are a lot of caveats with this. It assumes no duplicate time_stamp for a given type. Really the joins should be based on a unique id rather than type and time_stamp. I didn't test this much, but it may at least suggest an approach.
This is my first time to try a LATERAL join. So there may be a way to simplify that moe. Really what I wanted to do was a recursive CTE with the recursive part using MIN(time_stamp) based on time_stamp > (x.time_stamp + 10), but aggregate functions are not allowed in CTEs in that manner. But it seems the lateral join can be used in the CTE.