Efficiently copying a tree modelled with adjacency list in postgres - postgresql

I have the following table:
CREATE TABLE tree_node (
id serial primary key,
name character varying(255),
parent_id integer references tree (id)
);
The table contains many trees with up to about 1000 nodes.
(I'm able to query a tree and its descendants efficiently with a recursive query).
However, I need to be able to copy a single tree in one operation. Say I have a tree with 3 nodes, ids 1,2,3 (this is potentially a large tree). I would like to make a copy of it i.e. creating new nodes with new ids. (Here the copied tree is ids 4,5,6):
id | name | parent_id
----+-----------------+-----------
1 | NodeA |
2 | NodeA.1 | 1
3 | NodeA.1.1 | 2
4 | NodeA(copy) |
5 | NodeA.1(copy) | 4
6 | NodeA.1.1(copy) | 5
Is there a way to copy a tree and its descendants more efficiently than inserting each tree node separately (because the new parent_id is needed)?

There you go:
\i tmp.sql
CREATE TABLE tree_node (
id serial primary key
, name varchar
, parent_id integer references tree_node (id)
);
INSERT INTO tree_node(name, parent_id) VALUES
( 'Node-A', NULL)
, ( 'Node-A.1', 1)
, ( 'Node-A.1.1', 2)
;
SELECT * FROM tree_node;
-- Find the top value of the sequence
-- and use it as an increment on all the copies
WITH top(val) AS
(select currval('tree_node_id_seq')
)
INSERT INTO tree_node(id, name, parent_id)
SELECT id+top.val
, name|| '(copy)'
, parent_id + top.val
FROM tree_node
CROSS JOIN top
;
SELECT * FROM tree_node;
-- bump the sequence
WITH nxt AS (
select max(id) mx from tree_node
)
SELECT setval('tree_node_id_seq', (select mx FROM nxt) )
;
Output:
DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
INSERT 0 3
id | name | parent_id
----+------------+-----------
1 | Node-A |
2 | Node-A.1 | 1
3 | Node-A.1.1 | 2
(3 rows)
INSERT 0 3
id | name | parent_id
----+------------------+-----------
1 | Node-A |
2 | Node-A.1 | 1
3 | Node-A.1.1 | 2
4 | Node-A(copy) |
5 | Node-A.1(copy) | 4
6 | Node-A.1.1(copy) | 5
(6 rows)
setval
--------
6
(1 row)

Related

How to merge rows on a table and update junction table on postgres

Consider 2 tables (table A and table B) with a many-to-many relationship, each containing a primary key and other attributes. To map this relation there's a third joint table (table C) containing the foreign keys for each table of the relation ( fk_tableA | fk_tableB ).
Table B contains duplicate rows (except for the pk), so I want to merge these together into a single record with whatever unique primary key, just like so:
table B table B (after merging duplicates)
1 | Henry | 100.0 1 | Henry | 100.0
2 | Jessi | 97.0 2 | Jessi | 97.0
3 | Henry | 100.0 4 | Erica | 11.2
4 | Erica | 11.2
By merging these records, there may be foreign keys of table C (joint table) pointing to primary keys of table B that no longer exist. My goal is to edit them to point to the merged record:
Before merging:
tableA table B table C
id | att1 id | att1 | att2 fk_A | fk_b
----------- ------------------- ------------
1 | ab123 1 | Henry | 100.0 1 | 1
2 | adawd 2 | Jessi | 97.0 2 | 3
3 | da3wf 3 | Henry | 100.0
4 | Erica | 11.2
On table C, 2 records from table B are referenced (1 and 3) which happen to be duplicated rows. My goal is to merge those into a single record (in table B) and update the foreign key in table C:
After merging:
tableA table B table C
id | att1 id | att1 | att2 fk_A | fk_b
----------- ------------------- ------------
1 | ab123 1 | Henry | 100.0 1 | 1
2 | adawd 2 | Jessi | 97.0 2 | 1
3 | da3wf 4 | Erica | 11.2
- Note that id=3 was merged/deleted from table B and the same id
was updated on table C to point to the merged record's id.
So my question is basically how to update a junction table upon merging records of a table? I am currently using Postgres and working on millions of data.
-- \i tmp.sql
CREATE TABLE persons
( id integer primary key
, name text
, weight decimal(4,1)
);
INSERT INTO persons(id,name,weight)VALUES
(1 ,'Henry', 100.0)
,(2 ,'Jessi', 97.0)
,(3 ,'Henry', 100.0)
,(4 ,'Erica', 11.)
;
CREATE TABLE junctiontab
( fk_A integer NOT NULL
, p_id integer REFERENCES persons(id)
, PRIMARY KEY (fk_A,p_id)
);
INSERT INTO junctiontab(fk_A, p_id)VALUES (1 , 1 ),(2 , 3 );
-- find the ids of affected persons.
-- [for simplicity: put them in a temp table]
CREATE TEMP table xlat AS
SELECT * FROM(
SELECt id AS wrong_id
,min(id) OVER (PARTITION BY name ORDER BY id) AS good_id
FROM persons p
) x
WHERE good_id <> wrong_id
;
--show it
SELECT *FROM xlat;
UPDATE junctiontab j
SET p_id = x.good_id
FROM xlat x
WHERE j.p_id = x.wrong_id
-- The good junction-entry *could* already exist...
AND NOT EXISTS (
SELECT *FROM junctiontab nx
WHERE nx.fk_A= j.fk_A
AND nx.p_id= x.good_id
)
;
DELETE FROM junctiontab d
-- if the good junction-entry already existed, we can delete the wrong one now.
WHERE EXISTS (
SELECT *FROM junctiontab g
JOIN xlat x ON g.p_id= x.good_id
AND d.p_id = x.wrong_id
WHERE g.fk_A= d.fk_A
)
;
--show it
SELECT *FROM junctiontab
;
-- Delete thewrongperson-records
DELETE FROM persons p
WHERE EXISTS (
SELECT *FROM xlat x
WHERE p.id = x.wrong_id
);
--show it
SELECT * FROM persons p;
Result:
DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
INSERT 0 4
CREATE TABLE
INSERT 0 2
SELECT 1
wrong_id | good_id
----------+---------
3 | 1
(1 row)
UPDATE 1
DELETE 0
fk_a | p_id
------+------
1 | 1
2 | 1
(2 rows)
DELETE 1
id | name | weight
----+-------+--------
1 | Henry | 100.0
2 | Jessi | 97.0
4 | Erica | 11.0
(3 rows)

Hierarchy trees in database and web app

I want to create web app which will use tree data structures. Users will be able to create, update and delete trees. I have the following table in PostgreSQL called nodes in database:
id INTEGER PRIMARY KEY,
name VARCHAR(50) NOT NULL UNIQUE,
parent_id INTEGER NULL REFERENCE nodes(id)
Getting data
I want to get data in the following form:
id | name | children
---|------|--------------
1 | a | [2,3]
2 | b | []
3 | c | [4]
4 | d | []
I created query which returns data in form
id | name | parent_id
---|------|--------------
1 | a |
2 | b | 1
3 | c | 1
4 | d | 3
And here is code:
WITH RECURSIVE nodes_cte(id, name, parent_id, level) AS (
SELECT nodes.id, nodes.name, nodes.parent_id, 0 AS level
FROM nodes
WHERE name = 'a'
UNION ALL
SELECT nodes.id, nodes.name, nodes.parent_id, level+1
FROM nodes
JOIN nodes_cte
ON nodes_cte.id = nodes.parent_id
)
SELECT * FROM nodes_cte;
Can I change SQL code to get what I want or should I do that in app??
Inserting data
I want to know what are the ways to insert data into the table. I think that following approach will work for me:
create sequence in database
increase sequence for number of elements in tree
manually compute ids in app and insert elements in the table
Are there better ways?
CREATE TABLE nodes
( id INTEGER PRIMARY KEY
, name VARCHAR(50) NOT NULL UNIQUE
, parent_id INTEGER NULL REFERENCES nodes(id)
);
-- I created query which returns data in form
INSERT INTO nodes(id,name,parent_id)VALUES
( 1 , 'a' , NULL)
,( 2 , 'b' , 1)
,( 3 , 'c' , 1)
,( 4 , 'd' , 3)
;
SELECT p.id, p.name
, array_agg(c.id) AS children
FROM nodes p
LEFT JOIN nodes c ON c.parent_id = p.id
GROUP BY p.id, p.name
;
Result:
id | name | children
----+------+----------
1 | a | {2,3}
2 | b | {NULL}
3 | c | {4}
4 | d | {NULL}
(4 rows)
Extra: using generate_series() to insert a bunch of records. Each record having id/3 as parent, (except when zero).
INSERT INTO nodes(id,name,parent_id)
SELECT gs, 'zzz_'|| gs::text, NULLIF(gs/3 , 0)
FROM generate_series ( 5,25) gs
;
INSERTING/UPDATING DATA
Normally, your front-end should not mess with sequences, but leave that to the DBMS. You already have a UNIQUE constraint on name, because it is a natural key . So, your front-end should use that key to address rows in the nodes table, like in:
CREATE TABLE nodes2
( id SERIAL NOT NULL PRIMARY KEY
, name VARCHAR(50) NOT NULL UNIQUE
, parent_id INTEGER NULL REFERENCES nodes(id)
);
INSERT INTO nodes2(name,parent_id)
SELECT 'Omg_'|| gs::text, NULLIF(gs/3 , 0)
FROM generate_series ( 1,15) gs
;
PREPARE upd (text, text) AS
-- child, parent
UPDATE nodes2 c
SET parent_id = p.id
FROM nodes2 p
WHERE p.name = $2 -- parent
AND c.name = $1 -- child
;
EXECUTE upd( 'Omg_12', 'Omg_11');
EXECUTE upd( 'Omg_15', 'Omg_11');
Result:
CREATE TABLE
INSERT 0 15
PREPARE
UPDATE 1
UPDATE 1
id | name | children
----+--------+-----------
1 | Omg_1 | {3,4,5}
2 | Omg_2 | {6,7,8}
3 | Omg_3 | {9,10,11}
4 | Omg_4 | {13,14}
5 | Omg_5 | {NULL}
6 | Omg_6 | {NULL}
7 | Omg_7 | {NULL}
8 | Omg_8 | {NULL}
9 | Omg_9 | {NULL}
10 | Omg_10 | {NULL}
11 | Omg_11 | {15,12}
12 | Omg_12 | {NULL}
13 | Omg_13 | {NULL}
14 | Omg_14 | {NULL}
15 | Omg_15 | {NULL}
(15 rows)

Remove duplicate with lower Date from SQL result

I have following table:
CREATE TABLE Kundendaten (
beschreiben_knr INTEGER REFERENCES Kunde(knr) DEFERRABLE INITIALLY DEFERRED,
erstelldatum DATE,
anschrift VARCHAR(40),
sonderrabat INTEGER,
PRIMARY KEY (erstelldatum, beschreiben_knr)
);
If i make this query:
select * from Kundendaten ORDER BY erstelldatum DESC;
i get:
beschreiben_knr | erstelldatum | anschrift | sonderrabat
-----------------+--------------+---------------+-------------
1 | 2015-11-01 | Winkelgasse 5 | 0
2 | 2015-11-01 | Badeteich 7 | 10
3 | 2015-11-01 | Senfgasse 7 | 15
1 | 2015-10-30 | Sonnenweg 3 | 5
But i need to get only the entry for the highest date entry if there are more then one. In this case the last row should not appear.
How can i achieve this in postgresql?
You want something like WHERE erstelldatum = MAX(DATE) but that doesn't work. You can use a sub-query to get the newest date.
SELECT *
FROM Kundendaten
WHERE erstelldatum = (
SELECT MAX(erstelldatum) FROM Kundendaten
);
(SQL Fiddle)
Postgres will optimize that subquery so it is only run once, but you'll want to make sure erstelldatum is indexed.

way to calculate sum of two corresponded columns in postgresql

I have two tables called "incoming" and "orders" and i want to create view called "stock" which is produced using data from incoming and orders.
CREATE TABLE incoming
(
id serial NOT NULL,
model integer,
size integer,
color integer,
price real,
quanity integer,
CONSTRAINT pk PRIMARY KEY (id),
CONSTRAINT "incoming_model_size_color_key" UNIQUE (model, size, color)
)
CREATE TABLE orders
(
id serial NOT NULL,
model integer,
size integer,
color integer,
price real,
quanity integer,
Comenttext text,
CONSTRAINT pk_orders PRIMARY KEY (id)
)
For now i have this dirty solution:
CREATE OR REPLACE VIEW stock AS
WITH total_orders AS (
SELECT orders.model,
orders.size,
orders.color,
sum(orders.quanity) AS sum
FROM orders
GROUP BY orders.color, orders.size, orders.model
)
SELECT incoming.model,
incoming.size,
incoming.color,
incoming.quanity - (( SELECT
CASE count(*)
WHEN 1 THEN ( SELECT total_orders_1.sum
FROM total_orders total_orders_1
WHERE incoming.model = total_orders_1.model AND incoming.size = total_orders_1.size)
ELSE 0::bigint
END AS "case"
FROM total_orders
WHERE incoming.model = total_orders.model AND incoming.size=total_orders.size)) AS quanity
FROM incoming;
how can i use it more clear and simple?
examples:
select * from incloming
id | model | size | color | price | quanity
----+-------+------+-------+-------+--------
1 | 1 | 6 | 5 | 550 | 15
2 | 1 | 5 | 5 | 800 | 20
select * from orders
id | model | size | color | price | quanity |
----+-------+------+-------+-------+---------+
1 | 1 | 6 | 5 | 1000 | 1 |
2 | 1 | 6 | 5 | 1000 | 2 | -- sum is 3
select * from stock
model | size | color | quanity
-------+------+-------+----------
1 | 6 | 5 | 12 --= 15 - 3 !! excellent
1 | 5 | 5 | 20 -- has no oerders yet
You just need to left join on the aggregated orders:
select i.model, i.size, i.color, i.quantity,
o.qty as ordered,
i.quantity - coalesce(o.qty, 0) as quantity_on_stock
from incoming i
left join (
select model, size, color, sum(quantity) as qty
from orders
group by model, size, color
) o on (o.model, o.size, o.color) = (i.model, i.size, i.color);
SQLFiddle: http://sqlfiddle.com/#!15/7fbec/2
When using your CTE as base, then you wind up with this:
WITH total_orders AS (
SELECT orders.model,
orders.size,
orders.color,
sum(orders.quantity) AS sum
FROM orders
GROUP BY color, size, model
)
SELECT i.model,
i.size,
i.color,
i.quantity - coalesce(tot.sum, 0) AS quanity
FROM incoming i
LEFT JOIN total_orders tot on (tot.model, tot.size, tot.color) = (i.model, i.size, i.color);
Whether or not the CTE or the derived table (the first solution) is faster you need to test.

Restart primary key numbers of existing rows after deleting most of a big table

I am working with a PostgreSQL 8.4.13 database.
Recently I had around around 86.5 million records in a table. I deleted almost all of them - only 5000 records are left now. I ran:
vacuum full
after deleting the rows and that returned disk space to the OS (thx to suggestion from fellow SO member)
But I see that my id numbers are still stuck at millions. For ex:
id | date_time | event_id | secs_since_1970 | value
---------+-------------------------+----------+-----------------+-----------
61216287 | 2013/03/18 16:42:42:041 | 6 | 1363646562.04 | 46.4082
61216289 | 2013/03/18 16:42:43:041 | 6 | 1363646563.04 | 55.4496
61216290 | 2013/03/18 16:42:44:041 | 6 | 1363646564.04 | 40.0553
61216291 | 2013/03/18 16:42:45:041 | 6 | 1363646565.04 | 38.5694
In an attempt to start the id value at 1 again for the remaining rows, I tried:
cluster mytable_pkey on mytable;
where mytable is the name of my table. But that did not help.
So, my question(s) is/are:
Is there a way to get the index (id value) to start at 1 again?
If I add or update the table with a new record, will it start from 1 or pick up the next highest integer value (say 61216292 in above example)?
My table description is as follows: There is no FK constraint and no sequence in it.
jbossql=> \d mytable;
Table "public.mytable"
Column | Type | Modifiers
-----------------+------------------------+-----------
id | bigint | not null
date_time | character varying(255) |
event_id | bigint |
secs_since_1970 | double precision |
value | real |
Indexes:
"mydata_pkey" PRIMARY KEY, btree (id) CLUSTER
Drop the primary key fisrt and create a temporary sequence.
alter table mytable drop constraint mydata_pkey;
create temporary sequence temp_seq;
Use the sequence to update:
update mytable
set id = nextval('temp_seq');
Recreate the primary key and drop the sequence
alter table mytable add primary key (id);
drop sequence temp_seq;
If there is a foreign key dependency on this table then you will have to deal with it first and the update will be a more complex procedure.
Is your primary key defined using a serial? If so that creates an implicit sequence. You can use ALTER SEQUENCE (see: http://www.postgresql.org/docs/8.2/static/sql-altersequence.html for syntax) to change the starting number back to 1.
Based on the fact that you have some records left (just noticed the 5000 left), you DO NOT want to reset that number to a number before the last ID of the remaining records because then that sequence will generate non-unique numbers. The point of using a sequence is it gives you a transactional way to increment a number and guarantee successive operations get unique incremented numbers.
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
--
-- Note: "deferrable initially deferred" appears to be the default
--
CREATE TABLE funky
( id SERIAL NOT NULL PRIMARY KEY DEFERRABLE INITIALLY DEFERRED
, tekst varchar
);
-- create some data with gaps in it
INSERT INTO funky(id, tekst)
SELECT gs, 'Number:' || gs::text
FROM generate_series(1,100,10) gs
;
-- set the sequence to the max occuring id
SELECT setval('funky_id_seq' , mx.mx)
FROM (SELECT max(id) AS mx FROM funky) mx
;
SELECT * FROM funky ;
-- compress the keyspace, making the ids consecutive
UPDATE funky xy
SET id = self.newid
FROM (
SELECT id AS id
, row_number() OVER (ORDER BY id) AS newid
FROM funky
) self
WHERE self.id = xy.id
;
-- set the sequence to the new max occuring id
SELECT setval('funky_id_seq' , mx.mx)
FROM (SELECT max(id) AS mx FROM funky) mx
;
SELECT * FROM funky ;
Result:
CREATE TABLE
INSERT 0 10
setval
--------
91
(1 row)
id | tekst
----+-----------
1 | Number:1
11 | Number:11
21 | Number:21
31 | Number:31
41 | Number:41
51 | Number:51
61 | Number:61
71 | Number:71
81 | Number:81
91 | Number:91
(10 rows)
UPDATE 10
setval
--------
10
(1 row)
id | tekst
----+-----------
1 | Number:1
2 | Number:11
3 | Number:21
4 | Number:31
5 | Number:41
6 | Number:51
7 | Number:61
8 | Number:71
9 | Number:81
10 | Number:91
(10 rows)
WARNING WARNING WARNING WARNING WARNING WARNING ACHTUNG:
Changing key values is generally a terrible idea. Avoid it at all cost.