Postgresql - how can I clone a set of records but maintain a mapping between the original ids and the new ids - postgresql

I come from a SQL Server background and our team is migrating to Postgres (version 9.5).
We have a number of scripts that perform MERGE statements that essentially 'clone' rows in a table and insert them back into the same table with a new Id while maintaining a map between the cloned records and the records they were cloned from.
I'm having a hard time trying to replicate this behavior. I've tried a number of variations, but I still can't seem to find the right combination of temp tables and CTEs to get it right.
Here's an approximation of the latest version that doesn't work:
CREATE SCHEMA stackoverflow;
CREATE TABLE stackoverflow.clone_problem
(
id bigserial PRIMARY KEY NOT NULL,
some_id bigint NULL,
some_other_id bigint NULL,
modified_time timestamp NOT NULL DEFAULT now(),
modified_by varchar(128) NOT NULL DEFAULT current_user
);
INSERT INTO stackoverflow.clone_problem
(
id,
some_id,
some_other_id
)
VALUES (1,1,1)
,(2,2,2)
,(3,3,3);
;WITH sources
AS
(
SELECT
id as old_id,
some_id,
some_other_id
FROM stackoverflow.clone_problem
WHERE id = ANY('{1,3}')
),
inserts
AS
(
INSERT INTO stackoverflow.clone_problem
(
some_id,
some_other_id
)
SELECT
s.some_id,
s.some_other_id
FROM sources s
RETURNING id as new_id, s.id as old_id -- this doesn't work
)
SELECT * from inserts;
The final select statement is the output I'm trying to capture--either from a RETURNING statement of by other means-- so we know which records were cloned and what their new Ids are. But the code above throws this error: error: missing FROM-clause entry for table "s".
I don't understand because 's' is in the FROM clause so the error seems counterintuitive to me. I'm sure I'm missing something dumb, but I just can't seem to figure how to get that final piece of information.
Any help would be greatly appreciated.

I think your only chance is to generate the ID before you do the insert so that you have the mapping between old and new ID right away. This can be done by calling nextval() when retrieving the source rows, then providing that already generated ID during the INSERT
with sources as (
SELECT id as old_id,
nextval(pg_get_serial_sequence('stackoverflow.clone_problem', 'id')) as new_id,
some_id,
some_other_id
FROM clone_problem
WHERE id IN (1,3)
), inserts as (
INSERT INTO clone_problem (id, some_id, some_other_id)
SELECT s.new_id,
s.some_id,
s.some_other_id
FROM sources s
)
select old_id, new_id
from sources;
By using pg_get_serial_sequence you don't need know the name of the sequence directly.

Related

How to correctly GROUP BY on jdbc sources

I have a Kafka stream with user_id and want to produce another stream with user_id and number of records in a JDBC table.
Following is how I tried to achieve this (I'm new to flink, so please correct me if that's not how things are supposed to be done). The issue is that flink ignores all updates to JDBC table after the job has started.
As far as I understand the answer to this is to use lookup joins but flink complains that lookup joins are not supported on temporal views. Also tried to use versioned views without much success.
What would be the correct approach to achieve what I want?
CREATE TABLE kafka_stream (
user_id STRING,
event_time TIMESTAMP(3) METADATA FROM 'timestamp',
WATERMARK FOR event_time AS event_time - INTERVAL '5' SECOND
) WITH (
'connector' = 'kafka',
-- ...
)
-- NEXT SQL --
CREATE TABLE jdbc_table (
user_id STRING,
checked_at TIMESTAMP,
PRIMARY KEY(user_id) NOT ENFORCED
) WITH (
'connector' = 'jdbc',
-- ...
)
-- NEXT SQL --
CREATE TEMPORARY VIEW checks_counts AS
SELECT user_id, count(*) as num_checks
FROM jdbc_table
GROUP BY user_id
-- NEXT SQL --
INSERT INTO output_kafka_stream
SELECT
kafka_stream.user_id,
checks_counts.num_checks
FROM kafka_stream
LEFT JOIN checks_counts ON kafka_stream.user_id = checks_counts.user_id

Serial not auto-incrementing when insert table into another table

When I insert a table (station) into another table (postes), the auto-incrementing serial "id_hta" is lost.
INSERT INTO station (code_hta, geom_hta)
SELECT code, geom FROM postes;
So, I tried :
INSERT INTO station (id_hta, code_hta, geom_hta)
VALUES(DEFAULT, (SELECT code_gto, geom FROM postes));
But I receive an error: The query must return a single column.
Any help is welcome.
You can't use the values clause when the source is a SELECT statement. And you can't use the DEFAULT clause inside a SELECT. So the solution is to not specify the column that is auto-generated:
INSERT INTO station (code_hta, geom_hta)
SELECT code_gto, geom
FROM postes

using 1 WITH statement with multiple (n) INSERTs

I wanted to know if it is possible to create one WITH statement and add multiple data values to other table with select, or do any equivalent thing.
I have 2 tables
one has data
create table bg_item(
item_id text primary key default 'a'||nextval('bg_item_seq'),
sellerid bigint not null references bg_user(userid),
item_type char(1)not null default 'N', --NORMAL (public)
item_upload_date date NOT NULL default current_date,
item_name varchar(30) not null,
item_desc text not null,
other has images link
create table item_images(
img_id bigint primary key default nextval('bg_item_image_seq'),
item_id text not null references bg_item (item_id),
image_link text not null
);
The user can add item to sell and upload images of it, now these images can be 3 or more, now when i will add the images, and complete item's description and everything from app, my request goes to backend, and i want to perform the query that it adds the user's item, sends me the item's id (which is a sequence in PostgresSQL) and use that id to reference images that i am inserting.
Currently i was doing this (for 1 image):
WITH ins1 AS (
INSERT INTO bg_item(sellerid,item_type,item_date,item_name,item_desc,item_specs,item_category)
VALUES (1005, 'k',default,'asdf','asdf','asd','asd')
RETURNING item_id
)
INSERT INTO item_images (item_id, image_link)
select item_id,'asdfg.asd.asdf.com' from ins1
(for 3 images)
INSERT INTO bg_item(sellerid,item_type,item_date,item_name,item_desc,item_specs,item_category)
VALUES (1005, 'k',default,'asdf','asdf','asd','asd')
RETURNING item_id
)
INSERT INTO item_images (item_id, image_link)
select item_id,'asdfg.asd.asdf.com' from ins1
select item_id,'asdfg.asdaws3f.com' from ins1
select item_id,'asdfg.gooolefnsd.sfsjf.com' from ins1
This would work for 3 images.
So my question is how to do it with n number of images? (as user can upload from 1 to n images)
Can i write a for loop?
a procedure or function?
References:
With and Insert
Sql multiple insert select
I didn't understand the Edit 3 (if it is related to my answer) in the above one.
One Solution i can think of is to write a procedure to return me item_id and write one more procedure to run multiple inserts, but i want a more efficient solution.
If you are going to work with SQL then there is a concept you need to expel from your thoughts -- LOOP. As soon as you think it, it is time to rethink. It does not exist is SQL and is not typically needed. SQL works in sets of qualifying things not individual things.
Now to your issue, it can be done is 1 statement. You pass your image list as an array of text in the with clause, then unnest that array and join to your existing cte during the Insert/Select:
with images (ilist) as
(
select array['image1','image2','image3','image4','image5']
)
, item (item_id) as
(
insert into bg_item(sellerid,item_type,item_date,item_name,item_desc,item_specs,item_category)
values (1005, 'k',default,'asdf','asdf','asd','asd')
returning item_id
)
insert into item_images (item_id, image_link)
select item_id,unnest (ilist)
from images
join item on true;

postgress: insert rows to table with multiple records from other join tables

ّ am trying to insert multiple records got from the join table to another table user_to_property. In the user_to_property table user_to_property_id is primary, not null it is not autoincrementing. So I am trying to add user_to_property_id manually by an increment of 1.
WITH selectedData AS
( -- selection of the data that needs to be inserted
SELECT t2.user_id as userId
FROM property_lines t1
INNER JOIN user t2 ON t1.account_id = t2.account_id
)
INSERT INTO user_to_property (user_to_property_id, user_id, property_id, created_date)
VALUES ((SELECT MAX( user_to_property_id )+1 FROM user_to_property),(SELECT
selectedData.userId
FROM selectedData),3,now());
The above query gives me the below error:
ERROR: more than one row returned by a subquery used as an expression
How to insert multiple records to a table from the join of other tables? where the user_to_property table contains a unique record for the same user-id and property_id there should be only 1 record.
Typically for Insert you use either values or select. The structure values( select...) often (generally?) just causes more trouble than it worth, and it is never necessary. You can always select a constant or an expression. In this case convert to just select. For generating your ID get the max value from your table and then just add the row_number that you are inserting: (see demo)
insert into user_to_property(user_to_property_id
, user_id
, property_id
, created
)
with start_with(current_max_id) as
( select max(user_to_property_id) from user_to_property )
select current_max_id + id_incr, user_id, 3, now()
from (
select t2.user_id, row_number() over() id_incr
from property_lines t1
join users t2 on t1.account_id = t2.account_id
) js
join start_with on true;
A couple notes:
DO NOT use user for table name, or any other object name. It is a
documented reserved word by both Postgres and SQL standard (and has
been since Postgres v7.1 and the SQL 92 Standard at lest).
You really should create another column or change the column type
user_to_property_id to auto-generated. Using Max()+1, or
anything based on that idea, is a virtual guarantee you will generate
duplicate keys. Much to the amusement of users and developers alike.
What happens in an MVCC when 2 users run the query concurrently.

Compact or renumber IDs for all tables, and reset sequences to max(id)?

After running for a long time, I get more and more holes in the id field. Some tables' id are int32, and the id sequence is reaching its maximum value. Some of the Java sources are read-only, so I cannot simply change the id column type from int32 to long, which would break the API.
I'd like to renumber them all. This may be not good practice, but good or bad is not concerned in this question. I want to renumber, especially, those very long IDs like "61789238", "548273826529524324". I don't know why they are so long, but shorter IDs are also easier to handle manually.
But it's not easy to compact IDs by hand because of references and constraints.
Does PostgreSQL itself support of ID renumbering? Or is there any plugin or maintaining utility for this job?
Maybe I can write some stored procedures? That would be very nice so I can schedule it once a year.
The question is old, but we got a new question from a desperate user on dba.SE after trying to apply what is suggested here. Find an answer with more details and explanation over there:
Compacting a sequence in PostgreSQL
The currently accepted answer will fail for most cases.
Typically, you have a PRIMARY KEY or UNIQUE constraint on an id column, which is NOT DEFERRABLE by default. (OP mentions references and constraints.) Such constraints are checked after each row, so you most likely get unique violation errors trying. Details:
Constraint defined DEFERRABLE INITIALLY IMMEDIATE is still DEFERRED?
Typically, one wants to retain the original order of rows while closing gaps. But the order in which rows are updated is arbitrary, leading to arbitrary numbers. The demonstrated example seems to retain the original sequence because physical storage still coincides with the desired order (inserted rows in desired order just a moment earlier), which is almost never the case in real world applications and completely unreliable.
The matter is more complicated than it might seem at first. One solution (among others) if you can afford to remove the PK / UNIQUE constraint (and related FK constraints) temporarily:
BEGIN;
LOCK tbl;
-- remove all FK constraints to the column
ALTER TABLE tbl DROP CONSTRAINT tbl_pkey; -- remove PK
-- for the simple case without FK references - or see below:
UPDATE tbl t -- intermediate unique violations are ignored now
SET id = t1.new_id
FROM (SELECT id, row_number() OVER (ORDER BY id) AS new_id FROM tbl) t1
WHERE t.id = t1.id;
-- Update referencing value in FK columns at the same time (if any)
SELECT setval('tbl_id_seq', max(id)) FROM tbl; -- reset sequence
ALTER TABLE tbl ADD CONSTRAINT tbl_pkey PRIMARY KEY(id); -- add PK back
-- add all FK constraints to the column back
COMMIT;
This is also much faster for big tables, because checking PK (and FK) constraint(s) for every row costs a lot more than removing the constraint(s) and adding it (them) back.
If there are FK columns in other tables referencing tbl.id, use data-modifying CTEs to update all of them.
Example for a table fk_tbl and a FK column fk_id:
WITH u1 AS (
UPDATE tbl t
SET id = t1.new_id
FROM (SELECT id, row_number() OVER (ORDER BY id) AS new_id FROM tbl) t1
WHERE t.id = t1.id
RETURNING t.id, t1.new_id -- return old and new ID
)
UPDATE fk_tbl f
SET fk_id = u1.new_id -- set to new ID
FROM u1
WHERE f.fk_id = u1.id; -- match on old ID
More in the referenced answer on dba.SE.
Assuming your ids are generated from a bignum sequence, just RESTART the sequence and update the table with idcolumn = DEFAULT.
CAVEAT: If this id column is used as a foreign key by other tables, make sure you have the on update cascade modifier turned on.
For example:
Create the table, put some data in, and remove a middle value:
db=# create sequence xseq;
CREATE SEQUENCE
db=# create table foo ( id bigint default nextval('xseq') not null, data text );
CREATE TABLE
db=# insert into foo (data) values ('hello'), ('world'), ('how'), ('are'), ('you');
INSERT 0 5
db=# delete from foo where data = 'how';
DELETE 1
db=# select * from foo;
id | data
----+-------
1 | hello
2 | world
4 | are
5 | you
(4 rows)
Reset your sequence:
db=# ALTER SEQUENCE xseq RESTART;
ALTER SEQUENCE
Update your data:
db=# update foo set id = DEFAULT;
UPDATE 4
db=# select * from foo;
id | data
----+-------
1 | hello
2 | world
3 | are
4 | you
(4 rows)
new id column and Foreign Key(s) while the old ones are still in use. With some (quick) renaming, applications do not have to be aware. (But applications should be inactive during the final renaming step)
\i tmp.sql
-- the test tables
CREATE TABLE one (
id serial NOT NULL PRIMARY KEY
, payload text
);
CREATE TABLE two (
id serial NOT NULL PRIMARY KEY
, the_fk INTEGER REFERENCES one(id)
ON UPDATE CASCADE ON DELETE CASCADE
);
-- And the supporting index for the FK ...
CREATE INDEX ON two(the_fk);
-- populate
INSERT INTO one(payload)
SELECT x::text FROM generate_series(1,1000) x;
INSERT INTO two(the_fk)
SELECT id FROM one WHERE random() < 0.3;
-- make some gaps
DELETE FROM one WHERE id % 13 > 0;
-- SELECT * FROM two;
-- Add new keycolumns to one and two
ALTER TABLE one
ADD COLUMN new_id SERIAL NOT NULL UNIQUE
;
-- UPDATE:
-- This could need DEFERRABLE
-- Note since the update is only a permutation of the
-- existing values, we dont need to reset the sequence.
UPDATE one SET new_id = self.new_id
FROM ( SELECT id, row_number() OVER(ORDER BY id) AS new_id FROM one ) self
WHERE one.id = self.id;
ALTER TABLE two
ADD COLUMN new_fk INTEGER REFERENCES one(new_id)
;
-- update the new FK
UPDATE two t
SET new_fk = o.new_id
FROM one o
WHERE t.the_fk = o.id
;
SELECT * FROM two;
-- The crucial part: the final renaming
-- (at this point it would be better not to allow other sessions
-- messing with the {one,two} tables ...
-- --------------------------------------------------------------
ALTER TABLE one DROP COLUMN id CASCADE;
ALTER TABLE one rename COLUMN new_id TO id;
ALTER TABLE one ADD PRIMARY KEY(id);
ALTER TABLE two DROP COLUMN the_fk CASCADE;
ALTER TABLE two rename COLUMN new_fk TO the_fk;
CREATE INDEX ON two(the_fk);
-- Some checks.
-- (the automatically generated names for the indexes
-- and the sequence still contain the "new" names.)
SELECT * FROM two;
\d one
\d two
UPDATE: added the permutation of new_id (after creating it as a serial)
Funny thing is: it doesn't seem to need 'DEFERRABLE'.
*This script will work for postgresql
This is a generic solution that works for all cases
This query find the desciption of the fields of all tables from any database.
WITH description_bd AS (select colum.schemaname,coalesce(table_name,relname) as table_name , column_name, ordinal_position, column_default, data_type, is_nullable, character_maximum_length, is_updatable,description from
( SELECT columns.table_schema as schemaname,columns.table_name, columns.column_name, columns.ordinal_position, columns.column_default, columns.data_type, columns.is_nullable, columns.character_maximum_length, columns.character_octet_length, columns.is_updatable, columns.udt_name
FROM information_schema.columns
) colum
full join (SELECT schemaname, relid, relname,objoid, objsubid, description
FROM pg_statio_all_tables ,pg_description where pg_statio_all_tables.relid= pg_description.objoid ) descre
on descre.relname = colum.table_name and descre.objsubid=colum.ordinal_position and descre.schemaname=colum.schemaname )
This query propose a solution to fix the sequence of all database tables (this generates a query in the req field which fixes the sequence of the different tables).
It finds the number of records of the table and then increment this number by one.
SELECT table_name, column_name, ordinal_position,column_default,
data_type, is_nullable, character_maximum_length, is_updatable,
description,'SELECT setval('''||schemaname||'.'|| replace(replace(column_default,'''::regclass)',''),'nextval(''','')||''', (select max( '||column_name ||')+1 from '|| table_name ||' ), true);' as req
FROM description_bd where column_default like '%nextva%'
Since I didn't like the answers, I wrote a function in PL/pgSQL to do the job.
It is called like this :
=> SELECT resequence('port','id','port_id_seq');
resequence
--------------
5090 -> 3919
Takes 3 parameters
name of table
name of column that is SERIAL
name of sequence that the SERIAL uses
The function returns a short report of what it has done, with the previous value of the sequence and the new value.
The function LOOPs over the table ORDERed by the named column and makes an UPDATE for each row. Then sets the new value for the sequence. That's it.
The order of the values is preserved.
No ADDing and DROPing of temporary columns or tables involved.
No DROPing and ADDing of constraints and foreign keys needed.
Of course You better have ON UPDATE CASCADE for those foreign keys.
The code :
CREATE OR REPLACE FUNCTION resequence(_tbl TEXT, _clm TEXT, _seq TEXT) RETURNS TEXT AS $FUNC$
DECLARE
_old BIGINT;_new BIGINT := 0;
BEGIN
FOR _old IN EXECUTE 'SELECT '||_clm||' FROM '||_tbl||' ORDER BY '||_clm LOOP
_new=_new+1;
EXECUTE 'UPDATE '||_tbl||' SET '||_clm||'='||_new||' WHERE '||_clm||'='||_old;
END LOOP;
RETURN (nextval(_seq::regclass)-1)||' -> '||setval(_seq::regclass,_new);
END $FUNC$ LANGUAGE plpgsql;