How to reflect updates in a loop? - postgresql

I have a migration run. At each iteration i can update several rows. But how can i skip already updated rows? Changes inside the loop are not visible to the outside (yes I know - stable result) - but how can i change that? I want, that if a person already has a graph_id --> SKIP
DROP TABLE IF EXISTS persons_2_persons, persons, graph;
CREATE TABLE graph (id SERIAL PRIMARY KEY);
CREATE TABLE persons (id SERIAL PRIMARY KEY, graph_id INTEGER REFERENCES graph(id));
CREATE TABLE persons_2_persons("from" INTEGER NOT NULL REFERENCES persons(id), "to" INTEGER NOT NULL REFERENCES persons(id));
INSERT INTO persons (graph_id) VALUES (NULL), (NULL), (NULL);
INSERT INTO persons_2_persons VALUES
(1,2),(2,1),
(2,3),(3,2);
-- Table has about 18 Mio records. 30-40 persons are connected --> goal: give each graph/cluster an graph_id
-- I also tried with CURSORs
DO $x$
DECLARE
var_record RECORD;
var_graph_id INTEGER;
BEGIN
FOR var_record IN SELECT * FROM persons -- Reevaluate after each iteration
LOOP
IF var_record.graph_id IS NULL THEN -- this should be false at the 2nd and 3nd iteration because we updated the data in 1st iteration
INSERT INTO graph DEFAULT VALUES RETURNING id INTO var_graph_id;
RAISE NOTICE '%', var_graph_id;
UPDATE persons SET graph_id = var_graph_id WHERE id IN (1,2,3); -- (1,2,3) is found by a RECURESIVE CTE. This are normally 30-40 persons
END IF;
END LOOP;
END;
$x$;
SELECT * FROM persons;

Make the first command in the loop refresh that individual record:
select * from persons into var_record where id=var_record.id;

Related

Get row number of row to be inserted in Postgres trigger that gives no collisions when inserting multiple rows

Given the following (simplified) schema:
CREATE TABLE period (
id UUID NOT NULL DEFAULT uuid_generate_v4(),
name TEXT
);
CREATE TABLE course (
id UUID NOT NULL DEFAULT uuid_generate_v4(),
name TEXT
);
CREATE TABLE registration (
id UUID NOT NULL DEFAULT uuid_generate_v4(),
period_id UUID NOT NULL REFERENCES period(id),
course_id UUID NOT NULL REFERENCES course(id),
inserted_at timestamptz NOT NULL DEFAULT now()
);
I now want to add a new column client_ref, which identifies a registration unique within a period, but consists of only a 4-character string. I want to use pg_hashids - which requires a unique integer input - to base the column value on.
I was thinking of setting up a trigger on the registration table that runs on inserting a new row. I came up with the following:
CREATE OR REPLACE FUNCTION set_client_ref()
RETURNS trigger
LANGUAGE plpgsql
AS $function$
DECLARE
next_row_number integer;
BEGIN
WITH rank AS (
SELECT
period.id AS period_id,
row_number() OVER (PARTITION BY period.id ORDER BY registration.inserted_at)
FROM
registration
JOIN period ON registration.period_id = period.id ORDER BY
period.id,
row_number
)
SELECT
COALESCE(rank.row_number, 0) + 1 INTO next_row_number
FROM
period
LEFT JOIN rank ON (rank.period_id = period.id)
WHERE
period.id = NEW.period_id
ORDER BY
rank.row_number DESC
LIMIT 1;
NEW.client_ref = id_encode (next_row_number);
RETURN NEW;
END
$function$
;
The trigger is set-up like: CREATE TRIGGER set_client_ref BEFORE INSERT ON registration FOR EACH ROW EXECUTE FUNCTION set_client_ref();
This works as expected when inserting a single row to registration, but if I insert multiple within one statement, they end up having the same client_ref. I can reason about why this happens (the rows don't know about each other's existence, so they assume they're all just next in line when retrieving their row_order), but I am not sure what a way is to prevent this. I tried setting up the trigger as an AFTER trigger, but it resulted in the same (duplicated) behaviour.
What would be a better way to get the lowest possible, unique integer for the rows to be inserted (to base the hash function on) that also works when inserting multiple rows?

Generating incremental numbers based on a different column

I have got a composite primary key in a table in PostgreSQL (I am using pgAdmin4)
Let's call the the two primary keys productno and version.
version represents the version of productno.
So if I create a new dataset, then it needs to be checked if a dataset with this productno already exists.
If productno doesn't exist yet, then version should be (version) 1
If productno exists once, then version should be 2
If productno exists twice, then version should be 3
... and so on
So that we get something like:
productno | version
-----|-----------
1 | 1
1 | 2
1 | 3
2 | 1
2 | 2
I found a quite similar problem: auto increment on composite primary key
But I can't use this solution because PostgreSQL syntax is obviously a bit different - so tried a lot around with functions and triggers but couldn't figure out the right way to do it.
You can keep the version numbers in a separate table (one for each "base PK" value). That is way more efficient than doing a max() + 1 on every insert and has the additional benefit that it's safe for concurrent transactions.
So first we need a table that keeps track of the version numbers:
create table version_counter
(
product_no integer primary key,
version_nr integer not null
);
Then we create a function that increments the version for a given product_no and returns that new version number:
create function next_version(p_product_no int)
returns integer
as
$$
insert into version_counter (product_no, version_nr)
values (p_product_no, 1)
on conflict (product_no)
do update
set version_nr = version_counter.version_nr + 1
returning version_nr;
$$
language sql
volatile;
The trick here is the the insert on conflict which increments an existing value or inserts a new row if the passed product_no does not yet exists.
For the product table:
create table product
(
product_no integer not null,
version_nr integer not null,
created_at timestamp default clock_timestamp(),
primary key (product_no, version_nr)
);
then create a trigger:
create function increment_version()
returns trigger
as
$$
begin
new.version_nr := next_version(new.product_no);
return new;
end;
$$
language plpgsql;
create trigger base_table_insert_trigger
before insert on product
for each row
execute procedure increment_version();
This is safe for concurrent transactions because the row in version_counter will be locked for that product_no until the transaction inserting the row into the product table is committed - which will commit the change to the version_counter table as well (and free the lock on that row).
If two concurrent transactions insert the same value for product_no, one of them will wait until the other finishes.
If two concurrent transactions insert different values for product_no, they can work without having to wait for the other.
If we then insert these rows:
insert into product (product_no) values (1);
insert into product (product_no) values (2);
insert into product (product_no) values (3);
insert into product (product_no) values (1);
insert into product (product_no) values (3);
insert into product (product_no) values (2);
The product table looks like this:
select *
from product
order by product_no, version_nr;
product_no | version_nr | created_at
-----------+------------+------------------------
1 | 1 | 2019-08-23 10:50:57.880
1 | 2 | 2019-08-23 10:50:57.947
2 | 1 | 2019-08-23 10:50:57.899
2 | 2 | 2019-08-23 10:50:57.989
3 | 1 | 2019-08-23 10:50:57.926
3 | 2 | 2019-08-23 10:50:57.966
Online example: https://rextester.com/CULK95702
You can do it like this:
-- Check if pk exists
SELECT pk INTO temp_pk FROM table a WHERE a.pk = v_pk1;
-- If exists, inserts it
IF temp_pk IS NOT NULL THEN
INSERT INTO table(pk, versionpk) VALUES (v_pk1, temp_pk);
END IF;
So - I got it work now
So if you want a column to update depending on another column in pg sql - have a look at this:
This is the function I use:
CREATE FUNCTION public.testfunction()
RETURNS trigger
LANGUAGE 'plpgsql'
COST 100
VOLATILE NOT LEAKPROOF
AS $BODY$
DECLARE v_productno INTEGER := NEW.productno;
BEGIN
IF NOT EXISTS (SELECT *
FROM testtable
WHERE productno = v_productno)
THEN
NEW.version := 1;
ELSE
NEW.version := (SELECT MAX(testtable.version)+1
FROM testtable
WHERE testtable.productno = v_productno);
END IF;
RETURN NEW;
END;
$BODY$;
And this is the trigger that runs the function:
CREATE TRIGGER testtrigger
BEFORE INSERT
ON public.testtable
FOR EACH ROW
EXECUTE PROCEDURE public.testfunction();
Thank you #ChechoCZ, you definetly helped me getting in the right direction.

Postgres - fill in missing data in new table

Given two tables, A and B:
A B
----- -----
id id
high high
low low
bId
I want to find rows in table A where bId is null, create an entry in B based off the data in A, and update the row in A to reference the newly created row. I can create the rows but I'm having trouble updating table A with the reference to the new row:
begin transaction;
with rows as (
insert into B (high, low)
select high, low
from A a
where a.bId is null
returning id as bId, a.id as aId
)
update A
set bId=(select bId from rows where id=rows.aId)
where id=rows.aId;
--commit;
rollback;
However, this fails with a cryptic error: ERROR: missing FROM-clause entry for table a.
Using a Postgres query, how can I achieve this?
either
update "A"
set "bId"=(select "bId" from rows where id=rows."aId")
without the where clause or
update "A"
set "bId"=(select "bId" from rows where id=rows."aId")
FROM rows
where "A".id=rows.aId;
I dont know if your tables realy have that names, as mentioned in the comments try to avoid uppercase tables and fieldnames and try to avoid reserved keynames.
I found a way to get it to work but I feel like it's not the most efficient.
begin transaction;
do $body$
declare
newId int4;
tempB record;
begin
create temp table TempAB (
High float8,
Low float8,
AID int4
);
insert into TempAB (High, Low, AId)
select high, low, id
from A
where bId is null;
for tempB in (select * from TempAB)
loop
insert into B (high, low)
values (tempB.high, tempB.low)
returning id into newId;
update A
set bId=newId
where id=tempB.AId;
end loop;
end $body$;
rollback;
--commit;

Function taking forever to run for large number of records

I have created the following function in Postgres 9.3.5:
CREATE OR REPLACE FUNCTION get_result(val1 text, val2 text)
RETURNS text AS
$BODY
$Declare
result text;
BEGIN
select min(id) into result from table
where id_used is null and id_type = val2;
update table set
id_used = 'Y',
col1 = val1,
id_used_date = now()
where id_type = val2
and id = result;
RETURN result;
END;
$BODY$
LANGUAGE plpgsql VOLATILE COST 100;
When I run this function in a loop of over a 1000 or more records it just does freezing and just says "query is running". When I check my table nothing is being updated. When I run it for one or two records it runs fine.
Example of the function when being run:
select get_result('123','idtype');
table columns:
id character varying(200),
col1 character varying(200),
id_used character varying(1),
id_used_date timestamp without time zone,
id_type character(200)
id is the table index.
Can someone help?
Most probably you are running into race conditions. When you run your function a 1000 times in quick succession in separate transactions, something like this happens:
T1 T2 T3 ...
SELECT max(id) -- id 1
SELECT max(id) -- id 1
SELECT max(id) -- id 1
...
Row id 1 locked, wait ...
Row id 1 locked, wait ...
UPDATE id 1
...
COMMIT
Wake up, UPDATE id 1 again!
COMMIT
Wake up, UPDATE id 1 again!
COMMIT
...
Largely rewritten and simplified as SQL function:
CREATE OR REPLACE FUNCTION get_result(val1 text, val2 text)
RETURNS text AS
$func$
UPDATE table t
SET id_used = 'Y'
, col1 = val1
, id_used_date = now()
FROM (
SELECT id
FROM table
WHERE id_used IS NULL
AND id_type = val2
ORDER BY id
LIMIT 1
FOR UPDATE -- lock to avoid race condition! see below ...
) t1
WHERE t.id_type = val2
-- AND t.id_used IS NULL -- repeat condition (not if row is locked)
AND t.id = t1.id
RETURNING id;
$func$ LANGUAGE sql;
Related question with a lot more explanation:
Atomic UPDATE .. SELECT in Postgres
Explain
Don't run two separate SQL statements. That is more expensive and widens the time frame for race conditions. One UPDATE with a subquery is much better.
You don't need PL/pgSQL for the simple task. You still can use PL/pgSQL, the UPDATE stays the same.
You need to lock the selected row to defend against race conditions. But you cannot do this with the aggregate function you head because, per documentation:
The locking clauses cannot be used in contexts where returned rows
cannot be clearly identified with individual table rows; for example
they cannot be used with aggregation.
Bold emphasis mine. Luckily, you can replace min(id) easily with the equivalent ORDER BY / LIMIT 1 I provided above. Can use an index just as well.
If the table is big, you need an index on id at least. Assuming that id is indexed already as PRIMARY KEY, that would help. But this additional partial multicolumn index would probably help a lot more:
CREATE INDEX foo_idx ON table (id_type, id)
WHERE id_used IS NULL;
Alternative solutions
Advisory locks May be the superior approach here:
Postgres UPDATE ... LIMIT 1
Or you may want to lock many rows at once:
How to mark certain nr of rows in table on concurrent access

PostgreSQL: Auto-increment based on multi-column unique constraint

One of my tables has the following definition:
CREATE TABLE incidents
(
id serial NOT NULL,
report integer NOT NULL,
year integer NOT NULL,
month integer NOT NULL,
number integer NOT NULL, -- Report serial number for this period
...
CONSTRAINT PRIMARY KEY (id),
CONSTRAINT UNIQUE (report, year, month, number)
);
How would you go about incrementing the number column for every report, year, and month independently? I'd like to avoid creating a sequence or table for each (report, year, month) set.
It would be nice if PostgreSQL supported incrementing "on a secondary column in a multiple-column index" like MySQL's MyISAM tables, but I couldn't find a mention of such a feature in the manual.
An obvious solution is to select the current value in the table + 1, but this obviously is not safe for concurrent sessions. Maybe a pre-insert trigger would work, but are they guaranteed to be non-concurrent?
Also note that I'm inserting incidents individually, so I can't use generate_series as suggested elsewhere.
It would be nice if PostgreSQL supported incrementing "on a secondary column in a multiple-column index" like MySQL's MyISAM tables
Yeah, but note that in doing so, MyISAM locks your entire table. Which then makes it safe to find the biggest +1 without worrying about concurrent transactions.
In Postgres, you can do this too, and without locking the whole table. An advisory lock and a trigger will be good enough:
CREATE TYPE animal_grp AS ENUM ('fish','mammal','bird');
CREATE TABLE animals (
grp animal_grp NOT NULL,
id INT NOT NULL DEFAULT 0,
name varchar NOT NULL,
PRIMARY KEY (grp,id)
);
CREATE OR REPLACE FUNCTION animals_id_auto()
RETURNS trigger AS $$
DECLARE
_rel_id constant int := 'animals'::regclass::int;
_grp_id int;
BEGIN
_grp_id = array_length(enum_range(NULL, NEW.grp), 1);
-- Obtain an advisory lock on this table/group.
PERFORM pg_advisory_lock(_rel_id, _grp_id);
SELECT COALESCE(MAX(id) + 1, 1)
INTO NEW.id
FROM animals
WHERE grp = NEW.grp;
RETURN NEW;
END;
$$ LANGUAGE plpgsql STRICT;
CREATE TRIGGER animals_id_auto
BEFORE INSERT ON animals
FOR EACH ROW WHEN (NEW.id = 0)
EXECUTE PROCEDURE animals_id_auto();
CREATE OR REPLACE FUNCTION animals_id_auto_unlock()
RETURNS trigger AS $$
DECLARE
_rel_id constant int := 'animals'::regclass::int;
_grp_id int;
BEGIN
_grp_id = array_length(enum_range(NULL, NEW.grp), 1);
-- Release the lock.
PERFORM pg_advisory_unlock(_rel_id, _grp_id);
RETURN NEW;
END;
$$ LANGUAGE plpgsql STRICT;
CREATE TRIGGER animals_id_auto_unlock
AFTER INSERT ON animals
FOR EACH ROW
EXECUTE PROCEDURE animals_id_auto_unlock();
INSERT INTO animals (grp,name) VALUES
('mammal','dog'),('mammal','cat'),
('bird','penguin'),('fish','lax'),('mammal','whale'),
('bird','ostrich');
SELECT * FROM animals ORDER BY grp,id;
This yields:
grp | id | name
--------+----+---------
fish | 1 | lax
mammal | 1 | dog
mammal | 2 | cat
mammal | 3 | whale
bird | 1 | penguin
bird | 2 | ostrich
(6 rows)
There is one caveat. Advisory locks are held until released or until the session expires. If an error occurs during the transaction, the lock is kept around and you need to release it manually.
SELECT pg_advisory_unlock('animals'::regclass::int, i)
FROM generate_series(1, array_length(enum_range(NULL::animal_grp),1)) i;
In Postgres 9.1, you can discard the unlock trigger, and replace the pg_advisory_lock() call with pg_advisory_xact_lock(). That one is automatically held until and released at the end of the transaction.
On a separate note, I'd stick to using a good old sequence. That will make things faster -- even if it's not as pretty-looking when you look at the data.
Lastly, a unique sequence per (year, month) combo could also be obtained by adding an extra table, whose primary key is a serial, and whose (year, month) value has a unique constraint on it.
I think I found better solution. It doesn't depends on grp Type (it can be enum, integer and string) and can be used in a lot of cases.
myFunc() - function for a trigger. You can name it as you want.
number - autoincrement column which grows up for each exists value of grp.
grp - your column you want to count in number.
myTrigger - trigger for your table.
myTable - table where you want to make trigger.
unique_grp_number_key - unique constraint key. We need make it for unique pair of values: grp and number.
ALTER TABLE "myTable"
ADD CONSTRAINT "unique_grp_number_key" UNIQUE(grp, number);
CREATE OR REPLACE FUNCTION myFunc() RETURNS trigger AS $body_start$
BEGIN
SELECT COALESCE(MAX(number) + 1, 1)
INTO NEW.number
FROM "myTable"
WHERE grp = NEW.grp;
RETURN NEW;
END;
$body_start$ LANGUAGE plpgsql;
CREATE TRIGGER myTrigger BEFORE INSERT ON "myTable"
FOR EACH ROW
WHEN (NEW.number IS NULL)
EXECUTE PROCEDURE myFunc();
How does it work? When you insert something in myTable, trigger invokes and checks if number field is empty. If it is empty, myFunc() select MAX value of number where grp equals to new grp value which you want to insert. It returns max value + 1 like auto_increment and replaces null number field to new autoincrement value.
This solution is more unique than Denis de Bernardy cause it doesn't depend on grp Type, but thanks to him, his code helps me write my solution.
Maybe it's too late to write answer, but i can't found unique solution for this problem in stackoverflow, so it can help someone. Enjoy and thanks for help!
I think this will help:
http://www.varlena.com/GeneralBits/130.php
Note that in MySQL it is for MyISAM tables only.
PP I have tested advisory locks and found them useless for more than 1 transaction in same time. I am using 2 windows of pgAdmin. First is as simple as possible:
BEGIN;
INSERT INTO animals (grp,name) VALUES ('mammal','dog');
COMMIT;
BEGIN;
INSERT INTO animals (grp,name) VALUES ('mammal','cat');
COMMIT;
ERROR: duplicate key violates unique constraint "animals_pkey"
Second:
BEGIN;
INSERT INTO animals (grp,name) VALUES ('mammal','dog');
INSERT INTO animals (grp,name) VALUES ('mammal','cat');
COMMIT;
ERROR: deadlock detected
SQL state: 40P01
Detail: Process 3764 waits for ExclusiveLock on advisory lock [46462,46496,2,2]; blocked by process 2712.
Process 2712 waits for ShareLock on transaction 136759; blocked by process 3764.
Context: SQL statement "SELECT pg_advisory_lock( $1 , $2 )"
PL/pgSQL function "animals_id_auto" line 15 at perform
And database is locked and can not be unlocked - it is unknown what to unlock.