Get row number of row to be inserted in Postgres trigger that gives no collisions when inserting multiple rows - postgresql

Given the following (simplified) schema:
CREATE TABLE period (
id UUID NOT NULL DEFAULT uuid_generate_v4(),
name TEXT
);
CREATE TABLE course (
id UUID NOT NULL DEFAULT uuid_generate_v4(),
name TEXT
);
CREATE TABLE registration (
id UUID NOT NULL DEFAULT uuid_generate_v4(),
period_id UUID NOT NULL REFERENCES period(id),
course_id UUID NOT NULL REFERENCES course(id),
inserted_at timestamptz NOT NULL DEFAULT now()
);
I now want to add a new column client_ref, which identifies a registration unique within a period, but consists of only a 4-character string. I want to use pg_hashids - which requires a unique integer input - to base the column value on.
I was thinking of setting up a trigger on the registration table that runs on inserting a new row. I came up with the following:
CREATE OR REPLACE FUNCTION set_client_ref()
RETURNS trigger
LANGUAGE plpgsql
AS $function$
DECLARE
next_row_number integer;
BEGIN
WITH rank AS (
SELECT
period.id AS period_id,
row_number() OVER (PARTITION BY period.id ORDER BY registration.inserted_at)
FROM
registration
JOIN period ON registration.period_id = period.id ORDER BY
period.id,
row_number
)
SELECT
COALESCE(rank.row_number, 0) + 1 INTO next_row_number
FROM
period
LEFT JOIN rank ON (rank.period_id = period.id)
WHERE
period.id = NEW.period_id
ORDER BY
rank.row_number DESC
LIMIT 1;
NEW.client_ref = id_encode (next_row_number);
RETURN NEW;
END
$function$
;
The trigger is set-up like: CREATE TRIGGER set_client_ref BEFORE INSERT ON registration FOR EACH ROW EXECUTE FUNCTION set_client_ref();
This works as expected when inserting a single row to registration, but if I insert multiple within one statement, they end up having the same client_ref. I can reason about why this happens (the rows don't know about each other's existence, so they assume they're all just next in line when retrieving their row_order), but I am not sure what a way is to prevent this. I tried setting up the trigger as an AFTER trigger, but it resulted in the same (duplicated) behaviour.
What would be a better way to get the lowest possible, unique integer for the rows to be inserted (to base the hash function on) that also works when inserting multiple rows?

Related

Avoid putting PostgreSQL function result into one field

The end result of what I am after is a query that calls a function and that function returns a set of records that are in their own separate fields. I can do this but the results of the function are all in one field.
ie: http://i.stack.imgur.com/ETLCL.png and the results I am after are: http://i.stack.imgur.com/wqRQ9.png
Here's the code to create the table
CREATE TABLE tbl_1_hm
(
tbl_1_hm_id bigserial NOT NULL,
tbl_1_hm_f1 VARCHAR (250),
tbl_1_hm_f2 INTEGER,
CONSTRAINT tbl_1_hm PRIMARY KEY (tbl_1_hm_id)
)
-- do that for a few times to get some data
INSERT INTO tbl_1_hm (tbl_1_hm_f1, tbl_1_hm_f2)
VALUES ('hello', 1);
CREATE OR REPLACE FUNCTION proc_1_hm(id BIGINT)
RETURNS TABLE(tbl_1_hm_f1 VARCHAR (250), tbl_1_hm_f2 int AS $$
SELECT tbl_1_hm_f1, tbl_1_hm_f2
FROM tbl_1_hm
WHERE tbl_1_hm_id = id
$$ LANGUAGE SQL;
--And here is the current query I am running for my results:
SELECT t1.tbl_1_hm_id, proc_1_hm(t1.tbl_1_hm_id) AS t3
FROM tbl_1_hm AS t1
Thanks for having a read. Please if you want to haggle about the semantics of what I am doing by hitting the same table twice or my naming convention --> this is a simplified test.
When a function returns a set of records, you should treat it as a table source:
SELECT t1.tbl_1_hm_id, t3.*
FROM tbl_1_hm AS t1, proc_1_hm(t1.tbl_1_hm_id) AS t3;
Note that functions are implicitly using a LATERAL join (scroll down to sub-sections 4 and 5) so you can use fields from tables listed previously without having to specify an explicit JOIN condition.

Get row to swap tables on a certain condition

I currently have a parent table:
CREATE TABLE members (
member_id SERIAL NOT NULL, UNIQUE, PRIMARY KEY
first_name varchar(20)
last_name varchar(20)
address address (composite type)
contact_numbers varchar(11)[3]
date_joined date
type varchar(5)
);
and two related tables:
CREATE TABLE basic_member (
activities varchar[3])
INHERITS (members)
);
CREATE TABLE full_member (
activities varchar[])
INHERITS (members)
);
If the type is full the details are entered to the full_member table or if type is basic into the basic_member table. What I want is that if I run an update and change the type to basic or full the tuple goes into the corresponding table.
I was wondering if I could do this with a rule like:
CREATE RULE tuple_swap_full
AS ON UPDATE TO full_member
WHERE new.type = 'basic'
INSERT INTO basic_member VALUES (old.member_id, old.first_name, old.last_name,
old.address, old.contact_numbers, old.date_joined, new.type, old.activities);
... then delete the record from the full_member
Just wondering if my rule is anywhere near or if there is a better way.
You don't need
member_id SERIAL NOT NULL, UNIQUE, PRIMARY KEY
A PRIMARY KEY implies UNIQUE NOT NULL automatically:
member_id SERIAL PRIMARY KEY
I wouldn't use hard coded max length of varchar(20). Just use text and add a check constraint if you really must enforce a maximum length. Easier to change around.
Syntax for INHERITS is mangled. The key word goes outside the parens around columns.
CREATE TABLE full_member (
activities text[]
) INHERITS (members);
Table names are inconsistent (members <-> member). I use the singular form everywhere in my test case.
Finally, I would not use a RULE for the task. A trigger AFTER UPDATE seems preferable.
Consider the following
Test case:
Tables:
CREATE SCHEMA x; -- I put everything in a test schema named "x".
-- DROP TABLE x.members CASCADE;
CREATE TABLE x.member (
member_id SERIAL PRIMARY KEY
,first_name text
-- more columns ...
,type text);
CREATE TABLE x.basic_member (
activities text[3]
) INHERITS (x.member);
CREATE TABLE x.full_member (
activities text[]
) INHERITS (x.member);
Trigger function:
Data-modifying CTEs (WITH x AS ( DELETE ..) are the best tool for the purpose. Requires PostgreSQL 9.1 or later.
For older versions, first INSERT then DELETE.
CREATE OR REPLACE FUNCTION x.trg_move_member()
RETURNS trigger AS
$BODY$
BEGIN
CASE NEW.type
WHEN 'basic' THEN
WITH x AS (
DELETE FROM x.member
WHERE member_id = NEW.member_id
RETURNING *
)
INSERT INTO x.basic_member (member_id, first_name, type) -- more columns
SELECT member_id, first_name, type -- more columns
FROM x;
WHEN 'full' THEN
WITH x AS (
DELETE FROM x.member
WHERE member_id = NEW.member_id
RETURNING *
)
INSERT INTO x.full_member (member_id, first_name, type) -- more columns
SELECT member_id, first_name, type -- more columns
FROM x;
END CASE;
RETURN NULL;
END;
$BODY$
LANGUAGE plpgsql VOLATILE;
Trigger:
Note that it is an AFTER trigger and has a WHEN condition.
WHEN condition requires PostgreSQL 9.0 or later. For earlier versions, you can just leave it away, the CASE statement in the trigger itself takes care of it.
CREATE TRIGGER up_aft
AFTER UPDATE
ON x.member
FOR EACH ROW
WHEN (NEW.type IN ('basic ','full')) -- OLD.type cannot be IN ('basic ','full')
EXECUTE PROCEDURE x.trg_move_member();
Test:
INSERT INTO x.member (first_name, type) VALUES ('peter', NULL);
UPDATE x.member SET type = 'full' WHERE first_name = 'peter';
SELECT * FROM ONLY x.member;
SELECT * FROM x.basic_member;
SELECT * FROM x.full_member;

SELECT or INSERT a row in one command

I'm using PostgreSQL 9.0 and I have a table with just an artificial key (auto-incrementing sequence) and another unique key. (Yes, there is a reason for this table. :)) I want to look up an ID by the other key or, if it doesn't exist, insert it:
SELECT id
FROM mytable
WHERE other_key = 'SOMETHING'
Then, if no match:
INSERT INTO mytable (other_key)
VALUES ('SOMETHING')
RETURNING id
The question: is it possible to save a round-trip to the DB by doing both of these in one statement? I can insert the row if it doesn't exist like this:
INSERT INTO mytable (other_key)
SELECT 'SOMETHING'
WHERE NOT EXISTS (SELECT * FROM mytable WHERE other_key = 'SOMETHING')
RETURNING id
... but that doesn't give the ID of an existing row. Any ideas? There is a unique constraint on other_key, if that helps.
Have you tried to union it?
Edit - this requires Postgres 9.1:
create table mytable (id serial primary key, other_key varchar not null unique);
WITH new_row AS (
INSERT INTO mytable (other_key)
SELECT 'SOMETHING'
WHERE NOT EXISTS (SELECT * FROM mytable WHERE other_key = 'SOMETHING')
RETURNING *
)
SELECT * FROM new_row
UNION
SELECT * FROM mytable WHERE other_key = 'SOMETHING';
results in:
id | other_key
----+-----------
1 | SOMETHING
(1 row)
No, there is no special SQL syntax that allows you to do select or insert. You can do what Ilia mentions and create a sproc, which means it will not do a round trip fromt he client to server, but it will still result in two queries (three actually, if you count the sproc itself).
using 9.5 i successfully tried this
based on Denis de Bernardy's answer
only 1 parameter
no union
no stored procedure
atomic, thus no concurrency problems (i think...)
The Query:
WITH neworexisting AS (
INSERT INTO mytable(other_key) VALUES('hello 2')
ON CONFLICT(other_key) DO UPDATE SET existed=true -- need some update to return sth
RETURNING *
)
SELECT * FROM neworexisting
first call:
id|other_key|created |existed|
--|---------|-------------------|-------|
6|hello 1 |2019-09-11 11:39:29|false |
second call:
id|other_key|created |existed|
--|---------|-------------------|-------|
6|hello 1 |2019-09-11 11:39:29|true |
First create your table ;-)
CREATE TABLE mytable (
id serial NOT NULL,
other_key text NOT NULL,
created timestamptz NOT NULL DEFAULT now(),
existed bool NOT NULL DEFAULT false,
CONSTRAINT mytable_pk PRIMARY KEY (id),
CONSTRAINT mytable_uniq UNIQUE (other_key) --needed for on conflict
);
you can use a stored procedure
IF (SELECT id FROM mytable WHERE other_key = 'SOMETHING' LIMIT 1) < 0 THEN
INSERT INTO mytable (other_key) VALUES ('SOMETHING')
END IF
I have an alternative to Denis answer, that I think is less database-intensive, although a bit more complex:
create table mytable (id serial primary key, other_key varchar not null unique);
WITH table_sel AS (
SELECT id
FROM mytable
WHERE other_key = 'test'
UNION
SELECT NULL AS id
ORDER BY id NULLS LAST
LIMIT 1
), table_ins AS (
INSERT INTO mytable (id, other_key)
SELECT
COALESCE(id, NEXTVAL('mytable_id_seq'::REGCLASS)),
'test'
FROM table_sel
ON CONFLICT (id) DO NOTHING
RETURNING id
)
SELECT * FROM table_ins
UNION ALL
SELECT * FROM table_sel
WHERE id IS NOT NULL;
In table_sel CTE I'm looking for the right row. If I don't find it, I assure that table_sel returns at least one row, with a union with a SELECT NULL.
In table_ins CTE I try to insert the same row I was looking for earlier. COALESCE(id, NEXTVAL('mytable_id_seq'::REGCLASS)) is saying: id could be defined, if so, use it; whereas if id is null, increment the sequence on id and use this new value to insert a row. The ON CONFLICT clause assure
that if id is already in mytable I don't insert anything.
At the end I put everything together with a UNION between table_ins and table_sel, so that I'm sure to take my sweet id value and execute both CTE.
This query needs to search for the value other_key only once, and is a "search this value" not a "check if this value not exists in the table", that is very heavy; in Denis alternative you use other_key in both types of searches. In my query you "check if a value not exists" only on id that is a integer primary key, that, for construction, is fast.
Minor tweak a decade late to Denis's excellent answer:
-- Create the table with a unique constraint
CREATE TABLE mytable (
id serial PRIMARY KEY
, other_key varchar NOT NULL UNIQUE
);
WITH new_row AS (
-- Only insert when we don't find anything, avoiding a table lock if
-- possible.
INSERT INTO mytable ( other_key )
SELECT 'SOMETHING'
WHERE NOT EXISTS (
SELECT *
FROM mytable
WHERE other_key = 'SOMETHING'
)
RETURNING *
)
(
-- This comes first in the UNION ALL since it'll almost certainly be
-- in the query cache. Marginally slower for the insert case, but also
-- marginally faster for the much more common read-only case.
SELECT *
FROM mytable
WHERE other_key = 'SOMETHING'
-- Don't check for duplicates to be removed
UNION ALL
-- If we reach this point in iteration, we needed to do the INSERT and
-- lock after all.
SELECT *
FROM new_row
) LIMIT 1 -- Just return whatever comes first in the results and allow
-- the query engine to cut processing short for the INSERT
-- calculation.
;
The UNION ALL tells the planner it doesn't have to collect results for de-duplication. The LIMIT 1 at the end allows the planner to short-circuit further processing/iteration once it knows there's an answer available.
NOTE: There is a race condition present here and in the original answer. If the entry does not already exist, the INSERT will fail with a unique constraint violation. The error can be suppressed with ON CONFLICT DO NOTHING, but the query will return an empty set instead of the new row. This is a difficult problem because getting that info from another transaction would violate the I in ACID.

PostgreSQL: Auto-increment based on multi-column unique constraint

One of my tables has the following definition:
CREATE TABLE incidents
(
id serial NOT NULL,
report integer NOT NULL,
year integer NOT NULL,
month integer NOT NULL,
number integer NOT NULL, -- Report serial number for this period
...
CONSTRAINT PRIMARY KEY (id),
CONSTRAINT UNIQUE (report, year, month, number)
);
How would you go about incrementing the number column for every report, year, and month independently? I'd like to avoid creating a sequence or table for each (report, year, month) set.
It would be nice if PostgreSQL supported incrementing "on a secondary column in a multiple-column index" like MySQL's MyISAM tables, but I couldn't find a mention of such a feature in the manual.
An obvious solution is to select the current value in the table + 1, but this obviously is not safe for concurrent sessions. Maybe a pre-insert trigger would work, but are they guaranteed to be non-concurrent?
Also note that I'm inserting incidents individually, so I can't use generate_series as suggested elsewhere.
It would be nice if PostgreSQL supported incrementing "on a secondary column in a multiple-column index" like MySQL's MyISAM tables
Yeah, but note that in doing so, MyISAM locks your entire table. Which then makes it safe to find the biggest +1 without worrying about concurrent transactions.
In Postgres, you can do this too, and without locking the whole table. An advisory lock and a trigger will be good enough:
CREATE TYPE animal_grp AS ENUM ('fish','mammal','bird');
CREATE TABLE animals (
grp animal_grp NOT NULL,
id INT NOT NULL DEFAULT 0,
name varchar NOT NULL,
PRIMARY KEY (grp,id)
);
CREATE OR REPLACE FUNCTION animals_id_auto()
RETURNS trigger AS $$
DECLARE
_rel_id constant int := 'animals'::regclass::int;
_grp_id int;
BEGIN
_grp_id = array_length(enum_range(NULL, NEW.grp), 1);
-- Obtain an advisory lock on this table/group.
PERFORM pg_advisory_lock(_rel_id, _grp_id);
SELECT COALESCE(MAX(id) + 1, 1)
INTO NEW.id
FROM animals
WHERE grp = NEW.grp;
RETURN NEW;
END;
$$ LANGUAGE plpgsql STRICT;
CREATE TRIGGER animals_id_auto
BEFORE INSERT ON animals
FOR EACH ROW WHEN (NEW.id = 0)
EXECUTE PROCEDURE animals_id_auto();
CREATE OR REPLACE FUNCTION animals_id_auto_unlock()
RETURNS trigger AS $$
DECLARE
_rel_id constant int := 'animals'::regclass::int;
_grp_id int;
BEGIN
_grp_id = array_length(enum_range(NULL, NEW.grp), 1);
-- Release the lock.
PERFORM pg_advisory_unlock(_rel_id, _grp_id);
RETURN NEW;
END;
$$ LANGUAGE plpgsql STRICT;
CREATE TRIGGER animals_id_auto_unlock
AFTER INSERT ON animals
FOR EACH ROW
EXECUTE PROCEDURE animals_id_auto_unlock();
INSERT INTO animals (grp,name) VALUES
('mammal','dog'),('mammal','cat'),
('bird','penguin'),('fish','lax'),('mammal','whale'),
('bird','ostrich');
SELECT * FROM animals ORDER BY grp,id;
This yields:
grp | id | name
--------+----+---------
fish | 1 | lax
mammal | 1 | dog
mammal | 2 | cat
mammal | 3 | whale
bird | 1 | penguin
bird | 2 | ostrich
(6 rows)
There is one caveat. Advisory locks are held until released or until the session expires. If an error occurs during the transaction, the lock is kept around and you need to release it manually.
SELECT pg_advisory_unlock('animals'::regclass::int, i)
FROM generate_series(1, array_length(enum_range(NULL::animal_grp),1)) i;
In Postgres 9.1, you can discard the unlock trigger, and replace the pg_advisory_lock() call with pg_advisory_xact_lock(). That one is automatically held until and released at the end of the transaction.
On a separate note, I'd stick to using a good old sequence. That will make things faster -- even if it's not as pretty-looking when you look at the data.
Lastly, a unique sequence per (year, month) combo could also be obtained by adding an extra table, whose primary key is a serial, and whose (year, month) value has a unique constraint on it.
I think I found better solution. It doesn't depends on grp Type (it can be enum, integer and string) and can be used in a lot of cases.
myFunc() - function for a trigger. You can name it as you want.
number - autoincrement column which grows up for each exists value of grp.
grp - your column you want to count in number.
myTrigger - trigger for your table.
myTable - table where you want to make trigger.
unique_grp_number_key - unique constraint key. We need make it for unique pair of values: grp and number.
ALTER TABLE "myTable"
ADD CONSTRAINT "unique_grp_number_key" UNIQUE(grp, number);
CREATE OR REPLACE FUNCTION myFunc() RETURNS trigger AS $body_start$
BEGIN
SELECT COALESCE(MAX(number) + 1, 1)
INTO NEW.number
FROM "myTable"
WHERE grp = NEW.grp;
RETURN NEW;
END;
$body_start$ LANGUAGE plpgsql;
CREATE TRIGGER myTrigger BEFORE INSERT ON "myTable"
FOR EACH ROW
WHEN (NEW.number IS NULL)
EXECUTE PROCEDURE myFunc();
How does it work? When you insert something in myTable, trigger invokes and checks if number field is empty. If it is empty, myFunc() select MAX value of number where grp equals to new grp value which you want to insert. It returns max value + 1 like auto_increment and replaces null number field to new autoincrement value.
This solution is more unique than Denis de Bernardy cause it doesn't depend on grp Type, but thanks to him, his code helps me write my solution.
Maybe it's too late to write answer, but i can't found unique solution for this problem in stackoverflow, so it can help someone. Enjoy and thanks for help!
I think this will help:
http://www.varlena.com/GeneralBits/130.php
Note that in MySQL it is for MyISAM tables only.
PP I have tested advisory locks and found them useless for more than 1 transaction in same time. I am using 2 windows of pgAdmin. First is as simple as possible:
BEGIN;
INSERT INTO animals (grp,name) VALUES ('mammal','dog');
COMMIT;
BEGIN;
INSERT INTO animals (grp,name) VALUES ('mammal','cat');
COMMIT;
ERROR: duplicate key violates unique constraint "animals_pkey"
Second:
BEGIN;
INSERT INTO animals (grp,name) VALUES ('mammal','dog');
INSERT INTO animals (grp,name) VALUES ('mammal','cat');
COMMIT;
ERROR: deadlock detected
SQL state: 40P01
Detail: Process 3764 waits for ExclusiveLock on advisory lock [46462,46496,2,2]; blocked by process 2712.
Process 2712 waits for ShareLock on transaction 136759; blocked by process 3764.
Context: SQL statement "SELECT pg_advisory_lock( $1 , $2 )"
PL/pgSQL function "animals_id_auto" line 15 at perform
And database is locked and can not be unlocked - it is unknown what to unlock.

Using Rule to Insert Into Secondary Table Auto-Increments Sequence

To automatically add a column in a second table to tie it to the first table via a unique index, I have a rule such as follows:
CREATE OR REPLACE RULE auto_insert AS ON INSERT TO user DO ALSO
INSERT INTO lastlogin (id) VALUES (NEW.userid);
This works fine if user.userid is an integer. However, if it is a sequence (e.g., type serial or bigserial), what is inserted into table lastlogin is the next sequence id. So this command:
INSERT INTO user (username) VALUES ('john');
would insert column [1, 'john', ...] into user but column [2, ...] into lastlogin. The following 2 workarounds do work except that the second one consumes twice as many serials since the sequence is still auto-incrementing:
CREATE OR REPLACE RULE auto_insert AS ON INSERT TO user DO ALSO
INSERT INTO lastlogin (id) VALUES (lastval());
CREATE OR REPLACE RULE auto_insert AS ON INSERT TO user DO ALSO
INSERT INTO lastlogin (id) VALUES (NEW.userid-1);
Unfortunately, the workarounds do not work if I'm inserting multiple rows:
INSERT INTO user (username) VALUES ('john'), ('mary');
The first workaround would use the same id, and the second workaround is all kind of screw-up.
Is it possible to do this via postgresql rules or should I simply do the 2nd insertion into lastlogin myself or use a row trigger? Actually, I think the row trigger would also auto-increment the sequence when I access NEW.userid.
Forget rules altogether. They're bad.
Triggers are way better for you. And in 99% of cases when someone thinks he needs a rule. Try this:
create table users (
userid serial primary key,
username text
);
create table lastlogin (
userid int primary key references users(userid),
lastlogin_time timestamp with time zone
);
create or replace function lastlogin_create_id() returns trigger as $$
begin
insert into lastlogin (userid) values (NEW.userid);
return NEW;
end;
$$
language plpgsql volatile;
create trigger lastlogin_create_id
after insert on users for each row execute procedure lastlogin_create_id();
Then:
insert into users (username) values ('foo'),('bar');
select * from users;
userid | username
--------+----------
1 | foo
2 | bar
(2 rows)
select * from lastlogin;
userid | lastlogin_time
--------+----------------
1 |
2 |
(2 rows)