Are increments with Postgres triggers atomic and high concurrency safe? - postgresql

Given the following setup:
CREATE EXTENSION IF NOT EXISTS "pgcrypto";
CREATE TABLE IF NOT EXISTS foo (
id TEXT DEFAULT gen_random_uuid () NOT NULL,
text TEXT NOT NULL,
is_latest BOOLEAN DEFAULT TRUE,
version INTEGER NOT NULL DEFAULT 0,
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE UNIQUE INDEX foo_id_idx ON foo (id, is_latest);
CREATE INDEX foo_updated_at_idx ON foo (updated_at);
CREATE INDEX foo_created_at_idx ON foo (created_at);
CREATE OR REPLACE FUNCTION foo_copy_row ()
RETURNS TRIGGER
AS $BODY$
BEGIN
NEW.version = OLD.version + 1;
NEW.is_latest = TRUE;
NEW.updated_at = NOW();
NEW.created_at = OLD.created_at;
INSERT INTO foo (id, text, is_latest, version, updated_at, created_at)
VALUES (OLD.id, OLD.text, NULL, OLD.version, OLD.updated_at, OLD.created_at);
RETURN NEW;
END;
$BODY$
LANGUAGE plpgsql;
CREATE TRIGGER COPY BEFORE
UPDATE
ON foo FOR EACH ROW EXECUTE PROCEDURE foo_copy_row ();
I am able to successfully version my data, and on every update atomically increment the version column.
My problem is that when I have high concurrency updates on the same row I am expecting ORDER BY id, version DESC and ORDER BY id, updated_at DESC to be identical but they are not.
This is how I update my rows:
INSERT INTO foo (text) VALUES ('hello')
RETURNING *;
UPDATE foo SET text = 'welcome'
WHERE id = 'some-uuid' AND is_latest = TRUE
RETURNING *;
And this is an example of result:
SELECT id, is_latest, version, updated_at, created_at FROM foo ORDER BY id, updated_at DESC;
-
id | is_latest | version | updated_at | created_at
-------------------------------------+-----------+---------+-------------------------------+-------------------------------
4d2339ba-eb1f-4925-a4bc-753f2994bd5f | t | 4 | 2018-07-22 16:12:55.702035+00 | 2018-07-22 16:12:55.694725+00
4d2339ba-eb1f-4925-a4bc-753f2994bd5f | | 2 | 2018-07-22 16:12:55.698144+00 | 2018-07-22 16:12:55.694725+00
4d2339ba-eb1f-4925-a4bc-753f2994bd5f | | 1 | 2018-07-22 16:12:55.697429+00 | 2018-07-22 16:12:55.694725+00
4d2339ba-eb1f-4925-a4bc-753f2994bd5f | | 3 | 2018-07-22 16:12:55.697157+00 | 2018-07-22 16:12:55.694725+00
4d2339ba-eb1f-4925-a4bc-753f2994bd5f | | 0 | 2018-07-22 16:12:55.694725+00 | 2018-07-22 16:12:55.694725+00
What's the missing piece?
Is the trigger part of the transaction and is the UPDATE lock retained until both BEFORE and AFTER are executed?
Is it possible to end up with two rows with same id and version number?

That is not surprising. now() returns the time when the transaction started. There is no guarantee that the transaction that starts first will be the first one to perform the trigger.
Use the version to determine the order of the updates.

Related

Duplicate key found, but nothing matches in the table

I have a function which receives some parameters, does a SELECT to see if a table row exists and returns FALSE if so. If not, it does an INSERT but currently always fails with a 'duplicate key'. Here's a pseudo-code version...
CREATE OR REPLACE FUNCTION bob_function (
p_work_id INTEGER,
p_location_id VARCHAR,
p_area_id VARCHAR,
p_scheduled_work_no INTEGER,
p_start_date_time TIMESTAMPTZ,
p_work_date_time TIMESTAMPTZ,
p_user_id INTEGER,
p_comments TEXT,
p_work_stat_code CHAR(1)
)
RETURNS BOOLEAN AS $$
BEGIN
IF EXISTS (
SELECT 1
FROM work_table
WHERE location_id = p_location_id
AND area_id = p_area_id
AND work_id = p_work_id
AND scheduled_work_no = p_scheduled_work_no
AND start_date_time = p_start_date_time
AND user_work_id = p_user_id
AND work_date_time = p_work_date_time
)
THEN
RAISE NOTICE 'Work already exists - SKIPPING';
RETURN FALSE;
END IF;
INSERT INTO work_table (
location_id,
area_id,
work_id,
scheduled_work_no,
start_date_time,
user_work_id,
work_date_time,
stat_code,
comment
)
VALUES (
p_location_id,
p_area_id,
p_work_id,
p_scheduled_work_no,
p_start_date_time,
p_user_id,
p_work_date_time,
v_work_stat_code,
p_comments
);
RETURN TRUE;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;
The primary key is defined thus...
myDb=# \d task_work_pk
Index "schema1.task_work_pk"
Column | Type | Key? | Definition
-------------------+-----------------------------+------+-------------------
location_id | character varying(8) | yes | location_id
area_id | character varying(3) | yes | area_id
work_id | integer | yes | work_id
scheduled_work_no | integer | yes | scheduled_work_no
start_date_time | timestamp(0) with time zone | yes | start_date_time
user_work_id | integer | yes | user_work_id
work_date_time | timestamp(0) with time zone | yes | work_date_time
primary key, btree, for table "schema1.work_table"
Currently I get the following error every time I run this function...
ERROR: 23505: duplicate key value violates unique constraint "task_work_pk"
DETAIL: Key (location_id, area_id, work_id, scheduled_work_no, start_date_time, user_work_id, work_date_time)=(SITE_1, BOB, 218, 5, 2021-07-09 00:28:00+10, 1, 2021-07-09 21:00:15+10) already exists.
There are no rows whatsoever with work_id = 218 and this is the only place in the entire database where this table is written to. The function is only called no more frequently than once a minute and I'm 99% sure I've not got any race condition.
EDIT: updated to remove errors
I'm ignoring your PLPGSQL code because it is not real code and has obvious flaws.
Given that 218 doesn't exist the only way to cause that error without 218 pre-existing is to insert the same record twice in a single transaction.

PostgreSQL add SERIAL column to existing table with values based on ORDER BY

I have a large table (6+ million rows) that I'd like to add an auto-incrementing integer column sid, where sid is set on existing rows based on an ORDER BY inserted_at ASC. In other words, the oldest record based on inserted_at would be set to 1 and the latest record would be the total record count. Any tips on how I might approach this?
Add a sid column and UPDATE SET ... FROM ... WHERE:
UPDATE test
SET sid = t.rownum
FROM (SELECT id, row_number() OVER (ORDER BY inserted_at ASC) as rownum
FROM test) t
WHERE test.id = t.id
Note that this relies on there being a primary key, id.
(If your table did not already have a primary key, you would have to make one first.)
For example,
-- create test table
DROP TABLE IF EXISTS test;
CREATE TABLE test (
id int PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY
, foo text
, inserted_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);
INSERT INTO test (foo, inserted_at) VALUES
('XYZ', '2019-02-14 00:00:00-00')
, ('DEF', '2010-02-14 00:00:00-00')
, ('ABC', '2000-02-14 00:00:00-00');
-- +----+-----+------------------------+
-- | id | foo | inserted_at |
-- +----+-----+------------------------+
-- | 1 | XYZ | 2019-02-13 19:00:00-05 |
-- | 2 | DEF | 2010-02-13 19:00:00-05 |
-- | 3 | ABC | 2000-02-13 19:00:00-05 |
-- +----+-----+------------------------+
ALTER TABLE test ADD COLUMN sid INT;
UPDATE test
SET sid = t.rownum
FROM (SELECT id, row_number() OVER (ORDER BY inserted_at ASC) as rownum
FROM test) t
WHERE test.id = t.id
yields
+----+-----+------------------------+-----+
| id | foo | inserted_at | sid |
+----+-----+------------------------+-----+
| 3 | ABC | 2000-02-13 19:00:00-05 | 1 |
| 2 | DEF | 2010-02-13 19:00:00-05 | 2 |
| 1 | XYZ | 2019-02-13 19:00:00-05 | 3 |
+----+-----+------------------------+-----+
Finally, make sid SERIAL (or, better, an IDENTITY column):
ALTER TABLE test ALTER COLUMN sid SET NOT NULL;
-- IDENTITY fixes certain issue which may arise with SERIAL
ALTER TABLE test ALTER COLUMN sid ADD GENERATED BY DEFAULT AS IDENTITY;
-- ALTER TABLE test ALTER COLUMN sid SERIAL;

Data Type for select in plpgsql function and access its fields

I have the following tables in a Postgres 9.5 database:
product
Column | Type | Modifiers
----------------+-----------------------------+-----------------------------------------------------
id | integer | not null default nextval('product_id_seq'::regclass)
name | character varying(100) |
number_of_items | integer |
created_at | timestamp without time zone | default now()
updated_at | timestamp without time zone | default now()
total_number | integer |
provider_id | integer |
Indexes:
"pk_product" PRIMARY KEY, btree (id)
Foreign-key constraints:
"fk_product_provider" FOREIGN KEY (provider_id) REFERENCES provider(id)
And we also have
provider
Column | Typ | Modifiers
-------------+------------------------+------------------------------
id | integer | not null default nextval('property_id_seq'::regclass)
name | text |
description | text |
created_at | timestamp without time zone | default now()
updated_at | timestamp without time zone | default now()
Indexes:
"pk_provider" PRIMARY KEY, btree (id)
I am implelemtnig a plpgsql function which is supposed to find some specific products of a provider and loop through them
products = select u_id, number_of_items from product
where provider_id = p_id and total_number > limit;
loop
//here I need to loop through the products
end loop;
Question
what kind of data type should I declare for the products variables in order to store queried products into it? and also how should I have later on access to its columns like id or number_of_items?
In PostgreSQL, creating table also defines a composite data type with the same name as the table.
You can either use a variable of that type:
DECLARE
p product;
BEGIN
FOR p IN SELECT product FROM product WHERE ...
LOOP
[do something with "p.id" and "p.val"]
END LOOP;
END;
Or you can use several variables for the individual fields you need (probably better):
DECLARE
v_id integer;
v_val text;
BEGIN
FOR v_id, v_val IN SELECT id, val FROM product WHERE ...
LOOP
[do something with "v_id" and "v_val"]
END LOOP;
END;

postgres default values are applying on update, not just at create

In the process of writing this question, I found the answer and will post it below. If there are already duplicates of this question, please notify me and I will remove it, I was unable to find any.
I've got two columns tracking changes made to a table in postgres:
created_at timestamp default now()
updated_at timestamp
the updated_at column is being updated by a trigger:
united_states_congress=> \d congressional_bill_summaries;
Table "public.congressional_bill_summaries"
Column | Type | Modifiers
-------------+-----------------------------+---------------------------------------------------------------------------
id | bigint | not null default nextval('congressional_bill_summaries_id_seq'::regclass)
text | text |
created_at | timestamp without time zone | default now()
updated_at | timestamp without time zone |
bill_kid | integer | not null
date | date | not null
description | character varying(255) | not null
text_hash | uuid |
Indexes:
"congressional_bill_summaries_pkey" PRIMARY KEY, btree (id)
"congressional_bill_summaries_bill_kid_date_description_key" UNIQUE CONSTRAINT, btree (bill_kid, date, description)
Triggers:
hash_all_the_things BEFORE INSERT ON congressional_bill_summaries FOR EACH ROW EXECUTE PROCEDURE hash_this_foo()
update_iz_yoo BEFORE UPDATE ON congressional_bill_summaries FOR EACH ROW EXECUTE PROCEDURE update_iz_now()
as is one other column of the table, text_hash
My expected behavior is that when a line is first inserted, the created_at column will update to default value (which I understand to be the time at which the current transaction began, not the time of the specific query).
My expected behavior is that when a line is updated, the updated_at line will be updated by this function:
CREATE OR REPLACE FUNCTION public.update_iz_now()
RETURNS trigger
LANGUAGE plpgsql
AS $function$
BEGIN
NEW.updated_at = now();
RETURN NEW;
END;
$function$
but that created_at will remain unchanged, because a value is already in the column, so it should not override with the default.
created_at is initially functioning correctly:
united_states_congress=> select created_at, updated_at from congressional_bill_actions limit 5;
created_at | updated_at
----------------------------+------------
2017-01-28 00:08:11.238773 |
2017-01-28 00:08:11.255533 |
2017-01-28 00:08:15.036168 |
2017-01-28 00:08:15.047991 |
2017-01-28 00:08:15.071715 |
(5 rows)
But then when a line is updated, created_at is being changed to match the insert value of updated_at, leaving me with:
united_states_congress=> select created_at, updated_at from congressional_bill_actions where updated_at is not null limit 5;
created_at | updated_at
----------------------------+----------------------------
2017-01-28 07:55:34.078783 | 2017-01-28 07:55:34.078783
2017-02-01 18:47:50.673996 | 2017-02-01 18:47:50.673996
2017-02-02 14:50:33.066341 | 2017-02-02 14:50:33.066341
2017-02-02 14:50:33.083343 | 2017-02-02 14:50:33.083343
2017-02-03 13:58:34.950716 | 2017-02-03 13:58:34.950716
(5 rows)
I have been all over the internet trying to figure this one out, but the internet keeps helpfully routing me to questions about "how to create default values" and "how to make triggers."
This obviously must be a usage problem somewhere on my end, but I'm having trouble identifying it. Just in case, here is the other trigger being run on the table (on insert):
CREATE OR REPLACE FUNCTION public.hash_this_foo()
RETURNS trigger
LANGUAGE plpgsql
AS $function$
BEGIN
NEW.text_hash = md5(NEW.text)::uuid;
RETURN NEW;
END;
$function$
In the process of writing this question, I found the answer and will post it below. If there are already duplicates of this question, please notify me and I will remove it, I was unable to find any.
The problem here was my UPSERT handling, during which the schema of the table was being pulled in, resulting in the dynamic creation of queries that included lines like this:
ON CONFLICT ON CONSTRAINT congressional_bill_actions_bill_kid_date_action_actor_key DO UPDATE SET created_at = EXCLUDED.created_at,
because created_at was being set automatically to the EXCLUDED.created_at, this was causing the default value to overwrite the existing one precisely because I was instructing it to do so.
So when writing UPSERT handlers, this is something to be aware of, it would seem.
(Note: the way to avoid this is simply not to pull in any columns where column_default is not null.)

postgresql trigger for filling new column on insert

I have a table Table_A:
\d "Table_A";
Table "public.Table_A"
Column | Type | Modifiers
----------+---------+-------------------------------------------------------------
id | integer | not null default nextval('"Table_A_id_seq"'::regclass)
field1 | bigint |
field2 | bigint |
and now I want to add a new column. So I run:
ALTER TABLE "Table_A" ADD COLUMN "newId" BIGINT DEFAULT NULL;
now I have:
\d "Table_A";
Table "public.Table_A"
Column | Type | Modifiers
----------+---------+-------------------------------------------------------------
id | integer | not null default nextval('"Table_A_id_seq"'::regclass)
field1 | bigint |
field2 | bigint |
newId | bigint |
And I want newId to be filled with the same value as id for new/updated rows.
I created the following function and trigger:
CREATE OR REPLACE FUNCTION autoFillNewId() RETURNS TRIGGER AS $$
BEGIN
NEW."newId" := NEW."id";
RETURN NEW;
END $$ LANGUAGE plpgsql;
CREATE TRIGGER "newIdAutoFill" AFTER INSERT OR UPDATE ON "Table_A" EXECUTE PROCEDURE autoFillNewId();
Now if I insert something with:
INSERT INTO "Table_A" values (97, 1, 97);
newId is not filled:
select * from "Table_A" where id = 97;
id | field1 | field2 | newId
----+----------+----------+-------
97 | 1 | 97 |
Note: I also tried with FOR EACH ROW from some answer here in SO
What's missing me?
You need a BEFORE INSERT OR UPDATE ... FOR EACH ROW trigger to make this work:
CREATE TRIGGER "newIdAutoFill"
BEFORE INSERT OR UPDATE ON "Table_A"
FOR EACH ROW EXECUTE PROCEDURE autoFillNewId();
A BEFORE trigger takes place before the new row is inserted or updated, so you can still makes changes to the field values. An AFTER trigger is useful to implement some side effect, like auditing of changes or cascading changes to other tables.
By default, triggers are FOR EACH STATEMENT and then the NEW parameter is not defined (because the trigger does not operate on a row). So you have to specify FOR EACH ROW.