postgres default values are applying on update, not just at create

postgres default values are applying on update, not just at create - postgresql

In the process of writing this question, I found the answer and will post it below. If there are already duplicates of this question, please notify me and I will remove it, I was unable to find any.
I've got two columns tracking changes made to a table in postgres:
created_at timestamp default now()
updated_at timestamp
the updated_at column is being updated by a trigger:
united_states_congress=> \d congressional_bill_summaries;
Table "public.congressional_bill_summaries"
Column | Type | Modifiers
-------------+-----------------------------+---------------------------------------------------------------------------
id | bigint | not null default nextval('congressional_bill_summaries_id_seq'::regclass)
text | text |
created_at | timestamp without time zone | default now()
updated_at | timestamp without time zone |
bill_kid | integer | not null
date | date | not null
description | character varying(255) | not null
text_hash | uuid |
Indexes:
"congressional_bill_summaries_pkey" PRIMARY KEY, btree (id)
"congressional_bill_summaries_bill_kid_date_description_key" UNIQUE CONSTRAINT, btree (bill_kid, date, description)
Triggers:
hash_all_the_things BEFORE INSERT ON congressional_bill_summaries FOR EACH ROW EXECUTE PROCEDURE hash_this_foo()
update_iz_yoo BEFORE UPDATE ON congressional_bill_summaries FOR EACH ROW EXECUTE PROCEDURE update_iz_now()
as is one other column of the table, text_hash
My expected behavior is that when a line is first inserted, the created_at column will update to default value (which I understand to be the time at which the current transaction began, not the time of the specific query).
My expected behavior is that when a line is updated, the updated_at line will be updated by this function:
CREATE OR REPLACE FUNCTION public.update_iz_now()
RETURNS trigger
LANGUAGE plpgsql
AS $function$
BEGIN
NEW.updated_at = now();
RETURN NEW;
END;
$function$
but that created_at will remain unchanged, because a value is already in the column, so it should not override with the default.
created_at is initially functioning correctly:
united_states_congress=> select created_at, updated_at from congressional_bill_actions limit 5;
created_at | updated_at
----------------------------+------------
2017-01-28 00:08:11.238773 |
2017-01-28 00:08:11.255533 |
2017-01-28 00:08:15.036168 |
2017-01-28 00:08:15.047991 |
2017-01-28 00:08:15.071715 |
(5 rows)
But then when a line is updated, created_at is being changed to match the insert value of updated_at, leaving me with:
united_states_congress=> select created_at, updated_at from congressional_bill_actions where updated_at is not null limit 5;
created_at | updated_at
----------------------------+----------------------------
2017-01-28 07:55:34.078783 | 2017-01-28 07:55:34.078783
2017-02-01 18:47:50.673996 | 2017-02-01 18:47:50.673996
2017-02-02 14:50:33.066341 | 2017-02-02 14:50:33.066341
2017-02-02 14:50:33.083343 | 2017-02-02 14:50:33.083343
2017-02-03 13:58:34.950716 | 2017-02-03 13:58:34.950716
(5 rows)
I have been all over the internet trying to figure this one out, but the internet keeps helpfully routing me to questions about "how to create default values" and "how to make triggers."
This obviously must be a usage problem somewhere on my end, but I'm having trouble identifying it. Just in case, here is the other trigger being run on the table (on insert):
CREATE OR REPLACE FUNCTION public.hash_this_foo()
RETURNS trigger
LANGUAGE plpgsql
AS $function$
BEGIN
NEW.text_hash = md5(NEW.text)::uuid;
RETURN NEW;
END;
$function$
In the process of writing this question, I found the answer and will post it below. If there are already duplicates of this question, please notify me and I will remove it, I was unable to find any.

The problem here was my UPSERT handling, during which the schema of the table was being pulled in, resulting in the dynamic creation of queries that included lines like this:
ON CONFLICT ON CONSTRAINT congressional_bill_actions_bill_kid_date_action_actor_key DO UPDATE SET created_at = EXCLUDED.created_at,
because created_at was being set automatically to the EXCLUDED.created_at, this was causing the default value to overwrite the existing one precisely because I was instructing it to do so.
So when writing UPSERT handlers, this is something to be aware of, it would seem.
(Note: the way to avoid this is simply not to pull in any columns where column_default is not null.)

Related

postgresql: How do i add a column which stores the time of creation of the row with timestamp automatically

In postgresql I have a table already created with millions of rows.
Currently I have the following table
id | name | t1 | t2 | t3 |
--------------------------
I want to have
id | name | t1 | t2 | t3 | creation_time |
------------------------------------------
And the creation_time should be time_stamp with timezone. and automatically added whenever a new row is inserted.
HOw can I do this
Also since there are so many existing rows. I would like to add NULL for them. and for the new rows I want to use the current creation time.

You can either use a default value:
-- will default to NULL
ALTER TABLE mytab ADD creation_time timestamp with time zone;
ALTER TABLE mytab ALTER creation_time SET DEFAULT current_timestamp;
You don't set the default when you create the column, otherwise the default value will be applied to all existing rows.
The alternative is a trigger (which will not perform as well):
-- will default to NULL
ALTER TABLE mytab ADD creation_time timestamp with time zone;
CREATE FUNCTION set_creation_time() RETURNS trigger
LANGUAGE plpgsql AS
$$BEGIN
NEW.creation_time = current_timestamp;
RETURN NEW;
END;$$;
CREATE TRIGGER set_creation_time BEFORE INSERT ON mytab FOR EACH ROW
EXECUTE PROCEDURE set_creation_time();

Duplicate key found, but nothing matches in the table

I have a function which receives some parameters, does a SELECT to see if a table row exists and returns FALSE if so. If not, it does an INSERT but currently always fails with a 'duplicate key'. Here's a pseudo-code version...
CREATE OR REPLACE FUNCTION bob_function (
p_work_id INTEGER,
p_location_id VARCHAR,
p_area_id VARCHAR,
p_scheduled_work_no INTEGER,
p_start_date_time TIMESTAMPTZ,
p_work_date_time TIMESTAMPTZ,
p_user_id INTEGER,
p_comments TEXT,
p_work_stat_code CHAR(1)
)
RETURNS BOOLEAN AS $$
BEGIN
IF EXISTS (
SELECT 1
FROM work_table
WHERE location_id = p_location_id
AND area_id = p_area_id
AND work_id = p_work_id
AND scheduled_work_no = p_scheduled_work_no
AND start_date_time = p_start_date_time
AND user_work_id = p_user_id
AND work_date_time = p_work_date_time
)
THEN
RAISE NOTICE 'Work already exists - SKIPPING';
RETURN FALSE;
END IF;
INSERT INTO work_table (
location_id,
area_id,
work_id,
scheduled_work_no,
start_date_time,
user_work_id,
work_date_time,
stat_code,
comment
)
VALUES (
p_location_id,
p_area_id,
p_work_id,
p_scheduled_work_no,
p_start_date_time,
p_user_id,
p_work_date_time,
v_work_stat_code,
p_comments
);
RETURN TRUE;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;
The primary key is defined thus...
myDb=# \d task_work_pk
Index "schema1.task_work_pk"
Column | Type | Key? | Definition
-------------------+-----------------------------+------+-------------------
location_id | character varying(8) | yes | location_id
area_id | character varying(3) | yes | area_id
work_id | integer | yes | work_id
scheduled_work_no | integer | yes | scheduled_work_no
start_date_time | timestamp(0) with time zone | yes | start_date_time
user_work_id | integer | yes | user_work_id
work_date_time | timestamp(0) with time zone | yes | work_date_time
primary key, btree, for table "schema1.work_table"
Currently I get the following error every time I run this function...
ERROR: 23505: duplicate key value violates unique constraint "task_work_pk"
DETAIL: Key (location_id, area_id, work_id, scheduled_work_no, start_date_time, user_work_id, work_date_time)=(SITE_1, BOB, 218, 5, 2021-07-09 00:28:00+10, 1, 2021-07-09 21:00:15+10) already exists.
There are no rows whatsoever with work_id = 218 and this is the only place in the entire database where this table is written to. The function is only called no more frequently than once a minute and I'm 99% sure I've not got any race condition.
EDIT: updated to remove errors

I'm ignoring your PLPGSQL code because it is not real code and has obvious flaws.
Given that 218 doesn't exist the only way to cause that error without 218 pre-existing is to insert the same record twice in a single transaction.

Are increments with Postgres triggers atomic and high concurrency safe?

Given the following setup:
CREATE EXTENSION IF NOT EXISTS "pgcrypto";
CREATE TABLE IF NOT EXISTS foo (
id TEXT DEFAULT gen_random_uuid () NOT NULL,
text TEXT NOT NULL,
is_latest BOOLEAN DEFAULT TRUE,
version INTEGER NOT NULL DEFAULT 0,
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE UNIQUE INDEX foo_id_idx ON foo (id, is_latest);
CREATE INDEX foo_updated_at_idx ON foo (updated_at);
CREATE INDEX foo_created_at_idx ON foo (created_at);
CREATE OR REPLACE FUNCTION foo_copy_row ()
RETURNS TRIGGER
AS $BODY$
BEGIN
NEW.version = OLD.version + 1;
NEW.is_latest = TRUE;
NEW.updated_at = NOW();
NEW.created_at = OLD.created_at;
INSERT INTO foo (id, text, is_latest, version, updated_at, created_at)
VALUES (OLD.id, OLD.text, NULL, OLD.version, OLD.updated_at, OLD.created_at);
RETURN NEW;
END;
$BODY$
LANGUAGE plpgsql;
CREATE TRIGGER COPY BEFORE
UPDATE
ON foo FOR EACH ROW EXECUTE PROCEDURE foo_copy_row ();
I am able to successfully version my data, and on every update atomically increment the version column.
My problem is that when I have high concurrency updates on the same row I am expecting ORDER BY id, version DESC and ORDER BY id, updated_at DESC to be identical but they are not.
This is how I update my rows:
INSERT INTO foo (text) VALUES ('hello')
RETURNING *;
UPDATE foo SET text = 'welcome'
WHERE id = 'some-uuid' AND is_latest = TRUE
RETURNING *;
And this is an example of result:
SELECT id, is_latest, version, updated_at, created_at FROM foo ORDER BY id, updated_at DESC;
-
id | is_latest | version | updated_at | created_at
-------------------------------------+-----------+---------+-------------------------------+-------------------------------
4d2339ba-eb1f-4925-a4bc-753f2994bd5f | t | 4 | 2018-07-22 16:12:55.702035+00 | 2018-07-22 16:12:55.694725+00
4d2339ba-eb1f-4925-a4bc-753f2994bd5f | | 2 | 2018-07-22 16:12:55.698144+00 | 2018-07-22 16:12:55.694725+00
4d2339ba-eb1f-4925-a4bc-753f2994bd5f | | 1 | 2018-07-22 16:12:55.697429+00 | 2018-07-22 16:12:55.694725+00
4d2339ba-eb1f-4925-a4bc-753f2994bd5f | | 3 | 2018-07-22 16:12:55.697157+00 | 2018-07-22 16:12:55.694725+00
4d2339ba-eb1f-4925-a4bc-753f2994bd5f | | 0 | 2018-07-22 16:12:55.694725+00 | 2018-07-22 16:12:55.694725+00
What's the missing piece?
Is the trigger part of the transaction and is the UPDATE lock retained until both BEFORE and AFTER are executed?
Is it possible to end up with two rows with same id and version number?

That is not surprising. now() returns the time when the transaction started. There is no guarantee that the transaction that starts first will be the first one to perform the trigger.
Use the version to determine the order of the updates.

Data Type for select in plpgsql function and access its fields

I have the following tables in a Postgres 9.5 database:
product
Column | Type | Modifiers
----------------+-----------------------------+-----------------------------------------------------
id | integer | not null default nextval('product_id_seq'::regclass)
name | character varying(100) |
number_of_items | integer |
created_at | timestamp without time zone | default now()
updated_at | timestamp without time zone | default now()
total_number | integer |
provider_id | integer |
Indexes:
"pk_product" PRIMARY KEY, btree (id)
Foreign-key constraints:
"fk_product_provider" FOREIGN KEY (provider_id) REFERENCES provider(id)
And we also have
provider
Column | Typ | Modifiers
-------------+------------------------+------------------------------
id | integer | not null default nextval('property_id_seq'::regclass)
name | text |
description | text |
created_at | timestamp without time zone | default now()
updated_at | timestamp without time zone | default now()
Indexes:
"pk_provider" PRIMARY KEY, btree (id)
I am implelemtnig a plpgsql function which is supposed to find some specific products of a provider and loop through them
products = select u_id, number_of_items from product
where provider_id = p_id and total_number > limit;
loop
//here I need to loop through the products
end loop;
Question
what kind of data type should I declare for the products variables in order to store queried products into it? and also how should I have later on access to its columns like id or number_of_items?

In PostgreSQL, creating table also defines a composite data type with the same name as the table.
You can either use a variable of that type:
DECLARE
p product;
BEGIN
FOR p IN SELECT product FROM product WHERE ...
LOOP
[do something with "p.id" and "p.val"]
END LOOP;
END;
Or you can use several variables for the individual fields you need (probably better):
DECLARE
v_id integer;
v_val text;
BEGIN
FOR v_id, v_val IN SELECT id, val FROM product WHERE ...
LOOP
[do something with "v_id" and "v_val"]
END LOOP;
END;

Enforcing row immutability via trigger

I have a table with a definition similar to the following (condensed for clarity):
CREATE TABLE fns(
id serial,
start_date timestamptz NOT NULL DEFAULT CURRENT_TIMESTAMP,
end_date timestamptz,
name text NOT NULL,
parent_id integer,
PRIMARY KEY (id),
FOREIGN KEY (parent_id) REFERENCES fns(id),
UNIQUE(name)
);
When an UPDATE takes place I would like the row being 'updated' to have its end_date set to CURRENT_TIMESTAMP and to have a new row created (based on the old one) with its start_date set to CURRENT_TIMESTAMP. For example:
Before UPDATE
| id | start_date | end_date | name | parent_id |
|----|-------------------------|----------|-------|-----------|
| 1 | April, 01 2015 00:00:00 | (null) | fns_a | (null) |
Desired state after UPDATE
| id | start_date | end_date | name | parent_id |
|----|-------------------------|-------------------------|-------------|-----------|
| 1 | April, 01 2015 00:00:00 | April, 02 2015 00:00:00 | fns_a [old] | (null) |
| 2 | April, 02 2015 00:00:00 | (null) | fns_a | 1 |
I'm running into issues with the unique constraint for the name column. Here is the current state of my trigger:
CREATE OR REPLACE FUNCTION enfore_fns_immutability() RETURNS trigger AS $func$
BEGIN
-- 'Turn off' old record.
OLD.end_date = CURRENT_TIMESTAMP;
OLD.name = OLD.name || ' [old]';
-- Create the new record.
INSERT INTO fns(start_date, name, parent_id)
VALUES(CURRENT_TIMESTAMP, NEW.name, OLD.id); -- <-- unique violation
RETURN OLD;
END
$func$ LANGUAGE plpgsql;
CREATE TRIGGER tg_fns_bi
BEFORE UPDATE ON fns
FOR EACH ROW
EXECUTE PROCEDURE enforce_fns_immutability();
As far as I understand it this is failing because the update to OLD.name has not yet happened as the containing transaction has not committed. I'm struggling to think of a way around it but it feels like there must be an elegant solution for this! Some solutions I've considered:
Temporary table (feels like this is too heavyweight for this use case).
Use of an AFTER UPDATE trigger (same issue as the transaction has obviously not yet been committed).
I'm using Postgres 9.4.1.

You can create the unique constraint as deferred, in that case it will be checked when you commit your transaction, not when the insert is executed:
CREATE TABLE fns
(
id serial,
start_date timestamptz NOT NULL DEFAULT CURRENT_TIMESTAMP,
end_date timestamptz,
name text NOT NULL,
parent_id integer,
PRIMARY KEY (id),
FOREIGN KEY (parent_id) REFERENCES fns(id),
UNIQUE(name) deferrable initially deferred --<< here
);

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

postgres default values are applying on update, not just at create - postgresql

Related

postgresql: How do i add a column which stores the time of creation of the row with timestamp automatically

Duplicate key found, but nothing matches in the table

Are increments with Postgres triggers atomic and high concurrency safe?

Data Type for select in plpgsql function and access its fields

Enforcing row immutability via trigger

Categories

Resources