Atomic INSERT WHERE NOT EXISTS (Postgres 9.6.3) - postgresql

The following statement suffers a race condition which I can reliably demonstrate by executing it concurrent in very quick succession. Is there a way to remove this race condition from the below of do I need to take a different approach altogether?
INSERT INTO scheduled_event_log ("key")
SELECT 'test'
WHERE NOT EXISTS(
SELECT AGE(now() at time zone 'utc', timestamp_utc)
FROM scheduled_event_log
WHERE "key" = 'test'
AND age(now() at time zone 'utc', timestamp_utc) < '6s' FOR UPDATE);
Table as follows:
CREATE TABLE scheduled_event_log (
"key" varchar(64) NOT NULL,
"timestamp_utc" timestamp without time zone default (now() at time zone 'utc')
);

Related

Weird now() time difference with Postgres triggers

In a Postgres 10.10 database, I have a table table1 , and an AFTER INSERT trigger on table1 for table2:
CREATE TABLE table1 (
id SERIAL PRIMARY KEY,
-- other cols
created_at timestamp with time zone NOT NULL,
updated_at timestamp with time zone NOT NULL
);
CREATE UNIQUE INDEX table1_pkey ON table1(id int4_ops);
CREATE TABLE table2 (
id SERIAL PRIMARY KEY,
table1_id integer NOT NULL REFERENCES table1(id) ON UPDATE CASCADE,
-- other cols (not used in query)
created_at timestamp with time zone NOT NULL,
updated_at timestamp with time zone NOT NULL
);
CREATE UNIQUE INDEX table2_pkey ON table2(id int4_ops);
This query is executed on application start:
CREATE OR REPLACE FUNCTION after_insert_table1()
RETURNS trigger AS
$$
BEGIN
INSERT INTO table2 (table1_id, ..., created_at, updated_at)
VALUES (NEW.id, ..., 'now', 'now');
RETURN NEW;
END;
$$
LANGUAGE 'plpgsql';
DROP TRIGGER IF EXISTS after_insert_table1 ON "table1";
CREATE TRIGGER after_insert_table1
AFTER INSERT ON "table1"
FOR EACH ROW
EXECUTE PROCEDURE after_insert_table1();
I noticed some created_at and updated_at values on table2 are different to table1. In fact, table2 has mostly older values.
Here are 10 sequential entries, which show the difference jumping around a huge amount within a few minutes:
|table1_id|table1_created |table2_created |diff |
|---------|--------------------------|-----------------------------|----------------|
|2000 |2019-11-07 22:29:47.245+00|2019-11-07 19:51:09.727021+00|-02:38:37.517979|
|2001 |2019-11-07 22:30:02.256+00|2019-11-07 13:18:29.45962+00 |-09:11:32.79638 |
|2002 |2019-11-07 22:30:43.021+00|2019-11-07 13:44:12.099577+00|-08:46:30.921423|
|2003 |2019-11-07 22:31:00.794+00|2019-11-07 19:51:09.727021+00|-02:39:51.066979|
|2004 |2019-11-07 22:31:11.315+00|2019-11-07 13:18:29.45962+00 |-09:12:41.85538 |
|2005 |2019-11-07 22:31:27.234+00|2019-11-07 13:44:12.099577+00|-08:47:15.134423|
|2006 |2019-11-07 22:31:47.436+00|2019-11-07 13:18:29.45962+00 |-09:13:17.97638 |
|2007 |2019-11-07 22:33:19.484+00|2019-11-07 17:22:48.129063+00|-05:10:31.354937|
|2008 |2019-11-07 22:33:51.607+00|2019-11-07 19:51:09.727021+00|-02:42:41.879979|
|2009 |2019-11-07 22:34:28.786+00|2019-11-07 13:18:29.45962+00 |-09:15:59.32638 |
|2010 |2019-11-07 22:36:50.242+00|2019-11-07 13:18:29.45962+00 |-09:18:20.78238 |
Sequential entries have similar differences (mostly negative/mostly positive), and similar orders of magnitude (mostly minutes vs mostly hours) within the sequence, though there are exceptions
Here are the top 5 largest positive differences:
|table1_id|table1_created |table2_created |diff |
|---------|--------------------------|-----------------------------|----------------|
|1630 |2019-10-25 21:12:14.971+00|2019-10-26 00:52:09.376+00 |03:39:54.405 |
|950 |2019-09-16 12:36:07.185+00|2019-09-16 14:07:35.504+00 |01:31:28.319 |
|1677 |2019-10-26 22:19:12.087+00|2019-10-26 23:38:34.102+00 |01:19:22.015 |
|58 |2018-12-08 20:11:20.306+00|2018-12-08 21:06:42.246+00 |00:55:21.94 |
|171 |2018-12-17 22:24:57.691+00|2018-12-17 23:16:05.992+00 |00:51:08.301 |
Here are the top 5 largest negative differences:
|table1_id|table1_created |table2_created |diff |
|---------|--------------------------|-----------------------------|----------------|
|1427 |2019-10-15 16:03:43.641+00|2019-10-14 17:59:41.57749+00 |-22:04:02.06351 |
|1426 |2019-10-15 13:26:07.314+00|2019-10-14 18:00:50.930513+00|-19:25:16.383487|
|1424 |2019-10-15 13:13:44.092+00|2019-10-14 18:00:50.930513+00|-19:12:53.161487|
|4416 |2020-01-11 00:15:03.751+00|2020-01-10 08:43:19.668399+00|-15:31:44.082601|
|4420 |2020-01-11 01:58:32.541+00|2020-01-10 11:04:19.288023+00|-14:54:13.252977|
Negative differences outnumber positive differences 10x. The database timezone is UTC.
table2.table1_id is a foreign key, so it should be impossible to insert before insert on table1 completes.
table1.created_at is set by Sequelize, using option timestamps: true on the model.
When a row is inserted into table1, it's done inside a transaction. From the documentation I can find, triggers are executed inside the same transaction, so I can't think of a reason for this.
I can fix the issue by changing my trigger to use NEW.created_at instead of 'now', but I'm curious if anyone has any idea what the cause of this bug is?
Here is the query used to produce the above difference tables:
SELECT
table1.id AS table1_id,
table1.created_at AS table1_created,
table2.created_at AS table2_created,
(table2.created_at - table1.created_at) AS diff
FROM table1
INNER JOIN table2 ON
table2.table1_id = table1.id AND (
(table2.created_at - table1.created_at) > '2 min' OR
(table1.created_at - table2.created_at) > '2 min')
ORDER BY diff;
While 'now' is not a plain string, it is also not a function in this context, but a special date/time input. The manual:
... simply notational shorthands that will be converted to ordinary date/time values when read. (In particular, now and related strings are converted to a specific time value as soon as they are read.)
The body of a PL/pgSQL function is stored as string, each nested SQL command is parsed and prepared when control reaches it the first time per session. The manual:
The PL/pgSQL interpreter parses the function's source text and
produces an internal binary instruction tree the first time the
function is called (within each session). The instruction tree fully
translates the PL/pgSQL statement structure, but individual SQL
expressions and SQL commands used in the function are not translated
immediately.
As each expression and SQL command is first executed in the function,
the PL/pgSQL interpreter parses and analyzes the command to create a
prepared statement, using the SPI manager's SPI_prepare function.
Subsequent visits to that expression or command reuse the prepared statement.
There is more. Read on. But that's enough for our case:
The first time the trigger is executed per session, 'now' is translated to the current timestamp (the transaction timestamp). While doing more inserts in that same transaction, there won't be any difference to transaction_timestamp() because that is stable within a transaction by design.
But every subsequent transaction in the same session will insert the same, constant timestamp in table2, while values for table1 may be anything (not sure what Sequelize does there). If new values in table1 are the then current timestamp, that results in a "negative" diff in your test. (Timestamps in table2 will be older.)
Solution
Situations where you actually want 'now' are few and far between. Typically, you want the function now() (without single quotes!) - which is equivalent to CURRENT_TIMESTAMP (standard SQL) and transaction_timestamp(). Related (recommended reading!):
Difference between now() and current_timestamp
In your particular case I suggest column defaults instead of doing additional work in triggers. If you set the same default now() in table1 and table2, you also eliminate any nonsense the INSERT to table1 might add. And you never have to even mention these columns in inserts any more:
CREATE TABLE table1 (
id SERIAL PRIMARY KEY,
-- other cols
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now() -- or leave this one NULL?
);
CREATE TABLE table2 (
id SERIAL PRIMARY KEY,
table1_id integer NOT NULL REFERENCES table1(id) ON UPDATE CASCADE,
-- other cols (not used in query)
created_at timestamptz NOT NULL DEFAULT now(), -- not 'now'!
updated_at timestamptz NOT NULL DEFAULT now() -- or leave this one NULL?
);
CREATE OR REPLACE FUNCTION after_insert_table1()
RETURNS trigger LANGUAGE plpgsql AS
$$
BEGIN
INSERT INTO table2 (table1_id) -- more columns? but not: created_at, updated_at
VALUES (NEW.id); -- more columns?
RETURN NULL; -- can be NULL for AFTER trigger
END
$$;

Postgres remove duplicates (multiple columns) in order to add unique constraint

I have a table:
CREATE TABLE public.assignment (
id integer NOT NULL,
dining_table_id integer NOT NULL,
guest_group_id integer NOT NULL,
start_timestamp timestamp without time zone DEFAULT '1999-01-01 00:00:00'::timestamp without time zone NOT NULL,
end_timestamp timestamp without time zone DEFAULT '1999-01-02 00:00:00'::timestamp without time zone NOT NULL,
assignment_related_id text
);
When I add an unique constraint:
ALTER TABLE assignment ADD CONSTRAINT unique_assignment UNIQUE (dining_table_id, guest_group_id, start_timestamp, end_timestamp);
I get:
ERROR: could not create unique index "unique_assignment"
DETAIL: Key (dining_table_id, guest_group_id, start_timestamp, end_timestamp)=(1433, 101476, 2019-07-16 18:30:00, 2019-07-16 20:30:00) is duplicated.
So how can I delete all duplicates, which have the same values in the concerning columns.
DELETE FROM assignment
WHERE id IN (SELECT id
FROM (SELECT id,
ROW_NUMBER() OVER (partition BY dining_table_id, guest_group_id, start_timestamp, end_timestamp ORDER BY id) AS rnum
FROM assignment) t
WHERE t.rnum > 1);

Postgresql 8.4 update query syntax error in plpgsql function

I am using PostgreSQL 8.4 and creating a plpgsql function. In the body of this function I have a query to update records.
...
UPDATE device_syncfiles SET
state_code = 1, updated_at = NOW() at time zone 'UTC'
WHERE
((state_code = 2 AND EXTRACT(EPOCH FROM (NOW() at time zone 'UTC' - updated_at::timestamp without time zone)) > 3600) OR
(state_code = 3 AND EXTRACT(EPOCH FROM (NOW() at time zone 'UTC' - updated_at::timestamp without time zone)) > 3600));
...
When I load this function into database, a syntax error turns out
ERROR: syntax error at or near "$1"
LINE 1: UPDATE device_syncfiles SET $1 = 1, $2 = NOW() at time z...
^
QUERY: UPDATE device_syncfiles SET $1 = 1, $2 = NOW() at time zone 'UTC' WHERE (( $1 = 2 AND EXTRACT(EPOCH FROM (NOW() at time zone 'UTC' - $2 ::timestamp without time zone)) > $3 ) OR ( $1 = 3 AND EXTRACT(EPOCH FROM (NOW() at time zone 'UTC' - $2 ::timestamp without time zone)) > $4 ))
CONTEXT: SQL statement in PL/PgSQL function "syncfile_get" near line 19
I cannot find any problem with this query. What's wrong here?
UPDATE: (missing information)
Table: device_syncfiles
id PK integer auto inc
user_id integer FK
file_name character varying(255) NOT NULL,
state_code integer NOT NULL FK,
md5 character varying(255) NOT NULL,
msg character varying(255),
created_at timestamp without time zone,
updated_at timestamp without time zone
Function: syncfile_get()
CREATE OR REPLACE FUNCTION syncfile_get()
RETURNS TABLE(id integer, user_id integer, file_name character varying, state_code integer, md5 character varying, created_at timestamp without time zone, updated_at timestamp without time zone) AS
$BODY$
DECLARE
_device_syncfile_id integer;
_download_timeout integer;
_processing_timeout integer;
BEGIN
-- GET all timeout info
SELECT state_timeout INTO _download_timeout FROM device_syncfile_states
WHERE state_name = 'downloading';
SELECT state_timeout INTO _processing_timeout FROM device_syncfile_states
WHERE state_name = 'processing';
-- GET syncfile id
_device_syncfile_id = NULL;
-- Reset timed out file to idel state
UPDATE device_syncfiles SET
state_code = 1, updated_at = NOW() at time zone 'UTC'
WHERE
((state_code = 2 AND EXTRACT(EPOCH FROM (NOW() at time zone 'UTC' - updated_at::timestamp without time zone)) > _download_timeout) OR
(state_code = 3 AND EXTRACT(EPOCH FROM (NOW() at time zone 'UTC' - updated_at::timestamp without time zone)) > _processing_timeout));
-- GET the id of one idel/timed out file => result could be a integer or NULL
SELECT device_syncfiles.id INTO _device_syncfile_id FROM device_syncfiles
WHERE
device_syncfiles.state_code = 1 OR
(device_syncfiles.state_code = 2 AND EXTRACT(EPOCH FROM (NOW() at time zone 'UTC' - device_syncfiles.updated_at::timestamp without time zone)) > _download_timeout) OR
(device_syncfiles.state_code = 3 AND EXTRACT(EPOCH FROM (NOW() at time zone 'UTC' - device_syncfiles.updated_at::timestamp without time zone)) > _processing_timeout)
LIMIT 1;
-- WHEN NULL skip state update and return empty set of record
-- Otherwise return the set of record with the id found in last step
IF _device_syncfile_id IS NOT NULL THEN
PERFORM syncfile_update(_device_syncfile_id, 2, NULL);
END IF;
RETURN QUERY SELECT
device_syncfiles.id,
device_syncfiles.user_id ,
device_syncfiles.file_name ,
device_syncfiles.state_code ,
device_syncfiles.md5 ,
device_syncfiles.created_at ,
device_syncfiles.updated_at
FROM device_syncfiles WHERE device_syncfiles.id = _device_syncfile_id;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
Many problems.
0.
I am using Postgresql 8.4
Postgres 8.4 reached EOL in July 2014. Consider upgrading to a current version. Urgently.
1.
Your question does not disclose the complete function (at least header and footer) nor any table definition and some sample data to help us help you.
I have to make assumptions, and my educated guess is that you have a function parameter named state_code, which conflicts with the identical column name. In three places. Basics:
Postgresql - INSERT RETURNING INTO ambiguous column reference
How to return result of a SELECT inside a function in PostgreSQL?
You must be aware the all fields declared in a RETURNS TABLE clause are effectively OUT parameters as well. (As stated in the first sentence of the first link.) So your Q update confirmed my assumptions.
2.
Your error message reports the first of those instances here:
UPDATE device_syncfiles SET
state_code = 1 ...
That's a consequence of 0.. You are tripping over your long dead and forgotten version of Postgres, where the superficial syntax check at function creation time used to detect a naming conflict between target columns of UPDATE statements and function parameters. Which is silly and was later removed: those target columns cannot conflict with function parameters on principal.
Your error reproduced in Postgres 8.4: dbfiddle here
The same does not happen in Postgres 9.4: dbfiddle here
To fix, best rename the function parameter to avoid conflicts. Related:
Postgres function returning a row as JSON value
3.
There are two more instances:
WHERE
((state_code = 2 AND EXTRACT(EPOCH FROM (NOW() at time zone 'UTC' - updated_at::timestamp without time zone)) > 3600) OR
(state_code = 3 AND EXTRACT(EPOCH FROM (NOW() at time zone 'UTC' - updated_at::timestamp without time zone)) > 3600));
Would need a fix in any version. Postgres cannot tell whether to resolve to the function parameter or the table column. Best table-qualify all columns to avoid any possible conflicts with parameter names a priori. (Except for UPDATE target columns, which do not need nor allow table qualification.)
4.
That's still lipstick on a pig. Improve the query like this:
UPDATE device_syncfiles d
SET state_code = 1
, updated_at = NOW() AT TIME ZONE 'UTC'
WHERE d.state_code IN (2, 3)
AND d.updated_at < (now() - interval '1 hour') AT TIME ZONE 'UTC';
Shorter, faster, can use an index on updated_at.
5.
Finally consider using timestamp with time zone instead of timestamp without time zone to begin with:
Ignoring timezones altogether in Rails and PostgreSQL

postgresql - insert result of query SELECT EXTRACT into another table

I have the following table in postgresql (table1):
Var1,
var2,
var3,
timestamp1 timestamp without time zone NOT NULL,
timestamp2 timestamp without time zone NOT NULL,
diff double precision,
The column diff is empty.
I calculate the variable diff by the following code:
SELECT EXTRACT(EPOCH FROM ((timestamp1 – timestamp2)/1800))
I want insert the result of this operation in variable diff of table 1.
I write the following code, but do not work…
CREATE TEMPORARY TABLE temptablename AS
SELECT EXTRACT(EPOCH FROM ((timestamp1 – timestamp2)/1800)) AS diff2 from table1;
INSERT INTO table1 (diff) SELECT diff2 FROM temptablename;
ERROR: null value in column "" violates not-null constraint
DETAIL: Failing row contains (null, null, null, null, null,83).
Assuming your arithmetic is right, it sounds like you just need an update statement.
update table1
set diff = extract(epoch from ((timestamp1 – timestamp2)/1800))
where diff is null;
The WHERE clause isn't strictly necessary, since you already know that column is empty. But it guards against overwriting values the second time you run that statement.

Extracting the number of days from a calculated interval

I am trying to get a query like the following one to work:
SELECT EXTRACT(DAY FROM INTERVAL to_date - from_date) FROM histories;
In the referenced table, to_date and from_date are of type timestamp without time zone. A regular query like
SELECT to_date - from_date FROM histories;
Gives me interval results such as '65 days 04:58:09.99'. But using this expression inside the first query gives me an error: invalid input syntax for type interval. I've tried various quotations and even nesting the query without luck. Can this be done?
SELECT EXTRACT(DAY FROM INTERVAL to_date - from_date) FROM histories;
This makes no sense. INTERVAL xxx is syntax for interval literals. So INTERVAL from_date is a syntax error, since from_date isn't a literal. If your code really looks more like INTERVAL '2012-02-01' then that's going to fail, because 2012-02-01 is not valid syntax for an INTERVAL.
The INTERVAL keyword here is just noise. I suspect you misunderstood an example from the documentation. Remove it and the expression will be fine.
I'm guessing you're trying to get the number of days between two dates represented as timestamp or timestamptz.
If so, either cast both to date:
SELECT to_date::date - from_date::date FROM histories;
or get the interval, then extract the day component:
SELECT extract(day from to_date - from_date) FROM histories;
This example demontrates the creation of a table with trigger which updates the difference between a stop_time and start_time in DDD HH24:MI:SS format where the DDD stands for the amount of dates ...
DROP TABLE IF EXISTS benchmarks ;
SELECT 'create the "benchmarks" table'
;
CREATE TABLE benchmarks (
guid UUID NOT NULL DEFAULT gen_random_uuid()
, id bigint UNIQUE NOT NULL DEFAULT cast (to_char(current_timestamp, 'YYMMDDHH12MISS') as bigint)
, git_hash char (8) NULL DEFAULT 'hash...'
, start_time timestamp NOT NULL DEFAULT DATE_TRUNC('second', NOW())
, stop_time timestamp NOT NULL DEFAULT DATE_TRUNC('second', NOW())
, diff_time varchar (20) NOT NULL DEFAULT 'HH:MI:SS'
, update_time timestamp DEFAULT DATE_TRUNC('second', NOW())
, CONSTRAINT pk_benchmarks_guid PRIMARY KEY (guid)
) WITH (
OIDS=FALSE
);
create unique index idx_uniq_benchmarks_id on benchmarks (id);
-- START trigger trg_benchmarks_upsrt_diff_time
-- hrt = human readable time
CREATE OR REPLACE FUNCTION fnc_benchmarks_upsrt_diff_time()
RETURNS TRIGGER
AS $$
BEGIN
-- NEW.diff_time = age(NEW.stop_time::timestamp-NEW.start_time::timestamp);
NEW.diff_time = to_char(NEW.stop_time-NEW.start_time, 'DDD HH24:MI:SS');
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trg_benchmarks_upsrt_diff_time
BEFORE INSERT OR UPDATE ON benchmarks
FOR EACH ROW EXECUTE PROCEDURE fnc_benchmarks_upsrt_diff_time();
--
-- STOP trigger trg_benchmarks_upsrt_diff_time
Just remove the keyword INTERVAL:
SELECT EXTRACT(DAY FROM to_date - from_date) FROM histories;