How to generate unique timestamps in PostgreSQL? - postgresql

My idea is to implement a basic «vector clock», where a timestamps are clock-based, always go forward and are guaranteed to be unique.
For example, in a simple table:
CREATE TABLE IF NOT EXISTS timestamps (
last_modified TIMESTAMP UNIQUE
);
I use a trigger to set the timestamp value before insertion. It basically just goes into the future when two inserts arrive at the same time:
CREATE OR REPLACE FUNCTION bump_timestamp()
RETURNS trigger AS $$
DECLARE
previous TIMESTAMP;
current TIMESTAMP;
BEGIN
previous := NULL;
SELECT last_modified INTO previous
FROM timestamps
ORDER BY last_modified DESC LIMIT 1;
current := clock_timestamp();
IF previous IS NOT NULL AND previous >= current THEN
current := previous + INTERVAL '1 milliseconds';
END IF;
NEW.last_modified := current;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
DROP TRIGGER IF EXISTS tgr_timestamps_last_modified ON timestamps;
CREATE TRIGGER tgr_timestamps_last_modified
BEFORE INSERT OR UPDATE ON timestamps
FOR EACH ROW EXECUTE PROCEDURE bump_timestamp();
I then run a massive amount of insertions in two separate clients:
DO
$$
BEGIN
FOR i IN 1..100000 LOOP
INSERT INTO timestamps DEFAULT VALUES;
END LOOP;
END;
$$;
As expected, I get collisions:
ERROR: duplicate key value violates unique constraint "timestamps_last_modified_key"
État SQL :23505
Détail :Key (last_modified)=(2016-01-15 18:35:22.550367) already exists.
Contexte : SQL statement "INSERT INTO timestamps DEFAULT VALUES"
PL/pgSQL function inline_code_block line 4 at SQL statement
#rach suggested to mix current_clock() with a SEQUENCE object, but it would probably imply getting rid of the TIMESTAMP type. Even though I can't really figure out how it'd solve the isolation problem...
Is there a common pattern to avoid this?
Thank you for your insights :)

If you have only one Postgres server as you said, I think that using timestamp + sequence can solve the problem because sequence are non transactional and respect the insert order.
If you have db shard then it will be much more complex but maybe the distributed sequence of 2ndquadrant in BDR could help but I don't think that ordinality will be respected. I added some code below if you have setup to test it.
CREATE SEQUENCE "timestamps_seq";
-- Let's test first, how to generate id.
SELECT extract(epoch from now())::bigint::text || LPAD(nextval('timestamps_seq')::text, 20, '0') as unique_id ;
unique_id
--------------------------------
145288519200000000000000000010
(1 row)
CREATE TABLE IF NOT EXISTS timestamps (
unique_id TEXT UNIQUE NOT NULL DEFAULT extract(epoch from now())::bigint::text || LPAD(nextval('timestamps_seq')::text, 20, '0')
);
INSERT INTO timestamps DEFAULT VALUES;
INSERT INTO timestamps DEFAULT VALUES;
INSERT INTO timestamps DEFAULT VALUES;
select * from timestamps;
unique_id
--------------------------------
145288556900000000000000000001
145288557000000000000000000002
145288557100000000000000000003
(3 rows)
Let me know if that works. I'm not a DBA so maybe it will be good to ask on dba.stackexchange.com too about the potential side effect.

My two cents (Inspired from http://tapoueh.org/blog/2013/03/15-batch-update).
try adding the following before massive amount of insertions:
LOCK TABLE timestamps IN SHARE MODE;
Official documentation is here: http://www.postgresql.org/docs/current/static/sql-lock.html

Related

Error when creating a generated column in Postgresql

CREATE TABLE Person (
id serial primary key,
accNum text UNIQUE GENERATED ALWAYS AS (
concat(right(cast(extract year from current_date) as text), 2), cast(id as text)) STORED
);
Error: generation expression is not immutable
The goal is to populate the accNum field with YYid where YY is the last two letters of the year when the person was added.
I also tried the '||' operator but it was unsuccessful.
As you don't expect the column to be updated, when the row is changed, you can define your own function that generates the number:
create function generate_acc_num(id int)
returns text
as
$$
select to_char(current_date, 'YY')||id::text;
$$
language sql
immutable; --<< this is lying to Postgres!
Note that you should never use this function for any other purpose. Especially not as an index expression.
Then you can use that in a generated column:
CREATE TABLE Person
(
id integer generated always as identity primary key,
acc_num text UNIQUE GENERATED ALWAYS AS (generate_acc_num(id)) STORED
);
As #ScottNeville correctly mentioned:
CURRENT_DATE is not immutable. So it cannot be used int a GENERATED ALWAYS AS expression.
However, you can achieve this using a trigger nevertheless:
demo:db<>fiddle
CREATE FUNCTION accnum_trigger_function()
RETURNS TRIGGER
LANGUAGE PLPGSQL
AS $$
BEGIN
NEW.accNum := right(extract(year from current_date)::text, 2) || NEW.id::text;
RETURN NEW;
END
$$;
CREATE TRIGGER tr_accnum
BEFORE INSERT
ON person
FOR EACH ROW
EXECUTE PROCEDURE accnum_trigger_function();
As #a_horse_with_no_name mentioned correctly in the comments: You can simplify the expression to:
NEW.accNum := to_char(current_date, 'YY') || NEW.id;
I am not exactly sure how to solve this problem (maybe a trigger), but current_date is a stable function not an immutable one. For the generated IDs I believe all function calls must be immutable. You can read more here https://www.postgresql.org/docs/current/xfunc-volatility.html
I dont think any function that gets the date can be immutable as Postgres defines this as "An IMMUTABLE function cannot modify the database and is guaranteed to return the same results given the same arguments forever." This will not be true for anything that returns the current date.
I think your best bet would be to do this with a trigger so on insert it sets the value.

Create Trigger to get hourly difference between timestamps

I have a table where I would like to calculate the difference in time (in hours) between two columns after inserting a row. I would like to set up a trigger to do this whenever an insert or update is performed on the table.
My columns are delay_start, delay_stop, and delay_duration. I would like to do the following:
delay_duration = delay_stop - delay_start
The result should be of numeric (4,2) value and go into the delay_duration category. Below is what I have so far, but it will not populate the column for some reason.
BEGIN
INSERT INTO public.deckdelays(delay_duration)
VALUES(DATEDIFF(hh, delay_stop, delay_start));
RETURN NEW;
END;
I am quite new to all of this so if anyone could help I would greatly appreciate it!
If you have Postgres 12 or later you can define delay_duration as a generated column. This allows you to eliminate triggers.
create table deckdelays(id integer generated always as identity
, delay_start timestamp
, delay_stop timestamp
, delay_duration numeric(4,2)
generated always as
( extract(epoch from (delay_stop - delay_start))/3600 )
stored
--, other attributes
);
See demo here.
But if you insist on a trigger:
create or replace
function delayduration_func()
returns trigger
language plpgsql
as $$
begin
new.delay_duration = (extract(epoch from (deckdelays.delay_stop - deckdelays.delay_start))/3600)::numeric;
return new;
end;
$$;
create trigger delaydurationset1
before insert
or update of delay_stop, delay_start
on deckdelays
execute procedure delayduration_func();
Changes:
Before trigger instead of after. A before trigger can modify the
values in a column without additional DML statements, an after
trigger cannot. Issuing a DML statement on a table within a trigger
on that same table can lead to all types of problems. It is bast
avoided if possible.
Trigger name and function name not the same. Might just be me but I
do not like different things having the same name. Although it works
often leads to confusion. Always avoid confusion if possible.
Trigger fires on update of delay_start. An update of either delay_start or delay_end also updates delay_duration.

Efficiently looping through a VARIADIC in Postgres

I have the following table in Postgres
Which would typically be populated like below
id day visits passes
1 Monday {11,13,19} {13,17}
2 Tuesday {7,9} {11,13,19}
3 Wednesday {2,5,21} {21,27}
4 Thursday {3,11,39} {21,19}`
In order to get the visit or passes ids over a range of days I have written the following function
CREATE OR REPLACE FUNCTION day_entries(p_column TEXT,VARIADIC ids int[]) RETURNS bigint[] AS
$$
DECLARE result bigint[];
DECLARE hold bigint[];
BEGIN
FOR i IN 1 .. array_upper(ids,1) LOOP
execute format('SELECT %I FROM days WHERE id = $1',p_column) USING ids[i] INTO hold;
result := unnest(result) UNION unnest(hold);
END LOOP;
RETURN result;
END;
$$
LANGUAGE 'plpgsql';
which works with a subsequent call to day_entries('visits',1,2,3) returning
{11,9,19,21,5,13,2,7}
While it does the job I am concerned that based on my one day old knowledge of writing Postgres functions I have worked in one or more inefficiences into the process. Can the function be made easier in some way?
The other issue is more a curiosity than a problem - the order of elements in the result appears to have no bearing to the order of visits entries in the three rows that are touched. Although this is not an issue as far as I am concerned I am curious to know why it happens.
You can do the unnesting and aggregating in a single statement, no need for a loop. And you can use the ANY operator with the array to select all matching rows.
CREATE OR REPLACE FUNCTION day_entries(p_column TEXT, variadic p_ids int[])
RETURNS bigint[] AS
$$
DECLARE
result bigint[];
BEGIN
execute
format('SELECT array(select unnest(%I) from days WHERE id = any($1))', p_column)
USING p_ids -- pass the whole array as a parameter
INTO result;
RETURN result;
END;
$$
LANGUAGE plpgsql;
Not related to your questions, but I think you are going down the wrong road with that design. While arrays might look intriguing to beginners at the beginning, they should only be used rarely.
And if you find yourself unnesting and aggregating things back and forth, this is a strong indication that something could be improved.
I would split your table up in two tables, one that stores the "day" information and one that stores visits and passes in the same table with a column distinguishing the two. Then finding visits is as simple as adding a where ... = 'visit' rather than having to cope with (slow and error prone) dynamic SQL.
Without knowing more details, I would probably create the tables like this:
create table days
(
id integer not null primary key,
day character varying(9) not null
);
create table event
(
day_id integer not null references days,
event_id integer not null,
event_type varchar(10) not null check (event_type in ('visit', 'pass'))
);
event_id might even be a foreign to key to another table you haven't shown us - again something you can't really do with de-normalized tables.
Getting all visits for specific days, is then as simple as:_
select event_id
from event
where day_id in (1,2)
and event_type = 'visit';
Or if you do need that as an array:
select array_agg(event_id)
from event
where day_id in (1,2)
and event_type = 'visit';
Online example

How can I generate a unique string per record in a table in Postgres?

Say I have a table like posts, which has typical columns like id, body, created_at. I'd like to generate a unique string with the creation of each post, for use in something like a url shortener. So maybe a 10-character alphanumeric string. It needs to be unique within the table, just like a primary key.
Ideally there would be a way for Postgres to handle both of these concerns:
generate the string
ensure its uniqueness
And they must go hand-in-hand, because my goal is to not have to worry about any uniqueness-enforcing code in my application.
I don't claim the following is efficient, but it is how we have done this sort of thing in the past.
CREATE FUNCTION make_uid() RETURNS text AS $$
DECLARE
new_uid text;
done bool;
BEGIN
done := false;
WHILE NOT done LOOP
new_uid := md5(''||now()::text||random()::text);
done := NOT exists(SELECT 1 FROM my_table WHERE uid=new_uid);
END LOOP;
RETURN new_uid;
END;
$$ LANGUAGE PLPGSQL VOLATILE;
make_uid() can be used as the default for a column in my_table. Something like:
ALTER TABLE my_table ADD COLUMN uid text NOT NULL DEFAULT make_uid();
md5(''||now()::text||random()::text) can be adjusted to taste. You could consider encode(...,'base64') except some of the characters used in base-64 are not URL friendly.
All existing answers are WRONG because they are based on SELECT while generating unique index per table record. Let us assume that we need unique code per record while inserting: Imagine two concurrent INSERTs are happening same time by miracle (which happens very often than you think) for both inserts same code was generated because at the moment of SELECT that code did not exist in table. One instance will INSERT and other will fail.
First let us create table with code field and add unique index
CREATE TABLE my_table
(
code TEXT NOT NULL
);
CREATE UNIQUE INDEX ON my_table (lower(code));
Then we should have function or procedure (you can use code inside for trigger also) where we 1. generate new code, 2. try to insert new record with new code and 3. if insert fails try again from step 1
CREATE OR REPLACE PROCEDURE my_table_insert()
AS $$
DECLARE
new_code TEXT;
BEGIN
LOOP
new_code := LOWER(SUBSTRING(MD5(''||NOW()::TEXT||RANDOM()::TEXT) FOR 8));
BEGIN
INSERT INTO my_table (code) VALUES (new_code);
EXIT;
EXCEPTION WHEN unique_violation THEN
END;
END LOOP;
END;
$$ LANGUAGE PLPGSQL;
This is guaranteed error free solution not like other solutions on this thread
Use a Feistel network. This technique works efficiently to generate unique random-looking strings in constant time without any collision.
For a version with about 2 billion possible strings (2^31) of 6 letters, see this answer.
For a 63 bits version based on bigint (9223372036854775808 distinct possible values), see this other answer.
You may change the round function as explained in the first answer to introduce a secret element to have your own series of strings (not guessable).
The easiest way probably to use the sequence to guarantee uniqueness
(so after the seq add a fix x digit random number):
CREATE SEQUENCE test_seq;
CREATE TABLE test_table (
id bigint NOT NULL DEFAULT (nextval('test_seq')::text || (LPAD(floor(random()*100000000)::text, 8, '0')))::bigint,
txt TEXT
);
insert into test_table (txt) values ('1');
insert into test_table (txt) values ('2');
select id, txt from test_table;
However this will waste a huge amount of records. (Note: the max bigInt is 9223372036854775807 if you use 8 digit random number at the end, you can only have 922337203 records. Thou 8 digit is probably not necessary. Also check the max number for your programming environment!)
Alternatively you can use varchar for the id and even convert the above number with to_hex() or change to base36 like below (but for base36, try to not expose it to customer, in order to avoid some funny string showing up!):
PostgreSQL: Is there a function that will convert a base-10 int into a base-36 string?
Check out a blog by Bruce. This gets you part way there. You will have to make sure it doesn't already exist. Maybe concat the primary key to it?
Generating Random Data Via Sql
"Ever need to generate random data? You can easily do it in client applications and server-side functions, but it is possible to generate random data in sql. The following query generates five lines of 40-character-length lowercase alphabetic strings:"
SELECT
(
SELECT string_agg(x, '')
FROM (
SELECT chr(ascii('a') + floor(random() * 26)::integer)
FROM generate_series(1, 40 + b * 0)
) AS y(x)
)
FROM generate_series(1,5) as a(b);
Use primary key in your data. If you really need alphanumeric unique string, you can use base-36 encoding. In PostgreSQL you can use this function.
Example:
select base36_encode(generate_series(1000000000,1000000010));
GJDGXS
GJDGXT
GJDGXU
GJDGXV
GJDGXW
GJDGXX
GJDGXY
GJDGXZ
GJDGY0
GJDGY1
GJDGY2

PostgreSQL, triggers, and concurrency to enforce a temporal key

I want to define a trigger in PostgreSQL to check that the inserted row, on a generic table, has the the property: "no other row exists with the same key in the same valid time" (the keys are sequenced keys). In fact, I has already implemented it. But since the trigger has to scan the entire table, now i'm wondering: is there a need for a table-level lock? Or this is managed someway by the PostgreSQL itself?
Here is an example.
In the upcoming PostgreSQL 9.0 I would have defined the table in this way:
CREATE TABLE medicinal_products
(
aic_code CHAR(9), -- sequenced key
full_name VARCHAR(255),
market_time PERIOD,
EXCLUDE USING gist
(aic_code CHECK WITH =,
market_time CHECK WITH &&)
);
but in fact I have been defined it like this:
CREATE TABLE medicinal_products
(
PRIMARY KEY (aic_code, vs),
aic_code CHAR(9), -- sequenced key
full_name VARCHAR(255),
vs DATE NOT NULL,
ve DATE,
CONSTRAINT valid_time_range
CHECK (ve > vs OR ve IS NULL)
);
Then, I have written a trigger that check the costraint: "two distinct medicinal products can have the same code in two different periods, but not in same time".
So the code:
INSERT INTO medicinal_products VALUES ('1','A','2010-01-01','2010-04-01');
INSERT INTO medicinal_products VALUES ('1','A','2010-03-01','2010-06-01');
return an error.
One solution is to have a second table to use for detecting clashes, and populate that with a trigger. Using the schema you added into the question:
CREATE TABLE medicinal_product_date_map(
aic_code char(9) NOT NULL,
applicable_date date NOT NULL,
UNIQUE(aic_code, applicable_date));
(note: this is the second attempt due to misreading your requirement the first time round. hope it's right this time).
Some functions to maintain this table:
CREATE FUNCTION add_medicinal_product_date_range(aic_code_in char(9), start_date date, end_date date)
RETURNS void STRICT VOLATILE LANGUAGE sql AS $$
INSERT INTO medicinal_product_date_map
SELECT $1, $2 + offset
FROM generate_series(0, $3 - $2)
$$;
CREATE FUNCTION clr_medicinal_product_date_range(aic_code_in char(9), start_date date, end_date date)
RETURNS void STRICT VOLATILE LANGUAGE sql AS $$
DELETE FROM medicinal_product_date_map
WHERE aic_code = $1 AND applicable_date BETWEEN $2 AND $3
$$;
And populate the table first time with:
SELECT count(add_medicinal_product_date_range(aic_code, vs, ve))
FROM medicinal_products;
Now create triggers to populate the date map after changes to medicinal_products: after insert calls add_, after update calls clr_ (old values) and add_ (new values), after delete calls clr_.
CREATE FUNCTION sync_medicinal_product_date_map()
RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
IF TG_OP = 'UPDATE' OR TG_OP = 'DELETE' THEN
PERFORM clr_medicinal_product_date_range(OLD.aic_code, OLD.vs, OLD.ve);
END IF;
IF TG_OP = 'UPDATE' OR TG_OP = 'INSERT' THEN
PERFORM add_medicinal_product_date_range(NEW.aic_code, NEW.vs, NEW.ve);
END IF;
RETURN NULL;
END;
$$;
CREATE TRIGGER sync_date_map
AFTER INSERT OR UPDATE OR DELETE ON medicinal_products
FOR EACH ROW EXECUTE PROCEDURE sync_medicinal_product_date_map();
The uniqueness constraint on medicinal_product_date_map will trap any products being added with the same code on the same day:
steve#steve#[local] =# INSERT INTO medicinal_products VALUES ('1','A','2010-01-01','2010-04-01');
INSERT 0 1
steve#steve#[local] =# INSERT INTO medicinal_products VALUES ('1','A','2010-03-01','2010-06-01');
ERROR: duplicate key value violates unique constraint "medicinal_product_date_map_aic_code_applicable_date_key"
DETAIL: Key (aic_code, applicable_date)=(1 , 2010-03-01) already exists.
CONTEXT: SQL function "add_medicinal_product_date_range" statement 1
SQL statement "SELECT add_medicinal_product_date_range(NEW.aic_code, NEW.vs, NEW.ve)"
PL/pgSQL function "sync_medicinal_product_date_map" line 6 at PERFORM
This depends on the values being checked for having a discrete space- which is why I asked about dates vs timestamps. Although timestamps are, technically, discrete since Postgresql only stores microsecond-resolution, adding an entry to the map table for every microsecond the product is applicable for is not practical.
Having said that, you could probably also get away with something better than a full-table scan to check for overlapping timestamp intervals, with some trickery on looking for only the first interval not after or not before... however, for easy discrete spaces I prefer this approach which IME can also be handy for other things too (e.g. reports that need to quickly find which products are applicable on a certain day).
I also like this approach because it feels right to leverage the database's uniqueness-constraint mechanism this way. Also, I feel it will be more reliable in the context of concurrent updates to the master table: without locking the table against concurrent updates, it would be possible for a validation trigger to see no conflict and allow inserts in two concurrent sessions, that are then seen to conflict when both transaction's effects are visible.
Just a thought, in case the valid time blocks could be coded with a number or something, creating a UNIQUE index on Id+TimeBlock would be blazingly fast and resolve all table lock problems.
It is managed by PostgreSQL itself. On a select it acquires an ACCESS_SHARE lock which means that you can query the table but do not perform updates.
A radical solution which might help you is to use a cache like ehcache or memcached to store the id/timeblock info and not use the postgresql at all. Many can be persisted so they would survive a server restart and they do not exhibit this locking behavior.
Why can't you use a UNIQUE constraint? Will be much faster (it's an index) and easier.