I have the following table in Postgres
Which would typically be populated like below
id day visits passes
1 Monday {11,13,19} {13,17}
2 Tuesday {7,9} {11,13,19}
3 Wednesday {2,5,21} {21,27}
4 Thursday {3,11,39} {21,19}`
In order to get the visit or passes ids over a range of days I have written the following function
CREATE OR REPLACE FUNCTION day_entries(p_column TEXT,VARIADIC ids int[]) RETURNS bigint[] AS
$$
DECLARE result bigint[];
DECLARE hold bigint[];
BEGIN
FOR i IN 1 .. array_upper(ids,1) LOOP
execute format('SELECT %I FROM days WHERE id = $1',p_column) USING ids[i] INTO hold;
result := unnest(result) UNION unnest(hold);
END LOOP;
RETURN result;
END;
$$
LANGUAGE 'plpgsql';
which works with a subsequent call to day_entries('visits',1,2,3) returning
{11,9,19,21,5,13,2,7}
While it does the job I am concerned that based on my one day old knowledge of writing Postgres functions I have worked in one or more inefficiences into the process. Can the function be made easier in some way?
The other issue is more a curiosity than a problem - the order of elements in the result appears to have no bearing to the order of visits entries in the three rows that are touched. Although this is not an issue as far as I am concerned I am curious to know why it happens.
You can do the unnesting and aggregating in a single statement, no need for a loop. And you can use the ANY operator with the array to select all matching rows.
CREATE OR REPLACE FUNCTION day_entries(p_column TEXT, variadic p_ids int[])
RETURNS bigint[] AS
$$
DECLARE
result bigint[];
BEGIN
execute
format('SELECT array(select unnest(%I) from days WHERE id = any($1))', p_column)
USING p_ids -- pass the whole array as a parameter
INTO result;
RETURN result;
END;
$$
LANGUAGE plpgsql;
Not related to your questions, but I think you are going down the wrong road with that design. While arrays might look intriguing to beginners at the beginning, they should only be used rarely.
And if you find yourself unnesting and aggregating things back and forth, this is a strong indication that something could be improved.
I would split your table up in two tables, one that stores the "day" information and one that stores visits and passes in the same table with a column distinguishing the two. Then finding visits is as simple as adding a where ... = 'visit' rather than having to cope with (slow and error prone) dynamic SQL.
Without knowing more details, I would probably create the tables like this:
create table days
(
id integer not null primary key,
day character varying(9) not null
);
create table event
(
day_id integer not null references days,
event_id integer not null,
event_type varchar(10) not null check (event_type in ('visit', 'pass'))
);
event_id might even be a foreign to key to another table you haven't shown us - again something you can't really do with de-normalized tables.
Getting all visits for specific days, is then as simple as:_
select event_id
from event
where day_id in (1,2)
and event_type = 'visit';
Or if you do need that as an array:
select array_agg(event_id)
from event
where day_id in (1,2)
and event_type = 'visit';
Online example
Related
CREATE TABLE Person (
id serial primary key,
accNum text UNIQUE GENERATED ALWAYS AS (
concat(right(cast(extract year from current_date) as text), 2), cast(id as text)) STORED
);
Error: generation expression is not immutable
The goal is to populate the accNum field with YYid where YY is the last two letters of the year when the person was added.
I also tried the '||' operator but it was unsuccessful.
As you don't expect the column to be updated, when the row is changed, you can define your own function that generates the number:
create function generate_acc_num(id int)
returns text
as
$$
select to_char(current_date, 'YY')||id::text;
$$
language sql
immutable; --<< this is lying to Postgres!
Note that you should never use this function for any other purpose. Especially not as an index expression.
Then you can use that in a generated column:
CREATE TABLE Person
(
id integer generated always as identity primary key,
acc_num text UNIQUE GENERATED ALWAYS AS (generate_acc_num(id)) STORED
);
As #ScottNeville correctly mentioned:
CURRENT_DATE is not immutable. So it cannot be used int a GENERATED ALWAYS AS expression.
However, you can achieve this using a trigger nevertheless:
demo:db<>fiddle
CREATE FUNCTION accnum_trigger_function()
RETURNS TRIGGER
LANGUAGE PLPGSQL
AS $$
BEGIN
NEW.accNum := right(extract(year from current_date)::text, 2) || NEW.id::text;
RETURN NEW;
END
$$;
CREATE TRIGGER tr_accnum
BEFORE INSERT
ON person
FOR EACH ROW
EXECUTE PROCEDURE accnum_trigger_function();
As #a_horse_with_no_name mentioned correctly in the comments: You can simplify the expression to:
NEW.accNum := to_char(current_date, 'YY') || NEW.id;
I am not exactly sure how to solve this problem (maybe a trigger), but current_date is a stable function not an immutable one. For the generated IDs I believe all function calls must be immutable. You can read more here https://www.postgresql.org/docs/current/xfunc-volatility.html
I dont think any function that gets the date can be immutable as Postgres defines this as "An IMMUTABLE function cannot modify the database and is guaranteed to return the same results given the same arguments forever." This will not be true for anything that returns the current date.
I think your best bet would be to do this with a trigger so on insert it sets the value.
I'm moving from MySql to Postgres, and I noticed that when you delete rows from MySql, the unique ids for those rows are re-used when you make new ones. With Postgres, if you create rows, and delete them, the unique ids are not used again.
Is there a reason for this behaviour in Postgres? Can I make it act more like MySql in this case?
Sequences have gaps to permit concurrent inserts. Attempting to avoid gaps or to re-use deleted IDs creates horrible performance problems. See the PostgreSQL wiki FAQ.
PostgreSQL SEQUENCEs are used to allocate IDs. These only ever increase, and they're exempt from the usual transaction rollback rules to permit multiple transactions to grab new IDs at the same time. This means that if a transaction rolls back, those IDs are "thrown away"; there's no list of "free" IDs kept, just the current ID counter. Sequences are also usually incremented if the database shuts down uncleanly.
Synthetic keys (IDs) are meaningless anyway. Their order is not significant, their only property of significance is uniqueness. You can't meaningfully measure how "far apart" two IDs are, nor can you meaningfully say if one is greater or less than another. All you can do is say "equal" or "not equal". Anything else is unsafe. You shouldn't care about gaps.
If you need a gapless sequence that re-uses deleted IDs, you can have one, you just have to give up a huge amount of performance for it - in particular, you cannot have any concurrency on INSERTs at all, because you have to scan the table for the lowest free ID, locking the table for write so no other transaction can claim the same ID. Try searching for "postgresql gapless sequence".
The simplest approach is to use a counter table and a function that gets the next ID. Here's a generalized version that uses a counter table to generate consecutive gapless IDs; it doesn't re-use IDs, though.
CREATE TABLE thetable_id_counter ( last_id integer not null );
INSERT INTO thetable_id_counter VALUES (0);
CREATE OR REPLACE FUNCTION get_next_id(countertable regclass, countercolumn text) RETURNS integer AS $$
DECLARE
next_value integer;
BEGIN
EXECUTE format('UPDATE %s SET %I = %I + 1 RETURNING %I', countertable, countercolumn, countercolumn, countercolumn) INTO next_value;
RETURN next_value;
END;
$$ LANGUAGE plpgsql;
COMMENT ON get_next_id(countername regclass) IS 'Increment and return value from integer column $2 in table $1';
Usage:
INSERT INTO dummy(id, blah)
VALUES ( get_next_id('thetable_id_counter','last_id'), 42 );
Note that when one open transaction has obtained an ID, all other transactions that try to call get_next_id will block until the 1st transaction commits or rolls back. This is unavoidable and for gapless IDs and is by design.
If you want to store multiple counters for different purposes in a table, just add a parameter to the above function, add a column to the counter table, and add a WHERE clause to the UPDATE that matches the parameter to the added column. That way you can have multiple independently-locked counter rows. Do not just add extra columns for new counters.
This function does not re-use deleted IDs, it just avoids introducing gaps.
To re-use IDs I advise ... not re-using IDs.
If you really must, you can do so by adding an ON INSERT OR UPDATE OR DELETE trigger on the table of interest that adds deleted IDs to a free-list side table, and removes them from the free-list table when they're INSERTed. Treat an UPDATE as a DELETE followed by an INSERT. Now modify the ID generation function above so that it does a SELECT free_id INTO next_value FROM free_ids FOR UPDATE LIMIT 1 and if found, DELETEs that row. IF NOT FOUND gets a new ID from the generator table as normal. Here's an untested extension of the prior function to support re-use:
CREATE OR REPLACE FUNCTION get_next_id_reuse(countertable regclass, countercolumn text, freelisttable regclass, freelistcolumn text) RETURNS integer AS $$
DECLARE
next_value integer;
BEGIN
EXECUTE format('SELECT %I FROM %s FOR UPDATE LIMIT 1', freelistcolumn, freelisttable) INTO next_value;
IF next_value IS NOT NULL THEN
EXECUTE format('DELETE FROM %s WHERE %I = %L', freelisttable, freelistcolumn, next_value);
ELSE
EXECUTE format('UPDATE %s SET %I = %I + 1 RETURNING %I', countertable, countercolumn, countercolumn, countercolumn) INTO next_value;
END IF;
RETURN next_value;
END;
$$ LANGUAGE plpgsql;
My idea is to implement a basic «vector clock», where a timestamps are clock-based, always go forward and are guaranteed to be unique.
For example, in a simple table:
CREATE TABLE IF NOT EXISTS timestamps (
last_modified TIMESTAMP UNIQUE
);
I use a trigger to set the timestamp value before insertion. It basically just goes into the future when two inserts arrive at the same time:
CREATE OR REPLACE FUNCTION bump_timestamp()
RETURNS trigger AS $$
DECLARE
previous TIMESTAMP;
current TIMESTAMP;
BEGIN
previous := NULL;
SELECT last_modified INTO previous
FROM timestamps
ORDER BY last_modified DESC LIMIT 1;
current := clock_timestamp();
IF previous IS NOT NULL AND previous >= current THEN
current := previous + INTERVAL '1 milliseconds';
END IF;
NEW.last_modified := current;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
DROP TRIGGER IF EXISTS tgr_timestamps_last_modified ON timestamps;
CREATE TRIGGER tgr_timestamps_last_modified
BEFORE INSERT OR UPDATE ON timestamps
FOR EACH ROW EXECUTE PROCEDURE bump_timestamp();
I then run a massive amount of insertions in two separate clients:
DO
$$
BEGIN
FOR i IN 1..100000 LOOP
INSERT INTO timestamps DEFAULT VALUES;
END LOOP;
END;
$$;
As expected, I get collisions:
ERROR: duplicate key value violates unique constraint "timestamps_last_modified_key"
État SQL :23505
Détail :Key (last_modified)=(2016-01-15 18:35:22.550367) already exists.
Contexte : SQL statement "INSERT INTO timestamps DEFAULT VALUES"
PL/pgSQL function inline_code_block line 4 at SQL statement
#rach suggested to mix current_clock() with a SEQUENCE object, but it would probably imply getting rid of the TIMESTAMP type. Even though I can't really figure out how it'd solve the isolation problem...
Is there a common pattern to avoid this?
Thank you for your insights :)
If you have only one Postgres server as you said, I think that using timestamp + sequence can solve the problem because sequence are non transactional and respect the insert order.
If you have db shard then it will be much more complex but maybe the distributed sequence of 2ndquadrant in BDR could help but I don't think that ordinality will be respected. I added some code below if you have setup to test it.
CREATE SEQUENCE "timestamps_seq";
-- Let's test first, how to generate id.
SELECT extract(epoch from now())::bigint::text || LPAD(nextval('timestamps_seq')::text, 20, '0') as unique_id ;
unique_id
--------------------------------
145288519200000000000000000010
(1 row)
CREATE TABLE IF NOT EXISTS timestamps (
unique_id TEXT UNIQUE NOT NULL DEFAULT extract(epoch from now())::bigint::text || LPAD(nextval('timestamps_seq')::text, 20, '0')
);
INSERT INTO timestamps DEFAULT VALUES;
INSERT INTO timestamps DEFAULT VALUES;
INSERT INTO timestamps DEFAULT VALUES;
select * from timestamps;
unique_id
--------------------------------
145288556900000000000000000001
145288557000000000000000000002
145288557100000000000000000003
(3 rows)
Let me know if that works. I'm not a DBA so maybe it will be good to ask on dba.stackexchange.com too about the potential side effect.
My two cents (Inspired from http://tapoueh.org/blog/2013/03/15-batch-update).
try adding the following before massive amount of insertions:
LOCK TABLE timestamps IN SHARE MODE;
Official documentation is here: http://www.postgresql.org/docs/current/static/sql-lock.html
In PostgreSQL I have this table... (there is a primary key in the most left side "timestamp02" which is not shown in this image, please don't bother, its not important for the purpose of this question)
in the table above, all columns are entered via queries, except the "time_index" which I would like to be filled automatically via a trigger each time each row is filled.
This is the code to create the same table (without any value) so everyone could create it using the Postgre SQL query panel.
CREATE TABLE table_ebscb_spa_log02
(
pcnum smallint,
timestamp02 timestamp with time zone NOT NULL DEFAULT now(),
fn_name character varying,
"time" time without time zone,
time_elapse character varying,
time_type character varying,
time_index real,
CONSTRAINT table_ebscb_spa_log02_pkey PRIMARY KEY (timestamp02)
)
WITH (
OIDS=FALSE
);
ALTER TABLE table_ebscb_spa_log02
OWNER TO postgres;
What I would like the trigger to do is:
INSERT a number in the "time_index" column based on the INSERTed values of the "fn_name" and "time_type" columns in each row.
If both ("fn_name" and "time_type") do a combination (eg. Check Mails - Start) that doesn't exist in any row before (above), then INSERT 1 in the "time_index" column,
Elif both ("fn_name" and "time_type") do a combination that does exist in some row before (above), then INSERT the number following the one before(above) in the "time_index" column.
(pls look at the example table image, this trigger will produce every red highlighted square on it)
I have watch many, PostgreSQL tutorial videos, read many manuals, including these
http://www.postgresql.org/docs/9.4/static/sql-createtrigger.html
http://www.postgresql.org/docs/9.4/static/plpgsql-trigger.html
without any result.
I have tried so far this to create the function:
CREATE OR REPLACE FUNCTION on_ai_myTable() RETURNS TRIGGER AS $$
DECLARE
t_ix real;
n int;
BEGIN
IF NEW.time_type = 'Start' THEN
SELECT t.time_index FROM table_ebscb_spa_log02 t WHERE t.fn_name = NEW.fn_name AND t.time_type = 'Start' ORDER BY t.timestamp02 DESC LIMIT 1 INTO t_ix;
GET DIAGNOSTICS n = ROW_COUNT;
IF (n = 0) THEN
t_ix = 1;
ELSE
t_ix = t_ix + 1;
END IF;
END IF;
NEW.time_index = t_ix;
return NEW;
END
$$
LANGUAGE plpgsql;
And this to create the query:
CREATE TRIGGER on_ai_myTable
AFTER INSERT ON table_ebscb_spa_log02
FOR EACH ROW
EXECUTE PROCEDURE on_ai_myTable();
Then when I manually insert the values in the table, nothing change (no error message) time_index column just remain empty, what am I doing wrong???
Please some good PostgreSQL fellow programmer could give me a hand, I really have come to a death point in this task, I have any more ideas.
Thanks in advance
In an AFTER INSERT trigger, any changes you make to NEW.time_index will be ignored. The record is already inserted at this point; it's too late to modify it.
Create the trigger as BEFORE INSERT instead.
Say I have a table like posts, which has typical columns like id, body, created_at. I'd like to generate a unique string with the creation of each post, for use in something like a url shortener. So maybe a 10-character alphanumeric string. It needs to be unique within the table, just like a primary key.
Ideally there would be a way for Postgres to handle both of these concerns:
generate the string
ensure its uniqueness
And they must go hand-in-hand, because my goal is to not have to worry about any uniqueness-enforcing code in my application.
I don't claim the following is efficient, but it is how we have done this sort of thing in the past.
CREATE FUNCTION make_uid() RETURNS text AS $$
DECLARE
new_uid text;
done bool;
BEGIN
done := false;
WHILE NOT done LOOP
new_uid := md5(''||now()::text||random()::text);
done := NOT exists(SELECT 1 FROM my_table WHERE uid=new_uid);
END LOOP;
RETURN new_uid;
END;
$$ LANGUAGE PLPGSQL VOLATILE;
make_uid() can be used as the default for a column in my_table. Something like:
ALTER TABLE my_table ADD COLUMN uid text NOT NULL DEFAULT make_uid();
md5(''||now()::text||random()::text) can be adjusted to taste. You could consider encode(...,'base64') except some of the characters used in base-64 are not URL friendly.
All existing answers are WRONG because they are based on SELECT while generating unique index per table record. Let us assume that we need unique code per record while inserting: Imagine two concurrent INSERTs are happening same time by miracle (which happens very often than you think) for both inserts same code was generated because at the moment of SELECT that code did not exist in table. One instance will INSERT and other will fail.
First let us create table with code field and add unique index
CREATE TABLE my_table
(
code TEXT NOT NULL
);
CREATE UNIQUE INDEX ON my_table (lower(code));
Then we should have function or procedure (you can use code inside for trigger also) where we 1. generate new code, 2. try to insert new record with new code and 3. if insert fails try again from step 1
CREATE OR REPLACE PROCEDURE my_table_insert()
AS $$
DECLARE
new_code TEXT;
BEGIN
LOOP
new_code := LOWER(SUBSTRING(MD5(''||NOW()::TEXT||RANDOM()::TEXT) FOR 8));
BEGIN
INSERT INTO my_table (code) VALUES (new_code);
EXIT;
EXCEPTION WHEN unique_violation THEN
END;
END LOOP;
END;
$$ LANGUAGE PLPGSQL;
This is guaranteed error free solution not like other solutions on this thread
Use a Feistel network. This technique works efficiently to generate unique random-looking strings in constant time without any collision.
For a version with about 2 billion possible strings (2^31) of 6 letters, see this answer.
For a 63 bits version based on bigint (9223372036854775808 distinct possible values), see this other answer.
You may change the round function as explained in the first answer to introduce a secret element to have your own series of strings (not guessable).
The easiest way probably to use the sequence to guarantee uniqueness
(so after the seq add a fix x digit random number):
CREATE SEQUENCE test_seq;
CREATE TABLE test_table (
id bigint NOT NULL DEFAULT (nextval('test_seq')::text || (LPAD(floor(random()*100000000)::text, 8, '0')))::bigint,
txt TEXT
);
insert into test_table (txt) values ('1');
insert into test_table (txt) values ('2');
select id, txt from test_table;
However this will waste a huge amount of records. (Note: the max bigInt is 9223372036854775807 if you use 8 digit random number at the end, you can only have 922337203 records. Thou 8 digit is probably not necessary. Also check the max number for your programming environment!)
Alternatively you can use varchar for the id and even convert the above number with to_hex() or change to base36 like below (but for base36, try to not expose it to customer, in order to avoid some funny string showing up!):
PostgreSQL: Is there a function that will convert a base-10 int into a base-36 string?
Check out a blog by Bruce. This gets you part way there. You will have to make sure it doesn't already exist. Maybe concat the primary key to it?
Generating Random Data Via Sql
"Ever need to generate random data? You can easily do it in client applications and server-side functions, but it is possible to generate random data in sql. The following query generates five lines of 40-character-length lowercase alphabetic strings:"
SELECT
(
SELECT string_agg(x, '')
FROM (
SELECT chr(ascii('a') + floor(random() * 26)::integer)
FROM generate_series(1, 40 + b * 0)
) AS y(x)
)
FROM generate_series(1,5) as a(b);
Use primary key in your data. If you really need alphanumeric unique string, you can use base-36 encoding. In PostgreSQL you can use this function.
Example:
select base36_encode(generate_series(1000000000,1000000010));
GJDGXS
GJDGXT
GJDGXU
GJDGXV
GJDGXW
GJDGXX
GJDGXY
GJDGXZ
GJDGY0
GJDGY1
GJDGY2