Beside of unique actions we need recurrent actions in our database. We wan't the user to be able to define a periodicity (all 1,2,3,.. years) and a period (e.g. from 2018 - to 2020) in a form. This data should be used to insert appropriat datasets for a defined action.
If the user chooses an annual periodicity starting from 2018 3 datasets (2018, 2019 and 2020) should be inserted in the actions table.
If the user chooses an biannual periodicity starting from 2018 only 2 datasets (2018 and 2020) should be inserted in the actions table.
The simplified table actions looks like this:
id serial not null
id_action integer
action_year integer
periodicity integer
from_ integer
to_ integer
I need a starting point for the sql statement.
You should use generate_series(start, stop, step)
Annual:
=> select generate_series(2018,2020,1);
generate_series
-----------------
2018
2019
2020
(3 rows)
Biannual:
=> select generate_series(2018,2020,2);
generate_series
-----------------
2018
2020
(2 rows)
I didn't knew the function generate_series() until now. Thanks to point me in that direction.
To get things running like I intended I need to use the generate_series() inside an Trigger function that is fired AFTER INSERT. After first running into troubles with recursive Trigger inserts I now have the problem, that my Trigger produces to many duplicate inserts (increasing with the choosen periodicity).
My table actions looks like this:
id serial not null
id_action integer
action_year integer
periodicity integer
from_ integer
to_ integer
My Trigger on the table:
CREATE TRIGGER tr_actions_recurrent
AFTER INSERT
ON actions
FOR EACH ROW
WHEN ((pg_trigger_depth() = 0))
EXECUTE PROCEDURE actions_recurrent();
Here my trigger function:
CREATE OR REPLACE FUNCTION actions_recurrent()
RETURNS trigger AS
$BODY$
BEGIN
IF NEW.periodicity >0 AND NEW.action_year <= NEW.to_-NEW.periodicity THEN
INSERT into actions(id_action, action_year,periodicity, from_, to_)
SELECT NEW.id_action, y, NEW.periodicity, NEW.from_, NEW.to_
FROM actions, generate_series(NEW.from_+NEW.periodicity,NEW.to_,NEW.periodicity) AS y;
END IF;
RETURN NEW;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
When I'm doing an insert
INSERT INTO actions (id_action, action_year,periodicity,from_, to_)
VALUES (50,2018,4,2018,2028);
I get one row for action_year 2018, but 13 rows 2022 and 2026??
In my understanding, the IF-clause in the trigger-function should avoid such a repetitive execution.
Related
I've got a Postgres 12.3 question: Can I rely on CLOCK_TIMESTAMP() in a trigger to stamp an updated_dts timestamp in exactly the same order as changes are committed to the permanent data?
On the face of it, this might sound like kind of an silly question, but I just spent two tracking down a super rare race condition in a non-Postgres system that hinged on exactly this behavior. (Lagging commits made their 'last value seen' tracking data unreliable.) Now I'm trying to figure out if it's possible for CLOCK_TIMESTAMP() to not match the order of changes recorded in the WAL perfectly.
It's simple to see how this could occur with NOW/TRANSACTION_TIMESTAMP/CURRENT_TIMESTAMP as they're returning the transaction start time, not the completion time. It's pretty easy, in that case, to record a timestamp sequence where the stamps and log order don't agree. But I can't figure out if there's any chance for commits to be saved in a different order to the BEFORE trigger CLOCK_TIMESTAMP() values.
For background, we need a 100% reliable timeline for an external search to use. As I understand it, I can create one using logical replication, and a replication-target side trigger to stamp changes as they're replayed from the log. What I'm unclear on, is if it's possible to get the same fidelity from CLOCK_TIMESTAMP() on a single server.
I haven't got the chops to get deep into the Postgres internals, and see how requests are interleaved, nor how granular execution is, and am hoping that someone here knows definitively. If this is more of a question for one of the PG mailing lists, please let me know.
-- Thanks
Below is a bit of sample code for how I'm looking at building the timestamps. It works fine, but doesn't prove anything about behavior with lots of concurrent processes.
---------------------------------------------
-- Create the trigger function
---------------------------------------------
DROP FUNCTION IF EXISTS api.set_updated CASCADE;
CREATE OR REPLACE FUNCTION api.set_updated()
RETURNS TRIGGER
AS $BODY$
BEGIN
NEW.updated_dts = CLOCK_TIMESTAMP();
RETURN NEW;
END;
$BODY$
language plpgsql;
COMMENT ON FUNCTION api.set_updated() IS 'Sets updated_dts field to CLOCK_TIMESTAMP(), if the record has changed..';
---------------------------------------------
-- Create the table
---------------------------------------------
DROP TABLE IF EXISTS api.numbers;
CREATE TABLE api.numbers (
id uuid NOT NULL DEFAULT extensions.gen_random_uuid (),
number integer NOT NULL DEFAULT NULL,
updated_dts timestamptz NOT NULL DEFAULT 'epoch'::timestamptz
);
---------------------------------------------
-- Define the triggers (binding)
---------------------------------------------
-- NOTE: I'm guessing that in production that I can use DEFAULT CLOCK_TIMESTAMP() instead of a BEFORE INSERT trigger,
-- I'm using a distinct DEFAULT value, as I want it to pop out if I'm not getting the trigger to fire.
CREATE TRIGGER trigger_api_number_before_insert
BEFORE INSERT ON api.numbers
FOR EACH ROW
EXECUTE PROCEDURE set_updated();
CREATE TRIGGER trigger_api_number_before_update
BEFORE UPDATE ON api.numbers
FOR EACH ROW
WHEN (OLD.* IS DISTINCT FROM NEW.*)
EXECUTE PROCEDURE set_updated();
---------------------------------------------
-- INSERT some data
---------------------------------------------
INSERT INTO numbers (number) values (1),(2),(3);
---------------------------------------------
-- Take a look
---------------------------------------------
SELECT * from numbers ORDER BY updated_dts ASC; -- The values should be listed as 1, 2, 3 as oldest to newest.
---------------------------------------------
-- UPDATE a row
---------------------------------------------
UPDATE numbers SET number = 11 where number = 1;
---------------------------------------------
-- Take a look
---------------------------------------------
SELECT * from numbers ORDER BY updated_dts ASC; -- The values should be listed as 2, 3, 11 as oldest to newest.
No, you cannot depend on clock_timestamp() order during trigger execution (or while evaluating a DEFAULT clause) being the same as commit order.
Commit will always happen later than the function call, and you cannot control how long it takes between them.
But I am surprised that that is a problem for you. Typically, the commit time is not visible or relevant. Why don't you simply accept the clock_timestamp() as the measure of things?
I have the following table in Postgres
Which would typically be populated like below
id day visits passes
1 Monday {11,13,19} {13,17}
2 Tuesday {7,9} {11,13,19}
3 Wednesday {2,5,21} {21,27}
4 Thursday {3,11,39} {21,19}`
In order to get the visit or passes ids over a range of days I have written the following function
CREATE OR REPLACE FUNCTION day_entries(p_column TEXT,VARIADIC ids int[]) RETURNS bigint[] AS
$$
DECLARE result bigint[];
DECLARE hold bigint[];
BEGIN
FOR i IN 1 .. array_upper(ids,1) LOOP
execute format('SELECT %I FROM days WHERE id = $1',p_column) USING ids[i] INTO hold;
result := unnest(result) UNION unnest(hold);
END LOOP;
RETURN result;
END;
$$
LANGUAGE 'plpgsql';
which works with a subsequent call to day_entries('visits',1,2,3) returning
{11,9,19,21,5,13,2,7}
While it does the job I am concerned that based on my one day old knowledge of writing Postgres functions I have worked in one or more inefficiences into the process. Can the function be made easier in some way?
The other issue is more a curiosity than a problem - the order of elements in the result appears to have no bearing to the order of visits entries in the three rows that are touched. Although this is not an issue as far as I am concerned I am curious to know why it happens.
You can do the unnesting and aggregating in a single statement, no need for a loop. And you can use the ANY operator with the array to select all matching rows.
CREATE OR REPLACE FUNCTION day_entries(p_column TEXT, variadic p_ids int[])
RETURNS bigint[] AS
$$
DECLARE
result bigint[];
BEGIN
execute
format('SELECT array(select unnest(%I) from days WHERE id = any($1))', p_column)
USING p_ids -- pass the whole array as a parameter
INTO result;
RETURN result;
END;
$$
LANGUAGE plpgsql;
Not related to your questions, but I think you are going down the wrong road with that design. While arrays might look intriguing to beginners at the beginning, they should only be used rarely.
And if you find yourself unnesting and aggregating things back and forth, this is a strong indication that something could be improved.
I would split your table up in two tables, one that stores the "day" information and one that stores visits and passes in the same table with a column distinguishing the two. Then finding visits is as simple as adding a where ... = 'visit' rather than having to cope with (slow and error prone) dynamic SQL.
Without knowing more details, I would probably create the tables like this:
create table days
(
id integer not null primary key,
day character varying(9) not null
);
create table event
(
day_id integer not null references days,
event_id integer not null,
event_type varchar(10) not null check (event_type in ('visit', 'pass'))
);
event_id might even be a foreign to key to another table you haven't shown us - again something you can't really do with de-normalized tables.
Getting all visits for specific days, is then as simple as:_
select event_id
from event
where day_id in (1,2)
and event_type = 'visit';
Or if you do need that as an array:
select array_agg(event_id)
from event
where day_id in (1,2)
and event_type = 'visit';
Online example
I have ERP application that uses the system date when posting transactions. The database is PostgreSQL. I'm able to use https://www.nirsoft.net/utils/run_as_date.html for backdate the application but I notice that the transactions are still posting as of "today" and I think that maybe because of PostgreSQL using the system date.
Is there any way I can set the date back for PostgreSQL? Or any other way to do this? The process in the ERP application does not have an option to back date.
The easiest would be to add a trigger to the database that would change the date for inserted rows:
create table testpast(
id serial primary key,
time timestamp with time zone not null default now()
);
insert into testpast (time) values (default);
select * from testpast;
id | time
----+-------------------------------
1 | 2018-03-16 00:09:20.219419+01
(1 row)
create function time_20_years_back() returns trigger as $$
begin
NEW.time = now()-'20 years'::interval;
return NEW;
end;
$$ language plpgsql;
create trigger testpast_time_20_years_back
before insert on testpast
for each row
execute procedure time_20_years_back();
insert into testpast (time) values (default);
select * from testpast;
id | time
----+-------------------------------
1 | 2018-03-16 00:09:20.219419+01
2 | 1998-03-16 00:09:55.741345+01
(2 rows)
Though I have no idea what would be the purpose of such a hack.
My idea is to implement a basic «vector clock», where a timestamps are clock-based, always go forward and are guaranteed to be unique.
For example, in a simple table:
CREATE TABLE IF NOT EXISTS timestamps (
last_modified TIMESTAMP UNIQUE
);
I use a trigger to set the timestamp value before insertion. It basically just goes into the future when two inserts arrive at the same time:
CREATE OR REPLACE FUNCTION bump_timestamp()
RETURNS trigger AS $$
DECLARE
previous TIMESTAMP;
current TIMESTAMP;
BEGIN
previous := NULL;
SELECT last_modified INTO previous
FROM timestamps
ORDER BY last_modified DESC LIMIT 1;
current := clock_timestamp();
IF previous IS NOT NULL AND previous >= current THEN
current := previous + INTERVAL '1 milliseconds';
END IF;
NEW.last_modified := current;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
DROP TRIGGER IF EXISTS tgr_timestamps_last_modified ON timestamps;
CREATE TRIGGER tgr_timestamps_last_modified
BEFORE INSERT OR UPDATE ON timestamps
FOR EACH ROW EXECUTE PROCEDURE bump_timestamp();
I then run a massive amount of insertions in two separate clients:
DO
$$
BEGIN
FOR i IN 1..100000 LOOP
INSERT INTO timestamps DEFAULT VALUES;
END LOOP;
END;
$$;
As expected, I get collisions:
ERROR: duplicate key value violates unique constraint "timestamps_last_modified_key"
État SQL :23505
Détail :Key (last_modified)=(2016-01-15 18:35:22.550367) already exists.
Contexte : SQL statement "INSERT INTO timestamps DEFAULT VALUES"
PL/pgSQL function inline_code_block line 4 at SQL statement
#rach suggested to mix current_clock() with a SEQUENCE object, but it would probably imply getting rid of the TIMESTAMP type. Even though I can't really figure out how it'd solve the isolation problem...
Is there a common pattern to avoid this?
Thank you for your insights :)
If you have only one Postgres server as you said, I think that using timestamp + sequence can solve the problem because sequence are non transactional and respect the insert order.
If you have db shard then it will be much more complex but maybe the distributed sequence of 2ndquadrant in BDR could help but I don't think that ordinality will be respected. I added some code below if you have setup to test it.
CREATE SEQUENCE "timestamps_seq";
-- Let's test first, how to generate id.
SELECT extract(epoch from now())::bigint::text || LPAD(nextval('timestamps_seq')::text, 20, '0') as unique_id ;
unique_id
--------------------------------
145288519200000000000000000010
(1 row)
CREATE TABLE IF NOT EXISTS timestamps (
unique_id TEXT UNIQUE NOT NULL DEFAULT extract(epoch from now())::bigint::text || LPAD(nextval('timestamps_seq')::text, 20, '0')
);
INSERT INTO timestamps DEFAULT VALUES;
INSERT INTO timestamps DEFAULT VALUES;
INSERT INTO timestamps DEFAULT VALUES;
select * from timestamps;
unique_id
--------------------------------
145288556900000000000000000001
145288557000000000000000000002
145288557100000000000000000003
(3 rows)
Let me know if that works. I'm not a DBA so maybe it will be good to ask on dba.stackexchange.com too about the potential side effect.
My two cents (Inspired from http://tapoueh.org/blog/2013/03/15-batch-update).
try adding the following before massive amount of insertions:
LOCK TABLE timestamps IN SHARE MODE;
Official documentation is here: http://www.postgresql.org/docs/current/static/sql-lock.html
Is it possible to declare a serial field in Postgres (9.0) which will increment based on a pattern?
For example:
Pattern: YYYY-XXXXX
where YYYY is a year, and XXXXX increments from 00000 - 99999.
Or should I just use a trigger?
EDIT: I prefer the year to be auto-determined based, maybe, on server date. The XXXXX part does start with 00000 for each year and "resets" to 00000 then increments again to 99999 when the year part is modified.
I would create a separate SEQUENCE for each year, so that each sequence keeps track of one year - even after that year is over, should you need more unique IDs for that year later.
This function does it all:
Improved with input from #Igor and #Clodoaldo in the comments.
CREATE OR REPLACE FUNCTION f_year_id(y text = to_char(now(), 'YYYY'))
RETURNS text AS
$func$
BEGIN
LOOP
BEGIN
RETURN y ||'-'|| to_char(nextval('year_'|| y ||'_seq'), 'FM00000');
EXCEPTION WHEN undefined_table THEN -- error code 42P01
EXECUTE 'CREATE SEQUENCE year_' || y || '_seq MINVALUE 0 START 0';
END;
END LOOP;
END
$func$ LANGUAGE plpgsql VOLATILE;
Call:
SELECT f_year_id();
Returns:
2013-00000
Basically this returns a text of your requested pattern. Automatically tailored for the current year. If a sequence of the name year_<year>_seq does not exist yet, it is created automatically and nextval() is retried.
Note that you cannot have an overloaded function without parameter at the same time (like my previous example), or Postgres will not know which to pick and throw an exception in despair.
Use this function as DEFAULT value in your table definition:
CREATE TABLE tbl (id text DEFAULT f_year_id(), ...)
Or you can get the next value for a year of your choice:
SELECT f_year_id('2012');
Tested in Postgres 9.1. Should work in v9.0 or v9.2 just as well.
To understand what's going on here, read these chapters in the manual:
CREATE FUNCTION
CREATE SEQUENCE
39.6.3. Simple Loops
39.5.4. Executing Dynamic Commands
39.6.6. Trapping Errors
Appendix A. PostgreSQL Error Codes
Table 9-22. Template Pattern Modifiers for Date/Time Formatting
You can create a function that will form this value (YYYY-XXXXX) and set this function as a default for a column.
Details here.