How to force PostgreSQL function run sequentially - postgresql

I have a PostgreSQL function A.
Many clients will call A:
- client X1 send query 1 "SELECT A();" then
- client X2 send query 2 "SELECT A();" then
- client X3 send query 3 "SELECT A();" then
...
How to force function A to run sequentially?
Mean that force: query 1 run --> finish or timeout --> query 2 run --> finish or timeout --> query run --> finish or timeout ... (not allow query 1 and query 2 run simultaneously)

Use advisory locks.
The first command in the function body should be (1234 is an exemplary integer constant):
perform pg_advisory_xact_lock(1234);
When two concurrent sessions call the function, one of them will wait until the function in the second one completes. This is a transaction-level advisory lock, automatically released when a transaction terminates.
Alternatively, you can use a session-level advisory lock, which can (should) be manually released:
create function example()
returns void language plpgsql as $$
begin
perform pg_advisory_lock(1234);
--
-- function's commands
--
perform pg_advisory_unlock(1234);
end $$;
Any advisory lock obtained in a session is automatically released at the end of the session (if it hasn't been released earlier).

I think the full answer is use pg_advisory_xact_lock + idle_in_transaction_session_timeout in function A.
Query 2 will wait until query 1 is completed (lock auto released) or timeout (session auto killed and lock also auto released).

Related

kill the long running queries automatically

I want to kill the queries which are running more then 2 hours in automatic way.
I tried creating trigger like below
create or replace function stop_query()
RETURNS trigger
language plpgsql
as $$
begin
with pid_tbl as
(
SELECT
pid
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '120 minutes';
)
select * from pid_tbl;
SELECT pg_cancel_backend(var_pid);
end;$$
CREATE TRIGGER stop_query
FOR EACH ROW EXECUTE FUNCTION stop_query();
please advice me how can i achieve this. is there any way I can achieve it without writing functions trigger
You don't need this trigger at all. As I mentioned in the comment, it should be enough for you to run one of these queries:
SET LOCAL statement_timeout='2 h';--applies only until the end of the current transaction within the current session
SET SESSION statement_timeout='2 h';--only in the current session/connection
ALTER ROLE your_user_name SET statement_timeout='2 h';--all new sessions of this user
ALTER DATABASE your_db_name SET statement_timeout='2 h';--all new sessions on this db
ALTER SYSTEM SET statement_timeout='2 h';--all new sessions on all dbs on this system
They all set the statement_timeout setting that's by default 0 (meaning "no limit") to '2 h' (which simply stand for "2 hours"). It's best to apply this only to the specific context where it's required, i.e. for a specific user that tends to run queries you don't want hanging for too long.
Documentation:
statement_timeout (integer)
Abort any statement that takes more than the specified amount of time. If log_min_error_statement is set to ERROR or lower, the statement that timed out will also be logged. If this value is specified without units, it is taken as milliseconds. A value of zero (the default) disables the timeout.
The timeout is measured from the time a command arrives at the server until it is completed by the server. If multiple SQL statements appear in a single simple-Query message, the timeout is applied to each statement separately. (PostgreSQL versions before 13 usually treated the timeout as applying to the whole query string.) In extended query protocol, the timeout starts running when any query-related message (Parse, Bind, Execute, Describe) arrives, and it is canceled by completion of an Execute or Sync message.
Setting statement_timeout in postgresql.conf is not recommended because it would affect all sessions.
If you try to use unsupported units, you'll get a hint with your error:
ERROR: invalid value for parameter "statement_timeout": "2 hours"
HINT: Valid units for this parameter are "us", "ms", "s", "min", "h", and "d".
Which are microseconds, milliseconds, seconds, minutes, hours and days respectively.

How to run Postgres pg_cron Job AFTER another Job?

I running some automated tasks on my postgres database at night using the pg_cron extension. I am moving certain old records to archive database tables. I am running 5 Stored Procedures concurrently on 5 different background workers, so they all start at the same time and run on different workers (I am assuming this is similar to running different Tasks on different Threads in Java). These 5 Stored Procedures are independent (moving records to archive tables), so they can run at the same time. I schedule them each using a command like
cron.schedule (myJob1,
'* * * * *',
'call my_stored_proc_1()'
);
cron.schedule (myJob2,
'* * * * *',
'call my_stored_proc_2()'
);
.
..
...
cron.schedule (myJob5,
'* * * * *',
'call my_stored_proc_5()'
);
NOW, I have some MORE dependent Store Procedures that I want to run. But they need to run AFTER these 5 Jobs finish/complete, because they are doing some DELETE... sql operations.
How can I have this second Stored Procedure (the one doing the DELETE queries) Job run AFTER my first 5 Stored Procedures Jobs when they are DONE? I don't want to set a CRON expression for the second Stored Procedure doing the DELETES, because I don't know what time the first 5 Stored Procs are even going to finish...
Below I included a little schematic of how the Jobs are currently triggered and how I want it to work (if possible):
Preface: how I understand problem
I hope that I understand the problem described by OP.
If I was wrong then it makes everything below invalid.
I suppose that it's about periodic night tasks heavy in CPU and/or IO.
E.g:
there are tasks A-C for archiving data
maybe task D-E for rebuilding aggregates / refreshing mat views
and finally task F that runs reindexing/analyze on whole DB
So it makes sense to run task F only after tasks A-E are finished.
Every task is needed to be run just once in a period of time:
once in a day or hour or week or only during weekends in a night time
it's better not to run in a time when server is under load
Does it fits with OP requirement - IDK.
For the sake of simplicity let's presume that each task runs only once in a night. It's easy to extend for other periods/requirements.
Data-driven approach
1. Add log table
E.g.
CREATE TABLE job_log (
log_id bigint,
job_name text,
log_date timestamptz
)
Tasks A-E
On start
For each job function do check:
IF EXISTS(
SELECT 1 FROM job_log
WHERE
job_name = 'TaskA' # TaskB-TaskE for each functiont
AND log_date::DATE = NOW()::DATE # check that function already executed this night
) OR EXISTS(
SELECT 1 FROM pg_stat_activity
WHERE
query like 'SELECT * FROM jobA_function();' # check that job not executing right now
) THEN RETURN;
END IF;
It's possible that other conditions could be added: look for amount of connections, existence of locks and so on.
This way it will be guaranteed that function will not be executed more frequently than needed.
On finish
INSERT INTO job_log
SELECT
(SELECT MAX(log_id) FROM job_log) + 1 # or use sequences/other autoincrements
,'TaskA'
,NOW()
Cronjob schedule
The meaning of it becames different.
Now it's: "try to initiate execution of task".
It's safe to schedule it for every hour between a chosen period or even more frequently.
Cronjob cannot know if the server is under load or not, are there locks on a table or maybe somebody started execution of task manually.
Job function could be more smart in that.
Task F
Same as above but check on start looks for completion of other tasks.
E.g.
IF NOT EXISTS(
SELECT 1 FROM job_log
WHERE
job_name = 'TaskA'
AND log_date::DATE = NOW()::DATE
) OR NOT EXISTS(
SELECT 1 FROM job_log
WHERE
job_name = 'TaskB'
AND log_date::DATE = NOW()::DATE
)
.... # checks for completions of other tasks
OR EXISTS(
SELECT 1 FROM job_log
WHERE
job_name = 'TaskF' # TaskB-TaskE for each functiont
AND log_date::DATE = NOW()::DATE # check that function already executed this night
) OR EXISTS(
SELECT 1 FROM pg_stat_activity
WHERE
query like 'SELECT * FROM jobF_function();' # check that job not executing right now
) THEN RETURN;
On completion
Write to job_log the same as other functions.
UPDATE. Cronjob schedule
Create multiple schedule in cronjob.
E.g.
Let's say tasks A-E will run approximately 10-15 minutes.
And it's possible that one or two of them could work for 30-45-60 minutes.
Create a schedule for task F to attempt start every 5 minutes.
How that will work:
attempt 1: task A finished, other still working -> exit
attempt 2: task A-C finished -> exit
attempt 3: tasks A-E finished -> start task F
attempt 4: tasks A-E finished but in pg_stat_activity there is an executing task F -> exit
attempt 5: tasks A-E finished, pg_stat_activity is empty but in logs we see that task F already executed -> no need to work -> exit
... all other attempts will be the same till next night
Summary
It's easy extend this approach for any requirements:
another periodicity
or make it unperiodic at all. E.g. make a table with trigger and start execution on change
dependencies of any depth and/or "fuzzy" dependencies
... literally everything
Conception remains the same:
cronjob schedule means "try to run"
decision to run or not is data-driven
I would be glad to hear criticism of any kind - who knows maybe I'm overlooking something.
You could to use pg_stat_activity view to ensure that there are no active query like your jobs 1-5.
Note:
Superusers and members of the built-in role pg_read_all_stats (see also Section 21.5) can see all the information about all sessions
...
while (
select count(*) > 0
from pg_stat_activity
where query in ('call my_stored_proc_1()', 'call my_stored_proc_2()', ...))
loop
perform pg_sleep(1);
perform pg_stat_clear_snapshot(); -- needs to retrieve the fresh data
end loop;
...
Just insert this code at the beginning of your stored proc 6 and call it for a few seconds after the jobs 1-5.
Note 1:
The condition could be simplified and generalized using regexp:
when query ~ 'my_stored_proc_1|my_stored_proc_2|...'
Note 2:
You could to implement timeout using clock_timestamp() function:
...
is_timedout := false;
timeout := '10 min'::interval; -- stop waiting after 10 minutes
start_time := clock_timestamp();
while (...)
loop
perform pg_sleep(1);
perform pg_stat_clear_snapshot(); -- needs to retrieve the fresh data
if clock_timestamp() - start_time > timeout then
is_timedout := true;
break;
end if;
end loop;
if is_timedout then
...
else
...
end if;
...
Note 3:
Look at the other columns of the pg_stat_activity. You may need to use them as well.

Is this function free from race-conditions?

I wrote a function which returns me a record ID which is in the PENDING state (the column state). And it should limit active downloads (records that are in states STARTED or COMPLETED). So, it helps me to avoid resource limits problems. The functions is:
CREATE FUNCTION start_download(max_run INTEGER) RETURNS INTEGER
LANGUAGE PLPGSQL AS
$$
DECLARE
started_id INTEGER;
BEGIN
UPDATE downloads SET state='STARTED' WHERE id IN (
SELECT id FROM downloads WHERE state='PENDING' AND (
SELECT max_run >= COUNT(id) FROM downloads WHERE state::TEXT IN ('STARTED','COMPLETED'))
LIMIT 1 FOR UPDATE)
RETURNING id INTO started_id;
RETURN started_id;
COMMIT;
END;
$$;
Is it safe in the meaning of race-conditions? I mean that the resources limit will not be hit because of race condition (2 or more threads will get ID of some PENDING record or even of the same record and the limit of active, i.e. STARTED/COMPLETED downloads will be reached).
In short this function should work as test-and-set procedure and to return the available ID (switched from PENDING to STARTED, but it's irrelevant details). Does it have such property - free from race conditions? Or maybe I have to use some locks...
PS. 1) downloads table has columns id, state (an enum with values like STARTED, PENDING, ERROR, COMPLETED, PROCESSED) and others that don't matter for the question's context. 2) max_run is the limit of active downloads.
Yes, there is a race condition in the innermost subquery.
If two concurrent sessions run your function at the same time, they both could find that max_run is equal to the count of started or completed jobs, and they would both start a job, thereby pushing the number of started or running jobs over the limit.
That is not easy to avoid, unless you lock all the started or completed jobs (very bad for concurrency) or use a higher transaction isolation level (SERIALIZABLE).

PostgreSQL concurrent update selects

I am attempting to have some sort of update select for a job queue. I need it to support concurrent processes affecting the same table or database This server will be used only for the queue so a database per queue is acceptable. Originally I was thinking about something like the following:
UPDATE state=1,ts=NOW() FROM queue WHERE ID IN (SELECT ID FROM queue WHERE state=0 LIMIT X) RETURN *
Which I been reading that this will cause a race condition, I read that there was a an option for the SELECT subquery to use FOR UPDATE, but then that will lock the row and concurrent calls will be blocked where I would not mind if they skip over to the next unlocked row.
So what i am asking for is the best way to have a fifo system in postgres that requires the least amount of locking the entire database.
The typical way to do this is to wrap it in a PLPGSQL function, select FOR UPDATE NOWAIT, and then use exception handling to skip the locked rows.
This does place some additional overhead on the function because exception handling requires additional processor cycles to manage even if there are no exceptions.
As a very simple example:
CREATE OR REPLACE FUNCTION get_all_unlocked_customers() RETURNS SETOF customer
LANGUAGE PLPGSQL AS
$$
RETURN QUERY SELECT * FROM customer FOR UPDATE NOWAIT;
EXCEPTION
WHEN LOCK_NOT_AVAILABLE THEN
-- NO NEED TO DO ANYTHING
END;
END;
$$;

Are PostgreSQL functions transactional?

Is a PostgreSQL function such as the following automatically transactional?
CREATE OR REPLACE FUNCTION refresh_materialized_view(name)
RETURNS integer AS
$BODY$
DECLARE
_table_name ALIAS FOR $1;
_entry materialized_views%ROWTYPE;
_result INT;
BEGIN
EXECUTE 'TRUNCATE TABLE ' || _table_name;
UPDATE materialized_views
SET last_refresh = CURRENT_TIMESTAMP
WHERE table_name = _table_name;
RETURN 1;
END
$BODY$
LANGUAGE plpgsql VOLATILE SECURITY DEFINER;
In other words, if an error occurs during the execution of the function, will any changes be rolled back? If this isn't the default behavior, how can I make the function transactional?
PostgreSQL 12 update: there is limited support for top-level PROCEDUREs that can do transaction control. You still cannot manage transactions in regular SQL-callable functions, so the below remains true except when using the new top-level procedures.
Functions are part of the transaction they're called from. Their effects are rolled back if the transaction rolls back. Their work commits if the transaction commits. Any BEGIN ... EXCEPT blocks within the function operate like (and under the hood use) savepoints like the SAVEPOINT and ROLLBACK TO SAVEPOINT SQL statements.
The function either succeeds in its entirety or fails in its entirety, barring BEGIN ... EXCEPT error handling. If an error is raised within the function and not handled, the transaction calling the function is aborted. Aborted transactions cannot commit, and if they try to commit the COMMIT is treated as ROLLBACK, same as for any other transaction in error. Observe:
regress=# BEGIN;
BEGIN
regress=# SELECT 1/0;
ERROR: division by zero
regress=# COMMIT;
ROLLBACK
See how the transaction, which is in the error state due to the zero division, rolls back on COMMIT?
If you call a function without an explicit surounding transaction the rules are exactly the same as for any other Pg statement:
BEGIN;
SELECT refresh_materialized_view(name);
COMMIT;
(where COMMIT will fail if the SELECT raised an error).
PostgreSQL does not (yet) support autonomous transactions in functions, where the procedure/function could commit/rollback independently of the calling transaction. This can be simulated using a new session via dblink.
BUT, things that aren't transactional or are imperfectly transactional exist in PostgreSQL. If it has non-transactional behaviour in a normal BEGIN; do stuff; COMMIT; block, it has non-transactional behaviour in a function too. For example, nextval and setval, TRUNCATE, etc.
As my knowledge of PostgreSQL is less deeper than Craig Ringer´s I will try to give a shorter answer: Yes.
If you execute a function that has an error in it, none of the steps will impact in the database.
Also, if you execute a query in PgAdmin the same happen.
For example, if you execute in a query:
update your_table yt set column1 = 10 where yt.id=20;
select anything_that_do_not_exists;
The update in the row, id = 20 of your_table will not be saved in the database.
UPDATE Sep - 2018
To clarify the concept I have made a little example with non-transactional function nextval.
First, let´s create a sequence:
create sequence test_sequence start 100;
Then, let´s execute:
update your_table yt set column1 = 10 where yt.id=20;
select nextval('test_sequence');
select anything_that_do_not_exists;
Now, if we open another query and execute
select nextval('test_sequence');
We will get 101 because the first value (100) was used in the latter query (that is because the sequences are not transactional) although the update was not committed.
https://www.postgresql.org/docs/current/static/plpgsql-structure.html
It is important not to confuse the use of BEGIN/END for grouping statements in PL/pgSQL with the similarly-named SQL commands for transaction control. PL/pgSQL's BEGIN/END are only for grouping; they do not start or end a transaction. Functions and trigger procedures are always executed within a transaction established by an outer query — they cannot start or commit that transaction, since there would be no context for them to execute in. However, a block containing an EXCEPTION clause effectively forms a subtransaction that can be rolled back without affecting the outer transaction. For more about that see Section 39.6.6.
In the function level, it is not transnational. In other words, each statement in the function belongs to a single transaction, which is the default db auto commit value. Auto commit is true by default. But anyway, you have to call the function using
select schemaName.functionName()
The above statement 'select schemaName.functionName()' is a single transaction, let's name the transaction T1, and so the all the statements in the function belong to the transaction T1. In this way, the function is in a single transaction.
Postgres 14 update: All statements written in between the BEGIN and END block of a Procedure/Function is executed in a single transaction. Thus, any errors arising while execution of this block will cause automatic roll back of the transaction.
Additionally, the ATOMIC Transaction including triggers as well.