Why is catching errors inside a LOOP causing performance issues? - postgresql

I had a function with a performance issue:
totalCharge := 0;
FOR myRecord IN ... LOOP
......
IF severalConditionsAreMet THEN
BEGIN
SELECT t1.charge INTO STRICT recordCharge
FROM t1
WHERE t1.id = myRecord.id AND otherComplexConditionsHere;
totalCharge := totalCharge + recordCharge;
...........
EXCEPTION
WHEN OTHERS THEN
NULL;
END;
END IF;
END LOOP;
The function was being called 232 times (not counting the number of times the code from the FOR was accessed).
The IF from the FOR LOOP ended up being accessed 4466 times and was taking 561 seconds to complete all 4466 iterations.
For the particular data set that I had, the IF was always accessed, the SELECT from above never return data and the code was reaching the EXCEPTION branch each and every time.
I have changed the code to:
totalCharge := 0;
FOR myRecord IN ... LOOP
......
IF severalConditionsAreMet THEN
SELECT t1.charge INTO recordCharge
FROM t1
WHERE t1.id = myRecord.id AND otherComplexConditionsHere;
IF (recordCharge IS NULL) THEN
CONTINUE;
END IF;
totalCharge := totalCharge + recordCharge;
...........
END IF;
END LOOP;
Please note that for the table t1, the t1.charge column has a NOT NULL condition defined on it.
This time, the code from the IF takes 1-2 seconds to complete all 4466 iterations.
Basically, all I did was replace the
BEGIN
…
EXCEPTION
….
END;
With
IF conditionIsNotMet THEN
CONTINUE;
END IF;
Can someone please explain to me why this worked?
What happened behind the scenes?
I suspect that when you catch exceptions inside of a LOOP and the code ends up generating an exception, Postgres can’t use cached plans to optimize that code so it ends up planning the code at each iteration and this causes performance issues.
Is my assumption correct?
Later Edit:
I altered the example provided by Vao Tsun to reflect the case that I want to illustrate.
CREATE OR REPLACE FUNCTION initialVersion()
RETURNS VOID AS $$
declare
testDate DATE;
begin
for i in 1..999999 loop
begin
select now() into strict testDate where 1=0;
exception when others
then null;
end;
end loop;
end;
$$ Language plpgsql;
CREATE OR REPLACE FUNCTION secondVersion()
RETURNS VOID AS $$
declare
testDate DATE;
begin
for i in 1..999999 loop
select now() into testDate where 1=0;
if testDate is null then
continue;
end if;
end loop;
end;
$$ Language plpgsql;
select initialVersion(); -- 19.7 seconds
select secondVersion(); -- 5.2
As you can see there is a difference of almost 15 seconds.
In the example that I have provided initially, the difference is bigger because the SELECT FROM t1 runs against complex data and takes more time to execute that the simple SELECT provided in this second example.

I asked the same question here, in the PostgreSQL - general mailing group and got some responses that elucidated this "mystery" for me:
David G. Johnston:
"​Tip: A block containing an EXCEPTION clause is significantly
more expensive to enter and exit than a block without one. Therefore,
don't use EXCEPTION without need."
I'm somewhat doubting "plan caching" has anything to do with this; I
suspect its basically that there is high memory and runtime overhead
to deal with the possibilities of needing to convert a exception into
a branch instead of allowing it to be fatal.
Tom Lane:
Yeah, it's about the overhead of setting up and ending a
subtransaction. That's a fairly expensive mechanism, but we don't have
anything cheaper that is able to recover from arbitrary errors.
and an addition from David G. Johnston:
[...] setting up the pl/pgsql execution layer to trap "arbitrary SQL-layer
exceptions"​ is fairly expensive. Even if the user specifies specific
errors the error handling mechanism in pl/pgsql is code for generic
(arbitrary) errors being given to it.
These answers helped me understand a bit how things work.
I am posting this answer here 'cause I hope that this answer will help someone else.

with given details - cant reproduce:
t=# do
$$
declare
begin
for i in 1..999999 loop
perform now();
/* exception when others then null; */
if null then null; end if;
end loop;
end;
$$
;
DO
Time: 1920.568 ms
t=# do
$$
declare
begin
for i in 1..999999 loop
begin
perform now();
exception when others then null;
end;
end loop;
end;
$$
;
DO
Time: 2417.425 ms
as you can see with 10 millions iterations the difference is clear, but insignificant. please try same test on your machine - if you get same results, you need to provide more details...

Related

How to execute PostgreSQL function at specific time interval?

Is there anything similar to setTimeout setTimeInterval in PostgreSQL which allows to execute piece of code (FUNCTION) at specified time interval?
As far as I know only thing that can execute a FUNCTION according to certain event is Triggers but it is not time based but operation driven (INSERT / UPDATE / DELETE / TRUNCATE)
While I could do this in application code, but prefer to have it delegated to database. Anyway I could achieve this in PostgreSQL? May be an extension?
Yes, there is a way to do this. It's called pg_sleep:
CREATE OR REPLACE FUNCTION my_function() RETURNS VOID AS $$
BEGIN
LOOP
PERFORM pg_sleep(1);
RAISE NOTICE 'This is a notice!';
END LOOP;
END;
$$ LANGUAGE plpgsql;
SELECT my_function();
This will raise the notice every second. You can also make it do other things instead of raising a notice.
OR
You can use PostgreSQL's Background Worker feature.
The following is a simple example of a background worker that prints a message every 5 seconds:
CREATE OR REPLACE FUNCTION print_message() RETURNS VOID AS $$
BEGIN
RAISE NOTICE 'Hello, world!';
END;
$$ LANGUAGE plpgsql;
CREATE OR REPLACE FUNCTION schedule_print_message() RETURNS VOID AS $$
DECLARE
job_id BIGINT;
BEGIN
SELECT bgw_start_recurring_job(
'print-message',
'print_message',
'5 seconds'
) INTO job_id;
END;
$$ LANGUAGE plpgsql;
SELECT schedule_print_message();

Continue a PL/pgSQL block even if it finds an error

I've gotten the essence of the function below from: How to re-check an SQL function created with check_function_bodies=false?
The context is: I'm migrating some functions from Oracle to PostgreSQL. While migrating them, I used the option which establishes a non-verification of their bodies, so that all function could be created without being compiled/verified, what would "speed" the process. Therefore, now - using the function bellow as a mean - I am trying to make an analysis of each function created in this X schema.
My problem is that the function doesn't continue when it finds an error. My central thought is to run the recompile_function() once and get all the messages fired when an error is found for each function. I have tried to enclosure the statement that verifies the function within a sub-block (BEGIN, EXCEPTION, END). It didn't work though.
What am I missing here?
CREATE OR REPLACE FUNCTION public.recompile_functions()
RETURNS void
LANGUAGE plpgsql
AS
$function$
DECLARE
l_func regprocedure;
BEGIN
--test plpgsql functions
FOR l_func IN (
SELECT oid
FROM pg_proc
WHERE pronamespace='<<schema>>'::regnamespace
AND prolang=(SELECT oid FROM pg_language WHERE lanname='plpgsql')
AND pg_proc.oid NOT IN (select tg.tgfoid FROM pg_trigger tg)
AND pg_proc.prokind = 'f'
)
LOOP
BEGIN
PERFORM plpgsql_validator(l_func);
EXCEPTION
WHEN OTHERS THEN
RAISE EXCEPTION 'Function % failed validation checks: %', l_func::text, SQLERRM;
END;
END LOOP;
END;
$function$

calling a function in sql within nested loop

My query doesnt even work, but hopefully the logic comes through. Basically Im using datavault.dimdates_csv to produce a row for each date/day. Then for each day Im trying to get all account ids and for each and call a function using the date and for the account.
is there a better approach to getting my data? I know nested loops arnt that great for sql.
do
$$
declare
date_record record;
account record;
begin
for date_record in select d."date" from datavault.dimdates_csv d
for account in select sad.id from datavault.sat_account_details sad
select datavault.account_active_for_date(date_record , account)
loop
loop
end loop;
end;
$$
It's hard to follow your business logic but syntax-wise your block needs correction. Please note that d."date" and sad.id are scalars (I assume a date and an integer) and not records.
do
$$
declare
running_date date;
running_id integer;
begin
for running_date in select d."date" from datavault.dimdates_csv d loop
for running_id in select sad.id from datavault.sat_account_details sad loop
perform datavault.account_active_for_date(running_date, running_id);
end loop;
end loop;
end;
$$;
As far as I can see you are calling function datavault.account_active_for_date for every pair of d."date" and sad.id. If this is true then you can simply
select datavault.account_active_for_date(d."date", sad.id)
from datavault.dimdates_csv d, datavault.sat_account_details sad;
and ignore the resultset.

PL/pgSQL "for loop" + select basic example ("hello world")

I've been using Postgres for a while, but I'm totally new to PL/pgSQL.
I'm struggling to get a basic for loop to work.
This works fine:
-- Without SELECT
DO $$
BEGIN
FOR counter IN 1..6 BY 2 LOOP
RAISE NOTICE 'Counter: %', counter;
END LOOP;
END; $$;
But what I really want is to iterate through the result of a SELECT query.
I keep running into this error:
Error in query: ERROR: loop variable of loop over rows must be a record or row variable or list of scalar variables
Sounds pretty obscure to me and googling did not help.
There's a table from my own data I want to use (I was hoping to use a SELECT * FROM mytable WHERE ‹whatever›), but I realize I can't even get the for loop to work with simpler data.
Take this:
-- with a SELECT
DO $$
BEGIN
RAISE NOTICE 'Get ready to be amazed…';
FOR target IN SELECT * FROM generate_series(1,2) LOOP
RAISE NOTICE 'hello'
END LOOP;
END; $$
This generates the error above too. I'd like to get a simple thing printed to get the hang of the loop syntax, something like:
hello 1
hello 2
What am I doing wrong?
The iterator must be declared
DO $$
DECLARE
target record;
BEGIN
RAISE NOTICE 'Get ready to be amazed…';
FOR target IN SELECT * FROM generate_series(1,2) LOOP
RAISE NOTICE 'hello';
END LOOP;
END; $$;
NOTICE: Get ready to be amazed…
NOTICE: hello
NOTICE: hello

Is this generic MERGE/UPSERT function for PostgreSQL safe?

I have created a "merge" function which is supposed to execute either an UPDATE or an INSERT query, depending on existing data. Instead of writing an upsert-wrapper for each table (as in most of the available examples), this function takes entire SQL strings. Both of the SQL strings are automatically generated by our application.
The plan is to call the function like this:
-- hypothetical "settings" table, with a primary key of (user_id, setting):
SELECT merge(
$$UPDATE settings SET value = 'x' WHERE user_id = 42 AND setting = 'foo'$$,
$$INSERT INTO settings (user_id, setting, value) VALUES (42, 'foo', 'x')$$
);
Here's the full code of the merge() function:
CREATE OR REPLACE FUNCTION merge (update_sql TEXT, insert_sql TEXT) RETURNS TEXT AS
$func$
DECLARE
max_iterations INTEGER := 10;
i INTEGER := 0;
num_updated INTEGER;
BEGIN
-- usually returns before re-entering the loop
LOOP
-- first try the update
EXECUTE update_sql;
GET DIAGNOSTICS num_updated = ROW_COUNT;
IF num_updated > 0 THEN
RETURN 'UPDATE';
END IF;
-- nothing was updated: try the insert, watching out for concurrent inserts
BEGIN
EXECUTE insert_sql;
RETURN 'INSERT';
EXCEPTION WHEN unique_violation THEN
-- nop; just loop and try again from the top
END;
-- emergency brake
i := i + 1;
IF i >= max_iterations THEN
RAISE EXCEPTION 'merge(): tried looping % times, giving up now.', i;
EXIT;
END IF;
END LOOP;
END;
$func$
LANGUAGE plpgsql;
It appears to work well enough in my tests, but I'm not certain if I haven't missed anything crucial, especially regarding concurrent UPDATE/INSERT/DELETE queries, which may be issued without using this function. Did I overlook anything important?
Among the resources I consulted for this function are:
UPDATE/INSERT example 40.2 in the PostgreSQL manual
Why is UPSERT so complicated?
SO: Insert, on duplicate update (postgresql)
(Edit: one of the goals was to avoid locking the target table.)
The answer to your question depends your the context of how your application(s) will access the database. There are many ways to solve this as nicely discussed in depesz's post you cited by yourself. In addition you might want to also consider using writeable CTEs see here. Also the [question]Insert, on duplicate update in PostgreSQL? has some interesting discussions for your decision making process.