Postgresql Exception Logging - postgresql

Need help in taking care of the below requirement.
We need to take care of exceptions that could come in a pl sql block and log certain values from the select statement into a tailor made table - audit_log.
For example:
audit_log table structure:
col1, stored_procedure_name, error_code
CREATE OR REPLACE PROCEDURE SP_TEMP()
LANGUAGE plpgsql
AS $procedure$
declare
begin
/* loop through the data in table_a */
for sq in (select a.column1,a.column2..a.columnN from table_a a )
loop
/*Do some operations (not shown here) and select data from table_b */
(
select col1, col2, col3
from table_b b where
b.col1=sq.column1 )
/*insert into table_c*/
insert into table_c
values(sq.column1,sq.column2,b.col2,b.col3);
end loop;
EXCEPTION:
WHEN OTHERS THEN
/* Log the failure information to audit_log table */
insert into audit_log
values(column1, 'SP_TEMP',SQLERRM)
end
$procedure$
;
Is it possible to do this? How to pass column1 value to the exception?
We were not able to pass the column1 value to the exception.

Create a nested (inner block) inside the cursor loop. Then put your exception processing inside this block.
create or replace procedure sp_temp()
language plpgsql
as $$
declare
begin
/* loop through the data in table_a */
for sq in (select a.column1,a.column2..a.columnn from table_a a )
loop
begin -- inner block to allow processing the exception
/*do some operations (not shown here) and select data from table_b */
(
select col1, col2, col3
from table_b b where
b.col1=sq.column1 )
/*insert into table_c*/
insert into table_c
values(sq.column1,sq.column2,b.col2,b.col3);
exception
when others then
/* log the failure information to audit_log table */
insert into audit_log
values(sq.column1, 'sp_temp',sqlerrm);
end; -- inner block
end loop;
end;
$$;
NOTE: be careful of when others as the only predicate in the exception block. There are likely some conditions you want to handle and continue and others abort processing. Use when others only as the last resort.

Related

Is it worth Parallel/Concurrent INSERT INTO... (SELECT...) to the same Table in Postgres?

I was attempting an INSERT INTO.... ( SELECT... ) (inserting a batch of rows from SELECT... subquery), onto the same table in my database. For the most part it was working, however, I did see a "Deadlock" exception logged every now and then. Does it make sense to do this or is there a way to avoid a deadlock scenario? On a high-level, my queries both resemble this structure:
CREATE OR REPLACE PROCEDURE myConcurrentProc() LANGUAGE plpgsql
AS $procedure$
DECLARE
BEGIN
LOOP
EXIT WHEN row_count = 0
WITH cte AS (SELECT *
FROM TableA tbla
WHERE EXISTS (SELECT 1 FROM TableB tblb WHERE tblb.id = tbla.id)
INSERT INTO concurrent_table (SELECT id FROM cte);
COMMIT;
UPDATE log_tbl
SET status = 'FINISHED',
WHERE job_name = 'tblA_and_B_job';
END LOOP;
END
$procedure$;
And the other script that runs in parallel and INSERTS... also to the same table is also basically:
CREATE OR REPLACE PROCEDURE myConcurrentProc() LANGUAGE plpgsql
AS $procedure$
DECLARE
BEGIN
LOOP
EXIT WHEN row_count = 0
WITH cte AS (SELECT *
FROM TableC c
WHERE EXISTS (SELECT 1 FROM TableD d WHERE d.id = tblc.id)
INSERT INTO concurrent_table (SELECT id FROM cte);
COMMIT;
UPDATE log_tbl
SET status = 'FINISHED',
WHERE job_name = 'tbl_C_and_D_job';
END LOOP;
END
$procedure$;
So you can see I'm querying two different tables in each script, however inserting into the same some_table. I also have the UPDATE... statement that writes to a log table so I suppose that could also cause issues. Is there any way to use BEGIN... END here and COMMIT to avoid any deadlock/concurrency issues or should I just create a 2nd table to hold the "tbl_C_and_D_job" data?

Syntax error when calling one function from another

I am trying to create a function which calls 2 other functions.
Below is the calling function's code from where I am trying to call 2 another functions, schema1.func1() and schema1.func2().
But it is throwing error at the line SELECT schema1.func1(temp_val); saying:
syntax error at or near "SELECT".
I tried to figure out the correct syntax but couldn't resolve.
I am using Postgres version 1.14.3
DECLARE
temp_val int;
cursor1 CURSOR
FOR
SELECT col1 from schema1.table1;
BEGIN
OPEN cursor1;
LOOP
FETCH cursor1 INTO temp_val;
EXIT WHEN NOT FOUND;
SELECT CASE
WHEN NOT EXISTS (SELECT col2 FROM schema1.table2 WHERE col2 = temp_val)
THEN
BEGIN
SELECT schema1.func1(temp_val);
SELECT schema1.func2(temp_val);
END;
END CASE;
END LOOP;
CLOSE cursor1;
END;
There is a ; missing at:
SELECT CASE WHEN NOT EXISTS (SELECT col2 FROM schema1.table2 WHERE col2 = temp_val)
so..
SELECT CASE WHEN NOT EXISTS (SELECT col2 FROM schema1.table2 WHERE col2 = temp_val);
You can't mix PL/pgSQL BEGIN ... END blocks with SQL statement - even if that SQL statement is part of a PL/pgSQL function. So the BEGIN inside the THEN part of the CASE expression is invalid. A SQL CASE expression ends with just an END. A PL/pgSQL CASE statement would end with END CASE. As you are trying to use a CASE expression, it would require just END, not end case.
You also need to use perform in order to call a function where you want to discard the result.
It's not clear to me what exactly you want to achieve. If you only want to call those two functions if a row does not exist in table2 then you can do this with a single loop:
DECLARE
t1_rec record;
BEGIN
FOR t1_rec IN select t1.col1
from table1 t1
where not exists (select *
from table2 t2
where t2.col2 = t1.col1)
LOOP
perform schema1.func1(t1_rec.col1);
perform schema1.func2(t1_rec.col1);
END LOOP;
END;

plpgsql function not inserting data as intended

I have the below function compiled successfully. When I do select schema.funtion_name();, the function gets executed but there are no rows inserted in the table schema.table_insert:
CREATE OR REPLACE FUNCTION schema.function_name()
RETURNS void AS
$BODY$
DECLARE cur_1 CURSOR FOR
Select col1 from schema.table1
union
select col1 from schema.table2
union
select col1 from schema.table3
union
select col1 from schema.table4;
BEGIN
FOR rec_i in cur_1 LOOP
insert into schema.table_insert (col1,col2,col3)
select col1,col2,col3
from schema.view
where col1=rec_i.col1
commit;
END LOOP;
END;
$BODY$
LANGUAGE plpgsql STABLE
The select in cursor cur_1 returns more than 900 000 records. When I use the insert statement separately for single record, the record gets inserted in the table.
I have the below function compiled successfully.
No, you haven't.
For starters, plpgsql functions are not "compiled". On creation, only superficial syntax checks are done, then the function body is stored as is. No compilation. Late binding. Nested SQL statements are treated as prepared statements.
That aside, the function you display cannot be created at all. It is syntactical nonsense. Missing semicolon after the INSERT. COMMIT does not make sense and is not allowed in plpgsql. You do not need a cursor for this. Nor looping. Use a simple SQL statement:
INSERT INTO schema.table_insert (col1, col2, col3)
SELECT v.col1, v.col2, v.col3
FROM schema.view v
JOIN (
SELECT col1 FROM schema.table1
UNION
SELECT col1 FROM schema.table2
UNION
SELECT col1 FROM schema.table3
UNION
SELECT col1 FROM schema.table4;
) sub USING (col1);
Equivalent, may be faster:
INSERT INTO schema.table_insert (col1, col2, col3)
SELECT v.col1, v.col2, v.col3
FROM schema.view v
WHERE EXISTS (SELECT 1 schema.table1 WHERE col1 = v.col1)
OR EXISTS (SELECT 1 schema.table2 WHERE col1 = v.col1)
OR EXISTS (SELECT 1 schema.table3 WHERE col1 = v.col1)
OR EXISTS (SELECT 1 schema.table4 WHERE col1 = v.col1);
Can be wrapped up in a function, but plpgsql is overkill. And STABLE, would be wrong for a function containing an INSERT. I suggest a plain SQL function and VOLATILE is the default and correct for this.
CREATE OR REPLACE FUNCTION schema.function_name()
RETURNS void AS
$func$
INSERT ...
$func$ LANGUAGE sql;

How to optimize postgresql procedure

I have 61 million of non unique emails with statuses.
This emails need to deduplicate with logic by status.
I write stored procedure, but this procedure runs to long.
How I can optimize execution time of this procedure?
CREATE OR REPLACE FUNCTION public.load_oxy_emails() RETURNS boolean AS $$
DECLARE
row record;
rec record;
new_id int;
BEGIN
FOR row IN SELECT * FROM oxy_email ORDER BY id LOOP
SELECT * INTO rec FROM oxy_emails_clean WHERE email = row.email;
IF rec IS NOT NULL THEN
IF row.status = 3 THEN
UPDATE oxy_emails_clean SET status = 3 WHERE id = rec.id;
END IF;
ELSE
INSERT INTO oxy_emails_clean(id, email, status) VALUES(nextval('oxy_emails_clean_id_seq'), row.email, row.status);
SELECT currval('oxy_emails_clean_id_seq') INTO new_id;
INSERT INTO oxy_emails_clean_websites_relation(oxy_emails_clean_id, website_id) VALUES(new_id, row.website_id);
END IF;
END LOOP;
RETURN true;
END;
$$
LANGUAGE 'plpgsql';
How I can optimize execution time of this procedure?
Don't do it with a loop.
Doing a row-by-row processing (also known as "slow-by-slow") is almost always a lot slower then doing bulk changes where a single statement processes a lot of rows "in one go".
The change of the status can easily be done using a single statement:
update oxy_emails_clean oec
SET status = 3
from oxy_email oe
where oe.id = oec.id
and oe.status = 3;
The copying of the rows can be done using a chain of CTEs:
with to_copy as (
select *
from oxy_email
where status <> 3 --<< all those that have a different status
), clean_inserted as (
INSERT INTO oxy_emails_clean (id, email, status)
select nextval('oxy_emails_clean_id_seq'), email, status
from to_copy
returning id;
)
insert oxy_emails_clean_websites_relation (oxy_emails_clean_id, website_id)
select ci.id, tc.website_id
from clean_inserted ci
join to_copy tc on tc.id = ci.id;

How to reference a result set in plpgsql

In a plpgsql procedure I am looking how to reference and use a result set that I get from the first query. Following code tries to demonstrate what I want to achieve:
do
$body$
DECLARE
ref_result_set ???;
BEGIN
ref_result_set := select 'asdf';
perform xxx from ref_result_set;
perform yyy from ref_result_set;
END;
$body$
language plpgsql;
I was looking at cursors but there is just an option to fetch row by row and not an entire set. Is there any option how to achieve this without first writing to a table?
Question asked
There are no "table variables" in plpgsql (or SQL). You can use:
cursors
temporary, unlogged or regular tables
the original query as subquery, or a function or view doing the same
CTEs (for the scope of a single SQL statement)
Related questions:
Select from a table variable
Function to return a table of all children of a node
Actual problem
For your actual problem I suggest data-modifying CTEs:
WITH sel AS (
SELECT col1, col2, ..
FROM tbl1
WHERE <expensive condition>
)
, ins1 AS (
INSERT INTO test1 (col1, col2, ..)
SELECT col1, col2, ..
FROM sel
WHERE <some condition>
)
INSERT INTO test2 (col1, col2, ..)
SELECT col1, col2, ..
FROM sel
WHERE <some condition>;
You can use that inside plpgsql code or as standalone SQL command.
Inside plpgsql code you can reference variables in the query ...