Optimizing an insert/update loop in a stored procedure - postgresql

I have two tables wholesaler_catalog and wholesaler_catalog_prices. The latter has a foreign key reference to the former.
wholesaler_catalog_prices has a column called cost_type which can be either RETAIL or DISCOUNT.
Consider row Foo in wholesaler_catalog. Foo has two entries in wholesaler_catalog_prices - one for RETAIL and one for DISCOUNT. I want to split up Foo into Foo1 and Foo2, such that Foo1 points to RETAIL and Foo2 points to DISCOUNT. (The reasons for doing this are complex which I won't go into - it's part of a larger migration)
I have made a stored procedure that looks like this:
do
$$
declare
f record;
new_id int;
begin
for f in select catalog_id from
(select catalog_id, cost_type, row_number() over (partition by catalog_id) from wholesaler_catalog_prices
group by catalog_id, cost_type
order by catalog_id) as x
where row_number > 1
loop
insert into wholesaler_catalog
(item_number, name, catalog_log_id)
select item_number, name, catalog_log_id from wholesaler_catalog
where id = f.catalog_id
returning id into new_id;
-- RAISE NOTICE '% copied to %', f.catalog_id, new_id;
update wholesaler_catalog_prices set catalog_id = new_id where catalog_id = f.catalog_id and cost_type = 'RETAIL';
end loop;
end;
$$
The problem is that there are about 100k such records and it takes a very long time to run (I cancelled the run after 30 minutes). Is there anyway I can optimize the procedure to run faster?

Related

Is it worth Parallel/Concurrent INSERT INTO... (SELECT...) to the same Table in Postgres?

I was attempting an INSERT INTO.... ( SELECT... ) (inserting a batch of rows from SELECT... subquery), onto the same table in my database. For the most part it was working, however, I did see a "Deadlock" exception logged every now and then. Does it make sense to do this or is there a way to avoid a deadlock scenario? On a high-level, my queries both resemble this structure:
CREATE OR REPLACE PROCEDURE myConcurrentProc() LANGUAGE plpgsql
AS $procedure$
DECLARE
BEGIN
LOOP
EXIT WHEN row_count = 0
WITH cte AS (SELECT *
FROM TableA tbla
WHERE EXISTS (SELECT 1 FROM TableB tblb WHERE tblb.id = tbla.id)
INSERT INTO concurrent_table (SELECT id FROM cte);
COMMIT;
UPDATE log_tbl
SET status = 'FINISHED',
WHERE job_name = 'tblA_and_B_job';
END LOOP;
END
$procedure$;
And the other script that runs in parallel and INSERTS... also to the same table is also basically:
CREATE OR REPLACE PROCEDURE myConcurrentProc() LANGUAGE plpgsql
AS $procedure$
DECLARE
BEGIN
LOOP
EXIT WHEN row_count = 0
WITH cte AS (SELECT *
FROM TableC c
WHERE EXISTS (SELECT 1 FROM TableD d WHERE d.id = tblc.id)
INSERT INTO concurrent_table (SELECT id FROM cte);
COMMIT;
UPDATE log_tbl
SET status = 'FINISHED',
WHERE job_name = 'tbl_C_and_D_job';
END LOOP;
END
$procedure$;
So you can see I'm querying two different tables in each script, however inserting into the same some_table. I also have the UPDATE... statement that writes to a log table so I suppose that could also cause issues. Is there any way to use BEGIN... END here and COMMIT to avoid any deadlock/concurrency issues or should I just create a 2nd table to hold the "tbl_C_and_D_job" data?

Recursive with cursor on psql, nothing data found

How to use a recursive query and then using cursor to update multiple rows in postgresql. I try to return data but no data is found. Any alternative to using recursive query and cursor, or maybe better code please help me.
drop function proses_stock_invoice(varchar, varchar, character varying);
create or replace function proses_stock_invoice
(p_medical_cd varchar,p_post_cd varchar, p_pstruserid character varying)
returns void
language plpgsql
as $function$
declare
cursor_data refcursor;
cursor_proses refcursor;
v_medicalCd varchar(20);
v_itemCd varchar(20);
v_quantity numeric(10);
begin
open cursor_data for
with recursive hasil(idnya, level, pasien_cd, id_root) as (
select medical_cd, 1, pasien_cd, medical_root_cd
from trx_medical
where medical_cd = p_pstruserid
union all
select A.medical_cd, level + 1, A.pasien_cd, A.medical_root_cd
from trx_medical A, hasil B
where A.medical_root_cd = B.idnya
)
select idnya from hasil where level >=1;
fetch next from cursor_data into v_medicalCd;
return v_medicalCd;
while (found)
loop
open cursor_proses for
select B.item_cd, B.quantity from trx_medical_resep A
join trx_resep_data B on A.medical_resep_seqno = B.medical_resep_seqno
where A.medical_cd = v_medicalCd and B.resep_tp = 'RESEP_TP_1';
fetch next from cursor_proses into v_itemCd, v_quantity;
while (found)
loop
update inv_pos_item
set quantity = quantity - v_quantity, modi_id = p_pstruserid, modi_id = now()
where item_cd = v_itemCd and pos_cd = p_post_cd;
end loop;
close cursor_proses;
end loop;
close cursor_data;
end
$function$;
but nothing data found?
You have a function with return void so it will never return any data to you. Still you have the statement return v_medicalCd after fetching the first record from the first cursor, so the function will return from that point and never reach the lines below.
When analyzing your function you have (1) a cursor that yields a number of idnya values from table trx_medical, which is input for (2) a cursor that yields a number of v_itemCd, v_quantity from tables trx_medical_resep, trx_resep_data for each idnya, which is then used to (3) update some rows in table inv_pos_item. You do not need cursors to do that and it is, in fact, extremely inefficient. Instead, turn the whole thing into a single update statement.
I am assuming here that you want to update an inventory of medicines by subtracting the medicines prescribed to patients from the stock in the inventory. This means that you will have to sum up prescribed amounts by type of medicine. That should look like this (note the comments):
CREATE FUNCTION proses_stock_invoice
-- VVV parameter not used
(p_medical_cd varchar, p_post_cd varchar, p_pstruserid varchar)
RETURNS void AS $function$
UPDATE inv_pos_item -- VVV column repeated VVV
SET quantity = quantity - prescribed.quantity, modi_id = p_pstruserid, modi_id = now()
FROM (
WITH RECURSIVE hasil(idnya, level, pasien_cd, id_root) AS (
SELECT medical_cd, 1, pasien_cd, medical_root_cd
FROM trx_medical
WHERE medical_cd = p_pstruserid
UNION ALL
SELECT A.medical_cd, level + 1, A.pasien_cd, A.medical_root_cd
FROM trx_medical A, hasil B
WHERE A.medical_root_cd = B.idnya
)
SELECT B.item_cd, sum(B.quantity) AS quantity
FROM trx_medical_resep A
JOIN trx_resep_data B USING (medical_resep_seqno)
JOIN hasil ON A.medical_cd = hasil.idnya
WHERE B.resep_tp = 'RESEP_TP_1'
--AND hacil.level >= 1 Useless because level is always >= 1
GROUP BY 1
) prescribed
WHERE item_cd = prescribed.item_cd
AND pos_cd = p_post_cd;
$function$ LANGUAGE sql STRICT;
Important
As with all UPDATE statements, test this code before you run the function. You can do that by running the prescribed sub-query separately as a stand-alone query to ensure that it does the right thing.

How to optimize postgresql procedure

I have 61 million of non unique emails with statuses.
This emails need to deduplicate with logic by status.
I write stored procedure, but this procedure runs to long.
How I can optimize execution time of this procedure?
CREATE OR REPLACE FUNCTION public.load_oxy_emails() RETURNS boolean AS $$
DECLARE
row record;
rec record;
new_id int;
BEGIN
FOR row IN SELECT * FROM oxy_email ORDER BY id LOOP
SELECT * INTO rec FROM oxy_emails_clean WHERE email = row.email;
IF rec IS NOT NULL THEN
IF row.status = 3 THEN
UPDATE oxy_emails_clean SET status = 3 WHERE id = rec.id;
END IF;
ELSE
INSERT INTO oxy_emails_clean(id, email, status) VALUES(nextval('oxy_emails_clean_id_seq'), row.email, row.status);
SELECT currval('oxy_emails_clean_id_seq') INTO new_id;
INSERT INTO oxy_emails_clean_websites_relation(oxy_emails_clean_id, website_id) VALUES(new_id, row.website_id);
END IF;
END LOOP;
RETURN true;
END;
$$
LANGUAGE 'plpgsql';
How I can optimize execution time of this procedure?
Don't do it with a loop.
Doing a row-by-row processing (also known as "slow-by-slow") is almost always a lot slower then doing bulk changes where a single statement processes a lot of rows "in one go".
The change of the status can easily be done using a single statement:
update oxy_emails_clean oec
SET status = 3
from oxy_email oe
where oe.id = oec.id
and oe.status = 3;
The copying of the rows can be done using a chain of CTEs:
with to_copy as (
select *
from oxy_email
where status <> 3 --<< all those that have a different status
), clean_inserted as (
INSERT INTO oxy_emails_clean (id, email, status)
select nextval('oxy_emails_clean_id_seq'), email, status
from to_copy
returning id;
)
insert oxy_emails_clean_websites_relation (oxy_emails_clean_id, website_id)
select ci.id, tc.website_id
from clean_inserted ci
join to_copy tc on tc.id = ci.id;

Declare a variable of temporary table in stored procedure in PL/pgSQL

I receive this error to begin with:
ERROR: syntax error at or near "conference"
LINE 19: FOR conference IN conferenceset
Here's the function:
CREATE OR REPLACE FUNCTION due_payments_to_suppliers_previous_month()
RETURNS TABLE(supplier varchar,due_amount numeric)
AS $$
DECLARE
BEGIN
CREATE TABLE conferenceset AS -- temporary table, so I can store the result set
SELECT
conference.conference_supplier_id,
conference.id AS conferenceid,
conference.price_per_person,
0 AS participants_count,
400 AS deduction_per_participant,
0 AS total_amount
FROM Conference WHERE --- date_start has to be from the month before
date_start >= date_trunc('month', current_date - interval '1' month)
AND
date_start < date_trunc('month', current_date);
FOR conference IN conferenceset
LOOP
---fill up the count_participants column for the conference
conference.participants_count :=
SELECT COUNT(*)
FROM participant_conference JOIN conferenceset
ON participant_conference.conference_id = conferenceset.conferenceid;
---calculate the total amount for that conference
conference.total_amount := somerec.participants_count*(conference.price_per_person-conference.deduction_per_participant);
END LOOP;
----we still don't have the name of the suppliers of these conferences
CREATE TABLE finalresultset AS -- temporary table again
SELECT conference_supplier.name, conferenceset.total_amount
FROM conferenceset JOIN conference_supplier
ON conferenceset.conference_supplier_id = conference_supplier.id
----we have conference records with their amounts and suppliers' names scattered all over this set
----return the result with the suppliers' names extracted and their total amounts calculated
FOR finalrecord IN (SELECT name,SUM(total_amount) AS amount FROM finalresultset GROUP BY name)
LOOP
supplier:=finalrecord.name;
due_amount:=finalrecord.amount;
RETURN NEXT;
END LOOP;
END; $$
LANGUAGE 'plpgsql';
I don't know how and where to declare the variables that I need for the two FOR loops that I have: conference as type conferenceset and finalrecord whose type I'm not even sure of.
I guess nested blocks will be needed as well. It's my first stored procedure and I need help.
Thank you.
CREATE OR REPLACE FUNCTION due_payments_to_suppliers_previous_month()
RETURNS TABLE(supplier varchar,due_amount numeric)
AS $$
DECLARE
conference record;
finalrecord record;
BEGIN
CREATE TABLE conferenceset AS -- temporary table, so I can store the result set
SELECT
conference.conference_supplier_id,
conference.id AS conferenceid,
conference.price_per_person,
0 AS participants_count,
400 AS deduction_per_participant,
0 AS total_amount
FROM Conference WHERE --- date_start has to be from the month before
date_start >= date_trunc('month', current_date - interval '1' month)
AND
date_start < date_trunc('month', current_date);
FOR conference IN (select * from conferenceset)
LOOP
---fill up the count_participants column for the conference
conference.participants_count = (
SELECT COUNT(*)
FROM participant_conference JOIN conferenceset
ON participant_conference.conference_id = conferenceset.conferenceid
);
---calculate the total amount for that conference
conference.total_amount = somerec.participants_count*(conference.price_per_person-conference.deduction_per_participant);
END LOOP;
----we still don't have the name of the suppliers of these conferences
CREATE TABLE finalresultset AS -- temporary table again
SELECT conference_supplier.name, conferenceset.total_amount
FROM conferenceset JOIN conference_supplier
ON conferenceset.conference_supplier_id = conference_supplier.id
----we have conference records with their amounts and suppliers' names scattered all over this set
----return the result with the suppliers' names extracted and their total amounts calculated
FOR finalrecord IN (SELECT name,SUM(total_amount) AS amount FROM finalresultset GROUP BY name)
LOOP
supplier = finalrecord.name;
due_amount = finalrecord.amount;
RETURN NEXT;
END LOOP;
END; $$
LANGUAGE 'plpgsql';

Postgres - Trigger with matched key

I have several tables. A table, cexp, is a table that has attributes cid and total. Cid is grouped and total is the sum of quantity * price for that cid (matched on cid)
The cexp table was populated with the results of the following code:
SELECT c.cid, sum(ol.quantity*b.price) as total
FROM customers c join orders o on c.cid=o.cid
join orderlist ol on o.ordernum=ol.ordernum
join books b on b.isbn=ol.isbn
GROUP BY C.CID
My task is to create a trigger that, when inserting rows for order and orderderlist, finds the matching name, in cexp and increments the existing total by the product of new quantity (from orderlist) and the price (from books). If no match, insert a row in cexp.
Tables are as follows:
Customers-cid,name pk-cid
Books - isbn,title,price pk-isbn
Orders - ordernum,cid pk-ordernum
Orderlist - ordernum,isbn, quantity - pk-(ordernum,isbn)
cexp - cid,total - pk-cid
I am getting syntax errors. Can anyone correct this code?
CREATE OR REPLACE FUNCTION cexpupd()
RETURNS trigger as
$cexpupd$
BEGIN
UPDATE cexp
SET new.total=total+Select (b.price*new.quantity) FROM customers c
join orders o on c.cid=o.cid
join orderlist ol on o.ordernum=ol.ordernum
join books b on b.isbn=ol.isbn
where b.isbn=new.isbn;
--INSERT CODE WHEN ABOVE LINE DOES NOT OCCUR -INSERTS NEW ROW INTO CEXP
END;
$cexpupd$
LANGUAGE plpgsql
I would say that your UPDATE statement is an unsalvageable birds-nest. Fortunately, there's an easier way to achieve the same result.
Keep in mind that a trigger function is procedural code, so there's no need to concurrently load the gun, pull the trigger, scream, and shoot yourself in the foot in a single statement. You have access to all the procedural goodies like local variables and flow control statements.
The following should give you a good headstart on what you want:
CREATE OR REPLACE FUNCTION cexpupd()
RETURNS trigger as
$cexpupd$
DECLARE
bookprice books.price%TYPE;
ext_total cexp.total%TYPE;
custid orders.cid%TYPE;
BEGIN
SELECT cid
INTO custid
FROM orders
WHERE orders.ordernum = NEW.ordernum;
SELECT price
INTO bookprice
FROM books
WHERE books.isbn = NEW.isbn;
ext_total = bookprice * NEW.quantity;
UPDATE cexp
SET total = ext_total
WHERE cid = custid;
IF NOT FOUND THEN
--INSERT new CID record here
INSERT INTO cexp
(cid, total)
VALUES
(custid, ext_total);
END IF;
RETURN NEW;
END;
$cexpupd$
LANGUAGE plpgsql;
Note the statement near the end, RETURN NEW; This line is crucial, as it is how a PostgreSQL trigger function tells the database to go ahead and finish executing the statement that fired the trigger.
If you need any clarification, please don't hesitate to ask. Please note I have not tried executing this function, but I did create the necessary tables to compile it successfully.