PostgreSQL and SELECT FOR UPDATE LIMIT - postgresql

Can you please explain me this strange behaviour:
I have this stored procedure which tell me if a row is locked
CREATE OR REPLACE FUNCTION tg_availablega_is_unlocked(availablega_id integer)
RETURNS boolean AS
$BODY$
DECLARE
is_locked boolean = FALSE;
BEGIN
BEGIN
PERFORM id FROM tg_availablega WHERE id = availablega_id
FOR UPDATE NOWAIT;
EXCEPTION
WHEN lock_not_available THEN
is_locked := TRUE;
END;
RETURN not is_locked;
END;$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
If I start a transaction and execute this:
SELECT "tg_availablega"."id",
"tg_availablega"."isactive",
"tg_availablega"."schedule",
"tg_availablega"."zone_tg_id"
FROM "tg_availablega"
WHERE (tg_availablega_is_unlocked("tg_availablega"."id")
AND "tg_availablega"."zone_tg_id" = 1
AND "tg_availablega"."isactive" = TRUE
AND "tg_availablega"."schedule" = 20)
LIMIT 100
FOR
UPDATE;
It locks and return 100 rows. If I execute the same simultaneously in other transaction, it locks and return different 100 rows. If rows total are 101 then first executuon return 100 rows and second execution return just 1 remaining row.
BUT if I add ORDER BY clause
SELECT "tg_availablega"."id",
"tg_availablega"."isactive",
"tg_availablega"."schedule",
"tg_availablega"."zone_tg_id"
FROM "tg_availablega"
WHERE (tg_availablega_is_unlocked("tg_availablega"."id")
AND "tg_availablega"."zone_tg_id" = 1
AND "tg_availablega"."isactive" = TRUE
AND "tg_availablega"."schedule" = 20)
***ORDER BY "tg_availablega"."id"***
LIMIT 100
FOR
UPDATE;
then the first transaction return 100 locked rows, and second transaction return NO ROWS
Why is that?

The problem is that the function tg_availablega_is_unlocked does lock the rows it examines. Without order by, Postgres doesn't visit all rows - so the function doesn't get called on all of them. I think You meant:
select * from (
SELECT "tg_availablega"."id",
"tg_availablega"."isactive",
"tg_availablega"."schedule",
"tg_availablega"."zone_tg_id"
FROM "tg_availablega"
WHERE "tg_availablega"."zone_tg_id" = 1
AND "tg_availablega"."isactive" = TRUE
AND "tg_availablega"."schedule" = 20)
ORDER BY "tg_availablega"."id"
) a
where tg_availablega_is_unlocked(id)
limit 100

Related

Update table every 1000 rows

I am trying to do an update on a specific record every 1000 rows using Postgres. I am looking for a better way to do that. My function is described below:
CREATE OR REPLACE FUNCTION update_row()
RETURNS void AS
$BODY$
declare
myUID integer;
nRow integer;
maxUid integer;
BEGIN
nRow:=1000;
select max(uid_atm_inp) from tab into maxUid where field1 = '1240200';
loop
if (nRow > 1000 and nRow < maxUid) then
select uid from tab into myUID where field1 = '1240200' and uid >= nRow limit 1;
update tab
set field = 'xxx'
where field1 = '1240200' and uid = myUID;
nRow:=nRow+1000;
end if;
end loop;
END; $BODY$
LANGUAGE plpgsql VOLATILE
How can I improve this procedure? I think there is something wrong. The loop does not end and takes too much time.
To perform this task in SQL, you could use the row_number window function and update only those rows where the number is divisible by 1000.
Your loop doesn't finish because there is no EXIT or RETURN in it.
I doubt you could ever rival the performance of a standard SQL update with a procedural loop. Instead of doing it a row at a time, just do it all as a single statement:
with t2 as (
select
uid, row_number() over (order by 1) as rn
from tab
where field1 = '1240200'
)
update tab t1
set field = 'xxx'
from t2
where
t1.uid = t2.uid and
mod (t2.rn, 1000) = 0
Per my comment, I am presupposing what you mean by "every 1000th row," as without some designation of how to determine what tuple is what row number. That is easily edited by changing the "order by" criteria.
Adding a second where clause on the update (t1.field1 = '1240200') can't hurt but might not be necessary if these are nested loop.
This might be notionally similar to what Laurenz has in mind.
I solved this way:
declare
myUID integer;
nRow integer;
rowNum integer;
checkrow integer;
myString varchar(272);
cur_check_row cursor for select uid , row_number() over (order by 1) as rn, substr(fieldxx,1,244)
from table where field1 = '1240200' and uid >= 1000 ORDER BY uid;
BEGIN
open cur_check_row;
loop
fetch cur_check_row into myUID, rowNum, myString;
EXIT WHEN NOT FOUND;
select mod(rowNum, 1000) into checkrow;
if checkrow = 0 then
update table
set fieldxx= myString||'O'
where uid in (myUID);
end if;
end loop;
close cur_check_row;

if inside for Loop in postgresql

I have two tables as below.
TABLE 1 TABLE 2
-------- --------
id id
date table1_id
total subtotal
balance
table 1 table 2
-------- ---------
id total balance id table1_id subtotal paid
1 20 10 1 1 5 5
2 30 30 2 1 15 5
3 2 10 0
4 2 10 0
5 2 10 0
I have to add paid column in table2. so can anyone help me to add values to newly added column for existing data. I tried to wrote procedure as below but as postgres will not allow if in for loop so am unable to do it.
CREATE OR REPLACE FUNCTION public.add_amountreceived_inbillitem() RETURNS void AS
$BODY$
DECLARE
rec RECORD;
inner_rec RECORD;
distributebalance numeric;
tempvar numeric;
BEGIN
FOR rec IN select * from table1
LOOP
distributebalance = rec.balance;
FOR inner_rec IN(select * from table2 where table1_id = rec.id order by id limit 1)
tempvar = distributebalance - inner_rec.subtotal;
if (distributebalance >0 and tempvar>=0) THEN
update table2 set paid = inner_rec.subtotal where id = inner_rec.id ;
distributebalance =distributebalance-inner_rec.subtotal;
else if( distributebalance >0 and tempvar<0 )THEN
update table2 set paid = distributebalance where id = inner_rec.id;
END IF;
END LOOP;
END LOOP;
END; $BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
Thanks In advance :)
Postgres does allow for IF statements in loops.
The issue is that by writing ELSE IF you've started a new IF statement, so you now have 2 opening IFs but only one closing END IF. A "proper" elseif in plpgsql is ELSIF or ELSEIF. So just delete the space between those words and it should work.

How to optimize postgresql procedure

I have 61 million of non unique emails with statuses.
This emails need to deduplicate with logic by status.
I write stored procedure, but this procedure runs to long.
How I can optimize execution time of this procedure?
CREATE OR REPLACE FUNCTION public.load_oxy_emails() RETURNS boolean AS $$
DECLARE
row record;
rec record;
new_id int;
BEGIN
FOR row IN SELECT * FROM oxy_email ORDER BY id LOOP
SELECT * INTO rec FROM oxy_emails_clean WHERE email = row.email;
IF rec IS NOT NULL THEN
IF row.status = 3 THEN
UPDATE oxy_emails_clean SET status = 3 WHERE id = rec.id;
END IF;
ELSE
INSERT INTO oxy_emails_clean(id, email, status) VALUES(nextval('oxy_emails_clean_id_seq'), row.email, row.status);
SELECT currval('oxy_emails_clean_id_seq') INTO new_id;
INSERT INTO oxy_emails_clean_websites_relation(oxy_emails_clean_id, website_id) VALUES(new_id, row.website_id);
END IF;
END LOOP;
RETURN true;
END;
$$
LANGUAGE 'plpgsql';
How I can optimize execution time of this procedure?
Don't do it with a loop.
Doing a row-by-row processing (also known as "slow-by-slow") is almost always a lot slower then doing bulk changes where a single statement processes a lot of rows "in one go".
The change of the status can easily be done using a single statement:
update oxy_emails_clean oec
SET status = 3
from oxy_email oe
where oe.id = oec.id
and oe.status = 3;
The copying of the rows can be done using a chain of CTEs:
with to_copy as (
select *
from oxy_email
where status <> 3 --<< all those that have a different status
), clean_inserted as (
INSERT INTO oxy_emails_clean (id, email, status)
select nextval('oxy_emails_clean_id_seq'), email, status
from to_copy
returning id;
)
insert oxy_emails_clean_websites_relation (oxy_emails_clean_id, website_id)
select ci.id, tc.website_id
from clean_inserted ci
join to_copy tc on tc.id = ci.id;

postgres count from table efficient way

In my application we are using postgresql,now it has one million records in summary table.
When I run the following query it takes 80,927 ms
SELECT COUNT(*) AS count
FROM summary_views
GROUP BY question_id,category_type_id
Is there any efficient way to do this?
COUNT(*) in PostgreSQL tends to be slow. It's a feature of MVCC. One of the workarounds of the problem is a row counting trigger with a helper table:
create table table_count(
table_count_id text primary key,
rows int default 0
);
CREATE OR REPLACE FUNCTION table_count_update()
RETURNS trigger AS
$BODY$
begin
if tg_op = 'INSERT' then
update table_count set rows = rows + 1
where table_count_id = TG_TABLE_NAME;
elsif tg_op = 'DELETE' then
update table_count set rows = rows - 1
where table_count_id = TG_TABLE_NAME;
end if;
return null;
end;
$BODY$
LANGUAGE 'plpgsql' VOLATILE;
Next step is to add proper trigger declaration for each table you'd like to use it with. For example for table tab_name:
begin;
insert into table_count values
('tab_name',(select count(*) from tab_name));
create trigger tab_name_table_count after insert or delete
on tab_name for each row execute procedure table_count_update();
commit;
It is important to run in a transaction block to keep actual count and helper table in sync in case of delete or insert between initial count and trigger creation. Transaction guarantees this. From now on to get current count instantly, just invoke:
select rows from table_count where table_count_id = 'tab_name';
Edit: In case of your group by clause, you'll need more sophisticated trigger function and count table.

Is there a similar function in postgresql for mysql's SQL_CALC_FOUND_ROWS?

everybody using mysql knows:
SELECT SQL_CALC_FOUND_ROWS ..... FROM table WHERE ... LIMIT 5, 10;
and right after run this :
SELECT FOUND_ROWS();
how do i do this in postrgesql? so far, i found only ways where i have to send the query twice...
No, there is not (at least not as of July 2007). I'm afraid you'll have to resort to:
BEGIN ISOLATION LEVEL SERIALIZABLE;
SELECT id, username, title, date FROM posts ORDER BY date DESC LIMIT 20;
SELECT count(id, username, title, date) AS total FROM posts;
END;
The isolation level needs to be SERIALIZABLE to ensure that the query does not see concurrent updates between the SELECT statements.
Another option you have, though, is to use a trigger to count rows as they're INSERTed or DELETEd. Suppose you have the following table:
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
poster TEXT,
title TEXT,
time TIMESTAMPTZ DEFAULT now()
);
INSERT INTO posts (poster, title) VALUES ('Alice', 'Post 1');
INSERT INTO posts (poster, title) VALUES ('Bob', 'Post 2');
INSERT INTO posts (poster, title) VALUES ('Charlie', 'Post 3');
Then, perform the following to create a table called post_count that contains a running count of the number of rows in posts:
-- Don't let any new posts be added while we're setting up the counter.
BEGIN;
LOCK TABLE posts;
-- Create and initialize our post_count table.
SELECT count(*) INTO TABLE post_count FROM posts;
-- Create the trigger function.
CREATE FUNCTION post_added_or_removed() RETURNS TRIGGER AS $$
BEGIN
IF TG_OP = 'DELETE' THEN
UPDATE post_count SET count = count - 1;
ELSIF TG_OP = 'INSERT' THEN
UPDATE post_count SET count = count + 1;
END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
-- Call the trigger function any time a row is inserted.
CREATE TRIGGER post_added_or_removed_tgr
AFTER INSERT OR DELETE
ON posts
FOR EACH ROW
EXECUTE PROCEDURE post_added_or_removed();
COMMIT;
Note that this maintains a running count of all of the rows in posts. To keep a running count of certain rows, you'll have to tweak it:
SELECT count(*) INTO TABLE post_count FROM posts WHERE poster <> 'Bob';
CREATE FUNCTION post_added_or_removed() RETURNS TRIGGER AS $$
BEGIN
-- The IF statements are nested because OR does not short circuit.
IF TG_OP = 'DELETE' THEN
IF OLD.poster <> 'Bob' THEN
UPDATE post_count SET count = count - 1;
END IF;
ELSIF TG_OP = 'INSERT' THEN
IF NEW.poster <> 'Bob' THEN
UPDATE post_count SET count = count + 1;
END IF;
END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
There is a simple way, but keep in mind, that following COUNT(*) aggr function will be applied to all rows returned after where and before limit/offset (may be costy)
SELECT
id,
"count" (*) OVER () AS cnt
FROM
objects
WHERE
id > 2
OFFSET 50
LIMIT 5
No, PostgreSQL doesn't try to count all relevant results when you only need 10 results. You need a seperate COUNT to count all results.