How to delete multiple blob large objects in Postgres - postgresql

I have million of rows in pg_largeobject_metadata table I want to delete. What I have tried so far is :
A simple select lo_unlink(oid) works fine
A perform lo_unlink(oid) in a loop of 10000 rows will also work fine
So when I delete recursively multiple rows i get this error. I cannot increase max_locks_per_transaction because it is managed by AWS.
ERROR: out of shared memory
HINT: You might need to increase max_locks_per_transaction.
CONTEXT: SQL statement "SELECT lo_unlink(c_row.oid)" PL/pgSQL function inline_code_block line 21 at
PERFORM SQL state: 53200
Here is the program I tried to write but I still get the Out of shared memory ERROR.
DO $proc$
DECLARE
v_fetch bigInt;
v_offset bigInt;
nbRows bigInt;
c_row record;
c_rows CURSOR(p_offset bigInt, p_fetch bigInt) FOR SELECT oid FROM pg_largeobject_metadata WHERE oid BETWEEN 1910001 AND 2900000 OFFSET p_offset ROWS FETCH NEXT p_fetch ROWS ONLY;
BEGIN
v_offset := 0;
v_fetch := 100;
select count(*) into nbRows FROM pg_largeobject_metadata WHERE oid BETWEEN 1910001 AND 2900000;
RAISE NOTICE 'End loop nbrows = %', nbRows;
LOOP -- Loop the different cursors
RAISE NOTICE 'offseter = %', v_offset;
OPEN c_rows(v_offset, v_fetch);
LOOP -- Loop through the cursor results
FETCH c_rows INTO c_row;
EXIT WHEN NOT FOUND;
perform lo_unlink(c_row.oid);
END LOOP;
CLOSE c_rows;
EXIT WHEN v_offset > nbRows;
v_offset := v_offset + v_fetch; -- The next 10000 rows
END LOOP;
END;
$proc$;
I am using Pg 9.5
Can anyone has faced this issue and could help please?

Each lo_unlink() grabs a lock on the object it deletes. These locks are freed only at the end of the transaction, and they are capped by max_locks_per_transaction * (max_connections + max_prepared_transactions) (see Lock Management). By default max_locks_per_transaction is 64, and cranking it up by several order of magnitudes is not a good solution.
The typical solution is to move the outer LOOP from your DO block into your client-side code, and commit the transaction at each iteration (so each transaction removes 10000 large objects and commits).
Starting with PostgreSQL version 11, a COMMIT inside the DO block would be possible, just like transaction control in procedures is possible.

Related

How to get number of rows of a PostgreSQL cursor?

I have a cursor created using the WITH HOLD option that allows the cursor to be used for subsequent transactions.
I would like to retrieve the number of rows that can be fetched by the cursor. Since the rows represented by a held cursor are copied into a temporary file or memory area, I am wondering if it is possible to retrieve that number in a straightforward way or if the only solution is to fetch all the records to count them.
In that case, a MOVE FORWARD ALL FROM <cursor> statement returns MOVE x. Where x is
the number moved. The result is a command tag written to stdout, and I do not know how to retrieve that value in a pgsql function. GET DIAGNOSTICS <var> := ROW_COUNT only works for FETCH but not MOVE.
Here is a solution proposal, how do you think I can improve it ? (and is it possible to use MOVE instead of FETCH to retrieve the x value ?)
-- Function returning the number of rows available in the cursor
CREATE FUNCTION get_cursor_size(_cursor_name TEXT)
RETURNS TEXT AS
$func$
DECLARE
_n_rows int;
BEGIN
-- Set cursor at the end of records
EXECUTE format('FETCH FORWARD ALL FROM %I', _cursor_name);
-- Retrieve number of rows
GET DIAGNOSTICS _n_rows := ROW_COUNT;
-- Set cursor at the beginning
EXECUTE format('MOVE ABSOLUTE 0 FROM %I', _cursor_name);
RETURN _n_rows;
END
$func$ LANGUAGE plpgsql;
Thank you very much for your help
I don't believe there is a way to do that in PL/pgSQL.
As a workaround, you could define the cursor so that it includes a row number:
DECLARE c CURSOR WITH HOLD FOR
SELECT *, row_number() OVER ()
FROM (SELECT /* your original query */) AS s;
That is only marginally more expensive than the original query, and it allows you to position the cursor on the last row with MOVE and retrieve the row_number, which is the total number of rows.
Yes it is possible to use MOVE instead of FETCH to count the records in a cursor, with slightly improved performance. As it turns out we can indeed retrieve ROW_COUNT diagnostics from a MOVE cursor statement.
GOTCHA: Using EXECUTE does not update the GET DIAGNOSTICS for MOVE while it does for FETCH, and neither statements will update the FOUND variable.
This is not clearly stipulated in the PostgreSQL documentanion per se, but considering that MOVE does not produce any actual results it might make sense enough to be excused.
NOTE: The following examples do not reset the cursor position back to 0 as with the original example, allowing the function to be used with all cursor types especially NO SCROLL cursors which will reject backward movement by raising an error.
Using MOVE instead of FETCH
The holy grail of PostgreSQL cursor record count methods we've all been waiting for. Modify the function to take a refcursor as argument and instead execute the MOVE cursor statement directly which then makes ROW_COUNT available.
-- Function returning the number of rows available in the cursor
CREATE FUNCTION get_cursor_size(_cursor refcursor)
RETURNS TEXT AS
$func$
DECLARE
_n_rows int;
BEGIN
-- Set cursor at the end of records
MOVE FORWARD ALL FROM _cursor;
-- Retrieve number of rows
GET DIAGNOSTICS _n_rows := ROW_COUNT;
RETURN _n_rows;
END
$func$ LANGUAGE plpgsql;
Alternative approaches using MOVE
Provided here for completion.
Another approach is to LOOP through the cursor until FOUND returns falsey, however this approach is notably slower than even the FETCH ALL method from the original example in the question.
-- Increment the cursor position and count the rows
CREATE FUNCTION get_cursor_size(_cursor refcursor)
RETURNS TEXT AS
$func$
DECLARE
_n_rows int := 0;
begin
LOOP
-- Move cursor position
MOVE FORWARD 1 IN _cursor;
-- While not found
EXIT WHEN NOT FOUND;
-- Increment counter
_n_rows := _n_rows + 1;
END LOOP;
RETURN _n_rows;
END
$func$ LANGUAGE plpgsql;
Increasing the step size does improve performance but the result will be rounded up because FOUND will report success if the cursor has moved at all. To rectify this we can look up ROW_COUNT and increment by the actual amount moved instead.
-- Count the actual number of rows incremented
CREATE FUNCTION get_cursor_size(_cursor refcursor)
RETURNS TEXT AS
$func$
DECLARE
_n_rows int := 0;
_move_count int;
begin
LOOP
-- Move cursor position
MOVE FORWARD 100 IN _cursor;
-- Until not found
EXIT WHEN NOT FOUND;
-- Increment counter
GET DIAGNOSTICS _move_count := ROW_COUNT;
_n_rows := _n_rows + _move_count;
END LOOP;
RETURN _n_rows;
END
$func$ LANGUAGE plpgsql;
With 50,000 records I had to increase the step size to 1000 before I noticed any improvement over FETCH ALL so unless there is something else worth doing simultaneously the incremental approach is less optimal.
Passing cursor functions
We will probably want to use this with cursor producing functions, like:
-- Get a reference cursor
CREATE FUNCTION get_refcursor()
RETURNS refcursor AS
$func$
DECLARE
return_cursor refcursor;
BEGIN
OPEN return_cursor FOR SELECT 'One Record';
RETURN return_cursor;
END
$func$ LANGUAGE plpgsql;
We can simplify this by using an OUT argument instead allowing us to omit both the declaration block and return statement, see the Create Function documentation.
-- Get a reference cursor
CREATE FUNCTION get_refcursor(OUT return_cursor refcursor)
RETURNS refcursor AS
$func$
BEGIN
OPEN return_cursor FOR SELECT 'One Record';
END
$func$ LANGUAGE plpgsql;
Using one of our get cursor functions we can now easily count the number of returned cursor records (only 1) by passing the function as argument.
SELECT get_cursor_size(get_refcursor());
nJoy!

No Implicit variable as an alternative to ##ROWCOUNT/SQL%ROWCOUNT [duplicate]

I want to returns the number of rows affected by the last statement.
Using Microsoft SQL Server 2008 R2 I do it this way:
SELECT * FROM Test_table;
SELECT ##ROWCOUNT AS [Number Of Rows Affected];
Will gives:
Number Of Rows Affected
-----------------------
10
How about in PostgreSQL 9.3?
DO $$
DECLARE
total_rows integer;
BEGIN
UPDATE emp_salary
SET salary = salary+1;
IF NOT FOUND THEN
RAISE NOTICE 'No rows found';
ELSIF FOUND THEN
GET DIAGNOSTICS total_rows := ROW_COUNT;
-- the above line used to get row_count
RAISE NOTICE 'Rows Found : total_rows: %', total_rows;
END IF;
END $$;
AFAIK there is no such construct in postgresql however the number of rows is part of the result you get from postgresql.
CORRECTION: as a_horse_with_no_name states in his comment there is something similar which can be used within PL/pgSQL. Also see example in answer posted by
Achilles Ram Nakirekanti
From within programs however my original suggestion is in most cases easier then having to resort to the usage of PL/pgSQL.
When using libpq:
On the result of a select you can use PQntuples to determine the number of rows returned. For update, insert and delete you can use PQcmdTuples with the result to get the number of rows affected.
Other client libraries often have similar functionality.
For REF from referred article: GET DIAGNOSTICS integer_var = ROW_COUNT;

PL/pgSQL trigger table entry limit

I'd like to get an opinion on a trigger I've written for a PostGreSQL Database in PL/pgSQL. I haven't done it previously and would like to get suggestions by more experienced users.
Task is simple enough:
Reduce the number of entries in a table to a set amount.
What should happen:
An INSERT into to the table device_position occurs,
If the amount of entries with a specific column (deviceid) value exceeds 50 delete the oldest.
Repeat
Please let me know if you see any obvious flaws:
CREATE OR REPLACE FUNCTION trim_device_positions() RETURNS trigger AS $trim_device_positions$
DECLARE
devicePositionCount int;
maxDevicePos CONSTANT int=50;
aDeviceId device_position.id%TYPE;
BEGIN
SELECT count(*) INTO devicePositionCount FROM device_position WHERE device_position.deviceid=NEW.deviceid;
IF devicePositionCount>maxDevicePos THEN
FOR aDeviceId IN SELECT id FROM device_position WHERE device_position.deviceid=NEW.deviceid ORDER BY device_position.id ASC LIMIT devicePositionCount-maxDevicePos LOOP
DELETE FROM device_position WHERE device_position.id=aDeviceId;
END LOOP;
END IF;
RETURN NULL;
END;
$trim_device_positions$ LANGUAGE plpgsql;
DROP TRIGGER trim_device_positions_trigger ON device_position;
CREATE TRIGGER trim_device_positions_trigger AFTER INSERT ON device_position FOR EACH ROW EXECUTE PROCEDURE trim_device_positions();
Thanks for any wisdom coming my way :)

##ROWCOUNT in PostgreSQL 9.3

I want to returns the number of rows affected by the last statement.
Using Microsoft SQL Server 2008 R2 I do it this way:
SELECT * FROM Test_table;
SELECT ##ROWCOUNT AS [Number Of Rows Affected];
Will gives:
Number Of Rows Affected
-----------------------
10
How about in PostgreSQL 9.3?
DO $$
DECLARE
total_rows integer;
BEGIN
UPDATE emp_salary
SET salary = salary+1;
IF NOT FOUND THEN
RAISE NOTICE 'No rows found';
ELSIF FOUND THEN
GET DIAGNOSTICS total_rows := ROW_COUNT;
-- the above line used to get row_count
RAISE NOTICE 'Rows Found : total_rows: %', total_rows;
END IF;
END $$;
AFAIK there is no such construct in postgresql however the number of rows is part of the result you get from postgresql.
CORRECTION: as a_horse_with_no_name states in his comment there is something similar which can be used within PL/pgSQL. Also see example in answer posted by
Achilles Ram Nakirekanti
From within programs however my original suggestion is in most cases easier then having to resort to the usage of PL/pgSQL.
When using libpq:
On the result of a select you can use PQntuples to determine the number of rows returned. For update, insert and delete you can use PQcmdTuples with the result to get the number of rows affected.
Other client libraries often have similar functionality.
For REF from referred article: GET DIAGNOSTICS integer_var = ROW_COUNT;

Execution time issue - Postgresql

Below is the function which i am running on two different tables which contains same column names.
-- Function: test(character varying)
-- DROP FUNCTION test(character varying);
CREATE OR REPLACE FUNCTION test(table_name character varying)
RETURNS SETOF void AS
$BODY$
DECLARE
recordcount integer;
j integer;
hstoredata hstore;
BEGIN
recordcount:=getTableName(table_name);
FOR j IN 1..recordcount LOOP
RAISE NOTICE 'RECORD NUMBER IS: %',j;
EXECUTE format('SELECT hstore(t) FROM datas.%I t WHERE id = $1', table_name) USING j INTO hstoredata;
RAISE NOTICE 'hstoredata: %', hstoredata;
END LOOP;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100
ROWS 1000;
When the above function is run on a table containing 1000 rows time taken is around 536 ms.
When the above function is run on a table containing 10000 rows time taken is around 27994 ms.
Logically time taken for 10000 rows should be around 5360 ms based on the calculation from 1000 rows, but the execution time is very high.
In order to reduce execution time, please suggest what changes to be made.
Logically time taken for 10000 rows should be around 5360 ms based on
the calculation from 1000 rows, but the execution time is very high.
It assumes that reading any particular row takes the same time as reading any other row, but this is not true.
For instance, if there's a text column in the table and it sometimes contains large contents, it's going to be fetched from TOAST storage (out of page) and dynamically uncompressed.
In order to reduce execution time, please suggest what changes to be
made.
To read all the table rows while not necessary fetching all in memory at once, you may use a cursor. That would avoid a new query at every loop iteration. Cursors accept dynamic queries through EXECUTE.
See Cursors in plpgsql documentation.
As far as I can tell you are over complicating things. As the "recordcount" is used to increment the ID values, I think you can do everything with a single statement instead of querying for each and every ID separately.
CREATE OR REPLACE FUNCTION test(table_name varchar)
RETURNS void AS
$BODY$
DECLARE
rec record;
begin
for rec in execute format ('select id, hstore(t) as hs from datas.%I', table_name) loop
RAISE NOTICE 'RECORD NUMBER IS: %',rec.id;
RAISE NOTICE 'hstoredata: %', rec.hs;
end loop;
end;
$BODY$
language plpgsql;
The only thing where this would be different than your solution is, that if an ID smaller than the count of rows in the table does not exist, you won't see a RECORD NUMBER message for that. But you would see ids that are bigger than the row count of the table.
Any time you execute the same statement again and again in a loop very, very loud alarm bells should ring in your head. SQL is optimized to deal with sets of data, not to do row-by-row processing (which is what your loop is doing).
You didn't tell us what the real problem is you are trying to solve (and I fear that you have over-simplified your example) but given the code from the question, the above should be a much better solution (definitely much faster).