Execution time issue - Postgresql - postgresql

Below is the function which i am running on two different tables which contains same column names.
-- Function: test(character varying)
-- DROP FUNCTION test(character varying);
CREATE OR REPLACE FUNCTION test(table_name character varying)
RETURNS SETOF void AS
$BODY$
DECLARE
recordcount integer;
j integer;
hstoredata hstore;
BEGIN
recordcount:=getTableName(table_name);
FOR j IN 1..recordcount LOOP
RAISE NOTICE 'RECORD NUMBER IS: %',j;
EXECUTE format('SELECT hstore(t) FROM datas.%I t WHERE id = $1', table_name) USING j INTO hstoredata;
RAISE NOTICE 'hstoredata: %', hstoredata;
END LOOP;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100
ROWS 1000;
When the above function is run on a table containing 1000 rows time taken is around 536 ms.
When the above function is run on a table containing 10000 rows time taken is around 27994 ms.
Logically time taken for 10000 rows should be around 5360 ms based on the calculation from 1000 rows, but the execution time is very high.
In order to reduce execution time, please suggest what changes to be made.

Logically time taken for 10000 rows should be around 5360 ms based on
the calculation from 1000 rows, but the execution time is very high.
It assumes that reading any particular row takes the same time as reading any other row, but this is not true.
For instance, if there's a text column in the table and it sometimes contains large contents, it's going to be fetched from TOAST storage (out of page) and dynamically uncompressed.
In order to reduce execution time, please suggest what changes to be
made.
To read all the table rows while not necessary fetching all in memory at once, you may use a cursor. That would avoid a new query at every loop iteration. Cursors accept dynamic queries through EXECUTE.
See Cursors in plpgsql documentation.

As far as I can tell you are over complicating things. As the "recordcount" is used to increment the ID values, I think you can do everything with a single statement instead of querying for each and every ID separately.
CREATE OR REPLACE FUNCTION test(table_name varchar)
RETURNS void AS
$BODY$
DECLARE
rec record;
begin
for rec in execute format ('select id, hstore(t) as hs from datas.%I', table_name) loop
RAISE NOTICE 'RECORD NUMBER IS: %',rec.id;
RAISE NOTICE 'hstoredata: %', rec.hs;
end loop;
end;
$BODY$
language plpgsql;
The only thing where this would be different than your solution is, that if an ID smaller than the count of rows in the table does not exist, you won't see a RECORD NUMBER message for that. But you would see ids that are bigger than the row count of the table.
Any time you execute the same statement again and again in a loop very, very loud alarm bells should ring in your head. SQL is optimized to deal with sets of data, not to do row-by-row processing (which is what your loop is doing).
You didn't tell us what the real problem is you are trying to solve (and I fear that you have over-simplified your example) but given the code from the question, the above should be a much better solution (definitely much faster).

Related

How to include a grand total count along with pagination and sorting?

Have a PostgreSQL function findbooks() accepts params for WHERE, OFFSET and LIMIT, dynamically builds a SELECT returning rows, for example
create or replace sa.findbooks(...) returns SETOF sa.books language 'plpgsql'
as $$
declare dynamicsql text;
begin
if ... then
dynamicsql := 'select ...';
end if;
execute dynamicsql;
end;
$$
Supposed the frontend wants to show how many total books met criteria BEFORE pagination. One way I can think of is to append a new column for this piece of info, obviously all cells of this column will have the same repeated value. Is there any better way?
PostgreSQL v12.
To get the total count, there is no better way than to execute an extra query that counts the data you want. This will be inefficient, but then paging with LIMIT and OFFSET is inefficient too, so it might not matter too much.

How to get number of rows of a PostgreSQL cursor?

I have a cursor created using the WITH HOLD option that allows the cursor to be used for subsequent transactions.
I would like to retrieve the number of rows that can be fetched by the cursor. Since the rows represented by a held cursor are copied into a temporary file or memory area, I am wondering if it is possible to retrieve that number in a straightforward way or if the only solution is to fetch all the records to count them.
In that case, a MOVE FORWARD ALL FROM <cursor> statement returns MOVE x. Where x is
the number moved. The result is a command tag written to stdout, and I do not know how to retrieve that value in a pgsql function. GET DIAGNOSTICS <var> := ROW_COUNT only works for FETCH but not MOVE.
Here is a solution proposal, how do you think I can improve it ? (and is it possible to use MOVE instead of FETCH to retrieve the x value ?)
-- Function returning the number of rows available in the cursor
CREATE FUNCTION get_cursor_size(_cursor_name TEXT)
RETURNS TEXT AS
$func$
DECLARE
_n_rows int;
BEGIN
-- Set cursor at the end of records
EXECUTE format('FETCH FORWARD ALL FROM %I', _cursor_name);
-- Retrieve number of rows
GET DIAGNOSTICS _n_rows := ROW_COUNT;
-- Set cursor at the beginning
EXECUTE format('MOVE ABSOLUTE 0 FROM %I', _cursor_name);
RETURN _n_rows;
END
$func$ LANGUAGE plpgsql;
Thank you very much for your help
I don't believe there is a way to do that in PL/pgSQL.
As a workaround, you could define the cursor so that it includes a row number:
DECLARE c CURSOR WITH HOLD FOR
SELECT *, row_number() OVER ()
FROM (SELECT /* your original query */) AS s;
That is only marginally more expensive than the original query, and it allows you to position the cursor on the last row with MOVE and retrieve the row_number, which is the total number of rows.
Yes it is possible to use MOVE instead of FETCH to count the records in a cursor, with slightly improved performance. As it turns out we can indeed retrieve ROW_COUNT diagnostics from a MOVE cursor statement.
GOTCHA: Using EXECUTE does not update the GET DIAGNOSTICS for MOVE while it does for FETCH, and neither statements will update the FOUND variable.
This is not clearly stipulated in the PostgreSQL documentanion per se, but considering that MOVE does not produce any actual results it might make sense enough to be excused.
NOTE: The following examples do not reset the cursor position back to 0 as with the original example, allowing the function to be used with all cursor types especially NO SCROLL cursors which will reject backward movement by raising an error.
Using MOVE instead of FETCH
The holy grail of PostgreSQL cursor record count methods we've all been waiting for. Modify the function to take a refcursor as argument and instead execute the MOVE cursor statement directly which then makes ROW_COUNT available.
-- Function returning the number of rows available in the cursor
CREATE FUNCTION get_cursor_size(_cursor refcursor)
RETURNS TEXT AS
$func$
DECLARE
_n_rows int;
BEGIN
-- Set cursor at the end of records
MOVE FORWARD ALL FROM _cursor;
-- Retrieve number of rows
GET DIAGNOSTICS _n_rows := ROW_COUNT;
RETURN _n_rows;
END
$func$ LANGUAGE plpgsql;
Alternative approaches using MOVE
Provided here for completion.
Another approach is to LOOP through the cursor until FOUND returns falsey, however this approach is notably slower than even the FETCH ALL method from the original example in the question.
-- Increment the cursor position and count the rows
CREATE FUNCTION get_cursor_size(_cursor refcursor)
RETURNS TEXT AS
$func$
DECLARE
_n_rows int := 0;
begin
LOOP
-- Move cursor position
MOVE FORWARD 1 IN _cursor;
-- While not found
EXIT WHEN NOT FOUND;
-- Increment counter
_n_rows := _n_rows + 1;
END LOOP;
RETURN _n_rows;
END
$func$ LANGUAGE plpgsql;
Increasing the step size does improve performance but the result will be rounded up because FOUND will report success if the cursor has moved at all. To rectify this we can look up ROW_COUNT and increment by the actual amount moved instead.
-- Count the actual number of rows incremented
CREATE FUNCTION get_cursor_size(_cursor refcursor)
RETURNS TEXT AS
$func$
DECLARE
_n_rows int := 0;
_move_count int;
begin
LOOP
-- Move cursor position
MOVE FORWARD 100 IN _cursor;
-- Until not found
EXIT WHEN NOT FOUND;
-- Increment counter
GET DIAGNOSTICS _move_count := ROW_COUNT;
_n_rows := _n_rows + _move_count;
END LOOP;
RETURN _n_rows;
END
$func$ LANGUAGE plpgsql;
With 50,000 records I had to increase the step size to 1000 before I noticed any improvement over FETCH ALL so unless there is something else worth doing simultaneously the incremental approach is less optimal.
Passing cursor functions
We will probably want to use this with cursor producing functions, like:
-- Get a reference cursor
CREATE FUNCTION get_refcursor()
RETURNS refcursor AS
$func$
DECLARE
return_cursor refcursor;
BEGIN
OPEN return_cursor FOR SELECT 'One Record';
RETURN return_cursor;
END
$func$ LANGUAGE plpgsql;
We can simplify this by using an OUT argument instead allowing us to omit both the declaration block and return statement, see the Create Function documentation.
-- Get a reference cursor
CREATE FUNCTION get_refcursor(OUT return_cursor refcursor)
RETURNS refcursor AS
$func$
BEGIN
OPEN return_cursor FOR SELECT 'One Record';
END
$func$ LANGUAGE plpgsql;
Using one of our get cursor functions we can now easily count the number of returned cursor records (only 1) by passing the function as argument.
SELECT get_cursor_size(get_refcursor());
nJoy!

How to delete multiple blob large objects in Postgres

I have million of rows in pg_largeobject_metadata table I want to delete. What I have tried so far is :
A simple select lo_unlink(oid) works fine
A perform lo_unlink(oid) in a loop of 10000 rows will also work fine
So when I delete recursively multiple rows i get this error. I cannot increase max_locks_per_transaction because it is managed by AWS.
ERROR: out of shared memory
HINT: You might need to increase max_locks_per_transaction.
CONTEXT: SQL statement "SELECT lo_unlink(c_row.oid)" PL/pgSQL function inline_code_block line 21 at
PERFORM SQL state: 53200
Here is the program I tried to write but I still get the Out of shared memory ERROR.
DO $proc$
DECLARE
v_fetch bigInt;
v_offset bigInt;
nbRows bigInt;
c_row record;
c_rows CURSOR(p_offset bigInt, p_fetch bigInt) FOR SELECT oid FROM pg_largeobject_metadata WHERE oid BETWEEN 1910001 AND 2900000 OFFSET p_offset ROWS FETCH NEXT p_fetch ROWS ONLY;
BEGIN
v_offset := 0;
v_fetch := 100;
select count(*) into nbRows FROM pg_largeobject_metadata WHERE oid BETWEEN 1910001 AND 2900000;
RAISE NOTICE 'End loop nbrows = %', nbRows;
LOOP -- Loop the different cursors
RAISE NOTICE 'offseter = %', v_offset;
OPEN c_rows(v_offset, v_fetch);
LOOP -- Loop through the cursor results
FETCH c_rows INTO c_row;
EXIT WHEN NOT FOUND;
perform lo_unlink(c_row.oid);
END LOOP;
CLOSE c_rows;
EXIT WHEN v_offset > nbRows;
v_offset := v_offset + v_fetch; -- The next 10000 rows
END LOOP;
END;
$proc$;
I am using Pg 9.5
Can anyone has faced this issue and could help please?
Each lo_unlink() grabs a lock on the object it deletes. These locks are freed only at the end of the transaction, and they are capped by max_locks_per_transaction * (max_connections + max_prepared_transactions) (see Lock Management). By default max_locks_per_transaction is 64, and cranking it up by several order of magnitudes is not a good solution.
The typical solution is to move the outer LOOP from your DO block into your client-side code, and commit the transaction at each iteration (so each transaction removes 10000 large objects and commits).
Starting with PostgreSQL version 11, a COMMIT inside the DO block would be possible, just like transaction control in procedures is possible.

postgres trigger function only inserts few records in another table

I have 2 tables hourly and daily and my aim is to calculate average of values from hourly table and save it to daily table. I have written a trigger function like this -
CREATE OR REPLACE FUNCTION public.calculate_daily_avg()
RETURNS trigger AS
$BODY$
DECLARE chrly CURSOR for
SELECT device, date(datum) datum, avg(cpu_util) cpu_util
FROM chourly WHERE date(datum) = current_date group by device, date(datum);
BEGIN
FOR chrly_rec IN chrly
LOOP
insert into cdaily (device, datum, cpu_util)
values (chrly_rec.device, chrly_rec.datum, chrly_rec.cpu_util)
on conflict (device, datum) do update set
device=chrly_rec.device, datum=chrly_rec.datum, cpu_util=chrly_rec.cpu_util;
return NEW;
END LOOP;
EXCEPTION
WHEN NO_DATA_FOUND THEN
RAISE NOTICE 'NO DATA IN chourly FOR %', current_date;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION public.calculate_daily_avg()
OWNER TO postgres;
and a trigger on hourly table like this -
CREATE TRIGGER calculate_daily_avg_trg
BEFORE INSERT OR UPDATE
ON public.chourly
FOR EACH ROW
EXECUTE PROCEDURE public.calculate_daily_avg();
But when I try to insert or update about 3000 records in the hourly table only 3 or 4 devices are inserted and not 3000. (also in trigger I have already tried AFTER INSERT OR UPDATE but even that gives the same result) What I am doing wrong here? Please suggest any better way to write the trigger if you feel I have written it wrongly. Thanks!
I don't suggest using TRIGGER for calculation when INSERT. Try different approach using function execute by cron hourly or daily.
WHY?
Because everytime you INSERT one row. the postgres will always do aggregate function AVG() and LOOPING for insert (based on your flow).
That mean another INSERT statement will waiting until Previous INSERT commited that will suffer your database performance for highly INSERT Transaction. If somehow you manage to BREAK the rule (maybe from configuration) you will get inconsistent data like what happened to you right now.

PL/pgSQL trigger table entry limit

I'd like to get an opinion on a trigger I've written for a PostGreSQL Database in PL/pgSQL. I haven't done it previously and would like to get suggestions by more experienced users.
Task is simple enough:
Reduce the number of entries in a table to a set amount.
What should happen:
An INSERT into to the table device_position occurs,
If the amount of entries with a specific column (deviceid) value exceeds 50 delete the oldest.
Repeat
Please let me know if you see any obvious flaws:
CREATE OR REPLACE FUNCTION trim_device_positions() RETURNS trigger AS $trim_device_positions$
DECLARE
devicePositionCount int;
maxDevicePos CONSTANT int=50;
aDeviceId device_position.id%TYPE;
BEGIN
SELECT count(*) INTO devicePositionCount FROM device_position WHERE device_position.deviceid=NEW.deviceid;
IF devicePositionCount>maxDevicePos THEN
FOR aDeviceId IN SELECT id FROM device_position WHERE device_position.deviceid=NEW.deviceid ORDER BY device_position.id ASC LIMIT devicePositionCount-maxDevicePos LOOP
DELETE FROM device_position WHERE device_position.id=aDeviceId;
END LOOP;
END IF;
RETURN NULL;
END;
$trim_device_positions$ LANGUAGE plpgsql;
DROP TRIGGER trim_device_positions_trigger ON device_position;
CREATE TRIGGER trim_device_positions_trigger AFTER INSERT ON device_position FOR EACH ROW EXECUTE PROCEDURE trim_device_positions();
Thanks for any wisdom coming my way :)