How to get number of rows of a PostgreSQL cursor? - postgresql

I have a cursor created using the WITH HOLD option that allows the cursor to be used for subsequent transactions.
I would like to retrieve the number of rows that can be fetched by the cursor. Since the rows represented by a held cursor are copied into a temporary file or memory area, I am wondering if it is possible to retrieve that number in a straightforward way or if the only solution is to fetch all the records to count them.
In that case, a MOVE FORWARD ALL FROM <cursor> statement returns MOVE x. Where x is
the number moved. The result is a command tag written to stdout, and I do not know how to retrieve that value in a pgsql function. GET DIAGNOSTICS <var> := ROW_COUNT only works for FETCH but not MOVE.
Here is a solution proposal, how do you think I can improve it ? (and is it possible to use MOVE instead of FETCH to retrieve the x value ?)
-- Function returning the number of rows available in the cursor
CREATE FUNCTION get_cursor_size(_cursor_name TEXT)
RETURNS TEXT AS
$func$
DECLARE
_n_rows int;
BEGIN
-- Set cursor at the end of records
EXECUTE format('FETCH FORWARD ALL FROM %I', _cursor_name);
-- Retrieve number of rows
GET DIAGNOSTICS _n_rows := ROW_COUNT;
-- Set cursor at the beginning
EXECUTE format('MOVE ABSOLUTE 0 FROM %I', _cursor_name);
RETURN _n_rows;
END
$func$ LANGUAGE plpgsql;
Thank you very much for your help

I don't believe there is a way to do that in PL/pgSQL.
As a workaround, you could define the cursor so that it includes a row number:
DECLARE c CURSOR WITH HOLD FOR
SELECT *, row_number() OVER ()
FROM (SELECT /* your original query */) AS s;
That is only marginally more expensive than the original query, and it allows you to position the cursor on the last row with MOVE and retrieve the row_number, which is the total number of rows.

Yes it is possible to use MOVE instead of FETCH to count the records in a cursor, with slightly improved performance. As it turns out we can indeed retrieve ROW_COUNT diagnostics from a MOVE cursor statement.
GOTCHA: Using EXECUTE does not update the GET DIAGNOSTICS for MOVE while it does for FETCH, and neither statements will update the FOUND variable.
This is not clearly stipulated in the PostgreSQL documentanion per se, but considering that MOVE does not produce any actual results it might make sense enough to be excused.
NOTE: The following examples do not reset the cursor position back to 0 as with the original example, allowing the function to be used with all cursor types especially NO SCROLL cursors which will reject backward movement by raising an error.
Using MOVE instead of FETCH
The holy grail of PostgreSQL cursor record count methods we've all been waiting for. Modify the function to take a refcursor as argument and instead execute the MOVE cursor statement directly which then makes ROW_COUNT available.
-- Function returning the number of rows available in the cursor
CREATE FUNCTION get_cursor_size(_cursor refcursor)
RETURNS TEXT AS
$func$
DECLARE
_n_rows int;
BEGIN
-- Set cursor at the end of records
MOVE FORWARD ALL FROM _cursor;
-- Retrieve number of rows
GET DIAGNOSTICS _n_rows := ROW_COUNT;
RETURN _n_rows;
END
$func$ LANGUAGE plpgsql;
Alternative approaches using MOVE
Provided here for completion.
Another approach is to LOOP through the cursor until FOUND returns falsey, however this approach is notably slower than even the FETCH ALL method from the original example in the question.
-- Increment the cursor position and count the rows
CREATE FUNCTION get_cursor_size(_cursor refcursor)
RETURNS TEXT AS
$func$
DECLARE
_n_rows int := 0;
begin
LOOP
-- Move cursor position
MOVE FORWARD 1 IN _cursor;
-- While not found
EXIT WHEN NOT FOUND;
-- Increment counter
_n_rows := _n_rows + 1;
END LOOP;
RETURN _n_rows;
END
$func$ LANGUAGE plpgsql;
Increasing the step size does improve performance but the result will be rounded up because FOUND will report success if the cursor has moved at all. To rectify this we can look up ROW_COUNT and increment by the actual amount moved instead.
-- Count the actual number of rows incremented
CREATE FUNCTION get_cursor_size(_cursor refcursor)
RETURNS TEXT AS
$func$
DECLARE
_n_rows int := 0;
_move_count int;
begin
LOOP
-- Move cursor position
MOVE FORWARD 100 IN _cursor;
-- Until not found
EXIT WHEN NOT FOUND;
-- Increment counter
GET DIAGNOSTICS _move_count := ROW_COUNT;
_n_rows := _n_rows + _move_count;
END LOOP;
RETURN _n_rows;
END
$func$ LANGUAGE plpgsql;
With 50,000 records I had to increase the step size to 1000 before I noticed any improvement over FETCH ALL so unless there is something else worth doing simultaneously the incremental approach is less optimal.
Passing cursor functions
We will probably want to use this with cursor producing functions, like:
-- Get a reference cursor
CREATE FUNCTION get_refcursor()
RETURNS refcursor AS
$func$
DECLARE
return_cursor refcursor;
BEGIN
OPEN return_cursor FOR SELECT 'One Record';
RETURN return_cursor;
END
$func$ LANGUAGE plpgsql;
We can simplify this by using an OUT argument instead allowing us to omit both the declaration block and return statement, see the Create Function documentation.
-- Get a reference cursor
CREATE FUNCTION get_refcursor(OUT return_cursor refcursor)
RETURNS refcursor AS
$func$
BEGIN
OPEN return_cursor FOR SELECT 'One Record';
END
$func$ LANGUAGE plpgsql;
Using one of our get cursor functions we can now easily count the number of returned cursor records (only 1) by passing the function as argument.
SELECT get_cursor_size(get_refcursor());
nJoy!

Related

how to save a result from postgres querys from different tables inside a function

I have Postgres function that needs to iterate on an ARRAY of tables_name and should save the value that will be returned from the query each time to array.
maybe this is not correct way so if there is better ways to do it I'll be glad to know :)
I've try with format function to generate different queries each time.
CREATE OR REPLACE FUNCTION array_iter(tables_name text[],idd integer)
RETURNS void
LANGUAGE 'plpgsql'
AS $BODY$
declare
current_table text;
current_height integer :=0;
quer text;
heights integer[];
begin
FOREACH current_table IN ARRAY $1
LOOP
quer:=format('SELECT height FROM %s WHERE %s.idd=$2', current_table);
current_height:=EXECUTE quer;
SELECT array_append(heights, current_height);
END LOOP;
RAISE NOTICE '%', heights;
end;
$BODY$;
First off you desperately need to update your Postgres as version 9.1 is no longer supported and has not been for 5 years (Oct 2016). I would suggest going to v13 as it is the latest, but an absolute minimum to 10.12. That will still has slightly over a year (Nov 2022) before it looses support. So with that in mind.
The statement quer:=format('SELECT height FROM %s WHERE %s.idd=$2', current_table); is invalid, it contains 2 format specifiers but only 1 argument. You could use the single argument by including the argument number on each specifier. So quer:=format('SELECT height FROM %1s WHERE %1s.idd=$2', current_table);. But that is not necessary as the 2nd is a table alias which need not be the table name and since you only have 1 table is not needed at all. I would however move the parameter ($2) out of the select and use a format specifiers/argument for it.
The statement current_height:=EXECUTE quer; is likewise invalid, you cannot make the Right Val of assignment a select. For this you use the INTO option which follows the statement. execute query into ....
While SELECT array_append(heights, current_height); is a valid statement a simple assignment heights = heights || current_height; seems easier (at least imho).
Finally there a a couple omissions. Prior to running a dynamic SQL statement it is good practice to 'print' or log the statement before executing. What happens when the statement has an error. And why build a function to do all this work just to throw away the results, so instead of void return integer array (integer[]).
So we arrive at:
create or replace function array_iter(tables_name text[],idd integer)
returns integer[]
language plpgsql
as $$
declare
current_table text;
current_height integer :=0;
quer text;
heights integer[];
begin
foreach current_table in array tables_name
loop
quer:=format('select height from %I where id=%s', current_table,idd);
raise notice 'Running Query:: %',quer;
execute quer into current_height;
heights = heights || current_height;
end loop;
raise notice '%', heights;
return heights;
exception
when others then
raise notice 'Query failed:: SqlState:%, ErrorMessage:%', sqlstate,sqlerrm;
raise;
end;
$$;
This does run on version as old as 9.5 (see fiddle) although I cannot say that it runs on the even older 9.1.

How to include a grand total count along with pagination and sorting?

Have a PostgreSQL function findbooks() accepts params for WHERE, OFFSET and LIMIT, dynamically builds a SELECT returning rows, for example
create or replace sa.findbooks(...) returns SETOF sa.books language 'plpgsql'
as $$
declare dynamicsql text;
begin
if ... then
dynamicsql := 'select ...';
end if;
execute dynamicsql;
end;
$$
Supposed the frontend wants to show how many total books met criteria BEFORE pagination. One way I can think of is to append a new column for this piece of info, obviously all cells of this column will have the same repeated value. Is there any better way?
PostgreSQL v12.
To get the total count, there is no better way than to execute an extra query that counts the data you want. This will be inefficient, but then paging with LIMIT and OFFSET is inefficient too, so it might not matter too much.

Should i use STABLE or VOLATILE in a function that performs a query?

i'm trying to use the right modifier to a particular function that checks if a count is bigger or equal 2, but i'm not sure which one to use, here is the function:
CREATE FUNCTION check_table_ids()
RETURNS trigger AS
$$
DECLARE
counter integer := (SELECT count(*) FROM table WHERE fk_id = NEW.fk_id AND status <> 'CANCELED');
BEGIN
IF counter >= 2 THEN
RAISE EXCEPTION 'The number of entries for "%" is greater or equal than 2', NEW.fk_id;
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql VOLATILE;
BTW this function will be called by a trigger on insert.
According to the manual (https://www.postgresql.org/docs/current/static/sql-createfunction.html)
STABLE indicates that the function cannot modify the database, and that within a single table scan it will consistently return the same result for the same argument values, but that its result could change across SQL statements.
So if you modify the database or give different results without the database changing then use VOLATILE otherwise use STABLE.
For the code in your question STABLE should be fine.
The fact that this is called by a trigger doesn't make a difference as it's the content of the function you are declaring as STABLE not its usage.

Execution time issue - Postgresql

Below is the function which i am running on two different tables which contains same column names.
-- Function: test(character varying)
-- DROP FUNCTION test(character varying);
CREATE OR REPLACE FUNCTION test(table_name character varying)
RETURNS SETOF void AS
$BODY$
DECLARE
recordcount integer;
j integer;
hstoredata hstore;
BEGIN
recordcount:=getTableName(table_name);
FOR j IN 1..recordcount LOOP
RAISE NOTICE 'RECORD NUMBER IS: %',j;
EXECUTE format('SELECT hstore(t) FROM datas.%I t WHERE id = $1', table_name) USING j INTO hstoredata;
RAISE NOTICE 'hstoredata: %', hstoredata;
END LOOP;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100
ROWS 1000;
When the above function is run on a table containing 1000 rows time taken is around 536 ms.
When the above function is run on a table containing 10000 rows time taken is around 27994 ms.
Logically time taken for 10000 rows should be around 5360 ms based on the calculation from 1000 rows, but the execution time is very high.
In order to reduce execution time, please suggest what changes to be made.
Logically time taken for 10000 rows should be around 5360 ms based on
the calculation from 1000 rows, but the execution time is very high.
It assumes that reading any particular row takes the same time as reading any other row, but this is not true.
For instance, if there's a text column in the table and it sometimes contains large contents, it's going to be fetched from TOAST storage (out of page) and dynamically uncompressed.
In order to reduce execution time, please suggest what changes to be
made.
To read all the table rows while not necessary fetching all in memory at once, you may use a cursor. That would avoid a new query at every loop iteration. Cursors accept dynamic queries through EXECUTE.
See Cursors in plpgsql documentation.
As far as I can tell you are over complicating things. As the "recordcount" is used to increment the ID values, I think you can do everything with a single statement instead of querying for each and every ID separately.
CREATE OR REPLACE FUNCTION test(table_name varchar)
RETURNS void AS
$BODY$
DECLARE
rec record;
begin
for rec in execute format ('select id, hstore(t) as hs from datas.%I', table_name) loop
RAISE NOTICE 'RECORD NUMBER IS: %',rec.id;
RAISE NOTICE 'hstoredata: %', rec.hs;
end loop;
end;
$BODY$
language plpgsql;
The only thing where this would be different than your solution is, that if an ID smaller than the count of rows in the table does not exist, you won't see a RECORD NUMBER message for that. But you would see ids that are bigger than the row count of the table.
Any time you execute the same statement again and again in a loop very, very loud alarm bells should ring in your head. SQL is optimized to deal with sets of data, not to do row-by-row processing (which is what your loop is doing).
You didn't tell us what the real problem is you are trying to solve (and I fear that you have over-simplified your example) but given the code from the question, the above should be a much better solution (definitely much faster).

Calculate number of rows affected by batch query in PostgreSQL

First of all, yes I've read documentation for DO statement :)
http://www.postgresql.org/docs/9.1/static/sql-do.html
So my question:
I need to execute some dynamic block of code that contains UPDATE statements and calculate the number of all affected rows. I'm using Ado.Net provider.
In Oracle the solution would have 4 steps:
add InputOutput parameter "N" to command
add BEGIN ... END; to command
add :N := :N + sql%rowcount after each statement.
It's done! We can read N parameter from command, after execute it.
How can I do it with PostgreSQL? I'm using npgsql provider but can migrate to devard if it helps.
DO statement blocks are good to execute dynamic SQL. They are no good to return values. Use a plpgsql function for that.
The key statement you need is:
GET DIAGNOSTICS integer_var = ROW_COUNT;
Details in the manual.
Example code:
CREATE OR REPLACE FUNCTION f_upd_some()
RETURNS integer AS
$func$
DECLARE
ct int;
i int;
BEGIN
EXECUTE 'UPDATE tbl1 ...'; -- something dynamic here
GET DIAGNOSTICS ct = ROW_COUNT; -- initialize with 1st count
UPDATE tbl2 ...; -- nothing dynamic here
GET DIAGNOSTICS i = ROW_COUNT;
ct := ct + i; -- add up
RETURN ct;
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM f_upd_some();
My solution is quite simple. In Oracle I need to use variables to calculate the sum of updated rows because command.ExecuteNonQuery() returns only the count of rows affected by the last UPDATE in the batch.
However, npgsql returns the sum of all rows updated by all UPDATE queries. So I only need to call command.ExecuteNonQuery() and get the result without any variables. Much easier than with Oracle.