My query doesnt even work, but hopefully the logic comes through. Basically Im using datavault.dimdates_csv to produce a row for each date/day. Then for each day Im trying to get all account ids and for each and call a function using the date and for the account.
is there a better approach to getting my data? I know nested loops arnt that great for sql.
do
$$
declare
date_record record;
account record;
begin
for date_record in select d."date" from datavault.dimdates_csv d
for account in select sad.id from datavault.sat_account_details sad
select datavault.account_active_for_date(date_record , account)
loop
loop
end loop;
end;
$$
It's hard to follow your business logic but syntax-wise your block needs correction. Please note that d."date" and sad.id are scalars (I assume a date and an integer) and not records.
do
$$
declare
running_date date;
running_id integer;
begin
for running_date in select d."date" from datavault.dimdates_csv d loop
for running_id in select sad.id from datavault.sat_account_details sad loop
perform datavault.account_active_for_date(running_date, running_id);
end loop;
end loop;
end;
$$;
As far as I can see you are calling function datavault.account_active_for_date for every pair of d."date" and sad.id. If this is true then you can simply
select datavault.account_active_for_date(d."date", sad.id)
from datavault.dimdates_csv d, datavault.sat_account_details sad;
and ignore the resultset.
Related
I have Postgres function that needs to iterate on an ARRAY of tables_name and should save the value that will be returned from the query each time to array.
maybe this is not correct way so if there is better ways to do it I'll be glad to know :)
I've try with format function to generate different queries each time.
CREATE OR REPLACE FUNCTION array_iter(tables_name text[],idd integer)
RETURNS void
LANGUAGE 'plpgsql'
AS $BODY$
declare
current_table text;
current_height integer :=0;
quer text;
heights integer[];
begin
FOREACH current_table IN ARRAY $1
LOOP
quer:=format('SELECT height FROM %s WHERE %s.idd=$2', current_table);
current_height:=EXECUTE quer;
SELECT array_append(heights, current_height);
END LOOP;
RAISE NOTICE '%', heights;
end;
$BODY$;
First off you desperately need to update your Postgres as version 9.1 is no longer supported and has not been for 5 years (Oct 2016). I would suggest going to v13 as it is the latest, but an absolute minimum to 10.12. That will still has slightly over a year (Nov 2022) before it looses support. So with that in mind.
The statement quer:=format('SELECT height FROM %s WHERE %s.idd=$2', current_table); is invalid, it contains 2 format specifiers but only 1 argument. You could use the single argument by including the argument number on each specifier. So quer:=format('SELECT height FROM %1s WHERE %1s.idd=$2', current_table);. But that is not necessary as the 2nd is a table alias which need not be the table name and since you only have 1 table is not needed at all. I would however move the parameter ($2) out of the select and use a format specifiers/argument for it.
The statement current_height:=EXECUTE quer; is likewise invalid, you cannot make the Right Val of assignment a select. For this you use the INTO option which follows the statement. execute query into ....
While SELECT array_append(heights, current_height); is a valid statement a simple assignment heights = heights || current_height; seems easier (at least imho).
Finally there a a couple omissions. Prior to running a dynamic SQL statement it is good practice to 'print' or log the statement before executing. What happens when the statement has an error. And why build a function to do all this work just to throw away the results, so instead of void return integer array (integer[]).
So we arrive at:
create or replace function array_iter(tables_name text[],idd integer)
returns integer[]
language plpgsql
as $$
declare
current_table text;
current_height integer :=0;
quer text;
heights integer[];
begin
foreach current_table in array tables_name
loop
quer:=format('select height from %I where id=%s', current_table,idd);
raise notice 'Running Query:: %',quer;
execute quer into current_height;
heights = heights || current_height;
end loop;
raise notice '%', heights;
return heights;
exception
when others then
raise notice 'Query failed:: SqlState:%, ErrorMessage:%', sqlstate,sqlerrm;
raise;
end;
$$;
This does run on version as old as 9.5 (see fiddle) although I cannot say that it runs on the even older 9.1.
I have a cursor created using the WITH HOLD option that allows the cursor to be used for subsequent transactions.
I would like to retrieve the number of rows that can be fetched by the cursor. Since the rows represented by a held cursor are copied into a temporary file or memory area, I am wondering if it is possible to retrieve that number in a straightforward way or if the only solution is to fetch all the records to count them.
In that case, a MOVE FORWARD ALL FROM <cursor> statement returns MOVE x. Where x is
the number moved. The result is a command tag written to stdout, and I do not know how to retrieve that value in a pgsql function. GET DIAGNOSTICS <var> := ROW_COUNT only works for FETCH but not MOVE.
Here is a solution proposal, how do you think I can improve it ? (and is it possible to use MOVE instead of FETCH to retrieve the x value ?)
-- Function returning the number of rows available in the cursor
CREATE FUNCTION get_cursor_size(_cursor_name TEXT)
RETURNS TEXT AS
$func$
DECLARE
_n_rows int;
BEGIN
-- Set cursor at the end of records
EXECUTE format('FETCH FORWARD ALL FROM %I', _cursor_name);
-- Retrieve number of rows
GET DIAGNOSTICS _n_rows := ROW_COUNT;
-- Set cursor at the beginning
EXECUTE format('MOVE ABSOLUTE 0 FROM %I', _cursor_name);
RETURN _n_rows;
END
$func$ LANGUAGE plpgsql;
Thank you very much for your help
I don't believe there is a way to do that in PL/pgSQL.
As a workaround, you could define the cursor so that it includes a row number:
DECLARE c CURSOR WITH HOLD FOR
SELECT *, row_number() OVER ()
FROM (SELECT /* your original query */) AS s;
That is only marginally more expensive than the original query, and it allows you to position the cursor on the last row with MOVE and retrieve the row_number, which is the total number of rows.
Yes it is possible to use MOVE instead of FETCH to count the records in a cursor, with slightly improved performance. As it turns out we can indeed retrieve ROW_COUNT diagnostics from a MOVE cursor statement.
GOTCHA: Using EXECUTE does not update the GET DIAGNOSTICS for MOVE while it does for FETCH, and neither statements will update the FOUND variable.
This is not clearly stipulated in the PostgreSQL documentanion per se, but considering that MOVE does not produce any actual results it might make sense enough to be excused.
NOTE: The following examples do not reset the cursor position back to 0 as with the original example, allowing the function to be used with all cursor types especially NO SCROLL cursors which will reject backward movement by raising an error.
Using MOVE instead of FETCH
The holy grail of PostgreSQL cursor record count methods we've all been waiting for. Modify the function to take a refcursor as argument and instead execute the MOVE cursor statement directly which then makes ROW_COUNT available.
-- Function returning the number of rows available in the cursor
CREATE FUNCTION get_cursor_size(_cursor refcursor)
RETURNS TEXT AS
$func$
DECLARE
_n_rows int;
BEGIN
-- Set cursor at the end of records
MOVE FORWARD ALL FROM _cursor;
-- Retrieve number of rows
GET DIAGNOSTICS _n_rows := ROW_COUNT;
RETURN _n_rows;
END
$func$ LANGUAGE plpgsql;
Alternative approaches using MOVE
Provided here for completion.
Another approach is to LOOP through the cursor until FOUND returns falsey, however this approach is notably slower than even the FETCH ALL method from the original example in the question.
-- Increment the cursor position and count the rows
CREATE FUNCTION get_cursor_size(_cursor refcursor)
RETURNS TEXT AS
$func$
DECLARE
_n_rows int := 0;
begin
LOOP
-- Move cursor position
MOVE FORWARD 1 IN _cursor;
-- While not found
EXIT WHEN NOT FOUND;
-- Increment counter
_n_rows := _n_rows + 1;
END LOOP;
RETURN _n_rows;
END
$func$ LANGUAGE plpgsql;
Increasing the step size does improve performance but the result will be rounded up because FOUND will report success if the cursor has moved at all. To rectify this we can look up ROW_COUNT and increment by the actual amount moved instead.
-- Count the actual number of rows incremented
CREATE FUNCTION get_cursor_size(_cursor refcursor)
RETURNS TEXT AS
$func$
DECLARE
_n_rows int := 0;
_move_count int;
begin
LOOP
-- Move cursor position
MOVE FORWARD 100 IN _cursor;
-- Until not found
EXIT WHEN NOT FOUND;
-- Increment counter
GET DIAGNOSTICS _move_count := ROW_COUNT;
_n_rows := _n_rows + _move_count;
END LOOP;
RETURN _n_rows;
END
$func$ LANGUAGE plpgsql;
With 50,000 records I had to increase the step size to 1000 before I noticed any improvement over FETCH ALL so unless there is something else worth doing simultaneously the incremental approach is less optimal.
Passing cursor functions
We will probably want to use this with cursor producing functions, like:
-- Get a reference cursor
CREATE FUNCTION get_refcursor()
RETURNS refcursor AS
$func$
DECLARE
return_cursor refcursor;
BEGIN
OPEN return_cursor FOR SELECT 'One Record';
RETURN return_cursor;
END
$func$ LANGUAGE plpgsql;
We can simplify this by using an OUT argument instead allowing us to omit both the declaration block and return statement, see the Create Function documentation.
-- Get a reference cursor
CREATE FUNCTION get_refcursor(OUT return_cursor refcursor)
RETURNS refcursor AS
$func$
BEGIN
OPEN return_cursor FOR SELECT 'One Record';
END
$func$ LANGUAGE plpgsql;
Using one of our get cursor functions we can now easily count the number of returned cursor records (only 1) by passing the function as argument.
SELECT get_cursor_size(get_refcursor());
nJoy!
As I am newbie to plpgSQL,
I stuck while migrating a Oracle query into PostgreSQL.
Oracle query:
create or replace FUNCTION employee_all_case(
p_ugr_id IN integer,
p_case_type_id IN integer
)
RETURN number_tab_t PIPELINED
-- LANGUAGE 'plpgsql'
-- COST 100
-- VOLATILE
-- AS $$
-- DECLARE
is
l_user_id NUMBER;
l_account_id NUMBER;
BEGIN
l_user_id := p_ugr_id;
l_account_id := p_case_type_id;
FOR cases IN
(SELECT ccase.case_id, ccase.employee_id
FROM ct_case ccase
INNER JOIN ct_case_type ctype
ON (ccase.case_type_id=ctype.case_type_id)
WHERE ccase.employee_id = l_user_id)
LOOP
IF cases.employee_id IS NOT NULL THEN
PIPE ROW (cases.case_id);
END IF;
END LOOP;
RETURN;
END;
--$$
When I execute this function then I get the following result
select * from table(select employee_all_case(14533,1190) from dual)
My question here is: I really do not understand the pipelined function and how can I obtain the same result in PostgreSQL as Oracle query ?
Please help.
Thank you guys, your solution was very helpful.
I found the desire result:
-- select * from employee_all_case(14533,1190);
-- drop function employee_all_case
create or replace FUNCTION employee_all_case(p_ugr_id IN integer ,p_case_type_id IN integer)
returns table (case_id double precision)
-- PIPELINED
LANGUAGE 'plpgsql'
COST 100
VOLATILE
AS $$
DECLARE
-- is
l_user_id integer;
l_account_id integer;
BEGIN
l_user_id := cp_lookup$get_user_id_from_ugr_id(p_ugr_id);
l_account_id := cp_lookup$acctid_from_ugr(p_ugr_id);
RETURN QUERY SELECT ccase.case_id
FROM ct_case ccase
INNER JOIN ct_case_type ctype ON ccase.case_type_id = ctype.case_type_id
WHERE ccase.employee_id = p_ugr_id
and ccase.employee_id IS NOT NULL;
--return NEXT;
END;
$$
You would rewrite that to a set returning function:
Change the return type to
RETURNS SETOF integer
and do away with the PIPELINED.
Change the PIPE ROW statement to
RETURN NEXT cases.case_id;
Of course, you will have to do the obvious syntactic changes, like using integer instead of NUMBER and putting the IN before the parameter name.
But actually, it is quite unnecessary to write a function for that. Doing it in a single SELECT statement would be both simpler and faster.
Pipelined functions are best translated to a simple SQL function returning a table.
Something like this:
create or replace function employee_all_case(p_ugr_id integer, p_case_type_IN integer)
returns table (case_id integer)
as
$$
SELECT ccase.case_id
FROM ct_case ccase
INNER JOIN ct_case_type ctype ON ccase.case_type_id = ctype.case_type_id
WHERE ccase.employee_id = p_ugr_id
and cases.employee_id IS NOT NULL;
$$
language sql;
Note that your sample code did not use the second parameter p_case_type_id.
Usage is also more straightforward:
select *
from employee_all_case(14533,1190);
Before diving into the solution, I will provide some information which will help you to understand better.
So basically PIPELINED came into picture for improving memory allocation at run time.
As you all know collections will occupy space when ever they got created. So the more you use, the more memory will get allocated.
Pipelining negates the need to build huge collections by piping rows out of the function.
saving memory and allowing subsequent processing to start before all the rows are generated.
Pipelined table functions include the PIPELINED clause and use the PIPE ROW call to push rows out of the function as soon as they are created, rather than building up a table collection.
By using Pipelined how memory usage will be optimized?
Well, it's very simple. instead of storing data into an array, just process the data by using pipe row(desired type). This actually returns the row and process the next row.
coming to solution in plpgsql
simple but not preferred while storing large data.
Remove PIPELINED from return declaration and return an array of desired type. something like RETURNS typrec2[].
Where ever you are using pipe row(), add that entry to array and finally return that array.
create a temp table like
CREATE TEMPORARY TABLE temp_table (required fields) ON COMMIT DROP;
and insert data into it. Replace pipe row with insert statement and finally return statement like
return query select * from temp_table
**The best link for understanding PIPELINED in oracle [https://oracle-base.com/articles/misc/pipelined-table-functions]
pretty ordinary for postgres reference [http://manojadinesh.blogspot.com/2011/11/pipelined-in-oracle-as-well-in.html]
Hope this helps some one conceptually.
I wrote the following trigger:
CREATE FUNCTION trig_func() RETURNS trigger AS $$
BEGIN
IF NEW = OLD
THEN -- update would do nothing, doing something...
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trig BEFORE UPDATE ON some_table
FOR EACH ROW EXECUTE PROCEDURE trig_func();
It makes it clear what I'd like to achieve, but what is the proper thing to put in place of NEW = OLD?
The is distinct from operator can compare complete rows and will handle nulls correctly.
So you want
if new is not distinct from old then
...
end if;
I had a function with a performance issue:
totalCharge := 0;
FOR myRecord IN ... LOOP
......
IF severalConditionsAreMet THEN
BEGIN
SELECT t1.charge INTO STRICT recordCharge
FROM t1
WHERE t1.id = myRecord.id AND otherComplexConditionsHere;
totalCharge := totalCharge + recordCharge;
...........
EXCEPTION
WHEN OTHERS THEN
NULL;
END;
END IF;
END LOOP;
The function was being called 232 times (not counting the number of times the code from the FOR was accessed).
The IF from the FOR LOOP ended up being accessed 4466 times and was taking 561 seconds to complete all 4466 iterations.
For the particular data set that I had, the IF was always accessed, the SELECT from above never return data and the code was reaching the EXCEPTION branch each and every time.
I have changed the code to:
totalCharge := 0;
FOR myRecord IN ... LOOP
......
IF severalConditionsAreMet THEN
SELECT t1.charge INTO recordCharge
FROM t1
WHERE t1.id = myRecord.id AND otherComplexConditionsHere;
IF (recordCharge IS NULL) THEN
CONTINUE;
END IF;
totalCharge := totalCharge + recordCharge;
...........
END IF;
END LOOP;
Please note that for the table t1, the t1.charge column has a NOT NULL condition defined on it.
This time, the code from the IF takes 1-2 seconds to complete all 4466 iterations.
Basically, all I did was replace the
BEGIN
…
EXCEPTION
….
END;
With
IF conditionIsNotMet THEN
CONTINUE;
END IF;
Can someone please explain to me why this worked?
What happened behind the scenes?
I suspect that when you catch exceptions inside of a LOOP and the code ends up generating an exception, Postgres can’t use cached plans to optimize that code so it ends up planning the code at each iteration and this causes performance issues.
Is my assumption correct?
Later Edit:
I altered the example provided by Vao Tsun to reflect the case that I want to illustrate.
CREATE OR REPLACE FUNCTION initialVersion()
RETURNS VOID AS $$
declare
testDate DATE;
begin
for i in 1..999999 loop
begin
select now() into strict testDate where 1=0;
exception when others
then null;
end;
end loop;
end;
$$ Language plpgsql;
CREATE OR REPLACE FUNCTION secondVersion()
RETURNS VOID AS $$
declare
testDate DATE;
begin
for i in 1..999999 loop
select now() into testDate where 1=0;
if testDate is null then
continue;
end if;
end loop;
end;
$$ Language plpgsql;
select initialVersion(); -- 19.7 seconds
select secondVersion(); -- 5.2
As you can see there is a difference of almost 15 seconds.
In the example that I have provided initially, the difference is bigger because the SELECT FROM t1 runs against complex data and takes more time to execute that the simple SELECT provided in this second example.
I asked the same question here, in the PostgreSQL - general mailing group and got some responses that elucidated this "mystery" for me:
David G. Johnston:
"Tip: A block containing an EXCEPTION clause is significantly
more expensive to enter and exit than a block without one. Therefore,
don't use EXCEPTION without need."
I'm somewhat doubting "plan caching" has anything to do with this; I
suspect its basically that there is high memory and runtime overhead
to deal with the possibilities of needing to convert a exception into
a branch instead of allowing it to be fatal.
Tom Lane:
Yeah, it's about the overhead of setting up and ending a
subtransaction. That's a fairly expensive mechanism, but we don't have
anything cheaper that is able to recover from arbitrary errors.
and an addition from David G. Johnston:
[...] setting up the pl/pgsql execution layer to trap "arbitrary SQL-layer
exceptions" is fairly expensive. Even if the user specifies specific
errors the error handling mechanism in pl/pgsql is code for generic
(arbitrary) errors being given to it.
These answers helped me understand a bit how things work.
I am posting this answer here 'cause I hope that this answer will help someone else.
with given details - cant reproduce:
t=# do
$$
declare
begin
for i in 1..999999 loop
perform now();
/* exception when others then null; */
if null then null; end if;
end loop;
end;
$$
;
DO
Time: 1920.568 ms
t=# do
$$
declare
begin
for i in 1..999999 loop
begin
perform now();
exception when others then null;
end;
end loop;
end;
$$
;
DO
Time: 2417.425 ms
as you can see with 10 millions iterations the difference is clear, but insignificant. please try same test on your machine - if you get same results, you need to provide more details...