Using variables in a PL/pgSQL function - postgresql

Postgres PL/pgSQL docs say:
For any SQL command that does not return rows, for example INSERT
without a RETURNING clause, you can execute the command within a
PL/pgSQL function just by writing the command.
Any PL/pgSQL variable name appearing in the command text is treated as
a parameter, and then the current value of the variable is provided as
the parameter value at run time.
But when I use variable names in my queries I get an error:
ERROR: syntax error at or near "email"
LINE 16: ...d,email,password) values(identity_id,current_ts,''email'',''...
This is my function:
CREATE OR REPLACE FUNCTION app.create_identity(email varchar,passwd varchar)
RETURNS integer as $$
DECLARE
current_ts integer;
new_identity_id integer;
int_max integer;
int_min integer;
BEGIN
SELECT extract(epoch FROM now())::integer INTO current_ts;
int_min:=-2147483648;
int_max:= 2147483647;
LOOP
BEGIN
SELECT floor(int_min + (int_max - int_min + 1) * random()) INTO new_identity_id;
IF new_identity_id != 0 THEN
INSERT into app.identity(identity_id,date_inserted,email,password) values(identity_id,current_ts,''email'',''passwd'');
RETURN new_identity_id;
END IF;
EXCEPTION
WHEN unique_violation THEN
END;
END LOOP;
END;
$$ LANGUAGE plpgsql;
Why when I use variables in the query, Postgres throws an error. How is this supposed to be written?

You can not put the parameter names in single quotes (''email'' and you can't use the parameter email "as is" because it has the same name as a column in the table. This name clash is one of the reasons it is highly recommended to not use variables or parameters that have the same name as a column in one of the tables. You have three choices to deal with this:
rename the variable. A common naming convention is to prefix parameters with p_ e.g. p_email, then use the un-ambigous names in the insert
INSERT into app.identity(identity_id,date_inserted,email,password)
values(identity_id,current_ts,p_email,p_password);
use the $1 for the first parameter and $2 for the second:
INSERT into app.identity(identity_id,date_inserted,email,password)
values(identity_id,current_ts,$1,$2);
prefix the parameter name with the function name:
INSERT into app.identity(identity_id,date_inserted,email,password)
values(identity_id,current_ts,create_identity.email,create_identity.password);
I would highly recommend to go with option 1
Unrelated, but: you don't need SELECT statements to assign variable values if you don't retrieve those values from a table.
SELECT extract(epoch FROM now())::integer INTO current_ts;
can be simplified to:
current_ts := extract(epoch FROM now())::integer;
and
SELECT floor(int_min + (int_max - int_min + 1) * random()) INTO new_identity_id;
to
new_identity_id := floor(int_min + (int_max - int_min + 1) * random());

#a_horse answers your actual question and clarifies quoting issues and naming conflicts.
About quoting:
Insert text with single quotes in PostgreSQL
About naming conflicts (behavior of plpgsql changed slightly over time):
Set a default return value for a Postgres function
Postgres function returning a row as JSON value
Better solution
I suggest a completely different approach to begin with:
CREATE OR REPLACE FUNCTION app.create_identity(_email text, _passwd text
, OUT new_identity_id int) AS
$func$
DECLARE
_current_ts int := extract(epoch FROM now());
BEGIN
LOOP
--+ Generate compeltely random int4 numbers +-----------------------------
-- integer (= int4) in Postgres is a signed integer occupying 4 bytes --
-- int4 ranges from -2147483648 to +2147483647, i.e. -2^31 to 2^31 - 1 --
-- Multiply bigint 4294967296 (= 2^32) with random() (0.0 <= x < 1.0) --
-- trunc() the resulting (positive!) float8 - cheaper than floor() --
-- add result to -2147483648 and cast the next result back to int4 --
-- The result fits the int4 range *exactly* --
--------------------------------------------------------------------------
INSERT INTO app.identity
(identity_id, date_inserted, email , password)
SELECT _random_int, _current_ts , _email , _passwd
FROM (SELECT (bigint '-2147483648' -- could be int, but sum is bigint anyway
+ bigint '4294967296' * random())::int) AS t(_random_int) -- random int
WHERE _random_int <> 0 -- exclude 0 (no insert)
ON CONFLICT (identity_id) DO NOTHING -- no exception raised!
RETURNING identity_id -- return *actually* inserted identity_id
INTO new_identity_id; -- OUT parameter, returned at end
EXIT WHEN FOUND; -- exit after success
-- maybe add counter and raise exception when exceeding n (100?) iterations
END LOOP;
END
$func$ LANGUAGE plpgsql;
Major points
Your random integer calculation would result in an integer out of range error, because the intermediate term int_max - int_min + 1 operates with integer, but the result won't fit.
I suggest the cheaper, correct algorithm above.
Entering a block with exception clause is considerably more expensive than without. Luckily, you do not actually need to raise an exception to begin with. Use an UPSERT (INSERT ... ON CONFLICT ... DO NOTHING), to solve this cheaply and elegantly (Postgres 9.5+).
The manual:
Tip: A block containing an EXCEPTION clause is significantly more
expensive to enter and exit than a block without one. Therefore, don't
use EXCEPTION without need.
You don't need the extra IF construct either. Use SELECT with WHERE.
Make new_identity_id an OUT parameter to simplify.
Use the RETURNING clause and insert the resulting identity_id into the OUT parameter directly. Simpler code and faster execution aside, there is an additional, subtle benefit: you get the value that was actually inserted. If there are triggers or rules on the table, this might differ from what you sent with the INSERT.
Assignments are relatively expensive in PL/pgSQL. Reduce those to a minimum for efficient code.
You might remove the last remaining variable _current_ts as well, and do the calculation in the subquery, then you don't need a DECLARE at all. I left that one, since it might make sense to calculate it once, should the function loop multiple times ...
All that remains is one SQL command, wrapped into a LOOP to retry until success.
If there is a chance that your table might overflow (using all or most int4 numbers) - and strictly speaking, there is always the chance - I would add a counter and raise an exception after maybe 100 iterations to avoid endless loops.

Related

Declare and return value for DELETE and INSERT

I am trying to remove duplicated data from some of our databases based upon unique id's. All deleted data should be stored in a separate table for auditing purposes. Since it concerns quite some databases and different schemas and tables I wanted to start using variables to reduce chance of errors and the amount of work it will take me.
This is the best example query I could think off, but it doesn't work:
do $$
declare #source_schema varchar := 'my_source_schema';
declare #source_table varchar := 'my_source_table';
declare #target_table varchar := 'my_target_schema' || source_table || '_duplicates'; --target schema and appendix are always the same, source_table is a variable input.
declare #unique_keys varchar := ('1', '2', '3')
begin
select into #target_table
from #source_schema.#source_table
where id in (#unique_keys);
delete from #source_schema.#source_table where export_id in (#unique_keys);
end ;
$$;
The query syntax works with hard-coded values.
Most of the times my variables are perceived as columns or not recognized at all. :(
You need to create and then call a plpgsql procedure with input parameters :
CREATE OR REPLACE PROCEDURE duplicates_suppress
(my_target_schema text, my_source_schema text, my_source_table text, unique_keys text[])
LANGUAGE plpgsql AS
$$
BEGIN
EXECUTE FORMAT(
'WITH list AS (INSERT INTO %1$I.%3$I_duplicates SELECT * FROM %2$I.%3$I WHERE array[id] <# %4$L :: integer[] RETURNING id)
DELETE FROM %2$I.%3$I AS t USING list AS l WHERE t.id = l.id', my_target_schema, my_source_schema, my_source_table, unique_keys :: text) ;
END ;
$$ ;
The procedure duplicates_suppress inserts into my_target_schema.my_source_table || '_duplicates' the rows from my_source_schema.my_source_table whose id is in the array unique_keys and then deletes these rows from the table my_source_schema.my_source_table .
See the test result in dbfiddle.
As has been commented, you need some kind of dynamic SQL. In a FUNCTION, PROCEDURE or a DO statement to do it on the server.
You should be comfortable with PL/pgSQL. Dynamic SQL is no beginners' toy.
Example with a PROCEDURE, like Edouard already suggested. You'll need a FUNCTION instead to wrap it in an outer transaction (like you very well might). See:
When to use stored procedure / user-defined function?
CREATE OR REPLACE PROCEDURE pg_temp.f_archive_dupes(_source_schema text, _source_table text, _unique_keys int[], OUT _row_count int)
LANGUAGE plpgsql AS
$proc$
-- target schema and appendix are always the same, source_table is a variable input
DECLARE
_target_schema CONSTANT text := 's2'; -- hardcoded
_target_table text := _source_table || '_duplicates';
_sql text := format(
'WITH del AS (
DELETE FROM %I.%I
WHERE id = ANY($1)
RETURNING *
)
INSERT INTO %I.%I TABLE del', _source_schema, _source_table
, _target_schema, _target_table);
BEGIN
RAISE NOTICE '%', _sql; -- debug
EXECUTE _sql USING _unique_keys; -- execute
GET DIAGNOSTICS _row_count = ROW_COUNT;
END
$proc$;
Call:
CALL pg_temp.f_archive_dupes('s1', 't1', '{1, 3}', 0);
db<>fiddle here
I made the procedure temporary, since I assume you don't need to keep it permanently. Create it once per database. See:
How to create a temporary function in PostgreSQL?
Passed schema and table names are case-sensitive strings! (Unlike unquoted identifiers in plain SQL.) Either way, be wary of SQL-injection when concatenating SQL dynamically. See:
Are PostgreSQL column names case-sensitive?
Table name as a PostgreSQL function parameter
Made _unique_keys type int[] (array of integer) since your sample values look like integers. Use a the actual data type of your id columns!
The variable _sql holds the query string, so it can easily be debugged before actually executing. Using RAISE NOTICE '%', _sql; for that purpose.
I suggest to comment the EXECUTE line until you are sure.
I made the PROCEDURE return the number of processed rows. You didn't ask for that, but it's typically convenient. At hardly any cost. See:
Dynamic SQL (EXECUTE) as condition for IF statement
Best way to get result count before LIMIT was applied
Last, but not least, use DELETE ... RETURNING * in a data-modifying CTE. Since that has to find rows only once it comes at about half the cost of separate SELECT and DELETE. And it's perfectly safe. If anything goes wrong, the whole transaction is rolled back anyway.
Two separate commands can also run into concurrency issues or race conditions which are ruled out this way, as DELETE implicitly locks the rows to delete. Example:
Replicating data between Postgres DBs
Or you can build the statements in a client program. Like psql, and use \gexec. Example:
Filter column names from existing table for SQL DDL statement
Based on Erwin's answer, minor optimization...
create or replace procedure pg_temp.p_archive_dump
(_source_schema text, _source_table text,
_unique_key int[],_target_schema text)
language plpgsql as
$$
declare
_row_count bigint;
_target_table text := '';
BEGIN
select quote_ident(_source_table) ||'_'|| array_to_string(_unique_key,'_') into _target_table from quote_ident(_source_table);
raise notice 'the deleted table records will store in %.%',_target_schema, _target_table;
execute format('create table %I.%I as select * from %I.%I limit 0',_target_schema, _target_table,_source_schema,_source_table );
execute format('with mm as ( delete from %I.%I where id = any (%L) returning * ) insert into %I.%I table mm'
,_source_schema,_source_table,_unique_key, _target_schema, _target_table);
GET DIAGNOSTICS _row_count = ROW_COUNT;
RAISE notice 'rows influenced, %',_row_count;
end
$$;
--
if your _unique_key is not that much, this solution also create a table for you. Obviously you need to create the target schema yourself.
If your unique_key is too much, you can customize to properly rename the dumped table.
Let's call it.
call pg_temp.p_archive_dump('s1','t1', '{1,2}','s2');
s1 is the source schema, t1 is source table, {1,2} is the unique key you want to extract to the new table. s2 is the target schema

Error when creating a generated column in Postgresql

CREATE TABLE Person (
id serial primary key,
accNum text UNIQUE GENERATED ALWAYS AS (
concat(right(cast(extract year from current_date) as text), 2), cast(id as text)) STORED
);
Error: generation expression is not immutable
The goal is to populate the accNum field with YYid where YY is the last two letters of the year when the person was added.
I also tried the '||' operator but it was unsuccessful.
As you don't expect the column to be updated, when the row is changed, you can define your own function that generates the number:
create function generate_acc_num(id int)
returns text
as
$$
select to_char(current_date, 'YY')||id::text;
$$
language sql
immutable; --<< this is lying to Postgres!
Note that you should never use this function for any other purpose. Especially not as an index expression.
Then you can use that in a generated column:
CREATE TABLE Person
(
id integer generated always as identity primary key,
acc_num text UNIQUE GENERATED ALWAYS AS (generate_acc_num(id)) STORED
);
As #ScottNeville correctly mentioned:
CURRENT_DATE is not immutable. So it cannot be used int a GENERATED ALWAYS AS expression.
However, you can achieve this using a trigger nevertheless:
demo:db<>fiddle
CREATE FUNCTION accnum_trigger_function()
RETURNS TRIGGER
LANGUAGE PLPGSQL
AS $$
BEGIN
NEW.accNum := right(extract(year from current_date)::text, 2) || NEW.id::text;
RETURN NEW;
END
$$;
CREATE TRIGGER tr_accnum
BEFORE INSERT
ON person
FOR EACH ROW
EXECUTE PROCEDURE accnum_trigger_function();
As #a_horse_with_no_name mentioned correctly in the comments: You can simplify the expression to:
NEW.accNum := to_char(current_date, 'YY') || NEW.id;
I am not exactly sure how to solve this problem (maybe a trigger), but current_date is a stable function not an immutable one. For the generated IDs I believe all function calls must be immutable. You can read more here https://www.postgresql.org/docs/current/xfunc-volatility.html
I dont think any function that gets the date can be immutable as Postgres defines this as "An IMMUTABLE function cannot modify the database and is guaranteed to return the same results given the same arguments forever." This will not be true for anything that returns the current date.
I think your best bet would be to do this with a trigger so on insert it sets the value.

Return variable in postgres select

This covers most use cases How do you use variables in a simple PostgreSQL script? but not the select clause.
This code produces an error column "ct" does not exist"
DO
$$
declare CT timestamp := '2020-09-04 23:59:59';
select CT,5 from job;
$$;
I can see why it would interpret CT as a column name. What's the Postgres syntax required to refer to a variable in the context of the select clause?
I would expect that query to return
'2020-09-04 23:59:59',5
for each row in the job table.
Addendum to the accepted answer
My use case doesn't return rows. Instead, the result of the select is consumed by an insert statement. I'm transforming rows from staging tables into other tables and adding value like the import date and the identity owning the inserts. It's these values that are provided by the variables - they are used in several such transforms and the point of the variable is to let me set each value once up the top of the script.
Because the rows are consumed like this, it turns out that I don't need a function wrapping this code. It's a bit inconvenient to test since I can't run the select and look at the outcome without copying it and pasting in literals, but at least it's possible to use variables. My working script looks like this:
do
$$
declare ct timestamp := '2020-09-04 23:59:59';
declare cb int := 2;
declare iso8601 varchar(50) := 'YYYY-MM-DD HH24:MI:SS';
declare USAdate varchar(50) := 'MM-DD-YYYY HH24:MI:SS';
begin
delete from dozer_wheel_loader_equipment_movement where created = ct;
INSERT INTO dozer_wheel_loader_equipment_movement
(site, primary_category_id, machine, machine_class, x, y, z, timestamp_local, created, created_by)
select site ,mc.id ,machine , machineclass ,x,y,z,to_timestamp(timestamplocal, iso8601), ct, cb
from stage_dozer_csv d join machine_category mc on d.primarycategory = mc.short_code;
...
end
$$
There is a lot of worthwhile related reading at How to declare a variable in a PostgreSQL query
There are few things about variables in PostgreSQL.
Variable can not be used in Plain SQL in Postgres. So you have to use any pl language i.e. plpgsql to use this. You have tried the same in your example.
In your DO block you have missed the Begin and End, So you have to write it like below
DO
$$
begin
declare CT timestamp := '2020-09-04 23:59:59';
select CT,5 from job;
end
$$;
But when you read the official documentation of DO Statement, it says DO will allow to run the anonymous code but it returns void, that's why above code will throw following error:
ERROR: query has no destination for result data
HINT: If you want to discard the results of a SELECT, use PERFORM instead.
CONTEXT: PL/pgSQL function inline_code_block line 4 at SQL statement
So there is only one way - wrap this code block in a Function like below:
create or replace function func() returns table(col1 timestamp, col2 int )
AS
$$
declare ct timestamp := '2020-09-04 23:59:59';
begin
return query
select CT,5 from job;
end;
$$
language plpgsql
and you can call it like below:
select * from func()
DEMO
Conclusion
You can not use variable in normal SQL statement in Postgres.
You have to use any Procedural Language i.e. plpgsql to use variable.
DO Block doesn't return any value so you can not use select statement like above in DO block. It is good for non-returning queries i.e. insert, update, delete or grant etc.
Only way to return a value from procedural language code block is - you have to wrap it in a suitable PostgreSQL Function.

How to migrate oracle's pipelined function into PostgreSQL

As I am newbie to plpgSQL,
I stuck while migrating a Oracle query into PostgreSQL.
Oracle query:
create or replace FUNCTION employee_all_case(
p_ugr_id IN integer,
p_case_type_id IN integer
)
RETURN number_tab_t PIPELINED
-- LANGUAGE 'plpgsql'
-- COST 100
-- VOLATILE
-- AS $$
-- DECLARE
is
l_user_id NUMBER;
l_account_id NUMBER;
BEGIN
l_user_id := p_ugr_id;
l_account_id := p_case_type_id;
FOR cases IN
(SELECT ccase.case_id, ccase.employee_id
FROM ct_case ccase
INNER JOIN ct_case_type ctype
ON (ccase.case_type_id=ctype.case_type_id)
WHERE ccase.employee_id = l_user_id)
LOOP
IF cases.employee_id IS NOT NULL THEN
PIPE ROW (cases.case_id);
END IF;
END LOOP;
RETURN;
END;
--$$
When I execute this function then I get the following result
select * from table(select employee_all_case(14533,1190) from dual)
My question here is: I really do not understand the pipelined function and how can I obtain the same result in PostgreSQL as Oracle query ?
Please help.
Thank you guys, your solution was very helpful.
I found the desire result:
-- select * from employee_all_case(14533,1190);
-- drop function employee_all_case
create or replace FUNCTION employee_all_case(p_ugr_id IN integer ,p_case_type_id IN integer)
returns table (case_id double precision)
-- PIPELINED
LANGUAGE 'plpgsql'
COST 100
VOLATILE
AS $$
DECLARE
-- is
l_user_id integer;
l_account_id integer;
BEGIN
l_user_id := cp_lookup$get_user_id_from_ugr_id(p_ugr_id);
l_account_id := cp_lookup$acctid_from_ugr(p_ugr_id);
RETURN QUERY SELECT ccase.case_id
FROM ct_case ccase
INNER JOIN ct_case_type ctype ON ccase.case_type_id = ctype.case_type_id
WHERE ccase.employee_id = p_ugr_id
and ccase.employee_id IS NOT NULL;
--return NEXT;
END;
$$
You would rewrite that to a set returning function:
Change the return type to
RETURNS SETOF integer
and do away with the PIPELINED.
Change the PIPE ROW statement to
RETURN NEXT cases.case_id;
Of course, you will have to do the obvious syntactic changes, like using integer instead of NUMBER and putting the IN before the parameter name.
But actually, it is quite unnecessary to write a function for that. Doing it in a single SELECT statement would be both simpler and faster.
Pipelined functions are best translated to a simple SQL function returning a table.
Something like this:
create or replace function employee_all_case(p_ugr_id integer, p_case_type_IN integer)
returns table (case_id integer)
as
$$
SELECT ccase.case_id
FROM ct_case ccase
INNER JOIN ct_case_type ctype ON ccase.case_type_id = ctype.case_type_id
WHERE ccase.employee_id = p_ugr_id
and cases.employee_id IS NOT NULL;
$$
language sql;
Note that your sample code did not use the second parameter p_case_type_id.
Usage is also more straightforward:
select *
from employee_all_case(14533,1190);
Before diving into the solution, I will provide some information which will help you to understand better.
So basically PIPELINED came into picture for improving memory allocation at run time.
As you all know collections will occupy space when ever they got created. So the more you use, the more memory will get allocated.
Pipelining negates the need to build huge collections by piping rows out of the function.
saving memory and allowing subsequent processing to start before all the rows are generated.
Pipelined table functions include the PIPELINED clause and use the PIPE ROW call to push rows out of the function as soon as they are created, rather than building up a table collection.
By using Pipelined how memory usage will be optimized?
Well, it's very simple. instead of storing data into an array, just process the data by using pipe row(desired type). This actually returns the row and process the next row.
coming to solution in plpgsql
simple but not preferred while storing large data.
Remove PIPELINED from return declaration and return an array of desired type. something like RETURNS typrec2[].
Where ever you are using pipe row(), add that entry to array and finally return that array.
create a temp table like
CREATE TEMPORARY TABLE temp_table (required fields) ON COMMIT DROP;
and insert data into it. Replace pipe row with insert statement and finally return statement like
return query select * from temp_table
**The best link for understanding PIPELINED in oracle [https://oracle-base.com/articles/misc/pipelined-table-functions]
pretty ordinary for postgres reference [http://manojadinesh.blogspot.com/2011/11/pipelined-in-oracle-as-well-in.html]
Hope this helps some one conceptually.

postgres inbuilt vs user defined function performance

I have to count the number of bits in postgresql on very large integer columns for which i wrote a postgresql funtion to count the number of bits in an integer.
CREATE OR REPLACE FUNCTION bitcount(i integer) RETURNS integer AS $$
DECLARE n integer;
DECLARE bitCount integer;
BEGIN
bitCount := 0;
LOOP
IF i = 0 THEN
EXIT;
END IF;
i := i & (i-1);
bitCount:= bitCount+1;
END LOOP;
RETURN bitCount;
END
$$ LANGUAGE plpgsql;
but i found one more way to do this using pg's inbuilt functions as well
like
SELECT char_length( replace(100::bit(31)::TEXT, '0', ''));
so i decided to compare performance of both of the ways
so i used below queries to compare
First
SELECT a.n, bitcount(a.n)
from generate_series(1, 100000) as a(n);
Second
SELECT a.n, char_length( replace(a.n::bit(31)::TEXT, '0', ''))
FROM generate_series(1, 100000) as a(n);
I was expecting that First method will outperform the second one
but to my surprise both of them performed almost same. In fact on my machine second one always completed a few seconds faster even with large number of integers.
can anyone explain me why second is almost as fast as first despite of using cast and string operation ?
I'd say it is because PL/pgSQL is known to be slow.
Write the function in PL/Perl or PL/Python for better performance.