I have a general question. I have a function that creates a file. However, within that function presently I am hard coding the file name pattern based on argument inputs. Now I have come to a point where I need to have more than one file name pattern. I devised a way of using another table as the file name map that the function can simply call if the user inputs that file name pattern id. Here is my example to help better illustrate my point:
Here is the table creation and data insertion for referential purposes:
create table some_schema.file_mapper(
mapper_id integer not null,
file_name_template varchar);
insert into some_schema.file_mapper (mapper_id, file_name_template)
values
(1, '||v_first_name||''_''||v_last_name||')
(2, '||v_last_name||''_''||v_first_name||')
(3, '||v_last_name||''_''||v_last_name||');
Now the function itself
create or replace function some_schema.some_function(integer)
returns varchar as
$body$
Declare
v_filename_config_id alias for $1;
v_filename varchar;
v_first_name varchar;
v_last_name varchar;
cmd text;
Begin
v_first_name :='Joe';
v_last_name :='Shmoe';
cmd := 'select file_name_template
from some_schema.file_mapper
where mapper_id = '||v_filename_config||'';
execute cmd into v_filename;
raise notice 'checking "%"',v_filename;
return 'done';
end;
$body$
LANGUAGE plpgsql;
Now that I have this. I want to be able to mix and match file name patterns. So for instance I wanted to use mapper_id 3, I would expect a returned file of "Shmoe_Shmoe.csv" if I execute the script:
select from some_schema.some_function(2)
The Problem is whenever I get it to read the "v_filename" variable it will not evaluate and return the values from the function's variables. Originally, I believed it to be a quoting issue(and it probably still is). After messing with the quoting I have gotten about as far the error below:
ERROR: syntax error at or near "_"
LINE 4: ...s/dir/someplace/||v_last_name||'_'||v_firs...
^
QUERY: copy(
select *
from some_schema.some_table)
to '/dir/someplace/||v_last_name||'_'||v_first_name||.csv/;
DELIMITER,
csv HEADER;
CONTEXT: PL/pgSQL function some_schema.some_function(integer) line 27 at EXECUTE statement
As you can tell it is pretty much telling me it is a quoting issue. Is there a way I can get the function to properly evaluate the variable and return the proper file name? Let me know if I am not clear and need to elaborate.
This more or less illustrates the usage of format(). (Slightly reduced wrt the original question):
CREATE TABLE file_mapper
( mapper_id INTEGER NOT NULL PRIMARY KEY
, file_name_template TEXT
);
INSERT INTO file_mapper(mapper_id, file_name_template) VALUES (1,'one'), (2, 'two');
CREATE OR REPLACE FUNCTION dump_the_shit(INTEGER)
RETURNS VARCHAR AS
$body$
DECLARE
v_filename_config_id alias for $1;
v_filename VARCHAR;
name_cmd TEXT;
copy_cmd TEXT;
BEGIN
name_cmd := format( e'select file_name_template
from file_mapper
where mapper_id = %L;', v_filename_config_id );
EXECUTE name_cmd into v_filename;
RAISE NOTICE 'V_filename := %', v_filename;
copy_cmd := format( e'copy(
select *
from %I)
to \'/tmp/%s.csv\'
csv HEADER;' , 'file_mapper' , v_filename);
EXECUTE copy_cmd;
RETURN copy_cmd;
END;
$body$
LANGUAGE plpgsql;
SELECT dump_the_shit(1);
format function description
summary:
use %I for identifiers (tablenames and column names) [ FROM %I ]
use %L for literals such as query constants [ WHERE the_date = %L ]
use %s for ordinary strings [ to \'/tmp/%s.csv\' ]
Related
I am writing a function in POSTGRES v13.3 that when passed an array of column names returns an array of JSONB objects each with the distinct values of one of the columns. I have an existing script that I wish to refactor using FORMAT in the declaration portion of the function.
The existing and working function looks like below. It is passed an array of columns and a dbase name. The a loop presents each column name to an EXECUTE statement that uses JSONB_AGG on the distinct values in the column, creates a JSONB object, and appends that to an array. The array is returned on completion. This is the function:
CREATE OR REPLACE FUNCTION foo1(text[], text)
RETURNS text[] as $$
declare
col text;
interim jsonb;
temp jsonb;
y jsonb[];
begin
foreach col in array $1
loop
execute
'select jsonb_agg(distinct '|| col ||') from ' || $2 into interim;
temp := jsonb_build_object(col, interim);
y := array_append(y,temp);
end loop;
return y;
end;
$$ LANGUAGE plpgsql;
I have refactored the function to the following. The script is now in the DECLARE portion of the function.
CREATE OR REPLACE FUNCTION foo2(_cols text[], _db text)
RETURNS jsonb[]
LANGUAGE plpgsql as
$func$
DECLARE
_script text := format(
'select jsonb_agg( distinct $1) from %1$I', _db
);
col text;
interim jsonb;
temp jsonb;
y jsonb[];
BEGIN
foreach col in array _cols
loop
EXECUTE _script USING col INTO interim;
temp := jsonb_build_object(col, interim);
y := array_append(y,temp);
end loop;
return y;
END
$func$;
Unfortunately the two functions give different results on a toy data set (see bottom):
Original: {"{\"id\": [1, 2, 3]}","{\"val\": [1, 2]}"}
Refactored: {{"id": ["id"]},{"val": ["val"]}}
Here is a db<>fiddle of the preceding.
The challenge is in the EXECUTE. In the first instance the col argument is treated as a column identifier. In the refactored function it seems to be treated as just a text string. I think my approach is consistent with the docs and tutorials (example), and the answer from this forum here and the links included therein. I have tried playing around with combinations of ", ', and || but those were unsuccessful and don't make sense in a format statement.
Where should I be looking for the error in my use of FORMAT?
NOTE 1: From the docs I have so possibly the jsonagg() and distinct are what's preventing the behaviour I want:
Another restriction on parameter symbols is that they only work in SELECT, INSERT, UPDATE, and DELETE commands. In other statement types (generically called utility statements), you must insert values textually even if they are just data values.
TOY DATA SET:
drop table if exists example;
create temporary table example(id int, str text, val integer);
insert into example values
(1, 'a', 1),
(2, 'a', 2),
(3, 'b', 2);
https://www.postgresql.org/docs/14/plpgsql-statements.html#PLPGSQL-STATEMENTS-SQL-ONEROW
The command string can use parameter values, which are referenced in
the command as $1, $2, etc. These symbols refer to values supplied in
the USING clause.
What you want is paramterize sql identifier(column name).
You cannot do that. Access column using variable instead of explicit column name
Which means that select jsonb_agg( distinct $1) from %1$I In here "$1" must be %I type. USING Expression in the manual (EXECUTE command-string [ INTO [STRICT] target ] [ USING expression [, ... ] ];) will pass the literal value to it. But it's valid because select distinct 'hello world' from validtable is valid.
select jsonb_agg( distinct $1) from %1$I In here $1 must be same type as %1$I namely-> sql identifier.
--
Based on the following debug code, then you can solve your problem:
CREATE OR REPLACE FUNCTION foo2(_cols text[], _db text)
RETURNS void
LANGUAGE plpgsql as
$func$
DECLARE
col text;
interim jsonb;temp jsonb; y jsonb[];
BEGIN
foreach col in array _cols
loop
EXECUTE format( 'select jsonb_agg( distinct ( %1I ) ) from %2I', col,_db) INTO interim;
raise info 'interim: %', interim;
temp := jsonb_build_object(col, interim);
raise info 'temp: %', temp;
y := array_append(y,temp);
raise info 'y: %',y;
end loop;
END
$func$;
begin;
create type public.ltree as (a int, b int);
create table public.parent_tree(parent_id int,l_tree ltree);
insert into public.parent_tree values(1,(2,2)),(2,(1,2)),(3, (1,28));
commit;
Trying to replicate the solution in this answer:
Format specifier for integer variables in format() for EXECUTE?
For a function with composite type:
CREATE OR REPLACE FUNCTION public.get_parent_ltree
(_parent_id int, tbl_name regclass , OUT _l_tree ltree)
LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE format('SELECT l_tree FROM %s WHERE parent_id = $1', tbl_name)
INTO _l_tree
USING _parent_id;
END
$func$;
The effective query executed:
select l_tree from parent_tree where parent_id = 1;
Executing the function:
select get_parent_ltree(1,'parent_tree');
select get_parent_ltree(1,'public.parent_tree');
I get this error:
ERROR: invalid input syntax for type integer: "(2,2)"
CONTEXT: PL/pgSQL function get_parent_ltree(integer,regclass) line 3 at EXECUTE
Context of line 3:
The output parameter _l_tree is a "row variable". (A composite type is treated as row variable.) SELECT INTO assigns fields of a row variable one-by-one. The manual:
The optional target is a record variable, a row variable, or a comma-separated list of simple variables and record/row fields, [...]
So, currently (pg 14), a row or record variables must stand alone as target. Or as the according Postgres error message would put it:
ERROR: record variable cannot be part of multiple-item INTO list
This works:
CREATE OR REPLACE FUNCTION public.get_parent_ltree (IN _parent_id int
, IN _tbl_name regclass
, OUT _l_tree ltree)
LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE format('SELECT (l_tree).* FROM %s WHERE parent_id = $1', _tbl_name)
INTO _l_tree
USING _parent_id;
END
$func$;
Or this:
CREATE OR REPLACE FUNCTION public.get_parent_ltree2 (IN _parent_id int
, IN _tbl_name regclass
, OUT _l_tree ltree)
LANGUAGE plpgsql AS
$func$
BEGIN
EXECUTE format('SELECT (l_tree).a, (l_tree).b FROM %s WHERE parent_id = $1', _tbl_name)
INTO _l_tree.a, _l_tree.b
USING _parent_id;
END
$func$;
db<>fiddle here
I agree that this is rather tricky. One might expect that a composite field is treated as a single field (like a simple type). But that's currently not so in PL/pgSQL assignments.
A related quote from the manual about composite types:
A composite type represents the structure of a row or record; it is
essentially just a list of field names and their data types.
PostgreSQL allows composite types to be used in many of the same ways that simple types can be used.
Bold emphasis mine.
Many. Not all.
Related:
Use of custom return types in a FOR loop in plpgsql
PostgreSQL: ERROR: 42601: a column definition list is required for functions returning "record"
Aside: Consider the additional module ltree instead of "growing your own". And if you continue working with your own composite type, consider a different name to avoid confusion / conflict with that module.
It looks like a Postgres bug but Erwin clarifies the issue in the adjacent answer. One of the natural workarounds is to use an auxiliary text variable in the way like this:
create or replace function get_parent_ltree(_parent_id int, tbl_name regclass)
returns ltree language plpgsql as
$func$
declare
rslt text;
begin
execute format('select l_tree from %s where parent_id = $1', tbl_name)
into rslt
using _parent_id;
return rslt::ltree;
end
$func$;
I'm new to plpgsql and I'm trying to create function that will check if a certain value exists in table and if not will add a row.
CREATE OR REPLACE FUNCTION hire(
id_pracownika integer,
imie character varying,
nazwisko character varying,
miasto character varying,
pensja real)
RETURNS TEXT AS
$BODY$
DECLARE
wynik TEXT;
sprawdzenie INT;
BEGIN
sprawdzenie = id_pracownika;
IF EXISTS (SELECT id_pracownika FROM pracownicy WHERE id_pracownika=sprawdzenie) THEN
wynik = "JUZ ISTNIEJE";
RETURN wynik;
ELSE
INSERT INTO pracownicy(id_pracownika,imie,nazwisko,miasto,pensja)
VALUES (id_pracownika,imie,nazwisko,miasto,pensja);
wynik = "OK";
RETURN wynik;
END IF;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
The issue is that I'm getting errors saying that id_pracownika is a column name and a variable.
How to specify that "id_pracownika" in such context refers to column name?
Assuming id_pracownika is The PRIMARY KEY of the table. Or at least defined UNIQUE. (If it's not NOT NULL, NULL is a corner case.)
SELECT or INSERT
Your function is another implementation of "SELECT or INSERT" - a variant of the UPSERT problem, which is more complex in the face of concurrent write load than it might seem. See:
Is SELECT or INSERT in a function prone to race conditions?
With UPSERT in Postgres 9.5 or later
In Postgres 9.5 or later use UPSERT (INSERT ... ON CONFLICT ...) Details in the Postgres Wiki. This new syntax does a clean job:
CREATE OR REPLACE FUNCTION hire(
_id_pracownika integer
, _imie varchar
, _nazwisko varchar
, _miasto varchar
, _pensja real)
RETURNS text
LANGUAGE plpgsql AS
$func$
BEGIN
INSERT INTO pracownicy
( id_pracownika, imie, nazwisko, miasto, pensja)
VALUES (_id_pracownika,_imie,_nazwisko,_miasto,_pensja)
ON CONFLICT DO NOTHING;
IF FOUND THEN
RETURN 'OK';
ELSE
RETURN 'JUZ ISTNIEJE'; -- already exists
END IF;
END
$func$;
About the special variable FOUND:
Why is IS NOT NULL false when checking a row type?
Table-qualify column names to disambiguate where necessary. (You can also prefix function parameters with the function name, but that gets awkward quickly.)
But column names in the target list of an INSERT may not be table-qualified. Those are never ambiguous anyway.
Best avoid ambiguities a priori. Some (including me) like to prefix all function parameters and variables with an underscore.
If you positively need a column name as function parameter name, one way to avoid naming collisions is to use an ALIAS inside the function. One of the rare cases where ALIAS is actually useful.
Or reference function parameters by ordinal position: $1 for id_pracownika in this case.
If all else fails, you can decide what takes precedence by setting #variable_conflict. See:
Naming conflict between function parameter and result of JOIN with USING clause
There is more:
There are intricacies to the RETURNING clause in an UPSERT. See:
How to use RETURNING with ON CONFLICT in PostgreSQL?
String literals (text constants) must be enclosed in single quotes: 'OK', not "OK". See:
Insert text with single quotes in PostgreSQL
Assigning variables is comparatively more expensive than in other programming languages. Keep assignments to a minimum for best performance in plpgsql. Do as much as possible in SQL statements directly.
VOLATILE COST 100 are default decorators for functions. No need to spell those out.
Without UPSERT in Postgres 9.4 or older
...
IF EXISTS (SELECT FROM pracownicy p
WHERE p.id_pracownika = hire.id_pracownika) THEN
RETURN 'JUZ ISTNIEJE';
ELSE
INSERT INTO pracownicy(id_pracownika,imie,nazwisko,miasto,pensja)
VALUES (hire.id_pracownika,hire.imie,hire.nazwisko,hire.miasto,hire.pensja);
RETURN 'OK';
END IF;
...
But there is a tiny race condition between the SELECT and the INSERT, so not bullet-proof under heavy concurrent write-load.
In an EXISTS expression, the SELECT list does not matter. SELECT id_pracownika, SELECT 1, or even SELECT 1/0 - all the same. Just use an empty SELECT list. Only the existence of any qualifying row matters. See:
What is easier to read in EXISTS subqueries?
It is a example tested by me where I use EXECUTE to run a select and put its result in a cursor, using dynamic column names.
1. Create the table:
create table people (
nickname varchar(9),
name varchar(12),
second_name varchar(12),
country varchar(30)
);
2. Create the function:
CREATE OR REPLACE FUNCTION fun_find_people (col_name text, col_value varchar)
RETURNS void AS
$BODY$
DECLARE
local_cursor_p refcursor;
row_from_people RECORD;
BEGIN
open local_cursor_p FOR
EXECUTE 'select * from people where '|| col_name || ' LIKE ''' || col_value || '%'' ';
raise notice 'col_name: %',col_name;
raise notice 'col_value: %',col_value;
LOOP
FETCH local_cursor_p INTO row_from_people; EXIT WHEN NOT FOUND;
raise notice 'row_from_people.nickname: %', row_from_people.nickname ;
raise notice 'row_from_people.name: %', row_from_people.name ;
raise notice 'row_from_people.country: %', row_from_people.country;
END LOOP;
END;
$BODY$ LANGUAGE 'plpgsql'
3. Run the function
select fun_find_people('name', 'Cristian');
select fun_find_people('country', 'Chile');
inspire with Erwin Brandstetter's answers.
CREATE OR REPLACE FUNCTION test_upsert(
_parent_id int,
_some_text text)
RETURNS text
LANGUAGE plpgsql AS
$func$
DECLARE a text;
BEGIN
INSERT INTO parent_tree (parent_id, some_text)
VALUES (_parent_id,_some_text)
ON CONFLICT DO NOTHING
RETURNING 'ok' into a;
return a;
IF NOT FOUND THEN
return 'JUZ ISTNIEJE';
END IF;
END
$func$;
Follow Erwin's answer. I make a variable hold the return type text.
If conflict do nothing then the function will return nothing. For example, already have parent_id = 10, Then the result would be as following:
test_upsert
------------
(1 row)
NOT Sure the usage of:
IF NOT FOUND THEN
return 'JUZ ISTNIEJE';
END IF;
I need to do the same deletion or purge operation (based on several conditions) on a set of tables. For that I am trying to pass the table names in an array to a function. I am not sure if I am doing it right. Or is there a better way?
I am pasting just a sample example this is not the real function I have written but the basic is same as below:
CREATE OR REPLACE FUNCTION test (tablename text[]) RETURNS int AS
$func$
BEGIN
execute 'delete * from '||tablename;
RETURN 1;
END
$func$ LANGUAGE plpgsql;
But when I call the function I get an error:
select test( {'rajeev1'} );
ERROR: syntax error at or near "{"
LINE 10: select test( {'rajeev1'} );
^
********** Error **********
ERROR: syntax error at or near "{"
SQL state: 42601
Character: 179
Array syntax
'{rajeev1, rajeev2}' or ARRAY['rajeev1', 'rajeev2']. Read the manual.
TRUNCATE
Since you are deleting all rows from the tables, consider TRUNCATE instead. Per documentation:
Tip: TRUNCATE is a PostgreSQL extension that provides a faster
mechanism to remove all rows from a table.
Be sure to study the details. If TRUNCATE works for you, the whole operation becomes very simple, since the command accepts multiple tables:
TRUNCATE rajeev1, rajeev2, rajeev3, ..
Dynamic DELETE
Else you need dynamic SQL like you already tried. The scary missing detail: you are completely open to SQL injection and catastrophic syntax errors. Use format() with %I (not %s to sanitize identifiers like table names. Or, better yet in this particular case, use an array of regclass as parameter instead:
CREATE OR REPLACE FUNCTION f_del_all(_tbls regclass)
RETURNS void AS
$func$
DECLARE
_tbl regclass;
BEGIN
FOREACH _tbl IN ARRAY _tbls LOOP
EXECUTE format('DELETE * FROM %s', _tbl);
END LOOP;
END
$func$ LANGUAGE plpgsql;
Call:
SELECT f_del_all('{rajeev1,rajeev2,rajeev3}');
Explanation here:
Table name as a PostgreSQL function parameter
You used wrong syntax for text array constant in the function call. But even if it was right, your function is not correct.
If your function has text array as argument you should loop over the array to execute query for each element.
CREATE OR REPLACE FUNCTION test (tablenames text[]) RETURNS int AS
$func$
DECLARE
tablename text;
BEGIN
FOREACH tablename IN ARRAY tablenames LOOP
EXECUTE FORMAT('delete * from %s', tablename);
END LOOP;
RETURN 1;
END
$func$ LANGUAGE plpgsql;
You can then call the function for several tables at once, not only for one.
SELECT test( '{rajeev1, rajeev2}' );
If you do not need this feature, simply change the argument type to text.
CREATE OR REPLACE FUNCTION test (tablename text) RETURNS int AS
$func$
BEGIN
EXECUTE format('delete * from %s', tablename);
RETURN 1;
END
$func$ LANGUAGE plpgsql;
SELECT test('rajeev1');
I recommend using the format function.
If you want to execute a function (say purge_this_one_table(tablename)) on a group of tables identified by similar names you can use this construction:
create or replace function purge_all_these_tables(mask text)
returns void language plpgsql
as $$
declare
tabname text;
begin
for tabname in
select relname
from pg_class
where relkind = 'r' and relname like mask
loop
execute format(
'purge_this_one_table(%s)',
tabname);
end loop;
end $$;
select purge_all_these_tables('agg_weekly_%');
It should be:
select test('{rajeev1}');
I'm trying to create a function to get a field value from multiple tables in my database. I made script like this:
CREATE OR REPLACE FUNCTION get_all_changes() RETURNS SETOF RECORD AS
$$
DECLARE
tblname VARCHAR;
tblrow RECORD;
row RECORD;
BEGIN
FOR tblrow IN SELECT tablename FROM pg_catalog.pg_tables WHERE schemaname='public' LOOP /*FOREACH tblname IN ARRAY $1 LOOP*/
RAISE NOTICE 'r: %', tblrow.tablename;
FOR row IN SELECT MAX("lastUpdate") FROM tblrow.tablename LOOP
RETURN NEXT row;
END LOOP;
END LOOP;
END
$$
LANGUAGE 'plpgsql' ;
SELECT get_all_changes();
But it is not working, everytime it shows this error
tblrow.tablename" not defined in line "FOR row IN SELECT MAX("lastUpdate") FROM tblrow.tablename LOOP"
Your inner FOR loop must use the FOR...EXECUTE syntax as shown in the manual:
FOR target IN EXECUTE text_expression [ USING expression [, ... ] ] LOOP
statements
END LOOP [ label ];
In your case something along this line:
FOR row IN EXECUTE 'SELECT MAX("lastUpdate") FROM ' || quote_ident(tblrow.tablename) LOOP
RETURN NEXT row;
END LOOP
The reason for this is explained in the manual somewhere else:
Oftentimes you will want to generate dynamic commands inside your PL/pgSQL functions, that is, commands that will involve different tables or different data types each time they are executed. PL/pgSQL's normal attempts to cache plans for commands (as discussed in Section 39.10.2) will not work in such scenarios. To handle this sort of problem, the EXECUTE statement is provided[...]
Answer to your new question (mislabeled as answer):
This can be much simpler. You do not need to create a table just do define a record type.
If at all, you would better create a type with CREATE TYPE, but that's only efficient if you need the type in multiple places. For just a single function, you can use RETURNS TABLE instead :
CREATE OR REPLACE FUNCTION get_all_changes(text[])
RETURNS TABLE (tablename text
,"lastUpdate" timestamp with time zone
,nums integer) AS
$func$
DECLARE
tblname text;
BEGIN
FOREACH tblname IN ARRAY $1 LOOP
RETURN QUERY EXECUTE format(
$f$SELECT '%I', MAX("lastUpdate"), COUNT(*)::int FROM %1$I
$f$, tblname)
END LOOP;
END
$func$ LANGUAGE plpgsql;
A couple more points:
Use RETURN QUERY EXECUTE instead of the nested loop. Much simpler and faster.
Column aliases would only serve as documentation, those names are discarded in favor of the names declared in the RETURNS clause (directly or indirectly).
Use format() with %I to replace the concatenation with quote_ident() and %1$I to refer to the same parameter another time.
count() usually returns type bigint. Cast the integer, since you defined the column in the return type as such: count(*)::int.
Thanks,
I finally made my script like:
CREATE TABLE IF NOT EXISTS __rsdb_changes (tablename text,"lastUpdate" timestamp with time zone, nums bigint);
CREATE OR REPLACE FUNCTION get_all_changes(varchar[]) RETURNS SETOF __rsdb_changes AS /*TABLE (tablename varchar(40),"lastUpdate" timestamp with time zone, nums integer)*/
$$
DECLARE
tblname VARCHAR;
tblrow RECORD;
row RECORD;
BEGIN
FOREACH tblname IN ARRAY $1 LOOP
/*RAISE NOTICE 'r: %', tblrow.tablename;*/
FOR row IN EXECUTE 'SELECT CONCAT('''|| quote_ident(tblname) ||''') AS tablename, MAX("lastUpdate") AS "lastUpdate",COUNT(*) AS nums FROM ' || quote_ident(tblname) LOOP
/*RAISE NOTICE 'row.tablename: %',row.tablename;*/
/*RAISE NOTICE 'row.lastUpdate: %',row."lastUpdate";*/
/*RAISE NOTICE 'row.nums: %',row.nums;*/
RETURN NEXT row;
END LOOP;
END LOOP;
RETURN;
END
$$
LANGUAGE 'plpgsql' ;
Well, it works. But it seems I can only create a table to define the return structure instead of just RETURNS SETOF RECORD. Am I right?
Thanks again.