postgres plpgsql how to properly convert function to use FORMAT in DECLARE - postgresql

I am writing a function in POSTGRES v13.3 that when passed an array of column names returns an array of JSONB objects each with the distinct values of one of the columns. I have an existing script that I wish to refactor using FORMAT in the declaration portion of the function.
The existing and working function looks like below. It is passed an array of columns and a dbase name. The a loop presents each column name to an EXECUTE statement that uses JSONB_AGG on the distinct values in the column, creates a JSONB object, and appends that to an array. The array is returned on completion. This is the function:
CREATE OR REPLACE FUNCTION foo1(text[], text)
RETURNS text[] as $$
declare
col text;
interim jsonb;
temp jsonb;
y jsonb[];
begin
foreach col in array $1
loop
execute
'select jsonb_agg(distinct '|| col ||') from ' || $2 into interim;
temp := jsonb_build_object(col, interim);
y := array_append(y,temp);
end loop;
return y;
end;
$$ LANGUAGE plpgsql;
I have refactored the function to the following. The script is now in the DECLARE portion of the function.
CREATE OR REPLACE FUNCTION foo2(_cols text[], _db text)
RETURNS jsonb[]
LANGUAGE plpgsql as
$func$
DECLARE
_script text := format(
'select jsonb_agg( distinct $1) from %1$I', _db
);
col text;
interim jsonb;
temp jsonb;
y jsonb[];
BEGIN
foreach col in array _cols
loop
EXECUTE _script USING col INTO interim;
temp := jsonb_build_object(col, interim);
y := array_append(y,temp);
end loop;
return y;
END
$func$;
Unfortunately the two functions give different results on a toy data set (see bottom):
Original: {"{\"id\": [1, 2, 3]}","{\"val\": [1, 2]}"}
Refactored: {{"id": ["id"]},{"val": ["val"]}}
Here is a db<>fiddle of the preceding.
The challenge is in the EXECUTE. In the first instance the col argument is treated as a column identifier. In the refactored function it seems to be treated as just a text string. I think my approach is consistent with the docs and tutorials (example), and the answer from this forum here and the links included therein. I have tried playing around with combinations of ", ', and || but those were unsuccessful and don't make sense in a format statement.
Where should I be looking for the error in my use of FORMAT?
NOTE 1: From the docs I have so possibly the jsonagg() and distinct are what's preventing the behaviour I want:
Another restriction on parameter symbols is that they only work in SELECT, INSERT, UPDATE, and DELETE commands. In other statement types (generically called utility statements), you must insert values textually even if they are just data values.
TOY DATA SET:
drop table if exists example;
create temporary table example(id int, str text, val integer);
insert into example values
(1, 'a', 1),
(2, 'a', 2),
(3, 'b', 2);

https://www.postgresql.org/docs/14/plpgsql-statements.html#PLPGSQL-STATEMENTS-SQL-ONEROW
The command string can use parameter values, which are referenced in
the command as $1, $2, etc. These symbols refer to values supplied in
the USING clause.
What you want is paramterize sql identifier(column name).
You cannot do that. Access column using variable instead of explicit column name
Which means that select jsonb_agg( distinct $1) from %1$I In here "$1" must be %I type. USING Expression in the manual (EXECUTE command-string [ INTO [STRICT] target ] [ USING expression [, ... ] ];) will pass the literal value to it. But it's valid because select distinct 'hello world' from validtable is valid.
select jsonb_agg( distinct $1) from %1$I In here $1 must be same type as %1$I namely-> sql identifier.
--
Based on the following debug code, then you can solve your problem:
CREATE OR REPLACE FUNCTION foo2(_cols text[], _db text)
RETURNS void
LANGUAGE plpgsql as
$func$
DECLARE
col text;
interim jsonb;temp jsonb; y jsonb[];
BEGIN
foreach col in array _cols
loop
EXECUTE format( 'select jsonb_agg( distinct ( %1I ) ) from %2I', col,_db) INTO interim;
raise info 'interim: %', interim;
temp := jsonb_build_object(col, interim);
raise info 'temp: %', temp;
y := array_append(y,temp);
raise info 'y: %',y;
end loop;
END
$func$;

Related

PostgreSQL: Iterating over array of text and executing SQL

I am copying tables from one schema to another. I am trying to pass argument of name of tables that I want to copy. But no table is created in Schema when I execute the CALL.
Command: CALL copy_table('firstname', 'tableName1,tableName2,tableName3');
CREATE OR REPLACE PROCEDURE copy_table(user VARCHAR(50), strs TEXT)
LANGUAGE PLPGSQL
AS $$
DECLARE
my_array TEXT;
BEGIN
FOR my_array IN
SELECT string_to_array(strs, ',')
LOOP
EXECUTE 'CREATE TABLE ' || user || '.' || my_array || ' (LIKE public.' || my_array || ' INCLUDING ALL)';
END LOOP;
$$
Could you please help? Thank you.
The function string_to_array returns an array value. Looping through arrays is performed by FOREACH command, not FOR.
See documentation:
https://www.postgresql.org/docs/14/plpgsql-control-structures.html#PLPGSQL-FOREACH-ARRAY
CREATE FUNCTION sum(int[]) RETURNS int8 AS $$
DECLARE
s int8 := 0;
x int;
BEGIN
FOREACH x IN ARRAY $1
LOOP
s := s + x;
END LOOP;
RETURN s;
END;
$$ LANGUAGE plpgsql;
Loop over an array with FOREACH like Simon suggested. Or with FOR in old (or any) versions. See:
Iterating over integer[] in PL/pgSQL
Typically, a set-based solution is shorter and faster, though:
CREATE OR REPLACE PROCEDURE copy_tables(_schema text, VARIADIC _tables text[])
LANGUAGE plpgsql AS
$proc$
BEGIN
EXECUTE
(SELECT string_agg(format('CREATE TABLE %1$I.%2$I (LIKE public.%2$I INCLUDING ALL)', _schema, t), E';\n')
FROM unnest(_tables) t);
END
$proc$;
About VARIADIC:
Return rows matching elements of input array in plpgsql function
Call, passing list of table names:
CALL copy_tables('firstname', 'tableName1', 'tableName2', 'tableName3');
Or, passing genuine array:
CALL copy_tables('foo', VARIADIC '{tableName1,tableName2,tableName3}');
Or, passing (and converting) comma-separated string (your original input):
CALL copy_tables('foo', VARIADIC string_to_array('tableName1,tableName2,tableName3', ','));
I use format() to concatenate the SQL string safely. Note that identifiers must be passed as case-sensitive strings! See:
Define table and column names as arguments in a plpgsql function?
SQL injection in Postgres functions vs prepared queries

Postgres function - using table mapping to update value

I have a general question. I have a function that creates a file. However, within that function presently I am hard coding the file name pattern based on argument inputs. Now I have come to a point where I need to have more than one file name pattern. I devised a way of using another table as the file name map that the function can simply call if the user inputs that file name pattern id. Here is my example to help better illustrate my point:
Here is the table creation and data insertion for referential purposes:
create table some_schema.file_mapper(
mapper_id integer not null,
file_name_template varchar);
insert into some_schema.file_mapper (mapper_id, file_name_template)
values
(1, '||v_first_name||''_''||v_last_name||')
(2, '||v_last_name||''_''||v_first_name||')
(3, '||v_last_name||''_''||v_last_name||');
Now the function itself
create or replace function some_schema.some_function(integer)
returns varchar as
$body$
Declare
v_filename_config_id alias for $1;
v_filename varchar;
v_first_name varchar;
v_last_name varchar;
cmd text;
Begin
v_first_name :='Joe';
v_last_name :='Shmoe';
cmd := 'select file_name_template
from some_schema.file_mapper
where mapper_id = '||v_filename_config||'';
execute cmd into v_filename;
raise notice 'checking "%"',v_filename;
return 'done';
end;
$body$
LANGUAGE plpgsql;
Now that I have this. I want to be able to mix and match file name patterns. So for instance I wanted to use mapper_id 3, I would expect a returned file of "Shmoe_Shmoe.csv" if I execute the script:
select from some_schema.some_function(2)
The Problem is whenever I get it to read the "v_filename" variable it will not evaluate and return the values from the function's variables. Originally, I believed it to be a quoting issue(and it probably still is). After messing with the quoting I have gotten about as far the error below:
ERROR: syntax error at or near "_"
LINE 4: ...s/dir/someplace/||v_last_name||'_'||v_firs...
^
QUERY: copy(
select *
from some_schema.some_table)
to '/dir/someplace/||v_last_name||'_'||v_first_name||.csv/;
DELIMITER,
csv HEADER;
CONTEXT: PL/pgSQL function some_schema.some_function(integer) line 27 at EXECUTE statement
As you can tell it is pretty much telling me it is a quoting issue. Is there a way I can get the function to properly evaluate the variable and return the proper file name? Let me know if I am not clear and need to elaborate.
This more or less illustrates the usage of format(). (Slightly reduced wrt the original question):
CREATE TABLE file_mapper
( mapper_id INTEGER NOT NULL PRIMARY KEY
, file_name_template TEXT
);
INSERT INTO file_mapper(mapper_id, file_name_template) VALUES (1,'one'), (2, 'two');
CREATE OR REPLACE FUNCTION dump_the_shit(INTEGER)
RETURNS VARCHAR AS
$body$
DECLARE
v_filename_config_id alias for $1;
v_filename VARCHAR;
name_cmd TEXT;
copy_cmd TEXT;
BEGIN
name_cmd := format( e'select file_name_template
from file_mapper
where mapper_id = %L;', v_filename_config_id );
EXECUTE name_cmd into v_filename;
RAISE NOTICE 'V_filename := %', v_filename;
copy_cmd := format( e'copy(
select *
from %I)
to \'/tmp/%s.csv\'
csv HEADER;' , 'file_mapper' , v_filename);
EXECUTE copy_cmd;
RETURN copy_cmd;
END;
$body$
LANGUAGE plpgsql;
SELECT dump_the_shit(1);
format function description
summary:
use %I for identifiers (tablenames and column names) [ FROM %I ]
use %L for literals such as query constants [ WHERE the_date = %L ]
use %s for ordinary strings [ to \'/tmp/%s.csv\' ]

Postgres - Dynamically referencing columns from record variable

I'm having trouble referencing record variable type columns dynamically. I found loads of tricks online, but with regards to triggers mostly and I really hope the answer isn't "it can't be done"... I've got a very specific and simple need, see my example code below;
First I have an array containing a list of column names called "lCols". I loop through a record variable to traverse my data, replacing values in a paragraph which exactly match my column names.
DECLARE lTotalRec RECORD;
DECLARE lSQL text;
DECLARE lCols varchar[];
p_paragraph:= 'I am [names] and my surname is [surname]';
lSQL :=
'select
p.names,
p.surname
from
person p
';
FOR lTotalRec IN
execute lSQL
LOOP
-- Loop through the already created array of columns to replace the values in the paragraph
FOREACH lVal IN ARRAY lCols
LOOP
p_paragraph := replace(p_paragraph,'[' || lVal || ']',lTotalRec.lVal); -- This is where my problem is, because lVal is not a column of lTotalRec directly like this
END LOOP;
RETURN NEXT;
END LOOP;
My return value is the paragraph amended for each record in "lTotalRec"
You could convert your record to a json value using the row_to_json() function. Once in this format, you can extract columns by name, using the -> and ->> operators.
In Postgres 9.4 and up, you can also make use of the more efficient jsonb type.
DECLARE lJsonRec jsonb;
...
FOR lTotalRec IN
execute lSQL
LOOP
lJsonRec := row_to_json(lTotalRec)::jsonb;
FOREACH lVal IN ARRAY lCols
LOOP
p_paragraph := replace(p_paragraph, '[' || lVal || ']', lJsonRec->>lVal);
END LOOP;
RETURN NEXT;
END LOOP;
See the documentation for more details.
You can convert a row to JSON using row_to_json(), and then retrieve the column names using json_object_keys().
Here's an example:
drop table if exists TestTable;
create table TestTable (col1 text, col2 text);
insert into TestTable values ('a1', 'b1'), ('a2', 'b2');
do $$declare
sql text;
rec jsonb;
col text;
val text;
begin
sql := 'select row_to_json(row) from (select * from TestTable) row';
for rec in execute sql loop
for col in select * from jsonb_object_keys(rec) loop
val := rec->>col;
raise notice 'col=% val=%', col, val;
end loop;
end loop;
end$$;
This prints:
NOTICE: col=col1 val=a1
NOTICE: col=col2 val=b1
NOTICE: col=col1 val=a2
NOTICE: col=col2 val=b2
DO

Create a function to get column from multiple tables in PostgreSQL

I'm trying to create a function to get a field value from multiple tables in my database. I made script like this:
CREATE OR REPLACE FUNCTION get_all_changes() RETURNS SETOF RECORD AS
$$
DECLARE
tblname VARCHAR;
tblrow RECORD;
row RECORD;
BEGIN
FOR tblrow IN SELECT tablename FROM pg_catalog.pg_tables WHERE schemaname='public' LOOP /*FOREACH tblname IN ARRAY $1 LOOP*/
RAISE NOTICE 'r: %', tblrow.tablename;
FOR row IN SELECT MAX("lastUpdate") FROM tblrow.tablename LOOP
RETURN NEXT row;
END LOOP;
END LOOP;
END
$$
LANGUAGE 'plpgsql' ;
SELECT get_all_changes();
But it is not working, everytime it shows this error
tblrow.tablename" not defined in line "FOR row IN SELECT MAX("lastUpdate") FROM tblrow.tablename LOOP"
Your inner FOR loop must use the FOR...EXECUTE syntax as shown in the manual:
FOR target IN EXECUTE text_expression [ USING expression [, ... ] ] LOOP
statements
END LOOP [ label ];
In your case something along this line:
FOR row IN EXECUTE 'SELECT MAX("lastUpdate") FROM ' || quote_ident(tblrow.tablename) LOOP
RETURN NEXT row;
END LOOP
The reason for this is explained in the manual somewhere else:
Oftentimes you will want to generate dynamic commands inside your PL/pgSQL functions, that is, commands that will involve different tables or different data types each time they are executed. PL/pgSQL's normal attempts to cache plans for commands (as discussed in Section 39.10.2) will not work in such scenarios. To handle this sort of problem, the EXECUTE statement is provided[...]
Answer to your new question (mislabeled as answer):
This can be much simpler. You do not need to create a table just do define a record type.
If at all, you would better create a type with CREATE TYPE, but that's only efficient if you need the type in multiple places. For just a single function, you can use RETURNS TABLE instead :
CREATE OR REPLACE FUNCTION get_all_changes(text[])
RETURNS TABLE (tablename text
,"lastUpdate" timestamp with time zone
,nums integer) AS
$func$
DECLARE
tblname text;
BEGIN
FOREACH tblname IN ARRAY $1 LOOP
RETURN QUERY EXECUTE format(
$f$SELECT '%I', MAX("lastUpdate"), COUNT(*)::int FROM %1$I
$f$, tblname)
END LOOP;
END
$func$ LANGUAGE plpgsql;
A couple more points:
Use RETURN QUERY EXECUTE instead of the nested loop. Much simpler and faster.
Column aliases would only serve as documentation, those names are discarded in favor of the names declared in the RETURNS clause (directly or indirectly).
Use format() with %I to replace the concatenation with quote_ident() and %1$I to refer to the same parameter another time.
count() usually returns type bigint. Cast the integer, since you defined the column in the return type as such: count(*)::int.
Thanks,
I finally made my script like:
CREATE TABLE IF NOT EXISTS __rsdb_changes (tablename text,"lastUpdate" timestamp with time zone, nums bigint);
CREATE OR REPLACE FUNCTION get_all_changes(varchar[]) RETURNS SETOF __rsdb_changes AS /*TABLE (tablename varchar(40),"lastUpdate" timestamp with time zone, nums integer)*/
$$
DECLARE
tblname VARCHAR;
tblrow RECORD;
row RECORD;
BEGIN
FOREACH tblname IN ARRAY $1 LOOP
/*RAISE NOTICE 'r: %', tblrow.tablename;*/
FOR row IN EXECUTE 'SELECT CONCAT('''|| quote_ident(tblname) ||''') AS tablename, MAX("lastUpdate") AS "lastUpdate",COUNT(*) AS nums FROM ' || quote_ident(tblname) LOOP
/*RAISE NOTICE 'row.tablename: %',row.tablename;*/
/*RAISE NOTICE 'row.lastUpdate: %',row."lastUpdate";*/
/*RAISE NOTICE 'row.nums: %',row.nums;*/
RETURN NEXT row;
END LOOP;
END LOOP;
RETURN;
END
$$
LANGUAGE 'plpgsql' ;
Well, it works. But it seems I can only create a table to define the return structure instead of just RETURNS SETOF RECORD. Am I right?
Thanks again.

Reference a column of a record, from a variable holding the column name

In a function in plpgsql language and having:
"col" CHARACTER VARYING = 'a'; (can be 'b')
"rec" RECORD; Holding records from: (SELECT 1 AS "a", 2 AS "b")
"res" INTEGER;
I need to reference the named column in "col", from "rec". So if "col" has 'b' I will reference "rec"."b", and then save its value into "res".
You cannot reference the columns of an anonymous record type by name in plpgsql. Even though you spell out column aliases in your example in the SELECT statement, those are just noise and discarded.
If you want to reference elements of a record type by name, you need to use a well-known type. Either create and use a type:
CREATE TYPE my_composite_type(a int, b int);
Or use the row type associated with any existing table. You can just write the table name as data type.
DECLARE
rec my_composite_type;
...
Then you need a conditional statement or dynamic SQL to use the value of "col" as identifier.
Conditional statement:
IF col = 'a' THEN
res := rec.a;
ELSIF col = 'b' THEN
res := rec.b;
ELSE
RAISE EXCEPTION 'Unexpected value in variable "col": %', col;
END IF;
For just two possible cases, that's the way to go.
Or dynamic SQL:
EXECUTE 'SELECT $1.' || col
INTO res
USING rec;
I don't see a problem here, but be wary of SQL injection with dynamic SQL. If col can hold arbitrary data, you need to escape it with quote_ident() or format()
Demo
Demonstrating the more powerful, but also trickier variant with dynamic SQL.
Quick & dirty way to create a (temporary!) known type for testing:
CREATE TEMP TABLE rec_ab(a int, b int);
Function:
CREATE OR REPLACE FUNCTION f_test()
RETURNS integer AS
$func$
DECLARE
col text := 'a'; -- can be 'b'
rec rec_ab;
res int;
BEGIN
rec := '(1, 2)'::rec_ab;
EXECUTE 'SELECT $1.' || col
INTO res
USING rec;
RETURN res;
END
$func$
LANGUAGE plpgsql VOLATILE;
Call:
SELECT f_test();
Returns:
f_test
----
1