plpgsql : how to reference variable in sql statement - postgresql

I am rather new to PL/pgSQL and don't know how to reference a variable in a SELECT statement.
In this following function the SELECT INTO always returns NULL:
$body$
DECLARE
r RECORD;
c CURSOR FOR select name from t_people;
nb_bills integer;
BEGIN
OPEN c;
LOOP
FETCH c INTO r;
EXIT WHEN NOT FOUND;
RAISE NOTICE 'name found: %', r.name;
SELECT count(bill_id) INTO nb_bills from t_bills where name = r.name;
END LOOP;
END;
$body$
RAISE NOTICE allows me to verify that my CURSOR is working well: names are properly retrieved, but for some reason still unknown to me, not properly passed to the SELECT INTO statement.
For debugging purpose, I tried to replace the variable in SELECT INTO with a constant value and it worked:
SELECT count( bill_id) INTO nb_bills from t_bills where name = 'joe';
I don't know how to reference r.name in the SELECT INTO statement.
I tried r.name, I tried to create another variable containing quotes, it is always returning NULL.
I am stuck. If anyone knows ...

the SELECT INTO always returns NULL
not properly passed to the SELECT INTO statement.
it is always returning NULL.
None of this makes sense.
SELECT statements do not return anything by itself in PL/pgSQL. You have to either assign results to variables or explicitly return results with one of the available RETURN variants.
SELECT INTO is only used for variable assignment and does not return anything, either. Not to be confused with SELECT INTO in plain SQL - which is generally discouraged:
Combine two tables into a new one so that select rows from the other one are ignored
It's not clear what's supposed to be returned in your example. You did not disclose the return type of the function and you did not actually return anything.
Start by reading the chapter Returning From a Function in the manual.
Here are some related answers with examples:
Can I make a plpgsql function return an integer without using a variable?
Return a query from a function?
How to return multiple rows from PL/pgSQL function?
Return setof record (virtual table) from function
And there may be naming conflicts with parameter names. We can't tell unless you post the complete function definition.
Better approach
That said, explicit cursors are only actually needed on rare occasions. Typically, the implicit cursor of a FOR loop is simpler to handle and cheaper:
Truncating all tables in a Postgres database
And most of the time you don't even need any cursors or loops at all. Set-based solutions are typically simpler and faster.

Related

Is it possible to write polymorphic Postgres functions using RECORD parameters?

I want to write a PL/pgSQL function that can take records of different types, check the type of record provided, then do something with the record. Example:
CREATE FUNCTION polymorphic_input(arg_rec RECORD) RETURNS TEXT LANGUAGE plpgsql AS
$plpgsql$
BEGIN
IF pg_typeof(arg_rec)::text = 'information_schema.tables' THEN
RETURN (arg_rec::information_schema.tables).table_name;
ELSIF pg_typeof(arg_rec)::text = 'information_schema.columns' THEN
RETURN (arg_rec::information_schema.columns).column_name;
ELSE
RETURN 'unknown';
END IF;
END;
$plpgsql$;
When you call the function with a row from the information_schema.tables table, it should return the name of the table and it does so when you call it like this:
-- this returns table name "pg_type"
SELECT polymorphic_input((SELECT t FROM information_schema.tables t WHERE table_name = 'pg_type' LIMIT 1));
When you call the function with a row from the information_schema.columns table, it should return the name of the column and it does so when you call it like this:
-- this returns column name "objsubid"
SELECT polymorphic_input((SELECT t FROM information_schema.columns t WHERE t.column_name = 'objsubid' LIMIT 1));
The problem is you CAN'T call the function twice in a row with different row types. For example, if you call it with a row from information_schema.columns it works, then when you call it with a row form information_schema.tables, you get an error like this:
type of parameter 1 (information_schema.tables) does not match that when preparing the plan (information_schema.columns)
The words "when preparing the plan" gave me a hint that Postgres is caching the plans, so I figured running DISCARD PLANS; before each call to the function would work, and indeed it does when you run this entire query:
DISCARD PLANS; SELECT polymorphic_input((SELECT t FROM information_schema.tables t WHERE table_name = 'pg_type' LIMIT 1));
DISCARD PLANS; SELECT polymorphic_input((SELECT t FROM information_schema.columns t WHERE t.column_name = 'objsubid' LIMIT 1));
Running DISCARD PLANS; seems like the nuclear option and would no doubt affect performance in a real-world scenario. After some experimentation, I saw that using the pg_typeof function is what forces the plans to be cached. We can rewrite the function to avoid pg_typeof by adding a parameter that specifies what record type to expect:
CREATE FUNCTION polymorphic_input2(arg_rec RECORD, arg_type text) RETURNS TEXT LANGUAGE plpgsql AS
$plpgsql$
BEGIN
IF arg_type = 'tables' THEN
RETURN (arg_rec::information_schema.tables).table_name;
ELSIF arg_type = 'columns' THEN
RETURN (arg_rec::information_schema.columns).column_name;
ELSE
RETURN 'unknown';
END IF;
END;
$plpgsql$;
You can then call polymorphic_input2 multiple times in a row with different row types without error as follows:
-- no need for DISCARD PLANS here...these calls work fine.
SELECT polymorphic_input2((SELECT t FROM information_schema.tables t WHERE table_name = 'pg_type' LIMIT 1), 'tables');
SELECT polymorphic_input2((SELECT t FROM information_schema.columns t WHERE t.column_name = 'objsubid' LIMIT 1), 'columns');
The problem with polymorphic_input2 is that you have to manually give it a hint as to the type of the record to expect. My question: is it possible to implement a polymorphic function that can figure out the type of record passed to it, without the cached plan errors?
The docs mention the plan_cache_mode setting:
Prepared statements (either explicitly prepared or implicitly generated, for example by PL/pgSQL) can be executed using custom or generic plans. Custom plans are made afresh for each execution using its specific set of parameter values, while generic plans do not rely on the parameter values and can be re-used across executions....The allowed values are auto (the default), force_custom_plan and force_generic_plan...
I tried removing the error by running SET plan_cache_mode = force_custom_plan; but that didn't help (which is probably a bug because the docs imply it should force a custom plan in each call, but Postgres is still caching the plan and causing errors). Only DISCARD PLANS worked.
The docs on plan caching seem to recognize this issue and say:
The mutable nature of record variables presents another problem in this connection. When fields of a record variable are used in expressions or statements, the data types of the fields must not change from one call of the function to the next, since each expression will be analyzed using the data type that is present when the expression is first reached. EXECUTE can be used to get around this problem when necessary.
...and a little further down the docs indicate this shouldn't be happening:
Likewise, functions having polymorphic argument types have a separate statement cache for each combination of actual argument types they have been invoked for, so that data type differences do not cause unexpected failures.
This is further confirmed by the docs EXECUTE which say:
Also, there is no plan caching for commands executed via EXECUTE. Instead, the command is always planned each time the statement is run. Thus the command string can be dynamically created within the function to perform actions on different tables and columns.
So I tried another variant that tries to run pg_typeof via EXECUTE:
CREATE FUNCTION polymorphic_input3(arg_rec RECORD) RETURNS TEXT LANGUAGE plpgsql AS
$plpgsql$
DECLARE
rec_type text;
BEGIN
EXECUTE 'SELECT pg_typeof($1)' INTO rec_type USING arg_rec;
IF rec_type = 'information_schema.tables' THEN
RETURN (arg_rec::information_schema.tables).table_name;
ELSIF rec_type = 'information_schema.columns' THEN
RETURN (arg_rec::information_schema.columns).column_name;
ELSE
RETURN 'unknown';
END IF;
END;
$plpgsql$;
...but that still produces the same error as the variant which calls pg_typeof directly.
My question once again: is it possible (in Postgres 14) to implement a polymorphic function that can figure out the type of record passed to it, without the cached plan errors?

Postgres: How to transform data in two cursors before comparison?

I am replacing a legacy function get_data in our database which takes some entity_id and returns a refcursor.
I am writing a new function get_data_new which is using different data sources, but the outputs are expected to be the same as get_data for the same input.
I'd like to verify this with pgtap, and am doing so as follows within a test (with _expected and _actual being the names of the returned cursors):
SELECT schema.get_data('_expected', 123);
SELECT schema.get_data_new('_actual', 123);
SELECT results_eq(
'FETCH ALL FROM _actual',
'FETCH ALL FROM _expected',
'get_data_new should return identical results to the legacy version'
);
This works as expected for other functions, but the query in get_data happens to return some json columns meaning that comparison expectedly fails with ERROR: could not identify an equality operator for type json.
I'd rather leave the legacy function untouched so refactoring to jsonb isn't possible. I'm imagining a workaround to be transforming the data before comparison, hypothetically with something like SELECT entity_id, json_column::jsonb FROM (FETCH ALL FROM _actual), but this specific attempt obviously isn't valid.
What would be a suggested approach here? Write a helper function to insert data from the cursors into a couple of temporary tables? I'm hoping there's a cleaner solution I haven't discovered.
Using postgres 11.14, pgtap11
Have solved it by creating a function to loop over the cursor and return the results as a table. Unfortunately this isn't a generic solution - it only works for the cursors with specific data.
In this specific case, json_column can be implicitly converted to type jsonb so this is all that is needed. However, we can now SELECT * FROM cursor_to_table('_actual') meaning we can do whatever transformations we require on the result.
CREATE OR REPLACE FUNCTION cursor_to_table(_cursor refcursor)
RETURNS TABLE (entity_id bigint, json_column jsonb)
AS $func$
BEGIN
LOOP
FETCH _cursor INTO entity_id, json_column
EXIT WHEN NOT FOUND;
RETURN NEXT;
END LOOP;
RETURN;
END
$func$ LANGUAGE plpgsql;
SELECT results_eq(
'SELECT * FROM cursor_to_table(''_actual'')',
'SELECT * FROM cursor_to_table(''_expected'')',
'get_data_new should return identical results to the legacy version'
);

Loop through table and return a specific column

Here is my function:
CREATE OR REPLACE FUNCTION getAllFoo()
RETURNS SETOF "TraderMonthOutstanding" AS
$BODY$
DECLARE
r record;
BEGIN
FOR r IN (select * from "TraderMonthOutstanding"
where "TraderId"=1 and "IsPaid"=false)
LOOP
RETURN NEXT r.TraderId;
END LOOP;
RETURN;
END
$BODY$
LANGUAGE 'plpgsql' ;
I want only TraderId from this loop but I am not able to get it because it gives me an error.
What am I doing wrong?
You declared that getAllFoo() will RETURNS SETOF "TraderMonthOutstanding" (a set of rows with structure of table TraderMonthOutstanding) but then you RETURN NEXT r.TraderId (single field).
You need to either change the declared return type to RETURNS SETOF "TraderMonthOutstanding".TraderId%TYPE or RETURN NEXT r.
You should be returning a setof integer, or a setof TraderMonthOutstanding".TraderId%TYPE.
Or changing perhaps a setof record and returning next r, or event setof TraderMonthOutstanding and declaring r as a TraderMonthOutstanding row type instead of an anonymous record.
Better yet, this seems like the kind of case where you ought to create a view to avoid the overhead of the using a function altogether:
create view getAllFoo as
select *
from "TraderMonthOutstanding"
where "TraderId"=1 and "IsPaid"=false;
select * from getAllFoo;
It's usually terrible practice to loop over a set like that in a function if you're not doing anything useful. (Though I imagine the real one is doing something.)
Lastly, in the event you truly want this in a function, note that you can return query to avoid the loop altogether:
return query
select *
from "TraderMonthOutstanding"
where "TraderId"=1 and "IsPaid"=false;
What #Igor and #Denis said.
If you actually need the loop, which doesn't become apparent from your example, it would work like this:
CREATE OR REPLACE FUNCTION get_all_foo()
RETURNS SETOF "TraderMonthOutstanding"."TraderId"%TYPE AS
$func$
DECLARE
r "TraderMonthOutstanding"; -- use the well known type
BEGIN
FOR r IN
SELECT * FROM "TraderMonthOutstanding"
WHERE "TraderId" = 1
AND "IsPaid" = FALSE -- no parentheses needed
LOOP
-- do whatever necessitates a loop
RETURN NEXT r."TraderId";
END LOOP;
RETURN;
END
$func$ LANGUAGE plpgsql;
Do not quote the language name plpgsql. It's an identifier, not a literal. Current Postgres tolerates it, but it may be weeded out in future versions.
I would advice not to use CaMeL-case identifiers at all in Postgres. Even less if you double-quote some, but not others. That's a LoaDed Footgun. Read the manual about identifiers.
Since you already know the type of the record you are going to use, don't downgrade to an anonymous record, use the well known type to begin with: "TraderMonthOutstanding".
Like you have been told already, the type returned by RETURN has to match what you declared with RETURNS. Since you want only TraderId, adapt RETURNS accordingly. If you want to derive it from the existing table you can use the syntax displayed. Details in the manual. Or you can insert the actual type. Like integer or whatever.
And since you seem to have been using double-quoted CaMeL-case names when creating your table, you have to keep using them that way: double-quoted - even for fields of the derived type in the function body:
RETURN NEXT r."TraderId";
See what I mean with "LoaDed Footgun"?

Print ASCII-art formatted SETOF records from inside a PL/pgSQL function

I would love to exploit the SQL output formatting of PostgreSQL inside my PL/pgSQL functions, but I'm starting to feel I have to give up the idea.
I have my PL/pgSQL function query_result:
CREATE OR REPLACE FUNCTION query_result(
this_query text
) RETURNS SETOF record AS
$$
BEGIN
RETURN QUERY EXECUTE this_query;
END;
$$ LANGUAGE plpgsql;
..merrily returning a SETOF records from an input text query, and which I can use for my SQL scripting with dynamic queries:
mydb=# SELECT * FROM query_result('SELECT ' || :MYVAR || ' FROM Alice') AS t (id int);
id
----
1
2
3
So my hope was to find a way to deliver this same nicely formatted output from inside a PL/pgSQL function instead, but RAISE does not support SETOF types, and there's no magic predefined cast from SETOF records to text (I know I could create my own CAST..)
If I create a dummy print_result function:
CREATE OR REPLACE FUNCTION print_result(
this_query text
) RETURNS void AS
$$
BEGIN
SELECT query_result(this_query);
END;
$$ LANGUAGE plpgsql;
..I cannot print the formatted output:
mydb=# SELECT print_result('SELECT ' || :MYVAR || ' FROM Alice');
ERROR: set-valued function called in context that cannot accept a set
...
Thanks for any suggestion (which preferably works with PostgreSQL 8.4).
Ok, to do anything with your result set in print_result you'll have to loop over it. That'll look something like this -
Here result_record is defined as a record variable. For the sake of explanation, we'll also assume that you have a formatted_results variable that is defined as text and defaulted to a blank string to hold the formatted results.
FOR result_record IN SELECT * FROM query_result(this_query) AS t (id int) LOOP
-- With all this, you can do something like this
formatted_results := formatted_results ||','|| result_record.id;
END LOOP;
RETURN formatted_results;
So, if you change print_results to return text, declare the variables as I've described and add this in, your function will return a comma-separated list of all your results (with an extra comma at the end, I'm sure you can make use of PostgreSQL's string functions to trim that). I'm not sure this is exactly what you want, but this should give you a good idea about how to manipulate your result set. You can get more information here about control structures, which should let you do pretty much whatever you want.
EDIT TO ANSWER THE REAL QUESTION:
The ability to format data tuples as readable text is a feature of the psql client, not the PostgreSQL server. To make this feature available in the server would require extracting relevant code or modules from the psql utility and recompiling them as a database function. This seems possible (and it is also possible that someone has already done this), but I am not familiar enough with the process to provide a good description of how to do that. Most likely, the best solution for formatting query results as text will be to make use of PostgreSQL's string formatting functions to implement the features you need for your application.

Accessing columns inside a record using table alias in Postgresql

I'm trying to iterate over the result of a query using a record data type. Nevertheless, if I try to access one column using the table alias defined in the query, I get the following error:
ERRO: schema "inv_row" does not exist
CONTEXT: SQL command "SELECT inv_row.s.processor <> inv_row.d.processor"
PL/pgSQL function "teste" line 7 at IF
Here is the code that throws this error:
CREATE OR REPLACE FUNCTION teste() returns void as $$
DECLARE
inv_row record;
BEGIN
FOR inv_row in SELECT * FROM sa_inventory s LEFT JOIN dim_inventory d ON s.macaddr = d.macaddr LOOP
IF inv_row.s.processor <> inv_row.d.processor THEN
<do something>;
END IF;
END LOOP;
END;
$$ language plpgsql;
Is there another way to access a column of a particular table inside a record data type?
Fortunately the answer here is relatively simple. You have to use parentheses to indicate tuples:
IF (inv_row.s).processor <> (inv_row.d).processor THEN
This is because SQL specifies meaning to the depth of the namespaces and therefore without this PostgreSQL cannot safely determine what this means. So inv_row.s.processor means the processor column of the s table in the inv_row schema. However (inv_row.s).processor means take the s column of the inf_row table, treat it as a tuple, and take the processor column of that.