How to do variable substitution in plpgsql? - postgresql

I've got a bit of complex sql code that I'm converting from MSSql to Postgres (using Entity Framework Core 2.1), to deal with potential race conditions when inserting to a table with a unique index. Here's the dumbed-down version:
const string QUERY = #"
DO
$$
BEGIN
insert into Foo (Field1,Field2,Field3)
values (#value1,#value2,#value3);
EXCEPTION WHEN others THEN
-- do nothing; it's a race condition
END;
$$ LANGUAGE plpgsql;
select *
from Foo
where Field1 = #value1
and Field2 = #value2;
";
return DbContext.Foos
.FromSql(QUERY,
new NpgsqlParameter("value1", value1),
new NpgsqlParameter("value2", value2),
new NpgsqlParameter("value3", value3))
.First();
In other words, try to insert the record, but don't throw an exception if the attempt to insert it results in a unique index violation (index is on Field1+Field2), and return the record, whether it was created by me or another thread.
This concept worked fine in MSSql, using a TRY..CATCH block. As far as I can tell, the way to handle Postgres exceptions is as I've done, in a plpgsql block.
BUT...
It appears that variable substitution in plpgsql blocks doesn't work. The code above fails on the .First() (no elements in sequence), and when I comment out the EXCEPTION line, I see the real problem, which is:
Npgsql.PostgresException : 42703: column "value1" does not exist
When I test using regular Sql, i.e. doing the insert without using a plpgsql block, this works fine.
So, what is the correct way to do variable substitution in a plpgsql block?

The reason this doesn't work is that the body of the DO statement is actually a string, a text. See reference
$$ is just another way to delimit text in postgresql. It can be just as well be replaced with ' or $somestuff$.
As it is a string, Npgsql and Postgresql have no reason to mess with #value1 in it.
Solutions? Only a very ugly one, so not using this construction, as you're not able to pass it any values. And messing with string concatenation is no different than doing concatenation in C# in the first place.
Alternatives? Yes!
You don't need to handle exceptions in plpgsql blocks. Simply insert, use the ON CONFLICT DO NOTHING, and be on your way.
INSERT INTO Foo (Field1,Field2,Field3)
VALUES (#value1,#value2,#value3)
ON CONFLICT DO NOTHING;
select *
from Foo
where Field1 = #value1
and Field2 = #value2;
Or if you really want to keep using plpgsql, you can simply create a temporary table, using the ON COMMIT DROP option, fill it up with these parameters as one row, then use it in the DO statement. For that to work all your code must execute as part of one transaction. You can use one explicitly just in case.
The only ways to pass parameters to plpgsql code is via these 2 methods:
Declaring a function, then calling it with arguments
When already inside a plpgsql block you can call:
EXECUTE $$ INSERT ... VALUES ($1, $2, $3); $$ USING 3, 'text value', 5.234;
End notes:
As a fellow T-SQL developer who loved its freedom, but transitioned to Postgresql, I have to say that the BIG difference is that on one side there's T-SQL which gives the power, and on the other side it's a very powerful Postgresql-flavored SQL. plpgsql is very rarely warranted. In fact, in a code base of megabytes of complex SQL stuff, I can rewrite pretty much every plpgsql code in SQL. That's how powerful it really is compared to MSSQL-flavored SQL. It just takes some getting used to, and befriending the very ample documentation. Good luck!

Related

%rowtype in nested `declare...begin...end`

inside an anonymous block, I have nested a declare...begin...end. Inner block cannot "see" a temp table created in outer block. What is best way to solve, still maintaining creating of temp table inside?
The below FAILs with: CONTEXT: compilation of PL/pgSQL function "inline_code_block" near line 152
do
$$
declare
xyz...
being
create temp table x ... <depending on xyz>
...
declare
r x%rowtype -> FAIL
begin
...
end;
end;
$$
It cannot to work. This error is not related to block scopes, but it is related to timing. The temp table is created in runtime, but the x%rowtype is evaluated in validation time (before runtime). There is very simple and robust solution in Postgres. Use record type:
declare r record;
begin
The variable of record type takes real composite type in assign time.
Although PL/pgSQL is similar to PL/SQL, inside it is absolutely different technology, and not all patterns from Oracle are possible or are effective. And sometimes there are different (and sometimes much more comfortable) ways.

plpgsql : how to reference variable in sql statement

I am rather new to PL/pgSQL and don't know how to reference a variable in a SELECT statement.
In this following function the SELECT INTO always returns NULL:
$body$
DECLARE
r RECORD;
c CURSOR FOR select name from t_people;
nb_bills integer;
BEGIN
OPEN c;
LOOP
FETCH c INTO r;
EXIT WHEN NOT FOUND;
RAISE NOTICE 'name found: %', r.name;
SELECT count(bill_id) INTO nb_bills from t_bills where name = r.name;
END LOOP;
END;
$body$
RAISE NOTICE allows me to verify that my CURSOR is working well: names are properly retrieved, but for some reason still unknown to me, not properly passed to the SELECT INTO statement.
For debugging purpose, I tried to replace the variable in SELECT INTO with a constant value and it worked:
SELECT count( bill_id) INTO nb_bills from t_bills where name = 'joe';
I don't know how to reference r.name in the SELECT INTO statement.
I tried r.name, I tried to create another variable containing quotes, it is always returning NULL.
I am stuck. If anyone knows ...
the SELECT INTO always returns NULL
not properly passed to the SELECT INTO statement.
it is always returning NULL.
None of this makes sense.
SELECT statements do not return anything by itself in PL/pgSQL. You have to either assign results to variables or explicitly return results with one of the available RETURN variants.
SELECT INTO is only used for variable assignment and does not return anything, either. Not to be confused with SELECT INTO in plain SQL - which is generally discouraged:
Combine two tables into a new one so that select rows from the other one are ignored
It's not clear what's supposed to be returned in your example. You did not disclose the return type of the function and you did not actually return anything.
Start by reading the chapter Returning From a Function in the manual.
Here are some related answers with examples:
Can I make a plpgsql function return an integer without using a variable?
Return a query from a function?
How to return multiple rows from PL/pgSQL function?
Return setof record (virtual table) from function
And there may be naming conflicts with parameter names. We can't tell unless you post the complete function definition.
Better approach
That said, explicit cursors are only actually needed on rare occasions. Typically, the implicit cursor of a FOR loop is simpler to handle and cheaper:
Truncating all tables in a Postgres database
And most of the time you don't even need any cursors or loops at all. Set-based solutions are typically simpler and faster.

Executing queries dynamically in PL/pgSQL

I have found solutions (I think) to the problem I'm about to ask for on Oracle and SQL Server, but can't seem to translate this into a Postgres solution. I am using Postgres 9.3.6.
The idea is to be able to generate "metadata" about the table content for profiling purposes. This can only be done (AFAIK) by having queries run for each column so as to find out, say... min/max/count values and such. In order to automate the procedure, it is preferable to have the queries generated by the DB, then executed.
With an example salesdata table, I'm able to generate a select query for each column, returning the min() value, using the following snippet:
SELECT 'SELECT min('||column_name||') as minval_'||column_name||' from salesdata '
FROM information_schema.columns
WHERE table_name = 'salesdata'
The advantage being that the db will generate the code regardless of the number of columns.
Now there's a myriad places I had in mind for storing these queries, either a variable of some sort, or a table column, the idea being to then have these queries execute.
I thought of storing the generated queries in a variable then executing them using the EXECUTE (or EXECUTE IMMEDIATE) statement which is the approach employed here (see right pane), but Postgres won't let me declare a variable outside a function and I've been scratching my head with how this would fit together, whether that's even the direction to follow, perhaps there's something simpler.
Would you have any pointers, I'm currently trying something like this, inspired by this other question but have no idea whether I'm headed in the right direction:
CREATE OR REPLACE FUNCTION foo()
RETURNS void AS
$$
DECLARE
dyn_sql text;
BEGIN
dyn_sql := SELECT 'SELECT min('||column_name||') from salesdata'
FROM information_schema.columns
WHERE table_name = 'salesdata';
execute dyn_sql
END
$$ LANGUAGE PLPGSQL;
System statistics
Before you roll your own, have a look at the system table pg_statistic or the view pg_stats:
This view allows access only to rows of pg_statistic that correspond
to tables the user has permission to read, and therefore it is safe to
allow public read access to this view.
It might already have some of the statistics you are about to compute. It's populated by ANALYZE, so you might run that for new (or any) tables before checking.
-- ANALYZE tbl; -- optionally, to init / refresh
SELECT * FROM pg_stats
WHERE tablename = 'tbl'
AND schemaname = 'public';
Generic dynamic plpgsql function
You want to return the minimum value for every column in a given table. This is not a trivial task, because a function (like SQL in general) demands to know the return type at creation time - or at least at call time with the help of polymorphic data types.
This function does everything automatically and safely. Works for any table, as long as the aggregate function min() is allowed for every column. But you need to know your way around PL/pgSQL.
CREATE OR REPLACE FUNCTION f_min_of(_tbl anyelement)
RETURNS SETOF anyelement
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE (
SELECT format('SELECT (t::%2$s).* FROM (SELECT min(%1$s) FROM %2$s) t'
, string_agg(quote_ident(attname), '), min(' ORDER BY attnum)
, pg_typeof(_tbl)::text)
FROM pg_attribute
WHERE attrelid = pg_typeof(_tbl)::text::regclass
AND NOT attisdropped -- no dropped (dead) columns
AND attnum > 0 -- no system columns
);
END
$func$;
Call (important!):
SELECT * FROM f_min_of(NULL::tbl); -- tbl being the table name
db<>fiddle here
Old sqlfiddle
You need to understand these concepts:
Dynamic SQL in plpgsql with EXECUTE
Polymorphic types
Row types and table types in Postgres
How to defend against SQL injection
Aggregate functions
System catalogs
Related answer with detailed explanation:
Table name as a PostgreSQL function parameter
Refactor a PL/pgSQL function to return the output of various SELECT queries
Postgres data type cast
How to set value of composite variable field using dynamic SQL
How to check if a table exists in a given schema
Select columns with particular column names in PostgreSQL
Generate series of dates - using date type as input
Special difficulty with type mismatch
I am taking advantage of Postgres defining a row type for every existing table. Using the concept of polymorphic types I am able to create one function that works for any table.
However, some aggregate functions return related but different data types as compared to the underlying column. For instance, min(varchar_column) returns text, which is bit-compatible, but not exactly the same data type. PL/pgSQL functions have a weak spot here and insist on data types exactly as declared in the RETURNS clause. No attempt to cast, not even implicit casts, not to speak of assignment casts.
That should be improved. Tested with Postgres 9.3. Did not retest with 9.4, but I am pretty sure, nothing has changed in this area.
That's where this construct comes in as workaround:
SELECT (t::tbl).* FROM (SELECT ... FROM tbl) t;
By casting the whole row to the row type of the underlying table explicitly we force assignment casts to get original data types for every column.
This might fail for some aggregate function. sum() returns numeric for a sum(bigint_column) to accommodate for a sum overflowing the base data type. Casting back to bigint might fail ...
#Erwin Brandstetter, Many thanks for the extensive answer. pg_stats does indeed provide a few things, but what I really need to draw a complete profile is a variety of things, min, max values, counts, count of nulls, mean etc... so a bunch of queries have to be ran for each columns, some with GROUP BY and such.
Also, thanks for highlighting the importance of data types, i was sort of expecting this to throw a spanner in the works at some point, my main concern was with how to automate the query generation, and its execution, this last bit being my main concern.
I have tried the function you provide (I probably will need to start learning some plpgsql) but get a error at the SELECT (t::tbl) :
ERROR: type "tbl" does not exist
btw, what is the (t::abc) notation referred as, in python this would be a list slice, but it’s probably not the case in PLPGSQL

Another missing FROM-clause entry for table

I'm new to postgresql and I have this problem with trigger, that should just take value from the sequence. If I understood correctly, (tried few examples) in case like this you don't need clause FROM (like in ORACLE you would add 'from dual').
Trigger looks something like that:
CREATE OR REPLACE FUNCTION public.trigger_fct_name_id_autoinc()
RETURNS trigger AS
$$
BEGIN
SELECT "NAME_ID_SEQ".NEXTVAL INTO NEW."ID" ;
RETURN NEW;
END
$$
LANGUAGE 'plpgsql';
Of course, I've done something wrong. Error I get is:
ERROR: missing FROM-clause entry for table "NAME_ID_SEQ" Where: PL/pgSQL function trigger_fct_name_id_autoinc() line 3 at SQL statement
Tried to find some answers in the web, but either there was none, or I don't understand postgresql well enough to understand what people write. In most topics the answer involves aliases, but I don't think that's the case here.
Should I somehow add FROM clause, or is it just a sign of a different issue?
You are using Oracle syntax to get the next value from your sequence. Use the Postgres' nextval:
SELECT nextval('NAME_ID_SEQ') INTO NEW."ID" ;

Why does PostgreSQL treat my query differently in a function?

I have a very simple query that is not much more complicated than:
select *
from table_name
where id = 1234
...it takes less than 50 milliseconds to run.
Took that query and put it into a function:
CREATE OR REPLACE FUNCTION pie(id_param integer)
RETURNS SETOF record AS
$BODY$
BEGIN
RETURN QUERY SELECT *
FROM table_name
where id = id_param;
END
$BODY$
LANGUAGE plpgsql STABLE;
This function when executed select * from pie(123); takes 22 seconds.
If I hard code an integer in place of id_param, the function executes in under 50 milliseconds.
Why does the fact that I am using a parameter in the where statement cause my function to run slow?
Edit to add concrete example:
CREATE TYPE test_type AS (gid integer, geocode character varying(9))
CREATE OR REPLACE FUNCTION geocode_route_by_geocode(geocode_param character)
RETURNS SETOF test_type AS
$BODY$
BEGIN
RETURN QUERY EXECUTE
'SELECT gs.geo_shape_id AS gid,
gs.geocode
FROM geo_shapes gs
WHERE geocode = $1
AND geo_type = 1
GROUP BY geography, gid, geocode' USING geocode_param;
END;
$BODY$
LANGUAGE plpgsql STABLE;
ALTER FUNCTION geocode_carrier_route_by_geocode(character)
OWNER TO root;
--Runs in 20 seconds
select * from geocode_route_by_geocode('999xyz');
--Runs in 10 milliseconds
SELECT gs.geo_shape_id AS gid,
gs.geocode
FROM geo_shapes gs
WHERE geocode = '9999xyz'
AND geo_type = 1
GROUP BY geography, gid, geocode
Update in PostgreSQL 9.2
There was a major improvement, I quote the release notes here:
Allow the planner to generate custom plans for specific parameter
values even when using prepared statements (Tom Lane)
In the past, a prepared statement always had a single "generic" plan
that was used for all parameter values, which was frequently much
inferior to the plans used for non-prepared statements containing
explicit constant values. Now, the planner attempts to generate custom
plans for specific parameter values. A generic plan will only be used
after custom plans have repeatedly proven to provide no benefit. This
change should eliminate the performance penalties formerly seen from
use of prepared statements (including non-dynamic statements in
PL/pgSQL).
Original answer for PostgreSQL 9.1 or older
A plpgsql functions has a similar effect as the PREPARE statement: queries are parsed and the query plan is cached.
The advantage is that some overhead is saved for every call.
The disadvantage is that the query plan is not optimized for the particular parameter values it is called with.
For queries on tables with even data distribution, this will generally be no problem and PL/pgSQL functions will perform somewhat faster than raw SQL queries or SQL functions. But if your query can use certain indexes depending on the actual values in the WHERE clause or, more generally, chose a better query plan for the particular values, you may end up with a sub-optimal query plan. Try an SQL function or use dynamic SQL with EXECUTE to force a the query to be re-planned for every call. Could look like this:
CREATE OR REPLACE FUNCTION pie(id_param integer)
RETURNS SETOF record AS
$BODY$
BEGIN
RETURN QUERY EXECUTE
'SELECT *
FROM table_name
where id = $1'
USING id_param;
END
$BODY$
LANGUAGE plpgsql STABLE;
Edit after comment:
If this variant does not change the execution time, there must be other factors at play that you may have missed or did not mention. Different database? Different parameter values? You would have to post more details.
I add a quote from the manual to back up my above statements:
An EXECUTE with a simple constant command string and some USING
parameters, as in the first example above, is functionally equivalent
to just writing the command directly in PL/pgSQL and allowing
replacement of PL/pgSQL variables to happen automatically. The
important difference is that EXECUTE will re-plan the command on each
execution, generating a plan that is specific to the current parameter
values; whereas PL/pgSQL normally creates a generic plan and caches it
for re-use. In situations where the best plan depends strongly on the
parameter values, EXECUTE can be significantly faster; while when the
plan is not sensitive to parameter values, re-planning will be a
waste.