Loop through array parameter WITHOUT foreach - postgresql

CREATE OR REPLACE FUNCTION fnMyFunction(recipients recipient[]) ...
FOREACH v_recipient IN ARRAY recipients
LOOP
v_total := v_total + v_recipient.amount;
INSERT INTO tmp_recipients(id, amount)
VALUES(v_recipient.id, v_recipient.amount::numeric(10,2));
END LOOP;
...
This works great in dev environment, but just found out the release environment is 8.4 which doesn't appear to support the FOREACH construct. I was hoping someone might shed some light on an alternative implementation for loop though the array parameter set and using values from array in a similar fashion to avoid a complete refactor.
The error message I am receiving is:
ERROR: syntax error at or near "FOREACH" SQL state: 42601 Context: SQL
statement in PL/PgSQL function "fnMyFunction" near line ##
The db environment is on a shared host so I have no options for platform upgrade.
I tagged postgres 9.1 and 8.4 because the function works properly in 9.x but fails on 8.4.

Use unnest; I think it was in 8.4. Untested but I think is right:
FOR v_recipient IN SELECT vr FROM unnest(recipients) x(vr)
LOOP
....
END LOOP;
If you can't do that you'll have to loop over array_length using indexing into the array.

The way you have it, you execute one INSERT at a time. With relational databases, set-based operations are regularly much faster than iterating through records one at a time.
This should be simpler, faster and work with PostgreSQL 8.4 or later:
INSERT INTO tmp_recipients(id, amount)
SELECT (r.col).*
FROM (SELECT unnest(recipients) AS col) r
This assumes that the composite base type of the array consists of (id, amount) - in that order - and the type of amount can be coerced to numeric(10,2). Else, or just to be sure, be more explicit:
INSERT INTO tmp_recipients(id, amount)
SELECT (r.col).id, (r.col).amount::numeric(10,2)
FROM (SELECT unnest(recipients) AS col) r
The parentheses around (r.col) are not optional. They are required to disambiguate the syntax for the composite type.

Related

How to do variable substitution in plpgsql?

I've got a bit of complex sql code that I'm converting from MSSql to Postgres (using Entity Framework Core 2.1), to deal with potential race conditions when inserting to a table with a unique index. Here's the dumbed-down version:
const string QUERY = #"
DO
$$
BEGIN
insert into Foo (Field1,Field2,Field3)
values (#value1,#value2,#value3);
EXCEPTION WHEN others THEN
-- do nothing; it's a race condition
END;
$$ LANGUAGE plpgsql;
select *
from Foo
where Field1 = #value1
and Field2 = #value2;
";
return DbContext.Foos
.FromSql(QUERY,
new NpgsqlParameter("value1", value1),
new NpgsqlParameter("value2", value2),
new NpgsqlParameter("value3", value3))
.First();
In other words, try to insert the record, but don't throw an exception if the attempt to insert it results in a unique index violation (index is on Field1+Field2), and return the record, whether it was created by me or another thread.
This concept worked fine in MSSql, using a TRY..CATCH block. As far as I can tell, the way to handle Postgres exceptions is as I've done, in a plpgsql block.
BUT...
It appears that variable substitution in plpgsql blocks doesn't work. The code above fails on the .First() (no elements in sequence), and when I comment out the EXCEPTION line, I see the real problem, which is:
Npgsql.PostgresException : 42703: column "value1" does not exist
When I test using regular Sql, i.e. doing the insert without using a plpgsql block, this works fine.
So, what is the correct way to do variable substitution in a plpgsql block?
The reason this doesn't work is that the body of the DO statement is actually a string, a text. See reference
$$ is just another way to delimit text in postgresql. It can be just as well be replaced with ' or $somestuff$.
As it is a string, Npgsql and Postgresql have no reason to mess with #value1 in it.
Solutions? Only a very ugly one, so not using this construction, as you're not able to pass it any values. And messing with string concatenation is no different than doing concatenation in C# in the first place.
Alternatives? Yes!
You don't need to handle exceptions in plpgsql blocks. Simply insert, use the ON CONFLICT DO NOTHING, and be on your way.
INSERT INTO Foo (Field1,Field2,Field3)
VALUES (#value1,#value2,#value3)
ON CONFLICT DO NOTHING;
select *
from Foo
where Field1 = #value1
and Field2 = #value2;
Or if you really want to keep using plpgsql, you can simply create a temporary table, using the ON COMMIT DROP option, fill it up with these parameters as one row, then use it in the DO statement. For that to work all your code must execute as part of one transaction. You can use one explicitly just in case.
The only ways to pass parameters to plpgsql code is via these 2 methods:
Declaring a function, then calling it with arguments
When already inside a plpgsql block you can call:
EXECUTE $$ INSERT ... VALUES ($1, $2, $3); $$ USING 3, 'text value', 5.234;
End notes:
As a fellow T-SQL developer who loved its freedom, but transitioned to Postgresql, I have to say that the BIG difference is that on one side there's T-SQL which gives the power, and on the other side it's a very powerful Postgresql-flavored SQL. plpgsql is very rarely warranted. In fact, in a code base of megabytes of complex SQL stuff, I can rewrite pretty much every plpgsql code in SQL. That's how powerful it really is compared to MSSQL-flavored SQL. It just takes some getting used to, and befriending the very ample documentation. Good luck!

Postgres using functions inside queries

I have a table with common word values to match against brands - so when someone types in "coke" I want to match any possible brand names associated with it as well as the original term.
CREATE TABLE word_association ( commonterm TEXT, assocterm TEXT);
INSERT INTO word_association ('coke', 'coca-cola'), ('coke', 'cocacola'), ('coke', 'coca-cola');
I have a function to create a list of these values in a pipe-delim string for pattern matching:
CREATE OR REPLACE FUNCTION usp_get_search_terms(userterm text)
RETURNS text AS
$BODY$DECLARE
returnstr TEXT DEFAULT '';
BEGIN
SET DATESTYLE TO DMY;
returnstr := userterm;
IF EXISTS (SELECT 1 FROM word_association WHERE LOWER(commonterm) = LOWER(userterm)) THEN
SELECT returnstr || '|' || string_agg(assocterm, '|') INTO returnstr
FROM word_association
WHERE commonterm = userterm;
END IF;
RETURN returnstr;
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
ALTER FUNCTION usp_get_search_terms(text)
OWNER TO customer_role;
If you call SELECT * FROM usp_get_search_terms('coke') you end up with
coke|coca-cola|cocacola|coca cola
EDIT: this function runs <100ms so it works fine.
I want to run a query with this text inserted e.g.
SELECT X.article_number, X.online_description
FROM articles X
WHERE LOWER(X.online_description) % usp_get_search_terms ('coke');
This takes approx 56s to run against my table of ~500K records.
If I get the raw text and use it in the query it takes ~300ms e.g.
SELECT X.article_number, X.online_description
FROM articles X
WHERE X.online_description % '(coke|coca-cola|cocacola|coca cola)';
The result sets are identical.
I've tried modifying what the output string from the function to e.g. enclose it in quotes and parentheses but it doesn't seem to make a difference.
Can someone please advise why there is a difference here? Is it the data type or something about calling functions inside queries? Thanks.
Your function might take 100ms, but it's not calling your function once; it's calling it 500,000 times.
It's because your function is declared VOLATILE. This tells Postgres that either the function returns different values when called multiple times within a query (like clock_timestamp() or random()), or that it alters the state of the database in some way (for example, by inserting records).
If your function contains only SELECTs, with no INSERTs, calls to other VOLATILE functions, or other side-effects, then you can declare it STABLE instead. This tells the planner that it can call the function just once and reuse the result without affecting the outcome of the query.
But your function does have side-effects, due to the SET DATESTYLE statement, which takes effect for the rest of the session. I doubt this was the intention, however. You may be able to remove it, as it doesn't look like date formatting is relevant to anything in there. But if it is necessary, the correct approach is to use the SET clause of the CREATE FUNCTION statement to change it only for the duration of the function call:
...
$BODY$
LANGUAGE plpgsql STABLE
SET DATESTYLE TO DMY
COST 100;
The other issue with the slow version of the query is the call to LOWER(X.online_description), which will prevent the query from utilising the index (since online_description is indexed, but LOWER(online_description) is not).
With these changes, the performance of both queries is the same; see this SQLFiddle.
So the answer came to me about dawn this morning - CTEs to the rescue!
Particularly as this is the "simple" version of a very large query, it helps to get this defined once in isolation, then do the matching against it. The alternative (given I'm calling this from a NodeJS platform) is to have one request retrieve the string of terms, then make another request to pass the string back. Not elegant.
WITH matches AS
( SELECT * FROM usp_get_search_terms('coke') )
, main AS
( SELECT X.article_number, X.online_description
FROM articles X
JOIN matches M ON X.online_description % M.usp_get_search_terms )
SELECT * FROM main
Execution time is somewhere around 300-500ms depending on term searched and articles returned.
Thanks for all your input guys - I've learned a few things about PostGres that my MS-SQL background didn't necessarily prepare me for :)
Have you tried removing the IF EXISTS() and simply using:
SELECT returnstr || '|' || string_agg(assocterm, '|') INTO returnstr
FROM word_association
WHERE LOWER(commonterm) = LOWER(userterm)
In instead of calling the function for each row call it once:
select x.article_number, x.online_description
from
woolworths.articles x
cross join
woolworths.usp_get_search_terms ('coke') c (s)
where lower(x.online_description) % s

Executing queries dynamically in PL/pgSQL

I have found solutions (I think) to the problem I'm about to ask for on Oracle and SQL Server, but can't seem to translate this into a Postgres solution. I am using Postgres 9.3.6.
The idea is to be able to generate "metadata" about the table content for profiling purposes. This can only be done (AFAIK) by having queries run for each column so as to find out, say... min/max/count values and such. In order to automate the procedure, it is preferable to have the queries generated by the DB, then executed.
With an example salesdata table, I'm able to generate a select query for each column, returning the min() value, using the following snippet:
SELECT 'SELECT min('||column_name||') as minval_'||column_name||' from salesdata '
FROM information_schema.columns
WHERE table_name = 'salesdata'
The advantage being that the db will generate the code regardless of the number of columns.
Now there's a myriad places I had in mind for storing these queries, either a variable of some sort, or a table column, the idea being to then have these queries execute.
I thought of storing the generated queries in a variable then executing them using the EXECUTE (or EXECUTE IMMEDIATE) statement which is the approach employed here (see right pane), but Postgres won't let me declare a variable outside a function and I've been scratching my head with how this would fit together, whether that's even the direction to follow, perhaps there's something simpler.
Would you have any pointers, I'm currently trying something like this, inspired by this other question but have no idea whether I'm headed in the right direction:
CREATE OR REPLACE FUNCTION foo()
RETURNS void AS
$$
DECLARE
dyn_sql text;
BEGIN
dyn_sql := SELECT 'SELECT min('||column_name||') from salesdata'
FROM information_schema.columns
WHERE table_name = 'salesdata';
execute dyn_sql
END
$$ LANGUAGE PLPGSQL;
System statistics
Before you roll your own, have a look at the system table pg_statistic or the view pg_stats:
This view allows access only to rows of pg_statistic that correspond
to tables the user has permission to read, and therefore it is safe to
allow public read access to this view.
It might already have some of the statistics you are about to compute. It's populated by ANALYZE, so you might run that for new (or any) tables before checking.
-- ANALYZE tbl; -- optionally, to init / refresh
SELECT * FROM pg_stats
WHERE tablename = 'tbl'
AND schemaname = 'public';
Generic dynamic plpgsql function
You want to return the minimum value for every column in a given table. This is not a trivial task, because a function (like SQL in general) demands to know the return type at creation time - or at least at call time with the help of polymorphic data types.
This function does everything automatically and safely. Works for any table, as long as the aggregate function min() is allowed for every column. But you need to know your way around PL/pgSQL.
CREATE OR REPLACE FUNCTION f_min_of(_tbl anyelement)
RETURNS SETOF anyelement
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE (
SELECT format('SELECT (t::%2$s).* FROM (SELECT min(%1$s) FROM %2$s) t'
, string_agg(quote_ident(attname), '), min(' ORDER BY attnum)
, pg_typeof(_tbl)::text)
FROM pg_attribute
WHERE attrelid = pg_typeof(_tbl)::text::regclass
AND NOT attisdropped -- no dropped (dead) columns
AND attnum > 0 -- no system columns
);
END
$func$;
Call (important!):
SELECT * FROM f_min_of(NULL::tbl); -- tbl being the table name
db<>fiddle here
Old sqlfiddle
You need to understand these concepts:
Dynamic SQL in plpgsql with EXECUTE
Polymorphic types
Row types and table types in Postgres
How to defend against SQL injection
Aggregate functions
System catalogs
Related answer with detailed explanation:
Table name as a PostgreSQL function parameter
Refactor a PL/pgSQL function to return the output of various SELECT queries
Postgres data type cast
How to set value of composite variable field using dynamic SQL
How to check if a table exists in a given schema
Select columns with particular column names in PostgreSQL
Generate series of dates - using date type as input
Special difficulty with type mismatch
I am taking advantage of Postgres defining a row type for every existing table. Using the concept of polymorphic types I am able to create one function that works for any table.
However, some aggregate functions return related but different data types as compared to the underlying column. For instance, min(varchar_column) returns text, which is bit-compatible, but not exactly the same data type. PL/pgSQL functions have a weak spot here and insist on data types exactly as declared in the RETURNS clause. No attempt to cast, not even implicit casts, not to speak of assignment casts.
That should be improved. Tested with Postgres 9.3. Did not retest with 9.4, but I am pretty sure, nothing has changed in this area.
That's where this construct comes in as workaround:
SELECT (t::tbl).* FROM (SELECT ... FROM tbl) t;
By casting the whole row to the row type of the underlying table explicitly we force assignment casts to get original data types for every column.
This might fail for some aggregate function. sum() returns numeric for a sum(bigint_column) to accommodate for a sum overflowing the base data type. Casting back to bigint might fail ...
#Erwin Brandstetter, Many thanks for the extensive answer. pg_stats does indeed provide a few things, but what I really need to draw a complete profile is a variety of things, min, max values, counts, count of nulls, mean etc... so a bunch of queries have to be ran for each columns, some with GROUP BY and such.
Also, thanks for highlighting the importance of data types, i was sort of expecting this to throw a spanner in the works at some point, my main concern was with how to automate the query generation, and its execution, this last bit being my main concern.
I have tried the function you provide (I probably will need to start learning some plpgsql) but get a error at the SELECT (t::tbl) :
ERROR: type "tbl" does not exist
btw, what is the (t::abc) notation referred as, in python this would be a list slice, but it’s probably not the case in PLPGSQL

Error 28 "Out of Stack Space" executing a huge query with VB6 & ADO 2.8

Scenario:
Executing an SQL command from a Visual Basic 6 application using ADO Connection.Execute method through PostgreSQL OLEDB Provider to a PostgreSQL 9.2 database.
Query:
It's a simple EXECUTE prepared_statement_name (x, y, z), though it involves a PostGIS geometry type, thus it becomes something like:
EXECUTE prepared_statement_name (1, ST_GeomFromText('POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))', 900001));
Problem:
When the geometry is a huge and complex MULTIPOLYGON which contains many vertexes, the query becomes very lengthy (a few thousands characters) and the Connection.Execute method causes a Error 28: "Out Of Stack Space".
There is no recursion or nested loops involved in the process, it's quite clear the error is due to the excessive length of the query.
I think I could avoid the error if I passed the huge query in "chunks" to the provider before executing it, but this is just an idea and I don't know wether it is possible or not and how.
I have no clue, any help is appreciated.
Since it sounds like this is a VB6 level problem and you're already on current Pg versions I fear you may have to use some specacularly ugly workarounds.
If at all possible, try to find a way to increase the VB6 query buffer size, send the query in chunks via the VB6 ODBC interface, etc. Consider the following an absolute last resort.
Maybe this will give you some saner clues. I don't speak VB6 (thankfully) so I can't evaulate it: http://www.mrexcel.com/forum/excel-questions/61340-error-28-out-stack-space.html
Use the following only as a last resort if all else fails:
Create a TEMPORARY table like CREATE TEMPORARY TABLE my_query(id integer, text querychunk).
INSERT INTO the temporary table your statement, chunk by chunk, using parameterised queries to avoid quoting issues.
Create a wrapper PL/PgSQL function that does a RETURN QUERY EXECUTE format('EXECUTE stm_name(...)', passing the string_agg of the temp table as a parameter. Yes, that's amazingly ugly.
Here's a demo, some of the most horrible code I've ever written:
CREATE TABLE real_table (blah text);
PREPARE test_stm2(text) AS INSERT INTO real_table VALUES ($1);
CREATE TEMPORARY TABLE data_chunks(datord integer, datchunk text);
PREPARE chunk_insert(integer, text) AS INSERT INTO data_chunks(datord,datchunk) VALUES ($1,$2);
-- You'll really want to do this via proper parameterised statements
-- to avoid quoting nightmares; I'm using dollar-quoting as a workaround
EXECUTE chunk_insert(0, $val$POLYGON((0 0, 0 1, 1 1,$val$);
EXECUTE chunk_insert(1, $val$ 1, 1 0, 0 0))$val$);
DO
$$
BEGIN
EXECUTE 'EXECUTE test_stm2($1);'
USING
(SELECT string_agg(datchunk,'' ORDER BY datord) FROM data_chunks);
END;
$$ LANGUAGE plpgsql;
Result:
regress=> SELECT * FROM real_table ;
blah
---------------------------------------
POLYGON((0 0, 0 1, 1 1, 1, 1 0, 0 0))
(1 row)
A similar approach is possible for SELECT. You'd use RETURN QUERY EXECUTE within a function defined by CREATE OR REPLACE FUNCTION since DO blocks can't return a result. For example, for a query that returns a SETOF INTEGER you might write:
CREATE OR REPLACE FUNCTION test_wrapper_func() RETURNS SETOF integer AS $$
BEGIN
RETURN QUERY EXECUTE format('EXECUTE test_stm(%L);', (SELECT string_agg(datchunk,'' ORDER BY datord) FROM data_chunks));
END;
$$ LANGUAGE plpgsql;
You'll note the two-level EXECUTE. That's because the PL/PgSQL EXECUTE is quite a different statement to SQL-level EXECUTE. PL/PgSQL EXECUTE runs a string as dynamic SQL, wheras the SQL EXECUTE runs a prepared statement. Here we're running a prepared statement via dynamic SQL. Ick.
Wondering why I'm using PL/PgSQL? Because you can't use a subquery as an EXECUTE parameter. You can avoid the PL/PgSQL wrapper for the query if you don't run it as a prepared statement.
regress=> EXECUTE test_stm2( (SELECT string_agg(datchunk,'' ORDER BY datord) FROM data_chunks) );
ERROR: cannot use subquery in EXECUTE parameter

Nested query as PostGIS function parameter

I have a PostGIS query where I really need to have nested queries inside PostGIS function calls:
UPDATE raw.geocoding
SET the_geom = ST_Centroid(
ST_Collect(
SELECT the_geom
FROM raw.geocoding
WHERE hash = ((E'0101000020090C000081610F9CC5DC3341EE672E6E723B3241')::varchar),
SELECT the_geom
FROM raw.geocoding
WHERE hash = ((E'0101000020090C00002CF887E0C5DC3341C9E5B2DF2A383241')::varchar)
)
)
WHERE hash = ((E'3e638a27c6c38f05026252f4a0b57b2e')::varchar)
Unfortunately, this doesn't work. I get a syntax error at the beginning of the nested query:
ERROR: syntax error at or near "SELECT"
LINE 4: SELECT the_geom
^
********** Error **********
ERROR: syntax error at or near "SELECT"
SQL state: 42601
Character: 86
Looks like I cannot have a nested query as a PostGIS function parameter?
I've perused through the PostGIS documentation and cannot find any clear guidance for dealing with this.
It appears Postgres has a way of doing variables in pgSQL, but it's unclear to me how this would be pulled off in a standard query. This is a query that will be run tens or hundreds of thousands of times from a C# program. That aside, I could do a pgSQL stored procedure if required; just wanted to make sure there wasn't a simpler alternative first.
In case you were wondering, the query looks messy because it's the result of a npgsql-generated parameterized query. I think it's fair to say that npgsql is being extra-cautious with redundant typing and escaping.
I am running PostGIS 2.0.1, Postgres 9.1.5, and npgsql 2.0.12.
It sounds like you want a scalar subquery, an expression written like (SELECT ....) (note enclosing parentheses) that contains a query returning either zero rows (NULL result) or one field from one row.
You were most of the way there, you just needed the parens:
UPDATE raw.geocoding
SET the_geom = ST_Centroid(
ST_Collect(
(SELECT the_geom
FROM raw.geocoding
WHERE hash = ((E'0101000020090C000081610F9CC5DC3341EE672E6E723B3241')::varchar)),
(SELECT the_geom
FROM raw.geocoding
WHERE hash = ((E'0101000020090C00002CF887E0C5DC3341C9E5B2DF2A383241')::varchar))
)
)
WHERE hash = ((E'3e638a27c6c38f05026252f4a0b57b2e')::varchar)
Note that subqueries can be used in other places too - table returning subqueries can appear in FROM, for example. The PostgreSQL manual teaches about all this, and is well worth a cover-to-cover read.
If you're doing a lot of these updates, you may find it more efficient to formulate the UPDATE as a join using the PostgreSQL extension UPDATE ... FROM ... WHERE rather than running lots of individual UPDATEs over and over. I just wanted to raise the possibility. See from-list in UPDATE