syntax error when calling plpgsql function - postgresql

I'm working on a project that involves calculating the percent of an industry cluster's cost structure that comes from in-region transportation costs. I'll have one table for each industry cluster with the detailed cost breakdown (naics, amount, inregion_amt), a lookup table transpo_industries with all transportation naics, and a summary table cluster_costs that I want to eventually contain each industry cluster's name (c_name), the total cost (tot_cost), and the in-region transportation costs (inregion_transpo). The table is already populated with all the industry names, which match the table names for the corresponding industry clusters.
Since I need to run through at least 15 industry clusters and would potentially like to re-run this code with smaller subsets of the data, I'm trying to create a function. The following code creates the function without error, but when I try to call it, I get a syntax error ("ERROR: syntax error at or near "clustercosts" SQL state: 42601")
Can anyone help point out where I'm going wrong?
create or replace function clustercosts(tblname text) RETURNS void
AS $$
BEGIN
EXECUTE 'update cluster_costs set tot_cost= (select sum(amount) from '||tblname||'), inregion_transpo = (select sum(inregion_amt) from '||tblname||', transpo_industries where '||tblname||'.naics=transpo_industries.naics) where c_name='||tblname||;
END;
$$ Language plpgsql;
A version using format() gives me the same error:
CREATE OR REPLACE FUNCTION udate_clustercosts(tblname text)
RETURNS void AS
$BODY$
BEGIN
EXECUTE format(
'update cluster_costs'
'set tot_cost= (select sum(amount)from %I),'
'inregion_transpo = (select sum(inregion_amt) from %I, transpo_industries where %I.naics=transpo_industries.naics)'
'where c_name=%I',tblname);
END;
$BODY$
LANGUAGE plpgsql;

Your problems start at the design stage. With a proper DB design you wouldn't need dynamic SQL for this to begin with.
I'll have one table for each industry cluster ...
Don't. This should be a single table (like cluster_details) with a FK column (like cluster_id) referencing the PK of the table listing industry clusters (like industry_cluster).
It's also questionable that you materialize a computed aggregate with your UPDATE. Use a VIEW (or function) instead to get current sums. Your base query would be something like:
SELECT ic.*
, sum(cd.amount) AS sum_amount
, (SELECT sum(inregion_amt)
FROM transpo_industries
WHERE naics = cd.naics) AS sum_inregion_amt
FROM industry_cluster ic
LEFT JOIN cluster_details cd USING (cluster_id)
WHERE ic.name = 'Cluster 1';
As for the question asked: since the error is triggered by the function call and the error message clearly references the function name, the problem lies with the call, which is missing in the question.
There are other problems in your function definition, as has been pointed out in the comments - none of which are related to the error message you presented.

You have been bitten by the fact, that you want to use single quotes within quoted string. You can avoid that using dollar-quoted string constants as explained in the documentation.
The problem arises because you want to use single quote within SQL statement, because you want to pass value of tblname as a string constant.
Here I use $a$ to quote within the function body, quoted with $$:
create or replace function clustercosts(tblname text) RETURNS void
AS $$
BEGIN
EXECUTE $a$ update cluster_costs set tot_cost= (select sum(amount) from $a$ || tblname || $a$), inregion_transpo = (select sum(inregion_amt) from $a$ || tblname || $a$, transpo_industries where $a$ || tblname || $a$.naics=transpo_industries.naics) where cluster_costs.c_name='$a$ || tblname || $a$'$a$;
END;
$$ language plpgsql;
It's valid to insert nearly any identifier between the dollar signs and is a common pattern for nesting quotes in functions, exactly as in your case.
Example
I create the tables you describe:
create table tblname (naics int, amount int, inregion_amt int);
create table transpo_industries (naics int);
create table cluster_costs (c_name text, tot_cost int, inregion_transpo int);
testdb=> SELECT clustercosts('tblname');
clustercosts
--------------
(1 row)
No errors, SQL executed.

Related

Declare and return value for DELETE and INSERT

I am trying to remove duplicated data from some of our databases based upon unique id's. All deleted data should be stored in a separate table for auditing purposes. Since it concerns quite some databases and different schemas and tables I wanted to start using variables to reduce chance of errors and the amount of work it will take me.
This is the best example query I could think off, but it doesn't work:
do $$
declare #source_schema varchar := 'my_source_schema';
declare #source_table varchar := 'my_source_table';
declare #target_table varchar := 'my_target_schema' || source_table || '_duplicates'; --target schema and appendix are always the same, source_table is a variable input.
declare #unique_keys varchar := ('1', '2', '3')
begin
select into #target_table
from #source_schema.#source_table
where id in (#unique_keys);
delete from #source_schema.#source_table where export_id in (#unique_keys);
end ;
$$;
The query syntax works with hard-coded values.
Most of the times my variables are perceived as columns or not recognized at all. :(
You need to create and then call a plpgsql procedure with input parameters :
CREATE OR REPLACE PROCEDURE duplicates_suppress
(my_target_schema text, my_source_schema text, my_source_table text, unique_keys text[])
LANGUAGE plpgsql AS
$$
BEGIN
EXECUTE FORMAT(
'WITH list AS (INSERT INTO %1$I.%3$I_duplicates SELECT * FROM %2$I.%3$I WHERE array[id] <# %4$L :: integer[] RETURNING id)
DELETE FROM %2$I.%3$I AS t USING list AS l WHERE t.id = l.id', my_target_schema, my_source_schema, my_source_table, unique_keys :: text) ;
END ;
$$ ;
The procedure duplicates_suppress inserts into my_target_schema.my_source_table || '_duplicates' the rows from my_source_schema.my_source_table whose id is in the array unique_keys and then deletes these rows from the table my_source_schema.my_source_table .
See the test result in dbfiddle.
As has been commented, you need some kind of dynamic SQL. In a FUNCTION, PROCEDURE or a DO statement to do it on the server.
You should be comfortable with PL/pgSQL. Dynamic SQL is no beginners' toy.
Example with a PROCEDURE, like Edouard already suggested. You'll need a FUNCTION instead to wrap it in an outer transaction (like you very well might). See:
When to use stored procedure / user-defined function?
CREATE OR REPLACE PROCEDURE pg_temp.f_archive_dupes(_source_schema text, _source_table text, _unique_keys int[], OUT _row_count int)
LANGUAGE plpgsql AS
$proc$
-- target schema and appendix are always the same, source_table is a variable input
DECLARE
_target_schema CONSTANT text := 's2'; -- hardcoded
_target_table text := _source_table || '_duplicates';
_sql text := format(
'WITH del AS (
DELETE FROM %I.%I
WHERE id = ANY($1)
RETURNING *
)
INSERT INTO %I.%I TABLE del', _source_schema, _source_table
, _target_schema, _target_table);
BEGIN
RAISE NOTICE '%', _sql; -- debug
EXECUTE _sql USING _unique_keys; -- execute
GET DIAGNOSTICS _row_count = ROW_COUNT;
END
$proc$;
Call:
CALL pg_temp.f_archive_dupes('s1', 't1', '{1, 3}', 0);
db<>fiddle here
I made the procedure temporary, since I assume you don't need to keep it permanently. Create it once per database. See:
How to create a temporary function in PostgreSQL?
Passed schema and table names are case-sensitive strings! (Unlike unquoted identifiers in plain SQL.) Either way, be wary of SQL-injection when concatenating SQL dynamically. See:
Are PostgreSQL column names case-sensitive?
Table name as a PostgreSQL function parameter
Made _unique_keys type int[] (array of integer) since your sample values look like integers. Use a the actual data type of your id columns!
The variable _sql holds the query string, so it can easily be debugged before actually executing. Using RAISE NOTICE '%', _sql; for that purpose.
I suggest to comment the EXECUTE line until you are sure.
I made the PROCEDURE return the number of processed rows. You didn't ask for that, but it's typically convenient. At hardly any cost. See:
Dynamic SQL (EXECUTE) as condition for IF statement
Best way to get result count before LIMIT was applied
Last, but not least, use DELETE ... RETURNING * in a data-modifying CTE. Since that has to find rows only once it comes at about half the cost of separate SELECT and DELETE. And it's perfectly safe. If anything goes wrong, the whole transaction is rolled back anyway.
Two separate commands can also run into concurrency issues or race conditions which are ruled out this way, as DELETE implicitly locks the rows to delete. Example:
Replicating data between Postgres DBs
Or you can build the statements in a client program. Like psql, and use \gexec. Example:
Filter column names from existing table for SQL DDL statement
Based on Erwin's answer, minor optimization...
create or replace procedure pg_temp.p_archive_dump
(_source_schema text, _source_table text,
_unique_key int[],_target_schema text)
language plpgsql as
$$
declare
_row_count bigint;
_target_table text := '';
BEGIN
select quote_ident(_source_table) ||'_'|| array_to_string(_unique_key,'_') into _target_table from quote_ident(_source_table);
raise notice 'the deleted table records will store in %.%',_target_schema, _target_table;
execute format('create table %I.%I as select * from %I.%I limit 0',_target_schema, _target_table,_source_schema,_source_table );
execute format('with mm as ( delete from %I.%I where id = any (%L) returning * ) insert into %I.%I table mm'
,_source_schema,_source_table,_unique_key, _target_schema, _target_table);
GET DIAGNOSTICS _row_count = ROW_COUNT;
RAISE notice 'rows influenced, %',_row_count;
end
$$;
--
if your _unique_key is not that much, this solution also create a table for you. Obviously you need to create the target schema yourself.
If your unique_key is too much, you can customize to properly rename the dumped table.
Let's call it.
call pg_temp.p_archive_dump('s1','t1', '{1,2}','s2');
s1 is the source schema, t1 is source table, {1,2} is the unique key you want to extract to the new table. s2 is the target schema

Error when creating a generated column in Postgresql

CREATE TABLE Person (
id serial primary key,
accNum text UNIQUE GENERATED ALWAYS AS (
concat(right(cast(extract year from current_date) as text), 2), cast(id as text)) STORED
);
Error: generation expression is not immutable
The goal is to populate the accNum field with YYid where YY is the last two letters of the year when the person was added.
I also tried the '||' operator but it was unsuccessful.
As you don't expect the column to be updated, when the row is changed, you can define your own function that generates the number:
create function generate_acc_num(id int)
returns text
as
$$
select to_char(current_date, 'YY')||id::text;
$$
language sql
immutable; --<< this is lying to Postgres!
Note that you should never use this function for any other purpose. Especially not as an index expression.
Then you can use that in a generated column:
CREATE TABLE Person
(
id integer generated always as identity primary key,
acc_num text UNIQUE GENERATED ALWAYS AS (generate_acc_num(id)) STORED
);
As #ScottNeville correctly mentioned:
CURRENT_DATE is not immutable. So it cannot be used int a GENERATED ALWAYS AS expression.
However, you can achieve this using a trigger nevertheless:
demo:db<>fiddle
CREATE FUNCTION accnum_trigger_function()
RETURNS TRIGGER
LANGUAGE PLPGSQL
AS $$
BEGIN
NEW.accNum := right(extract(year from current_date)::text, 2) || NEW.id::text;
RETURN NEW;
END
$$;
CREATE TRIGGER tr_accnum
BEFORE INSERT
ON person
FOR EACH ROW
EXECUTE PROCEDURE accnum_trigger_function();
As #a_horse_with_no_name mentioned correctly in the comments: You can simplify the expression to:
NEW.accNum := to_char(current_date, 'YY') || NEW.id;
I am not exactly sure how to solve this problem (maybe a trigger), but current_date is a stable function not an immutable one. For the generated IDs I believe all function calls must be immutable. You can read more here https://www.postgresql.org/docs/current/xfunc-volatility.html
I dont think any function that gets the date can be immutable as Postgres defines this as "An IMMUTABLE function cannot modify the database and is guaranteed to return the same results given the same arguments forever." This will not be true for anything that returns the current date.
I think your best bet would be to do this with a trigger so on insert it sets the value.

Execute a SELECT with dynamic ORDER BY expression inside a function

I'm trying to EXECUTE some SELECTs to use inside a function, my code is something like this:
DECLARE
result_one record;
BEGIN
EXECUTE 'WITH Q1 AS
(
SELECT id
FROM table_two
INNER JOINs, WHERE, etc, ORDER BY... DESC
)
SELECT Q1.id
FROM Q1
WHERE, ORDER BY...DESC';
RETURN final_result;
END;
I know how to do it in MySQL, but in PostgreSQL I'm failing. What should I change or how should I do it?
For a function to be able to return multiple rows it has to be declared as returns table() (or returns setof)
And to actually return a result from within a PL/pgSQL function you need to use return query (as documented in the manual)
To build dynamic SQL in Postgres it is highly recommended to use the format() function to properly deal with identifiers (and to make the source easier to read).
So you need something like:
create or replace function get_data(p_sort_column text)
returns table (id integer)
as
$$
begin
return query execute
format(
'with q1 as (
select id
from table_two
join table_three on ...
)
select q1.id
from q1
order by %I desc', p_sort_column);
end;
$$
language plpgsql;
Note that the order by inside the CTE is pretty much useless if you are sorting the final query unless you use a LIMIT or distinct on () inside the query.
You can make your life even easier if you use another level of dollar quoting for the dynamic SQL:
create or replace function get_data(p_sort_column text)
returns table (id integer)
as
$$
begin
return query execute
format(
$query$
with q1 as (
select id
from table_two
join table_three on ...
)
select q1.id
from q1
order by %I desc
$query$, p_sort_column);
end;
$$
language plpgsql;
What a_horse said. And:
How to return result of a SELECT inside a function in PostgreSQL?
Plus, to pick a column for ORDER BY dynamically, you have to add that column to the SELECT list of your CTE, which leads to complications if the column can be duplicated (like with passing 'id') ...
Better yet, remove the CTE entirely. There is nothing in your question to warrant its use anyway. (Only use CTEs when needed in Postgres, they are typically slower than equivalent subqueries or simple queries.)
CREATE OR REPLACE FUNCTION get_data(p_sort_column text)
RETURNS TABLE (id integer) AS
$func$
BEGIN
RETURN QUERY EXECUTE format(
$q$
SELECT t2.id -- assuming you meant t2?
FROM table_two t2
JOIN table_three t3 on ...
ORDER BY t2.%I DESC NULL LAST -- see below!
$q$, $1);
END
$func$ LANGUAGE plpgsql;
I appended NULLS LAST - you'll probably want that, too:
PostgreSQL sort by datetime asc, null first?
If p_sort_column is from the same table all the time, hard-code that table name / alias in the ORDER BY clause. Else, pass the table name / alias separately and auto-quote them separately to be safe:
Define table and column names as arguments in a plpgsql function?
I suggest to table-qualify all column names in a bigger query with multiple joins (t2.id not just id). Avoids various kinds of surprising results / confusion / abuse.
And you may want to schema-qualify your table names (myschema.table_two) to avoid similar troubles when calling the function with a different search_path:
How does the search_path influence identifier resolution and the "current schema"

Dynamic table name for INSERT INTO query

I am trying to figure out how to write an INSERT INTO query with table name and column name of the source as parameter.
For starters I was just trying to parametrize the source table name. I have written the following query. For now I am declaring and assigning the value of the variable tablename directly, but in actual example it would come from some other source/list. The target table has only one column.
CREATE OR REPLACE FUNCTION foo()
RETURNS void AS
$$
DECLARE
tablename text;
BEGIN
tablename := 'Table_1';
EXECUTE 'INSERT INTO "Schemaname"."targettable"
SELECT "Col_A"
FROM "schemaname".'
||quote_ident(tablename);
END
$$ LANGUAGE PLPGSQL;
Although the query runs without any error no changes are reflected at the target table. On running the query I get the following output.
Query OK, 0 rows affected (execution time: 296 ms; total time: 296 ms)
I want the changes to be reflected at the target table. I don't know how to resolve the problem.
Audited code
CREATE OR REPLACE FUNCTION foo()
RETURNS void AS
$func$
DECLARE
_tbl text := 'Table_1'; -- or 'table_1'?
BEGIN
EXECUTE 'INSERT INTO schemaname.targettable(column_name)
SELECT "Col_A"
FROM schemaname.' || quote_ident(_tbl); -- or "Schemaname"?
END
$func$ LANGUAGE plpgsql;
Always use an explicit target list for persisted INSERT statements.
You can assign variables at declare time.
It's a wide-spread folly to use double-quoted identifiers to preserve otherwise illegal spelling. You have to keep double-quoting the name for the rest of its existence. One or more of those errors seem to have crept into your code: "Schemaname" or "schemaname"? Table_1 or "Table_1"?
Are PostgreSQL column names case-sensitive?
When you provide an identifier like a table name as text parameter and escape it with quote_ident(), it is case sensitive!
Identifiers in SQL code are cast to lower case unless double-quoted. But quote-ident() (which you must use to defend against SQL injection) preserves the spelling you provide with double-quotes where necessary.
Function with parameter
CREATE OR REPLACE FUNCTION foo(_tbl text)
RETURNS void AS
$func$
BEGIN
EXECUTE 'INSERT INTO schemaname.targettable(column_name)
SELECT "Col_A"
FROM schemaname.' || quote_ident(_tbl);
END
$func$ LANGUAGE plpgsql;
Call:
SELECT foo('tablename'); -- tablename is case sensitive
There are other ways:
Table name as a PostgreSQL function parameter

Putting EXPLAIN results into a table?

I see from the Postgres 8.1 docs that EXPLAIN generated something like tabular data:
Prior to PostgreSQL 7.3, the plan was emitted in the form of a NOTICE
message. Now it appears as a query result (formatted like a table with
a single text column).
I'm working with 9.0, for which, the docs say, the output can be a variety of types, including TEXT and XML. What I'd really like to do is treat the output as a standard query result so that I could generate a simple report for a query or a set of queries, e.g.,
SELECT maxcost FROM (
EXPLAIN VERBOSE
SELECT COUNT(*)
FROM Mytable
WHERE value>17);
The above doesn't work in any form that I've tried, and I made up the attribute maxcost to demonstrate how neat it would be to pull out specific bits of data (in this case, the maximum estimated cost of the query). Is there anything I can do that would get me part of the way there? I'd prefer to be able to work within a simple SQL console.
No other answers so far, so here's my own stab at it.
It's possible to read the results of explain into a variable within plpgsql, and since the output can be in XML one can wrap EXPLAIN in a stored function to yield the top-level costs using xpath:
CREATE OR REPLACE FUNCTION estimate_cost(IN query text,
OUT startup numeric,
OUT totalcost numeric,
OUT planrows numeric,
OUT planwidth numeric)
AS
$BODY$
DECLARE
query_explain text;
explanation xml;
nsarray text[][];
BEGIN
nsarray := ARRAY[ARRAY['x', 'http://www.postgresql.org/2009/explain']];
query_explain :=e'EXPLAIN(FORMAT XML) ' || query;
EXECUTE query_explain INTO explanation;
startup := (xpath('/x:explain/x:Query/x:Plan/x:Startup-Cost/text()', explanation, nsarray))[1];
totalcost := (xpath('/x:explain/x:Query/x:Plan/x:Total-Cost/text()', explanation, nsarray))[1];
planrows := (xpath('/x:explain/x:Query/x:Plan/x:Plan-Rows/text()', explanation, nsarray))[1];
planwidth := (xpath('/x:explain/x:Query/x:Plan/x:Plan-Width/text()', explanation, nsarray))[1];
RETURN;
END;
$BODY$
LANGUAGE plpgsql;
Hence the example from the question becomes:
SELECT totalcost
FROM estimate_cost('SELECT COUNT(*)
FROM Mytable
WHERE value>17');
This message suggests using "FOR rec IN EXECUTE EXPLAIN..." which seems to do the trick e.g.:
drop table if exists temp_a;
create temp table temp_a
(
"QUERY PLAN" text
);
DO
$$
DECLARE
rec record;
BEGIN
FOR rec IN EXECUTE 'EXPLAIN VERBOSE select version()'
LOOP
-- RAISE NOTICE 'rec=%', row_to_json(rec);
insert into temp_a
select rec."QUERY PLAN";
END LOOP;
END
$$ language plpgsql;
select *
from temp_a;