Query Execution in Postgresql - postgresql

How to execute query within format() function in Postgres? can any one please guide me.
IF NOT EXISTS ( SELECT format ('select id
from '||some_table||' where emp_id
='||QUOTE_LITERAL(m_emp_id)||' )
) ;

You may combine EXECUTE FORMAT and an IF condition using GET DIAGNOSTICS
Here's an example which you can reuse with minimal changes.I have passed table name using %I identifier and parameterised argument ($1) for employee_id. This is safe against SQL Injection.LIMIT 1 is used since we are interested if at least one row exists. This will improve query performance and is equivalent(or efficient) to using EXISTS for huge datasets with multiple matching rows.
DO $$
DECLARE
some_table TEXT := 'employees';
m_emp_id INT := 100;
cnt INT;
BEGIN
EXECUTE format ('select 1
from %I where employee_id
= $1 LIMIT 1',some_table ) USING m_emp_id ;
GET DIAGNOSTICS cnt = ROW_COUNT;
IF cnt > 0
THEN
RAISE NOTICE 'FOUND';
ELSE
RAISE NOTICE 'NOT FOUND';
END IF;
END
$$;
Result
NOTICE: FOUND
DO

The #Kaushik Nayak reply is correct - I will try just little bit more explain this issue.
PLpgSQL knows two types of queries to database:
static (embedded) SQL - SQL is written directly to plpgsql code. Static SQL can be parametrized (can use a varible), but the parameters should be used as table or column identifier. The semantic (and execution plan) should be same every time.
dynamic SQL - this style of queries is similar to classic client side SQL - SQL is written as string expression (that is evaluated) in running time, and result of this string expression is evaluated as SQL query. There are not limits for parametrization, but there is a risk of SQL injections (parameters must be sanitized against SQL injection (good examples are in #Kaushik Nayak reply)). More, there is a overhead with every time replanning - so dynamic SQL should be used only when it is necessary. Dynamic SQL is processed by EXECUTE command. The syntax of IF commands is:
IF expression THEN
statements; ...
END IF;
You cannot to place into expression PLpgSQL statements. So this task should be divided to more steps:
EXECUTE format(...) USING m_emp_id;
GET DIAGNOSTICS rc = ROW_COUNT;
IF cnt > 0 THEN
...
END IF;

Related

What Identifier in "EXECUTE FORMAT" with "FOR ... IN" loop "record" variable/iterator that uses dot/period "." for Column name in PLPGSQL Procedure?

I am using a basic plpgsql EXECUTE... FORMAT dynamic script with a FOR...IN loop. The problem is, the LOOP has a variable/pointer (like any FOR LOOP in any language) that iterates through the query result set of the SELECT... query in the EXECUTE. I need to make the column name part after the period/dot in temprecord.column_name dynamic, and thus use an Identifier (i.e. %I, %s, %L) for it, As you know, the way to get data (In this column is an ALTER TABLE sql statement) out of this variable/pointer is to use the "dot notation", that is, i.e.
/* Output employee names from column "names" in Employee table */
FOR temprecord IN
EXECUTE format('SELECT *
FROM %I t', ''Employee'')
LOOP
EXECUTE temprecord.names; -- THIS WORKS FINE WHEN I HARDCODE IT. I CAN'T SEEM TO MAKE IT DYNAMIC
END LOOP;
So the above works fine when I hardcord temprecord.names. The problem is I want the column name dynamic, so if different callers/methods call my function, I can select different columns through the temprecord iterator, and EXECUTE this data.
I tried many times what I have below and the best response I have gotten so far is that the sql query that I have in the column (as I stated above) executed, however, it showed an syntax error and truncation error, but I clearly noticed it was returned the all three columns of my table and concatenating the columns together. But I know the real problem is the variable temprecord not picking up on the dot notation that specifies column name i.e. temprecord.column_name_here, that's why it returns all columns and throws truncation here. As I stated above, it works when hardcoded.
i.e.
/* Using the $$ for string formatting here */
CREATE OR REPLACE PROCEDURE my_proc(drop_or_add text)
LANGUAGE plpgsql
AS
$procedure$
DECLARE
temprecord record;
col_nm text;
BEGIN
col_nm := concat_ws('_','sql',drop_or_add); -- SHOULD CONCATENATE TO COLUMN NAME, I.E. sql_drop OR sql_add WHICH HOLDS SQL "ALTER TABLE..." QUERY
FOR temprecord IN
EXECUTE format($f$ SELECT t.col1, t.col2, t.col3%s
FROM some_tbl t
WHERE t.col4 = %L, drop_or_add, 'blue')
LOOP
EXECUTE format($f$ %I.%s $f$, "temprecord", col_nm);
END LOOP;
END
$procedure$;
Hopefully none of this syntax is confusing, nor my logic. Again, I am simply looping with the temprecord iterator variable and trying to access some columns of my database table, that holds some SQL statements and needs to be dynamic based on the argument passed to my Procedure. So, potentially EXECUTE temprecord.sql_drop or EXECUTE temprecord.sql_add can execute.
I always used the %s in my other plpgsql scripts when I access a column of table using dot notation i.e. tbl1.id = tbl2.% and it's always worked fine. I have tried basically every combination possible, but I'll list a few here, i.e.
EXECUTE format($f$ %I.%s $f$, "temprecord", col_nm);
EXECUTE format($f$ %I%s $f$, "temprecord", '.sql_drop');
EXECUTE format($f$ %I.%s $f$, temprecord, col_nm);
EXECUTE format($f$ %I.%I $f$, "temprecord", col_nm);
EXECUTE format($f$ %I.%I $f$, "temprecord", "col_nm");
.
..
...
the string passed to EXECUTE statement have to be valid SQL command. It cannot be an expression.
The PL/pgSQL is almost static language - there are some dynamic features, but almost it is static typed language. Overusing EXECUTE statement is generally bad idea. It can work, but the code will be very unreadable and not well maintainable.
For dynamic access to record, the best way is (today) using transformation to json:
postgres=# create or replace function fx(nm varchar(64))
returns void as $$
declare
r record;
r2 record;
begin
for r in execute format('select v as %I from generate_series(1,3) g(v)', nm)
loop
raise notice '%', (row_to_json(r))->>nm;
end loop;
end;
$$ language plpgsql;
CREATE FUNCTION
postgres=# select fx('foo');
NOTICE: 1
NOTICE: 2
NOTICE: 3
┌────┐
│ fx │
╞════╡
│ │
└────┘
(1 row)
I was able to find a solution, however, I absolutely tried every combination of Identifiers and I could not get this to work when concatenating the record iterator and the column name in the EXECUTE FORMAT block. So my solution was just stick stick a CASE statement before the second EXECUTE... and then directly pass that String variable to the EXECUTE and use only one identifier. It seemed like the problem was any time I was trying to divide/split up the temprecord.column_name inside the FORMAT i.e. EXECUTE FORMAT($f$ %I.$s $f$, temprecord,column_name), it would just not work. However, I am able to do this with regular Tables and Columns in SELECT.. FROM WHERE t1.col1 = t2.%, so I'm not sure if its just different with the record variable because its part of a FOR... IN LOOP or something.
Solution
CASE
WHEN drop_or_add ILIKE '%drop%' THEN
v_exect_sql := temprow.sql_drop;
WHEN drop_or_add ILIKE '%add%' THEN
v_exect_sql := temprow.sql_add;
END CASE;
BEGIN
EXECUTE format($f$ %s $f$, v_exect_sql); -- Have to pass the `temprecord.column_name` all as one Identifier or would NOT work for me

use variable as the result dataset

I'm new to PostgreSql, used to work with SQL Server.
I'm having trouble understanding one scenario, quite simple from Sql Server perspective.
I need to have as the query result (e.g. dataset) the value of a variable.
I Sql Server i'd have something like this:
Declare #cnt int
Delete from MyTable
set #cnt = ##ROWCOUNT
select #cnt as FinalResult
The expected result was to use the value of the variable (e.g. #cnt as my resultset).
I tried the same in PostgreSql, using the follwoing code:
DO $$
DECLARE
affected_rows integer;
BEGIN
delete from mySchema.MyTable;
GET DIAGNOSTICS affected_rows := ROW_COUNT;
raise notice 'affected records: %', affected_rows;
select affected_rows as result;
END $$;
Obviously it throws me an error.
Could you please advise on how to approach it?
Many thanks.
I assume the error you get is "query has no destination for result data" which is caused by the select affected_rows as result; line. In Postgres' PL/pgSQL (and essentially every other procedural language in relational databases - except for T-SQL), the result of a query needs to be stored somewhere.
A do block is like a temporary function that does not return anything, so the only way to give feedback from there, is to use raise notice to print the number of deleted rows. You can't have a SELECT statement returning something from a DO block.
But you don't need PL/pgSQL for that to begin with:
with deleted as (
delete from table
returning *
)
select count(*) as final_result
from deleted;

Calculate number of rows affected by batch query in PostgreSQL

First of all, yes I've read documentation for DO statement :)
http://www.postgresql.org/docs/9.1/static/sql-do.html
So my question:
I need to execute some dynamic block of code that contains UPDATE statements and calculate the number of all affected rows. I'm using Ado.Net provider.
In Oracle the solution would have 4 steps:
add InputOutput parameter "N" to command
add BEGIN ... END; to command
add :N := :N + sql%rowcount after each statement.
It's done! We can read N parameter from command, after execute it.
How can I do it with PostgreSQL? I'm using npgsql provider but can migrate to devard if it helps.
DO statement blocks are good to execute dynamic SQL. They are no good to return values. Use a plpgsql function for that.
The key statement you need is:
GET DIAGNOSTICS integer_var = ROW_COUNT;
Details in the manual.
Example code:
CREATE OR REPLACE FUNCTION f_upd_some()
RETURNS integer AS
$func$
DECLARE
ct int;
i int;
BEGIN
EXECUTE 'UPDATE tbl1 ...'; -- something dynamic here
GET DIAGNOSTICS ct = ROW_COUNT; -- initialize with 1st count
UPDATE tbl2 ...; -- nothing dynamic here
GET DIAGNOSTICS i = ROW_COUNT;
ct := ct + i; -- add up
RETURN ct;
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM f_upd_some();
My solution is quite simple. In Oracle I need to use variables to calculate the sum of updated rows because command.ExecuteNonQuery() returns only the count of rows affected by the last UPDATE in the batch.
However, npgsql returns the sum of all rows updated by all UPDATE queries. So I only need to call command.ExecuteNonQuery() and get the result without any variables. Much easier than with Oracle.

Why does PostgreSQL treat my query differently in a function?

I have a very simple query that is not much more complicated than:
select *
from table_name
where id = 1234
...it takes less than 50 milliseconds to run.
Took that query and put it into a function:
CREATE OR REPLACE FUNCTION pie(id_param integer)
RETURNS SETOF record AS
$BODY$
BEGIN
RETURN QUERY SELECT *
FROM table_name
where id = id_param;
END
$BODY$
LANGUAGE plpgsql STABLE;
This function when executed select * from pie(123); takes 22 seconds.
If I hard code an integer in place of id_param, the function executes in under 50 milliseconds.
Why does the fact that I am using a parameter in the where statement cause my function to run slow?
Edit to add concrete example:
CREATE TYPE test_type AS (gid integer, geocode character varying(9))
CREATE OR REPLACE FUNCTION geocode_route_by_geocode(geocode_param character)
RETURNS SETOF test_type AS
$BODY$
BEGIN
RETURN QUERY EXECUTE
'SELECT gs.geo_shape_id AS gid,
gs.geocode
FROM geo_shapes gs
WHERE geocode = $1
AND geo_type = 1
GROUP BY geography, gid, geocode' USING geocode_param;
END;
$BODY$
LANGUAGE plpgsql STABLE;
ALTER FUNCTION geocode_carrier_route_by_geocode(character)
OWNER TO root;
--Runs in 20 seconds
select * from geocode_route_by_geocode('999xyz');
--Runs in 10 milliseconds
SELECT gs.geo_shape_id AS gid,
gs.geocode
FROM geo_shapes gs
WHERE geocode = '9999xyz'
AND geo_type = 1
GROUP BY geography, gid, geocode
Update in PostgreSQL 9.2
There was a major improvement, I quote the release notes here:
Allow the planner to generate custom plans for specific parameter
values even when using prepared statements (Tom Lane)
In the past, a prepared statement always had a single "generic" plan
that was used for all parameter values, which was frequently much
inferior to the plans used for non-prepared statements containing
explicit constant values. Now, the planner attempts to generate custom
plans for specific parameter values. A generic plan will only be used
after custom plans have repeatedly proven to provide no benefit. This
change should eliminate the performance penalties formerly seen from
use of prepared statements (including non-dynamic statements in
PL/pgSQL).
Original answer for PostgreSQL 9.1 or older
A plpgsql functions has a similar effect as the PREPARE statement: queries are parsed and the query plan is cached.
The advantage is that some overhead is saved for every call.
The disadvantage is that the query plan is not optimized for the particular parameter values it is called with.
For queries on tables with even data distribution, this will generally be no problem and PL/pgSQL functions will perform somewhat faster than raw SQL queries or SQL functions. But if your query can use certain indexes depending on the actual values in the WHERE clause or, more generally, chose a better query plan for the particular values, you may end up with a sub-optimal query plan. Try an SQL function or use dynamic SQL with EXECUTE to force a the query to be re-planned for every call. Could look like this:
CREATE OR REPLACE FUNCTION pie(id_param integer)
RETURNS SETOF record AS
$BODY$
BEGIN
RETURN QUERY EXECUTE
'SELECT *
FROM table_name
where id = $1'
USING id_param;
END
$BODY$
LANGUAGE plpgsql STABLE;
Edit after comment:
If this variant does not change the execution time, there must be other factors at play that you may have missed or did not mention. Different database? Different parameter values? You would have to post more details.
I add a quote from the manual to back up my above statements:
An EXECUTE with a simple constant command string and some USING
parameters, as in the first example above, is functionally equivalent
to just writing the command directly in PL/pgSQL and allowing
replacement of PL/pgSQL variables to happen automatically. The
important difference is that EXECUTE will re-plan the command on each
execution, generating a plan that is specific to the current parameter
values; whereas PL/pgSQL normally creates a generic plan and caches it
for re-use. In situations where the best plan depends strongly on the
parameter values, EXECUTE can be significantly faster; while when the
plan is not sensitive to parameter values, re-planning will be a
waste.

Putting EXPLAIN results into a table?

I see from the Postgres 8.1 docs that EXPLAIN generated something like tabular data:
Prior to PostgreSQL 7.3, the plan was emitted in the form of a NOTICE
message. Now it appears as a query result (formatted like a table with
a single text column).
I'm working with 9.0, for which, the docs say, the output can be a variety of types, including TEXT and XML. What I'd really like to do is treat the output as a standard query result so that I could generate a simple report for a query or a set of queries, e.g.,
SELECT maxcost FROM (
EXPLAIN VERBOSE
SELECT COUNT(*)
FROM Mytable
WHERE value>17);
The above doesn't work in any form that I've tried, and I made up the attribute maxcost to demonstrate how neat it would be to pull out specific bits of data (in this case, the maximum estimated cost of the query). Is there anything I can do that would get me part of the way there? I'd prefer to be able to work within a simple SQL console.
No other answers so far, so here's my own stab at it.
It's possible to read the results of explain into a variable within plpgsql, and since the output can be in XML one can wrap EXPLAIN in a stored function to yield the top-level costs using xpath:
CREATE OR REPLACE FUNCTION estimate_cost(IN query text,
OUT startup numeric,
OUT totalcost numeric,
OUT planrows numeric,
OUT planwidth numeric)
AS
$BODY$
DECLARE
query_explain text;
explanation xml;
nsarray text[][];
BEGIN
nsarray := ARRAY[ARRAY['x', 'http://www.postgresql.org/2009/explain']];
query_explain :=e'EXPLAIN(FORMAT XML) ' || query;
EXECUTE query_explain INTO explanation;
startup := (xpath('/x:explain/x:Query/x:Plan/x:Startup-Cost/text()', explanation, nsarray))[1];
totalcost := (xpath('/x:explain/x:Query/x:Plan/x:Total-Cost/text()', explanation, nsarray))[1];
planrows := (xpath('/x:explain/x:Query/x:Plan/x:Plan-Rows/text()', explanation, nsarray))[1];
planwidth := (xpath('/x:explain/x:Query/x:Plan/x:Plan-Width/text()', explanation, nsarray))[1];
RETURN;
END;
$BODY$
LANGUAGE plpgsql;
Hence the example from the question becomes:
SELECT totalcost
FROM estimate_cost('SELECT COUNT(*)
FROM Mytable
WHERE value>17');
This message suggests using "FOR rec IN EXECUTE EXPLAIN..." which seems to do the trick e.g.:
drop table if exists temp_a;
create temp table temp_a
(
"QUERY PLAN" text
);
DO
$$
DECLARE
rec record;
BEGIN
FOR rec IN EXECUTE 'EXPLAIN VERBOSE select version()'
LOOP
-- RAISE NOTICE 'rec=%', row_to_json(rec);
insert into temp_a
select rec."QUERY PLAN";
END LOOP;
END
$$ language plpgsql;
select *
from temp_a;