How to identify slow queries in PostgreSQL function? - postgresql

How can I identify slow queries in a Postgres function?
For example:
CREATE OR REPLACE FUNCTION my_function ()
RETURNS void AS $$
BEGIN
query#1;
query#2; --> slow query (durration 4 sec)
query#3;
query#4;
END
$$ LANGUAGE plpgsql;
After executing my_function() I get something like this in my Postgres log file:
duration: 4.904 ms statement: select my_function ();",,,,,,,,,"psql"
So I can't identify slow queries in my function.

By default, PL/pgSQL functions are black boxes to the Postgres query planner and logging.
The additional module auto_explain allows for more insights. It can be loaded dynamically, but you have to be a superuser. (Does not have to be installed like most other modules.)
To load it an individual session:
LOAD 'auto_explain';
-- SET auto_explain.log_min_duration = 1; -- exclude very fast trivial queries?
SET auto_explain.log_nested_statements = ON; -- statements inside functions
-- SET auto_explain.log_analyze = ON; -- get actual times, too?
Any query running in the same session will get extra logging. Just:
SELECT my_function(); -- your function
See:
Postgres query plan of a function invocation written in plpgsql

Related

PostgreSQL - dblink_exec statement_timeout

I have a function that used dblink_exec inside. Sometimes this execution takes a long time and in some cases produces deadlocks.
I know that I can set local lock_timeout and local statement_timeout but when I do this inside a function, It doesn't take any effect over dblink_exec.
Is that any way to set those parameters on dblink_connect? I know that I can set these two parameters on .conf file but I want to do this locally (per connection).
Is this possible?
Thanks!
Julio
It is quite ugly, but you can specify the statement timeout (to be implemented on the remote side) in the connection string, using the options construct.
create or replace function dblink_timeout() returns int language plpgsql as $$
declare xx int;
begin
select x into xx from
dblink(
'dbname=jjanes options = ''-c statement_timeout=500 -c lock_timeout=100''',
'select count(*) from pgbench_accounts'
) as (x int);
return xx;
end
$$;

How to use DO in postgres

I am attempting to get a better understanding of the DO command in postgreSQL 9.1
I have following code block,
DO
$do$
BEGIN
IF 1=1 THEN
SELECT 'foo';
ELSE
SELECT 'bar';
END IF;
END
$do$
However it returns the following error:
[42601] ERROR: query has no destination for result data Hint: If you want to discard the results of a SELECT, use PERFORM instead. Where: PL/pgSQL function "inline_code_block" line 4 at SQL statement
PostgreSQL DO command creates and executes some specific short life function. This function has not any interface, and then it cannot to return any output other then changes data in tables and some debug output.
Your example has more than one issues:
Only PostgreSQL table functions can returns some tabular data. But the mechanism is significantly different than MSSQL. PostgreSQL user's should to use RETURN NEXT or RETURN QUERY commands.
CREATE OR REPLACE FUNCTION foo(a int)
RETURNS TABLE(b int, c int) AS $$
BEGIN
RETURN QUERY SELECT i, i + 1 FROM generate_series(1,a) g(i);
END;
$$ LANGUAGE plpgsql;
SELECT * FROM foo(10);
DO anonymous functions are not table functions - so no output is allowed.
PostgreSQL 9.1 is not supported version, please upgrade
If you have some experience only from MSSQL, then forget it. A concept of stored procedures of PostgreSQL is very similar to Oracle or DB2, and it is significantly different to MSSQL does. T-SQL integrates procedural and SQL constructs to one set. Oracle, PostgreSQL, ... procedural functionality can embedded SQL, but procedural functionality is not integrated to SQL.
Please, read PostgreSQL PLpgSQL documentation for better imagine about this technology.

VACUUM cannot be executed from a function or multi-command string

I have written a script, using PL/pgSQL, that I run in pgAdmin III. The script deletes existing DB contents and then adds a bunch of "sample" data for the desired testing scenario (usually various types of load tests). Once the data is loaded, I would like to "vacuum analyze" the affected tables, both to recover the space from the deleted records and to accurately reflect the new contents.
I can use various workarounds (e.g. do the VACUUM ANALYZE manually, include drop/create statements for the various structures within the script, etc.) But, what I would really like to do is:
DO $$
BEGIN
-- parent table
FOR i IN 1..10000 LOOP
INSERT INTO my_parent_table( ... ) VALUES ...;
END LOOP;
VACUUM ANALYZE my_parent_table;
-- child table
FOR i IN 1..50000 LOOP
INSERT INTO my_child_table( ... ) VALUES ...;
END LOOP;
VACUUM ANALYZE my_child_table;
END;
$$;
When I run this, I get:
ERROR: VACUUM cannot be executed from a function or multi-command string
So then I tried moving the vacuum statements to the end like so:
DO $$
BEGIN
-- parent table
FOR i IN 1..10000 LOOP
INSERT INTO my_parent_table( ... ) VALUES ...;
END LOOP;
-- child table
FOR i IN 1..50000 LOOP
INSERT INTO my_child_table( ... ) VALUES ...;
END LOOP;
END;
$$;
VACUUM ANALYZE my_parent_table;
VACUUM ANALYZE my_child_table;
This give me the same error. Is there any way I can incorporate the vacuum analyze into the same script that adds the data?
I am using PostgreSQL v 9.2.
If you are running this from the pgAdmin3 query window with the "execute query" button, that sends the entire script to the server as one string.
If you execute it from the query window "execute pgScript" button, that sends the commands separately and so would work, except that it does not tolerate the DO syntax for anonymous blocks. You would have to create a function that does the currently anonymous work, and then invoke the "execute pgScript" with something like:
select inserts_function();
VACUUM ANALYZE my_parent_table;
VACUUM ANALYZE my_child_table;
The final solution I implemented ended up being a composite of suggestions made in the comments by a_horse_with_no_name and xzilla:
Using TRUNCATE instead of DELETE avoids the need for a VACUUM
ANALYZE can then be used by itself in-line during the script as needed

Changing message output in PostgreSQL and pgAdminIII

This is a small thing, but it's a bit annoying to me and seems like there is probably a way to configure it. Let's say I have the following:
CREATE OR REPLACE FUNCTION baz()
RETURNS void AS
$BODY$
DECLARE
BEGIN
RAISE NOTICE 'I also did some work!';
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
CREATE OR REPLACE FUNCTION bar()
RETURNS void AS
$BODY$
DECLARE
BEGIN
RAISE NOTICE 'I did a bunch of work and want you to know about it';
PERFORM baz();
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
CREATE OR REPLACE FUNCTION foo()
RETURNS void AS
$BODY$
DECLARE
BEGIN
PERFORM bar();
END;
$BODY$
LANGUAGE plpgsql VOLATILE
COST 100;
select foo();
I get the following output messages:
NOTICE: I did a bunch of work and want you to know about it
CONTEXT: SQL statement "SELECT bar()"
PL/pgSQL function "foo" line 4 at PERFORM
NOTICE: I also did some work!
CONTEXT: SQL statement "SELECT baz()"
PL/pgSQL function "bar" line 5 at PERFORM
SQL statement "SELECT bar()"
PL/pgSQL function "foo" line 4 at PERFORM
Total query runtime: 31 ms.
1 row retrieved.
What I want (usually) is to just see something like:
NOTICE: I did a bunch of work and want you to know about it
NOTICE: I also did some work!
Total query runtime: 31 ms.
1 row retrieved.
Is there a way to control/alter this? Again, it's a small thing and hardly worth a question on Stackoverflow, but if you have a lot of stuff going on it starts to introduce a lot of "noise" into the output and it makes my already overloaded brain hurt trying to sift through it. :)
I'm using PostgreSQL 9.1.5 with pgAdminIII 1.16.0
Try connecting with -q option. The q stands for Quiet and may be what you need.
psql -q -d foo_db
you may also try:
\set verbosity terse
\set quiet on
If you want more control over messaging and interface, it might be worth moving to a real scripting language. Writing database scripts in tools like Python with psycopg2 or Perl with DBD::Pg and DBI is pretty simple.
Not only does using a real scripting language give you total control over messaging, but it also gives you control over error handling, gives you loops and other control structures, and generally offers a much nicer model than raw SQL for scripting tasks.

Why does PostgreSQL treat my query differently in a function?

I have a very simple query that is not much more complicated than:
select *
from table_name
where id = 1234
...it takes less than 50 milliseconds to run.
Took that query and put it into a function:
CREATE OR REPLACE FUNCTION pie(id_param integer)
RETURNS SETOF record AS
$BODY$
BEGIN
RETURN QUERY SELECT *
FROM table_name
where id = id_param;
END
$BODY$
LANGUAGE plpgsql STABLE;
This function when executed select * from pie(123); takes 22 seconds.
If I hard code an integer in place of id_param, the function executes in under 50 milliseconds.
Why does the fact that I am using a parameter in the where statement cause my function to run slow?
Edit to add concrete example:
CREATE TYPE test_type AS (gid integer, geocode character varying(9))
CREATE OR REPLACE FUNCTION geocode_route_by_geocode(geocode_param character)
RETURNS SETOF test_type AS
$BODY$
BEGIN
RETURN QUERY EXECUTE
'SELECT gs.geo_shape_id AS gid,
gs.geocode
FROM geo_shapes gs
WHERE geocode = $1
AND geo_type = 1
GROUP BY geography, gid, geocode' USING geocode_param;
END;
$BODY$
LANGUAGE plpgsql STABLE;
ALTER FUNCTION geocode_carrier_route_by_geocode(character)
OWNER TO root;
--Runs in 20 seconds
select * from geocode_route_by_geocode('999xyz');
--Runs in 10 milliseconds
SELECT gs.geo_shape_id AS gid,
gs.geocode
FROM geo_shapes gs
WHERE geocode = '9999xyz'
AND geo_type = 1
GROUP BY geography, gid, geocode
Update in PostgreSQL 9.2
There was a major improvement, I quote the release notes here:
Allow the planner to generate custom plans for specific parameter
values even when using prepared statements (Tom Lane)
In the past, a prepared statement always had a single "generic" plan
that was used for all parameter values, which was frequently much
inferior to the plans used for non-prepared statements containing
explicit constant values. Now, the planner attempts to generate custom
plans for specific parameter values. A generic plan will only be used
after custom plans have repeatedly proven to provide no benefit. This
change should eliminate the performance penalties formerly seen from
use of prepared statements (including non-dynamic statements in
PL/pgSQL).
Original answer for PostgreSQL 9.1 or older
A plpgsql functions has a similar effect as the PREPARE statement: queries are parsed and the query plan is cached.
The advantage is that some overhead is saved for every call.
The disadvantage is that the query plan is not optimized for the particular parameter values it is called with.
For queries on tables with even data distribution, this will generally be no problem and PL/pgSQL functions will perform somewhat faster than raw SQL queries or SQL functions. But if your query can use certain indexes depending on the actual values in the WHERE clause or, more generally, chose a better query plan for the particular values, you may end up with a sub-optimal query plan. Try an SQL function or use dynamic SQL with EXECUTE to force a the query to be re-planned for every call. Could look like this:
CREATE OR REPLACE FUNCTION pie(id_param integer)
RETURNS SETOF record AS
$BODY$
BEGIN
RETURN QUERY EXECUTE
'SELECT *
FROM table_name
where id = $1'
USING id_param;
END
$BODY$
LANGUAGE plpgsql STABLE;
Edit after comment:
If this variant does not change the execution time, there must be other factors at play that you may have missed or did not mention. Different database? Different parameter values? You would have to post more details.
I add a quote from the manual to back up my above statements:
An EXECUTE with a simple constant command string and some USING
parameters, as in the first example above, is functionally equivalent
to just writing the command directly in PL/pgSQL and allowing
replacement of PL/pgSQL variables to happen automatically. The
important difference is that EXECUTE will re-plan the command on each
execution, generating a plan that is specific to the current parameter
values; whereas PL/pgSQL normally creates a generic plan and caches it
for re-use. In situations where the best plan depends strongly on the
parameter values, EXECUTE can be significantly faster; while when the
plan is not sensitive to parameter values, re-planning will be a
waste.