how to find out memory consumption of sql statements inside PL/pgSQL function without using auto-explain - plpgsql

I have a dynamic PL/pgSQL function with input parameters like (source table name, target table name, timestamps). It consists of few select statements and creation of temp tables, update/insert in final table.
I need to find out memory allocation and consumption of individual sql statements when this function is executed. Basically I need an 'explain analyze' kind of o/p for a PL/pgSQL function.
do not have access to auto-explain.

Related

How do I chain a VACUUM off of a purge routine running with pg_cron?

Postgres 13.4
I've got some pg_cron jobs set up to periodically delete older records out of log-like files. What I'd like to do is to run VACUUM ANALYZE after performing a purge. Unfortunately, I can't work out how to do this in a stored function. Am I missing a trick? Is a stored procedure more appropriate?
As an example, here's one of my purge routines
CREATE OR REPLACE FUNCTION dba.purge_event_log (
retain_days_in integer_positive default 14)
RETURNS int4
AS $BODY$
WITH -- Use a CTE so that we've got a way of returning the count easily.
deleted AS (
-- Normal-looking code for this requires a literal:
-- where your_dts < now() - INTERVAL '14 days'
-- Don't want to use a literal, SQL injection, etc.
-- Instead, using a interval constructor to achieve the same result:
DELETE
FROM dba.event_log
WHERE dts < now() - make_interval (days => $1)
RETURNING *
),
----------------------------------------
-- Save details to a custom log table
----------------------------------------
logit AS (
insert into dba.event_log (name, details)
values ('purge_event_log(' || retain_days_in::text || ')',
'count = ' || (select count(*)::text from deleted)
)
)
----------------------------------------
-- Return result count
----------------------------------------
select count(*) from deleted;
$BODY$
LANGUAGE sql;
COMMENT ON FUNCTION dba.purge_event_log (integer_positive) IS
'Delete dba.event_log records older than the day count passed in, with a default retention period of 14 days.';
The truth is, I don't really care about the count(*) result from this routine, in this case. But I might want a result and an additional action in some other, similar context. As you can see, the routine deletes records, uses a CTE to insert a report into another table, and then returns a result. No matter what, I figure this example is a good way to get me head around the alternatives and options in stored functions. The main thing I want to achieve here is to delete records, and then run maintenance. if this is an awkward fit for a stored function or procedure, I could write out an entry to a vacuum_list table with the table name, and have another job to run though that list.
If there's a smarter way to approach vacuum without the extra, I'm of course interested in that. But I'm also interested in understanding the limits on what operationa you can combine in PL/PgSQL routines.
Pavel Stehule' answer is correct and complete. I decided to follow-up a bit here as I like to dig in on bugs in my code, behaviors in Postgres, etc. to get a better sense of what I'm dealing with. I'm including some notes below for anyone who finds them of use.
COMMAND cannot be executed...
The reference to "VACUUM cannot be executed inside a transaction block" gave me a better way to search the docs for similarly restricted commands. The information below probably doesn't cover everything, but it's a start.
Command Limitation
CREATE DATABASE
ALTER DATABASE If creating a new table space.
DROP DATABASE
CLUSTER Without any parameters.
CREATE TABLESPACE
DROP TABLESPACE
REINDEX All in system catalogs, database, or schema.
CREATE SUBSCRIPTION When creating a replication slot (the default behavior.)
ALTER SUBSCRIPTION With refresh option as true.
DROP SUBSCRIPTION If the subscription is associated with a replication slot.
COMMIT PREPARED
ROLLBACK PREPARED
DISCARD ALL
VACUUM
The accepted answer indicates that the limitation has nothing to do with the specific server-side language used. I've just come across an older thread that has some excellent explanations and links for stored functions and transactions:
Do stored procedures run in database transaction in Postgres?
Sample Code
I also wondered about stored procedures, as they're allowed to control transactions. I tried them out in PG 13 and, no, the code is treated like a stored function, down to the error messages.
For anyone that goes in for this sort of thing, here are the "hello world" samples of sQL and PL/PgSQL stored functions and procedures to test out how VACCUM behaves in these cases. Spoiler: It doesn't work, as advertised.
SQL Function
/*
select * from dba.vacuum_sql_function();
Fails:
ERROR: VACUUM cannot be executed from a function
CONTEXT: SQL function "vacuum_sql_function" statement 1. 0.000 seconds. (Line 13).
*/
DROP FUNCTION IF EXISTS dba.vacuum_sql_function();
CREATE FUNCTION dba.vacuum_sql_function()
RETURNS VOID
LANGUAGE sql
AS $sql_code$
VACUUM ANALYZE activity;
$sql_code$;
select * from dba.vacuum_sql_function(); -- Fails.
PL/PgSQL Function
/*
select * from dba.vacuum_plpgsql_function();
Fails:
ERROR: VACUUM cannot be executed from a function
CONTEXT: SQL statement "VACUUM ANALYZE activity"
PL/pgSQL function vacuum_plpgsql_function() line 4 at SQL statement. 0.000 seconds. (Line 22).
*/
DROP FUNCTION IF EXISTS dba.vacuum_plpgsql_function();
CREATE FUNCTION dba.vacuum_plpgsql_function()
RETURNS VOID
LANGUAGE plpgsql
AS $plpgsql_code$
BEGIN
VACUUM ANALYZE activity;
END
$plpgsql_code$;
select * from dba.vacuum_plpgsql_function();
SQL Procedure
/*
call dba.vacuum_sql_procedure();
ERROR: VACUUM cannot be executed from a function
CONTEXT: SQL function "vacuum_sql_procedure" statement 1. 0.000 seconds. (Line 20).
*/
DROP PROCEDURE IF EXISTS dba.vacuum_sql_procedure();
CREATE PROCEDURE dba.vacuum_sql_procedure()
LANGUAGE SQL
AS $sql_code$
VACUUM ANALYZE activity;
$sql_code$;
call dba.vacuum_sql_procedure();
PL/PgSQL Procedure
/*
call dba.vacuum_plpgsql_procedure();
ERROR: VACUUM cannot be executed from a function
CONTEXT: SQL statement "VACUUM ANALYZE activity"
PL/pgSQL function vacuum_plpgsql_procedure() line 4 at SQL statement. 0.000 seconds. (Line 23).
*/
DROP PROCEDURE IF EXISTS dba.vacuum_plpgsql_procedure();
CREATE PROCEDURE dba.vacuum_plpgsql_procedure()
LANGUAGE plpgsql
AS $plpgsql_code$
BEGIN
VACUUM ANALYZE activity;
END
$plpgsql_code$;
call dba.vacuum_plpgsql_procedure();
Other Options
Plenty. As I understand it, VACUUM, and a handful of other commands, are not supported in server-side code running within Postgres. Therefore, you code needs to start from somewhere else. That can be:
Whatever cron you've got in your server's OS.
Any exteral client you like.
pg_cron.
As we're deployed on RDS, those last two options are where I'll look. And there's one more:
Let AUTOVACCUM and an occasional VACCUM do their thing.
That's pretty easy to do, and seems to work fine for the bulk of our needs.
Another Idea
If you do want a bit more control and some custom logging, I'm imagining a table like this:
CREATE TABLE IF NOT EXISTS dba.vacuum_list (
database_name text,
schema_name text,
table_name text,
run boolean,
run_analyze boolean,
run_full boolean,
last_run_dts timestamp)
ALTER TABLE dba.vacuum_list ADD CONSTRAINT
vacuum_list_pk
PRIMARY KEY (database_name, schema_name, table_name);
That's just a sketch. The idea is like this:
You INSERT into vacuum_list when a table needs some vacuuming, at least as far as you're concerned.
In my case, that would be an UPSERT as I don't need a full log-like table, just a single row per table of interest with the last outcome and/or pending state.
Periodically, a remote client, etc. connects, reads the table, and executes each specified VACUUM, according to the options specified in the record.
The external client updates the row with the last run timestamp, and whatever else you're including in the row.
Optionally, you could include fields for duration and change in relation size pre:post vacuuming.
That last option is what I'm interested in. None of our VACUUM calls were working for quite some time as there was a months-old dead connection from something sever-side. VACUUM appears to run fine, in such a case, it just can't delete a whole lot of rows. (Because of the super old "open" transaction ID, visibility maps, etc.) The only way to see this sort of thing seems to be to VACUUM VERBOSE and study the output. Or to record vacuum time and, more important, relation size change to flag cases where nothing seems to happen, when it seems like it should.
VACUUM is "top level" command. It cannot be executed from PL/pgSQL ever or from any other PL.

How to detect if Postgresql function utilizing index on the tables or not

I have created a PL/pgSQL table-returning function that executes a SELECT statement and uses the input parameter in the WHERE clause of the query.
I frame the statement dynamically and execute it like this: EXECUTE sqlStmt USING empID;
sqlStmt is a variable of data type text that has the SELECT query which joins 3 tables.
When I execute that query in pgAdmin and analyze I could see that 'Index scan' on the tables are utilized as expected. However, when I do EXPLAIN ANALYZE SELECT * from fn_getDetails(12), the output just says "Function scan".
How do we know if the table indexes are utilized? Other SO answers to use auto_explain module did not provide details of the function body statements. And I am unable to use the PREPARE inside my function body.
The time taken by execution of the direct SELECT statement is almost the same as the use of function, just couple of milliseconds, but how can I know if the index was used?
auto_explain will certainly provide the requested information.
Set the following parameters:
shared_preload_libraries = 'auto_explain' # requires a restart
auto_explain.log_min_duration = 0 # log all statements
auto_explain.log_nested_statements = on # log statements in functions too
The last parameter is required for tracking SQL statements inside functions.
To activate the module, you need to restart the database.
Of course, testing if the index is used in a query on a small table won't give you a reliable result. You need about as many test data as you expect to have in reality.

Redshift: Cannot use aggregate function inside UDF's?

I have written the below code:
create or replace function max_price()
returns real
volatile
as
$$
select
max(main_amount)
from
table
$$
language sql;
I am receiving this error message:
ERROR: The select expression can not have aggregate or window function.
CONTEXT: Create SQL function "max_price" body
How can I work around this?
No, Redshift UDFs are scalar - each "row" of input values returns one output.
https://docs.aws.amazon.com/redshift/latest/dg/udf-creating-a-scalar-sql-udf.html
You may be able to use a Stored Procedure to obtain the result you are looking for.
https://docs.aws.amazon.com/redshift/latest/dg/stored-procedure-create.html
A scalar User-Defined Function in Amazon Redshift cannot issue a SELECT command that retrieves data from a table. It is intended as a means of calculating a number, rather than querying the database.
From Creating a scalar SQL UDF - Amazon Redshift:
The SELECT clause can't include any of the following types of clauses: FROM, INTO, WHERE, GROUP BY, ORDER BY, LIMIT
If you need to consult another table as part of the function, use a Stored procedure.

How to log queries inside PLPGSQL with inlined parameter values

When a statement in my PLPGSQL function (Postgres 9.6) is being run I can see the query on one line, and then all the parameters on another line. A 2-line logging. Something like:
LOG: execute <unnamed>: SELECT * FROM table WHERE field1=$1 AND field2=$2 ...
DETAIL: parameters: $1 = '-767197682', $2 = '234324' ....
Is it possible to log the entire query in pg_log WITH the parameters already replaced in the query and log it in a SINGLE line?
Because this would make it much easier to copy/paste the query to reproduce it in another terminal, especially if queries have dozens of parameters.
The reason behind this: PL/pgSQL treats SQL statements as prepared statements internally.
First: With default settings, there is no logging of SQL statements inside PL/pgSQL functions at all. Are you using auto_explain?
Postgres query plan of a UDF invocation written in pgpsql
The first couple of invocations in the same session, the SPI manager (Server Programming Interface) generates a fresh execution plan, based on actual parameter values. Any kind of logging should report parameter values inline.
Postgres keeps track and after a couple of invocations in the current session, if execution plans don't seem sensitive to actual parameter values, it will start reusing a generic, cached plan. Then you should see the generic plan of a prepared statements with $n parameters (like in the question).
Details in the chapter "Plan Caching" in the manual.
You can observe the effect with a simple demo. In the same session (not necessarily same transaction):
CREATE TEMP TABLE tbl AS
SELECT id FROM generate_series(1, 100) id;
PREPARE prep1(int) AS
SELECT min(id) FROM tbl WHERE id > $1;
EXPLAIN EXECUTE prep1(3); -- 1st execution
You'll see the actual value:
Filter: (id > 3)
EXECUTE prep1(1); -- several more executions
EXECUTE prep1(2);
EXECUTE prep1(3);
EXECUTE prep1(4);
EXECUTE prep1(5);
EXPLAIN EXECUTE prep1(3);
Now you'll see a $n parameter:
Filter: (id > $1)
So you can get the query with parameter values inlined on the first couple of invocations in the current session.
Or you can use dynamic SQL with EXECUTE, because, per documentation:
Also, there is no plan caching for commands executed via EXECUTE.
Instead, the command is always planned each time the statement is run.
Thus the command string can be dynamically created within the function
to perform actions on different tables and columns.
That can actually affect performance, of course.
Related:
PostgreSQL Stored Procedure Performance

Executing queries dynamically in PL/pgSQL

I have found solutions (I think) to the problem I'm about to ask for on Oracle and SQL Server, but can't seem to translate this into a Postgres solution. I am using Postgres 9.3.6.
The idea is to be able to generate "metadata" about the table content for profiling purposes. This can only be done (AFAIK) by having queries run for each column so as to find out, say... min/max/count values and such. In order to automate the procedure, it is preferable to have the queries generated by the DB, then executed.
With an example salesdata table, I'm able to generate a select query for each column, returning the min() value, using the following snippet:
SELECT 'SELECT min('||column_name||') as minval_'||column_name||' from salesdata '
FROM information_schema.columns
WHERE table_name = 'salesdata'
The advantage being that the db will generate the code regardless of the number of columns.
Now there's a myriad places I had in mind for storing these queries, either a variable of some sort, or a table column, the idea being to then have these queries execute.
I thought of storing the generated queries in a variable then executing them using the EXECUTE (or EXECUTE IMMEDIATE) statement which is the approach employed here (see right pane), but Postgres won't let me declare a variable outside a function and I've been scratching my head with how this would fit together, whether that's even the direction to follow, perhaps there's something simpler.
Would you have any pointers, I'm currently trying something like this, inspired by this other question but have no idea whether I'm headed in the right direction:
CREATE OR REPLACE FUNCTION foo()
RETURNS void AS
$$
DECLARE
dyn_sql text;
BEGIN
dyn_sql := SELECT 'SELECT min('||column_name||') from salesdata'
FROM information_schema.columns
WHERE table_name = 'salesdata';
execute dyn_sql
END
$$ LANGUAGE PLPGSQL;
System statistics
Before you roll your own, have a look at the system table pg_statistic or the view pg_stats:
This view allows access only to rows of pg_statistic that correspond
to tables the user has permission to read, and therefore it is safe to
allow public read access to this view.
It might already have some of the statistics you are about to compute. It's populated by ANALYZE, so you might run that for new (or any) tables before checking.
-- ANALYZE tbl; -- optionally, to init / refresh
SELECT * FROM pg_stats
WHERE tablename = 'tbl'
AND schemaname = 'public';
Generic dynamic plpgsql function
You want to return the minimum value for every column in a given table. This is not a trivial task, because a function (like SQL in general) demands to know the return type at creation time - or at least at call time with the help of polymorphic data types.
This function does everything automatically and safely. Works for any table, as long as the aggregate function min() is allowed for every column. But you need to know your way around PL/pgSQL.
CREATE OR REPLACE FUNCTION f_min_of(_tbl anyelement)
RETURNS SETOF anyelement
LANGUAGE plpgsql AS
$func$
BEGIN
RETURN QUERY EXECUTE (
SELECT format('SELECT (t::%2$s).* FROM (SELECT min(%1$s) FROM %2$s) t'
, string_agg(quote_ident(attname), '), min(' ORDER BY attnum)
, pg_typeof(_tbl)::text)
FROM pg_attribute
WHERE attrelid = pg_typeof(_tbl)::text::regclass
AND NOT attisdropped -- no dropped (dead) columns
AND attnum > 0 -- no system columns
);
END
$func$;
Call (important!):
SELECT * FROM f_min_of(NULL::tbl); -- tbl being the table name
db<>fiddle here
Old sqlfiddle
You need to understand these concepts:
Dynamic SQL in plpgsql with EXECUTE
Polymorphic types
Row types and table types in Postgres
How to defend against SQL injection
Aggregate functions
System catalogs
Related answer with detailed explanation:
Table name as a PostgreSQL function parameter
Refactor a PL/pgSQL function to return the output of various SELECT queries
Postgres data type cast
How to set value of composite variable field using dynamic SQL
How to check if a table exists in a given schema
Select columns with particular column names in PostgreSQL
Generate series of dates - using date type as input
Special difficulty with type mismatch
I am taking advantage of Postgres defining a row type for every existing table. Using the concept of polymorphic types I am able to create one function that works for any table.
However, some aggregate functions return related but different data types as compared to the underlying column. For instance, min(varchar_column) returns text, which is bit-compatible, but not exactly the same data type. PL/pgSQL functions have a weak spot here and insist on data types exactly as declared in the RETURNS clause. No attempt to cast, not even implicit casts, not to speak of assignment casts.
That should be improved. Tested with Postgres 9.3. Did not retest with 9.4, but I am pretty sure, nothing has changed in this area.
That's where this construct comes in as workaround:
SELECT (t::tbl).* FROM (SELECT ... FROM tbl) t;
By casting the whole row to the row type of the underlying table explicitly we force assignment casts to get original data types for every column.
This might fail for some aggregate function. sum() returns numeric for a sum(bigint_column) to accommodate for a sum overflowing the base data type. Casting back to bigint might fail ...
#Erwin Brandstetter, Many thanks for the extensive answer. pg_stats does indeed provide a few things, but what I really need to draw a complete profile is a variety of things, min, max values, counts, count of nulls, mean etc... so a bunch of queries have to be ran for each columns, some with GROUP BY and such.
Also, thanks for highlighting the importance of data types, i was sort of expecting this to throw a spanner in the works at some point, my main concern was with how to automate the query generation, and its execution, this last bit being my main concern.
I have tried the function you provide (I probably will need to start learning some plpgsql) but get a error at the SELECT (t::tbl) :
ERROR: type "tbl" does not exist
btw, what is the (t::abc) notation referred as, in python this would be a list slice, but it’s probably not the case in PLPGSQL