Benchmarking/Performance of Postgresql Query

Benchmarking/Performance of Postgresql Query - postgresql

I want to measure the performance of postgresql code I wrote. In the code tables get created, selfwritten functions get called etc.
Looking around, I found EXPLAIN ANALYSE is the way to go.
However, as far as I understand it, the code only gets executed once. For a more realistic analysis I want to execute the code many many times and have the results of each iteration written somewhere, ideally in a table (for statistics later).
Is there a way to do this with a native postgresql function? If there is no native postgresql function, would I accomplish this with a simple loop? Further, how would I write out the information of every EXPLAIN ANALYZE iteration?

One way to do this is to write a function that runs an explain and then spool the output of that into a file (or insert that into a table).
E.g.:
create or replace function show_plan(to_explain text)
returns table (line_nr integer, line text)
as
$$
declare
l_plan_line record;
l_line integer;
begin
l_line := 1;
for l_plan_line in execute 'explain (analyze, verbose)'||to_explain loop
return query select l_line, l_plan_line."QUERY PLAN";
l_line := l_line + 1;
end loop;
end;
$$
language plpgsql;
Then you can use generate_series() to run a statement multiple times:
select g.i as run_nr, e.*
from show_plan('select * from foo') e
cross join generate_series(1,10) as g(i)
order by g.i, e.line_nr;
This will run the function 10 times with the passed SQL statement. The result can either be spooled to a file (how you do that depends on the SQL client you are using) or inserted into a table.
For an automatic analysis it's probably easer to use a more "parseable" explain format, e.g. XML or JSON. This is also easier to handle in the output as the plan is a single XML (or JSON) value instead of multiple text lines:
create or replace function show_plan_xml(to_explain text)
returns xml
as
$$
begin
return execut 'explain (analyze, verbose, format xml)'||to_explain;
end;
$$
language plpgsql;
Then use:
select g.i as run_nr, show_plan_xml('select * from foo')
from join generate_series(1,10) as g(i)
order by g.i;

Related

What Identifier in "EXECUTE FORMAT" with "FOR ... IN" loop "record" variable/iterator that uses dot/period "." for Column name in PLPGSQL Procedure?

I am using a basic plpgsql EXECUTE... FORMAT dynamic script with a FOR...IN loop. The problem is, the LOOP has a variable/pointer (like any FOR LOOP in any language) that iterates through the query result set of the SELECT... query in the EXECUTE. I need to make the column name part after the period/dot in temprecord.column_name dynamic, and thus use an Identifier (i.e. %I, %s, %L) for it, As you know, the way to get data (In this column is an ALTER TABLE sql statement) out of this variable/pointer is to use the "dot notation", that is, i.e.
/* Output employee names from column "names" in Employee table */
FOR temprecord IN
EXECUTE format('SELECT *
FROM %I t', ''Employee'')
LOOP
EXECUTE temprecord.names; -- THIS WORKS FINE WHEN I HARDCODE IT. I CAN'T SEEM TO MAKE IT DYNAMIC
END LOOP;
So the above works fine when I hardcord temprecord.names. The problem is I want the column name dynamic, so if different callers/methods call my function, I can select different columns through the temprecord iterator, and EXECUTE this data.
I tried many times what I have below and the best response I have gotten so far is that the sql query that I have in the column (as I stated above) executed, however, it showed an syntax error and truncation error, but I clearly noticed it was returned the all three columns of my table and concatenating the columns together. But I know the real problem is the variable temprecord not picking up on the dot notation that specifies column name i.e. temprecord.column_name_here, that's why it returns all columns and throws truncation here. As I stated above, it works when hardcoded.
i.e.
/* Using the $$ for string formatting here */
CREATE OR REPLACE PROCEDURE my_proc(drop_or_add text)
LANGUAGE plpgsql
AS
$procedure$
DECLARE
temprecord record;
col_nm text;
BEGIN
col_nm := concat_ws('_','sql',drop_or_add); -- SHOULD CONCATENATE TO COLUMN NAME, I.E. sql_drop OR sql_add WHICH HOLDS SQL "ALTER TABLE..." QUERY
FOR temprecord IN
EXECUTE format($f$ SELECT t.col1, t.col2, t.col3%s
FROM some_tbl t
WHERE t.col4 = %L, drop_or_add, 'blue')
LOOP
EXECUTE format($f$ %I.%s $f$, "temprecord", col_nm);
END LOOP;
END
$procedure$;
Hopefully none of this syntax is confusing, nor my logic. Again, I am simply looping with the temprecord iterator variable and trying to access some columns of my database table, that holds some SQL statements and needs to be dynamic based on the argument passed to my Procedure. So, potentially EXECUTE temprecord.sql_drop or EXECUTE temprecord.sql_add can execute.
I always used the %s in my other plpgsql scripts when I access a column of table using dot notation i.e. tbl1.id = tbl2.% and it's always worked fine. I have tried basically every combination possible, but I'll list a few here, i.e.
EXECUTE format($f$ %I.%s $f$, "temprecord", col_nm);
EXECUTE format($f$ %I%s $f$, "temprecord", '.sql_drop');
EXECUTE format($f$ %I.%s $f$, temprecord, col_nm);
EXECUTE format($f$ %I.%I $f$, "temprecord", col_nm);
EXECUTE format($f$ %I.%I $f$, "temprecord", "col_nm");
.
..
...

the string passed to EXECUTE statement have to be valid SQL command. It cannot be an expression.
The PL/pgSQL is almost static language - there are some dynamic features, but almost it is static typed language. Overusing EXECUTE statement is generally bad idea. It can work, but the code will be very unreadable and not well maintainable.
For dynamic access to record, the best way is (today) using transformation to json:
postgres=# create or replace function fx(nm varchar(64))
returns void as $$
declare
r record;
r2 record;
begin
for r in execute format('select v as %I from generate_series(1,3) g(v)', nm)
loop
raise notice '%', (row_to_json(r))->>nm;
end loop;
end;
$$ language plpgsql;
CREATE FUNCTION
postgres=# select fx('foo');
NOTICE: 1
NOTICE: 2
NOTICE: 3
┌────┐
│ fx │
╞════╡
│ │
└────┘
(1 row)

I was able to find a solution, however, I absolutely tried every combination of Identifiers and I could not get this to work when concatenating the record iterator and the column name in the EXECUTE FORMAT block. So my solution was just stick stick a CASE statement before the second EXECUTE... and then directly pass that String variable to the EXECUTE and use only one identifier. It seemed like the problem was any time I was trying to divide/split up the temprecord.column_name inside the FORMAT i.e. EXECUTE FORMAT($f$ %I.$s $f$, temprecord,column_name), it would just not work. However, I am able to do this with regular Tables and Columns in SELECT.. FROM WHERE t1.col1 = t2.%, so I'm not sure if its just different with the record variable because its part of a FOR... IN LOOP or something.
Solution
CASE
WHEN drop_or_add ILIKE '%drop%' THEN
v_exect_sql := temprow.sql_drop;
WHEN drop_or_add ILIKE '%add%' THEN
v_exect_sql := temprow.sql_add;
END CASE;
BEGIN
EXECUTE format($f$ %s $f$, v_exect_sql); -- Have to pass the `temprecord.column_name` all as one Identifier or would NOT work for me

Return variable in postgres select

This covers most use cases How do you use variables in a simple PostgreSQL script? but not the select clause.
This code produces an error column "ct" does not exist"
DO
$$
declare CT timestamp := '2020-09-04 23:59:59';
select CT,5 from job;
$$;
I can see why it would interpret CT as a column name. What's the Postgres syntax required to refer to a variable in the context of the select clause?
I would expect that query to return
'2020-09-04 23:59:59',5
for each row in the job table.
Addendum to the accepted answer
My use case doesn't return rows. Instead, the result of the select is consumed by an insert statement. I'm transforming rows from staging tables into other tables and adding value like the import date and the identity owning the inserts. It's these values that are provided by the variables - they are used in several such transforms and the point of the variable is to let me set each value once up the top of the script.
Because the rows are consumed like this, it turns out that I don't need a function wrapping this code. It's a bit inconvenient to test since I can't run the select and look at the outcome without copying it and pasting in literals, but at least it's possible to use variables. My working script looks like this:
do
$$
declare ct timestamp := '2020-09-04 23:59:59';
declare cb int := 2;
declare iso8601 varchar(50) := 'YYYY-MM-DD HH24:MI:SS';
declare USAdate varchar(50) := 'MM-DD-YYYY HH24:MI:SS';
begin
delete from dozer_wheel_loader_equipment_movement where created = ct;
INSERT INTO dozer_wheel_loader_equipment_movement
(site, primary_category_id, machine, machine_class, x, y, z, timestamp_local, created, created_by)
select site ,mc.id ,machine , machineclass ,x,y,z,to_timestamp(timestamplocal, iso8601), ct, cb
from stage_dozer_csv d join machine_category mc on d.primarycategory = mc.short_code;
...
end
$$
There is a lot of worthwhile related reading at How to declare a variable in a PostgreSQL query

There are few things about variables in PostgreSQL.
Variable can not be used in Plain SQL in Postgres. So you have to use any pl language i.e. plpgsql to use this. You have tried the same in your example.
In your DO block you have missed the Begin and End, So you have to write it like below
DO
$$
begin
declare CT timestamp := '2020-09-04 23:59:59';
select CT,5 from job;
end
$$;
But when you read the official documentation of DO Statement, it says DO will allow to run the anonymous code but it returns void, that's why above code will throw following error:
ERROR: query has no destination for result data
HINT: If you want to discard the results of a SELECT, use PERFORM instead.
CONTEXT: PL/pgSQL function inline_code_block line 4 at SQL statement
So there is only one way - wrap this code block in a Function like below:
create or replace function func() returns table(col1 timestamp, col2 int )
AS
$$
declare ct timestamp := '2020-09-04 23:59:59';
begin
return query
select CT,5 from job;
end;
$$
language plpgsql
and you can call it like below:
select * from func()
DEMO
Conclusion
You can not use variable in normal SQL statement in Postgres.
You have to use any Procedural Language i.e. plpgsql to use variable.
DO Block doesn't return any value so you can not use select statement like above in DO block. It is good for non-returning queries i.e. insert, update, delete or grant etc.
Only way to return a value from procedural language code block is - you have to wrap it in a suitable PostgreSQL Function.

How to migrate oracle's pipelined function into PostgreSQL

As I am newbie to plpgSQL,
I stuck while migrating a Oracle query into PostgreSQL.
Oracle query:
create or replace FUNCTION employee_all_case(
p_ugr_id IN integer,
p_case_type_id IN integer
)
RETURN number_tab_t PIPELINED
-- LANGUAGE 'plpgsql'
-- COST 100
-- VOLATILE
-- AS $$
-- DECLARE
is
l_user_id NUMBER;
l_account_id NUMBER;
BEGIN
l_user_id := p_ugr_id;
l_account_id := p_case_type_id;
FOR cases IN
(SELECT ccase.case_id, ccase.employee_id
FROM ct_case ccase
INNER JOIN ct_case_type ctype
ON (ccase.case_type_id=ctype.case_type_id)
WHERE ccase.employee_id = l_user_id)
LOOP
IF cases.employee_id IS NOT NULL THEN
PIPE ROW (cases.case_id);
END IF;
END LOOP;
RETURN;
END;
--$$
When I execute this function then I get the following result
select * from table(select employee_all_case(14533,1190) from dual)
My question here is: I really do not understand the pipelined function and how can I obtain the same result in PostgreSQL as Oracle query ?
Please help.

Thank you guys, your solution was very helpful.
I found the desire result:
-- select * from employee_all_case(14533,1190);
-- drop function employee_all_case
create or replace FUNCTION employee_all_case(p_ugr_id IN integer ,p_case_type_id IN integer)
returns table (case_id double precision)
-- PIPELINED
LANGUAGE 'plpgsql'
COST 100
VOLATILE
AS $$
DECLARE
-- is
l_user_id integer;
l_account_id integer;
BEGIN
l_user_id := cp_lookup$get_user_id_from_ugr_id(p_ugr_id);
l_account_id := cp_lookup$acctid_from_ugr(p_ugr_id);
RETURN QUERY SELECT ccase.case_id
FROM ct_case ccase
INNER JOIN ct_case_type ctype ON ccase.case_type_id = ctype.case_type_id
WHERE ccase.employee_id = p_ugr_id
and ccase.employee_id IS NOT NULL;
--return NEXT;
END;
$$

You would rewrite that to a set returning function:
Change the return type to
RETURNS SETOF integer
and do away with the PIPELINED.
Change the PIPE ROW statement to
RETURN NEXT cases.case_id;
Of course, you will have to do the obvious syntactic changes, like using integer instead of NUMBER and putting the IN before the parameter name.
But actually, it is quite unnecessary to write a function for that. Doing it in a single SELECT statement would be both simpler and faster.

Pipelined functions are best translated to a simple SQL function returning a table.
Something like this:
create or replace function employee_all_case(p_ugr_id integer, p_case_type_IN integer)
returns table (case_id integer)
as
$$
SELECT ccase.case_id
FROM ct_case ccase
INNER JOIN ct_case_type ctype ON ccase.case_type_id = ctype.case_type_id
WHERE ccase.employee_id = p_ugr_id
and cases.employee_id IS NOT NULL;
$$
language sql;
Note that your sample code did not use the second parameter p_case_type_id.
Usage is also more straightforward:
select *
from employee_all_case(14533,1190);

Before diving into the solution, I will provide some information which will help you to understand better.
So basically PIPELINED came into picture for improving memory allocation at run time.
As you all know collections will occupy space when ever they got created. So the more you use, the more memory will get allocated.
Pipelining negates the need to build huge collections by piping rows out of the function.
saving memory and allowing subsequent processing to start before all the rows are generated.
Pipelined table functions include the PIPELINED clause and use the PIPE ROW call to push rows out of the function as soon as they are created, rather than building up a table collection.
By using Pipelined how memory usage will be optimized?
Well, it's very simple. instead of storing data into an array, just process the data by using pipe row(desired type). This actually returns the row and process the next row.
coming to solution in plpgsql
simple but not preferred while storing large data.
Remove PIPELINED from return declaration and return an array of desired type. something like RETURNS typrec2[].
Where ever you are using pipe row(), add that entry to array and finally return that array.
create a temp table like
CREATE TEMPORARY TABLE temp_table (required fields) ON COMMIT DROP;
and insert data into it. Replace pipe row with insert statement and finally return statement like
return query select * from temp_table
**The best link for understanding PIPELINED in oracle [https://oracle-base.com/articles/misc/pipelined-table-functions]
pretty ordinary for postgres reference [http://manojadinesh.blogspot.com/2011/11/pipelined-in-oracle-as-well-in.html]
Hope this helps some one conceptually.

Calculate number of rows affected by batch query in PostgreSQL

First of all, yes I've read documentation for DO statement :)
http://www.postgresql.org/docs/9.1/static/sql-do.html
So my question:
I need to execute some dynamic block of code that contains UPDATE statements and calculate the number of all affected rows. I'm using Ado.Net provider.
In Oracle the solution would have 4 steps:
add InputOutput parameter "N" to command
add BEGIN ... END; to command
add :N := :N + sql%rowcount after each statement.
It's done! We can read N parameter from command, after execute it.
How can I do it with PostgreSQL? I'm using npgsql provider but can migrate to devard if it helps.

DO statement blocks are good to execute dynamic SQL. They are no good to return values. Use a plpgsql function for that.
The key statement you need is:
GET DIAGNOSTICS integer_var = ROW_COUNT;
Details in the manual.
Example code:
CREATE OR REPLACE FUNCTION f_upd_some()
RETURNS integer AS
$func$
DECLARE
ct int;
i int;
BEGIN
EXECUTE 'UPDATE tbl1 ...'; -- something dynamic here
GET DIAGNOSTICS ct = ROW_COUNT; -- initialize with 1st count
UPDATE tbl2 ...; -- nothing dynamic here
GET DIAGNOSTICS i = ROW_COUNT;
ct := ct + i; -- add up
RETURN ct;
END
$func$ LANGUAGE plpgsql;
Call:
SELECT * FROM f_upd_some();

My solution is quite simple. In Oracle I need to use variables to calculate the sum of updated rows because command.ExecuteNonQuery() returns only the count of rows affected by the last UPDATE in the batch.
However, npgsql returns the sum of all rows updated by all UPDATE queries. So I only need to call command.ExecuteNonQuery() and get the result without any variables. Much easier than with Oracle.

Putting EXPLAIN results into a table?

I see from the Postgres 8.1 docs that EXPLAIN generated something like tabular data:
Prior to PostgreSQL 7.3, the plan was emitted in the form of a NOTICE
message. Now it appears as a query result (formatted like a table with
a single text column).
I'm working with 9.0, for which, the docs say, the output can be a variety of types, including TEXT and XML. What I'd really like to do is treat the output as a standard query result so that I could generate a simple report for a query or a set of queries, e.g.,
SELECT maxcost FROM (
EXPLAIN VERBOSE
SELECT COUNT(*)
FROM Mytable
WHERE value>17);
The above doesn't work in any form that I've tried, and I made up the attribute maxcost to demonstrate how neat it would be to pull out specific bits of data (in this case, the maximum estimated cost of the query). Is there anything I can do that would get me part of the way there? I'd prefer to be able to work within a simple SQL console.

No other answers so far, so here's my own stab at it.
It's possible to read the results of explain into a variable within plpgsql, and since the output can be in XML one can wrap EXPLAIN in a stored function to yield the top-level costs using xpath:
CREATE OR REPLACE FUNCTION estimate_cost(IN query text,
OUT startup numeric,
OUT totalcost numeric,
OUT planrows numeric,
OUT planwidth numeric)
AS
$BODY$
DECLARE
query_explain text;
explanation xml;
nsarray text[][];
BEGIN
nsarray := ARRAY[ARRAY['x', 'http://www.postgresql.org/2009/explain']];
query_explain :=e'EXPLAIN(FORMAT XML) ' || query;
EXECUTE query_explain INTO explanation;
startup := (xpath('/x:explain/x:Query/x:Plan/x:Startup-Cost/text()', explanation, nsarray))[1];
totalcost := (xpath('/x:explain/x:Query/x:Plan/x:Total-Cost/text()', explanation, nsarray))[1];
planrows := (xpath('/x:explain/x:Query/x:Plan/x:Plan-Rows/text()', explanation, nsarray))[1];
planwidth := (xpath('/x:explain/x:Query/x:Plan/x:Plan-Width/text()', explanation, nsarray))[1];
RETURN;
END;
$BODY$
LANGUAGE plpgsql;
Hence the example from the question becomes:
SELECT totalcost
FROM estimate_cost('SELECT COUNT(*)
FROM Mytable
WHERE value>17');

This message suggests using "FOR rec IN EXECUTE EXPLAIN..." which seems to do the trick e.g.:
drop table if exists temp_a;
create temp table temp_a
(
"QUERY PLAN" text
);
DO
$$
DECLARE
rec record;
BEGIN
FOR rec IN EXECUTE 'EXPLAIN VERBOSE select version()'
LOOP
-- RAISE NOTICE 'rec=%', row_to_json(rec);
insert into temp_a
select rec."QUERY PLAN";
END LOOP;
END
$$ language plpgsql;
select *
from temp_a;