How to profile plpgsql procedures - postgresql

I'm trying to improve the performance of a long-running plpgsql stored procedure, but I have no idea what, if any, profiling tools are available. Can anyone offer suggestions for how to go about profiling such a procedure?

Raise some notices from the procedure including the clock_timestamp() to see where the database spends time. And make the procedures a simple as possible.
Could you show us an example?

We are currently looking for a better answer to this question, and have stumbled across this tool:
http://www.openscg.com/2015/02/postgresql-plpgsql-profiler/
Hosted at:
https://bitbucket.org/openscg/plprofiler
It claims to give you what you are looking for, including the total time spent on each line of the function. We have not investigated it further yet, but based on the author's claims, we are optimistic.

To start with, you could turn on logging of all statements into the Postgres logfile. The log will contain the runtime for each statement. This way you can identify the slowest queries and try to optimize them.
But reading your comment to Frank's post I'd guess that the looping is your problem. Try to get rid of the looping and do everything in a single query. One statement that reads a lot of rows is usually more efficient than a lot of statements reading only a few rows.

Try to use pg_stat_statements extension ( http://www.postgresql.org/docs/9.2/static/pgstatstatements.html ).
It can show call number and total call time for all statements (including sub-statements within plpgsql procedures).

The tool to use is https://github.com/bigsql/plprofiler
If you installed PostgreSQL using the PGDG yum repository, then installing plprofiler is very straightforward, just run the commands below but keep in mind it's only available for PostgreSQL versions 11 and higher (replace XX with your version number):
yum install plprofiler_XX-server plprofiler_XX-client plprofiler_XX
Then add the profiler extension to your database:
CREATE EXTENSION plprofiler;
Then to generate a profile report on a plpgsql function, run a command like this:
plprofiler run -U your_username -d your_database --command "SELECT * FROM your_custom_plpgsql_function()" --output profile.html

Related

What functions are called when working with the Postgres database

I need to implement transparent encryption in Postgres (TDE). To do this, I found which functions are called when INSERT and SELECT are triggered. Used LLVM-LLDB on SELECT.
I'm trying to do the same with INSERT - does not work
the base process stops and does not allow insertion. I did everything about one manual https://eax.me/lldb/.
What could be wrong? how to find out which functions are called upon insertion (in the case of SELECT, this is secure_read, etc.)? And, if anyone knows how to change the function code in the source?
First, the client and server are located on the same machine, the same user adds data and reads them
Unfortunately I do not have enough reputation to add a screenshots.
The SQL statements are the wrong level to start debugging. You should look at the code where blocks are read and written. That would be in src/backend/storage/smgr.
Look at the functions mdread and mdwrite in md.c. This is probably where you'd start hacking.
PostgreSQL v12 has introduced “pluggable storage”, so you can write your own storage manager. See the documentation. If you don't want to patch PostgreSQL, but have an extension that will work with standard PostgreSQL, that would be the direction to take.
So far I have only covered block storage, but you must not forget WAL. Encrypting that will require hacking PostgreSQL.
This is a complex question which you should post to PostgreSQL hackers distribution list https://www.postgresql.org/list/pgsql-hackers/.
You could start by setting a GDB breakpoint in Executor_Start in execMain.c

How to set up background worker for PostgreSQL 11 server?

Recently I've been asigned to migrate part of the database from Oracle to PostgreSQL enviroment, as testing experiment. During that process, major drawback that occured to me was lack of simple way to implement parallelism which is required due to multiple design reasons, which aren't so relevant here. Recently I've discovered https://www.postgresql.org/docs/11/bgworker.html following process, which occured to me as some way to solve my problems.
Yet not so truly, as I couldn't easly find any tutorial or example how to implement it even for a simple task as writing debugmessages into logger, while the process is running. I've tried some old ways, presented in some plugin specifications from version 9.3, but they weren't much of help.
I would like to know how to set up those workers properly. Any help would be appriciated.
PS: Also if some good soul found workaround to implement bulk collect for cursors into PostgreSQL it would be most kind of you, to share it.
The documentation for bgworker that you linked to is for writing C code, which is probably not what you want. You can use the pg_background extension, which will do what you want. ora2pg will optionally use pg_background when converting oracle procedures with the autonomous transaction pragma. The other option is to use dblink to open a connection to the current db.
Neither solution is great, but it's the only way to go if you need to store data in a table whether or not the enclosing transaction succeeds. If you can get by with just putting stuff into the logs, you can use RAISE NOTICE instead.
As far as bulk collect for cursors go, I'm not sure exactly how you are using them, but set returning functions may help you. Functions in postgres can return multiple rows without fiddling with cursors.

Upgrading from Postgres 7.4 to 9.4.1

I'm upgrading Postgres from ancient 7.4 to 9.4.1 and seeing some errors.
On the old machine, I did:
pg_dumpall | gzip > db_pg_bu.gz
On the new machine, I did:
gunzip -c db_pg_bu.gz | psql
While restoring I got a number of errors which I don't understand, and don't know the importance of. I'm not a DBA, just a lowly developer, so if someone could help me understand what I need to do to get this migration done I would appreciate it.
Here are the errors:
ERROR: cannot delete from view "pg_shadow"
DETAIL: Views that do not select from a single table or view are not automatically updatable.
HINT: To enable deleting from the view, provide an INSTEAD OF DELETE trigger or an unconditional ON DELETE DO INSTEAD rule.
I also got about 15 of these:
NOTICE: SYSID can no longer be specified
And this, although this looks harmless since I saw plpgsql is installed by default stating in version 9.2:
ERROR: could not access file "/usr/lib/postgresql/lib/plpgsql.so": No such file or directory
SET
NOTICE: using pg_pltemplate information instead of CREATE LANGUAGE parameters
ERROR: language "plpgsql" already exists
A big concern is that, as it restores the databases, for each ne I see something like this:
COMMENT
You are now connected to database "landrush" as user "postgres".
SET
ERROR: could not access file "/usr/lib/postgresql/lib/plpgsql.so": No such file or directory
There are basically two ways. Both are difficult for the inexperienced. (and maybe even for the experienced)
do a stepwise migration, using a few intermediate versions (which will probably have to be compiled from source). Between versions you'd have to do a pg_dump --> pg_restore (or just the psql < dumpfile, like in the question). A possible path first hop could be 7.4 -> 8.3, but maybe an additional hop might be needed.
Edit the (uncompressed) dumpfile: remove (or comment out) anything that the new version does not like. This will be an iterative process, and it assumes your dump fits into your editor. (and that you know what you are doing). You might need to redump, separating schema and data (options --schema-only and --data-only, I don't even know if these were available in PG-7.4)
BTW: it is advisable to use the pg_dump from the newer version(the one that you will import to). You'll need to specify the source host via the -h flag. The new (target) version knows about what the new version needs, and will try to adapt (upto a certain point, you still need to use more than one step) I will also refuse to work if it cannot produce a usable dump. (In which case you'll have to make smaller steps...)
Extra:
if the result of your failed conversion is complete enough, and if you are only interested in the basic data, you could just stop here, and maybe polish a bit.
NOTICE: using pg_pltemplate information instead of CREATE LANGUAGE parameters I don't know what this is. Maybe the way that additional languages, such as plpgsql, were added to the core dbms.
ERROR: language "plpgsql" already exists : You can probably ignore this error. -->> comment out the offending lines.
DETAIL: Views that do not select from a single table or view are not automatically updatable. This implies that the postgres RULE rewrite system is used in the old DB. It will need serious work to get it working again.

Could I script my monthly postgres maintenance?

I have to perform a monthly maintenance to a postgres database.
I puTTy into the system, navigate to the database and then run 3 commands on 40 different tables:
CLUSTER [table1] USING [primarykey];
ANALYZE [table1];
REINDEX TABLE [table1];
I have to wait for each command to finish executing before I can run the next one (i.e. CLUSTER, -wait up to a few minutes-, ANALYZE -wait-, REINDEX -wait-, )
It's very simple to do but it takes around 30-45 minutes of me just copying and pasting 120 lines, one line at a time... is there any way to automate this process?
I have zero experience with scripting and I know very little about postgreSQL.
My question is somewhat unique because I cannot install anything in the postgreSQL database. I want to have this script localized on my computer and then be able to run it when it's time for the maintenance.
Clustering automatically reindexes the table. There is no reason to reindex the table immediately after you cluster it.
Do you actually need to do this stuff? Do you have evidence that your tables are in need of clustering? Or you just assuming they do because of something you read off the internet referring to a decade-old version of PostgreSQL written by someone who didn't know what they were talking about in the first place? It is possible you really would benefit from this. It is even more possible you wouldn't, and it is just a waste of time.
If you know nothing about scripting, then you need to learn something about scripting. You should probably tag your post as being about scripting, in whichever shell/language you would like to use.
At the core, all you have to do is write a series of commands to be executed from the command line, and shove them into a text file. The easiest way is probably to install psql on your local computer, if it is not already there.
psql -c 'cluster foobar' -h thehost.example.com
psql -c 'analyze foobar' -h thehost.example.com
You might need to do some configuration to make this connection work with whatever authentication method you have in place, but without knowing which authentication method that is I can't comment further.
If the cluster for some reason fails, there is little reason to proceed to try to analyze it. (But there is also little harm in doing so). If you want to fine tune this situation, there are a variety of ways to do it, depending on which shell you are writing your script for, and what you want it to do.

How to run multiple transactions concurrently in PostgreSQL

I want to do some basic experiment on PostgreSQL, for example to generate deadlocks, to create non-repeatable reads, etc. But I could not figure out how to run multiple transactions at once to see such behavior.
Can anyone give me some Idea?
Open more than one psql session, one terminal per session.
If you're on Windows you can do that by launching psql via the Start menu multiple times. On other platforms open a couple of new terminals or terminal tabs and start psql in each.
I routinely do this when I'm examining locking and concurrency issues, used in answers like:
https://stackoverflow.com/a/12456645/398670
https://stackoverflow.com/a/12831041/398670
... probably more. A useful trick when you want to set up a race condition is to open a third psql session and BEGIN; LOCK TABLE the_table_to_race_on;. Then run statements in your other sessions; they'll block on the lock. ROLLBACK the transaction holding the table lock and the other sessions will race. It's not perfect, since it doesn't simulate offset-start-time concurrency, but it's still very helpful.
Other alternatives are outlined in this later answer on a similar topic.
pgbench is probably the best solution in yours case. It allows you to test different complex database resource contentions, deadlocks, multi-client, multi-threaded access.
To get dealocks you can simply right some script like this ('bench_script.sql):
DECLARE cnt integer DEFAULT 0;
BEGIN;
LOCK TABLE schm.tbl IN SHARE MODE;
select count(*) from schm.tabl into cnt;
insert into schm.tbl values (1 + 9999*random(), 'test descr' );
END;
and pass it to pgbench with -f parameter.
For more detailed pgbench usage I would recommend to read the official manual for postgreSQL pgBench
and get acquented with my pgbench question resolved recently here.
Craig Ringer provide a way that open mutiple transactions manualy, if you find that is not very convenient, You can use pgbench run multiple transactions at once.