Postgres query performance monitoring on production servers - postgresql

My goal here is to identify slow queries, which can be done using slow query logs i.e increase the timeout in postgresql.conf to be some 100ms (acceptable query time) and then identify the slow queries.
This has a number of issues, one being the query was slow during that time, but it is faster when run again.The best way is to get an average and then work on the queries which are slow on average.
This requires pg_stat_statements extension which is supposed to cause 10% of server performance degradation, taken from here, the blog post is pretty old and things might have improved.
My questions
I could not find any authoritative source on how performance impact would pg_stat_statemtents extension would have.Is this negligible in recent versions ?
If it is indeed causing a performance impact, will it be sufficient to take a backup/export the system tables required for pg_stat_statements and then analyze it offline in a separate machine perhaps, is there any extension/module which does this automatically
Any alternate solutions are also welcome

Related

RDS | Is there a way to identify what queries consume the most IOPS?

We are currently hitting the limits of allocated IOPS on our DB and this got me wondering how can we identify what is contributing the most to our IOPS usage.
We can see what queries are run, how long they took, how many times those queries were run but this doesn't directly answer the question of what queries are using up all the IOPS of the postgresql RDS db and we have to make educated guesses. Is there a better way?
That is simple if you can install pg_stat_statements and set track_io_timing = on.
The pg_stat_statements extension provides a view of the same name that you can query like this:
SELECT query,
blk_read_time + blk_write_time AS io_time
FROM pg_stat_statements
ORDER BY blk_read_time + blk_write_time DESC
LIMIT 20;
There are two drawback with this:
Since PostgreSQL uses buffered I/O, a read or write does not necessarily go to disk, but can go to the operating system kernel cache, not actually causing I/O. But since such I/O requests are fast, they should only cause a small fraction of the result.
The bigger problem is that because of the architecture of a relational database, you won't see most write requests there, because writes normally happen during checkpoints and cannot be attributed to an individual query.
Because of that, it might be better to track shared_blks_dirtied + shared_blks_written than blk_write_time, which won't tell you the I/O time, but give you a good idea how many 8kB blocks this query caused to be written.
Even if you don't get exact numbers that way, pg_stat_statements will give you a good idea which statements cause the most I/O.

When and why should i trigger pg_stat_reset()?

I am trying to understand how to monitor and tune postgresql performance. I started with exploring tables pg_stat_all_tables, pg_stat_statements in order to gather information about live tuples, dead tuples, last autovacuum time etc. There were some usefull information about n_live_tuples (near to real rows count in table) and n_dead_tup util i run pg_stat_reset query. After that i have some strange results - there are less n_live_tup than n_dead_tup. I can't find any articles/docs about why and when (some use cases) should i run pg_stat_reset query. Can somebody explain me that or provide some useful resources?
It is ok to run pg_stat_reset() occasionally, like once per month, to get a fairly up-to-date view on what is going on in your database.
But don't do that too often, as there is a down side to it: the system relevant autovacuum process relies on these statistics, so you will miss a couple of autovacuum (and autoanalyze) runs if you do that. That may or may not be a problem in your database, but at any rate I wouldn't do it too often. If you can, manually VACUUM and ANALYZE the database after calling pg_stat_reset().
There is no such problem with pg_stat_statements_reset(), so run that as often as you please.
The best thing for you would be to have a monitoring software that checks the values of the statistics regularly and provides you with a the development (differences to the previous run). Then you never have to reset the statistics and still have a good overview over what is going on.

Run vacuum by schedule

I'm using Postgres version 9.6
Most of my tables are for queries, update, insert.
Most of them around 200K-700K.
There are bigger (millions) and smaller.
Is that a good idea to perform vacuum (and analyze?) operation once a day? once a week? regardless if there is an autovacuum..
Advantages vs disadvantages?
Autovacuum is done when needed and it only creates statistics that are used when planning a query.
Basically you never need to do this manually, unless you have made vast changes to a table (filled it with data for example), and want to use it in another query within a few milliseconds. In that scenario, old statistics will result in the query planner coming up with a very bad query plan and will lead to a significantly slower query.
What you might want to do once per day / per week, or whatever, is to cluster tables, recreate degraded indexes, on tables that were modified a lot. Research these topics more to decide if / when / how to do it.

Database for long running transactions with huge updates

I build a tool for data extraction and transformation. Typical use case - transactionally processing lots of data.
Numbers are - about 10sec - 5min duration, 200-10000 row updated (long duration caused not by the database itself but by outside services that used during transaction).
There are two types of agents that access database - multiple read agents, and only one write agent (so, there are never multiple concurrent write).
During the transaction:
Read agents should be able to read database and see it in the current state.
Write agent should be able to read database (it does both - read and write during transaction) and see it in the new (not yet committed) state.
Is PostgreSQL a good choice for that type of load? I know it uses MVCC - so it should be ok in general, but is it ok to use long and big transactions extensively?
What other open-source transactional databases may be a good choice (I am not limited to SQL)?
P.S.
I do not know if the sharding may affect the performance. The database will be sharded. For every shard there will be multiple readers and only one writer, but multiple different shards can be written to at the same time.
I know that it's better not to use outside services during transaction, but in that case - it's the goal. The database used as a reliable and consistent index for some heavy, huge, slow and eventually-consistent data processing tool.
Huge disclaimer: as always, only real life test can tell you the truth.
But, I think PostgreSQL will not let you down, if you use most recent version (at least 9.1, better 9.2) and tune it properly.
I have somewhat similar load in my server, but with slightly worse R/W ratio: about 10:1. Transactions range from few milliseconds up to 1 hour (and sometimes even more), and one transaction can insert or update up to 100k rows. Total number of concurrent writers with long transactions can reach 10 and more.
So far so good - I don't really have any serious issues, performance is great (certainly not worse than I expected).
What really helps is that my hot working data set almost fits into available memory.
So, give it a try, it should work great for your load.
Have a look at this link. Maximum transaction size in PostgreSQL
Basically there can be some technical limits on the software side to how large your transaction can be.

PostgreSQL. Slow queries in log file are fast in psql

I have an application written on Play Framework 1.2.4 with Hibernate(default C3P0 connection pooling) and PostgreSQL database (9.1).
Recently I turned on slow queries logging ( >= 100 ms) in postgresql.conf and found some issues.
But when I tried to analyze and optimize one particular query, I found that it is blazing fast in psql (0.5 - 1 ms) in comparison to 200-250 ms in the log. The same thing happened with the other queries.
The application and database server is running on the same machine and communicating using localhost interface.
JDBC driver - postgresql-9.0-801.jdbc4
I wonder what could be wrong, because query duration in the log is calculated considering only database processing time excluding external things like network turnarounds etc.
Possibility 1: If the slow queries occur occasionally or in bursts, it could be checkpoint activity. Enable checkpoint logging (log_checkpoints = on), make sure the log level (log_min_messages) is 'info' or lower, and see what turns up. Checkpoints that're taking a long time or happening too often suggest you probably need some checkpoint/WAL and bgwriter tuning. This isn't likely to be the cause if the same statements are always slow and others always perform well.
Possibility 2: Your query plans are different because you're running them directly in psql while Hibernate, via PgJDBC, will at least sometimes be doing a PREPARE and EXECUTE (at the protocol level so you won't see actual statements). For this, compare query performance with PREPARE test_query(...) AS SELECT ... then EXPLAIN ANALYZE EXECUTE test_query(...). The parameters in the PREPARE are type names for the positional parameters ($1,$2,etc); the parameters in the EXECUTE are values.
If the prepared plan is different to the one-off plan, you can set PgJDBC's prepare threshold via connection parameters to tell it never to use server-side prepared statements.
This difference between the plans of prepared and unprepared statements should go away in PostgreSQL 9.2. It's been a long-standing wart, but Tom Lane dealt with it for the up-coming release.
It's very hard to say for sure without knowing all the details of your system, but I can think of a couple of possibilities:
The query results are cached. If you run the same query twice in a short space of time, it will almost always complete much more quickly on the second pass. PostgreSQL maintains a cache of recently retrieved data for just this purpose. If you are pulling the queries from the tail of your log and executing them immediately this could be what's happening.
Other processes are interfering. The execution time for a query varies depending on what else is going on in the system. If the queries are taking 100ms during peak hour on your website when a lot of users are connected but only 1ms when you try them again late at night this could be what's happening.
The point is you are correct that the query duration isn't affected by which library or application is calling it, so the difference must be coming from something else. Keep looking, good luck!
There are several possible reasons. First if the database was very busy when the slow queries excuted, the query may be slower. So you may need to observe the load of the OS at that moment for future analysis.
Second the history plan of the sql may be different from the current session plan. So you may need to install auto_explain to see the actual plan of the slow query.