Is there a shared query plan cache for Postgres? - postgresql

I have a complex postgres query that I've optimised with pg_hint_plan. Planning time is about 150ms while query time is about 30ms. The plan should never change, therefore there's no point in gathering statistics each any every time for each query. The structural problem with the query is that it hits too many tables.
Tweaking the join collapse limit and from select collapse limit has limited effect.
Most 'enterprise' databases have a shared query cache, but as far as I can see Postgres does not.
What are ways around this? Prepared statements aren't really suitable as their lifetime is bound to the connection.

There is no way around this. The best solution I can think of is to use connection pooling, so that your connections live for a long time, and use a prepared statement.

No PostgreSQL does not have a plan cache area like Microsoft SQL Server or Oracle...
It is one of the many differences with professional RDBMS, like SQL Server... For the last one, a complete comparison can be read here

Related

Run vacuum by schedule

I'm using Postgres version 9.6
Most of my tables are for queries, update, insert.
Most of them around 200K-700K.
There are bigger (millions) and smaller.
Is that a good idea to perform vacuum (and analyze?) operation once a day? once a week? regardless if there is an autovacuum..
Advantages vs disadvantages?
Autovacuum is done when needed and it only creates statistics that are used when planning a query.
Basically you never need to do this manually, unless you have made vast changes to a table (filled it with data for example), and want to use it in another query within a few milliseconds. In that scenario, old statistics will result in the query planner coming up with a very bad query plan and will lead to a significantly slower query.
What you might want to do once per day / per week, or whatever, is to cluster tables, recreate degraded indexes, on tables that were modified a lot. Research these topics more to decide if / when / how to do it.

PostgreSQL approaching plan caching on identical queries?

I am running some benchmarks tests on a lot of queries. I have a set of queries and they will be run multiple times after each other. I know that PostgreSQL caches query plans so this is important to consider but as far as I know this does not always happen.
So I have two approaches. I am considering to either (a) force the query plan to be generated each time I run a query or either (b) to 'warm up' a bit so that a plan is cached and it is reused each time. How can I perform either or what precautions can I take to ensure that one or the other is happening?
It would be great if I could monitor plans in the cache but I am not sure if it is possible.
UPDATE: My queries are complex SELECTs to retrieve data, no DELETEs/INSERTs etc. Does this mean I should not give so much respect to the query planner in benchmarks?
PostgreSQL only caches query plans if
you use prepared statements
the statement is executed inside a PL/pgSQL function
So if you want to benchmark how much faster your queries become if you avoid the overhead of planning, you should create a prepared statement and execute it al least six times (because the first five runs will always generate a custom plan).
If your queries are complex, odds are that you might even lose if you cache query plans, particularly if the runtime of the queries is long. In such a case, it is usually better to spend more effort on planning each query. The biggest win with prepared statements is when the execution time of the queries is low.

SQL Server 2008 R2 table access times

Does SQL Server maintain statistics for each table on read, write, update times etc?
What we are wanting to know which tables our ERP applications spend the most time and begin looking for ways to optimize the tables.
Well, SQL Server doesn't keep track of those statistics by table name. But you could look at DMVs like sys.dm_exec_query_stats to see which queries are taking the longest.
SELECT [sql] = SUBSTRING
(
st.[text],
(s.statement_start_offset/2)+1,
(CASE s.statement_end_offset
WHEN -1 THEN DATALENGTH(CONVERT(NVARCHAR(MAX), st.[text]))
ELSE s.statement_end_offset END
- s.statement_start_offset)/2
), s.*
FROM sys.dm_exec_query_stats AS s
CROSS APPLY sys.dm_exec_sql_text(s.[sql_handle]) AS st
WHERE s.execution_count > 1
AND st.[dbid] = DB_ID('Your_ERP_Database_Name')
ORDER BY total_worker_time*1.0 / execution_count DESC;
Of course you can order by any metrics you want, and quickly eyeball the first column to see if you identify anything that looks suspicious.
You can also look at sys.dm_exec_procedure_stats to identify procedures that are consuming high duration or reads.
Keep in mind that these and other DMVs reset for various events including reboots, service restarts, etc. So if you want to keep a running history of these metrics for trending / benchmarking / comparison purposes, you're going to have to snapshot them yourself, or get a 3rd party product (e.g. SQL Sentry Performance Advisor) that can help with that and a whole lot more.
Disclaimer: I work for SQL Sentry.
You could create a SQL Server Audit as per the following link:
http://msdn.microsoft.com/en-us/library/cc280386(v=sql.105).aspx
SQL Server does capture the information you're asking about, but it's on a per index basis, not per table - look in sys.dm_db_index_operational_stats and sys.dm_db_index_usage_stats. You'll have to aggregate the data based on object_id to get table information. However, there are caveats - for example, if an index is not used (no reads and no writes), it won't show up in the output. These statistics are reset on instance restart, and there's a bug that causes them to be reset in index_usage_stats when an index is rebuilt (https://connect.microsoft.com/SQLServer/feedback/details/739566/rebuilding-an-index-clears-stats-from-sys-dm-db-index-usage-stats). And, there are notable differences between the outputs from the DMVs - check out Craig Freedman's post for more information (http://blogs.msdn.com/b/craigfr/archive/2008/10/30/what-is-the-difference-between-sys-dm-db-index-usage-stats-and-sys-dm-db-index-operational-stats.aspx).
The bigger question is, what problem are you trying to solve by having this information? I would agree with Aaron that finding queries that are taking a long time is a better place to start in terms of optimization. But, I wanted you to be aware that SQL Server does have this information.
we use sp_whoisActive from Adam Mechanics blog.
It gives us a snap shot of what is currently going on on the server, and what execution plan the statements are using.
It is easy to use and free of charge.

Does PostgreSQL cache Prepared Statements like Oracle

I have just moved to PostgreSQL after having worked with Oracle for a few years.
I have been looking into some performance issues with prepared statements in the application (Java, JDBC) with the PostgreSQL database.
Oracle caches prepared statements in its SGA - the pool of prepared statements is shared across database connections.
PostgreSQL documentation does not seem to indicate this. Here's the snippet from the documentation (https://www.postgresql.org/docs/current/static/sql-prepare.html) -
Prepared statements only last for the duration of the current database
session. When the session ends, the prepared statement is forgotten,
so it must be recreated before being used again. This also means that
a single prepared statement cannot be used by multiple simultaneous
database clients; however, each client can create their own prepared
statement to use.
I just want to make sure that I am understanding this right, because it seems so basic for a database to implement some sort of common pool of commonly executed prepared statements.
If PostgreSQL does not cache these that would mean every application that expects a lot of database transactions needs to develop some sort of prepared statement pool that can be re-used across connections.
If you have worked with PostgreSQL before, I would appreciate any insight into this.
Yes, your understanding is correct. Typically if you had a set of prepared queries that are that critical then you'd have the application call a custom function to set them up on connection.
There are three key reasons for this afaik:
There's a long todo list and they get done when a developer is interested/paid to tackle them. Presumably no-one has thought it worth funding yet or come up with an efficient way of doing it.
PostgreSQL runs in a much wider range of environments than Oracle. I would guess that 99% of installed systems wouldn't see much benefit from this. There are an awful lot of setups without high-transaction performance requirement, or for that matter a DBA to notice whether it's needed or not.
Planned queries don't always provide a win. There's been considerable work done on delaying planning/invalidating caches to provide as good a fit as possible to the actual data and query parameters.
I'd suspect the best place to add something like this would be in one of the connection pools (pgbouncer/pgpool) but last time I checked such a feature wasn't there.
HTH

PostgreSQL. Slow queries in log file are fast in psql

I have an application written on Play Framework 1.2.4 with Hibernate(default C3P0 connection pooling) and PostgreSQL database (9.1).
Recently I turned on slow queries logging ( >= 100 ms) in postgresql.conf and found some issues.
But when I tried to analyze and optimize one particular query, I found that it is blazing fast in psql (0.5 - 1 ms) in comparison to 200-250 ms in the log. The same thing happened with the other queries.
The application and database server is running on the same machine and communicating using localhost interface.
JDBC driver - postgresql-9.0-801.jdbc4
I wonder what could be wrong, because query duration in the log is calculated considering only database processing time excluding external things like network turnarounds etc.
Possibility 1: If the slow queries occur occasionally or in bursts, it could be checkpoint activity. Enable checkpoint logging (log_checkpoints = on), make sure the log level (log_min_messages) is 'info' or lower, and see what turns up. Checkpoints that're taking a long time or happening too often suggest you probably need some checkpoint/WAL and bgwriter tuning. This isn't likely to be the cause if the same statements are always slow and others always perform well.
Possibility 2: Your query plans are different because you're running them directly in psql while Hibernate, via PgJDBC, will at least sometimes be doing a PREPARE and EXECUTE (at the protocol level so you won't see actual statements). For this, compare query performance with PREPARE test_query(...) AS SELECT ... then EXPLAIN ANALYZE EXECUTE test_query(...). The parameters in the PREPARE are type names for the positional parameters ($1,$2,etc); the parameters in the EXECUTE are values.
If the prepared plan is different to the one-off plan, you can set PgJDBC's prepare threshold via connection parameters to tell it never to use server-side prepared statements.
This difference between the plans of prepared and unprepared statements should go away in PostgreSQL 9.2. It's been a long-standing wart, but Tom Lane dealt with it for the up-coming release.
It's very hard to say for sure without knowing all the details of your system, but I can think of a couple of possibilities:
The query results are cached. If you run the same query twice in a short space of time, it will almost always complete much more quickly on the second pass. PostgreSQL maintains a cache of recently retrieved data for just this purpose. If you are pulling the queries from the tail of your log and executing them immediately this could be what's happening.
Other processes are interfering. The execution time for a query varies depending on what else is going on in the system. If the queries are taking 100ms during peak hour on your website when a lot of users are connected but only 1ms when you try them again late at night this could be what's happening.
The point is you are correct that the query duration isn't affected by which library or application is calling it, so the difference must be coming from something else. Keep looking, good luck!
There are several possible reasons. First if the database was very busy when the slow queries excuted, the query may be slower. So you may need to observe the load of the OS at that moment for future analysis.
Second the history plan of the sql may be different from the current session plan. So you may need to install auto_explain to see the actual plan of the slow query.