Does PostgreSQL cache Prepared Statements like Oracle - postgresql

I have just moved to PostgreSQL after having worked with Oracle for a few years.
I have been looking into some performance issues with prepared statements in the application (Java, JDBC) with the PostgreSQL database.
Oracle caches prepared statements in its SGA - the pool of prepared statements is shared across database connections.
PostgreSQL documentation does not seem to indicate this. Here's the snippet from the documentation (https://www.postgresql.org/docs/current/static/sql-prepare.html) -
Prepared statements only last for the duration of the current database
session. When the session ends, the prepared statement is forgotten,
so it must be recreated before being used again. This also means that
a single prepared statement cannot be used by multiple simultaneous
database clients; however, each client can create their own prepared
statement to use.
I just want to make sure that I am understanding this right, because it seems so basic for a database to implement some sort of common pool of commonly executed prepared statements.
If PostgreSQL does not cache these that would mean every application that expects a lot of database transactions needs to develop some sort of prepared statement pool that can be re-used across connections.
If you have worked with PostgreSQL before, I would appreciate any insight into this.

Yes, your understanding is correct. Typically if you had a set of prepared queries that are that critical then you'd have the application call a custom function to set them up on connection.
There are three key reasons for this afaik:
There's a long todo list and they get done when a developer is interested/paid to tackle them. Presumably no-one has thought it worth funding yet or come up with an efficient way of doing it.
PostgreSQL runs in a much wider range of environments than Oracle. I would guess that 99% of installed systems wouldn't see much benefit from this. There are an awful lot of setups without high-transaction performance requirement, or for that matter a DBA to notice whether it's needed or not.
Planned queries don't always provide a win. There's been considerable work done on delaying planning/invalidating caches to provide as good a fit as possible to the actual data and query parameters.
I'd suspect the best place to add something like this would be in one of the connection pools (pgbouncer/pgpool) but last time I checked such a feature wasn't there.
HTH

Related

SYSIBM.SQLSTATISTICS and SYSIBM.SQLPRIMARYKEYS using most of CPU in DB2 on Windows

I have a fairly busy DB2 on Windows server - 9.7, fix pack 11.
About 60% of the CPU time used by all queries in the package cache is being used by the following two statements:
CALL SYSIBM.SQLSTATISTICS(?,?,?,?,?,?)
CALL SYSIBM.SQLPRIMARYKEYS(?,?,?,?)
I'm fairly decent with physical tuning and have spent a lot of time on SQL tuning on this system as well. The applications are all custom, and educating developers is something I also spend time on.
I get the impression that these two stored procedures are something that perhaps ODBC calls? Reading their descriptions, they also seem like things that are completely unnecessary to do the work being done. The application doesn't need to know the primary key of a table to be able to query it!
Is there anything I can tell my developers to do that will either eliminate/reduce the execution of these or cache the information so that they're not executing against the database millions of times and eating up so much CPU? Or alternately anything I can do at the database level to reduce their impact?
6.5 years later, and I have the answer to my own question. This is a side effect of using an ORM. Part of what it does is to discover the database schema. Rails also has a similar workload. In Rails, you can avoid this by using the schema cache. This becomes particularly important at scale. Not sure if there are equivalencies for other ORMs, but I hope so!

PostgreSQL. Slow queries in log file are fast in psql

I have an application written on Play Framework 1.2.4 with Hibernate(default C3P0 connection pooling) and PostgreSQL database (9.1).
Recently I turned on slow queries logging ( >= 100 ms) in postgresql.conf and found some issues.
But when I tried to analyze and optimize one particular query, I found that it is blazing fast in psql (0.5 - 1 ms) in comparison to 200-250 ms in the log. The same thing happened with the other queries.
The application and database server is running on the same machine and communicating using localhost interface.
JDBC driver - postgresql-9.0-801.jdbc4
I wonder what could be wrong, because query duration in the log is calculated considering only database processing time excluding external things like network turnarounds etc.
Possibility 1: If the slow queries occur occasionally or in bursts, it could be checkpoint activity. Enable checkpoint logging (log_checkpoints = on), make sure the log level (log_min_messages) is 'info' or lower, and see what turns up. Checkpoints that're taking a long time or happening too often suggest you probably need some checkpoint/WAL and bgwriter tuning. This isn't likely to be the cause if the same statements are always slow and others always perform well.
Possibility 2: Your query plans are different because you're running them directly in psql while Hibernate, via PgJDBC, will at least sometimes be doing a PREPARE and EXECUTE (at the protocol level so you won't see actual statements). For this, compare query performance with PREPARE test_query(...) AS SELECT ... then EXPLAIN ANALYZE EXECUTE test_query(...). The parameters in the PREPARE are type names for the positional parameters ($1,$2,etc); the parameters in the EXECUTE are values.
If the prepared plan is different to the one-off plan, you can set PgJDBC's prepare threshold via connection parameters to tell it never to use server-side prepared statements.
This difference between the plans of prepared and unprepared statements should go away in PostgreSQL 9.2. It's been a long-standing wart, but Tom Lane dealt with it for the up-coming release.
It's very hard to say for sure without knowing all the details of your system, but I can think of a couple of possibilities:
The query results are cached. If you run the same query twice in a short space of time, it will almost always complete much more quickly on the second pass. PostgreSQL maintains a cache of recently retrieved data for just this purpose. If you are pulling the queries from the tail of your log and executing them immediately this could be what's happening.
Other processes are interfering. The execution time for a query varies depending on what else is going on in the system. If the queries are taking 100ms during peak hour on your website when a lot of users are connected but only 1ms when you try them again late at night this could be what's happening.
The point is you are correct that the query duration isn't affected by which library or application is calling it, so the difference must be coming from something else. Keep looking, good luck!
There are several possible reasons. First if the database was very busy when the slow queries excuted, the query may be slower. So you may need to observe the load of the OS at that moment for future analysis.
Second the history plan of the sql may be different from the current session plan. So you may need to install auto_explain to see the actual plan of the slow query.

How would you use EF in a typical Business Layer/Data Access Layer/Stored Procedures set up?

Whenever I watch a demo regarding the Entity Framework the demonstrator simply sets up some tables and performs Inserts, Updates and Deletes using automatically created code stubs but never shows any use of stored procedures. It seems to me that this is executing SQL from the client.
In my experience this is not particular good practice so I am presuming that my understanding of the Entity Framework is wrong.
Similarly WCF RIA Services demos use the EF and the demos are always the same. Can anyone shed any light on how you would use EF in a typical Business Layer/Data Access Layer/Stored Procedures set up.
I think I am confused and shouldn't be!!?
There's nothing wrong with executing SQL from the client. Most (if not all) of the problems that it might cause are in fact not there when using something like EF. For instance:
Client generated SQL might cause runtime syntax errors. This is not unlikely since the description of your query is mostly checked on compile time (assuming that the generator itself doesn't generate invalid SQL, which is also unlikely)
Client generated SQL might be inefficient. This is not true with modern database software which have query caches. EF works in a way that's compatible with query caches, i.e. it generates the same SQL consistently (as long as you use the same code consistently) and uses parameters for varying data.
Client generated SQL might be insecure (SQL injections and whatnot). This is all handled by the generator, which uses parameters for your values and does not interpolate user input into the query itself.
Back in the old Client / Server days, it used to be considered good practice to do all db updates using stored procedures.
But now, it's perfectly acceptable to have an O/RM generate SQL and run directly against DB.
Well, part of the reason why executing sql in stored procedures is a good idea is that it gives you a level of abstraction - when db changes inevitably occur, you make a change in a single place (the proc) rather than a dozen places (all the places where you were calling the client sql). Entity Framework provides this layer of abstraction through the data model, and you have the same advantage.
There are some other reasons why you might want to look at procs, like security granularity (only allowing certain users the right to execute), and some minor performance differences. Ultimately, you have to decide for yourself what the right trade-off is. EF is an attempt to dramatically reduce the developer time spent creating a data layer, with the trade-offs listed above.
never shows any use of stored procedures
Take a look at this video: Using Your Own Stored Procedures to Insert, Update and Delete Entities in Entity Framework.
Note that there are a lot of other videos on that topic there that are certainly worth watching!
The legend is that Scott Hanselman once said "It's not a real demo unless someone drags a datagrid" (pg 478 Silverlight 4 In Action, Pete Brown)
You have to remember that demos, are all about selling software, and not at all about communicating best practice. So your observations about the demos are absolutely correct, they cover the basics, and leave it to the observer to fill in the blanks.
As to your comment about Stored Procedures, and various answers to your question about the generator. The generator is good, and getting better. Howerver there are certain circumstances when it will generate completely unusable queries. (see my SO question here and discussed on the ADO.NET team blog)
Therefore there are occasions when hand crafted queries are your only recourse (either by way of stored proc, table value functions, views etc)

should i activate c3p0 statement pooling?

we are running java6/hibernate/c3p0/postgresql stack.
Our JDBC Driver is 8.4-701.jdbc3
I have a few questions about Prepared Statements. I have read
excellent document about Prepared Statements
But i still have a question how to configure c3p0 with postgresql.
At the moment we have
c3p0.maxStatements = 0
c3p0.maxStatementsPerConnection = 0
In my understanding the prepared statements and statement pooling are two different things:
Our hibernate stack uses prepared statements. Postgresql is caching the
execution plan. Next time the same statement is used, postgresql reuses the
execution plan. This saves time planning statements inside DB.
Additionally c3p0 can cache java instances of "java.sql.PreparedStatement"
which means it is caching the java object. So when using
c3p0.maxStatementsPerConnection = 100 it caches at most 100 different
objects. It saves time on creating objects, but this has nothing to do with
the postgresql database and its prepared statements.
Right?
As we use about 100 different statements I would set
c3p0.maxStatementsPerConnection = 100
But the c3p0 docs say in c3p0 known shortcomings
The overhead of Statement pooling is
too high. For drivers that do not
perform significant preprocessing of
PreparedStatements, the pooling
overhead outweighs any savings.
Statement pooling is thus turned off
by default. If your driver does
preprocess PreparedStatements,
especially if it does so via IPC with
the RDBMS, you will probably see a
significant performance gain by
turning Statement pooling on. (Do this
by setting the configuration property
maxStatements or
maxStatementsPerConnection to a value
greater than zero.).
So: Is it reasonable to activate maxStatementsPerConnection with c3p0 and Postgresql?
Is there a real benefit activating it?
kind regards
Janning
I don't remember offhand if Hibernate actually stores PreparedStatement instances itself, or relies on the connection provider to reuse them. (A quick scan of BatcherImpl suggests it reuses the last PreparedStatement if executing the same SQL multiple times in a row)
I think the point that the c3p0 documentation is trying to make is that for many JDBC drivers, a PreparedStatement isn't useful: some drivers will end up simply splicing the parameters in client-side and then passing the built SQL statement to the database anyway. For these drivers, PreparedStatements are no advantage at all, and any effort to reuse them is wasted. (The Postgresql JDBC FAQ says this was the case for Postgresql before sever protocol version 3 and there is more detailed information in the documentation).
For drivers that do handle PreparedStatements usefully, it's still likely necessary to actually reuse PreparedStatement instances to get any benefit. For example if the driver implements:
Connection.prepareStatement(sql) - create a server-side statement
PreparedStatement.execute(..) etc - execute that server-side statement
PreparedStatement.close() - deallocate the server-side statement
Given this, if the application always opens a prepared statement, executes it once and then closes it again, there's still no benefit; in fact, it might be worse since there are now potentially more round-trips. So the application needs to hang on to PreparedStatement instances. Of course, this leads to another problem: if the application hangs on to too many, and each server-side statement consumes some resources, then this can lead to server-side issues. In the case where someone is using JDBC directly, this might be managed by hand- some statements are known to be reusable and hence are prepared; some aren't and just use transient Statement instances instead. (This is skipping over the other benefit of prepared statements: handling argument escaping)
So this is why c3p0 and other connection pools also have prepared statement caches- it allows application code to avoid dealing with all this. The statements are usually kept in some limited LRU pool, so common statements reuse a PreparedStatement instance.
The final pieces of the puzzle are that JDBC drivers may themselves decide to be clever and do this; and servers may themselves also decide to be clever and detect a client submitting a statement that is structurally similar to a previous one.
Given that Hibernate doesn't itself keep a cache of PreparedStatement instances, you need to have c3p0 do that in order to get the benefit of them. (Which should be reduced overhead for common statements due to reusing cached plans). If c3p0 doesn't cache prepared statements, then the driver will just see the application preparing a statement, executing it, and then closing it again. Looks like the JDBC driver has a "threshold" setting for avoiding the prepare/execute server overhead in the case where the application always does this. So, yes, you need to have c3p0 do statement caching.
Hope that helps, sorry it's a bit long winded. The answer is yes.
Remember that statements have to be cached per connection which will mean you're going to have to consume quite a chunk of memory and it will take a long time before you'll see any benefit. So if you set it to use 100 statements to be cached, that's actually 100*number of connections or else 100/no of connections but you will still need to take quite some time until your cache will have any meaningful effect.

Is it a good idea to re-use ADO.NET command objects?

I'm working on a .NET program that executes arbitrary scripts against a database.
When a colleage started writing the database access code, he simply exposed one command object to the rest of the application which is re-used (setting CommandText/Type, calling ExecuteNonQuery() etc.) for each statement.
I imagine this is a big performance hit for repeated, identical statements, because they are parsed anew each time.
What I'm wondering about, though, is: will this also degrade execution speed if each statement is different from the previous one (not only different parameters, but an entirely different statement)? I couldn't easily find an answer on that in the documentation.
Btw, the RDBMS used is Oracle, but I guess this question is not really database specific.
P.S. I know exposing the same Command object is not thread safe, but that's not an issue here.
There is some overhead involved in creating new command objects, and so in certain circumstances it can make sense to re-use the same command. But as the general case enforced for an entire application it seems more than a little odd.
The performance hit usually comes from establishing a connection to the database, but ADO.NET creates a connection pool to help here.
If you wish to avoid parsing statements each time anew, you can put them into stored procedures.
I imagine your colleague just uses some old style approach that he's inherited from working on other platforms where reusing a command object did make a difference.