It's my understanding that you can use prepared statements or connection pooling (with tools like pgPool/pgBouncer) with Postgresql, but can benefit from only one at the same time (at least with Npgsql driver for .NET, plus library author suggests turning off clients-side connection pooling when using PgBouncer). Am I right?
If so - is this true for other runtimes and languages, like Java, Python, Go? Or is it a implementation-specific issue?
It's a complex question, but here are some answers.
As #laurenz-albe writes, you can use pgbouncer and prepared statements but need to use session pooling. This allows you to use prepared statements for the duration of your connection (i.e. as long as your NpgsqlConnection instance is open). However, if you're in a short-lived connection scenario (e.g. web app which opens and closes a connection for each HTTP request), you're out of luck. In this sense, one could say that pooling and prepared statements aren't compatible.
However, if you use Npgsql's internal pooling mechanism (on by default) instead of pgbouncer, then your prepared statements are automatically persisted across connection open/close. In other words, when you call NpgsqlCommand.Prepare(), if the physical connection happened to have already prepared the SQL, then the prepared statement is reused. This is done specifically to unlock the speed advantages of prepared statements for short-lived connection scenarios. This is a pretty unique behavior of Npgsql, see the docs for more info.
This is one of the advantages of an in-process connection pool, as opposed to an out-of-process pool such as pgbouncer - Npgsql retains information on the physical connection as it is passed around, in this case a table of which statements are prepared (name and SQL).
I think this is a generic question, so I'll give a generic answer. What aspect is applicable to a specific connection pool implementation will probably vary.
There are several modes of connection pooling:
A thread retains a connection for the duration of a session (session pooling):
In that case, persistent state like prepared statements can be held for the duration of the session, but you should clean the state when the session is returned to the pool.
A thread retains a connection for the duration of a database transaction (transaction pooling):
In that case, you'd have to clean the state after each transaction, so prepared statements don't make much sense.
A thread retains a connectoin for the duration of a statement (statement poling):
This is only useful in very limited cases where you don't need transactions spanning more than a single statement. Obviously, no state like prepared statements can be shared.
It depends what kind of connection pool you use. Basically, the longer a thread retains a connection, the more sense it makes to use prepared statements.
Of course, if you know what you are doing, you can also create a prepared statement right after the database connection is established and never deallocate it. This will only work if all threads need the same prepared statements. It is easy to screw up with a setup like that.
Related
I am using MongoDB via its driver for node.js
I typically open a connection (via the connect() method) any time I need to perform an operation and close it (via the close() method) as soon as I am finished. In my programs, as natural, I need to perform many operations against the MongoDB and therefore it happens that I open and close many times the connection.
I am wondering whether this is a good practice or whether it would be better to open the connection as the first operation is executed, store it in a variable and use the already opened connections for the following operations closing it when the program ends.
Any advice is very much appreciated.
It is best practice to open the connection once, store it in a variable and close it at the end. MongoDB explicitly recommends this. This is the reason why opening and closing a connection is part of the MongoDB API rather than have it happen automatically for each query.
Opening and closing connections for each query will introduce a significant overhead both in terms of performance (CPU + latency), network traffic, memory management (creating and deleting objects), not only for the client but also for the server itself, which also impacts other clients.
About the terminology of connection: in some drivers like Java, what is actually created and stored in a variable is not a physical connection, but a MongoClient instance. It looks like a connection from an abstract (API) perspective, but it actually encapsulates the actual physical connection(s) and hides complexity from the user.
Creating the MongoClient instance only once, for the drivers that support this, will also allow you to benefit from connection pooling where the driver maintains active connections in parallel for you, so that you also only need to create one MongoClient instance across multiple threads.
I am currently working with a very large database (>50GB) and trying to understand the most efficient, usable approach that plays well with Akka's inherent threading.
Regarding the "wrapping everything inside withSession{ }" approach, while this would be an easier fix, I am concerned that this would restrict Akka's threading between actors. I am not that knowledgeable on how Akka's threading works, and how wrapping an entire actor system inside of withSession would effect it.
Another approach is to call withSession whenever the database is accessed, which is too inefficient. The "withSession {" code segment takes ~6ms to execute, and we are making millions of queries.
Essentially: what is the best way to rapidly access a database with Slick and Akka without breaking threading?
I have heard of approaches using implicit sessions and transactions, but I am struggling to find documentation on either of these.
Better late than never:
The recommended way is using a jdbc connection pool (e.g. c3p0). You need to make sure a session is acquired and returned from and in between kept on the same thread. withSession lazily acquires a connection from the pool and returns it at the end of the scope. Quickly acquire connections when you need them and return them to the pool immediately after.
I have just moved to PostgreSQL after having worked with Oracle for a few years.
I have been looking into some performance issues with prepared statements in the application (Java, JDBC) with the PostgreSQL database.
Oracle caches prepared statements in its SGA - the pool of prepared statements is shared across database connections.
PostgreSQL documentation does not seem to indicate this. Here's the snippet from the documentation (https://www.postgresql.org/docs/current/static/sql-prepare.html) -
Prepared statements only last for the duration of the current database
session. When the session ends, the prepared statement is forgotten,
so it must be recreated before being used again. This also means that
a single prepared statement cannot be used by multiple simultaneous
database clients; however, each client can create their own prepared
statement to use.
I just want to make sure that I am understanding this right, because it seems so basic for a database to implement some sort of common pool of commonly executed prepared statements.
If PostgreSQL does not cache these that would mean every application that expects a lot of database transactions needs to develop some sort of prepared statement pool that can be re-used across connections.
If you have worked with PostgreSQL before, I would appreciate any insight into this.
Yes, your understanding is correct. Typically if you had a set of prepared queries that are that critical then you'd have the application call a custom function to set them up on connection.
There are three key reasons for this afaik:
There's a long todo list and they get done when a developer is interested/paid to tackle them. Presumably no-one has thought it worth funding yet or come up with an efficient way of doing it.
PostgreSQL runs in a much wider range of environments than Oracle. I would guess that 99% of installed systems wouldn't see much benefit from this. There are an awful lot of setups without high-transaction performance requirement, or for that matter a DBA to notice whether it's needed or not.
Planned queries don't always provide a win. There's been considerable work done on delaying planning/invalidating caches to provide as good a fit as possible to the actual data and query parameters.
I'd suspect the best place to add something like this would be in one of the connection pools (pgbouncer/pgpool) but last time I checked such a feature wasn't there.
HTH
we are running java6/hibernate/c3p0/postgresql stack.
Our JDBC Driver is 8.4-701.jdbc3
I have a few questions about Prepared Statements. I have read
excellent document about Prepared Statements
But i still have a question how to configure c3p0 with postgresql.
At the moment we have
c3p0.maxStatements = 0
c3p0.maxStatementsPerConnection = 0
In my understanding the prepared statements and statement pooling are two different things:
Our hibernate stack uses prepared statements. Postgresql is caching the
execution plan. Next time the same statement is used, postgresql reuses the
execution plan. This saves time planning statements inside DB.
Additionally c3p0 can cache java instances of "java.sql.PreparedStatement"
which means it is caching the java object. So when using
c3p0.maxStatementsPerConnection = 100 it caches at most 100 different
objects. It saves time on creating objects, but this has nothing to do with
the postgresql database and its prepared statements.
Right?
As we use about 100 different statements I would set
c3p0.maxStatementsPerConnection = 100
But the c3p0 docs say in c3p0 known shortcomings
The overhead of Statement pooling is
too high. For drivers that do not
perform significant preprocessing of
PreparedStatements, the pooling
overhead outweighs any savings.
Statement pooling is thus turned off
by default. If your driver does
preprocess PreparedStatements,
especially if it does so via IPC with
the RDBMS, you will probably see a
significant performance gain by
turning Statement pooling on. (Do this
by setting the configuration property
maxStatements or
maxStatementsPerConnection to a value
greater than zero.).
So: Is it reasonable to activate maxStatementsPerConnection with c3p0 and Postgresql?
Is there a real benefit activating it?
kind regards
Janning
I don't remember offhand if Hibernate actually stores PreparedStatement instances itself, or relies on the connection provider to reuse them. (A quick scan of BatcherImpl suggests it reuses the last PreparedStatement if executing the same SQL multiple times in a row)
I think the point that the c3p0 documentation is trying to make is that for many JDBC drivers, a PreparedStatement isn't useful: some drivers will end up simply splicing the parameters in client-side and then passing the built SQL statement to the database anyway. For these drivers, PreparedStatements are no advantage at all, and any effort to reuse them is wasted. (The Postgresql JDBC FAQ says this was the case for Postgresql before sever protocol version 3 and there is more detailed information in the documentation).
For drivers that do handle PreparedStatements usefully, it's still likely necessary to actually reuse PreparedStatement instances to get any benefit. For example if the driver implements:
Connection.prepareStatement(sql) - create a server-side statement
PreparedStatement.execute(..) etc - execute that server-side statement
PreparedStatement.close() - deallocate the server-side statement
Given this, if the application always opens a prepared statement, executes it once and then closes it again, there's still no benefit; in fact, it might be worse since there are now potentially more round-trips. So the application needs to hang on to PreparedStatement instances. Of course, this leads to another problem: if the application hangs on to too many, and each server-side statement consumes some resources, then this can lead to server-side issues. In the case where someone is using JDBC directly, this might be managed by hand- some statements are known to be reusable and hence are prepared; some aren't and just use transient Statement instances instead. (This is skipping over the other benefit of prepared statements: handling argument escaping)
So this is why c3p0 and other connection pools also have prepared statement caches- it allows application code to avoid dealing with all this. The statements are usually kept in some limited LRU pool, so common statements reuse a PreparedStatement instance.
The final pieces of the puzzle are that JDBC drivers may themselves decide to be clever and do this; and servers may themselves also decide to be clever and detect a client submitting a statement that is structurally similar to a previous one.
Given that Hibernate doesn't itself keep a cache of PreparedStatement instances, you need to have c3p0 do that in order to get the benefit of them. (Which should be reduced overhead for common statements due to reusing cached plans). If c3p0 doesn't cache prepared statements, then the driver will just see the application preparing a statement, executing it, and then closing it again. Looks like the JDBC driver has a "threshold" setting for avoiding the prepare/execute server overhead in the case where the application always does this. So, yes, you need to have c3p0 do statement caching.
Hope that helps, sorry it's a bit long winded. The answer is yes.
Remember that statements have to be cached per connection which will mean you're going to have to consume quite a chunk of memory and it will take a long time before you'll see any benefit. So if you set it to use 100 statements to be cached, that's actually 100*number of connections or else 100/no of connections but you will still need to take quite some time until your cache will have any meaningful effect.
I'm working on a .NET program that executes arbitrary scripts against a database.
When a colleage started writing the database access code, he simply exposed one command object to the rest of the application which is re-used (setting CommandText/Type, calling ExecuteNonQuery() etc.) for each statement.
I imagine this is a big performance hit for repeated, identical statements, because they are parsed anew each time.
What I'm wondering about, though, is: will this also degrade execution speed if each statement is different from the previous one (not only different parameters, but an entirely different statement)? I couldn't easily find an answer on that in the documentation.
Btw, the RDBMS used is Oracle, but I guess this question is not really database specific.
P.S. I know exposing the same Command object is not thread safe, but that's not an issue here.
There is some overhead involved in creating new command objects, and so in certain circumstances it can make sense to re-use the same command. But as the general case enforced for an entire application it seems more than a little odd.
The performance hit usually comes from establishing a connection to the database, but ADO.NET creates a connection pool to help here.
If you wish to avoid parsing statements each time anew, you can put them into stored procedures.
I imagine your colleague just uses some old style approach that he's inherited from working on other platforms where reusing a command object did make a difference.