Postgres: Concurrent queries in a connection - postgresql

In Postgres is there a limitation of having just one executing query per connection (and other queries in the connection wait for the first to complete before they start)? I think I am seeing this in one driver so I want to be sure this is a db behavior and not a specific driver limitation.

In Postgres is there a limitation of having just one executing query per connection
Yes. PostgreSQL doesn't let you suspend and resume transactions, nor does it support background (asynchronous) queries on the server back-end.
You can still run multiple concurrent queries, you just need one connection per concurrent query. You can use threads (one thread per connection) but it's usually better to use asynchronous query interfaces in your client library.
Without knowing what you're trying to achieve, and what what programming language (and thus what client library) you're using it's hard to offer more detailed advice.

Related

Multithreading with PostgreSQL JDBC

I'm still a student and not so experienced with multithreading and databases so I might have missed some obvious stuff - hoping for an answer at a more beginner level.
I'm busy creating a dummy Java application that allows users to submit subway station locations and then lookup the nearest station to their location. This is all happening over HTTP.
The backend for this application is PostgreSQL (with PostGis) and I connect to the database via the PostgreSQL JDBC.
I want my application to be as multithreaded as possible. Every time I receive a new HTTP connection, I spin up a new thread and service the users request. But I'm not sure how much point there is to this if reads and writes to the database themselves cannot be parallel.
According to this, PostgreSQL JDBC is not thread safe. But what does that mean exactly? Does that just mean that reads and writes within a single connection are not thread safe (i.e. in each instance of DriverManager.getConnection())? But what about if I made a new connection every time an HTTP request came in? Would that be safe to do in parallel? And would that affect performance badly?
Any other suggestions on broad approach to take?
JDBC in general is not thread-safe, not just the Postgres driver.
This means you can not run multiple statements created from the same Connection instance in multiple threads at the same time.
If you want to run two statements in parallel, you need to create two different physical connections to the database.
As creating connections isn't cost-free, the usual approach is to have a connection pool (e.g. through a ConnectionPoolDataSource) that keeps a set of connections open. The application then takes connections from the pool and puts them back when the query (or transaction) is done.

Postgres connection should be always on? or connect before running each query?

I am debating if I should keep my postgres connection always on, and check/re-connect before running query. Or I should connect it before run each query and close the connection as soon as it is done. Thanks!
As long as the Postgres server isn't totally jammed with connections (i.e. this is not an app that will be creating a gigantic number of perpetual connections), I don't think it's a problem to maintain the connection. I would also recommend checking the connection and handling reconnects prior to each query however. Many libraries offer ways to do this. For example, with MyBatis (Java), you can have it issue a test query each time, which can be specified. I use the lightweight SELECT 1 for this.
I would say the key thing to consider is to keep the connection idle in transaction for as little time as possible, as when that happens, it can have a variety of different impacts on performance (such as slowing down other queries, preventing high-turnover tables from being vacuumed in a timely manner, etc.). This is not to say that any time spent in idle in transaction is automatically bad, but it should be considered and minimized where possible. (e.g. if you have some calculations that are going several minutes to run, make sure to either commit or rollback prior to doing those (which one would depend on context).
If you're doing a bunch of SELECTs, and don't have anything you need to commit, I would recommend doing a rollback to help keep the idle in transaction states to a minimum.
I just realized the postgres connection string has a bunch of setting for the connection pooling, for example:
User ID=root;Password=myPassword;Host=localhost;Port=5432;Database=myDataBase;
Pooling=true;Min Pool Size=0;Max Pool Size=100;Connection Lifetime=0;
So in my code, I can just close the connection after the command finished execution. But behind the scene, the connection is actually still alive and stored in the connection pool to be used again.

Will queries fall into a queue for run in the same order that they were requested (in mysqli Persistent Connection)?

If hundreds of queries via mysqli(php extention) Persistent Connection be sent on the mysql at the same time, queries fall into a queue for run in the same order that they were requested?
Will queries fall into a queue for run in the same order that they were requested?
Yes.
It doesn't matter if it's mysqli or persistent though.
It won't be "the same time" too.
Each persistent connection is used by only one PHP process at a time. And such a process being executed sequentially, running one query after another.

Does it make sense to use pg_pconnect (php-fpm)

I have about 11000 hits a second on 10 servers with php-fpm. I'm migrating to postgres from mysql, so my question is Does it make sense to use pg_*p*connect?
It's better to use a dedicated connection pooler like PgBouncer.
Performance should be comparable to pg_pconnect, but PgBouncer will allow to perform a cleanup after an error in PHP code. pg_pconnect will not automatically clean open transactions, locks, prepared statements etc.
Establishing a connection to a PostgreSQL server is expected to be significantly more expensive than to a MySQL server. This is due to different design choices of these databases in how they handle resource allocation and privilege separation between independent connections.
Therefore, for a website, it totally makes sense to reuse connections to PostgreSQL whenever possible.
The way generally recommended is not to use pg_pconnect but rather an external connection pooler like pgBouncer or pgPoolII which are better suited for this task. When using PHP-FPM however, you already have a middleware that lets you control somehow the number of open connections through the fpm process manager options, so it may be good enough. You may consider setting pm.max_requests to a non-zero value to make sure that connections get cleaned up at a reasonable frequency and avoid keeping a pile of unused connections during off-peak hours.
Well, pg_pconnect will mean you have one connection per PHP backend, so it depends how many backends you have. With a traditional Apache mod-php setup it'd be a non-starter but you might get away with it.
The database server can handle hundreds of idle connections, but almost certainly grind to a halt if they all have queries being issued concurrently. I've seen a rule-of-thumb of no more than two connections per core - that's assuming I/O doesn't limit you first.
The common approach is to run a connection pooler like pgbouncer and have php connect per-request. That reduces your connection overhead while keeping concurrency plausible.

PostgreSQL consuming large amount of memory for persistent connection

I have a C++ application which is making use of PostgreSQL 8.3 on Windows. We use the libpq interface.
We have a multi-threaded app where each thread opens a connection and keeps using without PQFinish it.
We notice that for each query (especially the SELECT statements) postgres.exe memory consumption would go up. It goes up as high as 1.3 GB. Eventually, postgres.exe crashes and forces our program to create a new connection.
Has anyone experienced this problem before?
EDIT: shared_buffer is currently set to be 128MB in our conf. file.
EDIT2: a workaround that we have in place right now is to call PQfinish for every transaction. But then, this slows down our processing a bit since establishing a connection every time is quite slow.
In PostgreSQL, each connection has a dedicated backend. This backend not only holds connection and session state, but is also an execution engine. Backends aren't particularly cheap to leave lying around, and they cost both memory and synchronization overhead even when idle.
There's an optimum number of actively working backends for any given Pg server on any given workload, where adding more working backends slows things down rather than speeding it up. You want to find that point, and limit the number of backends to around that level. Unfortunately there's no magic recipe for this, it mostly involves benchmarking - on your hardware and with your workload.
If you need more connections than that, you should use a proxy or pooling system that allows you to separate "connection state" from "execution engine". Two popular choices are PgBouncer and PgPool-II . You can maintain light-weight connections from your app to the proxy/pooler, and let it schedule the workload to keep the database server working at its optimum load. If too many queries come in, some wait before being executed instead of competing for resources and slowing down all queries on the server.
See the postgresql wiki.
Note that if your workload is read-mostly, and especially if it has items that don't change often for which you can determine a reliable cache invalidation scheme, you can also potentially use memcached or Redis to reduce your database workload. This requires application changes. PostgreSQL's LISTEN and NOTIFY will help you do sane cache invalidation.
Many database engines have some separation of execution engine and connection state built in to the core database engine's design. Sybase ASE certainly does, and I think Oracle does too, but I'm not too sure about the latter. Unfortunately, because of PostgreSQL's one-process-per-connection model it's not easy for it to pass work around between backends, making it harder for PostgreSQL to do this natively, so most people use a proxy or pool.
I strongly recommend that you read PostgreSQL High Performance. I don't have any relationship/affiliation with Greg Smith or the publisher*, I just think it's great and will be very useful if you're concerned about your DB's performance.
* ... well, I didn't when I wrote this. I work for the same company now.
The memory usage is not necessarily a problem. PostgreSQL uses shared memory for some caching, and this memory does not count towards the size of the process memory usage until it's actually used. The more you use the process, the larger parts of the shared buffers will be active in it's address space.
If you have a large value for shared_buffers, this will happen. If you have it too large, the process can run out of address space and crash, yes.
The problem is probably that you don't close the transaction,
In PostgreSQL even if you do only selects without DML it runs in transaction which need to be rollback.
By adding rollback at the end of the transaction will reduce your memory problem