What exactly does a connection pool for databases like PostgreSQL do? - postgresql

I know the general idea that a connection pool is a pool of reusable connections that speeds up traffic to the database because it can reuse connections instead of constantly creating new ones.
But this is a very high level explanation. It doesn't explain what is meant by a connection and why the connection pool works, since even with a connection pool such as for example client -> PgBouncer -> PostgreSQL, while the client does not have to create a connection to the databasee, it still has to connect to create a connection to the proxy.
So what is the connection created from (e.g.) client -> PgBouncer and why is creating this connection faster than creating the connection PgBouncer -> PostgreSQL?

There are two uses of a connection pool:
it prevents opening and closing database connections all the time
There is certainly a certain overhead with establishing a TCP connection to pgBouncer, but that is way cheaper than establishing a database connection. When you start a database connection, additional work is done:
a server process is started, which is way more expensive than a TCP connection
PostgreSQL loads cached metadata tables
it puts a limit on the number of client connections, thereby preventing database overload
The advantage over limiting max_connections is that connections in excess of the limit won't receive an error, but will be queued waiting for a connection to become free.

Related

How to monitor/list outgoing connections to foreign servers in postgres?

I currently use the native foreign data wrapper extension to connect to some other postgres instances (sharding architecture). Now I would like to understand how those underlying outgoing connections are managed by the fdw. I read in the documentation that libpq is used to handle the connections. There is no connection pooling in place but those connections are cached.
This connection is kept and re-used for subsequent queries in the same session.
I would like to list those connections to monitor those? We can list incoming connections via SELECT * FROM pg_stat_activity; - can we do something similar for outgoing connections?
The connections to the foreign server are kept open until the database session ends. To monitor that, set log_connections and log_disconnections to on on the target PostgreSQL server.

What's the difference between loginTimeout, connectTimeout and socketTimeout in pgjdbc

In pgjdbc we have:
loginTimeout
connectTimeout
socketTimeout
cancelSignalTimeout
But it isn't clear to me what's the difference (when are they applied) between loginTimeout, connectTimeout and socketTimeout.
As documented in the PostgreSQL JDBC documentation:
loginTimeout = int
Specify how long to wait for establishment of a database connection.
The timeout is specified in seconds.
connectTimeout = int
The timeout value used for socket connect operations. If connecting
to the server takes longer than this value, the connection is broken.
The timeout is specified in seconds and a value of zero means that it
is disabled.
socketTimeout = int
The timeout value used for socket read operations. If reading from
the server takes longer than this value, the connection is closed.
This can be used as both a brute force global query timeout and a
method of detecting network problems. The timeout is specified in
seconds and a value of zero means that it is disabled.
cancelSignalTimeout = int
Cancel command is sent out of band over its own connection, so
cancel message can itself get stuck. This property controls "connect
timeout" and "socket timeout" used for cancel commands. The timeout is
specified in seconds. Default value is 10 seconds.
The connectTimeout and socketTimeout are timeouts on low-level socket operations. The connectTimeout governs the time needed to establish a TCP socket connection. Establishing a TCP connection doesn't guarantee a login (it doesn't even guarantee that you're connecting to a PostgreSQL server, just that you connected to something that accepted your TCP connection). A socketTimeout governs the time a socket can be blocked waiting to read from a socket. This involves all reads from the server, not just during connect, but also during subsequent interaction with the server (eg executing queries).
On the other hand loginTimeout governs the PostgreSQL protocol operation of connecting and authenticating to the PostgreSQL server. This involves establishing a TCP connection followed by one or more exchanges of packets for the handshake and authentication to the PostgreSQL server (I'm not familiar with the details of the PostgreSQL protocol, so I can't be very specific).
Exchanging these packets can take additional time, or if you connected to something that isn't a PostgreSQL server the packet exchange may stall. It might be possible to solve this with careful control of both connectTimeout and socketTimeout, but there are no guarantees (eg data is being exchanged, but the login never completes). In addition, as the socketTimeout also governs all other operations on the connection, you may want to set it higher (eg for other operations that take a long time to get a response back) than you are willing to wait for the login to complete.
The cancelSignalTimeout is used as the connect and socket timeout of the separate TCP connection used for cancel commands.
After reading the source, I'd say it is like this:
connectTimeout specifies how long to wait for a TCP network connection to get established
loginTimeout specifies how long the whole process of logging into the database is allowed to take
socketTimeout specifies how long the client will wait for a response to a command from the server before throwing an error
The first two are related to establishing a connection, the third is relevant for the whole database session.
Establishing a TCP connection is part of establishing a database connection.

Postgres terminology: client vs connection

In Postgres, is there a one-to-one relationship between a client and a connection? In other word, is a client always one connection and no client can open more than one connection?
For example, when Postgres says:
org.postgresql.util.PSQLException: FATAL: sorry, too many clients already.
is that equivalent to "too many connections already"?
Also, as far as I understand, Postgres uses one process for each client. So does this mean that each process is used for one connection only?
Refer - https://www.postgresql.org/docs/9.6/static/connect-estab.html
PostgreSQL is implemented using a simple "process per user"
client/server model. In this model there is one client process
connected to exactly one server process. As we do not know ahead of
time how many connections will be made, we have to use a master
process that spawns a new server process every time a connection is
requested.
So yes, one server process serves one connection.
You can have as many connections from a single client (machine, application) as the server can manage. The server can support a given number of connections, whether or not these come from different clients (machine, application) is irrelevant to the server.
The connection is made to the postmaster process that is listening on the port that PG is configured to listen to (5432 by default). When a connection is established (after authentication), the server spawns a process which is used exclusively by a single client. That client can make multiple connections to the same server, for instance to connect to different databases, or the same database using different credentials, etc.

How to get the local port of a jdbc connection?

As far as I know when one establishes multiple Connection objects via JDBC to one database then each connection occupies a separate port on the machine where the Connection is established (they all connect to one port on the server where the DBMS is running).
I tried to extract the port that corresponds to the Connection objects. Unfortunately I did not find any way to do so.
Background: I'm doing performance analysis where I setup multiple clients which issue queries on the db. I'm logging the execution time of queries on the database server. In the resulting log I have - among others - information about the connection who initiated the query e.g. localhost.localdomain:44760 I hope it is possible to use this information to map each query to the client or more precisely the Connection object who initiated the query (which is my ultimate goal and serves analysis purposes).
Just run this select through the JDBC connection:
select inet_client_port()
More functions like that are in the manual:
http://www.postgresql.org/docs/current/static/functions-info.html

How do you know if PGBouncer is working?

I've set up PGBouncer and configured it to connect to my postgres DB and it all connects just fine, however I'm not sure if its actually working.
I have a php script that runs as a daemon and picks up beanstalk jobs. The problem is for every distinct user/action on the system it opens a new connection to postgres and then leaves that connection idling because the daemon doesn't actually stop running so the connection is never terminated (a quick fix for this was to reset the connection at the end of the script loop, but that is going to be inefficient with many connects).
Anyway, this caused postgres to eventually run out of connections and lock up...
So PGBouncer seems like the answer.
But now when I run it, I see the same db connection multiple times when i do ps ax | grep postgres.
Isn't PGBouncer supposed to only ever have 1 connection open to the DB and route all traffic through that connection? Then open a new connection if that one is full?
At present I have 3 for one db connection (my access control system) and 2 for the other database (my client specific data).
To me, it feels like if I roll out these changes then I will just be faced with the same problem that the connections will just get eaten up again because they're not being released.
I hope that explains enough for someone to offer any advice.
It sounds like an important step here is to fix the scripts so they release the connections when they're done. Once you do that, PgBouncer will help reduce the connection setup/teardown overhead, but in connection pooling mode it won't give you the ability to maintain more connections to Pg than you otherwise could.
However, you can also use PgBouncer in transaction-pooling mode. When used for transaction pooling, PgBouncer keeps a pool of idle backend transactions and only assigns them when a client does a BEGIN. Connections are returned to the pool after the client does a COMMIT or ROLLBACK. That means that in transaction pooling mode you can have large numbers of open connections to PgBouncer; they don't each need a corresponding connection to a PostgreSQL backend, so long as most of them are idle at any point in time and don't have any transaction open.
The only real downside of transaction pooling mode is that it breaks applications that expect to SET session-level variables, keep prepared statements across transactions, etc. Applications may need modification, e.g. to use SET LOCAL.