Opening DB connections to Postgres taking long - postgresql

Some of our applications are facing issues with the connection pool. I run one of them. A JEE application on Payara 4.1 which uses PostgreSQL 9.5.8.
I have as good as no problems when running the application localy with local db instance. When running on the remote environment I have seen issues happening every 10 minutes that the application was unresponsive (well, it actually responded everything with HTTP status 503). Guessing it was related to opening connections taking long, we have set the parameter idleTimeoutInSeconds="0" in jdbc-resource. Now we have the same issues about 4 times a day which is an improvement, but - well - neighbour systems are still complaining.
We usually run with 5 steady connection allowing maximum connections of 30. Our application usually uses 1 up to 2 to handle traffic. With TCP dump I have seen, that at a certain point in time the connection pool tries to open many connections (the pool realizes the connections it holds have been closed by the DB without any information like TCP FIN, opening each connections takes about 1 second). During this time of about 30 seconds not all requests can be safely queued and some 503 happen.
Locally everything is fine. Opening a connection takes ~50ms and everyone is happy. Our postgres team is not helping at all and I am stuck with a problem. As I don't see any improvement possibility with the connection pool in JEE, I have radical ideas going in the direction of:
Refreshing the connections myself. All the time. Constantly. (Which would be hard to implement in JEE where I can not simply look into the connection pool and tell each connection to be refreshed just in case).
Replacing the not-helping-at-all JEE implementation of connection pool with something that works better. (Future generations of developers maintaining our app will hate me...)
Replacing the DB with something managed by myself. (Even dumber idea)
Does anyone:
Has any idea how I could perform 1 or 2 above?
Has any other ideas what could help?
Here my current JDBC resource definition if needed:
<jdbc-resource poolName="<poolName>" jndiName="<jndiName>" isConnectionValidationRequired="true"
connectionValidationMethod="table" validationTableName="version()" maxPoolSize="30"
validateAtmostOncePeriodInSeconds="30" statementTimeoutInSeconds="30" isTimerPool="true" steadyPoolSize="5"
idleTimeoutInSeconds="0" connectionCreationRetryAttempts="100000" connectionCreationRetryIntervalInSeconds="30"
maxWaitTimeInMillis="2000">

Related

Postgres 11.8 on AWS Losing Connection/Terminating Query After 15 Minutes with No Notification or Error

I am running into an issue where multiple different clients apps (DataGrip, DBeaver, Looker) have their queries cancelled after exactly 15 minutes, but no termination message or connection error is ever sent to the app. As far as the app is concerned, the query is still running even though it has been terminated in Postgres.
For example, if I run the following query, according to the client app it just runs forever. If I check pg_stat_activity, it shows the query no longer running after 15 minutes.
SELECT pg_sleep(16 * 60);
Does anyone know of a Postgres or AWS setting that would cause this? I've checked the configuration and couldn't find any settings set to a value of 15 minutes (or 900 seconds).
There is probably a ill-configured firewall that closes your session.
Assuming that the clients you are mentioning use libpq to connect to PostgreSQL, include this in the connection string:
keepalives_idle=300
See the documentation for details.
You could of course also configure the TCP stack on your operating system to use that value, so the problem will never surface again.
Your DB log might be able to tell you what happened.
In addition, check your statement_timeout setting. The units are milliseconds so you should be looking for 900000, not 900.
If it's not that, there exist firewalls that kill idle connections. Setting tcp_keepalives_idle could help avoid those types of problems.

Heroku Postgres Connection Limit?

I'm building a website attached to a Heroku Postgres database and am using the free hobby dev plan. Per Heroku, this means there's a "Maximum of 20 connections." Does this mean that a maximum of 20 people can be using the website with data being collected by the database on the back end? Any idea what happens if connections go above that level? The paid plans go up to a maximum connection limit of 500, but even that seems low to me if people are using this at the enterprise level. Any color on this would be greatly appreciated. There was a prior question on this but the answer wasn't quite clear to me.
Thanks!
What does database connection limit mean?
PostgreSQL could be configured to limit the number of simultaneous connections to the database. The Heroku comes with plans having connection limits. The 'Hobby' plans come with 20 connections whereas standard plans comes starting with 120 connections. When we start developing and testing, especially automated testings, the hobby plans raise the error PG::Error (FATAL: too many connections for role "xxxxxxx"). If we check the connections with Heroku CLI, we get
Heroku CLI
The immediate solution is to kill all connections with the command :
$ heroku pg:killall --app <app name>
This is not a permanent solution. We had the same issue with this website also. We tried many solutions available in the internet, especially in stack overflow.
It is very important to know how to calculate the no of connections required. Heroku documentation says...
Assuming that you are not manually creating threads in your application code, you can use your web server settings to guide the number of connections that you need. The Unicorn web server scales out using multiple processes, if you aren’t opening any new threads in your application, each process will take up 1 connection. So in your unicorn config file if you have worker_processes set to 3 like this:
worker_processes 3
Then your app will use 3 connections for workers. This means each dyno will require 3 connections. If you’re on a “Dev” plan, you can scale out to 6 dynos which will mean 18 active database connections, out of a maximum of 20. However, it is possible for a connection to get into a bad or unknown state.
Solution - Limit connections with PgBouncer
The easiest fix is to limit the connections with PG bouncer. For many frameworks, you must disable prepared statements in order to use PgBouncer. Then add the PgBouncer buildpack to your app.
$ heroku buildpacks:add https://github.com/heroku/heroku-buildpack-pgbouncer
The output will be something like
Buildpack added. Next release on will use:
heroku/python
https://github.com/heroku/heroku-buildpack-pgbouncer
Run git push heroku master to create a new release using these buildpacks.
Now you must modify your Procfile to start PgBouncer. In your Procfile add the command bin/start-pgbouncer-stunnel to the beginning of your web entry. So if your Procfile was
web: gunicorn .wsgi:application --worker-class gevent
Change it to:
web: bin/start-pgbouncer-stunnel gunicorn .wsgi:application --worker-class gevent
Commit the results to git, test on a staging app, and then deploy to production.
On deployment, you will see
OUTPUT
Depending on the web-framework you are using this can be different, but:
Typically you will have a maximum of one database connection per server process. This could be one per running web- or worker-dyno. Or more if your framework runs multiple thread / worker processes per dyno (most do).
These connections are then only used if there is an actual request to your application, not when the use is just viewing a page.
When you're running an async framework (node.js for example, or greenlets in python) this get's a little more complicated.
The easy way: just test it. You'll see the current connection count in the heroku interfaces. There are frameworks and services in the wild that let you test concurrent users.
The even easier way (since this runs on hobby plans, it seems like a hobby application): just see when it breaks :) .

Intermittent connection failures with heroku postgres while using play-slick

I have a play app on heroku connecting to a postgres instance with play-slick. Around 30% of the time when I deploy a new application I get this in my logs:
java.sql.SQLTransientConnectionException: db - Connection is not available, request timed out after 1007ms.
When I restart the application it will usually start again, though sometimes it takes a few tries.
Any advice for what I can do to debug this?
Most likely, there is a period of time where both the old app and the new app are trying to get connections to the database, which means you have double you max allowed connections active.
There are two solutions:
Upgrade your database plan to allow for more connection
Reduce your max db connections by half
play-slick uses HikariCP to pool connections, so you can probably configure your max connections with maximumPoolSize.
I believe I've figured out what the issue was. I used the default heroku play Procfile which contains -Ddb.default.url=${DATABASE_URL} and additionally had the slick db url specified in my conf. Removing the former solved the problem.

Multiple could not receive data from client: Connection reset by peer Postgresql and Resque

I have a server that runs Postgresql. in the logs I am seeing this message for my resque based 'worker' box, multiple times a minute. Some minutes there isn't a message, others could be 10 times.
2016-01-12 13:40:36 EST:1.1.8.2(33899):[16141]: LOG: could not receive data from client: Connection reset by peer
Now when i go into the 1.1.8.2 box to look at netstat -ntp i don't see a port 33899, and most of them are at least in the 40xxx range by now. That may be conjecture but I'm at a loss to find out why a Redis/Resque/Puma Rails stack would be printing out these messages, let alone what that means even if i get to the bottom of it.
Will I gain memory back if they are closed 'normally'?
Is this a thing to be wary of?
How does one debug OLD ports that are open when the db box and the worker box both don't display the ports any more?
This message is probably due to the resque worker task not closing the database connection before it exits. It's not a huge problem, but presumably Postgres is doing a little extra work to clean it up, and it makes a mess of your log file...
One solution is to add a hook to your resque worker's task file (the same file that contains the self.perform definition):
def self.after_perform(*args)
ActiveRecord::Base.connection.disconnect!
end

Sqlalchemy: Connections aren't closed when pool is overflowed

When I run ab (apache benchmark) on my site (with SQLAlchemy and postgresql hosted on Apache web server), SQLAlchemy makes many connections to postgre and I got too many connections error.
I traced the problem, and found that problem is the pool (actually QueuePool).
The documentation at http://www.sqlalchemy.org/docs/core/pooling.html#sqlalchemy.pool.Pool says that if when the pool is full, returning connections (that opened because max_overflow allowed creation of these extra connections) will be discarded and disconnected.
But it seems connections actually didn't close! They silently dropped out of pool without closing.
So SQLAlchemy continuously opens new connections, ignores them (without closing!) and opens new ones.
Increasing pool size is not the real solution, the problem is additional connections aren't closed.
(Default settings for QueuePool is pool_size=5 and max_overflow=10)
Looks like a bug in SQLAlchemy, fixed 2 weeks ago: http://hg.sqlalchemy.org/sqlalchemy/rev/aff95843c12f#l2.17
There was no release with this fix, so you have to patch it manually.
i think its bug and fixed ... install from source and have fun ;)