Heroku Postgres Connection Limit? - postgresql

I'm building a website attached to a Heroku Postgres database and am using the free hobby dev plan. Per Heroku, this means there's a "Maximum of 20 connections." Does this mean that a maximum of 20 people can be using the website with data being collected by the database on the back end? Any idea what happens if connections go above that level? The paid plans go up to a maximum connection limit of 500, but even that seems low to me if people are using this at the enterprise level. Any color on this would be greatly appreciated. There was a prior question on this but the answer wasn't quite clear to me.
Thanks!
What does database connection limit mean?

PostgreSQL could be configured to limit the number of simultaneous connections to the database. The Heroku comes with plans having connection limits. The 'Hobby' plans come with 20 connections whereas standard plans comes starting with 120 connections. When we start developing and testing, especially automated testings, the hobby plans raise the error PG::Error (FATAL: too many connections for role "xxxxxxx"). If we check the connections with Heroku CLI, we get
Heroku CLI
The immediate solution is to kill all connections with the command :
$ heroku pg:killall --app <app name>
This is not a permanent solution. We had the same issue with this website also. We tried many solutions available in the internet, especially in stack overflow.
It is very important to know how to calculate the no of connections required. Heroku documentation says...
Assuming that you are not manually creating threads in your application code, you can use your web server settings to guide the number of connections that you need. The Unicorn web server scales out using multiple processes, if you aren’t opening any new threads in your application, each process will take up 1 connection. So in your unicorn config file if you have worker_processes set to 3 like this:
worker_processes 3
Then your app will use 3 connections for workers. This means each dyno will require 3 connections. If you’re on a “Dev” plan, you can scale out to 6 dynos which will mean 18 active database connections, out of a maximum of 20. However, it is possible for a connection to get into a bad or unknown state.
Solution - Limit connections with PgBouncer
The easiest fix is to limit the connections with PG bouncer. For many frameworks, you must disable prepared statements in order to use PgBouncer. Then add the PgBouncer buildpack to your app.
$ heroku buildpacks:add https://github.com/heroku/heroku-buildpack-pgbouncer
The output will be something like
Buildpack added. Next release on will use:
heroku/python
https://github.com/heroku/heroku-buildpack-pgbouncer
Run git push heroku master to create a new release using these buildpacks.
Now you must modify your Procfile to start PgBouncer. In your Procfile add the command bin/start-pgbouncer-stunnel to the beginning of your web entry. So if your Procfile was
web: gunicorn .wsgi:application --worker-class gevent
Change it to:
web: bin/start-pgbouncer-stunnel gunicorn .wsgi:application --worker-class gevent
Commit the results to git, test on a staging app, and then deploy to production.
On deployment, you will see
OUTPUT

Depending on the web-framework you are using this can be different, but:
Typically you will have a maximum of one database connection per server process. This could be one per running web- or worker-dyno. Or more if your framework runs multiple thread / worker processes per dyno (most do).
These connections are then only used if there is an actual request to your application, not when the use is just viewing a page.
When you're running an async framework (node.js for example, or greenlets in python) this get's a little more complicated.
The easy way: just test it. You'll see the current connection count in the heroku interfaces. There are frameworks and services in the wild that let you test concurrent users.
The even easier way (since this runs on hobby plans, it seems like a hobby application): just see when it breaks :) .

Related

Opening DB connections to Postgres taking long

Some of our applications are facing issues with the connection pool. I run one of them. A JEE application on Payara 4.1 which uses PostgreSQL 9.5.8.
I have as good as no problems when running the application localy with local db instance. When running on the remote environment I have seen issues happening every 10 minutes that the application was unresponsive (well, it actually responded everything with HTTP status 503). Guessing it was related to opening connections taking long, we have set the parameter idleTimeoutInSeconds="0" in jdbc-resource. Now we have the same issues about 4 times a day which is an improvement, but - well - neighbour systems are still complaining.
We usually run with 5 steady connection allowing maximum connections of 30. Our application usually uses 1 up to 2 to handle traffic. With TCP dump I have seen, that at a certain point in time the connection pool tries to open many connections (the pool realizes the connections it holds have been closed by the DB without any information like TCP FIN, opening each connections takes about 1 second). During this time of about 30 seconds not all requests can be safely queued and some 503 happen.
Locally everything is fine. Opening a connection takes ~50ms and everyone is happy. Our postgres team is not helping at all and I am stuck with a problem. As I don't see any improvement possibility with the connection pool in JEE, I have radical ideas going in the direction of:
Refreshing the connections myself. All the time. Constantly. (Which would be hard to implement in JEE where I can not simply look into the connection pool and tell each connection to be refreshed just in case).
Replacing the not-helping-at-all JEE implementation of connection pool with something that works better. (Future generations of developers maintaining our app will hate me...)
Replacing the DB with something managed by myself. (Even dumber idea)
Does anyone:
Has any idea how I could perform 1 or 2 above?
Has any other ideas what could help?
Here my current JDBC resource definition if needed:
<jdbc-resource poolName="<poolName>" jndiName="<jndiName>" isConnectionValidationRequired="true"
connectionValidationMethod="table" validationTableName="version()" maxPoolSize="30"
validateAtmostOncePeriodInSeconds="30" statementTimeoutInSeconds="30" isTimerPool="true" steadyPoolSize="5"
idleTimeoutInSeconds="0" connectionCreationRetryAttempts="100000" connectionCreationRetryIntervalInSeconds="30"
maxWaitTimeInMillis="2000">

Deploy code to multiple production servers under the load balancer without continuous deployments

I am the only developer (full-stack) in my company I have too much work other than automating the deployments as of now. In the future, we may hire a DevOps guy for the same.
Problem: We have 3 servers under Load Balancer. I don't want to block 2nd & 3rd servers till the 1st server updated and repeat the same with 2nd & 3rd because there might be huge traffic for one server initially and may fail at some specif time before other servers go live.
Server 1
User's ----> Load Balancer ----> Server 2 -----> Database
Server 3
Personal Opinion: Is there a way where we can pull the code by writing any scripts in the Load Balancer. I can replace the traditional Digital Ocean load balancer with Nginx Server making it a reverse proxy.
NOTE: I know there are plenty of other questions asked in Stack
Overflow on the same but none of them solves my queries.
Solutions I know
GIT Hooks - I know somewhat about GIT Hooks but don't want to use it because if I commit to master branch by mistake then it must not get sync to my production and create havoc in the live server and live users.
Open multiple tabs of servers and do it manually (Current Scenario). Believe me its pain in the ass :)
Any suggestions or redirects to the solutions will be really helpful for me. Thanks in advance.
One of the solutions is to write ansible playbook for this. With Ansible, you can specify to run it per one host at the time and also as the last step you can include verification check that checks if your application responds with response code 200 or it can query some endpoint that indicates the status of your application. If the check fails, Ansible will stop the execution. For example, in your case, Server1 deploys fine, but on server2 it fails. The playbook will stop and you will have servers 1 and 3 running.
I have done it myself. Works fine in environments without continuous deployments.
Here is one example

Intermittent connection failures with heroku postgres while using play-slick

I have a play app on heroku connecting to a postgres instance with play-slick. Around 30% of the time when I deploy a new application I get this in my logs:
java.sql.SQLTransientConnectionException: db - Connection is not available, request timed out after 1007ms.
When I restart the application it will usually start again, though sometimes it takes a few tries.
Any advice for what I can do to debug this?
Most likely, there is a period of time where both the old app and the new app are trying to get connections to the database, which means you have double you max allowed connections active.
There are two solutions:
Upgrade your database plan to allow for more connection
Reduce your max db connections by half
play-slick uses HikariCP to pool connections, so you can probably configure your max connections with maximumPoolSize.
I believe I've figured out what the issue was. I used the default heroku play Procfile which contains -Ddb.default.url=${DATABASE_URL} and additionally had the slick db url specified in my conf. Removing the former solved the problem.

Grails shareable vs unshareable connection pools for postgres datasource

My problem is that I have two apps that are both getting these exceptions:
Caused by: org.apache.tomcat.jdbc.pool.PoolExhaustedException:
[pool-2-thread-273] Timeout: Pool empty. Unable to fetch a connection
in 30 seconds, none available[size:100; busy:99; idle:0;
lastwait:30000].
There are two apps:
grails app war running in tomcat connecting to postgres data source A
standalone jar connecting to data source B which is a different db on the same server as postgres data source A.
Both apps use org.apache.tomcat.jdbc.pool.ConnectionPool by default it seems (because I didn't configure the default pool anywhere and both apps use this). Also, my max connection limit is 200 and only using < 130 connections, so I'm not hitting a max connection issue. Since two apps are using separate data sources, I read that this would mean they cannot be the same conn pool.
When I login to my postgres server, I can see that app 2 has 100 idle connections and the max idle size of the pool is 100. So this is fine. However, what I was not expecting is that my app 1 would use connections from app 2's pool - or rather, since it appears that apps share a connection pool - I suppose that app 1 is trying to take from this common pool which already has 100 connections allocated. I would not really have expected that because they use a tomcat conn pool and AppB doesn't even use tomcat, so why would they be shared...
So my questions are (since I really am having a hard time finding docs about this):
Is it accurate that by default different apps would use the same conn pool?
If they're using the same conn pool then how can they share a conn pool if they're using different data sources?
Is it possible in grails to specify shared vs unshared conn pool? https://tomcat.apache.org/tomcat-7.0-doc/jndi-datasource-examples-howto.html This link mentions postgres specifically and that it seems like there's shared vs unshared concept (though I can't find any good documentation about this) but it's configured outside of grails. Any way to do it in grails?
Notes: Using grails 2.4.5 and postgres server 90502

mongodb i/o timeout when using clustered mongo instances

I have an application that is using the upper.io/db package for communication with a Mongo database server (which is a fairly simple wrapper around gopkg.in/mgo.v2). The way the application works is that it creates a session in the main thread on start-up, and then each individual go routine that needs to make requests to the mongo server calls Clone on the session and does a defer session.Close on the resulting value. As far as I can tell, this is all standard operating procedure.
This setup works without any errors in our development environments where we are either using a locally run MongoDB or a sandbox instance on MongoLab. Recently we promoted the application up to our staging environment where we have the application talking to a Shared Cluster instance of MongoDB on MongoLab (the cheapest 15$ option). This is where the weirdness starts happening. The /first/ request that goes through (from the first go-routine that gets invoked) comes back with the expected response, but the subsequent ones all return
read tcp <ip address>:47112: i/o timeout
This happens both from our local development machines pointed at the cluster or from the AWS host for the staging environment. Since the Mongo cluster is from Mongolabs I am going to assume that they've configured everything correctly on their end.
The code is somewhat boring TBH: It literally just opens the session in the main function and maintains a reference to it, and then there are multiple goroutines with this basic structure:
sess := session.Clone()
defer sess.Close()
// make requests to Mongo
During testing, I even restricted it to run only one thing at once (i.e. only one goroutine is active at any given time), and it still fails in the same fashion.
Has anybody run into this before? Do I need to configure upper.io/db in a specific fashion? Maybe use mgo directly? I am at my wits end with this :(
In a rather long and grueling process, we finally tracked down where this issue and similar ones like it came from in our program. It ended up being a session leak in the v1 version of the upper.io/db library. The bug and fix are outlined here, but the v1 version of this library is horribly outdated at this point and the later versions do not exhibit this issue.
I doubt this answer will be useful for anybody so late in the game (especially since we ourselves solved it like.. 3 years ago at this point), but just wanted to leave the answer here for completeness.