flask postgresql connection pool using uwsgi to run behind nginx - postgresql

I'm relatively new to web servers and web applications.
We have a basic python application implemented on flask which was deployed on the server using uwsgi to run behind nginx server. Currently our app has to create a postgreSQL db connection on every requests which takes more than ideal time. What I need is a persistent connection pool which will be created only once and reused on each request.
I tried to create a connection pool using pssycopg2 built-in functionality
# create pool with min number of connections of 1, max of 10
a = psycopg2.pool.SimpleConnectionPool(1,10,database='YOURDB', otherstuff...)
But Couldn't get a global persistent connection pool to be used, when processing a request, may be because of UWSGI.
I looked up connection pooling with flask UWSGI but couldn't find sufficient information to create connection pools
Question 1: How or what is the best way to implement connection pooling in such an environment mentioned above (flask + UWSGI + NGINX)?
Question 2: If I implement an ORM like SQL-Alchemy. Will that be able to provide efficient connection pooling in the above case or will UWSGI block its capabilities to provide a connection pool?

There are a few PostgreSQL pooling options:
pgpool
PgBouncer
psycopg2.pool
I'm using psycopg2.pool.ThreadedConnectionPool with a small Flask app and it is working fine.
connection_pool = pool.ThreadedConnectionPool(MINCONN, MAXCONN, database=DB, user=USER, password=PASSWORD)
conn = connection_pool.getconn()
# use conn, then:
connection_pool.putconn(conn)
It's up to you to handle what happens when the pool is full, as getconn() doesn't block.

Related

What exactly does a connection pool for databases like PostgreSQL do?

I know the general idea that a connection pool is a pool of reusable connections that speeds up traffic to the database because it can reuse connections instead of constantly creating new ones.
But this is a very high level explanation. It doesn't explain what is meant by a connection and why the connection pool works, since even with a connection pool such as for example client -> PgBouncer -> PostgreSQL, while the client does not have to create a connection to the databasee, it still has to connect to create a connection to the proxy.
So what is the connection created from (e.g.) client -> PgBouncer and why is creating this connection faster than creating the connection PgBouncer -> PostgreSQL?
There are two uses of a connection pool:
it prevents opening and closing database connections all the time
There is certainly a certain overhead with establishing a TCP connection to pgBouncer, but that is way cheaper than establishing a database connection. When you start a database connection, additional work is done:
a server process is started, which is way more expensive than a TCP connection
PostgreSQL loads cached metadata tables
it puts a limit on the number of client connections, thereby preventing database overload
The advantage over limiting max_connections is that connections in excess of the limit won't receive an error, but will be queued waiting for a connection to become free.

How to find connection leaks on PostgreSQL cloud sql

I’m using Postgres provisioned by Google Cloud SQL,
Recently we see the number of connections to increase by a lot.
Had to raise the limit from 200 to 500, then to 1000. In Google Cloud console Postgres reports 800 currenct connections.
However I have no idea where these connections come from. We have one app engine service, with not a lot of traffic at the moment accessing it, another application hosted on kubernetes. And a dozen or so batch jobs that connect to it. Clearly there must be some connection leakage somewhere.
Is there any way I can see from where these connections originate ?
All applications connecting to it are Java based at the moment.
They use the HikariCP connection pool. I’m considering changing the “test query”upon connection to insert a record in a log table. Hence I could perhaps find out from where the connections originate.
But are there better ways available?
Thanks,
Consider monitoring connection activity with pg_stat_activity, i.e: SELECT * from pg_stat_activity;.
As per the documentation:
Connections that show an IP address, such as 1.2.3.4, are connecting using IP. Connections with cloudsqlproxy~1.2.3.4 are using the Cloud SQL Proxy, or else they originated from App Engine. Connections from localhost are usually to a First Generation instance from App Engine, although that path is also used by some internal Cloud SQL processes.
Also, take a look at the best practices for managing database connections that contain information on opening and closing connections, connection count, or on how to set a connection duration in the Java programming language.

Connection timeout to MongoDb on Azure VM

I have some timeout problems when connecting my Azure Web App to a MongoDb hosted on a Azure VM.
2015-12-19T15:57:47.330+0100 I NETWORK Socket recv() errno:10060 A connection attempt
failed because the connected party did not properly respond after a period of time,
or established connection failed because connected host has failed to respond.
2015-12-19T15:57:47.343+0100 I NETWORK SocketException: remote: 104.45.x.x:27017 error:
9001 socket exception [RECV_ERROR] server [104.45.x.x:27017]
2015-12-19T15:57:47.350+0100 I NETWORK DBClientCursor::init call() failed
Currently mongodb is configured on a single server (just for dev) and it is exposed through a public ip. Website connect to it using an azure domain name (*.westeurope.cloudapp.azure.com) and without a Virtual Network.
Usually everything works well, but after some minutes of inactivity I get that timeout exception. The same will happen when using the MongoDb shell from my PC, so I'm quite sure that it is a problem on mongodb side.
I'm missing some configuration?
After some searching here my considerations:
It is usually a good practice to implement some sort of retry logic on every resource that you access on Azure (database, VM, ...). For MongoDb there is a partial implementation so you should potentially write your own. See also this issue and this.
If possible all resources on Azure should be in the same Azure Virtual Network (in this way all connections are made using Azure Private Ip instead of Public Ip. This is also useful for security reasons because you don't need to open endpoint to the public.
When deploying MongoDb on Azure try to follow the official MongoDb guidelines.
In this particular case you should set the net.ipv4.tcp_keepalive_time to a value lower than the tcp keep alive of Azure, that by default is 240 seconds. In this way the connection is closed and MongoDb driver can intercept this condition and open a new connection. If the connection is closed by Azure the driver cannot intercept it. If you want to change this setting on Azure (not recommended) you can find it inside the Public Ip configuration.
In my development environment I have set the net.ipv4.tcp_keepalive_time to 120 and now everything seems to work fine. Consider that if you host MondoDb inside an Docker container you should set this setting on the Docker host.
Here some other useful links:
http://focusmatic.tumblr.com/post/39569711018/solving-mongodb-connection-losses-on-windows-azure
https://docs.mongodb.org/ecosystem/platforms/windows-azure/
https://michaelmckeownblog.wordpress.com/2013/12/04/resolving-internal-ips-vs-dns-names-between-vms/
https://gist.github.com/davideicardi/f2094c4c3f3e00fbd490
MongoDB connection problems on Azure
MongoDB connection timeouts (Azure)
When using the C# Mongo driver we resolved this by setting the following
MongoDefaults.MaxConnectionIdleTime = TimeSpan.FromMinutes(1);

How to disable connection pooling with PostgreSQL in flask-sqlalchemy?

I want to use pgpool or pgbouncer as an external connection pooler with my flask app. The flask-sqlalchemy extension does not seem to expose a way to change the connection pooler to NullPool. Is there some way to do this?
While it should be possible with the apply_driver_hacks method, I would strongly recommend against it.
The TCP overhead is negligible on a local machine, but the authentication and negotiation (encoding for example) certainly isn't. Keeping a pool is always useful within Flask and if needed can be configured with the SQLALCHEMY_POOL_SIZE, SQLALCHEMY_POOL_TIMEOUT, SQLALCHEMY_POOL_RECYCLE and SQLALCHEMY_MAX_OVERFLOW settings.
If you simply want to cut down on overhead (albeit completely negligible) and your one instance Flask app is the only thing connecting to Postgres, than removing PgBouncer/PgPool from the mix would be better.

Difference between server connection and server instance?

I was using MySQL Workbench and I am not able to figure out the difference between the following:
1. Server instance
2. Connection to server
In general I want to know if we can use Open Connection to start querying without creating
a server instance of the connection we are trying to connect. Are these two things independent?
You need one or two connections depending on what you wanna do with your server. For MySQL work (i.e. running queries) you need a MySQL connection. For server work (e.g. shutting the MySQL server down or manage other aspects that require shell access) you need a second connection (which is called a Server Instance).
Beginning with MySQL Workbench 6.0 we merged both connection settings into one interface.