Airflow psycopg2.OperationalError: FATAL: sorry, too many clients already - postgresql

I have a four node clustered Airflow environment that's been working fine for me for a few months now.
ec2-instances
Server 1: Webserver, Scheduler, Redis Queue, PostgreSQL Database
Server 2: Webserver
Server 3: Worker
Server 4: Worker
Recently I've been working on a more complex DAG that has a few dozen tasks in it compared to my relatively small ones I was working on beforehand. I'm not sure if that's why I'm just now seeing this error pop up or what but I'll sporadically get this error:
On the Airflow UI under the logs for the task:
psycopg2.OperationalError: FATAL: sorry, too many clients already
And on the Webserver (output from running airflow webserver) I get the same error too:
[2018-07-23 17:43:46 -0400] [8116] [ERROR] Exception in worker process
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py", line 2158, in _wrap_pool_connect
return fn()
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/pool.py", line 403, in connect
return _ConnectionFairy._checkout(self)
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/pool.py", line 788, in _checkout
fairy = _ConnectionRecord.checkout(pool)
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/pool.py", line 532, in checkout
rec = pool._do_get()
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/pool.py", line 1193, in _do_get
self._dec_overflow()
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/util/langhelpers.py", line 66, in __exit__
compat.reraise(exc_type, exc_value, exc_tb)
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/util/compat.py", line 187, in reraise
raise value
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/pool.py", line 1190, in _do_get
return self._create_connection()
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/pool.py", line 350, in _create_connection
return _ConnectionRecord(self)
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/pool.py", line 477, in __init__
self.__connect(first_connect_check=True)
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/pool.py", line 671, in __connect
connection = pool._invoke_creator(self)
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/strategies.py", line 106, in connect
return dialect.connect(*cargs, **cparams)
File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 410, in connect
return self.dbapi.connect(*cargs, **cparams)
File "/usr/local/lib64/python3.6/site-packages/psycopg2/__init__.py", line 130, in connect
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: FATAL: sorry, too many clients already
I can fix this by running sudo /etc/init.d/postgresql restart and restarting the DAG but then after about three runs I'll start seeing the error again.
I can't find any specifics on this issue in regards to Airflow but from other posts I've found such as this one they're saying it's because my client (I guess in this case that's Airflow) is trying to open up more connections to PostgreSQL than what PostgreSQL is configured to handle. I ran this command to find that my PostgreSQL can accept 100 connections:
[ec2-user#ip-1-2-3-4 ~]$ sudo su
root#ip-1-2-3-4
[/home/ec2-user]# psql -U postgres
psql (9.2.24)
Type "help" for help.
postgres=# show max_connections;
max_connections
-----------------
100
(1 row)
In this solution the post says I can increase my PostgreSQL max connections but I'm wondering if I should instead set a value in my Airflow.cfg file so that I can match the Airflow allowed connections size to my PoastgreSQL max connections size. Does anyone know where I can set this value in Airflow? Here are the fields I think are relevant:
# The SqlAlchemy pool size is the maximum number of database connections
# in the pool.
sql_alchemy_pool_size = 5
# The SqlAlchemy pool recycle is the number of seconds a connection
# can be idle in the pool before it is invalidated. This config does
# not apply to sqlite.
sql_alchemy_pool_recycle = 3600
# The amount of parallelism as a setting to the executor. This defines
# the max number of task instances that should run simultaneously
# on this airflow installation
parallelism = 32
# The number of task instances allowed to run concurrently by the scheduler
dag_concurrency = 32
# When not using pools, tasks are run in the "default pool",
# whose size is guided by this config element
non_pooled_task_slot_count = 128
# The maximum number of active DAG runs per DAG
max_active_runs_per_dag = 32
Open to any suggestions for fixing this issue. Is this something related to my Airflow configuration or is it an issue with my PostgreSQL configuration?
Also, because I'm testing a new DAG I'll sometimes terminate the running tasks and start them over. Perhaps doing this is causing some of the processes to not die correctly and they're keeping dead connections open to PostgreSQL?

Ran into similar issue. I changed max_connections in postgres to 10000 and sql_alchemy_pool_size in airflow config to 1000. Now I am able to run hundreds of tasks in parallel.
PS: My machine has 32 cores and 60GB memory. Hence, its taking the load.

Quoting the airflow documentation:
sql_alchemy_max_overflow: The maximum overflow size of the pool. When the number of checked-out connections reaches the size set in pool_size, additional connections will be returned up to this limit. When those additional connections are returned to the pool, they are disconnected and discarded. It follows then that the total number of simultaneous connections the pool will allow is pool_size + max_overflow, and the total number of “sleeping” connections the pool will allow is pool_size. max_overflow can be set to -1 to indicate no overflow limit; no limit will be placed on the total number of concurrent connections. Defaults to 10.
It seems that the variables you'll want to set on your airflow.cfg are both sql_alchemy_pool_size and sql_alchemy_max_overflow. Your PostgreSQL max_connections must be equal to or greater than the sum of those two Airflow configuration variables, since Airflow can have at most sql_alchemy_pool_size + sql_alchemy_max_overflow open connections with your database.

Related

MongoDB replica set error reading from secondary (using primaryPreferred) after primary shut down

I created a MongoDB replica set and I am testing the failover scenarios.
I am using pymongo as my driver and I am connecting using an mongodb+srv:// connection string and I set my readPreference to primaryPreferred.
However when I shut down my primary server I get the following error from pymongo (prior to the secondary assuming the role of the primary, once the secondary is primary it works as expected), my question is that since I am using a readPreference of primaryPreferred shouldn't the read be directed to the secondary and not error out?
Traceback (most recent call last):
File ".\main.py", line 86, in <module>
main()
File ".\main.py", line 18, in main
commands[command](*args)
File ".\main.py", line 26, in print_specs
for db in client.list_databases():
File "C:\Python38\lib\site-packages\pymongo\mongo_client.py", line 1890, in list_databases
res = admin._retryable_read_command(cmd, session=session)
File "C:\Python38\lib\site-packages\pymongo\database.py", line 748, in _retryable_read_command
return self.__client._retryable_read(
File "C:\Python38\lib\site-packages\pymongo\mongo_client.py", line 1453, in _retryable_read
server = self._select_server(
File "C:\Python38\lib\site-packages\pymongo\mongo_client.py", line 1253, in _select_server
server = topology.select_server(server_selector)
File "C:\Python38\lib\site-packages\pymongo\topology.py", line 233, in select_server
return random.choice(self.select_servers(selector,
File "C:\Python38\lib\site-packages\pymongo\topology.py", line 192, in select_servers
server_descriptions = self._select_servers_loop(
File "C:\Python38\lib\site-packages\pymongo\topology.py", line 208, in _select_servers_loop
raise ServerSelectionTimeoutError(
pymongo.errors.ServerSelectionTimeoutError: No replica set members match selector "Primary()"
You have either:
Not specified the read preference at all, or specified it incorrectly, or
Have issued an operation that doesn't respect read preference, or
Are attempting a write, to which read preference doesn't apply.
Going by the stack trace, you are listing databases which is a read operation so it's one of the first two reasons.

is it correct parameters for pgbouncer.ini and postgresql.conf?

I have pgbouncer.ini file with the below configuration
[databases]
test_db = host=localhost port=5432 dbname=test_db
[pgbouncer]
logfile = /var/log/postgresql/pgbouncer.log
pidfile = /var/run/postgresql/pgbouncer.pid
listen_addr = 0.0.0.0
listen_port = 5433
unix_socket_dir = /var/run/postgresql
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
admin_users = postgres
#pool_mode = transaction
pool_mode = session
server_reset_query = RESET ALL;
ignore_startup_parameters = extra_float_digits
max_client_conn = 25000
autodb_idle_timeout = 3600
default_pool_size = 250
max_db_connections = 250
max_user_connections = 250
and I have in my postgresql.conf file
max_connections = 2000
does it effect badly on the performance ? because of max_connections in my postgresql.conf ? or it doesn't mean anything and already the connection handled by the pgbouncer ?
one more question. in pgpouncer configuration, does it right listen_addr = 0.0.0.0 ? or should to be listen_addr = * ?
Is it better to set default_pool_size on PgBouncer equal to the number of CPU cores available on this server?
Shall all of default_pool_size, max_db_connections and max_user_connections to be set with the same value ?
So the idea of using pgbouncer is to pool connections when you can't afford to have a higher number of max_connections in PG itself.
NOTE: Please DO NOT set max_connections to a number like 2000 just like that.
Let's start with an example, if you have a connection limit of 20 and then your app or organization wants to have a 1000 connections at a given time, that is where pooler comes into picture and in this specific case you want the 20 connections to pool that 1000 coming in from the application.
To understand how it actually works let's take a step back and understand what happens when you do not have a connection pooler and only rely on PG config setting for the max connections which in our case is 20.
So when a connection comes in from a client\application etc. the main process of postgresql(PID, i.e. parent ID) spawns a child for that. So each new connection spawns a child process under the main postgres process, like so:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
24379 postgres 20 0 346m 148m 122m R 61.7 7.4 0:46.36 postgres: sysbench sysbench ::1(40120)
24381 postgres 20 0 346m 143m 119m R 62.7 7.1 0:46.14 postgres: sysbench sysbench ::1(40124)
24380 postgres 20 0 338m 137m 121m R 57.7 6.8 0:46.04 postgres: sysbench sysbench ::1(40122)
24382 postgres 20 0 338m 129m 115m R 57.4 6.5 0:46.09 postgres: sysbench sysbench ::1(40126)
So now once a connection request is sent, it is received by the POSTMASTER process and creates a child process at OS level under the main parent process. This connection then has a life span of "unlimited" unless close by the application or you have a time out set for idle connections in postgresql.
Now here comes the situation where it can be a very costly affair to manage the connections with a given compute, if they exceed a certain limit. Meaning n number of connections when served have a given compute cost and after some time the OS won't be able to handle a situation with HUGE connections and will in turn cause contentions at different compute level(i.e. Memory, CPU, I/O).
What if you can use the presently spawned child processes(backends) if they are not doing any work. You will save time on getting the child process(backends) and the additional cost as well(this can be different at times). This is where the pool of connections that are always open help to serve different client requests comes in and is also called pooling.
So basically now you have only n connections available but the pooler can manage n+i number of connections to serve the client requests.
This where pg-bouncer helps to reuse the connections. It can be configured with 3 types of pooling i.e Session pooling, Statement pooling and Transaction pooling. Basically bouncer returns the connection back to the pool once it has done, statement level work or transaction level work etc. Only during session pooling it keeps the connections unless it disconnects.
So basically lower down the number of connections at PG conf file level and tune all settings in the bouncer.ini.
To answer the second part:
one more question. in pgpouncer configuration, does it right listen_addr = 0.0.0.0 ? or should to be listen_addr = * ?
It depends if you have a standalone deployment, server etc.
basically if its on the server itself and you want it to allow connections from everywhere(incoming) use "*" if you want only the local network to be allowed use "127.0.0.0".
For the rest of your questions check this link: pgbouncer docs
I have tried to share a little of what I know, feel free to ask away if anything was unclear or or correct if it was incorrectly mentioned.

Connecting Apache Superset with PostgreSQL

Suppose I run my Apache Superset on top of the Docker and I want this to connect with my local postgreSQL server. I used the following URI but I got an error:
postgresql+psycopg2://username:password#localhost:5432/mydb
The error is:
ERROR: {"error": "Connection failed!\n\nThe error message returned was:\n(psycopg2.OperationalError) could not connect to server: Connection refused\n\tIs the server running on host \"localhost\" (127.0.0.1) and accepting\n\tTCP/IP connections on port 5432?\ncould not connect to server: Cannot assign requested address\n\tIs the server running on host \"localhost\" (::1) and accepting\n\tTCP/IP connections on port 5432?\n\n(Background on this error at: http://sqlalche.me/e/e3q8)", "stacktrace": "Traceback (most recent call last):\n File \"/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/base.py\", line 2265, in _wrap_pool_connect\n return fn()\n File \"/usr/local/lib/python3.6/site-packages/sqlalchemy/pool/base.py\", line 303, in unique_connection\n return _ConnectionFairy._checkout(self)\n File \"/usr/local/lib/python3.6/site-packages/sqlalchemy/pool/base.py\", line 760, in _checkout\n fairy = _ConnectionRecord.checkout(pool)\n File \"/usr/local/lib/python3.6/site-packages/sqlalchemy/pool/base.py\", line 492, in checkout\n rec = pool._do_get()\n File \"/usr/local/lib/python3.6/site-packages/sqlalchemy/pool/impl.py\", line 238, in _do_get\n return self._create_connection()\n File \"/usr/local/lib/python3.6/site-packages/sqlalchemy/pool/base.py\", line 308, in _create_connection\n return _ConnectionRecord(self)\n File \"/usr/local/lib/python3.6/site-packages/sqlalchemy/pool/base.py\", line 437, in __init__\n self.__connect(first_connect_check=True)\n File \"/usr/local/lib/python3.6/site-packages/sqlalchemy/pool/base.py\", line 639, in __connect\n connection = pool._invoke_creator(self)\n File \"/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/strategies.py\", line 114, in ...
How can I solve it?
Instead of using localhost or 127.0.0.1, open up your pgAdmin.
The servers are on the left.
Click the dropdown.
Right click on the now opened cluster (level above "Databases") & open properties.
Navigate to the opened connection tab and the Hostname/Address is your replacement for "localhost"
Also make sure the final part of your connection string is pointed at your database which is one level below "Databases" in your pgAdmin.
I encountered the same problem with connecting superset to local database (postgresql), and after consulting many sites on the internet this trick solved it.Instead of local host, try to put this in SQLalchemy URI:
postgresql+psycopg2://user:password#host.docker.internal:5432/database
I understand that It is a bad practice action to connect the Docker with a host database so I changed my opinion and use the postgres image inside the docker and push my data to that postgres server.
It would be helpful if you notify me if I am wrong.

Unable to insert or retrieved data to MongoDB

I try to insert and pull some data from MongoDB.
The connection was setup correctly follow there instruction on mongodb.com
try:
client = MongoClient(
'mongodb+srv://user:pw!#cluster0-nghj0.gcp.mongodb.net/test?retryWrites=true',
ssl=True)
print("connected")
except:
print('failed')
I manually create a Database: messager.messager and put some json file in it
when I try to use collection.find() or collection.insert_one(...)
db = client.messager
collection = db.messager
for i in collection.find():
print(i)
It returns Timeout error:
File "/Users/anhnguyen/Documents/GitHub/GoogleCloud_Flask/comming soon/env/lib/python3.7/site-packages/pymongo/cursor.py", line 1225, in next
if len(self.__data) or self._refresh():
File "/Users/anhnguyen/Documents/GitHub/GoogleCloud_Flask/comming soon/env/lib/python3.7/site-packages/pymongo/cursor.py", line 1117, in _refresh
self.__session = self.__collection.database.client._ensure_session()
File "/Users/anhnguyen/Documents/GitHub/GoogleCloud_Flask/comming soon/env/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1598, in _ensure_session
return self.__start_session(True, causal_consistency=False)
File "/Users/anhnguyen/Documents/GitHub/GoogleCloud_Flask/comming soon/env/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1551, in __start_session
server_session = self._get_server_session()
File "/Users/anhnguyen/Documents/GitHub/GoogleCloud_Flask/comming soon/env/lib/python3.7/site-packages/pymongo/mongo_client.py", line 1584, in _get_server_session
return self._topology.get_server_session()
File "/Users/anhnguyen/Documents/GitHub/GoogleCloud_Flask/comming soon/env/lib/python3.7/site-packages/pymongo/topology.py", line 434, in get_server_session
None)
File "/Users/anhnguyen/Documents/GitHub/GoogleCloud_Flask/comming soon/env/lib/python3.7/site-packages/pymongo/topology.py", line 200, in _select_servers_loop
self._error_message(selector))
pymongo.errors.ServerSelectionTimeoutError: connection closed,connection closed,connection closed
Where did it goes wrong ?
Here is my Mongodb.com setup:
From the pymongo documentation for errors you have the following issue.
exception pymongo.errors.ServerSelectionTimeoutError(message='', errors=None)
Thrown when no MongoDB server is available for an operation
If there is no suitable server for an operation PyMongo tries for serverSelectionTimeoutMS (default 30 seconds) to find one, then throws this exception. For example, it is thrown after attempting an operation when PyMongo cannot connect to any server, or if you attempt an insert into a replica set that has no primary and does not elect one within the timeout window, or if you attempt to query with a Read Preference that the replica set cannot satisfy.
You have to check whether your network has good connectivity to the network where your MongoDB server resides.
There might be a case, where the Primary Node of the Replica Set becomes unresponsive. In such cases you need to restart your cluster(if you have the access permissions).
Also, create the connections as follows:
mongo_conn = MongoClient('mongodb+srv://cluster0-nghj0.gcp.mongodb.net/test?retryWrites=true', username=your_username, password=pwd, authSource='admin', authMechanism='SCRAM-SHA-1')
above is the best practice to follow. mongodb+srv urls do not need the ssl=True mention.

Heroku postgres connection: "Is the server running locally and accepting connections on Unix domain socket"

I am setting up a new dev environment, followed the django setup tutorial and am having issues. Here is what I get when I try to run syncdb
Running `python doccal/manage.py syncdb` attached to terminal... up, run.1
Traceback (most recent call last):
File "doccal/manage.py", line 14, in <module>
execute_manager(settings)
File "/app/.heroku/venv/lib/python2.7/site-packages/django/core/management/__i
nit__.py", line 459, in execute_manager
utility.execute()
File "/app/.heroku/venv/lib/python2.7/site-packages/django/core/management/__i
nit__.py", line 382, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/app/.heroku/venv/lib/python2.7/site-packages/django/core/management/bas
e.py", line 196, in run_from_argv
self.execute(*args, **options.__dict__)
File "/app/.heroku/venv/lib/python2.7/site-packages/django/core/management/bas
e.py", line 232, in execute
output = self.handle(*args, **options)
File "/app/.heroku/venv/lib/python2.7/site-packages/django/core/management/bas
e.py", line 371, in handle
return self.handle_noargs(**options)
File "/app/.heroku/venv/lib/python2.7/site-packages/django/core/management/com
mands/syncdb.py", line 57, in handle_noargs
cursor = connection.cursor()
File "/app/.heroku/venv/lib/python2.7/site-packages/django/db/backends/__init_
_.py", line 306, in cursor
cursor = self.make_debug_cursor(self._cursor())
File "/app/.heroku/venv/lib/python2.7/site-packages/django/db/backends/postgre
sql_psycopg2/base.py", line 177, in _cursor
self.connection = Database.connect(**conn_params)
File "/app/.heroku/venv/lib/python2.7/site-packages/psycopg2/__init__.py", lin
e 179, in connect
connection_factory=connection_factory, async=async)
psycopg2.OperationalError: could not connect to server: No such file or director
y
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
I have setup this same project before using the same steps and have never had a problem. I did, a few weeks ago, get an email that Heroku was migrating away from shared databases and assume this is somehow involved.
Also, I did notice two NEW steps in the tutorial, namely, installing dj-database-url and adding these lines to settings.py
import dj_database_url
DATABASES = {'default': dj_database_url.config(default='postgres://localhost')}
I have tried to run this both with and without these lines and get the same issue regardless.
Another post suggested the fix was to do this
heroku addons:add shared-database
Tried, get a message that shared-database is deprecated and to use heroku-postgresql, but that had no effect.
Thanks for any help
config HOST as localhost like :
'HOST': 'localhost',
The simplest(not comprehensive but one to get you up and running as quick as possible) solution to this error is, in your settings.py file , in the part where you have set up database settings revert the settings to
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.sqlite3',
'NAME': BASE_DIR / 'db.sqlite3',
}
}
make migrations (i.e python manage.py makemigrations and then python manage.py migrate)
push your changes to GitHub, in your Heroku account, create a new app then move to the setting portion of your new app, connect to GitHub and deploy from GitHub, the database resources will then be provisioned for you (i.e a PostgreSQL database will be created with the same schema and data as your local SQLite database).