How to reduce/specify number of connections in celery flower? - celery

I am running an app on a limited service host, where my Redis server has a maximum of 50 connections. I have configured my web app to be using a consistent 20 Redis connections. However, if I launch Flower to inspect the Redis, my connections jump up to 45. How can I reduce how many connections Flower is launching?
I thought I found a way by making my flower start command:
celery -A myapp flower --redis_max_connections=2 --concurrency=4 -l info
But this didn't seem to change anything... What am I missing?

Related

Airflow Celery Worker celery-hostname

On airflow 2.1.3, looking at the documentation for the CLI, airflow celery -h
it shows:
-H CELERY_HOSTNAME, --celery-hostname CELERY_HOSTNAME
Set the hostname of celery worker if you have multiple workers on a single machine
I am familiar with Celery and I know you can run multiple workers on the same machine. But with Airflow (and the celery executor) I don't understand how to do so.
if you do, on the same machine:
> airflow celery worker -H 'foo'
> airflow celery worker -H 'bar'
The second command will fail, complaining about the pid, so then:
> airflow celery worker -H 'bar' --pid some-other-pid-file
This will run another worker and it will sync with the first worker BUT you will get a port binding error since airflow binds the worker process to http://0.0.0.0:8793/ no matter what (unless I missed a parameter?).
It seems you are not suppose to run multiple workers per machines... Then my question is, what is the '-H' (--celery-hostname) option for? How would I use it?
The port that celery listens to is also configurable - it is used to serve logs to the webserver:
https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#worker-log-server-port
You can run multiple celery workers with different settings for that port or run them with --skip-serve-logs to not open the webserver in the first place (for example if you send logs to sentry/s3/gcs etc.).
BTW. This sounds strange by the way to run several celery workers because you can utilise the machine CPUS by using parallelism. This seems much more practical an easier to manage.

Utilizing multiple database pool connections within multiple gunicorn workers

I am using a flask server which initialises the app by creating 10 connections in psycopg2 Connection Pool (using Postgres). My flask server receives 40 requests every second.
Every request uses 1 connection and takes approximately 5 seconds in the database. If a connection is not found in the connection pool, new connections are created.
There is a limitation of 150 maximum database connections on postgreSQL server. However, I am facing challenges in specifying the maximum number of connection in the connection pool . For the pool intialization, I use:
app.config['pool'] = psycopg2.pool.ThreadedConnectionPool(
10, 145,
host = config["HOST"],
database = config["DATABASE"],
user = config["USER"],
password = config["PASSWORD"]
)
I know it may not be possible to share connections within multiple workers. What is the best practice to utilize these 150 connections across multiple workers?
Fo reference, my tech stack is flask + postgreSQL(on Azure). For deployment, i use gunicorn and nginx with flask.
Following is my gunicorn command-
gunicorn --bind 0.0.0.0:8000 --worker-class=gevent --worker-connections=1000 --workers=3 --timeout=1000 manage:app
The easiest solution is to change the worker-class from gevent to sync or possibly gthread.
It is worth paying attention to the entry straight from the gunicorn documentation: "For full greenlet support applications might need to be adapted. When using, e.g., Gevent and Psycopg it makes sense to ensure psycogreen is installed and setup." (https://docs.gunicorn.org/en/stable/design.html)

docker swarm - connections from wildfly to postgres randomly hang

I'm experiencing a weird problem when deploying a docker stack (compose file).
I have a three node docker swarm - master and two workers.
All machines are CentOS 7.5 with kernel 3.10.0 and docker 18.03.1-ce.
Most things run on the master, one of which is a wildfly (v9.x) application server.
On one of the workers is a postgres database.
After deploying the stack things work normally, but after a while (or maybe after a specific action in the web app) request start to hang.
Running netstat -ntp inside the wildfly container shows 52 bytes stuck in the Send-q:
tcp 0 52 10.0.0.72:59338 10.0.0.37:5432 ESTABLISHED -
On the postgres side the connection is also in ESTABLISHED state, but the send and receive queues are 0.
It's always exactly 52 bytes. I read somewhere that ACK packets with timestamps are also 52 bytes. Is there any way I can verify that?
We have the following sysctl tunables set:
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_intvl = 60
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_timestamps = 0
The first three were needed because of this.
All services in the stack are connected to the same default network that docker creates.
Now if I move the postgres service to be on the same host as the wildfly service the problem doesn't seem to surface or if I declare a separate network for postgres and add it only to the services that need the database (and the database of course) the problem also doesn't seem to show.
Has anyone come across a similar issue? Can anyone provide any pointers on how I can debug the problem further?
Turns out this is a known issue with pooled connections in swarm with services on different nodes.
Basically the workaround is to set the above tuneables + enable tcp keepalive on the socket. See here and here for more details.

Using Celery queues with multiple apps

How do you use a Celery queue with the same name for multiple apps?
I have an application with N client databases, which all require Celery task processing on a specific queue M.
For each client database, I have a separate celery worker that I launch like:
celery worker -A client1 -n client1#%h -P solo -Q long
celery worker -A client2 -n client2#%h -P solo -Q long
celery worker -A client3 -n client3#%h -P solo -Q long
When I ran all the workers at once, and tried to kick off a task to client1, I found it never seemed to execute. Then I killed all workers except for the first, and now the first worker receives and executes the task. It turned out that even though each worker's app used a different BROKER_URL, using the same queue caused them to steal each others tasks.
This surprised me, because if I don't specify -Q, meaning Celery pulls from the "default" queue, this doesn't happen.
How do I prevent this with my custom queue? Is the only solution to include a client ID in the queue name? Or is there a more "proper" solution?
For multiple applications I use different Redis databases like
redis://localhost:6379/0
redis://localhost:6379/1
etc.

Docker blocking outgoing connections on high load?

We have a node.js web server that makes some outgoing http requests to an external API. It's running in docker using dokku.
After some time of load (30req/s) these outgoing requests aren't getting responses anymore.
Here's a graph I made while testing with constant req/s:
incoming and outgoing is the amount of concurrent requests (not the number of initialized requests). (It's hard to see in the graph, but it's fairly constant at ~10 requests for each.)
response time is for external requests only.
You can clearly see that they start failing all of a sudden (hitting our 1000ms timeout).
The more req/s we send, the faster we run into this problem, so we must have some sort of limit we're getting closer to with each request.
I used netstat -ant | tail -n +3 | wc -l on the host to get the number of open connections, but it was only ~450 (most of them TIME_WAIT). That shouldn't hit the socket limit. We aren't hitting any RAM or CPU limits, either.
I also tried running the same app on the same machine outside docker and it only happens in docker.
It could be due to the Docker userland proxy. If you are running a recent version of Docker, try running the daemon with the --userland-proxy=false option. This will make Docker handle port forwarding with just iptables and there is less overhead.