can a Heroku worker job with external socket connection run in parallel? - sockets

Can a worker job in Heroku make socket (ex.pop3) connection to external server ?
I guess scaling worker process to 2 or more will run jobs in parallel and they all trying to connect to same server/port from a same client/port, am I right or missing something ?

Yes - Heroku workers can connect to the outside world - however, there is no built in provision for handling the sort of problems that you mention - you'd need to do that bit yourself.
Just look at the workers as a variety of separate EC2 instances.

Related

How do I connect Locust workers to a master running with an https URL?

My company writes medical software and as such is subject to the HIPPA requirments. We are running our code in GCP and I am trying to implement load testing using Locust.
I am able to get the locust master up and running on one of our clusters with an external address but only via https://locustmaster.gcp.mycompany
I am trying to figure out how to get the workers to connect to this. There are TLS and web auth command line options but those are for connecting to the target URL and not the locust master.
Any ideas on how to get this to work?
Oh, I am using Locust V1.4.4.
Worker and master communicate over ZeroMQ (unfortunately they cant work over http). You point to the master using --master-host=X.X.X.X and --master-port=XYZ
https://docs.locust.io/en/stable/running-locust-distributed.html#options

Dynamic port mapping for ECS tasks

I want to run a socket program in aws ecs with client and server in one task definition. I am able to run it when I use awsvpc network mode and connect to server on localhost every time. This is good so I don’t need to know the IP address of server. The issue is server has to start on some port and if I run 10 of these tasks only 3 tasks(= number of running instances) run at a time. This is clearly because 10 tasks cannot open the same port. I can manually check for open ports before starting the server and somehow write it to docker shared volume where client can read and connect. But this seems complicated and my server has unnecessary code. For the Services there is dynamic port mapping by using Application Load Balancer but there isn’t anything for simply running tasks.
How can I run multiple socket programs without having to manage the port number in Aws ecs?
If you're using awsvpc mode, each task will get its own eni and there shouldn't be any port conflict. But each instance type has a limited number of enis available. You can increase that by enabling eni trunking which, however is supported by a handful of instance types:
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/container-instance-eni.html#eni-trunking-supported-instance-types

Load balancer and celery result backends

I have a task that takes approximately 3 minutes to run. It pulls data from a remote server and makes cpu-intensive analysis on it. This task will be invoked by an api call. Upon the api call, i am planning to give client a unique task id and assign the task to a celery worker. Then the client will poll the server with the given task id to see if the task is completed by celery worker and its result it saved to a result backend. I think of using nginx, gunicorn, flask and dockerize them for a easy deploy in case i need to distribute this architecture across multiple machines.
The problem is that the client may poll different servers due to load balancer and if not handled well, the polled server’s celery’s result backend might not have the task’s result but other server’s celery result backend has it.
Is it possible to use a single result backend over multiple celery instances and make different celery instances wuery the same result backend? What might be other possible ways to solve this other than using cloud storage like S3?
Would I have this problem only if I have multiple machines or would it happen even if I have multiple gunicorn instances in a single machine where nginx acts as a load balancer on them?
Not that it is possible to use a single result backend by all Celery workers, but that is the only setting that makes sense! Same goes for the broker in most cases, unless you have a complicated Celery infrastructure with exchanges, and complicated routes...

Can I tie Celery workers to a particular instance given a shared database?

I have a number of machines each with a Django instance, sharing a single Postgres database.
I want to run Celery, preferably using the Django broker and the Postgres database for simplicity. I do not have a high volume of tasks to run, so there is no need to use a different broker for that reason.
I want to run celery tasks which operate on local file storage. This means that I want the celery worker only to run tasks which are on the same machine that triggered the event.
Is this possible with the current setup? If not, how do to it? A local Redis instance for each machine?
I worked out how to make this work. No need for fancy routing or brokers.
I run each celeryd instance with a special queue named after the host. This can be done automatically, like:
./manage.py celeryd -Q celery,`hostname`
I then set up a hostname in the settings.py that stores the hostname:
import socket
CELERY_HOSTNAME = socket.gethostname()
In each Django instance this will have a different value.
I can then specify this queue when I asynchronously call my task:
my_task.apply_async(args=[one, two], queue=settings.CELERY_HOSTNAME)

Questions Concerning Using Celery with Multiple Load-Balanced Django Application Servers

I'm interested in using Celery for an app I'm working on. It all seems pretty straight forward, but I'm a little confused about what I need to do if I have multiple load balanced application servers. All of the documentation assumes that the broker will be on the same server as the application. Currently, all of my application servers sit behind an Amazon ELB and tasks need to be able to come from any one of them.
This is what I assume I need to do:
Run a broker server on a separate instance
Configure each application instance to connect to that broker server
Each application instance will also be be a celery working (running
celeryd)?
My only beef with that is: What happens if my broker instance dies? Can I run 2 broker instances some how so I'm safe if one goes under?
Any tips or information on what to do in a setup like mine would be greatly appreciated. I'm sure I'm missing something or not understanding something.
For future reference, for those who do prefer to stick with RabbitMQ...
You can create a RabbitMQ cluster from 2 or more instances. Add those instances to your ELB and point your celeryd workers at the ELB. Just make sure you connect the right ports and you should be all set. Don't forget to allow your RabbitMQ machines to talk among themselves to run the cluster. This works very well for me in production.
One exception here: if you need to schedule tasks, you need a celerybeat process. For some reason, I wasn't able to connect the celerybeat to the ELB and had to connect it to one of the instances directly. I opened an issue about it and it is supposed to be resolved (didn't test it yet). Keep in mind that celerybeat by itself can only exist once, so that's already a single point of failure.
You are correct in all points.
How to make reliable broker: make clustered rabbitmq installation, as described here:
http://www.rabbitmq.com/clustering.html
Celery beat also doesn't have to be a single point of failure if you run it on every worker node with:
https://github.com/ybrs/single-beat