Selecting specific worker for connection - perl

So, I'm using a hypnotoad server for my application and is trying to maintain state for connections. Turns out for every connection a different worker is spawned/selected. Can I somehow make this selection explicit? Also is there a way to know which worker was used for my last request and use it again for corresponding requests?

You can ran multiple masters (every with one worker or like morbo - single processes).
And add load balancer before 'em which will be responsible to selection of concrete process per connection.
Typically I used nginx with upstream module sticky setting.

Related

Load balancer and celery result backends

I have a task that takes approximately 3 minutes to run. It pulls data from a remote server and makes cpu-intensive analysis on it. This task will be invoked by an api call. Upon the api call, i am planning to give client a unique task id and assign the task to a celery worker. Then the client will poll the server with the given task id to see if the task is completed by celery worker and its result it saved to a result backend. I think of using nginx, gunicorn, flask and dockerize them for a easy deploy in case i need to distribute this architecture across multiple machines.
The problem is that the client may poll different servers due to load balancer and if not handled well, the polled server’s celery’s result backend might not have the task’s result but other server’s celery result backend has it.
Is it possible to use a single result backend over multiple celery instances and make different celery instances wuery the same result backend? What might be other possible ways to solve this other than using cloud storage like S3?
Would I have this problem only if I have multiple machines or would it happen even if I have multiple gunicorn instances in a single machine where nginx acts as a load balancer on them?
Not that it is possible to use a single result backend by all Celery workers, but that is the only setting that makes sense! Same goes for the broker in most cases, unless you have a complicated Celery infrastructure with exchanges, and complicated routes...

Advice on how to monitor (micro)services?

We are transitioning from building applications on monolith application servers, to more microservices oriented applications on Spring Boot. We will publish health information with SB Actuator through HTTP or JMX.
What are the options/best practices to monitor services, that will be around 30-50 in total? Thanks for your input!
Not knowing too much detail about your architecture and services, here are some suggestions that represent (a subset of) the strategies that have been proven in systems i've worked on in production. For this I am assuming you are using one container/VM per micro service:
If your services are stateless (as they should be :-) and you have redundancy (as you should have :-) then you set up your load balancer to call your /health on each instance and if the health check fails then the load balancer should take the instance out of rotation. Depending on how tolerant your system is, you can set up various rules that define failure instead of just a single failure (e.g. 3 consecutive, etc.)
On each instance run a Nagios agent that calls your health check (/health) on the localhost. If this fails, generate an alert that specifies which instance failed.
You also want to ensure that a higher level alert is generated if none of your instances are healthy for a given service. You might be able to set this up in your load balancer or you can set up a monitor process outside the load balancer that calls your service periodically and if it does not get any response (i.e. none of the instances are responding) then it should sound all alarms. Hopefully this condition is never triggered in production because you dealt with the other alarms.
Advanced: In a cloud environment you can connect the alarms with automatic scaling features. In that way, unhealthy instances are torn down and healthy ones are brought up automatically every time an instance of a service is deemed unhealthy by the monitoring system

LiveRebel Update Strategy

I am trying to utilize LiveRebel on my production environment. After most parts are configured I tried to perform update on my application from lets say version 1.1 to 1.3 as shown below
Does this mean that LiveRebel require two server installation on 2 physical IP addresses ? Can I have two server on 2 virtual IP addresses ?
Rolling restarts use request routing to achieve zero downtime for the users. Sessions are first drained by waiting for old sessions to expire and routing new ones to an identical application on another server. When all sessions are drained, application is updated, while the other server handles the requests.
So, as you can see, for zero downtime you need additional server to handle the requests while application is updated. Full restart doesn't have that requirement, but results in downtime for users.
As for the question about IPs, as long as the two server (virtual) machines can see each other , doesn't really make much difference.

How to make restfull service truely Highly Available with Hardware load balancer

When we have a cluster of machines behind a load balancer (lb), generally hardware load balancer have persistent connections,
Now when we need to deploy some update on all machines (rolling update), the way to do is by bringing one machine Out of rotation, looks for no request sent to that server via lb. When the app reached no request state then update manually.
With 70-80 servers in picture this becomes very painful.
Can someone have a better way of doing it.
70-80 servers is a very horizontally scaled implementation... good job! Better is a very relative term, hopefully one of these suggestions count as "better".
Implement an intelligent health check for the application with the ability to adjust the health check while the application is running. What we do is have the health check start failing while the application is running just fine. This allows the load balancer to automatically take the system out of rotation. Our stop scripts query the load balancer to make sure that it is out of rotation and then shuts down normally which allows the existing connections to drain.
Batch multiple groups of systems together. I am assuming that you have 70 servers to handle peak load. This means that you should be able to restart several at a time. A standard way to do this is to implement a simple token granting service with a maximum of 10 tokens. Have your shutdown scripts checkout a token before continuing.
Another way to do this is with blue/green deploys. That means that you have an entire second server farm and then once the second server farm is updated switch load balancing to point to the new server farm.
This is an alternate to option 3. Install both versions of the app on the same servers and then have an internal proxy service (like haproxy) switch the connections between the version of the app that is deployed. For example:
haproxy listening on 8080
app version 0.1 listening on 9001
app version 0.2 listening on 9002
Once you are happy with the deploy of app version 0.2 switch haproxy to send traffic to 9002. When you release version 0.3 then switch load balancing back to 9001 etc.

Questions Concerning Using Celery with Multiple Load-Balanced Django Application Servers

I'm interested in using Celery for an app I'm working on. It all seems pretty straight forward, but I'm a little confused about what I need to do if I have multiple load balanced application servers. All of the documentation assumes that the broker will be on the same server as the application. Currently, all of my application servers sit behind an Amazon ELB and tasks need to be able to come from any one of them.
This is what I assume I need to do:
Run a broker server on a separate instance
Configure each application instance to connect to that broker server
Each application instance will also be be a celery working (running
celeryd)?
My only beef with that is: What happens if my broker instance dies? Can I run 2 broker instances some how so I'm safe if one goes under?
Any tips or information on what to do in a setup like mine would be greatly appreciated. I'm sure I'm missing something or not understanding something.
For future reference, for those who do prefer to stick with RabbitMQ...
You can create a RabbitMQ cluster from 2 or more instances. Add those instances to your ELB and point your celeryd workers at the ELB. Just make sure you connect the right ports and you should be all set. Don't forget to allow your RabbitMQ machines to talk among themselves to run the cluster. This works very well for me in production.
One exception here: if you need to schedule tasks, you need a celerybeat process. For some reason, I wasn't able to connect the celerybeat to the ELB and had to connect it to one of the instances directly. I opened an issue about it and it is supposed to be resolved (didn't test it yet). Keep in mind that celerybeat by itself can only exist once, so that's already a single point of failure.
You are correct in all points.
How to make reliable broker: make clustered rabbitmq installation, as described here:
http://www.rabbitmq.com/clustering.html
Celery beat also doesn't have to be a single point of failure if you run it on every worker node with:
https://github.com/ybrs/single-beat