Marathon how to health check a background worker - marathon

I got a standard rails app with 2 process types, the web process and the worker process. both running as tasks on marathon.
Is there a way to define health check to the worker process, the process does not listen on any port, whats recommended here?

Related

Stateless Worker service in Service Fabric restarted in the same process

I have a stateless service that pulls messages from an Azure queue and processes them. The service also starts some threads in charge of cleanup operations. We recently ran into an issue where these threads which ideally should have been killed when the service shuts down continue to remain active (definitely a bug in our service shutdown process).
Further looking at logs, it seemed that, the RunAsync methods cancellation token received a cancellation request, and later within the same process a new instance of the stateless service that was registered in ServiceRuntime.RegisterServiceAsync was created.
Is this expected behavior that service fabric can re-use the same process to start a new instance of the stateless service after shutting down the current instance.
The service fabric documentation https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-hosting-model does seem to suggest that different services can be hosted on the same process, but I could not find the above behavior being mentioned there.
In the shared process model, there's one process per ServicePackage instance on every node. Adding or replacing services will reuse the existing process.
This is done to save some (process-level) overhead on the node that runs the service, so hosting is more cost-efficient. Also, it enables port sharing for services.
You can change this (default) behavior, by configuring the 'exclusive process' mode in the application manifest, so every replica/instance will run in a separate process.
<Service Name="VotingWeb" ServicePackageActivationMode="ExclusiveProcess">
As you mentioned, you can monitor the CancellationToken to abort the separate worker threads, so they can stop when the service stops.
More info about the app model here and configuring process activation mode.

Service Fabric upgrades keep active connections alive

I am trying to upgrade an application deployed to service fabric.
How can I only upgrade nodes that have no active connections and wait for the busy nodes to finish before upgrading them?
Most of the time, you don't really have to worry about the upgrades on a node level as the SF runtime handles it internally if configured in Monitored mode. This is what we've been using with a high level of success and never really had to do much. This also fit our requirement that all upgrade domains (nodes) have to match our health state policies before considered healthy.
If you want to have more advanced control over your upgrades like using request draining etc, have a look at the info as mentioned here. But to be honest, we've been quite happy with just using monitored mode and investigating why stuff fails if it does. We had some apps that had a long background task running as a stateful actor that sometimes failed upgrade and most always it was due to an issue that was caused in the background task itself instead of anything to do with Service Fabric.
Service Fabric knew when no active connections and background tasks were running to then upgrade nodes and we could actually see the nodes that were temporarily 'stuck' due to waiting for an active background task to finish.

Load balancer and celery result backends

I have a task that takes approximately 3 minutes to run. It pulls data from a remote server and makes cpu-intensive analysis on it. This task will be invoked by an api call. Upon the api call, i am planning to give client a unique task id and assign the task to a celery worker. Then the client will poll the server with the given task id to see if the task is completed by celery worker and its result it saved to a result backend. I think of using nginx, gunicorn, flask and dockerize them for a easy deploy in case i need to distribute this architecture across multiple machines.
The problem is that the client may poll different servers due to load balancer and if not handled well, the polled server’s celery’s result backend might not have the task’s result but other server’s celery result backend has it.
Is it possible to use a single result backend over multiple celery instances and make different celery instances wuery the same result backend? What might be other possible ways to solve this other than using cloud storage like S3?
Would I have this problem only if I have multiple machines or would it happen even if I have multiple gunicorn instances in a single machine where nginx acts as a load balancer on them?
Not that it is possible to use a single result backend by all Celery workers, but that is the only setting that makes sense! Same goes for the broker in most cases, unless you have a complicated Celery infrastructure with exchanges, and complicated routes...

Acumatica scheduler

This might be a strange question. Is there any way to make the scheduler stay active perpetually? I have a couple of instances, one on a server for testing and a development instance on my laptop. I setup some business events in both instances that accurately fire as designed. My question comes from the fact that the scheduler seems to stall if no one logs into the instance. Once I login to the instance with any id, the scheduler restarts and runs for about 12 hours and then stalls again. I thought it was only the test instance on the server, but I took a couple days off and my laptop instance also stalled. Is there a setting to overcome this? I know the assumption is that there will be users in the system in production, but what about over the weekend or holidays?
The schedule is run from the IIS worker process (w3wp) of the assigned Application Pool. Normally the worker process is started when the first web request is received.
If you restart the test instance of the server or your laptop instance you may experience this delay until someone logs in.
However, you can set the worker process to start automatically whenever an Application Pool starts.
Check your IIS configuration, look for the Application Pool assigned to your Acumatica instance and edit its Advanced Settings.
There you can change StartMode to AlwaysRunning.
Your app pool might be recycled. Check the following posts that can help you
How to know who kills my threads
https://serverfault.com/questions/333907/what-should-i-do-to-make-sure-that-iis-does-not-recycle-my-application

Chronos Cluster with High Availability

I have three server A,B,C on each machine I'm running Chronos, ZooKeeper, mesos-master, mesos-slave.
Chronos contact mesos-master using ZooKeeper url hence it automatically picks leading master even if some node is down. I'm having high availability here.
Even Chronos run in cluster mode so accessing any of the Chronos I see same list of jobs and everything works fine.
Problem I have here is, Chronos is accessible with any of the three URLs
http://server_node_1:4400
http://server_node_2:4400
http://server_node_3:4400
I have another application which schedules jobs in Chronos using Rest API. Which URL my application has to talk to in order in run in high availabiity mode?
Let's say my application talks to http://server_node_1:4400 for scheduling the job, if Chronos on node server_node_1 is down I'm not able to schedule the Job.
My application needs to talk to single URL in order to schedule job in Chronos. Even if some Chronos node is down, I should be able to schedule the job. Do I need to have some kind of load balancer between my application and Chronos cluster to pick running chronos node for job scheduling? How can I achieve high availability in my scenario?
Use HAProxy for routing to a Chronos instance. This way you can access a Chronos instance using e.g. curl loadbalancer:8081.
haproxy.cfg:
listen chronos_8081
bind 0.0.0.0:8081
mode http
balance roundrobin
option allbackups
option http-no-delay
server chronos01 server_node_1:4400
server chronos02 server_node_2:4400
server chronos03 server_node_3:4400
Or even better, start Chronos via Marathon, which will ensure given number of instances. Then HAProxy configuration could be generated by:
marathon-lb
bamboo