What's a good way to deploy and update a long running process? - deployment

I have a process running on a server.
FYI I'm trying to solve the same problem for both nodejs and Python - I don't really think the specific server/languages matters as the question is more about approach to deployment.
The work the the process does might take anywhere from seconds to hours to run.
I'm trying to work out how to deploy updated code for the process.
I don't want to just stop the process in the middle of what it is doing for fear of losing all the work done so far in the long running process.
So what's a good way to get the process to gracefully exit and restart when new code has arrived?
I use systemd for running the nodejs service.
I use Ansible to deploy updates, not that this is really relevant.
I thought maybe at the end of each execution of the long running process the server could check to see if some file has been placed on the disk as some sort of flag to indicate it should exit and restart, but that seems kinda brittle and hacky.
Anyone got any better mechanisms for this sort of thing?

Related

NestJS schedualers are not working in production

I have a BE service in NestJS that is deployed in Vercel.
I need several schedulers, so I have used #nestjs/schedule lib, which is super easy to use.
Locally, everything works perfectly.
For some reason, the only thing that is not working in my production environment is those schedulers. Everything else is working - endpoints, data base access..
Does anyone has an idea why? is it something with my deployment? maybe Vercel has some issue with that? maybe this schedule library requires something the Vercel doesn't have?
I am clueless..
Cold boot is the process of starting a computer from shutdown or a powerless state and setting it to normal working condition.
Which means that the code you deployed in a serveless manner, will run when the endpoint is called. The platform you are using spins up a virtual machine, to execute your code. And keeps the machine running for a certain period of time, incase you get another API hit, it's cheaper and easier on them to keep the machine running for lets say 5 minutes or 60 seconds, than to redeploy it on every call after shutting the machine when function execution ends.
So in your case, most likely what is happening is that the machine that you are setting the cron on, is killed after a period of time. Crons are system specific tasks which run in the kernel. But if the machine is shutdown, the cron dies with it. The only case where the cron would run, is if the cron was triggered at a point of time, before the machine was shut down.
Certain cloud providers give you the option to keep the machines alive. I remember google cloud used to follow the path of that if a serveless function is called frequently, it shifts from cold boot to hot start, which doesn't kill the machine entirely, and if you have traffic the machines stay alive.
From quick research, vercel isn't the best to handle crons, due to the nature of the infrastructure, and this is what you are looking for. In general, crons aren't for serveless functions. You can deploy the crons using queues for example or another third party service, check out this link by vercel.

Wakanda Server solution.quitServer() sequence of operations

I have already read the thread:
Wakanda Server scripted clean shutdown
This does not address my question.
We are running Wakanda Server 11.197492.
We want an automated, orderly, ensured shut-down of Wakanda Server - no matter which version we are running.
Before we give the "shutdown" command, we will stop inbound traffic for 1 to 2 minutes, to ensure that no httpHandlers are running when we shut-down.
We have scripted a single SharedWorker process to look for the "shutdown" command, and execute solution.quitServer().
At this time no other ShareWorker processes are running, and no active threads should be executing. This will likely not always be the case.
When this is executed, is a "solution quit" guaranteed?
Is solution.quitServer() the best way to initiate an automated solution shutdown?
Will there be a better way?
Is there a way to know of any of the Solution's Projects are currently executing threads prior to shutting down?
If more than 1 Project issues a solution.quitServer() method, within a few seconds of eachother, will that be a problem?
solution.quitServer() is probably not the best way to shutdown your server as it will be deprecated in the next major release.
I would recommend to send a sigkill as you point out in your question.
Wakanda Server scripted clean shutdown
Some fix have been done on v1.1.0 to safely close wakanda server after a kill.

Production sailsjs app with no downtime with pm2

I have a sailsjs app running in cluster mode with pm2 and two instances. One of the main reasons for wanting the two instances was so I could restart/update the app without having to bring the entire app down.
However, in the middle of a restart of one instance pm2 restart 4, the site is all wonky (that's the technical term) if I refresh it. I'm assuming this is because grunt is doing it's thing and the .tmp folder gets destroyed for both instances?
Is the only real approach with sailsjs to have two complete instances running on different ports and use something like nginx as the load balancer, or am I missing something with PM2 that would allow for staged restarts without any downtime or hiccups in the resources being available?
There are a few issues here.
You need to provide what versions of sails.js/node.js/pm2 you're
running. In short, describe your environment as completely as
possible.
Describing your issue more completely helps people write more concise and clear answers.
node.js cluster mode may change (as of v0.12.4) and is still considered "Unstable": https://nodejs.org/api/cluster.html#cluster_cluster
In the following thread, "mikermcneil commented on Dec 3, 2014" says to disable Grunt for production with pm2: https://github.com/balderdashy/sails/pull/1716
Let me clarify by saying I've used pm2 until just recently. In addition to Grunt, it has issues with socket connections while nginx handles it just fine. Trust me, chasing down that bug was not fun. Here's a link to the thread: https://github.com/Unitech/PM2/issues/389
As an alternate solution I chose to use nginx with parallel sails.js apps, using redis for sockets and sessions. Use forever to keep the apps running and disable grunt. Point nginx to the assets folder to serve static files quickly, bypassing sails.js and add caching to those assets.
Hope this helps!

How to properly check if a slow starting java tomcat application is running so you can restart it?

I want to implement a automatic service restarting for several tomcat applications, applications that do take a lot of time to start, even over 10 minutes.
Mainly the test would check if the application is responding on HTTP with a valid response.
Still, this is not the problem, the problem is how to prevent this uptime check to fail while the service is under maintenance, scheduled or not.
I don't want for this service to be started if it was stopped manually, with `service appname stop".
I considered creating .maintenance files on stop or restart actions of the daemon and checking for them before triggering an automated restart.
So far the only problem that I wasn't able to properly solve was, how to detect that the app finished starting up and remove the .maintenance file, so the automatic restart would work properly.
Note, an init.d script is not supposed to wait, so the daemon should start a background command that solves this problem.

What's a good strategy to restart downed FastCGI processes automatically?

I've got a Perl based FastCGI app that rarely goes down. However, when it does go down, the restart is not automatic. Restarting Apache manually always does the trick but that does address improving the uptime of the app.
I'm thinking of using a cron job in conjunction with a script that uses WWW::Mechanize to periodically check on the app and restart it as required, as suggested by the folks at Perl Monks :
Keep FastCGI Processes Up and Running
Before I do that, I'm want to know if anyone knows of better ways to monitor a FastCGI process and restarting it automatically when it dies, or is the method suggested above the optimal one?
Thanks.
Monit is a nice monitoring daemon that can do automatic restarts and/or notification.
How about not having the process supervised by Apache but using an mechanism similar to the way init(8) starts getty processes? I have found daemon to be quite useful.
Most of the web servers offer already offer this as a configuration option.