How to handle deploy process of capistrano if it fails on a machine in case of multiserver deployment? - capistrano

I am trying to use capistrano in which i have 4 application servers. I am not sure about the behaviour of capistrano in case deploy fail on one machine. If the network connection is disconnected for a machine then the processes on all the machines go on interrupted sleep and does not notify anything. Please let me know how to trac such issues.
Thanks

Capistrano has its own transaction management, if it fails on any of the machine then the deploy:rollback is called and all the machines always remain in same valid state.

Related

NestJS schedualers are not working in production

I have a BE service in NestJS that is deployed in Vercel.
I need several schedulers, so I have used #nestjs/schedule lib, which is super easy to use.
Locally, everything works perfectly.
For some reason, the only thing that is not working in my production environment is those schedulers. Everything else is working - endpoints, data base access..
Does anyone has an idea why? is it something with my deployment? maybe Vercel has some issue with that? maybe this schedule library requires something the Vercel doesn't have?
I am clueless..
Cold boot is the process of starting a computer from shutdown or a powerless state and setting it to normal working condition.
Which means that the code you deployed in a serveless manner, will run when the endpoint is called. The platform you are using spins up a virtual machine, to execute your code. And keeps the machine running for a certain period of time, incase you get another API hit, it's cheaper and easier on them to keep the machine running for lets say 5 minutes or 60 seconds, than to redeploy it on every call after shutting the machine when function execution ends.
So in your case, most likely what is happening is that the machine that you are setting the cron on, is killed after a period of time. Crons are system specific tasks which run in the kernel. But if the machine is shutdown, the cron dies with it. The only case where the cron would run, is if the cron was triggered at a point of time, before the machine was shut down.
Certain cloud providers give you the option to keep the machines alive. I remember google cloud used to follow the path of that if a serveless function is called frequently, it shifts from cold boot to hot start, which doesn't kill the machine entirely, and if you have traffic the machines stay alive.
From quick research, vercel isn't the best to handle crons, due to the nature of the infrastructure, and this is what you are looking for. In general, crons aren't for serveless functions. You can deploy the crons using queues for example or another third party service, check out this link by vercel.

Running a Node.js server in Azure Release Pipeline

I have a Node.js web server which, as part of a CD process, I want to deploy to a staging server using Azure Release Pipeline. The problem is, that if I just run a Powershell script:
# Run-Server.ps1
node my-server.js
The Pipeline will hold since the node process blocks the Powershell session.
What I want is to be able to launch the service, and then in the next deployment just kill the node process and run it again with the new code.
So I figured I'll use Start-Process. If I run it locally:
> Start-Process node -ArgumentList ./server.js
I can now exit the Powershell session and the server will continue running. So I thought I can implement it the same way in my Release Pipeline.
But it turns out that once the Release Pipeline ends running, the server is no longer available - the node process is gone.
Can you help me figure out why is that? Is there another way of achieving this? I suppose it's a pretty common use case so there must be best-practices out there regarding to how this should be done.
Another way to achieve this is to use a full-blown web server to host andmanage node process. I.e. on Windows you could use IIS with iisnode module. This is more reliable and gives you a few other benefits:
process management (automatic start, restart on failure, etc.)
security - you can configure the user that node process will run as
scalability on multi-core CPUs
Then the process of app deployment would be just copying files to the right directory - the web server should pick up the change automatically.
By default, A pipeline job cleans up all of the child processes it spins up when it exits. This is killing your node server.
Set Process.Clean variable to false to override the default behavior.

How do I upgrade concourse from 3.4.0 to 3.5.0 without causing jobs to abort with state error?

When I did the upgrade of concourse from 3.4.0 to 3.5.0, suddenly all running jobs changed their state from running to errored. I can see the string 'no workers' appearing at the start of their log now. Starting the jobs manually or triggered by the next changes didn't have any problem.
The upgrade of concourse itself was successful.
I was watching what bosh did at the time and I saw this change of job states took place all at once while either the web or the db VM was upgraded (I don't know which one). I am pretty sure that the worker VMs were not touched yet by bosh.
Is there a way to avoid this behavior?
We have one db, one web VM and six workers.
With only one web VM it's possible that it was out of service for long enough that all workers expired. Workers continuously heartbeat and if they miss two heartbeats (which takes 1 minute by default) they'll stall. They should come back after the deploy is finished but if scheduling happened before they heartbeats, that would cause those errors.

Wakanda Server solution.quitServer() sequence of operations

I have already read the thread:
Wakanda Server scripted clean shutdown
This does not address my question.
We are running Wakanda Server 11.197492.
We want an automated, orderly, ensured shut-down of Wakanda Server - no matter which version we are running.
Before we give the "shutdown" command, we will stop inbound traffic for 1 to 2 minutes, to ensure that no httpHandlers are running when we shut-down.
We have scripted a single SharedWorker process to look for the "shutdown" command, and execute solution.quitServer().
At this time no other ShareWorker processes are running, and no active threads should be executing. This will likely not always be the case.
When this is executed, is a "solution quit" guaranteed?
Is solution.quitServer() the best way to initiate an automated solution shutdown?
Will there be a better way?
Is there a way to know of any of the Solution's Projects are currently executing threads prior to shutting down?
If more than 1 Project issues a solution.quitServer() method, within a few seconds of eachother, will that be a problem?
solution.quitServer() is probably not the best way to shutdown your server as it will be deprecated in the next major release.
I would recommend to send a sigkill as you point out in your question.
Wakanda Server scripted clean shutdown
Some fix have been done on v1.1.0 to safely close wakanda server after a kill.

Having Capistrano skip over down hosts

My setup
I am deploying a Ruby on Rails application to 70+ hosts. These hosts are located behind consumer-grade ADSL connections which may or may not be up. Probability of being up is aroud 99% but definently not 100%.
The deploy process works perfectly fine and I have no problem specific to it.
The problem
When Capistrano encounters a down host, it stops the entire process. This is a problem because if host n°30 is down, then the 40 other hosts after it do not get the deployment.
What I would like is definently an error for the hosts that are down but I would also like Capistrano to continue deploying to all the hosts that are up.
Is there any setting or configuration that would enable me to do this ?
I ended up running a Capistrano instance for each IP then parsing the logs to see which one has failed and which one has succeeded.
A little Python script adjusted to my needs does this fine.