How exactly does the Heroku deployment process work? - deployment

When I deploy a new version of my service to Heroku, what happens exactly?
Suppose I have N web dynos online right now, M of them currently servicing requests.
Do all of them shut down before the new version starts coming online? What happens to any pending requests currently being serviced?
Is there downtime? (let's assume I just have a stateless service without any migrations)
Is there a hook for doing custom migrations (e.g. migrate database tables)?
Can I bring up N servers running the new version, have them start servicing requests, and bring the previous N servers down only once they're not servicing any requests?
Does the answer depend on the stack/language? (Aspen/Bamboo/Cedar, Ruby/Node.js/Java/...)
I didn't any official documentation about this, just contrary posts (some saying hot migrations are not possible, while others say there is no downtime). Are there any official details about the deployment process and the questions above?

Here is what happens during a Heroku deploy (current as of 10/20/2011*)[1]:
Heroku receives your git push
A new release is compiled from the latest version of your app and stored
[These happen roughly simultaneously]
The dyno grid is signalled to terminate[2] all running processes for your app
The dyno grid is signalled to start new processes for your app
The dyno grid is signalled to unidle all idle processes of your app
The HTTP router is signalled to begin routing HTTP traffic to the web dynos running the new release
The general takeaway is that to minimize any possible downtime you should minimize the boot time of your app.
By following careful migration practices, it is possible to push new code and then migrate while the app is running.
Here's an example for Rails:
To minimize dropped connections during a restart, use a webserve that responds appropriately to SIGTERM by beginning a graceful shutdown (finish existing connections, dont take new ones). Newer versions of thin will handle SIGTERM correctly.
This subject is the topic of much discussion internally and may
change in the future.
SIGTERM followed 10s later by SIGKILL if
still running

I can answer "Is there a hook for doing custom migrations (e.g. migrate database tables)?" part of this question. I've handled running migrations by writing a shell script that does a "heroku rake db:migrate" immediately after I issue "git push heroku". I don't know if there is a more "hook" - y way to do that.


NestJS schedualers are not working in production

I have a BE service in NestJS that is deployed in Vercel.
I need several schedulers, so I have used #nestjs/schedule lib, which is super easy to use.
Locally, everything works perfectly.
For some reason, the only thing that is not working in my production environment is those schedulers. Everything else is working - endpoints, data base access..
Does anyone has an idea why? is it something with my deployment? maybe Vercel has some issue with that? maybe this schedule library requires something the Vercel doesn't have?
I am clueless..
Cold boot is the process of starting a computer from shutdown or a powerless state and setting it to normal working condition.
Which means that the code you deployed in a serveless manner, will run when the endpoint is called. The platform you are using spins up a virtual machine, to execute your code. And keeps the machine running for a certain period of time, incase you get another API hit, it's cheaper and easier on them to keep the machine running for lets say 5 minutes or 60 seconds, than to redeploy it on every call after shutting the machine when function execution ends.
So in your case, most likely what is happening is that the machine that you are setting the cron on, is killed after a period of time. Crons are system specific tasks which run in the kernel. But if the machine is shutdown, the cron dies with it. The only case where the cron would run, is if the cron was triggered at a point of time, before the machine was shut down.
Certain cloud providers give you the option to keep the machines alive. I remember google cloud used to follow the path of that if a serveless function is called frequently, it shifts from cold boot to hot start, which doesn't kill the machine entirely, and if you have traffic the machines stay alive.
From quick research, vercel isn't the best to handle crons, due to the nature of the infrastructure, and this is what you are looking for. In general, crons aren't for serveless functions. You can deploy the crons using queues for example or another third party service, check out this link by vercel.

K8s graceful upgrade of service with long-running connections

tl;dr: I have a server that handles WebSocket connections. The nature of the workload is that it is necessarily stateful (i.e., each connection has long-running state). Each connection can last ~20m-4h. Currently, I only deploy new revisions of this service at off hours to avoid interrupting users too much.
I'd like to move to a new model where deploys happen whenever, and the services gracefully drain connections over the course of ~30 minutes (typically the frontend can find a "good" time to make that switch over within 30 minutes, and if not, we just forcibly disconnect them). I can do that pretty easily with K8s by setting gracePeriodSeconds.
However, what's less clear is how to do rollouts such that new connections only go to the most recent deployment. Suppose I have five replicas running. Normal deploys have an undesirable mode where a client is on R1 (replica 1) and then K8s deploys R1' (upgraded version) and terminates R1; frontend then reconnects and gets routed to R2; R2 terminates, frontend reconnects, gets routed to R3.
Is there any easy way to ensure that after the upgrade starts, new clients get routed only to the upgraded versions? I'm already running Istio (though not using very many of its features), so I could imagine doing something complicated with some custom deployment infrastructure (currently just using Helm) that spins up a new deployment, cuts over new connections to the new deployment, and gracefully drains the old deployment... but I'd rather keep it simple (just Helm running in CI) if possible.
Any thoughts on this?
This is already how things work with normal Services. Once a pod is terminating, it has already been removed from the Endpoints. You'll probably need to tune up your max burst in the rolling update settings of the Deployment to 100%, so that it will spawn all new pods all at once and then start the shutdown process on all the rest.

How do I upgrade concourse from 3.4.0 to 3.5.0 without causing jobs to abort with state error?

When I did the upgrade of concourse from 3.4.0 to 3.5.0, suddenly all running jobs changed their state from running to errored. I can see the string 'no workers' appearing at the start of their log now. Starting the jobs manually or triggered by the next changes didn't have any problem.
The upgrade of concourse itself was successful.
I was watching what bosh did at the time and I saw this change of job states took place all at once while either the web or the db VM was upgraded (I don't know which one). I am pretty sure that the worker VMs were not touched yet by bosh.
Is there a way to avoid this behavior?
We have one db, one web VM and six workers.
With only one web VM it's possible that it was out of service for long enough that all workers expired. Workers continuously heartbeat and if they miss two heartbeats (which takes 1 minute by default) they'll stall. They should come back after the deploy is finished but if scheduling happened before they heartbeats, that would cause those errors.

How to do zero-downtime rolling updates for app with (long-lived) sticky sessions using containers

Am trying to figure out how to provide zero-downtime rolling updates of a webapp that has long-lived interactive user sessions that should be sticky, based on a JSESSIONID cookie.
For this (and other) reasons I'm looking at container technology, like say Docker Swarm or Kubernetes.
I am having difficulties finding a good answer on how to:
Make sure new sessions go to the latest version of the app
While existing sessions remain being served by whichever version of
the app they were initiated on
Properly clean up the old version once all sessions to/on it were
Some more info:
Requests are linked to a session based on a JSESSIONID cookie
Sessions could potentially live for days, but am able to terminate them from within the app within say a 24hr timeframe (Sending the user a notification to "logout/login again as there is a new version or that they are otherwise automatically logged out at 12pm" for example)
Of course for each version of the app there are multiple containers already running in load-balanced fashion
I don't mind the number of total containers growing, for example if each of the old versions containers are all still up and running because they would all still host 1 session, while the majority of the users are already on the new version of the app
So, my idea of the required flow is something along these lines:
Put up the new version of the app
let all new connections (the ones w/o the JSESSIONID cookie set) go to the new version of the app once
a container of the old version of the app is not serving sessions
anymore, remove the container/....
As I mentioned, I'm looking into Kubernetes amd Docker Swarm, but open for other suggestions, but the end solution should be able to run on cloud platforms (currently using Azure, but Google or Amazon clouds might be used in the future)
Any pointers/tips/suggestions or ideas appreciated
In answer to #Tarun question and general clarification: yes, I want no downtime. The way I envision this is that the containers hosting the old version will keep running to serve all existing sessions. Once all sessions on the old servers have ended, the old server is removed.
The new containers are only going to serve new sessions for users that startup the app after the rollout of the new version has started.
So, to give an example:
- I launch a new session A of the old version of the app at 9am
- At 10am a new version is rolled out.
- I continue to use session A with remains hosted on a container running the old version.
- at noon I go for lunch and log out
- as I was the last session connected to the container running the old version, the container will now be destroyed
- at 1pm I come back, log back in and I get the new version of the app
Makes sense?
Your work load might not be a good fit for Kubernetes/containers ith its current architecture. The best way I can come up to solve this is it to move the state to PV/PVC and migrate the PV to the new containers so the new container can have state from the old session, now how to migrate the calls for that session to the proper node I'm not sure how to do that efficiently.
Ideally you would separate your data/caching layer from your service into something like redis and then it wouldn't matter which of the nodes service the request.

Managing resources (database, elasticsearch, redis, etc) for tests using Docker and Jenkins

We need to use Jenkins to test some web apps that each need:
a database (postgres in our case)
a search service (ElasticSearch in our case, but only sometimes)
a cache server, such as redis
So far, we've just had these services running on the Jenkins master, but this causes problems when we want to upgrade Postgres, ES or Redis versions. Not all apps can move in lock step, and we want to run the tests on new versions before committing to move an app in production.
What we'd like to do is have these services provided on a per-job-run basis, each one running in its own container.
What's the best way to orchestrate these containers?
How do you start up these ancillary containers and tear them down, regardless of whether to job succeeds or not?
how do you prevent port collisions between, say, the database in a run of a job for one web app and the database in the job for another web app?
Check docker-compose and write a docker-compose file for your tests.
The latest network features of Docker (private network) will help you to isolate builds running in parallel.
However, start learning docker-compose as if you only had one build at the same time. When confident with this, look further for advanced docker documentation around networking.