InitContainer or Helm Hook Database Migrations Rollback - kubernetes

I read a lot about best practices applying DataBase Migrations for Kubernetes Apps. Actually there are three common solutions:
Separate CI/CD stage for DB Migrations (usually we run it before deploy new App version but in some cases we can deploy first, it doesn't matter for current question). Just decouple app and migrations, and run them in different stages one by one.
InitContainer: we can run DB migraitions before main app container starts. It is a good solution, but we have to be careful to tune it to run migrations once for all pods and replicas, do not run on pod restarts and check probes (kubelet can kill container on timeout while migration still running)
Separate Kubernetes Job or Helm Hook. Is close to initContainer, cannot share data with app container (but it is not really necessary for DB Migrations, so it's ok). Must be careful with timeout too - Helm can kill Job before migration is completed.
But the question - how to apply rollbacks for these solutions???
Lets try to form my ideas:
Separate stage in CI/CD: we can save previous migration name, and then rollback to it in another stage. Pipeline: Migrations -> Deploy -> Test -> Rollback DB and ReDeploy
But for InitContainer and HelmHook I have to idea how to realise Rollback! Do we need additional containers for that? Does helm rollback affect DB too (don't think so). What are the best practices for that?
I will be very glad to any suggestions!

I started investigating this issue too, and it looks like all manuals, tutorials and practices are meant to be forward-only. Should you ever need to rollback a database migration, you face with the limitations of Helm rollback. There is an open issue in Helm (https://github.com/helm/helm/issues/5825) that addresses this very problem, but it looks like no solution so far.
So probably Helm rollback is not a suitable mechanism for databases as you end up with creating a batch/v1.Job with container image that knows nothing about the changes you need to rollback. I use Liquibase and it requires that the changesets you need to rollback are present in the changelog.
I think for now this problem can be solved with forward-only approach:
Build the container image that introduces a new changeset into your database.
In release values for Helm specify the container image that should be run to perform a migration, and a command (like liquibase update).
If the migration fails and you see that rollback is required, you do not invoke helm rollback. You deploy a new release, and in its values you specify the container image that should perform the database migration rollback, and the command (like liquibase rollback <tag>). I am myself going this way now and it looks like the best solution I could came up with.
Hope that helps.

Related

Liquibase with Kubernetes, how to prevent DB being left in a locked state

Firstly, yes I have read this https://www.liquibase.com/blog/using-liquibase-in-kubernetes
and I also read many SO threads where people are answering "I solved the issue by using init-container"
I understand that for most people this might have fixed the issue because the reason their pods were going down was because the migration was taking too long and k8s probes killed the pods.
But what about when a new deployment is applied and the previous deployment was stuck a failed state (k8s trying again and again to launches the pods without success) ?
When this new deployment is applied it will simply whip / replace all the failing pods and if this happens while Liquibase aquired the lock the pods (and its init containers) are killed and the DB will be left in a locked state requiring manual intervention.
Unless I missed something with k8s's init-container, using them doesn't really solve the issue described above right?
Is that the only solution currently available? What other solution could be used to avoid manual intervention ?
My first thought was to add some kind of custom code (either directly in the app before the Liquibase migration happens) or in init-container that would run before liquibase init-container runs to automatically unlock the DB if for example the lock is, let's say, 5 minutes old.
Would that be acceptable or will it cause other issues i'm not thinking about ?

Whole Application level rolling update

My kubernetes application is made of several flavors of nodes, a couple of “schedulers” which send tasks to quite a few more “worker” nodes. In order for this app to work correctly all the nodes must be of exactly the same code version.
The deployment is performed using a standard ReplicaSet and when my CICD kicks in it just does a simple rolling update. This causes a problem though since during the rolling update, nodes of different code versions co-exist for a few seconds, so a few tasks during this time get wrong results.
Ideally what I would want is that deploying a new version would create a completely new application that only communicates with itself and has time to warm its cache, then on a flick of a switch this new app would become active and start to get new client requests. The old app would remain active for a few more seconds and then shut down.
I’m using Istio sidecar for mesh communication.
Is there a standard way to do this? How is such a requirement usually handled?
I also had such a situation. Kubernetes alone cannot satisfy your requirement, I was also not able to find any tool that allows to coordinate multiple deployments together (although Flagger looks promising).
So the only way I found was by using CI/CD: Jenkins in my case. I don't have the code, but the idea is the following:
Deploy all application deployments using single Helm chart. Every Helm release name and corresponding Kubernetes labels must be based off of some sequential number, e.g. Jenkins $BUILD_NUMBER. Helm release can be named like example-app-${BUILD_NUMBER} and all deployments must have label version: $BUILD_NUMBER . Important part here is that your Services should not be a part of your Helm chart because they will be handled by Jenkins.
Start your build with detecting the current version of the app (using bash script or you can store it in ConfigMap).
Start helm install example-app-{$BUILD_NUMBER} with --atomic flag set. Atomic flag will make sure that the release is properly removed on failure. And don't delete previous version of the app yet.
Wait for Helm to complete and in case of success run kubectl set selector service/example-app version=$BUILD_NUMBER. That will instantly switch Kubernetes Service from one version to another. If you have multiple services you can issue multiple set selector commands (each command executes immediately).
Delete previous Helm release and optionally update ConfigMap with new app version.
Depending on your app you may want to run tests on non user facing Services as a part of step 4 (after Helm release succeeds).
Another good idea is to have preStop hooks on your worker pods so that they can finish their jobs before being deleted.
You should consider Blue/Green Deployment strategy

Best practice for running database schema migrations

Build servers are generally detached from the VPC running the instance. Be it Cloud Build on GCP, or utilising one of the many CI tools out there (CircleCI, Codeship etc), thus running DB schema updates is particularly challenging.
So, it makes me wonder.... When's the best place to run database schema migrations?
From my perspective, there are four opportunities to automatically run schema migrations or seeds within a CD pipeline:
Within the build phase
On instance startup
Via a warm-up script (synchronously or asynchronously)
Via an endpoint, either automatically or manually called post deployment
The primary issue with option 1 is security. With Google Cloud Sql/Google Cloud Build, it's been possible for me to run (with much struggle), schema migrations/seeds via a build step and a SQL proxy. To be honest, it was a total ball-ache to set up...but it works.
My latest project is utilising MongoDb, for which I've connected in migrate-mongo if I ever need to move some data around/seed some data. Unfortunately there is no such SQL proxy to securely connect MongoDb (atlas) to Cloud Build (or any other CI tools) as it doesn't run in the instance's VPC. Thus, it's a dead-end in my eyes.
I'm therefore warming (no pun intended) to the warm-up script concept.
With App Engine, the warm-up script is called prior to traffic being served, and on the host which would already have access via the VPC. The warmup script is meant to be used for opening up database connections to speed up connectivity, but assuming there are no outstanding migrations, it'd be doing exactly that - a very light-weight select statement.
Can anyone think of any issues with this approach?
Option 4 is also suitable (it's essentially the same thing). There may be a bit more protection required on these endpoints though - especially if a "down" migration script exists(!)
It's hard to answer you because it's an opinion based question!
Here my thoughts about your propositions
It's the best solution for me. Of course you have to take care to only add field and not to delete or remove existing schema field. Like this, you can update your schema during the Build phase, then deploy. The new deployment will take the new schema and the obsolete field will no longer be used. On the next schema update, you will be able to delete these obsolete field and clean your schema.
This solution will decrease your cold start performance. It's not a suitable solution
Same remark as before, in addition to be sticky to App Engine infrastructure and way of working.
No real advantage compare to the solution 1.
About security, Cloud Build will be able to work with worker pool soon. Still in alpha but I expect in the next month an alpha release of it.

Kubernetes deployment with Recreate strategy and maxSurge?

Summary
Can I give a deployment the rollout strategy Recreate and also set a fixed maxSurge for the deployment?
More details
I am developing an application that runs in Kubernetes. The backend will have multiple replicas, and runs EF Core with database migrations. I understand there are several ways to solve this; here's my idea at the moment.
On a new release, I would like all replicas to be stopped. Then a single replica at a time should start, and for each replica there should be an init container that runs the migrations (if needed).
This seems to be possible, using the following two configuration values:
.spec.strategy.type==Recreate and
.spec.strategy.rollingUpdate.maxSurge==1
Is it possible to use these two together? If not, is there any way to control how many replicas a controller will start at once with the Recreate strategy?
"No! You should do this in a completely different way!"
Feel free to suggest other methods as well, if you think I am coming at this from the completely wrong angle.
Statefulset might help you in this case.
StatefulSets are valuable for applications that require one or more of the following.
Stable, unique network identifiers.
Stable, persistent storage.
Ordered, graceful deployment and scaling.
Ordered, automated rolling updates.

Flyway - Springboot - Kubernetes

Do, still, I need to run the Flywydb image as Kubernetes Job to run database migrations from Springboot deployed in Kubernetes (EKS & Openshift)?
I see some references of Flyway being configured that way. However, in case of Springboot, Flyway migrations are run as part of start cycle.
Am I missing something?
You have multiple options: You can either use flyway as part of your application or you can use a separate container, for example as an initContainer in Kubernetes.
If you feel more comfortable with having flyway as part of your application and run in similarly to your local development environment, just go ahead.