Implementing a no-downtime upgrade in Kubernetes - deployment

I'm trying to deploy a Piwik site onto Kubernetes, and I am struggling to figure out how to do a hot-database-upgrade without loosing stats. So the process is as follows(upgrading from Piwik 3.0 to 3.1):
Spin up redis
Spin up Piwik 3.0 in maintenance mode which will stop it from connecting to the database
Spin down old Piwik 3.0(in regular mode)
Spin up Piwik 3.1(in maintenance mode)
Run the database migrations.
Spin down old Piwik 3.0(in maintenance mode)
Spin up Piwik 3.1(in regular mode)
Spin down Piwik 3.1(in maintenance mode)
Can I script this inside Kubernetes, or is my best approach going to be running a script inside my master. My aim is to be able to do repeatedly and reliably(so failures prior to step 5 are rolled back, and failures afterwards rely on human intervention).
I have tried to do some research around the best approach but I cannot find much information regarding this sort of upgrade process. Most guides just recommend taking the service off-line for a few seconds/minutes to do the migration which in this application isn't acceptable.

If your instance of Piwik was created as a Deployment, you're in luck. The details are here. What you need to do is a RollingUpdate.
The process will look like this. The syntax is "Action: Result".
Enable maintenance mode in Piwik Deployment: k8s progressively starts new Pods of Piwik 3.0 in maintenance mode, and spin down old Piwik 3.0. This occurs Pod by Pod.
Update image tag in Piwik Deployment to 3.1: k8s progressively start new Pods of Piwik 3.1, and spin down Piwik 3.0 maintenance mode. This occurs Pod by Pod.
Run the database migrations: Do your schtuff.
Disable maintenance mode in Piwik Deployment: k8s progressively start new Pods of Piwik 3.1 regular mode, and spin down Piwik 3.1 maintenance mode. This occurs Pod by Pod.

Related

Synchronize and rollback independent deployments in kubernetes

I have k8s setup that contains 2 deployments: client and server deployed from different images. Both deployments have replica sets inside, liveness and readiness probes defined. The client communicates with the server via k8s' service.
Currently, the deployment scripts for both client and server are separated (separate yaml files applied via kustomization). Rollback works correctly for both parts independently but let's consider the following scenario:
1. deployment is starting
2. both deployment configurations are applied
3. k8s master starts replacing pods of server and client
4. server pods start correctly so new replica set has all the new pods up and running
5. client pods have an issue, so the old replica set is still running
In many cases it's not a problem, because client and server work independently, but there are situations when breaking change to the server API is released and both client and server must be updated. In that case if any of these two fails then both should be rolled back (doesn't matter which one fails - both needs to be rolled back to be in sync).
Is there a way to achieve that in k8s? I spent quite a lot of time searching for some solution but everything I found so far describes deployments/rollbacks of one thing at the time and that doesn't solve the issue above.
The problem here is something covered in Blue/Green deployments.
Here is a good reference of Blue/Green deployments with k8s.
The basic idea is, you deploy the new version (Green deployment) while keeping the previous version (Blue deployment) up and running and only allow traffic to the new version (Green deployment) when everything went fine.

Is there any way to deploy multi-container application in K8S single node for production?

What i want do is deployment of multiple container application in...
In RHEL os
RedHat Supportable product (if possible)
In single node K8S cluster (Bare metal machine)
So I found several way but I concerned about..
minikube, minishift, OKD, CodeReady Container
First, they run in VM but what I want is run in HOST.
Second, their doc said they are not for production environment.
So, Is there any PaaS for single-node cluster as production environment?
Docker, Docker-compose
Deployment target OS should maybe RHEL8. I guess it is not good idea to use docker because RedHat product is moving away from docker. Even in RHEL8 repository, there is no docker rpm for el8 yet.
My question is
Is there any PaaS for single-node cluster as production environment?
If not exist, docker-compose is best?
It was already mentioned, you should not use single node setup in production environment.
You should not do that because, if your servers drops you have service offline. There is nothing to switch to, nothing that might continue the process that was being worked on.
If you still want to setup a single node Kubernetes cluster you can do that using kubeadm. I think this would be closest to production grade as you can get.
Other then that as an alternative you can play with Installing Kubernetes with Minikube or Install a local Kubernetes with MicroK8s.
It's up to you which one you will choose but you need to remember this should not be running as a production, this should be a lab or a test environment which if works as expected will be migrated into few node production grade cluster.
As for PaaS as a single node there is Dokku.
Docker powered mini-Heroku. The smallest PaaS implementation you've ever seen.
And if you would consider using a cloud for PaaS, you can choose from AWS Cloud9, Azure App Service or Google App Engine.
Single node cluster is not recommended for production applications. You need scalability, high availability, fault tolerance for production apps. You must have more than one node to have these features.

Service Fabric Cluster Upgrade Failing

I've an on-premise, secure, development cluster that I wish to upgrade. The current version is 5.7.198.9494. I've followed the steps listed here.
At the time of writing, the latest version of SF is 6.2.283.9494. However, running Get-ServiceFabricRuntimeUpgradeVersion -BaseVersion 5.7.198.9494 shows that I first must update to 6.0.232.9494, before upgrade to 6.2.283.9494.
I run the following in Powershell, and the upgrade does start:
Copy-ServiceFabricClusterPackage -Code -CodePackagePath .\MicrosoftAzureServiceFabric.6.0.232.9494.cab -ImageStoreConnectionString "fabric:ImageStore"
Register-ServiceFabricClusterPackage -Code -CodePackagePath MicrosoftAzureServiceFabric.6.0.232.9494.cab
Start-ServiceFabricClusterUpgrade -Code -CodePackageVersion 6.0.232.9494 -Monitored -FailureAction Rollback
However, after a few minutes the following happens:
Powershell IDE crashes
The Service Fabric Cluster becomes unreachable
Service Fabric Local Cluster Manager disappears from the task bar
Event Viewer will log the events, see below.
Quite some time later, the vm will reboot. Service Fabric Local Cluster Manager will only give options to Setup or Restart the local cluster.
Event viewer has logs in the under Applications and Services Logs -> Microsoft-Service Fabric -> Operational. Most are information about opening, closing, and aborting one of the upgrade domains. There are some warnings about a vm failing to open an upgrade domain stating error: Lease Failed.
This behavior happens consistently, and I've not yet been able to update the cluster. My guess is that we are not able to upgrade a development cluster, but I've not found an article that states that.
Am I doing something incorrectly here, or is it impossible to upgrade a development cluster?
I will assume you have a development cluster with a single node or multiple nodes in a single VM.
As described in the first section of the documentation from the same link your provided:
service-fabric-cluster-upgrade-windows-server
You can upgrade your cluster to the new version only if you're using a
production-style node configuration, where each Service Fabric node is
allocated on a separate physical or virtual machine. If you have a
development cluster, where more than one Service Fabric node is on a
single physical or virtual machine, you must re-create the cluster
with the new version.

How do I enable audit logging for Google Container Engine?

Running a GKE cluster with 1.8.1 - when I look at /logs/kube-apiserver-audit.log, it's completely empty. I've taken actions like creating deployments and deleting pods that have been visible in audit logs for clusters I've provisioned outside of GKE.
Is there a better way to view or access these kinds of events with GKE?
That would be because Container Engine 1.8 release does not enable the audit logging feature yet. From Release Notes:
KNOWN ISSUE: Audit Logging, a beta feature in Kubernetes 1.8, is currently not enabled on Container Engine.
It will probably be enabled at some point in the future, I’d keep an eye on the Release Notes.

How to restart unresponsive kubernetes master in GKE

The kubernetes master in one of my GKE clusters became unresponsive last night following the infrastructure issue in us-central1-a.
Whenever I run "kubectl get pods" in the default namespace I get the following error message:
Error from server: an error on the server has prevented the request from succeeding
If I run "kubectl get pods --namespace=kube-system", I only see the kube-proxy and the fluentd-logging daemon.
I have trying scaling the cluster down to 0 and then scaling it back up. I have also tried downgrading and upgrading the cluster but that seems to apply only to the nodes (not the master). Is there any GKE/K8S API command to issue a restart to the kubernetes master?
There is not a command that will allow you to restart the Kubernetes master in GKE (since the master is considered a part of the managed service). There is automated infrastructure (and then an oncall engineer from Google) that is responsible for restarting the master if it is unhealthy.
In this particular cases, restarting the master had no effect on restoring it to normal behavior because Google Compute Engine Incident #16011 caused an outage on 2016-06-28 for GKE masters running in us-central1-a (even though that isn't indicated on the Google Cloud Status Dashboard). During the incident, many masters were unavailable.
If you had tried to create a GCE cluster using kube-up.sh during that time, you would have similarly seen that it would be unable to create a functional master VM due to the SSD Persistent disk latency issues.
I'm trying to have at least one version to upgrade ready, if you trying to upgrade the master, it will restart and work within few minutes. Otherwise you should wait around 3 days while Google team will reboot it. On e-mail/phone, then won't help you. And unless you have payed support (transition to which taking few days), they won't give a bird.