Why are some Kubernetes' resources immutable after creation? - kubernetes

From the Kubernetes' validation source code, at least those resources are immutable after creation:
Persistent Volumes
Storage Classes
Why is that ?

This is a core concept on Kubernetes. A few specs are immutable because their change has impact in the basic structure of the resource it's connected.
For example, changing the Persistent Volumes may impact pods that are using this PV. Let's suppose you have a mysql pod running on a PV and you change it in a way that all the data is gone.
On Kubernetes 1.18 Secrets and ConfigMaps also became immutable as an Alpha feature, meaning that this will be the new default soon. Check the GitHub Issue here.
What is it good for?
The most popular and the most convenient way of consuming Secrets and
ConfigMaps by Pods is consuming it as a file. However, any update to a
Secret or ConfigMap object is quickly (roughly within a minute)
reflected in updates of the file mounted for all Pods consuming them.
That means that a bad update (push) of Secret and/or ConfigMap can
very quickly break the entire application.
Here you can read more about the motivation behind this decision.
In this KEP, we are proposing to introduce an ability to specify that
contents of a particular Secret/ConfigMap should be immutable for its
whole lifetime. For those Secrets/ConfigMap, Kubelets will not be
trying to watch/poll for changes to updated mounts for their Pods.
Given there are a lot of users not really taking advantage of
automatic updates of Secrets/ConfigMaps due to consequences described
above, this will allow them to:
protect themselves better for accidental bad updates that could cause outages of their applications
achieve better performance of their cluster thanks to significant reduction of load on apiserver

Related

EKS cluster PVC and namespace snapshots

Having difficulty to find a proper tool or combination of tools in order to safely create selective backups of resources in EKS.
Valero seems to be a good option. It is not clear how the PVC snapshots are performed and if they can be performed directly by Valero or it requires a special drive to be attached as available storageclass. In this case if there is already production in place it requires surgical integrations and potential data loss to just replace storage class of all already deployed services with PVC usage.
Volume Snapshot Custom Resource Definitions (CRDs) and Volume Snapshot controller seem to work manually without being able to use the solution out of the box. This requires probably some POD cronjob that has access to all storageclasses available through serviceaccount injection. This seems that is a best fit if there are engineers that will perform snaps before upgrades or migrations.
Does anybody else have experience with other opensource tools that provide all above functionality and offer a simple UI to preview all backups and select storage platform for snaps?
Thanks in advance

Multiple apps in single K8S deployment

I'm exploring K8S possibilities and I'm wonder is there any way to create deployments for two or more apps in single deployment so it is transactional - when something is wrong after deployment all apps are rollbacked. Also I want to mention that I'm not saying about pod with multiple containers because additional side car containers are rather intended for some crosscutting concerns like monitoring, authentication (like kerberos) and others but it is not recommended to put different apps in single pod. Having this in mind, is it possible to have single deployment that can produce 2+ kind of pods?
Is it possible to have single deployment that can produce 2+ kind of pods?
No. A Deployment creates only one kind of Pod. You can update a Deployment's contents, and it will incrementally replace existing Pods with new ones that match the updated Pod spec.
Nothing stops you from creating multiple Deployments, one for each kind of Pod, and that's probably the approach you're looking for here.
... when something is wrong after deployment all apps are rollbacked.
Core Kubernetes doesn't have this capability on its own; indeed, it has somewhat limited capacity to tell that something has gone wrong, other than a container failing its health checks or exiting.
Of the various tools in #SYN's answer I at least have some experience with Helm. It's not quite "transactional" in the sense you might take from a DBMS, but it does have the ability to manage a collection of related resources (a "release" of a "chart") and it has the ability to roll back an entire version of a release across multiple Deployments if required. See the helm rollback command.
Helm
As pointed out in comments, one way to go about this would be to use something like Helm.
Helm is some kind of client (as of v3. Previous also involved "tiller", a controller running in your kubernetes cluster: let's forget about that one/deprecated).
Helm uses "Charts" (more or less: templates, with default values you can override).
Kustomize
Another solution, similar to Helm, is Kustomize. Working from plain-text files (not templates), while making it simple to override / customize your objects before applying them to your Kubernetes cluster.
ArgoCD
While Kustomize and Helm are both standalone clients, we could also mention solutions such as ArgoCD.
The ArgoCD controller would run inside your Kubernetes cluster, allowing you to create "Application" objects.
Those Applications are processed by ArgoCD, driving deployment of your workloads (common sources for those applications would involve Helm Charts, Git repositories, ...).
The advantage of ArgoCD being that their controller may (depending on your configuration) be responsible for upgrading your applications over time (eg: if your source is a git repository, branch XXX, and someone pushes changes into that branch: argocd would apply those pretty much right away)
Operators
Although most of those solutions are pretty much unaware of how your application is running. Say you upgrade a deployment, driven by Helm, Kustomize or ArgoCD, and end up with some database pods stuck in crashloopbackoff: your application pods would get updated nevertheless, there's no automatic rollback to a previous working configuration.
Which brings us to another way to ship applications to Kubernetes: operators.
Operators are aware of the state of your workloads, and may be able to fix common errors ( depending on how it was coded, ... there's no magic ).
An operator is an application (can be in Go, Java, Python, Ansible playbooks, ... or whichever comes with some library communicating with a Kubernetes cluster API)
An operator is constantly connected to your Kubernetes cluster API. You would usually find some CustomResourceDefinitions specific to your operator, allowing you to describe the deployment of some component in your cluster. (eg: the elasticsearch operator introduces an object kind "ElasticSearch", and some "Kibana")
The operator watches for instances of the objects it managed (eg: ElasticSearch), eventually creating Deployment/StatefulSets/Services ...
If someone deletes an object that was created by your operator, it would/should be re-created by that operator, in a timely manner (mileage may vary, depending on which operator we're talking about ...)
A perfect sample for operators would be something like OpenShift 4 (OKD4). A Kubernetes cluster that comes with 10s of operators (SDN, DNS, machine configurations, ingress controller, kubernetes API server, etcd database, ...). The whole cluster is an assembly of operators: upgrading your cluster, each of those would manage the upgrade of the corresponding services, in an orchestrated way, ... one after the other, ... if anything fails, you're still usually left with enough replicas running to troubleshoot the issue, ...
Depending on what you're looking for, each option has advantages and inconvenients. Now if you're looking for "single deployment that can produce 2+ kind of pods", then ArgoCD or some home-grown operator would qualify.

Kubernetes workload for stateful application but no need of persistent disk

I am having a stateful application - I am keeping data in user's sessions (basically data in HttpSession object) - but I do not have any requirement to write anything to persistent disk.
From what I have read so far - StatefulSet workloads are meant for stateful applications, but my understanding so far is that even though my application is a stateful application but Deployment workloads can also suffice my requirement because I do not want to write anything to persistent disks.
However, one point I am not sure about is that suppose I use Deployment workload and a lot of user data is present in my HttpSession object, now due to some reason Kubernetes restarts my Pod then of course all that user session data will be lost. So, my question are following:
Does StatefulSet handles this situation any better than Deployment workload?
So, only difference between Deployment workload and StatefulSet workload is about absence/presence of persistent disk or there is something to do with application session management as well in case of StatefulSet?
Does StatefulSet handles this situation any better than Deployment workload?
No. Neither Deployment nor StatefulSet will preserve memory contents. To preserve session information, you'll need to store it somewhere. One common approach is to use Redis.
So, only difference between Deployment workload and StatefulSet workload is about absence/presence of persistent disk or there is something to do with application session management as well in case of StatefulSet?
No, there are other differences:
StatefulSets create (and re-create) deterministic, consistent pod names (identifiers).
StatefulSets, are deployed, scaled, and updated one by one in a deterministic, consistent order. The next pod will be created only after the previous one reached the Running state.
Additionally, it's worth mentioning that persistent disks can be attached to pods that aren't part of a StatefulSet. It's just that it's convenient to have disks always be attached to a pod with a consistent id. For instance if you have pods running a replicated database, you can use StatefulSets to ensure that the master replica's disk is always attached to pod #1.
Edit:
Link to official documentation about StatefulSets
From the documentation:
Like a Deployment, a StatefulSet manages Pods that are based on an
identical container spec. Unlike a Deployment, a StatefulSet maintains
a sticky identity for each of their Pods. These pods are created from
the same spec, but are not interchangeable: each has a persistent
identifier that it maintains across any rescheduling.
...
StatefulSets are valuable for applications that require one or more of
the following.
Stable, unique network identifiers.
Stable, persistent storage.
Ordered, graceful deployment and scaling.
Ordered, automated rolling updates.
In the above, stable is synonymous with persistence across Pod
(re)scheduling. If an application doesn't require any stable
identifiers or ordered deployment, deletion, or scaling, you should
deploy your application using a workload object that provides a set of
stateless replicas. Deployment or ReplicaSet may be better suited to
your stateless needs.

k8s - what happens to persistent storage when cluster is deleted/?

Do they get permanently deleted as well?
I imagine they do, since they are a part of a cluster, but I'm new to k8s and I can't find this info online.
If they do get deleted, what would be the preferred solution to keep the data for a cluster that sometimes gets completely deleted and re-deployed?
Thanks
According to documentation you can avoid complete PersistentVolume deletion by using retain reclaiming policies.
In this case even after PersistentVolume deletion it still exist in external infrastructure, like AWS EBS. So it is possible to recover or reuse existed data.
You can find more details here and here

Recreating GCP kubernetes cluster

I'm looking to understand how to recreate my cluster. There's a cluster-level setting to specify the IP range for nodes created within it, which I want to use so I can set a decent firewall rule. However, it looks like that can't be changed once the cluster is created.
I have a number of namespaces, deployments, services, secrets, persistent volumes and claims. If I wanted to transfer them all to a new cluster, should I just kubectl get all --namespace=whatever --format=yaml, kubectl delete -f, and then kubectl apply -f on the new cluster?
Would something so crude work for mapping to the same load balancers / public IPs, persistent volumes, secrets, etc?
As you can see the backup and the migration of whole clusters is quite a discussed matter and still an open issue on Kubernetes github as well:
https://github.com/kubernetes/kubernetes/issues/24229
Therefore I do not believe that the command that you posted might be considered a solution or work. I think it will fail due to different resources that are cluster dependent and IPs. Moreover since this kind of use is not supported It will lead for to multiple issues.
Lets say that you change zone of the cluster, how could be possible to move the PV if the disk cannot be attached to an instance in a different zone (or possibly if you migrate to a different cloud service)?
More important I would not risk to delete my production to run a command that is not documented or indicated as best practise. You could try it on test namespace, but I would not suggest to go further.
You can check reshifter and ark since they might cover your needs. I have never tested them but they are mentioned in the thread, so they might be of your interest.
I tried this approach in one of my test cluster obtaining:
Error from server (Conflict): Operation cannot be fulfilled
Error from server (Conflict): Operation cannot be fulfilled
Error from server (Forbidden): [...]
Honestly I believe that for a limited subset of resources it might be possible (Note that some resources were created correctly) , but it cannot be considered at all a way to migrate.