New PVC for an active pod - kubernetes

Is it possible to plug and play storage to an active pod without restarting the pod? I want to bind a new storage to a running pod without restarting the pod. Does Kubernetes support this?

Most things in a Pod are immutable. In particular if you look at the API definition of a PodSpec it says in part (emphasis mine)
container: List of containers belonging to the pod. Containers cannot currently be added or removed. There must be at least one container in a Pod. Cannot be updated.
Typically you don't directly work with Pods; you work with a higher-level controller like a Deployment. There you can edit these things, and it reacts by creating new Pods with the new pod spec and then deleting the old Pods.
Also remember that sometimes the cluster itself will delete or restart a Pod (if its Node is over capacity or fails, for example) and you don't have any control over this. It's better to plan for your Pods to be periodically restarted than to try to prevent it.

Related

Assigning a volume for log dispatching with K8s

I want to ship my application logs from my K8s pods to hosted ELK. I want to used a PVC. This Link mentions that the pod should be owned by a StatefulSet. Is this a general recommendation or specific to DigitalOcean?
This is a general recommendation. The reason is, you will have to ensure that your logging pods should be ready before the application pods can start. Additionally, say you had a Pod named logging-pod-1 sending logs of application pod app-pod-1 to elk, and the logging-pod-1 crashes for some reason, using a StatefulSet ensure that the new pod started assumes the identity of logging-pod-1, and this way the "state" of the operation can be maintained across failures/rescheduling/restarts, ensuring that no logs are missed and logging isn't affected.
This part of "assuming the identity" can mean something like, when the new pod comes up, k8s will know which PVC to attach to this pod. This will be the same PVC that the crashed pod was using, and therefore the new pod will simply "take over the task" by mounting the same PV and starting on its work.
This is a very common design pattern and since you need to ensure ordering, a StatefulSet is a natural choice for deploying this logging functionality. You can learn more about StatefulSets here in the docs.

Is there an intermediate layer/cache between Kubernetes pod and Persistance volume, or does a pod access PV directly

Recently I ran into a strange problem. We have two pods running into an openshift cluster that shares a persistent volume (GlusterFs) between them.
Now for the sake of this explanation, let's assume one of the pods was PodA and the Other was PodB, in this case, PodB was running for three months, there is automation in POdA which creates/updates files in the shared persistence volume and PodB reads it and perform some operation based on the input.
Now coming to the problem, whenever POdA created a new file in the shared PV it was visible and accessible from PodA. However, there were a few files that PodA was updating periodically, but the change was not reflected in PodB. So in PodB, we could only see the old version of those files. To solve that problem, we have forcefully deleted PodB, and then openshift recreated it, and the problem was gone.
I thought in PV mechanism Kubernetes mount external storage/folder into the pod (container), and there is no intermediate layer or cache or something like that. From what we have experienced so far, it seems every container (or pod) creates a local copy of those files, or maybe there is a cache in between (PV and pod),
I have searched about this on google and could not find a detailed explanation on how this PV mount works in Kubernetes , would love to know the actual reason behind this problem.
There is no caching mechanism for PVs provided by Kubernetes, so the problem you are observing must be located in either the GlusterFS CSI driver or GlusterFS itself.

Force Kubernetes Pod shutdown before starting a new one in case of disruption

I'm trying to set up a stateful Apache Flink application in Kubernetes and I need to save the current state in case of a disruption, such as someone deleting the pod or it being rescheduled due to cluster resizing.
I added a preStop hook to the container that accomplishes this behaviour, but when I delete a pod using kubectl delete pod it spins up a new Pod before the old one terminates.
Guides such as this one use the Recreate update strategy to make sure only one pod runs at a time. This works fine in case of updating a deployment, but it does not cover disruptions like I described above. I also tried to set spec.strategy.rollingUpdate.maxSurge to 0 but that made no difference.
Is it possible to configure my Deployment in such a way that no pod ever starts before another one is terminated, or do I need to switch to StatefulSets?
I agree with #Cosmic Ossifrage as StatefulSets make it easy to achieve your goal. Each Pod in StatefulSets is represented with unique, persistent identities and stable hostnames that Kubernetes Engine maintains regardless of where they are scheduled.
Therefore, StatefulSets are deployed in sequential order and are terminated in reverse ordinal order assuming that Kubernetes StatefulSet controller removes one Pod each time after complete deletion of previous one as well.

Redistribute pods after adding a node in Kubernetes

What should I do with pods after adding a node to the Kubernetes cluster?
I mean, ideally I want some of them to be stopped and started on the newly added node. Do I have to manually pick some for stopping and hope that they'll be scheduled for restarting on the newly added node?
I don't care about affinity, just semi-even distribution.
Maybe there's a way to always have the number of pods be equal to the number of nodes?
For the sake of having an example:
I'm using juju to provision small Kubernetes cluster on AWS. One master and two workers. This is just a playground.
My application is apache serving PHP and static files. So I have a deployment, a service of type NodePort and an ingress using nginx-ingress-controller.
I've turned off one of the worker instances and my application pods were recreated on the one that remained working.
I then started the instance back, master picked it up and started nginx ingress controller there. But when I tried deleting my application pods, they were recreated on the instance that kept running, and not on the one that was restarted.
Not sure if it's important, but I don't have any DNS setup. Just added IP of one of the instances to /etc/hosts with host value from my ingress.
descheduler, a kuberenets incubator project could be helpful. Following is the introduction
As Kubernetes clusters are very dynamic and their state change over time, there may be desired to move already running pods to some other nodes for various reasons:
Some nodes are under or over utilized.
The original scheduling decision does not hold true any more, as taints or labels are added to or removed from nodes, pod/node affinity requirements are not satisfied any more.
Some nodes failed and their pods moved to other nodes.
New nodes are added to clusters.
There is automatic redistribution in Kubernetes when you add a new node. You can force a redistribution of single pods by deleting them and having a host based antiaffinity policy in place​. Otherwise Kubernetes will prefer using the new node for scheduling and thus achieve a redistribution over time.
What are your reasons for a manual triggered redistribution​?

Steps involved in creating a pod in kubernetes

How does Kubernetes create Pods?
I.e. what are the sequential steps involved in creating a Pod, is it implemented in Kubernetes?
Any code reference in Kubernetes repo would also be helpful.
A Pod is described in a definition file, and ran as a set of Docker containers on a given host which is part of the Kubernetes cluster, much like docker-compose does, but with several differences.
Precisely, a Pod always contains multiple Docker containers, even though, only the containers defined by the user are usually visible through the API: A Pod has one container that is a placeholder generated by the Kubernetes API, that will hold the IP for the Pod (so that when a Pod is restarted, it's actually the client containers that are restarted, but the placeholder container remains and keeps the same IP, unlike in straight Docker or docker-compose, where recreating a composition or container changes the IP.)
How Pods are scheduled, created, started, restarted if needed, re-scheduled etc... it a much longer story and very broad question.