I am working on a client requirement that the worker nodes needs to have a specific time zone configured for their apps to run properly. We have tried things such as using the TZ environment and also mounting a volume on /etc/localtime that points to the right file in /usr/share/zoneinfo// - these work to some extent but it seems I will need to use daemonsets to modify the node configuration for some of the apps.
The concern I have is that the specific pod that needs to make this change on the nodes will have to be run with host privileges and leaving such pods running on all pods doesn't sound good. The documentation says that the pods on daemonsets must have the restart policy of always so I can't have them exit after making the changes too.
I believe I can address this specific concern with an init container that run with host privileges, make the appropriate changes on the node and exit. The other pods in the daemonset will run after the init container is run successfully and finally, all the other pods get scheduled on the nodes. I also believe this sequence works the same way when I add another nodes to the cluster.
Does that sound about right? Are there better approaches?
Related
I am currently using the Hetzner CSI-Driver (https://github.com/hetznercloud/csi-driver) in Kubernetes, which works fine for the most part.
But sometimes I run into the issue that two pods using the same persistentVolumeClaim get scheduled onto different nodes. Since the persistentVolume is only mounted onto one node, all podes running on the other node fail with the error 'Unable to attach or mount volumes'.
That makes sense to me but I can't wrap my head around what the correct solution would be. I thought that CSI-Drivers which mount volumes told Kubernetes in some way "oh, this pod needs that volumeClaim? Then you need to schedule it onto that node because the mounted volume is currently in use there by another pod", so I don't understand why pods using the same claim even get scheduled onto different nodes.
Is my understanding of CSI-drivers in general incorrect or is there some way in which I can enforce that behaviour? Or am I using this wrong alltogether and should change the underlying configuration?
Any help is appreciated.
Currently I simply restart the pod until I get lucky and it is moved to the correct node and then everything works fine. But I assume that there is a more elegant solution.
Kubernetes tends to assume apps are small/lightweight/stateless microservices which can be stopped on one node and restarted on another node with no downtime.
We have a slow starting (20min) legacy (stateful) application which, once run as a set of pod should not be rescheduled without due cause. The reason being all user sessions will be killed and the users will have to login again. There is NO way to serialize the sessions and externalize them. We want 3 instances of the pod.
Can we tell k8s not to move a pod unless absolutely necessary (i.e. it dies)?
Additional information:
The app is a tomcat/java monolith
Assume for the sake of argument we would like to run it in Kubernetes
We do have a liveness test endpoint available
There is no benefit, if you tell k8s to use only one pod. That is not the "spirit" of k8s. In this case, it might be better to use a dedicated machine for your app.
But you can assign a pod to a special node - Assigning Pods to Nodes. The should be necessary only, when special hardware requirements are needed (e.g. the AI-microservice needs a GPU, which is only on node xy).
k8s don't restart your pod for fun. It will restart it, when there is a reason (node died, app died, ...) and I never noticed a "random reschedule" in a cluster. It is hard to say, without any further information (like deployment, logs, cluster) what exactly happened to you.
And for your comment: There are different types of recreation, one of them starts a fresh instance and will kill the old one, when the startup was successfully. Look here: Kubernetes deployment strategies
All points together:
Don't enforce a node to your app - k8s will "smart" select the node.
There are normally no planned reschedules in k8s.
k8s will recreate pods only, if there is a reason. Maybe your app didn't answer on the liveness-endpoint? Or someone/something deleting your pod?
I'm running a cluster with kind - one worker node.
However when I do kubectl get nodes I can't see the node, but instead I see 'kind control plane' - which makes no sense to me, control plane is a node??
The worker node must be running, because I can do kubectl exec --stdin --tty <name of the pod> /bin/sh and see inside of the container that's running my app.
Is this some weird WSL2 interaction? Or I'm simply doing something wrong?
control-plane is just a name. If you just run kind create cluster, its default is to create a single-node cluster with the name control-plane. From your description, everything is working properly.
One of kind's core features is the ability to run a "multi-node" cluster, but all locally in containers. If you want to test your application's behavior if, for example, you drain its pods from a node, you can run a kind cluster with one control-plane node (running etcd, the API server, and other core Kubernetes processes) and three worker nodes; let the application start up, then kubectl drain worker-1 and watch what happens. The documentation also notes that this is useful if you're developing on Kubernetes itself and need a "multi-node" control plan to test HA support.
Is it possible to plug and play storage to an active pod without restarting the pod? I want to bind a new storage to a running pod without restarting the pod. Does Kubernetes support this?
Most things in a Pod are immutable. In particular if you look at the API definition of a PodSpec it says in part (emphasis mine)
container: List of containers belonging to the pod. Containers cannot currently be added or removed. There must be at least one container in a Pod. Cannot be updated.
Typically you don't directly work with Pods; you work with a higher-level controller like a Deployment. There you can edit these things, and it reacts by creating new Pods with the new pod spec and then deleting the old Pods.
Also remember that sometimes the cluster itself will delete or restart a Pod (if its Node is over capacity or fails, for example) and you don't have any control over this. It's better to plan for your Pods to be periodically restarted than to try to prevent it.
How does Kubernetes create Pods?
I.e. what are the sequential steps involved in creating a Pod, is it implemented in Kubernetes?
Any code reference in Kubernetes repo would also be helpful.
A Pod is described in a definition file, and ran as a set of Docker containers on a given host which is part of the Kubernetes cluster, much like docker-compose does, but with several differences.
Precisely, a Pod always contains multiple Docker containers, even though, only the containers defined by the user are usually visible through the API: A Pod has one container that is a placeholder generated by the Kubernetes API, that will hold the IP for the Pod (so that when a Pod is restarted, it's actually the client containers that are restarted, but the placeholder container remains and keeps the same IP, unlike in straight Docker or docker-compose, where recreating a composition or container changes the IP.)
How Pods are scheduled, created, started, restarted if needed, re-scheduled etc... it a much longer story and very broad question.