Is it possible to add/modify kubernetes container spec based on clusterwide setting - kubernetes

I have a kubernetes-based application that uses an operator to build and deploy containers in pods. Sometimes I'd like to run containers in privileged mode to enable performance tracing, but since I'm not deploying the pod/containers directly from a manifest, I cannot simply add privileged mode and the debugfs filesystem mount.
That leaves me to fork the operator code, change where it builds the container spec, and redeploy with the modified operator. Doable, but awkward.
So my question is, is it possible to impose additional attributes to be added to container specs based on some clusterwide setting, either before pods are deployed by the operator? Or to modify the container spec after deployment? I tried that with kubectl edit pod mypod, but that didn't work.
This is on a physical cluster installed with kubespray.

There are three things to consider:
Your operator can create a controller (e.g. Deployment) instead of Pod, which allows modifications in the Pod Spec area, thus triggering Deployment's rollout (see rolling update strategy).
Use MutatingAdmissionWebhook
so before creating the Pod, its manifest would be modified/overwritten on the fly.
More info regarding MutatingAdmissionWebhook can be found here and here.
A workaround solution in a form of modifying the supply spec -> swapping the pod-a.
More about this was discussed here.
Please let me know if any of the above helped.

Related

Facing "The Pod "web-nginx" is invalid: spec.initContainers: Forbidden: pod updates may not add or remove containers" applying pod with initcontainers

I was trying to make file before application gets up in kubernetes cluster with initcontainers,
But when i am setting up the pod.yaml and trying to apply it with "kubectl apply -f pod.yaml" it throws below error
error-image
Like the error says, you cannot update a Pod adding or removing containers. To quote the documentation ( https://kubernetes.io/docs/concepts/workloads/pods/#pod-update-and-replacement )
Kubernetes doesn't prevent you from managing Pods directly. It is
possible to update some fields of a running Pod, in place. However,
Pod update operations like patch, and replace have some limitations
This is because usually, you don't create Pods directly, instead you use Deployments, Jobs, StatefulSets (and more) which are high-level resources that defines Pods templates. When you modify the template, Kubernetes simply delete the old Pod and then schedule the new version.
In your case:
you could delete the pod first, then create it again with the new specs you defined. But take into consideration that the Pod may be scheduled on a different node of the cluster (if you have more than one) and that may have a different IP Address as Pods are disposable entities.
Change your definition with a slightly more complex one, a Deployment ( https://kubernetes.io/docs/concepts/workloads/controllers/deployment/ ) which can be changed as desired and, each time you'll make a change to its definition, the old Pod will be removed and a new one will be scheduled.
From the spec of your Pod, I see that you are using a volume to share data between the init container and the main container. This is the optimal way but you don't necessarily need to use a hostPath. If the only needs for the volume is to share data between init container and other containers, you can simply use emptyDir type, which acts as a temporary volume that can be shared between containers and that will be cleaned up when the Pod is removed from the cluster for any reason.
You can check the documentation here: https://kubernetes.io/docs/concepts/storage/volumes/#emptydir

How to change the behaviour of kube-scheduler in EKS?

I am new at Kubernetes and completely new to setting it up in EKS.
I am trying to achieve sharing of GPU between multiple pods, but for that going through few of the documents and articles, I found out I should update the kube-scheduler configuration with parameters which will then allow me the make the necessary changes for enabling sharing of GPU between pods.
Question
How do I update the kube-scheduler configuration in EKS. If update for the configuration is not possible, is there some other way I can setup kube-scheduler for only those pods which require GPU ?
I think you need a custom kubescheduler, and for your pods to be able to specify whether they want to use the default or the custom scheduler.
Kubernetes supports this: https://kubernetes.io/docs/tasks/extend-kubernetes/configure-multiple-schedulers/ -- basically, create a .yaml file, run kubectl create -f on it, and you should see your scheduler. You'll want to run it in the kube-system namespace, and give it a unique name (so your pods have a way of saying which scheduler they want).
I haven't done this in EKS myself, but would be very surprised if you couldn't run a custom scheduler in EKS. Moreover, this aws blog post https://aws.amazon.com/blogs/opensource/virtual-gpu-device-plugin-for-inference-workload-in-kubernetes/ , which sounds similar to what you're looking for, seems like it would require a custom scheduler.

Add Sidecar container to running pod(s)

I have helm deployment scripts for a vendor application which we are operating. For logging solution, I need to add a sidecar container for fluentbit to push the logs to aggregated log server (splunk in this case).
Now to define this sidecar container, I want to avoid changing vendor defined deployment scripts. Instead i want some alternative way to attach the sidecar container to the running pod(s).
So far I have understood that sidecar container can be defined inside the same deployment script (deployment configuration).
Answering the question in the comments:
thanks #david. This has to be done before the deployment. I was wondering if I could attach a sidecar container to an already deployed (running) pod.
You can't attach the additional container to a running Pod. You can update (patch) the resource definition. This will force the resource to be recreated with new specification.
There is a github issue about this feature which was closed with the following comment:
After discussing the goals of SIG Node, the clear consensus is that the containers list in the pod spec should remain immutable. #27140 will be better addressed by kubernetes/community#649, which allows running an ephemeral debugging container in an existing pod. This will not be implemented.
-- Github.com: Kubernetes: Issues: Allow containers to be added to a running pod
Answering the part of the post:
Now to define this sidecar container, I want to avoid changing vendor defined deployment scripts. Instead i want some alternative way to attach the sidecar container to the running pod(s).
Below I've included two methods to add a sidecar to a Deployment. Both of those methods will reload the Pods to match new specification:
Use $ kubectl patch
Edit the Helm Chart and use $ helm upgrade
In both cases, I encourage you to check how Kubernetes handles updates of its resources. You can read more by following below links:
Kubernetes.io: Docs: Tutorials: Kubernetes Basics: Update: Update
Medium.com: Platformer blog: Enable rolling updates in Kubernetes with zero downtime
Use $ kubectl patch
The way to completely avoid editing the Helm charts would be to use:
$ kubectl patch
This method will "patch" the existing Deployment/StatefulSet/Daemonset and add the sidecar. The downside of this method is that it's not automated like Helm and you would need to create a "patch" for every resource (each Deployment/Statefulset/Daemonset etc.). In case of any updates from other sources like Helm, this "patch" would be overridden.
Documentation about updating API objects in place:
Kubernetes.io: Docs: Tasks: Manage Kubernetes objects: Update api object kubectl patch
Edit the Helm Chart and use $ helm upgrade
This method will require editing the Helm charts. The changes made like adding a sidecar will persist through the updates. After making the changes you will need to use the $ helm upgrade RELEASE_NAME CHART.
You can read more about it here:
Helm.sh: Docs: Helm: Helm upgrade
A kubernetes ressource is immutable, as mention by dawid-kruk . Therefore modifing the pod description will cause the containers to restart.
You can modify the pod using the kubectl patch command, don't forget to reapply the. Patch as necessary.
Alternatively The two following options will inject the sidecar without having to modify/fork upstream chart or mangling deployed ressources.
#1 mutating admission controller
A mutating admission controller (webhook) can modify ressources see https://kubernetes.io/docs/reference/access-authn-authz/admission-controllers/
You can use a generic framework like opa.
Or a specific webhook like fluentd-sidecar-injector (not tested)
#2 support arbitrary sidecar in helm
You could submit a feature request to the chart mainter to supooort arbitrary sidecar injection, like in Prometheus, see https://stackoverflow.com/a/62910122/1260896

Designing K8 pod and proceses for initialization

I have a problem statement where in there is a Kubernetes cluster and I have some pods running on it.
Now, I want some functions/processes to run once per deployment, independent of number of replicas.
These processes use the same image like the image in deployment yaml.
I cannot use initcontainers and sidecars, because they will run along with main container on pod for each replica.
I tried to create a new image and then a pod out of it. But this pod keeps on running, which is not good for cluster resource, as it should be destroyed after it has done its job. Also, the main container depends on the completion on this process, in order to run the "command" part of K8 spec.
Looking for suggestions on how to tackle this?
Theoretically, You could write an admission controller webhook for intercepting create/update deployments and triggering your functions as you want. If your functions need to be checked, use ValidatingWebhookConfiguration for validating the process and then deny or accept commands.

Kubernetes rolling update vs set image

After some intense google and SO search i couldn't find any document that mentions both rolling update and set image, and can stress the difference between the two.
Can anyone shed light? When would I rather use either of those?
EDIT: It's worth mentioning that i'm already working with deployments (rather than replication controller directly) and that I'm using yaml configuration files. It would also be nice to know if there's a way to perform any of those using configuration files rather than direct commands.
In older k8s versions the ReplicationController was the only resource to manage a group of replicated pods. To update the pods of a ReplicationController you use kubectl rolling-update.
Later, k8s introduced the Deployment which manages ReplicaSet resources. The Deployment could be updated via kubectl set image.
Working with Deployment resources (as you already do) is the preferred way. I guess the ReplicationController and its rolling-update command are mainly still there for backward compatibility.
UPDATE: As mentioned in the comments:
To update a Deployment I used kubectl patch as it could also change things like adding new env vars whereas kubectl set image is rather limited and can only change the image version. Also note, that patch can be applied to all k8s resources and is not restricted to be used with a Deployment.
Later, I shifted my deployment processes to use helm - a really neat and k8s native package management tool. Can highly recommend to have a look at it.