How to change the behaviour of kube-scheduler in EKS? - kubernetes

I am new at Kubernetes and completely new to setting it up in EKS.
I am trying to achieve sharing of GPU between multiple pods, but for that going through few of the documents and articles, I found out I should update the kube-scheduler configuration with parameters which will then allow me the make the necessary changes for enabling sharing of GPU between pods.
Question
How do I update the kube-scheduler configuration in EKS. If update for the configuration is not possible, is there some other way I can setup kube-scheduler for only those pods which require GPU ?

I think you need a custom kubescheduler, and for your pods to be able to specify whether they want to use the default or the custom scheduler.
Kubernetes supports this: https://kubernetes.io/docs/tasks/extend-kubernetes/configure-multiple-schedulers/ -- basically, create a .yaml file, run kubectl create -f on it, and you should see your scheduler. You'll want to run it in the kube-system namespace, and give it a unique name (so your pods have a way of saying which scheduler they want).
I haven't done this in EKS myself, but would be very surprised if you couldn't run a custom scheduler in EKS. Moreover, this aws blog post https://aws.amazon.com/blogs/opensource/virtual-gpu-device-plugin-for-inference-workload-in-kubernetes/ , which sounds similar to what you're looking for, seems like it would require a custom scheduler.

Related

Any way we can add an ENV to a pod or a new pod in kubernetes?

Summarize the problem:
Any way we can add an ENV to a pod or a new pod in kubernetes?
For example, I want to add HTTP_PROXY to many pods and the new pods it will generate in kubeflow 1.4. So these pods can be access to internet.
Describe what you’ve tried:
I searched and found istio maybe do that, but it's too complex for me.
The second, there are too many yamls in kubeflow, as to I cannot modify them one by one to use configmap or add ENV just in them.
So anyone has a good simle way to do this? Like doing this in kubernetes configuation.
Use "PodPreset" object to inject common environment variables and other params to all the matching pods.
Please follow below article
https://v1-19.docs.kubernetes.io/docs/tasks/inject-data-application/podpreset/
If PodPreset is indeed removed from v1.20, then you seem to need a webhook.
You will have to run an additional service in your cluster that will change the configuration of the pods.
Here is an example, on the basis of which I created my webhook, which changed the configuration of the pods in the cluster, in this example the developer used the logic adding a sidecar to the pod, but you can set your own to forward the required ENV:
https://github.com/morvencao/kube-mutating-webhook-tutorial/blob/master/medium-article.md

Designing K8 pod and proceses for initialization

I have a problem statement where in there is a Kubernetes cluster and I have some pods running on it.
Now, I want some functions/processes to run once per deployment, independent of number of replicas.
These processes use the same image like the image in deployment yaml.
I cannot use initcontainers and sidecars, because they will run along with main container on pod for each replica.
I tried to create a new image and then a pod out of it. But this pod keeps on running, which is not good for cluster resource, as it should be destroyed after it has done its job. Also, the main container depends on the completion on this process, in order to run the "command" part of K8 spec.
Looking for suggestions on how to tackle this?
Theoretically, You could write an admission controller webhook for intercepting create/update deployments and triggering your functions as you want. If your functions need to be checked, use ValidatingWebhookConfiguration for validating the process and then deny or accept commands.

Kubernetes: run a job on node reboot only once

Is there a way to run a job only once upon machine reboot in Kubernetes?
Thought of running a cronjob as static pod but seems kubelet does not like it.
Edit: Based on the replies, I'd like to clarify. I'm only looking at doing this through native Kubernetes. I'm ok writing a cronjob in Kubernetes but I need this to run only once and upon node reboot.
Not sure your platform.
For example, in AWS, ec2 instance has user_data (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html)
It will run commands on Your Linux/Windows Instance at Launch
You should be fine to find the similar solutions for other cloud providers or on-premise servers.
If I understand you correctly you should consider using DaemonSet:
A DaemonSet ensures that all (or some) Nodes run a copy of a Pod. As
nodes are added to the cluster, Pods are added to them. As nodes are
removed from the cluster, those Pods are garbage collected. Deleting a
DaemonSet will clean up the Pods it created.
That way you could create a container with a job that would be run from a DaemonSet.
Alternatively you could consider DaemonJob:
This is an example CompositeController that's similar to Job, except
that a pod will be scheduled to each node, similar to DaemonSet.
Also there are:
Kubebuilder
Kubebuilder is a framework for building Kubernetes APIs using custom
resource definitions (CRDs).
and:
Metacontroller
Metacontroller is an add-on for Kubernetes that makes it easy to write
and deploy custom controllers in the form of simple scripts.
But the first option that I have provided would be easier to implement in my opinion.
Please let me know if that helped.
Consider using CronJob for this. It takes the same format as normal cron scheduler on linux.
Besides the usual format (minute / hour / day of month / month / day of week) that is widely used to indicate a schedule, cron scheduler also allows the use of #reboot.
This directive, followed by the absolute path to the script, will cause it to run when the machine boots.

Is it possible to add/modify kubernetes container spec based on clusterwide setting

I have a kubernetes-based application that uses an operator to build and deploy containers in pods. Sometimes I'd like to run containers in privileged mode to enable performance tracing, but since I'm not deploying the pod/containers directly from a manifest, I cannot simply add privileged mode and the debugfs filesystem mount.
That leaves me to fork the operator code, change where it builds the container spec, and redeploy with the modified operator. Doable, but awkward.
So my question is, is it possible to impose additional attributes to be added to container specs based on some clusterwide setting, either before pods are deployed by the operator? Or to modify the container spec after deployment? I tried that with kubectl edit pod mypod, but that didn't work.
This is on a physical cluster installed with kubespray.
There are three things to consider:
Your operator can create a controller (e.g. Deployment) instead of Pod, which allows modifications in the Pod Spec area, thus triggering Deployment's rollout (see rolling update strategy).
Use MutatingAdmissionWebhook
so before creating the Pod, its manifest would be modified/overwritten on the fly.
More info regarding MutatingAdmissionWebhook can be found here and here.
A workaround solution in a form of modifying the supply spec -> swapping the pod-a.
More about this was discussed here.
Please let me know if any of the above helped.

Kubernetes: Is there a way to update "nodeAffinity" and "schedulerName" in Pod configuration via the Kubernetes API when the Pod is in Pending state?

I have some specific pod-scheduling needs. So I am writing my own custom scheduler.
However, I wonder if the custom scheduler can simply update the pod configuration, specifically the "nodeAffinity" and "schedulerName" fields, so the scheduling can be delegated back to kube-scheduler.
This way, the custom scheduler need not emulate the entire kube-scheduler, and can make use of any updates in kube-scheduler in the future versions.
This is why I asked if a Job or Pod running in the cluster can update the nodeAffinity and schedulerName fields of the pod configuration - say via the Kubernetes API or any other way?
you can modify the validating part of validation.go. i did that and succeed。