See Every Configuration Field (a Schema?) for Kubernetes REST Objects - kubernetes

I'm new to Kubernetes (K8s). It's my understanding that in order to "do things" in a kubernetes cluster, we interact with a kuberentes REST API endpoint and create/update/delete objects. When these objects are created/updated/deleted K8s will see those changes and take steps to bring the system in-line with the state of your objects.
In other words, you tell K8s you want a "deployment object" with container image foo/bar and 10 replicas and K8s will create 10 running pods with the foo/bar image. If you update the deployment to say you want 20 replicas, K8s will start more pods.
My Question: Is there a canonical description of all the possible configuration fields for these objects? That is -- tutorials liks this one do a good job of describing the simplest possible configuration to get an object like a deployment working, but now I'm curious what else it's possible to do with deployments that go beyond these hello world examples.

Is there a canonical description of all the possible configuration fields for these objects?
Yes, there is the Kubernetes API reference e.g. for Deployment.
But when developing, the easiest way is to use kubectl explain <resource> and navigate deeper, e.g:
kubectl explain Deployment.spec
and then deeper, e.g:
kubectl explain Deployment.spec.template

Related

How to change the behaviour of kube-scheduler in EKS?

I am new at Kubernetes and completely new to setting it up in EKS.
I am trying to achieve sharing of GPU between multiple pods, but for that going through few of the documents and articles, I found out I should update the kube-scheduler configuration with parameters which will then allow me the make the necessary changes for enabling sharing of GPU between pods.
Question
How do I update the kube-scheduler configuration in EKS. If update for the configuration is not possible, is there some other way I can setup kube-scheduler for only those pods which require GPU ?
I think you need a custom kubescheduler, and for your pods to be able to specify whether they want to use the default or the custom scheduler.
Kubernetes supports this: https://kubernetes.io/docs/tasks/extend-kubernetes/configure-multiple-schedulers/ -- basically, create a .yaml file, run kubectl create -f on it, and you should see your scheduler. You'll want to run it in the kube-system namespace, and give it a unique name (so your pods have a way of saying which scheduler they want).
I haven't done this in EKS myself, but would be very surprised if you couldn't run a custom scheduler in EKS. Moreover, this aws blog post https://aws.amazon.com/blogs/opensource/virtual-gpu-device-plugin-for-inference-workload-in-kubernetes/ , which sounds similar to what you're looking for, seems like it would require a custom scheduler.

Manually creating and editing Kubernetes objects

Most Kubernetes objects can be created with kubectl create, but if you need e.g. a DaemonSet — you're out of luck.
On top of that, the objects being created through kubectl can only be customized minimally (e.g. kubectl create deployment allows you to only specify the image to run and nothing else).
So, considering that Kubernetes actually expects you to either edit a minimally configured object with kubectl edit to suit your needs or write a spec from scratch and then use kubectl apply to apply it, how does one figure out all possible keywords and their meanings to properly describe the object they need?
I expected to find something similar to Docker Compose file reference, but when looking at DaemonSet docs, I found only a single example spec that doesn't even explain most of it's keys.
The spec of the resources in .yaml file that you can run kubectl apply -f on is described in Kubernetes API reference.
Considering DeamonSet, its spec is described here. It's template is actually the same as in Pod resource.

kubernetes - kubectl run vs create and apply

I'm just getting started with kubernetes and setting up a cluster on AWS using kops. In many of the examples I read (and try), there will be commands like:
kubectl run my-app --image=mycompany/myapp:latest --replicas=1 --port=8080
kubectl expose deployment my=app --port=80 --type=LoadBalancer
This seems to do several things behind the scenes, and I can view the manifest files created using kubectl edit deployment, and so forth However, i see many examples where people are creating the manifest files by hand, and using commands like kubectl create -f or kubectl apply -f
Am I correct in assuming that both approaches accomplish the same goals, but that by creating the manifest files yourself, you have a finer grain of control?
Would I then have to be creating Service, ReplicationController, and Pod specs myself?
Lastly, if you create the manifest files yourself, how do people generally structure their projects as far as storing these files? Are they simply in a directory alongside the project they are deploying?
The fundamental question is how to apply all of the K8s objects into the k8s cluster. There are several ways to do this job.
Using Generators (Run, Expose)
Using Imperative way (Create)
Using Declarative way (Apply)
All of the above ways have a different purpose and simplicity. For instance, If you want to check quickly whether the container is working as you desired then you might use Generators .
If you want to version control the k8s object then it's better to use declarative way which helps us to determine the accuracy of data in k8s objects.
Deployment, ReplicaSet and Pods are different layers which solve different problems.All of these concepts provide flexibility to k8s.
Pods: It makes sure that related containers are together and provide efficiency.
ReplicaSet: It makes sure that k8s cluster has desirable replicas of the pods
Deployment: It makes sure that you can have different version of Pods and provide the capability to rollback to the previous version
Lastly, It depends on use case how you want to use these concepts or methodology. It's not about which is good or which is bad.
There is a little more nuance to the difference between apply and create than what is already mentioned here. Kubectl create can be used imperatively on the command line or declaratively against a manifest file.
Kubectl apply is used declaratively against a manifest file. You can't use kubectl apply imperatively.
One key difference is when you already have an object and you want to update something. Even if you used a manifest file with kubectl create, you will get an error when you use kubectl create again to update the same resource. But, if you use kubectl apply, you will not get an error. It will update the resource without any issues.
So, the convention is to use kubectl apply to create AND update resources, kubectl create is used to create resources, and kubectl run is used to create a pod with a specific image, namespace, etc. for experimentation and testing with the --dry-run=client option.

How does k8s know which pod to update?

I'm currently getting started with Kubernetes, and so far, I have a question that I could not find answered anywhere.
Until know, I have learned what containers, pods, and replica sets are. I basically understand the things, but one thing I did not get is: If I update a manifest of a pod (or of a replica set), and re-POST it to k8s - how does k8s know which existing manifest this refers to?
Is this matching done by the manifest's name, i.e. by the name of the pod or the replica set? Or …?
In other words: If I update a manifest, how does k8s know that it is an updated one, and how does it detect which one is the one with the previous version?
You are right, k8s uses metadata.name for identifying resources. That name is unique per resource type (Pod/ReplicaSet/...) and namespace.
Well, for starters lets get things straight. When you update manifest, it is obvious what to update in the first place - the object you updated - ie. Deployment or ReplicaSet. Now, when that is updated, the RollingUpdate kicks in, and this is what I assume you wonder about as well as in general how ownership of pod is established. If you make a kubectl get pod -o yaml you can find a keys like ownerReferences, pod-template-hash and kubernetes.io/created-by which should be rather obvious when you see the content. In the other direction (so not from the Pod but from Deployment) you have a selector field which defines what labels are used to filter pods to find the right ones.

What is stored in a kubernetes job and how do I check resource use of old job(s)?

This morning I learned about the (unfortunate) default in kubernetes of all previously run cronjobs' jobs instances being retained in the cluster. Mea culpa for not reading that detail in the documentation. I also notice that deleting jobs (kubectl delete job [<foo> or --all]) takes quite a long time. Further, I noticed that even a reasonably provisioned kubernetes cluster with three large nodes appears to fail (get timeouts of all sorts when trying to use kubectl) when there are just ~750 such old jobs in the system (plus some other active containers that otherwise had not entailed heavy load) [Correction: there were also ~7k pods associated with those old jobs that were also retained :-o]. (I did learn about the configuration settings to limit/avoid storing old jobs from cronjobs, so this won't be a problem [for me] in the future.)
So, since I couldn't find documentation for kubernetes about this, my (related) questions are:
what exactly is stored when kubernetes retains old jobs? (Presumably it's the associated pod's logs and some metadata, but this doesn't explain why they seemed to place such a load on the cluster.)
is there a way to see the resources (disk only, I assume, but maybe
there is some other resource) that individual or collective old jobs
are using?
why does deleting a kubernetes job take on the order of a minute?
I don't know if k8s provides that kinda details of what job is consuming how much disk space but here is something you can try.
Try to find the pods associated with the job:
kubectl get pods --selector=job-name=<job name> --output=jsonpath={.items..metadata.name}
Once you know the pod then find the docker container associated with it:
kubectl describe pod <pod name>
In the above output look for Node & Container ID. Now go on that node and in that node goto path /var/lib/docker/containers/<container id found above> here you can do some investigation to find out what is wrong.