Consider a cluster in which each node has a given taint (let's say NodeType) and a Pod can tolerate a set of NodeType. For example, there are nodes tainted NodeType=A, NodeType=B and NodeType=C.
I'd like to be able to specify for example that some Pods tolerate NodeType=A or NodeType=C, but not NodeType=B. Other Pods (in different Deployments) would tolerate different sets. Is there a way to do this?
Yes, it appears it is possible to do so by adding multiple tolerations with the same key on the pod's spec. An example of the same is given in the official docs.
Here is a demo I tried which works to produce the desired result.
The cluster has three nodes:
kubectl get nodes
NAME STATUS AGE VERSION
dummy-0 Ready 3m17s v1.17.14
dummy-1 Ready 26m v1.17.14
dummy-2 Ready 26m v1.17.14
I tainted them as mentioned in the question using the kubectl taint command:
kubectl taint node dummy-0 NodeType=A:NoSchedule
kubectl taint node dummy-1 NodeType=B:NoSchedule
kubectl taint node dummy-2 NodeType=C:NoSchedule
Created a Deployment with three replicas with the matching tolerations:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- image: nginx
name: nginx-nfs
tolerations:
- key: "NodeType"
operator: "Equal"
value: "A"
effect: "NoSchedule"
- key: "NodeType"
operator: "Equal"
value: "B"
effect: "NoSchedule"
From the kubectl get pods command, we can see that the pods of the Deployment were scheduled only on the nodes dummy-0 and dummy-1 and not on dummy-2 which has a different taint:
kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE
nginx-deployment-5fc8f985d8-2pfvm 1/1 Running 0 8s 100.96.2.11 dummy-0
nginx-deployment-5fc8f985d8-hkrcz 1/1 Running 0 8s 100.96.6.10 dummy-1
nginx-deployment-5fc8f985d8-xfxsx 1/1 Running 0 8s 100.96.6.11 dummy-1
Further, it is important to understand that the taints and tolerations are useful to make sure that the pods don't get scheduled to a particular node.
We should use the concepts of node affinities namely affinity and anti-affinity to make sure that the pods are scheduled to a particular node.
Related
Imagine the following deployment definition in kubernetes:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
env: staging
spec:
...
I have two questions in particular:
1). The label env: staging won't be available in created pods. how can I access this data programmatically in client-go?
2). When pod is created/updated, how can I found which deployment it belongs to?
1). the label env: staging won't be available in created pods. how can I access this data programmatically in client-go?
You can get the Deployment using client-go. See the example Create, Update & Delete Deployment for operations on a Deployment.
2). when pod is created/updated, how can I found which deployment it belongs to?
When a Deployment is created, a ReplicaSet is created that manage the Pods.
See the ownerReferences field of a Pod to see what ReplicaSet manages it. This is described in How a ReplicaSet works
hope you are enjoying your kubernetes journey !
In fact the label won't be available in created pods but you can add it to the manifest, in the pod section:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deploy
labels:
#Here you have the deployment labels
app: nginx
spec:
selector:
matchLabels:
#Here you have the selector that indicates to the deployment
#(more exactly to the replicatsets of the deployment)
#which pod to track to check if the number of replicas is respected.
app: nginx
...
template:
metadata:
labels:
#Here you have the POD labels that needs to match in the selector.matchlabels section
app: nginx
spec:
containers:
- name: nginx-container
image: nginx:latest
...
you can check the pods' labels by typing:
❯ k get po --show-labels
NAME READY STATUS RESTARTS AGE LABELS
nginx-deploy-6bdc4445fd-5qlhg 1/1 Running 0 7m13s app=nginx,pod-template-hash=6bdc4445fd
nginx-deploy-6bdc4445fd-pgkhb 1/1 Running 0 7m13s app=nginx,pod-template-hash=6bdc4445fd
nginx-deploy-6bdc4445fd-xdz59 1/1 Running 0 7m13s app=nginx,pod-template-hash=6bdc4445fd
you can get the deployments' labels by typing:
❯ k get deploy --show-labels
NAME READY UP-TO-DATE AVAILABLE AGE LABELS
nginx-deploy 3/3 3 3 7m39s app=nginx
you can add a custom column in your "kubectl get po" command to display the value of each "app" labels when getting the pods:
❯ k get pod -L app
NAME READY STATUS RESTARTS AGE APP
nginx-deploy-6bdc4445fd-5qlhg 1/1 Running 0 8m30s nginx
nginx-deploy-6bdc4445fd-pgkhb 1/1 Running 0 8m30s nginx
nginx-deploy-6bdc4445fd-xdz59 1/1 Running 0 8m30s nginx
and you can use multiple -L :
❯ k get pod -L app -L test
NAME READY STATUS RESTARTS AGE APP TEST
nginx-deploy-6bdc4445fd-5qlhg 1/1 Running 0 9m46s nginx
nginx-deploy-6bdc4445fd-pgkhb 1/1 Running 0 9m46s nginx
nginx-deploy-6bdc4445fd-xdz59 1/1 Running 0 9m46s nginx
In general, the names of the pod begin by the name of their owner (deployment, replicaset, statefulset, job etc)
When you use a deployment to create a pod, you can be sure that between the deployment and the pod there is a replicaset (The deployment only manages the differents version of the replicaset, while the replicaset only ENSURES that the current number of actual replicas is matching the demanded number of replicas in the manifes, with labels selector ! )
So you in fact, checks the ownerReference filed of a pod, by typing:
❯ kubectl get po -o custom-columns=NAME:'{.metadata.name}',OWNER:'{.metadata.ownerReferences[0].name}',OWNER_KIND:'{.metadata.ownerReferences[0].kind}'
NAME OWNER OWNER_KIND
nginx-deploy-6bdc4445fd-5qlhg nginx-deploy-6bdc4445fd ReplicaSet
nginx-deploy-6bdc4445fd-pgkhb nginx-deploy-6bdc4445fd ReplicaSet
nginx-deploy-6bdc4445fd-xdz59 nginx-deploy-6bdc4445fd ReplicaSet
can do the same with replicasets to get their deployments owner:
❯ kubectl get rs -o custom-columns=NAME:'{.metadata.name}',OWNER:'{.metadata.ownerReferences[0].name}',OWNER_KIND:'{.metadata.ownerReferences[0].kind}'
NAME OWNER OWNER_KIND
nginx-deploy-6bdc4445fd nginx-deploy Deployment
thats how you can quickly see withs kubectl who owns who
here is a little reading about owners and dependants: https://kubernetes.io/docs/concepts/overview/working-with-objects/owners-dependents/
hope this has helped you. bguess
I'm using the Oracle Cloud Infrastructure with Kubernetes and Docker. I've got the following pod:
$ kubectl describe pod $podname -n $namespace
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 19m default-scheduler 0/1 nodes are available: 1 node(s) had taint {nvidia.com/gpu: }, that the pod didn't tolerate.
Warning FailedScheduling 18m default-scheduler 0/1 nodes are available: 1 node(s) had taint {nvidia.com/gpu: }, that the pod didn't tolerate.
I want to add a toleration to this pod - is there a command to do so, without creating the pod config yaml file (as this pod is created by some other systems that I don't want to edit. I just want to add the toleration to resolve this issue.
Thanks.
====================
gpu-config.yaml
apiVersion: v1 # What version of the Kubernetes API to use
kind: Pod # What kind of object you want to create
metadata: # Data that helps uniquely identify the object, including a name, string, UID and optional namespace
name: nvidia-gpu-workload
spec: # What state you desire for the object, differs for every type of Kubernetes object.
restartPolicy: OnFailure
containers:
- name: cuda-vector-add
image: k8s.gcr/io/cuda-vector-add:v0.1
resources:
limits:
nvidia.com/gpu: 1
tolerations:
- key: "nvidia.com/gpu"
operator: "Equal"
effect: "NoSchedule"
# Update command
$ kubectl create -f ./gpu-config.yaml
# All this seems to do is create a pod by the name of nvidia-gpu-workload-v2, and it doesn't add these configurations to the pod that I require.
Just to note that this issue is occurring on a pod called hook-image-awaiter-5tq5 and I don't think I should re-create that pod with a different config as it seems to be configured by part of the system.
Nearly 3 years ago, Kubernetes would not carry out a rolling deployment if you had a single replica (Kubernetes deployment does not perform a rolling update when using a single replica).
Is this still the case? Is there any additional configuration required for this to work?
You are not required to have a minimum number of replicas to rollout an update using Kubernetes Rolling Update anymore.
I tested it on my lab (v1.17.4) and it worked like a charm having only one replica.
You can test it yourself using this Katakoda Lab: Interactive Tutorial - Updating Your App
This lab is setup to create a deployment with 3 replicas. Before starting the lab, edit the deployment and change the number of replicas to one and follow the lab steps.
I created a lab using different example similar to your previous scenario. Here is my deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx-container
image: nginx:1.16.1
ports:
- containerPort: 80
Deployment is running with one replica only:
kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-deployment-6c4699c59c-w8clt 1/1 Running 0 5s
Here I edited my nginx-deployment.yaml and changed the version of nginx to nginx:latest and rolled out my deployment running replace:
$ kubectl replace -f nginx-deployment.yaml
deployment.apps/nginx-deployment replaced
Another option is to change the nginx version using the kubectl set image command:
kubectl set image deployment/nginx-deployment nginx-container=nginx:latest --record
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-deployment-566d9f6dfc-hmlf2 0/1 ContainerCreating 0 3s
nginx-deployment-6c4699c59c-w8clt 1/1 Running 0 48s
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-deployment-566d9f6dfc-hmlf2 1/1 Running 0 6s
nginx-deployment-6c4699c59c-w8clt 0/1 Terminating 0 51s
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-deployment-566d9f6dfc-hmlf2 1/1 Running 0 13s
As you can see, everything worked normally with only one replica.
In the latest version of the documentation we can read:
Deployment ensures that only a certain number of Pods are down while
they are being updated. By default, it ensures that at least 75% of
the desired number of Pods are up (25% max unavailable).
Deployment also ensures that only a certain number of Pods are created
above the desired number of Pods. By default, it ensures that at most
125% of the desired number of Pods are up (25% max surge).
I would like to reserve some worker nodes for a namespace. I see the notes of stackflow and medium
How to assign a namespace to certain nodes?
https://medium.com/#alejandro.ramirez.ch/reserving-a-kubernetes-node-for-specific-nodes-e75dc8297076
I understand we can use taint and nodeselector to achieve that.
My question is if people get to know the details of nodeselector or taint, how can we prevent them to deploy pods into these dedicated worker nodes.
thank you
To accomplish what you need, basically you have to use taint.
Let's suppose you have a Kubernetes cluster with one Master and 2 Worker nodes:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
knode01 Ready <none> 8d v1.16.2
knode02 Ready <none> 8d v1.16.2
kubemaster Ready master 8d v1.16.2
As example I'll setup knode01 as Prod and knode02 as Dev.
$ kubectl taint nodes knode01 key=prod:NoSchedule
$ kubectl taint nodes knode02 key=dev:NoSchedule
To run a pod into these nodes, we have to specify a toleration in spec session on you yaml file:
apiVersion: v1
kind: Pod
metadata:
name: pod1
labels:
env: test
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
tolerations:
- key: "key"
operator: "Equal"
value: "dev"
effect: "NoSchedule"
This pod (pod1) will always run in knode02 because it's setup as dev. If we want to run it on prod, our tolerations should look like that:
tolerations:
- key: "key"
operator: "Equal"
value: "prod"
effect: "NoSchedule"
Since we have only 2 nodes and both are specified to run only prod or dev, if we try to run a pod without specifying tolerations, the pod will enter on a pending state:
$ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod0 1/1 Running 0 21m 192.168.25.156 knode01 <none> <none>
pod1 1/1 Running 0 20m 192.168.32.83 knode02 <none> <none>
pod2 1/1 Running 0 18m 192.168.25.157 knode01 <none> <none>
pod3 1/1 Running 0 17m 192.168.32.84 knode02 <none> <none>
shell-demo 0/1 Pending 0 16m <none> <none> <none> <none>
To remove a taint:
$ kubectl taint nodes knode02 key:NoSchedule-
This is how it can be done
Add new label, say, ns=reserved, label to a specific worker node
Add taint and tolerations to target specific pods on to this worker node
You need to define RBAC roles and role bindings in that namespace to control what other users can do
I am new to kubernetes and trying to deploy openstack on kubernetes cluster, below is the error I see when I try to deploy openstack. I am following the openstack docs to deploy.
kube-system ingress-error-pages-56b4446784-crl85 0/1 Pending 0 1d
kube-system ingress-error-pages-56b4446784-m7jrw 0/1 Pending 0 5d
I have kubernetes cluster with one master and one node running on debain9. I encounted this error during openstack installation on kubernetes.
Kubectl describe pod shows the event as below:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2m (x7684 over 1d) default-scheduler 0/2 nodes are available: 1 PodToleratesNodeTaints, 2 MatchNodeSelector.
All I see is a failed scheduling, Even the container logs for kube scheduler shows it failed to schedule a pod, but doesn't say why it failed? I am kind of struck at this step from past few hours trying to debug....
PS: I am running debian9, kube version: v1.9.2+coreos.0, Docker - 17.03.1-ce
Any help appreciated ....
Looks like you have a toleration on your Pod and don't have nodes with the taints for those tolerations. Would help to post the definition for your Ingress and its corresponding Deployment or DaemonSet.
You would generally taint your node(s) like this:
kubectl taint nodes <your-node> key=value:IngressNode
Then on your PodSpec something like this:
tolerations:
- key: "key"
operator: "Equal"
value: "value"
effect: "IngressNode"
It could also be because of missing labels on your node that your Pod needs in the nodeSelector field:
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
env: test
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
nodeSelector:
cpuType: haswell
Then on you'd add a label to your node.
kubectl label nodes kubernetes-foo-node-1 cpuType=haswell
Hope it helps!