Cilium topologyKey: kubernetes.io/hostname: pending pod, didn't match pod anti-affinity rules - cilium

I am trying to install Cilium, but I get error: didn't match pod anti-affinity rules
kubectl get pod cilium-operator-69b677f97c-m4rjw -n kube-system -o yaml
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
io.cilium/app: operator
topologyKey: kubernetes.io/hostname
I am running on a single node and have done(if that is an issue):
kubectl taint nodes --all node-role.kubernetes.io/control-plane-

You don't need affinity if you are running cilium on a single node

Related

Kubernetes Pods accessible from outside cluster

I have two Kubernetes clusters. I have run an Nginx server pod on one cluster. Its pod IP is 10.40.0.1. When I ping 10.40.0.1 from this cluster nodes it can ping easily from any node of this cluster.
when I ping from the second cluster node to the first cluster pod it is not working. How should I make a pod so, that it is accessible from the second cluster node as well?
I have deployed Nginx server with the below YAML file.
apiVersion: v1
kind: Pod
metadata:
name: Serverpod
spec:
containers:
- name: Nginx
image: nginx:latest
ports:
- containerPort: 80
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- Node1
I have tried the hostnetwork: true but it is not working well.
You have posted a pod spec with nodeAffinity in your question which your pod will always run on the Node1.
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- Node1
If you set hostNetwork: true, you can access the pod as curl <IP of Node1 or just Node1 if the name is resolvable to IP>. You can also expose the pod via kubectl expose pod serverpod --type NodePort --name serverpod --port 31000, in this case you can curl <any node IP:31000> and the request will route to your pod by k8s network proxy. These methods work out of the box that do not require you to install any load balancer, ingress controller or service mesh.

Can a Pod with an affinity for one node's label, but without a toleration for that node's taint, mount to that node?

Say you have Node1 with taint node1taint:NoSchedule and label node1specialkey=asdf.
And Node2 with no taints.
Then you create PodA with affinity to Node1:
apiVersion: v1
kind: Pod
metadata:
labels:
name: PodA
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node1specialKey
operator: Exists
containers:
- image: busybox
name: PodA
Which pod should the node schedule to? Will the affinity override the taint?
Thanks!
The pod will not schedule anywhere, because it does not tolerate Node1's taint and it does not have an affinity for Node2.
Here is the missing pod taint that would, in combination with the affinity, successfully schedule PodA on Node1.
tolerations:
- key: "node1taint"
operator: "Exists"
effect: "NoSchedule"
A taint is more powerful than an affinity. The pod needs the toleration, too, because affinity alone is not strong enough here in Kubernetes-land.

Kubernetes: How To Ensure That Pod A Only Ends Up On Nodes Where Pod B Is Running

I have two use cases where teams only want Pod A to end up on a Node where Pod B is running. They often have many Copies of Pod B running on a Node, but they only want one copy of Pod A running on that same Node.
Currently they are using daemonsets to manage Pod A, which is not effective because then Pod A ends up on a lot of nodes where Pod B is not running. I would prefer not to restrict the nodes they can end up on with labels because that would limit the Node capacity for Pod B (ie- if we have 100 nodes and 20 are labeled, then Pod B's possible capacity is only 20).
In short, how can I ensure that one copy of Pod A runs on any Node with at least one copy of Pod B running?
The current scheduler doesn’t really have anything like this. You would need to write something yourself.
As per my understanding, you have Kubernetes cluster with N nodes and some pods of type B is scheduled there. Now you want to have only one instance of pod of type A to be present on the the node where more than zero pods of type B is scheduled. I assume that A<=N and A<=B and ( B>N or B<=N ) (Read <= as greater or equal).
You are using a Daemonset controller to schedule podsA at this moment, and it doesn't work as you want. But you can fix it by forcing Deaemonset to be scheduled by default scheduler instead of DaemonSet controller which schedules its pods without considering pod priority and preemption.
ScheduleDaemonSetPods allows you to schedule DaemonSets using the default scheduler instead of the DaemonSet controller, by adding the NodeAffinity term to the DaemonSet pods, instead of the .spec.nodeName term. The default scheduler is then used to bind the pod to the target host. If node affinity of the DaemonSet pod already exists, it is replaced. The DaemonSet controller only performs these operations when creating or modifying DaemonSet pods, and no changes are made to the spec.template of the DaemonSet.
In addition, node.kubernetes.io/unschedulable:NoSchedule toleration is added automatically to DaemonSet Pods. The default scheduler ignores unschedulable Nodes when scheduling DaemonSet Pods.
So if we add podAaffinity/podAntiAffinity to a Daemonset, the N=number of nodes replicas will be created, but only for nodes that match the condition of (anti)affinity the pods will be scheduled, rest of pods will remain in the Pending state.
Here is an example of such Daemonset:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: ds-splunk-sidecar
namespace: default
labels:
k8s-app: ds-splunk-sidecar
spec:
selector:
matchLabels:
name: ds-splunk-sidecar
template:
metadata:
labels:
name: ds-splunk-sidecar
spec:
affinity:
# podAntiAffinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- splunk-app
topologyKey: "kubernetes.io/hostname"
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
containers:
- name: ds-splunk-sidecar
image: nginx
terminationGracePeriodSeconds: 30
The output of kubectl get pods -o wide | grep splunk:
ds-splunk-sidecar-26cpt 0/1 Pending 0 4s <none> <none> <none> <none>
ds-splunk-sidecar-8qvpx 1/1 Running 0 4s 10.244.2.87 kube-node2-2 <none> <none>
ds-splunk-sidecar-gzn7l 0/1 Pending 0 4s <none> <none> <none> <none>
ds-splunk-sidecar-ls56g 0/1 Pending 0 4s <none> <none> <none> <none>
splunk-7d65dfdc99-nz6nz 1/2 Running 0 2d 10.244.2.16 kube-node2-2 <none> <none>
The output of the kubectl get pod ds-splunk-sidecar-26cpt -o yaml (which is in Pending state). The nodeAffinity section is automatically added to pod.spec without affecting the parent Daemonset configuration:
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2020-04-02T13:10:23Z"
generateName: ds-splunk-sidecar-
labels:
controller-revision-hash: 77bfdfc748
name: ds-splunk-sidecar
pod-template-generation: "1"
name: ds-splunk-sidecar-26cpt
namespace: default
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: DaemonSet
name: ds-splunk-sidecar
uid: 4fda6743-74e3-11ea-8141-42010a9c0004
resourceVersion: "60026611"
selfLink: /api/v1/namespaces/default/pods/ds-splunk-sidecar-26cpt
uid: 4fdf96d5-74e3-11ea-8141-42010a9c0004
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchFields:
- key: metadata.name
operator: In
values:
- kube-node2-1
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- splunk-app
topologyKey: kubernetes.io/hostname
containers:
- image: nginx
imagePullPolicy: Always
name: ds-splunk-sidecar
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: default-token-mxvh9
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/disk-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/memory-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/pid-pressure
operator: Exists
- effect: NoSchedule
key: node.kubernetes.io/unschedulable
operator: Exists
volumes:
- name: default-token-mxvh9
secret:
defaultMode: 420
secretName: default-token-mxvh9
status:
conditions:
- lastProbeTime: null
lastTransitionTime: "2020-04-02T13:10:23Z"
message: '0/4 nodes are available: 1 node(s) didn''t match pod affinity rules,
1 node(s) didn''t match pod affinity/anti-affinity, 3 node(s) didn''t match
node selector.'
reason: Unschedulable
status: "False"
type: PodScheduled
phase: Pending
qosClass: BestEffort
Alternatively you can achieve the similar results using a Deployment controller:
As soon as we only can automatically scale deployments based on Pod metrics (unless you write your own HPA) , we have to set the number of the A replicas equals to N manually. In the case that there is one node without pod B, one pod of A will stay in the pending state.
There is an almost precise example of the setup described in the question using directive requiredDuringSchedulingIgnoredDuringExecution . Please see the section "More Practical Use-cases: Always co-located in the same nodelink" of the "Assigning Pods to Nodes" documentation page:
apiVersion: apps/v1
kind: Deployment
metadata:
name: deplA
spec:
selector:
matchLabels:
app: deplA
replicas: N #<---- Number of nodes in the cluster <= replicas of deplB
template:
metadata:
labels:
app: deplA
spec:
affinity:
podAntiAffinity: # Prevent scheduling more tnan one PodA on the same node
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- deplA
topologyKey: "kubernetes.io/hostname"
podAffinity: # ensures that PodA is schedules only if PodB is present on the same node.
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- deplB
topologyKey: "kubernetes.io/hostname"
containers:
- name: web-app
image: nginx:1.16-alpine
There is only one problem, the same for both cases, If PodB is rescheduled on the different node by any reason and no more PodB is present on the node, PodA will not be evicted automatically from that node.
That problem could be solved by scheduling a CronJob with kubectl image and proper service-account specified, that every ~5 mins kills all PodsA where no corresponding PodB is present on the same node. (Please search for the existing solution on Stack or ask another question about the script content)
As already explained by coderanger- currect scheduler doesn't support this fuction. Ideal solution would be to create your own scheduler to support such functionality.
However you can use to podAffinity to partially schedule pods on same node.
spec:
affinity:
podAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- <your_value>
topologyKey: "kubernetes.io/hostname"
It will try to schedule pods as tightly as possible.

Kubernetes pods scheduled to non-tainted node

I have created a GKE Kubernetes cluster and two workloads deployed on that cluster, There are separate node pools for each workload. The node pool for celery workload is tainted with
celery-node-pool=true.
The pod's spec has the following toleration:
tolerations:
- key: "celery-node-pool"
operator: "Exists"
effect: "NoSchedule"
Despite having the node taint and toleration some of the pods from celery workload are deployed to the non-tainted node. Why is this happening and am I doing something wrong? What other taints and tolerations should I add to keep the pods on specific nodes?
Using Taints:
Taints allow a node to repel a set of pods.You have not specified the effect in the taint. It should be node-pool=true:NoSchedule. Also your other node need to repel this pod so you need to add a different taint to other nodes and not have that toleration in the pod.
Using Node Selector:
You can constrain a Pod to only be able to run on particular Node(s) , or to prefer to run on particular nodes.
You can label the node
kubectl label nodes kubernetes-foo-node-1.c.a-robinson.internal node-pool=true
Add node selector in the pod spec:
kind: Pod
metadata:
name: nginx
labels:
env: test
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
nodeSelector:
node-pool: true
Using Node Affinity
nodeSelector provides a very simple way to constrain pods to nodes with particular labels. The affinity/anti-affinity feature, greatly expands the types of constraints you can express.
apiVersion: v1
kind: Pod
metadata:
name: with-node-affinity
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-pool
operator: In
values:
- true
containers:
- name: with-node-affinity
image: k8s.gcr.io/pause:2.0
What other taints and tolerations should I add to keep the pods on specific nodes?
You should also add a node selector to pin your pods to tainted node, else pod is free to go to a non-tainted node if scheduler wants.
kubectl taint node node01 hostname=node01:NoSchedule
If i taint node01 and want my pods be placed on it with toleration need node selector as well.
nodeSelector provides a very simple way to constrain(affinity) pods to nodes with particular labels.
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
tolerations:
- key: "hostname"
operator: "Equal"
value: "node01"
effect: "NoSchedule"
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
nodeSelector:
kubernetes.io/hostname: node01

In k8s, how to let the nodes choose by itself what kind of pods they would accept

I want one of my node only accepts some kind of pods.
So I wonder, is there a way to make one node only accept those pods with some specific labels?
You have two options:
Node Affinity: property of Pods which attract them to set of nodes.
Taints & Toleration : Taints are opposite of Node Affinity, they allow node to repel set of Pods.
Using Node Affinity
You need to label your nodes:
kubectl label nodes node1 mylabel=specialpods
Then when you launch Pods specify the affinity:
apiVersion: v1
kind: Pod
metadata:
name: mypod
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: mylabel
operator: In
values:
- specialpods
containers:
- name: nginx-container
image: nginx
Using Taint & Toleration
Taint & Toleration work together: you taint a node, and then specify the toleration for pod, only those Pods will be scheduled on node whose toleration "matches" taint:
Taint: kubectl taint nodes node1 mytaint=specialpods:NoSchedule
Add toleration in Pod Spec:
apiVersion: v1
kind: Pod
metadata:
name: mypod
spec:
tolerations:
- key: "mytaint"
operator: "Equal"
value: "specialpods"
effect: "NoSchedule"
containers:
- name: nginx-container
image: nginx