How to ensure only one pod runs on my node in GKE? - kubernetes

In my application, I have a rest server which locally interacts with a database via the command line (it's a long story). Anyway, the database is mounted in a local ssd on the node. I can guarantee that only pods of that type will by scheduled in the node pool, as I have tainted the nodes and added tolerances to my pods.
What I want to know is, how can I prevent kubernetes from scheduling multiple instances of my pod on a single node? I want to avoid this as I want my pod to be able to consume as much CPU as possible, and I also don't want multiple pods to interact via the local ssd.
How do I prevent scheduling of more than one pod of my type onto the node? I was thinking daemon sets at first, but I think down the line, I want to set my node pool to auto scale, that way when I have n nodes in my pool, and I request n+1 replicas, the node pool automatically scales up.

You can use Daemonsets in combination with nodeSelector or affinity. Alternatively you could configure podAntiAffinity on your Pods, for example:
apiVersion: apps/v1
kind: Deployment
metadata:
name: rest-server
spec:
selector:
matchLabels:
app: rest-server
replicas: 3
template:
metadata:
labels:
app: rest-server
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- rest-server
topologyKey: "kubernetes.io/hostname"
containers:
- name: rest-server
image: nginx:1.12-alpine

Depending on what you are trying to achieve, DaemonSets might not be a complete answer, because DaemonSet is NOT auto scaled, and it will only place a pod in a new node; when you add new nodes to your pool.
If you are looking to modify your workload with n+1 replicas,it’s better to use podAntiAffinity, controlling scheduling with node taints and cluster autoscaler; this will guarantee that a new node will be added when you increase your pods and deleted when you scale down your pods:
apiVersion: v1
kind: ReplicationController
metadata:
name: echoheaders
spec:
replicas: 1
template:
metadata:
labels:
app: echoheaders
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- echoheaders
topologyKey: "kubernetes.io/hostname"
containers:
- name: echoheaders
image: k8s.gcr.io/echoserver:1.4
ports:
- containerPort: 8080
tolerations:
- key: dedicated
operator: Equal
value: experimental
effect: NoSchedule

I can suggest two ways of going about this. One is to restrict the number of pods schedulable on a node, and the other is to assign the pod to a given node while requesting for the entirety of the resources available in the node.
1. Restricting number of schedulable pods per node
You can set this restriction when you're creating a new cluster, however it is limiting if later you change your mind. Find the following field in the advanced settings as you create the cluster.
2. Assigning pod to specific node and occupying all resources
Another option is to set the resource request numbers such that they match a node's resources, and assign it to a given node, using the nodeSelector and labels.
Check out this link for how to assign pods to a specific node.

Related

Is there a way to deploy DaemonSet only on nodes with active deployments?

I'm deploying my service under my namespace. We have a worker node farm and my service will only utilize a small subset of its nodes.
I want to deploy a DaemontSet of cAdvisors but I only want them to run in the nodes not related to my namespace. If possible, how to do that?
Thanks!
ScheduleDaemonSetPods is a kubernetes feature that allows you to schedule DaemonSets using the default scheduler instead of the DaemonSet controller, by adding the NodeAffinity term to the DaemonSet pods, instead of the .spec.nodeName term. Kubernetes Document
For example below k8s manifest will create Pods on nodes with type=target-host-name:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: testdaemon
spec:
template:
metadata:
labels:
app: testdaemon
annotations:
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: type
operator: In
values:
- target-host-name
containers:
- name: testdaemon
image: nginx
DaemonSet runs one pod per node. You cant bypass a node.

Hybrid between replicaset and daemonset

Is there such a thing as a hybrid between a replicaset and a daemonset.
I want to specify that I always want to have 2 pods up. But those pods must
never be on the same node. (and I have like 10 nodes)
Is there a way I can achieve this?
In a deployment or replicaSet you can use podAffinity and podAntiaffinity.
Inter-pod affinity and anti-affinity allow you to constrain which nodes your pod is eligible to be scheduled, based on labels on pods that are already running on the node rather than based on labels on nodes.
The rules are of the form “this pod should (or, in the case of anti-affinity, shouldn't) run in an X if that X is already running one or more pods that meet rule Y”. Y is expressed as a LabelSelector with an optional associated list of namespaces.
apiVersion: apps/v1
kind: Deployment
metadata:
name: web
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: k8s.gcr.io/nginx-slim:0.8
ports:
- containerPort: 80
name: web
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- nginx
topologyKey: "kubernetes.io/hostname"
Above example nginx pod1 and pod2 will never be scheduled on same node.
Find more details in the official docs.

Kubernetes pod in more replicas - anti affinity rule

I have a question about K8S pod anti affinity rule. What I have and what I need.
I have cross 2 data centers K8S cluster. In every DC is for example 5 nodes. I have a Deployment for Pod, which runs in 10 replicas cross all 10 nodes. For every node is 1 Pod replica.
And I want to set up rule for case, if one DC will crash, to not migrate 5 replicas from crashed DC to health DC.
I found, that it could be possible to do it throught "Anti-affinity" rule, but I can't find any example for this scenario. Do you have example for it?
From the documentation
https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
you need to set a selector on your deployment and indicate in the anti-affinity section what is the value to match and make the anti-affinity true:
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis-cache
spec:
selector:
matchLabels:
app: store
replicas: 3
template:
metadata:
labels:
app: store
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- store
topologyKey: "kubernetes.io/hostname"
containers:
- name: redis-server
image: redis:3.2-alpine
You can see that it is using a label selector that try to find the key app with value store it means that if a node has already a pod with that label and value kubernetes will apply anti-affinity.
look at DaemonSet. It will deploy one replica on each node.
If one DC is crashed then the pods will not be redeployed to other DC.

Are 2 OpenShift pods replicas deployed on two different nodes (when #nodes > 2)?

Assume I have a cluster with 2 nodes and a POD with 2 replicas. Can I have the guarantee that my 2 replicas are deployed in 2 differents nodes. So that when a node is down, the application keeps running. By default does the scheduler work on best effort mode to assign the 2 replicas in distinct nodes?
Pod AntiAffinity
Pod anti-affinity can also to repel the pod from each other. so no two pods can be scheduled on same node.
Use following configurations.
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- nginx
topologyKey: "kubernetes.io/hostname"
containers:
- name: nginx
image: nginx
This will use the anti-affinity feature so if you are having more than 2 nodes the there will be guarantee that no two pod will be scheduled on same node.
You can use kind: DeamonSet . Here is a link to Kubernetes DeamonSet documentation.
A DaemonSet ensures that all (or some) Nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected. Deleting a DaemonSet will clean up the Pods it created.
Here is a link to documentation about DeamonSets in OpenShift
Example might look like the following:
This is available on Openshift >= 3.2 version of openshift This use case is to run a specific docker container (veermuchandi/welcome) on all nodes (or a set nodes with specific label
Enable HostPorts expose on Openshift
$ oc edit scc restricted #as system:admin user
change allowHostPorts: true and save
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: welcome
spec:
template:
metadata:
name: welcome
labels:
daemon: welcome
spec:
containers:
- name: c
image: veermuchandi/welcome
ports:
- containerPort: 8080
hostPort: 8080
name: serverport
$ oc create -f myDaemonset.yaml #with system:admin user
Source available here
Daemonset is not a good option. It will schedule one pod on every node. In future if you scale your cluster and then pods get scaled as many as nodes. Instead Use pod affinity to schedule no more than one pod on any node

Avoiding kubernetes scheduler to run all pods in single node of kubernetes cluster

I have one kubernetes cluster with 4 nodes and one master. I am trying to run 5 nginx pod in all nodes. Currently sometimes the scheduler runs all the pods in one machine and sometimes in different machine.
What happens if my node goes down and all my pods were running in same node? We need to avoid this.
How to enforce scheduler to run pods on the nodes in round-robin fashion, so that if any node goes down then at at least one node should have NGINX pod in running mode.
Is this possible or not? If possible, how can we achieve this scenario?
Use podAntiAfinity
Reference: Kubernetes in Action Chapter 16. Advanced scheduling
The podAntiAfinity with requiredDuringSchedulingIgnoredDuringExecution can be used to prevent the same pod from being scheduled to the same hostname. If prefer more relaxed constraint, use preferredDuringSchedulingIgnoredDuringExecution.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: nginx
spec:
replicas: 5
template:
metadata:
labels:
app: nginx
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution: <---- hard requirement not to schedule "nginx" pod if already one scheduled.
- topologyKey: kubernetes.io/hostname <---- Anti affinity scope is host
labelSelector:
matchLabels:
app: nginx
container:
image: nginx:latest
Kubelet --max-pods
You can specify the max number of pods for a node in kubelet configuration so that in the scenario of node(s) down, it will prevent K8S from saturating another nodes with pods from the failed node.
Use Pod Topology Spread Constraints
As of 2021, (v1.19 and up) you can use Pod Topology Spread Constraints topologySpreadConstraints by default and I found it more suitable than podAntiAfinity for this case.
The major difference is that Anti-affinity can restrict only one pod per node, whereas Pod Topology Spread Constraints can restrict N pods per nodes.
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-example-deployment
spec:
replicas: 6
selector:
matchLabels:
app: nginx-example
template:
metadata:
labels:
app: nginx-example
spec:
containers:
- name: nginx
image: nginx:latest
# This sets how evenly spread the pods
# For example, if there are 3 nodes available,
# 2 pods are scheduled for each node.
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: nginx-example
For more details see KEP-895 and an official blog post.
I think the inter-pod anti-affinity feature will help you.
Inter-pod anti-affinity allows you to constrain which nodes your pod is eligible to schedule on based on labels on pods that are already running on the node. Here is an example.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
run: nginx-service
name: nginx-service
spec:
replicas: 3
selector:
matchLabels:
run: nginx-service
template:
metadata:
labels:
service-type: nginx
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: service-type
operator: In
values:
- nginx
topologyKey: kubernetes.io/hostname
containers:
- name: nginx-service
image: nginx:latest
Note: I use preferredDuringSchedulingIgnoredDuringExecution here since you have more pods than nodes.
For more detailed information, you can refer to the Inter-pod affinity and anti-affinity (beta feature) part of following link:
https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
The scheduler should spread your pods if your containers specify resource request for the amount of memory and CPU they need. See
http://kubernetes.io/docs/user-guide/compute-resources/
We can use Taint or toleration to avoid pods deployed into an node or not to deploy into a node.
Tolerations are applied to pods, and allow (but do not require) the pods to schedule onto nodes with matching taints.
Taints and tolerations work together to ensure that pods are not scheduled onto inappropriate nodes. One or more taints are applied to a node; this marks that the node should not accept any pods that do not tolerate the taints.
A sample deployment yaml will be like
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
run: nginx-service
name: nginx-service
spec:
replicas: 3
selector:
matchLabels:
run: nginx-service
template:
metadata:
labels:
service-type: nginx
spec:
containers:
- name: nginx-service
image: nginx:latest
tolerations:
- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoSchedule"
You can find more information at https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/#:~:text=Node%20affinity%2C%20is%20a%20property,onto%20nodes%20with%20matching%20taints.