Kubernetes node affinity - Assigning a pod to a specific node?

Kubernetes node affinity - Assigning a pod to a specific node? - kubernetes

The k8s node affinity documentation explains how to deploy a pod to a specific node by first tagging the node with a label and using nodeSelector to pick the node.
However, I have a use case where I have 40-50 deployments in a cluster and I want to add a new node to the cluster and set that node dedicated to one of those deployments/pods without altering all those deployments which don't have any nodeSelector specified
For example, lets say I have 3 deployments, with no nodeSelector defined and 3 worker nodes. This means k8s decides where the pods will be deployed and it could be deployed in one of those 3 nodes. Now I have to create a 4th deployment, and add 4th server and I want to dedicate the 4th deployment to the 4th server and also want to make sure that k8s won't schedule the first 3 deployments to this 4th node. How can I do this without going through all those 3 deployment scheme and apply a nodeSelector filter to not deploy on 4th node? (It would be ok to perform this change on 3 deployments, but I am talking about 50s of deployments in real life scenario)
The only thing that I can think of is to taint the node but if I do it, none of the pods will be scheduled there.
Is there a better approach here to achieve this goal that I am not aware of?

If you taint the 4th node, the 4th deployment with the toleration will be deployed on that node, while the other three won't.

You could use an admission controller to dynamically add NodeSelector into a pod spec at runtime rather than modifying existing deployment yamls. You could write any custom logic in the mutating web-hook to cater to your use case. For example you could have a label/annotation in the new deployment spec and based on the existence of that label/annotation you could dynamically add a NodeSelector.
PodNodeSelector admission web-hook is a good example to refer to.

Related

Can a pod run on multiple nodes?

I have one kubernetes master and three kubernetes nodes. I made one pod which is running on specific node. I want to run that pod on 2 nodes. how can I achieve this? do replica concept help me? if yes how?

Yes, you can assign pods to one or more nodes of your cluster, and here are some options to achieve this:
nodeSelector
nodeSelector is the simplest recommended form of node selection constraint. nodeSelector is a field of PodSpec. It specifies a map of key-value pairs. For the pod to be eligible to run on a node, the node must have each of the indicated key-value pairs as labels (it can have additional labels as well). The most common usage is one key-value pair.
affinity and anti-affinity
Node affinity is conceptually similar to nodeSelector -- it allows you to constrain which nodes your pod is eligible to be scheduled on, based on labels on the node.
nodeSelector provides a very simple way to constrain pods to nodes with particular labels. The affinity/anti-affinity feature, greatly expands the types of constraints you can express. The key enhancements are
The affinity/anti-affinity language is more expressive. The language offers more matching rules besides exact matches created with a logical AND operation;
you can indicate that the rule is "soft"/"preference" rather than a hard requirement, so if the scheduler can't satisfy it, the pod will still be scheduled;
you can constrain against labels on other pods running on the node (or other topological domain), rather than against labels on the node itself, which allows rules about which pods can and cannot be co-located
DaemonSet
A DaemonSet ensures that all (or some) Nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected. Deleting a DaemonSet will clean up the Pods it created.
Some typical uses of a DaemonSet are:
running a cluster storage daemon on every node
running a logs collection daemon on every node
running a node monitoring daemon on every node
Please check this link to read more about how to assign pods to nodes.

It's not a good practice to run the pods directly on the nodes as the nodes/pods can crash at any time. It's better use the K8S controllers as mentioned in the K8S documentation here.
K8S supports multiple containers and depending on the requirement the appropriate controller can be used. By looking at the OP it's difficult to say which controller to use.

You can use daemonset, if you want to run pod on each node.
What I see is you are trying to deploy pod on each node, it's better if you allow the scheduler to make decision where the pod need to be deployed based on the resources.
This would be best in all worst scenario's.
I'm mean in case of node failures.

How to deploy deployments in multiple nodes in kubernetes?

i have a bare-metal kubernetes cluster with 1 master node and 4 worker nodes.
I want to deploy my deployment objects on every 4 worker nodes but i can't.
I try nodeSelector but looks like it only works on last key:value pair label.
Please help me.

If you want to ensure that all nodes have that pod on them you can use a DaemonSet.
You can also use affinity/anti-affinity selectors.
Inter-pod affinity and anti-affinity allow you to constrain which nodes your pod is eligible to be scheduled based on labels on pods that are already running on the node rather than based on labels on nodes. The rules are of the form “this pod should (or, in the case of anti-affinity, should not) run in an X if that X is already running one or more pods that meet rule Y”
If you don't want to two instances are located on the same host, check following link
https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#never-co-located-in-the-same-node

What happens when we scale the kubernetes deployment and change one of the pod or container configuration?

When i scale the application by creating deployment .Let's say i am running nginx service on 3 cluster.
Nginx is running in containers in multiple pods .
If i change nginx configuration in one of the pod ,does it propagate to all the nodes and pods because it is running in cluster and scaled.

does it propagate to all the nodes and pods because it is running in
cluster and scaled.
No. Only when you change the deployment yaml. Then it re-creates pods 1 by 1 with the new configuration.

I would like to add a few more things to what was already said. First of all you are even not supposed to do any changes to Pods which are managed let's say by ReplicaSet, ReplicationController or Deployment. This are objects which provide additional abstraction layer and it is their responsibility to ensure that there are given number of Pods of a certain kind running in your kubernetes cluster.
It doesn't matter how many nodes your cluster consists of as mentioned controllers span across all nodes in the cluster.
Changes made in a single Pod will not only not propagate to other Pods but may be easily lost if such newly created Pod with changed configuration crashes.
Remember that one of the tasks of the Deployment is to make sure that certain number of Pods of a given type ( specified in a Pod template section of the Deployment ) are always up and running. When your manually reconfigured Pod goes down then your Deployment (actually ReplicaSet created by the Deployment) acts behind the scenes and recreates such Pod. But how does it recreate it ? Does it take into consideration changes introduced by you to such Pod ? Of course not, it will recreate it based on the template it is given in the Deployment.
If you want to make changes in your Pods one by one kubernetes allows you to do so by providing so called rolling update mechanism.
Here you can read about old-fashioned approach using ReplicationController which is not used any more as it is replaced by Deployments and ReplicaSets but I think it's still worth reading just to grasp the concept.
Currently Deployment is the way to go. About updating a Deployment you can read here. Note that the default update strategy is RollingUpdate which ensures that changes are not applied to all Pods at once but one by one.

How Replicaset includes pods with specific labels

If I give some specific label to pods and define replicaset saying to include pods with same labels, it includes that pod in it. That is all fine and good..
( i know pods are not to be created separately, but are supposed to be created with deployments or replicaset.. but still how deployments/replicasets include pods whose label match in the defination, if they are already there for some reason)
BUT, how does this work behind the scene ? How replicaset knows that pod is to be included as it has the same label ? Lets say, I already have a pod with those labels, how does newly created replica set know that pod is to be included if it has pods less than desired number of pods ?
Does it get that information from etcd ? Or pods expose labels somehow ? How does this thing work really behind the scene ?

As stated in the Kubernetes documentation regarding ReplicaSet.
A ReplicaSet is defined with fields, including a selector that specifies how to identify Pods it can acquire, a number of replicas indicating how many Pods it should be maintaining, and a pod template specifying the data of new Pods it should create to meet the number of replicas criteria. A ReplicaSet then fulfills its purpose by creating and deleting Pods as needed to reach the desired number. When a ReplicaSet needs to create new Pods, it uses its Pod template.
It's recommended to use Deployments instead of ReplicaSets.
Deployment is an object which can own ReplicaSets and update them and their Pods via declarative, server-side rolling updates. While ReplicaSets can be used independently, today they’re mainly used by Deployments as a mechanism to orchestrate Pod creation, deletion and updates. When you use Deployments you don’t have to worry about managing the ReplicaSets that they create. Deployments own and manage their ReplicaSets. As such, it is recommended to use Deployments when you want ReplicaSets.
Like you mentioned if you have a Pod with label matching the ReplicaSet label, ReplicaSet will take control over the pod. If you deploy ReplicaSet with 3 replicas and Pod was deployed before that, then RS will spawn only 2 Pods with the matching label. It's explained with details and examples on Non-Template Pod acquisitions.
As to how it works behind the scenes, you can have a look at slides #47-56 of Kubernetes Architecture - beyond a black box - Part 1

kubernetes node affinity require pod restart even the pod has meet the rule in node affinity

I have a running pod (pod-1), deployed from a k8s deployment (deploy-1), on k8s node-1. Someday, I want to patch node affinity to this deployment. For example, the target node must have label 'data=allowed'.
My steps:
Add label 'data=allowed' to node-1 first
Patch the node affinity definition to deploy-1
My expectation is that the pod-1 should not be rescheduled by k8s, since it is already on node-1, which is already meet the node affinity rule(Step 1). But the result is that pod-1 was recreated, although still on node-1.
Is there any configuration to prevent the recreation if the living pod/deployment has meet the new defined node affinity rule? Thanks.

As you want to change the state of the cluster by adding a new label to the deployment which means your desired state has been changed, therefore, k8s makes sure that current state === the desired state. This is fundamental design.
In order to utilise above functionality, We need to use declarative approach rather than imperative approach.
For instance, It's better to use apply operation rather than create operation in k8s cluster.
Now If you wish to change or modify other fields in k8s resources, It makes sure that dependent fields won't change or restart container or external IPs.
I have added the reference for further research.
kubectl-apply-vs-kubectl-create
object-management/

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse