Can a pod run on multiple nodes? - kubernetes

I have one kubernetes master and three kubernetes nodes. I made one pod which is running on specific node. I want to run that pod on 2 nodes. how can I achieve this? do replica concept help me? if yes how?

Yes, you can assign pods to one or more nodes of your cluster, and here are some options to achieve this:
nodeSelector
nodeSelector is the simplest recommended form of node selection constraint. nodeSelector is a field of PodSpec. It specifies a map of key-value pairs. For the pod to be eligible to run on a node, the node must have each of the indicated key-value pairs as labels (it can have additional labels as well). The most common usage is one key-value pair.
affinity and anti-affinity
Node affinity is conceptually similar to nodeSelector -- it allows you to constrain which nodes your pod is eligible to be scheduled on, based on labels on the node.
nodeSelector provides a very simple way to constrain pods to nodes with particular labels. The affinity/anti-affinity feature, greatly expands the types of constraints you can express. The key enhancements are
The affinity/anti-affinity language is more expressive. The language offers more matching rules besides exact matches created with a logical AND operation;
you can indicate that the rule is "soft"/"preference" rather than a hard requirement, so if the scheduler can't satisfy it, the pod will still be scheduled;
you can constrain against labels on other pods running on the node (or other topological domain), rather than against labels on the node itself, which allows rules about which pods can and cannot be co-located
DaemonSet
A DaemonSet ensures that all (or some) Nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected. Deleting a DaemonSet will clean up the Pods it created.
Some typical uses of a DaemonSet are:
running a cluster storage daemon on every node
running a logs collection daemon on every node
running a node monitoring daemon on every node
Please check this link to read more about how to assign pods to nodes.

It's not a good practice to run the pods directly on the nodes as the nodes/pods can crash at any time. It's better use the K8S controllers as mentioned in the K8S documentation here.
K8S supports multiple containers and depending on the requirement the appropriate controller can be used. By looking at the OP it's difficult to say which controller to use.

You can use daemonset, if you want to run pod on each node.
What I see is you are trying to deploy pod on each node, it's better if you allow the scheduler to make decision where the pod need to be deployed based on the resources.
This would be best in all worst scenario's.
I'm mean in case of node failures.

Related

Kubernetes Autoscaler: no downtime for deployments when downscaling is possible?

In a project, I'm enabling the cluster autoscaler functionality from Kubernetes.
According to the documentation: How does scale down work, I understand that when a node is used for a given time less than 50% of its capacity, then it is removed, together with all of its pods, which will be replicated in a different node if needed.
But the following problem can happen: what if all the pods related to a specific deployment are contained in a node that is being removed? That would mean users might experience downtime for the application of this deployment.
Is there a way to avoid that the scale down deletes a node whenever there is a deployment which only contains pods running on that node?
I have checked the documentation, and one possible (but not good) solution, is to add an annotation to all of the pods containing applications here, but this clearly would not down scale the cluster in an optimal way.
In the same documentation:
What happens when a non-empty node is terminated? As mentioned above, all pods should be migrated elsewhere. Cluster Autoscaler does this by evicting them and tainting the node, so they aren't scheduled there again.
What is the Eviction ?:
The eviction subresource of a pod can be thought of as a kind of policy-controlled DELETE operation on the pod itself.
Ok, but what if all pods get evicted at the same time on the node?
You can use Pod Disruption Budget to make sure minimum replicas are always working:
What is PDB?:
A PDB limits the number of Pods of a replicated application that are down simultaneously from voluntary disruptions.
In k8s docs you can also read:
A PodDisruptionBudget has three fields:
A label selector .spec.selector to specify the set of pods to which it applies. This field is required.
.spec.minAvailable which is a description of the number of pods from that set that must still be available after the eviction, even in the absence of the evicted pod. minAvailable can be either an absolute number or a percentage.
.spec.maxUnavailable (available in Kubernetes 1.7 and higher) which is a description of the number of pods from that set that can be unavailable after the eviction. It can be either an absolute number or a percentage.
So if you use PDB for your deployment it should not get deleted all at once.
But please notice that if the node fails for some other reason (e.g hardware failure), you will still experience downtime. If you really care about High Availability consider using pod antiaffinity to make sure the pods are not scheduled all on one node.
Same document you referred to, has this:
How is Cluster Autoscaler different from CPU-usage-based node autoscalers? Cluster Autoscaler makes sure that all pods in the
cluster have a place to run, no matter if there is any CPU load or
not. Moreover, it tries to ensure that there are no unneeded nodes in
the cluster.
CPU-usage-based (or any metric-based) cluster/node group autoscalers
don't care about pods when scaling up and down. As a result, they may
add a node that will not have any pods, or remove a node that has some
system-critical pods on it, like kube-dns. Usage of these autoscalers
with Kubernetes is discouraged.

Kubernetes node affinity - Assigning a pod to a specific node?

The k8s node affinity documentation explains how to deploy a pod to a specific node by first tagging the node with a label and using nodeSelector to pick the node.
However, I have a use case where I have 40-50 deployments in a cluster and I want to add a new node to the cluster and set that node dedicated to one of those deployments/pods without altering all those deployments which don't have any nodeSelector specified
For example, lets say I have 3 deployments, with no nodeSelector defined and 3 worker nodes. This means k8s decides where the pods will be deployed and it could be deployed in one of those 3 nodes. Now I have to create a 4th deployment, and add 4th server and I want to dedicate the 4th deployment to the 4th server and also want to make sure that k8s won't schedule the first 3 deployments to this 4th node. How can I do this without going through all those 3 deployment scheme and apply a nodeSelector filter to not deploy on 4th node? (It would be ok to perform this change on 3 deployments, but I am talking about 50s of deployments in real life scenario)
The only thing that I can think of is to taint the node but if I do it, none of the pods will be scheduled there.
Is there a better approach here to achieve this goal that I am not aware of?
If you taint the 4th node, the 4th deployment with the toleration will be deployed on that node, while the other three won't.
You could use an admission controller to dynamically add NodeSelector into a pod spec at runtime rather than modifying existing deployment yamls. You could write any custom logic in the mutating web-hook to cater to your use case. For example you could have a label/annotation in the new deployment spec and based on the existence of that label/annotation you could dynamically add a NodeSelector.
PodNodeSelector admission web-hook is a good example to refer to.

How to deploy deployments in multiple nodes in kubernetes?

i have a bare-metal kubernetes cluster with 1 master node and 4 worker nodes.
I want to deploy my deployment objects on every 4 worker nodes but i can't.
I try nodeSelector but looks like it only works on last key:value pair label.
Please help me.
If you want to ensure that all nodes have that pod on them you can use a DaemonSet.
You can also use affinity/anti-affinity selectors.
Inter-pod affinity and anti-affinity allow you to constrain which nodes your pod is eligible to be scheduled based on labels on pods that are already running on the node rather than based on labels on nodes. The rules are of the form “this pod should (or, in the case of anti-affinity, should not) run in an X if that X is already running one or more pods that meet rule Y”
If you don't want to two instances are located on the same host, check following link
https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#never-co-located-in-the-same-node

Set replicas on different nodes

I am developing an application for dealing with kubernetes runtime microservices. I actually did some cool things, like moving a microservice from a node to another one. The problem is that all replicas go together.
So, Imagine that a microservice has two replicas and it is running on a namespaces with two nodes.
I want to set one replica in each node. Is that possible? Even in a yaml file, is that possible?
I am trying to do my own scheduler to do that, but I got no success until now.
Thank you all
I think what you are looking for is a NodeSelector for your replica Set. From the documentation:
Inter-pod affinity and anti-affinity allow you to constrain which nodes your pod is eligible to be scheduled based on labels on pods that are already running on the node rather than based on labels on nodes.
Here is the documentation: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#inter-pod-affinity-and-anti-affinity-beta-feature
I can't find where it's documented, but I recently read somewhere that replicas will be distributed across nodes when you create the kubernetes service BEFORE the deployment / replicaset.

How do I schedule the same pod on different nodes using kubectl scale?

New to kubernetes. Can I use kubectl scale --replicas=N and start pods on different nodes?
By default the scheduler attempts to spread pods across nodes, so that you don't have multiple pods of the same type on the same node. So there's nothing special required if you're just aiming for best-effort pod spreading.
If you want to express the requirement that the pod must not run on a node that already has a pod of that type on it you can use pod anti-affinity, which is currently an Alpha feature.
If you want to ensure that all nodes (or all nodes matching a certain selector) have that pod on them you can use a DaemonSet.
Scaling a Deployment (or RC) tells controller-manager to create more pods, new pods are then subject to scheduling. K8S scheduler will attempt to find most reasonable placement to schedule your pods to. This does not guarantee that pods will launch on different nodes, but makes it a rather likely scenario, if you have the required resources. Unfortunately it also means that if all pods can fit on one node, there are situations where scheduler might actually do just that (ie. all other nodes in unschedulable state for some reason). If that happens, the pods will not reschedule when conditions change.
To have a solid guarantee that pods wil not get colocated on the same node you have two options:
legacy hack : define a hostPort in your pod template. As given host port is a resource that can be assigned only once per node, your pods will never exist more then once per node
alpha feature : you can look into Pod AntiAffinity, quite early and not really battle proven yet
First one has a dissadvantage - you can never have more then one pod of this type per node, so it ie. affects rolling deployments and limits your capacity for scaling (you can never have more active pods then number of nodes)