Affinity - Only run x number of pods per node in Kubernetes? - kubernetes

I can only find documentation online for attaching pods to nodes based on labels.
Is there a way to attach pods to nodes based on labels and count - So only x pods with label y?
Our scenario is that we only want to run 3 of our API pods per node.
If a 4th API pod is created, it should be scheduled onto a different node with less than 3 API pods running currently.
Thanks

No, you can not schedule by count of a specific label. But you can avoid co-locate your pods on the same node.
Avoid co-locate your pods on same node
You can use podAntiAffinity and topologyKey and taints to avoid scheduling pods on the same node. See Never co-located in the same node

Related

Kubernetes StatefulSets - run pod on every worker node

What is the easiest way to run a single Pod on every available worker node as part of the StatefulSet. So, a one to one mapping.
Am I right to say every Pod will run on a different Node by default with a StatefulSet? In which case is it sufficient to add x pods to the SS where x Worker nodes exist in the cluster?
Thanks.
Use DaemonSet instead.
A DaemonSet ensures that all (or some) Nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them. As nodes are removed from the cluster, those Pods are garbage collected. Deleting a DaemonSet will clean up the Pods it created.
If you really want to use statefulSet, you can take a look at features like nodeSelector or Affinity and Anti-affinity.

Kubernetes Statefulset Downscaling

Currently I am running a solr cluster on Kubernetes as a statefulset. My solr cluster has 39 pods running in it. I am running a single pod on a single physical node. My solr cluster has just 1 collection divived into 3 shards, each shard has 13 nodes (or pods) running in it and out of those 13 nodes (or pods), 3 are TLOG replicas and 10 are PULL replicas.
The problem that I want to disucss is - I want to autoscale my solr cluster. On the basis of some condition I want to downscale my PULL replica nodes (or pods) to minimum, so that unnecessary consumption can be reduced. Now I know I can use HPA in Kuberntes to autoscale, but while downscaling I don't want to stop my TLOG nodes (or pods). Similarly, while scaling up I want to just add PULL replicas to my cluster.
Can anyone please help me with this problem.
You can have different deployments for each one of the pod types, e.g one Deployment for TLOG pods and another one for PULL pods. Then you can define a fixed number of replicas for the TLOG pods and an HPA for the PULL pods. This will allow for adding / removing PULL pods only, without any impact on the TLOG pods.

Kubernetes: Evenly distribute the replicas across the cluster

We can use DaemonSet object to deploy one replica on each node. How can we deploy say 2 replicas or 3 replicas per node? How can we achieve that. please let us know
There is no way to force x pods per node the way a Daemonset does. However, with some planning, you can force a fairly even pod distribution across your nodes using pod anti affinity.
Let's say we have 10 nodes. The first thing is we need to have a ReplicaSet (deployment) with 30 pods (3 per node). Next, we want to set the pod anti affinity to use preferredDuringSchedulingIgnoredDuringExecution with a relatively high weight and match the deployment's labels. This will cause the scheduler to prefer not scheduling pods where the same pod already exists. Once there is 1 pod per node, the cycle starts over again. A node with 2 pods will be weighted lower than one with 1 pod so the next pod should try to go there.
Note this is not as precise as a DaemonSet and may run into some limitations when it comes time to scale up or down the cluster.
A more reliable way if scaling the cluster is to simply create multiple DaemonSets with different names, but identical in every other way. Since the DaemonSets will have the same labels, they can all be exposed through the same service.
By default, the kubernetes scheduler will prefer to schedule pods on different nodes.
The kubernetes scheduler will first determine all possible nodes where a pod can be deployed based on your affinity/anti-affinity/resource limits/etc.
Afterward, the scheduler will find the best node where the pod can be deployed. The scheduler will automatically schedule the pods to be on separate availability zones and on separate nodes if this is possible of course.
You can try this on your own. For example, if you have 3 nodes, try deploying 9 replicas of a pod. You will see that each node will have 2 pods running.

Kubernetes - Scheduling pod replicas on nodes as per the resources available

I have a node with 64 cores and another one with just 8. I need multiple replicas of my Kubernetes pods ( at least 6 ) and my 8-core node can only handle 1 instance. How could I ask kubernetes to schedule the rest (5) on the more powerful node ?
It would be good if I could do a scale-up only on the required node, is that possible?
While kubernetes is intelligent to spread pods on nodes with enough resources (CPU cores in this case), the following ways can be used to fine-tune how pods can be spread/load-balanced on the nodes in a cluster:
Adding labels to nodes and pods
Resource requests and limits for pods
nodeSelector, node affinity/anti-affinity, nodeName
Horizontal Pod Autoscaler
K8s Descheduler
In general, you should use resources.requests definitions in your workload in order to let the scheduler know about the requirements of your application. With this way the scheduler will take care of scheduling the pods where there are resources available.

How to convert Daemonsets to kind Deployment

I have already deployed pods using Daemonsets with nodeselector. My requirements is I need to use kind Deployment but at the same time I would want to retain Daemonsets functionality
.I have nodeselector defined so that same pod should be installed in labelled node.
How to achieve your help is appreciated.
My requirements is pod should be placed automatically based on nodeselector but with kind Deployment
In otherwords
Using Replication controller when I schedule 2 (two) replicas of a pod I expect 1 (one) replica each in each Nodes (VMs). Instead I find both replicas are created in same node This will make 1 Node a single point of failure which I need to avoid.
I have labelled two nodes properly. And I could see both pods spawned on single node. How to achieve both pods always schedule on both nodes?
Look into affinity and anti-affinity, specifically, inter-pod affinity and anti-affinity.
From official documentation:
Inter-pod affinity and anti-affinity allow you to constrain which nodes your pod is eligible to be scheduled based on labels on pods that are already running on the node rather than based on labels on nodes. The rules are of the form “this pod should (or, in the case of anti-affinity, should not) run in an X if that X is already running one or more pods that meet rule Y”.