Kubernetes spread pods along nodepools - kubernetes

I'm running a managed kubernetes cluster in GCP, which has 2 node pools - one on regular VMs, one on spot VMs, autoscaling is configured for both of them.
Currently i'm running batch jobs and async tasks on spot VMs and web apps on regular VMs, but to reduce costs i'd like to move web apps pods mostly to spot VMS. Usually i have 3-5 pods of each app running, so i'd like to leave 1 on regular VMs and 2-4 move to spot.
I've found a nodeAffinity and podAffinity settings and have set preferred pod placement with preferredDuringSchedulingIgnoredDuringExecution and spot VMs node selector, but now all my pods have moved to spot VMs.

Try something like
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/type
operator: In
values:
- regular
- spot/preemptible
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- app-label
topologyKey: spot-node-label

Related

Is it possible to trigger a node scale-up with Pod Affinity/Antiaffinity rules?

I have a nodepool in AKS with autoscaling enabled. Here is a simple scenario I am trying to achieve.
I have 1 node running a single pod (A) with label deployment=x. I have another pod (B) with a podAntiAffinity rule to avoid nodes running pods with deployment=x label.
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: deployment
operator: In
values:
- x
topologyKey: kubernetes.io/hostname
The behavior I am seeing is that Pod B gets scheduled to the same node as Pod A. I would have wanted Pod B to be in Pending state until the autoscaler added a new node that satisfies the podAntiAffinity rule and schedule Pod B to the new node. Is this possible to do?
Kubernetes Version: 1.22.6
-- EDIT--
This example does trigger a node scale up, so it's doing what I expect it to do and that working for me.
The podAntiAffinity example I posted works as expected.
You should change key for search. As I understand, once you select key: deployment, you are looking in deployments, it's not related to the node. Try this example:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/instance
operator: In
values:
- name-of-deployment
topologyKey: "kubernetes.io/hostname"

Kubernetes how to spread pods to nodes but with preferredDuringSchedulingIgnoredDuringExecution

I want that my api deployment pods will be spread to the whole cluster's nodes.
So I came up with this:
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- api
topologyKey: "kubernetes.io/hostname"
But this allows exactly one pod in each node, and no more.
My problem is when I want to rollout an update, kubernetes remains the new creating pod under "pending" state.
How can I change the requiredDuringSchedulingIgnoredDuringExecution to preferredDuringSchedulingIgnoredDuringExecution?
I have tried, but I got many errors since the preferredDuringSchedulingIgnoredDuringExecution probably requires different configurations from the requiredDuringSchedulingIgnoredDuringExecution.
This is the right implementation:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- api
topologyKey: kubernetes.io/hostname
This will spread the pods evenly in the nodes and will allow more than one in each node. So basically you can deploy 6 replicas to cluster of 3 nodes without a problem. Also, you can rollout an update even though it creates a new extra pod before shutting down the old one.

Kubernetes only run specific deployments on nodes by specific label

Using Kubernetes I have a set of nodes that are high cpu and I am using a affinity policy for a given deployment to specifically target these high cpu nodes:
# deployment.yaml
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: high-cpu-node
operator: In
values:
- "true"
That works, however it does not prevent all the rest of the deployments from scheduling pods on these high cpu nodes. How do I specify that these high cpu nodes should ONLY run pods where high-cpu-node=true? Is it possible to do this without going and modifying all the other deployment configurations (I have dozens of deployments)?
To get this behaviour you should taint nodes and use tolerations on deployments: https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/
But, unfortunately, you would have to modify deployments. It's not possible to achieve this simply via labels.

What is the recommended way to deploy kafka to make it deploy in all the available nodes?

I have 3 nodes in k8s and i'm running kafka (3 cluster).
While deploying zk/broker/rest-proxy, its not getting deployed in all the available nodes. How can i make sure that all pods are deployed in different nodes. Do i need to use nodeaffinity or podaffinity ?
If you want all pods to run on different nodes - you must use PodAntiAffinity. If this is hard requirement - you must use requiredDuringSchedulingIgnoredDuringExecution rule. If it's not - use preferredDuringSchedulingIgnoredDuringExecution.
topologyKey should be kubernetes.io/hostname.
In labelSelector put your pod's labels.
I recommend using soft anti-affinity which will look like:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- <your app label>
topologyKey: kubernetes.io/hostname
weight: 100
Here I explained the difference between anti-affinity types with examples applied to a live cluster:
https://blog.verygoodsecurity.com/posts/kubernetes-multi-az-deployments-using-pod-anti-affinity/

Use of Labels in Kubernetes deployments

I am interested in knowing how pervasively labels / selectors are getting used in Kubernetes. Is it widely used feature in field to segregate container workloads.
If not, what are other ways that are used to segregate workloads in kubernetes.
I'm currently running a Kubernetes in production for some months and using the labels on some pods to spread them out over the nodes using the podAntiAffinity rules. So that these pods aren't all located on a single node. Mind you, I'm running a small cluster of three nodes.
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- your-deployment-name
topologyKey: "kubernetes.io/hostname"
I've found this a useful way to use labels.