Kubernetes - Are containers automatically allocated to nodes after adding worker node? - kubernetes

I apologize for my poor English.
I created 1 master-node and 1 worker-node in cluster, and deployed container (replicas:4).
then kubectl get all shows like as below. (omitted)
NAME  NODE
pod/container1 k8s-worker-1.local
pod/container2 k8s-worker-1.local
pod/container3 k8s-worker-1.local
pod/container4 k8s-worker-1.local
next, I added 1 worker-node to this cluster. but all containers keep to be deployed to worker1.
ideally, I want 2 containers to stop, and start up on worker2 like as below.
NAME  NODE
pod/container1 k8s-worker-1.local
pod/container2 k8s-worker-1.local
pod/container3 k8s-worker-2.local
pod/container4 k8s-worker-2.local
Do I need some commands after adding additional node?

Scheduling only happens when a pod is started. After that, it won't be moved. There are tools out there for deleting (evicting) pods when nodes get too imbalanced, but if you're just starting out I wouldn't go that far for now. If you delete your 4 pods and recreate them (or let the Deployment system recreate them as is more common in a real situation) they should end up more balanced (though possibly not 2 and 2 since the system isn't exact and spreading out is only one of the factors used in scheduling).

Related

azure kubenetes (aks): How to avoid the deployment's pod will never be killed when cluster's node scale down?

I have a deployment (called moon2) having 2 replicas (let's say moon2-pod1 and moon2-pod2), deployed on Azure Kubeneretes (AKS) where the autocsaling feature is enabled (the min=2, the max=10 for nodes)
And when the cluster scales down, sometimes the workers deploying the pods of the deployment in question get killed, then the pods in question get deployed on other workers.
My question: How can i avoid the killing of pods moon2-pod1 and moon2-pod2 ? ie. can i tell AKS: when you scale down, do not delete worker(s) having the 2 pods in question ? If the response is yes, how can i do that ? or is there another way ?
Thank you in advance for your help!
I think Pod Disruption Budget (PDB) is probably what you're looking for. You can set a PDB for pods in your demployment with MaxUnavailable: 0 , which means "do not touch any pods for this deployment".
Please check this doc for more details and how to set it: https://kubernetes.io/docs/tasks/run-application/configure-pdb/
one thing to note is that, with AKS, with a PDB like that, you will need to remove such PDB temporarily before you do version upgrade/node-image update etc, otherwise these operations get stuck, because these nodes cannot be drained for upgrade/update with such pods on it.

Helm Chart/Statefulset dinamically set a pods to determined nodes

So this is a little bit related with this other post, but I have a determined example that I think justify this use case.
I am installing a redis-cluster using bitnami/redis-cluster helm chart. This works with 6 instances, 3 master + 3 slaves, that are deployed over 3 nodes. My idea then, its that these instances were deployed over 3 nodes, avoiding master 1 and slave 1 (for example) were not in the same node, so if one node fails I have its replica in one of the other nodes.
What I am doing actually, is set a custom scheduler for this nodes, and in the end of deployment execute an script that manually sets each instance in the node I want.
Is there any way to do this simply using nodeSelector, affinity, taints... directly from values file? I would like to stay on this chart, and don't create a yaml for each redis instance.

How to delete decrease the nodes in eksctl kubernetes

I want to delete the single node of cluster
here is my problem i am create the node where 2 nodes are running only
but for sometime i need more nodes for few minutes only then after using scaling down i want delete the drain node only from cluster.
i do scaling up/down manually
here is the step i follow
create cluster with 2 node
scale up the cluster and add 2 more.
after i want to delete the 2 node with all backup pod only
i tried it with command
eksctl scale nodegroup --cluster= cluster-name --name= name --nodes=4 --nodes-min=1 --nodes-max=4
but it doesn't help it will delete random node also manager will crash.
One option is using a separate node group for the transient load, use taints/tolerations for laod to be scheduled on that node group, drain/delete that particular node group if not needed.
Do you manually scale up/down nodes? If you are using something like cluster auto scaler, there will be variables like "cluster-autoscaler.kubernetes.io/safe-to-evict": "false" to protect pods from scaling down.

Specifying exact number of pods per node then performing image version upgrade

I have 3 nodes in a k8s cluster and I need exactly 2 pods to be scheduled in each node, so I would end up having 3 nodes with 2 pods each (6 replicas).
I found that k8s have Pod Affinity/Anti-Affinity feature and that seems to be the correct way of doing.
My problem is: I want to run 2 pods per node but I often use kubectl apply to upgrade my docker image version, and in this case k8s should've be able to schedule 2 new images in each node before terminating the old ones - will the newer images be scheduled if I use Pod Affinity/Anti-Affinity to allow only 2 pods per node?
How can I do this in my deployment configuration? I cannot get it to work.
I believe it is part of kubelet's setting, so you would have to look into kubelet's --max-pods flag, depending on what your cluster configuration is.
The following links could be useful:
https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/#kubelet
and
https://kubernetes.io/docs/tasks/administer-cluster/reconfigure-kubelet/

How to set label to Kubernetes node at creation time?

I am following up guide [1] to create multi-node K8S cluster which has 1 master and 2 nodes. Also, a label needs to set to each node respectively.
Node 1 - label name=orders
Node 2 - label name=payment
I know that above could be achieved running kubectl command
kubectl get nodes
kubectl label nodes <node-name> <label-key>=<label-value>
But I would like to know how to set label when creating a node. Node creation guidance is in [2].
Appreciate your input.
[1] https://coreos.com/kubernetes/docs/latest/getting-started.html
[2] https://coreos.com/kubernetes/docs/latest/deploy-workers.html
In fact there is a trivial way to achieve that since 1.3 or something like that.
What is responsible for registering your node is the kubelet process launched on it, all you need to do is pass it a flag like this --node-labels 'role=kubemaster'. This is how I differentiate nodes between different autoscaling groups in my AWS k8s cluster.
This answer is now incorrect (and has been for several versions of Kubernetes). Please see the correct answer by Radek 'Goblin' Pieczonka
There are a few options available to you. The easiest IMHO would be to use a systemd unit to install and configure kubectl, then run the kubectl label command. Alternatively, you could just use curl to update the labels in the node's metadata directly.
That being said, while I don't know your exact use case, the way you are using the labels on the nodes seems to be an effort to bypass some of Kubernetes key features, like dynamic scheduling of components across nodes. I would suggest rather than work on labeling the nodes automatically that you try to address why you need to identify the nodes in the first place.
I know this isn't creation time but, the following is pretty easy (labels follow pattern of key=value):
k label node minikube gpu.nvidia.com/model=Quadro_RTX_4000 node.coreweave.cloud/cpu=intel-xeon-v2
node/minikube labeled