Kubernetes version: 1.20
Cloud being used: Bare metal
Installation method: Kubeadm
Host OS: Redhat
We have a 7 node cluster and have the following requirements:
Deploy specific application pods to 2 nodes only
Enure the 2 nodes have enough memory/cpu before deploying them.
For #1, we tainted the two nodes, added the tolerations as well as node affinity to our statefulsets/deployments so no other pods are allowed to run besides our application.
For #2, one of the nodes keeps crashing because the scheudler is deploying pods even though memory/cpu is over 80% and the other node is less that 30% memory/cpu.
I don’t understand why this is happening but suspect it’s because for some reason the scheduler is coming up with a score based on the entire cluster and not the two nodes we’re interested in.
Does anyone know if it’s possible to configure a custom kube-scheduler to focus on two nodes only ? I think this is the only way to achieve a balance.
Thanks & Regards,
John
Related
I'm experiencing downtimes whenever the GKE cluster gets upgraded during the maintenance window. My services (APIs) become unreachable for like ~5min.
The cluster Location type is set to "Zonal", and all my pods have 2 replicas. The only affected pods seem to be the ones using nginx ingress controller.
Is there anything I can do to prevent this? I read that using Regional clusters should prevent downtimes in the control plane, but I'm not sure if it's related to my case. Any hints would be appreciated!
You mention "downtime" but is this downtime for you using the control plane (i.e. kubectl stop working) or is it downtime in that the end user who is using the services stops seeing the service working.
A GKE upgrade upgrades two parts of the cluster: the control plane or master nodes, and the worker nodes. These are two separate upgrades although they can happen at the same time depending on your configuration of the cluster.
Regional clusters can help with that, but they will cost more as you are having more nodes, but the upside is that the cluster is more resilient.
Going back to the earlier point about the control plane vs node upgrades. The control plane upgrade does NOT affect the end-user/customer perspective. The services will remaining running.
The node upgrade WILL affect the customer so you should consider various techniques to ensure high availability and resiliency on your services.
A common technique is to increase replicas and also to include pod antiaffinity. This will ensure the pods are scheduled on different nodes, so when the node upgrade comes around, it doesn't take the entire service out because the cluster scheduled all the replicas on the same node.
You mention the nginx ingress controller in your question. If you are using Helm to install that into your cluster, then out of the box, it is not setup to use anti-affinity, so it is liable to be taken out of service if all of its replicas get scheduled onto the same node, and then that node gets marked for upgrade or similar.
I'm new with Kubernetes, i'm testing with Minikube locally. I need some advice with Kubernetes's horizontal scaling.
In the following scenario :
Cluster composed of only 1 node
There is only 1 pod on this node
Only one application running on this pod
Is there a benefit of deploying new pod on this node only to scale my application ?
If i understand correctly, pod are sharing the system's resources. So if i deploy 2 pods instead of 1 on the same node, there will be no performance increase.
There will be no availability increase either, because if the node fails, the two pods will also shut.
Am i right about my two previous statements ?
Thanks
Yes, you are right. Pods on the same node are anyhow utilizing the same CPU and Memory resources and therefore are expected to go down in event of node failure.
But, you need to consider it at pod level also. There can be situation where the pod itself gets failed but node is working fine. In such cases, multiple pods can help you serve better and make application highly available.
From performance perspective also, more number of pods can serve requests faster, thereby dropping down latency issues for your application.
I have an EKS cluster with two worker nodes. I would like to "switch off" the nodes or do something to reduce costs of my cluster outside working hours. Is there any way to turn off the nodes at night and turn on again at morning?
Thanks a lot.
This is a very common concern with anyone using managed K8s cluster. There might be different approaches people might be taking for this. What works best for us is a combination of kube-downscaler and cluster-autoscaler.
kube-downscaler helps you to scale down / "pause" Kubernetes workload (Deployments, StatefulSets, and/or HorizontalPodAutoscalers and CronJobs too !) during non-work hours.
cluster-autoscaler is a tool that automatically:
Scales-down the size of the Kubernetes cluster when there are nodes in the cluster that have been underutilized for an extended period of time and their pods can be placed on other existing nodes.
Scales-up the size of the Kubernetes cluster when there are pods that failed to run in the cluster due to insufficient resources.
So, essentially during night when kube-downscaler scales down the pods and other objects, cluster-autoscaler notices the underutilized nodes and kill them before placing pods on other nodes. And does the opposite in the morning.
Ofcourse, there might be some fine-tuning needed regarding the configuration of the two to make it work best for you.
Unrelated to your specific question but, if you are in "savings" mode you may want to have a look at EC2 Spot Instances for EKS assuming you can operate within their boundaries. See here for the details.
I am running a deployment on a cluster of 1 master and 4 worker nodes (2-32GB and 2-4GB machine). I want to run a maximum of 10 pods on 4GB machines and 50 pods in 32GB machines.
Is there a way to assign different number of pods to different nodes in Kubernetes for same deployment?
I want to run a maximum of 10 pods on 4GB machines and 50 pods in 32GB
machines.
This is possible with configuring kubelet to limit the maximum pod count on the node:
// maxPods is the number of pods that can run on this Kubelet.
MaxPods int32 `json:"maxPods,omitempty"`
Github can be found here.
Is there a way to assign different number of pods to different nodes
in Kubernetes for same deployment?
Adding this to your request makes it not possible. There is no such native mechanism in Kubernetes at this point to suffice this. And this more or less goes in spirit of how Kubernetes works and its principles. Basically you schedule your application and let scheduler decides where it should go, unless there is very specific resource required like GPU. And this is possible with labels,affinity etc .
If you look at the Kubernetes-API you notice the there is no such field that will make your request possible. However, API functionality can be extended with custom resources and this problem can be tackled with creating your own scheduler. But this is not the easy way of fixing this.
You may want to also set appropriate memory requests. Higher requests will tell scheduler to deploy more pods into node which has more memory resources. It's not ideal but it is something.
Well in general the scheduling is done on basis of algorithms like round robin, least used etc.
And likely we have the independence of adding node affinities via selectors but that won't even tackle the count.
Maybe you have to manually reset this thing up along the worker nodes.
Say -
you did kubectl top nodes to get the available spaces, once the deployment has been done.
and kubectl get po -o wide will give you the nodes taken on by the pods.
Now to force the Pod to get spawned in a specific node, let's say the one with 32GB then you can temporarily mark the 4GB nodes as "Not ready" by executing following command
Kubectl cordon {node_name}
And now kill the pods those are running in 4GB machines and you want those to run in 32GB machines. After killing them, they will automatically get spawned in any of the 32GB nodes
then you can execute
Kubectl uncordon {node_name} to mark the node as "ready" again.
This is bit involved stuff and will need lots of calculations as well.
I am new to the Kubernetes and cluster.
I would like to bring up an High Availability Master Only Kubernetes Cluster(Need Not to!).
I have the 2 Instances/Servers running Kubernetes daemon, and running different kind of pods on both the Nodes.
Now I would like to somehow create the cluster and if the one of the host(2) down, then all the pods from that host(2) should move to the another host(1).
once the host(2) comes up. the pods should float back.
Please let me know if there is any way i can achieve this?
Since your requirement is to have a 2 node master-only cluster and also have HA capabilities then unfortunately there is no straightforward way to achieve it.
Reason being that a 2 node master-only cluster deployed by kubeadm has only 2 etcd pods (one on each node). This gives you no fault tolerance. Meaning if one of the nodes goes down, etcd cluster would lose quorum and the remaining k8s master won't be able to operate.
Now, if you were ok with having an external etcd cluster where you can maintain an odd number of etcd members then yes, you can have a 2 node k8s cluster and still have HA capabilities.
It is possible that master node serves also as a worker node however it is not advisable on production environments, mainly for performance reasons.
By default, kubeadm configures master node so that no workload can be run on it and only regular nodes, added later would be able to handle it. But you can easily override this default behaviour.
In order to enable workload to be scheduled also on master node you need to remove from it the following taint, which is added by default:
kubectl taint nodes --all node-role.kubernetes.io/master-
To install and configure multi-master kubernetes cluster you can follow this tutorial. It describes scenario with 3 master nodes but you can easily customize it to your needs.