Does HorizontalPodAutoscaler make sense when there is only one Deployment on GKE (Google Container Engine) Kubernetes cluster? - kubernetes

I have a "homogeneous" Kubernetes setup. By this I mean that I am only running instances of a single type of pod (an http server) with a load balancer service distributing traffic to them.
By my reasoning, to get the most out of my cluster (edit: to be concrete -- getting the best average response times to http requests) I should have:
At least one pod running on every node: Not having a pod running on a node, means that I am paying for the node and not having it ready to serve a request.
At most one pod running on every node: The pods are threaded http servers so they can maximize utilization of a node, so running multiple pods on a node does not net me anything.
This means that I should have exactly one pod per node. I achieve this using a DaemonSet.
The alternative way is to configure a Deployment and apply a HorizontalPodAutoscaler to it and have Kubernetes handle the number of pods and pod to node mapping. Is there any disadvantage of my approach in comparison to this?
My evaluation is that the HorizontalPodAutoscaler is relevant mainly in heterogeneous situations, where one HorizontalPodAutoscaler can scale up a Deployment at the expense of another Deployment. But since I have only one type of pod, I would have only one Deployment and I would be scaling up that deployment at the expense of itself, which does not make sense.

HorizontalPodAutoscaler is actually a valid solution for your needs. To address your two concerns:
1. At least one pod running on every node
This isn't your real concern. The concern is underutilizing your cluster. However, you can be underutilizing your cluster even if you have a pod running on every node. Consider a three-node cluster:
Scenario A: pod running on each node, 10% CPU usage per node
Scenario B: pod running on only one node, 70% CPU usage
Even though Scenario A has a pod on each node the cluster is actually being less utilized than in Scenario B where only one node has a pod.
2. At most one pod running on every node
The Kubernetes scheduler tries to spread pods around so that you don't end up with multiple pods of the same type on a single node. Since in your case the other nodes should be empty, the scheduler should have no problems starting the pods on the other nodes. Additionally, if you have the pod request resources equivalent to the node's resources, that will prevent the scheduler from scheduling a new pod on a node that already has one.
Now, you can achieve the same effect whether you go with DaemonSet or HPA, but I personally would go with HPA since I think it fits your semantics better, and would also work much better if you eventually decide to add other types of pods to your cluster
Using a DamonSet means that the pod has to run on every node (or some subset). This is a great fit for something like a logger or a metrics collector which is per-node. But you really just want to use available cluster resources to power your pod as needed, which matches up better with the intent of HPA.
As an aside, I believe GKE supports cluster autoscaling, so you should never be paying for nodes that aren't needed.

Related

What will happen when a node is almost out of resource, when deploying K8s daemonset?

When deploying Kubernetes Daemonset, what will happen when single node (out of a few nodes) is almost out of resource, and a pod can't be created, and when there are no pods that can be evicted? Though Kubernetes can be horizontally scaled, I believe it is meaningless to scale horizontally as Daemonset would need every pod on each node.
Though Kubernetes can be horizontally scaled, I believe it is meaningless to scale horizontally as Daemonset would need every pod on each node.
DaemonSet is a workload type that is mostly for operations workload e.g. transporting logs from the node or similar "system services". It is rarely a good fit for workload that is serving your users, but it can be.
what will happen when single node (out of a few nodes) is almost out of resource, and a pod can't be created, and when there are no pods that can be evicted?
As I described above, workload deployed with DaemonSet is typically operations workload that has e.g. an infrastructure role in your cluster. Since this may be more critical pods (or less, depending on what you want), I would use a higher Quality of Service for these pods, so that other pods is evicted when there are few resources on the node.
See Configure Quality of Service for Pods for how to configure your Pods to be in a Quality of Service class, one of:
Guaranteed
Burstable
Best Effort
You might also consider to use Pod Priority and Preemption
The question was about DaemonSet but as a final note: Workload that serves requests from your users, typically is deployed as Deployment and for those, it is very easy to do horizontal scaling using Horizontal Pod Autoscaler.

Kubernetes: Evenly distribute the replicas across the cluster

We can use DaemonSet object to deploy one replica on each node. How can we deploy say 2 replicas or 3 replicas per node? How can we achieve that. please let us know
There is no way to force x pods per node the way a Daemonset does. However, with some planning, you can force a fairly even pod distribution across your nodes using pod anti affinity.
Let's say we have 10 nodes. The first thing is we need to have a ReplicaSet (deployment) with 30 pods (3 per node). Next, we want to set the pod anti affinity to use preferredDuringSchedulingIgnoredDuringExecution with a relatively high weight and match the deployment's labels. This will cause the scheduler to prefer not scheduling pods where the same pod already exists. Once there is 1 pod per node, the cycle starts over again. A node with 2 pods will be weighted lower than one with 1 pod so the next pod should try to go there.
Note this is not as precise as a DaemonSet and may run into some limitations when it comes time to scale up or down the cluster.
A more reliable way if scaling the cluster is to simply create multiple DaemonSets with different names, but identical in every other way. Since the DaemonSets will have the same labels, they can all be exposed through the same service.
By default, the kubernetes scheduler will prefer to schedule pods on different nodes.
The kubernetes scheduler will first determine all possible nodes where a pod can be deployed based on your affinity/anti-affinity/resource limits/etc.
Afterward, the scheduler will find the best node where the pod can be deployed. The scheduler will automatically schedule the pods to be on separate availability zones and on separate nodes if this is possible of course.
You can try this on your own. For example, if you have 3 nodes, try deploying 9 replicas of a pod. You will see that each node will have 2 pods running.

How to migrate the pods automatically to another node in kubernetes?

I am a new cookie to kubernetes . I am wondering if kubernetes have automatically switch the pods to another node if that node resources are on critical.
For example if Pod A , Pod B , Pod C is running on Node A and Pod D is running on Node B. The resources of Node A used by pods would be high. In these case whether kubernetes will migrate the any of the pods running in Node A to Node B.
I have learnt about node affinity and node selector which is used to run the pods in certain nodes. It would be helpfull if kubernetes offer this feature to migrate the pods to another node automatically if resources are used highly.
Can any one know how can we achieve this in kubernetes ?
Thanks
-S
Yes, Kubernetes can migrate the pods to another node automatically if resources are used highly. The pod would be killed and a new pod would be started on another node. You would probably want to learn about Quality of Service Classes, to understand which pod would be killed first.
That said, you may want to read about Automatic Horizontal Pod Autoscaling. This may give you more control.
With Horizontal Pod Autoscaling, Kubernetes automatically scales the number of pods in a replication controller, deployment or replica set based on observed CPU utilization (or, with alpha support, on some other, application-provided metrics).
With increase of load it makes more sense to spin up a new pod rather than moving pod between different nodes to avoid distraction of currently running processes inside pod on busy node.
you can do node selector in deployment and move the node
https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/

What's the purpose of Kubernetes DaemonSet when replication controllers have node anti-affinity

DaemonSet is a Kubernetes beta resource that can ensure that exactly one pod is scheduled to a group of nodes. The group of nodes is all nodes by default, but can be limited to a subset using nodeSelector or the Alpha feature of node affinity/anti-affinity.
It seems that DaemonSet functionality can be achieved with replication controllers/replica sets with proper node affinity and anti-affinity.
Am I missing something? If that's correct should DaemonSet be deprecated before it even leaves Beta?
As you said, DaemonSet guarantees one pod per node for a subset of the nodes in the cluster. If you use ReplicaSet instead, you need to
use the node affinity/anti-affinity and/or node selector to control the set of nodes to run on (similar to how DaemonSet does it).
use inter-pod anti-affinity to spread the pods across the nodes.
make sure the number of pods > number of node in the set, so that every node has one pod scheduled.
However, ensuring (3) is a chore as the set of nodes can change over time. With DaemonSet, you don't have to worry about that, nor would you need to create extra, unschedulable pods. On top of that, DaemonSet does not rely on the scheduler to assign its pods, which makes it useful for cluster bootstrap (see How Daemon Pods are scheduled).
See the "Alternative to DaemonSet" section in the DaemonSet doc for more comparisons. DaemonSet is still the easiest way to run a per-node daemon without external tools.

Allow only one pod of a type on a node in Kubernetes

How to allow only one pod of a type on a node in Kubernetes. Daemon-sets doesn't fit into this use-case.
For e.g. - Restricting scheduling of only one Elasticsearch pod on a node, to prevent data loss in case the node goes down.
It can be achieved by carefully planning CPU/memory resource of pod and machine type of cluster.
Is there any other way to do so?
Kubernetes 1.4 introduced Inter-pod affinity and anti-affinity. From the documentation: Inter-pod affinity and anti-affinity allow you to constrain which nodes your pod is eligible to schedule on based on labels on pods that are already running on the node.
That won't prevent a pod to be scheduled on a node, but at least the pod will be scheduled on the node if and only if the scheduler has no choice.
If you assign a constraint to your pod that can only be met at most once per node, then the scheduler will only be able to place one pod per node. A good example of such a constraint is a host port (the scheduler won't try to put two pods that both require the same host port onto the same node because the second one will never be able to run).
Also see How to require one pod per minion/kublet when configuring a replication controller?
I'm running MC jobs on kubernetes/GCE cluster, and for me M:N scheduling is important in a sense that out of M jobs, I want one job/pod per node for N nodes running (M >> N).
For me solution was to have explicit CPU limit set in pod JSON file
"resources": {
"limits": {
"cpu": "700m"
}
}
ANd I have no replication controller, just pure batch-style cluster.
Numbe of nodes N is typically 100-200-300, M is about 10K-20K
Is creating one elasticsearch deploy per node an option? In case yes I found it the easiest to use the nodeSelector combined with the Always restart policy. You can match on different labels, here I simply use the zone of the Azure AvailabilitySet. E.g. like this
spec:
containers:
...
nodeSelector:
failure-domain.beta.kubernetes.io/zone: "2"
...
restartPolicy: Always