Is there a way to use vertical and horizontal pod autoscaler without a controller? - kubernetes

I want to know if there is a way to use autoscalers in Kubernetes with pods directly created from a "pod creation yaml files" not the pods created as part of a higher-level controller like deployments or replicasets?

The short answer to your question is no.
Horizontal Pod Autoscaler changes the number of replicas of a Deployment reacting to changes in load utilization. So you need a Deployment for it to work.
Regarding Vertical Pod Autoscaler, I think it should work with spare pods as well, but only at Pod creation time. In fact, I read the following statement in the Known limitations section of the README:
VPA does not evict pods which are not run under a controller. For such
pods Auto mode is currently equivalent to Initial.
That sentence make me conclude that VPA should work on Pods not backed by a Controller, but in a limited way. In fact, the documentation about Initial mode states:
VPA only assigns resource requests on pod creation and never changes
them later.
Making it basically useless.

I think it is not possible to use Pod object as the target resource for an HPA.
The document describes HPA as:
The Horizontal Pod Autoscaler automatically scales the number of Pods
in a replication controller, deployment, replica set or stateful set
based on observed CPU utilization (or, with custom metrics support, on
some other application-provided metrics). Note that Horizontal Pod
Autoscaling does not apply to objects that can't be scaled, for
example, DaemonSets.
The document also described how the algorithm is implemented at the backend as:
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
and since the Pod resource does not have the replicas field as part of its spec therefore we can say that the same is not supported for auto scaling using the HPA.
Although it seems the VPA does support working with Pod object but there is a limitation when using VPA just with Pods.
VPA does not evict pods which are not run under a controller. For such
pods Auto mode is currently equivalent to Initial.
You can read about the different updatePolicy.updateModes in the docs.

Related

Is it possible to use Kubernetes autoscale on cron job pods

Some context: I have multiple cron jobs running daily, weekly, hourly and some of which require significant processing power.
I would like to add requests and limitations to these container cron pods to try and enable vertical scaling and ensure that the assigned node will have enough capacity when being initialized. This will prevent me from having to have multiple large node available at all times and also letting me modify how many crons I can run in parallel easily.
I would like to try and avoid timed scaling since the cron jobs processing time can increase as the application grows.
Edit - Additional Information :
Currently I am using Digital Ocean and utilizing it's UI for cluster autoscaling. I have it working with HPA's on deployments but not crons. Adding limits to crons does not trigger cluster autoscaling to my knowledge.
I have tried to enable HPA scaling with the cron but with no success. Basically it just sits on a pending status signalling that there is insufficient CPU available and does not generate a new node.
Does HPA scaling work with cron job pods and is there a way to achieve the same type of scaling?
HPA is used to scale more pods when pod loads are high, but this won't increase the resources on your cluster.
I think you're looking for cluster autoscaler (works on AWS, GKE and Azure) and will increase cluster capacity when pods can't be scheduled.
This is a Community Wiki answer so feel free to edit it and add any additional details you consider important.
As Dom already mentioned "this won't increase the resources on your cluster." Saying more specifically, it won't create an additional node as Horizontal Pod Autoscaler doesn't have such capability and in fact it has nothing to do with cluster scaling. It's name is pretty self-exlpanatory. HPA is able only to scale Pods and it scales them horizontaly, in other words it can automatically increase or decrease number of replicas of your "replication controller, deployment, replica set or stateful set based on observed CPU utilization (or, with custom metrics support, on some other application-provided metrics)" as per the docs.
As to cluster autoscaling, as already said by Dom, such solutions are implemented in so called managed kubernetes solutions such as GKE on GCP, EKS on AWS or AKS on Azure and many more. You typically don't need to do anything to enable them as they are available out of the box.
You may wonder how HPA and CA fit together. It's really well explained in FAQ section of the Cluster Autoscaler project:
How does Horizontal Pod Autoscaler work with Cluster Autoscaler?
Horizontal Pod Autoscaler changes the deployment's or replicaset's
number of replicas based on the current CPU load. If the load
increases, HPA will create new replicas, for which there may or may
not be enough space in the cluster. If there are not enough resources,
CA will try to bring up some nodes, so that the HPA-created pods have
a place to run. If the load decreases, HPA will stop some of the
replicas. As a result, some nodes may become underutilized or
completely empty, and then CA will terminate such unneeded nodes.

Dynamic scaling for statefulset best practices

Background
I have app running in kubernetes cluster using sharded mongodb and elasticsearch statefulsets. I setup horizontal pod autoscalers for deployment components in my app and everything works well.
Problems
Problems arise when the traffic goes up. My server deployment scales out just fine, but mongodb shards and elasticsearch nodes cannot handle this much traffic and throttle overall response time.
Simple solution is to configure those statefulset with more shards, more replicas. What bugs me is that traffic spike happens like 3-4 hours a day, thus it's kinda wasteful to let all those boys sitting idly for the rest of the day.
I did some research and looks like database in general is not supposed to scale out/in dynamically as it will consume a lot of network and disk io just to do replication between them. Also there is potential of data loss and inconsistency during scaling up, scaling down.
Questions
If possible, what is proper way to handle dynamic scaling in mongodb, elasticsearch... and database in general?
If not, what can I do to save some cents off my cloud bill as we only need the maximum power from database pods for a short period per day.
You should read about Kubernetes autoscaling - HPA.
The Horizontal Pod Autoscaler automatically scales the number of pods in a replication controller, deployment, replica set or stateful set based on observed CPU utilization (or, with custom metrics support, on some other application-provided metrics). Note that Horizontal Pod Autoscaling does not apply to objects that can't be scaled, for example, DaemonSets.
The Horizontal Pod Autoscaler is implemented as a Kubernetes API resource and a controller. The resource determines the behavior of the controller. The controller periodically adjusts the number of replicas in a replication controller or deployment to match the observed average CPU utilization to the target specified by user.
With HPA you should have to also take care about the volume mounting and data latency.
As #Serge mentioned in comments, I would suggest to check the native scaling cluster option provided by the MongoDB and Elasticsearch itself.
Take a look at
MongoDB operator documentation
Elasticsearch operator documentation
Elasticsearch future release autoscaling
I am not very familiar with MongoDB and Elasticsearch with Kubernetes, but maybe those tutorials help you:
https://medium.com/faun/scaling-mongodb-on-kubernetes-32e446c16b82
https://www.youtube.com/watch?v=J7h0F34iBx0
https://kubernetes.io/blog/2017/01/running-mongodb-on-kubernetes-with-statefulsets/
https://sematext.com/blog/elasticsearch-operator-on-kubernetes/#toc-what-is-the-elasticsearch-operator-1
If you use helm take a look at banzaicloud Horizontal Pod Autoscaler operator
You may not want nor can edit a Helm chart just to add an autoscaling feature. Nearly all charts supports custom annotations so we believe that it would be a good idea to be able to setup autoscaling just by adding some simple annotations to your deployment.
We have open sourced a Horizontal Pod Autoscaler operator. This operator watches for your Deployment or StatefulSet and automatically creates an HorizontalPodAutoscaler resource, should you provide the correct autoscale annotations.
Hope you find this useful.

Kubernetes Autoscaler: no downtime for deployments when downscaling is possible?

In a project, I'm enabling the cluster autoscaler functionality from Kubernetes.
According to the documentation: How does scale down work, I understand that when a node is used for a given time less than 50% of its capacity, then it is removed, together with all of its pods, which will be replicated in a different node if needed.
But the following problem can happen: what if all the pods related to a specific deployment are contained in a node that is being removed? That would mean users might experience downtime for the application of this deployment.
Is there a way to avoid that the scale down deletes a node whenever there is a deployment which only contains pods running on that node?
I have checked the documentation, and one possible (but not good) solution, is to add an annotation to all of the pods containing applications here, but this clearly would not down scale the cluster in an optimal way.
In the same documentation:
What happens when a non-empty node is terminated? As mentioned above, all pods should be migrated elsewhere. Cluster Autoscaler does this by evicting them and tainting the node, so they aren't scheduled there again.
What is the Eviction ?:
The eviction subresource of a pod can be thought of as a kind of policy-controlled DELETE operation on the pod itself.
Ok, but what if all pods get evicted at the same time on the node?
You can use Pod Disruption Budget to make sure minimum replicas are always working:
What is PDB?:
A PDB limits the number of Pods of a replicated application that are down simultaneously from voluntary disruptions.
In k8s docs you can also read:
A PodDisruptionBudget has three fields:
A label selector .spec.selector to specify the set of pods to which it applies. This field is required.
.spec.minAvailable which is a description of the number of pods from that set that must still be available after the eviction, even in the absence of the evicted pod. minAvailable can be either an absolute number or a percentage.
.spec.maxUnavailable (available in Kubernetes 1.7 and higher) which is a description of the number of pods from that set that can be unavailable after the eviction. It can be either an absolute number or a percentage.
So if you use PDB for your deployment it should not get deleted all at once.
But please notice that if the node fails for some other reason (e.g hardware failure), you will still experience downtime. If you really care about High Availability consider using pod antiaffinity to make sure the pods are not scheduled all on one node.
Same document you referred to, has this:
How is Cluster Autoscaler different from CPU-usage-based node autoscalers? Cluster Autoscaler makes sure that all pods in the
cluster have a place to run, no matter if there is any CPU load or
not. Moreover, it tries to ensure that there are no unneeded nodes in
the cluster.
CPU-usage-based (or any metric-based) cluster/node group autoscalers
don't care about pods when scaling up and down. As a result, they may
add a node that will not have any pods, or remove a node that has some
system-critical pods on it, like kube-dns. Usage of these autoscalers
with Kubernetes is discouraged.

What is needed to enable a pod to control another deployment in kubernetes?

I'm trying to figure out the pieces and how to fit them together for having a pod be able to control aspects of a deployment, like scaling. I'm thinking I need to set up a service account for it, but I'm not finding the information on how to link it all together, and then how to get the pod to use the service account. I'll be writing this in python, which might add to the complexity of how to use the service account
Try to set up Horizontal Pod Autpscaler.
The Horizontal Pod Autoscaler automatically scales the number of pods in a replication controller, deployment, replica set or stateful set based on observed CPU utilization (or, with custom metrics support, on some other application-provided metrics). Horizontal Pod Autoscaling does not apply to objects that can’t be scaled, for example, DaemonSets.
The Horizontal Pod Autoscaler is implemented as a Kubernetes API resource and a controller. The resource determines the behavior of the controller. The controller periodically adjusts the number of replicas in a replication controller or deployment to match the observed average CPU utilization to the target specified by user.
Documentations: hpa-setup, autoscaling.

Kubernetes PodDisruptionBudget, HorizontalPodAutoscaler & RollingUpdate Interaction?

If I have the following Kubernetes objects:
Deployment with rollingUpdate.maxUnavailable set to 1.
PodDisruptionBudget with maxUnavailable set to 1.
HorizontalPodAutoscaler setup to allow auto scaling.
Cluster auto-scaling is enabled.
If the cluster was under load and is in the middle of scaling up, what happens:
During a rolling update? Do the new Pod's added due to the scale up use the new version of the Pod?
When a node needs to be restarted or replaced? Does the PodDisruptionBudget stop the restart completely? Does the HorizontalPodAutoscaler to scale up the number of nodes before taking down another node?
When the Pod affinity is set to avoid placing two Pod's from the same Deployment on the same node.
As in documentation:
Pods which are deleted or unavailable due to a rolling upgrade to an application do count against the disruption budget, but controllers (like deployment and stateful-set) are not limited by PDBs when doing rolling upgrades – the handling of failures during application updates is configured in the controller spec.
So it partially depends on the controller configuration and implementation. I believe new pods added by the autoscaler will use the new version of the Pod, because that's the version present in deployments definition at that point.
That depends on the way you execute the node restart. If you just cut the power, nothing can be done ;) If you execute proper drain before shutting the node down, then the PodDisruptionBudget will be taken into account and the draining procedure won't violate it. The disruption budget is respected by the Eviction API, but can be violated with low level operations like manual pod deletion. It is more like a suggestion, that some APIs respect, than a force limit that is enforced by whole Kubernetes.
According to the official documentation if the affinity is set to be a "soft" one, the pods will be scheduled on the same node anyway. If it's "hard", then the deployment will get stuck, not being able to schedule required amount of pods. Rolling update will still be possible, but the HPA won't be able to grow the pod pool anymore.