Kubernetes vertical-pod-autoscaler without Updater and how to use it with benchmarks - kubernetes

Good morning all. I am working now with Kubernetes Vertical Pod Autoscaler (VPA) https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler and I'd like to use it to optimise resource consumption on the cluster. There's around 40+ microservices deployed, I can easily, in a shell script, automate creation of the VPA objects and I get recommendations. I cannot use the Admission Controller on a cluster - this is not deployed.
Do I understand the documentation correctly that Updater part needs the Admission Controller to work and w/o that, no eviction / update is done at all?
https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/pkg/updater/README.md
Updater runs in Kubernetes cluster and decides which pods should be restarted based on resources allocation recommendation calculated by Recommender. (...) Updater does not perform the actual resources update, but relies on Vertical Pod Autoscaler admission plugin to update pod resources when the pod is recreated after eviction.
Another question that I have is a difference between updatePolicy.updateMode: Auto and Off. How they are different in a deployment where Updater is not used?
Again, documentation / VPA code says:
// UpdateModeOff means that autoscaler never changes Pod resources.
// The recommender still sets the recommended resources in the
// VerticalPodAutoscaler object. This can be used for a "dry run".
UpdateModeOff UpdateMode = "Off"
// UpdateModeInitial means that autoscaler only assigns resources on pod
// creation and does not change them during the lifetime of the pod.
UpdateModeInitial UpdateMode = "Initial"
My VPA cannot evict / recreate a pod, no admission controller. So I should use the Off mode, if I understand correctly? Recreating a pod (scale -> 0, scale -> 1) recreates with resources defined the same as in Deployment, that's in my opinion expected. Is it any difference if I use Off or Initial?
And the last question, as I cannot use the Updater part of VPA, I've set up the --history-length=14d of Prometheus to 2 weeks, I can on demand run performance tests. Should I point VPA to an environment with more steady load or rather can use DEV-like environment with artificial load (via jMeter in my case) to get the recommendations. Should VPA be deployed on the same time as the load tests are being run to get the live metrics for recommendation?
Thank you for your responses.

Related

Is Interrupt functionality is availabe in Kubernetes minikube autoscaling?

I have used custom metrics API and have successfully auto scaled the service based on some metrics.But here is the point the auto scaling works in way that the minikube's HPA(horizontal pod autoscaler) will check the particular URL/API and try to find out the metric value repetitively with some polling period.For example HPA will check for the value for every 15 seconds.So this is continuous polling from HPA to the URL/API to fetch the value of that metric.After that it will simply compare the value with the target reference value given and try to scale.
What I want is, the API/URL should trigger the minikube HPA whenever needed, it's like HPA should work as a interrupt here in simple words.
call for autoscale should be from Service/API to HPA not from HPA to Service !
is this feature available in Kubernetes ? or do you have any comments in this scenario ? please share your view on this I am currently in last stage and this question is stopping my progress!

Designing K8 pod and proceses for initialization

I have a problem statement where in there is a Kubernetes cluster and I have some pods running on it.
Now, I want some functions/processes to run once per deployment, independent of number of replicas.
These processes use the same image like the image in deployment yaml.
I cannot use initcontainers and sidecars, because they will run along with main container on pod for each replica.
I tried to create a new image and then a pod out of it. But this pod keeps on running, which is not good for cluster resource, as it should be destroyed after it has done its job. Also, the main container depends on the completion on this process, in order to run the "command" part of K8 spec.
Looking for suggestions on how to tackle this?
Theoretically, You could write an admission controller webhook for intercepting create/update deployments and triggering your functions as you want. If your functions need to be checked, use ValidatingWebhookConfiguration for validating the process and then deny or accept commands.

How will a scheduled (rolling) restart of a service be affected by an ongoing upgrade (and vice versa)

Due to a memory leak in one of our services I am planning to add a k8s CronJob to schedule a periodic restart of the leaking service. Right now we do not have the resources to look into the mem leak properly, so we need a temporary solution to quickly minimize the issues caused by the leak. It will be a rolling restart, as outlined here:
How to schedule pods restart
I have already tested this in our test cluster, and it seems to work as expected. The service has 2 replicas in test, and 3 in production.
My plan is to schedule the CronJob to run every 2 hours.
I am now wondering: How will the new CronJob behave if it should happen to execute while a service upgrade is already running? We do rolling upgrades to achieve zero downtime, and we sometimes roll out upgrades several times a day. I don't want to limit the people who deploy upgrades by saying "please ensure you never deploy near to 08:00, 10:00, 12:00 etc". That will never work in the long term.
And vice versa, I am also wondering what will happen if an upgrade is started while the CronJob is already running and the pods are restarting.
Does kubernetes have something built-in to handle this kind of conflict?
This answer to the linked question recommends using kubectl rollout restart from a CronJob pod. That command internally works by adding an annotation to the deployment's pod spec; since the pod spec is different, it triggers a new rolling upgrade of the deployment.
Say you're running an ordinary redeployment; that will change the image: setting in the pod spec. At about the same time, the kubectl rollout restart happens that changes an annotation setting in the pod spec. The Kubernetes API forces these two changes to be serialized, so the final deployment object will always have both changes in it.
This question then reduces to "what happens if a deployment changes and needs to trigger a redeployment, while a redeployment is already running?" The Deployment documentation covers this case: it will start deploying new pods on the newest version of the pod spec and treat all older ones as "old", so a pod with the intermediate state might only exist for a couple of minutes before getting replaced.
In short: this should work consistently and you shouldn't need to take any special precautions.

HPA scale-down-kubernetes pods

My Requirement is Scale up PODS on Custom metrics like pending messages from queue increases pods has to increase to process jobs. In kubernetes Scale up is working fine with prometheus adapter & prometheus operator.
I have long running process in pods, but HPA checks the custom metrics and try to scale down, Due to this process killing mid of operations and loosing that message. How i can control the HPA kill only free pods where no process is running.
AdapterService to collect custom metrics
seriesQuery: '{namespace="default",service="hpatest-service"}'
resources:
overrides:
namespace:
resource: "namespace"
service:
resource: "service"
name:
matches: "msg_consumergroup_lag"
metricsQuery: 'avg_over_time(msg_consumergroup_lag{topic="test",consumergroup="test"}[1m])'
HPA Configuration
type: Object
object:
describedObject:
kind: Service
name: custommetric-service
metric:
name: msg_consumergroup_lag
target:
type: Value
value: 2
At present the HPA cannot be configured to accommodate workloads of this nature. The HPA simply sets the replica count on the deployment to a desired value according to the scaling algorithm, and the deployment chooses one or more pods to terminate.
There is a lot of discussion on this topic in this Kubernetes issue that may be of interest to you. It is not solved by the HPA, and may never be. There may need to be a different kind of autoscaler for this type of workload. Some suggestions are given in the link that may help you in defining one of these.
If I was to take this on myself, I would create a new controller, with corresponding CRD containing a job definition and the scaling requirements. Instead of scaling deployments, I would have it launch jobs. I would have the jobs do their work (process the queue) until they became idle (no items in the queue) then exit. The controller would only scale up, by adding jobs, never down. The jobs themselves would scale down by exiting when the queue is empty.
This would require that your jobs be able to detect when they become idle, by checking the queue and exiting if there is nothing there. If your queue read blocks forever, this would not work and you would need a different solution.
The kubebuilder project has an excellent example of a job controller. I would start with that and extend it with the ability to check your published metrics and start the jobs accordingly.
Also see Fine Parallel Processing Using a Work Queue in the Kubernetes documentation.
I will suggest and idea here , You can run a custom script to disable HPA as soon as it scales up and the script should keep checking the resource and process and when no process enable HPA and scale down , or kill the pods using kubectl command and enable HPA back.
I had similar use case to scale the deployments based on the queue length, I used KEDA (keda.sh), it does exactly that. Just know that it will scale down the additional pods created for that deployment even if the pod is currently processing the data/input - you will have to configure the cooldown parameter to scale down appropriately.
KEDA ScaledJobs are best for such scenarios and can be triggered through Queue, Storage, etc. (the currently available scalers can be found here). The ScaledJobs are not killed in between the execution and are recommended for long running executions.

Is it possible to add/modify kubernetes container spec based on clusterwide setting

I have a kubernetes-based application that uses an operator to build and deploy containers in pods. Sometimes I'd like to run containers in privileged mode to enable performance tracing, but since I'm not deploying the pod/containers directly from a manifest, I cannot simply add privileged mode and the debugfs filesystem mount.
That leaves me to fork the operator code, change where it builds the container spec, and redeploy with the modified operator. Doable, but awkward.
So my question is, is it possible to impose additional attributes to be added to container specs based on some clusterwide setting, either before pods are deployed by the operator? Or to modify the container spec after deployment? I tried that with kubectl edit pod mypod, but that didn't work.
This is on a physical cluster installed with kubespray.
There are three things to consider:
Your operator can create a controller (e.g. Deployment) instead of Pod, which allows modifications in the Pod Spec area, thus triggering Deployment's rollout (see rolling update strategy).
Use MutatingAdmissionWebhook
so before creating the Pod, its manifest would be modified/overwritten on the fly.
More info regarding MutatingAdmissionWebhook can be found here and here.
A workaround solution in a form of modifying the supply spec -> swapping the pod-a.
More about this was discussed here.
Please let me know if any of the above helped.