How to increase load on stateful application in kubernetes cluster using a script - kubernetes

I have implemented horizontal pod autoscaler for my stateful application. How can I increase the load on this application do that my CPU utilisation goes up and pods are autoscaled

you can use Kubernetes performance testing tutorial: Load test a cluster as an example how to perform load test on cluster
you can use kubernetes-jmeter to generate workloads to achieve load testing
you can write script using any lang you know to generate a load
In any case, I would like to recommend you start from 13-Step Guide to Performance Testing in Kubernetes to check how you can setup performance testing with monitoring

Related

How to deploy workload with K8s on-demand (GKE)?

I need to deploy a GPU intensive task on GCP. I want to use a Node.js Docker image and within that container to run a Node.js server that listens to HTTP requests and runs a Python image processing script on-demand (every time that a new HTTP request is received containing the images to be processed). My understanding is that I need to deploy a load balancer in front of the K8s cluster that has a static public IP address which then builds/launches containers every time a new HTTP request comes in? And then destroy the container once processing is completed. Is container re-use not a concern? I never worked with K8s before and I want to understand how it works and after reading the GKE documentation this is how I imagine the architecture. What am I missing here?
runs a Python image processing script on-demand (every time that a new HTTP request is received containing the images to be processed)
This can be solved on Kubernetes, but it is not a very common kind of workload.
The project that support your problem best is Knative with its per-request auto-scaler. Google Cloud Run is the easiest way to use this. But if you want to run this within your own GKE cluster, you can enable it.
That said, you can also design your Node.js service to integrate with the Kubernetes API-server to create Jobs - but it is not a good design to have common workload talk to the API-server. It is better to use Knative or Google Cloud Run.

Is it possible to schedule a pod to run for say 24 hours and then remove deployment/statefulset? or need to use jobs?

We have a bunch of pods running in dev environment. The pods are auto-provisioned by an application on every business action. The problem is that across various namespaces they are accumulating and eating available resources in EKS.
Is there a way without jenkins/k8s jobs to simply put some parameter on the pod manifest to tell it to self destruct say in 24 hours?
Add to your pod.spec:
activeDeadlineSeconds: 86400
After deadline your Pod will be stopped for good with the status DeadlineExceeded
If I understood your situation properly, you would like to scale your cluster down in order to save resources.
Kubernetes is featured with the ability to autoscale your application in a cluster. Literally, it means that Kubernetes can start additional pods when the load is increasing and terminate excessive pods when the load is decreasing.
It is possible to downscale the application to zero pods, but, in this case, you will have a delay serving the first request while the pod is starting.
This functionality relies on performance metrics. From the practical side, it means that autoscaling doesn't happen instantly, because it takes some time to performance metrics reach the configured threshold.
The mentioned Kubernetes feature called HPA(horizontal pod autoscale) is described in this document.
In case you are running your cluster on GCP or GKE, you are able to go further and automatically start additional nodes for your cluster when you need more computing capacity and shut down nodes when they are not running application pods anymore.
More information about this functionality can be found following the link.
Last, but not least, you can use tool like Ansible to manage all your kubernetes assets (it can create/manage deployments via playbooks).
If you decide to give it a try, you might find this information useful:
Creating a Container cluster in GKE
70% cheaper Kubernetes cluster on AWS
How to build a Kubernetes Horizontal Pod Autoscaler using custom metrics

Triggering a Kubernetes-based application from AppEngine

I'm currently looking into triggering some 3D rendering from an AppEngine-based service.
The idea is that input data is submitted by an API client to this web service, which then invokes an internal Kubernetes GPU enabled application ("rendering backend") to do the hard work.
GPU-enabled clusters are relatively expensive ($$$), so I really want the cluster to be up and running on demand. I am trying to achieve that by setting the autoscaling minimum to 0 for the rendering backend.
The only pretty way of "triggering" a rendering task on such a cluster I could think of is via Pub/Sub Push. Basically, I need something like Cloud Tasks, but those seem to be aimed at long running tasks executed in AppEngine, not Kubernetes. Plus I like the way Pub/Sub decouples the web service from the rendering backend.
Google's Pub/Sub only allows pushing via HTTPS and only to a validated domain. It appears that Google is forcing me to completely "expose" my internal rendering backend by assigning a domain name to it, which feels ridiculous. I cannot just tell Pub/Sub to invoke http://loadbalancer.IP.address/handle_push.
This is making me doubt my architecture.
How would you go about building something like this on GCP?
From the GKE perspective:
You can have a cluster with a dedicated GPU-based nodepool and schedule your pods there using Taints and tolerations. Additionally, you can control the number of nodes in your nodepool using Autoscaling so that, you can use them only when your pods are to be scheduled/run.
Consider that this requires an additional default-non-GPU-based nodepool, where system pods are being run.
For triggering, as long as your default pool is running, you'd be able to deploy your application and the autoscaling should start automatically. For deploying from an App Engine application, you might want to consider talking to the Kubernetes API directly through a library.
Finally and considering the nature of your current goal (3D rendering), it might be best to use Kubernetes Jobs. With these, you can complete an sporadic computational load, allowing the nodepool to downsize once is finished.
Wrapping up, you can have a minimum cluster with a zero-sized GPU-based nodepool that will autoscale when a tainted job is requested to be run there, and once the workload is finished, it should automatically downscale. These actions can be triggered from GAE, using one of the client libraries.

Deploy service dynamically according to load with Googl Kubernetes Engine

I'm currently working on an application deployed with Google Kubernetes Engine. I want to be able to change the behavior of a service if the load on my application reaches a certain point. The idea is to deploy a similar service which consumes less ressources so that my application can still work with a bigger load.
Is it possible with Google Kubernetes Engine ?
Yes it can be done with HPA and custom metrics in prometheus. We are using this setup to autoscale our deployments based on requests per minute.
Prometheus scrapes this metric from the application and prometheus adapter makes them available to kubernetes.
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
https://github.com/DirectXMan12/k8s-prometheus-adapter

Kubernetes automatic shutdown after some idle time

Does kubernetes or Helm support shut down the pods if it is idle for more than a given threshold time?
This would be very useful in the development environment, to provide room for other processes to consume it and save cost.
Kubernetes is featured with the ability to autoscale your application in a cluster. Literally, it means that Kubernetes can start additional pods when the load is increasing and terminate excessive pods when the load is decreasing.
It is possible to downscale the application to zero pods, but, in this case, you will have a delay serving the first request while the pod is starting.
This functionality relies on performance metrics provided by Heapster application, that must be run in the cluster. From the practical side, it means that autoscaling doesn't happen instantly, because it takes some time to performance metrics reach the configured threshold.
The mentioned Kubernetes feature called HPA(horizontal pod autoscale) is described in this document.
In case you are running your cluster on GCP or GKE, you are able to go further and automatically start additional nodes for your cluster when you need more computing capacity and shut down nodes when they are not running application pods anymore.
More information about this functionality can be found following the link.
If you decide to give it a try, you might find this information useful:
Creating a Container cluster in GKE
70% cheaper Kubernetes cluster on AWS
How to build a Kubernetes Horizontal Pod Autoscaler using custom metrics