How to scale a GKE deployment according to an AWS SQS queue size - kubernetes

Might be a strange one to ask, but found myself with:
An AWS Simple Queue that holds messages to process.
A deployment in a Kubernetes cluster on Google Cloud (GKE) that processes this queue.
I want to scale the deployment according to the queue size. A simple logic for example:
Queue size = 0 => deploy 3 pods
Queue size 0 > 10 > 1000 => deploy 20 pods
Queue size < 1000 => deploy 100 pods
Turns out that this isn't such a simple task, and I'm looking for ideas.
I tried to achieve this via the Horizontal pod autoscaler, but it looks like an impossible task.
My best idea is an AWS Lambda that monitors the queue (by messages or a cron schedule), and updates the Kubernetes deployment via API.
The easy part was monitoring the queue size and getting the desired scale for the deployment, but I'm not managing to physically control the deployment size via the AWS Lambda.
TL:DR, I would like to achieve kubectl functionality (scale deployment), but via an external lambda running node.js code, while authenticating to my google cloud platform, And it seems really tricky as well. There are a few client libraries, but none of them really documents how to authenticate and connect to my cluster.
I even thought about running the bash script from my deployment system - but running that through a lambda function using node.js 'exec' seems very very wrong.
Am I missing an easier way?

There's a project called Keda: https://keda.sh/docs/2.0/scalers/aws-sqs/. It supports horizontal scaling basing on a bunch of queue types. SQS is supported.
To securely access SQS/CloudWatch from a GKE one can use https://github.com/doitintl/gtoken which lets you assume AWS role from a GKE. Or in a simpler and less secure way - dedicated AWS user with periodic keys rotation. Also look at https://cloud.google.com/pubsub/docs/overview, perhaps you can replace your SQS to stay in one stack.

You can use WPA: https://github.com/practo/k8s-worker-pod-autoscaler to scale a GKE deployment based on SQS queue. The project scales based on combination of SQS metrics. https://medium.com/practo-engineering/launching-worker-pod-autoscaler-3f6079728e8b

Related

Can we spin off a kubernetes cronjob automatically and dynamically? How can we do it in AWS EKS, Azure AKS based on queues or notifications?

For my microservice based application, I am designing a component which is as follows:
Task that we want to execute is of periodic nature. For it, i planned to make use of the Kubernetes cron-jobs. It executes the job every 1 hour. This works perfectly fine.
In few scenarios, i want to execute this task on-demand (in stead of waiting for next hour window). For example, if next job time is 2:00pm, i want to execute it early, say 1:20pm.
There is a related question - How can I trigger a Kubernetes Scheduled Job manually?
But I am not looking for a manual way of achieving it or explicitly calling kubectl
commands. Is there a way do it automatically, based on events/queues?
Our application is deployed on AWS EKS and Azure AKS. Can I integrate the k8 clusters to read onto some queues/pub-subs (ex. aws-sqs, aws-sns) and do it dynamically?
Your help would be immensely appreciated!
If you application is running on Kubernetes and don't want to get migrated to serverless function and keep everything inside the Kubernetes cluster you can use the Knative.
Scale to Zero With Knative
Knative is a serverless platform that is built on top of Kubernetes. It provides higher-level abstractions for common application use cases.
One key feature is its ability to run generic (micro) service-based applications as serverless with the help of built-in scale to zero support. Knative has introduced its own autoscaler, Knative Pod Autoscaler (KPA), that supports scale to zero for any service that uses non-CPU-based scaling matrics.
update your micro service to running with Knative minor change will be there and you can run it on Kubernetes.

How to deploy workload with K8s on-demand (GKE)?

I need to deploy a GPU intensive task on GCP. I want to use a Node.js Docker image and within that container to run a Node.js server that listens to HTTP requests and runs a Python image processing script on-demand (every time that a new HTTP request is received containing the images to be processed). My understanding is that I need to deploy a load balancer in front of the K8s cluster that has a static public IP address which then builds/launches containers every time a new HTTP request comes in? And then destroy the container once processing is completed. Is container re-use not a concern? I never worked with K8s before and I want to understand how it works and after reading the GKE documentation this is how I imagine the architecture. What am I missing here?
runs a Python image processing script on-demand (every time that a new HTTP request is received containing the images to be processed)
This can be solved on Kubernetes, but it is not a very common kind of workload.
The project that support your problem best is Knative with its per-request auto-scaler. Google Cloud Run is the easiest way to use this. But if you want to run this within your own GKE cluster, you can enable it.
That said, you can also design your Node.js service to integrate with the Kubernetes API-server to create Jobs - but it is not a good design to have common workload talk to the API-server. It is better to use Knative or Google Cloud Run.

Triggering a Kubernetes-based application from AppEngine

I'm currently looking into triggering some 3D rendering from an AppEngine-based service.
The idea is that input data is submitted by an API client to this web service, which then invokes an internal Kubernetes GPU enabled application ("rendering backend") to do the hard work.
GPU-enabled clusters are relatively expensive ($$$), so I really want the cluster to be up and running on demand. I am trying to achieve that by setting the autoscaling minimum to 0 for the rendering backend.
The only pretty way of "triggering" a rendering task on such a cluster I could think of is via Pub/Sub Push. Basically, I need something like Cloud Tasks, but those seem to be aimed at long running tasks executed in AppEngine, not Kubernetes. Plus I like the way Pub/Sub decouples the web service from the rendering backend.
Google's Pub/Sub only allows pushing via HTTPS and only to a validated domain. It appears that Google is forcing me to completely "expose" my internal rendering backend by assigning a domain name to it, which feels ridiculous. I cannot just tell Pub/Sub to invoke http://loadbalancer.IP.address/handle_push.
This is making me doubt my architecture.
How would you go about building something like this on GCP?
From the GKE perspective:
You can have a cluster with a dedicated GPU-based nodepool and schedule your pods there using Taints and tolerations. Additionally, you can control the number of nodes in your nodepool using Autoscaling so that, you can use them only when your pods are to be scheduled/run.
Consider that this requires an additional default-non-GPU-based nodepool, where system pods are being run.
For triggering, as long as your default pool is running, you'd be able to deploy your application and the autoscaling should start automatically. For deploying from an App Engine application, you might want to consider talking to the Kubernetes API directly through a library.
Finally and considering the nature of your current goal (3D rendering), it might be best to use Kubernetes Jobs. With these, you can complete an sporadic computational load, allowing the nodepool to downsize once is finished.
Wrapping up, you can have a minimum cluster with a zero-sized GPU-based nodepool that will autoscale when a tainted job is requested to be run there, and once the workload is finished, it should automatically downscale. These actions can be triggered from GAE, using one of the client libraries.

How to best run Apache Airflow tasks on a Kubernetes cluster?

What we want to achieve:
We would like to use Airflow to manage our machine learning and data pipeline while using Kubernetes to manage the resources and schedule the jobs. What we would like to achieve is for Airflow to orchestrate the workflow (e.g. Various tasks dependencies. Re-run jobs upon failures) and Kubernetes to orchestrate the infrastructure (e.g cluster autoscaling and individual jobs assignment to nodes). In other words Airflow will tell the Kubernetes cluster what to do and Kubernetes decides how to distribute the work. In the same time we would also want Airflow to be able to monitor the individual tasks status. For example if we have 10 tasks spreaded across a cluster of 5 nodes, Airflow should be able to communicate with the cluster and reports show something like: 3 “small tasks” are done, 1 “small task” has failed and will be scheduled to re-run and the remaining 6 “big tasks” are still running.
Questions:
Our understanding is that Airflow has no Kubernetes-Operator, see open issues at https://issues.apache.org/jira/browse/AIRFLOW-1314. That being said we don’t want Airflow to manage resources like managing service accounts, env variables, creating clusters, etc. but simply send tasks to an existing Kubernetes cluster and let Airflow know when a job is done. An alternative would be to use Apache Mesos but it looks less flexible and less straightforward compared to Kubernetes.
I guess we could use Airflow’s bash_operator to run kubectl but this seems not like the most elegant solution.
Any thoughts? How do you deal with that?
Airflow has both a Kubernetes Executor as well as a Kubernetes Operator.
You can use the Kubernetes Operator to send tasks (in the form of Docker images) from Airflow to Kubernetes via whichever AirflowExecutor you prefer.
Based on your description though, I believe you are looking for the KubernetesExecutor to schedule all your tasks against your Kubernetes cluster. As you can see from the source code it has a much tighter integration with Kubernetes.
This will also allow you to not have to worry about creating the docker images ahead of time as is required with the Kubernetes Operator.

Feasibility of using multi master Kubernetes cluster architecture

I am trying to implement CI/CD pipeline using Kubernetes and Jenkins. In my application I have 25 Micro services. And need to deploy it for 5 different clients. The microservice code is unique. But configuration for each client is different.
So here I am configuring Spring cloud config server with 5 different Profiles/Configuration. And When I am building Docker images, I will define which is the active config server profile by adding active profile in Docker file. So from 25 microservices I am building 25 * 5 number of Docker images and deploying that. So total 125 microservices I need to deploy in Kubernetes cluster. And these microservice are calling from my Angular 2 front end application.
Here when I am considering the performance of application and speed of response, the single master is enough of this application architecture? Or Should I definitely need to use the multi master Kubernetes cluster? How I can manage this application?
I am new to these cloud and CI/CD pipeline architecture tasks. So I have confusion related with designing of workflow. If single master is enough, then I can continue with current. Otherwise I need to implement the multi master Kubernetes HA cluster.
The performance of the application and/or the speed do not depend on the number of master nodes. It resolves High Availability issues, but not performance. Now, you should still consider having at least 3 masters for this implementation you are working on. If the master goes down, your cluster is useless.
In Kubernetes, the master gets the API calls and acts upon them, by setting the desired state of the cluster to the current state. But in the end that's the nodes (slaves) doing the heavy work. So your performance issues will depend mostly, if not exclusively, on your nodes. If you have enough memory and CPU, you should be fine.
Multi master sounds like a good idea for HA.
You could also look at using Helm which lets you configure microservices in a per installation basis so that you don't have to keep re-releasing docker images each time to configure a new environment. You can then inject the helm configuration into, say, a ConfigMap that mounts the content as an application.yml so that Spring Boot automatically loads the settings