Can we spin off a kubernetes cronjob automatically and dynamically? How can we do it in AWS EKS, Azure AKS based on queues or notifications? - kubernetes

For my microservice based application, I am designing a component which is as follows:
Task that we want to execute is of periodic nature. For it, i planned to make use of the Kubernetes cron-jobs. It executes the job every 1 hour. This works perfectly fine.
In few scenarios, i want to execute this task on-demand (in stead of waiting for next hour window). For example, if next job time is 2:00pm, i want to execute it early, say 1:20pm.
There is a related question - How can I trigger a Kubernetes Scheduled Job manually?
But I am not looking for a manual way of achieving it or explicitly calling kubectl
commands. Is there a way do it automatically, based on events/queues?
Our application is deployed on AWS EKS and Azure AKS. Can I integrate the k8 clusters to read onto some queues/pub-subs (ex. aws-sqs, aws-sns) and do it dynamically?
Your help would be immensely appreciated!

If you application is running on Kubernetes and don't want to get migrated to serverless function and keep everything inside the Kubernetes cluster you can use the Knative.
Scale to Zero With Knative
Knative is a serverless platform that is built on top of Kubernetes. It provides higher-level abstractions for common application use cases.
One key feature is its ability to run generic (micro) service-based applications as serverless with the help of built-in scale to zero support. Knative has introduced its own autoscaler, Knative Pod Autoscaler (KPA), that supports scale to zero for any service that uses non-CPU-based scaling matrics.
update your micro service to running with Knative minor change will be there and you can run it on Kubernetes.

Related

Flink and k8s deployment

Per Flink's doc, we can deploy a standalone Flink cluster on top of Kubernetes, using Flink’s standalone deployment,
or deploy Flink on Kubernetes using native Kubernetes deployments.
The document says
We generally recommend new users to deploy Flink on Kubernetes using native Kubernetes deployments.
Is it because native Kubernetes is easier to get start with, or is it because standalone mode is kind of legacy ?
In native Kubernetes mode, Flink is able to dynamically allocate and de-allocate TaskManagers depending on the required resources. While in standalone mode, task managers have to be provisioned manually.
It sounds to me that native Kubernetes mode is a better choice.
Posted community wiki based on other answers - David Anderson answer and austin_ce answer. Feel free to expand it.
Good explanation from the David Anderson answer:
Standalone mode:
In a Kubernetes session or per-job deployment, Flink has no idea it's running on Kubernetes. In this mode, Flink behaves as it does in any standalone deployment (where there is no cluster framework available to do resource management). Kubernetes just happens to be how the infrastructure was created, but as far as Flink is concerned, it could have been bare metal. You will have to arrange for kubernetes to create the infrastructure that you will have configured Flink to expect.
Native mode:
Session deployment
In a Native Kubernetes session deployment, Flink uses its KubernetesResourceManager, which submits a description of the cluster it wants to the Kubernetes ApiServer, which creates it. As jobs come and go, and the requirements for task managers (and slots) go up and down, Flink is able to obtain and release resources from kubernetes as appropriate.
Application mode
In Application Mode (blog post) (details) you end up with Flink running as a kubernetes application, which will automatically create and destroy cluster components as needed for the job(s) in one Flink application.
The native mode is recommended because it is just simpler, I would not say it is legacy:
The Native mode is the current recommendation for starting out on Kubernetes as it is the simplest option, like you noted. In Flink 1.13 (to be released in the coming weeks), there is added support for specifying Pod templates. One of the drawbacks to this approach is its limited ability to integrate with CI/CD.

How to scale a GKE deployment according to an AWS SQS queue size

Might be a strange one to ask, but found myself with:
An AWS Simple Queue that holds messages to process.
A deployment in a Kubernetes cluster on Google Cloud (GKE) that processes this queue.
I want to scale the deployment according to the queue size. A simple logic for example:
Queue size = 0 => deploy 3 pods
Queue size 0 > 10 > 1000 => deploy 20 pods
Queue size < 1000 => deploy 100 pods
Turns out that this isn't such a simple task, and I'm looking for ideas.
I tried to achieve this via the Horizontal pod autoscaler, but it looks like an impossible task.
My best idea is an AWS Lambda that monitors the queue (by messages or a cron schedule), and updates the Kubernetes deployment via API.
The easy part was monitoring the queue size and getting the desired scale for the deployment, but I'm not managing to physically control the deployment size via the AWS Lambda.
TL:DR, I would like to achieve kubectl functionality (scale deployment), but via an external lambda running node.js code, while authenticating to my google cloud platform, And it seems really tricky as well. There are a few client libraries, but none of them really documents how to authenticate and connect to my cluster.
I even thought about running the bash script from my deployment system - but running that through a lambda function using node.js 'exec' seems very very wrong.
Am I missing an easier way?
There's a project called Keda: https://keda.sh/docs/2.0/scalers/aws-sqs/. It supports horizontal scaling basing on a bunch of queue types. SQS is supported.
To securely access SQS/CloudWatch from a GKE one can use https://github.com/doitintl/gtoken which lets you assume AWS role from a GKE. Or in a simpler and less secure way - dedicated AWS user with periodic keys rotation. Also look at https://cloud.google.com/pubsub/docs/overview, perhaps you can replace your SQS to stay in one stack.
You can use WPA: https://github.com/practo/k8s-worker-pod-autoscaler to scale a GKE deployment based on SQS queue. The project scales based on combination of SQS metrics. https://medium.com/practo-engineering/launching-worker-pod-autoscaler-3f6079728e8b

Does AWS CDK provide constructs to deploy ECS applications?

We generally use BlueGreen & Rolling deployment strategy,
for docker containers in ECS container instances, to get deployed & updated.
Ansible ECS modules allow implement such deployment strategies with below modules:
https://docs.ansible.com/ansible/latest/modules/ecs_taskdefinition_module.html
https://docs.ansible.com/ansible/latest/modules/ecs_task_module.html
https://docs.ansible.com/ansible/latest/modules/ecs_service_module.html
Does AWS CDK provide such constructs for implementing deployment strategies?
CDK supports higher level constructs for ECS called "ECS patterns". One of them is ApplicationLoadBalancedFargateService which allows you to define an ECS Fargate service behind an Application Load Balancer. Rolling update is supported out of the box in this case. You simply run cdk deploy with a newer Docker image and ECS will take care of the deployment. It will:
Start a new task with the new Docker image.
Wait for several successful health checks of the new deployment.
Start sending new traffic to the new task, while letting the existing connections gracefully finish on the old task.
Once all old connections are done, ECS will automatically stop the old task.
If your new task does not start or is not healthy, ECS will keep running the original task.
Regarding Blue-Green deployment I think it's yet to be supported in CloudFormation. Once that's done, it can be implemented in CDK. If you can live without BlueGreen as IaC, you can define your CodeDeploy manually.
Check this NPM plugin which helps with blue-green deployment using CDK.
https://www.npmjs.com/package/#cloudcomponents/cdk-blue-green-container-deployment
Blue green deployments are supported in cloud formation now .
https://aws.amazon.com/about-aws/whats-new/2020/05/aws-cloudformation-now-supports-blue-green-deployments-for-amazon-ecs/
Don’t think CDK implementation is done yet .

Triggering a Kubernetes-based application from AppEngine

I'm currently looking into triggering some 3D rendering from an AppEngine-based service.
The idea is that input data is submitted by an API client to this web service, which then invokes an internal Kubernetes GPU enabled application ("rendering backend") to do the hard work.
GPU-enabled clusters are relatively expensive ($$$), so I really want the cluster to be up and running on demand. I am trying to achieve that by setting the autoscaling minimum to 0 for the rendering backend.
The only pretty way of "triggering" a rendering task on such a cluster I could think of is via Pub/Sub Push. Basically, I need something like Cloud Tasks, but those seem to be aimed at long running tasks executed in AppEngine, not Kubernetes. Plus I like the way Pub/Sub decouples the web service from the rendering backend.
Google's Pub/Sub only allows pushing via HTTPS and only to a validated domain. It appears that Google is forcing me to completely "expose" my internal rendering backend by assigning a domain name to it, which feels ridiculous. I cannot just tell Pub/Sub to invoke http://loadbalancer.IP.address/handle_push.
This is making me doubt my architecture.
How would you go about building something like this on GCP?
From the GKE perspective:
You can have a cluster with a dedicated GPU-based nodepool and schedule your pods there using Taints and tolerations. Additionally, you can control the number of nodes in your nodepool using Autoscaling so that, you can use them only when your pods are to be scheduled/run.
Consider that this requires an additional default-non-GPU-based nodepool, where system pods are being run.
For triggering, as long as your default pool is running, you'd be able to deploy your application and the autoscaling should start automatically. For deploying from an App Engine application, you might want to consider talking to the Kubernetes API directly through a library.
Finally and considering the nature of your current goal (3D rendering), it might be best to use Kubernetes Jobs. With these, you can complete an sporadic computational load, allowing the nodepool to downsize once is finished.
Wrapping up, you can have a minimum cluster with a zero-sized GPU-based nodepool that will autoscale when a tainted job is requested to be run there, and once the workload is finished, it should automatically downscale. These actions can be triggered from GAE, using one of the client libraries.

How to best run Apache Airflow tasks on a Kubernetes cluster?

What we want to achieve:
We would like to use Airflow to manage our machine learning and data pipeline while using Kubernetes to manage the resources and schedule the jobs. What we would like to achieve is for Airflow to orchestrate the workflow (e.g. Various tasks dependencies. Re-run jobs upon failures) and Kubernetes to orchestrate the infrastructure (e.g cluster autoscaling and individual jobs assignment to nodes). In other words Airflow will tell the Kubernetes cluster what to do and Kubernetes decides how to distribute the work. In the same time we would also want Airflow to be able to monitor the individual tasks status. For example if we have 10 tasks spreaded across a cluster of 5 nodes, Airflow should be able to communicate with the cluster and reports show something like: 3 “small tasks” are done, 1 “small task” has failed and will be scheduled to re-run and the remaining 6 “big tasks” are still running.
Questions:
Our understanding is that Airflow has no Kubernetes-Operator, see open issues at https://issues.apache.org/jira/browse/AIRFLOW-1314. That being said we don’t want Airflow to manage resources like managing service accounts, env variables, creating clusters, etc. but simply send tasks to an existing Kubernetes cluster and let Airflow know when a job is done. An alternative would be to use Apache Mesos but it looks less flexible and less straightforward compared to Kubernetes.
I guess we could use Airflow’s bash_operator to run kubectl but this seems not like the most elegant solution.
Any thoughts? How do you deal with that?
Airflow has both a Kubernetes Executor as well as a Kubernetes Operator.
You can use the Kubernetes Operator to send tasks (in the form of Docker images) from Airflow to Kubernetes via whichever AirflowExecutor you prefer.
Based on your description though, I believe you are looking for the KubernetesExecutor to schedule all your tasks against your Kubernetes cluster. As you can see from the source code it has a much tighter integration with Kubernetes.
This will also allow you to not have to worry about creating the docker images ahead of time as is required with the Kubernetes Operator.