How can I have a GKE cluster "expire" and delete itself? - kubernetes

We stand up a lot of clusters for testing/poc/deving and its up to us to remember to delete them
What I would like is a way of setting a ttl on an entire gke cluster and having it get deleted/purged automatically.
I could tag the clusters with a timestamp at creation and have an external process running on a schedule that reaps old clusters, but it'd be great if I didn't have to do that- it might be the only way but maybe there is a gke/k8s feature for this?
Is there a way to have the cluster delete itself without relying on an external service? I suppose it could spawn a cloud function itself- but Im wondering if there is a native gke/k8s feature to do this more elegantly

You can spawn GKE cluster with Alpha features. Such clusters exist for one month maximum and then are auto-deleted.
Read more: https://cloud.google.com/kubernetes-engine/docs/concepts/alpha-clusters

Try Cloud Scheduler and hook it up with your build server. Cloud Scheduler supports Http , App Engine , Pub/Sub endpoints.

I don't believe there is a native way to do this, but it doesn't seem unreasonable to use cloud scheduler to every so often trigger a cloud function which looks for appropriately labeled clusters and triggers their deletion via the API.

Related

GKE automated pod recycling ideas

I'm thinking of a solution to do a rolling update on a schedule without really releasing something. I was thinking of an ENV variable change through kubectl patch to kick off the update in GKE.
The context is we have containers that don't do garbage collection, and the temporary fix and easiest path forward and is cycling out pods frequently that we can schedule on a nightly.
Our naive approach would be to schedule this through our build pipeline, but seems like there's a lot of moving parts.
I haven't looked at Cloud Functions, but I'm sure there's an API that could do this and I'm leaning towards automating this with Cloud Functions.
Or is there already a GKE solution to do this?
Any other approaches to this problem?
I know AWS's ec2 has this feature for ASG, I was looking for the same thing so I don't to do a hacky ENV var change on manifest.
I can think of two possibilities:
Cronjobs. You can use CronJobs to run tasks at a specific time or interval. Suggested for automatic tasks, such as backups, reporting, sending emails, or cleanup tasks. More details here.
CI/CD with CloudBuild. When you push a change to your repository, Cloud Build automatically builds and deploys the container to your GKE cluster.

GKE and Task Queues

I am working on a cloud service platform that consists of getting tasks from users, executing them, and giving back the results.
TL;DR
Is there a way to have a "task queue", where tasks can be inserted via a REST API, and extracted automatically by the Google Kubernetes Engine cluster by guaranteeing an automatic scaling?
Long description
Users can send tasks in parallel, and each task is time consuming and need to be performed on a GPU. So, setting up an auto-scaling GPU cluster is what I thought of.
More in particular, in my idea, users could send tasks/data through a REST API, the REST API provides in filling a task queue, and the task queue itself will feed tasks to workers on the GPU auto-scaling cluster. Of course, there are other details (authentication, database, storage, etc.) that have to be addressed but are not the point of my question.
For reasons I don't specify here, the project is already started on the Google Cloud Platform, so switching to AWS or other providers is not an option.
For what I understood, things seem a bit different from standard Docker-only clusters in AWS, that is, we have to use the Google Kubernetes Engine (GKE) to setup the auto-scaling cluster, even for "simple" GPU-enabled Docker containers.
By looking at the not-so-exhaustive documentation, I know that queues are used, but what I don't know is whether feeding of tasks to the cluster is automatically handled. Also, the so-called "Task Queue" service has been deprecated.
Thank you!
First I thought Cloud Tasks queues may be the answer to your troubles, but more this post seems to promote Cloud Pub/Sub as a better alternative.
After a quick chat with batch developers, the current solution (before the batch service become public) is to adopt a third-party queue system like Slurm.

Kubernetes: Policy check before container execution

I am new to Kubernetes, I am looking to see if its possible to hook into the container execution life cycle events in the orchestration process so that I can call an API to pass the details of the container and see if its allowed to execute this container in the given environment, location etc.
An example check could be: container can only be run in a Europe or US data centers. so before someone tries to execute this container, outside this region data centers, it should not be allowed.
Is this possible and what is the best way to achieve this?
You can possibly set up an ImagePolicy admission controller in the clusters, were you describes from what registers it is allowed to pull images.
kube-image-bouncer is an example of an ImagePolicy admission controller
A simple webhook endpoint server that can be used to validate the images being created inside of the kubernetes cluster.
If you don't want to start from scratch...there is a Cloud Native Computing Foundation (incubating) project - Open Policy Agent with support for Kubernetes that seems to offer what you want. (I am not affiliated with the project)

How to run multiple Kubernetes jobs in sequence?

I would like to run a sequence of Kubernetes jobs one after another. It's okay if they are run on different nodes, but it's important that each one run to completion before the next one starts. Is there anything built into Kubernetes to facilitate this? Other architecture recommendations also welcome!
This requirement to add control flow, even if it's a simple sequential flow, is outside the scope of Kubernetes native entities as far as I know.
There are many workflow engine implementations for Kubernetes, most of them are focusing on solving CI/CD but are generic enough for you to use however you want.
Argo: https://applatix.com/open-source/argo/
Added a custom resource deginition in Kubernetes entity for Workflow
Brigade: https://brigade.sh/
Takes a more serverless like approach and is built on Javascript which is very flexible
Codefresh: https://codefresh.io
Has a unique approach where you can use the SaaS to easily get started without complicated installation and maintenance, and you can point Codefresh at your Kubernetes nodes to run the workflow on.
Feel free to Google for "Kubernetes Workflow", and discover the right platform for yourself.
Disclaimer: I work at Codefresh
I would try to use cronjobs and set the concurrency policy to forbid so it doesn't run concurrent jobs.
I have worked on IBM TWS (Workload Automation) which is a scheduler similar to cronjob where you can mention the dependencies of the jobs.
You can specify a job to run only after it's dependencies has run using follows keyword.

Akka cluster and OpenShift

I'm new to Akka Clusters, however as I am understanding its documentation, I need to know at least one "seed node" to join an existing cluster.
So when using clusters with OpenShift I would need to know if the current gear is the first node - then I would create a new cluster - or if there are already some other gears around - I would need to know at least one of their IPs to join them.
Is this possible with OpenShift cloud? (I'm using the DIY catridge, so customizing the start up script wouldn't be a problem. However I can't find any environment variable which provides me relevant data.)
DIY gears on OpenShift Online do not scale. And if you are spinning up separate applications for each of the nodes in your cluster, you are going to (probably) run into inter-gear communication issues. You might need to create your own akka cartridge (http://docs.openshift.org/origin-m4/oo_cartridge_developers_guide.html), then you can set your own scaling options. You might check out this cartridge (https://github.com/smarterclayton/openshift-redis-cart) which supports scaling and might give you some ideas about how to implement yours.