I want to stop/hibernate the cluster to save cost, any best approach/practice for it? - kubernetes

I have created A GKE cluster for a POC, later on, I want to stop/hibernate the cluster to save cost, any best approach/practice for it?

You can put all your node pool to 0 VM but be careful to data lost (according to your node pool configuration, if you delete all the VM you can loose data). However, you will continue to pay for the control plane.
Another approach is to backup your data and to use IaC (Infra as code, such as terraform) to detroy and rebuild your cluster as needed.
Both approach are valid, they depend on your use case and how long to you need to hibernate your cluster.
An alternative is to use GKE Autopilot if your workloads are compliant with this deployment mode.

Related

How to upsize volume of Terraformed EKS node

We have been using Terraform for almost a year now to manage all kinds of resources on AWS from bastion hosts to VPCs, RDS and also EKS.
We are sometimes really baffled by the EKS module. It could however be due to lack of understanding (and documentation), so here it goes:
Problem: Upsizing Disk (volume)
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "12.2.0"
cluster_name = local.cluster_name
cluster_version = "1.19"
subnets = module.vpc.private_subnets
#...
node_groups = {
first = {
desired_capacity = 1
max_capacity = 5
min_capacity = 1
instance_type = "m5.large"
}
}
I thought the default value for this (dev) k8s cluster's node can easily be the default 20GBs but it's filling up fast so I know want to change disk_size to let's say 40GBs.
=> I thought I could just add something like disk_size=40 and done.
terraform plan tells me I need to replace the node. This is a 1 node cluster, so not good. And even if it were I don't want to e.g. drain nodes. That's why I thought we are using managed k8s like EKS.
Expected behaviour: since these are elastic volumes I should be able to upsize but not downsize, why is that not possible? I can def. do so from the AWS UI.
Sure with a slightly scary warning:
Are you sure that you want to modify volume vol-xx?
It may take some time for performance changes to take full effect.
You may need to extend the OS file system on the volume to use any newly-allocated space
But I can work with the provided docs on that: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/recognize-expanded-volume-linux.html?icmpid=docs_ec2_console
Any guidelines on how to up the storage? If I do so with the UI but don't touch Terraform then my EKS state will be nuked/out of sync.
To my knowledge, there is currently no way to resize an EKS node volume without recreating the node using Terraform.
Fortunately, there is a workaround: As you also found out, you can directly change the node size via the AWS UI or API. To update your state file afterward, you can run terraform apply -refresh-only to download the latest data (e.g., the increased node volume size). After that, you can change the node size in your Terraform plan to keep both plan and state in sync.
For the future, you might want to look into moving to ephemeral nodes as (at least my) experience shows that you will have unforeseeable changes to clusters and nodes from time to time. Already planning with replaceable nodes in mind will make these changes substantially easier.
By using the terraform-aws-eks terraform module you are actually following the "ephemeral nodes" paradigm, because for both ways of creating instances (self-managed workers or managed node groups) the module is creating Autoscaling Groups that create EC2 instances out of a Launch Template.
ASG and Launch Templates are specifically designed so that you don't care anymore about specific nodes, and rather you just care about the number of nodes. This means that for updating the nodes, you just replace them with new ones, which will use the new updated launch template (with more GBs for example, or with a new updated AMI, or a new instance type).
This is called "rolling updates", and it can be done manually (adding new instances, then draining the node, then deleting the old node), with scripts (see: eks-rolling-update in github by Hellofresh), or it can be done automagically if you use the AWS managed nodes (the ones you are actually using when specifying "node_groups", that is why if you add more GB, it will replace the node automatically when you run apply).
And this paradigm is the most common when operating Kubernetes in the cloud (and also very common on-premise datacenters when using virtualization).
Option 1) Self Managed Workers
With self managed nodes, when you change a parameter like disk_size or instance_type, it will change the Launch Template. It will update the $latest version tag, which is commonly where the ASG is pointing to (although can be changed). This means that old instances will not see any change, but new ones will have the updated configuration.
If you want to change the existing instances, you actually want to replace them with new ones. That is what this ephemeral nodes paradigm is.
One by one you can drain the old instances while increasing the number of desired_instances on the ASG, or let the cluster autoscaler do the job. Alternatively, you can use an automated script which does this for you for each ASG: https://github.com/hellofresh/eks-rolling-update
In terraform_aws_eks module, you create self managed workers by either using worker_groups or worker_groups_launch_template (recommended) field
Option 2) Managed Nodes
Managed nodes is an EKS-specific feature. You configure them very similarly, but in reality, it is an abstraction, and AWS will create the actual underlying ASG.
You can specify a Launch Template to be used by the ASG and its version. Some config can be specified at the managed node level (i.e. AMI and instance_types) and at the Launch Template (if it wasn't specified in the former).
Any change on the node group level config, or on the Launch Template version, will trigger an automatic rolling update, which will replace all old instances.
You can delay the rolling update by just not pointing to the $latest version (or pointing to $default, and not updating the $default tag when changing the LT).
In terraform_aws_eks module, you create self managed workers by using the node_groups field. You can also play with these settings: create_launch_template=true and set_instance_types_on_lt=true if you want the module to create the LT for you (alternatively you can just not use it, or pass a reference to one); and to set the instance_type on such LT as specified above.
But behavior is similar to worker groups. In no case you will have your existing instances changed. You can only change them manually.
However, there is an alternative: The manual way
You can use the EKS module to create the control plane, but then use a regular EC2 resource in terraform (https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/instance) to create one ore multiple (using count or for_each) instances.
If you create the instances using the aws_instance resource, then terraform will patch those instances (updated-in-place) when any change is allowed (i.e. increasing the root volue GB or the instance type; whereas changing the AMI will force a replacement).
The only tricky part, is that you need to configure the cloud-init script to make the instance join the cluster (something that is automatically done by the EKS module when using self/managed node groups).
However, it is very possible, and you can borrow the script from the module and plug it into the aws_instance's user_data field (https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/instance#user_data)
In this case (when talking about disk_size), however, you still need to manually (either by SSH, or by running an hacky exec using terraform) to patch the XFS filesystem so it sees the increased disk space.
Another alternative: Consider Kubernetes storage
That said, there is also another alternative for certain use cases. If you want to increase the disk space of those instances because of one of your applications using a hostPath, then it might be the case that you can use a kubernetes built-in storage solution using the EBS CSI driver.
For example, I manage an ElasticSearch cluster in Kubernetes (and deploy it from terraform with the helm module), and it uses dynamic storage provisioning to request an EBS volume (note that performance is the same, because both root and this other volume are EBS volumes). EBS CSI driver supports volume expansion, so I can just increase this disk by changing a terraform variable.
To conclude, I would not recommend the aws_instance way, unless you understand it and are sure you really want it. It may make sense in certain cases, but definitely not common

Spring Batch Restartability on Kubernetes for File Operations

I want to learn what is the proper way to reach the processed files when restarting the spring batch application on Kubernetes. Especially if the target type is file, it is being deleted together with the pod after the job failed.
We are considering to use persistent volume or backing up the created file somewhere such as DB or sftp server by implementing a listener.
Is there anyone have the experience of persistent volume usage(nfs or other solutions) for file operations. We are concerned about the performance and unexpected problems. Do you have any suggestions?
Thank you.
You should not rely on the ephemeral file system of a Pod for files that should persist and survive a Job (Pod) failure.
You need to use a persistent volume for that, so that Spring Batch can find the (incomplete) output file in a restart scenario and resume writing where it left off.
If you want data persistence, you may begin by using hostPath volumes first. This will restrict which nodes your pods may be spawned on. But is the simplest and gives you the best performance.
https://kubernetes.io/docs/concepts/storage/volumes/#hostpath
If you want dynamic allocation, you will need to configure storage solutions such as GlusterFS, NFS, CEPH etc.

How can I have a GKE cluster "expire" and delete itself?

We stand up a lot of clusters for testing/poc/deving and its up to us to remember to delete them
What I would like is a way of setting a ttl on an entire gke cluster and having it get deleted/purged automatically.
I could tag the clusters with a timestamp at creation and have an external process running on a schedule that reaps old clusters, but it'd be great if I didn't have to do that- it might be the only way but maybe there is a gke/k8s feature for this?
Is there a way to have the cluster delete itself without relying on an external service? I suppose it could spawn a cloud function itself- but Im wondering if there is a native gke/k8s feature to do this more elegantly
You can spawn GKE cluster with Alpha features. Such clusters exist for one month maximum and then are auto-deleted.
Read more: https://cloud.google.com/kubernetes-engine/docs/concepts/alpha-clusters
Try Cloud Scheduler and hook it up with your build server. Cloud Scheduler supports Http , App Engine , Pub/Sub endpoints.
I don't believe there is a native way to do this, but it doesn't seem unreasonable to use cloud scheduler to every so often trigger a cloud function which looks for appropriately labeled clusters and triggers their deletion via the API.

How do I setup a Active / Passive environment with two nodes in OpenShift?

I am trying to configure a Active/Passive cluster with two nodes (using OpenShift). The second passive node should be a hot standby, in other words it is up and running but not doing anything, until the first node dies. Then the passive node becomes active and a new passive node is started.
I have read the High Availability documentation, however it just seems to cover the theory. Furthermore it seems like overkill ( I am thinking there might be an easier way to meet my goal).
Where would I start?
What you are asking for goes against the usual practice for how Kubernetes/OpenShift is used. You wouldn't have hot standby nodes, you would always use all nodes in the cluster. You would then allow for enough additional capacity in your cluster such that loosing a node doesn't cause a problem as other nodes would have enough capacity to then run the applications. In this scenario the Kubernetes scheduler would automatically restart any applications which were on a failed node on the other nodes in the cluster, without you needing to perform any explicit failover steps.
So don't try and do anything special, setup your cluster with the two nodes, with applications being distributed across both. If you need to have the ability to run with only a single node, make sure it has enough capacity to run everything. If over time you add more applications and one node is not enough, add a third node, with all three being used in normal case. You can then handle failure of a single node again.

Persistent storage for Apache Mesos

Recently I've discovered such a thing as a Apache Mesos.
It all looks amazingly in all that demos and examples. I could easily imagine how one would run for stateless jobs - that fits to the whole idea naturally.
Bot how to deal with long running jobs that are stateful?
Say, I have a cluster that consists of N machines (and that is scheduled via Marathon). And I want to run a postgresql server there.
That's it - at first I don't even want it to be highly available, but just simply a single job (actually Dockerized) that hosts a postgresql server.
1- How would one organize it? Constraint a server to a particular cluster node? Use some distributed FS?
2- DRBD, MooseFS, GlusterFS, NFS, CephFS, which one of those play well with Mesos and services like postgres? (I'm thinking here on the possibility that Mesos/marathon could relocate the service if goes down)
3- Please tell if my approach is wrong in terms of philosophy (DFS for data servers and some kind of switchover for servers like postgres on the top of Mesos)
Question largely copied from Persistent storage for Apache Mesos, asked by zerkms on Programmers Stack Exchange.
Excellent question. Here are a few upcoming features in Mesos to improve support for stateful services, and corresponding current workarounds.
Persistent volumes (0.23): When launching a task, you can create a volume that exists outside of the task's sandbox and will persist on the node even after the task dies/completes. When the task exits, its resources -- including the persistent volume -- can be offered back to the framework, so that the framework can launch the same task again, launch a recovery task, or launch a new task that consumes the previous task's output as its input.
Current workaround: Persist your state in some known location outside the sandbox, and have your tasks try to recover it manually. Maybe persist it in a distributed filesystem/database, so that it can be accessed from any node.
Disk Isolation (0.22): Enforce disk quota limits on sandboxes as well as persistent volumes. This ensures that your storage-heavy framework won't be able to clog up the disk and prevent other tasks from running.
Current workaround: Monitor disk usage out of band, and run periodic cleanup jobs.
Dynamic Reservations (0.23): Upon launching a task, you can reserve the resources your task uses (including persistent volumes) to guarantee that they are offered back to you upon task exit, instead of going to whichever framework is furthest below its fair share.
Current workaround: Use the slave's --resources flag to statically reserve resources for your framework upon slave startup.
As for your specific use case and questions:
1a) How would one organize it? You could do this with Marathon, perhaps creating a separate Marathon instance for your stateful services, so that you can create static reservations for the 'stateful' role, such that only the stateful Marathon will be guaranteed those resources.
1b) Constraint a server to a particular cluster node? You can do this easily in Marathon, constraining an application to a specific hostname, or any node with a specific attribute value (e.g. NFS_Access=true). See Marathon Constraints. If you only wanted to run your tasks on a specific set of nodes, you would only need to create the static reservations on those nodes. And if you need discoverability of those nodes, you should check out Mesos-DNS and/or Marathon's HAProxy integration.
1c) Use some distributed FS? The data replication provided by many distributed filesystems would guarantee that your data can survive the failure of any single node. Persisting to a DFS would also provide more flexibility in where you can schedule your tasks, although at the cost of the difference in latency between network and local disk. Mesos has built-in support for fetching binaries from HDFS uris, and many customers use HDFS for passing executor binaries, config files, and input data to the slaves where their tasks will run.
2) DRBD, MooseFS, GlusterFS, NFS, CephFS? I've heard of customers using CephFS, HDFS, and MapRFS with Mesos. NFS would seem an easy fit too. It really doesn't matter to Mesos what you use as long as your task knows how to access it from whatever node where it's placed.
Hope that helps!