Kubernetes - How is high availability ensured if I deploy a containerised app? - kubernetes

I am new to the kubernetes environment. While deploying an application, I could figure out how to do auto scaling but did not quite understand how high availability is ensured? If its not, how can I configure it?
Edit : By HA, I mean how to ensure that pod is scheduled across multiple nodes to ensure HA on pod/service level.
Please guide. Thanks in advance! :)

By HA, I mean how to ensure that pod is scheduled across multiple
nodes to ensure HA on pod/service level.
I'm guessing your app is cloud compatible and can be scaled, In this situation there are multiple feature your can take advantage of:
DaemonSets: containers on demonsets will be run on every single node. Unless you include/exclude certain nodes.
Deployments: Deployments are next generation of Replication Controllers. Using deployments you can easily scale your application as well as ensure availability of certain number of pods. Please note in order to be available on node failure, you need to set node affinity rules on the pods. In order to do that you need to set it in the pod templates. In 1.6 affinity can be specified as a field in PodSpec, rather than using annotations.

Related

How to avoid downtime during scheduled maintenance window

I'm experiencing downtimes whenever the GKE cluster gets upgraded during the maintenance window. My services (APIs) become unreachable for like ~5min.
The cluster Location type is set to "Zonal", and all my pods have 2 replicas. The only affected pods seem to be the ones using nginx ingress controller.
Is there anything I can do to prevent this? I read that using Regional clusters should prevent downtimes in the control plane, but I'm not sure if it's related to my case. Any hints would be appreciated!
You mention "downtime" but is this downtime for you using the control plane (i.e. kubectl stop working) or is it downtime in that the end user who is using the services stops seeing the service working.
A GKE upgrade upgrades two parts of the cluster: the control plane or master nodes, and the worker nodes. These are two separate upgrades although they can happen at the same time depending on your configuration of the cluster.
Regional clusters can help with that, but they will cost more as you are having more nodes, but the upside is that the cluster is more resilient.
Going back to the earlier point about the control plane vs node upgrades. The control plane upgrade does NOT affect the end-user/customer perspective. The services will remaining running.
The node upgrade WILL affect the customer so you should consider various techniques to ensure high availability and resiliency on your services.
A common technique is to increase replicas and also to include pod antiaffinity. This will ensure the pods are scheduled on different nodes, so when the node upgrade comes around, it doesn't take the entire service out because the cluster scheduled all the replicas on the same node.
You mention the nginx ingress controller in your question. If you are using Helm to install that into your cluster, then out of the box, it is not setup to use anti-affinity, so it is liable to be taken out of service if all of its replicas get scheduled onto the same node, and then that node gets marked for upgrade or similar.

Is it possible to schedule a pod to run for say 24 hours and then remove deployment/statefulset? or need to use jobs?

We have a bunch of pods running in dev environment. The pods are auto-provisioned by an application on every business action. The problem is that across various namespaces they are accumulating and eating available resources in EKS.
Is there a way without jenkins/k8s jobs to simply put some parameter on the pod manifest to tell it to self destruct say in 24 hours?
Add to your pod.spec:
activeDeadlineSeconds: 86400
After deadline your Pod will be stopped for good with the status DeadlineExceeded
If I understood your situation properly, you would like to scale your cluster down in order to save resources.
Kubernetes is featured with the ability to autoscale your application in a cluster. Literally, it means that Kubernetes can start additional pods when the load is increasing and terminate excessive pods when the load is decreasing.
It is possible to downscale the application to zero pods, but, in this case, you will have a delay serving the first request while the pod is starting.
This functionality relies on performance metrics. From the practical side, it means that autoscaling doesn't happen instantly, because it takes some time to performance metrics reach the configured threshold.
The mentioned Kubernetes feature called HPA(horizontal pod autoscale) is described in this document.
In case you are running your cluster on GCP or GKE, you are able to go further and automatically start additional nodes for your cluster when you need more computing capacity and shut down nodes when they are not running application pods anymore.
More information about this functionality can be found following the link.
Last, but not least, you can use tool like Ansible to manage all your kubernetes assets (it can create/manage deployments via playbooks).
If you decide to give it a try, you might find this information useful:
Creating a Container cluster in GKE
70% cheaper Kubernetes cluster on AWS
How to build a Kubernetes Horizontal Pod Autoscaler using custom metrics

AWS EKS Communication Between Node Groups

I have an EKS cluster with a single node group (3 nodes) that is currently only running Jenkins4.
I want to start utilising this EKS cluster for other things but want to separate out deployments into specific node groups.
For example, I want to create a 'monitoring' node group to which I will deploy prometheus and grafana. I also want another larger node group for application deployments.
I know I can create a second node group in EKS and label it with 'monitoring' so I can use nodeSelector to deploy to the correct node group.
My question is around whether I need to consider networking between the node groups. For prometheus for example to be able to scrape from exporters running on pods on the other node groups.
Is that something which is required with some sort of ingress rule? Or is it not required. If it required, what is the correct way to implement this?
As long as the nodes are in the same cluster and belong to the same master and no custom network policy prevents node groups from reaching each other you should be able to rely on ClusterIPs.
My concern is more related on the reason why you should prefer to use dedicated node groups for separating tasks. Is that because of specific requirements? As long as you have available resources in your cluster I would leverage on the existing nodes and deploy Kubernetes Resources (deployments/services/etc..) in dedicated namespaces which is the kind of separation looks appropriate the most to me in your case. Then, at the time you need more horsepower, you can scale horizontally your cluster even with different hardware, specific labels and NodeAffinity (instead of NodeSelector, for better customisation).
Hope I helped.

Is There a Way To Control Demonset's Rolling Update Way In Kubernetes?

I have three demonset pods which contain a container of hadoop resource manager in each pod. One of three is active node. And the other two are standby nodes.
So there is two quesion:
Is there a way to let kubernetes know the hadoop resource manager
inside the pod is a active node or standby node?
I want to control the rolling update way to update the standby node at first and update the active node in last for decrease the times
changing active node which may cause risk.
Consider the following: Deployments, DaemonSets and ReplicaSets are abstractions meant to manage a uniform group of objects.
In your specific case, although you're running the same application, you can't say it's a uniform group of object as you have two types: active and standby objects.
There is no way for telling Kubernetes which is which if they're grouped in what is supposed to be an uniform set of objects.
As suggested by #wolmi, having them in a Deployment instead of DaemonSet still leaves you with the issue that deployment strategies can't individually identify objects to control when they're updated because of the aforementioned logic.
My suggestion would be that, additional to using a Deployment with node affinity to ensure a highly available environment, you separate active and standby objects in different Deployments/Services and base your rolling update strategy on that scenario.
This will ensure that you're updating the standby nodes first, removing the risk of updating the active nodes before the other.
I think this is not the best way to do that, totally understand that you use Daemonset to be sure that Hadoop exists on an HA environment one per node but you can have that same scenario using a deployment and affinity parameters more concrete the pod affinity, then you can be sure only one Hadoop node exists per K8S node.
With that new approach, you can use a replication-controller to control the rolling-update, some resources from the documentation:
https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity
https://kubernetes.io/docs/tasks/run-application/rolling-update-replication-controller/

Set replicas on different nodes

I am developing an application for dealing with kubernetes runtime microservices. I actually did some cool things, like moving a microservice from a node to another one. The problem is that all replicas go together.
So, Imagine that a microservice has two replicas and it is running on a namespaces with two nodes.
I want to set one replica in each node. Is that possible? Even in a yaml file, is that possible?
I am trying to do my own scheduler to do that, but I got no success until now.
Thank you all
I think what you are looking for is a NodeSelector for your replica Set. From the documentation:
Inter-pod affinity and anti-affinity allow you to constrain which nodes your pod is eligible to be scheduled based on labels on pods that are already running on the node rather than based on labels on nodes.
Here is the documentation: https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#inter-pod-affinity-and-anti-affinity-beta-feature
I can't find where it's documented, but I recently read somewhere that replicas will be distributed across nodes when you create the kubernetes service BEFORE the deployment / replicaset.