Kubernetes - Single Cluster or Multiple Clusters - kubernetes

I'm migrating a number of applications from AWS ECS to Azure AKS and being the first production deployment for me in Kubernetes I'd like to ensure that it's set up correctly from the off.
The applications being moved all use resources at varying degrees with some being more memory intensive and others being more CPU intensive, and all running at different scales.
After some research, I'm not sure which would be the best approach out of running a single large cluster and running them all in their own Namespace, or running a single cluster per application with Federation.
I should note that I'll need to monitor resource usage per application for cost management (amongst other things), and communication is needed between most of the applications.
I'm able to set up both layouts and I'm sure both would work, but I'm not sure of the pros and cons of each approach, whether I should be avoiding one altogether, or whether I should be considering other options?

Because you are at the beginning of your kubernetes journey I would go with separate clusters for each stage you have (or at least separate dev and prod). You can very easily take your cluster down (I did it several times with resource starvation). Also not setting correctly those network policies you might find that services from different stages/namespaces (like test and sandbox) communicate with each other. Or pipelines that should deploy dev to change something in other namespace.
Why risk production being affected by dev work?
Even if you don't have to upgrade the control plane yourself, aks still has its versions and flags and it is better to test them before moving to production on a separate cluster.
So my initial decision would be to set some hard boundaries: different clusters. Later once you get more knowledge with aks and kubernetes you can review your decision.

As you said that communication is need among the applications I suggest you go with one cluster. Application isolation can be achieved by Deploying each application in a separate namespace. You can collect metrics at namespace level and can set resources quota at namespace level. That way you can take action at application level

A single cluster (with namespaces and RBAC) is easier to setup and manage. A single k8s cluster does support high load.
If you really want multiple clusters, you could try istio multi-cluster (istio service mesh for multiple cluster) too.

Depends... Be aware AKS still doesn't support multiple node pools (On the short-term roadmap), so you'll need to run those workloads in single pool VM type. Also when thinking about multiple clusters, think about multi-tenancy requirements and the blast radius of a single cluster. I typically see users deploying multiple clusters even though there is some management overhead, but good SCM and configuration management practices can help with this overhead.

Related

Does kubernetes support non distributed applications?

Our store applications are not distributed applications. We deploy on each node and then configured to store specific details. So, it is tightly coupled to node. Can I use kubernetes for this test case? Would I get benefits from it?
Our store applications are not distributed applications. We deploy on each node and then configured to store specific details. So, it is tightly coupled to node. Can I use kubernetes for this test case?
Based on only this information, it is hard to tell. But Kubernetes is designed so that it should be easy to migrate existing applications. E.g. you can use a PersistentVolumeClaim for the directories that your application store information.
That said, it will probably be challenging. A cluster administrator want to treat the Nodes in the cluster as "cattles" and throw them away when its time to upgrade. If your app only has one instance, it will have some downtime and your PersistentVolume should be backed by a storage system over the network - otherwise the data will be lost when the node is thrown away.
If you want to run more than one instance for fault tolerance, it need to be stateless - but it is likely not stateless if it stores local data on disk.
There are several ways to have applications running on fixed nodes of the cluster. It really depends on how those applications behave and why do they need to run on a fixed node of the cluster.
Usually such applications are Stateful and may require interacting with a specific node's resources, or writing directly on a mounted volume on specific nodes for performance reasons and so on.
It can be obtained with a simple nodeSelector or with affinity to nodes ( https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/ )
Or with local persistent volumes ( https://kubernetes.io/blog/2019/04/04/kubernetes-1.14-local-persistent-volumes-ga/ )
With this said, if all the applications that needs to be executed on the Kubernetes cluster are apps that needs to run on a single node, you lose a lot of benefits as Kubernetes works really well with stateless applications, which may be moved between nodes to obtain high availability and a strong resilience to nodes failure.
The thing is that Kubernetes is complex and brings you a lot of tools to work with, but if you end up using a small amount of them, I think it's an overkill.
I would weight the benefits you could get with adopting Kubernetes (easy way to check the whole cluster health, easy monitoring of logs, metrics and resources usage. Strong resilience to node failure for stateless applications, load balancing of requests and a lot more) with the cons and complexity that it may brings (especially migrating to it can require a good amount of effort, if you weren't using containers to host your applications and so on)

Should I run my testing environment in the same Kubernetes cluster as my production environment?

I am planning on deploying my services on digitalocean's managed Kubernetes Service and I'm trying to determine which is the best approach when working with testing and production environments.
These are the approaches I'm considering:
Have one performant cluster on which I create 2 namespaces prod and test to run the 2 environments on the same cluster.
Have one cluster entirely dedicated to my production environment and another cluster with lower specs that's dedicated to testing.
Is one of these approaches preferred/recommended or is it very subjectve?
Feel free to suggest a completely different approach if the one's I have suggested aren't good!
Thank you for your time,
Eliot
always prefer to go with two isolate cluster would be good option. but it will cost you.
if you want use one cluster for both environment. you could isolate physical resource but not traffic(Istio would help).
you could isolate by using hardcode resource quota to test env. to avoid consume Prod. resource and always use limit to all deployment and use nodeaffinity to limit your test env.
Resource quota - Namespace
Request limit - Deployment
Nodeaffinity - Node

benefits of running k8s pods in non default namespace

Pardon me for my limited knowledge of k8s. As per k8s best practices we need to run pods in non default namespace. few reasons for this approach is to.
create logical isolation and creating uat, sit,dev environment on
same k8s cluster
default namespace is ok when we are having less than
10 micro services running in same PODs.
do we have any other benefits in terms of security, performance and maintenance point of view?
I would say the best practice is to think about how you will use your cluster and take namespaces into account. So thinking about what you'll run in the cluster, how much resource you want to dedicate to it and who can do what. Namespaces can help with controlling all of these things.
In terms of what you run, it's important that kubernetes object names have to be unique within a namespace. So if you want to run two instances of the same app, then you either install them in different namespaces or distinguish the resource names - helm charts for example default to adding prefixes to ensure uniqueness.
Also role-based access control permissions can be set as namespace-specific and resource usage quotas can be applied to namespaces. So if you had adev namespace on the same cluster as UAT then you could ensure that permissions are more restricted on UAT and that it has more resource availability guaranteed for it.
For more on these points see https://dzone.com/articles/kubernetes-namespaces-explained and https://kubernetes.io/blog/2016/08/kubernetes-namespaces-use-cases-insights/

Linking kubernetes namespace to nodes

It seems to be good practice to map environments such as dev, qa and production to kubernetes namespaces. To achieve "true" separation, it seems to be a good idea to label nodes exclusively dedicated to one of those namespaces and ensure resources in those environments get scheduled on those nodes only. That's at least our current thinking. There may be manifests one might want to use in those namespaces that should/must not be tampered with. Kubernetes does not seem to support associating namespaces with nodes out of the box. PodNodeSelector admission controller seems close but is not quite what we are looking for. The only option to achieve what we want seems to be a custom mutating admission webhook as documented here.
I guess other people have been here before and there is a solution to address our primary concern which is that we don't want load on dev or qa environments impacting production performance.
Is there any off the shelf solution linking namespaces to nodes?
If not, are there any alternative approaches ensuring that environments do not interfere in terms of load/performance?
I guess other people have been here before and there is a solution to address our primary concern which is that we don't want load on dev or qa environments impacting production performance.
Been there, got burned by it.
Multiple environments in one cluster might be a good idea under certain circumstances but mixing dev/qa/stage with production in a single cluster spells trouble. Load itself might not be the main issue, especially if you mitigate effects with proper resource allocation, but any tweak, modification and dev-process induced outage on kube-system pods affects production directly. You can't test updates on kubernetes system components beforehand, any cni issue on dev can slow down or render inoperable production and so on... We went down that path and don't recommend it.
With that being said, separation as such is rather easy. On one of our clusters we do keep dev/qa/stage environments for some projects in single cluster, and separate some of the resources with labels. Strictly speaking not really env-separated but we do have dedicated nodes for elk covering all three environments, separate gitlab runners nodes, database nodes and so on, but principle is same. We label nodes and use nodeAffinity with nodeSelectorTerms to target group of nodes with same label for certain task/service (or environment in your case) separation. As a side notenodeSelector is depricated according to the official documentation.
In my opinion having multiple environments in one cluster is a bad idea, for many reasons.
If you are sure you want to do it, and don't want to kill the performance of production pods, you can easily attach resources to deployments/pods.
Another approach is to attach labels to nodes, and force particular pods to deploy on them using PodNodeSelector
In general it is not recommended to use namespaces to separate software environments (dev, test, staging, prod..).
The best practice is to use a dedicated cluster for each environment.
To save on costs, you can take the comprises of using:
1 cluster: dev, test, staging
1 cluster: prod
and with this setup, creation of resources namespace in the cluster shared for dev, testing, and staging get a little more annoying to be managed.
I found it very useful the motivations for using namespaces from the docs.
Still, if you need to ensure a set of nodes is dedicated only to the resources in a namespace, you should use a combination of:
podSelector to force scheduling of resources only on nodes of the set
and a taint to deny scheduling of any other resource not in the namespace on the nodes of the set
How to create a namespace dedicated to use only a set of nodes: https://stackoverflow.com/a/74617601/5482942

Multiple environments (Staging, QA, production, etc) with Kubernetes

What is considered a good practice with K8S for managing multiple environments (QA, Staging, Production, Dev, etc)?
As an example, say that a team is working on a product which requires deploying a few APIs, along with a front-end application. Usually, this will require at least 2 environments:
Staging: For iterations/testing and validation before releasing to the client
Production: This the environment the client has access to. Should contain stable and well-tested features.
So, assuming the team is using Kubernetes, what would be a good practice to host these environments? This far we've considered two options:
Use a K8s cluster for each environment
Use only one K8s cluster and keep them in different namespaces.
(1) Seems the safest options since it minimizes the risks of potential human mistake and machine failures, that could put the production environment in danger. However, this comes with the cost of more master machines and also the cost of more infrastructure management.
(2) Looks like it simplifies infrastructure and deployment management because there is one single cluster but it raises a few questions like:
How does one make sure that a human mistake might impact the production environment?
How does one make sure that a high load in the staging environment won't cause a loss of performance in the production environment?
There might be some other concerns, so I'm reaching out to the K8s community on StackOverflow to have a better understanding of how people are dealing with these sort of challenges.
Multiple Clusters Considerations
Take a look at this blog post from Vadim Eisenberg (IBM / Istio): Checklist: pros and cons of using multiple Kubernetes clusters, and how to distribute workloads between them.
I'd like to highlight some of the pros/cons:
Reasons to have multiple clusters
Separation of production/development/test: especially for testing a new version of Kubernetes, of a service mesh, of other cluster software
Compliance: according to some regulations some applications must run in separate clusters/separate VPNs
Better isolation for security
Cloud/on-prem: to split the load between on-premise services
Reasons to have a single cluster
Reduce setup, maintenance and administration overhead
Improve utilization
Cost reduction
Considering a not too expensive environment, with average maintenance, and yet still ensuring security isolation for production applications, I would recommend:
1 cluster for DEV and STAGING (separated by namespaces, maybe even isolated, using Network Policies, like in Calico)
1 cluster for PROD
Environment Parity
It's a good practice to keep development, staging, and production as similar as possible:
Differences between backing services mean that tiny incompatibilities
crop up, causing code that worked and passed tests in development or
staging to fail in production. These types of errors create friction
that disincentivizes continuous deployment.
Combine a powerful CI/CD tool with helm. You can use the flexibility of helm values to set default configurations, just overriding the configs that differ from an environment to another.
GitLab CI/CD with AutoDevops has a powerful integration with Kubernetes, which allows you to manage multiple Kubernetes clusters already with helm support.
Managing multiple clusters (kubectl interactions)
When you are working with multiple Kubernetes clusters, it’s easy to
mess up with contexts and run kubectl in the wrong cluster. Beyond
that, Kubernetes has restrictions for versioning mismatch between the
client (kubectl) and server (kubernetes master), so running commands
in the right context does not mean running the right client version.
To overcome this:
Use asdf to manage multiple kubectl versions
Set the KUBECONFIG env var to change between multiple kubeconfig files
Use kube-ps1 to keep track of your current context/namespace
Use kubectx and kubens to change fast between clusters/namespaces
Use aliases to combine them all together
I have an article that exemplifies how to accomplish this: Using different kubectl versions with multiple Kubernetes clusters
I also recommend the following reads:
Mastering the KUBECONFIG file by Ahmet Alp Balkan (Google Engineer)
How Zalando Manages 140+ Kubernetes Clusters by Henning Jacobs (Zalando Tech)
Definitely use a separate cluster for development and creating docker images so that your staging/production clusters can be locked down security wise. Whether you use separate clusters for staging + production is up to you to decide based on risk/cost - certainly keeping them separate will help avoid staging affecting production.
I'd also highly recommend using GitOps to promote versions of your apps between your environments.
To minimise human error I also recommend you look into automating as much as you can for your CI/CD and promotion.
Here's a demo of how to automate CI/CD with multiple environments on Kubernetes using GitOps for promotion between environments and Preview Environments on Pull Requests which was done live on GKE though Jenkins X supports most kubernetes clusters
It depends on what you want to test in each of the scenarios. In general I would try to avoid running test scenarios on the production cluster to avoid unnecessary side effects (performance impact, etc.).
If your intention is testing with a staging system that exactly mimics the production system I would recommend firing up an exact replica of the complete cluster and shut it down after you're done testing and move the deployments to production.
If your purpose is testing a staging system that allows testing the application to deploy I would run a smaller staging cluster permanently and update the deployments (with also a scaled down version of the deployments) as required for continuous testing.
To control the different clusters I prefer having a separate ci/cd machine that is not part of the cluster but used for firing up and shutting down clusters as well as performing deployment work, initiating tests, etc. This allows to set up and shut down clusters as part of automated testing scenarios.
It's clear that by keeping the production cluster appart from the staging one, the risk of potential errors impacting the production services is reduced. However this comes at a cost of more infrastructure/configuration management, since it requires at least:
at least 3 masters for the production cluster and at least one master for the staging one
2 Kubectl config files to be added to the CI/CD system
Let’s also not forget that there could be more than one environment. For example I've worked at companies where there are at least 3 environments:
QA: This where we did daily deploys and where we did our internal QA before releasing to the client)
Client QA: This where we deployed before deploying to production so that the client could validate the environment before releasing to production)
Production: This where production services are deployed.
I think ephemeral/on-demand clusters makes sense but only for certain use cases (load/performance testing or very « big » integration/end-to-end testing) but for more persistent/sticky environments I see an overhead that might be reduced by running them within a single cluster.
I guess I wanted to reach out to the k8s community to see what patterns are used for such scenarios like the ones I've described.
Unless compliance or other requirements dictate otherwise, I favor a single cluster for all environments. With this approach, attention points are:
Make sure you also group nodes per environment using labels. You can then use the nodeSelector on resources to ensure that they are running on specific nodes. This will reduce the chances of (excess) resource consumption spilling over between environments.
Treat your namespaces as subnets and forbid all egress/ingress traffic by default. See https://kubernetes.io/docs/concepts/services-networking/network-policies/.
Have a strategy for managing service accounts. ClusterRoleBindings imply something different if a clusters hosts more than one environment.
Use scrutiny when using tools like Helm. Some charts blatantly install service accounts with cluster-wide permissions, but permissions to service accounts should be limited to the environment they are in.
I think there is a middle point. I am working with eks and node groups. The master is managed, scaled and maintained by aws. You could then create 3 kinds of node groups (just an example):
1 - General Purpose -> labels: environment=general-purpose
2 - Staging -> labels: environment=staging (taints if necessary)
3 - Prod -> labels: environment=production (taints if necessary)
You can use tolerations and node selectors on the pods so they are placed where they are supposed to be.
This allows you to use more robust or powerful nodes for production's nodegroups, and, for example, SPOT instances for staging, uat, qa, etc... and has a couple of big upsides:
Environments are physically separated (and virtually too, in namespaces)
You can reduce costs by sharing not only the masters, but also some nodes with pods shared by the two environments and by using spot or cheaper instances in staging/uat/...
No cluster-management overheads
You have to pay attention to roles and policies to keep it secure. You can implement network policies using, for example eks+calico.
Update:
I found a doc that may be useful when using EKS. It has some details on how to safely run multi-tenant cluster, and some of this details may be useful to isolate production pods and namespaces from the ones in staging.
https://aws.github.io/aws-eks-best-practices/security/docs/multitenancy/
Using multiple clusters is the norm, at the very least to enforce a strong separation between production and "non-production".
In that spirit, do note that GitLab 13.2 (July 2020) now includes:
Multiple Kubernetes cluster deployment in Core
Using GitLab to deploy multiple Kubernetes clusters with GitLab previously required a Premium license.
Our community spoke, and we listened: deploying to multiple clusters is useful even for individual contributors.
Based on your feedback, starting in GitLab 13.2, you can deploy to multiple group and project clusters in Core.
See documentation and issue.
A few thoughts here:
Do not trust namespaces to protect the cluster from catastrophe. Having separate production and non-prod (dev,stage,test,etc) clusters is the minimum necessary requirement. Noisy neighbors have been known to bring down entire clusters.
Separate repositories for code and k8s deployments (Helm, Kustomize, etc.) will make best practices like trunk-based development and feature-flagging easier as the teams scale.
Using Environments as a Service (EaaS) will allow each PR to be tested in its own short-lived (ephemeral) environment. Each environment is a high-fidelity copy of production (including custom infrasture like database, buckets, dns, etc.), so devs can remotely code against a trustworthy environment (NOT minikube). This can help reduce configuration drift, improve release cycles, and improve the overall dev experience. (disclaimer: I work for an EaaS company).
I think running a single cluster make sense because it reduces overhead, monitoring. But, you have to make sure to place network policies, access control in place.
Network policy - to prohibit dev/qa environment workload to interact with prod/staging stores.
Access control - who have access on different environment resources using ClusterRoles, Roles etc..