Kubernetes clusters merge to minimize the clusters - kubernetes

This should be the topic of my bachelor thesis. At the moment I am looking for literature or general information, but I can't really find it. Do you have more information on this topic? I want to find out if it makes sense to run dev and test stages on a cluster instead of running each stage on its own.
I also want to find out, if it's a good idea, how I can consolidate the clusters.

That's a nice question. It's a huge topic to cover actually, in short Yes and No you can setup a single cluster for all of your environments.
But in general, everyone need to consider various things before merging all the environments into a single cluster. Some of them include, number of services that you are running on k8s, number of engineers you have in hand as Ops to manage and maintain the existing cluster without any issues, location of different teams who use the cluster(If you take latency into consideration).
There are many advantages and disadvantages in merging everything into one.
Advantages include easy to manage and maintain a small cluster, you can spread out your nodes with labels and deploy your applications with dev label on dev node and so on. But this also hits a waste of resources thing, if you are not letting k8s take the decision by restricting the deployment of the pod. People can argue on this topic for hours and hours if we setup a debate.
Costing of resources, imagine you have a cluster of 3 nodes and 3 masters on prod, similarly a cluster of 2 nodes and 3 masters on dev, a cluster of 2 nodes and 3 masters on test. The costing is huge as you are allocating 9 masters, if your merge dev and test into 1 you will be saving 3 VMs cost.
K8s is very new to many DevOps engineers in many organisations, and many of us need a region to experiment and figure out things with the latest versions of the softwares before that could be implemented on Prod. This is the biggest thing of all, because downtime is very costly and many cannot afford a downtime no matter what. If everything is in single cluster, it's difficult to debug the problems. One example is upgrading Helm 2.0 to 3.0 as this involves losing of helm data. One need to research and workout.
As I said, team locations is another thing. Imagine you have a offshore testing team, if you merge dev and test into single cluster. There might be network latency for the testing team to work with the product and as all of us have deadlines. This could be arguable but still one need to consider network latencies.
In short Yes and No, this is very debatable question and we can keep adding pros and cons to the list forever, but it is advisable to have different cluster for each environment until you become some kind of kubernetes guru understanding each and every packet of data inside your cluster.

This is already achievable using namespace
https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/
With name space we will be able to isolate.
Merging with different namespace would achieve what your thinking.

Related

k8s - Nested Firecracker Environments

Sorry if this question might sound "convoluted" but here it goes...
I'm currently designing a k8s solution based on Firecracker and Kata-containers. I'd like the environment to be as isolated/secure as possible. My thoughts around this are:
deploy k8s masters as Firecracker nodes having API-server,
Controller, Scheduler and etcd
deploy k8s workers as Firecracker nodes having Kubelet, Kube-proxy and using Kata-containers + Firecracker for
deployed workload. The workload will be a combination of MQTT cluster components and in-house developed FaaS components (probably using OpenFaaS)
It's point 2 above which makes me feel a little awkward/convoluted. Am I over complicating things, introducing complexity which will cause problems related to (CNI) networking among worker nodes etc? Isolation and minimizing attack vectors are all important, but maybe I'm trying "to be too much of a s.m.a.r.t.a.s.s" here :)
I really like the concept with Firecrackers microVM architecture with reduced security risks and reduced footprint and it would make for a wonderful solution to tenant isolation. However, am I better of to use another CRI-conforming runtime together with Kata for the actual workload being deployed on the workers?
Many thanks in advance for your thoughts/comments on this!
You might want to take a look at https://github.com/weaveworks-liquidmetal and consider whether contributing to that would get you further towards your goal? alternative runtimes (like kata) for different workloads are welcomed in PR’s. There is a liquid-metal slack channel in the Weaveworks user group of you have any queries. Disclosure I currently work at Weaveworks :)

Openshift vs Rancher, what are the differences? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I am totally new to this two technologies (I know docker and kubernetes btw).
Haven't find much an the web about this comparison topic.
I have read that Openshift is used by more companies,but a nightmare to install,pricier and on upgrade data loss can occur.
But nothing else.
What should be the deciding factor for which one to use for kubernete cluster orchestration?
I currently work for Rancher. I've also been building Internet infrastructure since 1996 and owned an MSP for 14 years that built and managed Internet datacenters for large US media companies. I've been working with containers since 2014, and since then I've tried pretty much everything that exists for managing containers and Kubernetes.
"The deciding factor" varies by individual and organization. Many companies use OpenShift. Many companies use Rancher. Many companies use something else, and everyone will defend their solution because it fits their needs, or because of the psychological principle of consistency, which states that because we chose to walk a certain path, that path must be correct. More specifically, the parameters around the solution we chose must be what we need because that was the choice we made.
Red Hat's approach to Kubernetes management comes from OpenShift being a PaaS before it was ever a Kubernetes solution. By virtue of being a PaaS, it is opinionated, which means it's going to be prescriptive about what you can do and how you can do it. For many people, this is a great solution -- they avoid the "analysis paralysis" that comes from having too many choices available to them.
Rancher's approach to Kubernetes management comes from a desire to integrate cloud native tooling into a modular platform that still lets you choose what to do. Much like Kubernetes itself, it doesn't tell you how to do it, but rather gives fast access to the tooling to do whatever you want to do.
Red Hat's approach is to create large K8s clusters and manage them independently.
Rancher's approach is to unify thousands of clusters into a single management control plane.
Because Rancher is designed for multi-cluster management, it applies global configuration where it benefits the operator (such as authentication and identity management) but keeps tight controls on individual clusters and namespaces within them.
Within the security boundaries Rancher gives developers access to clusters and namespaces, easy app deployment, monitoring and metrics, service mesh, and access to Kubernetes features without having to go and learn all about Kubernetes first.
But wait! Doesn't OpenShift give developers those things too?
Yes, but often with Red Hat-branded solutions that are modified versions of open source software. Rancher always deploys unadulterated versions of upstream software and adds management value to it from the outside.
The skills you learn using software with Rancher will transfer to using that same software anywhere else. That's not always the case with skills you learn while using OpenShift.
There are a lot of things in Kubernetes that are onerous to configure, independent of the value of using the thing itself. It's easy to spend more time fussing around with Kubernetes than you do using it, and Rancher wants to narrow that gap without compromising your freedom of choice.
What is it that you want to do, not only now, but in the future? You say that you already know Kubernetes, but something has you seeking a management solution for your K8s clusters. What are your criteria for success?
No one can tell you what you need to be successful. Not me, not Red Hat, not Rancher.
I chose to use Rancher and to work there because I believe that they are empowering developers and operators to hit the ground running with Kubernetes. Everything that Rancher produces is free and open source, and although they're a business, the vast majority of Rancher deployments make no money for Rancher.
This forces Rancher to create a product that has true value, not a product that they can convince other people to buy.
The proof is in the deployments - Red Hat has roughly 1,000 OpenShift customers, which means roughly 1,000 OpenShift deployments. Rancher has fewer paying customers than Red Hat, but Rancher has over 30,000 deployments that we know about.
You can be up and running with Rancher in under ten minutes, and you can import the clusters you already have and start working with them a few minutes later. Why not just take it for a spin and see if you like it?
I also invite you to join the Rancher Users slack. There you will not only find a community of Rancher users, but you will be able to find other people who compared Rancher and OpenShift and chose Rancher. They will be happy to help you with information that will lead you to feel confident about whatever choice you make.

Which is the best approach to create namespaces?

I have an application with 5 microservices (iam, courses...). I want to know which is the best approach to migrate them to kubernetes. I was thinking to create namespaces by enviroment as google recomendes:
1. prod
2. dev
3. staging
then I thought that may be better create namespace by environment and microservices.
1. iam-prod
2. iam-dev
3. iam-staging
1. courses-prod
2. courses-dev
3. courses-staging
...
but this approach can be a little bit difficult to handle. Because I need to communicate between each other.
Which approach do you think that is better?
Just like the other answer, you should create namespace isolation for prod, dev and staging. This will ensure a couple of nuances are taken care of...
Ideally, your pods in either of the environments should not be talking across environments
You can manage your network policies in a much cleaner and manageable way with this organization of k8s kinds
You can run multiple microservices on the same namespace. So, I would go with prod, dev and staging namespaces where you can have one or multiple instances of each micro-service.
yet, If you want to use separate namespaces for separate microservices environments, they still can communicate using service. The DNS URL will be, SERVICE_NAME.NAMESPACE.SVC.
ref: https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/
If you go with second approach you will create unnecessary complexity without achieving any benefit. Also think of situation if your micro-services grows ,are you going to create new cluster for each one.This is not at all recommended.
Concept of Namespace should not be linked to applications but it is related to users.Refer k8 doc as below
"Namespaces are intended for use in environments with many users spread
across multiple teams, or projects. For clusters with a few to tens of users,
you should not need to create or think about namespaces at all. Start using namespaces when you need the features they provide."
Also even if first approach is recommended, please have separate cluster for prod as this should be more secure and highly available with proper disaster recovery plan ready and tested.
Go with one name space for each environment. You can also define resource quota per names paces. That way each application environment can be independently managed
None of the above are ideal solutions. I’ll go over why.
Security
Namespaces are the easiest boundary to use for managing RBAC permissions. In general, you will want to use the pre-provisioned admin and editor cluster roles to constrain access for users to use namespaces. This means people and services that share namespaces also share visibility of secrets. So the namespace becomes the blast radius for compromising secrets.
In order to reduce the blast radius of secrets exposure you can either micromanage resource level role binding (which is unreasonable overhead without additional automation and tooling) or segregate services across namespaces so that only tightly couple services share a namespace.
Isolation
Kubernetes resource isolation is relatively poor between namespaces. There’s no way to force a namespace to deploy into a different node pool than another namespace without custom admission controllers. So resource isolation is effectively opt-in, which is both insecure and unenforceable.
Because of this, is actually more secure and better resource isolated to have different environments (dev, staging, prod) in seperate K8s clusters all together. But this is obviously more expensive and more management overhead. So it’s only cost effective when you have many services and enough resource usage to justify the added overhead.
The consequence of poor resource isolation is that your dev and staging workloads can effectively DOS your prod workloads simply by using shared resources. CPU/memory/disk are the obvious culprits. These can be enforced by custom admission controllers. But the more insidious problem is sharing ingress proxies, load balancer, and networking, which is harder to isolate between namespaces.
Another consequence of poor isolation is that dev services with poor security can be compromised, allowing horizontal access to prod services. Realistically, no one deploys dev apps as production ready and secure. So without hard isolation, your security is at risk too.
Quotas
Quotas are managed at the namespace level. So if you want to isolate quota by environment AND team, you can’t use namespaces for both. And if you want to have quota by project, you’d need a project per namespace. The only way to handle all three is with multiple clusters, multiple namespaces, and multiple node pools with custom deployment/admission enforcement of that creates a makeshift hierarchy or matrix.
Namespace Hierarchy
Namespaces are flat. If you use them for env you can’t use them for org or team level access control. If you use them for team level access control your engineers can use them for component/project/system level abstraction boundaries. You can only choose one or the chaos will be unmanageable.
Conclusion
Unfortunately, the namespace abstraction is being used for 3 or 4 use cases in the Kubernetes community, and it’s the not really ideal for any of them. So either you pick an non-ideal use case to optimize for or you manage multiple clusters and write a bunch of custom automation to handle all the use cases.

Multiple apps running in one kubernetes cluster or a clusters for each app

I have some apps in production working in Azure. All these applications belong to the same company and communicate with each other. I want to migrate them to Kubernetes.
My question is: What are the best practices in this case and why ?
Some peoples recommend one cluster and multiples namespaces and I don't know why.
For example: https://www.youtube.com/watch?v=xygE8DbwJ7c recommends apps within a cluster doing intra-cluster multi-tenancy but the arguments of this choice are not enough for me.
My question is: What are the best practices in this case? and why ?
Answer is: it depends...
To try to summarize it from our experience:
Cluster for each app is usually quite a bit waste of resources, especially giving HA clusters requirements, and it can mainly be justified in case when single app is comprised of larger number of microservices that are naturally clustered together or when some special security considerations has to be taken into account. That is, however, in our experience, rare the case (but it depends)...
Namespaces for apps in a cluster are more in line with our experience and needs, but again, this should not be overdone either (so, again it depends) since, for example your CNI can be bottleneck leading to one rogue app (or setup) degrading performance for other apps in seemingly unrelated case. Loadbanalcing and rollout downtimes, clashes for resources and other things can happen if all is crammed into one cluster at all cost. So this has it's limits as well.
Best of both worlds - we started with single cluster, and when we reached naturally separate (and separately performant) use cases (say, qa, dev, stage environments, different client with special security considerations etc) we migrated to more clusters, keeping in each cluster reasonably namespaced apps.
So all in all: depending on available machine pool (number of nodes), size of the cluster, size of apps themselves (microservice/service complexity), HA requirements, redundance, security considerations etc.... you might want to fit all into one cluster with namespaced apps, then again maybe separate in several clusters (again with namespaced apps within each cluster) or keep everything totally separate with one app per cluster. So - it depends.
It really depends on the scenario. I can think of one scenario where some of the apps need dedicated higher configuration nodes (Say GPU).
In such scenarios having a dedicated cluster with GPU nodes can be beneficial for such apps. And having a normal CPU nodes for other normal apps.

What is the cheapest Google Compute Engine architecture for sharded MongoDB development setup?

After weeks of developing my various microservices, GC Pub/Sub and GC Functions using a basic MongoDB server, I would like to test the entire data flow using what I would use in production: a sharded MongoDB cluster. I've never used these and would like to get myself familiar with setting them up, updating, etc.
Costs are an issue at this stage, especially for testing. Therefore, what is the most cost-effective way to setup a (test) MongoDB sharded cluster on Google Compute Engine?
The easiest approach for you is to use Cloud Launcher for your deployment. It will let you choose the number of nodes and the machine types. In that way you can deploy something that suits your budget. You will get billed according to the resources you deploy and can use this online calculator to have an estimate. A drawback is that there does not seems to be a direct way to increase nodes or change machine types without manual reconfiguration.
While configuring your deployment the appropriate number of nodes and an arbitre will be created. Once you have tested, you might want to think about using more complex architectures that will be redundant against failures in one region (Those will certainly increase your cost since it will mean having additional nodes).
You can also consider running Mongo on GKE, it would be easier to escale but it will require that you get familiar with Kubernetes. Kubernetes Engine is also charged according to the resources used by the cluster.