What is the difference between having multiple namespace and multiple cluster in Kubernetes - kubernetes

I am a beginner and learning about Kubernetes.
As per my understanding, namespace is a virtual cluster backed by the same physical cluster.
In which usecases do we go for separate physical Kubernetes cluster?
What are the main resources that can be saved by opting for namespace instead of physical Kubernetes cluster? (Kubernetes objects present in one namespace of the physical cluster can be shared by all other namespaces, like the ones in kube-system? And are the nodes in the physical Kubernetes cluster shared by all the namespaces but it is not possible to share nodes between multiple physical Kubernetes clusters?)

A namespace isn't a "virtual cluster" in any meaningful way; it's just a way to group together resources. For instance, these Services are different because they're in different namespaces:
kubectl describe service --namespace n1 foo
kubectl describe service --namespace n2 foo
But a service in n1 can make a call to foo.n2.svc.cluster.local without doing any special setup.
A namespace is a natural boundary for Kubernetes RBAC settings. If one object directly references another (e.g., a pod mounts a persistent volume claim or gets environment variables from a config map) they generally must be in the same namespace.
In an actual cluster, the nodes are shared. A given node can run any pod from any namespace (unless that's specifically configured at the pod level); kubectl describe node will show this. If a pod makes very heavy use of some resource (CPU, memory, disk I/O) this can impact other pods running on the same node. That completely ignores namespace boundaries.
You want different clusters when you want things to actually be separated: when a service in one environment shouldn't be able to call a service in a different environment, when cluster-level resources like NodePort services need to be separated, if you have different policies around things like PersistentVolume allocation.
Sharing a cluster means that you need fewer copies of some cluster-global processes (the Kubernetes core, service meshes like Istio) and you can share nodes. That could result in better utilization of large nodes.
You might, for example, separate your test and production environments into separate clusters. These would have different external-DNS settings, separate ingress controllers, and separate node pools. You couldn't accidentally send requests into the test environment from outside, and a load test on the test environment wouldn't impact the production environment.

Generally a separate physical cluster is necessary
To meet compliance and security standards such as PCI DSS, HIPPA etc.
To provide dedicated physical resources to critical workloads.
To separate different environments such as DEV, TEST, PROD
A multi tenant cluster shared by many tenants using their own namespace is useful for saving cost. Namespace separation is logical where the resources of all namespaces still reside in same ETCD storage but with different keys. This is not a problem in separate dedicated physical cluster because in that case the cluster will have separate ETCD as well.
Access to resources across namespaces is controlled by RBAC via kubernetes API Server. But you can access everything from all namespaces if you get access to ETCD directly bypassing the API Server.
You need to put lot of best practices and protection in a multi tenant cluster so that tenants from different namespaces do not step on each other toes. This not that much necessary in a separate dedicated physical cluster.

Related

Can you make a kubernetes container deployment conditional on whether a configmap variable is set?

If I have a k8s deployment file for a service with multiple containers like api and worker1, can I make it so that there is a configmap with a variable worker1_enabled, such that if my service is restarted, container worker1 only runs if worker1_enabled=true in the configmap?
The short answer is No.
According to k8s docs, Pods in a Kubernetes cluster are used in two main ways:
Pods that run a single container. The "one-container-per-Pod" model is the most common Kubernetes use case; in this case, you can think of a Pod as a wrapper around a single container; Kubernetes manages Pods rather than managing the containers directly.
Pods that run multiple containers that need to work together. A Pod can encapsulate an application composed of multiple co-located containers that are tightly coupled and need to share resources. These co-located containers form a single cohesive unit of service—for example, one container serving data stored in a shared volume to the public, while a separate sidecar container refreshes or updates those files. The Pod wraps these containers, storage resources, and an ephemeral network identity together as a single unit.
Unless your application requires it, it is better to separate the worker and api containers into their own pod. So you may have one deployment for worker and one for api.
As for deploying worker when worker1_enabled=true, that can be done with helm. You have to create a chart such that when the value of worker1_enabled=true is set, worker is deployed.
Last note, a service in kubernetes is an abstract way to expose an application running on a set of Pods as a network service.

Why there is no concept of nodepool in Kubernetes?

I can see GKE, AKS, EKS all are having nodepool concepts inbuilt but Kubernetes itself doesn't provide that support. What could be the reason behind this?
We usually need different Node types for different requirements such as below-
Some pods require either CPU or Memory intensive and optimized nodes.
Some pods are processing ML/AI algorithms and need GPU-enabled nodes. These GPU-enabled nodes should be used only by certain pods as they are expensive.
Some pods/jobs want to leverage spot/preemptible nodes to reduce the cost.
Is there any specific reason behind Kubernetes not having inbuilt such support?
Node Pools are cloud-provider specific technologies/groupings.
Kubernetes is intended to be deployed on various infrastructures, including on-prem/bare metal. Node Pools would not mean anything in this case.
Node Pools generally are a way to provide Kubernetes with a group of identically configured nodes to use in the cluster.
You would specify the node you want using node selectors and/or taints/tolerations.
So you could taint nodes with a GPU and then require pods to have the matching toleration in order to schedule onto those nodes. Node Pools wouldn't make a difference here. You could join a physical server to the cluster and taint that node in exactly the same way -- Kubernetes would not see that any differently to a Google, Amazon or Azure-based node that was also registered to the cluster, other than some different annotations on the node.
As Blender Fox mentioned Node group is more specific to Cloud provider Grouping/Target options.
In AWS we have Node groups or Target groups, While in GKE Managed/Unmanaged node groups.
You set the Cluster Autoscaler and it scales up & down the count in the Node pool or Node groups.
If you are running Kubernetes on On-prem there may not be the option of a Node pool, as the Node group is mostly a group of VM in the Cloud. While on the on-prem bare metal machines also work as Worker Nodes.
To scale up & Down there is Cluster autoscaler(CA adds or removes nodes from the cluster by creating/deleting VMs) in K8s which uses the Cloud provider node group API while on Bare metal it may not work simply.
Each provider have own implementation and logic which get determined from K8s side by flag --cloud-provider Code link
So if you are on On-prem private cloud write your own cloud client and interface.
It's not necessary to have to node group however it's more of Cloud provider side implementation.
For Scenario
Some pods require either CPU or Memory intensive and optimized nodes.
Some pods are processing ML/AI algorithms and need GPU-enabled nodes.
These GPU-enabled nodes should be used only by certain pods as they
are expensive. Some pods/jobs want to leverage spot/preemptible nodes
to reduce the cost.
You can use the Taints-toleration, Affinity, or Node selectors as per need to schedule the POD on the specific type of Nodes.

What's the maximum number of Kubernetes namespaces?

Is there a maximum number of namespaces supported by a Kubernetes cluster? My team is designing a system to run user workloads via K8s and we are considering using one namespace per user to offer logical segmentation in the cluster, but we don't want to hit a ceiling with the number of users who can use our service.
We are using Amazon's EKS managed Kubernetes service and Kubernetes v1.11.
This is quite difficult to answer which has dependency on a lot of factors, Here are some facts which were created on the k8s 1.7 cluster kubernetes-theresholds the Number of namespaces (ns) are 10000 with few assumtions
The are no limits from the code point of view because is just a Go type that gets instantiated as a variable.
In addition to link that #SureshVishnoi posted, the limits will depend on your setup but some of the factors that can contribute to how your namespaces (and resources in a cluster) scale can be:
Physical or VM hardware size where your masters are running
Unfortunately, EKS doesn't provide that yet (it's a managed service after all)
The number of nodes your cluster is handling.
The number of pods in each namespace
The number of overall K8s resources (deployments, secrets, service accounts, etc)
The hardware size of your etcd database.
Storage: how many resources can you persist.
Raw performance: how much memory and CPU you have.
The network connectivity between your master components and etcd store if they are on different nodes.
If they are on the same nodes then you are bound by the server's memory, CPU and storage.
There is no limit on number of namespaces. You can create as many as you want. It doesn't actually consume cluster resources like cpu, memory etc.

benefits of running k8s pods in non default namespace

Pardon me for my limited knowledge of k8s. As per k8s best practices we need to run pods in non default namespace. few reasons for this approach is to.
create logical isolation and creating uat, sit,dev environment on
same k8s cluster
default namespace is ok when we are having less than
10 micro services running in same PODs.
do we have any other benefits in terms of security, performance and maintenance point of view?
I would say the best practice is to think about how you will use your cluster and take namespaces into account. So thinking about what you'll run in the cluster, how much resource you want to dedicate to it and who can do what. Namespaces can help with controlling all of these things.
In terms of what you run, it's important that kubernetes object names have to be unique within a namespace. So if you want to run two instances of the same app, then you either install them in different namespaces or distinguish the resource names - helm charts for example default to adding prefixes to ensure uniqueness.
Also role-based access control permissions can be set as namespace-specific and resource usage quotas can be applied to namespaces. So if you had adev namespace on the same cluster as UAT then you could ensure that permissions are more restricted on UAT and that it has more resource availability guaranteed for it.
For more on these points see https://dzone.com/articles/kubernetes-namespaces-explained and https://kubernetes.io/blog/2016/08/kubernetes-namespaces-use-cases-insights/

Can Pods be thought of as Namespaces?

My understanding is since pod is defined as a group of containers which provides shared resources such as storage and network among those containers, can it be thought of as a namespace in a worker node that is to say, different pods are representing different namespaces in a worker node machine?
Or otherwise is pod actually a process which is first started (or run or executed) by the deployment and then it starts the containers inside it?
Can i see it through ps command? (I did try it, there are only docker containers running so I am ruling out pod being a process)
If we start from the basics
What is a namespace (in a generic manner)?
A namespace is a declarative region that provides a scope to the identifiers (the names of types, functions, variables, etc) inside it. Namespaces are used to organize code into logical groups and to prevent name collisions that can occur especially when your code base includes multiple libraries.
What is a Pod (in K8s)?
A pod is a group of one or more containers (such as Docker containers), with shared storage/network, and a specification for how to run the containers. A pod’s contents are always co-located and co-scheduled, and run in a shared context. A pod models an application-specific “logical host” - it contains one or more application containers which are relatively tightly coupled — in a pre-container world, being executed on the same physical or virtual machine would mean being executed on the same logical host.
While Kubernetes supports more container runtimes than just Docker, Docker is the most commonly known runtime, and it helps to describe pods in Docker terms.
The shared context of a pod is a set of Linux namespaces, cgroups, and potentially other facets of isolation - the same things that isolate a Docker container. Within a pod’s context, the individual applications may have further sub-isolations applied.
Some deep dive into Pods
What is a Namespace (in k8s terms)?
Namespaces are intended for use in environments with many users spread across multiple teams, or projects.
Namespaces provide a scope for names. Names of resources need to be unique within a namespace, but not across namespaces.
Namespaces are a way to divide cluster resources between multiple users.
So I think its suffice to say:
Yes Pods have a namespace :
Pods kind of represent a namespace but on a container level (where they share the same context of networks, volumes/storage only among a set of containers)
But namespaces (in terms of K8s) are a bigger level of isolation -- on a cluster level which shared by all the containers (services, deployments, dns-names, IPs, config-maps, secrets, roles, etc).
Also you should see this link
Hope this clears a bit of fog on the issue.
Yes, you could say that a pod is a namespace that is shared by containers. When using the Docker executor, a pause container is created which establishes the network, file-system, and process namespace for subsequent containers to utilise.
This is because Docker doesn't understand pods as a first class primitive, and you won't see the pause container with an other run-time.