Copy-on-write style memory reuse for kubernetes pods? To make pod spawn faster and more memory efficient - kubernetes

Can Kubernetes pods share significant amount of memory?
Does copy-on-write style forking exist for pods?
The purpose is to make pods spawn faster and use less memory.
Our scenario is that we have a dedicated game server to host in kubernetes. The problem is that one instance of the dedicated game server would take up a few GB of memory upfront (e.g. 3 GBs).
Also, we have a few such docker images of game servers, each for game A, game B... Let's call a pod that's running game A's image for game A pod A.
Let's say we now have 3 x pod A, 5 x pod B. Now players rushing into game B, so I need let's say another 4 * pod B urgently.
I can surely spawn 4 more pod B. Kubernetes supports this perfectly. However there are 2 problems:
The booting of my game server is very slow (30s - 1min). Players don't want to wait.
More importantly for us, the cost of having this many pods that take up so much memory is very high. Because pods do not share memory as far as I know. Where as if it were plain old EC2 machine or bare metal, processes can share memory because they can fork and then copy-on-write.
Copy-on-write style forking and memory sharing seems to solve both problems.

One of Kubernetes' assumptions is that pods are scheduled on different Nodes, which contradicts the idea of sharing common resources (does not apply for storage where there are many options and documentation available). The situation is different when it comes to sharing resources between containers in one pod, but for your issue this doesn't apply.
However, it seems that there is some possibility to share memory - not well documented and I guess very uncommon in Kubernetes. Check my answers with more details below:
Can Kubernetes pods share significant amount of memory?
What I found is that pods can share a common IPC with the host (node).
You can check Pod Security Policies, especially field hostIPC:
HostIPC - Controls whether the pod containers can share the host IPC namespace.
Some usage examples and possible security issues can be found here:
Shared /dev/sh directory
Use existing IPC facilities
Keep in mind that this solution is not common in Kubernetes. Pods with elevated privileges are granted broader permissions than needed:
The way PSPs are applied to Pods has proven confusing to nearly everyone that has attempted to use them. It is easy to accidentally grant broader permissions than intended, and difficult to inspect which PSP(s) apply in a given situation.
That's why the Kubernetes team marked Pod Security Policies as deprecated from Kubernetes v1.21 - check more information in this article.
Also, if you are using multiple nodes in your cluster you should use nodeSelector to make sure that pods will be assigned to same node that means they will be able to share one (host's) IPC.
Does copy-on-write style forking exist for pods?
I did a re-search and I didn't find any information about this possibility, so I think it is not possible.
I think the main issue is that your game architecture is not "very suitable" for Kubernetes. Check these articles and websites about dedicated game servers in Kubernetes- maybe you will them useful:
Agones
Scaling Dedicated Game Servers with Kubernetes: Part 3 – Scaling Up Nodes
Google Cloud - Dedicated Game Server Solution
Google Cloud - Game Servers

A different way to resolve the issue would be if some of the initialisation can be baked into the image.
As part of the docker image build, start up the game server and do as much of the 30s - 1min initialisation as possible, then dump that part of the memory into a file in the image. On game server boot-up, use mmap (with MAP_PRIVATE and possibly even MAP_FIXED) to map the pre-calculated file into memory.
That would solve the problem with the game server boot-up time, and probably also with the memory use; everything in the stack should be doing copy-on-write all the way from the image through to the pod (although you'd have to confirm whether it actually does).
It would also have the benefit that it's plain k8s with no special tricks; no requirements for special permissions or node selection or anything, nothing to break or require reimplementation on upgrades or otherwise get in the way. You will be able to run it on any k8s cluster, whether your own or any of the cloud offerings, as well as in your CI/CD pipeline and dev machine.

Related

Does kubernetes support non distributed applications?

Our store applications are not distributed applications. We deploy on each node and then configured to store specific details. So, it is tightly coupled to node. Can I use kubernetes for this test case? Would I get benefits from it?
Our store applications are not distributed applications. We deploy on each node and then configured to store specific details. So, it is tightly coupled to node. Can I use kubernetes for this test case?
Based on only this information, it is hard to tell. But Kubernetes is designed so that it should be easy to migrate existing applications. E.g. you can use a PersistentVolumeClaim for the directories that your application store information.
That said, it will probably be challenging. A cluster administrator want to treat the Nodes in the cluster as "cattles" and throw them away when its time to upgrade. If your app only has one instance, it will have some downtime and your PersistentVolume should be backed by a storage system over the network - otherwise the data will be lost when the node is thrown away.
If you want to run more than one instance for fault tolerance, it need to be stateless - but it is likely not stateless if it stores local data on disk.
There are several ways to have applications running on fixed nodes of the cluster. It really depends on how those applications behave and why do they need to run on a fixed node of the cluster.
Usually such applications are Stateful and may require interacting with a specific node's resources, or writing directly on a mounted volume on specific nodes for performance reasons and so on.
It can be obtained with a simple nodeSelector or with affinity to nodes ( https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/ )
Or with local persistent volumes ( https://kubernetes.io/blog/2019/04/04/kubernetes-1.14-local-persistent-volumes-ga/ )
With this said, if all the applications that needs to be executed on the Kubernetes cluster are apps that needs to run on a single node, you lose a lot of benefits as Kubernetes works really well with stateless applications, which may be moved between nodes to obtain high availability and a strong resilience to nodes failure.
The thing is that Kubernetes is complex and brings you a lot of tools to work with, but if you end up using a small amount of them, I think it's an overkill.
I would weight the benefits you could get with adopting Kubernetes (easy way to check the whole cluster health, easy monitoring of logs, metrics and resources usage. Strong resilience to node failure for stateless applications, load balancing of requests and a lot more) with the cons and complexity that it may brings (especially migrating to it can require a good amount of effort, if you weren't using containers to host your applications and so on)

Scaling down video conference software in Kubernetes

I'm planning to deploy a WebRTC custom videoconference software (based on NodeJS, using websockets) with Kubernetes, but I have some doubts about scaling down this environment.
Actually, I'm planning to use cloud hosted Kubernetes (GKE, EKS, AKS or any) to be able to auto-scale nodes in the cluster to attend the demand increase and decrease. But, scaling up is not the problem, but it's about scaling down.
The cluster will scale down based on some CPU average usage metrics across the cluster, as I understand, and if it tries to remove some node, it will start to drain connections and stop receiving new connections, right? But now, imagine that there's a videoconference still running in this "pending deletion" node. There are two problems:
1 - Stopping the node before the videoconference finishes (it will drop the meeting)
2 - With the draining behaviour when it starts to scale down, it will stop receiving new connections, so if someone tries to join in this running video conference, it will receive a timeout, right?
So, which is the best strategy to scale down nodes for a video conference solution? Any ideas?
Thanks
I would say this is not a matter of resolving it on kubernetes level by some specific scaling strategy but rather application ability to handle such situations. It isn't even specific to kubernetes. Imagine that you deploy it directly on compute instances which are also subject to autoscale and you'll end up in exactly the same situation when the load decreases and one of the instances is removed from the set.
You should rather ask yourself if such application is suitable to be deployed as kubernetes workload. I can imagine that such videoconference session doesn't have to rely on the backend deployed on a single node only. You can even define some affinity or anti-affinity rules to prevent your Pods from being scheduled on the same node. So if the whole application cluster is still up and running (it's Pods are running on different nodes), eviction of a limited subset of Pods should not have a big impact.
You can actually face the same issue with any other application as vast majority of them base on some session which needs to be established between the client software and the server part. I would say it's application responsibility to be able to handle such scenarios. If some of the users unexpectedly loses the connection it should be possible to immediately redirect them to the running instance e.g. different Pod which is still able to accept new requests.
So basically if the application is designed to be highly available, scaling in (when we talk about horizontal scaling we actually talk about scaling in and scaling out) the underyling VMs, or more specifically kubernetes nodes, shouldn't affect it's high availability capabilities. From the other hand if it is not designed to be highly available, solution such as kubernetes probably won't help much.
There is no best strategy at your use case. When a cloud provider scales down, it is going to get one node randomly and kill it. It's not going to check whether this node has less resource consumption, so let's kill this one. It might end up killing the node with most pods running on it.
I would focus on how you want to schedule your pods. I would try to schedule them, if possible, on a node with running pods already (Pod inter-affinity), and would set up a Pod Disruption Budget to all Deployments/StatefulSets/etc (depending on how you want to run the pods). As a result it would only scale down when there are no pods running on a specific node, and it would kill that node, because on the other nodes there are pods; protected by a PDB.

In a Kubernetes cluster. Does the Master Node need always to run alone in a cluster node?

I am aware that it is possible to enable the master node to execute pods and that is my concern. Since the default configuration is do not allow the master to run pods. Should I change it? What is the reason for the default configuration as it is?
If the change can be performed in some situations. I would like to ask if my cluster in one of these. It has only three nodes with exactly the same hardware and possibly more nodes are not going to be added in the foreseeable future. In my opinion, as I have three equal nodes, it will be a waste of resources to use 1/3 of my cluster computational power to run the kubernetes master. Am I right?
[Edit1]
I have found the following reason in Kubernets documentation.
It is, the security, the only reason?
Technically, it doesn't need to run on a dedicated node. But for your Kubernetes cluster to run, you need your masters to work properly. And one of the ways how to ensure it can be secure, stable and perform well is to use separate node which runs only the master components and not regular pod. If you share the node with different pods, there could be several ways how it can impact the master. For example:
The other pods will impact the perforamnce of the masters (network or disk latencies, CPU cache etc.)
They migth be a security risk (if someone manages to hack from some other pod into the master node)
A badly written application can cause stability issues to the node
While it can be seen as wasting resources, you can also see it as a price to pay for the stability of your master / Kubernetes cluster. However, it doesn't have to be waste of 1/3 of resources. Depending on how you deploy your Kubernetes cluster you can use different hosts for different nodes. So for example you can use small host for the master and bigger nodes for the workers.
No, this is not required, but strongly recommended. Security is one aspect, but performance is another. Etcd is usually run on those control plane nodes and it tends to chug if it runs out of IOPS. So a rogue pod running application code could destabilize the control plane, which then reduces your ability to fix the problem.
When running small clusters for testing purposes, it is common to run everything (control plane and workloads) on a single node specifically to save money/complexity.

Will the master know the data on workers/nodes in k8s

I try to deploy a set of k8s on the cloud, there are two options:the masters are in trust to the cloud provider or maintained by myself.
so i wonder about that if the masters in trust will leak the data on workers?
Shortly, will the master know the data on workers/nodes?
The abstractions in Kubernetes are very well defined with clear boundaries. You have to understand the concept of Volumes first. As defined here,
A Kubernetes volume is essentially a directory accessible to all
containers running in a pod. In contrast to the container-local
filesystem, the data in volumes is preserved across container
restarts.
Volumes are attached to the containers in a pod and There are several types of volumes
You can see the layers of abstraction source
Master to Cluster communication
There are two primary communication paths from the master (apiserver) to the cluster. The first is from the apiserver to the kubelet process which runs on each node in the cluster. The second is from the apiserver to any node, pod, or service through the apiserver’s proxy functionality.
Also, you should check the CCM - The cloud controller manager (CCM) concept (not to be confused with the binary) was originally created to allow cloud specific vendor code and the Kubernetes core to evolve independent of one another. The cloud controller manager runs alongside other master components such as the Kubernetes controller manager, the API server, and scheduler. It can also be started as a Kubernetes addon, in which case it runs on top of Kubernetes.
Hope this answers all your questions related to Master accessing the data on Workers.
If you are still looking for more secure ways, check 11 Ways (Not) to Get Hacked
Short answer: yes the control plane can access all of your data.
Longer and more realistic answer: probably don't worry about it. It is far more likely that any successful attack against the control plane would be just as successful as if you were running it yourself. The exact internal details of GKE/AKS/EKS are a bit fuzzy, but all three providers have a lot of experience running multi-tenant systems and it wouldn't be negligent to trust that they have enough protections in place against lateral escalations between tenants on the control plane.

Clusters and nodes formation in Kubernetes

I am trying to deploy my Docker images using Kubernetes orchestration tools.When I am reading about Kubernetes, I am seeing documentation and many YouTube video tutorial of working with Kubernetes. In there I only found that creation of pods, services and creation of that .yml files. Here I have doubts and I am adding below section,
When I am using Kubernetes, how I can create clusters and nodes ?
Can I deploy my current docker-compose build image directly using pods only? Why I need to create services yml file?
I new to containerizing, Docker and Kubernetes world.
My favorite way to create clusters is kubespray because I find ansible very easy to read and troubleshoot, unlike more monolithic "run this binary" mechanisms for creating clusters. The kubespray repo has a vagrant configuration file, so you can even try out a full cluster on your local machine, to see what it will do "for real"
But with the popularity of kubernetes, I'd bet if you ask 5 people you'll get 10 answers to that question, so ultimately pick the one you find easiest to reason about, because almost without fail you will need to debug those mechanisms when something inevitably goes wrong
The short version, as Hitesh said, is "yes," but the long version is that one will need to be careful because local docker containers and kubernetes clusters are trying to solve different problems, and (as a general rule) one could not easily swap one in place of the other.
As for the second part of your question, a Service in kubernetes is designed to decouple the current provider of some networked functionality from the long-lived "promise" that such functionality will exist and work. That's because in kubernetes, the Pods (and Nodes, for that matter) are disposable and subject to termination at almost any time. It would be severely problematic if the consumer of a networked service needed to constantly update its IP address/ports/etc to account for the coming-and-going of Pods. This is actually the exact same problem that AWS's Elastic Load Balancers are trying to solve, and kubernetes will cheerfully provision an ELB to represent a Service if you indicate that is what you would like (and similar behavior for other cloud providers)
If you are not yet comfortable with containers and docker as concepts, then I would strongly recommend starting with those topics, and moving on to understanding how kubernetes interacts with those two things after you have a solid foundation. Else, a lot of the terminology -- and even the problems kubernetes is trying to solve -- may continue to seem opaque