Highly available stateful session implementation with K8S - kubernetes

How to implement memory state/session replications with K8S? For instance, a web shopping cart system replicates the user HTTP sessions among cluster nodes over the network so that if a node is down, a process in another node can take over the user sessions.
K8S has StatefulSet which uses the disk storages to assure the state persistency, I think. If a pod is down, the restarted pod takes over the state form the disk. However, the overhead of persisting in-memory user sessions to disk is high and may not be fast enough.
I suppose the solution could be using memory cache server or etcd like systems. Is it the established practice? In my understanding, K8S is good for stateless processing in scale, and StatefulSet had been introduced to address stateful situation but not sure it is good fit for situation where fast stateful handover is required.
Please advise.

How to implement memory state/session replications with K8S? For
instance, a web shopping cart system replicates the user HTTP sessions
among cluster nodes over the network so that if a node is down, a
process in another node can take over the user sessions.
To store the state it's best to use the Redis or in-memory database.
K8S has StatefulSet which uses the disk storages to assure the state
persistency, I think. If a pod is down, the restarted pod takes over
the state form the disk. However, the overhead of persisting in-memory
user sessions to disk is high and may not be fast enough.
You are right but maybe you have not tried it before, i been using the Redis for Production in K8s with million of users but never faced issues. Redis has two options for backup the keys if you deploy on K8s.
RDB and Append only-AOF, till now never faced Redis crashed or so, but only get crashed due to Out of Memory, so make sure your Key policy is set properly like LRU or so.
In my understanding, K8S is good for stateless processing in scale
You are right but people have been using the Deployment and Statefulsets for running Redis cluster and Elasticsearch clusters in K8s also with all backup and scaling options.
It's easy to configure & manage the DB with K8s while with VM not much scalability there.
We have been running stateful sets with Redis, Elasticsearch, RabbitMQ since long in Prod and have not seen many issues. Make sure you attach the SSD high IOPS disk to POD and you are good to go.
Nice example : https://github.com/loopbackio/loopback4-example-shopping/blob/master/kubernetes/README.md

Related

Fault Tolerance and Kubernetes StatefulSet

As I understand it, most databases enable the use of replicas that can take over from a leader in case the leader is unavailable.
I'm wondering the necessity of having these replicas in a Kubernetes environment, when using say a StatefulSet. Once the pod becomes unresponsive, Kubernetes will restart it, right? And the PVC will make sure the data isn't lost.
Is it that leader election is a faster process than bringing up a new application?
Or is it that the only advantage of the replicas is to provide load balancing for read queries?
As I understand it, most databases enable the use of replicas that can take over from a leader in case the leader is unavailable.
I'm wondering the necessity of having these replicas in a Kubernetes environment, when using say a StatefulSet.
There has been a move to distributed databases from previous single node datatbases. Distributed databases typically run using 3 or 5 replicas / instances in a cluster. The primary purpose for this is High Availability and fault tolerance to e.g. node or disk failure. This is the same if the database is run on Kubernetes.
the PVC will make sure the data isn't lost.
The purpose of PVCs is to decouple the application configuration with the selection of storage system. This allows that you e.g. can deploy the same application on both Google Cloud, AWS and Minikube without any different configuration although you will use different storage systems. This does not change how the storage systems work.
Is it that leader election is a faster process than bringing up a new application?
Many different things can fail, the node, the storage system or the network can be partitioned so that you cannot reach a certain node.
Leader election is just a piece of the mitigations against these problems in a clustered setup, you also need replication of all data in a consistent way. Raft consensus algorithm is a common solution for this in modern distributed databases.
Or is it that the only advantage of the replicas is to provide load balancing for read queries?
This might be an advantage in distributed databases, yes. But this is seldom the primary reason to using them, in my experience.

Does kubernetes support non distributed applications?

Our store applications are not distributed applications. We deploy on each node and then configured to store specific details. So, it is tightly coupled to node. Can I use kubernetes for this test case? Would I get benefits from it?
Our store applications are not distributed applications. We deploy on each node and then configured to store specific details. So, it is tightly coupled to node. Can I use kubernetes for this test case?
Based on only this information, it is hard to tell. But Kubernetes is designed so that it should be easy to migrate existing applications. E.g. you can use a PersistentVolumeClaim for the directories that your application store information.
That said, it will probably be challenging. A cluster administrator want to treat the Nodes in the cluster as "cattles" and throw them away when its time to upgrade. If your app only has one instance, it will have some downtime and your PersistentVolume should be backed by a storage system over the network - otherwise the data will be lost when the node is thrown away.
If you want to run more than one instance for fault tolerance, it need to be stateless - but it is likely not stateless if it stores local data on disk.
There are several ways to have applications running on fixed nodes of the cluster. It really depends on how those applications behave and why do they need to run on a fixed node of the cluster.
Usually such applications are Stateful and may require interacting with a specific node's resources, or writing directly on a mounted volume on specific nodes for performance reasons and so on.
It can be obtained with a simple nodeSelector or with affinity to nodes ( https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/ )
Or with local persistent volumes ( https://kubernetes.io/blog/2019/04/04/kubernetes-1.14-local-persistent-volumes-ga/ )
With this said, if all the applications that needs to be executed on the Kubernetes cluster are apps that needs to run on a single node, you lose a lot of benefits as Kubernetes works really well with stateless applications, which may be moved between nodes to obtain high availability and a strong resilience to nodes failure.
The thing is that Kubernetes is complex and brings you a lot of tools to work with, but if you end up using a small amount of them, I think it's an overkill.
I would weight the benefits you could get with adopting Kubernetes (easy way to check the whole cluster health, easy monitoring of logs, metrics and resources usage. Strong resilience to node failure for stateless applications, load balancing of requests and a lot more) with the cons and complexity that it may brings (especially migrating to it can require a good amount of effort, if you weren't using containers to host your applications and so on)

How can I make a Kubernetes multi-cluster Redis solution?

I deployed this week a Redis instance using Bitnami's Helm Chart into a GKE (Google Kubernetes Engine) cluster. Although I've been successful on this part, the challenge now is to make a failover disaster recovery strategy that replicates the data to another Redis instance in another GKE cluster (but same GCP project). How can I do this? I tried Persistence Volume Claims but they are only visible inside the cluster.
Redis Enterprise does have a WAN (multi-geo) replication mode, however I've never used it and it looks like it dramatically limits which features of Redis you can use to things that are compatible with a CRDT model. Basically first you need to answer how you would do this by hand, and then investigate automating that. WAN failover is a very complex thing, and generally you wouldn't even want to do it since you wouldn't fail over at the Redis level. Instead you would fail over your entire DC (or whatever you want to call your failure zones). Distributed database modelling and management is very tricky, here be dragons.

Question about 100 pods per node limitation

I'm trying to build a web app where each user gets their own instance of the app, running in its own container. I'm new to kubernetes so I'm probably not understanding something correctly.
I will have a few physical servers to use, which in kubernetes as I understand are called nodes. For each node, there is a limitation of 100 pods. So if I am building the app so that each user gets their own pod, will I be limited to 100 users per physical server? (If I have 10 servers, I can only have 500 users?) I suppose I could run multiple VMs that act as nodes on each physical server but doesn't that defeat the purpose of containerization?
The main issue in having too many pods in a node is because it will degrade the node performance and makes is slower(and sometimes unreliable) to manage the containers, each pod is managed individually, increasing the amount will take more time and more resources.
When you create a POD, the runtime need to keep a constant track, doing probes (readiness and Liveness), monitoring, Routing rules many other small bits that adds up to the load in the node.
Containers also requires processor time to run properly, even though you can allocate fractions of a CPU, adding too many containers\pod will increase the context switch and degrade the performance when the PODs are consuming their quota.
Each platform provider also set their own limits to provide a good quality of service and SLAs, overloading the nodes is also a risk, because a node is a single point of failure, and any fault in high density nodes might have a huge impact in the cluster and applications.
You should either consider:
Smaller nodes and add more nodes to the cluster or
Use Actors instead, where each client will be one Actor. And many actor will be running in a single container. To make it more balanced around the cluster, you partition the actors into multiple containers instances.
Regarding the limits, this thread has a good discussion about the concerns
Because of the hard limit if you have 10 servers you're limited to 1000 pods.
You might want to count also control plane pods in your 1000 available pods. Usually located in the namespace kube-system it can include (but is not limited to) :
node log exporters (1 per node)
metrics exporters
kube proxy (usually 1 per node)
kubernetes dashboard
DNS (scaling according to the number of nodes)
controllers like certmanager
A pretty good rule of thumb could be 80-90 application pods per node, so 10 nodes will be able to handle 800-900 clients considering you don't have any other big deployment on those nodes.
If you're using containers in order to gain perfs, creating node VMs will be against your goal. But if you're using containers as a way to deploy coherent environments and scale stateless applications then using VMs as node can make sense.
There are no magic rules and your context will dictate what to do.
As managing a virtualization cluster and a kubernetes cluster may skyrocket your infrastructure complexity, maybe kubernetes is not the most efficient tool to manage your workload.
You may also want to take a look at Nomad wich does not seem to have those kind of limitations and may provide features that are closer to your needs.

Schedule legacy applications as single instance on Kubernetes

A lot of legacy applications are deployed as containers. Most of them only need a few changes to work in a container but many of them are not built to scale, for example because they maintain session data or write to a volume (concurrency issues).
I was wondering if those applications are intended to run on Kubernetes and if so what is a good way to do so. Pods are not durable, so the desired way to start an application is by using a replication controller and setting replicas to 1. The RC ensures that the right amount of pods are running. The documentation also specifies that it kills pods if there are too many. I was wondering if that's ever the case (if a pod is not started manually).
I guess a database like Postgres (with an external data volume) is a good example. I have seen tutorials deploying those using a replication controller.
Creating a Replication Controller with 1 replica is indeed a good approach, it's more reliable than starting a single pod since you benefit from the auto-healing mechanism: in case the node your app is running on dies, your pod will be terminated an restarted somewhere else.
Data persistence in the context of a cluster management system like Kubernetes means that your data should be available outside the cluster itself (separate storage). I personally use EC2 EBS since our app runs in AWS, but Kubernetes supports a lot of other volume types. If your pod runs on node A, the volumes it uses will be mounted locally and inside your pod containers. Now if your pod is destroyed and restarted on node B this volume will be unmounted from node A and mounted on node B before the containers of your pod are recreated. Pretty neat.
Take a look at persistent volumes, this should be particularly interesting for you.