Properly manage user sessions in Keycloak and Kubernetes

Properly manage user sessions in Keycloak and Kubernetes - kubernetes

I have KeyCloak deployed to kubernetes.
When the pod restart for any reason (like a modification to the deployment) all user sessions are lost.
I see in documentation that session can only be stored in-memory. While it will be replicated, I found no documentation to ensure all sessions are replicated before the old pod goes down.
Strangely, My searches don't find people having any issue with this. Am I missing something?
My ideal solution would be to store the session data in a redis cluster.

Related

Running a stateless app as a statefulset (Kubernetes)

In the Kubernetes world, a typical/classic pattern is using Deployment for Stateless Applications and using StatefulSet for a stateful application.
I am using a vendor product (Ping Access) which is meant to be a stateless application (it plays the role of a Proxy in front of other Ping products such as Ping Federate).
The github repo for Ping Cloud (where they run these components as containers) shows them running Ping Access (a stateless application) as a Stateful Set.
I am reaching out to their support team to understand why anyone would run a Stateless application as a StatefulSet.
Are there other examples of such usage (as this appears strange/bizarre IMHO)?
I also observed a scenario where a customer is using a StatefulApp (Ping Federate) as a regular deployment instead of hosting them as a StatefulSet.
The Ping Cloud repository does build and deploy Ping Federate as a StatefulSet.
Honestly, both these usages, running a stateless app as a StatefulSet (Ping Access) and running a stateful app as a deployment (Ping Federate) sound like classic anti-patterns.

Apart from the ability to attach dedicated Volumes to StatefulSets you get the following features of which some might be useful for stateless applications:
Ordered startup and shutdown of Pods with K8s doing them one by one in an ordered fashion.
Possibility to guarantee that not more than a single Pod is running at a time even during unscheduled Pod restarts.
Stable DNS names for Pods.
I can only speculate, why Ping Federate uses a StatefulSet. Possibly, it has to do with access limitations of the downstream services it connects to.

The consumption of PingAccess is stateless, but the operation is very much stateful. Namely, the PingAccess admin console maintains a database for configuration, and part of that configuration includes clustered engine mapping and session keys.
Thus, if you were to take away the persistent volume, restarting the admin console would decouple all the engines in the cluster. Then the engines no longer receive configuration.. and web session keys would be mismatched.
The ping-cloud-base repo uses StatefulSet for engines not for persistent volumes, but for sts naming scheme. I personally disagree with this and recommend using Deployment for engines. The only downside is you then have to remove orphaned engines from the admin configuration. Orphaned engines meaning engine config that stays in the admin console db after the engine deployment is rolled/updated. These can be removed from the admin UI, or API. Pretty easy to script in a hook.
It would be ideal for an application that is not a datastore to run without persistent volume, but for the reasons mentioned above, the PingAccess admin console does require and act like a persistent datastore so I think StatefulSet is okay.
Finally, the Ping DevOps team focuses support on their helm chart (where engines are also deployments by default). I'd suspect the community and enterprise support is much larger there for folks deploying on their own. ping-cloud-base is a good place to pick up some hooks though.

Highly available stateful session implementation with K8S

How to implement memory state/session replications with K8S? For instance, a web shopping cart system replicates the user HTTP sessions among cluster nodes over the network so that if a node is down, a process in another node can take over the user sessions.
K8S has StatefulSet which uses the disk storages to assure the state persistency, I think. If a pod is down, the restarted pod takes over the state form the disk. However, the overhead of persisting in-memory user sessions to disk is high and may not be fast enough.
I suppose the solution could be using memory cache server or etcd like systems. Is it the established practice? In my understanding, K8S is good for stateless processing in scale, and StatefulSet had been introduced to address stateful situation but not sure it is good fit for situation where fast stateful handover is required.
Please advise.

How to implement memory state/session replications with K8S? For
instance, a web shopping cart system replicates the user HTTP sessions
among cluster nodes over the network so that if a node is down, a
process in another node can take over the user sessions.
To store the state it's best to use the Redis or in-memory database.
K8S has StatefulSet which uses the disk storages to assure the state
persistency, I think. If a pod is down, the restarted pod takes over
the state form the disk. However, the overhead of persisting in-memory
user sessions to disk is high and may not be fast enough.
You are right but maybe you have not tried it before, i been using the Redis for Production in K8s with million of users but never faced issues. Redis has two options for backup the keys if you deploy on K8s.
RDB and Append only-AOF, till now never faced Redis crashed or so, but only get crashed due to Out of Memory, so make sure your Key policy is set properly like LRU or so.
In my understanding, K8S is good for stateless processing in scale
You are right but people have been using the Deployment and Statefulsets for running Redis cluster and Elasticsearch clusters in K8s also with all backup and scaling options.
It's easy to configure & manage the DB with K8s while with VM not much scalability there.
We have been running stateful sets with Redis, Elasticsearch, RabbitMQ since long in Prod and have not seen many issues. Make sure you attach the SSD high IOPS disk to POD and you are good to go.
Nice example : https://github.com/loopbackio/loopback4-example-shopping/blob/master/kubernetes/README.md

Is there a way to tell where a pod was before it is migrated to another app node

Is there a way to tell from which node a pod was before it was migrated or can we get the details of a pod which got migrated of its previous node/pod-name details?
Thanks,

Without any sort of logging system? No.
You need to setup logging or some kind or monitoring to do it. Basically this means that if you don't keep the history of what was happening in your cluster, you won't be able to tell.
K8s does not store the historical information. All you can do is to lookup the current state.
This means that you can check where the current pod are running, but you can not tell what happened to all the pods ever created since the creation of a cluster.

Is Kubernetes' ETCD exposed for us to use?

We are working on provisioning our service using Kubernetes and the service needs to register/unregister some data for scaling purposes. Let's say the service handles long-held transactions so when it starts/scales out, it needs to store the starting and ending transaction ids somewhere. When it scales out further, it will need to find the next transaction id and save it with the ending transaction id that is covered. When it scales in, it needs to delete the transaction ids, etc. ETCD seems to make the cut as it is used (by Kubernetes) to store deployment data and not only that it is close to Kubernetes, it is actually inside and maintained by Kubernetes; thus we'd like to find out if that is open for our use. I'd like to ask the question for both EKS, AKS, and self-installed. Any advice welcome. Thanks.

Do not use the kubernetes etcd directly for an application.
Access to read/write data to the kubernetes etcd store is root access to every node in your cluster. Even if you are well versed in etcd v3's role based security model avoid sharing that specific etcd instance so you don't increase your clusters attack surface.
For EKS and GKE, the etcd cluster is hidden in the provided cluster service so you can't break things. I would assume AKS takes a similar approach unless they expose the instances to you that run the management nodes.
If the data is small and not heavily updated, you might be able to reuse the kubernetes etcd store via the kubernetes API. Create a ConfigMap or a custom resource definition for your data and edit it via the easily securable and namespaced functionality in the kubernetes API.
For most application uses run your own etcd cluster (or whatever service) to keep Kubernetes free to do it's workload scheduling. The coreos etcd operator will let you define and create new etcd clusters easily.

What happens when the Kubernetes master fails?

I've been trying to figure out what happens when the Kubernetes master fails in a cluster that only has one master. Do web requests still get routed to pods if this happens, or does the entire system just shut down?
According to the OpenShift 3 documentation, which is built on top of Kubernetes, (https://docs.openshift.com/enterprise/3.2/architecture/infrastructure_components/kubernetes_infrastructure.html), if a master fails, nodes continue to function properly, but the system looses its ability to manage pods. Is this the same for vanilla Kubernetes?

In typical setups, the master nodes run both the API and etcd and are either largely or fully responsible for managing the underlying cloud infrastructure. When they are offline or degraded, the API will be offline or degraded.
In the event that they, etcd, or the API are fully offline, the cluster ceases to be a cluster and is instead a bunch of ad-hoc nodes for this period. The cluster will not be able to respond to node failures, create new resources, move pods to new nodes, etc. Until both:
Enough etcd instances are back online to form a quorum and make progress (for a visual explanation of how this works and what these terms mean, see this page).
At least one API server can service requests
In a partially degraded state, the API server may be able to respond to requests that only read data.
However, in any case, life for applications will continue as normal unless nodes are rebooted, or there is a dramatic failure of some sort during this time, because TCP/ UDP services, load balancers, DNS, the dashboard, etc. Should all continue to function for at least some time. Eventually, these things will all fail on different timescales. In single master setups or complete API failure, DNS failure will probably happen first as caches expire (on the order of minutes, though the exact timing is configurable, see the coredns cache plugin documentation). This is a good reason to consider a multi-master setup–DNS and service routing can continue to function indefinitely in a degraded state, even if etcd can no longer make progress.
There are actions that you could take as an operator which would accelerate failures, especially in a fully degraded state. For instance, rebooting a node would cause DNS queries and in fact probably all pod and service networking functionality until at least one master comes back online. Restarting DNS pods or kube-proxy would also be bad.
If you'd like to test this out yourself, I recommend kubeadm-dind-cluster, kind or, for more exotic setups, kubeadm on VMs or bare metal. Note: kubectl proxy will not work during API failure, as that routes traffic through the master(s).

Kubernetes cluster without a master is like a company running without a Manager.
No one else can instruct the workers(k8s components) other than the Manager(master node)(even you, the owner of the cluster, can only instruct the Manager)
Everything works as usual. Until the work is finished or something stopped them.(because the master node died after assigning the works)
As there is no Manager to re-assign any work for them, the workers will wait and wait until the Manager comes back.
The best practice is to assign multiple managers(master) to your cluster.

Although your data plane and running applications does not immediately starts breaking but there are several scenarios where cluster admins will wish they had multi-master setup. Key to understanding the impact would be understanding which all components talk to master for what and how and more importantly when will they fail if master fails.
Although your application pods running on data plane will not get immediately impacted but imagine a very possible scenario - your traffic suddenly surged and your horizontal pod autoscaler kicked in. The autoscaling would not work as Metrics Server collects resource metrics from Kubelets and exposes them in Kubernetes apiserver through Metrics API for use by Horizontal Pod Autoscaler and vertical pod autoscaler ( but your API server is already dead).If your pod memory shoots up because of high load then it will eventually lead to getting killed by k8s OOM killer. If any of the pods die, then since controller manager and scheduler talks to API Server to watch for current state of pods so they too will fail. In short a new pod will not be scheduled and your application may stop responding.
One thing to highlight is that Kubernetes system components communicate only with the API server. They don’t
talk to each other directly and so their functionality themselves could fail I guess. Unavailable master plane can mean several things - failure of any or all of these components - API server,etcd, kube scheduler, controller manager or worst the entire node had crashed.
If API server is unavailable - no one can use kubectl as generally all commands talk to API server ( meaning you cannot connect to cluster, cannot login into any pods to check anything on container file system. You will not be able to see application logs unless you have any additional centralized log management system).
If etcd database failed or got corrupted - your entire cluster state data is gone and the admins would want to restore it from backups as early as possible.
In short - a failed single master control plane although may not immediately impact traffic serving capability but cannot be relied on for serving your traffic.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse