What is the best way to persist user sessions in Keycloak? - keycloak

I'm using a standalone Keycloak server. I have created one realm and a user inside that realm. When I log in the user and restart keycloak server, the session gets lost.
I am aware that keycloak saves user sessions data in Infinispan. But is there any way I can save/persist this user session data?
Or shall I create multiple nodes cluster and replicate the keycloak session data?
Please suggest what's best.

We also have this problem.
So far I think there are 2 solutions, neither of them is perfect:
1. Keep sessions in infinispan
You can use external infinispan instance as described in the docs. This is cumbersome because you need to keep an external infinispan instance.
If you don't want to use an external infinispan instance, you can set CACHE_OWNERS_COUNT in the docker image >=2. This will rebalance the cache between nodes and will make sure that an entry is saved in at lease CACHE_OWNERS_COUNT nodes. If you have many sessions (>1Million) you will run out of memory and the startup time will increase substantially because the cache needs to be rebalanced at each deploy.
Another issue is that you would loose the sessions when updating the infinispan instance.
2. Use offline sessions
Offline sessions are kept in the db and some of them are kept in the offline_sessions_cache. You can limit the number of offline sessions keycloak keeps in memory and stop preloading offline session for faster startup as described here.
This also has drawbacks:
SSO will not work after the server is restarted
You cannot change any offline session notes
I think there are some PRs in keycloak to have a persistent session store build in so please keep an eye on the progress on the keycloak Github page

Related

User limitation on Postgresql synthetic monitoring using Airflow

I am trying to write a synthetic monitoring for my on-prem postgresql service, using airflow. The monitoring should return if a cluster is available for creating tables, writing and reading data, and deleting tables.
The clusters on my service are using SSL certificates for authentication, which means a client is required to provide a suitable client certificate in order to connect to the cluster.
Currently, I have implemented my monitoring by creating a global user which will have a certificate with permissions to all the cluster. The user will have permissions to create, write and read only on one schema, dedicated to this monitoring. Using airflow, I will connect with this user each of my postgresql clusters and try to create a table, write to it, read, and then delete it. If one of the actions fails - the DAG will write a log describing the reason for failure.
My main problem with this solution it not being able to limit such a powerful user with accessibility to all of my clusters. In case an intruder will get the user's client certificate, he would be able to explode the DB storage by writing huge amount of data or overload queries and fail the cluster.
I am looking for some ideas for limiting this user so it will be able to act only for it's purpose- the simple actions required for this monitoring, and could not be exploit by an attacker. Alternatively, I would appreciate any suggestions for different implementation for this monitoring.
I searched for build in postgresql configurations that will allow me to limit the dedicated monitoring schema / limiting the amount of queries performed by the user.

How To Design a Distributed Logging System in Kubernetes?

I'm designing a distributed application, comprised of several Spring microservices that will be deployed with Kubernetes. It is a batch processing app, and a typical request could take several minutes of processing, with the processing getting distributed across the services, using Kafka as a message broker.
A requirement of the project is that each request will generate a log file, which will need to be stored on the application file store for retrieval. The current design is, all the processing services write log messages (with the associated unique request ID) to Kafka, and there is a dedicated logging microservice that reads these messages down, does some formatting and should persist them to the log file associated with the given request ID.
I'm very unfamiliar with how files should be stored in web applications. Should I be storing these log files to the local file system? If so, wouldn't that mean this "logging service" couldn't be scaled? For example, if I scaled the log service to 2 instances, then each instance would only have access to half of the log files in theory. And if a user makes a request to retrieve a log file, there is no guarantee that the requested log file will be at whatever log service instance the Kubernetes load balancer routed them too.
What is the currently accepted "best practice" for having a file system in a distributed application? Or should I just accept that the logging service can never be scaled up?
A possible solution I can think of would just store the text log files in our MySQL database as TEXT rows, making the logging service effectively stateless. If someone could point out any potential issues with this that would be much appreciated?
deployed with Kubernetes
each request will generate a log file, which will need to be stored on the application file store
Don't do this. Use a Fluentd / Filebeat / promtail / Splunk forwarder side car that gathers stdout from the container processes.
Or have your services write to a kafka logs topic rather than create files.
With either option, use a collector like Elasticsearch, Grafana Loki, or Splunk
https://kubernetes.io/docs/concepts/cluster-administration/logging/#sidecar-container-with-a-logging-agent
wouldn't that mean this "logging service" couldn't be scaled?
No, each of these services are designed to be scaled
possible solution I can think of would just store the text log files in our MySQL database as TEXT rows,
Sure, but Elasticsearch or Solr are purpose-built for gathering and searching plaintext, not MySQL.
Don't treat logs as something application specific. In other words, your solution shouldn't be unique to Spring

High-Availability of Keycloak across remote sites

I’ve been looking into Keycloak as an on-prem IAM and SSO solution for my company. One thing that I’m unclear on from reading the documentation is if Keycloak’s clustered mode can handle our requirements for instance federation across sites.
We have some remote manned sites that occasionally run critical telemetry-gathering processes. Our AD domain is replicated to those sites.
The issue is that there is a single internet link to the sites. If we had keycloak at the main office, and the internet link went down for a day, any software at the remote site that relies on keycloak to authenticate wouldn’t work (which would be a big problem).
Can we set up Keycloak in a cluster mode (ie, putting an instance at each site), so that if this link went out, remote users are able to connect to their local instance automatically and authenticate with local apps? What happens when the connection is restored and the databases are out of sync - does keycloak automatically repair this?
Cheers
In general answer is "yes", you can setup two keycloak instances in different locations, and link them with each other via cluster (under the hood it would be infinispan cache replication). But it depends on details of your infrastructure.
Main goal of Keycloak cluster is to perform sessions cache replication between nodes. So in simplest case you can setup two nodes that looks to same DB instance, and when first node goes down second would handle whole job, but if DB also goes down second node would be useless. In such case each site should have both separate Keycloak node and DB replica (how to achieve DB replication is out of scope of this topic). Third option is to use multitenancy feature of keycloak application adapter, in that case you secure application by two separate Keycloak instances, that know nothing about each other.
Try to start from this documentation article:
https://www.keycloak.org/docs/latest/server_installation/index.html#crossdc-mode

Keycloak cluster standalone-ha issue data users

I have a cluster with 2 servers that are in HA, there is some configuration so that when I make a change for example in the password of a user or change of role, etc. the change is made immediately on the 2 servers?
The problem is that a user's password is changed and it does not update on the other server immediately, the same happens when a user is assigned a role mapping, it never updates on both servers, only when the server is reboot
OS: Linux (ubuntu 16.04)
keycloak version: 11.0
Thanks for the help
I can't tell from the first paragraph of your question whether Keycloak has ever propagated the changes to the other servers or not.
Does the setup you use usually propagate changes?
If not it sounds like there are issues with your cluster setup.
Do the nodes discover each other? You can check the logs on startup, there is a good illustration on the Keycloak blog on how to check this.
In general it would be a good idea to look over the recommended clustering setup in the docs.
You could change number of owners in the cluster, that way both nodes own the data before it is put in the database. Might help if the issue is that the changes are not immediate enough.

How to do zero-downtime rolling updates for app with (long-lived) sticky sessions using containers

Am trying to figure out how to provide zero-downtime rolling updates of a webapp that has long-lived interactive user sessions that should be sticky, based on a JSESSIONID cookie.
For this (and other) reasons I'm looking at container technology, like say Docker Swarm or Kubernetes.
I am having difficulties finding a good answer on how to:
Make sure new sessions go to the latest version of the app
While existing sessions remain being served by whichever version of
the app they were initiated on
Properly clean up the old version once all sessions to/on it were
closed
Some more info:
Requests are linked to a session based on a JSESSIONID cookie
Sessions could potentially live for days, but am able to terminate them from within the app within say a 24hr timeframe (Sending the user a notification to "logout/login again as there is a new version or that they are otherwise automatically logged out at 12pm" for example)
Of course for each version of the app there are multiple containers already running in load-balanced fashion
I don't mind the number of total containers growing, for example if each of the old versions containers are all still up and running because they would all still host 1 session, while the majority of the users are already on the new version of the app
So, my idea of the required flow is something along these lines:
Put up the new version of the app
let all new connections (the ones w/o the JSESSIONID cookie set) go to the new version of the app once
a container of the old version of the app is not serving sessions
anymore, remove the container/....
As I mentioned, I'm looking into Kubernetes amd Docker Swarm, but open for other suggestions, but the end solution should be able to run on cloud platforms (currently using Azure, but Google or Amazon clouds might be used in the future)
Any pointers/tips/suggestions or ideas appreciated
Paul
EDIT:
In answer to #Tarun question and general clarification: yes, I want no downtime. The way I envision this is that the containers hosting the old version will keep running to serve all existing sessions. Once all sessions on the old servers have ended, the old server is removed.
The new containers are only going to serve new sessions for users that startup the app after the rollout of the new version has started.
So, to give an example:
- I launch a new session A of the old version of the app at 9am
- At 10am a new version is rolled out.
- I continue to use session A with remains hosted on a container running the old version.
- at noon I go for lunch and log out
- as I was the last session connected to the container running the old version, the container will now be destroyed
- at 1pm I come back, log back in and I get the new version of the app
Makes sense?
Your work load might not be a good fit for Kubernetes/containers ith its current architecture. The best way I can come up to solve this is it to move the state to PV/PVC and migrate the PV to the new containers so the new container can have state from the old session, now how to migrate the calls for that session to the proper node I'm not sure how to do that efficiently.
Ideally you would separate your data/caching layer from your service into something like redis and then it wouldn't matter which of the nodes service the request.