Based on the documentation:
Domain Clustered Mode:
Domain mode is a way to centrally manage and publish the configuration for your servers.
Running a cluster in standard mode can quickly become aggravating as
the cluster grows in size. Every time you need to make a configuration
change, you have to perform it on each node in the cluster. Domain
mode solves this problem by providing a central place to store and
publish configurations. It can be quite complex to set up, but it is
worth it in the end. This capability is built into the WildFly
Application Server which Keycloak derives from.
I tried the example setup from the user manual and it really the maintenance of multiple configuration.
However, as High Availability is concerned, this is not quite resilient. When the master node goes down, the Auth Server will stop functioning since all the slave nodes listen to the domain controller.
Is my understanding correct here? Or am I missing something?
If this is the case, to ensure High Availability then Standalone-HA is the way to go, right?
Wildfly nodes management and clustering is ortogonal features.
Clustering in keycloak in fact is just a cache replication (all kinds of sessions, login failures etc...). So if you want to enable fault tolerance for your sessions you just have to properly configure cache replication (and usually nodes discovery), and to do that you can simply just make owners param be greater that 1:
<distributed-cache name="sessions" owners="2"/>
<distributed-cache name="authenticationSessions" owners="2"/>
<distributed-cache name="offlineSessions" owners="2"/>
<distributed-cache name="clientSessions" owners="2"/>
<distributed-cache name="offlineClientSessions" owners="2"/>
<distributed-cache name="loginFailures" owners="1"/>
<distributed-cache name="actionTokens" owners="2">
Now all new sessions that was initiated on first node will be replicated to another node, so if first node goes down end-user can be served by another node. For example you can have 3 node total, and require at least 2 sessions replica distributed among those 3 nodes.
Now if we look to domain vs ha mode, we can say that it just all about how those jboss/wildfly server configs will be delivered to target node. In HA mode all configs supplied with server runtime, in domain mode this configs will be fetched from domain controller.
I suggest you to achieve replication with HA mode, and then if required move to Domain mode. Also if we take to account modern approach to containerize everything, HA mode is more appropriate for containerization. Parametrized clustering settings could be injected during container build, with ability to alter them in runtime via environment (e.g. owners param could be drained from container enviroment variable)
There was some articles in Keycloak blog about clustering like:
this
Also i suggest to check out Keycloak docker container image repository:
here
Related
I am trying to configure a Active/Passive cluster with two nodes (using OpenShift). The second passive node should be a hot standby, in other words it is up and running but not doing anything, until the first node dies. Then the passive node becomes active and a new passive node is started.
I have read the High Availability documentation, however it just seems to cover the theory. Furthermore it seems like overkill ( I am thinking there might be an easier way to meet my goal).
Where would I start?
What you are asking for goes against the usual practice for how Kubernetes/OpenShift is used. You wouldn't have hot standby nodes, you would always use all nodes in the cluster. You would then allow for enough additional capacity in your cluster such that loosing a node doesn't cause a problem as other nodes would have enough capacity to then run the applications. In this scenario the Kubernetes scheduler would automatically restart any applications which were on a failed node on the other nodes in the cluster, without you needing to perform any explicit failover steps.
So don't try and do anything special, setup your cluster with the two nodes, with applications being distributed across both. If you need to have the ability to run with only a single node, make sure it has enough capacity to run everything. If over time you add more applications and one node is not enough, add a third node, with all three being used in normal case. You can then handle failure of a single node again.
I am trying to setup a very simple cluster of 2 ejabberd nodes. However, while trying to go through the official ejabberd documentation and using the join_cluster argument that comes along with the ejabberdctl script, I always end up with a multi-master cluster where both the mnesia databases have replicated data.
Is it possible to set up a ejabberd cluster in master-slave mode? And if yes, then what I am I missing?
In my understanding, a slave get the data replicated but would simply not be active. The slave needs the data to be able to take over the task of the master at some point.
It seems to means that the core of the setup you describe is not about disabling replication but about not sending traffic to the slave, no ?
In that case, this is just a matter of configuring your load balancing mechanism to route the traffic accordingly to your preference..
So far what I've come across is this -
Setting up ejabberd cluster in a master-slave configuration, there would be a single point of failure and people have experienced issues when even after fixing the master (if it goes down), the cluster doesn't become operable again. Also sometimes, ejabberd instances of every slave would have to be revisited again to get them working properly, or mnesia commands would have to be in-putted again to make master communicate with the slaves.
Setting up ejabberd cluster in a multi-master configuration then any of the nodes can be taken out of the cluster without bringing the whole cluster down. Basically, there is no single point of failure and, this is also the way in which the official documentation for ejabberd tells you to do via the join_cluster argument they expose in the ejabberdctl script. HOWEVER, in this case, all the data is replicated across both nodes which is a big performance overhead in my opinion.
So it boils down to this.
What is the best/recommended/popular mode in which an ejabberd cluster of 2 nodes should be set up mostly with respect to performance but keeping other critical factors (fault tolerance, load balancing) in mind as well.
There is only a single mode in ejabberd. Basically, it works like what you describe as multi-master. master-slave would basically be the same setup without any traffic sent to the second node by load balancing mechanism.
So case 2 is the way to go.
If I setup a replication controller for something like a database, how does it keep the data in the replicas in-sync? If one of the replica goes down, how does it bring it back up with the latest data?
A replication controller ensures that the desired number of pods with the same template are kept running in the system. The replication controller itself does not know anything about what it is running, and doesn't have any special hooks for containers running databases. This means that if you want to run a container with a database with more than one replica, then it is easiest to run a database that can natively do replication and discovery (possibly with the injection of some environment variables).
An alternative is to run a pod with two containers, where one container is a vanilla database, and the second "side-car" container is used to implement the necessary replication / synchronization / master election or whatever extra functionality you need to provide to make the database run in a clustered environment. This is more flexible (you can run a database that wasn't initially designed to run in a clustered environment) but also requires more custom work to make it scale.
I have created a cluster consists of three RabbitMQ nodes using join_cluster command.
i.e.
rabbitmqctl –n rabbit2#MYPC1 join_cluster rabbit2#MYPC1
(currently the cluster runs on a single computer)
Questions:
In the documents it says there is one implemetation for active passive and one for active active.
What did I configure?
How do I know?
How can it be changed?
Is there a big performance trade off between Active Active & Active Passive?
What is the best practice to interact with active/active?
i.e. install a load balancer? apache that will round robin
What is the best practice to interact with active/passive?
if I interact with only the active - this is a single point f failure
Thanks.
I have been doing some research into availability options with RabbitMQ and while I am still fairly new, I'll attempt to answer your questions with the knowledge I do have. Please understand that these answers are not intended to be comprehensive.
Before getting to the questions and answers, I think it's worth pointing out that I think using the terms Active/Active and Active/Passive in the context of a cluster running on a single computer does not really apply. Active/Active and Active/Passive are typically terms used to describe highly available clusters where you have a system of more than one logical server (in your case, multiple RabbitMQ clusters), shared/redundant storage, network capabilities, power, etc.
What did I configure?
Without any load balancing for the nodes in your cluster or queue mirroring you have neither, meaning you do not have a highly available cluster.
How do I know?
RabbitMQ does not provide any connection management so traffic with a failed node will not automatically be passed on to a different node, which is required for an active/active cluster. Without queue mirroring you do not have fully redundant nodes in your cluster, which is required for active/passive.
How can it be changed?
Even if you implement load balancing and/or queue mirroring you are missing a number of requirements to offer a highly-available RabbitMQ cluster. Primarily, with a RabbitMQ cluster you only have a single logical broker (at least two are required for an HA cluster).
Is there a big performance trade off between Active Active & Active Passive?
I think you will start seeing performance penalties as you start introducing data replication and/or redundancy, which would affect both Active/Active and Active/Passive. If you are using synchronous data replication then you will see a bigger performance hit than if you replicate data asynchronously. There's a lot more to it, but to me this feels like there may be a bigger performance hit by using Active/Active but this depends heavily on how fast all of the pieces are working together. In Active/Passive where you may be using asynchronous replication across servers your performance may appear better but in a failover situation you would need to wait for that replication to complete before you can switch to your secondary server.
What is the best practice to interact with active/active? i.e. install a load balancer? apache that will round robin
RabbitMQ recommends using a load balancer so that you do not have to leak details about the nodes in your cluster to the clients.
What is the best practice to interact with active/passive? if I interact with only the active - this is a single point of failure
It is a point of failure but with Active/Passive you can implement a failure strategy to retry the next available server or all remaining servers. With these strategies in place you can establish a scenario where the capabilities of your cluster are merely degraded while a failover is happening instead of totally unavailable. Also, you can interact with the passive side but the types of interactions may be very different (i.e. read-only access) since there may be fewer resources available on the passive side and there may be delays in data replication.
Here are some references used to gather this information:
High-Availability Cluster on Wikipedia
Clustering with RabbitMQ
Highly Available Queues in a RabbitMQ Cluster
High Availability in RabbitMQ