I have a sample question
Does it make sense to configure all the nodes of the akka cluster as seed nodes?
example:
cluster {
seed-nodes = [
"akka://application#127.0.0.1:2551",
"akka://application#127.0.0.1:2552",
"akka://application#127.0.0.1:2553",
"akka://application#127.0.0.1:2554",
"akka://application#127.0.0.1:2555",
"akka://application#127.0.0.1:2556",
"akka://application#127.0.0.1:2557",
"akka://application#127.0.0.1:2558",
"akka://application#127.0.0.1:2559",
"akka://application#127.0.0.1:2560",
"akka://application#127.0.0.1:2561",
"akka://application#127.0.0.1:2562"]
downing-provider-class = "akka.cluster.sbr.SplitBrainResolverProvider"
split-brain-resolver {
active-strategy = static-quorum
static-quorum {
quorum-size = 7
}
}
Are there disadvantages for this configuration?
I guess the answer has to be "it depends".
Seed nodes is one mechanism that enables new nodes to join akka cluster.
For your example to work you have to run all the nodes on the same host. I am guessing you're passing some JVM argument like -Dakka.remote.artery.canonical.port=2*** to bind each node to different port. That's fine, it will work. A new node starting up will try to join the cluster by contacting the seed nodes starting from the first until one of them responds.
In practice you probably want the cluster nodes running on different machines and that's when a static configuration like the one in your example can become a bit of a pain. This is because you'd need to know all the IP addresses beforehand and would need to guarantee that they will not change over time. This is perhaps possible in a network with statically assigned IPs but is nearly impossible with dynamically assigned IPs or in environments like Kubernetes. This is why there are other methods of cluster joining implemented (https://doc.akka.io/docs/akka/current/discovery/index.html).
So the disadvantage I see here is limitation of this configuration in any real-life scenario. As long as you're doing this to learn/experiment with Akka cluster, then it's all fine, though you can also argue that if you're doing that, then having a list of 12 seed nodes does not give you that much advantage over say 2 seed nodes as long as you can keep them up and running for the time of your experiment so that all the nodes can join the cluster.
Related
I know it is recommended to have an odd number of master nodes. But will k8s work if we have an even number of nodes? And what are the downsides?
The reason I'm asking is that I'm building an IoT cluster, where every node is a master node. All devices are the same and any device must be able to take up the master role if the current master fails.
Also the number of devices could be any, so the system should work with both odd or even numbers of nodes.
https://discuss.kubernetes.io/t/high-availability-host-numbers/13143/2 says that you should avoid ever having more than 7 master nodes due to the overhead of membership algorithms so depending on how many IoT nodes you have, you should consider a different architecture.
Nodes are supposed to be abstracted away from their purpose so you shouldn't need your user nodes to be the system nodes and this might introduce tightly coupled problems later on.
Hy !!
I was wondering if it could be possible to replicate an VMWare architecture in Kubernetes.
What I mean by that :
In place of having the Control-Panel always separated from the Worker Nodes, I would like to put them all together, at the end we would obtain a cluster of Master Nodes on which we can schedule applications. For now I'm using kata-container with containerd as such all applications are deployed in 'mini' VMs so there isn't the 'escape from the container' problem. The management of the Cluster would be done trough a special interface (eth0 1Gb). The users would be able to communicate with the apps that are deployed within the cluster trough another interface (eth1 10Gb). I would use Keepalived and HAProxy to elect my 'Main Master' and load balance the traffic.
The question might be 'why would you do that ?'. Well to assure High Availability at all time and reduce the management overhead, in place of having 2 sets of "entities" to manage (the control-plane and the worker nodes) simply reduce it to one, as such there won't be any problems such as 'I don't have more than 50% of my masters online so there won't be a leader elect', so now I would have to either eliminate master nodes from my cluster until the percentage of online master nodes > 50%, that would ask for technical intervention and as fast as possible which might result in human errors etc..
Another positive point would be the scaling, in place of having 2 parts of the cluster that I would need to scale (masters and workers) there would be only one, I would need to add another master/worker to the cluster and that's it. All the management traffic would be redirected to the Main Master that uses a Virtual IP (VIP) and in case of an overcharge the request would be redirected to another Node.
In the end I would have something resembling to this :
Photo - Architecture VMWare-like
I try to find disadvantages to this kind of architecture, I know that there would be etcd traffic on each Node but how impactful is it ? I know that there will be wasted resources for the Pods of the control-plane on each node, but knowing that these pods (except etcd) wont do much beside waiting, how impactful would it be ? Having each Node being capable to take the Master role there won't be any down time. Right now if my control-plane (3 masters) go down I have to reboot them or find the solution as fast as possible before there's a problem with one of the apps that turn on the worker Nodes.
The topology I'm using right now resembles the following :
Architecture basic Kubernetes
I'm new to kuberentes so the question might be seen as stupid but I would really like to know the advantages/disadvantages between the two and understand why it wouldn't be a good idea.
Thanks a lot for any help !! :slightly_smiling_face:
There are two reasons for keeping control planes on their own. The big one is that you only want a small number of etcd nodes, usually 3 or 5 and that's usually the bounding factor on the size of the control plane. You usually want the ability to scale worker nodes independently from that. The second issue is Etcd is very sensitive to IOPS brownouts and can get bad cascade failures if the machine runs low on IOPS.
And given that you are doing things on top of VMWare anyway, the overhead of managing 3 vs 6 VMs is not generally a difference in kind. This seems like a false savings in the long run.
There's a worker dial-in pattern described for Akka, particularly here: http://letitcrash.com/post/29044669086/balancing-workload-across-nodes-with-akka-2. It describes a way to fairly spread a load between multiple remote workers. It assumes there's only one master, and workers discover and register with it. Is there a way to support multiple masters with worker dial-in pattern, which supports fair and deterministic sharing of workers between multiple masters?
I imagine the following situation. Let's say there's a cluster with 2 different node roles: front-end and worker. There are multiple front-end nodes which run HTTP servers. Those front-ends delegate the business logic to actors running on worker nodes. The front-ends are behind simple HTTP round-robin load balancer (Nginx).
I'd like to have a shared pool of worker nodes that can be used by any of the front-ends. If one node has more load than other, it should consume more worker nodes' capacity. If the load is too heavy, I should be able to add more worker nodes (probably automatically via auto-scaling), and they should, again, support all of the front-ends fairly, on a need basis.
There is a couple of naive implementation leading to different deficiencies. If workers somehow decide which single front-end to support, then worker capacity might not be spread fairly, because front-end load is highly dynamic. Alternatively, if workers will register with all of the front-ends, there might be a race condition when multiple front-ends request some work from a single worker. All in all, I don't see a good way of supporting this. Has anyone any better idea?
By using clusters current state we can add more than one master
.match(CurrentClusterState.class, state -> {
for (Member member : state.getMembers()) {
if (member.status().equals(MemberStatus.up())) {
register(member);
}
}
})
With the understanding that Ubernetes is designed to fully solve this problem, is it currently possible (not necessarily recommended) to span a single K8/OpenShift cluster across multiple internal corporate datacententers?
Additionally assuming that latency between data centers is relatively low and that infrastructure across the corporate data centers is relatively consistent.
Example: Given 3 corporate DC's, deploy 1..* masters at each datacenter (as a single cluster) and have 1..* nodes at each DC with pods/rc's/services/... being spun up across all 3 DC's.
Has someone implemented something like this as a stop gap solution before Ubernetes drops and if so, how has it worked and what would be some considerations to take into account on running like this?
is it currently possible (not necessarily recommended) to span a
single K8/OpenShift cluster across multiple internal corporate
datacententers?
Yes, it is currently possible. Nodes are given the address of an apiserver and client credentials and then register themselves into the cluster. Nodes don't know (or care) of the apiserver is local or remote, and the apiserver allows any node to register as long as it has valid credentials regardless of where the node exists on the network.
Additionally assuming that latency between data centers is relatively
low and that infrastructure across the corporate data centers is
relatively consistent.
This is important, as many of the settings in Kubernetes assume (either implicitly or explicitly) a high bandwidth, low-latency network between the apiserver and nodes.
Example: Given 3 corporate DC's, deploy 1..* masters at each
datacenter (as a single cluster) and have 1..* nodes at each DC with
pods/rc's/services/... being spun up across all 3 DC's.
The downside of this approach is that if you have one global cluster you have one global point of failure. Even if you have replicated, HA master components, data corruption can still take your entire cluster offline. And a bad config propagated to all pods in a replication controller can take your entire service offline. A bad node image push can take all of your nodes offline. And so on. This is one of the reasons that we encourage folks to use a cluster per failure domain rather than a single global cluster.
My understanding is :
Seed node maintains all the nodes list in cluster.
Lets say if we have to add a new node to the cluster, we have to enter the new node name in the seed list of seed server and then new node will be part of the ring.
I am assuming we don't have to mention any thing about the seed server in the peer nodes.
correct me if my understanding incorrect.
I read some where Failure in "Seed Node" doesn't cause any problem. Lets say if the seed node is crashed how the ring information is maintained?
I want to clarify because that quote from the docs is old and is was never exactly precise.
Even after bootstrapping, seed nodes still play a role in Gossip.
There is no additional impact if you have a seed node that goes down. Though if you need to replace a seed node you should follow the guide in the docs.
Details:
In addition to helping new nodes bootstrap, seed nodes are also used to prevent split brain in your cluster. A node finds out about other nodes when it handshakes with a node that already has information about other nodes from recent gossip operations.
Gossip.run() happens every second. In a single gossip run a node will handshake with one random live node, one random dead node--if any--based on some probability, and one random seed node if the random node wasn't seed--also based on some probability. As your list of seed nodes increases, the more nodes you will be handshaking with. Per this logic, the probabilistic frequency with which handshakes to the list of seed nodes occurs will increase as your proportion of seed nodes increases.
However, as noted above, step 3 only happens if step 1 did not occur on a seed node. So the probability of having to do step 3 increases with added seeds, maxing out at the point where half your nodes are seeds (.25 chance) and then decreases again.
It is recommended to keep 3 seed nodes per DC. Do not add all your nodes as seed nodes
It is the other way round: In the configuration of your new node you point to another, already existing node as the seed provider. The seed-provider is the initial contact point for a new node joining a cluster. After the node has joined the cluster it remembers the topology and does not require the seed provider any more.
From the Cassandra docs:
Note: The seed node designation has no purpose other than
bootstrapping the gossip process for new nodes joining the cluster.
Seed nodes are not a single point of failure, nor do they have any
other special purpose in cluster operations beyond the bootstrapping
of nodes.