Is it possible to assign a different master node to each floating IP in Keepalived? - haproxy

We have X floating IPs that deal with different types of traffic. At the moment, only one node of Y nodes is assigned to be the master so all of the traffic goes through it before being distributed. I'm looking to run multiple instances such that each floating IP gets a different master node. Is this possible?
This seems somewhat applicable to my case. The person was in a situation wherein they needed to set up two VRRP instances on the same interface. I'm not sure, however, if this then gives the desired behaviour I'm looking for.

Related

Can k8s work with an even number of master nodes

I know it is recommended to have an odd number of master nodes. But will k8s work if we have an even number of nodes? And what are the downsides?
The reason I'm asking is that I'm building an IoT cluster, where every node is a master node. All devices are the same and any device must be able to take up the master role if the current master fails.
Also the number of devices could be any, so the system should work with both odd or even numbers of nodes.
https://discuss.kubernetes.io/t/high-availability-host-numbers/13143/2 says that you should avoid ever having more than 7 master nodes due to the overhead of membership algorithms so depending on how many IoT nodes you have, you should consider a different architecture.
Nodes are supposed to be abstracted away from their purpose so you shouldn't need your user nodes to be the system nodes and this might introduce tightly coupled problems later on.

akka cluster : configuration seed node

I have a sample question
Does it make sense to configure all the nodes of the akka cluster as seed nodes?
example:
cluster {
seed-nodes = [
"akka://application#127.0.0.1:2551",
"akka://application#127.0.0.1:2552",
"akka://application#127.0.0.1:2553",
"akka://application#127.0.0.1:2554",
"akka://application#127.0.0.1:2555",
"akka://application#127.0.0.1:2556",
"akka://application#127.0.0.1:2557",
"akka://application#127.0.0.1:2558",
"akka://application#127.0.0.1:2559",
"akka://application#127.0.0.1:2560",
"akka://application#127.0.0.1:2561",
"akka://application#127.0.0.1:2562"]
downing-provider-class = "akka.cluster.sbr.SplitBrainResolverProvider"
split-brain-resolver {
active-strategy = static-quorum
static-quorum {
quorum-size = 7
}
}
Are there disadvantages for this configuration?
I guess the answer has to be "it depends".
Seed nodes is one mechanism that enables new nodes to join akka cluster.
For your example to work you have to run all the nodes on the same host. I am guessing you're passing some JVM argument like -Dakka.remote.artery.canonical.port=2*** to bind each node to different port. That's fine, it will work. A new node starting up will try to join the cluster by contacting the seed nodes starting from the first until one of them responds.
In practice you probably want the cluster nodes running on different machines and that's when a static configuration like the one in your example can become a bit of a pain. This is because you'd need to know all the IP addresses beforehand and would need to guarantee that they will not change over time. This is perhaps possible in a network with statically assigned IPs but is nearly impossible with dynamically assigned IPs or in environments like Kubernetes. This is why there are other methods of cluster joining implemented (https://doc.akka.io/docs/akka/current/discovery/index.html).
So the disadvantage I see here is limitation of this configuration in any real-life scenario. As long as you're doing this to learn/experiment with Akka cluster, then it's all fine, though you can also argue that if you're doing that, then having a list of 12 seed nodes does not give you that much advantage over say 2 seed nodes as long as you can keep them up and running for the time of your experiment so that all the nodes can join the cluster.

Kubernetes : Disadvantages of an all Master cluster

Hy !!
I was wondering if it could be possible to replicate an VMWare architecture in Kubernetes.
What I mean by that :
In place of having the Control-Panel always separated from the Worker Nodes, I would like to put them all together, at the end we would obtain a cluster of Master Nodes on which we can schedule applications. For now I'm using kata-container with containerd as such all applications are deployed in 'mini' VMs so there isn't the 'escape from the container' problem. The management of the Cluster would be done trough a special interface (eth0 1Gb). The users would be able to communicate with the apps that are deployed within the cluster trough another interface (eth1 10Gb). I would use Keepalived and HAProxy to elect my 'Main Master' and load balance the traffic.
The question might be 'why would you do that ?'. Well to assure High Availability at all time and reduce the management overhead, in place of having 2 sets of "entities" to manage (the control-plane and the worker nodes) simply reduce it to one, as such there won't be any problems such as 'I don't have more than 50% of my masters online so there won't be a leader elect', so now I would have to either eliminate master nodes from my cluster until the percentage of online master nodes > 50%, that would ask for technical intervention and as fast as possible which might result in human errors etc..
Another positive point would be the scaling, in place of having 2 parts of the cluster that I would need to scale (masters and workers) there would be only one, I would need to add another master/worker to the cluster and that's it. All the management traffic would be redirected to the Main Master that uses a Virtual IP (VIP) and in case of an overcharge the request would be redirected to another Node.
In the end I would have something resembling to this :
Photo - Architecture VMWare-like
I try to find disadvantages to this kind of architecture, I know that there would be etcd traffic on each Node but how impactful is it ? I know that there will be wasted resources for the Pods of the control-plane on each node, but knowing that these pods (except etcd) wont do much beside waiting, how impactful would it be ? Having each Node being capable to take the Master role there won't be any down time. Right now if my control-plane (3 masters) go down I have to reboot them or find the solution as fast as possible before there's a problem with one of the apps that turn on the worker Nodes.
The topology I'm using right now resembles the following :
Architecture basic Kubernetes
I'm new to kuberentes so the question might be seen as stupid but I would really like to know the advantages/disadvantages between the two and understand why it wouldn't be a good idea.
Thanks a lot for any help !! :slightly_smiling_face:
There are two reasons for keeping control planes on their own. The big one is that you only want a small number of etcd nodes, usually 3 or 5 and that's usually the bounding factor on the size of the control plane. You usually want the ability to scale worker nodes independently from that. The second issue is Etcd is very sensitive to IOPS brownouts and can get bad cascade failures if the machine runs low on IOPS.
And given that you are doing things on top of VMWare anyway, the overhead of managing 3 vs 6 VMs is not generally a difference in kind. This seems like a false savings in the long run.

Single Kubernetes/OpenShift cluster/instance across datacenters?

With the understanding that Ubernetes is designed to fully solve this problem, is it currently possible (not necessarily recommended) to span a single K8/OpenShift cluster across multiple internal corporate datacententers?
Additionally assuming that latency between data centers is relatively low and that infrastructure across the corporate data centers is relatively consistent.
Example: Given 3 corporate DC's, deploy 1..* masters at each datacenter (as a single cluster) and have 1..* nodes at each DC with pods/rc's/services/... being spun up across all 3 DC's.
Has someone implemented something like this as a stop gap solution before Ubernetes drops and if so, how has it worked and what would be some considerations to take into account on running like this?
is it currently possible (not necessarily recommended) to span a
single K8/OpenShift cluster across multiple internal corporate
datacententers?
Yes, it is currently possible. Nodes are given the address of an apiserver and client credentials and then register themselves into the cluster. Nodes don't know (or care) of the apiserver is local or remote, and the apiserver allows any node to register as long as it has valid credentials regardless of where the node exists on the network.
Additionally assuming that latency between data centers is relatively
low and that infrastructure across the corporate data centers is
relatively consistent.
This is important, as many of the settings in Kubernetes assume (either implicitly or explicitly) a high bandwidth, low-latency network between the apiserver and nodes.
Example: Given 3 corporate DC's, deploy 1..* masters at each
datacenter (as a single cluster) and have 1..* nodes at each DC with
pods/rc's/services/... being spun up across all 3 DC's.
The downside of this approach is that if you have one global cluster you have one global point of failure. Even if you have replicated, HA master components, data corruption can still take your entire cluster offline. And a bad config propagated to all pods in a replication controller can take your entire service offline. A bad node image push can take all of your nodes offline. And so on. This is one of the reasons that we encourage folks to use a cluster per failure domain rather than a single global cluster.

Why does a mongodb replica set need an odd number of voting members?

If find the replica set requirement a bit confusing, and I'm probably missing something obvious (like under which condition there are elections).
I understand that in normal operations you need quorum, and a voting takes place and to get a majority you need and odd numbers of machines.
But since we use a replica set for failover, if the master dies, then we are left with an even number of voting members, which based on my limited experience lengthen the time to elect a primary.
Also according to the documentation, the addition of a voting member doesn't start an election, it would seem that starting (booting) you replica set with an even number of nodes would make more sense?
So if we start say with 4 machines in the replica set, and one machine dies, there is a re-election with 3 machines, fast quorum. We add a machine back to get back to our normal operation state, no re-election and we are back to our normal operation conditions.
Can someone shed a light on this?
TL;DR: With single master systems, even partitions make it impossible to determine which remainder still has a majority, taking both systems down.
Let N be a cluster of four machines:
One machine dies, the others resume operation. Good.
Two machines die, we're offline because we no longer get a majority. Bad.
Let M be a cluster of three machines:
One machine dies, the others resume operation. Good.
Two machines die, we're offline because we no longer get a majority. Bad.
=> Same result at 3/4 of the cost.
Now, let's add an assumption or two:
We're also going to operate some kind of server application that uses the database
The network can be partitioned
Let's say you have two datacenters, one with two database instances and the backend server machines. If the connection to the backup center (which has one MongoDB instance) fails, you're still online.
Now if you added a second MongoDB instance at the backup data center, a network partition would, despite seemingly higher redundancy, yield lower availability since we'd lose the majority in case of a network partition and can't continue to operate.
=> Less availability at higher cost. But that doesn't answer the question yet.
Let's say you're really worried about availability: You have two data centers, with backend servers in both datacenters, anycast IPs, the whole deal. Now the network between the two DCs is partitioned, but some clients connect to DC A while other reach DC B. How do you now determine which datacenter may accept writes? It's not possible - this is why the odd number is necessary.
You don't actually need Anycast IPs, BGP or any fancy stuff for the problem to become real, any writing application (like a worker, a stale request, anything) would require later merging different writes, which is a completely different concurrency scheme.