Avoid a k8s master node / etc node to be a leader - kubernetes

For HA and Quorum I will install three master / etc nodes in three different data centers.
But I want to configure one node to never become a leader. Only acts as follower for etcd quorum.
Is this possible?

I believe, today it is not a supported option and is not recommended.
what you want is to have 3 node control plane ( including etcd ) and one of the node should participate in leader election but not become leader and shouldnt store data. you are looking for some kind of ARBITER feature that exists in mongodb HA cluster.
ARBITER feature is not supported in ETCD. you might need to raise a PR to get that addressed.
The controller manager and scheduler always connect the local apiserver. You might want to route those calls to apiserver on the active master. You might need to open another PR for kubernetes community to get that addressed.

Related

Can etcd detect problems and elect leaders for other clusters?

To my knowledge, etcd uses Raft as a consensus and leader selection algorithm to maintain a leader that is in charge of keeping the ensemble of etcd nodes in sync with data changes within the etcd cluster. Among other things, this allows etcd to recover from node failures in the cluster where etcd runs.
But what about etcd managing other clusters, i.e. clusters other than the one where etcd runs?
For example, say we have an etcd cluster and separately, a DB (e.g. MySQL or Redis) cluster comprised of master (read and write) node/s and (read-only) replicas. Can etcd manage node roles for this other cluster?
More specifically:
Can etcd elect a leader for clusters other than the one running etcd and make that information available to other clusters and nodes?
To make this more concrete, using the example above, say a master node in the MySQL DB cluster mentioned in the above example goes down. Note again, that the master and replicas for the MySQL DB are running on a different cluster from the nodes running and hosting etcd data.
Does etcd provide capabilities to detect this type of node failures on clusters other than etcd's automatically? If yes, how is this done in practice? (e.g. MySQL DB or any other cluster where nodes can take on different roles).
After detecting such failure, can etcd re-arrange node roles in this separate cluster (i.e. designate new master and replicas), and would it use the Raft leader selection algorithm for this as well?
Once it has done so, can etcd also notify client (application) nodes that depend on this DB and configuration accordingly?
Finally, does any of the above require Kubernetes? Or can etcd manage external clusters all by its own?
In case it helps, here's a similar question for Zookeper.
etcd's master election is strictly for electing a leader for etcd itself.
Other clusters, however can use a distributed strongly-consistent key-value store (such as etcd) to implement their own failure detection, leader election and to allow clients of that cluster to respond.
Etcd doesn't manage clusters other than its own. It's not magic awesome sauce.
If you want to use etcd to manage a mysql cluster, you will need a component which manages the mysql nodes and stores cluster state in etcd. Clients can watch for changes in that cluster state and adjust.

Impact of master node and etcd failure in a k8s cluster

I want to understand what could be the possible impact of a master node failure in a k8s cluster with only one master node with internal etcd store.
As per my understanding, all kinds of deployed workload containers (including stateless and stateful sets with persistent volume claims) running on worker nodes would keep on running until recreation of any container is required as they don't have a direct functional dependency on the master node and etcd store for their core functions. And, the unavailability of the master node only affects the control plane operations for the cluster.
Is my understanding correct? If not, could you please explain the impact of the master node failure on my workload running on that cluster?
I understand that the best way to achieve HA for k8s cluster is to set up a multi-master cluster with possibly externalizing etcd stores also for decoupling of them. This question is to understand the exact impact of the master node failure to take an informed call before configuring a multi-master cluster.
Etcd operators on the quorum system so as long as the cluster sees a majority it will continue operating. If the failed node was the current leader, the others would trigger an election after the heartbeat timeout.
For kube-apiserver, it's a horizontal service so losing a node is not interesting, just like any other webapp. Some (most) controllers are singletons, but they run on every control plane node and use kube-apiserver for leader elections so as with Etcd, if the leader dies then a few seconds later another copy will get the leader lock and take over.

Why we need more than 3 master cluster for kubernetes HA

I see most of the K8S master components has a leader selection process except apiServer. If only one node will be the leader any point of time, why would we need more then 3 master cluster for bigger k8s cluster?
The requirement of minimum 3 hosts comes from the fact that Kubernetes HA cluster uses etcd for storing and syncing configuration. etcd requires minimum 3 nodes to ensure HA. In general case we need to use n+1 model when want to deploy Kubernetes HA cluster
In a single master setup, the master node manages the etcd database, API server, controller manager and scheduler, along with the worker nodes. However, if that single master node fails, all the worker node fail as well and entire cluster will be lost.
In a multi-master setup, by contrast, multi-master provides high availability for a single cluster and improves network performance because all the masters behave like a unified data center.
A multi-master setup protects against a wide range of failure modes, from a loss of single worker node to the failure of the master node’s etcd service. By providing redundancy, a multi-master cluster serves a highly available system for your end users.
Do not use a cluster with two master replicas. Consensus on a two-replica cluster requires both replicas running when changing persistent state. As a result, both replicas are needed and a failure of any replica turns cluster into majority failure state. A two-replica cluster is thus inferior, in terms of HA, to a single replica cluster.
Here are useful documentation: kubernetes-ha-cluster, creating-ha-cluster.
Articles: ha-cluster, ha.

Kubernetes: Disaster recovery

I am currently working on Kube deployment, unfortunatelly we have only 2 locations and the requirements are that we need to tolerate one location failure. It is not required that it must be automatic or zero downtime (so it is not expected to be HA, we just want disaster recovery).
I am completely aware that 2 masters cluster has lower availability than 1 master cluster. But the idea is that with two masters, when one location fails (or is taken down), admin will spin up another new master in the other location, ensuring majority.
Expected advantage over single master: The state of the cluster should survive location failure (we have also Postgres and other stateful components in Kube and we want Postgres to reconfigure slave to master, if node with master is taken down).
Is this approach sane? Or is there some other way how to deploy Kube in 2 locations?

What different between Master node gateway and other node gateway in elasticsearch? Both of them store meta data, isn't it?

What different between Master node gateway and other node gateway in elasticsearch? Both of them store meta data, isn't it?
Meta data, elasticsearch called, what information we can get from it?
The master node is the same as any other node in the cluster, except that it has been elected to be the master.
It is responsible for coordinating any cluster-wide changes, such as the addition or removal of a node, creation, deletion or change of state (i.e. open/close) of an index, and the allocation of shards to nodes. When any of these changes occur, the "cluster state" is updated by the master and published to all other nodes in the cluster. It is the only node that may publish a new cluster state.
The tasks that a master performs are lightweight. Any tasks that deal with data (eg indexing, searching etc) do not need to involve the master. If you choose to run the master as a non-data node (ie a node that acts as master and as a router, but doesn't contain any data) then the master can run happily on a smallish box.
A node is allowed to become a master if it is marked as "master eligible" (which all nodes are by default). If the current master goes down, a new master will be elected by the cluster.
An important configuration option in your cluster is minimum_master_nodes. This specifies the number of "master eligible" nodes that a node must be able to see in order to be part of a cluster. Its purpose is to avoid "split brain" ie having the cluster separate into two clusters, both of which think that they are functioning correctly.
For instance, if you have 3 nodes, all of which are master eligible, and set minimum_master_nodes to 1, then if the third node is separated from the other two it, it still sees one master-eligible node (itself) and thinks that it can form a cluster by itself.
Instead, set minimum_master_nodes to 2 in this case (number of nodes / 2 + 1), then if the third node separates, it won't see enough master nodes, and thus won't form a cluster by itself. It will keep trying to join the original cluster.
While Elasticsearch tries very hard to choose the correct defaults, minimum_master_nodes is impossible to guess, as it has no way of knowing how many nodes you intend to run. This is something you must configure yourself.