I have 1 master node and 2 slave nodes on Jboss EAP 7
All 3 nodes are running and during this.
-If the master nodes go down suddenly, will the slave nodes be still able to run on their own and work normally? what is the disadvantage of running slave nodes alone in this scenario? is this will be possible?
-If one of the slave node goes down while the master is down ,can the slave node can be brought up independently and work normally?
If the DC (master node) goes down, the slave nodes will still be Up & Running.
The problem with the slave nodes is that they cannot be restarted unless the master node (DC) is up. Bear in mind that the DC, holds the whole DOMAIN configuration. That's why is very important to consider the HA of the Domain Controller from the beginning.
Related
Is there any way to convert/change multi-master nodes(3 masters, HA & LB) to single master in stacked etcd configuration?
In 3 master nodes, it only tolerates 1 failure right?
So if 2 of these master node goes down, the control plane wouldn't work.
what I need to do is convert these 3masters to a single master? is there any way to do this to minimalize the downtime of the control plane? (in case the other 2 masters need some time to turn back on)
the test I've done:
I've tried to restore etcd snapshot to a fully different environment with a new setup of 1 master & 2 workers, and it seems to work fine.. the status of the other 2 master nodes is not ready, 2 worker node is ready, and request to api-server is working normally.
But, if I restore etcd snapshot to the original environment.. after resetting the last master node with kubeadm reset, the cluster seems to be broken.. the status of 2 workers is not ready, seems like it has different certificates.
any suggestion on how to make this works?
UPDATE: so apparently, I could restore etcd snapshot directly without
doing "kubeadm reset", even if doing reset.. as long as we update the
certificates, the cluster should be restored successfully.
BUT now I run into a different issue, after restoring the etcd
snapshot.. everything works fine, so basically I want to add a new
Control Plane to this Cluster, the current node status is:
master1 ready
master2 not-ready
master3 not-ready
before I add new CP, I removed the 2 failed master node from the
cluster. after removing it I tried to join new CP to cluster, and the
join process stuck at :
[etcd] Waiting for the new etcd member to join the cluster. This can
take up to 40s [kubelet-check] Initial timeout of 40s passed.
the original master node is broken again, now I can't access the
api-server. do you guys have any idea what's going wrong?
I have K8s cluster with 3 nodes "VMs" doing both master/worker with Etcd installed on all of the 3 nodes "untainted master" installed via kubespray "kubeadm based tool"
Now I would like to replace one VM with another.
Is there a direct way to do so because the only workaround will be adding 2 nodes to have always an odd number of ETCD via kubespray scale.yml ex:node4 and node 5 then removing the additional node3 and node 5 keeping node 4
I don't like the approach.
Any ideas are welcomed
Best Regards
If you have 3 main (please avoid using master, šŖ) control plane nodes you should be fine replacing 1 at a time. The only thing is that your cluster will not be able to make any decisions/schedule any new workload, but the existing workloads will run fine.
The recommendation of 5 main nodes is based on the fact that you will always have a majority to reach quorum on the state decisions š for etcd even if one node goes down. So if you have 5 nodes and one of them goes down you will still be able to schedule/run workloads.
In other words:
3 main nodes
Can tolerate a failure of one node.
Will not be able to make decisions
5 main nodes
Can tolerate a failure of one node.
Can still make decisions because there are 4 nodes still available.
If 2 failures happen then it's tolerated but there is no quorum.
To summarize, Raft š£ which is the consensus protocol for etcd tolerates up to (N-1)/2 failures and requires a majority or quorum of (N/2)+1. The recommended procedure is to update the nodes one at a time: bring one node down, and then a bring another one up, and wait for it to join the cluster (all control plane components)
I have kubernetes cluster working fine. I have one master node and 5 worker nodes, and all these are running pods. When all the nodes are on and if the kubernetes master goes down/ powered off, will the worker nodes keep working as normally.?
If master node is down and one of the worker node also goes down and then come back online after sometime. Then will the pod automatically be started on that worker as the master node is still down.?
When all the nodes are on and if the kubernetes master goes down/ powered off, will the worker nodes keep working as normally.?
Yes, they will work in their last state.
If master node is down and one of the worker node also goes down and then come back online after sometime. Then will the pod automatically be started on that worker as the master node is still down.?
No.
As you can read in Kubernetes Components section:
Master components provide the clusterās control plane. Master components make global decisions about the cluster (for example, scheduling), and detecting and responding to cluster events (starting up a new pod when a replication controllerās āreplicasā field is unsatisfied).
I have a three-node RabbitMQ cluster (version 3.4.4 on Ubuntu 14.04.3) with approximately 80 queues, 40 of which are ha queues (policy {"ha-mode":"exactly","ha-params":3}). The three-node cluster was increased to 5 nodes. When it was at 5 nodes the queues rebalanced as expected, so that some queues moved ownership to the new cluster nodes.
A few hours later the new nodes were removed from the cluster via the following command sequence on the two new nodes:
rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl start_app
The cluster_status on all three remaining nodes is consistent and shows only the three remaining nodes as expected.
Cluster status of node 'rabbit#prod-api-02' ...
[{nodes,[{disc,['rabbit#prod-api-01',
'rabbit#prod-api-02',
'rabbit#prod-api-03']}]},
{running_nodes,['rabbit#prod-api-03',
'rabbit#prod-api-01',
'rabbit#prod-api-02']},
{cluster_name,<<"rabbit#localhost.localdomain">>},
{partitions,[]}]
However, the cluster was left in a state where the three nodes of the cluster believe that two of the ha queues that moved ownership when the cluster was expanded are still owned by a node that is no longer part of the cluster, and it considers them to be in a 'down' state as a consequence. Further, it no longer shows the policy associated with the queues or the slave nodes.
rabbitmqctl list_queues name state policy pid slave_pids
production-process-queue down <rabbit#prod-rabbit-01.0.0.0>
delete_unused_things_queue down <rabbit#prod-rabbit-01.0.0.0>
Trying to delete the two queues returns an error saying the home node is down or inaccessible.
Has anyone experienced this type of problem before? Is there any way to change the ownership of these two queues back to one of the remaining cluster nodes, or a way to delete them? I do not need to preserve the queue content.
Had a similar problem.
It is possible to move the home node if you use the command forget_cluster_node for each of the two departed nodes. See https://www.rabbitmq.com/rabbitmqctl.8.html#forget_cluster_node.
Can Apache Mesos 'master' nodes be co-located on the same machine as Mesos 'slave' nodes? Similarly (for high-availability (HA) deploys), can the Apache Zookeeper nodes used in Mesos 'master' election be deployed on the same machines as Mesos 'slave' nodes?
Mesos recommends 3 'masters' be used for HA deploys, and Zookeeper recommends 5 nodes be used for its quorum election system. It would be nice to have these services running along side Mesos 'slave' processes instead of committing 8 machines to effectively 'non-productive' tasks.
If such a setup is feasible, what are the pros/cons of such a setup?
Thanks!
You can definitely run a master, slave, and zk process all on the same node. You can even run multiple master and slave processes on the same node, provided you give them each unique ports, but that's only useful for a test cluster.
Typically we recommend running ZK on the same nodes as your masters, but if you have extra ZKs, you can certainly run them on slaves, or mix-and-match as you see fit, as long as all master/slave/framework nodes can reach the ZK nodes, and all slaves can reach the masters.
For a smaller cluster (<10 nodes) it could make sense to run a slave process on each master, especially since the standby masters won't be doing much. Even an active master for a small cluster uses only a small amount of cpu, memory, and network resources. Just make sure you adjust the --resources on that slave to account for the master's resource usage.
Once your cluster grows larger (especially >100 nodes) the network traffic to/from the master as well as its cpu/memory utilization becomes significant enough that you wouldn't want to run a mesos slave on the same node as the master. It should be fine to co-locate ZK with your master even at large scale.
You didn't specifically ask, but I'll also discuss where to run your framework schedulers (e.g. Spark, Marathon, or Chronos). These could be co-located with any of the other components, but they only really need to be able to reach the master and zk nodes, since all communication to slaves goes through the master. Some customers run the schedulers on master nodes, some run them on edge nodes (so users don't have access to the slaves), and others use meta-frameworks like Marathon to run other schedulers on slaves as Mesos tasks.