How to stop the entire Akka Cluster (Sharding) - scala

How to stop the ENTIRE cluster with sharding (spanning multiple machines - nodes) from one actor?
I know I can stop the actor system on 'this' node context.system.terminate()
I know I can stop the local Sharding Region.
I found .prepareForFullClusterShutdown() but it doesn't actually stop the nodes.
I suppose there is no single command to do that, but there must be some way to do this.

There's no out-of-the-box way to do this that I'm aware of: the overall expectation is that there's an external control plane (e.g. kubernetes) which manages this.
However, one could have an actor on every node of the cluster that listens for membership events and also subscribes to a pubsub topic. This actor would track the current cluster membership and, when told to begin a cluster shutdown, it publishes a (e.g.) ShutdownCluster message to the topic and tracks which nodes leave. After some length of time (since distributed pubsub is at-most-once) if there are nodes besides this one that haven't left, it sends it again. Eventually, after all other nodes in the cluster have left, this actor then shuts down its node. When other nodes see a ShutdownCluster message, they immediately shut themselves down.
Of course, this sort of scheme will probably not play nicely with any form of external orchestration (whether it's a container scheduler like kubernetes, mesos, or nomad; or even something simple like monit which notices that the service isn't running and restarts it).

Related

MSMQ in cluster behavior on node switch

I just installed MSMQ in cluster and now test how it behaves. It appears, that when active cluster node is switched all messages, which were in the queue, are lost (even when we switch back to original node). For me it seems like undesired behavior. I thaught that all messages from source node should be moved towards destination node on node switch.
I tested node switch via Pause > Drain roles menu item and via Move > Select node menu item.
I want to know is decribed behavior is like MSMQ in cluster should behave or may be it is some misconfiguration issue?
Update. Found similar question here: MSMQ Cluster losing messages on failover. But the solution did not help in my situation.
It seems that I sent to message queue messages, which were not recoverable (as it is written here: https://blogs.msdn.microsoft.com/johnbreakwell/2009/06/03/i-restarted-msmq-and-all-my-messages-have-vanished). That's why these messages didn't survive service restart. When I send message with Recoverable flag set, messages started to recover after service restart and cluster node switch.

Is there downtime when a partition is moved to a new node?

Service Fabric offers the capability to rebalance partitions whenever a node is removed or added to the cluster. The Service Fabric Cluster Resource Manager will move one or more partitions to this node so more work can be done.
Imagine a reliable actor service which has thousands of actors running who are distributed across multiple partitions. If the Resource Manager decides to move one or more partitions, will this cause any downtime? Or does rebalancing partitions work the same as upgrading a service?
They act pretty much the same way, The main difference I can point is that Upgrades might affect only the services being updated, and re-balancing might affect multiple services at once. During an upgrade, the cluster might re-balance the services as well to fit the new service instance in a node.
Adding or Removing nodes I would compare more with node failures. In any of these cases they will be rebalanced because of the cluster capacity changes, not because of the service metric\load changes.
The main difference between a node failure and a cluster scaling(Add/remove node) is that the rebalance will take in account the services states during the process, when a infrastructure notification comes in telling that a node is being shutdown(for updates or maintenance, or scaling down) the SF will ask the Infrastructure to wait so it can prepare for this announced 'failure', and then start re-balancing the services.
Even though re-balancing cares about the service states for a scale down it should not be considered more reliable than a node failure, because the infrastructure will wait for a while before shutting down the node(the limit it can wait will depend on the reliability tier you defined for your cluster), until SF check if the services meet health conditions, like turn down services and creating new ones, checking if they will run fine without errors, if this process takes too long, these service might be killed once the timeout is reached and the infrastructure proceed with the changes, Also, the new instances of the services might fail on new nodes, forcing the services to move again.
When you design the services is safer to consider the re-balancing as a node failure, because at the end is not much different. Your services will move around, data stored in memory will be lost if not persisted, the service address will change, and etc. The services should have replicated data and the clients should always use a retry logic and refresh the services location to reduce the down time.
The main difference between service upgrade and service rebalancing is that during upgrade all replicas from all partitions are get turned off on particular node. According to documentation here balancing is done on replica basis i.e. only some replicas from some partitions will get moved, so there shouldn't be any outage.

How do I setup a Active / Passive environment with two nodes in OpenShift?

I am trying to configure a Active/Passive cluster with two nodes (using OpenShift). The second passive node should be a hot standby, in other words it is up and running but not doing anything, until the first node dies. Then the passive node becomes active and a new passive node is started.
I have read the High Availability documentation, however it just seems to cover the theory. Furthermore it seems like overkill ( I am thinking there might be an easier way to meet my goal).
Where would I start?
What you are asking for goes against the usual practice for how Kubernetes/OpenShift is used. You wouldn't have hot standby nodes, you would always use all nodes in the cluster. You would then allow for enough additional capacity in your cluster such that loosing a node doesn't cause a problem as other nodes would have enough capacity to then run the applications. In this scenario the Kubernetes scheduler would automatically restart any applications which were on a failed node on the other nodes in the cluster, without you needing to perform any explicit failover steps.
So don't try and do anything special, setup your cluster with the two nodes, with applications being distributed across both. If you need to have the ability to run with only a single node, make sure it has enough capacity to run everything. If over time you add more applications and one node is not enough, add a third node, with all three being used in normal case. You can then handle failure of a single node again.

Zooker Failover Strategies

We are young team building an applicaiton using Storm and Kafka.
We have common Zookeeper ensemble of 3 nodes which is used by both Storm and Kafka.
I wrote a test case to test zooker Failovers
1) Check all the three nodes are running and confirm one is elected as a Leader.
2) Using Zookeeper unix client, created a znode and set a value. Verify the values are reflected on other nodes.
3) Modify the znode. set value in one node and verify other nodes have the change reflected.
4) Kill one of the worker nodes and make sure the master/leader is notified about the crash.
5) Kill the leader node. Verify out of other two nodes, one is elected as a leader.
Do i need i add any more test case? additional ideas/suggestion/pointers to add?
From the documentation
Verifying automatic failover
Once automatic failover has been set up, you should test its operation. To do so, first locate the active NameNode. You can tell which node is active by visiting the NameNode web interfaces -- each node reports its HA state at the top of the page.
Once you have located your active NameNode, you may cause a failure on that node. For example, you can use kill -9 to simulate a JVM crash. Or, you could power cycle the machine or unplug its network interface to simulate a different kind of outage. After triggering the outage you wish to test, the other NameNode should automatically become active within several seconds. The amount of time required to detect a failure and trigger a fail-over depends on the configuration of ha.zookeeper.session-timeout.ms, but defaults to 5 seconds.
If the test does not succeed, you may have a misconfiguration. Check the logs for the zkfc daemons as well as the NameNode daemons in order to further diagnose the issue.
more on setting up automatic failover

Supervisors in STORM

I have a doubt in storm and here is goes:
Can multiple supervisors run on a single node? or is it the fact that we can run only one supervisor in one machine?
Thanks.
In Principle, There should be 1 Supervisor daemon per 1 physical machine. Why ?
Answer : Nimbus receives heart beat of Supervisor daemon and try to restart it in case supervisor died, if nimbus failed permanently on restart attempt. Nimbus will assign that job to another Supervisor.
Imagine, two Supervisors going down same time as they are from same physical machine, poor fault tolerant !!
running two Supervisor daemons will also be waste of memory resources.
If you have very high memory machines simply increase number of workers by adding more ports in storm.yaml instead adding supervisor.slots.ports.
Theoretically possible - practically you may not need to do it - unless you are doing a PoC/Demo. I did this for one of the demo I gave by making multiple copies of storm and changing the ports for one of the supervisors - you can do it by changing supervisors.slots.ports.
It is designed basically per node. So one node should have only one supervisor. This daemon deals with number of worker processes that you configured based on ports.
So there is no need of extra supervisor daemon per node.
It is possible to run multiple supervisors on a single host. Have a look at this post in storm-user mailing list.
Just copy multiple Storm, and change the storm.yaml to specify
different ports for each supervisor(supervisor.slots.ports)
Supervisor is configured per node basis. Running multiple supervisor on a single node does not make much sense. The sole purpose of the supervisor daemon is to start/stop the worker process (each of these workers are responsible for running subset of topologies). From the doc page ..
The supervisor listens for work assigned to its machine and starts and stops worker processes as necessary based on what Nimbus has assigned to it.