How do akka nodes discover other nodes? - scala

in akka cluster i would like to know a few more details about how the cluster works.
If i have a seed from A to B,C and from C to D and from D to E
then what if nodes D and E are restarted and D does not come up will E know the rest of the cluster? if not isn't that considered a problem?

Assuming you have multiple seed nodes (A to E) and question is not aimed at cluster state management/convergence.
yes, seed node E will try to join the other seed nodes (A,B,C) even though node D is not up. As in the Cluster Usage documentation
Once more than two seed nodes have been started it is no problem to
shut down the first seed node. If the first seed node is restarted it
will first try join the other seed nodes in the existing cluster.

Related

How to prevent data inconsistency when one node lose network connectivity in kubernetes

I have a situation where I have a cluster with a service (we named it A1) and its data which is on a remote storage like cephFS in my case. the number of replica for my service is 1. Assume I have 5 node in my cluster and service A1 reside in node 1. something happens with node 1 network and it lose the connectivity with cephFS cluster and my Kubernetes cluster as well (or docker-swarm). cluster mark it as unreachable and start a new service (we named it A2) on node 2 to keep replica as 1. after for example 15 min node 1 network fixed and node 1 get back to cluster and have service A1 running already (assume it didn't crash while it loses its connectivity with remote storage).
I worked with docker-swarm and recently switched to Kubernetes. I see Kuber has a feature call StatefulSet but when I read about it. it doesn't answer my question. (or I may miss something when I read about it)
Question A: what does cluster do. does it keep A2 and shutdown A1 or let A1 keeps working and shutdown A2 (Logically it should shutdown A1)
Question B (and my primary question as well!): Assume that the cluster wants to shutdown on of these services (for example A1). This service does some save on storage when it wants to shutdown. in this case state A1 save to disk and A2 with newer state saved something before A1 network get fixed.
There must be some locks when we mount the volume to the container in which when it attached to one container other container cant write to that (let A1 failed when want to save its old state data on disk)
The way it works - using docker swarm terminology -
You have a service. A service is a description of some image you'd like to run, how many replicas and so on. Assuming the service specifies at least 1 replica should be running it will create a task that will schedule a container on a swarm node.
So the service is associated with 0 to many tasks, where each task has 0 - if its still starting or 1 container - if the task is running or stopped - which is on a node.
So, when swarm (the orcestrator) detects a node go offline, it principally sees that a number of tasks associated with a service have lost their containers, and so the replication (in terms of running tasks) is no longer correct for the service, and it creates new tasks which in turn will schedule new containers on the available nodes.
On the disconnected node, the swarm worker notices that it has lost connection to the swarm managers so it cleans up all the tasks it is holding onto as it no longer has current information about them. In the process of cleaning the tasks up, the associated containers get stopped.
This is good because when the node finally reconnects there is no race condition where there are two tasks running. Only "A2" is running and "A1" has been shut down.
This is bad if you have a situation where nodes can lose connectivity to the managers frequently, but you need the services to keep running on those nodes regardless, as they will be shut down each time the workers detach.
The process on K8s is pretty much the same just change the terminology.

problem in sync'ing replicas with solr 8.3 with zookeeper 3.5.6

I recently converted a solr 7.x + zookeeper 3.4.14 to solr 8.3 + zk 3.5.6, and depending on how I start the solr nodes I'm geting a sync exception.
My setup uses 3 zk nodes and 2 solr nodes (let's call it A and B). The collection that has this problem has 1 shard and 2 replicas. I've noticed 2 situations: (1) which works fine and (2) which does not work.
1) This works: I start solr node A, and wait until it's replica is elected leader ("green" in the Solr interface 'Cloud'->'Graph') - which takes about 2 min; and only then start solr node B. Both replicas are active and the one in A is the leader.
2) This does NOT work: I start solr node A, and a few secs after I star solr node B (that is, before the 'A' replica is elected leader - still "Down" in the solr interface). In this case I get the following exception:
ERROR (coreZkRegister-1-thread-2-processing-n:192.168.15.20:8986_solr x:alldata_shard1_replica_n1 c:alldata s:shard1 r:core_node3) [c:alldata s:shard1 r:core_node3 x:alldata_shard1_replica_n1] o.a.s.c.SyncStrategy Sync Failed:java.lang.IndexOutOfBoundsException: Index -1 out of bounds for length 99
It seems that if both solr node are started soon after each other, then ZK cannot elect one as leader.
This error only appears in the solr.log of node A, even if I invert the order of starting nodes.
Has anyone seen this before?
I have several other collections which do not show this problem.
Thanks!

Crate DB 2 Node Setup

I'm trying to setup a 2 node Crate cluster, I have set the following configuration values on the 2 nodes:
gateway.recover_after_nodes: 1
gateway.expected_nodes: 2
However the check is failing as per the documentation:
(E / 2) < R <= E where R is the number of recovery nodes, E is the
number of expected nodes.
I see that most available documentation states a 3 node cluster, however at this point I can only start a 2 node cluster as a failover setup.
The behaviour I'm expecting is that if one of the nodes goes down the other node should be able to take up the traffic and once 2nd node comes back up it should sync up with new node.
If anyone has been able to successfully bring up a 2 node Crate cluster, please share the configuration required for the same.
Cheers
It doesn't make sense to run a two node cluster with 1 required node, because this could easily end up in a split brain and set the cluster into a state that it won't be able to recover, that's why you always need more then half of the number of expected nodes.

Akka-cluster holds and keeps storing wrong info about membership on EC2/docker

Environment: scala-2.11.x, akka-2.5.9
There are two hosts on EC2: H1 and H2.
There are tree modules of sbt-project: master, client and worker.
Each module implement akka-cluster node, which subscribes to a cluster events and logs them. Also each node logs a cluster state every 1 minute (for debug). The following ports are used for cluster-nodes:master: 2551, worker: 3000, client: 5000
The project available at github
The more details about infrastructure: my previous question
A module can be redeployed in H1 or H2 randomly.
There is a strange behavior of the akka-cluster. When one of nodes (for example worker) is redeployed. The following steps illustrate a history of deploying:
The initial state - when worker is deployed on H1 and master and client are deployed on H2
----[state-of-deploying-0]---
H1 = [worker]
H2 = [master, client]
cluster status: // cluster works correctly
Member(address = akka.tcp://ClusterSystem#H1:3000, status = Up)
Member(address = akka.tcp://ClusterSystem#H2:2551, status = Up)
Member(address = akka.tcp://ClusterSystem#H2:5000, status = Up)
----------------
After that the worker module has been redeployed on host H2
----[state-of-deploying-1]---
H1 = [-]
H2 = [master, client, worker (Redeployed)]
cluster status: // WRONG cluster state!
Member(address = akka.tcp://ClusterSystem#H1:3000, status = Up) // ???
Member(address = akka.tcp://ClusterSystem#H2:2551, status = Up)
Member(address = akka.tcp://ClusterSystem#H2:3000, status = WeaklyUp)
Member(address = akka.tcp://ClusterSystem#H2:5000, status = Up)
----------------
The above situation happens occasionally. In this case a cluster stores a wrong state of membership and will not repair it:
Member(address = akka.tcp://ClusterSystem#H1:3000, status = Up) // ???
The host H1 doesn't contain any instances of worker. And > telnet H1 3000 returns connection refused.
But why does the akka-cluster keep storing this wrong info?
This behaviour is intended, and Akka cluster in production should be run with no automatic downing, to prevent split-brain problems.
Imagine a two node (A and B) cluster with two client (X and Y):
At the beginning, everything is ok and a request of Y connected to B can be forwarded to Actor1 running on A for processing through remoting
Then, because of a network partitioning, A becomes unreachable from B, B might be tempted to mark A as down and restart Actor1 locally.
Client Y sends messages to Actor1 running on B
Client X on the other side of network partitioning is still connected to A and will send messages to Actor1
This is a split-brain problem: the same actor with the same identifier is running on two nodes with two different states. It is very hard or impossible to rebuild the correct actor state.
To prevent this to happen, you have to pick up a reasonable downing strategy for your problem or use case:
If you can afford it, use manual downing. An operator will recognize that the cluster is really down, and mark nodes unreachable.
If your cluster has a dynamic number of nodes, then you need something sophisticated as the Lightbend Split Brain Resolver
If your cluster is static, you can use quorum strategies to avoid split brain. You always need to run an odd number of nodes.

Akka cluster doesn't start when using manual join

I have an application where I cannot know the seed nodes ahead of time to put into the application configuration. Therefore, the application starts on one node and when it's started on the other nodes, they use Cluster.join to join the cluster on the first node. The problem is that the join never completes and the cluster never starts. What is the problem?
The problem is that there is no cluster yet to join. Simply instantiating a cluster object on the first node does not initiate the cluster. There is a small note in the documentation that may be easily missed:
Joining can also be performed programatically with Cluster(system).join. Note that you can only join to an existing cluster member, which means that for bootstrapping some node must join itself.
So, the first node should join itself to initiate the cluster. This causes the creation of a "leader" that is responsible for adding and removing nodes from the cluster.