Infinispan - Node Failover and Control over Recovery - jboss

Hope all are doing good. I am new to Infinispan and I need help. Say I have a cluster of 3 nodes running in Distributed Mode. Consider the following scenario:
Infinispan Version : 7.1.1
No. of Nodes = 3 (NodeA, NodeB, NodeC)
Mode = Distributed
numOwners = 2
No. of Key/Values in the cluster = 3 [(k1,v1),(k2,v2),(k3,v3)]
Distribution of keys in each of the nodes :
NodeA --> k1,k2
NodeB --> k2,k3
NodeC --> k3,k1
Now, say Node B is down.
Q1. Would the following scenario be like this?
NodeA --> k1,k2, k3
NodeC --> k3,k1, k2
Q2. If Node B becomes alive again, I want my cluster to regain its original state like:
NodeA --> k1,k2
NodeB --> k2,k3
NodeC --> k3,k1
Is there any mechanism by which I can achieve the above 2 states (after node failure and after node recovery).
Can anyone help me out?
Any help would be highly appreciated.

Q1: Yes, with numOwners = 2 and 2 nodes all data will be on both nodes
Q2: It won't get to the original state, but it will spread the entries ~evenly across cluster. Therefore, it is possible that it will end up e.g. like
A -> k1, k3
B -> k3, k2
C -> k2, k1
However, the keys don't have to be spread exactly evenly. Infinispan defines the distribution by the concept of segments; you can define the number of segments in configuration, too. Each segment contains a portion of keys according to the hashCode() of those keys, and these segments are spread as evenly as possible.

Related

How to isolate the two mirrormaker clusters, each link is one-way synchronization

I want to deploy two mirror maker2 clusters, each link is one-way synchronization,connect-mirror-maker.properties config like this:
one mm2
clusters = A,B
A.bootstrap.servers=...
B.bootstrap.servers=...
A->B.enabled = true
B->A.enabled = false
A->B.topics = .* ...
another mm2
clusters = A,B
A.bootstrap.servers=...
B.bootstrap.servers=...
B->A.enabled = true
A->B.enabled = false
B->A.topics = .* ...
the reason I want to deploy like this is that each mirror maker cluster is close to the corresponding Kafka cluster, but after I start each mirror maker progress, only one cluster can Synchronous Data, the other doesn't work,I want the two clusters to be isolated, At present, it seems to affect each other, Has anyone encountered this problem? I need your help,
Thank you very much.
I start each mirror maker's progress, only one cluster can Synchronous Data, the other doesn't work, I want the two clusters to be isolated, At present, it seems to affect each other.
I have tried the setup with kakfa-3.1
Your specification implies that you want to mirror all topics, including those that are already mirrored and/or internal topics.
You need to modify A->B.topics and B->A.topics in such a way that their intersection is empty.
Otherwise you run into a typical mirroring-a-mirror situation:
* we have a topic T in cluster A
* MM (A->B) mirrors topic T into cluster B (where it does not exist)
* MM (B->A) mirrors topic T into cluster A (where T already exists)

How to partitions space between n pods in kubernetes

We are using Kubernetes and we need to do "Smart partitioning" of data. We want to split the space between 1 to 1000 between n running Pods,
And each pod should know which part of the space is his to handle (for pooling partitioned tasks).
So for example, if we have 1 pod he will handle the whole space from 1-1000.
When we scale out to 3 pods, each of them will get the same share.
Pod 1 - will handle 1-333
Pod 2 - 334-667
Pod 3 667-1000
Right now the best way that we find to handle this issue is to create a Stateful-set, that pooling the number of running pods and his instance number and decide which part of the space he needs to handle.
Is there a smarter/built-in way in Kubernetes to partition the space between nodes in this manner?
Service fabric has this feature built-in.
There are NO native tools for scaling at the partition level in K8s yet.
Only custom solutions similar to what you have came up with in your original post.
Provide another customized way for doing this for your reference. Based on this tech blog of Airbnb
Given the list of pods and their names, each pod is able to
deterministically calculate a list of partitions that it should work
on. When we add or remove pods from the ReplicaSet, the pods will
simply pick up the change, and work on the new set of partitions
instead
How do they do is based on the their repo. I summarized the key components here (Note: the repo is written in Java).
Get how many pods running in the k8s namespace, and sort by pod name (code). Code snippet
String podName = System.getenv("K8S_POD_NAME");
String namespace = System.getenv("K8S_NAMESPACE");
NamespacedKubernetesClient namespacedClient = kubernetesClient.inNamespace(namespace);
ReplicaSet replicaSet;
// see above code link to know how to get activePods, remove it here because it is too long
int podIndex = activePods.indexOf(podName);
int numPods = activePods.size();
Every time you call the above code, you will have deterministic list of podIndex and numPods. Then, using this information to calculate the range this pod is responsible for
List<Integer> partitions = new ArrayList<>();
int split = spaceRange / numPods;
int start = podIndex * split;
int end = (podIndex == numPods - 1) ? spaceRange - 1 : ((podIndex + 1) * split) - 1;
for (int i = start; i <= end; i++) {
partitions.add(i);
}
Since the number of pods will be changed anytime, you may need a executorService.scheduleWithFixedDelay to periodically update the list as here
executorService.scheduleWithFixedDelay(this::updatePartitions, 0, 30, TimeUnit.SECONDS);
This approach is not the best, since if you set scheduleWithFixedDelay with 30 seconds, any pod change won't be captured within 30 seconds. Also, it is possible in a short period of time, two pods may be responsible for the same space, and you need to handle this special case in your business logics as Airbnb tech blog does.

How to create a simple nodes-to-sink communication pattern (multi-hop topology) in Castalia Simulator

I am facing some teething problems in Castalia Simulator, while creating a simple nodes-to-sink communication pattern.
I want to create a unidirectional topology as describe follows
node 0 <-------> node 1<----------->node 2<-------->node 3
source =node 0
relay node= node 1, 2
Sink node = node 3
Here messages flow from left to right, so node 0 sends only to node 1, node 1 sends only to node 2, and node 2 sends only to node 3. When node 0 want to send data packet to node 3, then node 1 and node 2 worked as intermediate nodes (relay nodes/ forwarding nodes). The neighbor nodes can also send data in unidirectional fashion (left to right) such as node 0 sends to node 1, node 1 sends to node 2 etc.
I read manual and understand the ApplicationName ="ThroughputTest" , but according to my understanding here, all nodes will send data to sink (node 0).
I added following lines in omnetpp.init file:-
SN.node[0].Application.nextRecipient = "1"
SN.node[1].Application.nextRecipient = "2"
SN.node[2].Application.nextRecipient = "3"
SN.node[3].Application.nextRecipient = "3"
But I am not getting my desire result.
Please help me regard this .
Regards
Gulshan Soni
We really need more information to figure out what you have done.
The part of your omnetpp.ini file you copied here, just shows that you are defining some static app-level routing using the app module ThroughputTest
There are so many other parts to a network. Firstly, the definition of the MAC plays a crucial role. For example, if you have chosen MAC 802.15.4 or BaselineBANMAC, you cannot have multihop routing, since there is only hub to slave nodes communication. Furthermore, how you define the radio and the channel, can also impact communication. For example, the signal might not be strong enough to reach from one node to another.
Read the Castalia User's Manual carefully, and provide enough information in your questions so that others can replicate your results.

How to determine Last write win on concurrent Vector clocks?

I'd like to keep track of only the recent data and also employ the help of Vector clocks in resolving issues so I can easily discard data via L-W-W rule.(last write wins)
Say we have 3 nodes:
- Node1
- Node2
- Node3
Then we would use Vector clocks to keep track of causality and concurrency on each events/changes. We represent Vector clocks initially with
{Node1:0, Node2:0, Node3:0}.
For instance Node1 gets 5 local changes it would mean we increment its clock by 5 increments that would result into
{Node1: 5, Node2:0, Node3:0}.
This would be normally okay right?
Then what if at the same time Node2 updates its local and also incremented its clock resulting into
{Node1:0, Node2:1, Node3:0}.
At some point Node1 sends an event to Node3 passing the updates and piggybacking its vectorclock. So Node3 which has a VC of {Node1:0, Node2:0, Node3:0} would easily just merge the data and clock as there are no changes on it yet.
The problem I'm thinking about how to deal with is what would happen if Node2 sends an event to update into Node3 passing it's own VC and updates.
What would happen to the data and the clocks. How do I apply Last Write wins here when the first one that gets written to Node3 which was from Node1 would basically appear as the later write as it have a greater VC value on its own clock.
Node3's clock before merging: {Node1: 5, Node2: 0 , Node3: 1}
Node2's messagevc that Node3 received: {Node1:0, Node2:1, Node3:0}
How do I handle resolving data on concurrent VCs?
This is a good question. You're running into this issue because you are using counters in your vector clocks, and you are not synchronizing the counters across nodes. You have a couple of options:
Submit all writes through one primary server. The primary server can apply a total order to all of the writes and then send them to the individual nodes to be stored. It would be helpful to have some background on your system. For example, why are there three independent nodes? Do they exist to provide replication and availability? If so, this primary server approach would work well.
Keep your server's times in sync, as described in Google's Spanner paper. Then, instead of using a monotonically increasing counter for each node in the vector clock, you can use a timestamp based off of the server's time. Again, having background on your system would be helpful. If your system just consists of human users submitting writes, then you might be able to get away with keeping your server's times loosely in sync using NTP without violating the LWW invariant.

Akka cluster nodes with actors leave

Lets say that I have 4 nodes (N1, N2, N3, N4) in an Akka cluster. Suppose I have actor with name A deployed N4 (by the akka system and transparent to the user). If I decided that I no longer need a lot of computing power, I would scale down the servers to only have 2 nodes, thus node N3 and node N4 are powered down. What would happen to actor A? Would it be dead and should be recreated manually by application logic? Would it be automatically recreated on another node (even with the state lost)?
If you have a regular actor on a node and you shut down that node the actor will be shut down together with the actor system. There are some tools that you can use if you want a specific actor to (almost) always be alive on some node, ClusterSingleton keeps an actor alive on one node as continuously as possible without ever having multiple instances of it in the cluster, ClusterSharding makes it possible to keep actors alive and redistributable across cluster using an identifier. Akka persistence allows for the state of an actor survive being stopped on one node and started on another.
Read more about all of this in the docs, and I really recommend reading the general sections on what akka cluster is to get a firm understanding before starting to use it: http://doc.akka.io/docs/akka/2.4.0/scala/index-network.html