Why does a heartbeat take O(log N) time to propagate - distributed-computing

I was reading about gossip style failure detection.
In the notes that I was reading it's stated that: a single heartbeat takes O(log(N)) time to propagate but this statement is not explained
Any idea why this is?

Because the most effective way of propagation in such case is using the Binary Tree structure (or any k-ary tree). First node sends message to its children, they send message to their children etc. Binary tree has height of log n, every level in the tree represents one stage of propagating messages, so the overall time equals O(log n).

You start by sending message to k nodes. Each of them sends a message to k nodes and collects back their responses. Each hop multiplies the number of nodes that have received the message by k. All the nodes have received the message when k^t >= N. The clock time it takes for this to happen is proportional to t, the number of hops.
k^t = N => log_k(N)=t
We know that the clock time is proportional to t, so it must be proportional to log_k(N).
I'm not familiar with gossip in particular but this answer applies to most broadcast messages on most cluster fabrics.

Related

Multi pickup locations for a single delivery

I have a list of orders, each consists of a disjunction of pickup and a disjunction of delivery nodes (using AddDisjunction with a positive penalty and max cardinality of 1).
Some of these orders form groups that must be delivered to the same location at the same time by the same vehicle, or not at all.
AddPickupAndDelivery/AddPickupAndDeliverySets cannot be used on the same node/disjunction twice so I cannot merge the delivery disjunctions into one and link all pickup disjunctions to it.
I have tried setting the NextVar of one delivery disjunction to the other delivery disjunction, however the other disjunction was still sometimes reached without the first one (but not vice versa).
I have tried combining the NextVar method with giving penalty for reaching only part of the delivery nodes, in two different ways:
first by using AddSoftSameVehicleConstraint, however it did not give penalty for unperformed nodes,
second by creating a new dimension with positive arc values for reaching all disjunctions other than the first one in the NextVar chain, and a negative arc value for reaching the latter, which is the only one that can be reached only if all the rest of the disjunctions were reached. Combined with SetSpanCostCoefficientForAllVehicles and a big cumul var start value at the start nodes, the idea was that reaching part of the nodes would induce a positive span, while reaching all of them would reset the span back to 0.
However at this point the algorithm stopped reaching any nodes, I presume due to the fact that the local search operators do not include a single addition of multiple nodes, and each addition of a single node induces a higher cost. Is there a way of implementing multiple pickups to a single delivery which abides the constraints I have stated while using the python version of or-tools?

In paxos, what happens if a proposer is down after its proposal is rejected?

In this figure, the proposal of X is rejected.
At the end of the timeline, S1 and S2 accept X while S3, S4 and S5 accept Y. Proposer X is now supposed to re-send the proposal with value Y.
But what happens if proposer X gets down at that time? How does S1 and S2 eventually learn the value Y?
Thanks in advance!
It is a little hard to answer this from the fragment of a diagram that you've shared since it is not clear what exactly it means. It would be helpful if you could link to the source of that diagram so we can see more of the context of your question. The rest of this answer is based on a guess as to its meaning.
There are three distinct roles in Paxos, commonly known as proposer, acceptor and learner, and I think it aids understanding to divide things into these three roles. The diagram you've shared looks like it is illustrating a set of five acceptors and the messages that they have sent as part of the basic Synod algorithm (a.k.a. single-instance Paxos). In general there's no relationship between the sets of learners and acceptors in a system: there might be a single learner, or there might be thousands, and I think it helps to separate these concepts out. Since S1 and S2 are acceptors, not learners, it doesn't make sense to ask about them learning a value. It is, however, valid to ask about how to deal with a learner that didn't learn a value.
In practical systems there is usually also another role of leader which takes responsibility for pushing the system forward using timeouts and retries and fault detectors and so on, to ensure that all learners eventually learn the chosen value or die trying, but this is outside the scope of the basic algorithm that seems to be illustrated here. In other words, this algorithm guarantees safety ("nothing bad happens") but does not guarantee liveness ("something good happens"). It is acceptable here if some of the learners never learn the chosen value.
The leader can do various things to ensure that all learners eventually learn the chosen value. One of the simplest strategies is to get the learned value from any learner and broadcast it to the other learners, which is efficient and works as long as there is at least one running learner that's successfully learned the chosen value. If there is no such learner, the leader can trigger another round of the algorithm, which will normally result in the chosen value being learned. If it doesn't then its only option is to retry, and keep retrying until eventually one of these rounds succeeds.
In this figure, the proposal of X is rejected.
My reading of the diagram is that it is an ”accept request” that is rejected. Page 5 paragraph 1 of Paxos Made Simple describes this message type.
Proposer X is now supposed to re-send the proposal with value Y.
The diagram does not indicate that. Only if Y was seen in response to the blue initial propose messages would the blue proposer have to choose Y. Yet the blue proposer chose X as the value in its ”accept request”. If it is properly following Paxos it could not have ”seen Y” in response to it's initial proposal message. If it had seen it then it must have chosen it and so it wouldn’t have sent X.
In order to really know what is happening you would need to know what responses were seen by each proposer. We cannot see from the diagram what values, if any, were returned in response to the first three blue propose messages. We don’t see in the diagram whether X was previously accepted at any node or whether it was not. We don't know if the blue proposer was ”free to choose” it's own X or had to use an X that was already accepted at one or more nodes.
But what happens if proposer X gets down at that time?
If the blue proposer dies then this is not a problem. The green proposer has successfully fixed the value Y at a majority of the nodes.
How does S1 and S2 eventually learn the value Y?
The more interesting scenario is what happens if the green proposer dies. The green proposer may have sent it's accept request messages containing Y and immediately died. As three of the messages are successful the value Y has been fixed yet the original proposer may not be alive to see the accept response messages. For any further progress to be made a new proposer needs to send a new propose message. As three of the nodes will reply with Y the new proposer will chose Y as the value of it's accept request message. This will be sent to all nodes and if all messages get through, and no other proposer interrupts, then S1 and S2 will become consistent.
The essence of the algorithm is collaboration. If a proposer dies the next proposer will collaborate and chose the highest value previously proposed if any exists.

How to model time in matlab?

I am implementing a scheme in Matlab in which a particular node A waits for time period t(defined by the distance between the farthest node within A's range and the propagation speed of the signal) for acknowledgements from a set of nodes after sending a message. If it does not receive any acknowledgement with in time period t, it takes some action.
I have no idea how to implement time in Matlab. Is it possible or I'll have to find some way around?
You can use MATLAB's powerful datetime:
For example: you want to check if the signal is received within the acceptable delay (in this example, 40 milliseconds):
% t = datetime(Y,M,D,H,MI,S,MS);
send = datetime(2016,08,31,06,01,00,00);
receive=datetime(2016,08,31,06,01,00,100);
acceptableDelay=datenum(40/(24*60*60*1000));
if ((receive-send)<acceptableDelay)
disp('Well received!')
else
disp('Late!')
end

How to guarantee that all nodes get infected in gossip-based protocols?

In gossip-based protocols, how do we guarantee that all nodes get infected by the message?
If we selected a random number of nodes and send a message to these nodes, and these nodes did the same, there is a probability that some node will not receive the message.
Although I couldn't calculate it, it seems small. However, if the system is running for a long time, at some point one nodes will be unlucky and will be leftover.
It's a bit hard to answer, due to two reasons:
There isn't really a gossip-based protocol. At most, there are families of gossip-based algorithms.
The algorithms actually guarantee infection only under specific assumptions. E.g., if, as you put it, as "the system is running for a long time" any given link fails permanently under some exponential process (a very likely scenario), then with probability 1 some node will be completely isolated, and no protocol can overcome that.
However, IIUC, you're asking about a protocol with the following assumptions:
For any group V' &subset; V of nodes, there is an active link u &in; V' &rightarrow; v &in; V &setminus; V'.
Each node chooses uniformly d of its neighbors at each step, irrespective of their state, choices made by other nodes, total update state, etc.
Under these conditions, the problem you raised will have probability 0.
You can think about the infection as a Markov Chain where the system is at state i if i nodes are infected. Suppose some change originated at some s &in; V, and so the system starts at state i.
By property 1., there is a link from the i infected nodes to one of the n - i others.
By property 2., the probability of selecting this link is at least 1 / n. This is because the node whose link happens to cross the cut, has at most n neighbors, but at least one neighbor across the cut. Even if its selection is entirely stateless and uninformed, that is the chance that it will choose this neighbor.
Therefore, the probability that this will not happen for j steps is (1 - d/n)j. Using the Union Bound, the probability that this will happen for any state i is at most n (1 - 1/n)j. Take j = n2, and this becomes n e- n; take j = n3, and this becomes n e- n2. Etc.
(Of course, gossip algorithm infection happens much sooner; this is an upper bound for the worst-possible conditions.)
So, if the system runs long enough, the probability that some node does not become infected, decreases to 0 (very quickly). For Anti-Entropy Gossip Protocols, this is enough. For some other protocols, as you suspected, there is a chance that some node will be missed for some update.
We can't provide an answer because you don't understand your problem (hence the question is ambiguous)
The topology of the network is unknown, but the answer depend on it
What's the stop condition of the algorithm? Does it stop or not?
Suppose that a given node is connected to all the other node (that's the topology) and each node perform the same action if it receive a message.
You could simplify your problem into smaller sub-problems (that's the divide-et-impera approach): imagine that any node perform just one attempt (i.e. i = 1).
Since any node picks the receiver completely at random and since this operation is done infinite times then eventually all the nodes will receive the message. How many iterations are required to reach a given confidence (ratio of node which received the message / no. of all nodes ) is up to you.
Once you get this including the repeated attempt i is straightforward.
I made a little simulation of what you're trying to do. http://jsfiddle.net/ut78sega/
function gossip(nodes, tries, startNode, reached) {
var stack = [startNode, tries];
while(stack.length > 0) {
var ttl = stack.pop();
var n = stack.pop();
reached[n] = 1;
if(ttl <= 0) { continue; }
for(var i=0; i < ttl; i++) {
stack.push(Math.floor(Math.random() * nodes), ttl - 1);
}
}
return reached;
}
node - number of total nodes
tries - the starting amount of random selections
startNode - the node that gets the first message
reached - a hash set of nodes that were reached by the current simulation
At each level of the recursive the number of tries is decreased by one. It takes ~9 tries to get 100% coverage of 65536 (2^16) nodes.

Simulation: send packets according to exponential distribution

I am trying to build a network simulation (aloha like) where n nodes decide at any instant whether they have to send or not according to an exponential distribution (exponentially distributed arrival times).
What I have done so far is: I set a master clock in a for loop which ticks and any node will start sending at this instant (tick) only if a sample I draw from a uniform [0,1] for this instant is greater than 0.99999; i.e. at any time instant a node has 0.00001 probability of sending (very close to zero as the exponential distribution requires).
Can these arrival times be considered exponentially distributed at each node and if yes with what parameter?
What you're doing is called a time-step simulation, and can be terribly inefficient. Each tick in your master clock for loop represents a delta-t increment in time, and in each tick you have a laundry list of "did this happen?" possible updates. The larger the time ticks are, the lower the resolution of your model will be. Small time ticks will give better resolution, but really bog down the execution.
To answer your direct questions, you're actually generating a geometric distribution. That will provide a discrete time approximation to the exponential distribution. The expected value of a geometric (in terms of number of ticks) is 1/p, while the expected value of an exponential with rate lambda is 1/lambda, so effectively p corresponds to the exponential's rate per whatever unit of time a tick corresponds to. For instance, with your stated value p = 0.00001, if a tick is a millisecond then you're approximating an exponential with a rate of 1 occurrence per 100 seconds, or a mean of 100 seconds between occurrences.
You'd probably do much better to adopt a discrete-event modeling viewpoint. If the time between network sends follows the exponential distribution, once a send event occurs you can schedule when the next one will occur. You maintain a priority queue of pending events, and after handling the logic of the current event you poll the priority queue to see what happens next. Pull the event notice off the queue, update the simulation clock to the time of that event, and dispatch control to a method/function corresponding to the state update logic of that event. Since nothing happens between events, you can skip over large swatches of time. That makes the discrete-event paradigm much more efficient than the time step approach unless the model state needs updating in pretty much every time step. If you want more information about how to implement such models, check out this tutorial paper.