Do proposal numbers need to be unique in Paxos? - consensus

What can happen if two proposals are issued with the same ProposalId? It shouldn't create any problem if they have the same value. But what about different values? I can work out a scenario where it risks liveness at the time of failure, but would it create any problem for the protocol's safety?

It's a nice idea because it would ease an annoying design requirement for a practical system: designing a scheme to ensure proposers have different, yet increasing round numbers.
It doesn't work, unfortunately.
Let's define PaxosPrime to be a variation of Paxos where different proposers are allowed to have the same round numbers. We show through contradiction that PaxosPrime is not a consensus algorithm.
Proof sketch: Assume PaxosPrime is a consensus algorithm. Let's consider where each acceptor has a different value, but with the same round (w.l.o.g we'll pick 3). Each will also have promised at 3. We then have a pair of proposers interact in the system.
A1 | A2 | A3
v p | v p | v p
------+-------+------
x#3 3 | y#3 3 | z#3 3
P1 prepares for round 4 (i.e. Phase 1.A) and receives all three values x#3, y#3, and z#3. There is no provision in Paxos for tie-breaking, so we'll have it choose x.
P2 also prepares for round 4 and receives y#3 and z#3 from A2 and A3, respectively. We'll have it choose y.
P1 sends accepts for x#4 (i.e. Phase 2.A) and the acceptors all process it. At this point all the acceptors have value v#4 and promises for 4. The system has consensus on the value x.
P2 sends accepts for round y#4 and the acceptors A2 and A3 successfully process it. A majority of the acceptors now have value y#4; The system has consensus on the value y.
Here is the trace from the acceptors' point of view:
A1 | A2 | A3
v p | v p | v p
------+-------+------
x#3 3 | y#3 3 | z#3 3 # Start
x#3 4 | y#3 4 | z#3 4 # Process P1's propose command
x#3 4 | x#3 4 | x#3 4 # Process P1's accept command
x#3 4 | y#3 4 | y#3 4 # Process P2's accept command
The system first had consensus on the value x then consensus on the value y. This is a contradiction on the definition of consensus; thus PaxosPrime is not a consensus algorithm.
∎

The Synod algorithm technically does not require the proposal numbers to be unique among all processes. And the monotonically increasing requirement is only an optimization. We could technically choose a random proposal number, and still have a correct algorithm.
The algorithm starts off with a proposer sending a prepare(proposal number) to all acceptors. It can then only uses that proposal number in a proposal if a majority of acceptors haven't seen a proposal number as high as or higher than that yet and successfully send back a promise which can only happen once per proposal number. After this stage, if the proposer is able to continue, then that proposal number is effectively unique. So the algorithm already enforces the uniqueness of proposal numbers that are actually used in a proposal.

Related

Questiins about DTCM, how to model this process?

In a certain manufacturing system, there are 2 machines M 1 and M 2.
M 1 is a fast and high precision machine whereas M 2 is a slow and low
precision machine. M 2 is employed only when M 1 is down, and it is
assumed that M 2 does not fail. Assume that the processing time of
parts on M 1, the processing time of parts on M 2, the time to failure
of M 1, and the repair time of M 1 are independent geometric random
variables with parameters p 1, p 2, f, and r, respectively. Identify a
suitable state space for the DTMC model of the above system and
compute the TPM. Investigate the steady-state behavior of the DTMC.
How to model this into. DTMC? What is the state space? I have tried to use state-space like this:
0: M1 is working, M1 does not fail
1: M1 failed, M2 is working, M1 is repairing, M1 does not finish repairing
But there are still some problems, like what will happen after M1 finishes 1 part? Immediately process the next one or will change decide whether to fail or not? What will happen if M1 fails during process one part? What is the probability transfer matrix?
Thank you very much for your help!!!!

Online Algorithm approach for alternating subsequence

Consider a sequence A = a1, a2, a3, ... an of integers. A subsequence B of A is a sequence B = b1, b2, .... ,bn which is created from A by removing some elements but by keeping the order. Given an integer sequence A, the goal is to compute an alternating subsequence B, i.e. a sequence b1, ... bn such that for all i in {2, 3, ... , m-1}, if b{i-1} < b{i} then b{i} > b{i+1} and if b{i-1} > b{i} then b{i} < b{i+1}**
Consider an online version of the problem, where the sequence A is given element-by-element and each time, one needs to directly decide whether to include the next element in the subsequence B. Is it possible to achieve a constant competitive ratio (by using a deterministic online algorithm)? Either give an online algorithm which achieves a constant competitive ratio or show that it is not possible to find such an online algorithm.
Assume sequence [9,8,9,8,9,8, .... , 9,8,9,8,2,1,2,9,8,9, ... , 8,9,8,9,8,9]
My Argumentation:
The algorithm must decide immediately if it inserts an incoming number into the subsequence. If the algorithm now gets the numbers 1 then 2 then 2 it will eventually decide that they are part of the sequence and thus by a nonlinear factor worse than the optimal solution of n-3.
-> No constant competitive ratio!
Is this a proper argumentation?
If I understood what you meant, your argument is correct, but the sequence you gave in the example is wrong. for example the algorithm may choose all the 9's and 8's.
You can alter your argument slightly to make it more accurate, for example consider the sequence
3,4,3,4,3,4,......, 1/5,2/6,1/5,2/6,....
Explanation:
You start the sequence with 3,4,3,4,... etc. until the algorithm picks two numbers. If it never does, it's obviously not competitive (it gets 0/1 out of n)
If the algorithm picked a 3, then 4, the algorithm must next take a number lower than 4. By continuing with 5,6,5,6,... the algorithm cannot take another number.
If the algorithm chose to take a 4 then a 3, by a similar resoning we can easily see how continuing with 1,2,1,2,... prevents the algorithm from taking another nubmer.
Thus, in any case, the algorithm cannot take more than 2 numbers for every n, which, as you stated, isn't a constant competitive ratio.

Stata: Keep only observations with minimum, maximum and median value of a given variable

In Stata, I have a dataset with two variables: id and var, and say 1000 observations. The variable var is of type float and takes distinct values for all observations. I would like to keep only the three observations where var is either the minimum of var, the maximum of var, or the median of var.
The way I currently do this:
summarize var, detail
local varmax = r(max)
local varmin = r(min)
local varmedian= r(p50)
keep if inlist(float(var),float(`varmax') , float(`varmedian'), float(`varmin'))
The problem that I face is that sometimes the inlist condition will not match one of the value. E.g. I end up with two observations instead of three, for instance the one with min and the one with max, but not the one with median. I suspect this has to do with a precision problem. As you see, I tried to convert all numbers to float, but this is apparently not sufficient.
Any fix to my solution, or alternative solution would be greatly appreciated (if possible without installing additional packages), thanks!
This is not in the first instance a precision problem.
It is an inevitable problem when (1) the number of values is even and (2) the median is the mean of two central values that are different. Then the median itself is not a value in the dataset and will not be found by keep.
Consider a data set 1, 2, 3, 4. The median 2.5 is not in the data. This is very common; indeed it is what is expected with all values distinct and the number of observations even.
Other problems can arise because two or even three of the minimum, median and maximum could be equal to each other. This is not your present problem, but it can bite with other variables (e.g. indicator variables).
Precision problems are possible.
Here is a general solution purported to avoid all these difficulties.
If you collapse to min, median. max and then reshape you can avoid the problem. You will always get three results, even if they are numerically equal and/or not present in the data.
In the trivial example below, the identifier is needed only to appease reshape. In other problems, you might want to collapse using by() and then your identifier comes ready-made. However, you will be less likely to want to reshape in that case.
. clear
. set obs 4
number of observations (_N) was 0, now 4
. gen y = _n
. collapse (min)ymin=y (max)ymax=y (median)ymedian=y
. gen id = _n
. reshape long y, i(id) j(statistic) string
(note: j = max median min)
Data wide -> long
-----------------------------------------------------------------------------
Number of obs. 1 -> 3
Number of variables 4 -> 3
j variable (3 values) -> statistic
xij variables:
ymax ymedian ymin -> y
-----------------------------------------------------------------------------
. list
+---------------------+
| id statis~c y |
|---------------------|
1. | 1 max 4 |
2. | 1 median 2.5 |
3. | 1 min 1 |
+---------------------+
All that said, having (lots of?) datasets with just three observations sounds poor data management strategy. Perhaps this is extracted from some larger question.
UPDATE
Here is another way to keep precisely 3 observations. Apart from the minimum and maximum, we use the rule that we keep the "low median", i.e. the lower of two values averaged for the median, when the number of observations is even, and a single value that is the median otherwise. (In Stephen Stigler's agreeable terminology, we can talk of "comedians" in the first case.)
. sysuse auto, clear
(1978 Automobile Data)
. sort mpg
. drop if missing(mpg)
(0 observations deleted)
. keep if inlist(_n, 1, cond(mod(_N, 2), ceil(_N/2), floor(_N/2)), _N)
(71 observations deleted)
. l mpg
+-----+
| mpg |
|-----|
1. | 12 |
2. | 20 |
3. | 41 |
+-----+
mod(_N, 2) is 1 if _N is odd and 0 if _N is even. The expression in cond() selects ceil(_N/2) if the number of observations is odd and floor(_N/2) if it is even.

Simplifying a 9 variable boolean expression

I am trying to create a tic-tac-toe program as a mental exercise and I have the board states stored as booleans like so:
http://i.imgur.com/xBiuoAO.png
I would like to simplify this boolean expression...
(a&b&c) | (d&e&f) | (g&h&i) | (a&d&g) | (b&e&h) | (c&f&i) | (a&e&i) | (g&e&c)
My first thoughts were to use a Karnaugh Map but there were no solvers online that supported 9 variables.
and heres the question:
First of all, how would I know if a boolean condition is already as simple as possible?
and second: What is the above boolean condition simplified?
2. Simplified condition:
The original expression
a&b&c|d&e&f|g&h&i|a&d&g|b&e&h|c&f&i|a&e&i|g&e&c
can be simplified to the following, knowing that & is more prioritary than |
e&(d&f|b&h|a&i|g&c)|a&(b&c|d&g)|i&(g&h|c&f)
which is 4 chars shorter, performs in the worst case 18 & and | evaluations (the original one counted 23)
There is no shorter boolean formula (see point below). If you switch to matrices, maybe you can find another solution.
1. Making sure we got the smallest formula
Normally, it is very hard to find the smallest formula. See this recent paper if you are more interested. But in our case, there is a simple proof.
We will reason about a formula being the smallest with respect to the formula size, where for a variable a, size(a)=1, for a boolean operation size(A&B) = size(A|B) = size(A) + 1 + size(B), and for negation size(!A) = size(A) (thus we can suppose that we have Negation Normal Form at no cost).
With respect to that size, our formula has size 37.
The proof that you cannot do better consists in first remarking that there are 8 rows to check, and that there is always a pair of letter distinguishing 2 different rows. Since we can regroup these 8 checks in no less than 3 conjuncts with the remaining variable, the number of variables in the final formula should be at least 8*2+3 = 19, from which we can deduce the minimal tree size.
Detailed proof
Let us suppose that a given formula F is the smallest and in NNF format.
F cannot contain negated variables like !a. For that, remark that F should be monotonic, that is, if it returns "true" (there is a winning row), then changing one of the variables from false to true should not change that result. According to Wikipedia, F can be written without negation. Even better, we can prove that we can remove the negation. Following this answer, we could convert back and from DNF format, removing negated variables in the middle or replacing them by true.
F cannot contain a sub-tree like a disjunction of two variables a|b.
For this formula to be useful and not exchangeable with either a or b, it would mean that there are contradicting assignments such that for example
F[a|b] = true and F[a] = false, therefore that a = false and b = true because of monotonicity. Also, in this case, turning b to false makes the whole formula false because false = F[a] = F[a|false] >= F[a|b](b = false).
Therefore there is a row passing by b which is the cause of the truth, and it cannot go through a, hence for example e = true and h = true.
And the checking of this row passes by the expression a|b for testing b. However, it means that with a,e,h being true and all other set to false, F is still true, which contradicts the purpose of the formula.
Every subtree looking like a&b checks a unique row. So the last letter should appear just above the corresponding disjunction (a&b|...)&{c somewhere for sure here}, or this leaf is useless and either a or b can be removed safely. Indeed, suppose that c does not appear above, and the game is where a&b&c is true and all other variables are false. Then the expression where c is supposed to be above returns false, so a&b will be always useless. So there is a shorter expression by removing a&b.
There are 8 independent branches, so there is at least 8 subtrees of type a&b. We cannot regroup them using a disjunction of 2 conjunctions since a, f and h never share the same rows, so there must be 3 outer variables. 8*2+3 makes 19 variables appear in the final formula.
A tree with 19 variables cannot have less than 18 operators, so in total the size have to be at least 19+18 = 37.
You can have variants of the above formula.
QED.
One option is doing the Karnaugh map manually. Since you have 9 variables, that makes for a 2^4 by 2^5 grid, which is rather large, and by the looks of the equation, probably not very interesting either.
By inspection, it doesn't look like a Karnaugh map will give you any useful information (Karnaugh maps basically reduce expressions such as ((!a)&b) | (a&b) into b), so in that sense of simplification, your expression is already as simple as it can get. But if you want to reduce the number of computations, you can factor out a few variables using the distributivity of the AND operators over ORs.
The best way to think of this is how a person would think of it. No person would say to themselves, "a and b and c, or if d and e and f," etc. They would say "Any three in a row, horizontally, vertically, or diagonally."
Also, instead of doing eight checks (3 rows, 3 columns, and 2 diagonals), you can do just four checks (three rows and one diagonal), then rotate the board 90 degrees, then do the same checks again.
Here's what you end up with. These functions all assume that the board is a three-by-three matrix of booleans, where true represents a winning symbol, and false represents a not-winning symbol.
def win?(board)
winning_row_or_diagonal?(board) ||
winning_row_or_diagonal?(rotate_90(board))
end
def winning_row_or_diagonal?(board)
winning_row?(board) || winning_diagonal?(board)
end
def winning_row?(board)
3.times.any? do |row_number|
three_in_a_row?(board, row_number, 0, 1, 0)
end
end
def winning_diagonal?(board)
three_in_a_row?(board, 0, 0, 1, 1)
end
def three_in_a_row?(board, x, y, delta_x, delta_y)
3.times.all? do |i|
board[x + i * delta_x][y + i * deltay]
end
end
def rotate_90(board)
board.transpose.map(&:reverse)
end
The matrix rotate is from here: https://stackoverflow.com/a/3571501/238886
Although this code is quite a bit more verbose, each function is clear in its intent. Rather than a long boolean expresion, the code now expresses the rules of tic-tac-toe.
You know it's a simple as possible when there are no common sub-terms to extract (e.g. if you had "a&b" in two different trios).
You know your tic tac toe solution must already be as simple as possible because any pair of boxes can belong to at most only one winning line (only one straight line can pass through two given points), so (a & b) can't be reused in any other win you're checking for.
(Also, "simple" can mean a lot of things; specifying what you mean may help you answer your own question. )

Reading multi-channel serial usb in pure data

I have multichannel serial data from an 8 channel ADC chip that I'm connecting to my computer via a serial-USB cable. I want to use these separate channels in Pure Data, but the pd_comport object doesn't read multi-channel serial data. I've scoured the Pd discussion fora but there's no mention of how to do this.
Any thoughts on how I can go about it?
per definition a serial connection is only single-channel. if you have multiple (synchronized) channels, it's called parallel.
so your problem is basically one of the two following:
parallel serial streams
if you are transmitting the 8 ADC-channels via different serial connections, your (special) cable should register 8 different devices (e.g. /dev/ttyUSB5, /dev/ttyUSB6, .../dev/ttyUSB12).
in this case, simply use multiple [comport] objects (one for each serial device you want to interface)
single multiplex stream
in the (more likely) case, that your ADC transmits it's 8 channels in a single serial connection by multiplexing the data, you will have to demultiplex the serial stream yourself. how to do this, is very much depending on the actual format of the data.
assuming your ADCs are only 8bit and you have only 4 channels (for simplicity), then you might receive a serial stream like: ... A1 B1 C1 D1 A2 B2 C2 D2 A3 B3 .... (with A, B,... being the samples for the 4 channels; and 1,2,... being the sample frames), then you could demultiplex the signal into 4 streams with something like
|
[t b f]
| |
| +------------+ |
[i ]/[+ 1]/[% 4]/ |
| |
[pack 0 0]
|
[route 0 1 2 3]
| | | |
in practice your protocol might look slightly different (e.g. there ought to be a way to specify frame boundaries (there's no way by just looking at the numbers, whether you are actually seeing A1 B1 C1 D1 A2 B2 or B1 C1 D1 A2 B2 C2, so it's unclear whether the 1st sample belongs to channelA or channelB).
thus you really must get your hands on the protocol definition and interpret the data you get from [comport]