Why Paxos is design in two phases - distributed-computing

Why Paxos is design in two phases - distributed-computing

Why Paxos requires two phases(prepare/promise + accept/accepted) instead of a single one? That is, using only prepare/promise portion, if the proposer has heard back from a majority of acceptors, that value is choose.
What is the problem, does it break safety or liveness?

It breaks safety not to follow the full protocol.
Typical implementations of multi-paxos have a steady state mode where a stable leader streams Accept messages containing fresh values. Only when a problem occurs (leader crashes, stalls, or is partitioned by a network issue) does a new leader need to issue prepare messages to ensure saftey. A full description of this is in the write-up of how TRex an open source Paxos library implements Paxos.
Consider the following crash scenario which TRex would handle properly:
Nodes A, B, C with A leading
Client application sends V1 to leader A
Node A is in steady state and so sends accept(n, V1) to nodes B and C. The network starts to fail though so only B sees the message and it replies with accepted(n)
Node A sees the response and has a majority {A,B} so it knows the value is fixed due to the safety proof of the protocol.
Node A attempts to broadcast the outcome to everyone just as it's server dies. Only the client application who issued the V1 gets the message. Imagine that V1 is a customer order and upon learning the order is fixed the client application debts the customer credit card.
Node C times out on the dead leader and attempts to lead. It never saw the value V1. It cannot arbitrarily choose any new value without rolling back the order V1 but the customer has already been charged.
So Node C first issues a prepare(n+1) and node B responds with promise(n+1, V1).
Node C then issues accept(n+1, V1) and as long as the remaining messages get through nodes B and C will learn the value V1 was chosen.
Intuitively we can say that Node C has chosen to collaborate with the dead node A by choosing A's value. So intuitively we can see why there must be two rounds. The first round is needed to discover whether there is any pending work to finish. The second round is used to fix the correct value to give consistency across all processes within the system.

It's not entirely accurate, but you can think of the two phases as 1) copying the data, and then 2) committing the data. If the data is just copied to the other servers, those servers would have no idea if enough other servers have the data for it to be considered safe to serve. Thus there is a second phase to let the servers know that they can commit the data.
Paxos is a little more complex than that, which allows it to continue during failures of either phase. Part of the Paxos proof is that it is the minimal protocol for doing this completely. That is, other protocols do more work, either because they add more capabilities, or because they were poorly designed.

Related

How to distribute tasks between servers where each task must be done by only one server?

Goal: There are X number backend servers. There are Y number of tasks. Each task must be done only by one server. The same task ran by two different servers should not happen.
There are tasks which include continuous work for an indefinite amount of time, such as polling for data. The same server can keep doing such a task as long as the server stays alive.
Problem: How to reassign a task if the server executing it dies? If the server dies, it can't mark the task as open. What are efficient ways to accomplish this?

Well, the way you define your problem makes it sloppy to reason about. What you actually is looking for called a "distributed lock".
Let's start with a simpler problem: assume you have only two concurrent servers S1, S2 and a single task T. The safety property you stated remains as is: at no point in time both S1 and S2 may process task T. How could that be achieved? The following strategies come to mind:
Implement an algorithm that deterministically maps task to a responsible server. For example, it could be as stupid as if task.name.contains('foo') then server1.process(task) else server2.process(task). That works and indeed might fit some real world requirements out there, yet such an approach is a dead end: a) you have to know how many server would you have upfront, statically and - the most dangerous - 2) you can not tolerate either server being down: if, say, S1 is taken off then there is nothing you can do with T right now except then just wait for S1 to come back online. These drawbacks could be softened, optimized - yet there is no way to get rid of them; escaping these deficiencies requires a more dynamic approach.
Implement an algorithm that would allow S1 and S2 to agree upon who is responsible for the T. Basically, you want both S1 and S2 to come to a consensus about (assumed, not necessarily needed) T.is_processed_by = "S1" or T.is_processed_by = "S2" property's value. Then your requirement translates to the "at any point in time is_process_by is seen by both servers in the same way". Hence "consensus": "an agreement (between the servers) about a is_processed_by value". Having that eliminates all the "too static" issues of the previous strategy: actually, you are no longer bound to 2 servers, you could have had n, n > 1 servers (provided that your distributed consensus works for a chosen n), however it is not prepared for accidents like unexpected power outage. It could be that S1 won the competition, is_processed_by became equal to the "S1", S2 agreed with that and... S1 went down and did nothing useful....
...so you're missing the last bit: the "liveness" property. In simple words, you'd like your system to continuously progress whenever possible. To achieve that property - among many other things I am not mentioning - you have to make sure that spontaneous server's death is monitored and - once it happened - not a single task T gets stuck for indefinitely long. How do you achieve that? That's another story, a typical piratical solution would be to copy-paste the good old TCP's way of doing essentially the same thing: meet the keepalive approach.
OK, let's conclude what we have by now:
Take any implementation of a "distributed locking" which is equivalent to "distributed consensus". It could be a ZooKeeper done correctly, a PostgreSQL running a serializable transaction or whatever alike.
Per each unprocessed or stuck task T in your system, make all the free servers S to race for that lock. Only one of them guaranteed to win and all the rest would surely loose.
Frequently enough push sort of TCP's keepalive notifications per each processing task or - at least - per each alive server. Missing, let say, three notifications in a sequence should be taken as server's death and all of it's tasks should be re-marked as "stuck" and (eventually) reprocessed in the previous step.
And that's it.
P.S. Safety & liveness properties is something you'd definitely want to be aware of once it comes to distributed computing.

Try rabbitmq worker queues
https://www.rabbitmq.com/tutorials/tutorial-two-python.html
It has an acknowledgement feature so if a task fails or server cashes it will automatically replay your task. Based on your specific use case u can setup retries, etc

"Problem: How to reassign a task if the server executing it dies? If the server dies, it can't mark the task as open. What are efficient ways to accomplish this?"
You are getting into a known problem in distributed systems, how does a system makes decisions when the system is partitioned. Let me elaborate on this.
A simple statement "server dies" requires quite a deep dive on what does this actually mean. Did the server lost power? Is it the network between your control plane and the server is down (and the task is keep running)? Or, maybe, the task was done successfully, but the failure happened just before the task server was about to report about it? If you want to be 100% correct in deciding the current state of the system - that the same as to say that the system has to be 100% consistent.
This is where CAP theorem (https://en.wikipedia.org/wiki/CAP_theorem) comes to play. Since your system may be partitioned at any time (a worker server may get disconnected or die - which is the same state) and you want to be 100% correct/consistent, this means that the system won't be 100% available.
To reiterate the previous paragraph: if the system suspects a task server is down, the system as a whole will have to come to a stop, till it will be able to determine on what happened with the particular task server.
Trade off between consistency and availability is the core of distributed systems. Since you want to be 100% correct, you won't have 100% availability.
While availability is not 100%, you still can improve the system to make it as available as possible. Several approaches may help with that.
Simplest one is to alert a human when the system suspects it is down. The human will get a notification (24/7), wake up, login and do a manual check on what is going on. Whether this approach works for your case - it depends on how much availability you need. But this approach is completely legit and is widely used in the industry (those engineers carrying pagers).
More complicated approach is to let the system to fail over to another task server automatically, if that is possible. Few options are available here, depending on type of task.
First type of task is a re-runnable one, but they have to exist as a single instance. In this case, the system uses "STONITH" (shoot the other node in the head) technic to make sure previous node is dead for good. For example, in a cloud the system would actually kill the whole container of task server and then start a new container as a failover.
Second type of tasks is not re-runnable. For example, a task of transferring money from account A to be B is not (automatically) re-runnable. System does not know if the task failed before or after the money were moved. Hence, the fail over needs to do extra steps to calculate the outcome, which may also be impossible if network is not working correctly. In this cases the system usually goes to halt, till it can make 100% correct decision.
None of these options will give 100% of availability, but they can do as good as possible due to nature of distributed systems.

Is ZooKeeper always consistent in terms of CAP theorem?

Is that correct that ZooKeeper is always CP (in terms of CAP theorem)?
Or is there anyway to use it as AP for service discovery needs?

Zookeeper is not A, and can't drop P. So it's called CP apparently. In terms of CAP theorem, "C" actually means linearizability.
linearizability : if operation B started after operation A successfully completed, then operation B must see the the system in the same state as it was on completion of operation A, or a newer state.
But,
Zookeeper has Sequential Consistency - Updates from a client will be applied in the order that they were sent.
ZooKeeper does not in fact simultaneously consistent across client views.
http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#ch_zkGuarantees
ZooKeeper does not guarantee that at every instance in time, two different clients will have identical views of ZooKeeper data. Due to factors like network delays, one client may perform an update before another client gets notified of the change. Consider the scenario of two clients, A and B. If client A sets the value of a znode /a from 0 to 1, then tells client B to read /a, client B may read the old value of 0, depending on which server it is connected to. If it is important that Client A and Client B read the same value, Client B should should call the sync() method from the ZooKeeper API method before it performs its read.
ZooKeeper provides "sequential consistency". This is weaker than linearizability but is still very strong, much stronger than "eventual consistency". ZooKeeper also provides a sync command. If you invoke a sync command and then a read, the read is guaranteed to see at least the last write that completed before the sync started.
linearizability, writes should appear to be instantaneous. Imprecisely, once a write completes, all later reads (where “later” is defined by wall-clock start time) should return the value of that write or the value of a later write. Once a read returns a particular value, all later reads should return that value or the value of a later write."
In Zookeeper they have sync() method to use where we need something like linearizability.
Serializability is a guarantee about transactions, or groups of one or more operations over one or more objects. It guarantees that the execution of a set of transactions (usually containing read and write operations) over multiple items is equivalent to some serial execution (total ordering) of the transactions.
Refer :
http://zookeeper-user.578899.n2.nabble.com/Consistency-in-zookeeper-td7578531.html
http://www.bailis.org/blog/linearizability-versus-serializability/
Difference between Linearizability and Serializability

No, you cannot change consistency guarantees in current versions of ZooKeeper like you can in some other systems.
You can add a local cache to your clients which will make them have read only data if the cluster goes down, but in terms of CAP that is still not A because it needs to be available for updates as well as reads.
If ZK offers too strong levels of consistency for your service discovery needs, you should try researching other options, e.g. Eureka, Consul or etcd.
Possibly related reads:
https://tech.knewton.com/blog/2014/12/eureka-shouldnt-use-zookeeper-service-discovery/
https://github.com/Netflix/eureka/wiki/FAQ
https://www.consul.io/intro/vs/zookeeper.html

An excellent question.
In terms of CAP theorem, "C" actually means linearizability:
if operation B started after operation A successfully completed, then
operation B must see the the system in the same state as it was on
completion of operation A, or a newer state.
Since the write in ZooKeeper is considered completed after the quorum confirms it, there can still be stale nodes with old data. Therefore, strictly speaking, ZooKeeper is by default not a CP system, even though it provides reasonably high level of consistency. You can ensure linearizability by preceding a read with a sync command.
Regarding the availability under the network partition, those nodes that are not in majority could not process write requests anymore because they don't have a quorum.
See also:
Please stop calling database CP or AP
CAP FAQ

Paxos algorithm in the context of distributed database transaction

I had some confusion about paxos, specifically in the context of database transactions:
In the paper "paxos made simple", it says in the second phase that the proposer needs to choose one of the values with the highest sequence number which one of the acceptors has accepted before (if no such value exists, the proposer is free to choose the original value is proposed).
Questions:
On one hand, I understand it does so to maintain the consensus.
But on the other hand, I had confusion about what the value actually is - what's the point of "having to send accepters the value that has been accepted before"?
In the context of database transactions, what if it needs to commit a new value? Does it need to start a new instance of Paxos?
If the answer to the above question is "Yes", then how does accepters reset the state? (In my understanding, if it doesn't reset the state, the proposer would be forced to send one of the old values that has been accepted before rather than being free to commit whatever the new value is.)

There are different kinds of paxos in the "Paxos made simple" paper. One is Paxos (plain paxos, single-degree paxos, synod), another is Multi-Paxos. From an engineer's point of view, the first is distributed write-once register and the second is distributed append only log.
Answers:
In the context of Paxos, the actual value is the value that was successfully written to the write-once register, it happens when a majority of the acceptors accept value of the same round. In the paper it was shown that the new chosen value always will be the same as previous (if it was chosen). So to get the actual value we should initiate a new round and return the new successfully written value.
In the context of Multi-Paxos the actual value is the latest value added to the log.
With Multi-Paxos we just add a new value to the log. To read the current value we read the log and return the latest version. On the low level Multi-Paxos is an array of Paxos-registers. To write a new value we put it with a position of the current value in a free register and then we fill previous free registers with no-op. When two registers contain two different next values for the same previous value we choose the register with the lowest position in the array.
It is possible and trivial with Multi-Paxos: we just start a new round of the Paxos over a free register. Although plain Paxos doesn't cover it, we can "extend" it and turn into a distributed variable instead of the dist. register. I described this idea and the proof in the "A memo on how Paxos works" post.

Rather than answering your questions directly, I'll try explaining how one might go about implementing a database transaction with Paxos, perhaps that will help clear things up.
The first thing to notice is that there are two "values" in question here. First is the database value, the application-level data that is being modified. Second is the 'Commit'/'Abort' decision. For Paxos-based transactions, the consensus "value" is the 'Commit'/'Abort' decision.
An important point to keep in mind about database transactions with respect to Paxos consensus is that Paxos does not guarantee that all of the peers involved in the transaction will actually see the consensus decision. When this is needed, as it usually is with databases, it's left to the application to ensure that this happens. This means that the state stored by some peers can lag behind others and any database application built on top of Paxos will need some mechanism for handling this. This can be very complicated and is all application-specific so I'm going to ignore all that completely and focus on ensuring that a simple majority of all database replicas agree on the value of revision 2 of the database key FOO which, of course, is initially set to BAR.
The first step is to send the new value for FOO, lets say that's BAZ, and it's expected current revision, 1, along with the Paxos Prepare message. When the database replicas receive this message, they'll first look up their local copy of FOO and check to see if the current revision matches the expected revision included along with the Prepare message. If they match the database replica will bundle a "Vote Commit" flag along with it's Promise message sent in response to the Prepare. If they don't match "Vote Abort" will be sent instead (the revision check protects against the case where the value was modified since the last time the application read it. Allowing overwrites in this case could corrupt application state).
Once the transaction driver receives a quorum of Promise messages along with their associated "Vote Commit"/"Vote Abort" values, it must chose to propose either "Commit" or "Abort". The first step in choosing this value is to follow the Paxos requirement of checking the Prepare messages to see if any database replicant (the Acceptor in Paxos terms) has already accepted a "Commit"/"Abort" decision. If any of them has, then the transaction driver must choose the "Commit"/"Abort" value associated with the highest previously accepted proposal ID. If they haven't it must decide on it's own. This is done by looking at the "Vote Commit"/"Vote Abort" values bundled with the Promises. If a quorum of "Vote Commmit"s are present, the transaction driver may propose "Commit", otherwise it must propose "Abort".
From that point on, it's all standard Paxos messages that get exchanged back and forth until consensus is reached on the 'Commit'/'Abort' decision. Assuming 'Commit' is chosen, the database replicants will update the value and revision associated with FOO to BAZ and 2, respectively.

I wrote a long blog with links to sourcecode on the topic of doing transaction log replication with paxos as described in the Paxos Made Simple paper. Here I give short answers to your questions. The blog post and sourcecode shows the complete picture.
On one hand, I understand it does so to maintain the consensus. But on
the other hand, I had confusion about what the value actually is -
what's the point of "having to send accepters the value that has been
accepted before"?
The value is the command the client is trying to run on the cluster. During an outage the client value transmitted to all nodes by the last leader may have only reached one node in the surviving majority. The new leader may not be that node. The new leader discovers the client value from at least one surviving node and then it transmits it to all the nodes in the surviving majority. In this manner, the new leader collaborates with the dead leader to complete any client work it may have had in progress.
In the context of database transactions, what if it needs to commit a
new value? Does it need to start a new instance of Paxos?
It cannot choose any new commands from clients until it has rebuilt the history of the chosen values selected by the last leader. The blog post talks about this as a "leader takeover phase" where after a crash of the old leader the new leader is trying to bring all nodes fully up to date.
In effect whatever the last leader transmitted which got to a majority of nodes is chosen; the new leader cannot change this history. During the takeover phase it is simply synchronising nodes to get them all up to date. Only when the new leader had finished this phase is it known to be fully up to date with all chosen values can it process any new client commands (i.e. process any new work).
If the answer to the above question is "Yes", then how does accepters
reset the state?
They don't. There is a gap between a value being chosen and any node learning that the value had been chosen. In the context of a database you cannot "commit" the value (apply it to the data store) until you have "learnt" the chosen value. Paxos ensures that a chosen value won't ever be undone. So don't commit the value until you learn that the value has been chosen. The blog post gives more details on this.

This is from my experience of implementing raft and reading the ZAB paper. Which are the two prevalent incarnations of PAXOS. I haven't really gotten into simple paxos or multipaxos.
When a client sends a commit to any node in the cluster, it will redirect that commit to the leader the leader then sends the commit message to each node in the quorum, when all of the nodes confirm the commit it will commit to it's own log.

In Paxos, can an Acceptor accept a different value after it has already accepted one?

In Multi-Paxos algorithm, consider this message flow from the viewpoint of an acceptor:
receive: Prepare(N)
reply: Promise(N, null)
receive: Accept!(N, V1)
reply: Accepted(N, V1)
receive: Accept!(N+1, V2)
reply: ?
What should the acceptor's reaction be in this case, according to the protocol? Should it reply with Accepted(N+1, V2), or should it ignore the second Accept!?
I believe this case may happen in Multi-Paxos when a second proposer comes online and believes he is (and always was) leader, therefore he sends his Accept without Preparing. Or if his Prepare simply did not reach the acceptor. If this case may not happen, can you explain why?

I disagree with both other answers.
Multi-Paxos does not say that the Leader is the only proposer; this would cause the system to have a single point of failure. Even during network partitions the system may not be able to progress. Multi-Paxos is an optimization allowing a single node (the Leader) to skip some prepare phases. Other nodes, thinking the leader is dead, may attempt to continue the instance on her behalf, but must still use the full Basic-Paxos protocol.
Nacking the accept message violates the Paxos algorithm. An acceptor should accept all values unless it has promised to not accept it. (Ignoring is allowed but not recommended; it's just because dropped messages are allowed.)
There is also an elegant solution to this. The problem is with the Leader's round number (N+1 in the question).
Here are some assumptions:
You have a scheme such that round ids are disjoint across all nodes (required by Paxos anyway).
You have a way of determining which node is the Leader per Paxos instance (required by Multi-Paxos). The Leader is able to change from one Paxos instance to the next.
Aside: The Part-time Parliament suggests this is done by the Leader winning a prior Paxos instance (Section 3.1) and points out she can stay Leader as long as she's alive or the richest (Section 3.3.1). I have an explicit ELECT_NEW_LEADER:<node> value that is proposed via Paxos.
The Leader only skips the Prepare phase on the initial round per instance; and uses full Basic Paxos on subsequent rounds.
With these assumptions, solution is very simple. The Leader merely picks a really low round id for it's initial Accept phase. This id (which I'll call it INITIAL_ROUND_ID) can be anything as long as it is lower than all the nodes' round ids. Depending on your id-choosing scheme, either -1, 0, or Integer.MIN_VALUE will work.
It works because another node (I'll call him the Stewart) has to go through the full Paxos protocol to propose anything and his round id is always greater than INITIAL_ROUND_ID. There are two cases to consider: whether or not the Leader's Accept messages reached any of the nodes the Stewart's Prepare message did.
When the Leader's Accept phase has not reaches any nodes, the Stewart will get back no value in any Promise, and can proceed just like in regular Basic-Paxos.
And, when the Leader's Accept phase has reached a node, the Stewart will get back a value in a promise, which it uses to continue the algorithm, just like in Basic-Paxos.
In either case, because the Stewart's round id is greater than INITIAL_ROUND_ID, any slow Accept messages a node receives from the Leader will always result in a Nack.
There is no special logic on either the Acceptor or the Stewart. And minimal special logic on the Leader (Viz. pick a really low INITIAL_ROUND_ID).
Notice, if we change the OP's question by one character then the OP's self answer is correct: Nack.
receive: Prepare(N)
reply: Promise(N, null)
receive: Accept!(N, V1)
reply: Accepted(N, V1)
receive: Accept!(N-1, V2)
reply: Nack(N, V1)
But as it stands, his answer breaks the Paxos Algorithm; it should be Accept!
receive: Prepare(N)
reply: Promise(N, null)
receive: Accept!(N, V1)
reply: Accepted(N, V1)
receive: Accept!(N+1, V2)
reply: Accept!(N+1, V2)

The correctness of Multi-Paxos is conditioned on the requirement that the leader (i.e., proposer) does not change between successive Paxos instances. From The Part-Time Parliament Section 3.1 (The Protocol of the Multi-Decree Parliament):
Logically, the parliamentary protocol [a.k.a. Multi-Paxos] used a
separate instance of the complete Synod protocol [a.k.a. Paxos] for each decree number. However,
a single president [a.k.a. proposer/leader] was selected for all these instances, and he performed the first
two steps of the protocol just once.[Added emphasis is mine.]
Therefore, Multi-Paxos makes the assumption that the case which you describe—when a second proposer comes online and believes he is (and always was) leader—will never happen. If such a case can happen, then one should not use Multi-Paxos. With respect to the second possibility—when the second proposer's Prepare did not reach the acceptor—the fact that the second proposer already sent out an Accept! means that it previously sent out a Prepare that was Promised by a quorum of acceptors. Since the acceptors already promised to the first proposer on round N, then the second proposer's Prepare must have been sent out prior to round N. Therefore, the final Accept!(N+1,V2) must have a counter less than N.
Edit: It should also be noted that this version of the protocol is not robust to Byzantine failure:
[The Paxon Parliament's protocol] does not tolerate arbitrary, malicious failures, nor does it guarantee bounded-time response.—The Part-Time Parliament, Section 4.1

Perhaps a simpler answer is to observe that this is the case when the Prepare(N+1) command was accepted by a majority that did not include the acceptor in question.
To elaborate: Once a leader knows that some majority has Promised(N+1), it then sends Accept(N+1,x) to all acceptors, and even if some other majority of acceptors reply with Accepted(N+1) then consensus has been reached.
This is not that unusual a scenario.

(Answering my own question.)
My current understanding is that I should not to accept the value in N+1 (i.e. not answer at all or send a NACK), thus forcing the leader to possibly start another round with Prepare (if the majority hasn't reached consensus yet). After I receive Prepare(N+2), I shall reply with Promise(N+2, V1) and continue as usual.

Akka and state among actors in cluster

I am working on my bc thesis project which should be a Minecraft server written in scala and Akka. The server should be easily deployable in the cloud or onto a cluster (not sure whether i use proper terminology...it should run on multiple nodes). I am, however, newbie in akka and i have been wondering how to implement such a thing. The problem i'm trying to figure out right now, is how to share state among actors on different nodes. My first idea was to have an Camel actor that would read tcp stream from minecraft clients and then send it to load balancer which would select a node that would process the request and then send some response to the client via tcp. Lets say i have an AuthenticationService implementing actor that checks whether the credentials provided by user are valid. Every node would have such actor(or perhaps more of them) and all the actors should have exactly same database (or state) of users all the time. My question is, what is the best approach to keep this state? I have came up with some solutions i could think of, but i haven't done anything like this so please point out the faults:
Solution #1: Keep state in a database. This would probably work very well for this authentication example where state is only represented by something like list of username and passwords but it probably wouldn't work in cases where state contains objects that can't be easily broken into integers and strings.
Solution #2: Every time there would be a request to a certain actor that would change it's state, the actor will, after processing the request, broadcast information about the change to all other actors of the same type whom would change their state according to the info send by the original actor. This seems very inefficient and rather clumsy.
Solution #3: Having a certain node serve as sort of a state node, in which there would be actors that represent the state of the entire server. Any other actor, except the actors in such node would have no state and would ask actors in the "state node" everytime they would need some data. This seems also inefficient and kinda fault-nonproof.
So there you have it. Only solution i actually like is the first one, but like i said, it probably works in only very limited subset of problems (when state can be broken into redis structures). Any response from more experienced gurus would be very appriciated.
Regards, Tomas Herman

Solution #1 could possibly be slow. Also, it is a bottleneck and a single point of failure (meaning the application stops working if the node with the database fails). Solution #3 has similar problems.
Solution #2 is less trivial than it seems. First, it is a single point of failure. Second, there are no atomicity or other ordering guarantees (such as regularity) for reads or writes, unless you do a total order broadcast (which is more expensive than a regular broadcast). In fact, most distributed register algorithms will do broadcasts under-the-hood, so, while inefficient, it may be necessary.
From what you've described, you need atomicity for your distributed register. What do I mean by atomicity? Atomicity means that any read or write in a sequence of concurrent reads and writes appears as if it occurs in single point in time.
Informally, in the Solution #2 with a single actor holding a register, this guarantees that if 2 subsequent writes W1 and then W2 to the register occur (meaning 2 broadcasts), then no other actor reading the values from the register will read them in the order different than first W1 and then W2 (it's actually more involved than that). If you go through a couple of examples of subsequent broadcasts where messages arrive to destination at different points in time, you will see that such an ordering property isn't guaranteed at all.
If ordering guarantees or atomicity aren't an issue, some sort of a gossip-based algorithm might do the trick to slowly propagate changes to all the nodes. This probably wouldn't be very helpful in your example.
If you want fully fault-tolerant and atomic, I recommend you to read this book on reliable distributed programming by Rachid Guerraoui and Luís Rodrigues, or the parts related to distributed register abstractions. These algorithms are built on top of a message passing communication layer and maintain a distributed register supporting read and write operations. You can use such an algorithm to store distributed state information. However, they aren't applicable to thousands of nodes or large clusters because they do not scale, typically having complexity polynomial in the number of nodes.
On the other hand, you may not need to have the state of the distributed register replicated across all of the nodes - replicating it across a subset of your nodes (instead of just one node) and accessing those to read or write from it, providing a certain level of fault-tolerance (only if the entire subset of nodes fails, will the register information be lost). You can possibly adapt the algorithms in the book to serve this purpose.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse