What do matrix clocks solve but vector clocks can't? - distributed-computing

I understand the need for vector clocks in terms of scalar logical clocks failing to provide enough information to tell whether there is an update conflict in a key value store update for example.
But I am not sure what problem is still unsolved by vector clocks and then solved by the more bulky matrix clocks?

In an eventual consistency environment all messages ever created by a system need to be kept until every peer has received the message (== eventual consistency). But you don't want to keep messages for ever, so you need to have a way to tell which messages were received by all nodes and can be deleted, this is why you use matrix clocks.
Matrix clocks are a list of vector clocks, so you know the current state of each node in the system. Based on this you can know which peer received already which messages. When you exchange messages with another node in the system you compare the matrix clocks and remember always the highest values for each node. Afterwards you can delete messages which were sent before, because the node already must have received them.
This is a very brief description of TSAE (timestamped anti-entropy) protocol. You can read more about it in the dissertation project Weak-consistency group communication and membership by Richard Andrew Golding from 1992 (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.88.7385&rep=rep1&type=pdf) starting from chapter 5.

The distinctions among Lamport clock (scalar logical clock, in your term), vector clock, and matrix clock lie in that they represent different levels of knowledge.
For vector clock $vt_i[1 \ldots n]$ in site $i$, the entry $vt_i[k]$ represents the knowledge the site $S_i$ has about site $S_k$. The knowledge has the form of "$i$ knows $k$ that $\ldots$".
For matrix clock $mt_i[1 \ldots n, 1 \ldots n]$ in site $S_i$, the entry $mt_i[k,l]$ represents the knowledge the site $S_i$ has about the knowledge by $S_k$ about site $S_l$. The knowledge here the form of "$i$ knows $k$ knows $l$ that $\ldots$".
Intuitively, we can do more things with more knowledge.
The following description is mainly quoted from [1]:
Vector clocks and matrix clocks are widely used in asynchronous distributed message-passing systems.
Some example areas using vector clocks are checkpointing, causal memory, maintaining consistency of replicated files, global snapshot, global time approximation, termination detection, bounded multiwriter construction of shared variables, mutual exclusion and debugging (predicate detection).
Some example areas that use matrix clocks are designing fault-tolerant protocols and distributed database protocols, including protocols to discard obsolete information in distributed databases, and protocols to solve the replicated log and replicated dictionary problems.
For matrix clock, we notice that
$min_k(mt_i[k,i]) \ge t$ means that site $S_i$ knows that every other site $k$ knows its progress till its local time $t$.
It is this property that allows a site to no longer send an information with a local time $\le t$ or to discard obsolete information.
[1] Concurrent Knowledge and Logical Clock Abstractions Ajay D. Kshemkalyani 2000

Related

Observation Space for race strategy development - Reinforcement learning

I refrained from asking for help until now, but as my thesis' deadline creeps ever closer and I do not know anybody with experience in RL, I'm trying my luck here.
TLDR;
I have not found an academic/online resource which helps me understand the correct representation of the environment as an observation space. I would be very thankful for any links or for giving me a starting point of how to model the specifics of my environment in an observation space.
Short thematic introduction
The goal of my research is to determine the viability of RL for strategy development in motorsports. This is currently achieved by simulating (lots of!) races and calculating the resulting race time (thus end-position) of different strategic decisions (which are the timing of pit stops + amount of laps to refuel for). This demands a manual input of expected inlaps (the lap a pit stop occurs) for all participants, which implicitly limits the possible strategies by human imagination as well as the amount of possible simulations.
Use of RL
A trained RL agent could decide on its own when to perform a pit stop and how much fuel should be added, in order to minizime the race time and react to probabilistic events in the simulation.
The action space is discrete(4) and represents the options to continue, pit and refuel for 2,4,6 laps respectively.
Problem
The observation space is of POMDP nature and needs to model the agent's current race position (which I hope is enough?). How would I implement the observation space accordingly?
The training is performed using OpenAI's Gym framework, but a general explanation/link to article/publication would also be appreciated very much!
Your observation could be just an integer which represents round or position the agent is in. This is obviously not a sufficient representation so you need to add more information.
A better observation could be the agents race position x1, the round the agent is in x2 and the current fuel in the tank x3. All three of these can be represented by a real number. Then you can create your observation by concating these to a vector obs = [x1, x2, x3].

SIMD programming: hybrid approch for data structure layout

The Intel Optimization Reference Manual
https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf
discusses the advantage of Structure-Of-Arrays (SoA) data layout for SIMD processing compared to the traditional Array-Of-Structures (AoS) layout. This is clear.
However, there's one argument I don't understand. On page 4-23 it says "SoA can have the disadvantage of requiring more independent memory stream references. A computation that uses arrays X, Y, and Z (see Example 4-20) would require three separate data streams. This can require the use of more prefetches, additional address generation calculations, as well as having a greater impact on DRAM page access efficiency." To mitigate this problem they recommend a hybrid approach (Example 4-22).
Can somebody please explain the "three separate data streams", the "prefetches" and "additional address generation calculations", and "impact on DRAM page access efficiency"?
At https://stackoverflow.com/a/40169187/3852630 Peter Cordes discusses two effects: Three different data streams for X, Y, and Z would tie up three registers for the addresses, and if the three arrays would be mapped to the same cache lines, frequent cache eviction would be a problem. However, registers are not a sparse resource on modern CPUs, and multi-way caches should mitigate the cache problem.

How to make simulated electric components behave nicely?

I'm making a simple electric circuit simulator. It will (at least initially) only feature batteries, wires and resistors in series and parallel. However, I'm at a loss how best to simulate said circuit in a good way.
Specifically, I will have batteries and resistors with two contact points each, and wires that go between two contact points. I assume that each component will have a field for its resistance, the current through it and the voltage across it (current and voltage will, of course, be signed). Each component is given a resistance, and the batteries are given a voltage. The goal of the simulation is to assign correct values to all the other fields in real time as the player connects and disconnects components and wires.
These are the requirements:
It must be correct, including Ohm's and Kirchhoff's laws (I'm modeling real world circuits, and there is little point if the model does something completely different)
It must be numerically stable (we can't have uncontrolled oscillations or something just because two neighbouring resistors can't make up their minds together)
It should stabilize relatively quickly for, let's say, fewer than 30 components (having to wait a few seconds before the values are correct doesn't really satisfy "real time", but I really don't plan on using it for more than 10 or maybe 20 components)
The optimal formulation for me (how I envision this in my head) would be if I could assign a script to each component that took care of that component only, possibly by communicating field values with neighbouring components, and each component script works in parallel and adjusts as is needed
I only see problems here and no solutions. The biggest problem, I think, is Kirchhoff's voltage law (going around any sub-circuit, the voltage across all components, including signs, add up to 0), because that's a global law (it says somehting about a whole circuit and not just a single component / connection point). There is a mathematical reformulation saying that there exists a potential function on the points in the circuit (for instance, the voltage measured against the + pole of the battery), which is a bit more local, but I still don't see how to let a component know how much the voltage / potential drops across it.
Kirchhoff's current law (the net current flow into an intersection is 0) might also be trouble. It seems to force me to make intersections into separate objects to enforce it. I originally thought that I could just let each component have two lists (a left list and a right list) containing every other component that is connected to it at that point, but that might not make KCL easily enforcable.
I know there are circuit simulators out there, and they must have solved this exact problem somehow. I just can't find an explanation because if I try googling it, I only find the already made simulators and no explanations anywhere.

Synchronization in Manchester coding

Lately I have been reading about Manchester encoding and I think I'm beginning to understand most of it now, but still I have got some whys that need addressing. Mainly 3 for the moment:
1) Most articles on Internet when introducing Manchester coding start by telling how bad NRZI really was and one of the disadvantages that gets mentioned is that synchronization becomes a problem when lengthy 1's or 0's get sent. Why is that a problem, since most places where NRZI is used have got separate clock and data lines. As long as the clock signal is there why should that ever be a problem?
2) Also, is Manchester supposed to work on a fixed frequency? Or can it work like I2C where clock frequency can be variable?
3) The good thing that gets mentioned about Manchester encoding is that it does not require separate clock line and that clock is embedded in the data and can be recovered by the receiver. Frequent transitions in Manchester help in synchronization and that the transitions happen in the middle and so clock can be recovered from transition. But my question is, if there are repeated 1's or 0's transition can happen in the middle and in the end as well (see attached waveform pic, look at the transitions when sending 111). So when a receiver sees a transition how does it figure out whether it is in the middle or at the end?
If I'm talking rubbish I would love to be corrected.
regarding your third question: I'm also brushing up on manchester and it appears that to recover a clock you need a differential signal:
Reference: "Data Communications, Computer Networks and Open Systems" by Fred Halsall, page 104, figure 3.8
For the 3 question,
Whenever a signal is transmitted, initially a few redundant bits which contain info about clock are sent.
For example, 1111, now the receiver knows the real data will arrive next, and through those redundant bits clock signal is extracted as well as the “notification “ that a signal is going to come.
As for question 1, NRZ scheme can send lengthy 1’s and lengthy 0’s.... but here the problem is actually with lengthy 1’s, if you could check sending lengthy 1’s with some modulation scheme and a dipole antenna, you could observe that the power of carrier signal will start decaying exponentially.
And the other reason would be the power needed to send that many lengthy 1’s, which is not favourable!
For question 2, yes it is possible to use it variable clock frequency but the condition is you should send redundant bits before you could change the clock frequency so that the receiver understands that the clock is changed from this point onwards.
Hope it’s clear now ;)

How to synchronize readout of binary streams on serial port of Matlab

I'm having an issue which is partially Matlab- and partially general programming-related, I'm hoping that somebody can help me brainstorm for solutions.
I have an external microcontroller that generates a large stream of binary data (~40kb) every 400ms and sends it via UART to a PC running Matlab scripts. The data is not encoded in hexa or dec characters, but true binary (hence, there's no terminator defined as all 256 values are possible, valid combinations of data). Baudrate is set at 1024000. In short, it takes roughly 375ms for a whole stream of data to be sent, with 25ms of dead time in between streams
In Matlab, the serial port is configured correctly (also 1024000, 8x bits, 1x stop bit, no parity, no hardware flow control, etc.). I am able to readout the data I'm sending via the microcontroller correctly (i.e. there's no corruption of data), but I'm not being able to synchronize the serial readout on Matlab. My script is as follows:
function data_show = GetDATA
if ~isempty(instrfind)
fclose(instrfind);
end
DATA_TOTAL_SIZE = 38400;
DATA_buffer = uint8(zeros(DATA_TOTAL_SIZE,1));
DATA_show = reshape(DATA_buffer(1:2:end)',[160,120])';
f_data_in = false;
f_data_out = true;
serialport = serial('COM11','BaudRate',1024000,'DataBits',8,'FlowControl','none','Parity','none','StopBits',1,...
'BytesAvailableFcnCount',DATA_TOTAL_SIZE,'BytesAvailableFcnMode','byte','InputBufferSize',DATA_TOTAL_SIZE * 2,...
'BytesAvailableFcn',#GetPortData);
fopen(serialport);
while (get(serialport,'BytesAvailable') ~= 0) % Skip first packet which might be incomplete
fread(serialport,DATA_TOTAL_SIZE,'uint8');
end
f_data_out = true;
while (1)
if (f_data_in)
DATA_buffer = fread(serialport,DATA_TOTAL_SIZE,'uint8');
DATA_show = reshape(DATA_buffer(1:2:end)',[160,120])'; %Reshape array as matrix
DATAsc(DATA_show);
disp('DATA');
end
pause(0.01);
end
fclose(serialport);
delete(serialport);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function GetPortData (obj,~)
if f_data_out
f_data_in = true;
end
end
end
The problem I see is that what I end up reading is always the correct size, but belongs to multiple streams, because I haven't found a way to tell Matlab that these 25ms of no data should be used to synchronize (i.e. data from before and after that blank period should belong to different streams).
Does anyone have any suggestions for this?
Thanks a lot!
For completeness, I would like to post the current implementation I have fixing this issue, which is probably not a suitable solution in all cases but might be useful in some.
The approach I took consists in moving into a bi-directional communication protocol, in which Matlab initiates the streaming by sending a very short command as a trigger (e.g. single, non-printable character). Given the high baudrate it does not add significant delay due to processing in the microcontroller's side.
The microcontroller, upon reception of this trigger, proceeds to transmit only one full package (as opposed to continuously streaming package at a 5Hz rate). By forcing Matlab to pickup a serial package of the known length right after issuing the trigger, it ensures that only one package and without synchronization issues is received.
Then it becomes just a matter of encapsulating the Matlab script in a routine with a 5Hz tick given by a timer, in which the sequence is repeated (send trigger, retrieve package, do whatever processing, and repeat).
Advantages of this:
It solves the synchronization problems
Disadvantages of this:
Having Matlab running on a timer tick does not ensure perfect periodicity, and hence the triggers might not always be sent at exactly 5Hz. If triggers are sent at "inconvenient" times for the microcontroller, packages might need to be skipped in order to avoid that a package is updated in memory while it is still being transmitted (since transmission takes a significant part of the 200ms time slot)
From experience, performance can vary a lot depending on what the PC running Matlab is doing. For example, it works fine when the PC is left on its own to do the acquisition, but if another program is used (e.g. Chrome), Matlab begins to lag and that results in delays in transmission of triggers.
As mentioned above, it's not a complete answer, but it is an approach that might be sufficient in some situations. If someone has a more efficient option, please fell free to share!