Amend the above Promela model with a mechanism for corrupting messages - promela

I'm studying for the final exam in system validation and this question was in last exam paper. I need help to solve it.
Intuitively, sender sends the squares of the first five non-negative integers to two receiver processes. Each number is sent only once, and the choice of receiver to which it is sent is random.
a) Suppose that the channels can corrupt messages. Amend the above Promela model with a mechanism for corrupting messages, as well as mechanisms to detect and cope with corrupted messages. In your solution, you may add new channels, variables, processes, change the type of messages, etc.
chan linkA = [5] of {byte};
chan linkB = [5] of {byte};
proctype sender ()
{ byte n;
:: n < 5 -> linkA!n*n; n++
:: n < 5 -> linkB!n*n; n++
:: else -> break
proctype receiver (chan link)
{ byte m, total;
:: link?m -> total=total+m
run sender (); run receiver (linkA);
run receiver (linkB)


Transferring arrays/classes/records between locales

In a typical N-Body simulation, at the end of each epoch, each locale would need to share its own portion of the world (i.e. all bodies) to the rest of the locales. I am working on this with a local-view approach (i.e. using on Loc statements). I encountered some strange behaviours that I couldn't make sense out of, so I decided to make a test program, in which things got more complicated. Here's the code to replicate the experiment.
proc log(args...?n) {
writeln("[locale = ",, "] [",, "] => ", args);
const max: int = 50000;
record stuff {
var x1: int;
var x2: int;
proc init() {
this.x1 =;
this.x2 =;
class ctuff {
var x1: int;
var x2: int;
proc init() {
this.x1 =;
this.x2 =;
class wrapper {
// The point is that total size (in bytes) of data in `r`, `c` and `a` are the same here, because the record and the class hold two ints per index.
var r: [{1..max / 2}] stuff;
var c: [{1..max / 2}] owned ctuff?;
var a: [{1..max}] int;
proc init() {
this.a =;
proc test() {
var wrappers: [LocaleSpace] owned wrapper?;
coforall loc in LocaleSpace {
on Locales[loc] {
wrappers[loc] = new owned wrapper();
// rest of the experiment further down.
Two interesting behaviours happen here.
1. Moving data
Now, each instance of wrapper in array wrappers should live in its locale. Specifically, the references (wrappers) will live in locale 0, but the internal data (r, c, a) should live in the respective locale. So we try to move some from locale 1 to locale 3, as such:
on Locales[3] {
var timer: Timer;
var local_stuff = wrappers[1]!.r;
log("get r from 1", timer.elapsed());
on Locales[3] {
var timer: Timer;
var local_c = wrappers[1]!.c;
log("get c from 1", timer.elapsed());
on Locales[3] {
var timer: Timer;
var local_a = wrappers[1]!.a;
log("get a from 1", timer.elapsed());
Surprisingly, my timings show that
Regardless of the size (const max), the time of sending the array and record strays constant, which doesn't make sense to me. I even checked with chplvis, and the size of GET actually increases, but the time stays the same.
The time to send the class field increases with time, which makes sense, but it is quite slow and I don't know which case to trust here.
2. Querying the locales directly.
To demystify the problem, I also query the of some variables directly. First, we query the data, which we expect to live in locale 2, from locale 2:
on Locales[2] {
var wrappers_ref = wrappers[2]!; // This is always 1 GET from 0, okay.
And the result is:
[locale = 2] [2020-12-26T19:36:26.834472] => (array, 2, 2)
[locale = 2] [2020-12-26T19:36:26.894779] => (record, 2, 2, 2)
[locale = 2] [2020-12-26T19:36:27.023112] => (class, 2, 2, 2)
Which is expected. Yet, if we query the locale of the same data on locale 1, then we get:
[locale = 1] [2020-12-26T19:34:28.509624] => (array, 2, 2)
[locale = 1] [2020-12-26T19:34:28.574125] => (record, 2, 2, 1)
[locale = 1] [2020-12-26T19:34:28.700481] => (class, 2, 2, 2)
Implying that wrappers_ref.r[1] lives in locale 1, even though it should clearly be on locale 2. My only guess is that by the time is executed, the data (i.e. the .x of the record) is already moved to the querying locale (1).
So all in all, the second part of the experiment lead to a secondary question, whilst not answering the first part.
NOTE: all experiment are run with -nl 4 in chapel/chapel-gasnet docker image.
Good observations, let me see if I can shed some light.
As an initial note, any timings taken with the gasnet Docker image should be taken with a grain of salt since that image simulates the execution across multiple nodes using your local system rather than running each locale on its own compute node as intended in Chapel. As a result, it is useful for developing distributed memory programs, but the performance characteristics are likely to be very different than running on an actual cluster or supercomputer. That said, it can still be useful for getting coarse timings (e.g., your "this is taking a much longer time" observation) or for counting communications using chplvis or the CommDiagnostics module.
With respect to your observations about timings, I also observe that the array-of-class case is much slower, and I believe I can explain some of the behaviors:
First, it's important to understand that any cross-node communications can be characterized using a formula like alpha + beta*length. Think of alpha as representing the basic cost of performing the communication, independent of length. This represents the cost of calling down through the software stack to get to the network, putting the data on the wire, receiving it on the other side, and getting it back up through the software stack to the application there. The precise value of alpha will depend on factors like the type of communication, choice of software stack, and physical hardware. Meanwhile, think of beta as representing the per-byte cost of the communication where, as you intuit, longer messages necessarily cost more because there's more data to put on the wire, or potentially to buffer or copy, depending on how the communication is implemented.
In my experience, the value of alpha typically dominates beta for most system configurations. That's not to say that it's free to do longer data transfers, but that the variance in execution time tends to be much smaller for longer vs. shorter transfers than it is for performing a single transfer versus many. As a result, when choosing between performing one transfer of n elements vs. n transfers of 1 element, you'll almost always want the former.
To investigate your timings, I bracketed your timed code portions with calls to the CommDiagnostics module as follows:
...code to time here...
and found, as you did with chplvis, that the number of communications required to localize the array of records or array of ints was constant as I varied max, for example:
This is consistent with what I'd expect from the implementation: That for an array of value types, we perform a fixed number of communications to access array meta-data, and then communicate the array elements themselves in a single data transfer to amortize the overheads (avoid paying multiple alpha costs).
In contrast, I found that the number of communications for localizing the array of classes was proportional to the size of the array. For example, for the default value of 50,000 for max, I saw:
I believe the reason for this distinction relates to the fact that c is an array of owned classes, in which only a single class variable can "own" a given ctuff object at a time. As a result, when copying the elements of array c from one locale to another, you're not just copying raw data, as with the record and integer cases, but also performing an ownership transfer per element. This essentially requires setting the remote value to nil after copying its value to the local class variable. In our current implementation, this seems to be done using a remote get to copy the remote class value to the local one, followed by a remote put to set the remote value to nil, hence, we have a get and put per array element, resulting in O(n) communications rather than O(1) as in the previous cases. With additional effort, we could potentially have the compiler optimize this case, though I believe it will always be more expensive than the others due to the need to perform the ownership transfer.
I tested the hypothesis that owned classes were resulting in the additional overhead by changing your ctuff objects from being owned to unmanaged, which removes any ownership semantics from the implementation. When I do this, I see a constant number of communications, as in the value cases:
I believe this represents the fact that once the language has no need to manage the ownership of the class variables, it can simply transfer their pointer values in a single transfer again.
Beyond these performance notes, it's important to understand a key semantic difference between classes and records when choosing which to use. A class object is allocated on the heap, and a class variable is essentially a reference or pointer to that object. Thus, when a class variable is copied from one locale to another, only the pointer is copied, and the original object remains where it was (for better or worse). In contrast, a record variable represents the object itself, and can be thought of as being allocated "in place" (e.g., on the stack for a local variable). When a record variable is copied from one locale to the other, it's the object itself (i.e., the record's fields' values) which are copied, resulting in a new copy of the object itself. See this SO question for further details.
Moving on to your second observation, I believe that your interpretation is correct, and that this may be a bug in the implementation (I need to stew on it a bit more to be confident). Specifically, I think you're correct that what's happening is that wrappers_ref.r[1].x1 is being evaluated, with the result being stored in a local variable, and that the query is being applied to the local variable storing the result rather than the original field. I tested this theory by taking a ref to the field and then printing of that ref, as follows:
ref x1loc = wrappers_ref.r[1].x1;
and that seemed to give the right result. I also looked at the generated code which seemed to indicate that our theories were correct. I don't believe that the implementation should behave this way, but need to think about it a bit more before being confident. If you'd like to open a bug against this on Chapel's GitHub issues page, for further discussion there, we'd appreciate that.

Graphx : Is it possible to execute a program on each vertex without receiving a message?

When I was trying to implement an algorithm in Graphx with Scala, I didn't find it possible to activate all the vertices in the next ietration.. How can I send a message to all my graph vertices?
In my algorithm, there is some super-steps that should be executed by all the vertices (whether they receive a message or not because even not receiving a message is an event that should be handled in next iteration).
I give here the official code of SSSP algorithm implemeted in pregel's logic, you can see that only vertices that received a message will execute their program in the next iteration but for my case, I want pregel function to run iteratively i.e., each super-step the vertices execute their programs and they can vote to halt if needed !! The reasoning in this example doesn't look like Pregel's paper logic. Please any ideas on how to implement Pregel's real logic?
val graph: Graph[Long, Double] =
GraphGenerators.logNormalGraph(sc, numVertices = 100).mapEdges(e => e.attr.toDouble)
val sourceId: VertexId = 42 // The ultimate source
// Initialize the graph such that all vertices except the root have distance infinity.
val initialGraph = graph.mapVertices((id, _) =>
if (id == sourceId) 0.0 else Double.PositiveInfinity)
val sssp = initialGraph.pregel(Double.PositiveInfinity)(
(id, dist, newDist) => math.min(dist, newDist), // Vertex Program
triplet => { // Send Message
if (triplet.srcAttr + triplet.attr < triplet.dstAttr) {
Iterator((triplet.dstId, triplet.srcAttr + triplet.attr))
} else {
(a, b) => math.min(a, b) // Merge Message
After reading the two replies from #Mahmoud Hanafy and #Shaido confirming that there is no way to activate the vertices or vote to halt in GraphX, I tried to implement this logic within the algorithm itself. So, here is what I did:
Pregel's API sends an init message to all the graph vertices in the first super-step where they can execute their routines at least one time before they become inactive.
At the end of this super-step, each vertex v may send messages to its neighbors and wait to receive messages from others.
In the second super-step, not all vertices will receive information from their neighbors, that means not all vertices will be activated in the second super-step ! So, to solve this we need to get back to super-step one and ensure that each vertex will receive a message ! How? by sending a message to itself ! (This is the only way I can guarantee the activation of my vertex in the next super-step but I believe it's not the best one to do it because this will increase the number of messages sent and received).
In the second super-step, every vertex will receive at least one message and hence will be active so it can execute its program.
To ensure that a vertex will be activated in the next super-steps, we can do the same.
I repeat, this is the only way I come up with to solve my problem but I don't encourage you to use it.

Expected behavior when multiple things happen together in select

Assuming one goroutine is waiting on the following select on two unbuffered channels one and two
select {
case <-one:
fmt.Println("read from one")
case <-two:
fmt.Println("read from two")
and one one goroutine is waiting on the following send
one <- 1
and another is waiting on the following
two <- 2
The first waiting on a select implies that there is room in the buffer for both the channels one and two, then which select case is guaranteed to run? Is it deterministic or can either run with one channel left with one unread value at the end.
If there is only one guaranteed net output, then do selects ensure a total order across all operations on all the channels participating in the select? That seems very inefficient..
For example in the following code
package main
import (
func main() {
one_net := 0
two_net := 0
var mtx = &sync.Mutex{}
for i := 0; i < 8; i++ {
one, two := make(chan int), make(chan int)
go func() { // go routine one
select {
case <-one:
fmt.Println("read from one")
case <-two:
fmt.Println("read from two")
go func() { // go routine two
one <- 1
fmt.Println("Wrote to one")
go func() { // go routine three
two <- 2
fmt.Println("Wrote to two")
fmt.Println("one_net", one_net)
fmt.Println("two_net", two_net)
can there even be a mismatch in the number of reads vs the number of writes (i.e. can one_net and two_net be non 0 at the end)? For example in the case where the select statement is waiting on a read from both channels, and then goroutines two and three go through with their respective writes, but then the select only picks up on one of those writes.
The Go Programming Language Specification
Select statements
A "select" statement chooses which of a set of possible send or
receive operations will proceed.
If one or more of the communications can proceed, a single one that
can proceed is chosen via a uniform pseudo-random selection.
Your question is imprecise: How to create a Minimal, Complete, and Verifiable example. For example,
package main
import (
func main() {
for i := 0; i < 8; i++ {
one, two := make(chan int), make(chan int)
go func() { // goroutine one
select {
case <-one:
fmt.Println("read from one")
case <-two:
fmt.Println("read from two")
select {
case <-one:
fmt.Println("read from one")
case <-two:
fmt.Println("read from two")
go func() { // goroutine two
one <- 1
go func() { // goroutine three
two <- 2
$ go run chan.go
read from two
read from one
read from one
read from two
read from one
read from two
read from two
read from one
read from one
read from two
read from two
read from one
read from one
read from two
read from two
read from one
What behavior do you expect and why?
The Go Programming Language Specification
Channel types
A channel provides a mechanism for concurrently executing functions to
communicate by sending and receiving values of a specified element
A new, initialized channel value can be made using the built-in
function make, which takes the channel type and an optional capacity
as arguments:
make(chan int, 100)
The capacity, in number of elements, sets the size of the buffer in
the channel. If the capacity is zero or absent, the channel is
unbuffered and communication succeeds only when both a sender and
receiver are ready. Otherwise, the channel is buffered and
communication succeeds without blocking if the buffer is not full
(sends) or not empty (receives). A nil channel is never ready for
Go statements
A "go" statement starts the execution of a function call as an
independent concurrent thread of control, or goroutine, within the
same address space.
The function value and parameters are evaluated as usual in the
calling goroutine, but unlike with a regular call, program execution
does not wait for the invoked function to complete. Instead, the
function begins executing independently in a new goroutine. When the
function terminates, its goroutine also terminates. If the function
has any return values, they are discarded when the function completes.
Analyzing your new example:
The channels are unbuffered. Goroutines two and three wait on goroutine one. A send on an unbuffered channel waits until there is a pending receive. When the goroutine one select is evaluated, there will be a pending receive on either channel one or channel two. The goroutine, two or three, that sends on that channel can now send and terminate. Goroutine one can now execute a receive on that channel and terminate. As a crude goroutine synchronization mechanism, we wait goroutine main for one millisecond and then terminate it, which terminates any other goroutines. It will terminate the goroutine, two or three, that didn't get to send because it's still waiting for a pending receive.
You ask "can there even be a mismatch in the number of reads vs the number of writes (i.e. can one_net and two_net be non 0 at the end)? For example in the case where the select statement is waiting on a read from both channels, and then goroutines two and three go through with their respective writes, but then the select only picks up on one of those writes."
Only one of goroutines two and three gets to send (write). There will be exactly one (send) write and one (receive) read. This assumes that goroutine main does not terminate before this occurs, that is, it occurs within one millisecond.
As peterSO points out, selection among multiple simultaneous channels that are ready is pseudo-random.
However, it is important to notice that in most cases, you will have race conditions between the sending and/or receiving goroutines, which also introduces indeterminism.
In fact, peterSO's example illustrates this very situation; at the point where the receiving goroutine reaches the first select statement, there is no guarantee whether any or both of the sending goroutines have executed their respective send statement. The relevant snippet follows, with some added comments:
a, b := make(chan int), make(chan int)
go func() { // goroutine one
// At this point, any or none of the channels could be ready.
select {
case <-a:
fmt.Println("read from a")
case <-b:
fmt.Println("read from b")
// At this point, we will have read one, and will block waiting for the other.
select {
case <-a:
fmt.Println("read from a")
case <-b:
fmt.Println("read from b")
go func() { // goroutine two
a <- 1 // Does this execute first?
go func() { // goroutine three
b <- 2 // ...or does this?
In general, when writing concurrent programmes, one should avoid relying on concurrent events happening in any particular determined order. Unless your program logic serialises things, as a rule of thumb, consider them happening in an indeterminate (though not necessarily random and evenly distributed) order, and you will be safe more often than sorry.

RX and buffering

I'm trying to obtain the following observable (with a buffer capacity of 10 ticks):
Time 0 5 10 15 20 25 30 35 40
Source A B C D E F G H
Result A E H
Phase |<------->|-------|<------->|<------->|
That is, the behavior is very similar to the Buffer observable with the difference that the buffering phase is not in precise time slot, but starts at the first symbol pushed in the idle phase. I mean, in the example above the buffering phases start with the 'A', 'E', and 'H' symbols.
Is there a way to compose the observable or do I have to implement it from scratch?
Any help will be appreciated.
Try this:
IObservable<T> source = ...;
IScheduler scheduler = ...;
IObservable<IList<T>> query = source
.Publish(obs => obs
.Buffer(() => obs.Take(1).IgnoreElements()
.Concat(Observable.Return(default(T)).Delay(duration, scheduler))
The buffer closing selector is called once at the start and then once whenever a buffer closes. The selector says "The buffer being started now should be closed duration after the first element of this buffer, or when the source completes, whichever occurs first."
Edit: Based on your comments, if you want to make multiple subscriptions to query share a single subscription to source, you can do that by appending .Publish().RefCount() to the query.
IObservable<IList<T>> query = source
.Publish(obs => obs
.Buffer(() => obs.Take(1).IgnoreElements()
.Concat(Observable.Return(default(T)).Delay(duration, scheduler))

In Rx (or RxJava/RxScala), how to make an auto-resetting stateful latch map/filter for measuring in-stream elapsed time to touch a barrier?

Apologies if the question is poorly phrased, I'll do my best.
If I have a sequence of values with times as an Observable[(U,T)] where U is a value and T is a time-like type (or anything difference-able I suppose), how could I write an operator which is an auto-reset one-touch barrier, which is silent when abs(u_n - u_reset) < barrier, but spits out t_n - t_reset if the barrier is touched, at which point it also resets u_reset = u_n.
That is to say, the first value this operator receives becomes the baseline, and it emits nothing. Henceforth it monitors the values of the stream, and as soon as one of them is beyond the baseline value (above or below), it emits the elapsed time (measured by the timestamps of the events), and resets the baseline. These times then will be processed to form a high-frequency estimate of the volatility.
For reference, I am trying to write a volatility estimator outlined in , where rather than measuring the standard deviation (deviations at regular homogeneous times), you repeatedly measure the time taken to breach a barrier for some fixed barrier amount.
Specifically, could this be written using existing operators? I'm a bit stuck on how the state would be reset, though maybe I need to make two nested operators, one which is one-shot and another which keeps creating that one-shot... I know it could be done by writing one by hand, but then I need to write my own publisher etc etc.
I don't fully understand the algorithm and your variables in the example, but you can use flatMap with some heap-state and return empty() or just() as needed:
int[] var1 = { 0 };
source.flatMap(v -> {
var1[0] += v;
if ((var1[0] & 1) == 0) {
return Observable.just(v);
return Observable.empty();
If you need a per-sequence state because of multiple consumers, you can defer the whole thing:
Observable.defer(() -> {
int[] var1 = { 0 };
return source.flatMap(v -> {
var1[0] += v;
if ((var1[0] & 1) == 0) {
return Observable.just(v);
return Observable.empty();