What are the implications and semantics for choosing a mode in Unison's distributed package? - distributed-computing

I searched through the documentation but may have missed where unique type Mode = Parallel | Sequential is discussed. In particular I'm trying to use it in Seq.fromList, which says:
Seq.fromList : Mode -> [a] -> Seq k a
...
All branches in the result tree will have the provided Mode
Is this a means of controlling nested parallelism, as seems to be indicated, or is it something else?

Yes, that's right. Certain operations like reduce will examine the Mode and use that to decide whether to use parallelism when processing that node in the tree.
Also see this article section discussing controlling the granularity of parallelism

Related

Monitoring runtime use of concrete collections

Background:
Our Scala software consists of various components, developed by different teams, that pass Scala collections back and forth. The APIs usually use abstract collections such as Seq[T] and Set[T], and developers are currently essentially free to choose any implementation they like: e.g. when creating new instances, some go with List() or Vector(), others with Seq.empty.
Problem:
Different implementations have different performance characteristics, e.g. List might have been a good choice locally (for one component) because the collection is only sequentially iterated over or modified at the head, but it could have been a poor choice globally, because another component performs loads of random accesses.
Question:
Are their any tools — ideally Scala-specific, but JVM-general might also be OK — that can monitor runtime use of collections and record the information necessary to detect and report undesirable access/usage patterns of collections?
My feeling is that runtime monitoring would be more fruitful than static analyses (including simple linting) because (i) statically detecting usage patterns in hot code is virtually impossible, and (ii) would most likely miss collections that are internally created, e.g. when performing complex filter/map/fold/etc. operations on immutable collections.
Edits/Clarifications:
Changing the interfaces to enforce specific types such as List isn't an option; it would also not prevent purely internal use of "wrong" collections/usage patterns.
The goal is identifying a globally optimal (over many runs of the software) collection type rather than locally optimising for each applied algorithm
You don't need linting for this, let alone runtime monitoring. This is exactly what having a strictly-typed language does for you out of the box. If you want to ensure a particular collection type is passed to the API, just declare that that API accepts that collection type (e.g., def foo(x: Stream[Bar]), not def foo(x: Seq[Bar]), etc.).
Alternatively, when practical, just convert to the desired type as part of implementation: def foo(x: List[Bar]) = { val y = x.toArray ; lotsOfRandomAccess(y); }
Collections that are "internally created" are typically the same type as the parent object: List(1,2,3).map(_ + 1) returns a List etc.
Again, if you want to ensure you are using a particular type, just say so:
val mapped: List[Int] = List(1,2,3).map(_ + 1)
You can actually, change the type this way if there is a need for that:
val mappedStream: Stream[Int] = List(1,2,3).map(_ + 1)(breakOut)
As discussed in the comments, this is a problem that needs to be solved at a local level rather than via global optimisation.
Each algorithm in the system will work best with a particular data type, so using a single global structure will never be optimal. Instead, each algorithm should ensure that the incoming data is in a format that can be processed efficiently. If it is not in the right format, the data should be converted to a better format as the first part of the process. Since the algorithm works better on the right format, this conversion is always a performance improvement.
The output data format is more of a problem if the system does not know which algorithm will be used next. The solution is to use the most efficient output format for the algorithm in question, and rely on other algorithms to re-format the data if required.
If you do want to monitor the whole system, it would be better to track the algorithms rather than the collections. If you monitor which algorithms are called and in which order you can create multiple traces through the code. You can then play back those traces with different algorithms and data structures to see which is the most efficient configuration.

Query on various addressing modes?

Can an array be implemented using only indirect addressing mode? I think we can only access the first element but what about the other elements? For that, I think, we'll have to use immediate addressing mode.
An add instruction can generate an address in a register.
A CPU with only [register] addressing modes would work, but need more instructions than one with an immediate displacement as part of load/store instructions.
Instruction set design isn't about what's necessary for computation to be possible, but rather about how to make it efficient.
related:
Why does the lw instruction's second argument take in both an offset and regSource?
What is the minimum instruction set required for any Assembly language to be considered useful? (note the difference between useful and Turing-complete.)

How to handle the two signals depending on each other?

I read Deprecating the Observer Pattern with Scala.React and found reactive programming very interesting.
But there is a point I can't figure out: the author described the signals as the nodes in a DAG(Directed acyclic graph). Then what if you have two signals(or event sources, or models, w/e) depending on each other? i.e. the 'two-way binding', like a model and a view in web front-end programming.
Sometimes it's just inevitable because the user can change view, and the back-end(asynchronous request, for example) can change model, and you hope the other side to reflect the change immediately.
The loop dependencies in a reactive programming language can be handled with a variety of semantics. The one that appears to have been chosen in scala.React is that of synchronous reactive languages and specifically that of Esterel. You can have a good explanation of this semantics and its alternatives in the paper "The synchronous languages 12 years later" by Benveniste, A. ; Caspi, P. ; Edwards, S.A. ; Halbwachs, N. ; Le Guernic, P. ; de Simone, R. and available at http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1173191&tag=1 or http://virtualhost.cs.columbia.edu/~sedwards/papers/benveniste2003synchronous.pdf.
Replying #Matt Carkci here, because a comment wouldn't suffice
In the paper section 7.1 Change Propagation you have
Our change propagation implementation uses a push-based approach based on a topologically ordered dependency graph. When a propagation turn starts, the propagator puts all nodes that have been invalidated since the last turn into a priority queue which is sorted according to the topological order, briefly level, of the nodes. The propagator dequeues the node on the lowest level and validates it, potentially changing its state and putting its dependent nodes, which are on greater levels, on the queue. The propagator repeats this step until the queue is empty, always keeping track of the current level, which becomes important for level mismatches below. For correctly ordered graphs, this process monotonically proceeds to greater levels, thus ensuring data consistency, i.e., the absence of glitches.
and later at section 7.6 Level Mismatch
We therefore need to prepare for an opaque node n to access another node that is on a higher topological level. Every node that is read from during n’s evaluation, first checks whether the current propagation level which is maintained by the propagator is greater than the node’s level. If it is, it proceed as usual, otherwise it throws a level mismatch exception containing a reference to itself, which is caught only in the main propagation loop. The propagator then hoists n by first changing its level to a level above the node which threw the exception, reinserting n into the propagation queue (since it’s level has changed) for later evaluation in the same turn and then transitively hoisting all of n’s dependents.
While there's no mention about any topological constraint (cyclic vs acyclic), something is not clear. (at least to me)
First arises the question of how is the topological order defined.
And then the implementation suggests that mutually dependent nodes would loop forever in the evaluation through the exception mechanism explained above.
What do you think?
After scanning the paper, I can't find where they mention that it must be acyclic. There's nothing stopping you from creating cyclic graphs in dataflow/reactive programming. Acyclic graphs only allow you to create Pipeline Dataflow (e.g. Unix command line pipes).
Feedback and cycles are a very powerful mechanism in dataflow. Without them you are restricted to the types of programs you can create. Take a look at Flow-Based Programming - Loop-Type Networks.
Edit after second post by pagoda_5b
One statement in the paper made me take notice...
For correctly ordered graphs, this process
monotonically proceeds to greater levels, thus ensuring data
consistency, i.e., the absence of glitches.
To me that says that loops are not allowed within the Scala.React framework. A cycle between two nodes would seem to cause the system to continually try to raise the level of both nodes forever.
But that doesn't mean that you have to encode the loops within their framework. It could be possible to have have one path from the item you want to observe and then another, separate, path back to the GUI.
To me, it always seems that too much emphasis is placed on a programming system completing and giving one answer. Loops make it difficult to determine when to terminate. Libraries that use the term "reactive" tend to subscribe to this thought process. But that is just a result of the Von Neumann architecture of computers... a focus of solving an equation and returning the answer. Libraries that shy away from loops seem to be worried about program termination.
Dataflow doesn't require a program to have one right answer or ever terminate. The answer is the answer at this moment of time due to the inputs at this moment. Feedback and loops are expected if not required. A dataflow system is basically just a big loop that constantly passes data between nodes. To terminate it, you just stop it.
Dataflow doesn't have to be so complicated. It is just a very different way to think about programming. I suggest you look at J. Paul Morison's book "Flow Based Programming" for a field tested version of dataflow or my book (once it's done).
Check your MVC knowledge. The view doesn't update the model, so it won't send signals to it. The controller updates the model. For a C/F converter, you would have two controllers (one for the F control, on for the C control). Both controllers would send signals to a single model (which stores the only real temperature, Kelvin, in a lossless format). The model sends signals to two separate views (one for C view, one for F view). No cycles.
Based on the answer from #pagoda_5b, I'd say that you are likely allowed to have cycles (7.6 should handle it, at the cost of performance) but you must guarantee that there is no infinite regress. For example, you could have the controllers also receive signals from the model, as long as you guaranteed that receipt of said signal never caused a signal to be sent back to the model.
I think the above is a good description, but it uses the word "signal" in a non-FRP style. "Signals" in the above are really messages. If the description in 7.1 is correct and complete, loops in the signal graph would always cause infinite regress as processing the dependents of a node would cause the node to be processed and vice-versa, ad inf.
As #Matt Carkci said, there are FRP frameworks that allow loops, at least to a limited extent. They will either not be push-based, use non-strictness in interesting ways, enforce monotonicity, or introduce "artificial" delays so that when the signal graph is expanded on the temporal dimension (turning it into a value graph) the cycles disappear.

What is the difference between serializability and linearizability?

I am very confused between these two consistency models. Please give some timeline examples along with explanation.
http://en.wikipedia.org/wiki/Consistency_model
It was hard to find information about this subject. However, At some point I found a statement that explained it clearly:
Linearizability gives isolation at the level of operations, while Serializability gives isolation at the level of transactions.
(summarized from the in-depth description found here)
As an example:
Here, A, B and C are three different transactions running at the same time. r(varname) means that the current transaction is accessing the value inside varname, and w(varname) means that the current transaction is writing a certain value in varname.
Now, to create a linearized history of these events, we have to make sure that no two operations are happening at the same time. An operation that has started while another operation already started should appear behind the first operation.
In this case:
Log1: A.r(x), B.r(X), B.r(Y), A.w(X), C.r(Y)
To create a Serialized history of these events, one has to separate all the operations of the transactions A, B and C so there are no interleaved operations from other transactions.
From our example this could result in:
Log2: A.r(x), A.w(x), B.r(X), B.r(Y), C.r(Y)
please have a look at this video: https://www.youtube.com/watch?v=noUNH3jDLC0&t
Martin Kleppmann is the writer of Designing Data-Intense Applications which is a great book and I'd highly recommend it to someone interested about either serializability or linearizability.

How to start working with a large decision table

Today I've been presented with a fun challenge and I want your input on how you would deal with this situation.
So the problem is the following (I've converted it to demo data as the real problem wouldn't make much sense without knowing the company dictionary by heart).
We have a decision table that has a minimum of 16 conditions. Because it is an impossible feat to manage all of them (2^16 possibilities) we've decided to only list the exceptions. Like this:
As an example I've only added 10 conditions but in reality there are (for now) 16. The basic idea is that we have one baseline (the default) which is valid for everyone and all the exceptions to this default.
Example:
You have a foreigner who is also a pirate.
If you go through all the exceptions one by one, and condition by condition you remove the exceptions that have at least one condition that fails. In the end you'll end up with the following two exceptions that are valid for our case. The match is on the IsPirate and the IsForeigner condition. But as you can see there are 2 results here, well 3 actually if you count the default.
Our solution
Now what we came up with on how to solve this is that in the GUI where you are adding these exceptions, there should run an algorithm which checks for such cases and force you to define the exception more specifically. This is only still a theory and hasn't been tested out but we think it could work this way.
My Question
I'm looking for alternative solutions that make the rules manageable and prevent the problem I've shown in the example.
Your problem seem to be resolution of conflicting rules. When multiple rules match your input, (your foreigner and pirate) and they end up recommending different things (your cangetjob and cangetevicted), you need a strategy for resolution of this conflict.
What you mentioned is one way of resolution -- which is to remove the conflict in the first place. However, this may not always be possible, and not always desirable because when a user adds a new rule that conflicts with a set of old rules (which he/she did not write), the user may not know how to revise it to remove the conflict.
Another possible resolution method is prioritization. Mark a priority on each rule (based on things like the user's own authority etc.), sort the matching rules according to priority, and apply in ascending sequence of priority. This usually works and is much simpler to manage (e.g. everybody knows that the top boss's rules are final!)
Prioritization may also be used to mark a certain rule as "global override". In your example, you may want to make "IsPirate" as an override rule -- which means that it overrides settings for normal people. In other words, once you're a pirate, you're treated differently. This make it very easy to design a system in which you have a bunch of normal business rules governing 90% of the cases, then a set of "exceptions" that are treated differently, automatically overriding certain things. In this case, you should also consider making "?" available in the output columns as well.
One other possible resolution method is to include attributes in each of your conditions. For example, certain conditions must have no "zeros" in order to pass (? doesn't matter). Some conditions must have at least one "one" in order to pass. In other words, mark each condition as either "AND", "OR", or "XOR". Some popular file-system security uses this model. For example, CanGetJob may be AND (you want to be stringent on rights-to-work). CanBeEvicted may be OR -- you may want to evict even a foreigner if he is also a pirate.
An enhancement on the AND/OR method is to provide a threshold that the total result must exceed before passing that condition. For example, putting CanGetJob at a threshold of 2 then it must get at least two 1's in order to return 1. This is sometimes useful on conditions that are not clearly black-and-white.
You can mix resolution methods: e.g. first prioritize, then use AND/OR to resolve rules with similar priorities.
The possibilities are limitless and really depends on what your actual needs are.
To me this problem reminds business rules engine where there is no known algorithm to define outputs from inputs (e.g. using boolean logic) but the user (typically some sort of administrator) has to define all or some the logic itself.
This might sound a bit of an overkill but OTOH this provides virtually limit-less extension capabilities: you don't have to code any new business logic, just define a new rule set.
As I understand your problem, you are looking for a nice way to visualise the editing for these rules. But this all depends on your programming language and the tool you select for this. Java, for example, has JBoss Drools. Quoting their page:
Drools Guvnor provides a (logically
centralized) repository to store you
business knowledge, and a web-based
environment that allows business users
to view and (within certain
constraints) possibly update the
business logic directly.
You could possibly use this generic tool or write your own.
Everything depends on what your actual rules will look like. Rules like 'IF has an even number of these properties THEN' would be painful to represent in this format, whereas rules like 'IF pirate and not geek THEN' are easy.
You can 'avoid the ambiguity' by stating that you'll always be taking the first actual match, in other words your rules have a priority. You'd then want to flag rules which have no effect because they are 'shadowed' by rules higher up. They're not hard to find, so it's something your program should do.
Your interface could also indicate groups of rules where rules within the group can be in any order without changing the outcomes. This will add clarity to what the rules are really saying.
If some of your outputs are relatively independent of the others, you will also get a more compact and much clearer table by allowing question marks in the output. In that design the scan for first matching rule is done once for each output. Consider for example if 'HasChildren' is the only factor relevant to 'Can Be Evicted'. With question marks in the outputs (= no effect) you could be halving the number of exception rules.
My background for this is circuit logic design, not business logic. What you're designing is similar to, but not the same as, a PLA. As long as your actual rules are close to sum of products then it can work well. If your rules aren't, for example the 'even number of these properties' rule, then the grid like presentation will break down in a combinatorial explosion of cases. Your best hope if your rules are arbitrary is to get a clearer more compact presentation with either equations or with diagrams like a circuit diagram. To be avoided, if you can.
If you are looking for a Decision Engine with a GUI, than you can try this one: http://gandalf.nebo15.com/
We just released it, it's open source and production ready.
You probably need some kind of inference engine. Think about doing it in prolog.