Force parfor to respect some order - matlab

I understand that some indeterminism stems from parfor's parallel nature but I don't understand why it should be entirely random. Is there any way to force parfor to respect (at least loosely) the order of the loop? More specifically I would like that in the case of:
parfor i=1:100
do_independent_stuff()
end
each worker of the pool when asking for a new task (i.e. in this case a new iteration of the loop) to be affected the lowest i that hasn't been computed or affected to a worker yet.

I think its by design that running something in parallel assumes that order is not important, at least in Matlab. Each thread/worker should be independent of each other. However, as indicated in this question, you could try job and task control interface to give you some level of control.

Firstly, in practice, PARFOR isn't "entirely random" - you can easily observe that it sends out chunks of loop iterates in reverse order. In R2013b and later, if you need more control over ordering (if, for example, you know that certain of your independent things are likely to take a long time, and therefore wish to start computing them first), you can use PARFEVAL.

If you need to loosely synchronize things, for instance wait until some thread as finished or has reach some point before to start another one, best should be to use semaphores, locks, mutex, etc...
I don't know if 'Parallel toolbox' includes such synchronization objects, but here is some workaround to create semaphore for instance:
https://stackoverflow.com/a/22874669/684399
You can also use objects in 'System.Threading' namespace (requires .NET):
Init:
someResultAvailable = System.Threading.ManualResetEvent(false);
In some job:
... do work ...
someResultAvailable .Set();
... continue ...
In another one:
... do work ...
if (!someResultAvailable.WaitOne(10000))
{
error('Timeout waiting for result from other thread');
}
... continue now knowing that results are available ...

Related

Abort execution of parsim

For the use case of being able to abort parallel simulations with a MATLAB GUI, I would like to stop all scheduled simulations after the user pressed the Stop button.
All simulations are submitted at once using the parsim command, hence something like a callback to my GUI variables (App Designer) would be the most preferable solution.
Approaches I have considered but were not exactly providing a desirable solution:
The Simulation Manager provides the functionality to close simulations using its own interface. If I only had the code its Stop button executes...
parsim uses the Simulink.SimulationInput class as input to run simulations, allowing to modify the preSimFcn at the beginning of each simulation. I have not found a way to "skip" the simulation at its initialization phase apart from intentionally throwing an error so far.
Thank you for your help!
Update 1: Using the preSimFcn to set the the termination time equal to the start time drastically reduces simulation time. But since the first step still is computed there has to be a better solution.
simin = simin.setModelParameter('StopTime',get_param(mdl,'StartTime'))
Update 2: Intentionally throwing an error executing the preSimFcn, for example by setting it to
simin = simin.setModelParameter('SimulationCommand','stop')
provides the shortest termination times for me so far. Though, it requires catching and identifying the error in the ErrorMessageof the Simulink.SimulationOutput object. As this is exactly the "ugly" implementation I wanted to avoid, the issue is still active.
If you are using 17b or later, parsim provides an option to 'RunInBackground'. It returns an array of Future objects.
F = parsim(in, 'RunInBackground', 'on')
Please note that is only available for parallel simulations. The Simulink.Simulation.Future object F provides a cancel method which will terminate the simulation. You can use the fetchOutputs methods to fetch the output from the simulation.
F.cancel();

Order in always_comb block

I have the impression that in an always_comb block, all the non-blocking assignment should work in parallel. That is, if I have
always_comb
begin
a = b;
b = c;
end
Then, a should be equal to c regardless of the order of above two lines in the always_comb block, as they are evaluated concurrently anyway. However, today I experienced an issue that change the order of above two lines, the results are different!!! Whay is that?
The statements within a begin/end block execute serially. It does not matter if you are using an always_comb or any other kind of always block. But you are using blocking assignments, not non-blocking assignments, which is the proper thing to do in an always_comb block. Non-blocking assignments are used to assign sequential logic, which implies storage of the current and next state.
This difference stems from that combinational always blocks cannot "self-trigger". The way the simulator works when a signal changes value is to locate all always blocks with that signal in the sensitivity list, then execute them one by one sequentially. But, only if that block is not already running! In your case the expected behavior would require the block to run twice, but instead only one iteration occurs for every update of c.
The situation is unfortunate since the sensitivity list is a simulator concept and generally ignored altogether for synthesis. Most synthesis tools would generate a wire from your code without producing any warning, creating a simulation-synthesis mismatch.
Note that an explicit sensitivity list (e.g., always #(b or c)) does not make any difference. One solution is to always ensure that the assignments are in the right order. Another is to use non-blocking assignments, but this is generally advised against since it slows down the simulator. (Note that VHDL does not have blocking assignments, and would thus always have this performance penalty. On the plus side you do not have problems like this.)

Implementing a priority queue in matlab in order to solve optimization problems using BRANCH AND BOUND

I'm trying to code a priority queue in MATLAB, I know there is the SIMULINK toolbox for priority queue, but I'm trying to code it in MATLAB. I have a pseudo code that uses priority queue for a method called BEST First Search with Branch and Bound. The branch and bound algorithm design strategy is a state space tree and it is used to solve optimization problems. simple explanation of what is branch and bound
I have read chapter 5: Branch and Bound from a book called 'FOUNDATIONS OF ALGORITHMS', it's the 4th edition by Richard Neapolitan and Kumarss Naimipour , and the text is about designing algorithms, complexity analysis of algorithms, and computational complexity (analysis of problems), very interesting book, and I came across this pseudocode:
Void BeFS( state_space_tree T, number& best)
{
priority _queue-of_node PQ;
node(u,v);
initialize (PQ) % initialize PQ to be empty
u=root of T;
best=value(v);
insert(PQ,v) insert(PQ,v) is a procedure that adds v to the priority queue PQ
while(!empty(PQ){ % remove node with best bound
remove(PQ,v);
remove(PQ,v) is a procedure that removes the node with the best bound and it assigns its value to v
if(bound(v) is better than best) % check if node is still promising
for (each child of u of v){
if (value (u) is better than best)
(best=value(u);
if (bound(u) is better than best)
insert(PQ,u)
}
}
}
I don't know how to code it in matlab, and branch and bound is an interesting general algorithm for finding optimal solutions of various optimization problems, especially in discrete and combinatorial optimization, instead of using heuristics to find an optimal solution, since branch and bound reduces calculation time and finds the optimal solution faster.
EDIT:
I have checked everywhere whether a solution already has been implemented , before posting a question here. And I came here to get ideas of how I can get started to implement this code
I have included this in your post so people can know better what you expect of them. However, 'ideas to get started to implement' is still not much more specific than 'how to write code in matlab'.
However, I will still try to answer:
Make the structure of the code, write the basic loops and fill them with comments of what you want to do
Pick (the easiest or first) one of those comments, and see whether you can make it happen in a few lines, you can test it by generating some dummy input for that piece of code
Keep repeating step 2 untill all comments have the required code
If you get stuck in one of the blocks, and have searched but not found the answer to a specific question. Then this is not a bad place to ask.

Objective-C: Calling and copying the same block from multiple threads

I'm dealing with neural networks here, but it's safe to ignore that, as the real question has to deal with blocks in objective-c. Here is my issue. I found a way to convert a neural network into a big block that can be executed all at once. However, it goes really, really slow, relative to activating the network. This seems a bit counterintuitive.
If I gave you a group of nested functions like
CGFloat answer = sin(cos(gaussian(1.5*x + 2.5*y)) + (.3*d + bias))
//or in block notation
^(CGFloat x, CGFloat y, CGFloat d, CGFloat bias) {
return sin(cos(gaussian(1.5*x + 2.5*y)) + (.3*d + bias));
};
In theory, running that function multiple times should be easier/quicker than looping through a bunch of connections, and setting nodes active/inactive, etc, all of which essentially calculate this same function in the end.
However, when I create a block (see thread: how to create function at runtime) and run this code, it is slow as all hell for any moderately sized network.
Now, what I don't quite understand is:
When you copy a block, what exactly are you copying?
Let's say, I copy a block twice, copy1 and copy2. If I call copy1 and copy2 on the same thread, is the same function called? I don't understand exactly what the docs mean for block copies: Apple Block Docs
Now if I make that copy again, copy1 and copy2, but instead, I call the copies on separate threads, now how do the functions behave? Will this cause some sort of slowdown, as each thread attempts to access the same block?
When you copy a block, what exactly
are you copying?
You are copying any state the block has captured. If that block captures no state -- which that block appears not to -- then the copy should be "free" in that the block will be a constant (similar to how #"" works).
Let's say, I copy a block twice, copy1
and copy2. If I call copy1 and copy2
on the same thread, is the same
function called? I don't understand
exactly what the docs mean for block
copies: Apple Block Docs
When a block is copied, the code of the block is never copied. Only the captured state. So, yes, you'll be executing the exact same set of instructions.
Now if I make that copy again, copy1
and copy2, but instead, I call the
copies on separate threads, now how do
the functions behave? Will this cause
some sort of slowdown, as each thread
attempts to access the same block?
The data captured within a block is not protected from multi-threaded access in any way so, no, there would be no slowdown (but there will be all the concurrency synchronization fun you might imagine).
Have you tried sampling the app to see what is consuming the CPU cycles? Also, given where you are going with this, you might want to become acquainted with your friendly local disassembler (otool -TtVv binary/or/.o/file) as it can be quite helpful in determining how costly a block copy really is.
If you are sampling and seeing lots of time in the block itself, then that is just your computation consuming lots of CPU time. If the block were to consume CPU during the copy, you would see the consumption in a copy helper.
Try creating a source file that contains a bunch of different kinds of blocks; with parameters, without, with captured state, without, with captured blocks with/without captured state, etc.. and a function that calls Block_copy() on each.
Disassemble that and you'll gain a deep understanding on what happens when blocks are copied. Personally, I find x86_64 assembly to be easier to read than ARM. (This all sounds like good blog fodder -- I should write it up).

Does MATLAB perform tail call optimization?

I've recently learned Haskell, and am trying to carry the pure functional style over to my other code when possible. An important aspect of this is treating all variables as immutable, i.e. constants. In order to do so, many computations that would be implemented using loops in an imperative style have to be performed using recursion, which typically incurs a memory penalty due to the allocation a new stack frame for each function call. In the special case of a tail call (where the return value of a called function is immediately returned to the callee's caller), however, this penalty can be bypassed by a process called tail call optimization (in one method, this can be done by essentially replacing a call with a jmp after setting up the stack properly). Does MATLAB perform TCO by default, or is there a way to tell it to?
If I define a simple tail-recursive function:
function tailtest(n)
if n==0; feature memstats; return; end
tailtest(n-1);
end
and call it so that it will recurse quite deeply:
set(0,'RecursionLimit',10000);
tailtest(1000);
then it doesn't look as if stack frames are eating a lot of memory. However, if I make it recurse much deeper:
set(0,'RecursionLimit',10000);
tailtest(5000);
then (on my machine, today) MATLAB simply crashes: the process unceremoniously dies.
I don't think this is consistent with MATLAB doing any TCO; the case where a function tail-calls itself, only in one place, with no local variables other than a single argument, is just about as simple as anyone could hope for.
So: No, it appears that MATLAB does not do TCO at all, at least by default. I haven't (so far) looked for options that might enable it. I'd be surprised if there were any.
In cases where we don't blow out the stack, how much does recursion cost? See my comment to Bill Cheatham's answer: it looks like the time overhead is nontrivial but not insane.
... Except that Bill Cheatham deleted his answer after I left that comment. OK. So, I took a simple iterative implementation of the Fibonacci function and a simple tail-recursive one, doing essentially the same computation in both, and timed them both on fib(60). The recursive implementation took about 2.5 times longer to run than the iterative one. Of course the relative overhead will be smaller for functions that do more work than one addition and one subtraction per iteration.
(I also agree with delnan's sentiment: highly-recursive code of the sort that feels natural in Haskell is typically likely to be unidiomatic in MATLAB.)
There is a simple way to check this. Create this function tail_recursion_check:
function r = tail_recursion_check(n)
if n > 1
r = tail_recursion_check(n - 1);
else
error('error');
end
end
and run tail_recursion_check(10), for example. You are going to see a very long stack trace with 10 items that says error at line 3. If there were tail call optimization, you would only see one.