Efficient parallelization of two tasks - matlab

I have two tasks that take a fairly short time to compute (around half a second each). These two tasks (say A and B) are called repeatedly a large number of times. Each time they are called, the two tasks can be computed in parallel because they don't depend on each other. Here is an example code without parallelization:
A = 3;
B = 4;
for counter = 1:5000
A = task1(counter,A); %Run task 1
B = task2(counter,B); %Run task 2
disp(task3(A,B)); %Use the results for task 3
end
The most obvious way to parallelize this on MATLAB is to do the following:
A = 3;
B = 4;
pool = parpool("threads");
for counter = 1:5000
parfor p = 1:2
if p == 1
A = task1(counter,A); %Run task 1
end
if p == 2
B = task2(counter,B); %Run task 2
end
disp(task3(A,B)); %Use the results for task 3
end
delete('pool')
This implementation, however, turns out to be much slower (almost by 10 times). My guess is that this is the result of overhead issues involved in parallelization.
Is there any way to go around this?

Related

Fibonacci Recursion Value tracer

So I need to write a program which uses a recursive function to store the value of input arguments in the order they were made.
e.g. If my function is [f trace]=fibo_trace(6,[]) it should return
[f trace]=fibo_trace(6,[])
f=
8
trace=
6 4 2 3 1 2 5 3 1 2 4 2 3 1 2
With trace being the values with which the recursive call is being initialized and f being the 6th element in the fibonacci series.
Here is my code
function [f,trace] = fibo_trace(n,v)
persistent ptrace; % must specify persistent
v=n;
ptrace=cat(2,ptrace,v);
if n <= 2
f = 1;
else
f = fibo_trace(n-2) + fibo_trace(n-1);
end
trace=ptrace;
end
But using a persistent variable does not give proper output if multiple conditions are tested. I need to compute this without using a persistent or global variable, can anyone help me out here?
When I don't use a persistent variable only the latest value of n is stored in vector v instead of the entire set of values.
First, note that your input v is never used, you always overwrite it with n.
Next, note that your calls to fibo_trace(n-1) and (n-2) could return trace, you're just choosing not to. We can take advantage of the 2nd output to build the trace array.
These two points mean you can get rid of the persistent variable, and simplify the code a bit. We just need to make separate calls to fibo_trace for the two previous iterations.
function [f,trace] = fibo_trace(n)
if n <= 2
% The trace should just be 'n' for the starting elements f=1
f = 1;
trace = n;
else
% Get previous values and their traces
[f1, t1] = fibo_trace(n-1);
[f2, t2] = fibo_trace(n-2);
% Compute the outputs
f = f1 + f2;
trace = [n, t2, t1];
end
end
Result:
[f, trace] = fibo_trace(6)
f =
8
trace =
6 4 2 3 1 2 5 3 1 2 4 2 3 1 2
There's some optimisation to be had here, since fibo_trace(n-1) will itself call fibo_trace(n-1), so computing fibo_trace(n-2) separately is multiplying the computation time.

To see the results of patternsearch optimization for each iteration in MATLAB

I have a optimization problem which is very time consuming and I need to do it many times (This is somehow a trial and error problem for me). However, I do not want to wait for the final result. I need to see the result of optimization at each iteration. More specifically, I want to see the x value (the solution) and am not so interested in fval (objective function value at x). Unfortunately, patternsearch only shows fval and not the solution of optimization at each iteration. I know that I should fix this problem through the "Output Function" and spent a lot of time and could not understand how to do it. To make everything convenient for you and let's consider the following simple optimization problem:
options = optimoptions('patternsearch');
options = optimoptions(options,'Display', 'iter','TolX',1e-6);
x0=2;lb=-3;ub=3;
x = patternsearch(#(x)x^2,x0,[],[],[],[],lb,ub,[],options);
The first few lines we see on the command window looks like the following:
Iter f-count f(x) MeshSize Method
0 1 4 1
1 2 4 0.5 Refine Mesh
2 3 0 1 Successful Poll
Unfortunately, I see nothing about x.
Would you please change my code so that I can also see the solution (x) at each iteration, it is greatly appreciated!
Babak
A valid output function handler for patternsearch should be specified as follows:
function [stop,options,optchanged] = fun(optimvalues,options,flag)
The following code should be enough to show the information you are looking for:
options = optimoptions('patternsearch');
options = optimoptions(options,'Display','iter','OutputFcns',#custom,'TolX',1e-6);
x0 = 2; lb = -3; ub = 3;
x = patternsearch(#(x)x^2,x0,[],[],[],[],lb,ub,[],options);
function [stop,options,optchanged] = custom(optimvalues,options,flag)
stop = false;
optchanged = false;
if (strcmp(flag,'iter'))
disp([' Iteration performed for X=' num2str(optimvalues.x)]);
end
end
Here is the output:
Iter Func-count f(x) MeshSize Method
0 1 4 1
Iteration performed for X=2
1 2 4 0.5 Refine Mesh
Iteration performed for X=0
2 3 0 1 Successful Poll
Iteration performed for X=0
3 3 0 0.5 Refine Mesh
Iteration performed for X=0
4 5 0 0.25 Refine Mesh
Iteration performed for X=0
...
It's just an example and you can, of course, tweak the function so that the text is displayed the way you prefer.

How to make a function that picks according to a changing distribution, without passing over and over?

This is not a question about MatLab, but it is a question about how to achieve something that would be easy in object-oriented programming when you're using a less sophisticated language.
I'm a mathematician who is writing some MatLab code to test an algorithm in linear algebra (I won't burden you with the details). The beginning of the program is to generate a random 500 by 50 matrix of floats (call it A). In the course of running my program, I will want to pick random rows of A, not uniformly at random, but rather according to a distribution where the likelihood of row i being picked is different, depending on the specific matrix that has been generated.
I want to write a function called "pickRandomRow" that I can call over and over when I need it. It will use the same probability distribution on the rows throughout each individual run of the program, but that distribution will change between runs of the program (because the random matrix will be different).
If I were using a more object-oriented language than MatLab, I would make a class called "rowPicker" which could be initialized with the information about the specific random matrix I'm using on this run. But here, I'm not sure how to make a function in MatLab that can know the information it needs to know about the random matrix A once and for all, without passing A to the function over and over (expensively), when it's not changing.
Possible options
Make pickRandomRow a script instead of a function, so it can see the workspace. Then I wouldn't be able to give pickRandomRow any arguments, but so far I don't see why I'd need to.
Start messing areound with classes in MatLab.
As far as I remember, MATLAB supports closures.
Closures are something like an object with bunch of private member variables and a single method.
So, you could do something like this:
function rowPicker = createRowPicker(matrix, param)
expensivePreparations = ... (use 'matrix' and 'param' here) ...
function pickedRow = someComplicatedSamplingFunction
... (use 'matrix', 'expensivePreparations' and 'param' here) ...
end
rowPicker = #someComplicatedSamplingFunction
end
and then you could generate a bunch of differently parameterized rowPickers in a loop, something like this:
for p = [p1, p2, p3]
matrix = generateMatrix()
picker = createRowPicker(matrix, p)
... (run expensive simulation, reuse 'picker')
end
In this way, the expensive intermediate result expensivePreparations will be saved inside the closure, and you won't have to recompute it in each step of your expensive simulation.
Warning: all of the above it matlab-esque pseudocode and not tested.
In order to achieve this task you could use the randsample function and, to be exact, its four arguments overload:
y = randsample(n,k,true,w) or y = randsample(population,k,true,w)
returns a weighted sample taken with replacement, using a vector of
positive weights w, whose length is n. The probability that the
integer i is selected for an entry of y is w(i)/sum(w). Usually, w is
a vector of probabilities. randsample does not support weighted
sampling without replacement.
An example:
M = [
1 1 1;
2 2 2;
3 3 3;
4 4 4;
5 5 5
];
idx = randsample(1:5,1,true,[0.2 0.2 0.1 0.1 0.4]);
row = M(idx,:);
If you have to pick more than one row every time you run the script and the fact that the weighted sampling without replacement is not supported, you could use the datasample function instead:
M = [
1 1 1;
2 2 2;
3 3 3;
4 4 4;
5 5 5
];
idx = datasample(1:5,2,'Replace',false,'Weights',[0.2 0.2 0.1 0.1 0.4]);
rows = M(idx,:);
For what concerns the choice between a class and a script, I honestly think you are overcomplicating your problem a little bit. An OOP class, in this case, looks like an overkill to me. If you want to use a script (actually, a function) without passing any argument to it, you could use the persistent modifier on an internally defined matrix and a variable representing its row probabilities. Let's assume that the first solution I proposed is the one that fits your need, then:
a = pickRandomRow();
b = pickRandomRow();
c = pickRandomRow();
function row = pickRandomRow()
persistent M;
persistent W;
if (isempty(M))
M = [
1 1 1;
2 2 2;
3 3 3;
4 4 4;
5 5 5
];
W = [
0.2
0.2
0.1
0.1
0.4
];
end
idx = randsample(1:size(M,1),1,true,W);
row = M(idx,:);
end
If you want to provide different weights according to previous computations, you could change the above code as follows:
w1 = WeightsFromDistributionX();
w2 = WeightsFromDistributionY();
a = pickRandomRow(w1);
b = pickRandomRow(w2);
c = pickRandomRow(w2);
function row = pickRandomRow(W)
persistent M;
if (isempty(M))
M = [
1 1 1;
2 2 2;
3 3 3;
4 4 4;
5 5 5
];
end
M_size = size(M,1);
W_size = numel(W);
if (M_size ~= W_size)
error('The weights vector must have the same length of matrix rows.');
end
idx = randsample(1:M_size,1,true,W);
row = M(idx,:);
end
If creating a class is too much work (you first class will be, it's quite different than in other languages), you have several alternatives.
A single distribution at the time
You can accomplish this using persistent variables in a function. The function will become some sort of unique object.
function out = func(arg)
persistent M; % matrix to pick rows from
persistent S; % status
if nargin == 1
M = randn(...);
S = ...;
else
% use M, S
return ...;
end
You call this using func('init') the first time, and data = func() after that.
Multiple different distributions
You can rewrite the above but returning a cell array with the internal data when called with 'init'. Other times you pass that cell array as input:
function out = func(arg)
if ischar(arg)
M = randn(...);
S = ...;
return {M,S};
else % iscell(arg)
% use M=arg{1}, S=arg{2}
return ...;
end
Of course, instead of a cell array it could be a struct. I see this as a "poor man's object". There's no control over the user modifying the status of the object, but if you're your own user, this is probably not a big deal.

Efficient operations of big non-sparse matrices in Matlab

I need to operate in big 3-dim non-sparse matrices in Matlab. Using pure vectorization gives a high computation time. So, I have tried to split the operations into 10 blocks and then parse the results.
I got surprised when I saw the the pure vectorization does not scale very well with the data size as presented in the following figure.
I include an example of the two approaches.
% Parameters:
M = 1e6; N = 50; L = 4; K = 10;
% Method 1: Pure vectorization
mat1 = randi(L,[M,N,L]);
mat2 = repmat(permute(1:L,[3 1 2]),M,N);
result1 = nnz(mat1>mat2)./(M+N+L);
% Method 2: Split computations
result2 = 0;
for ii=1:K
mat1 = randi(L,[M/K,N,L]);
mat2 = repmat(permute(1:L,[3 1 2]),M/K,N);
result2 = result2 + nnz(mat1>mat2);
end
result2 = result2/(M+N+L);
Hence, I wonder if there is any other approach that makes big matrix operations in Matlab more efficient. I know it is a quite broad question, but I will take the risk :)
Edit:
Using the implementation of #Shai
% Method 3
mat3 = randi(L,[M,N,L]);
result3 = nnz(bsxfun( #gt, mat3, permute( 1:L, [3 1 2] ) ))./(M+N+L);
The times are:
Why repmat and not bsxfun?
result = nnz(bsxfun( #gt, mat1, permute( 1:L, [3 1 2] ) ))./(M+N+L);
It seems like you are using up your RAM and the OS starts to allocate room in swap for the very large matrics. Memory swapping is always a very time consuming operation and it gets worse as the amount of memory you require increases.
I believe you are witnessing thrashing.

How to delete zero components in a vector in Matlab?

I have a vector for example
a = [0 1 0 3]
I want to turn a into b which equals b = [1 3].
How do I perform this in general? So I have a vector with some zero components and I want to remove the zeroes and leave just the non-zero numbers?
If you just wish to remove the zeros, leaving the non-zeros behind in a, then the very best solution is
a(a==0) = [];
This deletes the zero elements, using a logical indexing approach in MATLAB. When the index to a vector is a boolean vector of the same length as the vector, then MATLAB can use that boolean result to index it with. So this is equivalent to
a(find(a==0)) = [];
And, when you set some array elements to [] in MATLAB, the convention is to delete them.
If you want to put the zeros into a new result b, while leaving a unchanged, the best way is probably
b = a(a ~= 0);
Again, logical indexing is used here. You could have used the equivalent version (in terms of the result) of
b = a(find(a ~= 0));
but mlint will end up flagging the line as one where the purely logical index was more efficient, and thus more appropriate.
As always, beware EXACT tests for zero or for any number, if you would have accepted elements of a that were within some epsilonic tolerance of zero. Do those tests like this
b = a(abs(a) >= tol);
This retains only those elements of a that are at least as large as your tolerance.
I just came across this problem and wanted to find something about the performance, but I couldn't, so I wrote a benchmarking script on my own:
% Config:
rows = 1e6;
runs = 50;
% Start:
orig = round(rand(rows, 1));
t1 = 0;
for i = 1:runs
A = orig;
tic
A(A == 0) = [];
t1 = t1 + toc;
end
t1 = t1 / runs;
t2 = 0;
for i = 1:runs
A = orig;
tic
A = A(A ~= 0);
t2 = t2 + toc;
end
t2 = t2 / runs;
t1
t2
t1 / t2
So you see, the solution using A = A(A ~= 0) is the quicker of the two :)
I often ended up doing things like this. Therefore I tried to write a simple function that 'snips' out the unwanted elements in an easy way. This turns matlab logic a bit upside down, but looks good:
b = snip(a,'0')
you can find the function file at:
http://www.mathworks.co.uk/matlabcentral/fileexchange/41941-snip-m-snip-elements-out-of-vectorsmatrices
It also works with all other 'x', nan or whatever elements.
b = a(find(a~=0))
Data
a=[0 3 0 0 7 10 3 0 1 0 7 7 1 7 4]
Do
aa=nonzeros(a)'
Result
aa=[3 7 10 3 1 7 7 1 7 4]
Why not just, a=a(~~a) or a(~a)=[]. It's equivalent to the other approaches but certainly less key strokes.
You could use sparse(a), which would return
(1,2) 1
(1,4) 3
This allows you to keep the information about where your non-zero entries used to be.