Matlab GPU use with functions that take arguments of different dimensions - matlab

I am trying to use parallel computing with GPU in Matlab, and I would like to apply a function to a large array (to avoid the use of a for loop, which is quite slow). I have read Matlab's documentation, and I can use arrayfun, but only if I want to do elementwise operations. Maybe I am confused, but I would appreciate if someone can help me to use it. As an example of what I want to do, imagine that I would like to perform the following operation,
$X_t = B Z_t + Q\varepsilon_t$
where $X_t$ is 2x1, $B$ is 2x5, and $Z_t$ is 5x1, with $Q$ 2x2. I define a function,
function X = propose(Z,B,Q)
X=Z*B+Q*rand(2,1);
end
Now, suppose that I have an array $Z_p$ which is 5x1000. To each of the 1000 columns I would like to apply the previous function, for given matrices $B$ and $Q$, to get an array $X_p$ which is 2x1000.
Given the documentation for arrayfun I can not do this,
X=arrayfun(#propose,Zp,B,Q)
So, is there any possibility to do it?
Thanks!
PS: Yes, I know that in this simple example I can just do the multiplication without a for loop, but the application I have in mind is more complicated and I cannot do it. I just put this example as an illustration.

Related

Relate outputs of parent function to input of nested function

I am going to try and explain myself simply in the hopes of getting a simple answer.
Let's say I have a function 'calculate' that takes the inputs [t,k,r,x] and outputs [A,B,C,D] as follows:
function [A,B,C,D] = calculate(t,k,r,x)
Now lets say I have another function that takes these outputs as the inputs, and spits out more, different outputs, eg.
function [M,N] = again(A,B,C,D)
How do I link [M,N] to say k and t? The overall aim is to minimise both M and N by optimising k and t, and I can guess that it has something to do with nested functions and passing parameters but I'm not sure how to start, and the start is all I want. Thanks
Take a look at the Matlab optimization toolbox. It provides functions for a multitude of optimization problems. Although I believe these functions only take one function as a parameter. Therefore for your case it would probably be best, if you do it like this if you can:
Write function calculate.m with parameters (t,k,r,x) and save it.
Write function again.m with parameters(t,k,r,x) and save it.
Function again calls function calculate with parameters (t,k,r,x) and then continues to determine (M,N) from the output of calculate.
in the Matlab toolbox optimization function, e.g.: fmincon(fun,x0,A,b), you then have to use again.m as the function you want to optimize (fun).
Hope that's good enough for a start.

Matlab: need some help for a seemingly simple vectorization of an operation

I would like to optimize this piece of Matlab code but so far I have failed. I have tried different combinations of repmat and sums and cumsums, but all my attempts seem to not give the correct result. I would appreciate some expert guidance on this tough problem.
S=1000; T=10;
X=rand(T,S),
X=sort(X,1,'ascend');
Result=zeros(S,1);
for c=1:T-1
for cc=c+1:T
d=(X(cc,:)-X(c,:))-(cc-c)/T;
Result=Result+abs(d');
end
end
Basically I create 1000 vectors of 10 random numbers, and for each vector I calculate for each pair of values (say the mth and the nth) the difference between them, minus the difference (n-m). I sum over of possible pairs and I return the result for every vector.
I hope this explanation is clear,
Thanks a lot in advance.
It is at least easy to vectorize your inner loop:
Result=zeros(S,1);
for c=1:T-1
d=(X(c+1:T,:)-X(c,:))-((c+1:T)'-c)./T;
Result=Result+sum(abs(d),1)';
end
Here, I'm using the new automatic singleton expansion. If you have an older version of MATLAB you'll need to use bsxfun for two of the subtraction operations. For example, X(c+1:T,:)-X(c,:) is the same as bsxfun(#minus,X(c+1:T,:),X(c,:)).
What is happening in the bit of code is that instead of looping cc=c+1:T, we take all of those indices at once. So I simply replaced cc for c+1:T. d is then a matrix with multiple rows (9 in the first iteration, and one fewer in each subsequent iteration).
Surprisingly, this is slower than the double loop, and similar in speed to Jodag's answer.
Next, we can try to improve indexing. Note that the code above extracts data row-wise from the matrix. MATLAB stores data column-wise. So it's more efficient to extract a column than a row from a matrix. Let's transpose X:
X=X';
Result=zeros(S,1);
for c=1:T-1
d=(X(:,c+1:T)-X(:,c))-((c+1:T)-c)./T;
Result=Result+sum(abs(d),2);
end
This is more than twice as fast as the code that indexes row-wise.
But of course the same trick can be applied to the code in the question, speeding it up by about 50%:
X=X';
Result=zeros(S,1);
for c=1:T-1
for cc=c+1:T
d=(X(:,cc)-X(:,c))-(cc-c)/T;
Result=Result+abs(d);
end
end
My takeaway message from this exercise is that MATLAB's JIT compiler has improved things a lot. Back in the day any sort of loop would halt code to a grind. Today it's not necessarily the worst approach, especially if all you do is use built-in functions.
The nchoosek(v,k) function generates all combinations of the elements in v taken k at a time. We can use this to generate all possible pairs of indicies then use this to vectorize the loops. It appears that in this case the vectorization doesn't actually improve performance (at least on my machine with 2017a). Maybe someone will come up with a more efficient approach.
idx = nchoosek(1:T,2);
d = bsxfun(#minus,(X(idx(:,2),:) - X(idx(:,1),:)), (idx(:,2)-idx(:,1))/T);
Result = sum(abs(d),1)';
Update: here are the results for the running times for the different proposals (10^5 trials):
So it looks like the transformation of the matrix is the most efficient intervention, and my original double-loop implementation is, amazingly, the best compared to the vectorized versions. However, in my hands (2017a) the improvement is only 16.6% compared to the original using the mean (18.2% using the median).
Maybe there is still room for improvement?

Apply a function in each tensor slice

Suppose we have an 3-D array (tensor) and we want to apply in each slice a function, e.g., myFun = #(x)mean(x). Is there any way to do this without for loop using build-in functions (possibly withbsxfun, or arrayfun, or accumarray)?
For loop example:
inputA = rand(10,5,20);
for sl = 1:size(A,3)
inputB = myFun(inputA(:,:,sl));
end
Thank you.
EDIT:
inputB = arrayfun(#(iterSlice) myFun(inputA(:,:,iterSlice), 1:size(inputA,3))
PS: I would like to mention, that the handler function applied is more complicated in each slice, mean was an example included in the handler function.
A for loop is already the best possibility to iterate. The only way to further improve the performance is to get rid of the iteration and have a single function call of myFun which processes all data in a vectorized way. For your example function that very simple:
myFun=#(x)permute(mean(x,1),[3,2,1])
Now it accepts 3d inputs and in the rows you find the results of the previous iterative code. You have to modify your function, there is no generic wrapper on top which can do this for you.
Regarding your edit: arrayfun is known to be slow

bsxfun() Invalid output dimensions

I have a function that takes upto seven arguments and returns a row vector. The first three arguments are vectors (column, column, row) and the remaining four are optional scalars.
I want to use bsxfun() to apply the function to a vector of its last argument. Below is my attempt to do that.
o = #(m,pulse,N0,samples_per_pulse,sample_select,filter,channel_cutoff) ELE452Functions.EvaluateBER(m,pulse,N0,samples_per_pulse,sample_select,filter,channel_cutoff);
oo = #(m,pulse,N0,samples_per_pulse,sample_select,filter,channel_cutoff) bsxfun(#(N0,channel_cutoff) o(m,pulse,N0,samples_per_pulse,sample_select,filter,channel_cutoff), N0' , channel_cutoff);
when I try to call the function with a vector, oo(m,pulse,N0,1,1,1,[0.5 0.2]); for example, I get this error:
Error using bsxfun
Invalid output dimensions.
I am not experienced in using bsxfun and I tried to follow the documentation.
Update:
May be this is a clearer way to ask my question:
I want to use bsxfun to rewrite (improve) the code below with out a loop.
for i=1:length(channel_normalized_cuttoffs)
BER_LPchannel(i,:) = ELE452Functions.EvaluateBER(m,pulse,N0,1,1,1,channel_normalized_cuttoffs(i));
end
The idea behind bsxfun is to evaluate a certain function for all possible combinations of two elements (b in bsxfun stands for binary), each coming from one of the arrays. (NB: This is valid if you use it with a row and a column vector. But bsxfun can also do more.)
What you want to achieve is simply: For all entries of a single array, evaluate a function.
So bsxfun is just not the correct choice here.
You could use arrayfun instead, but this still may not perform a lot better than your original for loop, as it looks like the Matlab JIT Compiler would be able to optimize most of it, considering it's simplicity.
As I don't have the code of your function, I'm not able to test it, but your solution might look a lot like this:
evalBER = #(CNcutoffi) ELE452Functions.EvaluateBER(m,pulse,N0,1,1,1,CNcutoffi);
BER_LPchannel = arrayfun(evalBER, channel_normalized_cuttoffs, 'UniformOutput', false)

Vectorising 3d array

I am trying to vectorise a for loop. I have a set of coordinates listed in a [68x200] matrix called plt2, and I have another set of coordinates listed in a [400x1] matrix called trans1. I want to create a three dimensional array called dist1, where in dist1(:,:,1) I have all of the values of plt2 with the first value of trans1 subtracted, all the way through to the end of trans1. I have a for loop like this which works but is very slow:
for i=1:source_points;
dist1(:,:,i)=plt2-trans1(i,1);
end
Thanks for any help.
If I understood correctly, this can be easily solved with bsxfun:
dist1 = bsxfun(#minus, plt2, shiftdim(trans1,-2));
Or, if speed is important, use this equivalent version (thanks to #chappjc), which seems to be much faster:
dist1 = bsxfun(#minus, plt2, reshape(trans1,1,1,[]));
In general, bsxfun is a very useful function for cases like this. Its behaviour can be summarized as follows: for any singleton dimension of any of its two input arrays, it applies an "implicit" for loop to the other array along the same dimension. See the doc for further details.
Vectorizing is a good first optimization, and is usually much easier than going all in writing your own compiled mex-function (in c).
However, the golden middle-way for power users is Matlab Coder (this also applies to slightly harder problems than the one posted, where vectorization is more or less impossible). First, create a small m-file function around the slow code, in your case:
function dist1 = do_some_stuff(source_points,dist1,plt2,trans1)
for i=1:source_points;
dist1(:,:,i)=plt2-trans1(i,1);
end
Then create a simple wrapper function which calls do_some_stuff as well as defines the inputs. This file should really be only 5 rows, with only the bare essentials needed. Matlab Coder uses the wrapper function to understand what typical proper inputs to do_some_stuff are.
You can now fire up the Matlab Coder gui from the Apps section and simply add do_some_stuff under Entry-Point Files. Press Autodefine types and select your wrapper function. Go to build and press build, and you are good to go! This approach usually bumps up the execution speed substantially with almost no effort.
BR
Magnus