I am trying to vectorise a for loop. I have a set of coordinates listed in a [68x200] matrix called plt2, and I have another set of coordinates listed in a [400x1] matrix called trans1. I want to create a three dimensional array called dist1, where in dist1(:,:,1) I have all of the values of plt2 with the first value of trans1 subtracted, all the way through to the end of trans1. I have a for loop like this which works but is very slow:
for i=1:source_points;
dist1(:,:,i)=plt2-trans1(i,1);
end
Thanks for any help.
If I understood correctly, this can be easily solved with bsxfun:
dist1 = bsxfun(#minus, plt2, shiftdim(trans1,-2));
Or, if speed is important, use this equivalent version (thanks to #chappjc), which seems to be much faster:
dist1 = bsxfun(#minus, plt2, reshape(trans1,1,1,[]));
In general, bsxfun is a very useful function for cases like this. Its behaviour can be summarized as follows: for any singleton dimension of any of its two input arrays, it applies an "implicit" for loop to the other array along the same dimension. See the doc for further details.
Vectorizing is a good first optimization, and is usually much easier than going all in writing your own compiled mex-function (in c).
However, the golden middle-way for power users is Matlab Coder (this also applies to slightly harder problems than the one posted, where vectorization is more or less impossible). First, create a small m-file function around the slow code, in your case:
function dist1 = do_some_stuff(source_points,dist1,plt2,trans1)
for i=1:source_points;
dist1(:,:,i)=plt2-trans1(i,1);
end
Then create a simple wrapper function which calls do_some_stuff as well as defines the inputs. This file should really be only 5 rows, with only the bare essentials needed. Matlab Coder uses the wrapper function to understand what typical proper inputs to do_some_stuff are.
You can now fire up the Matlab Coder gui from the Apps section and simply add do_some_stuff under Entry-Point Files. Press Autodefine types and select your wrapper function. Go to build and press build, and you are good to go! This approach usually bumps up the execution speed substantially with almost no effort.
BR
Magnus
Related
I have a differential equation that's a function of around 30 constants. The differential equation is a system of (N^2+1) equations (where N is typically 4). Solving this system produces N^2+1 functions.
Often I want to see how the solution of the differential equation functionally depends on constants. For example, I might want to plot the maximum value of one of the output functions and see how that maximum changes for each solution of the differential equation as I linearly increase one of the input constants.
Is there a particularly clean method of doing this?
Right now I turn my differential-equation-solving script into a large function that returns an array of output functions. (Some of the inputs are vectors & matrices). For example:
for i = 1:N
[OutputArray1(i, :), OutputArray2(i, :), OutputArray3(i, :), OutputArray4(i, :), OutputArray5(i, :)] = DE_Simulation(Parameter1Array(i));
end
Here I loop through the function. The function solves a differential equation, and then returns the set of solution functions for that input parameter, and then each is appended as a row to a matrix.
There are a few issues I have with my method:
If I want to see the solution to the differential equation for a different parameter, I have to redefine the function so that it is an input of one of the thirty other parameters. For the sake of code readability, I cannot see myself explicitly writing all of the input parameters as individual inputs. (Although I've read that structures might be helpful here, but I'm not sure how that would be implemented.)
I typically get lost in parameter space and often have to update the same parameter across multiple scripts. I have a script that runs the differential-equation-solving function, and I have a second script that plots the set of simulated data. (And I will save the local variables to a file so that I can load them explicitly for plotting, but I often get lost figuring out which file is associated with what set of parameters). The remaining parameters that are not in the input of the function are inside the function itself. I've tried making the parameters global, but doing so drastically slows down the speed of my code. Additionally, some of the inputs are arrays I would like to plot and see before running the solver. (Some of the inputs are time-dependent boundary conditions, and I often want to see what they look like first.)
I'm trying to figure out a good method for me to keep track of everything. I'm trying to come up with a smart method of saving generated figures with a file tag that displays all the parameters associated with that figure. I can save such a file as a notepad file with a generic tagging-number that's listed in the title of the figure, but I feel like this is an awkward system. It's particularly awkward because it's not easy to see what's different about a long list of 30+ parameters.
Overall, I feel as though what I'm doing is fairly simple, yet I feel as though I don't have a good coding methodology and consequently end up wasting a lot of time saving almost-identical functions and scripts to solve fairly simple tasks.
It seems like what you really want here is something that deals with N-D arrays instead of splitting up the outputs.
If all of the OutputArray_ variables have the same number of rows, then the line
for i = 1:N
[OutputArray1(i, :), OutputArray2(i, :), OutputArray3(i, :), OutputArray4(i, :), OutputArray5(i, :)] = DE_Simulation(Parameter1Array(i));
end
seems to suggest that what you really want your function to return is an M x K array (where in this case, K = 5), and you want to pack that output into an M x K x N array. That is, it seems like you'd want to refactor your DE_Simulation to give you something like
for i = 1:N
OutputArray(:,:,i) = DE_Simulation(Parameter1Array(i));
end
If they aren't the same size, then a struct or a table is probably the best way to go, as you could assign to one element of the struct array per loop iteration or one row of the table per loop iteration (the table approach would assume that the size of the variables doesn't change from iteration to iteration).
If, for some reason, you really need to have these as separate outputs (and perhaps later as separate inputs), then what you probably want is a cell array. In that case you'd be able to deal with the variable number of inputs doing something like
for i = 1:N
[OutputArray{i, 1:K}] = DE_Simulation(Parameter1Array(i));
end
I hesitate to even write that, though, because this almost certainly seems like the wrong data structure for what you're trying to do.
I am trying to use parallel computing with GPU in Matlab, and I would like to apply a function to a large array (to avoid the use of a for loop, which is quite slow). I have read Matlab's documentation, and I can use arrayfun, but only if I want to do elementwise operations. Maybe I am confused, but I would appreciate if someone can help me to use it. As an example of what I want to do, imagine that I would like to perform the following operation,
$X_t = B Z_t + Q\varepsilon_t$
where $X_t$ is 2x1, $B$ is 2x5, and $Z_t$ is 5x1, with $Q$ 2x2. I define a function,
function X = propose(Z,B,Q)
X=Z*B+Q*rand(2,1);
end
Now, suppose that I have an array $Z_p$ which is 5x1000. To each of the 1000 columns I would like to apply the previous function, for given matrices $B$ and $Q$, to get an array $X_p$ which is 2x1000.
Given the documentation for arrayfun I can not do this,
X=arrayfun(#propose,Zp,B,Q)
So, is there any possibility to do it?
Thanks!
PS: Yes, I know that in this simple example I can just do the multiplication without a for loop, but the application I have in mind is more complicated and I cannot do it. I just put this example as an illustration.
I have a function which does the following loop many, many times:
for cluster=1:max(bins), % bins is a list in the same format as kmeans() IDX output
select=bins==cluster; % find group of values
means(select,:)=repmat_fast_spec(meanOneIn(x(select,:)),sum(select),1);
% (*, above) for each point, write the mean of all points in x that
% share its label in bins to the equivalent row of means
delta_x(select,:)=x(select,:)-(means(select,:));
%subtract out the mean from each point
end
Noting that repmat_fast_spec and meanOneIn are stripped-down versions of repmat() and mean(), respectively, I'm wondering if there's a way to do the assignment in the line labeled (*) that avoids repmat entirely.
Any other thoughts on how to squeeze performance out of this thing would also be welcome.
Here is a possible improvement to avoid REPMAT:
x = rand(20,4);
bins = randi(3,[20 1]);
d = zeros(size(x));
for i=1:max(bins)
idx = (bins==i);
d(idx,:) = bsxfun(#minus, x(idx,:), mean(x(idx,:)));
end
Another possibility:
x = rand(20,4);
bins = randi(3,[20 1]);
m = zeros(max(bins),size(x,2));
for i=1:max(bins)
m(i,:) = mean( x(bins==i,:) );
end
dd = x - m(bins,:);
One obvious way to speed up calculation in MATLAB is to make a MEX file. You can compile C code and perform any operations you want. If you're searching for the fastest-possible performance, turning the operation into a custom MEX file would likely be the way to go.
You may be able to get some improvement by using ACCUMARRAY.
%# gather array sizes
[nPts,nDims] = size(x);
nBins = max(bins);
%# calculate means. Not sure whether it might be faster to loop over nDims
meansCell = accumarray(bins,1:nPts,[nBins,1],#(idx){mean(x(idx,:),1)},{NaN(1,nDims)});
means = cell2mat(meansCell);
%# subtract cluster means from x - this is how you can avoid repmat in your code, btw.
%# all you need is the array with cluster means.
delta_x = x - means(bins,:);
First of all: format your code properly, surround any operator or assignment by whitespace. I find your code very hard to comprehend as it looks like a big blob of characters.
Next of all, you could follow the other responses and convert the code to C (mex) or Java, automatically or manually, but in my humble opinion this is a last resort. You should only do such things when your performance is not there yet by a small margin. On the other hand, your algorithm doesn't show obvious flaws.
But the first thing you should do when trying to improve performance: profile. Use the MATLAB profiler to determine which part of your code is causing your problems. How much would you need to improve this to meet your expectations? If you don't know: first determine this boundary, otherwise you will be looking for a needle in a hay stack which might not even be in there in the first place. MATLAB will never be the fastest kid on the block with respect to runtime, but it might be the fastest with respect to development time for certain kinds of operations. In that respect, it might prove useful to sacrifice the clarity of MATLAB over the execution speed of other languages (C or even Java). But in the same respect, you might as well code everything in assembler to squeeze all of the performance out of the code.
Another obvious way to speed up calculation in MATLAB is to make a Java library (similar to #aardvarkk's answer) since MATLAB is built on Java and has very good integration with user Java libraries.
Java's easier to interface and compile than C. It might be slower than C in some cases, but the just-in-time (JIT) compiler in the Java virtual machine generally speeds things up very well.
I have just profiled my MATLAB code and there is a bottle-neck in this for loop:
for vert=-down:up
for horz=-lhs:rhs
y = y + x(k+vert.*length+horz).*DM(abs(vert).*nu+abs(horz)+1);
end
end
where y, x and DM are vectors I have already defined. I vectorised the loop by writing,
B=(-down:up)'*ones(1,lhs+rhs+1);
C=ones(up+down+1,1)*(-lhs:rhs);
y = sum(sum(x(k+length.*B+C).*DM(abs(B).*nu+abs(C)+1)));
But this ended up being sufficiently slower.
Are there any suggestions on how I can speed up this for loop?
Thanks in advance.
What you've done is not really vectorization. It's very difficult, if not impossible, to write proper vectorization procedures for image processing (I assume that's what you're doing) in Matlab. When we use the term vectorized, we really mean "vectorized with no additional computation". For example, this code
a = 1:1000000;
for i = a
n = n+i;
end
would run much slower then this code
a = 1:1000000;
sum(a)
Update: code above has been modified, thanks to #Rasman's keen suggestion. The reason is that Matlab does not compile your code into machine language before running it, and that's what causes it to be slower. Built-in functions like sum, mean and the .* operator run pre-compiled C code behind the scenes. For loops are a great example of code that runs slowly when not optimized for you CPU's registers.
What you have done, and please ignore my first comment, is rewriting your procedure with a vector operation and some additional operations. Those are the operations that take extra CPU simply because you're telling your computer to do more computations, even though each computation separately may (or may not) take less time.
If you are really after speeding up you code, take a look at MEX files. They allow you to write and compile C and C++ code, compile it and run as Matlab functions, just like those fast built-in ones. In any case, Matlab is not meant to be a fast general-purpose programming platform, but rather a computer simulation environment, though this approach has been changing in the recent years. My advise (from experience) is that if you do image processing, you will write for loops, and there's rarely a way around it. Vector operations were written for a more intuitive approach to linear algebra problems, and we rarely treat digital images as regular rectangular matrices in terms of what we do with them.
I hope this helps.
I would use matrices when handling images... you could then try to extract submatrices like so:
X = reshape(x,height,length);
kx = mod(k,length);
ky = floor(k/length);
xstamp = X( [kx-down:kx+up], [ky-lhs:ky+rhs]);
xstamp = xstamp.*getDMMMask(width, height);
y = sum(xstamp);
...
function mask = getDMMask(width, height, nu)
% I don't get what you're doing there .. return an appropriate sized mask here.
return mask;
end
I hope I am on topic here. I'm asking here since it said on the faq page: a question concerning (among others) a software algorithm :) So here it goes:
I need to solve a system of ODEs (like $ \dot x = A(t) x$. The Matrix A may change and is given as a string in the function call (Calc_EDS_v2('Sys_EDS_a',...)
Then I'm using ode45 in a loop to find my x:
function [intervals, testing] = EDS_calc_v2(smA,options,debug)
[..]
for t=t_start:t_step:t_end)
[Te,Qe]=func_int(#intQ_2_v2,[t,t+t_step],q);
q=Qe(end,:);
[..]
end
[..]
with func_int being ode45 and #intQ_2_v2 my m-file. q is given back to the call as the starting vector. As you can see I'm just using ode45 on the intervall [t, t+t_step]. That's because my system matrix A can force ode45 to use a lot of steps, leading it to hit the AbsTol or RelTol very fast.
Now my A is something like B(t)*Q(t), so in the m-file intQ_2_v2.m I need to evaluate both B and Q at the times t.
I first done it like so: (v1 -file, so function name is different)
function q=intQ_2_v1(t,X)
[..]
B(1)=...; ... B(4)=...;
Q(1)=...; ...
than that is naturally only with the assumption that A is a 2x2 matrix. With that setup it took a basic system somewhere between 10 and 15 seconds to compute.
Instead of the above I now use the files B1.m to B4.m and Q1.m to B4.m (I know that that's not elegant, but I need to use quadgk on B later and quadgk doesn't support matrix functions.)
function q=intQ_2_v2(t,X)
[..]
global funcnameQ, funcnameB, d
for k=1:d
Q(k)=feval(str2func([funcnameQ,int2str(k)]),t);
B(k)=feval(str2func([funcnameB,int2str(k)]),t);
end
[..]
funcname (string) referring to B or Q (with added k) and d is dimension of the system.
Now I knew that it would cost me more time than the first version but I'm seeing the computing times are ten times as high! (getting 150 to 160 seconds) I do understand that opening 4 files and evaluate roughly 40 times per ode-loop is costly... and I also can't pre-evalute B and Q, since ode45 uses adaptive step sizes...
Is there a way to not use that last loop?
Mostly I'm interested in a solution to drive down the computing times. I do have a feeling that I'm missing something... but can't really put my finger on it. With that one taking nearly three minutes instead of 10 seconds I can get a coffee in between each testrun now... (plz don't tell me to get a faster computer)
(sorry for such a long question )
I'm not sure that I fully understand what you're doing here, but I can offer a few tips.
Use the profiler, it will help you understand exactly where the bottlenecks are.
Using feval is slower than using function handles directly, especially when using str2func to build the handle each time. There is also a slowdown from using the global variables (and it's a good habit to avoid these unless absolutely necessary). Each of these really adds up when using them repeatedly (as it looks like here). Store function handles to each of your mfiles in a cell array and either pass them directly to the function or use nested function for the optimization so that the cell array of handles is visible to the function being optimized. Personally, I prefer the nested method, but passing is better if you will use those mfiles elsewhere.
I expect this will get your runtime back to close to what the first method gave. Be sure to tell us if this was the problem or if you found another solution.