Matlab slow performance using closures - matlab

I'm coding a solution for Poisson equation on a 2d rectangle using finite elements. In order to simplify the code I store handles to the basis functions in an array and then loop over these basis functions to create my matrix and right hand side. The problem with this is that even for very coarse grids it is prohibitively slow. For a 9x9 grid (using Dirichlet BC, there are 49 nodes to solve for) it takes around 20 seconds. Using the profile I've noticed that around half the time is spent accessing (not executing) my basis functions.
The profiler says matrix_assembly>#(x,y)bilinearBasisFunction(x,y,xc(k-1),xc(k),xc(k+1),yc(j-1),yc(j),yc(j+1)) (156800 calls, 11.558 sec), the self time (not executing the bilinear basis code) is over 9 seconds. Any ideas as to why this might be so slow?
Here's some of the code, I can post more if needed:
%% setting up the basis functions, storing them in cell array
basisFunctions = cell(nu, 1); %nu is #unknowns
i = 1;
for j = 2:length(yc) - 1
for k = 2:length(xc) - 1
basisFunctions{i} = #(x,y) bilinearBasisFunction(x,y, xc(k-1), xc(k),...
xc(k+1), yc(j-1), yc(j), yc(j+1)); %my code for bilinear basis functions
i = i+1;
end
end
%% Assemble matrices and RHS
M = zeros(nu,nu);
S = zeros(nu,nu);
F = zeros(nu, 1);
for iE = 1:ne
for iBF = 1:nu
[z1, dx1, dy1] = basisFunctions{iBF}(qx(iE), qy(iE));
F(iBF) = F(iBF) + z1*forcing_handle(qx(iE),qy(iE))/ae(iE);
for jBF = 1:nu
[z2, dx2, dy2] = basisFunctions{jBF}(qx(iE), qy(iE));
%M(iBF,jBF) = M(iBF,jBF) + z1*z2/ae(iE);
S(iBF,jBF) = S(iBF, jBF) + (dx1*dx2 + dy1*dy2)/ae(iE);
end
end
end

Try to change basisFunctions from being a cell array to being a regular array.
You can also try to inline the direct call to bilinearBasisFunctionwithin your jBF loop, rather than using basisFunctions. Creating and later using anonymous functions in Matlab is always slower than directly using the target function. The code may be slightly more verbose this way, but will be faster.

Related

On histogram equalization

So I'd like to do it without histeq, but my code seems to get out a rather peculiar, really whited out image, and doesn't seem all too much improved from the original picture. Is there a better way to apply the proper histogram?
Cumlative=zeros(256,1);
CumHisty=uint8(zeros(ROWS,COLS));
% First we need to find the probabilities and the frequencies
freq = zeros(256,1);
probab = zeros(256,1);
for i=1:ROWS
for j=1:COLS
value=I1(i,j);
freq(value+1)=freq(value+1)+1;
probab(value+1)=freq(value+1)/(ROWS*COLS);
end
end
count=0;
cumprobab=zeros(256,1);
distrib=zeros(256,1);
for i=1:size(probab)
count=count+freq(i);
Cumlative(i)=count;
cumprobab(i)=Cumlative(i)/(ROWS*COLS);
distrib(i)=round(cumprobab(i)*(ROWS*COLS));
end
for i=1:ROWS
for j=1:COLS
CumHisty(i,j)=distrib(I1(i,j)+1);
end
You probably want to do:
distrib(i) = round(cumprobab(i)*255);
EDIT:
Here is a version of your code without the redundant computations, and simplified looping:
freq = zeros(256,1);
for i = 1:numel(I1)
index = I1(i) + 1;
freq(index) = freq(index)+1;
end
count = 0;
distrib = zeros(256,1);
for i = 1:length(freq)
count = count + freq(i);
cumprobab = count/numel(I1);
distrib(i) = round(cumprobab*255);
end
CumHisty = zeros(size(I1),'uint8');
for i = 1:numel(I1)
CumHisty(i) = distrib(I1(i)+1);
end
I use linear indexing above, it's simpler (one loop instead of 2) and automatically helps you access the pixels in the same order that they are stored in. The way you looped (over rows in the outer loop, and columns in the inner loop) means that you are not accessing pixels in the optimal order, since arrays are stored column-wise (column-major order). Accessing data in the order in which it is stored in memory allows for an optimal cache usage (i.e. is faster).
The above can also be written as:
freq = histcounts(I1,0:256);
distrib = round(cumsum(freq)*(255/numel(I1)));
distrib = uint8(distrib);
CumHisty = distrib(I1+1);
This is faster than the loop code, but within the same order of magnitude. Recent versions of MATLAB are no longer terribly slow doing loops.
I clocked your code at 40 ms, with simplified loops at 19.5 ms, and without loops at 5.8 ms, using an image of size 1280x1024.

Speeding up matlab for loop

I have a system of 5 ODEs with nonlinear terms involved. I am trying to vary 3 parameters over some ranges to see what parameters would produce the necessary behaviour that I am looking for.
The issue is I have written the code with 3 for loops and it takes a very long time to get the output.
I am also storing the parameter values within the loops when it meets a parameter set that satisfies an ODE event.
This is how I have implemented it in matlab.
function [m,cVal,x,y]=parameters()
b=5000;
q=0;
r=10^4;
s=0;
n=10^-8;
time=3000;
m=[];
cVal=[];
x=[];
y=[];
val1=0.1:0.01:5;
val2=0.1:0.2:8;
val3=10^-13:10^-14:10^-11;
for i=1:length(val1)
for j=1:length(val2)
for k=1:length(val3)
options = odeset('AbsTol',1e-15,'RelTol',1e-13,'Events',#eventfunction);
[t,y,te,ye]=ode45(#(t,y)systemFunc(t,y,[val1(i),val2(j),val3(k)]),0:time,[b,q,s,r,n],options);
if length(te)==1
m=[m;val1(i)];
cVal=[cVal;val2(j)];
x=[x;val3(k)];
y=[y;ye(1)];
end
end
end
end
Is there any other way that I can use to speed up this process?
Profile viewer results
I have written the system of ODEs simply with the a format like
function s=systemFunc(t,y,p)
s= zeros(2,1);
s(1)=f*y(1)*(1-(y(1)/k))-p(1)*y(2)*y(1)/(p(2)*y(2)+y(1));
s(2)=p(3)*y(1)-d*y(2);
end
f,d,k are constant parameters.
The equations are more complicated than what's here as its a system of 5 ODEs with lots of non linear terms interacting with each other.
Tommaso is right. Preallocating will save some time.
But I would guess that there is fundamentally not a lot you can do since you are running ode45 in a loop. ode45 itself may be the bottleneck.
I would suggest you profile your code to see where the bottleneck is:
profile on
parameters(... )
profile viewer
I would guess that ode45 is the problem. Probably you will find that you should actually focus your time on optimizing the systemFunc code for performance. But you won't know that until you run the profiler.
EDIT
Based on the profiler output and additional code, I see some things that will help
It seems like the vectorization of your values is hurting you. Instead of
#(t,y)systemFunc(t,y,[val1(i),val2(j),val3(k)])
try
#(t,y)systemFunc(t,y,val1(i),val2(j),val3(k))
where your system function is defined as
function s=systemFunc(t,y,p1,p2,p3)
s= zeros(2,1);
s(1)=f*y(1)*(1-(y(1)/k))-p1*y(2)*y(1)/(p2*y(2)+y(1));
s(2)=p3*y(1)-d*y(2);
end
Next, note that you don't have to preallocate space in the systemFunc, just combine them in the output:
function s=systemFunc(t,y,p1,p2,p3)
s = [ f*y(1)*(1-(y(1)/k))-p1*y(2)*y(1)/(p2*y(2)+y(1)),
p3*y(1)-d*y(2) ];
end
Finally, note that ode45 is internally taking about 1/3 of your runtime. There is not much you will be able to do about that. If you can live with it, I would suggest increasing your 'AbsTol' and 'RelTol' to more reasonable numbers. Those values are really small, and are making ode45 run for a really long time. If you can live with it, try increasing them to something like 1e-6 or 1e-8 and see how much the performance increases. Alternatively, depending on how smooth your function is, you might be able to do better with a different integrator (like ode23). But your mileage will vary based on how smooth your problem is.
I have two suggestions for you.
Preallocate the vectors in which you store your results and use an
increasing index to populate them into each iteration.
Since the options you use are always the same, instantiate then
outside the loop only once.
Final code:
function [m,cVal,x,y] = parameters()
b = 5000;
q = 0;
r = 10^4;
s = 0;
n = 10^-8;
time = 3000;
options = odeset('AbsTol',1e-15,'RelTol',1e-13,'Events',#eventfunction);
val1 = 0.1:0.01:5;
val1_len = numel(val1);
val2 = 0.1:0.2:8;
val2_len = numel(val2);
val3 = 10^-13:10^-14:10^-11;
val3_len = numel(val3);
total_len = val1_len * val2_len * val3_len;
m = NaN(total_len,1);
cVal = NaN(total_len,1);
x = NaN(total_len,1);
y = NaN(total_len,1);
res_offset = 1;
for i = 1:val1_len
for j = 1:val2_len
for k = 1:val3_len
[t,y,te,ye] = ode45(#(t,y)systemFunc(t,y,[val1(i),val2(j),val3(k)]),0:time,[b,q,s,r,n],options);
if (length(te) == 1)
m(res_offset) = val1(i);
cVal(res_offset) = val2(j);
x(res_offset) = val3(k);
y(res_offset) = ye(1);
end
res_offset = res_offset + 1;
end
end
end
end
If you only want to preserve result values that have been correctly computed, you can remove the rows containing NaNs at the bottom of your function. Indexing on one of the vectors will be enough to clear everything:
rows_ok = ~isnan(y);
m = m(rows_ok);
cVal = cVal(rows_ok);
x = x(rows_ok);
y = y(rows_ok);
In continuation of the other suggestions, I have 2 more suggestions for you:
You might want to try with a different solver, ODE45 is for non-stiff problems, but from the looks of it, it might seem like your problem could be stiff (parameters have a different order of magnitude). Try for instance with the ode23s method.
Secondly, without knowing which event you are looking for, maybe it is possible for you to use a logarithmic search rather than a linear one. e.g. the Bisection method. This will severely cut down on the number of times you have to solve the equation.

MATLAB Piecewise function

I have to construct the following function in MATLAB and am having trouble.
Consider the function s(t) defined for t in [0,4) by
{ sin(pi*t/2) , for t in [0,1)
s(t) = { -(t-2)^3 , for t in [1,3)*
{ sin(pi*t/2) , for t in [3,4)
(i) Generate a column vector s consisting of 512 uniform
samples of this function over the interval [0,4). (This
is best done by concatenating three vectors.)
I know it has to be something of the form.
N = 512;
s = sin(5*t/N).' ;
But I need s to be the piecewise function, can someone provide assistance with this?
If I understand correctly, you're trying to create 3 vectors which calculate the specific function outputs for all t, then take slices of each and concatenate them depending on the actual value of t. This is inefficient as you're initialising 3 times as many vectors as you actually want (memory), and also making 3 times as many calculations (CPU), most of which will just be thrown away. To top it off, it'll be a bit tricky to use concatenate if your t is ever not as you expect (i.e. monotonically increasing). It might be an unlikely situation, but better to be general.
Here are two alternatives, the first is imho the nice Matlab way, the second is the more conventional way (you might be more used to that if you're coming from C++ or something, I was for a long time).
function example()
t = linspace(0,4,513); % generate your time-trajectory
t = t(1:end-1); % exclude final value which is 4
tic
traj1 = myFunc(t);
toc
tic
traj2 = classicStyle(t);
toc
end
function trajectory = myFunc(t)
trajectory = zeros(size(t)); % since you know the size of your output, generate it at the beginning. More efficient than dynamically growing this.
% you could put an assert for t>0 and t<3, otherwise you could end up with 0s wherever t is outside your expected range
% find the indices for each piecewise segment you care about
idx1 = find(t<1);
idx2 = find(t>=1 & t<3);
idx3 = find(t>=3 & t<4);
% now calculate each entry apprioriately
trajectory(idx1) = sin(pi.*t(idx1)./2);
trajectory(idx2) = -(t(idx2)-2).^3;
trajectory(idx3) = sin(pi.*t(idx3)./2);
end
function trajectory = classicStyle(t)
trajectory = zeros(size(t));
% conventional way: loop over each t, and differentiate with if-else
% works, but a lot more code and ugly
for i=1:numel(t)
if t(i)<1
trajectory(i) = sin(pi*t(i)/2);
elseif t(i)>=1 & t(i)<3
trajectory(i) = -(t(i)-2)^3;
elseif t(i)>=3 & t(i)<4
trajectory(i) = sin(pi*t(i)/2);
else
error('t is beyond bounds!')
end
end
end
Note that when I tried it, the 'conventional way' is sometimes faster for the sampling size you're working on, although the first way (myFunc) is definitely faster as you scale up really a lot. In anycase I recommend the first approach, as it is much easier to read.

How to accumulate submatrices without looping (subarray smoothing)?

In Matlab I need to accumulate overlapping diagonal blocks of a large matrix. The sample code is given below.
Since this piece of code needs to run several times, it consumes a lot of resources. The process is used in array signal processing for a so-called subarray smoothing or spatial smoothing. Is there any way to do this faster?
% some values for parameters
M = 1000; % size of array
m = 400; % size of subarray
n = M-m+1; % number of subarrays
R = randn(M)+1i*rand(M);
% main code
S = R(1:m,1:m);
for i = 2:n
S = S + R(i:m+i-1,i:m+i-1);
end
ATTEMPTS:
1) I tried the following alternative vectorized version, but unfortunately it became much slower!
[X,Y] = meshgrid(1:m);
inds1 = sub2ind([M,M],Y(:),X(:));
steps = (0:n-1)*(M+1);
inds = repmat(inds1,1,n) + repmat(steps,m^2,1);
RR = sum(R(inds),2);
S = reshape(RR,m,m);
2) I used Matlab coder to create a MEX file and it became much slower!
I've personally had to fasten up some portions of my code lately. Being not an expert at all, I would recommend trying the following:
1) Vectorize:
Getting rid of the for-loop
S = R(1:m,1:m);
for i = 2:n
S = S + R(i:m+i-1,i:m+i-1)
end
and replacing it for an alternative based on cumsum should be the way to go here.
Note: will try and work on this approach on a future Edit
2) Generating a MEX-file:
In some instances, you could simply fire up the Matlab Coder app (given that you have it in your current Matlab version).
This should generate a .mex file for you, that you can call as it was the function that you are trying to replace.
Regardless of your choice (1) or 2)), you should profile your current implementation with tic; my_function(); toc; for a fair number of function calls, and compare it with your current implementation:
my_time = zeros(1,10000);
for count = 1:10000
tic;
my_function();
my_time(count) = toc;
end
mean(my_time)

faster method of interpolation in matlab

I am using interp1 to inteprolate some data:
temp = 4 + (30-4).*rand(365,10);
depth = 1:10;
dz = 0.5; %define new depth interval
bthD = min(depth):dz:max(depth); %new depth vector
for i = 1:length(temp);
i_temp(i,:) = interp1(depth,temp(i,:),bthD);
end
Here, I am increasing the resolution of my measurements by interpolating the measurements from 1 m increments to 0.5 m increments. This code works fine i.e. it gives me the matrix I was looking for. However, when I apply this to my actual data, it takes a long time to run, primarily as I am running an additional loop which runs through various cells. Is there a way of achieving what is described above without using the loop, in other words, is there a faster method?
Replace your for loop with:
i_temp = interp1(depth,temp',bthD)';
You can get rid of the transposes if you change the way that temp is defined, and if you are OK with i_temp being a 19x365 array instead of 365x19.
BTW, the documentation for interp1 is very clear that you can pass in an array as the second argument.