Can anyone help to understand what/where is the problem?
I am comparing the speed of a basic matlab function like the mean.m with two matlab version 2013b and 2014b with the same machine.
and surprising, the version 2013b is much faster than 2014b....
Some of you have/had the same problem??
Profile summary of mean with 2014b --> 0,024
Profile summary of mean with 2013b --> 0,013
like in my scripts I use the mean function really often the different in running time of the same program in one or the other version is huge.....
Whats going on?
the code to compute the profile time:
A=rand(100,1)
time_mean=zeros( 1000,1)
for i=1:1000
tic
mean(A);
time_mean(i)= toc;
end
Firstly, it's not wise to use the profiler to compare timings across releases - it's designed to identify slow portions in a single MATLAB release. Secondly, you should use timeit to time this sort of thing. I compared R2013b and R2014b on my Windows machine over a range of sizes, and can see what appears to be a small fixed overhead in R2014b of around 0.1ms.
Code is essentially:
for exp = 1:6
A = rand(10^exp, 1);
t(exp) = timeit(#()mean(A));
end
semilogy(1:6, t);
If you are making lots of individual calls to mean, you might be better off seeing if you can form these into a single call - MATLAB's mean can operate down columns or along rows of a matrix...
Related
The allfitdist function in MATLAB for normally distributed data return 'rayleigh' as best fit distribution!
Here is the link of function: http://www.mathworks.in/matlabcentral/fileexchange/34943-fit-all-valid-parametric-probability-distributions-to-data
data = normrnd(5,3,1e4,1);
[D PD] = allfitdist(data,'PDF');
D(1)
DistName: 'rayleigh'
NLogL: 2.4515e+04 - 1.5959e+03i
BIC: 4.9038e+04 - 3.1919e+03i
AIC: 4.9031e+04 - 3.1919e+03i
AICc: 4.9031e+04 - 3.1919e+03i
ParamNames: {'B'}
ParamDescription: {'scale'}
Params: 4.1166
Paramci: [2x1 double]
ParamCov: 4.2366e-04
Support: [1x1 struct]
So weird as it is an example included in file.
The second best fit is normal. The results are sorted by BIC, just wondering if I should change the sorting criteria or something is wrong with what I am doing.
Following user3278640's answer, I tried out several versions of matlab. 2012b showed the normal distribution, while 2013b and 2014b showed the rayleigh as the best approximation.
While inspecting the values, it seems that the negative log-likelihood of the rayleigh in 2013b and 2014b is a complex number. I don't think that makes much sense, so I changed the code around line 209 from:
%If NLL is non-finite number, produce error to ignore distribution
if ~isfinite(NLL)
error('non-finite NLL');
end
Into:
%If NLL is non-finite or complex number, produce error to ignore
%distribution
if ~isfinite(NLL) || ~isreal(NLL)
error('non-finite or non-real NLL');
end
That filters out the rayleigh distribution. Not really sure what changed between versions, but this might help.
For saving time of the others, I have to highlight that this problem arises when you are using function ALLFITDIST in MATLAB 2014a. In version 2010b, everything is fine.
I won't accept this answer as a correct one, since it does not contain any information relating to the part of function that cause this dependency and problem.
I am using MATLAB's PCG subroutine to solve a system of linear equations. However, I don't want it to solve exactly. I want it to run for only 20 iterations and if it doesn't converge, I want it to return the value at the 20th iteration.
What MATLAB (My version is the latest one) is doing however is returning a zero vector if it doesn't find an acceptable solution by 20 iterations. Is there any way to override this without changing the source code of pcg.m?
I have a code which I wrote which does that (I just copied from Wikipedia) but obviously, it is not close to how robust MATLAB's version is.
function [x] = conjgrad(A,b,x)
r=b-A*x;
p=r;
rsold=r'*r;
for i=1:20
Ap=A*p;
alpha=rsold/(p'*Ap);
x=x+alpha*p;
r=r-alpha*Ap;
rsnew=r'*r;
if sqrt(rsnew)<1e-10
break;
end
p=r+rsnew/rsold*p;
rsold=rsnew;
end
I have access to a 12 core machine and some matlab code that relies heavily on fftn. I would like to speed up my code.
Since the fft can be parallelized I would think that more cores would help but I'm seeing the opposite.
Here's an example:
X = peaks(1028);
ncores = feature('numcores');
ntrials = 20;
mtx_power_times = zeros(ncores,ntrials);
fft_times = zeros(ncores, ntrials);
for i=1:ncores
for j=1:ntrials
maxNumCompThreads(i);
tic;
X^2;
mtx_power_times(i,j) = toc;
tic
fftn(X);
fft_times(i,j) = toc;
end
end
subplot(1,2,1);
plot(mtx_power_times,'x-')
title('mtx power time vs number of cores');
subplot(1,2,2);
plot(fft_times,'x-');
title('fftn time vs num of cores');
Which gives me this:
The speedup for matrix multiplication is great but it looks like my ffts go almost 3x slower when I use all my cores. What's going on?
For reference my version is 7.12.0.635 (R2011a)
Edit: On large 2D arrays taking 1D transforms I get the same problem:
Edit: The problem appears to be that fftw is not seeing the thread limiting that maxNumCompThreads enforces. I'm getting all the cpus going full speed no matter what I set maxNumCompThreads at.
So... is there a way I can specify how many processors I want to use for an fft in Matlab?
Edit: Looks like I can't do this without some careful work in .mex files. http://www.mathworks.com/matlabcentral/answers/35088-how-to-control-number-of-threads-in-fft has an answer. It would be nice if someone has an easy fix...
Looks like I can't do this without some careful work in .mex files. http://www.mathworks.com/matlabcentral/answers/35088-how-to-control-number-of-threads-in-fft has an answer. It would be nice if someone has an easy fix...
To use different cores, you should use the Parallel Computing Toolbox. For instance, you could use a parfor loop, and you have to pass the functions as a list of handles:
function x = f(n, i)
...
end
m = ones(8);
parfor i=1:8
m(i,:) = f(m(i,:), i);
end
More info is available at:
High performance computing
Multithreaded computation
Multithreading
I have a function which does the following loop many, many times:
for cluster=1:max(bins), % bins is a list in the same format as kmeans() IDX output
select=bins==cluster; % find group of values
means(select,:)=repmat_fast_spec(meanOneIn(x(select,:)),sum(select),1);
% (*, above) for each point, write the mean of all points in x that
% share its label in bins to the equivalent row of means
delta_x(select,:)=x(select,:)-(means(select,:));
%subtract out the mean from each point
end
Noting that repmat_fast_spec and meanOneIn are stripped-down versions of repmat() and mean(), respectively, I'm wondering if there's a way to do the assignment in the line labeled (*) that avoids repmat entirely.
Any other thoughts on how to squeeze performance out of this thing would also be welcome.
Here is a possible improvement to avoid REPMAT:
x = rand(20,4);
bins = randi(3,[20 1]);
d = zeros(size(x));
for i=1:max(bins)
idx = (bins==i);
d(idx,:) = bsxfun(#minus, x(idx,:), mean(x(idx,:)));
end
Another possibility:
x = rand(20,4);
bins = randi(3,[20 1]);
m = zeros(max(bins),size(x,2));
for i=1:max(bins)
m(i,:) = mean( x(bins==i,:) );
end
dd = x - m(bins,:);
One obvious way to speed up calculation in MATLAB is to make a MEX file. You can compile C code and perform any operations you want. If you're searching for the fastest-possible performance, turning the operation into a custom MEX file would likely be the way to go.
You may be able to get some improvement by using ACCUMARRAY.
%# gather array sizes
[nPts,nDims] = size(x);
nBins = max(bins);
%# calculate means. Not sure whether it might be faster to loop over nDims
meansCell = accumarray(bins,1:nPts,[nBins,1],#(idx){mean(x(idx,:),1)},{NaN(1,nDims)});
means = cell2mat(meansCell);
%# subtract cluster means from x - this is how you can avoid repmat in your code, btw.
%# all you need is the array with cluster means.
delta_x = x - means(bins,:);
First of all: format your code properly, surround any operator or assignment by whitespace. I find your code very hard to comprehend as it looks like a big blob of characters.
Next of all, you could follow the other responses and convert the code to C (mex) or Java, automatically or manually, but in my humble opinion this is a last resort. You should only do such things when your performance is not there yet by a small margin. On the other hand, your algorithm doesn't show obvious flaws.
But the first thing you should do when trying to improve performance: profile. Use the MATLAB profiler to determine which part of your code is causing your problems. How much would you need to improve this to meet your expectations? If you don't know: first determine this boundary, otherwise you will be looking for a needle in a hay stack which might not even be in there in the first place. MATLAB will never be the fastest kid on the block with respect to runtime, but it might be the fastest with respect to development time for certain kinds of operations. In that respect, it might prove useful to sacrifice the clarity of MATLAB over the execution speed of other languages (C or even Java). But in the same respect, you might as well code everything in assembler to squeeze all of the performance out of the code.
Another obvious way to speed up calculation in MATLAB is to make a Java library (similar to #aardvarkk's answer) since MATLAB is built on Java and has very good integration with user Java libraries.
Java's easier to interface and compile than C. It might be slower than C in some cases, but the just-in-time (JIT) compiler in the Java virtual machine generally speeds things up very well.
I have just profiled my MATLAB code and there is a bottle-neck in this for loop:
for vert=-down:up
for horz=-lhs:rhs
y = y + x(k+vert.*length+horz).*DM(abs(vert).*nu+abs(horz)+1);
end
end
where y, x and DM are vectors I have already defined. I vectorised the loop by writing,
B=(-down:up)'*ones(1,lhs+rhs+1);
C=ones(up+down+1,1)*(-lhs:rhs);
y = sum(sum(x(k+length.*B+C).*DM(abs(B).*nu+abs(C)+1)));
But this ended up being sufficiently slower.
Are there any suggestions on how I can speed up this for loop?
Thanks in advance.
What you've done is not really vectorization. It's very difficult, if not impossible, to write proper vectorization procedures for image processing (I assume that's what you're doing) in Matlab. When we use the term vectorized, we really mean "vectorized with no additional computation". For example, this code
a = 1:1000000;
for i = a
n = n+i;
end
would run much slower then this code
a = 1:1000000;
sum(a)
Update: code above has been modified, thanks to #Rasman's keen suggestion. The reason is that Matlab does not compile your code into machine language before running it, and that's what causes it to be slower. Built-in functions like sum, mean and the .* operator run pre-compiled C code behind the scenes. For loops are a great example of code that runs slowly when not optimized for you CPU's registers.
What you have done, and please ignore my first comment, is rewriting your procedure with a vector operation and some additional operations. Those are the operations that take extra CPU simply because you're telling your computer to do more computations, even though each computation separately may (or may not) take less time.
If you are really after speeding up you code, take a look at MEX files. They allow you to write and compile C and C++ code, compile it and run as Matlab functions, just like those fast built-in ones. In any case, Matlab is not meant to be a fast general-purpose programming platform, but rather a computer simulation environment, though this approach has been changing in the recent years. My advise (from experience) is that if you do image processing, you will write for loops, and there's rarely a way around it. Vector operations were written for a more intuitive approach to linear algebra problems, and we rarely treat digital images as regular rectangular matrices in terms of what we do with them.
I hope this helps.
I would use matrices when handling images... you could then try to extract submatrices like so:
X = reshape(x,height,length);
kx = mod(k,length);
ky = floor(k/length);
xstamp = X( [kx-down:kx+up], [ky-lhs:ky+rhs]);
xstamp = xstamp.*getDMMMask(width, height);
y = sum(xstamp);
...
function mask = getDMMask(width, height, nu)
% I don't get what you're doing there .. return an appropriate sized mask here.
return mask;
end