I am trying to iterate through a set of samples that seems to show periodic changes. I need continuously apply the fit function to get the fourier series coefficients, the regression has to be n samples in the past (in my case, around 30). The problem is, my code is extremely slow! It will take like 1 hour to do this for a set of 50,000 samples. Is there any way to optimize this? What am I doing wrong?
Here's my code:
function[coefnames,coef] = fourier_regression(vect_waves,n)
j = 1;
coef = zeros(length(vect_waves)-n,10);
for i=n+1:length(vect_waves)
take_fourier = vect_waves(i-n+1:i);
x = 1:n;
f = fit(x,take_fourier,'fourier4');
current_coef = coeffvalues(f);
coef(j,1:length(current_coef)) = current_coef;
j = j + 1;
end
coefnames = coeffnames(f);
end
When I call [coefnames,coef] = fourier_regression(VECTOR,30); This takes forever to compute. Is there any way to fix it? What's wrong with my code?
Note: I have a intel i7 5500 U cpu, 16GB RAM, and using Matlab 2015a.
As I am not familiar with your application, I am not sure whether it is possible to vectorize the code to improve performance. However, I have a couple of other tips.
One thing you should consider is preallocation of arrays. In this case, you should preallocate at least the array coef since I believe you do know its size before starting the loop.
Another thing I suggest is to profile your code. This will provide information on what parts of your code are consuming the most time, helping you focus your effort on improving those parts' performance.
Related
This is sort of a weird question but hopefully someone will be able to help me.
I have a matlab code where due to the parallelized nature of the code i need to work with struct arrays.
After running the parfor loop I want to transform those structure arrays into three-dimensional arrays.
At the moment I am using the following code:
for k = 1:nsim
ksim(:,:,k) = st(k).ksim;
Msim(:,k) = st(k).Msim;
Vsim(:,:,k) = st(k).Vsim;
Psim(:,:,k) = st(k).Psim;
end
clearvars st
However this seems to be extremely inefficient as momentarily matlab needs to double all the matrices thus almos doubling memory use.
Any smarter way of achieving this without increasing that much memory use?
I do not consider this as the answer you are looking for; But it is an improvement.
Define the new arrays and remove fields one by one. Since there seems to be three huge outputs, this will decrease the peak of memory usage to ~130% instead of 200%.
for k = 1:nsim
ksim(:,:,k) = st(k).ksim;
end
st = rmfield(st , 'ksim');
for k = 1:nsim
Msim(:,k) = st(k).Msim;
end
st = rmfield(st , 'Msim');
and so on.
I want to calculate the Euclidean distance between two images using the Hyperbolic Tangent (Sigmoid) kernel. Please follow this link where I have discussed the same problem using Gaussian Kernel in detail.
If x=(i,j) & y=(i1,j1) are any two pixels in our image then for hyperbolic tangent kernel, my H(x,y) will be defined as:
H(i,j) = tanh(alpha*(x'*y) + c)
where alpha and c are parameters and x' is the transpose of x. Parameter alpha can be taken as 1/N where N is my image dimension(8192 x 200 in my case) and c can take any value according to the problem. More detailed description about Hyperbolic Tangent kernel can be found here.
To achieve my goal & keeping the running time under consideration, I have written the below MATLAB script.
gray1=zeros(8192,200);
gray2=zeros(8192,200);
s1 = 8192;
s2 = 200;
alpha = s1*s2;
perms = combvec(1:s2,1:s1);
perms = [perms(2,:);perms(1,:)]';
perms1 = perms;
gray1(4096,100) = 10;
gray2(10,100) = 10;
img_diff = gray1 - gray2;
display('Calculation of Sigmoid Kernel started');
for i = 1:length(perms1)
kernel = sum(bsxfun(#times,perms,perms1(i,:))');
kernel1 = tanh((1/alpha)*kernel + 1)';
g_temp(i) = img_diff(:)'*kernel1;
end
temp = g_temp*img_diff(:);
ans = sqrt(temp);
In spite of my all efforts I couldn't vectorize it further so as to decrease its running cost. Currently, it is taking around 29 hours to complete which is too much for me as I want to run it for various different images. I want to give it a completely vectorized form using intrinsic MATLAB functions as it was done by #dan-man in the case of Gaussian Kernel. With his help the Gaussian Version was taking 1-2 secs to complete. I tried my best to use the same conv2fft function in this case also but it seems difficult to find a way to achieve that.
Can someone please help me to remove that one extra for loop so as to get the running cost of algorithm in the same proportion as that of the Gaussian version of same problem.
Thanks in advance.
Get rid of the nasty loop with matrix-multiplication -
g_temp = img_diff(:).'*tanh((1/alpha)*(perms*perms.')+1)
With my times in my PC for just 50 iterations, the code takes 2.07s
Just changing the bsxfun line to
kernel = sum(bsxfun(#times,perms,perms1(i,:)),2)';
as the warning suggests you can get it to 1.65s
If you use the Neural Network toolbox and substitute tanh by tansig , the time goes to 1.44s
If you write your own tanhas
kernel1= (2./(1+exp(-2.*((1/alpha)*kernel + 1)))-1)';
the time goes to 1.28s
Just these changes would mean improvement from 29h to 18h
And remember to preallocate!
g_temp=zeros(length(perms1),1);
After having learned basic programming in Java, I have found that the most difficult part of transitioning to MatLab for my current algorithm course, is to avoid loops. I know that there are plenty of smart ways to vectorize operations in MatLab, but my mind is so "stuck" in loop-thinking, that I am finding it hard to intuitively see how I may vectorize code. Once I am shown how it can be done, it makes sense to me, but I just don't see it that easily myself. Currently I have the following code for finding the barycentric weights used in Lagrangian interpolation:
function w = barycentric_weights(x);
% The function is used to find the weights of the
% barycentric formula based on a given grid as input.
n = length(x);
w = zeros(1,n);
% Calculating the weights
for i = 1:n
prod = 1;
for j = 1:n
if i ~= j
prod = prod*(x(i) - x(j));
end
end
w(i) = prod;
end
w = 1./w;
I am pretty sure there must be a smarter way to do this in MatLab, but I just can't think of it. If anyone has any tips I will be very grateful :). And the only way I'll ever learn all the vectorizing tricks in MatLab is to see how they are used in various scenarios such as above.
One has to be creative in matlab to avoid for loop:
[X,Y] =meshgrid(x,x)
Z = X - Y
w =1./prod(Z+eye(length(x)))
Kristian, there are a lot of ways to vectorize code. You've already gotten two. (And I agree with shakinfree: you should always consider 1) how long it takes to run in non-vectorized form (so you'll have an idea of how much time you might save by vectorizing); 2) how long it might take you to vectorize (so you'll have a better sense of whether or not it's worth your time; 3) how many times you will call it (again: is it worth doing); and 3) readability. As shakinfree suggests, you don't want to come back to your code a year from now and scratch your head about what you've implemented. At least make sure you've commented well.
But at a meta-level, when you decide that you need to improve runtime performance by vectorizing, first start with small (3x1 ?) array and make sure you understand exactly what's happening for each iteration. Then, spend some time reading this document, and following relevant links:
http://www.mathworks.com/help/releases/R2012b/symbolic/code-performance.html
It will help you determine when and how to vectorize.
Happy MATLABbing!
Brett
I can see the appeal of vectorization, but I often ask myself how much time it actually saves when I go back to the code a month later and have to decipher all that repmat gibberish. I think your current code is clean and clear and I wouldn't mess with it unless performance is really critical. But to answer your question here is my best effort:
function w = barycentric_weights_vectorized(x)
n = length(x);
w = 1./prod(eye(n) + repmat(x,n,1) - repmat(x',1,n),1);
end
Hope that helps!
And I am assuming x is a row vector here.
I have access to a 12 core machine and some matlab code that relies heavily on fftn. I would like to speed up my code.
Since the fft can be parallelized I would think that more cores would help but I'm seeing the opposite.
Here's an example:
X = peaks(1028);
ncores = feature('numcores');
ntrials = 20;
mtx_power_times = zeros(ncores,ntrials);
fft_times = zeros(ncores, ntrials);
for i=1:ncores
for j=1:ntrials
maxNumCompThreads(i);
tic;
X^2;
mtx_power_times(i,j) = toc;
tic
fftn(X);
fft_times(i,j) = toc;
end
end
subplot(1,2,1);
plot(mtx_power_times,'x-')
title('mtx power time vs number of cores');
subplot(1,2,2);
plot(fft_times,'x-');
title('fftn time vs num of cores');
Which gives me this:
The speedup for matrix multiplication is great but it looks like my ffts go almost 3x slower when I use all my cores. What's going on?
For reference my version is 7.12.0.635 (R2011a)
Edit: On large 2D arrays taking 1D transforms I get the same problem:
Edit: The problem appears to be that fftw is not seeing the thread limiting that maxNumCompThreads enforces. I'm getting all the cpus going full speed no matter what I set maxNumCompThreads at.
So... is there a way I can specify how many processors I want to use for an fft in Matlab?
Edit: Looks like I can't do this without some careful work in .mex files. http://www.mathworks.com/matlabcentral/answers/35088-how-to-control-number-of-threads-in-fft has an answer. It would be nice if someone has an easy fix...
Looks like I can't do this without some careful work in .mex files. http://www.mathworks.com/matlabcentral/answers/35088-how-to-control-number-of-threads-in-fft has an answer. It would be nice if someone has an easy fix...
To use different cores, you should use the Parallel Computing Toolbox. For instance, you could use a parfor loop, and you have to pass the functions as a list of handles:
function x = f(n, i)
...
end
m = ones(8);
parfor i=1:8
m(i,:) = f(m(i,:), i);
end
More info is available at:
High performance computing
Multithreaded computation
Multithreading
This question is related to these two:
Introduction to vectorizing in MATLAB - any good tutorials?
filter that uses elements from two arrays at the same time
Basing on the tutorials I read, I was trying to vectorize some procedure that takes really a lot of time.
I've rewritten this:
function B = bfltGray(A,w,sigma_r)
dim = size(A);
B = zeros(dim);
for i = 1:dim(1)
for j = 1:dim(2)
% Extract local region.
iMin = max(i-w,1);
iMax = min(i+w,dim(1));
jMin = max(j-w,1);
jMax = min(j+w,dim(2));
I = A(iMin:iMax,jMin:jMax);
% Compute Gaussian intensity weights.
F = exp(-0.5*(abs(I-A(i,j))/sigma_r).^2);
B(i,j) = sum(F(:).*I(:))/sum(F(:));
end
end
into this:
function B = rngVect(A, w, sigma)
W = 2*w+1;
I = padarray(A, [w,w],'symmetric');
I = im2col(I, [W,W]);
H = exp(-0.5*(abs(I-repmat(A(:)', size(I,1),1))/sigma).^2);
B = reshape(sum(H.*I,1)./sum(H,1), size(A, 1), []);
Where
A is a matrix 512x512
w is half of the window size, usually equal 5
sigma is a parameter in range [0 1] (usually one of: 0.1, 0.2 or 0.3)
So the I matrix would have 512x512x121 = 31719424 elements
But this version seems to be as slow as the first one, but in addition it uses a lot of memory and sometimes causes memory problems.
I suppose I've made something wrong. Probably some logic mistake regarding vectorizing. Well, in fact I'm not surprised - this method creates really big matrices and probably the computations are proportionally longer.
I have also tried to write it using nlfilter (similar to the second solution given by Jonas) but it seems to be hard since I use Matlab 6.5 (R13) (there are no sophisticated function handles available).
So once again, I'm asking not for ready solution, but for some ideas that would help me to solve this in reasonable time. Maybe you will point me what I did wrong.
Edit:
As Mikhail suggested, the results of profiling are as follows:
65% of time was spent in the line H= exp(...)
25% of time was used by im2col
How big are I and H (i.e. numel(I)*8 bytes)? If you start paging, then the performance of your second solution is going to be affected very badly.
To test whether you really have a problem due to too large arrays, you can try and measure the speed of the calculation using tic and toc for arrays A of increasing size. If the execution time increases faster than by the square of the size of A, or if the execution time jumps at some size of A, you can try and split the padded I into a number of sub-arrays and perform the calculations like that.
Otherwise, I don't see any obvious places where you could be losing lots of time. Well, maybe you could skip the reshape, by replacing B with A in your function (saves a little memory as well), and writing
A(:) = sum(H.*I,1)./sum(H,1);
You may also want to look into upgrading to a more recent version of Matlab - they've worked hard on improving performance.