MATLAB: Faster method of finding Discrete Fourier Transform - matlab

I'm trying to write a code in matlab that does basically the same as the build-in fft function. Hence computing the discrete fourier transform of any given input vector.
The transform is given by
% N
% X(k) = sum x(n)*exp(-j*2*pi*(k-1)*(n-1)/N), 1 <= k <= N.
% n=1
Now I created my own code to do this, but the computational effort is about a factor 200 when I look at the computation times. Obviously I would like to reduce this.
Below the computational part of my code, where y is the output vector.
N=length(input_vector)
for k = 1:N
y(k)=0;
for n = 1:N
term = input_vector(n)*exp(-2*pi*1i*(n-1)*(k-1)/N);
y(k)=y(k)+term;
end
end
Now I think the computation is heavy because of the for loops and the line with y(k)=y(k)+term, since this happens at all iterations. I reckon I should be able to make this smaller by either using vector/matrix notation or by using functions with dummy variables and then iterate these functions. But I don't know how to start this process.
Any help or suggestions would be much appreciated.

Using implicit expansion you can greatly reduce the computation time of your algorithm:
% Vector length
N = length(input_vector);
% Vectorized DFT algorithm
y = sum(input_vector.*exp(-2*pi*1i*[0:N-1].'*[0:N-1]/N),2);
There is however two downsides:
The vectorization will consume a lot of memory (since a vector N*N
have to be created)
It won't be faster than the built-in function.

Related

Vector operations using Gpu in Matlab

I'm writing a Matlab code that needs to calculate distances of vectors and I execute
X = norm(A(:,i)-B(:,j));
%do something with X
%loop over i and j
quite often. It is a relatively small computation so it is not really suitable for parfor, so I thought the best idea would be to implement it with the gpu functions.
I found that pagefun and arrayfun do something like what I want, but they execute element-wise operations and not on vectors.
So my question is, is there a more clever way of calculating norms without for loops? Or if I actually need to use gpu, what is the best way to do it?
If you need the norm between all the elements of A and B, the fastest way is probably this:
N = 1000; % number of elements
dim = 3; % number of dimensions
A = rand(dim,N, 'gpuArray');
B = rand(dim,1,N, 'gpuArray');
C = sqrt(squeeze(sum(bsxfun(#minus, A, B).^2))); % C(i,j) norm
I get the execution time of 0.16 seconds on CPU and 0.13 seconds on the GPU.

Summation using for loop in MATLAB

J = 0;
sumTerm = 0;
for i=1:m
sumTerm = sumTerm + ((theta(1)+theta(2)*X(i))-y(i)).^2;
end
J = (1/2*m)*sumTerm;
Is this the right way to do summation ?
How about this:
J = 0.5 * sum(((theta(1)*ones(size(X))+theta(2)*X)-y).^2)/m
Or as #rayryeng pointed out, you can even drop the ones
J = 0.5 * sum(((theta(1)+theta(2)*X)-y).^2)/m
That's correct, but you'll want to implement that vectorized instead of using loops. You can take advantage of this by using linear algebra to compute the sum for you. You can compute theta(1) + theta(2)*X(i) - y(i) for each term by first creating the matrix X that is a matrix of points where the first column is appended with all ones and the next column contains your single feature / data points. You would finally compute the difference between the output from the prediction line and the true output for each data point by X*theta - y which would thus produce a vector of differences for each data point. This is also assuming that your array of points and theta are both column vectors, and I believe that this is the right structure since this looks like you're implementing the cost function for univariate linear regression from Andrew Ng's Machine Learning course.
You can then compute the dot product of this vector with itself to compute the sum of square differences, then you can divide by 2*m when you're done:
vec = [ones(m,1) X]*theta - y;
J = (vec.'*vec) / (2*m); %'
The reason why you should pursue a linear algebra solution instead is because native matrix and vector operations in MATLAB are very, very fast and if you can find a solution to your computational problems with linear algebra, it'll be the fastest you can ever get your code to compute things.
For example, see this post on why matrix multiplication in MATLAB is amongst the fastest when benchmarking with other platforms: Why is MATLAB so fast in matrix multiplication?

Obtaining the constant that makes the integral equal to zero in Matlab

I'm trying to code a MATLAB program and I have arrived at a point where I need to do the following. I have this equation:
I must find the value of the constant "Xcp" (greater than zero), that is the value that makes the integral equal to zero.
In order to do so, I have coded a loop in which the the value of Xcp advances with small increments on each iteration and the integral is performed and checked if it's zero, if it reaches zero the loop finishes and the Xcp is stored with this value.
However, I think this is not an efficient way to do this task. The running time increases a lot, because this loop is long and has the to perform the integral and the integration limits substitution every time.
Is there a smarter way to do this in Matlab to obtain a better code efficiency?
P.S.: I have used conv() to multiply both polynomials. Since cl(x) and (x-Xcp) are both polynomials.
EDIT: Piece of code.
p = [1 -Xcp]; % polynomial (x-Xcp)
Xcp=0.001;
i=1;
found=false;
while(i<=x_te && found~=true) % Xcp is upper bounded by x_te
int_cl_p = polyint(conv(cl,p));
Cm_cp=(-1/c^2)*diff(polyval(int_cl_p,[x_le,x_te]));
if(Cm_cp==0)
found=true;
else
Xcp=Xcp+0.001;
end
end
This is the code I used to run this section. Another problem is that I have to do it for different cases (different cl functions), for this reason the code is even more slow.
As far as I understood, you need to solve the equation for X_CP.
I suggest using symbolic solver for this. This is not the most efficient way for large polynomials, but for polynomials of degree 20 it takes less than 1 second. I do not claim that this solution is fastest, but this provides generic solution to the problem. If your polynomial does not change every iteration, then you can use this generic solution many times and not spend time for calculating integral.
So, generic symbolic solution in terms of xLE and xTE is obtained using this:
syms xLE xTE c x xCP
a = 1:20;
%//arbitrary polynomial of degree 20
cl = sum(x.^a.*randi([-100,100],1,20));
tic
eqn = -1/c^2 * int(cl * (x-xCP), x, xLE, xTE) == 0;
xCP = solve(eqn,xCP);
pretty(xCP)
toc
Elapsed time is 0.550371 seconds.
You can further use matlabFunction for finding the numerical solutions:
xCP_numerical = matlabFunction(xCP);
%// we then just plug xLE = 10 and xTE = 20 values into function
answer = xCP_numerical(10,20)
answer =
19.8038
The slight modification of the code can allow you to use this for generic coefficients.
Hope that helps
If you multiply by -1/c^2, then you can rearrange as
and integrate however you fancy. Since c_l is a polynomial order N, if it's defined in MATLAB using the usual notation for polyval, where coefficients are stored in a vector a such that
then integration is straightforward:
MATLAB code might look something like this
int_cl_p = polyint(cl);
int_cl_x_p = polyint([cl 0]);
X_CP = diff(polyval(int_cl_x_p,[x_le,x_te]))/diff(polyval(int_cl_p,[x_le,x_te]));

How to calculate matrix entries efficently using Matlab

I have a cell array myBasis of sparse matricies B_1,...,B_n.
I want to evaluate with Matlab the matrix Q(i,j) = trace (B^T_i * B_j).
Therefore, I wrote the following code:
for i=1:n
for j=1:n
B=myBasis{i};
C=myBasis{j};
Q(i,j)=trace(B'*C);
end
end
Which takes already 68 seconds when n=1226 and B_i has 50 rows, and 50 colums.
Is there any chance to speed this up? Usually I exclude for-loops from my matlab code in a c++ file - but I have no experience how to handle a sparse cell array in C++.
As noted by Inox Q is symmetric and therefore you only need to explicitly compute half the entries.
Computing trace( B.'*C ) is equivalent to B(:).'*C(:):
trace(B.'*C) = sum_i [B.'*C]_ii = sum_i sum_j B_ij * C_ij
which is the sum of element-wise products and therefore equivalent to B(:).'*C(:).
When explicitly computing trace( B.'*C ) you are actually pre-computing all k-by-k entries of B.'*C only to use the diagonal later on. AFAIK, Matlab does not optimize its calculation to save it from computing all the entries.
Here's a way
for ii = 1:n
B = myBasis{ii};
for jj = ii:n
C = myBasis{jj};
t = full( B(:).'*C(:) ); % equivalent to trace(B'*C)!
Q(ii,jj) = t;
Q(jj,ii) = t;
end
end
PS,
It is best not to use i and j as variable names in Matlab.
PPS,
You should notice that ' operator in Matlab is not matrix transpose, but hermitian conjugate, for actual transpose you need to use .'. In most cases complex numbers are not involved and there is no difference between the two operators, but once complex data is introduced, confusing between the two operators makes debugging quite a mess...
Well, a couple of thoughts
1) Basic stuff: A'*B = (B'*A)' and trace(A) = trace(A'). Well, only this trick cut your calculations by almost 50%. Your Q(i,j) matrix is symmetric, and you only need to calculate n(n+1)/2 terms (and not n²)
2) To calculate the trace you don't need to calculate every term of B'*C, just the diagonal. Nevertheless, I don't know if it's easy to create a script in Matlab that is actually faster then just calculating B'*C (MatLab is pretty fast with matrix operations).
But I would definitely implement (1)

Speeding up sparse FFT computations

I'm hoping someone can review my code below and offer hints how to speed up the section between tic and toc. The function below attempts to perform an IFFT faster than Matlab's built-in function since (1) almost all of the fft-coefficient bins are zero (i.e. 10 to 1000 bins out of 10M to 300M bins are non-zero), and (2) only the central third of the IFFT results are retained (the first and last third are discarded -- so no need to compute them in the first place).
The input variables are:
fftcoef = complex fft-coef 1D array (10 to 1000 pts long)
bins = index of fft coefficients corresponding to fftcoef (10 to 1000 pts long)
DATAn = # of pts in data before zero padding and fft (in range of 10M to 260M)
FFTn = DATAn + # of pts used to zero pad before taking fft (in range of 16M to 268M) (e.g. FFTn = 2^nextpow2(DATAn))
Currently, this code takes a few orders of magnitude longer than Matlab's ifft function approach which computes the entire spectrum then discards 2/3's of it. For example, if the input data for fftcoef and bins are 9x1 arrays (i.e. only 9 complex fft coefficients per sideband; 18 pts when considering both sidebands), and DATAn=32781534, FFTn=33554432 (i.e. 2^25), then the ifft approach takes 1.6 seconds whereas the loop below takes over 700 seconds.
I've avoided using a matrix to vectorize the nn loop since sometimes the array size for fftcoef and bins could be up to 1000 pts long, and a 260Mx1K matrix would be too large for memory unless it could be broken up somehow.
Any advice is much appreciated! Thanks in advance.
function fn_fft_v1p0(fftcoef, bins, DATAn, FFTn)
fftcoef = [fftcoef; (conj(flipud(fftcoef)))]; % fft coefficients
bins = [bins; (FFTn - flipud(bins) +2)]; % corresponding fft indices for fftcoef array
ttrend = zeros( (round(2*DATAn/3) - round(DATAn/3) + 1), 1); % preallocate
start = round(DATAn/3)-1;
tic;
for nn = start+1 : round(2*DATAn/3) % loop over desired time indices
% sum over all fft indices having non-zero coefficients
arg = 2*pi*(bins-1)*(nn-1)/FFTn;
ttrend(nn-start) = sum( fftcoef.*( cos(arg) + 1j*sin(arg));
end
toc;
end
You have to keep in mind that Matlab uses a compiled fft library (http://www.fftw.org/) for its fft functions, which besides operating much faster then a Matlab script, it is well optimized for many use-cases. So a first step might be writing your code in c/c++ and compiling it as a mex file you can use within Matlab. That will surely speed up your code at least an order of magnitude (probably more).
Besides that, one simple optimization you can do is by considering 2 things:
You assume your time series is real valued, so you can use the symmetry of the fft coeffs.
Your time series is typically much longer then your fft coeffs vector, so it is better to iterate over bins instead of time points (thus vectorizing the longer vector).
These two points are translated to the following loop:
nn=(start+1 : round(2*DATAn/3))';
ttrend2 = zeros( (round(2*DATAn/3) - round(DATAn/3) + 1), 1);
tic;
for bn = 1:length(bins)
arg = 2*pi*(bins(bn)-1)*(nn-1)/FFTn;
ttrend2 = ttrend2 + 2*real(fftcoef(bn) * exp(i*arg));
end
toc;
Note you have to use this loop before you expand bins and fftcoef, since the symmetry is already taken into account. This loop takes 8.3 seconds to run with the parameters from your question, while it takes on my pc 141.3 seconds to run with your code.
I have posted a question/answer at Accelerating FFTW pruning to avoid massive zero padding which solves the problem for the C++ case using FFTW. You can use this solution by exploiting mex-files.