Speeding up sparse FFT computations

Speeding up sparse FFT computations - matlab

I'm hoping someone can review my code below and offer hints how to speed up the section between tic and toc. The function below attempts to perform an IFFT faster than Matlab's built-in function since (1) almost all of the fft-coefficient bins are zero (i.e. 10 to 1000 bins out of 10M to 300M bins are non-zero), and (2) only the central third of the IFFT results are retained (the first and last third are discarded -- so no need to compute them in the first place).
The input variables are:
fftcoef = complex fft-coef 1D array (10 to 1000 pts long)
bins = index of fft coefficients corresponding to fftcoef (10 to 1000 pts long)
DATAn = # of pts in data before zero padding and fft (in range of 10M to 260M)
FFTn = DATAn + # of pts used to zero pad before taking fft (in range of 16M to 268M) (e.g. FFTn = 2^nextpow2(DATAn))
Currently, this code takes a few orders of magnitude longer than Matlab's ifft function approach which computes the entire spectrum then discards 2/3's of it. For example, if the input data for fftcoef and bins are 9x1 arrays (i.e. only 9 complex fft coefficients per sideband; 18 pts when considering both sidebands), and DATAn=32781534, FFTn=33554432 (i.e. 2^25), then the ifft approach takes 1.6 seconds whereas the loop below takes over 700 seconds.
I've avoided using a matrix to vectorize the nn loop since sometimes the array size for fftcoef and bins could be up to 1000 pts long, and a 260Mx1K matrix would be too large for memory unless it could be broken up somehow.
Any advice is much appreciated! Thanks in advance.
function fn_fft_v1p0(fftcoef, bins, DATAn, FFTn)
fftcoef = [fftcoef; (conj(flipud(fftcoef)))]; % fft coefficients
bins = [bins; (FFTn - flipud(bins) +2)]; % corresponding fft indices for fftcoef array
ttrend = zeros( (round(2*DATAn/3) - round(DATAn/3) + 1), 1); % preallocate
start = round(DATAn/3)-1;
tic;
for nn = start+1 : round(2*DATAn/3) % loop over desired time indices
% sum over all fft indices having non-zero coefficients
arg = 2*pi*(bins-1)*(nn-1)/FFTn;
ttrend(nn-start) = sum( fftcoef.*( cos(arg) + 1j*sin(arg));
end
toc;
end

You have to keep in mind that Matlab uses a compiled fft library (http://www.fftw.org/) for its fft functions, which besides operating much faster then a Matlab script, it is well optimized for many use-cases. So a first step might be writing your code in c/c++ and compiling it as a mex file you can use within Matlab. That will surely speed up your code at least an order of magnitude (probably more).
Besides that, one simple optimization you can do is by considering 2 things:
You assume your time series is real valued, so you can use the symmetry of the fft coeffs.
Your time series is typically much longer then your fft coeffs vector, so it is better to iterate over bins instead of time points (thus vectorizing the longer vector).
These two points are translated to the following loop:
nn=(start+1 : round(2*DATAn/3))';
ttrend2 = zeros( (round(2*DATAn/3) - round(DATAn/3) + 1), 1);
tic;
for bn = 1:length(bins)
arg = 2*pi*(bins(bn)-1)*(nn-1)/FFTn;
ttrend2 = ttrend2 + 2*real(fftcoef(bn) * exp(i*arg));
end
toc;
Note you have to use this loop before you expand bins and fftcoef, since the symmetry is already taken into account. This loop takes 8.3 seconds to run with the parameters from your question, while it takes on my pc 141.3 seconds to run with your code.

I have posted a question/answer at Accelerating FFTW pruning to avoid massive zero padding which solves the problem for the C++ case using FFTW. You can use this solution by exploiting mex-files.

Related

How do I calculate an exponentially weighted moving mean using DSP signal processing toolbox

I am using MATLAB R2020a with MacOS. I am trying to find the exponentially weighted moving mean of the cycle period of an ECG signal, and have used the dsp.MovingAverage function from the DSP signal processing toolbox, and called the commands shown. However, I am not sure how to specify how many of the elements of the vector to include in the weighted mean. At the moment, is it just adding a weight to all of the elements and then finding the moving mean?
movavgExp = dsp.MovingAverage('Method', 'Exponential weighting', 'ForgettingFactor', 0.1);
Whenever I call the 'WindowLength' command as specified in the DSP documentation, it produces an error:
movavgExp = dsp.MovingAverage(10, 'Method', 'Exponential weighting', 'ForgettingFactor', 0.1);
Warning: The WindowLength property is not relevant in this configuration of the System
object.
I would really appreciate any suggestions for this, thanks in advance!

From the Mathworks page for dsp.MovingAverage:
"Exponential weighting — The block multiplies the samples by a set of weighting factors. The magnitude of the weighting factors decreases exponentially as the age of the data increases, but the magnitude never reaches zero. To compute the average, the algorithm sums the weighted data."
So there is no real averaging window as you use all your signal up to time t (exponentially weighted) for the mean value at that instant.
Of course older samples are weighted less than newer ones, and the parameter for that is that ForgettingFactor. I guess you could then define an "effective" averaging window as the number of samples whose weight is larger than a threshold.
Unfortunately it doesn't seem like dsp.MovingAverage can return the weights itself, but you can calculate them yourself. From the Mathworks page,
where is the weight for the Nth sample and is your forgetting factor. Remember to initialize the weight for the first sample to 1, so that you could have something like:
w = zeros(length(x),1); % where x is your signal
w(1) = 1; % initialize the weight for the first sample
for i = 2:length(x)
w(i) = lambda*w(i-1) + 1; % calculate the successive weights
end
To have then the averaging window for the N-th sample I would probably then normalize the weights from 1 to N with respect to the their sum:
thr = 1.e-3; % your threshold, you'll probably have to play with this a bit
lengthAveragWdw = zeros(length(x),1);
for i = 1:length(x)
wi = w(1:i); % weights used to calculate the moving average up to the i-th sample
wi = wi./sum(wi); % normalize the weights
lengthAveragWdw(i) = sum(wi >= thr); % count the number of samples whose weight is greater than the threshold
end
where thr is a threshold value that you have to decide beforehand.

MATLAB: Faster method of finding Discrete Fourier Transform

I'm trying to write a code in matlab that does basically the same as the build-in fft function. Hence computing the discrete fourier transform of any given input vector.
The transform is given by
% N
% X(k) = sum x(n)*exp(-j*2*pi*(k-1)*(n-1)/N), 1 <= k <= N.
% n=1
Now I created my own code to do this, but the computational effort is about a factor 200 when I look at the computation times. Obviously I would like to reduce this.
Below the computational part of my code, where y is the output vector.
N=length(input_vector)
for k = 1:N
y(k)=0;
for n = 1:N
term = input_vector(n)*exp(-2*pi*1i*(n-1)*(k-1)/N);
y(k)=y(k)+term;
end
end
Now I think the computation is heavy because of the for loops and the line with y(k)=y(k)+term, since this happens at all iterations. I reckon I should be able to make this smaller by either using vector/matrix notation or by using functions with dummy variables and then iterate these functions. But I don't know how to start this process.
Any help or suggestions would be much appreciated.

Using implicit expansion you can greatly reduce the computation time of your algorithm:
% Vector length
N = length(input_vector);
% Vectorized DFT algorithm
y = sum(input_vector.*exp(-2*pi*1i*[0:N-1].'*[0:N-1]/N),2);
There is however two downsides:
The vectorization will consume a lot of memory (since a vector N*N
have to be created)
It won't be faster than the built-in function.

In Matlab, how to fast generate 'sparse random matrix', and fast multiply it with a dense vector?

I am using this function X = randsrc(250,600,[[-1,0,1];[0.5/ps,1-1/ps,0.5/ps]])) with ps=2373 It shows that 250*600 matrix is generated. Its entries only contain -1,0 or 1. And -1,0,1 is randomly choosed according to the probability distribution 0.5/ps,1-1/ps,0.5/ps.
So that the density is about 0.00042.
The above X is called sparse random projection matrix, see https://web.stanford.edu/~hastie/Papers/Ping/KDD06_rp.pdf. It can be used to compress a data vector from dimension 600 to 250 with some nice geometric properties guaranteed.
The problem is that in Matlab, randsrc seems to be very slow (e.g., compared with randn(250,600)). Then, how can I fast generate the above matrix?
BTW, how can I fast calculate X*y? where y may be a dense vector.
My code is:
ps=2373;
tic;
X = randsrc(250,600,[[-1,0,1];[0.5/ps,1-1/ps,0.5/ps]]));
toc
a = randn(600,1);
tic;
X*a;
toc
Also, I have tried a same Python function http://scikit-learn.org/stable/modules/generated/sklearn.random_projection.SparseRandomProjection.html, it is twice faster than Matlab.

You can use sprand to generate a sparsity structure, then find to extract the rows and columns of the non-zero elements. Finally randsample will select values -1,1 with 50% probability of each:
ps=2373;
tic
[i,j,~] = find(sprand(250,600,1/ps))
X = sparse(i,j,randsample([-1,1],length(i),true))
toc
MATLAB is very fast at multiplying matrices so X*a is very fast.

How to calculate matrix entries efficently using Matlab

I have a cell array myBasis of sparse matricies B_1,...,B_n.
I want to evaluate with Matlab the matrix Q(i,j) = trace (B^T_i * B_j).
Therefore, I wrote the following code:
for i=1:n
for j=1:n
B=myBasis{i};
C=myBasis{j};
Q(i,j)=trace(B'*C);
end
end
Which takes already 68 seconds when n=1226 and B_i has 50 rows, and 50 colums.
Is there any chance to speed this up? Usually I exclude for-loops from my matlab code in a c++ file - but I have no experience how to handle a sparse cell array in C++.

As noted by Inox Q is symmetric and therefore you only need to explicitly compute half the entries.
Computing trace( B.'*C ) is equivalent to B(:).'*C(:):
trace(B.'*C) = sum_i [B.'*C]_ii = sum_i sum_j B_ij * C_ij
which is the sum of element-wise products and therefore equivalent to B(:).'*C(:).
When explicitly computing trace( B.'*C ) you are actually pre-computing all k-by-k entries of B.'*C only to use the diagonal later on. AFAIK, Matlab does not optimize its calculation to save it from computing all the entries.
Here's a way
for ii = 1:n
B = myBasis{ii};
for jj = ii:n
C = myBasis{jj};
t = full( B(:).'*C(:) ); % equivalent to trace(B'*C)!
Q(ii,jj) = t;
Q(jj,ii) = t;
end
end
PS,
It is best not to use i and j as variable names in Matlab.
PPS,
You should notice that ' operator in Matlab is not matrix transpose, but hermitian conjugate, for actual transpose you need to use .'. In most cases complex numbers are not involved and there is no difference between the two operators, but once complex data is introduced, confusing between the two operators makes debugging quite a mess...

Well, a couple of thoughts
1) Basic stuff: A'*B = (B'*A)' and trace(A) = trace(A'). Well, only this trick cut your calculations by almost 50%. Your Q(i,j) matrix is symmetric, and you only need to calculate n(n+1)/2 terms (and not n²)
2) To calculate the trace you don't need to calculate every term of B'*C, just the diagonal. Nevertheless, I don't know if it's easy to create a script in Matlab that is actually faster then just calculating B'*C (MatLab is pretty fast with matrix operations).
But I would definitely implement (1)

Duplicating a 2d matrix in matlab along a 3rd axis MANY times

I'm looking to duplication a 784x784 matrix in matlab along a 3rd axis. The following code seems to work:
mat = reshape(repmat(mat, 1,10000),784,784,10000);
Unfortunately, it takes so long to run it's worthless (changing the 10,000s to 1000 makes it take a few minutes, and using 10,000 makes my whole machine freeze up practically). is there a faster way to do this?
For reference, I'm looking to use mvnpdf on 10,000 vectors each of length 784, using the same covariance matrix for each. So my final call looks like
mvnpdf(X,mu,mat)
%size(X) = (10000,784), size(mu) = (10000,784), size(mat) = 784,784,10000
If there's a way to do this that's not repeating the covariance matrix 10,000 times, that'd be helpful too. Thanks!

For replication in more than 2 dimensions, you need to supply the replication counts as an array:
out = repmat(mat,[1,1,10000])

Creating a 784x784 matrix 10,000 times isn't going to take advantage of the vectorization in MATLAB, which is going to be more useful for small arrays. Avoiding a for loop also won't help too much, given the following:
The main speedup you can gain here is by computing the inverse of the covariance matrix once, and then computing the pdf yourself. The inverse of sigma takes O(n^3), and you are needlessly doing that 10,000 times. (Also, the square root determinant can be precomputed.) For reference, the PDF of the multivariate normal distribution is computed as follows:
http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Properties
Better to just compute the inverse once, and then compute z = x - mu for each value, then doing z'Sz for each pdf value, and applying a simple function and a constant. But wait! You can vectorize that, too.
I don't have MATLAB in front of me, but this is basically what you need to do, and it'll run in an instant.
s = inv(sigma);
c = -0.5*log(det(s)) - (k/2)*log(2*pi);
z = x - mu; % 10000 x 784 matrix
ans = exp( c - 0.5 .* dot(z*s, z, 2) ); % 10000 x 1 vector