FFT and changing frequency and vectorizing a FOR loop - matlab

FFT and changing frequency and vectorizing for loop
Greetings All
I can increase and decrease the frequency of a
signal using the combination of fft and a Fourier series expansion FOR loop in
the code below
but if the signal/array is to large it becomes extremely
slow (an array that's 1x44100 takes about 2 mins to complete) I'm sure
it has to do with the for loop but
I'm not exactly sure how to vectorize it to improve performance. Please note that this will be used with audio signals that are 3 to 6 mins long. The 1x44100 array is only a second and it takes about 2 mins to complete
Any recommendations
%create signal
clear all, clc,clf,tic
x= linspace(0,2*pi,44100)';
%Used in exporting to ycalc audio file make sure in sync with above
freq_orig=1;
freq_new=4
vertoff=0;
vertoffConj=0;
vertoffInv=0;
vertoffInvConj=0;
phaseshift=(0)*pi/180 ; %can use mod to limit to 180 degrees
y=sin(freq_orig*(x));
[size_r,size_c]=size(y);
N=size_r; %to test make 50
T=2*pi;
dt=T/N;
t=linspace(0,T-dt,N)';
phase = 0;
f0 = 1/T; % Exactly, one period
y=(y/max(abs(y))*.8)/2; %make the max amplitude here
C = fft(y)/N; % No semicolon to display output
A = real(C);
B = imag(C)*-1; %I needed to multiply by -1 to get the correct sign
% Single-Sided (f >= 0)
An = [A(1); 2*A(2:round(N/2)); A(round(N/2)+1)];
Bn = [B(1); 2*B(2:round(N/2)); B(round(N/2)+1)];
pmax=N/2;
ycalc=zeros(N,1); %preallocating space for ycalc
w=0;
for p=2:pmax
%
%%1 step) re-create signal using equation
ycalc=ycalc+An(p)*cos(freq_new*(p-1).*t-phaseshift)
+Bn(p)*sin(freq_new*(p-1).*t-phaseshift)+(vertoff/pmax);
w=w+(360/(pmax-1)); %used to create phaseshift
phaseshift=w;
end;
fprintf('\n- Completed in %4.4fsec or %4.4fmins\n',toc,toc/60);
subplot(2,1,1), plot(y),title('Orginal Signal');
subplot(2,1,2),plot(ycalc),title('FFT new signal');
Here's a pic of the plot if some one wants to see the output, which is correct the FOR loop is just really really slow

It appears as though you are basically shifting the signal upwards in the frequency domain, and then your "series expansion" is simply implementing the inverse DFT on the shifted version. As you have seen, the naive iDFT is going to be exceedingly slow. Try changing that entire loop into a call to ifft, and you should be able to get a tremendous speedup.

Related

comparing generated data to measured data

we have measured data that we managed to determine the distribution type that it follows (Gamma) and its parameters (A,B)
And we generated n samples (10000) from the same distribution with the same parameters and in the same range (between 18.5 and 59) using for loop
for i=1:1:10000
tot=makedist('Gamma','A',11.8919,'B',2.9927);
tot= truncate(tot,18.5,59);
W(i,:) =random(tot,1,1);
end
Then we tried to fit the generated data using:
h1=histfit(W);
After this we tried to plot the Gamma curve to compare the two curves on the same figure uing:
hold on
h2=histfit(W,[],'Gamma');
h2(1).Visible='off';
The problem s the two curves are shifted as in the following figure "Figure 1 is the generated data from the previous code and Figure 2 is without truncating the generated data"
enter image description here
Any one knows why??
Thanks in advance
By default histfit fits a normal probability density function (PDF) on the histogram. I'm not sure what you were actually trying to do, but what you did is:
% fit a normal PDF
h1=histfit(W); % this is equal to h1 = histfit(W,[],'normal');
% fit a gamma PDF
h2=histfit(W,[],'Gamma');
Obviously that will result in different fits because a normal PDF != a gamma PDF. The only thing you see is that for the gamma PDF fits the curve better because you sampled the data from that distribution.
If you want to check whether the data follows a certain distribution you can also use a KS-test. In your case
% check if the data follows the distribution speccified in tot
[h p] = kstest(W,'CDF',tot)
If the data follows a gamma dist. then h = 0 and p > 0.05, else h = 1 and p < 0.05.
Now some general comments on your code:
Please look up preallocation of memory, it will speed up loops greatly. E.g.
W = zeros(10000,1);
for i=1:1:10000
tot=makedist('Gamma','A',11.8919,'B',2.9927);
tot= truncate(tot,18.5,59);
W(i,:) =random(tot,1,1);
end
Also,
tot=makedist('Gamma','A',11.8919,'B',2.9927);
tot= truncate(tot,18.5,59);
is not depending in the loop index and can therefore be moved in front of the loop to speed things up further. It is also good practice to avoid using i as loop variable.
But you can actually skip the whole loop because random() allows to return multiple samples at once:
tot=makedist('Gamma','A',11.8919,'B',2.9927);
tot= truncate(tot,18.5,59);
W =random(tot,10000,1);

matlab: running fft on short time intervals in a for-loop for the length of data

I have some EEG data that I would like to break down into 30-second windows and run a fast Fourier transform on each window of data. I've tried to implement a for-loop and increment the index value by the number of samples in the time window. When I run this, I can see that (1) this works for the first window of data, but not the rest of them because (I think) the "number of samples minus one" leads to fewer elements than necessary for data_fft and thus doesn't have the same dimensions as f, which are both being plotted in a figure. (2) I tried to update the index value by adding the number of samples in a window, but after i = 1, it goes to i = 2 in my workspace and not to i = 7681 as I'd hoped. I've spent an embarrassingly long time on trying to figure out how to change this so it works correctly, so any advice is appreciated! Code is below. Let me know if I can clarify anything.
data_ch6 = data(:,6); % looking at just 1 electrode
tmax = 2*60; % total time in sec I want to analyze; just keeping it to 2 minutes for this exercise
tmax_window = 30; %30 sec window
times = tmax/tmax_window; % number of times fft should be run
Nsamps = tmax*hdr.SPR; % total # samples in tmax; sample rate is 256 hz
Nsamps_window = tmax_window*hdr.SPR; % # samples in time window
f = hdr.SPR*(0:((Nsamps_window-1)/2))/Nsamps_window; % frequency for plotting
for i = 1:Nsamps; % need to loop through data in 30 second windows in tmax
data_fft = abs(fft(data_ch6(i:i+Nsamps_window-1))); %run fft on data window
data_fft = data_fft(i:((i+Nsamps_window-1)/2)); %discard half the points
figure
plot(f, data_fft)
i = i+Nsamps_window;
end
Well there are a few things that are wrong in your code. First, let me start be saying that i is a very poor choice for a variable name since in matlab it usually stand for sqrt{-1}.
As for your code, I assume that you intend to perform windowed FFT without overlapping.
1) Your loop goes from 1 to Nsamps with an increment of 1. That means the each time you advance 1 sample. In other words you have Nsamps_window-1 overlap. You can use perhaps i=1:Nsamps_window:Nsamps-Nsamps_window-1 if you are not interested in overlapping.
2) The length of data_fft is Nsamps_window, so I think what you wanted to do is data_fft = data_fft(1:round(Nsamps_window/2));
3) When plotting FFT results, I suggest using dB: plot(20*log10(abs(data_fft)));
4) The line i = i+Nsamps_window; is meaningless since i is your loop variable (it has not effect).

K-means Stopping Criteria in Matlab?

Im implementing the k-means algorithm on matlab without using the k-means built-in function, The stopping criteria is when the new centroids doesn't change by new iterations, but i cannot implement it in matlab , can anybody help?
Thanks
Setting no change as a stopping criteria is a bad idea. There are a few main reasons you shouldn't use a 0 change condition
even for a well behaved function the difference between 0 change and a very small change (say 1e-5 perhaps)could be 1000+ iterations, so you are wasting time trying to get them to be exactly the same. Especially because computers usually keep far more digits than we are interested in. IF you only need 1 digit accuracy, why wait for the computer to find an answer within 1e-31?
computers have floating point errors everywhere. Try doing some easily reversible matrix operations like a = rand(3,3); b = a*a*inv(a); a-b theoretically this should be 0 but you will see it isn't. So these errors alone could prevent your program from ever stopping
dithering. lets say we have a 1d k means problem with 3 numbers and we want to split them into 2 groups. One iteration the grouping can be a,b vs c. the next iteration could be a vs b,c the next could be a,b vs c the next.... This is of course a simplified example, but there can be instances where a few data points can dither between clusters, and you will end up with a never ending algorithm. Since those few points are reassigned, the change will never be 0
the solution is to use a delta threshold. basically you subtract the current values from the previous and if they are less than a threshold you are done. This on its own is powerful, but as with any loop, you need a backup escape plan. And that is setting a max_iterations variable. Look at matlabs documentation for kmeans, even they have a MaxIter variable (default is 100) so even if your kmeans doesn't converge, at least it wont run endlessly. Something like this might work
%problem specific
max_iter = 100;
%choose a small number appropriate to your problem
thresh = 1e-3;
%ensures it runs the first time
delta_mu = thresh + 1;
num_iter = 0;
%do your kmeans in the loop
while (delta_mu > thresh && num_iter < max_iter)
%save these right away
old_mu = curr_mu;
%calculate new means and variances, this is the standard kmeans iteration
%then store the values in a variable called curr_mu
curr_mu = newly_calculate_values;
%use the two norm to find the delta as a single number. no matter what
%the original dimensionality of mu was. If old_mu -new_mu was
% 0 the norm is still 0. so it behaves well as a distance measure.
delta_mu = norm(old_mu - curr_mu,2);
num_ter = num_iter + 1;
end
edit
if you don't know the 2 norm is essentially the euclidean distance

Neural Network Toolbox in Matlab get different results each time even if the initial weights are all zeros

Why should be closed and reopened the MATLAB windows for again running the neural network in order to get the same result? What parameters are effectively in this process?
EDIT (More details on my problem)
If I don't close all windows of MATLAB and don't re-open them to run another net (such as run by another number of neurons), the obtained results is different from every time that I close and reopen the windows. For example: I run the ANN by 5 neurons in hidden layer and get the R(1)=0.97, then I close and reopen my m.file and run by 5 neurons and get R(2)=0.58.Now, if I don't close and don't reopen, I may get R(1)=0.99 and R(2)=0.7 (R is regression). What parameters is effective so that these answers be different?
my code is as follow:
clc
clear
for m=6:7
% P is input matrix for training
% T is output matrix
[Pn,minP,maxP,Tn,minT,maxT] = premnmx(P,T);
net=newff(minmax(Pn),[m,1],{'logsig','purelin'},'trainlm');
net.trainParam.show =100;
net.trainParam.lr = 0.09;
net.trainParam.epochs =1000;
net.trainParam.goal = 1e-3;
[net,tr]=train(net,Pn,Tn);
diff= sim(net,Pn);
diff1 = postmnmx(diff,minT,maxT)
%testing===================================================================
[Pn,minP,maxP,Tn,minT,maxT] = premnmx(P,T);
% Pt is input matrix data for testing
% Tt is output matrix data for testing
Ptn = tramnmx(Pt,minP,maxP)
diff= sim(net,Ptn);
diff2 = postmnmx(diff,minT,maxT)
msetr=mse(diff1-T)
msets=mse(diff2-Tt)
y=(1/n)*sum(diff2); % n is number of testing data
R2=((sum((Tt-y).^2))-(sum((diff2-Tt).^2)))/(sum((Tt-y).^2))
net.IW{1,1}=zeros(m,5);
net.LW{2,1}=zeros(2,m);
net.b{1,1}=zeros(m,1);
net.b{2,1}=zeros(2,1);
end
when I run that, the answers for each number of neurons is different from time which I don't use a "for.. end" loop and run for each number of neurons by reopening the m-file and MATLAB windows.
However I give zero value to weights, didn't solve my problem.
I'm not quite sure what do you mean by Matlab windows, but you can control the pop-up of nntraintool GUI (nntraintool('close')) by putting
yournet.trainParam.showWindow = false;
yournet.trainParam.showCommandLine = false;
after your network yournet's definition but before the training function.
EDIT: my reply to the OP's EDIT
I attached my training and test code based on yours, I tried to learn y = x.^2, and my training data is [1,3,5,7,9] for x, and [2,4,6,8] for test. Yet I should say I got different weights every time even if the initial weights are all zero. That means given the hidden layer nodes of 6 or 7, the back propagation won't achieve unique solution. Please see my revisions below:
clc
clear
for m=6:7
% P is input matrix for training
% T is output matrix
P=[1 3 5 7 9];
T=P.^2;
[Pn,minP,maxP,Tn,minT,maxT] = premnmx(P,T);
clear net
net.IW{1,1}=zeros(m,1);
net.LW{2,1}=zeros(1,m);
net.b{1,1}=zeros(m,1);
net.b{2,1}=zeros(1,1);
net=newff(minmax(Pn),[m,1],{'logsig','purelin'},'trainlm');
net.trainParam.show =100;
net.trainParam.lr = 0.09;
net.trainParam.epochs =1000;
net.trainParam.goal = 1e-3;
[net,tr]=train(net,Pn,Tn);
diff= sim(net,Pn);
diff1 = postmnmx(diff,minT,maxT)
%testing===================================================================
[Pn,minP,maxP,Tn,minT,maxT] = premnmx(P,T);
% Pt is input matrix data for testing
% Tt is output matrix data for testing
Pt=[2 4 6 8];
Tt=Pt.^2;
n=length(Pt);
Ptn = tramnmx(Pt,minP,maxP)
diff= sim(net,Ptn);
diff2 = postmnmx(diff,minT,maxT)
msetr=mse(diff1-T)
msets=mse(diff2-Tt)
y=(1/n)*sum(diff2); % n is number of testing data
R2=((sum((Tt-y).^2))-(sum((diff2-Tt).^2)))/(sum((Tt-y).^2))
end
Actually if you add
aa=net.LW(2,1);
aa{1}
right before
[net,tr]=train(net,Pn,Tn);
you will find the weights are different every time you run it. Different Matlab Neural networks toolbox results is because of two reasons: (1) random data division, and (2) random weight initialization. Even if you zerolize the initial weight every time that you avoid (2), (1) still exists since dividerand randomizes the order of the input/target pairs.
One trick to compromise this is to record the first time's weight. In my case, I added:
bb = [ -0.2013 -0.8314 0.4717 0.4266 0.1441 -0.6205];
net.LW{2,1} = bb;
bbb = [-16.7956 -16.8096 16.8002 16.8001 -16.8101 -16.8416]';
net.IW{1}=bbb;
bbbb=0.2039;
bbbbb=[-16.8044 -10.0608 3.3530 -3.3563 -10.0588 -16.7584]';
net.b{1}=bbbbb;
net.b{2}=bbbb;
before [net,tr]=train(net,Pn,Tn);, then the result won't change. You may need to work on your own case by recording the net.b, net.IW, and net.LW values, and use them every time in your loop (save your net for the 1st trial run, and load net to get the value of net.b, net.IW, and net.LW in your loop run).
Yet I don't think this method make much sense. What I highly recommend you is to:
Initialize the rand weights.
Use an outer loop that specifies the number of hidden nodes,m
Use an inner loop that creates a net with a new set of random initial weights for each m; then trains, evaluates, and stores the R2 in a 2-D matrix.
Search the stored results for the smallest net with an acceptable performance, record the m.
Run several times in a loop with the determined m values and only store the index or weights of the current best design.
Select the weights with best performance

Find a Binary Data Sequence in a Signal

Here's my goal:
I'm trying to find a way to search through a data signal and find (index) locations where a known, repeating binary data sequence is located. Then, because the spreading code and demodulation is known, pull out the corresponding chip of data and read it. Currently, I believe xcorr will do the trick.
Here's my problem:
I can't seem to interpret my result from xcorr or xcorr2 to give me what I'm looking for. I'm either having a problem cross-referencing from the vector location of my xcorr function to my time vector, or a problem properly identifying my data sequence with xcorr, or both. Other possibilities may exist.
Where I am at/What I have:
I have created a random BPSK signal that consists of the data sequence of interest and garbage data over a repeating period. I have tried processing it using xcorr, which is where I am stuck.
Here's my code:
%% Clear Variables
clc;
clear all, close all;
%% Create random data
nbits = 2^10;
ngarbage = 3*nbits;
data = randi([0,1],1,nbits);
garbage = randi([0,1],1,ngarbage);
stream = horzcat(data,garbage);
%% Convert from Unipolar to Bipolar Encoding
stream_b = 2*stream - 1;
%% Define Parameters
%%% Variable Parameters
nsamples = 20*nbits;
nseq = 5 %# Iterate stream nseq times
T = 10; %# Number of periods
Ts = 1; %# Symbol Duration
Es = Ts/2; %# Energy per Symbol
fc = 1e9; %# Carrier frequency
%%% Dependent Parameters
A = sqrt(2*Es/Ts); %# Amplitude of Carrier
omega = 2*pi*fc %# Frequency in radians
t = linspace(0,T,nsamples) %# Discrete time from 0 to T periods with nsamples samples
nspb = nsamples/length(stream) %# Number of samples per bit
%% Creating the BPSK Modulation
%# First we have to stretch the stream to fit the time vector. We can quickly do this using _
%# simple matrix manipulation.
% Replicate each bit nspb/nseq times
repStream_b = repmat(stream_b',1,nspb/nseq);
% Tranpose and replicate nseq times to be able to fill to t
modSig_proto = repmat(repStream_b',1,nseq);
% Tranpose column by column, then rearrange into a row vector
modSig = modSig_proto(:)';
%% The Carrier Wave
carrier = A*cos(omega*t);
%% Modulated Signal
sig = modSig.*carrier;
Using XCORR
I use xcorr2() to eliminate the zero padding effect of xcorr on unequal vectors. See comments below for clarification.
corr = abs(xcorr2(data,sig); %# pull the absolute correlation between data and sig
[val,ind] = sort(corr(:),'descend') %# sort the correlation data and assign values and indices
ind_max = ind(1:nseq); %# pull the nseq highest valued indices and send to ind_max
Now, I think this should pull the five highest correlations between data and sig. These should correspond to the end bit of data in the stream for every iteration of stream, because I would think that is where the data would most strongly cross-correlate with sig, but they do not. Sometimes the maxes are not even one stream length apart. So I'm confused here.
Question
In a three part question:
Am I missing a certain step? How do I use xcorr in this case to find where data and sig are most strongly correlated?
Is my entire method wrong? Should I not be looking for the max correlations?
Or should I be attacking this problem from another angle, id est, not use xcorr and maybe use filter or another function?
Your overall method is great and makes a lot of sense. The problem you're having is that you're getting some actual correlation with your garbage data. I noticed that you shifted all of your sream to be zero-centered, but didn't do the same to your data. If you zero-center the data, your correlation peaks will be better defined (at least that worked when I tried it).
data = 2*data -1;
Also, I don't recommend using a simple sort to find your peaks. If you have a wide peak, which is especially possible with a noisy signal, you could have two high points right next to each other. Find a single maximum, and then zero that point and a few neighbors. Then just repeat however many times you like. Alternatively, if you know how long your epoch is, only do a correlation with one epoch's worth of data, and iterate through the signal as it arrives.
With #David K 's and #Patrick Mineault's help I manage to track down where I went wrong. First #Patrick Mineault suggested I flip the signals. The best way to see what you would expect from the result is to slide the small vector along the larger, searched vector. So
corr = xcorr2(sig,data);
Then I like to chop off the end there because it's just extra. I did this with a trim function I made that simply takes the signal you're sliding and trims it's irrelevant pieces off the end of the xcorr result.
trim = #(x,s2) x(1:end - (length(s2) - 1));
trim(corr,data);
Then, as #David K suggests, you need to have the data stream you're looking for encoded the same as your searched signal. So in this case
data = 2*data-1;
Second, if you just have your data at it's original bit length, and not at it's stretched, iterated length, it can be found in the signal but it will be VERY noisy. To reduce the noise, simply stretch the data to match it's stretched length in the iterated signal. So
rdata = repmat(data',1,nspb/nseq);
rdata = repmat(rdata',1,nseq);
data = rdata(:)';
Now finally, we should have crystal clear correlations for this case. And to pull out the maxes that should correspond to those correlations I wrote
[sortedValues sortIndex] = sort(corr(:),'descend');
c = 0 ;
for r = 1 : length(sortedValues)
if sortedValues(r,:) == max(corr)
c = c + 1;
maxIndex(1,c) = sortIndex(r,:);
else break % If you don't do this, you get loop lock
end
end
Now c should end up being nseq for this case and you should have 5 index times where the corrs should be! You can easily pull out the bits with another loop and c or length(maxIndex). I've also made this into a more "real world" toy script, where there is a data stream, doppler, fading, and it's over a time vector in seconds instead of samples.
Thanks for the help!
Try flipping the signal, i.e.:
corr = abs(xcorr2(data,sig(end:-1:1));
Is that any better?