FFT MATLAB code correct? - matlab

I'd like to generate the frequency spectrum of seven concatenated cosine functions.
I am unsure whether my code is correct; in particular, whether N = time*freq*7 is correct, or whether it should be N = time*freq (without the times seven).
My code is as follow:
sn = [1, 2, 3, 4, 5, 6, 7, 8];
time = 1;
freq = 22050;
N = time*freq*7;
dt = 1/freq;
t = 0 : dt : time - dt;
y = #(sn, phasePosNeg) cos(2*pi*(1200-100*sn) * t + phasePosNeg * sn*pi/10);
f = [y(sn(1), 1), y(sn(2), -1), y(sn(3), 1), y(sn(4), -1), y(sn(5), 1), y(sn(6), -1), y(sn(7), 1)];
F = abs(fftshift(fft(f)))/N;
df = freq/N;
faxis = -freq/2 : df : (freq/2-1/freq);
plot(faxis, F);
grid on;
axis([-1500, 1500, 0, 0.6]);
title('Frequency Spectrum Of Concatenated Cosine Functions');
xlabel('Frequency (Hz)');
ylabel('Magnitude');
I suppose the essense of my question is:
Should the height of the spikes equal 1/7 of 0.5, or simply 0.5? (All cosine functions have an amplitude of 1.)
Thank you.

Let me correct/help you on a few things:
1) the fourier transform is typically displayed in dB for its magnitute. 20*log base10(FFT coeff)
2) there is no need to divide your FFT amplitudes by any value of N.
F = abs(fftshift(fft(f)))/N; %get rid of the N or N =1
3) if N is the number of points in your FFT N = size(t); because you've taken that many samples of the sin/cos functions
4) When plotting the function remember that the FFT spans from -Pi to + Pi and you need to remap it to the frequency spectrum using your sampling frequency
5) becasue of the large phase discontinuties between these functions, dont expect the forrier transform the be a bunch of large narrow peaks. (otherwise Phase Modulation would be the modulationscheme of choice... zero bandwidth)

This is clearly homework, so I'm just going to give some direction: In the concatenated case, think of it as though you're adding six waveforms, which are each padded by zeros (6x the length of the waveform), and then offset these so they don't overlap and then added together to form the concatenated waveform. In the case where you're adding the FFTs of the individual waveforms, also remember that you're assuming that they are periodic. So you basically need to handle the zero padding so that you're comparing apples to apples. One check, of course, is to just use one waveform throughout and make sure it works in this case (and this should return exactly the same result since the FFT assumes that waveform is periodic -- that is, infinitely concatenated to itself).
Thinking of this in terms of Parseval's Theorem will probably be helpful in figuring out how to interpret and normalize things.
It is correct to use N=(7*time)*freq, since your actual time of the waveform is 7*time, regardless of how you constructed it.
Several of the comments talk about discontinuities, but it should be noted that these usually exist in the FFT anyway, since the FFT waveform is assumed to be periodic, and that usually means that there are effectively discontinuities at the boundaries even in the non-concatenated case.

Related

Cross correlate data that contains "spikes"

When using xcorr in MATLAB to cross correlate 2 related data sets, everything works as expected - I see a correlation peak and the lag reported is correct. However, when I use xcorr to cross correlate unrelated data sets where both data sets contain 1 cluster of "spikes", I see a correlation peak and the lag reported is the distance between the 2 spikes.
In this image:
x is a random data series. y is also a random data series. Both x and y have 30 random peaks inserted into the series in sequence. In theory, there should be no correlation between the 2 data sets since they are both very different. However, it can be seen from the 3rd plot that there is a very strong correlation between the 2 data sets. The code used to generate this figure is at the bottom of this post.
I've tried to filter the spikes using a few different mechanisms (rolling rms power ... etc) before performing the xcorr. This has worked in some cases but not all. I feel like I need a different approach to the problem, maybe an alternative to xcorr. I do understand why x and y cross correlate using xcorr. Is there another cross correlation tool that I can use? Note x and y will never be exactly the same, they will only ever be approximately the same but in normal operation, it's not the spikes that should make them correlate.
Any suggestions on how to tell if x and y correlate while also ignoring the "spikes"?
Here is some my example code:
x = rand(1, 3000);
x = x - 0.5;
y = rand(1, 3000);
y = y - 0.5;
% insert the impulses into the data
impulse_width = 30;
impulse_max_height = 6;
x_impulse_start = 460;
y_impulse_start = 120;
rand_insert_x = rand(1, impulse_width);
rand_insert_x = (rand_insert_x - 0.5) * 2 * impulse_max_height;
rand_insert_y = rand(1, impulse_width);
rand_insert_y = (rand_insert_y - 0.5) * 2 * impulse_max_height;
x(1,x_impulse_start:x_impulse_start + impulse_width - 1) = rand_insert_x;
y(1,y_impulse_start:y_impulse_start + impulse_width - 1) = rand_insert_y;
subplot(3, 1, 1);
plot(x);
ylim([-impulse_max_height impulse_max_height]);
title('random data series: x');
subplot(3, 1, 2);
plot(y);
ylim([-impulse_max_height impulse_max_height]);
title('random data series: y');
[c, l] = xcorr(x, y);
subplot(3, 1, 3);
plot(l, c);
title('correlation using xcorr');
The way to solve this is to use normalized cross-correlation.
In normalize cross-correlation the correlation is 1 when the signals are exactly the same, and less when they are not. You can see it as "percentage of similarity".
To do that in MATLAB, you just need to add 'coeff' as an argument to your code.
So, if I change your code to [c, l] = xcorr(x, y,'coeff'); the plot I get is the nest:
(note I changed sample size to 600 to make it more readable)
the cross-correlation gets to 0.3 there, so not much. However, if we change your code lines to
x(1,x_impulse_start:x_impulse_start + impulse_width - 1) = rand_insert_x;
y(1,y_impulse_start:y_impulse_start + impulse_width - 1) = rand_insert_x;
and insert the same random patter in both signals, then we get:
Now, the cross-correlation gets to a high value, almost 1, but not one, because the big random pattern there is the same, but the rest of the signal is not.
The cross-correlation is the convolution of two signals. Imagine that during the cross correlation, the two signals are at lags like I have shown here (x-axis labels should be completely ignored):
The positive (+) spike in series x (~ sample 490) is multiplied by the negative (-) spike in series y (~ sample 121), resulting in a large negative value in the xcorr, which we actually see in the bottom plot (~ sample 315). This large negative value will be added by something close to 0 since the rest of the signals are indeed low-power noise. I am afraid that no matter what xcorr function you use, you should get the same result. In fact, if there is another function that claims to be a cross-correlator, but doesn't give the same result as xcorr() then that function should not be called a cross-correlator. I hope this helps.
My understanding of the question is "How do I remove these spikes from my data?"
The answer is find something characteristic about those spikes, and then test each time window for that characteristic. If that test passes, then you have detected a spike, and you should remove that data.
For example, you might say "A spike is any time point that has an absolute value greater than some threshold." You determine the threshold using your data, say 0.2. Then you do something like
spikeless_data = data .* (abs(data)<0.2);
which copies data when abs(data)<0.2 and sets it to 0 when not.
You could also notice that a characteristic of spikes is that their derivative is very large, which might be more robust than a simple threshold. This would correspond to spikeless_data = data .* ([abs(diff(data)), 0] < some_threshold);
You will have to play around to find something that works for your data.

Cross Correlation with signals of different length in MATLAB

I have 2 signals of different lengths where the shorter signal is the same as the longer n samples shifted. I want to find the maximum normalized cross-correlation between these two signals. Since it is normalized should give 1. The xcorr function lags vary from -441 to 441 samples.
This will be used for slightly different signals later on, but for now I'm trying with equal signals.
Zero-padding (as I've done in the image) gives me a wrong correlation since the zeros become a part of the correlation calculation.
Any ideas how to accomplish this?
Suppose x and y are the shorter and longer signals you have.
nx = length(x);
ny = length(y);
cc = nan(1,ny-nx+1);
for ii = 0 : ny-nx
id = (1:nx) + ii;
cc(ii+1) = sum(x.*y(id))/(sqrt(sum(x.^2)*sum(y(id).^2)));
end
[ccmx,idmx] = max(cc);
Now you have the position of the maximum cross coefficient.
if the lag starts from -441 as you gave (where x and y are aligned at the left). The max should be at lag=idmx-442.
If you want to remove the first and last zeros of a vector you can use this here
A = A(find(A~=0, 1, 'first'):find(A~=0, 1, 'last'));
To use it directly in your correlation, try to use this here (where A is your red line and B the blue one)
xcorr(A(find(A~=0, 1, 'first'):find(A~=0, 1, 'last')), B(find(A~=0, 1, 'first'):find(A~=0, 1, 'last')));

How to find the frequency response of the Rosenberg Glottal Model

Is there an easy way to calculate the frequency response of the following function?
I tried using heaviside function but with no luck.
Basically I want to write a function to return the frequency response based on input N1 and N2 and also the number of points (lets say x) between 0 and pi
The output would be a vector which returns x values for the frequency response for corresponding frequencies => 0:pi/x:pi
Assuming that N1 + N2 < num_points, where num_points is the length of the sequence, you can simply write the function like so:
function [gr] = rosenburg(N1, N2, num_points)
gr = zeros(num_points,1);
range1 = 0:N1;
range2 = N1+1:N1+N2;
gr(range1+1) = 0.5*(1 - cos(pi*range1/N1));
gr(range2+1) = cos(pi*(range2-N1) / (2*N2));
end
The function prototype, rosenburg takes in N1, N2 and the total number of points you want this function to take in, num_points. How this code works is that we first allocate an array that is all zeroes of size num_points. We then compute two linear ranges: One from 0 <= n <= N1 and the other from N1 < n <= N2. Note that the second range starts by offsetting N1 by 1 because we have already computed the value at n = N1. Once we compute these ranges, we simply apply the right relationship in the right ranges. Note that when I'm assigning the relationships to the correct intervals in the array, I need to offset by 1 because MATLAB begins indexing arrays at index 1. The rest of the values are zero due to the initialization at the beginning of the function.
Now, if you want to find the frequency response of this signal, just use fft which is the Fast Fourier Transform. It's the classic method to find the frequency domain version of a discrete input signal on a numerical basis. As such, once you create your signal using the rosenburg function, then throw this into the FFT function. How you call it is like so:
X = fft(gr);
This computes the N point FFT, where N is the length of the signal gr. Alternatively, you can provide the number of points you want to compute the FFT for. Specifically:
X = fft(gr, N);
Basically, the higher N is, the finer or granular the frequency components will be. Note that the frequency axis is normalized between 0 to 2*pi, and so the higher N is, the finer resolution you will have between neighbouring points on the axis. Specifically, each point on this axis has the following frequency:
w = i*(2*pi)/x;
i would be the index on the x-axis (0, 1, 2, ..., num_points-1) and x would be the total number of points for the FFT. Normally, people show the spectrum between -pi <= w <= pi, and so some people apply fftshift to shift the spectrum so that the DC component is located at the centre of the spectrum, which is how we naturally perceive the spectrum to be.
When you say "frequency response", I believe you are referring to the magnitude, and so use abs to calculate the complex magnitude of each value, as the fft is generally complex valued. Therefore, assuming that you wish to compute the FFT to be as many points as the length of your signal, and let's say we choose N1 = 4, N2 = 8 and we want 64 points, and we want to plot the spectrum. Simply do this:
gr = rosenburg(4, 8, 64);
X = fft(gr);
Xshift = fftshift(X);
plot(linspace(-pi,pi,64), abs(Xshift));
grid;
The above code will shift the spectrum, then plot its magnitude between -pi to pi. This is what I get:
As an illustration, this is what the spectrum looks like before we apply fftshift:
Here's the code to generate the above figure:
plot(linspace(0,2*pi,64), abs(X));
grid;
You can see that the spectra is symmetric. Right at the frequency pi, you can see that it is mirror reflected, which makes sense as the range from pi to 2*pi, precisely maps to -pi to 0. Because the signal is real, the spectrum is symmetric. In fact, we can call this signal Hermitian symmetric. Obviously, the frequency components are a bit sparsely spaced. It may be better to increase the total number of points to something like 256. This is what I get when I change the number of points to 256:
Pretty smooth! Now, if you want to extract the frequency components from 0 to pi, you need to extract half of the frequency decomposition that is stored in X. Therefore, you would simply do:
f = X(1:numel(X)/2);
numel determines how many elements are in an array or matrix. However, remember that each frequency point was defined as:
w = i*(2*pi)/x
You specifically want:
w = i*pi/x
As such, you'll need to compute the FFT at twice the size of your signal first, then extract half of the spectra in the same way. For example, for 64 points:
gr = rosenburg(4, 8, 64);
X = fft(gr, 128);
f = X(1:numel(X)/2);
This should hopefully get you started. Good luck!

How does the sgolay function work in Matlab R2013a?

I have a question about the sgolay function in Matlab R2013a. My database has 165 spectra with 2884 variables and I would like to take the first and second derivatives of them. How might I define the inputs K and F to sgolay?
Below is an example:
sgolay is used to smooth a noisy sinusoid and compare the resulting first and second derivatives to the first and second derivatives computed using diff. Notice how using diff amplifies the noise and generates useless results.
K = 4; % Order of polynomial fit
F = 21; % Window length
[b,g] = sgolay(K,F); % Calculate S-G coefficients
dx = .2;
xLim = 200;
x = 0:dx:xLim-1;
y = 5*sin(0.4*pi*x)+randn(size(x)); % Sinusoid with noise
HalfWin = ((F+1)/2) -1;
for n = (F+1)/2:996-(F+1)/2,
% Zero-th derivative (smoothing only)
SG0(n) = dot(g(:,1), y(n - HalfWin: n + HalfWin));
% 1st differential
SG1(n) = dot(g(:,2), y(n - HalfWin: n + HalfWin));
% 2nd differential
SG2(n) = 2*dot(g(:,3)', y(n - HalfWin: n + HalfWin))';
end
SG1 = SG1/dx; % Turn differential into derivative
SG2 = SG2/(dx*dx); % and into 2nd derivative
% Scale the "diff" results
DiffD1 = (diff(y(1:length(SG0)+1)))/ dx;
DiffD2 = (diff(diff(y(1:length(SG0)+2)))) / (dx*dx);
subplot(3,1,1);
plot([y(1:length(SG0))', SG0'])
legend('Noisy Sinusoid','S-G Smoothed sinusoid')
subplot(3, 1, 2);
plot([DiffD1',SG1'])
legend('Diff-generated 1st-derivative', 'S-G Smoothed 1st-derivative')
subplot(3, 1, 3);
plot([DiffD2',SG2'])
legend('Diff-generated 2nd-derivative', 'S-G Smoothed 2nd-derivative')
Taking derivatives in an inherently noisy process. Thus, if you already have some noise in your data, indeed, it will be magnified as you take higher order derivatives. Savitzky-Golay is a very useful way of combining smoothing and differentiation into one operation. It's a general method and it computes derivatives to an arbitrary order. There are trade-offs, though. Other special methods exist for data with a certain structure.
In terms of your application, I don't have any concrete answers. Much depends on the nature of the data (sampling rate, noise ratio, etc.). If you use too much smoothing, you'll smear your data or produce aliasing. Same thing if you over-fit the data by using high order polynomial coefficients, K. In your demo code you should also plot the analytical derivatives of the sin function. Then play with different amounts of input noise and smoothing filters. Such a tool with known exact answers may be helpful if you can approximate aspects of your real data. In practice, I try to use as little smoothing as possible in order to produce derivatives that aren't too noisy. Often this means a third-order polynomial (K = 3) and a window size, F, as small as possible.
So yes, many suggest that you use your eyes to tune these parameters. However, there has also been some very recent research on choosing the coefficients automatically: On the Selection of Optimum Savitzky-Golay Filters (2013). There are also alternatives to Savitzky-Golay, e.g., this paper based on regularization, but you may need to implement them yourself in Matlab.
By the way, a while back I wrote a little replacement for sgolay. Like you, I only needed the second output, the differentiation filters, G, so that's all it calculates. This function is also faster (by about 2–4 times):
function G=sgolayfilt(k,f)
%SGOLAYFILT Savitzky-Golay differentiation filters
s = vander(0.5*(1-f):0.5*(f-1));
S = s(:,f:-1:f-k);
[~,R] = qr(S,0);
G = S/R/R';
A full version of this function with input validation is available on my GitHub.

Matlab - Signal Noise Removal

I have a vector of data, which contains integers in the range -20 20.
Bellow is a plot with the values:
This is a sample of 96 elements from the vector data. The majority of the elements are situated in the interval -2, 2, as can be seen from the above plot.
I want to eliminate the noise from the data. I want to eliminate the low amplitude peaks, and keep the high amplitude peak, namely, peaks like the one at index 74.
Basically, I just want to increase the contrast between the high amplitude peaks and low amplitude peaks, and if it would be possible to eliminate the low amplitude peaks.
Could you please suggest me a way of doing this?
I have tried mapstd function, but the problem is that it also normalizes that high amplitude peak.
I was thinking at using the wavelet transform toolbox, but I don't know exact how to reconstruct the data from the wavelet decomposition coefficients.
Can you recommend me a way of doing this?
One approach to detect outliers is to use the three standard deviation rule. An example:
%# some random data resembling yours
x = randn(100,1);
x(75) = -14;
subplot(211), plot(x)
%# tone down the noisy points
mu = mean(x); sd = std(x); Z = 3;
idx = ( abs(x-mu) > Z*sd ); %# outliers
x(idx) = Z*sd .* sign(x(idx)); %# cap values at 3*STD(X)
subplot(212), plot(x)
EDIT:
It seems I misunderstood the goal here. If you want to do the opposite, maybe something like this instead:
%# some random data resembling yours
x = randn(100,1);
x(75) = -14; x(25) = 20;
subplot(211), plot(x)
%# zero out everything but the high peaks
mu = mean(x); sd = std(x); Z = 3;
x( abs(x-mu) < Z*sd ) = 0;
subplot(212), plot(x)
If it's for demonstrative purposes only, and you're not actually going to be using these scaled values for anything, I sometimes like to increase contrast in the following way:
% your data is in variable 'a'
plot(a.*abs(a)/max(abs(a)))
edit: since we're posting images, here's mine (before/after):
You might try a split window filter. If x is your current sample, the filter would look something like:
k = [L L L L L L 0 0 0 x 0 0 0 R R R R R R]
For each sample x, you average a band of surrounding samples on the left (L) and a band of surrounding samples on the right. If your samples are positive and negative (as yours are) you should take the abs. value first. You then divide the sample x by the average value of these surrounding samples.
y[n] = x[n] / mean(abs(x([L R])))
Each time you do this the peaks are accentuated and the noise is flattened. You can do more than one pass to increase the effect. It is somewhat sensitive to the selection of the widths of these bands, but can work. For example:
Two passes:
What you actually need is some kind of compression to scale your data, that is: values between -2 and 2 are scale by a certain factor and everything else is scaled by another factor. A crude way to accomplish such a thing, is by putting all small values to zero, i.e.
x = randn(1,100)/2; x(50) = 20; x(25) = -15; % just generating some data
threshold = 2;
smallValues = (abs(x) <= threshold);
y = x;
y(smallValues) = 0;
figure;
plot(x,'DisplayName','x'); hold on;
plot(y,'r','DisplayName','y');
legend show;
Please do not that this is a very nonlinear operation (e.g. when you have wanted peaks valued at 2.1 and 1.9, they will produce very different behavior: one will be removed, the other will be kept). So for displaying, this might be all you need, for further processing it might depend on what you are trying to do.
To eliminate the low amplitude peaks, you're going to equate all the low amplitude signal to noise and ignore.
If you have any apriori knowledge, just use it.
if your signal is a, then
a(abs(a)<X) = 0
where X is the max expected size of your noise.
If you want to get fancy, and find this "on the fly" then, use kmeans of 3. It's in the statistics toolbox, here:
http://www.mathworks.com/help/toolbox/stats/kmeans.html
Alternatively, you can use Otsu's method on the absolute values of the data, and use the sign back.
Note, these and every other technique I've seen on this thread is assuming you are doing post processing. If you are doing this processing in real time, things will have to change.