Harmonic mean when a DC signal is present - matlab

I have an output from a noisy signal, saved as a set of cosines.
I have a set of frequencies from 0 to x Hz (x is a large number), and a set, of the same size, of amplitudes.
I want to work out the harmonic mean of the frequencies present, when the weighting of the frequency is the magnitude of the corresponding amplitude.
For example:
If I have a set of frequencies
[ 1 , 2 , 3] and amplitudes [ 10, 100, 1000 ] (such that the cosine with frequency 1 has amplitude 10, etc.). Then, the harmonic mean of the frequencies is 2.8647.
However, I run into problems when I have a zero frequency (a "DC" component) - the harmonic mean is just zero!
The real life problem is a very big set of cosines, starting with a zero frequency, going up to several GHz. Much of the signal is weighted in a portion of the spectrum and I want to compare a simple weighted mean of the spectrum with a harmonic mean.
The way around this (it seems a cheap way) is to ignore the zero frequency - it is only one frequency out of tens of thousands. But is there a correct way to do this?

Below is the equation for the weighted harmonic mean:
Applied to your example it's:
x = 1:3;
w = logspace(1,3,3); % [10 100 1000]
sum(w)/sum(w./x); % 2.8220
You can see that if one of the x values is 0, the sum in the denominator would be infinite. If you manually set the weight of this value to 0, you would have a 0/0 scenario in the bottom sum (which evaluates to NaN). Technically speaking - you can't have an x of 0 in the computation of this type of mean without getting a result of 0.
I think it's quite clear that this isn't the right tool to handle a DC signal. Several things come to mind in order to get some meaningful information:
It sounds reasonable to ignore the DC signal altogether in both means.
Perhaps you would be better off ignoring it for the purpose of the harmonic mean and add it afterwards for compatibility with the simple mean.
At the end of the day, you need to decide what is the point you're trying to make with this, and then process the data accordingly.

Related

How to change value of a signal's range in Matlab

Let's say that I have a signal in Matlab like this
x = cos(2*pi*10*t) + cos(2*pi*20*t) + cos(2*pi*50*t);
And I want to change the values between 20 and 30 hz into 0. How can I do that? I mean, those values generated from the x formula, I want to change them a little bit.
You can do it by performing FFT over x and setting to zero those values that are between 20 and 30 Hz then applying the FFT inverse on the previous values and you should get the signal without those frequencies. However, you may lose valuable information or the signal might just not look as you wish. Therefore, I recommend you to use a "Bandstop filter". The band stop filter will receive the cutoff frequencies (the limit frequencies you want to work with) and some other parameters. The bandstop filter basically removes from the signal the frequencies that you specify. And the good part is that it can be done as easy as doing what follows:
First you have to build the filter. To do so, you need to indicate the filter order which can be defined as you wish. Usually a second order works good. Also, you have to be aware of your sampling rate Fs.
d = designfilt('bandstopiir','FilterOrder',2, ...
'HalfPowerFrequency1',20,'HalfPowerFrequency2',30, ...
'SampleRate',Fs);
Now you only need to apply the filter to your desired signal.
filtered_signal_x = filtfilt(d, x)
Now, filtered_signal_x should not have the frequencies you wanted to delete. By using the bandstop you don't have to mess with the FFT and that kind of stuff and is a way faster so I think its the best option.
You can either use a filter, or you can filter it by yourself by going into Fourier space and explicitly setting the signal on the frequencies you need to zero. After that, you need to go back to the time domain. Here is a code:
t=0:0.01:0.99; % time
x = cos(2*pi*10*t) + cos(2*pi*20*t) + cos(2*pi*50*t); %signal
xf=fftshift(fft(x)); %Fourier signal
N=size(x,2); % Size of the signal
frequency=2*pi*[-N/2:N/2-1]; %frequency range
frequencyrangeplus=find(frequency/(2*pi)>=20 & frequency/(2*pi)<=30); %find positive frequencies in the required range
frequencyrangeminus=find(frequency/(2*pi)<=-20 & frequency/(2*pi)>=-30); %find negative frequencies in the required range
xf(frequencyrangeplus)=0; %set signal to zero at positive frequencies range
xf(frequencyrangeminus)=0; %set signal to zero at nagative frequencies range
xnew=ifft(ifftshift(xf)); %get the new signal in time domain
xcheck= cos(2*pi*10*t) + cos(2*pi*50*t); % to check the code
max(abs(xcheck-xnew)) % maximum difference

Confusion on how the frequency axis when plotting the FFT magnitude is created

This code takes FFT of a signal and plots it on a new frequency axis.
f=600;
Fs=6000;
t=0:1/Fs:0.3;
n=0:1:length(t);
x=cos(2*pi*(400/Fs)*n)+2*sin(2*pi*(1100/Fs)*n);
y=fft(x,512);
freqaxis=Fs*(linspace(-0.5,0.5, length(y)));
subplot(211)
plot(freqaxis,fftshift(abs(y)));
I understand why we used fftshift because we wanted to see the signal centered at the 0 Hz (DC) value and it is better for observation.
However I seem to be confused about how the frequency axis is defined. Specifically, why did we especially multiply the range of [-0.5 0.5] with Fs and we obtain the [-3000 3000] range? It could be [-0.25 0.25].
The reason why the range is between [-Fs/2,Fs/2] is because Fs/2 is the Nyquist frequency. This is the largest possible frequency that has the ability of being visualized and what is ultimately present in your frequency decomposition. I also disagree with your comment where the range "could be between [-0.25,0.25]". This is contrary to the definition of the Nyquist frequency.
From signal processing theory, we know that we must sample by at least twice the bandwidth of the signal in order to properly reconstruct the signal. The bandwidth is defined as the largest possible frequency component that can be seen in your signal, which is also called the Nyquist Frequency. In other words:
Fs = 2*BW
The upper limit of where we can visualize the spectrum and ultimately the bandwidth / Nyquist frequency is defined as:
BW = Fs / 2;
Therefore because your sampling frequency is 6000 Hz, this means the Nyquist frequency is 3000 Hz, so the range of visualization is [-3000,3000] Hz which is correct in your magnitude graph.
BTW, your bin centres for each of the frequencies is incorrect. You specified the total number of bins in the FFT to be 512, yet the way you are specifying the bins is with respect to the total length of the signal. I'm surprised why you don't get a syntax error because the output of the fft function should give you 512 points yet your frequency axis variable will be an array that is larger than 512. In any case, that is not correct. The frequency at each bin i is supposed to be:
f = i * Fs / N, for i = 0, 1, 2, ..., N-1
N is the total number of points you have in your FFT, which is 512. You originally had it as length(y) and that is not correct... so this is probably why you have a source of confusion when examining the frequency axis. You can read up about why this is the case by referencing user Paul R's wonderful post here: How do I obtain the frequencies of each value in an FFT?
Note that we only specify bins from 0 up to N - 1. To account for this when you specify the bin centres of each frequency, you usually specify an additional point in your linspace command and remove the last point:
freqaxis=Fs*(linspace(-0.5,0.5, 513); %// Change
freqaxis(end) = []; %// Change
BTW, the way you've declared freqaxis is a bit obfuscated to me. This to me is more readable:
freqaxis = linspace(-Fs/2, Fs/2, 513);
freqaxis(end) = [];
I personally hate using length and I favour numel more.
In any case, when I run the corrected code to specify the bin centres, I now get this plot. Take note that I inserted multiple data cursors where the peaks of the spectrum are, which correspond to the frequencies for each of the cosines that you have declared (400 Hz and 1100 Hz):
You see that there are some slight inaccuracies, primarily due to the number of bins you have specified (i.e. 512). If you increased the total number of bins, you will see that the frequencies at each of the peaks will get more accurate.

Sampling at exactly Nyquist rate in Matlab

Today I have stumbled upon a strange outcome in matlab. Lets say I have a sine wave such that
f = 1;
Fs = 2*f;
t = linspace(0,1,Fs);
x = sin(2*pi*f*t);
plot(x)
and the outcome is in the figure.
when I set,
f = 100
outcome is in the figure below,
What is the exact reason of this? It is the Nyquist sampling theorem, thus it should have generated the sine properly. Of course when I take Fs >> f I get better results and a very good sine shape. My explenation to myself is that Matlab was having hardtime with floating numbers but I am not so sure if this is true at all. Anyone have any suggestions?
In the first case you only generate 2 samples (the third input of linspace is number of samples), so it's hard to see anything.
In the second case you generate 200 samples from time 0 to 1 (including those two values). So the sampling period is 1/199, and the sampling frequency is 199, which is slightly below the Nyquist rate. So there is aliasing: you see the original signal of frequency 100 plus its alias at frequency 99.
In other words: the following code reproduces your second figure:
t = linspace(0,1,200);
x = .5*sin(2*pi*99*t) -.5*sin(2*pi*100*t);
plot(x)
The .5 and -.5 above stem from the fact that a sine wave can be decomposed as the sum of two spectral deltas at positive and negative frequencies, and the coefficients of those deltas have opposite signs.
The sum of those two sinusoids is equivalent to amplitude modulation, namely a sine of frequency 99.5 modulated by a sine of frequency 1/2. Since time spans from 0 to 1, the modulator signal (whose frequency is 1/2) only completes half a period. That's what you see in your second figure.
To avoid aliasing you need to increase sample rate above the Nyquist rate. Then, to recover the original signal from its samples you can use an ideal low pass filter with cutoff frequency Fs/2. In your case, however, since you are sampling below the Nyquist rate, you would not recover the signal at frequency 100, but rather its alias at frequency 99.
Had you sampled above the Nyquist rate, for example Fs = 201, the orignal signal could ideally be recovered from the samples.† But that would require an almost ideal low pass filter, with a very sharp transition between passband and stopband. Namely, the alias would now be at frequency 101 and should be rejected, whereas the desired signal would be at frequency 100 and should be passed.
To relax the filter requirements you need can sample well above the Nyquist rate. That way the aliases are further appart from the signal and the filter has an easier job separating signal from aliases.
† That doesn't mean the graph looks like your original signal (see SergV's answer); it only means that after ideal lowpass filtering it will.
Your problem is not related to the Nyquist theorem and aliasing. It is simple problem of graphic representation. You can change your code that frequency of sine will be lower Nyquist limit, but graph will be as strange as before:
t = linspace(0,1,Fs+2);
plot(sin(2*pi*f*t));
Result:
To explain problem I modify your code:
Fs=100;
f=12; %f << Fs
t=0:1/Fs:0.5; % step =1/Fs
t1=0:1/(10*Fs):0.5; % step=1/(10*Fs) for precise graphic representation
subplot (2, 1, 1);
plot(t,sin(2*pi*f*t),"-b",t,sin(2*pi*f*t),"*r");
subplot (2, 1, 2);
plot(t1,sin(2*pi*f*t1),"g",t,sin(2*pi*f*t),"r*");
See result:
Red star - values of sin(2*pi*f) with sampling rate of Fs.
Blue line - lines which connect red stars. It is usual data representation of function plot() - line interpolation between data points
Green curve - sin(2*pi*f)
Your eyes and brain can easily understand that these graphs represent the sine
Change frequency to more high:
f=48; % 2*f < Fs !!!
See on blue lines and red stars. Your eyes and brain do not understand now that these graphs represent the same sine. But your "red stars" are actually valid value of sine. See on bottom graph.
Finally, there is the same graphics for sine with frequency f=50 (2*f = Fs):
P.S.
Nyquist-Shannon sampling theorem states for your case that if:
f < 2*Fs
You have infinite number of samples (red stars on our plots)
then you can reproduce values of function in any time (green curve on our plots). You must use sinc interpolation to do it.
copied from Matlab Help:
linspace
Generate linearly spaced vectors
Syntax
y = linspace(a,b)
y = linspace(a,b,n)
Description
The linspace function generates linearly spaced vectors. It is similar to the colon operator ":", but gives direct control over the number of points.
y = linspace(a,b) generates a row vector y of 100 points linearly spaced between and including a and b.
y = linspace(a,b,n) generates a row vector y of n points linearly spaced between and including a and b. For n < 2, linspace returns b.
Examples
Create a vector of 100 linearly spaced numbers from 1 to 500:
A = linspace(1,500);
Create a vector of 12 linearly spaced numbers from 1 to 36:
A = linspace(1,36,12);
linspace is not apparent for Nyquist interval, so you can use the common form:
t = 0:Ts:1;
or
t = 0:1/Fs:1;
and change the Fs values.
The first Figure is due to the approximation of '0': sin(0) and sin(2*pi). We can notice the range is in 10^(-16) level.
I wrote the function reconstruct_FFT that can recover critically sampled data even for short observation intervals if the input sequence of samples is periodic. It performs lowpass filtering in the frequency domain.

implementation of Lomb-Scargle periodogram

from matlab official site , Lomb-Scargle periodogram is defined as
http://www.mathworks.com/help/signal/ref/plomb.html#lomb
suppose we have some random signal let say
x=rand(1,1000);
average of this signal can be easily implemented as
average=mean(x);
variance can be implemented as
>> average_vector=repmat(average,1,1000);
>> diff=x-average_vector;
>> variance= sum(diff.*diff)/(length(x)-1);
how should i continue? i mean how to choose frequencies ? calculation of time offset is not problem,let us suppose that we have time vector
t=0:0.1:99.9;
so that total length of time vector be 1000, in generally for DFT, frequencies bins are represented as a multiplier of 2*pi/N, where N is length of signal, but what about this case? thanks in advance
As can be seen from the provided link to MATLAB documentation, the algorithm does not depend on a specific sampling times tk selection. Note that for equally spaced sampling times (as you have selected), the same link indicates:
The offset depends only on the measurement times and vanishes when the times are equally spaced.
So, as you put it "calculation of time offset is not a problem".
Similar to the DFT which can be obtained from the Discrete-Time Fourier Transform (DTFT) by selecting a discrete set of frequencies, we can also choose f[n] = n * sampling_rate/N (where sampling_rate = 10 for your selection of tk). If we disregard the value of PLS(f[n]) for n=mN where m is any integer (since it's value is ill-formed, at least in the formula posted in the link), then:
Thus for real-valued data samples:
where Y can be expressed in terms of the diff vector you defined as:
Y = fft(diff);
That said, as indicated on wikipedia, the Lomb–Scargle method is primarilly intended for use with unequally spaced data.

Help me understand FFT function (Matlab)

1) Besides the negative frequencies, which is the minimum frequency provided by the FFT function? Is it zero?
2) If it is zero how do we plot zero on a logarithmic scale?
3) The result is always symmetrical? Or it just appears to be symmetrical?
4) If I use abs(fft(y)) to compare 2 signals, may I lose some accuracy?
1) Besides the negative frequencies, which is the minimum frequency provided by the FFT function? Is it zero?
fft(y) returns a vector with the 0-th to (N-1)-th samples of the DFT of y, where y(t) should be thought of as defined on 0 ... N-1 (hence, the 'periodic repetition' of y(t) can be thought of as a periodic signal defined over Z).
The first sample of fft(y) corresponds to the frequency 0.
The Fourier transform of real, discrete-time, periodic signals has also discrete domain, and it is periodic and Hermitian (see below). Hence, the transform for negative frequencies is the conjugate of the corresponding samples for positive frequencies.
For example, if you interpret (the periodic repetition of) y as a periodic real signal defined over Z (sampling period == 1), then the domain of fft(y) should be interpreted as N equispaced points 0, 2π/N ... 2π(N-1)/N. The samples of the transform at the negative frequencies -π ... -π/N are the conjugates of the samples at frequencies π ... π/N, and are equal to the samples at frequencies
π ... 2π(N-1)/N.
2) If it is zero how do we plot zero on a logarithmic scale?
If you want to draw some sort of Bode plot you may plot the transform only for positive frequencies, ignoring the samples corresponding to the lowest frequencies (in particular 0).
3) The result is always symmetrical? Or it just appears to be symmetrical?
It has Hermitian symmetry if y is real: Its real part is symmetric, its imaginary part is anti-symmetric. Stated another way, its amplitude is symmetric and its phase anti-symmetric.
4) If I use abs(fft(y)) to compare 2 signals, may I lose some accuracy?
If you mean abs(fft(x - y)), this is OK and you can use it to get an idea of the frequency distribution of the difference (or error, if x is an estimate of y). If you mean abs(fft(x)) - abs(fft(y)) (???) you lose at least phase information.
Well, if you want to understand the Fast Fourier Transform, you want to go back to the basics and understand the DFT itself. But, that's not what you asked, so I'll just suggest you do that in your own time :)
But, in answer to your questions:
Yes, (excepting negatives, as you said) it is zero. The range is 0 to (N-1) for an N-point input.
In MATLAB? I'm not sure I understand your question - plot zero values as you would any other value... Though, as rightly pointed out by duffymo, there is no natural log of zero.
It's essentially similar to a sinc (sine cardinal) function. It won't necessarily be symmetrical, though.
You won't lose any accuracy, you'll just have the magnitude response (but I guess you knew that already).
Consulting "Numerical Recipes in C", Chapter 12 on "Fast Fourier Transform" says:
The frequency ranges from negative fc to positive fc, where fc is the Nyquist critical frequency, which is equal to 1/(2*delta), where delta is the sampling interval. So frequencies can certainly be negative.
You can't plot something that doesn't exist. There is no natural log of zero. You'll either plot frequency as the x-axis or choose a range that doesn't include zero for your semi-log axis.
The presence or lack of symmetry in the frequency range depends on the nature of the function in the time domain. You can have a plot in the frequency domain that is not symmetric about the y-axis.
I don't think that taking the absolute value like that is a good idea. You'll want to read a great deal more about convolution, correction, and signal processing to compare two signals.
result of fft can be 0. already answered by other people.
to plot 0 frequency, the trick is to set it to a very small positive number (I use exp(-15) for that purpose).
already answered by other people.
if you are only interested in the magnitude, yes you can do that. this is applicable, say, in many image processing problems.
Half your question:
3) The results of the FFT operation depend on the nature of the signal; hence there's nothing requiring that it be symmetrical, although if it is you may get some more information about the properties of the signal
4) That will compare the magnitudes of a pair of signals, but those being equal do no guarantee that the FFTs are identical (don't forget about phase). It may, however, be enough for your purposes, but you should be sure of that.