Remove noise on a wav file - matlab

I'm working on a small code to learn signal processing on Matlab. I have got a .wav sound with some noise and I just want to remove the noise. I tried the code under but noise is not removed correctly. My idea is to do a cut band filter to remove the different noise components on the fft. After a lot of researches, I don't understant where is my problem. Here my code :
clear all;
clc;
close all;
% We load and plot the signal
[sig,Fs] = audioread('11.wav');
N = length(sig);
figure,plot(sig); title 'signal'
% FFT of the signal
fft_sig = abs(fft(sig));
% Normalisation of the frequencies for the plot
k = 0 : length(sig) - 1;
f = k*Fs/length(sig);
plot(f,fft_sig);
% Loop with 2 elements because there are 2 amplitudes (not sure about
% this)
for i = 1:2
% I put noise components in an array
[y(i),x(i)] = max(fft_sig);
% Normalisation of the frequencies to eliminate
f(i) = x(i) * (Fs/N);
% Cut band filter with elimination of f from f-20 to f+20 Hz
b = fir1(2 , 2 * [f(i)-20 f(i)+20] / Fs, 'stop')
sig = filter(b,1,sig);
figure,plot(abs(fft(sig)));title 'fft of the signal'
end
Here the images I got, the fft plot is exactly the same before and after applying the filter, there is a modification only on the x axis :
The sampling frequency is Fs = 22050.
Thanks in advance for your help, I hope I'm clear enough in my description

Since you haven't explicitly said so, the code you provided essentially defines your noise as narrowband interference at two frequencies (even though that interference may look less 'noisy').
The first thing to notice is that the value x(i) obtained from max(fft_sig) corresponds to the 1-based index of the located maximum. It won't make a huge difference for large N, but it may have an impact for smaller values especially when trying to design a very narrow notch filter. The corresponding frequency would then be:
freq = (x(i)-1) * (Fs/N);
Also if you are going to iteratively attenuate frequency components you would need to update fft_sig which you use to pick the frequency components to be attenuated (otherwise you will always pick the same component). The simplest would be to recompute fft_sig from the filtered sig:
sig = filter(b,1,sig);
fft_sig = abs(fft(sig));
Finally, since you are trying to attenuate a very narrow frequency range, you may find that you need to increase the FIR filter order by a few orders of magnitudes to get a good attenuation at the desired frequency without attenuating everything else. As was pointed out in comments such a narrow notch filter can often be better implemented using an IIR filter.
The updated code would then look somewhat like:
% Loop with 2 elements because there are 2 amplitudes
for i = 1:2
% I put noise components in an array
% Limit peak search to the lower-half of the spectrum since the
% upper half is symmetric
[y(i),x(i)] = max(fft_sig(1:floor(N/2)+1));
% Normalisation of the frequencies to eliminate
freq = (x(i)-1) * (Fs/N);
% Cut band filter with elimination of f from f-20 to f+20 Hz
b = fir1(2000 , 2 * [freq-20 freq+20] / Fs, 'stop')
sig = filter(b,1,sig);
fft_sig = abs(fft(sig));
%figure;plot(f, abs(fft_sig));title 'fft of the signal'
figure;plot(f, 20*log10(abs(fft_sig)));title 'fft of the signal'
end
As far as the last FFT plot showing a different x-axis, it is simply because you omitted to provide an x-axis variable (in occurrence f used in the previous plot), so the plot shows the array index (instead of frequencies) as the x-axis.

Related

Applying sgolay for calculation velocity/acceleration with nonuniform Fs

I have some x and y (pixel) coordinates that were collected using a sensor that as not a steady Fs (sample rate) and want to apply a SGOLAY filter to my signal and compute the velocity and acceleration of the movement.
I'm following the example in the Mathworks help secction regarding Savitzky-Golay Differentiation. But they use a predetermined fixed Fs can someone help me and explain how can I apply the filter for an variable Fs (I have the time stamp for each coordinate).
The filter does not depend on the sample rate but is applied to a fixed number of points (i.e. the input framelen). Nevertheless, you are right that the signal should ideally be equally spaced.
apply the filter as it is to a variable-spaced signal (in terms of time)
make the signal equally spaced (i.e. interpolate it... as we don't know the signals, I cannot recommend any interpolation method for this)
make the filter time variable
For the last, I would recommend to use different frame-length and build your time-variable filtered signal afterwards. Here is a quick example of what I thought of
% create random signal
x = ([1:50,51:2:150].')/10; % "time" vector
sig = rand(100,1)+sin(x); % signal vector
% different factors => will be frame length
fct = round( 1./diff(x) );
fct_uq = unique(fct);
% allocate memory
Y = NaN(length(sig),length(fct_uq));
sig_flt = NaN(size(sig));
% loop over unique "factors"
for i = 1:length(fct_uq)
% ensure odd framelen
framelen = 2*floor(fct_uq(i)/2)+1;
% apply filter
y = sgolayfilt(sig,3,framelen);
% store results
Y(:,i) = y;
% build new filtered signal
lg = fct == fct_uq(i);
sig_flt(lg) = y(lg);
end
% plot
plot(x,[sig,sig_flt], x,Y,'--')

Linear regression -- Stuck in model comparison in Matlab after estimation?

I want to determine how well the estimated model fits to the future new data. To do this, prediction error plot is often used. Basically, I want to compare the measured output and the model output. I am using the Least Mean Square algorithm as the equalization technique. Can somebody please help what is the proper way to plot the comparison between the model and the measured data? If the estimates are close to true, then the curves should be very close to each other. Below is the code. u is the input to the equalizer, x is the noisy received signal, y is the output of the equalizer, w is the equalizer weights. Should the graph be plotted using x and y*w? But x is noisy. I am confused since the measured output x is noisy and the model output y*w is noise-free.
%% Channel and noise level
h = [0.9 0.3 -0.1]; % Channel
SNRr = 10; % Noise Level
%% Input/Output data
N = 1000; % Number of samples
Bits = 2; % Number of bits for modulation (2-bit for Binary modulation)
data = randi([0 1],1,N); % Random signal
d = real(pskmod(data,Bits)); % BPSK Modulated signal (desired/output)
r = filter(h,1,d); % Signal after passing through channel
x = awgn(r, SNRr); % Noisy Signal after channel (given/input)
%% LMS parameters
epoch = 10; % Number of epochs (training repetation)
eta = 1e-3; % Learning rate / step size
order=10; % Order of the equalizer
U = zeros(1,order); % Input frame
W = zeros(1,order); % Initial Weigths
%% Algorithm
for k = 1 : epoch
for n = 1 : N
U(1,2:end) = U(1,1:end-1); % Sliding window
U(1,1) = x(n); % Present Input
y = (W)*U'; % Calculating output of LMS
e = d(n) - y; % Instantaneous error
W = W + eta * e * U ; % Weight update rule of LMS
J(k,n) = e * e'; % Instantaneous square error
end
end
Lets start step by step:
First of all when using some fitting method it is a good practice to use RMS error . To get this we have to find error between input and output. As I understood x is an input for our model and y is an output. Furthermore you already calculated error between them. But you used it in loop without saving. Lets modify your code:
%% Algorithm
for k = 1 : epoch
for n = 1 : N
U(1,2:end) = U(1,1:end-1); % Sliding window
U(1,1) = x(n); % Present Input
y(n) = (W)*U'; % Calculating output of LMS
e(n) = x(n) - y(n); % Instantaneous error
W = W + eta * e(n) * U ; % Weight update rule of LMS
J(k,n) = e(n) * (e(n))'; % Instantaneous square error
end
end
Now e consists of errors at the last epoch. So we can use something like this:
rms(e)
Also I'd like to compare results using mean error and standard deviation:
mean(e)
std(e)
And some visualization:
histogram(e)
Second moment: we can't use compare function just for vectors! You can use it for dynamic system models. For it you have to made some workaround about using this method as dynamic model. But we can use some functions as goodnessOfFit for example. If you want something like error at each step that consider all previous points of data then make some math workaround - calculate it at each point using [1:currentNumber].
About using LMS method. There are built-in function calculating LMS. Lets try to use it for your data sets:
alg = lms(0.001);
eqobj = lineareq(10,alg);
y1 = equalize(eqobj,x);
And lets see at the result:
plot(x)
hold on
plot(y1)
There are a lot of examples of such implementation of this function: look here for example.
I hope this was helpful for you!
Comparison of the model output vs observed data is known as residual.
The difference between the observed value of the dependent variable
(y) and the predicted value (ŷ) is called the residual (e). Each data
point has one residual.
Residual = Observed value - Predicted value
e = y - ŷ
Both the sum and the mean of the residuals are equal to zero. That is,
Σ e = 0 and e = 0.
A residual plot is a graph that shows the residuals on the vertical
axis and the independent variable on the horizontal axis. If the
points in a residual plot are randomly dispersed around the horizontal
axis, a linear regression model is appropriate for the data;
otherwise, a non-linear model is more appropriate.
Here is an example of residual plots from a model of mine. On the vertical axis is the difference between the output of the model and the measured value. On the horizontal axis is one of the independent variables used in the model.
We can see that most of the residuals are within 0.2 units which happens to be my tolerance for this model. I can therefore make a conclusion as to the worth of the model.
See here for a similar question.
Regarding you question about the lack of noise in your models output. We are creating a linear model. There's the clue.

Filtering signal noise using Fourier Transforms and MATLAB

So, I've been given three different MATLAB (I'm using MATLAB R2014b) files with signals containing noise. I simply just plotted the values I was given for the first part. For example, the plot of the first signal looks like the one below.
Then, I did the Fourier Transform of the signal and plotted those values as well to determine where the noise and signal occur in the frequency spectrum. To show this, I added the plot image of the first signal below.
Finally, I am supposed to create a filter using the basic MATLAB commands and filter the noise out of the plot of the signal and then do the Fourier Transform of the signal again and plot the results. The filter portion will look something like this...
b = fir1(n,w,'type');
freqz(b,1,512);
in = filter(b,1,in);
Where n is the order of the filter, w is the cutoff frequency (cutoff frequency divided by half the sampling rate), and 'type' is something to the effect of low/high/stop/etc... So, my question is how do I figure out what the n, w, and type values of the filter I am creating are supposed to be?! Thanks in advance for any help!
I believe your high frequency components are noise, but it actually depends on your data.
See this example,
Fs = 2000;
L = 200;
t = (0 : L - 1)/Fs;
data = chirp(t,20,.05,50) + chirp(t,500,.1,700);
subplot(411)
plot(t,data,'LineWidth',2);
title('Original Data')
N = 2^nextpow2(L);
y = fft(data,N)/L;
f = Fs/2 * linspace(0,1,N/2+1);
subplot(412)
plot(f,abs(y(1:N/2+1)))
title('Spectrum of Original Data')
b = fir1(40,2*[1 200]/Fs);
newd = filter(b,1,data);
subplot(413)
plot(t,newd)
title('Filtered Data')
newy = fft(newd,N)/L;
subplot(414)
plot(f,abs(newy(1:N/2+1)))
title('Spectrum of Filtered Data')
You can use b = fir1(40,2*[200 800]/Fs); for high-pass filter.
If the second plot is correct, in the x-axis, I can assume:
A. The sampling frequency is 2000 Hz.
B. The "noise" is in the low frequencies. It seems also from the original signal, that you need to filter the low-frequency baseline.
If so, you need highpass filter, so 'type'='high'.
The order is depend in the sharpness that you want to the filter. from the plots it seems that you can use 'n'=12 or 20.
The cutoff frequency suppose to be about 0.1 if the peak in the low frequencies is indeed the noise that you want to filter, and if the 1000Hz x-axis is indeed the Nyquist frequency.

Find the height and length of waves in noisy data

My goal is to find the maximum values of wave heights and wave lengths.
dwcL01 though dwcL10 is arrays of <3001x2 double> with output from a numerical wave model.
Part of my script:
%% Plotting results from SWASH
% Examination of phase velocity on deep water with different number of layers
% Wave height 3 meters, wave peroid 8 sec on a depth of 30 meters
clear all; close all; clc;
T=8;
L0=1.56*T^2;
%% Loading results tabels.
load dwcL01.tbl; load dwcL02.tbl; load dwcL03.tbl; load dwcL04.tbl;
load dwcL05.tbl; load dwcL06.tbl; load dwcL07.tbl; load dwcL08.tbl;
load dwcL09.tbl; load dwcL10.tbl;
M(:,:,1) = dwcL01; M(:,:,2) = dwcL02; M(:,:,3) = dwcL03; M(:,:,4) = dwcL04;
M(:,:,5) = dwcL05; M(:,:,6) = dwcL06; M(:,:,7) = dwcL07; M(:,:,8) = dwcL08;
M(:,:,9) = dwcL09; M(:,:,10) = dwcL10;
%% Finding position of wave crest using diff and sign.
for i=1:10
Tp(:,1,i) = diff(sign(diff([M(1,2,i);M(:,2,i)]))) < 0;
Wc(:,:,i) = M(Tp,:,i);
L(:,i) = diff(Wc(:,1,i))
end
This works fine for finding the maximum values, if the data is "smooth". The following image shows a section of my data. I get all peaks, when I only need the one around x = 40. How do I filter so I only get the "real" wave crests. The solution needs to be general so that it still works if I change the domain size, wave height or wave period.
If you're basically trying to fit this curve of data to a sine wave, have you considered performing Fourier analysis (FFT in Matlab), then checking the magnitude of that fundamental frequency? The frequency will tell you the wave spacing, and the magnitude the height, and when used over multiple periods will find an average.
See the Matlab help page for an example of the usage
but the basic gist is:
y = [...] %vector of wave data points
N=length(y); %Make sure this is an even number
Y = fft(y); %Convert into frequency domain
figure;
plot(y(1:N)); %Plot original wave data
figure;
plot(abs(Y(1:N/2))./N); %Plot only the lower half of frequencies to hide aliasing
I have one more solution that might work for you. It involves computing the 2nd-order derivative using a 5-point central difference instead of the 2-point finite differences. When using diff twice you are performing two first-order derivatives consecutively (finite 2-point differences) which are very susceptible to noise/oscillations. The advantage of using a higher-order approximation is that the neighboring points help filter out the small oscillations, and this may work for your case.
Let f(:) = squeeze(M(:,2,i)) be the array of data points, and h is the uniform spacing distance between the points:
%Better approximation of the 2nd derivative using neighboring points:
for j=3:length(f)-2
Tp(j,i) = (-f(j-2) + 16*f(j-1) - 30*f(j) + 16*f(j+1) - f(j+2))/(12*h^2);
end
Note that since this 2nd-order derivative requires the 2 neighboring points to the left and right, that the range of the loop must start at the 3rd index and end 2 short of the array length.

MATLAB FFT xaxis limits messing up and fftshift

This is the first time I'm using the fft function and I'm trying to plot the frequency spectrum of a simple cosine function:
f = cos(2*pi*300*t)
The sampling rate is 220500. I'm plotting one second of the function f.
Here is my attempt:
time = 1;
freq = 220500;
t = 0 : 1/freq : 1 - 1/freq;
N = length(t);
df = freq/(N*time);
F = fftshift(fft(cos(2*pi*300*t))/N);
faxis = -N/2 / time : df : (N/2-1) / time;
plot(faxis, real(F));
grid on;
xlim([-500, 500]);
Why do I get odd results when I increase the frequency to 900Hz? These odd results can be fixed by increasing the x-axis limits from, say, 500Hz to 1000Hz. Also, is this the correct approach? I noticed many other people didn't use fftshift(X) (but I think they only did a single sided spectrum analysis).
Thank you.
Here is my response as promised.
The first or your questions related to why you "get odd results when you increase the frequency to 900 Hz" is related to the Matlab's plot rescaling functionality as described by #Castilho. When you change the range of the x-axis, Matlab will try to be helpful and rescale the y-axis. If the peaks lie outside of your specified range, matlab will zoom in on the small numerical errors generated in the process. You can remedy this with the 'ylim' command if it bothers you.
However, your second, more open question "is this the correct approach?" requires a deeper discussion. Allow me to tell you how I would go about making a more flexible solution to achieve your goal of plotting a cosine wave.
You begin with the following:
time = 1;
freq = 220500;
This raises an alarm in my head immediately. Looking at the rest of the post, you appear to be interested in frequencies in the sub-kHz range. If that is the case, then this sampling rate is excessive as the Nyquist limit (sr/2) for this rate is above 100 kHz. I'm guessing you meant to use the common audio sampling rate of 22050 Hz (but I could be wrong here)?
Either way, your analysis works out numerically OK in the end. However, you are not helping yourself to understand how the FFT can be used most effectively for analysis in real-world situations.
Allow me to post how I would do this. The following script does almost exactly what your script does, but opens some potential on which we can build . .
%// These are the user parameters
durT = 1;
fs = 22050;
NFFT = durT*fs;
sigFreq = 300;
%//Calculate time axis
dt = 1/fs;
tAxis = 0:dt:(durT-dt);
%//Calculate frequency axis
df = fs/NFFT;
fAxis = 0:df:(fs-df);
%//Calculate time domain signal and convert to frequency domain
x = cos( 2*pi*sigFreq*tAxis );
F = abs( fft(x, NFFT) / NFFT );
subplot(2,1,1);
plot( fAxis, 2*F )
xlim([0 2*sigFreq])
title('single sided spectrum')
subplot(2,1,2);
plot( fAxis-fs/2, fftshift(F) )
xlim([-2*sigFreq 2*sigFreq])
title('whole fft-shifted spectrum')
You calculate a time axis and calculate your number of FFT points from the length of the time axis. This is very odd. The problem with this approach, is that the frequency resolution of the fft changes as you change the duration of your input signal, because N is dependent on your "time" variable. The matlab fft command will use an FFT size that matches the size of the input signal.
In my example, I calculate the frequency axis directly from the NFFT. This is somewhat irrelevant in the context of the above example, as I set the NFFT to equal the number of samples in the signal. However, using this format helps to demystify your thinking and it becomes very important in my next example.
** SIDE NOTE: You use real(F) in your example. Unless you have a very good reason to only be extracting the real part of the FFT result, then it is much more common to extract the magnitude of the FFT using abs(F). This is the equivalent of sqrt(real(F).^2 + imag(F).^2).**
Most of the time you will want to use a shorter NFFT. This might be because you are perhaps running the analysis in a real time system, or because you want to average the result of many FFTs together to get an idea of the average spectrum for a time varying signal, or because you want to compare spectra of signals that have different duration without wasting information. Just using the fft command with a value of NFFT < the number of elements in your signal will result in an fft calculated from the last NFFT points of the signal. This is a bit wasteful.
The following example is much more relevant to useful application. It shows how you would split a signal into blocks and then process each block and average the result:
%//These are the user parameters
durT = 1;
fs = 22050;
NFFT = 2048;
sigFreq = 300;
%//Calculate time axis
dt = 1/fs;
tAxis = dt:dt:(durT-dt);
%//Calculate frequency axis
df = fs/NFFT;
fAxis = 0:df:(fs-df);
%//Calculate time domain signal
x = cos( 2*pi*sigFreq*tAxis );
%//Buffer it and window
win = hamming(NFFT);%//chose window type based on your application
x = buffer(x, NFFT, NFFT/2); %// 50% overlap between frames in this instance
x = x(:, 2:end-1); %//optional step to remove zero padded frames
x = ( x' * diag(win) )'; %//efficiently window each frame using matrix algebra
%// Calculate mean FFT
F = abs( fft(x, NFFT) / sum(win) );
F = mean(F,2);
subplot(2,1,1);
plot( fAxis, 2*F )
xlim([0 2*sigFreq])
title('single sided spectrum')
subplot(2,1,2);
plot( fAxis-fs/2, fftshift(F) )
xlim([-2*sigFreq 2*sigFreq])
title('whole fft-shifted spectrum')
I use a hamming window in the above example. The window that you choose should suit the application http://en.wikipedia.org/wiki/Window_function
The overlap amount that you choose will depend somewhat on the type of window you use. In the above example, the Hamming window weights the samples in each buffer towards zero away from the centre of each frame. In order to use all of the information in the input signal, it is important to use some overlap. However, if you just use a plain rectangular window, the overlap becomes pointless as all samples are weighted equally. The more overlap you use, the more processing is required to calculate the mean spectrum.
Hope this helps your understanding.
Your result is perfectly right. Your frequency axis calculation is also right. The problem lies on the y axis scale. When you use the function xlims, matlab automatically recalculates the y scale so that you can see "meaningful" data. When the cosine peaks lie outside the limit you chose (when f>500Hz), there are no peaks to show, so the scale is calculated based on some veeeery small noise (here at my computer, with matlab 2011a, the y scale was 10-16).
Changing the limit is indeed the correct approach, because if you don't change it you can't see the peaks on the frequency spectrum.
One thing I noticed, however. Is there a reason for you to plot the real part of the transform? Usually, it is abs(F) that gets plotted, and not the real part.
edit: Actually, you're frequency axis is only right because df, in this case, is 1. The faxis line is right, but the df calculation isn't.
The FFT calculates N points from -Fs/2 to Fs/2. So N points over a range of Fs yields a df of Fs/N. As N/time = Fs => time = N/Fs. Substituting that on the expression of df you used: your_df = Fs/N*(N/Fs) = (Fs/N)^2. As Fs/N = 1, the final result was right :P