Matlab - Signal Noise Removal - matlab

I have a vector of data, which contains integers in the range -20 20.
Bellow is a plot with the values:
This is a sample of 96 elements from the vector data. The majority of the elements are situated in the interval -2, 2, as can be seen from the above plot.
I want to eliminate the noise from the data. I want to eliminate the low amplitude peaks, and keep the high amplitude peak, namely, peaks like the one at index 74.
Basically, I just want to increase the contrast between the high amplitude peaks and low amplitude peaks, and if it would be possible to eliminate the low amplitude peaks.
Could you please suggest me a way of doing this?
I have tried mapstd function, but the problem is that it also normalizes that high amplitude peak.
I was thinking at using the wavelet transform toolbox, but I don't know exact how to reconstruct the data from the wavelet decomposition coefficients.
Can you recommend me a way of doing this?

One approach to detect outliers is to use the three standard deviation rule. An example:
%# some random data resembling yours
x = randn(100,1);
x(75) = -14;
subplot(211), plot(x)
%# tone down the noisy points
mu = mean(x); sd = std(x); Z = 3;
idx = ( abs(x-mu) > Z*sd ); %# outliers
x(idx) = Z*sd .* sign(x(idx)); %# cap values at 3*STD(X)
subplot(212), plot(x)
EDIT:
It seems I misunderstood the goal here. If you want to do the opposite, maybe something like this instead:
%# some random data resembling yours
x = randn(100,1);
x(75) = -14; x(25) = 20;
subplot(211), plot(x)
%# zero out everything but the high peaks
mu = mean(x); sd = std(x); Z = 3;
x( abs(x-mu) < Z*sd ) = 0;
subplot(212), plot(x)

If it's for demonstrative purposes only, and you're not actually going to be using these scaled values for anything, I sometimes like to increase contrast in the following way:
% your data is in variable 'a'
plot(a.*abs(a)/max(abs(a)))
edit: since we're posting images, here's mine (before/after):

You might try a split window filter. If x is your current sample, the filter would look something like:
k = [L L L L L L 0 0 0 x 0 0 0 R R R R R R]
For each sample x, you average a band of surrounding samples on the left (L) and a band of surrounding samples on the right. If your samples are positive and negative (as yours are) you should take the abs. value first. You then divide the sample x by the average value of these surrounding samples.
y[n] = x[n] / mean(abs(x([L R])))
Each time you do this the peaks are accentuated and the noise is flattened. You can do more than one pass to increase the effect. It is somewhat sensitive to the selection of the widths of these bands, but can work. For example:
Two passes:

What you actually need is some kind of compression to scale your data, that is: values between -2 and 2 are scale by a certain factor and everything else is scaled by another factor. A crude way to accomplish such a thing, is by putting all small values to zero, i.e.
x = randn(1,100)/2; x(50) = 20; x(25) = -15; % just generating some data
threshold = 2;
smallValues = (abs(x) <= threshold);
y = x;
y(smallValues) = 0;
figure;
plot(x,'DisplayName','x'); hold on;
plot(y,'r','DisplayName','y');
legend show;
Please do not that this is a very nonlinear operation (e.g. when you have wanted peaks valued at 2.1 and 1.9, they will produce very different behavior: one will be removed, the other will be kept). So for displaying, this might be all you need, for further processing it might depend on what you are trying to do.

To eliminate the low amplitude peaks, you're going to equate all the low amplitude signal to noise and ignore.
If you have any apriori knowledge, just use it.
if your signal is a, then
a(abs(a)<X) = 0
where X is the max expected size of your noise.
If you want to get fancy, and find this "on the fly" then, use kmeans of 3. It's in the statistics toolbox, here:
http://www.mathworks.com/help/toolbox/stats/kmeans.html
Alternatively, you can use Otsu's method on the absolute values of the data, and use the sign back.
Note, these and every other technique I've seen on this thread is assuming you are doing post processing. If you are doing this processing in real time, things will have to change.

Related

Normal distribution with the minimum values in Matlab

I tried to generate 1000 the random values in normal distribution by the normrnd function.
A = normrnd(4,1,[1000 1]);
I would like to set the minimum value is 2. However, that function just can define the mean and sd. How can I set the minimum value is 2 ?
You can't. Gaussian or normally distributed numbers are in a bell curve, with the tails tailing off to infinity. What you can do is "censor" them by eliminating every number beyond a cut-off.
Since you choose mean = 4 and sigma = 1, you will end up ~95% elements of A fall within range [2,6]. The number of elements whose values smaller than 2 is about 2.5%. If you consider this figure is small, you can wrap these elements to a minimum value. For example:
A = normrnd(4,1,[1000 1]);
A(A < 2) = A(A<2) + 2 - min(A(A<2))
Of course, it is technically not gaussian distribution. However if you have total control of mean and sigma, you can get a "more gaussian like" distribution by adding an offset to A:
A = A + 2 - min(A)
Note: This assumes you can have an arbitrarily set standard deviation, which may not be the case
As others have said, you cannot specify a lower bound for a true Gaussian. However, you can generate a Gaussian and estimate 1-p percent of values to be above and then ignore p percent of values (which will fall outside your cutoff).
For example, in the following code, I am generating a Gaussian where 95% of data-points fall above 2. Then I am removing all points below 2, knowing that 5% of data will be removed.
This is a solution because setting as p gets closer to zero, your chances of getting uncensored sample data that follows your Gaussian curve and is entirely above your cutoff goes to 100% (Really it's defined by the p/n ratio, but if n is fixed this is true).
n = 1000; % number of samples
cutoff = 2; % Cutoff point for min-value
mu = 4; % Mean
p = .05; % Percentile you would like to cutoff
z = -sqrt(2) * erfcinv(p*2); % compute z score
sigma = (cutoff - mu)/z; % compute standard deviation
A = normrnd(mu,sigma,[n 1]);
I would recommend removing values below the cutoff rather than re-attributing them to the lower bound of your distribution, but that is up to you.
A(A<cutoff) = []; % removes all values of A less than cutoff
If you want to be symmetrical (which you should to prevent sample skew) the following should work.
A(A>(2*mu-cutoff)) = [];

How to make a vector that follows a certain trend?

I have a set of data with over 4000 points. I want to exclude grooves from them, ideally from the point from which they start. The data look for example like this:
The problem with this is the noise I get at the top of the plateaus. I have an idea, in which I would take an average value of the most common within some boundaries (again, ideally sth like the red line here:
and then I would construct a temporary matrix, which would fill up one by one with Y if they are less than this average. If the Y(i) would rise above average, the matrix would find its minima and compare it with the global minima. If the temporary matrix's minima wouldn't be sth like 80% of the global minima, it would be discarded as noise.
I've tried using mean(Y), interpolating and fitting it in a polynomial (the green line) - none of those method would cut it to the point I would be satisfied.
I need this to be extremely robust and it doesn't need to be quick. The top and bottom values can vary a lot, as well as the shape of the plateaus. The groove width is more or less the same.
Do you have any ideas? Again, the point is to extract the values that would make the groove.
How about a median filter?
Let's define some noisy data similar to yours, and plot it in blue:
x = .2*sin((0:9999)/1000); %// signal
x(1000:1099) = x(1000:1099) + sin((0:99)/50*pi); %// noise: spike
x(5000:5199) = x(5000:5199) - sin((0:199)/100*pi); %// noise: wider spike
x = x + .05*sin((0:9999)/10); %// noise: high-freq ripple
plot(x)
Now apply the median filter (using medfilt2 from the Image Processing Toolbox) and plot in red. The parameter k controls the filter memory. It should chosen to be large compared to noise variations, and small compared to signal variations:
k = 500; %// filter memory. Choose as needed
y = medfilt2(x,[1 k]);
hold on
plot(y, 'r', 'linewidth', 2)
In case you don't have the image processing toolbox and can't use medfilt2 a method that's more manual. Skip the extreme values, and do a curve fit with sin1 as curve type. Note that this will only work if the signal is in fact a sine wave!
x = linspace(0,3*pi,1000);
y1 = sin(x) + rand()*sin(100*x).*(mod(round(10*x),5)<3);
y2 = 20*(mod(round(5*x),5) == 0).*sin(20*x);
y = y1 + y2; %// A messy sine-wave
yy = y; %// Store the messy sine-wave
[~, idx] = sort(y);
y(idx(1:round(0.15*end))) = y(idx(round(0.15*end))); %// Flatten out the smallest values
y(idx(round(0.85*end):end)) = y(idx(round(0.85*end)));%// Flatten out the largest values
[foo goodness output] = fit(x.',y.', 'sin1'); %// Do a curve fit
plot(foo,x,y) %// Plot it
hold on
plot(x,yy,'black')
Might not be perfect, but it's a step in the right direction.

Computing a moving average

I need to compute a moving average over a data series, within a for loop. I have to get the moving average over N=9 days. The array I'm computing in is 4 series of 365 values (M), which itself are mean values of another set of data. I want to plot the mean values of my data with the moving average in one plot.
I googled a bit about moving averages and the "conv" command and found something which i tried implementing in my code.:
hold on
for ii=1:4;
M=mean(C{ii},2)
wts = [1/24;repmat(1/12,11,1);1/24];
Ms=conv(M,wts,'valid')
plot(M)
plot(Ms,'r')
end
hold off
So basically, I compute my mean and plot it with a (wrong) moving average. I picked the "wts" value right off the mathworks site, so that is incorrect. (source: http://www.mathworks.nl/help/econ/moving-average-trend-estimation.html) My problem though, is that I do not understand what this "wts" is. Could anyone explain? If it has something to do with the weights of the values: that is invalid in this case. All values are weighted the same.
And if I am doing this entirely wrong, could I get some help with it?
My sincerest thanks.
There are two more alternatives:
1) filter
From the doc:
You can use filter to find a running average without using a for loop.
This example finds the running average of a 16-element vector, using a
window size of 5.
data = [1:0.2:4]'; %'
windowSize = 5;
filter(ones(1,windowSize)/windowSize,1,data)
2) smooth as part of the Curve Fitting Toolbox (which is available in most cases)
From the doc:
yy = smooth(y) smooths the data in the column vector y using a moving
average filter. Results are returned in the column vector yy. The
default span for the moving average is 5.
%// Create noisy data with outliers:
x = 15*rand(150,1);
y = sin(x) + 0.5*(rand(size(x))-0.5);
y(ceil(length(x)*rand(2,1))) = 3;
%// Smooth the data using the loess and rloess methods with a span of 10%:
yy1 = smooth(x,y,0.1,'loess');
yy2 = smooth(x,y,0.1,'rloess');
In 2016 MATLAB added the movmean function that calculates a moving average:
N = 9;
M_moving_average = movmean(M,N)
Using conv is an excellent way to implement a moving average. In the code you are using, wts is how much you are weighing each value (as you guessed). the sum of that vector should always be equal to one. If you wish to weight each value evenly and do a size N moving filter then you would want to do
N = 7;
wts = ones(N,1)/N;
sum(wts) % result = 1
Using the 'valid' argument in conv will result in having fewer values in Ms than you have in M. Use 'same' if you don't mind the effects of zero padding. If you have the signal processing toolbox you can use cconv if you want to try a circular moving average. Something like
N = 7;
wts = ones(N,1)/N;
cconv(x,wts,N);
should work.
You should read the conv and cconv documentation for more information if you haven't already.
I would use this:
% does moving average on signal x, window size is w
function y = movingAverage(x, w)
k = ones(1, w) / w
y = conv(x, k, 'same');
end
ripped straight from here.
To comment on your current implementation. wts is the weighting vector, which from the Mathworks, is a 13 point average, with special attention on the first and last point of weightings half of the rest.

How do I sync two or more waveforms in MATLAB?

I'll try to be more specific: I have several time histories of a signal which have pretty much all the same behaviour (sine waves) but all start at a different point in time. How do I automatically detect the initial time lag and delete it such that all sine waves start at the same instant in time?
If you have two signals, x and y, each being a n x 1 matrix where y is a shifted version of x:
[c,lags] = xcorr(x,y); % c is the correlation, should have a clear peak
s = lags(c==max(c)); % s is the shift you need
y2 = circshift(y,s); % y2 should now overlap x
(Demo purposes only - I don't suggest you circshift your actual data). The shift you are looking for in this case should ideally be relatively small compared to the length of x and y. A lot depends on the noise level and the nature of the offset.
The following works pretty well under low noise conditions and fast sampling and may do depending on your requirements for accuracy. It uses a simple threshold and thus is subject to inaccuracy when things get noisy. Adjust thresh to a low value above the noise.
Nwav = 3;
Np = 100;
tmax = 50;
A = 1000;
Nz = Np/5;
%%%%%%%%%%%%%%
thresh = A/50;
%%%%%%%%%%%%%%
% generate some waveforms
t = [0:tmax/(Np-1):tmax].';
w = rand(1,Nwav);
offs = round(rand(1,Nwav)*100);
sig = [A*sin(t(1:end-Nz)*w) ; zeros(Nz,Nwav)] + randn(Np,Nwav);
for ii=1:Nwav
sig(:,ii) = circshift(sig(:,ii),round(rand()*Nz));
end
figure, plot(t,sig)
hold on, plot(t,repmat(thresh,length(t),1),'k--')
% use the threshold and align the waveforms
for ii=1:Nwav
[ir ic] = find(sig(:,ii)>thresh,1)
sig(:,ii) = circshift(sig(:,ii),-ir+1);
end
figure, plot(t,sig)
hold on, plot(t,repmat(thresh,length(t),1),'k--')
There is room for improvement (noise filtering, slope detection) but this should get you started.
I also recommend you look into waveform processing toolboxes, in matlab central for instance.

MATLAB FFT xaxis limits messing up and fftshift

This is the first time I'm using the fft function and I'm trying to plot the frequency spectrum of a simple cosine function:
f = cos(2*pi*300*t)
The sampling rate is 220500. I'm plotting one second of the function f.
Here is my attempt:
time = 1;
freq = 220500;
t = 0 : 1/freq : 1 - 1/freq;
N = length(t);
df = freq/(N*time);
F = fftshift(fft(cos(2*pi*300*t))/N);
faxis = -N/2 / time : df : (N/2-1) / time;
plot(faxis, real(F));
grid on;
xlim([-500, 500]);
Why do I get odd results when I increase the frequency to 900Hz? These odd results can be fixed by increasing the x-axis limits from, say, 500Hz to 1000Hz. Also, is this the correct approach? I noticed many other people didn't use fftshift(X) (but I think they only did a single sided spectrum analysis).
Thank you.
Here is my response as promised.
The first or your questions related to why you "get odd results when you increase the frequency to 900 Hz" is related to the Matlab's plot rescaling functionality as described by #Castilho. When you change the range of the x-axis, Matlab will try to be helpful and rescale the y-axis. If the peaks lie outside of your specified range, matlab will zoom in on the small numerical errors generated in the process. You can remedy this with the 'ylim' command if it bothers you.
However, your second, more open question "is this the correct approach?" requires a deeper discussion. Allow me to tell you how I would go about making a more flexible solution to achieve your goal of plotting a cosine wave.
You begin with the following:
time = 1;
freq = 220500;
This raises an alarm in my head immediately. Looking at the rest of the post, you appear to be interested in frequencies in the sub-kHz range. If that is the case, then this sampling rate is excessive as the Nyquist limit (sr/2) for this rate is above 100 kHz. I'm guessing you meant to use the common audio sampling rate of 22050 Hz (but I could be wrong here)?
Either way, your analysis works out numerically OK in the end. However, you are not helping yourself to understand how the FFT can be used most effectively for analysis in real-world situations.
Allow me to post how I would do this. The following script does almost exactly what your script does, but opens some potential on which we can build . .
%// These are the user parameters
durT = 1;
fs = 22050;
NFFT = durT*fs;
sigFreq = 300;
%//Calculate time axis
dt = 1/fs;
tAxis = 0:dt:(durT-dt);
%//Calculate frequency axis
df = fs/NFFT;
fAxis = 0:df:(fs-df);
%//Calculate time domain signal and convert to frequency domain
x = cos( 2*pi*sigFreq*tAxis );
F = abs( fft(x, NFFT) / NFFT );
subplot(2,1,1);
plot( fAxis, 2*F )
xlim([0 2*sigFreq])
title('single sided spectrum')
subplot(2,1,2);
plot( fAxis-fs/2, fftshift(F) )
xlim([-2*sigFreq 2*sigFreq])
title('whole fft-shifted spectrum')
You calculate a time axis and calculate your number of FFT points from the length of the time axis. This is very odd. The problem with this approach, is that the frequency resolution of the fft changes as you change the duration of your input signal, because N is dependent on your "time" variable. The matlab fft command will use an FFT size that matches the size of the input signal.
In my example, I calculate the frequency axis directly from the NFFT. This is somewhat irrelevant in the context of the above example, as I set the NFFT to equal the number of samples in the signal. However, using this format helps to demystify your thinking and it becomes very important in my next example.
** SIDE NOTE: You use real(F) in your example. Unless you have a very good reason to only be extracting the real part of the FFT result, then it is much more common to extract the magnitude of the FFT using abs(F). This is the equivalent of sqrt(real(F).^2 + imag(F).^2).**
Most of the time you will want to use a shorter NFFT. This might be because you are perhaps running the analysis in a real time system, or because you want to average the result of many FFTs together to get an idea of the average spectrum for a time varying signal, or because you want to compare spectra of signals that have different duration without wasting information. Just using the fft command with a value of NFFT < the number of elements in your signal will result in an fft calculated from the last NFFT points of the signal. This is a bit wasteful.
The following example is much more relevant to useful application. It shows how you would split a signal into blocks and then process each block and average the result:
%//These are the user parameters
durT = 1;
fs = 22050;
NFFT = 2048;
sigFreq = 300;
%//Calculate time axis
dt = 1/fs;
tAxis = dt:dt:(durT-dt);
%//Calculate frequency axis
df = fs/NFFT;
fAxis = 0:df:(fs-df);
%//Calculate time domain signal
x = cos( 2*pi*sigFreq*tAxis );
%//Buffer it and window
win = hamming(NFFT);%//chose window type based on your application
x = buffer(x, NFFT, NFFT/2); %// 50% overlap between frames in this instance
x = x(:, 2:end-1); %//optional step to remove zero padded frames
x = ( x' * diag(win) )'; %//efficiently window each frame using matrix algebra
%// Calculate mean FFT
F = abs( fft(x, NFFT) / sum(win) );
F = mean(F,2);
subplot(2,1,1);
plot( fAxis, 2*F )
xlim([0 2*sigFreq])
title('single sided spectrum')
subplot(2,1,2);
plot( fAxis-fs/2, fftshift(F) )
xlim([-2*sigFreq 2*sigFreq])
title('whole fft-shifted spectrum')
I use a hamming window in the above example. The window that you choose should suit the application http://en.wikipedia.org/wiki/Window_function
The overlap amount that you choose will depend somewhat on the type of window you use. In the above example, the Hamming window weights the samples in each buffer towards zero away from the centre of each frame. In order to use all of the information in the input signal, it is important to use some overlap. However, if you just use a plain rectangular window, the overlap becomes pointless as all samples are weighted equally. The more overlap you use, the more processing is required to calculate the mean spectrum.
Hope this helps your understanding.
Your result is perfectly right. Your frequency axis calculation is also right. The problem lies on the y axis scale. When you use the function xlims, matlab automatically recalculates the y scale so that you can see "meaningful" data. When the cosine peaks lie outside the limit you chose (when f>500Hz), there are no peaks to show, so the scale is calculated based on some veeeery small noise (here at my computer, with matlab 2011a, the y scale was 10-16).
Changing the limit is indeed the correct approach, because if you don't change it you can't see the peaks on the frequency spectrum.
One thing I noticed, however. Is there a reason for you to plot the real part of the transform? Usually, it is abs(F) that gets plotted, and not the real part.
edit: Actually, you're frequency axis is only right because df, in this case, is 1. The faxis line is right, but the df calculation isn't.
The FFT calculates N points from -Fs/2 to Fs/2. So N points over a range of Fs yields a df of Fs/N. As N/time = Fs => time = N/Fs. Substituting that on the expression of df you used: your_df = Fs/N*(N/Fs) = (Fs/N)^2. As Fs/N = 1, the final result was right :P