Cross Correlation with signals of different length in MATLAB - matlab

I have 2 signals of different lengths where the shorter signal is the same as the longer n samples shifted. I want to find the maximum normalized cross-correlation between these two signals. Since it is normalized should give 1. The xcorr function lags vary from -441 to 441 samples.
This will be used for slightly different signals later on, but for now I'm trying with equal signals.
Zero-padding (as I've done in the image) gives me a wrong correlation since the zeros become a part of the correlation calculation.
Any ideas how to accomplish this?

Suppose x and y are the shorter and longer signals you have.
nx = length(x);
ny = length(y);
cc = nan(1,ny-nx+1);
for ii = 0 : ny-nx
id = (1:nx) + ii;
cc(ii+1) = sum(x.*y(id))/(sqrt(sum(x.^2)*sum(y(id).^2)));
end
[ccmx,idmx] = max(cc);
Now you have the position of the maximum cross coefficient.
if the lag starts from -441 as you gave (where x and y are aligned at the left). The max should be at lag=idmx-442.

If you want to remove the first and last zeros of a vector you can use this here
A = A(find(A~=0, 1, 'first'):find(A~=0, 1, 'last'));
To use it directly in your correlation, try to use this here (where A is your red line and B the blue one)
xcorr(A(find(A~=0, 1, 'first'):find(A~=0, 1, 'last')), B(find(A~=0, 1, 'first'):find(A~=0, 1, 'last')));

Related

Normal distribution with the minimum values in Matlab

I tried to generate 1000 the random values in normal distribution by the normrnd function.
A = normrnd(4,1,[1000 1]);
I would like to set the minimum value is 2. However, that function just can define the mean and sd. How can I set the minimum value is 2 ?
You can't. Gaussian or normally distributed numbers are in a bell curve, with the tails tailing off to infinity. What you can do is "censor" them by eliminating every number beyond a cut-off.
Since you choose mean = 4 and sigma = 1, you will end up ~95% elements of A fall within range [2,6]. The number of elements whose values smaller than 2 is about 2.5%. If you consider this figure is small, you can wrap these elements to a minimum value. For example:
A = normrnd(4,1,[1000 1]);
A(A < 2) = A(A<2) + 2 - min(A(A<2))
Of course, it is technically not gaussian distribution. However if you have total control of mean and sigma, you can get a "more gaussian like" distribution by adding an offset to A:
A = A + 2 - min(A)
Note: This assumes you can have an arbitrarily set standard deviation, which may not be the case
As others have said, you cannot specify a lower bound for a true Gaussian. However, you can generate a Gaussian and estimate 1-p percent of values to be above and then ignore p percent of values (which will fall outside your cutoff).
For example, in the following code, I am generating a Gaussian where 95% of data-points fall above 2. Then I am removing all points below 2, knowing that 5% of data will be removed.
This is a solution because setting as p gets closer to zero, your chances of getting uncensored sample data that follows your Gaussian curve and is entirely above your cutoff goes to 100% (Really it's defined by the p/n ratio, but if n is fixed this is true).
n = 1000; % number of samples
cutoff = 2; % Cutoff point for min-value
mu = 4; % Mean
p = .05; % Percentile you would like to cutoff
z = -sqrt(2) * erfcinv(p*2); % compute z score
sigma = (cutoff - mu)/z; % compute standard deviation
A = normrnd(mu,sigma,[n 1]);
I would recommend removing values below the cutoff rather than re-attributing them to the lower bound of your distribution, but that is up to you.
A(A<cutoff) = []; % removes all values of A less than cutoff
If you want to be symmetrical (which you should to prevent sample skew) the following should work.
A(A>(2*mu-cutoff)) = [];

Matlab Convolution regarding the conv() function and length()/size() function

I'm kind've new to Matlab and stack overflow to begin with, so if I do something wrong outside of the guidelines, please don't hesitate to point it out. Thanks!
I have been trying to do convolution between two functions and I have been having a hard time trying to get it to work.
t=0:.01:10;
h=exp(-t);
x=zeros(size(t)); % When I used length(t), I would get an error that says in conv(), A and B must be vectors.
x(1)=2;
x(4)=5;
y=conv(h,x);
figure; subplot(3,1,1);plot(t,x); % The discrete function would not show (at x=1 and x=4)
subplot(3,1,2);plot(t,h);
subplot(3,1,3);plot(t,y(1:length(t))); %Nothing is plotted here when ran
I commented my issues with the code. I don't understand the difference of length and size in this case and how it would make a difference.
For the second comment, x=1 should have an amplitude of 2. While x=4 should have an amplitude of 5. When plotted, it only shows nothing in the locations specified but looks jumbled up at x=0. I'm assuming that's the reason why the convoluted plot won't be displayed.
The original problem statement is given if it helps to understand what I was thinking throughout.
Consider an input signal x(t) that consists of two delta functions at t = 1 and t = 4 with amplitudes A1 = 5 and A2 = 2, respectively, to a linear system with impulse response h that is an exponential pulse (h(t) = e ^−t ). Plot x(t), h(t) and the output of the linear system y(t) for t in the range of 0 to 10 using increments of 0.01. Use the MATLAB built-in function conv.
The initial question regarding size vs length
length yields a scalar that is equal to the largest dimension of the input. In the case of your array, the size is 1 x N, so length yields N.
size(t)
% 1 1001
length(t)
% 1001
If you pass a scalar (N) to ones, zeros, or a similar function, it will create a square matrix that is N x N. This results in the error that you see when using conv since conv does not accept matrix inputs.
size(ones(length(t)))
% 1001 1001
When you pass a vector to ones or zeros, the output will be that size so since size returns a vector (as shown above), the output is the same size (and a vector) so conv does not have any issues
size(ones(size(t)))
% 1 1001
If you want a vector, you need to explicitly specify the number of rows and columns. Also, in my opinion, it's better to use numel to the number of elements in a vector as it's less ambiguous than length
z = zeros(1, numel(t));
The second question regarding the convolution output:
First of all, the impulses that you create are at the first and fourth index of x and not at the locations where t = 1 and t = 4. Since you create t using a spacing of 0.01, t(1) actually corresponds to t = 0 and t(4) corresponds to t = 0.03
You instead want to use the value of t to specify where to put your impulses
x(t == 1) = 2;
x(t == 4) = 5;
Note that due to floating point errors, you may not have exactly t == 1 and t == 4 so you can use a small epsilon instead
x(abs(t - 1) < eps) = 2;
x(abs(t - 4) < eps) = 5;
Once we make this change, we get the expected scaled and shifted versions of the input function.

Computing a moving average

I need to compute a moving average over a data series, within a for loop. I have to get the moving average over N=9 days. The array I'm computing in is 4 series of 365 values (M), which itself are mean values of another set of data. I want to plot the mean values of my data with the moving average in one plot.
I googled a bit about moving averages and the "conv" command and found something which i tried implementing in my code.:
hold on
for ii=1:4;
M=mean(C{ii},2)
wts = [1/24;repmat(1/12,11,1);1/24];
Ms=conv(M,wts,'valid')
plot(M)
plot(Ms,'r')
end
hold off
So basically, I compute my mean and plot it with a (wrong) moving average. I picked the "wts" value right off the mathworks site, so that is incorrect. (source: http://www.mathworks.nl/help/econ/moving-average-trend-estimation.html) My problem though, is that I do not understand what this "wts" is. Could anyone explain? If it has something to do with the weights of the values: that is invalid in this case. All values are weighted the same.
And if I am doing this entirely wrong, could I get some help with it?
My sincerest thanks.
There are two more alternatives:
1) filter
From the doc:
You can use filter to find a running average without using a for loop.
This example finds the running average of a 16-element vector, using a
window size of 5.
data = [1:0.2:4]'; %'
windowSize = 5;
filter(ones(1,windowSize)/windowSize,1,data)
2) smooth as part of the Curve Fitting Toolbox (which is available in most cases)
From the doc:
yy = smooth(y) smooths the data in the column vector y using a moving
average filter. Results are returned in the column vector yy. The
default span for the moving average is 5.
%// Create noisy data with outliers:
x = 15*rand(150,1);
y = sin(x) + 0.5*(rand(size(x))-0.5);
y(ceil(length(x)*rand(2,1))) = 3;
%// Smooth the data using the loess and rloess methods with a span of 10%:
yy1 = smooth(x,y,0.1,'loess');
yy2 = smooth(x,y,0.1,'rloess');
In 2016 MATLAB added the movmean function that calculates a moving average:
N = 9;
M_moving_average = movmean(M,N)
Using conv is an excellent way to implement a moving average. In the code you are using, wts is how much you are weighing each value (as you guessed). the sum of that vector should always be equal to one. If you wish to weight each value evenly and do a size N moving filter then you would want to do
N = 7;
wts = ones(N,1)/N;
sum(wts) % result = 1
Using the 'valid' argument in conv will result in having fewer values in Ms than you have in M. Use 'same' if you don't mind the effects of zero padding. If you have the signal processing toolbox you can use cconv if you want to try a circular moving average. Something like
N = 7;
wts = ones(N,1)/N;
cconv(x,wts,N);
should work.
You should read the conv and cconv documentation for more information if you haven't already.
I would use this:
% does moving average on signal x, window size is w
function y = movingAverage(x, w)
k = ones(1, w) / w
y = conv(x, k, 'same');
end
ripped straight from here.
To comment on your current implementation. wts is the weighting vector, which from the Mathworks, is a 13 point average, with special attention on the first and last point of weightings half of the rest.

FFT MATLAB code correct?

I'd like to generate the frequency spectrum of seven concatenated cosine functions.
I am unsure whether my code is correct; in particular, whether N = time*freq*7 is correct, or whether it should be N = time*freq (without the times seven).
My code is as follow:
sn = [1, 2, 3, 4, 5, 6, 7, 8];
time = 1;
freq = 22050;
N = time*freq*7;
dt = 1/freq;
t = 0 : dt : time - dt;
y = #(sn, phasePosNeg) cos(2*pi*(1200-100*sn) * t + phasePosNeg * sn*pi/10);
f = [y(sn(1), 1), y(sn(2), -1), y(sn(3), 1), y(sn(4), -1), y(sn(5), 1), y(sn(6), -1), y(sn(7), 1)];
F = abs(fftshift(fft(f)))/N;
df = freq/N;
faxis = -freq/2 : df : (freq/2-1/freq);
plot(faxis, F);
grid on;
axis([-1500, 1500, 0, 0.6]);
title('Frequency Spectrum Of Concatenated Cosine Functions');
xlabel('Frequency (Hz)');
ylabel('Magnitude');
I suppose the essense of my question is:
Should the height of the spikes equal 1/7 of 0.5, or simply 0.5? (All cosine functions have an amplitude of 1.)
Thank you.
Let me correct/help you on a few things:
1) the fourier transform is typically displayed in dB for its magnitute. 20*log base10(FFT coeff)
2) there is no need to divide your FFT amplitudes by any value of N.
F = abs(fftshift(fft(f)))/N; %get rid of the N or N =1
3) if N is the number of points in your FFT N = size(t); because you've taken that many samples of the sin/cos functions
4) When plotting the function remember that the FFT spans from -Pi to + Pi and you need to remap it to the frequency spectrum using your sampling frequency
5) becasue of the large phase discontinuties between these functions, dont expect the forrier transform the be a bunch of large narrow peaks. (otherwise Phase Modulation would be the modulationscheme of choice... zero bandwidth)
This is clearly homework, so I'm just going to give some direction: In the concatenated case, think of it as though you're adding six waveforms, which are each padded by zeros (6x the length of the waveform), and then offset these so they don't overlap and then added together to form the concatenated waveform. In the case where you're adding the FFTs of the individual waveforms, also remember that you're assuming that they are periodic. So you basically need to handle the zero padding so that you're comparing apples to apples. One check, of course, is to just use one waveform throughout and make sure it works in this case (and this should return exactly the same result since the FFT assumes that waveform is periodic -- that is, infinitely concatenated to itself).
Thinking of this in terms of Parseval's Theorem will probably be helpful in figuring out how to interpret and normalize things.
It is correct to use N=(7*time)*freq, since your actual time of the waveform is 7*time, regardless of how you constructed it.
Several of the comments talk about discontinuities, but it should be noted that these usually exist in the FFT anyway, since the FFT waveform is assumed to be periodic, and that usually means that there are effectively discontinuities at the boundaries even in the non-concatenated case.

Matlab - Signal Noise Removal

I have a vector of data, which contains integers in the range -20 20.
Bellow is a plot with the values:
This is a sample of 96 elements from the vector data. The majority of the elements are situated in the interval -2, 2, as can be seen from the above plot.
I want to eliminate the noise from the data. I want to eliminate the low amplitude peaks, and keep the high amplitude peak, namely, peaks like the one at index 74.
Basically, I just want to increase the contrast between the high amplitude peaks and low amplitude peaks, and if it would be possible to eliminate the low amplitude peaks.
Could you please suggest me a way of doing this?
I have tried mapstd function, but the problem is that it also normalizes that high amplitude peak.
I was thinking at using the wavelet transform toolbox, but I don't know exact how to reconstruct the data from the wavelet decomposition coefficients.
Can you recommend me a way of doing this?
One approach to detect outliers is to use the three standard deviation rule. An example:
%# some random data resembling yours
x = randn(100,1);
x(75) = -14;
subplot(211), plot(x)
%# tone down the noisy points
mu = mean(x); sd = std(x); Z = 3;
idx = ( abs(x-mu) > Z*sd ); %# outliers
x(idx) = Z*sd .* sign(x(idx)); %# cap values at 3*STD(X)
subplot(212), plot(x)
EDIT:
It seems I misunderstood the goal here. If you want to do the opposite, maybe something like this instead:
%# some random data resembling yours
x = randn(100,1);
x(75) = -14; x(25) = 20;
subplot(211), plot(x)
%# zero out everything but the high peaks
mu = mean(x); sd = std(x); Z = 3;
x( abs(x-mu) < Z*sd ) = 0;
subplot(212), plot(x)
If it's for demonstrative purposes only, and you're not actually going to be using these scaled values for anything, I sometimes like to increase contrast in the following way:
% your data is in variable 'a'
plot(a.*abs(a)/max(abs(a)))
edit: since we're posting images, here's mine (before/after):
You might try a split window filter. If x is your current sample, the filter would look something like:
k = [L L L L L L 0 0 0 x 0 0 0 R R R R R R]
For each sample x, you average a band of surrounding samples on the left (L) and a band of surrounding samples on the right. If your samples are positive and negative (as yours are) you should take the abs. value first. You then divide the sample x by the average value of these surrounding samples.
y[n] = x[n] / mean(abs(x([L R])))
Each time you do this the peaks are accentuated and the noise is flattened. You can do more than one pass to increase the effect. It is somewhat sensitive to the selection of the widths of these bands, but can work. For example:
Two passes:
What you actually need is some kind of compression to scale your data, that is: values between -2 and 2 are scale by a certain factor and everything else is scaled by another factor. A crude way to accomplish such a thing, is by putting all small values to zero, i.e.
x = randn(1,100)/2; x(50) = 20; x(25) = -15; % just generating some data
threshold = 2;
smallValues = (abs(x) <= threshold);
y = x;
y(smallValues) = 0;
figure;
plot(x,'DisplayName','x'); hold on;
plot(y,'r','DisplayName','y');
legend show;
Please do not that this is a very nonlinear operation (e.g. when you have wanted peaks valued at 2.1 and 1.9, they will produce very different behavior: one will be removed, the other will be kept). So for displaying, this might be all you need, for further processing it might depend on what you are trying to do.
To eliminate the low amplitude peaks, you're going to equate all the low amplitude signal to noise and ignore.
If you have any apriori knowledge, just use it.
if your signal is a, then
a(abs(a)<X) = 0
where X is the max expected size of your noise.
If you want to get fancy, and find this "on the fly" then, use kmeans of 3. It's in the statistics toolbox, here:
http://www.mathworks.com/help/toolbox/stats/kmeans.html
Alternatively, you can use Otsu's method on the absolute values of the data, and use the sign back.
Note, these and every other technique I've seen on this thread is assuming you are doing post processing. If you are doing this processing in real time, things will have to change.