measure similarity between 1 dimensional vectors - matlab

EDITED QUESTION
I have n signals of equal length.
X_signal
Y_signal
...
Z_signal
I calculate minima of these signals and I store their location (in time) in the vectors
X = [x1 x2 x3 x4 ... x100]
Y = [y1 y2 y3 y4 ... y150]
...
Z = [z1 z2 z3 z4 ... z110]
You can think about X,Y,..Z as time series that can have different lenght.
I assume that the original signals are similar if they have their minima almost at the same locations.
I would like to know what would be a smart approach to measure this kind of similarity keeping in mind that some minima in X,Y,Z can be just noise.
For example if X = [1 5 8 12 15 20] and Y = [1.5 5.5 7.5 10 12 15.5 20.2] they should be similar since almost all the points have the same value exept for Y(4) = [10].
If you have time code or pseudo code in Matlab is appreciated, otherwise also a suggestion, link etc. is super fine.
Thanks
ORIGINAL QUESTION
I have n vectors of different length.
X = [x1 x2 x3 x4 ... x100]
Y = [y1 y2 y3 y4 ... y150]
...
Z = [z1 z2 z3 z4 ... z110]
Vectors (X Y ... Z) represent minima values of the energy of the corresponding signals (X_energy, Y_energy, etc).
To recap starting from the signals X_signal, Y_signal ... Z_signal I compute the energy in windows of 20 samples and I calculate the minima of the resulting energy signals.
Assuming that 2 or more vector are similar if they have almost equal values (i.e. X and Y are similar if x1 ~= y1, x2 ~=y2, etc.) In other words I assume that the original signals are similar if they have minimum energy at the same (or almost at the same) time instant. I would like to know what would be a smart approach to measure this kind of similarity.
PS.
It is almost impossible that two vectors are equal so I would like to have just an idea of how close their "points" are.
X and Y could be similar also if they are shifted (i.e. x1~=y3, x2=~y4, etc)
It is always the case that the values are in ascending order (x1<x2<...<x100)
If you have time code or pseudo code in Matlab is appreciated, otherwise also a suggestion, link etc. is super fine.
Thanks

One possible approach (particularly if you do not have the Statistics and/or Signal Processing toolbox) is to generate a correlation matrix for all of your vectors with the Matlab function corrcoef
Since your vectors are different sizes, you would have to either
zero pad the smaller vectors so they are the same size as the largest
Or take an aligned sample of values less than or equal to the number
of values in the smallest vector, out of each of them before
computing correlation.
It depends on your application which procedure is more suitable. Since your vectors are ordered in ascending order, likely zero padding would be inappropriate.
Then you would need to create a matrix M with the rows corresponding to the elements, and the columns corresponding to each (zero padded or sampled) vector.
You could do that with the Matlab function horzcat:
M=horzcat(V1,V2,...Vn)
where V1, V2, ..Vn are each column vectors of the same size.
Finally you could get a correlation matrix for all of your vectors with corrcoef:
Cmat=corrcoef(M)
Matlab docs for corrcoef at this link will help you understand how to interpret the results statistically.
Note that this approach would not take into account any correlation between lagged versions of your vectors.

Edited answer
Now that it is clear that X vector is the time positions of all minima of signal 'X', Y vector is the time positions of all minima of signal 'Y', etc... Here is some updated code.
In fact the idea is still the same ... we build a linearly sampled time vector from all time positions of the minima in all signals (+ from some time sampling precision)... then we build new signals being 1.0 everywhere expect at minima time locations (set to 0.0) ... finally we use the same correlation code as before.
NB Speed and memory optimized version is now available here
function [RMax] = MinimaCorrelation(c, ts)
%[
% Some default resolution and time-location of minima positions
if (nargin < 2), ts = 0.1; end
if (nargin < 1), c = { [1 3 8 7 3 4 12]; [3 8 7 3]; [4 12]; [5 3 8 -3 12]; [1 3 8 7 3 4 12]; }; end
% Number of channels
n = length(c);
% Build linearly sample time vector for all time locations
minTime = min(cellfun(#min, c));
maxTime = max(cellfun(#max, c));
timeVector = minTime:ts:maxTime;
timeVector(end+1) = timeVector(end) + ts; % just to really include min and max if step is not ok
% Build new signals being '1' everywhere except at minima locations (set to '0')
s = ones(n, length(timeVector));
for ni = 1:n
for mv = c{ni}
[~, ind] = min(abs(timeVector - mv));
s(ni, ind) = 0;
end
end
% Correlation (copied 3 times to avoid biased effect on sides ==> circular shifting is ok this way)
s = [s, s, s].';
RMax = max(xcorr(s, 'coeff'), [], 1);
% Put in R(i,j) format
RMax = reshape(RMax, [n n]);
%]
end
With default data, one obtains:
1.0000 0.9899 0.9866 0.9829 1.0000
0.9899 1.0000 0.9833 0.9865 0.9899
0.9866 0.9833 1.0000 0.9832 0.9866
0.9829 0.9865 0.9832 1.0000 0.9829
1.0000 0.9899 0.9866 0.9829 1.0000
Careful, it is brute force solution (time & memory consumption increases quickly with the number of signal and time resolution to have). Now that question is more clear, maybe someone will find smarter answer.
Original answer
Here is coarse-code for an approach using the maximum of cross-correlation and xcorr routine (in signal processing toolbox):
function [RMax] = xcorrmax(c)
%[
% Default signals for test
if (nargin < 1),
c = cell(0,0);
c{end+1} = [1 3 8 7 3 4 12];
c{end+1} = [3 8 7 3];
c{end+1} = [4 12];
c{end+1} = [5 3 8 -3 12];
c{end+1} = [1 3 8 7 3 4 12];
end
% Number of channels
n = length(c);
% Padding to have vectors all of the same length
% See also `padarray` to do circular/symmetric padding (i don't have image toolbox)
maxlength = max(cellfun(#length, c));
c = cellfun(#(x)myquickpad(x, maxlength), c, 'UniformOutput', false);
c = cell2mat(c.').';
% Compute cross correlation (multichannel case) and keep max value
% NB1: May also use xcov if signal mean is not important
% NB2: Normalization at lag = 0
RMax = max(xcorr(c, 'coeff'), [], 1);
% Put in R(i,j) format
RMax = reshape(RMax, [n n]);
%]
end
function [a] = myquickpad(a, maxlength)
%[
if (length(a) < maxlength)
a(maxlength) = 0;
end
%]
end
For the following signals:
(1) [1 3 8 7 3 4 12]
(2) [3 8 7 3]
(3) [4 12]
(4) [5 3 8 -3 12]
(5) [1 3 8 7 3 4 12]
It returns the following correlation matrix R(i,j) between ith and jth signals:
1.0000 0.6698 0.7402 0.8016 1.0000
0.6698 1.0000 0.8012 0.4853 0.6698
0.7402 0.8012 1.0000 0.6587 0.7402
0.8016 0.4853 0.6587 1.0000 0.8016
1.0000 0.6698 0.7402 0.8016 1.0000
Some remarks:
It looks coherent, for instance signal (1) and (5) are identical and correlation is 1.0.
Because of normalization used it considers (1) closer to (3) than (2) ... so should be reviewed upon your needs (see normalization as in xcorrcoef for instance as shown by #paisanco).
You can use xcov instead of xcorr if signal shifts in amplitude are not important.
Again, this is a coarse approach, not speed/memory optimized at all, nor accounting for the fact that values are sorted, and may be not fully inline with what you'd really like to have.

Related

Octave Coding - I need help coding coefficients of polynomial

This question fairly easy doing it manually however, I am struggling to have this written in code.
There is a quartic polynomial:
P(x)=ax^4+bx^3+cx^2+dx+e
There is also a given matrix M:
5 0 -1 2 9
-2 -1 0 1 2
Which the first row gives P(x) and the second row gives the value of x.
Using the information in matrix M, find the coefficients:
a, b, c, d, e
I would know how to work this manually, subbing each column and solve simultaneously with the other columns to obtain a value for each coefficient or put it in a matrix.
I have an idea of what to do, but I don't know how to code it.
I do believe the last line would be linearsolve(M(,1),M(,2)) and thus be able to obtain each coefficient but I have no idea how to get to that line.
Welcome J Cheong
% Values of y and x (note: in your post you had 2 values at x = 1, I assumed that was an accident)
M = [5 0 -1 2 9 ; -2 -1 0 1 2];
% Separate for clarity
y = M(1,:);
x = M(2,:);
% Fit to highest order polynomial model
f = fit(x',y',['poly', num2str(length(y)-1)])
% Extract coefficients
coeff = coeffvalues(f);
% Plotting
X = linspace(min(x)-1, max(x) + 1, 1000) ;
plot(x,y,'.',X,f(X))
Edit
Sorry, I'm using Matlab. Looking at the Octave documentation. You should be able to get the coefficients using
p = polyfit(x,y,length(y)-1)';
Then to display the coefficients the way you specified try this
strcat(cellstr(char(96+(1:length(p))')), { ' = ' } , cellstr(num2str(p)))
y=[5 0 -1 2 9];
x=[-2 -1 0 1 2];
P=polyfit(x,y,2)
gives
P =
2.0000 1.0000 -1.0000
these are your coefficients for c,d,e the others are zero. You can check the result:
polyval(P, x)
ans =
5.0000e+00 2.2204e-16 -1.0000e+00 2.0000e+00 9.0000e+00
which gives you y
Btw, you can solve this very fast just inside your head without calculator because the function values for x=0 and x=+/-1 are very easy to calculate.

Interpolate matrices for different times in Matlab

I have computed variables stored in a matrix for a specific time vector.
Now I want to interpolate between those whole matrices for a new time vector to get the matrices for the desired new time vector.
I've came up with the following solution but it seems clunky and computational demanding:
clear all;
a(:,:,1) = [1 1 1;2 2 2;3 3 3]; % Matrix 1
a(:,:,2) = [4 4 4;6 6 6;8 8 8]; % Matrix 2
t1 = [1 2]; % Old time vector
t2 = [1 1.5 2]; % New time vector
% Interpolation for each matrix element
for r = 1:1:size(a,2)
for c = 1:1:size(a,1)
tab(:) = a(r,c,:);
tabInterp(r,c,:) = interp1(t1,tab(:),t2);
end
end
The result is and should be:
[2.5000 2.5000 2.5000
4.0000 4.0000 4.0000
5.5000 5.5000 5.5000]
Any thoughts?
You can do the linear interpolation manually, and all at once...
m = ( t2 - t1(1) ) / ( t1(2) - t1(1) );
% Linear interpolation using the standard 'y = m*x + c' linear structure
tabInterp = reshape(m,1,1,[]) .* (a(:,:,2)-a(:,:,1)) + a(:,:,1);
This will work for any size t2, as long as t1 has 2 elements.
If you have a t1 with more than 2 elements, you can create the scaling vector m using interp1. This is relatively efficient because you're only using interp1 for your time vector, not the matrix:
m = interp1( t1, (t1-min(t1))/(max(t1)-min(t1)), t2, 'linear', 'extrap' );
This uses implicit expansion with the .* operation, which requires R2016b or newer. If you have an older MATLAB version then use bsxfun for the same functionality.
I don't really see a problem with a loop based approach, but if you're looking for a loopless method you can do the following.
[rows, cols, ~] = size(a);
aReshape = reshape(a, rows*cols, []).';
tabInterp = reshape(interp1(t1, aReshape, t2).', rows, cols, []);
Looking at the source code for interp1 it appears a for loop is being used anyway so I doubt this will result in any performance gain.

Extracting and storing non-zero entries in MATLAB

Could anyone help me build and correct my code which aims to only save the non-zero elements of an arbitrary square matrix and its index? Basically I need to write a script that does the same function as 'sparse' in MATLAB.
`%Consider a 3x3 matrix
A=[ 0 0 9 ;-1 8 0;0 -5 0 ];
n=3; %size of matrix
%initialise following arrays:
RI= zeros(n,1); %row index
CI = zeros(n,1); %column index
V = zeros(n,1); %value in the matrix
for k = 1:n %row 1 to n
for j = 1:n %column 1 to n
if A(k,j)~=0
RI(k)=k;
CI(j)=j;
V(k,j)=A(k,j);
end
end
end`
You could use the find function to find all the non-zero elements.
So,
[RI, CI, V] = find(A);
% 2 1 -1
% 2 2 8
% 3 2 -5
% 1 3 9
EDIT :
I realize from your comments that your goal was to learn coding in Matlab and you might be wondering why your code didn't work as expected. So let me try to explain the issue along with an example code that is similar to yours.
% Given:
A=[ 0 0 9 ;-1 8 0;0 -5 0 ];
Firstly, instead of manually specifying the size as n = 3, I'd recommend using the built-in size function.
sz = size(A);
% note that this contains 2 elements:
% [number of rows, number of columns]
Next, to initialize the arrays RI, CI and V we would like to know their sizes. Since we do not know the number of non-zero elements to start with, we
have two options: (1) choose a large number that is guaranteed to be equal to or greater than the number of non-zero elements, for example prod(sz). (Why is that true?). (2) Do not initialize it at all and let Matlab dynamically allocate memory as required. I'd follow the second option in the code below.
% we'll keep a count of non-zero elements as we find them
numNZ = 0; % this will increment every time a non-zero element is found
for iCol = 1:sz(2) %column 1 to end
for iRow = 1:sz(1) %row 1 to end
if A(iRow,iCol)~=0
numNZ = numNZ + 1;
RI(numNZ) = iRow;
CI(numNZ) = iCol;
V(numNZ) = A(iRow,iCol);
end
end
end
disp([RI, CI, V])
% 2 1 -1
% 2 2 8
% 3 2 -5
% 1 3 9
Makes sense?
So I think we've established that the point of this is to learn an unfamiliar programming language. The simplest solution is to use sparse itself but that gives you no insight into programming. Nor does find, which can be used similarly.
Now, we could go the same route you've started using: procedural for and if over each row and each column. Could be almost any programming language, but for a few quirks of punctuation. But what you'll find, even if you do correct the mistakes (like the fact that n should be the number of non-zero entries, not the number of rows) is that this is a very slow way of doing numerical work in Matlab.
Here's another (still inefficient, but less so) way which will hopefully provide some insight into the "vectorized" way of doing things, which is one of the things that makes Matlab as powerful as it is:
function [RI, CI, V] = mysparse(A) % first: use functions!
[nRows, nCols] = size(A);
[allRowIndices, allColIndices] = ndgrid(1:nRows, 1:nCols) % let's leave the semicolon off so you can see for yourself what it does.
% It's very similar to `meshgrid` which you'll see more often (it's heavily used in Matlab graphics)
% but `ndgrid` is "simpler" in that it's more in tune with the fundamental conventions of Matlab (rows, then columns)
isNonZero = A ~= 0; % this gives you a "logical array" which is a very powerful thing: it can be used as a subscript to select elements from another array, in one shot...
RI = allRowIndices(isNonZero); % like this
CI = allColIndices(isNonZero); % or this
V = A(isNonZero); % or even this
RI = RI(:); % have to do this explicitly, because the lines above will reshape the values into a single long string under some circumstances but not others
CI = CI(:);
V = V(:);
I will go with a N x 3 matrix where N are the number of non-zero elements in the matrix.
% Define a matrix A as follows:
A = randi([0 1],[4 4])
for i=1:16
if A(i) ~= 0
A(i) = rand;
end
end
[row,col] = find(A);
elms = A(A~=0); % MATLAB always works in column-major order and is consistent,
% so no need to use sub2ind to access elements given by find
newSparse_A = [row col elms];
Output:
newSparse_A =
1.0000 1.0000 0.9027
2.0000 1.0000 0.9448
3.0000 1.0000 0.4909
1.0000 2.0000 0.4893
2.0000 2.0000 0.3377
4.0000 2.0000 0.9001
>> sparse(A)
ans =
(1,1) 0.9027
(2,1) 0.9448
(3,1) 0.4909
(1,2) 0.4893
(2,2) 0.3377
(4,2) 0.9001

Calculate the correlation coefficient using the probability density in Matlab

I'm trying to use Matlab to calculate the correlation coefficient for a bidimensional normal law.
mu = [1 2];
SIGMA = [9 4; 4 3];
X = mvnrnd(mu,SIGMA);
p = mvnpdf(X,mu,SIGMA);
The variable p stores the probability density of the vector X that follows the bidimensional normal law. I must use the probability function p to calculate the correlation coefficent and the function R = corrcoef(X) doesn't do that.
You are actually creating one multivariate normal random number (1x2) in the third line of your code, but you need more.
According to the documentations:
R = corrcoef(X) returns a matrix R of correlation coefficients calculated from an input matrix X whose rows are observations and whose columns are variables.
in your case you have one observation and two variables, for which the corrcoef function will return 1.
so you can do something like this:
mu = [1 -1];
SIGMA = [.9 .4; .4 .3];
X = mvnrnd(mu,SIGMA,10); % a 10 x 2 matrix
p = mvnpdf(X,mu,SIGMA); % a 10 x 1 vector
corrcoef(X)
ans =
1.0000 0.7846
0.7846 1.0000

Getting the N-dimensional product of vectors

I am trying to write code to get the 'N-dimensional product' of vectors. So for example, if I have 2 vectors of length L, x & y, then the '2-dimensional product' is simply the regular vector product, R=x*y', so that each entry of R, R(i,j) is the product of the i'th element of x and the j'th element of y, aka R(i,j)=x(i)*y(j).
The problem is how to elegantly generalize this in matlab for arbitrary dimensions. This is I had 3 vectors, x,y,z, I want the 3 dimensional array, R, such that R(i,j,k)=x(i)*y(j)*z(k).
Same thing for 4 vectors, x1,x2,x3,x4: R(i1,i2,i3,i4)=x1(i1)*x2(i2)*x3(i3)*x4(i4), etc...
Also, I do NOT know the number of dimensions beforehand. The code must be able to handle an arbitrary number of input vectors, and the number of input vectors corresponds to the dimensionality of the final answer.
Is there any easy matlab trick to do this and avoid going through each element of R specifically?
Thanks!
I think by "regular vector product" you mean outer product.
In any case, you can use the ndgrid function. I like this more than using bsxfun as it's a little more straightforward.
% make some vectors
w = 1:10;
x = w+1;
y = x+1;
z = y+1;
vecs = {w,x,y,z};
nvecs = length(vecs);
[grids{1:nvecs}] = ndgrid(vecs{:});
R = grids{1};
for i=2:nvecs
R = R .* grids{i};
end;
% Check results
for i=1:10
for j=1:10
for k=1:10
for l=1:10
V(i,j,k,l) = R(i,j,k,l) == w(i)*x(j)*y(k)*z(l);
end;
end;
end;
end;
all(V(:))
ans = 1
The built-in function bsxfun is a fast utility that should be able to help. It is designed to perform 2 input functions on a per-element basis for two inputs with mismatching dimensions. Singletons dimensions are expanded, and non-singleton dimensions need to match. (It sounds confusing, but once grok'd it useful in many ways.)
As I understand your problem, you can adjust the dimension shape of each vector to define the dimension that it should be defined across. Then use nested bsxfun calls to perform the multiplication.
Example code follows:
%Some inputs, N-by-1 vectors
x = [1; 3; 9];
y = [1; 2; 4];
z = [1; 5];
%The computation you describe, using nested BSXFUN calls
bsxfun(#times, bsxfun(#times, ... %Nested BSX fun calls, 1 per dimension
x, ... % First argument, in dimension 1
permute(y,2:-1:1) ) , ... % Second argument, permuited to dimension 2
permute(z,3:-1:1) ) % Third argument, permuted to dimension 3
%Result
% ans(:,:,1) =
% 1 2 4
% 3 6 12
% 9 18 36
% ans(:,:,2) =
% 5 10 20
% 15 30 60
% 45 90 180
To handle an arbitrary number of dimensions, this can be expanded using a recursive or loop construct. The loop would look something like this:
allInputs = {[1; 3; 9], [1; 2; 4], [1; 5]};
accumulatedResult = allInputs {1};
for ix = 2:length(allInputs)
accumulatedResult = bsxfun(#times, ...
accumulatedResult, ...
permute(allInputs{ix},ix:-1:1));
end