calculate similarity between two vectors - matlab

I have a matrix M which is a 29 x 18 double, something like this:
1 1 1 ...
2 1 1 ...
3 1 2 ...
2 2 2 ...
2 1 3 ...
3 1 3 ...
1 3 3 ...
...
For each possible pair of two columns in M, I want to calculate the number of times the values of the same row between two columns are identical. Take column 1 and 2 for instance, the number of times values of the same row are identical is 2 since M(1,1) = M(1,2) and M(4,1) = M(4,2). This calculation is repeated 18 time for each column as each column is paired with each of the total number of columns, including itself. Thus, the output (called N) would be 18 x 18 matrix with the each value indicating how many instances the values of the same row from the original two corresponding columns are identical. Something like this
29 4 5 3 ...
4 29 6 0 ...
5 6 29 7 ...
...
Since N(2,1) = 4, this would indicate that column 1 and column 2 matrix M have 4 matching values of the same row.
How do I do this?

you can do a double for loop like that:
result = zeros(18);
for i = 1:18
for j = 1:18
result(i,j) = nnz(M(:,i) == M(:,j));
end
end

Related

Summing specific columns for each row in a matrix of double

I would like to sum specific columns of each row in a matrix using a for loop. Below I have included a simplified version of my problem. As of right now, I am calculating the column sums individually, but this is not effective as my actual problem has multiple matrices (data sets).
a = [1 2 3 4 5 6; 4 5 6 7 8 9];
b = [2 2 3 4 4 6; 3 3 3 4 5 5];
% Repeat the 3 lines of code below for row 2 of matrix a
% Repeat the entire process for matrix b
c = sum(a(1,1:3)); % Sum columns 1:3 of row 1
d = sum(a(1,4:6)); % Sum columns 4:6 of row 1
e = sum(a(1,:)); % Sum all columns of row 1
I would like to know how to create a for loop that automatically loops through and sums the specific columns of each row for each matrix that I have.
Thank you.
Here is a solution that you don't need to use for loop.
Assuming that you have a matrix a of size 2x12, and you want to do the row sums every 4 columns, then you can use reshape() and squeeze() to get the final result:
k = 4;
a = [1:12
13:24];
% a =
% 1 2 3 4 5 6 7 8 9 10 11 12
% 13 14 15 16 17 18 19 20 21 22 23 24
s = squeeze(sum(reshape(a,size(a,1),k,[]),2));
and you will get
s =
10 26 42
58 74 90

indexing for loop matlab

I want to do the following:
I create a matrix with all possible permutations from 1:n, for example
n=4;
L=perms(1:n)';
I get as output as expected a 4-by-24 matrix:
L =
Columns 1 through 13
4 4 4 4 4 4 3 3 3 3 3 3
3 3 2 2 1 1 4 4 2 2 1 1
2 1 3 1 2 3 2 1 4 1 2 4
1 2 1 3 3 2 1 2 1 4 4 2
Columns 14 through 24
2 2 2 2 2 1 1 1 1 1 1
3 4 4 1 1 3 3 2 2 4 4
1 3 1 4 3 2 4 3 4 2 3
4 1 3 3 4 4 2 4 3 3 2
Now I want to use this matrix for the indexes of a for loop:
Using the first column, I want to feed the input of my loop the following indexes: i=4 j=3,2,1. Then for i=3 j=2,1. Then for i=2 j=1. i=1 is empty
This could be done just for the first column like this:
for u=4:-1:1
for v=u-1:-1:1
But will not work for other columns so I need to do the same but with the entries of matrix L, something like (it doesn't work in MATLB) for column i=1:
u=L(1:4,1)
v=L(u:L(4,1) , 1) %// where u corresponds to L(1,1) then L(2,1) then L(3,1)
(for all the columns it would look like:
for i=1:length(L)
for u=L(4*(i-1)+1:4*i)
for v=.. ?
)
This doesn't work because MATLAB takes the values of the entries and when I write L(1,1):L(4,1) it doesn't mean return the entries from line one to line four but rather all the numbers with increment 1 from the value of L(1,1) to the value of L(4,1) (here empty).
Any ideas ? thank you very much in advance
I believe something like this will solve you problem.
for col = 1:size(L,2)
rowIdx = 1;
for j = [L(:,col)]'
for k = [L(rowIdx:end,col)]'
% Do your stuff here
end
rowIdx = rowIdx + 1;
end
end
Notice how I use the values from columns from L directly as loop index variable. In a for loop statement you can basically write any row vector and the index takes those values. For example
for i = [1, 7, 11, 14, 23]
disp(i); % prints 1,7,11,14,23
end
This is true for arrays of objects, cell arrays, basically any single row matrix.
You can do it like this:
for col = 1:size(L, 2)
for I = 1:n-1
for J = I:n
i = L(I,col);
j = L(J,col);
%// As an example just print out the loop variable values
disp(sprintf('Col:%d\ti:%d\tj:%d\r\n',col,i,j))
end
end
end

Matlab: How I can make this transformation on the matrix A? (part 2)

N.B: This question is more complex than my previous question: Matlab: How I can make this transformation on the matrix A?
I have a matrix A 4x10000, I want to use it to find another matrix C, based on a predefined vector U.
I'll simplify my problem with a simple example:
from a matrix A
20 4 4 74 20 20 4
36 1 1 11 36 36 1
77 1 1 15 77 77 1
3 4 2 6 7 8 15
and
U=[2 3 4 6 7 8 2&4&15 7&8 4|6].
& : AND
| : OR
I want, first, to find an intermediate entity B:
2 3 4 6 7 8 2&4&15 7&8 4|6
[20 36 77] 0 1 0 0 1 1 0 1 0 4
[4 1 1] 1 0 1 0 0 0 1 0 1 4
[74 11 15] 0 0 0 1 0 0 0 0 1 2
we put 1 if the corresponding value of the first line and the vector on the left, made ​​a column in the matrix A.
the last column of the entity B is the sum of 1 of each line.
at the end I want a matrix C, consisting of vectors which are left in the entity B, but only if the sum of 1 is greater than or equal to 3.
for my example:
20 4
C = 36 1
77 1
This was a complex one indeed and because of the many restrictions and labeling processes involved, it won't be as efficient as the solution to the previous problem. Here's the code to solve the posted problem -
find_labels1 = 2:8; %// Labels to be detected - main block
find_labels2 = {[2 4 15],[7 8],[4 6]}; %// ... side block
A1 = A(1:end-1,:); %// all of A except the last row
A2 = A(end,:); %// last row of A
%// Find unique columns and their labels for all of A execpt the last row
[unqmat_notinorder,row_ind,inv_labels] = unique(A1.','rows'); %//'
[tmp_sortedval,ordered_ind] = sort(row_ind);
unqcols = unqmat_notinorder(ordered_ind,:);
[tmp_matches,labels] = ismember(inv_labels,ordered_ind);
%// Assign labels to each group
ctl = numel(unique(labels));
labelgrp = arrayfun(#(x) find(labels==x),1:ctl,'un',0);
%// Work for the main comparisons
matches = bsxfun(#eq,A2,find_labels1'); %//'
maincols = zeros(ctl,numel(find_labels1));
for k = 1:ctl
maincols(k,:) = any(matches(:,labelgrp{k}),2);
end
%// Work for the extra comparisons added that made this problem extra-complex
lens = cellfun('length',find_labels2);
lens(end) = 1;
extcols = nan(ctl,numel(find_labels2));
for k = 1:numel(find_labels2)
idx = find(ismember(A2,find_labels2{k}));
extcols(:,k)=arrayfun(#(n) sum(ismember(labelgrp{n},idx))>=lens(k),1:ctl).'; %//'
end
C = unqcols(sum([maincols extcols],2)>=3,:).' %//'# Finally the output
I will give you a partial answer. I think you can take from here. Idea is to concatenate first 3 rows of A with each element of U replicated as last column. After you get the 3D matrix, replicate your original A and then just compare the rows. The rows which are equal, that is equivalent to putting one in your table.
B=(A(1:3,:).';
B1=repmat(B,[1 1 length(U)]);
C=permute(U,[3 1 2]);
D=repmat(C,[size(B1,1),1,1]);
E=[B1 D];
F=repmat(A',[1 1 size(E,3)]);
Now compare F and E, row-wise. If the rows are equal, then you put 1 in your table. For replicating & and |, you can form some kind of indicator vector.
Say,
indU=[1 2 3 4 5 6 7 7 7 8 8 -9 -9];
Same positive value indicates &, same negative value indicates |. Different value indicate different entries.
I hope you can take from here.

MATLAB find mean of column in matrix using two different indices

I have a 22007x3 matrix with data in column 3 and two separate indices in columns 1 and 2.
eg.
x =
1 3 4
1 3 5
1 3 5
1 16 4
1 16 3
1 16 4
2 4 1
2 4 3
2 11 2
2 11 3
2 11 2
I need to find the mean of the values in column 3 when the values in column 1 are the same AND the values in column 2 are the same, to end up with something like:
ans =
1 3 4.6667
1 16 3.6667
2 4 2
2 11 2.3333
Please bear in mind that in my data, the number of times the values in column 1 and 2 occur can be different.
Two options I've tried already are the meshgrid/accumarray option, using two distinct unique functions and a 3D array:
[U, ix, iu] = unique(x(:, 1));
[U2,ix2,iu2] = unique(x(:,2));
[c, r, j] = meshgrid((1:size(x(:, 1), 2)), iu, iu2);
totals = accumarray([r(:), c(:), j(:)], x(:), [], #nanmean);
which gives me this:
??? Maximum variable size allowed by the program is exceeded.
Error in ==> meshgrid at 60
xx = xx(ones(ny,1),:,ones(nz,1));
and the loop option,
for i=1:size(x,1)
if x(i,2)== x(i+1,2);
totals(i,:)=accumarray(x(:,1),x(:,3),[],#nanmean);
end
end
which is obviously so very, very wrong, not least because of the x(i+1,2) bit.
I'm also considering creating separate matrices depending on how many times a value in column 1 occurs, but that would be long and inefficient, so I'm loathe to go down that road.
Group on the first two columns with a unique(...,'rows'), then accumulate only the third column (always the best approach to accumulate only where accumulation really happens, thus avoiding indices, i.e. the first two columns, which you can reattach with unX):
[unX,~,subs] = unique(x(:,1:2),'rows');
out = [unX accumarray(subs,x(:,3),[],#nanmean)];
out =
1 3 4.6667
1 16 3.6667
2 4 2
2 11 2.33
This is an ideal opportunity to use sparse matrix math.
x = [ 1 2 5;
1 2 7;
2 4 6;
3 4 6;
1 4 8;
2 4 8;
1 1 10]; % for example
SM = sparse(x(:,1),x(:,2), x(:,3);
disp(SM)
Result:
(1,1) 10
(1,2) 12
(1,4) 8
(2,4) 14
(3,6) 7
As you can see, we did the "accumulate same indices into same container" in one fell swoop. Now you need to know how many elements you have:
NE = sparse(x(:,1), x(:,2), ones(size(x(:,1))));
disp(NE);
Result:
(1,1) 1
(1,2) 2
(1,4) 1
(2,4) 2
(3,6) 1
Finally, you divide one by the other to get the mean (only use elements that have a value):
matrixMean = SM;
nz = find(NE>0);
matrixMean(nz) = SM(nz) ./ NE(nz);
If you then disp(matrixMean), you get
(1,1) 10
(1,2) 6
(1,4) 8
(2,4) 7
(3,6) 7
If you want to access the individual elements differently, then after you have computed SM and NE you can do
[i j n] = find(NE);
matrixMean = SM(i,j)./NE(i,j);
disp([i(:) j(:) nonzeros(matrixMean)]);

Extracting unique values

I have data in two columns that looks as follows:
A B
1,265848208 3
-0,608043611 0
-0,285735893 0
0,006895134 7
0 7
-0,004526196 7
0,176326617 10
-0,159688071 2
0,22439945 2
-0,991045044 1
0,178022324 1
-0,270967397 4
0,285849994 4
1,881705539 23
1,057184204 10
NaN 10
For all unique values in B I want to extract the corresponding value in column A and move it to a new matrix. I'm looking to then compute the mean of all the corresponding values in A and use as a dependent variable (weighted by no of observations per value in B) in a regression with the common value of B being the independent variable to reduce noise. Any help would on how to do this in Matlab (except running the regression) would be great!
Thanks
Oscar
Here is an efficient solution:
X = [
1.265848208 3
-0.608043611 0
-0.285735893 0
0.006895134 7
0 7
-0.004526196 7
0.176326617 10
-0.159688071 2
0.22439945 2
-0.991045044 1
0.178022324 1
-0.270967397 4
0.285849994 4
1.881705539 23
1.057184204 10
NaN 10
];
%# unique values in B, and their indices
[valB,~,subs] = unique(X(:,2));
%# values of A for each unique number in B (cellarray)
valA = accumarray(subs, X(:,1), [], #(x) {x});
%# mean of each group
meanValA = cellfun(#nanmean, valA)
%# perform regression here...
The result:
%# B values, mean of corresponding values in A, number of A values
>> [valB meanValA cellfun(#numel,valA)]
ans =
0 -0.44689 2
1 -0.40651 2
2 0.032356 2
3 1.2658 1
4 0.0074413 2
7 0.00078965 3
10 0.61676 3
23 1.8817 1