How to create an adjacency/joint probability matrix in matlab

How to create an adjacency/joint probability matrix in matlab - matlab

From a binary matrix, I want to calculate a kind of adjacency/joint probability density matrix (not quite sure how to label it as so please feel free to rename).
For example, I start with this matrix:
A = [1 1 0 1 1
1 0 0 1 1
0 0 0 1 0]
I want to produce this output:
Output = [1 4/5 1/5
4/5 1 1/5
1/5 1/5 1]
Basically, for each row, I want to calculate the proportion of times where they agreed (1 and 1 or 0 and 0). A will always agree with itself and thus have it as 1 along the diagonal. No matter how many different js are added it will still result in a 3x3, but an extra i variable will result in a 4x4.
I like to think of the inputs along i in the A matrix as the person and Js as the question and so the final output is a 3x3 (number of persons) matrix.
I am having some trouble with this on matlab. If you could please help point me in the right direction that would be fabulous.

So, you can do this in two parts.
bothOnes = A*A';
gives you a matrix showing how many 1s each pair of rows share, and
bothZeros = (1-A)*(1-A)';
gives you a matrix showing how many 0s each pair of rows share.
If you just add them up, you get how many elements they share of either type:
bothSame = A*A' + (1-A)*(1-A)';
Then just divide by the row length to get the desired fractional representation:
output = (A*A' + (1-A)*(1-A)') / size(A, 2);
That should get you there.
Note that this only works if A contains only 1's and 0's, but it can be adapted for other cases.

Here are some alternatives, assuming A can only contain 0 and 1:
If you have the Statistics Toolbox:
result = 1-squareform(pdist(A, 'hamming'));
Manual approach with implicit expansion:
result = mean(permute(A, [1 3 2])==permute(A, [3 1 2]), 3);
Using bitwise operations. This is a more esoteric approach, and is only valid if A has at most 53 columns, due to floating-point limitations:
t = bin2dec(char(A+'0')); % convert each row from binary to decimal
u = bitxor(t, t.'); % bitwise xor
v = mean(dec2bin(u)-'0', 2); % compute desired values
result = 1 - reshape(v, size(A,1), []); % reshape to obtain result

Related

Finding equal rows in Matlab

I have a matrix suppX in Matlab with size GxN and a matrix A with size MxN. I would like your help to construct a matrix Xresponse with size GxM with Xresponse(g,m)=1 if the row A(m,:) is equal to the row suppX(g,:) and zero otherwise.
Let me explain better with an example.
suppX=[1 2 3 4;
5 6 7 8;
9 10 11 12]; %GxN
A=[1 2 3 4;
1 2 3 4;
9 10 11 12;
1 2 3 4]; %MxN
Xresponse=[1 1 0 1;
0 0 0 0;
0 0 1 0]; %GxM
I have written a code that does what I want.
Xresponsemy=zeros(size(suppX,1), size(A,1));
for x=1:size(suppX,1)
Xresponsemy(x,:)=ismember(A, suppX(x,:), 'rows').';
end
My code uses a loop. I would like to avoid this because in my real case this piece of code is part of another big loop. Do you have suggestions without looping?

One way to do this would be to treat each matrix as vectors in N dimensional space and you can find the L2 norm (or the Euclidean distance) of each vector. After, check if the distance is 0. If it is, then you have a match. Specifically, you can create a matrix such that element (i,j) in this matrix calculates the distance between row i in one matrix to row j in the other matrix.
You can treat your problem by modifying the distance matrix that results from this problem such that 1 means the two vectors completely similar and 0 otherwise.
This post should be of interest: Efficiently compute pairwise squared Euclidean distance in Matlab.
I would specifically look at the answer by Shai Bagon that uses matrix multiplication and broadcasting. You would then modify it so that you find distances that would be equal to 0:
nA = sum(A.^2, 2); % norm of A's elements
nB = sum(suppX.^2, 2); % norm of B's elements
Xresponse = bsxfun(#plus, nB, nA.') - 2 * suppX * A.';
Xresponse = Xresponse == 0;
We get:
Xresponse =
3×4 logical array
1 1 0 1
0 0 0 0
0 0 1 0
Note on floating-point efficiency
Because you are using ismember in your implementation, it's implicit to me that you expect all values to be integer. In this case, you can very much compare directly with the zero distance without loss of accuracy. If you intend to move to floating-point, you should always compare with some small threshold instead of 0, like Xresponse = Xresponse <= 1e-10; or something to that effect. I don't believe that is needed for your scenario.

Here's an alternative to #rayryeng's answer: reduce each row of the two matrices to a unique identifier using the third output of unique with the 'rows' input flag, and then compare the identifiers with singleton expansion (broadcast) using bsxfun:
[~, ~, w] = unique([A; suppX], 'rows');
Xresponse = bsxfun(#eq, w(1:size(A,1)).', w(size(A,1)+1:end));

Matlab matrix with fixed sum over rows

I'm trying to construct a matrix in Matlab where the sum over the rows is constant, but every combination is taken into account.
For example, take a NxM matrix where M is a fixed number and N will depend on K, the result to which all rows must sum.
For example, say K = 3 and M = 3, this will then give the matrix:
[1,1,1
2,1,0
2,0,1
1,2,0
1,0,2
0,2,1
0,1,2
3,0,0
0,3,0
0,0,3]
At the moment I do this by first creating the matrix of all possible combinations, without regard for the sum (for example this also contains [2,2,1] and [3,3,3]) and then throw away the element for which the sum is unequal to K
However this is very memory inefficient (especially for larger K and M), but I couldn't think of a nice way to construct this matrix without first constructing the total matrix.
Is this possible in a nice way? Or should I use a whole bunch of for-loops?

Here is a very simple version using dynamic programming. The basic idea of dynamic programming is to build up a data structure (here S) which holds the intermediate results for smaller instances of the same problem.
M=3;
K=3;
%S(k+1,m) will hold the intermediate result for k and m
S=cell(K+1,M);
%Initialisation, for M=1 there is only a trivial solution using one number.
S(:,1)=num2cell(0:K);
for iM=2:M
for temporary_k=0:K
for new_element=0:temporary_k
h=S{temporary_k-new_element+1,iM-1};
h(:,end+1)=new_element;
S{temporary_k+1,iM}=[S{temporary_k+1,iM};h];
end
end
end
final_result=S{K+1,M}

This may be more efficient than your original approach, although it still generates (and then discards) more rows than needed.
Let M denote the number of columns, and S the desired sum. The problem can be interpreted as partitioning an interval of length S into M subintervals with non-negative integer lengths.
The idea is to generate not the subinterval lengths, but the subinterval edges; and from those compute the subinterval lengths. This can be done in the following steps:
The subinterval edges are M-1 integer values (not necessarily different) between 0 and S. These can be generated as a Cartesian product using for example this answer.
Sort the interval edges, and remove duplicate sets of edges. This is why the algorithm is not totally efficient: it produces duplicates. But hopefully the number of discarded tentative solutions will be less than in your original approach, because this does take into account the fixed sum.
Compute subinterval lengths from their edges. Each length is the difference between two consecutive edges, including a fixed initial edge at 0 and a final edge at S.
Code:
%// Data
S = 3; %// desired sum
M = 3; %// number of pieces
%// Step 1 (adapted from linked answer):
combs = cell(1,M-1);
[combs{end:-1:1}] = ndgrid(0:S);
combs = cat(M+1, combs{:});
combs = reshape(combs,[],M-1);
%// Step 2
combs = unique(sort(combs,2), 'rows');
%// Step 3
combs = [zeros(size(combs,1),1) combs repmat(S, size(combs,1),1)]
result = diff(combs,[],2);
The result is sorted in lexicographical order. In your example,
result =
0 0 3
0 1 2
0 2 1
0 3 0
1 0 2
1 1 1
1 2 0
2 0 1
2 1 0
3 0 0

sum matrix using logical matrix - index exceeds matrix dimensions

I have two matrices.
mcaps which is a double 1698 x 2
index_g which is a logical 1698 x 2
When using the line of code below I get the error message that Index exceeds matrix dimensions. I don't see how this is the case though?
tsp = nansum(mcaps(index_g==1, :));
Update
Sorry I should have mentioned that I need the sum of each column in the mcaps vector
** Example of data **
mcaps index_g
5 6 0 0
4 3 0 0
6 5 1 1
4 6 0 1
8 7 0 0

There are two problems here. I missed one. Original answer is below.
What I missed is that when you use the logical index in this way, you are picking out elements of the matrix that may have different numbers of elements in each column, so MATLAB can't return a well formed matrix back to nansum, and so returns a vector. To get around this, use the fact that 0 + anything = 0
% create a mask of values you don't want to sum. Note that since
% index_g is already logical, you don't have to test equal to 1.
mask = ~index_g & isnan(mcaps)
% create a temporary variable
mcaps_to_sum = mcaps;
% change all of the values that you don't want to sum to zero
mcaps_to_sum(mask) = 0;
% do the sum
sum(mcaps_to_sum,1);
This is basically all that the nansum function does internally, is to set all of the NaN values to zero and then call the sum function.
index_g == 1 returns a 1698 x 2 logical matrix, but then you add in an extra dimension with the colon. To sum the columns, use the optional dim input. You want:
tsp = nansum(mcaps(index_g == 1),1);

Eliminating zeros in a matrix - Matlab

Hi I have the following matrix:
A= 1 2 3;
0 4 0;
1 0 9
I want matrix A to be:
A= 1 2 3;
1 4 9
PS - semicolon represents the end of each column and new column starts.
How can I do that in Matlab 2014a? Any help?
Thanks

The problem you run into with your problem statement is the fact that you don't know the shape of the "squeezed" matrix ahead of time - and in particular, you cannot know whether the number of nonzero elements is a multiple of either the rows or columns of the original matrix.
As was pointed out, there is a simple function, nonzeros, that returns the nonzero elements of the input, ordered by columns. In your case,
A = [1 2 3;
0 4 0;
1 0 9];
B = nonzeros(A)
produces
1
1
2
4
3
9
What you wanted was
1 2 3
1 4 9
which happens to be what you get when you "squeeze out" the zeros by column. This would be obtained (when the number of zeros in each column is the same) with
reshape(B, 2, 3);
I think it would be better to assume that the number of elements may not be the same in each column - then you need to create a sparse array. That is actually very easy:
S = sparse(A);
The resulting object S is a sparse array - that is, it contains only the non-zero elements. It is very efficient (both for storage and computation) when lots of elements are zero: once more than 1/3 of the elements are nonzero it quickly becomes slower / bigger. But it has the advantage of maintaining the shape of your matrix regardless of the distribution of zeros.
A more robust solution would have to check the number of nonzero elements in each column and decide what the shape of the final matrix will be:
cc = sum(A~=0);
will count the number of nonzero elements in each column of the matrix.
nmin = min(cc);
nmax = max(cc);
finds the smallest and largest number of nonzero elements in any column
[i j s] = find(A); % the i, j coordinates and value of nonzero elements of A
nc = size(A, 2); % number of columns
B = zeros(nmax, nc);
for k = 1:nc
B(1:cc(k), k) = s(j == k);
end
Now B has all the nonzero elements: for columns with fewer nonzero elements, there will be zero padding at the end. Finally you can decide if / how much you want to trim your matrix B - if you want to have no zeros at all, you will need to trim some values from the longer columns. For example:
B = B(1:nmin, :);

Simple solution:
A = [1 2 3;0 4 0;1 0 9]
A =
1 2 3
0 4 0
1 0 9
A(A==0) = [];
A =
1 1 2 4 3 9
reshape(A,2,3)
ans =
1 2 3
1 4 9
It's very simple though and might be slow. Do you need to perform this operation on very large/many matrices?

From your question it's not clear what you want (how to arrange the non-zero values, specially if the number of zeros in each column is not the same). Maybe this:
A = reshape(nonzeros(A),[],size(A,2));

Matlab's logical indexing is extremely powerful. The best way to do this is create a logical array:
>> lZeros = A==0
then use this logical array to index into A and delete these zeros
>> A(lZeros) = []
Finally, reshape the array to your desired size using the built in reshape command
>> A = reshape(A, 2, 3)

Why sprank(A) and A\b report different rank in matlab?

I have a point set P and I construct it's adjacent matrix A by k-nearest neighbor. Each row of A is [...+1...-1...], indicates a pair of neighbor points. The size of A is 48348 x 8058, sprank(A) is 8058. But when I do the following, it gives me a warning: "Warning: Rank deficient, rank = 8055, tol = 8.307912e-10."
a=A*b;
c=A\a;
and norm(c-b) is quite large. It seems something is wrong with the adjacent matrix A, but I can't figure it out. Thanks in advance!

sprank only tells you how many rows/columns of your matrix have non-zero elements, while A\b is reporting the actual rank of the matrix which indicates how many rows of your matrix are linearly independent. For example, for following matrix:
A = [-1 1 0 0;
0 1 -1 0;
1 0 -1 0;
0 0 1 -1]
sprank(A) is 4 but rank(A) is only 3 because you can write the third row as a linear combination of the other rows, specifically A(2,:) - A(1,:).
The issue that you need to address is either in how you're computing A (if you expect that to generate a system of linearly independent equations) or you need to find a way to use A that doesn't require factorizing a rank deficient matrix.