Matlab: Covariance Matrix from matrix of combinations using E(X) and E(X^2) - matlab

I have a set of independent binary random variables (say A,B,C) which take a positive value with some probability and zero otherwise, for which I have generated a matrix of 0s and 1s of all possible combinations of these variables with at least a 1 i.e.
A B C
1 0 0
0 1 0
0 0 1
1 1 0
etc.
I know the values and probabilities of A,B,C so I can calculate E(X) and E(X^2) for each. I want to treat each combination in the above matrix as a new random variable equal to the product of the random variables which are present in that combination (show a 1 in the matrix). For example, random variable Row4 = A*B.
I have created a matrix of the same size to the above, which shows the relevant E(X)s instead of the 1s, and 1s instead of the 0s. This allows me to easily calculate the vector of Expected values of the new random variables (one per combination) as the product of each row. I have also generated a similar matrix which shows E(X^2) instead of E(X), and another one which shows prob(X>0) instead of E(X).
I'm looking for a Matlab script that computes the Covariance matrix of these new variables i.e. taking each row as a random variable. I presume it will have to use the formula:
Cov(X,Y)=E(XY)-E(X)E(Y)
For example, for rows (1 1 0) and (1 0 1):
Cov(X,Y)=E[(AB)(AC)]-E(X)E(Y)
=E[(A^2)BC]-E(X)E(Y)
=E(A^2)E(B)E(C)-E(X)E(Y)
These values I already have from the matrices I've mentioned above. For each Covariance, I'm just unsure how to know which two variables appear in both rows, because for those I will have to select E(X^2) instead of E(X).
Alternatively, the above can be written as:
Cov(X,Y)=E(X)E(Y)*[1/prob(A>0)-1]
But the problem remains as the probabilities in the denominator will only be the ones of the variables which are shared between two combinations.
Any advice on how automate the computation of the Covariance matrix in Matlab would be greatly appreciated.

I'm pretty sure this is not the most efficient way to do that but that's a start:
Assume r1...n the combinations of the random variables, R is the matrix:
A B C
r1 1 0 0
r2 0 1 0
r3 0 0 1
r4 1 1 0
If you have the vector E1, E2 and ER as:
E1 = [E(A) E(B) E(C) ...]
E2 = [E(A²) E(B²) E(C²) ...]
ER = [E(r1) E(r2) E(r3) ...]
If you want to compute E(r1,r2) you can:
1) Extract the R1 and R2 columns from R
v1 = R(1,:)
v2 = R(2,:)
2) Sum both vectors in vs
vs = v1 + v2
3) Loop in vs, if you see a 2 that means the value in R2 has to be used, if you see a 1 it is the value in R1, if it is 0 do not use the value.
4) Using the loop, compute your E(r1,r2) as wanted.

Related

How to create an adjacency/joint probability matrix in matlab

From a binary matrix, I want to calculate a kind of adjacency/joint probability density matrix (not quite sure how to label it as so please feel free to rename).
For example, I start with this matrix:
A = [1 1 0 1 1
1 0 0 1 1
0 0 0 1 0]
I want to produce this output:
Output = [1 4/5 1/5
4/5 1 1/5
1/5 1/5 1]
Basically, for each row, I want to calculate the proportion of times where they agreed (1 and 1 or 0 and 0). A will always agree with itself and thus have it as 1 along the diagonal. No matter how many different js are added it will still result in a 3x3, but an extra i variable will result in a 4x4.
I like to think of the inputs along i in the A matrix as the person and Js as the question and so the final output is a 3x3 (number of persons) matrix.
I am having some trouble with this on matlab. If you could please help point me in the right direction that would be fabulous.
So, you can do this in two parts.
bothOnes = A*A';
gives you a matrix showing how many 1s each pair of rows share, and
bothZeros = (1-A)*(1-A)';
gives you a matrix showing how many 0s each pair of rows share.
If you just add them up, you get how many elements they share of either type:
bothSame = A*A' + (1-A)*(1-A)';
Then just divide by the row length to get the desired fractional representation:
output = (A*A' + (1-A)*(1-A)') / size(A, 2);
That should get you there.
Note that this only works if A contains only 1's and 0's, but it can be adapted for other cases.
Here are some alternatives, assuming A can only contain 0 and 1:
If you have the Statistics Toolbox:
result = 1-squareform(pdist(A, 'hamming'));
Manual approach with implicit expansion:
result = mean(permute(A, [1 3 2])==permute(A, [3 1 2]), 3);
Using bitwise operations. This is a more esoteric approach, and is only valid if A has at most 53 columns, due to floating-point limitations:
t = bin2dec(char(A+'0')); % convert each row from binary to decimal
u = bitxor(t, t.'); % bitwise xor
v = mean(dec2bin(u)-'0', 2); % compute desired values
result = 1 - reshape(v, size(A,1), []); % reshape to obtain result

Matlab matrix with fixed sum over rows

I'm trying to construct a matrix in Matlab where the sum over the rows is constant, but every combination is taken into account.
For example, take a NxM matrix where M is a fixed number and N will depend on K, the result to which all rows must sum.
For example, say K = 3 and M = 3, this will then give the matrix:
[1,1,1
2,1,0
2,0,1
1,2,0
1,0,2
0,2,1
0,1,2
3,0,0
0,3,0
0,0,3]
At the moment I do this by first creating the matrix of all possible combinations, without regard for the sum (for example this also contains [2,2,1] and [3,3,3]) and then throw away the element for which the sum is unequal to K
However this is very memory inefficient (especially for larger K and M), but I couldn't think of a nice way to construct this matrix without first constructing the total matrix.
Is this possible in a nice way? Or should I use a whole bunch of for-loops?
Here is a very simple version using dynamic programming. The basic idea of dynamic programming is to build up a data structure (here S) which holds the intermediate results for smaller instances of the same problem.
M=3;
K=3;
%S(k+1,m) will hold the intermediate result for k and m
S=cell(K+1,M);
%Initialisation, for M=1 there is only a trivial solution using one number.
S(:,1)=num2cell(0:K);
for iM=2:M
for temporary_k=0:K
for new_element=0:temporary_k
h=S{temporary_k-new_element+1,iM-1};
h(:,end+1)=new_element;
S{temporary_k+1,iM}=[S{temporary_k+1,iM};h];
end
end
end
final_result=S{K+1,M}
This may be more efficient than your original approach, although it still generates (and then discards) more rows than needed.
Let M denote the number of columns, and S the desired sum. The problem can be interpreted as partitioning an interval of length S into M subintervals with non-negative integer lengths.
The idea is to generate not the subinterval lengths, but the subinterval edges; and from those compute the subinterval lengths. This can be done in the following steps:
The subinterval edges are M-1 integer values (not necessarily different) between 0 and S. These can be generated as a Cartesian product using for example this answer.
Sort the interval edges, and remove duplicate sets of edges. This is why the algorithm is not totally efficient: it produces duplicates. But hopefully the number of discarded tentative solutions will be less than in your original approach, because this does take into account the fixed sum.
Compute subinterval lengths from their edges. Each length is the difference between two consecutive edges, including a fixed initial edge at 0 and a final edge at S.
Code:
%// Data
S = 3; %// desired sum
M = 3; %// number of pieces
%// Step 1 (adapted from linked answer):
combs = cell(1,M-1);
[combs{end:-1:1}] = ndgrid(0:S);
combs = cat(M+1, combs{:});
combs = reshape(combs,[],M-1);
%// Step 2
combs = unique(sort(combs,2), 'rows');
%// Step 3
combs = [zeros(size(combs,1),1) combs repmat(S, size(combs,1),1)]
result = diff(combs,[],2);
The result is sorted in lexicographical order. In your example,
result =
0 0 3
0 1 2
0 2 1
0 3 0
1 0 2
1 1 1
1 2 0
2 0 1
2 1 0
3 0 0

sum matrix using logical matrix - index exceeds matrix dimensions

I have two matrices.
mcaps which is a double 1698 x 2
index_g which is a logical 1698 x 2
When using the line of code below I get the error message that Index exceeds matrix dimensions. I don't see how this is the case though?
tsp = nansum(mcaps(index_g==1, :));
Update
Sorry I should have mentioned that I need the sum of each column in the mcaps vector
** Example of data **
mcaps index_g
5 6 0 0
4 3 0 0
6 5 1 1
4 6 0 1
8 7 0 0
There are two problems here. I missed one. Original answer is below.
What I missed is that when you use the logical index in this way, you are picking out elements of the matrix that may have different numbers of elements in each column, so MATLAB can't return a well formed matrix back to nansum, and so returns a vector. To get around this, use the fact that 0 + anything = 0
% create a mask of values you don't want to sum. Note that since
% index_g is already logical, you don't have to test equal to 1.
mask = ~index_g & isnan(mcaps)
% create a temporary variable
mcaps_to_sum = mcaps;
% change all of the values that you don't want to sum to zero
mcaps_to_sum(mask) = 0;
% do the sum
sum(mcaps_to_sum,1);
This is basically all that the nansum function does internally, is to set all of the NaN values to zero and then call the sum function.
index_g == 1 returns a 1698 x 2 logical matrix, but then you add in an extra dimension with the colon. To sum the columns, use the optional dim input. You want:
tsp = nansum(mcaps(index_g == 1),1);

How to select Matrix elements with a filter-matrix

I have 2 martices of the same size. The first contains values and the second only elements of 0 and 1 (like boolean). I now want all elements of my first Matrix stored in an array where the second Matrix has a 1 at the same index.
Maybe an example makes that clear:
Matrix 1:
a b c
d e f
g h i
Matrix 2:
0 1 1
1 0 0
0 0 1
output:
[b c d i]
I think this will work in two steps, but i cant get it to work.
This will need two steps indeed.
%# transpose Matrix 1 because Matlab iterates by row first
matrix_1 = matrix_1';
%# read values (transpose M2 as well)
%# also transpose the result to get a row-vector
output = matrix_1(matrix_2')';
Note that this indexing operation only works if matrix_2 is logical. If it isn't, cast it by writing logical(matrix_2) instead.
If your arrays are a and b, with b the mask array, try
a(find(b))
This won't produce the output in the order in your question. If order is important resort to #Jonas' approach.

Why sprank(A) and A\b report different rank in matlab?

I have a point set P and I construct it's adjacent matrix A by k-nearest neighbor. Each row of A is [...+1...-1...], indicates a pair of neighbor points. The size of A is 48348 x 8058, sprank(A) is 8058. But when I do the following, it gives me a warning: "Warning: Rank deficient, rank = 8055, tol = 8.307912e-10."
a=A*b;
c=A\a;
and norm(c-b) is quite large. It seems something is wrong with the adjacent matrix A, but I can't figure it out. Thanks in advance!
sprank only tells you how many rows/columns of your matrix have non-zero elements, while A\b is reporting the actual rank of the matrix which indicates how many rows of your matrix are linearly independent. For example, for following matrix:
A = [-1 1 0 0;
0 1 -1 0;
1 0 -1 0;
0 0 1 -1]
sprank(A) is 4 but rank(A) is only 3 because you can write the third row as a linear combination of the other rows, specifically A(2,:) - A(1,:).
The issue that you need to address is either in how you're computing A (if you expect that to generate a system of linearly independent equations) or you need to find a way to use A that doesn't require factorizing a rank deficient matrix.