Matlab matrix with fixed sum over rows - matlab

I'm trying to construct a matrix in Matlab where the sum over the rows is constant, but every combination is taken into account.
For example, take a NxM matrix where M is a fixed number and N will depend on K, the result to which all rows must sum.
For example, say K = 3 and M = 3, this will then give the matrix:
[1,1,1
2,1,0
2,0,1
1,2,0
1,0,2
0,2,1
0,1,2
3,0,0
0,3,0
0,0,3]
At the moment I do this by first creating the matrix of all possible combinations, without regard for the sum (for example this also contains [2,2,1] and [3,3,3]) and then throw away the element for which the sum is unequal to K
However this is very memory inefficient (especially for larger K and M), but I couldn't think of a nice way to construct this matrix without first constructing the total matrix.
Is this possible in a nice way? Or should I use a whole bunch of for-loops?

Here is a very simple version using dynamic programming. The basic idea of dynamic programming is to build up a data structure (here S) which holds the intermediate results for smaller instances of the same problem.
M=3;
K=3;
%S(k+1,m) will hold the intermediate result for k and m
S=cell(K+1,M);
%Initialisation, for M=1 there is only a trivial solution using one number.
S(:,1)=num2cell(0:K);
for iM=2:M
for temporary_k=0:K
for new_element=0:temporary_k
h=S{temporary_k-new_element+1,iM-1};
h(:,end+1)=new_element;
S{temporary_k+1,iM}=[S{temporary_k+1,iM};h];
end
end
end
final_result=S{K+1,M}

This may be more efficient than your original approach, although it still generates (and then discards) more rows than needed.
Let M denote the number of columns, and S the desired sum. The problem can be interpreted as partitioning an interval of length S into M subintervals with non-negative integer lengths.
The idea is to generate not the subinterval lengths, but the subinterval edges; and from those compute the subinterval lengths. This can be done in the following steps:
The subinterval edges are M-1 integer values (not necessarily different) between 0 and S. These can be generated as a Cartesian product using for example this answer.
Sort the interval edges, and remove duplicate sets of edges. This is why the algorithm is not totally efficient: it produces duplicates. But hopefully the number of discarded tentative solutions will be less than in your original approach, because this does take into account the fixed sum.
Compute subinterval lengths from their edges. Each length is the difference between two consecutive edges, including a fixed initial edge at 0 and a final edge at S.
Code:
%// Data
S = 3; %// desired sum
M = 3; %// number of pieces
%// Step 1 (adapted from linked answer):
combs = cell(1,M-1);
[combs{end:-1:1}] = ndgrid(0:S);
combs = cat(M+1, combs{:});
combs = reshape(combs,[],M-1);
%// Step 2
combs = unique(sort(combs,2), 'rows');
%// Step 3
combs = [zeros(size(combs,1),1) combs repmat(S, size(combs,1),1)]
result = diff(combs,[],2);
The result is sorted in lexicographical order. In your example,
result =
0 0 3
0 1 2
0 2 1
0 3 0
1 0 2
1 1 1
1 2 0
2 0 1
2 1 0
3 0 0

Related

How to create an adjacency/joint probability matrix in matlab

From a binary matrix, I want to calculate a kind of adjacency/joint probability density matrix (not quite sure how to label it as so please feel free to rename).
For example, I start with this matrix:
A = [1 1 0 1 1
1 0 0 1 1
0 0 0 1 0]
I want to produce this output:
Output = [1 4/5 1/5
4/5 1 1/5
1/5 1/5 1]
Basically, for each row, I want to calculate the proportion of times where they agreed (1 and 1 or 0 and 0). A will always agree with itself and thus have it as 1 along the diagonal. No matter how many different js are added it will still result in a 3x3, but an extra i variable will result in a 4x4.
I like to think of the inputs along i in the A matrix as the person and Js as the question and so the final output is a 3x3 (number of persons) matrix.
I am having some trouble with this on matlab. If you could please help point me in the right direction that would be fabulous.
So, you can do this in two parts.
bothOnes = A*A';
gives you a matrix showing how many 1s each pair of rows share, and
bothZeros = (1-A)*(1-A)';
gives you a matrix showing how many 0s each pair of rows share.
If you just add them up, you get how many elements they share of either type:
bothSame = A*A' + (1-A)*(1-A)';
Then just divide by the row length to get the desired fractional representation:
output = (A*A' + (1-A)*(1-A)') / size(A, 2);
That should get you there.
Note that this only works if A contains only 1's and 0's, but it can be adapted for other cases.
Here are some alternatives, assuming A can only contain 0 and 1:
If you have the Statistics Toolbox:
result = 1-squareform(pdist(A, 'hamming'));
Manual approach with implicit expansion:
result = mean(permute(A, [1 3 2])==permute(A, [3 1 2]), 3);
Using bitwise operations. This is a more esoteric approach, and is only valid if A has at most 53 columns, due to floating-point limitations:
t = bin2dec(char(A+'0')); % convert each row from binary to decimal
u = bitxor(t, t.'); % bitwise xor
v = mean(dec2bin(u)-'0', 2); % compute desired values
result = 1 - reshape(v, size(A,1), []); % reshape to obtain result

Finding equal rows in Matlab

I have a matrix suppX in Matlab with size GxN and a matrix A with size MxN. I would like your help to construct a matrix Xresponse with size GxM with Xresponse(g,m)=1 if the row A(m,:) is equal to the row suppX(g,:) and zero otherwise.
Let me explain better with an example.
suppX=[1 2 3 4;
5 6 7 8;
9 10 11 12]; %GxN
A=[1 2 3 4;
1 2 3 4;
9 10 11 12;
1 2 3 4]; %MxN
Xresponse=[1 1 0 1;
0 0 0 0;
0 0 1 0]; %GxM
I have written a code that does what I want.
Xresponsemy=zeros(size(suppX,1), size(A,1));
for x=1:size(suppX,1)
Xresponsemy(x,:)=ismember(A, suppX(x,:), 'rows').';
end
My code uses a loop. I would like to avoid this because in my real case this piece of code is part of another big loop. Do you have suggestions without looping?
One way to do this would be to treat each matrix as vectors in N dimensional space and you can find the L2 norm (or the Euclidean distance) of each vector. After, check if the distance is 0. If it is, then you have a match. Specifically, you can create a matrix such that element (i,j) in this matrix calculates the distance between row i in one matrix to row j in the other matrix.
You can treat your problem by modifying the distance matrix that results from this problem such that 1 means the two vectors completely similar and 0 otherwise.
This post should be of interest: Efficiently compute pairwise squared Euclidean distance in Matlab.
I would specifically look at the answer by Shai Bagon that uses matrix multiplication and broadcasting. You would then modify it so that you find distances that would be equal to 0:
nA = sum(A.^2, 2); % norm of A's elements
nB = sum(suppX.^2, 2); % norm of B's elements
Xresponse = bsxfun(#plus, nB, nA.') - 2 * suppX * A.';
Xresponse = Xresponse == 0;
We get:
Xresponse =
3×4 logical array
1 1 0 1
0 0 0 0
0 0 1 0
Note on floating-point efficiency
Because you are using ismember in your implementation, it's implicit to me that you expect all values to be integer. In this case, you can very much compare directly with the zero distance without loss of accuracy. If you intend to move to floating-point, you should always compare with some small threshold instead of 0, like Xresponse = Xresponse <= 1e-10; or something to that effect. I don't believe that is needed for your scenario.
Here's an alternative to #rayryeng's answer: reduce each row of the two matrices to a unique identifier using the third output of unique with the 'rows' input flag, and then compare the identifiers with singleton expansion (broadcast) using bsxfun:
[~, ~, w] = unique([A; suppX], 'rows');
Xresponse = bsxfun(#eq, w(1:size(A,1)).', w(size(A,1)+1:end));

Matlab: Covariance Matrix from matrix of combinations using E(X) and E(X^2)

I have a set of independent binary random variables (say A,B,C) which take a positive value with some probability and zero otherwise, for which I have generated a matrix of 0s and 1s of all possible combinations of these variables with at least a 1 i.e.
A B C
1 0 0
0 1 0
0 0 1
1 1 0
etc.
I know the values and probabilities of A,B,C so I can calculate E(X) and E(X^2) for each. I want to treat each combination in the above matrix as a new random variable equal to the product of the random variables which are present in that combination (show a 1 in the matrix). For example, random variable Row4 = A*B.
I have created a matrix of the same size to the above, which shows the relevant E(X)s instead of the 1s, and 1s instead of the 0s. This allows me to easily calculate the vector of Expected values of the new random variables (one per combination) as the product of each row. I have also generated a similar matrix which shows E(X^2) instead of E(X), and another one which shows prob(X>0) instead of E(X).
I'm looking for a Matlab script that computes the Covariance matrix of these new variables i.e. taking each row as a random variable. I presume it will have to use the formula:
Cov(X,Y)=E(XY)-E(X)E(Y)
For example, for rows (1 1 0) and (1 0 1):
Cov(X,Y)=E[(AB)(AC)]-E(X)E(Y)
=E[(A^2)BC]-E(X)E(Y)
=E(A^2)E(B)E(C)-E(X)E(Y)
These values I already have from the matrices I've mentioned above. For each Covariance, I'm just unsure how to know which two variables appear in both rows, because for those I will have to select E(X^2) instead of E(X).
Alternatively, the above can be written as:
Cov(X,Y)=E(X)E(Y)*[1/prob(A>0)-1]
But the problem remains as the probabilities in the denominator will only be the ones of the variables which are shared between two combinations.
Any advice on how automate the computation of the Covariance matrix in Matlab would be greatly appreciated.
I'm pretty sure this is not the most efficient way to do that but that's a start:
Assume r1...n the combinations of the random variables, R is the matrix:
A B C
r1 1 0 0
r2 0 1 0
r3 0 0 1
r4 1 1 0
If you have the vector E1, E2 and ER as:
E1 = [E(A) E(B) E(C) ...]
E2 = [E(A²) E(B²) E(C²) ...]
ER = [E(r1) E(r2) E(r3) ...]
If you want to compute E(r1,r2) you can:
1) Extract the R1 and R2 columns from R
v1 = R(1,:)
v2 = R(2,:)
2) Sum both vectors in vs
vs = v1 + v2
3) Loop in vs, if you see a 2 that means the value in R2 has to be used, if you see a 1 it is the value in R1, if it is 0 do not use the value.
4) Using the loop, compute your E(r1,r2) as wanted.

How to construct a 128x32 scrambled matrix?

How can I construct a scrambled matrix with 128 rows and 32 columns in vb.net or Matlab?
Entries of the matrix are numbers between 1 and 32 with the condition that each row mustn't contain duplicate elements and rows mustn't be duplicates.
This is similar to #thewaywewalk's answer, but makes sure that the matrix has no repeated rows by testing if it does and in that case generating a new matrix:
done = 0;
while ~done
[~, matrix] = sort(rand(128,32),2);
%// generate each row as a random permutation, independently of other rows.
%// This line was inspired by randperm code
done = size(unique(matrix,'rows'),1) == 128;
%// in the event that there are repeated rows: generate matrix again
end
If my computations are correct, the probability that the matrix has repteated rows (and thus has to be generated again) is less than
>> 128*127/factorial(32)
ans =
6.1779e-032
Hey, it's more likely that a cosmic ray will spoil a given run of the program! So I guess you can safely remove the while loop :-)
With randperm you can generate one row:
row = randperm(32)
if this vector wouldn't be that long you could just use perms to find all permutations:
B = perms(randperm(32))
but it's memory-wise too much! ( 32! = 2.6313e+35 rows )
so you can use a little loop:
N = 200;
A = zeros(N,32);
for ii = 1:N
A(ii,:) = randperm(32);
end
B = unique(A, 'rows');
B = B(1:128,:);
For my tests it was sufficient to use N = 128 directly and skip the last two lines, because with 2.6313e+35 possibly permutations the probability that you get a correct matrix with the first try is very high. But to be sure that there are no row-duplicates choose a higher number and select the first 128 rows finally. In case the input vector is relatively short and the number of desired rows close to the total number of possible permutations use the proposed perms(randperm( n )).
small example for intergers from 1 to 4 and a selection of 10 out of 24 possible permutations:
N = 20;
A = zeros(N,4);
for ii = 1:N
A(ii,:) = randperm(4);
end
B = unique(A, 'rows');
B = B(1:10,:);
returns:
B =
1 2 3 4
1 2 4 3
1 3 4 2
2 3 1 4
2 3 4 1
2 4 1 3
2 4 3 1
3 1 2 4
3 1 4 2
3 2 1 4
some additional remarks for the choice of N:
I made some test runs, where I used the loop above to find all permutations like perms does. For vector lengths of n=4 to n=7 and in each case N = factorial(n): 60-80% of the rows are unique.
So for small n I would recommend to choose N as follows to be absolutely on the safe side:
N = min( [Q factorial(n)] )*2;
where Q is the number of permutations you want. For bigger n you either run out of memory while searching for all permutations, or the desired subset is so small compared to the number of all possible permutations that repetition is very unlikely! (Cosmic Ray theory linked by Luis Mendo)
Your requirements are very loose and allow many different possibilities. The most efficient solution I can think off that meets these requirements is as follows:
p = perms(1:6);
[p(1:128,:) repmat(7:32,128,1)]

For large sparse matrices in MATLAB, compute the cumulative sum across the columns for non-zero entries?

In MATLAB have a large matrix with transition probabilities transition_probs, and an adjacency matrix adj_mat. I want to compute the cumulative sum of the transition matrix along the columns and then element wise multiply it against the adjacency matrix which acts as a mask in this way:
cumsumTransitionMat = cumsum(transition_probs,2) .* adj_mat;
I get a MEMORY error because with the cumsum all the entries of the matrix are then non-zero.
I would like to avoid this problem by only having the cumulative sum entries where there are non zero entries in the first place. How can this be done without the use of a for loop?
when CUMSUM is applied on rows, for each row it will go and fill with values starting with the first nonzero column it finds up until the last column, thats what it does by definition.
The worst case in terms of storage is when the sparse matrix contains values at the first column, the best case is when all nonzero values occur at the last column. Example:
% worst case
>> M = sparse([ones(5,1) zeros(5,4)]);
>> MM = cumsum(M,2); % completely dense matrix
>> nnz(MM)
ans =
25
% best case
>> MM = cumsum(fliplr(M),2);
If the resulting matrix does not fit in memory, I dont see what else you can do, except maybe use a for-loop over the rows, and process the matrix is smaller batches...
Note that you cannot apply the masking operation before computing the cumulative sum, since this will alter the results. So you cant say cumsum(transition_probs .* adj_mat, 2).
You can apply cumsum on the non-zero elements only. Here is some code:
A = sparse(round(rand(100,1))); %some sparse data
A_cum = A; %instantiate A_cum by copy A
idx_A = find(A); %find non-zeros
A_cum(idx_A) = cumsum(A(idx_A)); %cumsum on non-zeros elements only
You can check the output with
B = cumsum(A);
A_cum B
1 1
0 1
0 1
2 2
3 3
4 4
5 5
0 5
0 5
6 6
and isequal(A_cum(find(A_cum)), B(find(A_cum))) gives 1.