Find the probability of each column in matrix - matlab

I'm using the Monte Carlo approach to generate N scenarios for my random variable. I store these scenarios in a M x N matrix. I want to compute the probability of each scenario (column) occurring in the matrix.
I tried to use the command histc() but it doesn't work. How can I find these probabilities and store them in a vector in order to use it for an optimization problem?

You can use unique on the transpose of this "scenario" matrix and examine the third output of the function call. The third output assigns a unique ID to every possible unique occurrence seen in the matrix. You'll also want to specify the 'rows' flag to consider each row in its entirety to be a single instance. If you don't specify 'rows', each individual element in your matrix would be considered a single instance where you want the entire column (or row of the transposed matrix) to be a single instance. Also, we can't operate along the columns with unique unfortunately which is why you need to transpose the matrix instead.
You'll also want the first output too to examine how each ID maps to each unique occurrence. Make sure you transpose this output variable at the end to make the outputs map to columns. You can then use histc or histcounts to determine the occurrence of each "scenario" and thus find the probability assuming an equiprobable situation.
Here's a quick example. Suppose I have this matrix of events stored in A:
>> A = [0 0 0; 1 1 1; 1 0 0; 0 0 1; 0 0 0; 0 0 1; 1 1 1].'
A =
0 1 1 0 0 0 1
0 1 0 0 0 0 1
0 1 0 1 0 1 1
We see that the dimensionality of the scenario is 3 while there are 7 events. Doing what I said with unique gives:
>> [un, ~, id] = unique(A.', 'rows'); %'
>> un = un.' %'
un =
0 0 1 1
0 0 0 1
0 1 0 1
>> id
id =
1
4
3
2
1
2
4
We can see that the column [0; 0; 0] belongs to ID 1, the column [0; 0; 1] maps to ID 2, the column [1; 0; 0] maps to ID 3 and finally [1; 1; 1] maps to ID 4. un stores all unique columns while id provides us this mapping. You can verify that if you consult the id variable and consult the mapping we have seen above, you can replace each column with their corresponding IDs.
We can then determine the probabilities of each occurrence:
N = size(un,2); %// Get total number of unique scenarios
M = size(A,2); %// Get total number of scenarios
prob = histc(id, 1 : N) / M; %// Finding probabilities
We get:
>> prob
prob =
0.2857
0.2857
0.1429
0.2857
This agrees with our data. For columns [0; 0; 0], [0; 0; 1] and [1; 1; 1], there are two occurrences of each column and so the probabilities are 2/7 = 0.2857. The other column of [1; 0; 0] only has one occurrence, and so the probability is 1/7 = 0.1429.
For the full code listing so you can easily copy and paste, assuming your matrix is stored in A:
[un, ~, id] = unique(A.', 'rows'); %'// Assigning each event a unique ID
un = un.'; %'// Transpose to ensure compatibility
N = size(un,2); %// Get total number of unique scenarios
M = size(A,2); %// Get total number of scenarios
prob = histc(id, 1 : N) / M; %// Finding probabilities

Related

Constructing vectors of different lengths

I want to find out row and column number of zeros in 3 dimensional space. Problem is I get output vectors(e.g row) of different length each time, hence dimension error occurs.
My attempt:
a (:,:,1)= [1 2 0; 2 0 1; 0 0 2]
a (:,:,2) = [0 2 8; 2 1 0; 0 0 0]
for i = 1 : 2
[row(:,i) colum(:,i)] = find(a(:,:,i)==0);
end
You can use linear indexing:
a (:,:,1) = [1 2 0; 2 0 1; 0 0 2];
a (:,:,2) = [0 2 8; 2 1 0; 0 0 0];
% Answer in linear indexing
idx = find(a == 0);
% Transforms linear indexing in rows-columns-3rd dimension
[rows , cols , third] = ind2sub(size(a) ,idx)
More on the topic can be found in Matlab's help
Lets assume your Matrix has the format N-by-M-by-P.
In your case
N = 3;
M = 3;
P = 2;
This would mean that the maximum length of rows and coloms from your search (if all entries are zero) is N*M=9
So one possible solution would be
%alloc output
row=zeros(size(a,1)*size(a,2),size(a,3));
colum=row;
%loop over third dimension
n=size(a,3);
for i = 1 : n
[row_t colum_t] = find(a(:,:,i)==0);
%copy your current result depending on it's length
row(1:length(row_t),i)=row_t;
colum(1:length(colum_t),i)=colum_t;
end
However, when you past the result to the next function / script you have to keep in mind to operate on the non-zero elements.
I would go for the vectorized solution of Zep. As for bigger matrices a it is more memory efficient and I am sure it must be way faster.

Combination and Multiplying Rows of array in matlab

I have a matrix (89x42) of 0's and 1's that I'd like to multiply combinations of rows together.
For example, for matrix
input = [1 0 1
0 0 0
1 1 0];
and with 2 combinations, I want an output of
output = [0 0 0; % (row1*row2)
1 0 0; % (row1*row3)
0 0 0] % (row2*row3)
Which rows to multiply is dictated by "n Choose 2" (nCk), or all possible combinations of the rows n taken k at a time. In this case k=2.
Currently I am using a loop and it works fine for the 89C2 combinations of rows, but when I run it with 89C3 it takes far too long too run.
What would be the most efficient way to do this program so I can do more than 2 combinations?
You can do it using nchoosek and element-wise multiplication.
inp = [1 0 1; 0 0 0; 1 1 0]; %Input matrix
C = nchoosek(1:size(inp,1),2); %Number of rows taken 2 at a time
out = inp(C(:,1),:) .* inp(C(:,2),:); %Multiplying those rows to get the desired output
Several things you can do:
Use logical ("binary") arrays (or even sparse logical arrays) instead of double arrays.
Use optimized combinatorical functions.
bitand or and instead of times (where applicable).
Vectorize:
function out = q44417404(I,k)
if nargin == 0
rng(44417404);
I = randi(2,89,42)-1 == 1;
k = 3;
end
out = permute(prod(reshape(I(nchoosek(1:size(I,1),k).',:).',size(I,2),k,[]),2),[3,1,2]);

How do you generate a random sequence of only -1 and 1?

I want to know how to randomly generate a sequence either -1 or 1.
For example:
[1 1 -1 -1] or
[-1 1 -1 1 1 1 1 -1 ] or
[-1 1 1 -1 1 -1 -1] are what I expect.
but [-1 0 1 -1 -1] or [2 1 -1 -1 1 1] are not what I want.
One liner for N elements:
2*randi(2, 1, N) - 3
or maybe clearer
(-1).^randi(2, 1, N)
There are several ways to do this.
Method #1 - Selecting from a small array
You can create a small array that consist of [-1 1], then create random integers that contain either 1 or 2 and index into this sequence:
N = 10; %// Number of values in the array
%// Generate random indices
ind = randi(2, N, 1);
%// Create small array
arr = [-1; 1];
%// Get final array
out = arr(ind);
Method #2 - Generate values from a uniform random distribution and threshold
You can also generate random uniformly distributed floating point values, and anything larger than 0.5, you can set to 1, and anything less you can set to -1.
N = 10; %// Number of values in the array
%// Generate randomly distributed floating point values
out = rand(N, 1);
%// Find those locations that are >= 0.5
ind = out >= 0.5;
%// Set the right locations to +1/-1
out(ind) = 1;
out(~ind) = -1;
Method #3 - Use trigonometry
You can use the fact that cos(n*pi) can either give 1 or -1, depending on what value that n is as long as n is an integer. Odd values produce -1 while even values produce 1. As such, you can generate a bunch of random integers that are either 1 or 2, and calculate cos(n*pi):
N = 10; %// Number of values in the array
%// Generate random integers
ind = randi(2, N, 1);
%// Compute sequence via trigonometry
out = cos(ind*pi);

Replacing zeros (or NANs) in a matrix with the previous element row-wise or column-wise in a fully vectorized way

I need to replace the zeros (or NaNs) in a matrix with the previous element row-wise, so basically I need this Matrix X
[0,1,2,2,1,0;
5,6,3,0,0,2;
0,0,1,1,0,1]
To become like this:
[0,1,2,2,1,1;
5,6,3,3,3,2;
0,0,1,1,1,1],
please note that if the first row element is zero it will stay like that.
I know that this has been solved for a single row or column vector in a vectorized way and this is one of the nicest way of doing that:
id = find(X);
X(id(2:end)) = diff(X(id));
Y = cumsum(X)
The problem is that the indexing of a matrix in Matlab/Octave is consecutive and increments columnwise so it works for a single row or column but the same exact concept cannot be applied but needs to be modified with multiple rows 'cause each of raw/column starts fresh and must be regarded as independent. I've tried my best and googled the whole google but coukldn’t find a way out. If I apply that same very idea in a loop it gets too slow cause my matrices contain 3000 rows at least. Can anyone help me out of this please?
Special case when zeros are isolated in each row
You can do it using the two-output version of find to locate the zeros and NaN's in all columns except the first, and then using linear indexing to fill those entries with their row-wise preceding values:
[ii jj] = find( (X(:,2:end)==0) | isnan(X(:,2:end)) );
X(ii+jj*size(X,1)) = X(ii+(jj-1)*size(X,1));
General case (consecutive zeros are allowed on each row)
X(isnan(X)) = 0; %// handle NaN's and zeros in a unified way
aux = repmat(2.^(1:size(X,2)), size(X,1), 1) .* ...
[ones(size(X,1),1) logical(X(:,2:end))]; %// positive powers of 2 or 0
col = floor(log2(cumsum(aux,2))); %// col index
ind = bsxfun(#plus, (col-1)*size(X,1), (1:size(X,1)).'); %'// linear index
Y = X(ind);
The trick is to make use of the matrix aux, which contains 0 if the corresponding entry of X is 0 and its column number is greater than 1; or else contains 2 raised to the column number. Thus, applying cumsum row-wise to this matrix, taking log2 and rounding down (matrix col) gives the column index of the rightmost nonzero entry up to the current entry, for each row (so this is a kind of row-wise "cummulative max" function.) It only remains to convert from column number to linear index (with bsxfun; could also be done with sub2ind) and use that to index X.
This is valid for moderate sizes of X only. For large sizes, the powers of 2 used by the code quickly approach realmax and incorrect indices result.
Example:
X =
0 1 2 2 1 0 0
5 6 3 0 0 2 3
1 1 1 1 0 1 1
gives
>> Y
Y =
0 1 2 2 1 1 1
5 6 3 3 3 2 3
1 1 1 1 1 1 1
You can generalize your own solution as follows:
Y = X.'; %'// Make a transposed copy of X
Y(isnan(Y)) = 0;
idx = find([ones(1, size(X, 1)); Y(2:end, :)]);
Y(idx(2:end)) = diff(Y(idx));
Y = reshape(cumsum(Y(:)), [], size(X, 1)).'; %'// Reshape back into a matrix
This works by treating the input data as a long vector, applying the original solution and then reshaping the result back into a matrix. The first column is always treated as non-zero so that the values don't propagate throughout rows. Also note that the original matrix is transposed so that it is converted to a vector in row-major order.
Modified version of Eitan's answer to avoid propagating values across rows:
Y = X'; %'
tf = Y > 0;
tf(1,:) = true;
idx = find(tf);
Y(idx(2:end)) = diff(Y(idx));
Y = reshape(cumsum(Y(:)),fliplr(size(X)))';
x=[0,1,2,2,1,0;
5,6,3,0,1,2;
1,1,1,1,0,1];
%Do it column by column is easier
x=x';
rm=0;
while 1
%fields to replace
l=(x==0);
%do nothing for the first row/column
l(1,:)=0;
rm2=sum(sum(l));
if rm2==rm
%nothing to do
break;
else
rm=rm2;
end
%replace zeros
x(l) = x(find(l)-1);
end
x=x';
I have a function I use for a similar problem for filling NaNs. This can probably be cutdown or sped up further - it's extracted from pre-existing code that has a bunch more functionality (forward/backward filling, maximum distance etc).
X = [
0 1 2 2 1 0
5 6 3 0 0 2
1 1 1 1 0 1
0 0 4 5 3 9
];
X(X == 0) = NaN;
Y = nanfill(X,2);
Y(isnan(Y)) = 0
function y = nanfill(x,dim)
if nargin < 2, dim = 1; end
if dim == 2, y = nanfill(x',1)'; return; end
i = find(~isnan(x(:)));
j = 1:size(x,1):numel(x);
j = j(ones(size(x,1),1),:);
ix = max(rep([1; i],diff([1; i; numel(x) + 1])),j(:));
y = reshape(x(ix),size(x));
function y = rep(x,times)
i = find(times);
if length(i) < length(times), x = x(i); times = times(i); end
i = cumsum([1; times(:)]);
j = zeros(i(end)-1,1);
j(i(1:end-1)) = 1;
y = x(cumsum(j));

Random binary matrix with two non-trivial constraints

I need to generate a random matrix of K columns and N rows containing ones and zeroes, such that:
a) Each row contains exactly k ones.
b) Each row is different from the other (combinatorics imposes that if N > nchoosek(K, k) there will be nchoosek(K,k) rows).
Assume I want N = 10000 (out of all the possible nchoosek(K, k) = 27405 combinations), different 1×K vectors (with K = 30) containing k (with k = 4) ones and K - k zeroes.
This code:
clear all; close
N=10000; K=30; k=4;
M=randi([0 1],N,K);
plot(sum(M,2)) % condition a) not satisfied
does not satisfy neither a) nor b).
This code:
clear all; close;
N=10000;
NN=N; K=30; k=4;
tempM=zeros(NN,K);
for ii=1:NN
ttmodel=tempM(ii,:);
ttmodel(randsample(K,k,false))=1; %satisfies condition a)
tempM(ii,:)=ttmodel;
end
Check=bi2de(tempM); %from binary to decimal
[tresh1,ind,tresh2] = unique(Check);%drop the vectors that appear more than once in the matrix
M=tempM(ind,:); %and satisfies condition b)
plot(sum(M,2)) %verify that condition a) is satisfied
%Effective draws, Wanted draws, Number of possible combinations to draw from
[sum(sum(M,2)==k) N nchoosek(K,k) ]
satisfies condition a) and partially condition b). I say partially because unless NN>>N the final matrix will contain less than N rows each different from each other.
Is there a better and faster way (that possible avoids the for cycle and the need of having NN>>N) to solve the problem?
First, generate N unique k-long permutations of the positions of ones:
cols = randperm(K, N);
cols = cols(:, 1:k);
Then generate the matching row indices:
rows = meshgrid(1:N, 1:k)';
and finally create the sparse matrix with:
A = sparse(rows, cols, 1, N, K);
To obtain the full form of the matrix, use full(A).
Example
K = 10;
k = 4;
N = 5;
cols = randperm(K, N);
cols = cols(:, 1:k);
rows = meshgrid(1:N, 1:k)';
A = sparse(rows, cols , 1, N, K);
full(A)
The result I got is:
ans =
1 1 0 0 0 0 0 1 0 1
0 0 1 1 0 1 0 0 0 1
0 0 0 1 1 0 1 0 1 0
0 1 0 0 0 0 1 0 1 1
1 1 1 0 0 1 0 0 0 0
This computation should be pretty fast even for large values of K and N. For K = 30, k = 4, N = 10000 the result was obtained in less than 0.01 seconds.
You could use randperm(n) to generate random sequences of integers from 1 to n, and store the nonrepeated sequences as rows in a matrix M until size(unique(M,'rows'),1)==size(M,1). Then you could use M to index a logical matrix with the appropriate number of true values in each row.
If you have enough memory for nchoosek(K,k) integers, build an array of those, use a partial Fisher-Yates shuffle to get a proper uniformly random subset of N of those. Now, given the array of N integers, interpret each as the rank of the combination representing each row of your final array. If you use colexicographical ordering of combinations, computing the combination from a rank is pretty simple (though it uses lots of binomial combination functions, so it pays to have a fast one).
I'm not a Matlab guy, but I've done things similar to this in C. This code, for example:
for (i = k; i >= 1; --i) {
while ((b = binomial(n, i)) > r) --n;
buf[i-1] = n;
r -= b;
}
will fill the array buf[] with indices from 0 to n-1 for the rth combination of k out of n elements in colex order. You would interpret these as the positions of the 1s in your row.