Choose multiple combination (matlab) - matlab

I have 3 sets of array each contains 12 elements of same type
a=[1 1 1 1 1 1 1 1 1 1 1];
b=[2 2 2 2 2 2 2 2 2 2 2];
c=[3 3 3 3 3 3 3 3 3 3 3];
I have to find how many ways it can be picked up if I need to pickup 12 items at a time
here, 1 1 2 is same as 2 1 1
I found this link Generating all combinations with repetition using MATLAB.
Ho this can be done in matlab within reasonable time.
is that way correct
abc=[a b c];
allcombs=nmultichoosek(abc,12);
combs=unique(allcombs,'rows');

If you only need to find the number of ways to select the items, then using generating functions is a way to very efficiently compute that, even for fairly large values of N and k.
If you are not familiar with generating functions, you can read up on the math background here:
http://mathworld.wolfram.com/GeneratingFunction.html
and here:
http://math.arizona.edu/~faris/combinatoricsweb/generate.pdf
The solution hinges on the fact that the number of ways to choose k items from 36, with each of 3 items repeated 12 times, can be determined from the product of the generating function:
g(x) = 1 + x + x^2 + x^3 + ... + x^12
with itself 3 times. The 12 comes from the fact the elements are repeated 12 times (NOT from the fact you are choosing 12), and multiplying by itself 3 times is because there are three different sets of elements. The number of ways to choose 12 elements is then just the coefficient of the power of x^12 in this product of polynomials (try it for smaller examples if you want to prove to yourself that it works).
The great thing about that is that MATLAB has a simple function conv for multiplying polynomials:
>> g = ones(1,13); %% array of 13 ones, for a 12th degree polynomial with all `1` coefficents
>> prod = conv(g, conv(g, g)); %% multiply g by itself 3 times, as a polynomial
>> prod(13)
ans =
91
So there are 91 ways to select 12 elements from your list of 36. If you want to select 11 elements, that's z(12) = 78. If you want to select 13 elements, that's z(14) = 102.
Finally, if you had different numbers of elements in the sets, say 10 1's, 12 2's, and 14 3's, then you would have 3 distinct polynomials of the same form, 1 + x + x^2 + ..., with degrees 10, 12 and 14 respectively. Inspecting the coefficient of the degree k term again gives you the number of ways to choose k elements from this set.

Related

How to split a matrix based on how close the values are?

Suppose I have a matrix A:
A = [1 2 3 6 7 8];
I would like to split this matrix into sub-matrices based on how relatively close the numbers are. For example, the above matrix must be split into:
B = [1 2 3];
C = [6 7 8];
I understand that I need to define some sort of criteria for this grouping so I thought I'd take the absolute difference of the number and its next one, and define a limit upto which a number is allowed to be in a group. But the problem is that I cannot fix a static limit on the difference since the matrices and sub-matrices will be changing.
Another example:
A = [5 11 6 4 4 3 12 30 33 32 12];
So, this must be split into:
B = [5 6 4 4 3];
C = [11 12 12];
D = [30 33 32];
Here, the matrix is split into three parts based on how close the values are. So the criteria for this matrix is different from the previous one though what I want out of each matrix is the same, to separate it based on the closeness of its numbers. Is there any way I can specify a general set of conditions to make the criteria dynamic rather than static?
I'm afraid, my answer comes too late for you, but maybe future readers with a similar problem can profit from it.
In general, your problem calls for cluster analysis. Nevertheless, maybe there's a simpler solution to your actual problem. Here's my approach:
First, sort the input A.
To find a criterion to distinguish between "intraclass" and "interclass" elements, I calculate the differences between adjacent elements of A, using diff.
Then, I calculate the median over all these differences.
Finally, I find the indices for all differences, which are greater or equal than three times the median, with a minimum difference of 1. (Depending on the actual data, this might be modified, e.g. using mean instead.) These are the indices, where you will have to "split" the (sorted) input.
At last, I set up two vectors with the starting and end indices for each "sub-matrix", to use this approach using arrayfun to get a cell array with all desired "sub-matrices".
Now, here comes the code:
% Sort input, and calculate differences between adjacent elements
AA = sort(A);
d = diff(AA);
% Calculate median over all differences
m = median(d);
% Find indices with "significantly higher difference",
% e.g. greater or equal than three times the median
% (minimum difference should be 1)
idx = find(d >= max(1, 3 * m));
% Set up proper start and end indices
start_idx = [1 idx+1];
end_idx = [idx numel(A)];
% Generate cell array with desired vectors
out = arrayfun(#(x, y) AA(x:y), start_idx, end_idx, 'UniformOutput', false)
Due to the unknown number of possible vectors, I can't think of way to "unpack" these to individual variables.
Some tests:
A =
1 2 3 6 7 8
out =
{
[1,1] =
1 2 3
[1,2] =
6 7 8
}
A =
5 11 6 4 4 3 12 30 33 32 12
out =
{
[1,1] =
3 4 4 5 6
[1,2] =
11 12 12
[1,3] =
30 32 33
}
A =
1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 3
out =
{
[1,1] =
1 1 1 1 1 1 1
[1,2] =
2 2 2 2 2 2
[1,3] =
3 3 3 3 3 3 3
}
Hope that helps!

Weighted Random number?

How can I randomize and generate numbers from 0-50 in matrix of 5x5 with SUM or each row printed on the right side?
+
is there any way to give weight to individual numbers before generating the numbers?
Please help
Thanks!
To generate a random matrix of integers between 0 and 50 (sampled with replacement) you could use
M = randint(5,5,[0,50])
To print the matrix with the sum of each row execute the following command
[M sum(M,2)]
To use a different distribution there are a number of techniques but one of the easiest is to use the datasample function from the Statistics and Machine Learning toolbox.
% sample from a truncated Normal distribution. No need to normalize
x = 0:50;
weights = exp(-0.5*(x-25).^2 / 5^2);
M = reshape(datasample(x,25,'Weights',weights),[5,5])
Edit:
Based on your comment you want to perform random sampling without replacement. You can perform such a random sampling without replacement if the weights are non-negative integers by simulating the classic ball-urn experiment.
First create an array containing the appropriate number of each value.
Example: If we have the values 0,1,2,3,4 with the following weights
w(0) = 2
w(1) = 3
w(2) = 5
w(3) = 4
w(4) = 1
Then we would first create the urn array
>> urn = [0 0 1 1 1 2 2 2 2 2 3 3 3 3 4];
then, we would shuffle the urn using randperm
>> urn_shuffled = urn(randperm(numel(urn)))
urn_shuffled =
2 0 4 3 0 3 2 2 3 3 1 2 1 2 1
To pick 5 elements without replacement we would simple select the first 5 elements of urn_shuffled.
Rather than typing out the entire urn array, we can construct it programatically given an array of weights for each value. For example
weight = [2 3 5 4 1];
urn = []
v = 0
for w = weight
urn = [urn repmat(v,1,w)];
v = v + 1;
end
In your case, the urn will contain many elements. Once you shuffle you would select the first 25 elements and reshape them into a matrix.
>> M = reshape(urn_shuffled(1:25),5,5)
To draw random integer uniformly distributed numbers, you can use the randi function:
>> randi(50,[5,5])
ans =
34 48 13 28 13
33 18 26 7 41
9 30 35 8 13
6 12 45 13 47
25 38 48 43 18
Printing the sum of each row can be done by using the sum function with 2 as the dimension argument:
>> sum(ans,2)
ans =
136
125
95
123
172
For weighting the various random numbers, see this question.

MATLAB: sample from population randomly many times?

I am aware of MATLAB's datasample which allows to select k times from a certain population. Suppose population=[1,2,3,4] and I want to uniformly sample, with replacement, k=5 times from it. Then:
datasample(population,k)
ans =
1 3 2 4 1
Now, I want to repeat the above experiment N=10000 times without using a for loop. I tried doing:
datasample(repmat(population,N,1),5,2)
But the output I get is (just a short excerpt below):
1 3 2 1 3
1 3 2 1 3
1 3 2 1 3
1 3 2 1 3
1 3 2 1 3
1 3 2 1 3
1 3 2 1 3
1 3 2 1 3
1 3 2 1 3
Every row (result of an experiment) is the same! But obviously they should be different... It's as though some random seed is not updating between rows. How can I fix this? Or some other method I could use that avoids a for loop? Thanks!
You seem to be confusing the way datasample works. If you read the documentation on the function, if you specify a matrix, it will generate a data sampling from a selection of rows in the matrix. Therefore, if you simply repeat the population vector 10000 times, and when you specify the second parameter of the function - which in this case is how many rows of the matrix to extract, even though the actual row locations themselves are different, the actual rows over all of the matrix is going to be the same which is why you are getting that "error".
As such, I wouldn't use datasample here if it is your intention to avoid looping. You can use datasample, but you'd have to loop over each call and you explicitly said that this is not what you want.
What I would recommend you do is first create your population vector to have whatever you desire in it, then generate a random index matrix where each value is between 1 up to as many elements as there are in population. This matrix is in such a way where the number of columns is the number of samples and the number of rows is the number of trials. Once you create this matrix, simply use this to index into your vector to achieve the desired sampling matrix. To generate this random index matrix, randi is a fine choice.
Something like this comes to mind:
N = 10000; %// Number of trials
M = 5; %// Number of samples per trial
population = 1:4; %// Population vector
%// Generate random indices
ind = randi(numel(population), N, M);
%// Get the stuff
out = population(ind);
Here's the first 10 rows of the output:
>> out(1:10,:)
ans =
4 3 1 4 2
4 4 1 3 4
3 2 2 2 3
1 4 2 2 2
1 2 3 4 2
2 2 3 2 1
4 1 3 2 4
1 4 1 3 1
1 1 2 4 4
1 2 4 2 1
I think the above does what you want. Also keep in mind that the above code generalizes to any population vector you want. You simply have to change the vector and it will work as advertised.
datasample interprets each column of your data as one element of your population, sampling among all columns.
To fix this you could call datasample N times in a loop, instead I would use randi
population(randi(numel(population),N,5))
assuming your population is always 1:p, you could simplify to:
randi(p,N,5)
Ok so both of the current answers both say don't use datasample and use randi instead. However, I have a solution for you with datasample and arrayfun.
>> population = [1 2 3 4];
>> k = 5; % Number of samples
>> n = 1000; % Number of times to execute datasample(population, k)
>> s = arrayfun(#(k) datasample(population, k), n*ones(k, 1), 'UniformOutput', false);
>> s = cell2mat(s);
s =
1 4 1 4 4
4 1 2 2 4
2 4 1 2 1
1 4 3 3 1
4 3 2 3 2
We need to make sure to use 'UniformOutput', false with arrayfun as there is more than one output. The cell2mat call is needed as the result of arrayfun is a cell array.

Matlab, Sum Function for a Matrix row

Basically the sum function calculate the sum of the columns, that is to say if we have a 4x4 matrix we would get a 1X4 vector
A = magic(4)
A =
16 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
sum(A)
ans =
34 34 34 34
But if I want to get the Summation of the rows then i have 2 methods, the first is to get the transpose of the matrix then get the summation of the transposed matrix,and finally get the transpose of the result...., The Second method is to use dimension argument for the Sum function "sum(A, 2)"
A = magic(4)
A =
16 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
sum(A,2)
ans =
34
34
34
34
The problem is here I cannot understand how this is done, If anyone could please tell me the idea/concept behind this method,
It's hard to tell exactly how sum internally works, but we can guess it does something similar to this.
Matlab stores matrices (or N-dimensional arrays) in memory using column-major order. This means the order for the elements in memory for a 3 x 4 matrix is
1 4 7 10
2 5 8 11
3 6 9 12
So it first stores element (1,1), then (1,2), then (13), then (2,1), ...
In fact, this is the order you use when you apply linear indexing (that is, index a matrix with a single number). For example, let
A = [7 8 6 2
9 0 3 5
6 3 2 1];
Then A(4) gives 8.
With this in mind, it's easy to guess that what sum(A,1) does is traverse elements consecutively: A(1)+A(2)+A(3) to obtain the sum of the first column, then A(4)+A(5)+A(6) to sum the second column, etc. In contrast, sum(A,2) proceeds in steps of size(A,1) (3 in this example): A(1)+A(4)+A(7)+A(10) to compute the sum of the first row, etc.
As a side note, this is probably related with the observed fact that sum(A,1) is faster than sum(A,2).
I'm really not sure what you are asking. sum takes two inputs, the first of which is a multidimensional array A, say.
Now let's take sA = size(A), and d between 1 and ndims(A).
To understand what B = sum(A,d) does, first we find out what the size of B is.
That's easy, sB = sA; sB(d) = 1;. So in a way, it will "reduce" the size of A along dimension d.
The rest is trivial: every element in B is the sum of elements in A along dimension d.
Basically, sum(A) = sum(A,1) which outputs the sum of the columns in the matrix. 1 indicates the columns. So, sum(A,2) outputs the sum of the rows in the matrix. 2 indicating the rows. More than that, the sum command will output the entire matrix because there is only 2 dimensions (rows and columns)

Creating combinations without repetitions

I have so far the following code:
data = xlsread('filename');
% 1000 samples without replacement
% each element of y contains 10 values without repetition
y = cell(10,1000);
for i = 1:1000
y{i} = datasample(data,10,'Replace',false);
end
Now I dont want to have the same vector twice in the cell y, and by twice I also mean vectors like [ 1 2 3 4 5 6 7 8 9 10] and [1 2 3 4 5 6 7 8 10 9], i.e the ordering of the elements does not matter, but if 2 vectors contain the same elements I want one to be deleted. How do I do that? Is there alternatively a way to sample some of combinations without replacement from data? Data contains 171 values, and all of the combinations without replacement would probably be some milions, whereas I basically only need around 1000 combinations without replacement.. Thanks