Matlab : create parameter table from parameters for sweep analysis [duplicate] - matlab

This question pops up quite often in one form or another (see for example here or here). So I thought I'd present it in a general form, and provide an answer which might serve for future reference.
Given an arbitrary number n of vectors of possibly different sizes, generate an n-column matrix whose rows describe all combinations of elements taken from those vectors (Cartesian product) .
For example,
vectors = { [1 2], [3 6 9], [10 20] }
should give
combs = [ 1 3 10
1 3 20
1 6 10
1 6 20
1 9 10
1 9 20
2 3 10
2 3 20
2 6 10
2 6 20
2 9 10
2 9 20 ]

The ndgrid function almost gives the answer, but has one caveat: n output variables must be explicitly defined to call it. Since n is arbitrary, the best way is to use a comma-separated list (generated from a cell array with ncells) to serve as output. The resulting n matrices are then concatenated into the desired n-column matrix:
vectors = { [1 2], [3 6 9], [10 20] }; %// input data: cell array of vectors
n = numel(vectors); %// number of vectors
combs = cell(1,n); %// pre-define to generate comma-separated list
[combs{end:-1:1}] = ndgrid(vectors{end:-1:1}); %// the reverse order in these two
%// comma-separated lists is needed to produce the rows of the result matrix in
%// lexicographical order
combs = cat(n+1, combs{:}); %// concat the n n-dim arrays along dimension n+1
combs = reshape(combs,[],n); %// reshape to obtain desired matrix

A little bit simpler ... if you have the Neural Network toolbox you can simply use combvec:
vectors = {[1 2], [3 6 9], [10 20]};
combs = combvec(vectors{:}).' % Use cells as arguments
which returns a matrix in a slightly different order:
combs =
1 3 10
2 3 10
1 6 10
2 6 10
1 9 10
2 9 10
1 3 20
2 3 20
1 6 20
2 6 20
1 9 20
2 9 20
If you want the matrix that is in the question, you can use sortrows:
combs = sortrows(combvec(vectors{:}).')
% Or equivalently as per #LuisMendo in the comments:
% combs = fliplr(combvec(vectors{end:-1:1}).')
which gives
combs =
1 3 10
1 3 20
1 6 10
1 6 20
1 9 10
1 9 20
2 3 10
2 3 20
2 6 10
2 6 20
2 9 10
2 9 20
If you look at the internals of combvec (type edit combvec in the command window), you'll see that it uses different code than #LuisMendo's answer. I can't say which is more efficient overall.
If you happen to have a matrix whose rows are akin to the earlier cell array you can use:
vectors = [1 2;3 6;10 20];
vectors = num2cell(vectors,2);
combs = sortrows(combvec(vectors{:}).')

I've done some benchmarking on the two proposed solutions. The benchmarking code is based on the timeit function, and is included at the end of this post.
I consider two cases: three vectors of size n, and three vectors of sizes n/10, n and n*10 respectively (both cases give the same number of combinations). n is varied up to a maximum of 240 (I choose this value to avoid the use of virtual memory in my laptop computer).
The results are given in the following figure. The ndgrid-based solution is seen to consistently take less time than combvec. It's also interesting to note that the time taken by combvec varies a little less regularly in the different-size case.
Benchmarking code
Function for ndgrid-based solution:
function combs = f1(vectors)
n = numel(vectors); %// number of vectors
combs = cell(1,n); %// pre-define to generate comma-separated list
[combs{end:-1:1}] = ndgrid(vectors{end:-1:1}); %// the reverse order in these two
%// comma-separated lists is needed to produce the rows of the result matrix in
%// lexicographical order
combs = cat(n+1, combs{:}); %// concat the n n-dim arrays along dimension n+1
combs = reshape(combs,[],n);
Function for combvec solution:
function combs = f2(vectors)
combs = combvec(vectors{:}).';
Script to measure time by calling timeit on these functions:
nn = 20:20:240;
t1 = [];
t2 = [];
for n = nn;
%//vectors = {1:n, 1:n, 1:n};
vectors = {1:n/10, 1:n, 1:n*10};
t = timeit(#() f1(vectors));
t1 = [t1; t];
t = timeit(#() f2(vectors));
t2 = [t2; t];
end

Here's a do-it-yourself method that made me giggle with delight, using nchoosek, although it's not better than #Luis Mendo's accepted solution.
For the example given, after 1,000 runs this solution took my machine on average 0.00065935 s, versus the accepted solution 0.00012877 s. For larger vectors, following #Luis Mendo's benchmarking post, this solution is consistently slower than the accepted answer. Nevertheless, I decided to post it in hopes that maybe you'll find something useful about it:
Code:
tic;
v = {[1 2], [3 6 9], [10 20]};
L = [0 cumsum(cellfun(#length,v))];
V = cell2mat(v);
J = nchoosek(1:L(end),length(v));
J(any(J>repmat(L(2:end),[size(J,1) 1]),2) | ...
any(J<=repmat(L(1:end-1),[size(J,1) 1]),2),:) = [];
V(J)
toc
gives
ans =
1 3 10
1 3 20
1 6 10
1 6 20
1 9 10
1 9 20
2 3 10
2 3 20
2 6 10
2 6 20
2 9 10
2 9 20
Elapsed time is 0.018434 seconds.
Explanation:
L gets the lengths of each vector using cellfun. Although cellfun is basically a loop, it's efficient here considering your number of vectors will have to be relatively low for this problem to even be practical.
V concatenates all the vectors for easy access later (this assumes you entered all your vectors as rows. v' would work for column vectors.)
nchoosek gets all the ways to pick n=length(v) elements from the total number of elements L(end). There will be more combinations here than what we need.
J =
1 2 3
1 2 4
1 2 5
1 2 6
1 2 7
1 3 4
1 3 5
1 3 6
1 3 7
1 4 5
1 4 6
1 4 7
1 5 6
1 5 7
1 6 7
2 3 4
2 3 5
2 3 6
2 3 7
2 4 5
2 4 6
2 4 7
2 5 6
2 5 7
2 6 7
3 4 5
3 4 6
3 4 7
3 5 6
3 5 7
3 6 7
4 5 6
4 5 7
4 6 7
5 6 7
Since there are only two elements in v(1), we need to throw out any rows where J(:,1)>2. Similarly, where J(:,2)<3, J(:,2)>5, etc... Using L and repmat we can determine whether each element of J is in its appropriate range, and then use any to discard rows that have any bad element.
Finally, these aren't the actual values from v, just indices. V(J) will return the desired matrix.

Related

How to split a matrix based on how close the values are?

Suppose I have a matrix A:
A = [1 2 3 6 7 8];
I would like to split this matrix into sub-matrices based on how relatively close the numbers are. For example, the above matrix must be split into:
B = [1 2 3];
C = [6 7 8];
I understand that I need to define some sort of criteria for this grouping so I thought I'd take the absolute difference of the number and its next one, and define a limit upto which a number is allowed to be in a group. But the problem is that I cannot fix a static limit on the difference since the matrices and sub-matrices will be changing.
Another example:
A = [5 11 6 4 4 3 12 30 33 32 12];
So, this must be split into:
B = [5 6 4 4 3];
C = [11 12 12];
D = [30 33 32];
Here, the matrix is split into three parts based on how close the values are. So the criteria for this matrix is different from the previous one though what I want out of each matrix is the same, to separate it based on the closeness of its numbers. Is there any way I can specify a general set of conditions to make the criteria dynamic rather than static?
I'm afraid, my answer comes too late for you, but maybe future readers with a similar problem can profit from it.
In general, your problem calls for cluster analysis. Nevertheless, maybe there's a simpler solution to your actual problem. Here's my approach:
First, sort the input A.
To find a criterion to distinguish between "intraclass" and "interclass" elements, I calculate the differences between adjacent elements of A, using diff.
Then, I calculate the median over all these differences.
Finally, I find the indices for all differences, which are greater or equal than three times the median, with a minimum difference of 1. (Depending on the actual data, this might be modified, e.g. using mean instead.) These are the indices, where you will have to "split" the (sorted) input.
At last, I set up two vectors with the starting and end indices for each "sub-matrix", to use this approach using arrayfun to get a cell array with all desired "sub-matrices".
Now, here comes the code:
% Sort input, and calculate differences between adjacent elements
AA = sort(A);
d = diff(AA);
% Calculate median over all differences
m = median(d);
% Find indices with "significantly higher difference",
% e.g. greater or equal than three times the median
% (minimum difference should be 1)
idx = find(d >= max(1, 3 * m));
% Set up proper start and end indices
start_idx = [1 idx+1];
end_idx = [idx numel(A)];
% Generate cell array with desired vectors
out = arrayfun(#(x, y) AA(x:y), start_idx, end_idx, 'UniformOutput', false)
Due to the unknown number of possible vectors, I can't think of way to "unpack" these to individual variables.
Some tests:
A =
1 2 3 6 7 8
out =
{
[1,1] =
1 2 3
[1,2] =
6 7 8
}
A =
5 11 6 4 4 3 12 30 33 32 12
out =
{
[1,1] =
3 4 4 5 6
[1,2] =
11 12 12
[1,3] =
30 32 33
}
A =
1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 3
out =
{
[1,1] =
1 1 1 1 1 1 1
[1,2] =
2 2 2 2 2 2
[1,3] =
3 3 3 3 3 3 3
}
Hope that helps!

matlab shuffle elements of vector with the same sequent of the same number

I have the following vector
a = 3 3 5 5 20 20 20 4 4 4 2 2 2 10 10 10 6 6 1 1 1
does anyone know how to shuffle this vector with the same elementsnever be seperate?
something like bellow
a = 10 10 10 5 5 4 4 4 20 20 20 1 1 1 3 3 2 2 2 6 6
thank you, best regard...
You can use unique combined with accumarray to create a cell array where each group of values is placed into a separate cell element. You can then shuffle these elements and recombine them into an array.
% Put each group into a separate cell of a cell array
[~, ~, ind] = unique(a);
C = accumarray(ind(:), a(:), [], #(x){x});
% Shuffle it
shuffled = C(randperm(numel(C)));
% Now make it back into a vector
out = cat(1, shuffled{:}).';
% 20 20 20 1 1 1 3 3 10 10 10 5 5 4 4 4 6 6 2 2 2
Another option is to get the values using unique and then compute the number that each occurs. You can then shuffle the values and use repelem to expand out the result
u = unique(a);
counts = histc(a, u);
% Shuffle the values
inds = randperm(numel(u));
% Now expand out the array
out = repelem(u(inds), counts(inds));
A very similar answer to #Suever, using a loop and logical matrix rather than cells
a = [3 3 5 5 20 20 20 4 4 4 2 2 2 10 10 10 6 6 1 1 1];
vals = unique(a); %find unique values
vals = vals(randperm(length(vals))); %shuffle vals matrix
aout = []; %initialize output matrix
for ii = 1:length(vals)
aout = [aout a(a==(vals(ii)))]; %add correct number of each value
end
Here's another approach:
a = [3 3 5 5 20 20 20 4 4 4 2 2 2 10 10 10 6 6 1 1 1];
[~, ~, lab] = unique(a);
r = randperm(max(lab));
[~, ind] = sort(r(lab));
result = a(ind);
Example result:
result =
2 2 2 3 3 5 5 20 20 20 4 4 4 10 10 10 1 1 1 6 6
It works as follows:
Assign unique labels to each element of a depending on their values (this is vector lab);
Apply a random bijection from the values of lab to themselves (the random bijection is represented by r; the result of applying it is r(lab));
Sort r(lab) and get the indices of the sorting (this is ind);
Apply those indices to a.

How to align vectors with asynchronous time stamp in matlab?

I would like to align and count vectors with different time stamps to count the corresponding bins.
Let's assume I have 3 matrix from [N,edges] = histcounts in the following structure. The first row represents the edges, so the bins. The second row represents the values. I would like to sum all values with the same bin.
A = [0 1 2 3 4 5;
5 5 6 7 8 5]
B = [1 2 3 4 5 6;
2 5 7 8 5 4]
C = [2 3 4 5 6 7 8;
1 2 6 7 4 3 2]
Now I want to sum all the same bins. My final result should be:
result = [0 1 2 3 4 5 6 7 8;
5 7 12 16 ...]
I could loop over all numbers, but I would like to have it fast.
You can use accumarray:
H = [A B C].'; %//' Concatenate the histograms and make them column vectors
V = [unique(H(:,1)) accumarray(H(:,1)+1, H(:,2))].'; %//' Find unique values and accumulate
V =
0 1 2 3 4 5 6 7 8
5 7 12 16 22 17 8 3 2
Note: The H(:,1)+1 is to force the bin values to be positive, otherwise MATLAB will complain. We still use the actual bins in the output V. To avoid this, as #Daniel says in the comments, use the third output of unique (See: https://stackoverflow.com/a/27783568/2732801):
H = [A B C].'; %//' stupid syntax highlighting :/
[U, ~, IU] = unique(H(:,1));
V = [U accumarray(IU, H(:,2))].';
If you're only doing it with 3 variables as you've shown then there likely aren't going to be any performance hits with looping it.
But if you are really averse to the looping idea, then you can do it using arrayfun.
rng = 0:8;
output = arrayfun(#(x)sum([A(2,A(1,:) == x), B(2,B(1,:) == x), C(2,C(1,:) == x)]), rng);
output = cat(1, rng, output);
output =
0 1 2 3 4 5 6 7 8
5 7 12 16 22 17 8 3 2
This can be beneficial for particularly large A, B, and C variables as there is no copying of data.

getting all possible combinations of multiple vectors of different size [duplicate]

This question pops up quite often in one form or another (see for example here or here). So I thought I'd present it in a general form, and provide an answer which might serve for future reference.
Given an arbitrary number n of vectors of possibly different sizes, generate an n-column matrix whose rows describe all combinations of elements taken from those vectors (Cartesian product) .
For example,
vectors = { [1 2], [3 6 9], [10 20] }
should give
combs = [ 1 3 10
1 3 20
1 6 10
1 6 20
1 9 10
1 9 20
2 3 10
2 3 20
2 6 10
2 6 20
2 9 10
2 9 20 ]
The ndgrid function almost gives the answer, but has one caveat: n output variables must be explicitly defined to call it. Since n is arbitrary, the best way is to use a comma-separated list (generated from a cell array with ncells) to serve as output. The resulting n matrices are then concatenated into the desired n-column matrix:
vectors = { [1 2], [3 6 9], [10 20] }; %// input data: cell array of vectors
n = numel(vectors); %// number of vectors
combs = cell(1,n); %// pre-define to generate comma-separated list
[combs{end:-1:1}] = ndgrid(vectors{end:-1:1}); %// the reverse order in these two
%// comma-separated lists is needed to produce the rows of the result matrix in
%// lexicographical order
combs = cat(n+1, combs{:}); %// concat the n n-dim arrays along dimension n+1
combs = reshape(combs,[],n); %// reshape to obtain desired matrix
A little bit simpler ... if you have the Neural Network toolbox you can simply use combvec:
vectors = {[1 2], [3 6 9], [10 20]};
combs = combvec(vectors{:}).' % Use cells as arguments
which returns a matrix in a slightly different order:
combs =
1 3 10
2 3 10
1 6 10
2 6 10
1 9 10
2 9 10
1 3 20
2 3 20
1 6 20
2 6 20
1 9 20
2 9 20
If you want the matrix that is in the question, you can use sortrows:
combs = sortrows(combvec(vectors{:}).')
% Or equivalently as per #LuisMendo in the comments:
% combs = fliplr(combvec(vectors{end:-1:1}).')
which gives
combs =
1 3 10
1 3 20
1 6 10
1 6 20
1 9 10
1 9 20
2 3 10
2 3 20
2 6 10
2 6 20
2 9 10
2 9 20
If you look at the internals of combvec (type edit combvec in the command window), you'll see that it uses different code than #LuisMendo's answer. I can't say which is more efficient overall.
If you happen to have a matrix whose rows are akin to the earlier cell array you can use:
vectors = [1 2;3 6;10 20];
vectors = num2cell(vectors,2);
combs = sortrows(combvec(vectors{:}).')
I've done some benchmarking on the two proposed solutions. The benchmarking code is based on the timeit function, and is included at the end of this post.
I consider two cases: three vectors of size n, and three vectors of sizes n/10, n and n*10 respectively (both cases give the same number of combinations). n is varied up to a maximum of 240 (I choose this value to avoid the use of virtual memory in my laptop computer).
The results are given in the following figure. The ndgrid-based solution is seen to consistently take less time than combvec. It's also interesting to note that the time taken by combvec varies a little less regularly in the different-size case.
Benchmarking code
Function for ndgrid-based solution:
function combs = f1(vectors)
n = numel(vectors); %// number of vectors
combs = cell(1,n); %// pre-define to generate comma-separated list
[combs{end:-1:1}] = ndgrid(vectors{end:-1:1}); %// the reverse order in these two
%// comma-separated lists is needed to produce the rows of the result matrix in
%// lexicographical order
combs = cat(n+1, combs{:}); %// concat the n n-dim arrays along dimension n+1
combs = reshape(combs,[],n);
Function for combvec solution:
function combs = f2(vectors)
combs = combvec(vectors{:}).';
Script to measure time by calling timeit on these functions:
nn = 20:20:240;
t1 = [];
t2 = [];
for n = nn;
%//vectors = {1:n, 1:n, 1:n};
vectors = {1:n/10, 1:n, 1:n*10};
t = timeit(#() f1(vectors));
t1 = [t1; t];
t = timeit(#() f2(vectors));
t2 = [t2; t];
end
Here's a do-it-yourself method that made me giggle with delight, using nchoosek, although it's not better than #Luis Mendo's accepted solution.
For the example given, after 1,000 runs this solution took my machine on average 0.00065935 s, versus the accepted solution 0.00012877 s. For larger vectors, following #Luis Mendo's benchmarking post, this solution is consistently slower than the accepted answer. Nevertheless, I decided to post it in hopes that maybe you'll find something useful about it:
Code:
tic;
v = {[1 2], [3 6 9], [10 20]};
L = [0 cumsum(cellfun(#length,v))];
V = cell2mat(v);
J = nchoosek(1:L(end),length(v));
J(any(J>repmat(L(2:end),[size(J,1) 1]),2) | ...
any(J<=repmat(L(1:end-1),[size(J,1) 1]),2),:) = [];
V(J)
toc
gives
ans =
1 3 10
1 3 20
1 6 10
1 6 20
1 9 10
1 9 20
2 3 10
2 3 20
2 6 10
2 6 20
2 9 10
2 9 20
Elapsed time is 0.018434 seconds.
Explanation:
L gets the lengths of each vector using cellfun. Although cellfun is basically a loop, it's efficient here considering your number of vectors will have to be relatively low for this problem to even be practical.
V concatenates all the vectors for easy access later (this assumes you entered all your vectors as rows. v' would work for column vectors.)
nchoosek gets all the ways to pick n=length(v) elements from the total number of elements L(end). There will be more combinations here than what we need.
J =
1 2 3
1 2 4
1 2 5
1 2 6
1 2 7
1 3 4
1 3 5
1 3 6
1 3 7
1 4 5
1 4 6
1 4 7
1 5 6
1 5 7
1 6 7
2 3 4
2 3 5
2 3 6
2 3 7
2 4 5
2 4 6
2 4 7
2 5 6
2 5 7
2 6 7
3 4 5
3 4 6
3 4 7
3 5 6
3 5 7
3 6 7
4 5 6
4 5 7
4 6 7
5 6 7
Since there are only two elements in v(1), we need to throw out any rows where J(:,1)>2. Similarly, where J(:,2)<3, J(:,2)>5, etc... Using L and repmat we can determine whether each element of J is in its appropriate range, and then use any to discard rows that have any bad element.
Finally, these aren't the actual values from v, just indices. V(J) will return the desired matrix.

MATLAB: Keeping a number of random rows in a matrix which satisfy certain conditions

I have a matrix in Matlab which looks similar to this, except with thousands of rows:
A =
5 6 7 8
6 1 2 3
5 1 4 8
5 2 3 7
5 8 7 2
6 1 3 8
5 2 1 6
6 3 2 1
I would like to get out a matrix that has three random rows with a '5' in the first column and three random rows with a '6' in the first column. So in this case, the output matrix would look something like this:
A =
5 6 7 8
6 1 2 3
5 2 3 7
6 1 3 8
5 2 1 6
6 3 2 1
The rows must be random, not just the first three or the last three in the original matrix.
I'm not really sure how to begin this, so any help would be greatly appreciated.
EDIT: This is the most successful attempt I've had so far. I found all the rows with a '5' in the first column:
BLocation = find(A(:,1) == 5);
B = A(BLocation,:);
Then I was trying to use 'randsample' like this to find three random rows from B:
C = randsample(B,3);
But 'randsample' does not work with a matrix.
I also think this could be done a little more efficiently.
You need to run randsample on the row indices that satisfy the conditions, i.e. equality to 5 or 6.
n = size(A,1);
% construct the linear indices for rows with 5 and 6
indexA = 1:n;
index5 = indexA(A(:,1)==5);
index6 = indexA(A(:,1)==6);
% sample three (randomly) from each
nSamples = 3;
r5 = randsample(index5, nSamples);
r6 = randsample(index6, nSamples);
% new matrix from concatenation
B = [A(r5,:); A(r6,:)];
Update: You can also use find to replace the original index construction, as yuk suggested, which proves to be faster (and optimized!).
Bechmark (MATLAB R2012a)
A = randi(10, 1e8, 2); % 10^8 rows random matrix of 1-10
tic;
n = size(A,1);
indexA = 1:n;
index5_1 = indexA(A(:,1)==5);
toc
tic;
index5_2 = find(A(:,1)==5);
toc
Elapsed time is 1.234857 seconds.
Elapsed time is 0.679076 seconds.
You can do this as follows:
desiredMat=[];
mat1=A(A(:,1)==5,:);
mat1=mat1(randperm(size(mat1,1)),:);
desiredMat=[desiredMat;mat1(1:3,:)];
mat1=A(A(:,1)==6,:);
mat1=mat1(randperm(size(mat1,1)),:);
desiredMat=[desiredMat;mat1(1:3,:)];
The above code uses logical indexing. You can also do this with find function (logical indexing is always faster than find).