Enumerating combinations of cells - matlab

Say I have 3 cells:
M1={ [1,1,1], [2,2,2] }
M2={ [3,3], [4,4] }
M3={ [5], [6] }
I want to take every element in M1, combine it with every element of M2, combine that with every element of M3, ect.
For the input above, I would like to produce one giant cell like:
[1,1,1],[3,3],[5]
[1,1,1],[3,3],[6]
[1,1,1],[4,4],[5]
[1,1,1],[4,4],[6]
[2,2,2],[3,3],[5]
[2,2,2],[3,3],[6]
[2,2,2],[4,4],[5]
[2,2,2],[4,4],[6]
How can I do this? In general, the number of cells (M1,M2...Mn), and their size, are unknown (and changing).

This function does what you want:
function C = add_permutations(A,B)
% A is a cell array NxK, B is 1xM
% C is a cell array N*M x K+1
N = size(A,1);
A = reshape(A,N,1,[]);
C = cat(3,repmat(A,1,numel(B)),repmat(B,N,1));
C = reshape(C,[],size(C,3));
It creates all combinations of two cell arrays by replicating them in different dimensions, then concatenating along the 3rd dimension and collapsing the first two dimensions. Because we want to repeatedly call it with different cell arrays, input A (NxK) has K matrices in each row, these are the previous combinations. B is a cell vector, each element will be combined with each row of A.
You use it as follows:
M1 = { 'a', 'b', 'c', 'd' }; % These are easier for debugging than OP's input, but cell elements can be anything at all.
M2 = { 1, 2 };
M3 = { 10, 12 };
X = M1.';
X = add_permutations(X,M2);
X = add_permutations(X,M3);
X now contains:
X =
16×3 cell array
'a' [1] [10]
'b' [1] [10]
'c' [1] [10]
'd' [1] [10]
'a' [2] [10]
'b' [2] [10]
'c' [2] [10]
'd' [2] [10]
'a' [1] [12]
'b' [1] [12]
'c' [1] [12]
'd' [1] [12]
'a' [2] [12]
'b' [2] [12]
'c' [2] [12]
'd' [2] [12]

That's not a permutation, it's an enumeration: you have 3 symbols, each with 2 possible values, and you are simply enumerating all possible "numbers". You can think about it the same way as if you were counting binary numbers with 3 digits.
In this case, one way to enumerate all these possibilities is with ndgrid. If M1 has n1 elements, M2 has n2 elements, etc:
n1 = numel(M1);
n2 = numel(M2);
n3 = numel(M3);
[a,b,c] = ndgrid(1:n1, 1:n2, 1:n3);
Here a,b,c are each 3-dimensional array, which represent the "grid" of combinations. Obviously you don't need that, so you can vectorise them, and use them to create combinations of the various elements in M1, M2, M3, like so
vertcat( M1(a(:)), M2(b(:)), M3(c(:)) )
If you are interested in generalising this for any number of Ms, this can also be done, but keep in mind that these "grids" are growing very fast as you increase their dimensionality.
Note: vertcat stands for "vertical concatenation", the reason it is vertical and not horizontal is because the result of M1(a(:)) is a row-shaped cell, even though a(:) is a column vector. That's just indexing headache, but you can simply transpose the result if you want it Nx3.

Related

MATLAB: How to select rows from dataset where nominal variables is greater than a certain frequency?

I have a dataset like this in MATLAB:
stovdata =
state discipline gender avggradereceived
OH Humanities M A
TX Communication F B
CA Philosophy M B
CA Anthropology M A+
CA Mathematics M B+
NV English F A-
CA Communication M B-
MA Sociology M A-
OK Anthropology F B-
VA Languages F A
I want to select all rows containing disciplines which have a frequency greater than 2.
So I am calling the sortrows function after tabulating the stovdata to get the descending order of frequencies in discipline:
>> sortrows(tabulate(stovdata.discipline), -2)
ans =
'Anthropology' [2] [20]
'Communication' [2] [20]
'English' [1] [10]
'Humanities' [1] [10]
'Languages' [1] [10]
'Mathematics' [1] [10]
'Philosophy' [1] [10]
'Sociology' [1] [10]
Now I want a reduced dataset which looks like this:
new_stovdata =
state discipline gender avggradereceived
TX Communication F B
CA Anthropology M A+
CA Communication M B-
OK Anthropology F B-
Thanks.
For a given n x m cell array tmp in MATLAB executing something like tmp{:,j} or tmp{i,:} returns what is known as a Comma Separated List. You can construct an array from the list just by using square brackets around it. For example:
>> tmp = num2cell(eye(3))
tmp =
[1] [0] [0]
[0] [1] [0]
[0] [0] [1]
>> [tmp{:,1}]
ans =
1 0 0
In this case you can use the following to get the logical indexing for all rows with freq >= 2 from your data as:
idx = [data{:,2}] >= 2;
You can use this idx to get the rows from the cell array as:
data_ = data(idx,:);
If you are not looking for the most efficient solution. The following will do the job:
SortedData=sortrows(tabulate(stovdata.discipline), -2);
NumberOfRows=size(SortedData,1);
IndexItr=1;
for RowItr=1:NumberOfRows
if SortedData{RowItr,2}(1)>=2
IndicesOfRowsOfInterest(IndexItr)=RowItr;
IndexItr=IndexItr+1;
else
break;
end
end
PS: IndicesOfRowsOfInterest contains the indices of rows containing disciplines which have a frequency greater than 2.

Apply cellfun to only one column in cell array

I know cellfun can be applied to an entire cell array and understand its syntax. However is it possible to apply cellfun only to one column in a cell array and not have it affect the other columns?
As user1543042 and It's magic said in the comments, you can apply the cell function to just one column using ':', but you want to add an assignment step. Also, as you want the cell function to return a cell array, you need to flag non-uniformoutput. So, you end up with:
C(:,i) = cellfun(#foo, C(:,i), 'UniformOutput', false)
To see an example in action:
>> C = {1,2,3;4 5 6};
>> C
C =
[1] [2] [3]
[4] [5] [6]
>> size(C)
ans =
2 3
>> cellfun(#(x)x.^2,C(:,1))
ans =
1
16
>> C(:,1) = cellfun(#(x)x.^2,C(:,1))
Conversion to cell from double is not possible.
>> C(:,1) = cellfun(#(x)x.^2,C(:,1),'UniformOutput',false)
C =
[ 1] [2] [3]
[16] [5] [6]
>>

In Matlab, How to eliminate empty columns from the cell array?

So in 3 X 18 cell array, 7 columns are empty and I need a new cell array that's 3 X 11. Any suggestions without going for looping ?
Let's consider the following cell array. Its second column consists only of [], so it should be removed.
>> c = {1 , [], 'a'; 2, [], []; 3, [], 'bc'}
c =
[1] [] 'a'
[2] [] []
[3] [] 'bc'
You can compute a logical index to tell which columns should be kept and then use it to obtain the result:
>> keep = any(~cellfun('isempty',c), 1); %// keep columns that don't only contain []
keep =
1 0 1 %// column 2 should be removed
>> result = c(:,keep)
result =
[1] 'a'
[2] []
[3] 'bc'
How it works:
cellfun('isempty' ,c) is a matrix the same size as c. It contains 1 at entry (m,n) if and only if c{m,n} is empty.
~cellfun('isempty' ,c) is the logical negation of the above, so it contains 1 where c is not empty.
any(~cellfun('isempty' ,c), 1) applies any to each column of the above. So it's a row vector such that its m-th entry equals 1 if any of the cells of c in that column are non-empty, and 0 otherwise.
The above is used as a logical index to select the desired columns of c.
Use cellfun to detect elements, then from that find columns with empty elements and delete those:
cellarray(:, any(cellfun(#isempty, cellarray), 1)) = [];
If instead you'd like to keep columns with at least one non-empty element, use all instead of any.
For example:
>> cellarray = {1 2 ,[], 4;[], 5, [], 3}
[1] [2] [] [4]
[] [5] [] [3]
>> cellarray(:,any(cellfun(#isempty, cellarray), 1))=[]
cellarray =
[2] [4]
[5] [3]

Sorting a cell array

I want to sort the rows according to their second entries, i.e. by second column. Each entry of the second column is an array chars(representing a time stamp). There also might be missing values, i.e. the entry in the second column can be []. How do I do this?
you need to use the sortrows() function
if the matrix you wanted to sort is A then use
sorted_matrix = sortrows(A,2);
http://www.mathworks.com/help/techdoc/ref/sortrows.html
I would first convert the time stamps from strings to numeric values using the function DATENUM. Then you will want to replace the contents of the empty cells with a place holder, like NaN. The you can use the function SORTROWS to sort based on the second column. Here is an example:
>> mat = {1 '1/1/10' 3; 4 [] 6; 7 '1/1/09' 9} %# Sample cell array
mat =
[1] '1/1/10' [3]
[4] [] [6]
[7] '1/1/09' [9]
>> validIndex = ~cellfun('isempty',mat(:,2)); %# Find non-empty indices
>> mat(validIndex,2) = num2cell(datenum(mat(validIndex,2))); %# Convert dates
>> mat(~validIndex,2) = {NaN}; %# Replace empty cells with NaN
>> mat = sortrows(mat,2) %# Sort based on the second column
mat =
[7] [733774] [9]
[1] [734139] [3]
[4] [ NaN] [6]
The NaN values will be sorted to the bottom in this case.

Replace a string in cell array into 1x3 numeric cell array

Cell array data as below:
data=
'A' [0.006] 'B'
'C' [3.443] 'C'
i would like to convert character in first column in to 1x3 vector, mean that
'A' replace by [0] [0] [0],
'C' replace by [0] [1] [0]..
the result will be
[0] [0] [0] [0.006] 'B'
[0] [1] [0] [3.443] 'C'
the code i tried as below:
B=data(1:end,1);
B=regexprep(B,'C','[0 0 0]');
B=regexprep(B,'A','[0 1 0]');
the result show me
B=
'[0 0 0]'
'[0 1 0]'
which is wrong, each character does not change to 1x3 array...please help...
Since you did not specify the rule to convert letters to numbers,
I assumed you want to replace A with 000, B with 001, ..., H with 111
(ie numbers from 0 to 7 in binary, corresponding to letters A to H).
In case you want to go up to Z, the code below can be easily changed.
%# you data cell array
data = {
'A' [0.006] 'B'
'C' [3.443] 'C'
};
%# compute binary numbers equivalent to letters A to H
binary = num2cell(dec2bin(0:7)-'0'); %# use 0:25 to go up to Z
%# convert letters in to row indices in the above cell array "binary"
idx = cellfun(#(c) c-'A'+1, upper(data(:,1)));
%# replace first column, and build new data
newData = [binary(idx,:) data(:,2:end)]
The result:
newData =
[0] [0] [0] [0.006] 'B'
[0] [1] [0] [3.443] 'C'