Count Occurrence of Cell Array to Cell Array Matlab - matlab

I've got 2 string cell arrays, one is the unique version of the other. I would like to count the number of occurrence of each values in the unique cell array given the other cell array. I got a large cell array so I thought I'd try my best to find answers to a more faster approach as oppose to looping...
An example:
x = {'the'
'the'
'aaa'
'b'
'the'
'c'
'c'
'd'
'aaa'}
y=unique(x)
I am looking for an output in any form that contains something like the following:
'aaa' = 2
'b' = 1
'c' = 2
'd' = 1
'the' = 3
Any ideas?

One way is to count the indices unique finds:
[y, ~, idx] = unique(x);
counts = histc(idx, 1:length(y));
which gives
counts =
2
1
2
1
3
in the same order as y.
histc is my default fallback for counting things, but the function I always forget about is probably better in this case:
counts = accumarray(idx, 1);
should give the same result and is probably more efficient.

Related

How to find an matching element (either number or string) in a multi level cell?

I am trying to search a cell of cell arrays for a matching number (for example, 2) or string ('text'). Example for a cell:
A = {1 {2; 3};4 {5 'text' 7;8 9 10}};
There is similar question. However, this solution works only, if you want to find a number value in cell. I would need a solution as well for numbers as for strings.
The needed output should be 1 or 0 (the value is or is not in the cell A) and the cell level/deepness where the matched element was found.
For your example input, you can match character vectors as well as numbers by replacing ismember in the linked solution with isequal. You can get the depth at which the search value was found by tracking how many times the function has to go round the while loop.
function [isPresent, depth] = is_in_cell(cellArray, value)
depth = 1;
f = #(c) isequal(value, c);
cellIndex = cellfun(#iscell, cellArray);
isPresent = any(cellfun(f, cellArray(~cellIndex)));
while ~isPresent
depth = depth + 1;
cellArray = [cellArray{cellIndex}];
cellIndex = cellfun(#iscell, cellArray);
isPresent = any(cellfun(f, cellArray(~cellIndex)));
if ~any(cellIndex)
break
end
end
end
Using isequal works because f is only called for elements of cellArray that are not themselves cell arrays. Use isequaln if you want to be able to search for NaN values.
Note this now won't search inside numeric, logical or string arrays:
>> A = {1 {2; 3};4 {5 'text' 7;8 9 [10 11 12]}};
>> is_in_cell(A, 10)
ans =
logical
0
If you want that, you can define f as
f = #(c) isequal(value, c) || isequal(class(value), class(c)) && ismember(value, c);
which avoids calling ismember with incompatible data types, because of the 'short-circuiting' behaviour of || and &&. This last solution is still a bit inconsistent in how it matches strings with character vectors, just in case that's important to you - see if you can figure out how to fix that.

Replacing string values in cell array with numbers

I have a cell array which contains some descriptions, namely my_des.
my_des = [{'FRD'} {'1'}; {'UNFRD'} {'2'}; {'OTH'} {'3'};];
I also have an approximately 5000x1 cell array. The elements in this array are either 'FRD', 'UNFRD' or 'OTH'.
What I want to do is replace these text values with the corresponding numeric values in my_des.
Currently my only idea (which I think isn't that great) is to loop through my_des and do a string replacement.
Example:
So say my current vector looks like this:
FRD
FRD
OTH
UNFRD
OTH
FRD
Then my desired output would be this:
1
1
3
2
3
1
The numbers come from the my_des array
Do you want to use the characters '1', '2', '3' or just the numbers 1, 2, 3? The distinction is the difference between a 1 line answer and a 2 line answer!
Based on your example, let's use the following data:
arr = {'FRD'; 'FRD'; 'OTH'; 'UNFRD'; 'OTH'; 'FRD'};
Get the row index within my_des of each element in arr, and use that to get the corresponding 2nd column values...
% If you just want the *number* then this is all you need
[~, idx] = ismember(arr, my_des);
% idx is the row within column 1 of my_des where the value in arr is found
% >> idx = [1; 1; 3; 2; 3; 1]
% If you want to get the values my_des then use idx as a row index
out = mydes(idx, 2);
% out is the corresponding values from the 2nd column of my_des, whatever they may be.
% >> out = {'1'; '1'; '3'; '2'; '3'; '1'};
Aside: why are you declaring a cell array by concatenating 1-element cell arrays for my_des? Instead, you can just do this:
my_des = {'FRD', '1';
'UNFRD', '2';
'OTH', '3'};

How to convert char to number in Matlab

I am having trouble converting a character variable to a number in Matlab.
Each cell in the char variable contains one of two possible words. I need to convert word_one (for example) to represent '1', and word_two to represent '2'.
Is there a command that will let me do this?
So far I've tried:
%First I converted 'Word' from cell to char
Word = char(Word);
Word(Word == 'Word_one') = '1';
Word(Word == 'Word_two') = '2';
However, I get the:
Error using ==
Matrix dimensions must agree.
When I try to include the first letter only (ie. 'W'), it only changes the first letter in the full word (ie. 1ord_one).
Is there an easy way to do this?
Thanks for your help - any advice is much appreciated!
Use ismember:
possibleWords = {'Word_one', 'Word_two'}; %// template: words corresponding to 1, 2, ...
words = {'Word_two', 'Word_one', 'Word_two'}; %// data: words you need to convert
[~, result] = ismember(words, possibleWords);
In this example,
result =
2 1 2
If you need more flexibility, you can specify the value corresponding to each word:
possibleWords = {'Word_one', 'Word_two'}; %// template: words corresponding to 1, 2, ...
correspondingValues = [1.1, 2.2]; %// template: value corresponding to each word
words = {'Word_two', 'Word_one', 'Word_two'}; %// data: words you need to convert
[~, ind] = ismember(words, possibleWords);
result = correspondingValues(ind);
which gives
result =
2.2000 1.1000 2.2000
Looks like there are a couple of potential issues here.
Use strcmp() (string compare) in place of your current equivalence statement. Comparing strings using == compares element by element and returns a logical vector (where here you want a single logical value). String comparison, strcmp(), will compare the entire strings instead and return a single value.
It's also probably not necessary for you to convert your cell array. You can maintain the cell array structure and address each cell individually.
Try something along the lines of the following snippet.
for i = 1:length(Word)
if strcmp(Word{i},'Word_one')
Word{i} = '1';
elseif strcmp(Word{i},'Word_two')
Word{i} = '2';
end
end
There are a number of ways to solve this problem. Here's my approach.
% define your words
words = {'word_one','word_two','word_two','word_one','word_one'};
% define a function to get the indexes of the words of interest
getindex = #(c, y) cellfun(#(x) strcmp(x,y), c);
% replace 'word_one' with '1'
words(getindex(words, 'word_one'))={'1'};
% replace 'word_two' with '2'
words(getindex(words, 'word_two'))={'2'};
words =
'1' '2' '2' '1' '1'
You can use short n simple unique -
input_cellarr = {'Word_two','Word_one','Word_two','Word_two','Word_one','Word_one'}
[~,~,out] = unique(input_cellarr)
Sample run -
input_cellarr =
'Word_two' 'Word_one' 'Word_two' 'Word_two' 'Word_one' 'Word_one'
out =
2
1
2
2
1
1
Explanation: unique works here because it will produce an ascending order sorted array with numeric arrays. Now, when used on cell arrays, that ascending order translates to alphabetical order sorting. Thus, unique(input_cellarr) would always have {'Word_one' , 'Word_two'} because one is alphabetically higher up than two. Therefore the out indices would always have the first unique ID as 1 for 'Word_one' and the second ID as 2 for 'Word_two'.

Reshape Matlab table

I have the following table
name = ['A' 'A' 'A' 'B' 'B' 'C' 'C' 'C' 'C' 'D' 'D' 'E' 'E' 'E']';
value = randn(14, 1);
T = table(name, value);
i,e.
T =
name value
____ _________
A 0.0015678
A -0.76226
A 0.98404
B -1.0942
B 0.71249
C 1.688
C 1.4001
C -0.9278
C -1.3725
D 0.11563
D 0.076776
E 1.0568
E 1.1972
E 0.29037
I want to transform it in the following way: take the first two cells in value corresponding to different values in name and put it in the 5x2 matrix. This matrix would have rows corresponding to different names A,B,C,D,E and columns corresponding to values, e.g. the first two rows are
0.0015678 -0.76226
-1.0942 0.71249
This can be done with accumarray using a custom function. The first step is to convert the name column of T into a numeric vector; and then accumarray can be applied.
This approach requires T being sorted according to column 1, because only in this case is accumarray guaranteed to preserve order (as indicated in its documentation). So if T may not be sorted (although it is in your example), sort it first using sortrows.
T = sortrows(T, 1); %// you can remove this line if T is guaranteed to be sorted
[~, ~, names] = unique(T(:,1)); %// names as a numeric vector
result = cell2mat(accumarray(names, T.value, [], #(x) {x([1 2]).'}));
First figure out where each name has values located in the table, then cycle through each name and place the first two values encountered for each name into individual cell arrays. Once you're done, reshape the matrix to 5 x 2 as you have said. As such, do something like this:
names = unique(T.name); %// 1
ind = arrayfun(#(x) find(T.name == x), names, 'uni', 0); %// 2
vals = cellfun(#(x) T.value(x(1:2)), ind, 'uni', 0); %// 3
m = [vals{:}].'; %// 4
Let's go through each line of code slowly.
Line #1
The first line finds all unique names through unique and we store them into names.
Line #2
The next line goes through all of the unique names and finds those locations / rows in the table that share that particular name. I use arrayfun and go through each name in names, find those rows that share the same name as one we are looking for, and place those row locations into individual cells; these are stored into ind. To find the locations of each valid name in our table, I use find and the locations are placed into a column vector. As such, we will have five column vectors where each column vector is placed into an individual cell. These column vectors will tell us which rows match a particular name located in your table.
Line #3
The next line uses cellfun to go through each of the cells in ind and extracts the first two row locations that share a particular name, indexes into the value field for your table to pull those two values, and these are placed as two-element vectors into individual cells for each name.
Line #4
The last line of code simply unrolls each two-element vector. The first two elements of each name get stored into columns. To get them into rows, I simply transpose the unrolling. The output matrix is stored into m.
If you want to see what the output looks like, this is what I get when I run the above code with your example table:
m =
0.0016 -0.7623
-1.0942 0.7125
1.6880 1.4001
0.1156 0.0768
1.0568 1.1972
Be advised that I only showed the first 5 digits of precision so there is some round-off at the end. However, this is only for display purposes and so what I got is equivalent to what your expect for the output.
Hope this helps!
If you want use the tables, you could try something like this:
count = 1;
U = unique(table2array(T(:,1)));
for ii = 1:size(U,1)
A = find(table2array(T(:,1)) == U(ii));
A = A(1:2);
B(count,1:2) = table2array(T(A,2));
count = count + 1;
end
Personally, I would find this simpler to do with your name and value arrays and forget about the table. If it is a requirement then I understand, however I will provide my solution still. It may provide some insight either way.
count = 1;
U = unique(name);
for ii = 1:size(U,1)
A = find(name == U(ii));
A = A(1:2);
B(count,1:2) = value(A);
count = count + 1;
end
Quick and dirty, but hopefully it's good enough. Good luck.
Another solution that is more manageable and easily scalable exists. Since MATLAB R2013b you can use a specialized function for pivoting a table (which is what you want to do): unstack.
In order to get exactly what you wanted, you need to add an extra variable to your table that will indicate replications:
name = ['A' 'A' 'A' 'B' 'B' 'C' 'C' 'C' 'C' 'D' 'D' 'E' 'E' 'E']';
value = randn(14, 1);
rep = [1, 2, 3, 1, 2, 1, 2, 3, 4, 1, 2, 1, 2, 3];
T = table(name, value, rep);
T =
name value rep
____ _________ ___
A 0.53767 1
A 1.8339 2
A -2.2588 3
B 0.86217 1
B 0.31877 2
C -1.3077 1
C -0.43359 2
C 0.34262 3
C 3.5784 4
D 2.7694 1
D -1.3499 2
E 3.0349 1
E 0.7254 2
E -0.063055 3
Then you just use unstack like this:
pivotTable = unstack(T, 'value','name')
pivotTable =
rep A B C D E
___ _______ _______ ________ _______ _________
1 0.53767 0.86217 -1.3077 2.7694 3.0349
2 1.8339 0.31877 -0.43359 -1.3499 0.7254
3 -2.2588 NaN 0.34262 NaN -0.063055
4 NaN NaN 3.5784 NaN NaN
Afterwards, it's a matter of re-arranging the table if you still want to.
The easiest way is to first convert the table into a matrix form and then reshape it by using the "reshape" function in Matlab.
matrix = t{:,:};% t-- your table variable
reshape_matrix = reshape(matrix ,[2,3]) % [2,3]--> the size of the matrix you desire
These two steps can be done by one line of code
reshape_matrix = reshape(t{:,:},[2,3]);

How using reshape with cell arrays?

Unfortunately I have to work with a dataset of cell arrays, which don't even have the same input..
My dataset (the relevant columns of cell arrays) look as follows:
Data =
1 'd2'
1 'd3'
2 'u2'
2 'd2'
2 'u3'
3 'e2'
... ...
I want to reshape them in a way, that all inputs of the second column of all rows containing the same number in the first column, are stored in new columns. Because the single rows of each number in the first column aren't always the same (but at highest 4) I wrote following code:
% creating 4 new cell arrays for the new columns
cells = cell(length(Data(:,1)),4);
Data = [Data,cells];
% reshaping Data
Data(:,3:6) = reshape(Data(Data(:,1) == 1,2),1,[]);
Data(:,3:6) = reshape(Data(Data(:,1) == 2,2),1,[]);
This would perfectly work with matrices. But unfortunately, it doesn't work on cell arrays!
Please could you help me out, where I have to place the curly brackets, so it would work? I didn't get it so far and maybe I'm just overseeing it now! ;-)
Thank you a lot!
Personally I find a loop to be the most simple and flexible solution in this case:
mydata={1 'd2'
1 'd3'
2 'u2'
2 'd2'
2 'u3'
3 'e2'}
list = unique([mydata{:,1}])
result = {};
for t=1:numel(list)
count=0;
for u =1:size(mydata,1)
if mydata{u,1}==list(1,t)
count = count+1;
result(t,count)=mydata(u,2)
end
end
end
Note that a vectorized approach will likely be more efficient, but unless your data is big it should not matter much.
This could be one approach that uses the masking capability of bsxfun -
%// Input
Data = {
1 'd2'
1 'd3'
2 'u2'
2 'd2'
2 'u3'
3 'e2'}
%// Find the IDs and the unique IDs
ids = cell2mat(Data(:,1))
id_out = num2cell([1:max(ids)]') %//'# To be used as the first col of desired o/p
%// Find the extents of each group/ID members
grp_extents = sum(bsxfun(#eq,[1:max(ids)],ids),1)
%// Or use accumarray which could be faster -
%// grp_extents = accumarray(ids,ones(1,numel(ids))).'
%// Get a cell array with the members (strings) from the second column of Data
%// put into specific columns based on their IDs
string_out = cell(max(grp_extents),numel(grp_extents))
string_out(bsxfun(#le,[1:max(grp_extents)]',grp_extents)) = Data(:,2) %//'# This is
%// where the masking is being used for logical indexing
%// Transpose the string cell array and horizontally concatenate with 1D
%// cell array containing the IDs to form the desired output
Data_out = [id_out string_out']
Output -
Data_out =
[1] 'd2' 'd3' []
[2] 'u2' 'd2' 'u3'
[3] 'e2' [] []