Finding which letter has maximal occurence - matlab

I tried Matlab and the net to find an answer but in vain so I need your help
I have used the code below to find number of occurrences of the letters in an array;
characterCell = {'a' 'b' 'b' 'a' 'b' 'd' 'c' 'c'}; %# Sample cell array
matchCell = {'a' 'b' 'c' 'd' 'e'}; %# Letters to count
[~,index] = ismember(characterCell,matchCell); %# Find indices in matchCell
counts = accumarray(index(:),1,[numel(matchCell) 1]); %# Accumulate indices
results = [matchCell(:) num2cell(counts)] `
results =
'a' [2]
'b' [3]
'c' [2]
'd' [1]
'e' [0]
Now I need to get which letter has the highest occurrence
How to know the index?

The mode function tells you the most frequent value.
mostCommonLetter = mode(matchCell[:]);

The index is the second output of the function max.
So you should do:
[~,index]=max(counts)
mostCommonLetter=matchCell{index};

Related

Enumerating combinations of cells

Say I have 3 cells:
M1={ [1,1,1], [2,2,2] }
M2={ [3,3], [4,4] }
M3={ [5], [6] }
I want to take every element in M1, combine it with every element of M2, combine that with every element of M3, ect.
For the input above, I would like to produce one giant cell like:
[1,1,1],[3,3],[5]
[1,1,1],[3,3],[6]
[1,1,1],[4,4],[5]
[1,1,1],[4,4],[6]
[2,2,2],[3,3],[5]
[2,2,2],[3,3],[6]
[2,2,2],[4,4],[5]
[2,2,2],[4,4],[6]
How can I do this? In general, the number of cells (M1,M2...Mn), and their size, are unknown (and changing).
This function does what you want:
function C = add_permutations(A,B)
% A is a cell array NxK, B is 1xM
% C is a cell array N*M x K+1
N = size(A,1);
A = reshape(A,N,1,[]);
C = cat(3,repmat(A,1,numel(B)),repmat(B,N,1));
C = reshape(C,[],size(C,3));
It creates all combinations of two cell arrays by replicating them in different dimensions, then concatenating along the 3rd dimension and collapsing the first two dimensions. Because we want to repeatedly call it with different cell arrays, input A (NxK) has K matrices in each row, these are the previous combinations. B is a cell vector, each element will be combined with each row of A.
You use it as follows:
M1 = { 'a', 'b', 'c', 'd' }; % These are easier for debugging than OP's input, but cell elements can be anything at all.
M2 = { 1, 2 };
M3 = { 10, 12 };
X = M1.';
X = add_permutations(X,M2);
X = add_permutations(X,M3);
X now contains:
X =
16×3 cell array
'a' [1] [10]
'b' [1] [10]
'c' [1] [10]
'd' [1] [10]
'a' [2] [10]
'b' [2] [10]
'c' [2] [10]
'd' [2] [10]
'a' [1] [12]
'b' [1] [12]
'c' [1] [12]
'd' [1] [12]
'a' [2] [12]
'b' [2] [12]
'c' [2] [12]
'd' [2] [12]
That's not a permutation, it's an enumeration: you have 3 symbols, each with 2 possible values, and you are simply enumerating all possible "numbers". You can think about it the same way as if you were counting binary numbers with 3 digits.
In this case, one way to enumerate all these possibilities is with ndgrid. If M1 has n1 elements, M2 has n2 elements, etc:
n1 = numel(M1);
n2 = numel(M2);
n3 = numel(M3);
[a,b,c] = ndgrid(1:n1, 1:n2, 1:n3);
Here a,b,c are each 3-dimensional array, which represent the "grid" of combinations. Obviously you don't need that, so you can vectorise them, and use them to create combinations of the various elements in M1, M2, M3, like so
vertcat( M1(a(:)), M2(b(:)), M3(c(:)) )
If you are interested in generalising this for any number of Ms, this can also be done, but keep in mind that these "grids" are growing very fast as you increase their dimensionality.
Note: vertcat stands for "vertical concatenation", the reason it is vertical and not horizontal is because the result of M1(a(:)) is a row-shaped cell, even though a(:) is a column vector. That's just indexing headache, but you can simply transpose the result if you want it Nx3.

Sum sub-matrices according to row index containing numerical data as strings

How to sum each each columns of sub part of cell?
Given a cell A
A = {'a' '546.8' '543.5' '544'
'a' '641.9' '637.4' '632.3'
'a' '214.7' '214.1' '231.8'
'a' '256.9' '255.6' '254.2'
'c' '356' '355.1' '354.4'
'c' '759' '759.6' '756.2'
'c' '352.2' '350.4' '350.8'
'f' '234' '230.3' '232.3'
'f' '225' '223.5' '221.8'}
To separate A into sub cell according to different letter in the first column of A. And sum each columns of the the sub cell.
The anticipated result is:
B = {'a' '1660.3' '1650.6' '1662.3'
'c' '1467.2' '1465.1' '1461.4'
'f' '459' '453.8' '454.1'}
There is no loop required:
%// get unique rows
[ids,~,subs] = unique(A(:,1))
%// transform string data to numeric data
vals = str2double(A(:,2:end))
%// sum unique rows
sums = accumarray(subs, 1:numel(subs), [], #(x) {sum(vals(x,:),1)} )
%// output result
out = [ids(:),num2cell(cell2mat(sums))]
One of the possible solutions is
[B,~,idxs]= unique(A(:,1))
for k=2:size(A,2)
B(:,k)= num2cell(accumarray(idxs,str2double(A(:,k))))
end

Count number of letter repetitions in a string

I am trying to find how many times each letter appears in a cell array.
I have to open this data file in Matlab
A 12 A 88
B 23 F 22
C 55 B 77
D 66 H 44
I named it Thor.dat and this is my code in Matlab
fid = fopen('Thor.dat')
if fid == -1
disp ('File open not successful')
else
disp ('File open is successful')
mat = textscan(fid,'%c %f %c %f')
[r c] = size(mat)
charoccur(mat)
fclose(fid)
end
and the charoccur function is
function occurence = charoccur(mat)
% charoccur finds the number of times each character appears in a column
[row, col] = size(mat);
[row, ccol] = size(mat{1});
[mat] = unique(mat{i})
d = hist(c,length(a))
end
Here is a way to do it using unique and strcmp. Basically loop though the cell array and sum the number of occurence of each unique letters. Using strcmp gives a logical array of 0 and 1. By summing the 1s you get the total number of times a letter is found.
clear
clc
%// Input cell array
mat = {'A' 12 'A' 88;'B' 23 'F' 22;'C' 55 'B' 77;'D' 66 'H' 44;'W' 11 'C' 9;'H' 3 'H' 0};
mat = [mat(:,1) mat(:,3)]
%// Find unique letters
UniqueLetters = unique(mat);
%// Initialize output cell
OutputCell = cell(numel(UniqueLetters,2));
%// Loop through each unique letter and count number of occurence
for k = 1:numel(UniqueLetters);
%// 1st column: letter
OutputCell(k,1) = UniqueLetters(k);
%// 2nd column: sum of occurences
OutputCell(k,2) = {sum(sum(strcmp(UniqueLetters{k},mat)))};
end
OutputCell
OutputCell now looks like this:
OutputCell =
'A' [2]
'B' [2]
'C' [2]
'D' [1]
'F' [1]
'H' [3]
'W' [1]
Hope that helps get you started!
EDIT:
As per your comment, for the initialization of the output cell array:
OutputCell = cell(numel(UniqueLetters,2));
I create a Nx2 cell array, in which N (the number of rows) is the number of unique cells identified by the call to unique. In my example above there are 7 unique letters, so I create a 7x2 cell array to store:
1) In column 1 the actual letter
2) In column 2 the number of times this letter is found in mat

Count Occurrence of Cell Array to Cell Array Matlab

I've got 2 string cell arrays, one is the unique version of the other. I would like to count the number of occurrence of each values in the unique cell array given the other cell array. I got a large cell array so I thought I'd try my best to find answers to a more faster approach as oppose to looping...
An example:
x = {'the'
'the'
'aaa'
'b'
'the'
'c'
'c'
'd'
'aaa'}
y=unique(x)
I am looking for an output in any form that contains something like the following:
'aaa' = 2
'b' = 1
'c' = 2
'd' = 1
'the' = 3
Any ideas?
One way is to count the indices unique finds:
[y, ~, idx] = unique(x);
counts = histc(idx, 1:length(y));
which gives
counts =
2
1
2
1
3
in the same order as y.
histc is my default fallback for counting things, but the function I always forget about is probably better in this case:
counts = accumarray(idx, 1);
should give the same result and is probably more efficient.

Replace a string in cell array into 1x3 numeric cell array

Cell array data as below:
data=
'A' [0.006] 'B'
'C' [3.443] 'C'
i would like to convert character in first column in to 1x3 vector, mean that
'A' replace by [0] [0] [0],
'C' replace by [0] [1] [0]..
the result will be
[0] [0] [0] [0.006] 'B'
[0] [1] [0] [3.443] 'C'
the code i tried as below:
B=data(1:end,1);
B=regexprep(B,'C','[0 0 0]');
B=regexprep(B,'A','[0 1 0]');
the result show me
B=
'[0 0 0]'
'[0 1 0]'
which is wrong, each character does not change to 1x3 array...please help...
Since you did not specify the rule to convert letters to numbers,
I assumed you want to replace A with 000, B with 001, ..., H with 111
(ie numbers from 0 to 7 in binary, corresponding to letters A to H).
In case you want to go up to Z, the code below can be easily changed.
%# you data cell array
data = {
'A' [0.006] 'B'
'C' [3.443] 'C'
};
%# compute binary numbers equivalent to letters A to H
binary = num2cell(dec2bin(0:7)-'0'); %# use 0:25 to go up to Z
%# convert letters in to row indices in the above cell array "binary"
idx = cellfun(#(c) c-'A'+1, upper(data(:,1)));
%# replace first column, and build new data
newData = [binary(idx,:) data(:,2:end)]
The result:
newData =
[0] [0] [0] [0.006] 'B'
[0] [1] [0] [3.443] 'C'