I am trying to find how many times each letter appears in a cell array.
I have to open this data file in Matlab
A 12 A 88
B 23 F 22
C 55 B 77
D 66 H 44
I named it Thor.dat and this is my code in Matlab
fid = fopen('Thor.dat')
if fid == -1
disp ('File open not successful')
else
disp ('File open is successful')
mat = textscan(fid,'%c %f %c %f')
[r c] = size(mat)
charoccur(mat)
fclose(fid)
end
and the charoccur function is
function occurence = charoccur(mat)
% charoccur finds the number of times each character appears in a column
[row, col] = size(mat);
[row, ccol] = size(mat{1});
[mat] = unique(mat{i})
d = hist(c,length(a))
end
Here is a way to do it using unique and strcmp. Basically loop though the cell array and sum the number of occurence of each unique letters. Using strcmp gives a logical array of 0 and 1. By summing the 1s you get the total number of times a letter is found.
clear
clc
%// Input cell array
mat = {'A' 12 'A' 88;'B' 23 'F' 22;'C' 55 'B' 77;'D' 66 'H' 44;'W' 11 'C' 9;'H' 3 'H' 0};
mat = [mat(:,1) mat(:,3)]
%// Find unique letters
UniqueLetters = unique(mat);
%// Initialize output cell
OutputCell = cell(numel(UniqueLetters,2));
%// Loop through each unique letter and count number of occurence
for k = 1:numel(UniqueLetters);
%// 1st column: letter
OutputCell(k,1) = UniqueLetters(k);
%// 2nd column: sum of occurences
OutputCell(k,2) = {sum(sum(strcmp(UniqueLetters{k},mat)))};
end
OutputCell
OutputCell now looks like this:
OutputCell =
'A' [2]
'B' [2]
'C' [2]
'D' [1]
'F' [1]
'H' [3]
'W' [1]
Hope that helps get you started!
EDIT:
As per your comment, for the initialization of the output cell array:
OutputCell = cell(numel(UniqueLetters,2));
I create a Nx2 cell array, in which N (the number of rows) is the number of unique cells identified by the call to unique. In my example above there are 7 unique letters, so I create a 7x2 cell array to store:
1) In column 1 the actual letter
2) In column 2 the number of times this letter is found in mat
Related
Consider that I have a table of such type in MATLAB:
Location String Number
1 a 26
1 b 361
2 c 28
2 a 45
3 a 78
4 b 82
I would like to create a script which returns only 3 rows, which would include the largest Number for each string. So in this case the table returned would be the following:
Location String Number
3 a 78
1 b 361
2 c 28
The actual table that I want to tackle is much greater, though I wrote this like that for simplicity. Any ideas on how this task can be tackled? Thank you in advance for your time!
You could use splitapply, with an id for each row.
Please see the comments for details...
% Assign unique ID to each row
tbl.id = (1:size(tbl,1))';
% Get groups of the different strings
g = findgroups(tbl.String);
% create function which gets id of max within each group
% f must take arguments corresponding to each splitapply table column
f = #(num,id) id(find(num == max(num), 1));
% Use splitapply to apply the function f to all different groups
idx = splitapply( f, tbl(:,{'Number','id'}), g );
% Collect rows
outTbl = tbl(idx, {'Location', 'String', 'Number'});
>> outTbl =
Location String Number
3 'a' 78
1 'b' 361
2 'c' 28
Or just a simple loop. This loop is only over the unique values of String so should be pretty quick.
u = unique(tbl.String);
c = cell(numel(u), size(tbl,2));
for ii = 1:numel(u)
temp = tbl(strcmp(tbl.String, u{ii}),:);
[~, idx] = max(temp.Number);
c(ii,:) = table2cell(temp(idx,:));
end
outTbl = cell2table(c, 'VariableNames', tbl.Properties.VariableNames);
Finding max values of each string my idea is.
Create a vector of all your strings and include them only one time. Something like:
strs=['a','b','c'];
Then create a vector that will store maximum value of each string:
n=length(strs);
max_values=zeros(1,n);
Now create a loop with the size of the whole data to compare current max_value with the current value and substitute if current_value>max_value:
for i=1:your_table_size
m=find(strs==current_table_string); % This finds the index of max_values
if max_values(m)<current_table_Number % This the the i_th row table_number
max_values(m)=current_table_Number;
end
end
I have a huge text file in the following format:
1 2
1 3
1 10
1 11
1 20
1 376
1 665255
2 4
2 126
2 134
2 242
2 247
First column is the x coordinate while second column is the y coordinate.
It indicates that if I had to construct a Matrix
M = zeros(N, N);
M(1, 2) = 1;
M(1, 3) = 1;
.
.
M(2, 247) = 1;
This text file is huge and can't be brought to main memory at once. I must read it line by line. And save it in a sparse matrix.
So I need the following function:
function mat = generate( path )
fid = fopen(path);
tline = fgetl(fid);
% initialize an empty sparse matrix. (I know I assigned Mat(1, 1) = 1)
mat = sparse(1);
while ischar(tline)
tline = fgetl(fid);
if ischar(tline)
C = strsplit(tline);
end
mat(C{1}, C{2}) = 1;
end
fclose(fid);
end
But unfortunately besides the first row it just puts trash in my sparse mat.
Demo:
1 7
1 9
2 4
2 9
If I print the sparse mat I get:
(1,1) 1
(50,52) 1
(49,57) 1
(50,57) 1
Any suggestions ?
Fixing what you have...
Your problem is that C is a cell array of characters, not numbers. You need to convert the strings you read from the file into integer values. Instead of strsplit you can use functions like str2num and str2double. Since tline is a space-delimited character array of integers in this case, str2num is the easiest to use to compute C:
C = str2num(tline);
Then you just index C like an array instead of a cell array:
mat(C(1), C(2)) = 1;
Extra tidbit: If you were wondering how your demo code still worked even though C contained characters, it's because MATLAB has a tendency to automatically convert variables to the correct type for certain operations. In this case, the characters were converted to their double ASCII code equivalents: '1' became 49, '2' became 50, etc. Then it used these as indices into mat.
A simpler alternative...
You don't even have to bother with all that mess above, since you can replace your entire function with a much simpler approach using dlmread and sparse like so:
data = dlmread(filePath);
mat = sparse(data(:, 1), data(:, 2), 1);
clear data; % Save yourself some memory if you don't need it any more
How to sum each each columns of sub part of cell?
Given a cell A
A = {'a' '546.8' '543.5' '544'
'a' '641.9' '637.4' '632.3'
'a' '214.7' '214.1' '231.8'
'a' '256.9' '255.6' '254.2'
'c' '356' '355.1' '354.4'
'c' '759' '759.6' '756.2'
'c' '352.2' '350.4' '350.8'
'f' '234' '230.3' '232.3'
'f' '225' '223.5' '221.8'}
To separate A into sub cell according to different letter in the first column of A. And sum each columns of the the sub cell.
The anticipated result is:
B = {'a' '1660.3' '1650.6' '1662.3'
'c' '1467.2' '1465.1' '1461.4'
'f' '459' '453.8' '454.1'}
There is no loop required:
%// get unique rows
[ids,~,subs] = unique(A(:,1))
%// transform string data to numeric data
vals = str2double(A(:,2:end))
%// sum unique rows
sums = accumarray(subs, 1:numel(subs), [], #(x) {sum(vals(x,:),1)} )
%// output result
out = [ids(:),num2cell(cell2mat(sums))]
One of the possible solutions is
[B,~,idxs]= unique(A(:,1))
for k=2:size(A,2)
B(:,k)= num2cell(accumarray(idxs,str2double(A(:,k))))
end
I have three cell arrays of strings and I want to find all possible combinations of the three. So if I have:
One= Two= Three=
[A B [M N [W X
C D] O P] Y Z]
I want to be able to find all combinations, which would give me something like:
Four=
[AMW AMX AMY AMZ ANW ANX ANY ANZ AOW AOX AOY AOZ APW APX APY APZ
BMW BMX BMY BMZ BNW BNX BNY BNZ BOW BOX BOY BOZ BPW BPX BPY BPZ
etc.
etc.]
Is there a simple way to do this or will I have to somehow change the strings to integer values?
So, you have three cell arrays:
One = {'A' 'B' 'C' 'D'};
Two = {'M' 'N' 'O' 'P'};
Three = {'W' 'X' 'Y' 'Z'};
We can try to work with their indices
a = 1:length(One);
ind = combvec(a,a,a);
ind would be the matrix with all the combinations of three numbers from 1 to 4, e.g. 111 112 113 114 211 212 213 214 etc. According to combinatorics, its dimensions would be 3x64. The indices correspond to the letters in your cell array, i.e. 214 would correspond to 'BMZ' combination.
EDIT, developed an easy way to generate the combinations without combvec with help of #LuisMendo 's answer:
a=1:4;
[A,B,C] = ndgrid(a,a,a);
ind = [A(:),B(:),C(:)]';
Then you create the blank cell array with length equal to 64 - that's how many combinations of three letters you are expected to get. Finally, you start a loop that concatenates the letters according to the combination in ind array:
Four = cell(1,length(ind(1,:)));
for i = 1:length(ind(1,:))
Four(i) = strcat(One(ind(1,i)),Two(ind(2,i)),Three(ind(3,i)));
end
So, you obtain:
Four =
'AMW' 'BMW' 'CMW' 'DMW' 'ANW' ...
This is a solution from another stackoverflow participant who helped me out.
Data is coming from a csv file:
States Damage Blizzards
Indiana 1 3
Alabama 2 3
Ohio 3 2
Alabama 4 2
%// Parse CSV file
[States, Damage, Blizzards] = textread(csvfilename, '%s %d %d', ...
'delimiter', ',', 'headerlines', 1);
%// Parse data and store in an array of structs
[U, ix, iu] = unique(States); %// Find unique state names
S = struct('state', U); %// Create a struct for each state
for k = 1:numel(U)
idx = (iu == k); %// Indices of rows matching current state
S(k).damage = Damage(idx); %// Add damage information
S(k).blizzards = Blizzards(idx); %// Add blizards information
end
In MATLAB, I need to create a series of assigned variables (A1,A2,A3) in a loop. So I have structure S with 3 fields: state, tornado, hurricane.
Now I have attempted this method to assign A1 =, A2 =, which I got an error because it will not work for structures:
for n = 1:numel(S)
eval(sprintf('A%d = [1:n]',S(n).states));
end
Output goal is a series of assigned variables in the loop to the fields of the structure:
A1 = 2 3
A2 = 2 3
A3 = 4 5
I'm not 100% sure I understand your question.
But maybe you are looking for something like this:
for n = 1:numel(S)
eval(sprintf('A%d = [S(n).damage S(n).blizzards]',n));
end
BTW using evalc instead of eval will suppress the command line output.
A little explanation, why
eval(sprintf('A%d = [1:n]',S(n).state));
does not work:
S(1).state
returns
ans =
Alabama
which is a string. However,
A%d
expects a number (see this for number formatting).
Additionally,
numel(S)
yields
ans =
3
Therefore,
eval(sprintf('A%d = [1:n]',n));
will simply return the following output:
A1 =
1
A2 =
1 2
A3 =
1 2 3
Hence, you want n as a counter for the variable name, but compose the vector of the entries in the other struct-fields (damage and blizzards), again, using n as a counter.