Detect cell entries in MATLAB Table - matlab

I have a Matlab table (the new 'Table' class), let's call it A:
A=table([1;2;3],{'A';'B';'C'})
As you can see, some of the columns are double, some are cell.
I'm trying to figure out which ones are cells.
For some reason, there is no A.Properties.class I can use, and I can't seem to call iscell on it.
What's the "Matlab" way of doing this? Do I have to loop through each column of the table to figure out its class?

One approach -
out = cellfun(#(x) iscell(getfield(A,x)),A.Properties.VariableNames)
Or, a better way would be to access the fields(variables) dynamically like so -
out = cellfun(#(x) iscell(A.(x)), A.Properties.VariableNames)
Sample runs:
Run #1 -
A=table([1;2;3],{4;5;6})
A =
Var1 Var2
____ ____
1 [4]
2 [5]
3 [6]
out =
0 1
Run #2 -
>> A=table([1;2;3],{'A';'B';'C'})
A =
Var1 Var2
____ ____
1 'A'
2 'B'
3 'C'
out =
0 1
Run #3 -
>> A=table([1;2;3],{4;5;6},{[99];'a';'b'},{'m';'n';'p'})
A =
Var1 Var2 Var3 Var4
____ ____ ____ ____
1 [4] [99] 'm'
2 [5] 'a' 'n'
3 [6] 'b' 'p'
>> out
out =
0 1 1 1

You could test with iscell(A.Var2) if the second variable is of type cell. More generally, you could reference columns by their index:
for k = 1 : width(A)
disp(iscell(A.(k)))
end

Related

Truncating a single table cell in Matlab

I would like to truncate/clean up a single cell of a matlab table, such as this:
important =
2×8 table
Var3 Var5 Var6 Var7 Var8 Var9 Var10 Var11
________ __________________ ___________ ____________ ___________ ____________ ____________ __________
09:13:30 'Zone="<0>"' 'Vset="19"' 'Vrdb="0.0"' 'Iset="10"' 'Irdb="0.0"' 'Pset="190"' 'Prdb="0"'
09:13:30 'Zone="<1>"' 'Vset="19"' 'Vrdb="0.0"' 'Iset="10"' 'Irdb="0.0"' 'Pset="190"' 'Prdb="0"'
I would like to be able to truncate Var5, to trim it down to just the number (1 or 0 in this case). I don't know if it would be best to pull out Var5 and modify it or something else.
Any guidance would be appreciated.
Thanks
Let's define
t = table;
t.Var5 = {'Zone="<0>"'; 'Zone="<1>"'};
t.Var6 = {'Vset="19"'; 'Vset="19"'};
Assuming that the number you want is just a sequence or digits (no sign, decimals etc), that each table entry contains only one such sequence (or you only want the first) and that you want the results as character vectors:
t.Var5 = regexp(t.Var5, '\d+', 'match', 'once');
Before:
t =
2×2 table
Var5 Var6
__________________ ___________
'Zone="<0>"' 'Vset="19"'
'Zone="<1>"' 'Vset="19"'
After:
t =
2×2 table
Var5 Var6
____ ___________
'0' 'Vset="19"'
'1' 'Vset="19"'
If you want the results as numbers:
t.Var5 = str2double(regexp(t.Var5, '\d+', 'match', 'once'));
After
t =
2×2 table
Var5 Var6
____ ___________
0 'Vset="19"'
1 'Vset="19"'

Table sort by month

I have a table in MATLAB with attributes in the first three columns and data from the fourth column onwards. I was trying to sort the entire table based on the first three columns. However, one of the columns (Column C) contains months ('January', 'February' ...etc). The sortrows function would only let me choose 'ascend' or 'descend' but not a custom option to sort by month. Any help would be greatly appreciated. Below is the code I used.
sortrows(Table, {'Column A','Column B','Column C'} , {'ascend' , 'ascend' , '???' } )
As #AnonSubmitter85 suggested, the best thing you can do is to convert your month names to numeric values from 1 (January) to 12 (December) as follows:
c = {
7 1 'February';
1 0 'April';
2 1 'December';
2 1 'January';
5 1 'January';
};
t = cell2table(c,'VariableNames',{'ColumnA' 'ColumnB' 'ColumnC'});
t.ColumnC = month(datenum(t.ColumnC,'mmmm'));
This will facilitate the access to a standard sorting criterion for your ColumnC too (in this example, ascending):
t = sortrows(t,{'ColumnA' 'ColumnB' 'ColumnC'},{'ascend', 'ascend', 'ascend'});
If, for any reason that is unknown to us, you are forced to keep your months as literals, you can use a workaround that consists in sorting a clone of the table using the approach described above, and then applying to it the resulting indices:
c = {
7 1 'February';
1 0 'April';
2 1 'December';
2 1 'January';
5 1 'January';
};
t_original = cell2table(c,'VariableNames',{'ColumnA' 'ColumnB' 'ColumnC'});
t_clone = t_original;
t_clone.ColumnC = month(datenum(t_clone.ColumnC,'mmmm'));
[~,idx] = sortrows(t_clone,{'ColumnA' 'ColumnB' 'ColumnC'},{'ascend', 'ascend', 'ascend'});
t_original = t_original(idx,:);

Create an index to table

I have a table T as below:
T = table({'A';'A';'B';'B';'B';'B';'C';'C';'D';'D'},...
{'xd';'z';'x';'y';'z';'w';'x';'wh';'z';'w'},...
[4;2;4;1;2;5;2;1;1;5], ...
'VariableNames', {'memberId', 'productId','Rating'});
T =
memberId productId Rating
________ _________ ______
'A' 'xd' 4
'A' 'z' 2
'B' 'x' 4
'B' 'y' 1
'B' 'z' 2
'B' 'w' 5
'C' 'x' 2
'C' 'wh' 1
'D' 'z' 1
'D' 'w' 5
I need to index it by memberId and productId so the result is:
A: {'xd' 'z'}
B: {'x' 'y' 'z' 'w'}
C: {'x' 'wh'}
.......
You can use categorical arrays and a structure to do this:
% convert to categorical arrays
T.memberId = categorical(T.memberId);
T.productId = categorical(T.productId);
% cross-tabulate memberId vs. productId
cross_T = crosstab(T.memberId,T.productId);
% a function to return the productId for all 1 in row
productId = categories(T.productId).';
row = #(x) productId(logical(cross_T(x,:)));
% preform on all rows
rBy_c = arrayfun(row,1:size(cross_T,1),'UniformOutput',false).';
% convert to structure for readability
s = cell2struct(rBy_c,categories(T.memberId))
To get the output (s):
A: {'xd' 'z'}
B: {'w' 'x' 'y' 'z'}
C: {'wh' 'x'}
D: {'w' 'z'}

Finding Duplicate string values in two cell array 22124x1

I have a cell 22124x1 and it contain duplicate Values, I want to know how many times these values duplicate and their index
first cell contain these values Datacell=
'221853_s_at'
'221971_x_at'
'221971_x_at'
'221971_x_at'
'221971_x_at'
'222031_at'
'222031_at'
'31637_s_at'
'37796_at'
'38340_at'
symbol cell:
'OR1D4 '
' OR1D5'
' UTP14C'
'GTF2H2 '
'ZNF324B '
' LOC644504'
'JMJD7 '
'ZNF324B '
'JMJD7-PLA2G4B'
' OR2A5 '
'OR1D4 '
For example i want the output from cell 1 like this
ID duplicated index
'221853_s_at' 1 1
'221971_x_at' 4 {2:5,1}
I tried to use unique but it does not work. Any help will be highly appreciated
Generating the indices in a visually pleasing matter isn't necessarily a trivial exercise. It's made simpler if you assume d is sorted.
An alternative utilizing accumarray:
d = {'221853_s_at'; '221971_x_at'; '221971_x_at'; '221971_x_at'; '221971_x_at'; ...
'222031_at'; '222031_at'; '31637_s_at'; '37796_at'; '38340_at' ...
};
d = sort(d); % Sort to make indices easier
% Find unique strings and their locations
[uniquestrings, ~, stringbin] = unique(d);
counts = accumarray(stringbin, 1);
repeatidx = find(counts - 1 > 0);
repeatedstrings = uniquestrings(repeatidx);
repeatcounts = counts(repeatidx) - 1;
% Find where string repeats start
startidx = find([true; diff(stringbin) > 0]);
repeatstart = startidx(repeatidx);
repeatend = startidx(repeatidx + 1) - 1;
% Generate table, requires R2013b or newer
t = table(repeatedstrings, repeatcounts, repeatstart, repeatend, ...
'VariableNames', {'ID', 'Duplicated', 'StringStart', 'StringEnd'} ...
);
Which yields:
t =
ID Duplicated StringStart StringEnd
_____________ __________ ___________ _________
'221971_x_at' 3 2 5
'222031_at' 1 6 7
d = { '221853_s_at'
'221971_x_at'
'221971_x_at'
'221971_x_at'
'221971_x_at'
'222031_at'
'222031_at'
'31637_s_at'
'37796_at'
'38340_at'};
[ids,ia,ic]=unique(d);
ids has the unique strings
ia has an index corresponding to an instance of the unique string within d
ic has an index corresponding to which entry in ids is in that index within d
[ncnt] = hist(ic,1:numel(ids)) - 1; % minus 1 since you only want duplicates
ncnt =
0 3 1 0 0 0
Gets you the number of duplicates for
ids =
'221853_s_at'
'221971_x_at'
'222031_at'
'31637_s_at'
'37796_at'
'38340_at'
ic has the lookup table for the indexes.. use find or logical indexing

flatten matlab table by key

I have a large table whose entries are
KEY_A,KEY_B,VAL
where KEY_A and KEY_B are finite sets of keys. For arguments sake, we'll have 4 different KEY_B values and 4 different KEY_A values. And example table:
KEY_A KEY_B KEY_C
_____ _____ _________
1 1 0.45054
1 2 0.083821
1 3 0.22898
1 4 0.91334
2 1 0.15238
2 2 0.82582
2 3 0.53834
2 4 0.99613
3 1 0.078176
3 2 0.44268
3 3 0.10665
3 4 0.9619
4 1 0.0046342
4 2 0.77491
4 3 0.8173
4 4 0.86869
4 5 1
I want to elegantly flatten the table into
KEY_A KEY_B_1 KEY_B_2 KEY_B_3 KEY_B_4 KEY_B_5
_____ _________ ________ _______ _______ _______
1 0.45054 0.083821 0.22898 0.91334 -1
2 0.15238 0.82582 0.53834 0.99613 -1
3 0.078176 0.44268 0.10665 0.9619 -1
4 0.0046342 0.77491 0.8173 0.86869 1
I'd like to be able to handle missing B values (set them to a default like -1), but I think if I get an elegant way to do this to start then such things will fall into place.
The actual table has millions of records, so I do want to use a vectorized call.
The line I've got (which doesn't handle int invalid 5) is:
cell2mat(arrayfun(#(x)[x,testtable{testtable.KEY_A==x,3}'],unique(testtable{:,1}),'UniformOutput',false))
But
it doesn't output a different table
If there are missing keys in the table, it doesn't handle that
I would think that this isn't that uncommon of an activity...has anyone done something like this before?
If the input table is T, then you could try this for the given case -
KEY_B_ =-1.*ones(max(T.KEY_A),max(T.KEY_B))
KEY_B_(sub2ind(size(KEY_B_),T.KEY_A,T.KEY_B)) = T.KEY_C
T1 = array2table(KEY_B_)
Output for the edited input -
T1 =
KEY_B_1 KEY_B_2 KEY_B_3 KEY_B_4 KEY_B_5
_________ ________ _______ _______ _______
0.45054 0.083821 0.22898 0.91334 -1
0.15238 0.82582 0.53834 0.99613 -1
0.078176 0.44268 0.10665 0.9619 -1
0.0046342 0.77491 0.8173 0.86869 1
Edit by MadScienceDreams: This answer lead me to write the following function, which will smash together pretty much any table based on the input keys. Enjoy!
function [ OT ] = flatten_table( T,primary_keys,secondary_keys,value_key,default_value )
%UNTITLED Summary of this function goes here
% Detailed explanation goes here
if nargin < 5
default_value = {NaN};
end
if ~iscell(default_value)
default_value={default_value};
end
if ~iscell(primary_keys)
primary_keys={primary_keys};
end
if ~iscell(secondary_keys)
secondary_keys={secondary_keys};
end
if ~iscell(value_key)
value_key={value_key};
end
primary_key_values = unique(T(:,primary_keys));
num_primary = size(primary_key_values,1);
[primary_key_map,primary_key_map] = ismember(T(:,primary_keys),primary_key_values);
secondary_key_values = unique(T(:,secondary_keys));
num_secondary = size(secondary_key_values,1);
[secondary_key_map,secondary_key_map] = ismember(T(:,secondary_keys),secondary_key_values);
%out =-1.*ones(max(T.KEY_A),max(T.KEY_B))
try
values = num2cell(T{:,value_key},2);
catch
values = num2cell(table2cell(T(:,value_key)),2);
end
if (~iscell(values))
values=num2cell(values);
end
OT=repmat(default_value,num_primary,num_secondary);
OT(sub2ind(size(OT),primary_key_map,secondary_key_map)) = values;
label_array = num2cell(cellfun(#(x,y)[x '_' mat2str(y)],...
repmat (secondary_keys,size(secondary_key_values,1),1),...
table2cell(secondary_key_values),'UniformOutput',false),1);
label_array = strcat(label_array{:});
OT = [primary_key_values,cell2table(OT,'VariableNames',label_array)];
end