Create an index to table - matlab

I have a table T as below:
T = table({'A';'A';'B';'B';'B';'B';'C';'C';'D';'D'},...
{'xd';'z';'x';'y';'z';'w';'x';'wh';'z';'w'},...
[4;2;4;1;2;5;2;1;1;5], ...
'VariableNames', {'memberId', 'productId','Rating'});
T =
memberId productId Rating
________ _________ ______
'A' 'xd' 4
'A' 'z' 2
'B' 'x' 4
'B' 'y' 1
'B' 'z' 2
'B' 'w' 5
'C' 'x' 2
'C' 'wh' 1
'D' 'z' 1
'D' 'w' 5
I need to index it by memberId and productId so the result is:
A: {'xd' 'z'}
B: {'x' 'y' 'z' 'w'}
C: {'x' 'wh'}
.......

You can use categorical arrays and a structure to do this:
% convert to categorical arrays
T.memberId = categorical(T.memberId);
T.productId = categorical(T.productId);
% cross-tabulate memberId vs. productId
cross_T = crosstab(T.memberId,T.productId);
% a function to return the productId for all 1 in row
productId = categories(T.productId).';
row = #(x) productId(logical(cross_T(x,:)));
% preform on all rows
rBy_c = arrayfun(row,1:size(cross_T,1),'UniformOutput',false).';
% convert to structure for readability
s = cell2struct(rBy_c,categories(T.memberId))
To get the output (s):
A: {'xd' 'z'}
B: {'w' 'x' 'y' 'z'}
C: {'wh' 'x'}
D: {'w' 'z'}

Related

RSA Algorithm - don't want to find 'd'

Is there any 'p' ,'q' and 'e' that can't find any 'd' for it?
for example with this parameters:
q=13 , p=7 , e=25
we can find d=49 for it with this:
(e * d) % (12 * 6) = 1
but i want 'p', 'q' and 'e' that no 'd' can find for it...
is it possible or not?

Detect cell entries in MATLAB Table

I have a Matlab table (the new 'Table' class), let's call it A:
A=table([1;2;3],{'A';'B';'C'})
As you can see, some of the columns are double, some are cell.
I'm trying to figure out which ones are cells.
For some reason, there is no A.Properties.class I can use, and I can't seem to call iscell on it.
What's the "Matlab" way of doing this? Do I have to loop through each column of the table to figure out its class?
One approach -
out = cellfun(#(x) iscell(getfield(A,x)),A.Properties.VariableNames)
Or, a better way would be to access the fields(variables) dynamically like so -
out = cellfun(#(x) iscell(A.(x)), A.Properties.VariableNames)
Sample runs:
Run #1 -
A=table([1;2;3],{4;5;6})
A =
Var1 Var2
____ ____
1 [4]
2 [5]
3 [6]
out =
0 1
Run #2 -
>> A=table([1;2;3],{'A';'B';'C'})
A =
Var1 Var2
____ ____
1 'A'
2 'B'
3 'C'
out =
0 1
Run #3 -
>> A=table([1;2;3],{4;5;6},{[99];'a';'b'},{'m';'n';'p'})
A =
Var1 Var2 Var3 Var4
____ ____ ____ ____
1 [4] [99] 'm'
2 [5] 'a' 'n'
3 [6] 'b' 'p'
>> out
out =
0 1 1 1
You could test with iscell(A.Var2) if the second variable is of type cell. More generally, you could reference columns by their index:
for k = 1 : width(A)
disp(iscell(A.(k)))
end

Extract Cumulative N-grams Matlab

I have an array of words:
x=['ae' ; 'be' ; 'ce' ; 'de' ; 'ee' ; 'fe']
I would like to extract sets of characters. So assume each set has N = 2 words, how can I go about getting return values that look like this
'ae' 'be'
'be' 'ce'
'ce' 'de'
'de' 'ee'
'ee' 'fe'
So if N = 2, I get back a matrix where each row contains pairs of the current and previous characters. If N=3 i will get back current and previous 2 chars for each row. I want to avoid loops if possible.
Any ideas?
You can use the Circulant Matrix Maltlab provides, truncate it as needed and use it as an index vector:
x = {'ae' ; 'be' ; 'ce' ; 'de' ; 'ee' ; 'fe'}
N = 3;
n = numel(x);
A = gallery('circul',n:-1:1)
B = fliplr( A(1:n-N+1,n-N+1:end) )
result = x(B)
or a little shorter:
A = fliplr( gallery('circul',n:-1:1) )
result = x( A(1:n-N+1,1:n-N) )
or another option using the hankel-Matrix:
A = hankel(1:n,1:N)
result = x( A(1:n-N+1,:) )
gives:
result =
'ae' 'be' 'ce'
'be' 'ce' 'de'
'ce' 'de' 'ee'
'de' 'ee' 'fe'

Deleting rows with specific rules

I got a 20*3 cell array and I need to delete the rows contains "137", "2" and "n:T"
Origin data:
'T' '' ''
'NP(*)' '' ''
[ 137] '' ''
[ 2] '' ''
'ARE' 'and' 'NP(FCC_A1#1)'
'' '' '1:T'
[ 1200] [0.7052] ''
[1.2051e+03] [0.7076] ''
'ARE' 'and' 'NP(FCC_A1#3)'
'' '' '2:T'
[ 1200] [0.0673] ''
[1.2051e+03] [0.0671] ''
'ARE' 'and' 'NP(M23C6)'
'' '' '3:T'
[ 1200] [0.2275] ''
[1.2051e+03] [0.2253] ''
[ 137] '' ''
[ 2] '' ''
And I want it to be like
'T' '' ''
'NP(*)' '' ''
'ARE' 'and' 'NP(FCC_A1#1)'
[ 1200] [0.7052] ''
[1.2051e+03] [0.7076] ''
'ARE' 'and' 'NP(FCC_A1#3)'
[ 1200] [0.0673] ''
[1.2051e+03] [0.0671] ''
'ARE' 'and' 'NP(M23C6)'
[ 1200] [0.2275] ''
[1.2051e+03] [0.2253] ''
I've tried regexp and strcmp and they don't work well. Plus the cell array also hard to deal with. Can anyone help?
Thank you in advance.
If you can somehow read your original data so that all cells are strings or empty arrays (not numeric values), you can do it with strcmp and regexprep:
% The variable 'data' is a 2D-cell array of strings or empty arrays
datarep = regexprep(data,'^\d+:T','2'); % replace 'n:T' with '2' for convenience
remove1 = strcmp('2',datarep); % this takes care of '2' and 'n:T'
remove2 = strcmp('137',datarep); % this takes care of '137'
rows_keep = find(~sum(remove1|remove2,2)); % rows that will be kept
solution = data(rows_keep,:)
For example, with this data
'aa' 'bb' 'cc'
'dd' 'dd' '2'
'137' 'dd' 'dd'
'dd' 'dd' '11:T'
'1:T' '1:137' 'dd'
'dd' '' []
the result in the variable solution is
'aa' 'bb' 'cc'
'dd' '' []
I just tried the following codes on my desktop and it seems to do the trick. I made a as the cell array you had.
L = size(a, 1);
mask = false(L, 1);
for ii = 1:L
if isnumeric(a{ii, 1}) && (a{ii, 1} == 137 || a{ii, 1} == 2)
mask(ii) = true;
elseif ~isempty(a{ii, 3}) && strcmp(a{ii, 3}(end-1:end), ':T')
mask(ii) = true;
end
end
b = a(~mask, :)
Now, b should be the cell array you wanted. Basically, I created a logical mask that indicates the position of rows that satisfy the rules, then use the inverse of it to call out the rows.
Here is another simple option:
%Anonymous function that checks if a cell is equal to 173 or to 2 or fits the '*:T*' pattern
Eq137or2 = #(x) sum(x == 137 | x == 2) | sum(strfind(num2str(x), ':T') > 1)
%Use the anonymous functions to find the rows you don't want
mask = = sum(cellfun(Eq137or2, a),2)
%Remove the unwanted rows
a(~mask, :)

removing duplicates - ** only when the duplicates occur in sequence

I would like to do something similar to the following, except I would only like to remove 'g' and'g' because they are the duplicates that occur one after each other. I would also like to keep the sequence the same.
Any help would be appreciated!!!
I have this cell array in MATLAB:
y = { 'd' 'f' 'a' 'g' 'g' 'w' 'a' 'h'}
ans =
'd' 'f' 'a' 'w' 'a' 'h'
There was an error in my first answer (below) when used on multiple duplicates (thanks grantnz). Here's an updated version:
>> y = { 'd' 'f' 'a' 'g' 'g' 'w' 'a' 'h' 'h' 'i' 'i' 'j'};
>> i = find(diff(char(y)) == 0);
>> y([i; i+1]) = []
y =
'd' 'f' 'a' 'w' 'a' 'j'
OLD ANSWER
If your "cell vector" always contains only single character elements you can do the following:
>> y = { 'd' 'f' 'a' 'g' 'g' 'w' 'a' 'h'}
y =
'd' 'f' 'a' 'g' 'g' 'w' 'a' 'h'
>> y(find(diff(char(y)) == 0) + [0 1]) = []
y =
'd' 'f' 'a' 'w' 'a' 'h'
Look at it like this: you want to keep an element if and only if either (1) it's the first element or (2) its predecessor is different from it and either (3) it's the last element or (4) its successor is different from it. So:
y([true ~strcmp(y(1:(end-1)),y(2:end))] & [~strcmp(y(1:(end-1)),y(2:end)) true])
or, perhaps better,
different = ~strcmp(y(1:(end-1)),y(2:end));
result = y([true different] & [different true]);
This should work:
y([ diff([y{:}]) ~= 0 true])
or slightly more compactly
y(diff([y{:}]) == 0) = []
Correction : The above wont remove both the duplicates
ind = diff([y{:}]) == 0;
y([ind 0] | [0 ind]) = []
BTW, this works even if there are multiple duplicate sequences
eg,
y = { 'd' 'f' 'a' 'g' 'g' 'w' 'a' 'h' 'h'};
ind = diff([y{:}]) == 0;
y([ind 0] | [0 ind]) = []
y =
'd' 'f' 'a' 'w' 'a'