Extract Cumulative N-grams Matlab - matlab

I have an array of words:
x=['ae' ; 'be' ; 'ce' ; 'de' ; 'ee' ; 'fe']
I would like to extract sets of characters. So assume each set has N = 2 words, how can I go about getting return values that look like this
'ae' 'be'
'be' 'ce'
'ce' 'de'
'de' 'ee'
'ee' 'fe'
So if N = 2, I get back a matrix where each row contains pairs of the current and previous characters. If N=3 i will get back current and previous 2 chars for each row. I want to avoid loops if possible.
Any ideas?

You can use the Circulant Matrix Maltlab provides, truncate it as needed and use it as an index vector:
x = {'ae' ; 'be' ; 'ce' ; 'de' ; 'ee' ; 'fe'}
N = 3;
n = numel(x);
A = gallery('circul',n:-1:1)
B = fliplr( A(1:n-N+1,n-N+1:end) )
result = x(B)
or a little shorter:
A = fliplr( gallery('circul',n:-1:1) )
result = x( A(1:n-N+1,1:n-N) )
or another option using the hankel-Matrix:
A = hankel(1:n,1:N)
result = x( A(1:n-N+1,:) )
gives:
result =
'ae' 'be' 'ce'
'be' 'ce' 'de'
'ce' 'de' 'ee'
'de' 'ee' 'fe'

Related

In Perl , How to generate All Possible Patterns ,

may i know how to ( in Perl ) ,
generate below All Possible Patterns in a file and on screen output , and each slot in the pattern can be accessed , ?!
many thanks for all ,
input value ,
1 , no. of slots ,
2 , no. of objects ,
for example ,
no. of object = 2 , { a , b } ,
no. of slots = 4 ,
then , output ,
no. of all possible patterns = 2^4 = 16 ,
then ,
row is 16 ,
column is 8 ,
eachSlot[i][j] = allow assign or change its value ,
then , output format look like ,
a a a a
a a a b
a a b a
a a b b
a b a a
a b a b
a b b a
a b b b
b a a a
b a a b
b a b a
b a b b
b b a a
b b a b
b b b a
b b b b
and ,
if see 'a' , then do sth actionX ,
if see 'b' , then do sth actionY ,
many thanks for all the advices and helps ,
use Algorithm::Loops qw( NestedLoops );
my #syms = qw( a b );
my $num_slots = 4;
my $iter = NestedLoops([ ( \#syms ) x $num_slots ]);
while ( my #items = $iter->() ) {
say "#items";
}
I made Set::CrossProduct:
use v5.10;
use Set::CrossProduct;
my $set = Set::CrossProduct->new( [ [ qw(a b) ] x 4 ] );
while( my $next = $set->get ) {
say "#$next";
}
ikegami showed the Algorithm::Loops module, which is also fine to get all the combinations.

Create an index to table

I have a table T as below:
T = table({'A';'A';'B';'B';'B';'B';'C';'C';'D';'D'},...
{'xd';'z';'x';'y';'z';'w';'x';'wh';'z';'w'},...
[4;2;4;1;2;5;2;1;1;5], ...
'VariableNames', {'memberId', 'productId','Rating'});
T =
memberId productId Rating
________ _________ ______
'A' 'xd' 4
'A' 'z' 2
'B' 'x' 4
'B' 'y' 1
'B' 'z' 2
'B' 'w' 5
'C' 'x' 2
'C' 'wh' 1
'D' 'z' 1
'D' 'w' 5
I need to index it by memberId and productId so the result is:
A: {'xd' 'z'}
B: {'x' 'y' 'z' 'w'}
C: {'x' 'wh'}
.......
You can use categorical arrays and a structure to do this:
% convert to categorical arrays
T.memberId = categorical(T.memberId);
T.productId = categorical(T.productId);
% cross-tabulate memberId vs. productId
cross_T = crosstab(T.memberId,T.productId);
% a function to return the productId for all 1 in row
productId = categories(T.productId).';
row = #(x) productId(logical(cross_T(x,:)));
% preform on all rows
rBy_c = arrayfun(row,1:size(cross_T,1),'UniformOutput',false).';
% convert to structure for readability
s = cell2struct(rBy_c,categories(T.memberId))
To get the output (s):
A: {'xd' 'z'}
B: {'w' 'x' 'y' 'z'}
C: {'wh' 'x'}
D: {'w' 'z'}

Extract numbers from string in MATLAB

I'm working with sscanf to extract a number from a string. The strings are usually in the form of:
'44 ppm'
'10 gallons'
'23.4 inches'
but ocassionally they are in the form of:
'<1 ppm'
If I use the following code:
x = sscanf('1 ppm','%f')
I get an output of
1
But if I add the less than sign in front of the one:
x = sscanf('<1 ppm','%f')
I get:
[]
How can I write this code so this actually produces a number? I'm not sure yet what number it should print...but let's just say it should print 1 for the moment.
You can use regexp:
s= '<1 ppm';
x=regexp(s, '.*?(\d+(\.\d+)*)', 'tokens' )
x{1}
Demo :
>> s= {'44 ppm', '10 gallons', '23.4 inches', '<1 ppm' } ;
>> x = regexp(s, '.*?(\d+(\.\d+)*)', 'tokens' );
>> cellfun( #(x) disp(x{1}), x ) % Demo for all
'44'
'10'
'23.4'
'1'

Replacing string with empty vector

I'm trying to modify this code so that if the input of this function contains the letter 'Z', it will return return an empty vector. I am able to do this for the letter 'Q' or 'Z' if it is at the beginning of string, but unfortunately it won't work if either of these two letters are at the end.
function d = change(a)
new_claim = regexprep(a, 'A', '2');
new_claim1 = regexprep(new_claim, 'B', '2');
new_claim2 = regexprep(new_claim1, 'C', '2');
new_claim3 = regexprep(new_claim2, 'D', '3');
new_claim4 = regexprep(new_claim3, 'E', '3');
new_claim5 = regexprep(new_claim4, 'F', '3');
new_claim6 = regexprep(new_claim5, 'G', '4');
new_claim7 = regexprep(new_claim6, 'H', '4');
new_claim8 = regexprep(new_claim7, 'I', '4');
new_claim9 = regexprep(new_claim8, 'J', '5');
new_claim10 = regexprep(new_claim9, 'K', '5');
new_claim11 = regexprep(new_claim10, 'L', '5');
new_claim12 = regexprep(new_claim11, 'M', '6');
new_claim13 = regexprep(new_claim12, 'N', '6');
new_claim14 = regexprep(new_claim13, 'O', '6');
new_claim15 = regexprep(new_claim14, 'P', '7');
new_claim16 = regexprep(new_claim15, 'R', '7');
new_claim17 = regexprep(new_claim16, 'S', '7');
new_claim18 = regexprep(new_claim17, 'T', '8');
new_claim19 = regexprep(new_claim18, 'U', '8');
new_claim20 = regexprep(new_claim19, 'V', '8');
new_claim21 = regexprep(new_claim20, 'W', '9');
new_claim22 = regexprep(new_claim21, 'X', '9');
new_claim23 = regexprep(new_claim22, 'Y', '9');
new_claim24 = regexprep(new_claim23, '-', ' ');
new_claim25 = regexprep(new_claim24, '(', '');
new_claim26 = regexprep(new_claim25, ')','');
d = new_claim26;
if strfind(d,'Q') == true
d = [];
elseif strfind(d,'Z') == true
d = [];
else
return;
end
If it's your desire to check to see if a string contains the letter Z or z, maybe put this at the beginning of your code:
if ~isempty(regexp(a, '[Zz]'))
d = [];
return;
end
If you also wanted to check for Q or q, you can do:
if ~isempty(regexp(a, '[ZzQq]'))
d = [];
return;
end
The above uses a regular expression to see if there are any characters in your string that contain either Z or z (or Q or q, depending on what you want). regexp returns the indices of where these characters were found. If there were Z or z (or Q or q characters, depending on what you want) characters found, then the indices would be non-empty, hence the ~isempty check. If there were no Z or z (or Q or q) characters that were found, this would be empty and so this statement is skipped. What's important is that if we have found Z, z (or Q, q) characters, we immediately make d empty and return so that the rest of the logic is not run.
You can then carry on with the rest of your code.
You can check if a character is in a string with: any(d == 'Q') || any(d == 'Z')

removing duplicates - ** only when the duplicates occur in sequence

I would like to do something similar to the following, except I would only like to remove 'g' and'g' because they are the duplicates that occur one after each other. I would also like to keep the sequence the same.
Any help would be appreciated!!!
I have this cell array in MATLAB:
y = { 'd' 'f' 'a' 'g' 'g' 'w' 'a' 'h'}
ans =
'd' 'f' 'a' 'w' 'a' 'h'
There was an error in my first answer (below) when used on multiple duplicates (thanks grantnz). Here's an updated version:
>> y = { 'd' 'f' 'a' 'g' 'g' 'w' 'a' 'h' 'h' 'i' 'i' 'j'};
>> i = find(diff(char(y)) == 0);
>> y([i; i+1]) = []
y =
'd' 'f' 'a' 'w' 'a' 'j'
OLD ANSWER
If your "cell vector" always contains only single character elements you can do the following:
>> y = { 'd' 'f' 'a' 'g' 'g' 'w' 'a' 'h'}
y =
'd' 'f' 'a' 'g' 'g' 'w' 'a' 'h'
>> y(find(diff(char(y)) == 0) + [0 1]) = []
y =
'd' 'f' 'a' 'w' 'a' 'h'
Look at it like this: you want to keep an element if and only if either (1) it's the first element or (2) its predecessor is different from it and either (3) it's the last element or (4) its successor is different from it. So:
y([true ~strcmp(y(1:(end-1)),y(2:end))] & [~strcmp(y(1:(end-1)),y(2:end)) true])
or, perhaps better,
different = ~strcmp(y(1:(end-1)),y(2:end));
result = y([true different] & [different true]);
This should work:
y([ diff([y{:}]) ~= 0 true])
or slightly more compactly
y(diff([y{:}]) == 0) = []
Correction : The above wont remove both the duplicates
ind = diff([y{:}]) == 0;
y([ind 0] | [0 ind]) = []
BTW, this works even if there are multiple duplicate sequences
eg,
y = { 'd' 'f' 'a' 'g' 'g' 'w' 'a' 'h' 'h'};
ind = diff([y{:}]) == 0;
y([ind 0] | [0 ind]) = []
y =
'd' 'f' 'a' 'w' 'a'