Discriminate between empty and non empty regexp matches in Matlab

Discriminate between empty and non empty regexp matches in Matlab - matlab

I'm trying to match some strings in Matlab and create a new table from the matches.
The variable txt contains:
Columns 1 through 4
'Time' 'LR1R2' 'LR1R2_SD' 'LR1R2_I'
Columns 5 through 8
'LR1R2_SD' 'R1' 'R1_SD' 'R1_I'
Columns 9 through 12
'R1_I_SD' 'R2' 'R2_SD' 'R2_I'
Column 13
'R2_I_SD'
And I want to select all those with '_SD' on the end of the string
pattern='_SD';
match=regexp(txt,pattern)
which returns:
match =
Columns 1 through 8
[] [] [6] [] [6] [] [3] []
Columns 9 through 13
[5] [] [3] [] [5]
Does anybody know how to discriminate between the empty and non empty matches? My aim is to build a new table from the matches. Here is what I've tried
for i=match,
~isempty(i)
end
But this returns true for everything.
Thanks

The regexp function returns a cell array, where each cell contains either an empty array (i.e. []), or a number (e.g. [6]). To go through all cells of this cell array, you can use the cellfun function and apply the isempty function to each cell:
~cellfun(#isempty,match)
which returns
ans =
0 0 1 0 1 0 1 0 1 0 1 0 1
As #Divakar correctly remarks, using
~cellfun('isempty',match)
is much faster.
When the command is run 100'000 times, I measured the following run times:
With #isempty:
Elapsed time is 0.757626 seconds.
With 'isempty':
Elapsed time is 0.118241 seconds.
Note that this syntax is not available for all functions. From the MATLAB documentation on cellfun:
cellfun accepts function name strings for function func, rather than a
function handle, for these function names: isempty, islogical, isreal,
length, ndims, prodofsize, size, isclass. Enclose the function name in
single quotes.

The answer I was looking for is something like:
for i=1:length(match),
if ~isequal(match(i),{[]}),
num(:,i)
end
end
As hbaderts suggested the following is also a way to do this:
~cellfun(#isempty,match)

Related

Iterate (/) a multivalent function

How do you iterate a function of multivalent rank (>1), e.g. f:{[x;y] ...} where the function inputs in the next iteration step depend on the last iteration step? Examples in the reference manual only iterate unary functions.
I was able to achieve this indirectly (and verbosely) by passing a dictionary of arguments (state) into unary function:
f:{[arg] key[arg]!(min arg;arg[`y]-2)}
f/[{0<x`x};`x`y!6 3]
Note that projection, e.g. f[x;]/[whilecond;y] would only work in the scenario where the x in the next iteration step does not depend on the result of the last iteration (i.e. when x is path-independent).

In relation to Rahul's answer, you could use one of the following (slightly less verbose) methods to achieve the same result:
q)g:{(min x,y;y-2)}
q)(g .)/[{0<x 0};6 3]
-1 -3
q).[g]/[{0<x 0};6 3]
-1 -3
Alternatively, you could use the .z.s self function, which recursively calls the function g and takes the output of the last iteration as its arguments. For example,
q)g:{[x;y] x: min x,y; y:y-2; $[x<0; (x;y); .z.s[x;y]]}
q)g[6;3]
-1 -3

Function that is used with '/' and '\' can only accept result from last iteration as a single item which means only 1 function parameter is reserved for the result. It is unary in that sense.
For function whose multiple input parameters depends on last iteration result, one solution is to wrap that function inside a unary function and use apply operator to execute that function on the last iteration result.
Ex:
q) g:{(min x,y;y-2)} / function with rank 2
q) f:{x . y}[g;] / function g wrapped inside unary function to iterate
q) f/[{0<x 0};6 3]

Over time I stumbled upon even shorter way which does not require parentheses or brackets:
q)g:{(min x,y;y-2)}
q){0<x 0} g//6 3
-1 -3
Why does double over (//) work ? The / adverb can sometimes be used in place of the . (apply) operator:
q)(*) . 2 3
6
q)(*/) 2 3
6

Find substring in cell array of numbers and strings

I have a cell array consisting of numbers, strings, and empty arrays. I want to find the position (linear or indexed) of all cells containing a string in which a certain substring of interest appears.
mixedCellArray = {
'adpo' 2134 []
0 [] 'daesad'
'xxxxx' 'dp' 'dpdpd'
}
If the substring of interest is 'dp', then I should get the indices for three cells.
The only solutions I can find work when the cell array contains only strings:
http://www.mathworks.com/matlabcentral/answers/2015-find-index-of-cells-containing-my-string
http://www.mathworks.com/matlabcentral/newsreader/view_thread/255090
One work-around is to find all cells not containing strings, and fill them with '', as hinted by this posting. Unfortunately, my approach requires a variation of that solution, probably something like cellfun('ischar',mixedCellArray). This causes the error:
Error using cellfun
Unknown option.
Thanks for any suggestions on how to figure out the error.
I've posted this to usenet
EDUCATIONAL AFTERNOTE: For those who don't have Matlab at home, and end up bouncing back and forth between Matlab and Octave. I asked above why cellfun doesn't accept 'ischar' as its first argument. The answer turns out to be that the argument must be a function handle in Matlab, so you really need to pass #ischar. There are some functions whose names can be passed as strings, for backward compatibility, but ischar is not one of them.

How about this one-liner:
>> mixedCellArray = {'adpo' 2134 []; 0 [] 'daesad'; 'xxxxx' 'dp' 'dpdpd'};
>> index = cellfun(#(c) ischar(c) && ~isempty(strfind(c, 'dp')), mixedCellArray)
index =
3×3 logical array
1 0 0
0 0 0
0 1 1
You could get by without the ischar(c) && ..., but you will likely want to keep it there since strfind will implicitly convert any numeric values/arrays into their equivalent ASCII characters to do the comparison. That means you could get false positives, as in this example:
>> C = {65, 'A'; 'BAD' [66 65 68]} % Note there's a vector in there
C =
2×2 cell array
[ 65] 'A'
'BAD' [1×3 double]
>> index = cellfun(#(c) ~isempty(strfind(c, 'A')), C) % Removed ischar(c) &&
index =
2×2 logical array
1 1 % They all match!
1 1

Just use a loop, testing with ischar and contains (added in R2016b). The various *funs are basically loops and, in general, do not offer any performance advantage over the explicit loop.
mixedCellArray = {'adpo' 2134 []; 0 [] 'daesad'; 'xxxxx' 'dp' 'dpdpd'};
querystr = 'dp';
test = false(size(mixedCellArray));
for ii = 1:numel(mixedCellArray)
if ischar(mixedCellArray{ii})
test(ii) = contains(mixedCellArray{ii}, querystr);
end
end
Which returns:
test =
3×3 logical array
1 0 0
0 0 0
0 1 1
Edit:
If you don't have a MATLAB version with contains you can substitute a regex:
test(ii) = ~isempty(regexp(mixedCellArray{ii}, querystr, 'once'));

z=cellfun(#(x)strfind(x,'dp'),mixedCellArray,'un',0);
idx=cellfun(#(x)x>0,z,'un',0);
find(~cellfun(#isempty,idx))

Here is a solution from the usenet link in my original post:
>> mixedCellArray = {
'adpo' 2134 []
0 [] 'daesad'
'xxxxx' 'dp' 'dpdpd'
}
mixedCellArray =
'adpo' [2134] []
[ 0] [] 'daesad'
'xxxxx' 'dp' 'dpdpd'
>> ~cellfun( #isempty , ...
cellfun( #(x)strfind(x,'dp') , ...
mixedCellArray , ...
'uniform',0) ...
)
ans =
1 0 0
0 0 0
0 1 1
The inner cellfun is able to apply strfind to even numerical cells because, I presume, Matlab treats numerical arrays and strings the same way. A string is just an array of numbers representing the character codes. The outer cellfun identifies all cells for which the inner cellfun found a match, and the prefix tilde turns that into all cells for which there was NO match.
Thanks to dpb.

Give each variable a name based on an already existing logical-ID vector (MATLAB)

I have length(C) number of variables. Each index represents a uniqe type of variable (in my optimization model), e.g. wheter it is electricity generation, transmission line capacity etc..
However, I have a logical vector with the same length as C (all variables) indicating if it is e.g. generation:
% length(genoidx)=length(C), i.e. the number of variables
genoidx = [1 1 1 1 1 1 0 0 ... 1 1 1 1 1 1 0 0]
In this case, there are 6 generators in 2 time steps, amounting to 12 variables.
I want to name each variable to get a better overview of the output from the optimization model, f.ex. like this:
% This is only a try on pseudo coding
varname = cell(length(C),1)
varname(genoidx) = 'geno' (1 2 3 4 5 6 ... 1 2 3 4 5 6)
varname(lineidx) = 'line' (...
Any suggestions on how to name the variables in C with string and number, based on logical ID-vector?
Thanks!

Using dynamic names is maybe OK for the seeing the results of a calculation in the workspace, but I wouldn't use them if any code is ever going to read them.
You can use the assignin('base') function to do this.
I'm not quite sure what your pseudo code is attempting to do, but you could do something like:
>> varname={'aaa','bbb','ccc','ddd'}
varname =
'aaa' 'bbb' 'ccc' 'ddd'
>> genoidx=logical([1,0,1,1])
genoidx =
1 0 1 1
>> assignin('base', sprintf('%s_',varname{genoidx}), 22)
which would create the variable aaa_ccc_ddd_ in the workspace and assign the number 22 to it.
Alternatively you could use an expression like:
sum(genoidx.*(length(genoidx):-1:1))
to calculate a decimal value and index a cell array of bespoke names:
>> varname={'aaa','bbb','ccc','ddd','eee','fff','ggg','hhh'}
varname =
'aaa' 'bbb' 'ccc' 'ddd' 'eee' 'fff' 'ggg' 'hhh'
>> assignin('base', varname{sum(genoidx.*(length(genoidx):-1:1))}, 33)
which would create the variable ggg and assign 33 to it.

Matlab: How do I check the length string got more than certain number

I want to check the length of string got more than 20 characters, if more than 20 then will return 1 else return 0 in matrix form [n x 1]. But now, I get the answer of [1x1]. How do I modify my code in if-else statement to get the ans?
str = {'http://www.mathworks.com/matlabcentral/newsreader/view_thread/324182',
'http://jitkomut.lecturer.eng.chula.ac.th/matlab/text.html',
'http://www.ee.ic.ac.uk/pcheung/teaching/ee2_signals/Introduction%20to%20Matlab2.pdf'};
a = cellfun(#length,str)
if a > 20
'1'
else
'0'
end
Output:
a =
68
57
83
ans =
1
I want the output of, lets say
ans =
1
1
1

In Matlab, you can simply use (no if statement is needed):
a = cellfun(#length,str)
(a>20)'
This will give you:
a =
68 57 83
ans =
1
1
1

As #herohuyongtao mentions, you don't actually need an if, the if will only consider the first element of the matrix that it returns, hence giving you only a single value.
But you could actually do this all in your cellfun by using an anonymous function:
cellfun(#(x)(length(x) > 20), str)
And get the result in one shot.

As there is no equivalent of the c ternary operator (?:) in matlab, you can use the following two statements to replace your if then else statement, and achieve what you ask for:
b(a==a)='0'
b(a>20)='1'
The first line initializes the result array, where all value b defaults to the value of the else branch, i.e. '0',
the second line changes the elements for which the conditional > 20 holds to the value in the then branch, i.e. '1'.
If the output values are boolean, you can simply do:
(a>20)
as #herohuyongtao suggested or use #Dan's answer.

How can I filter my array of numbers in Matlab/Octave?

I have a very trivial example where I'm trying to filter by matching a String:
A = [0:1:999];
B = A(int2str(A) == '999');
This
A(A > 990);
works
This
int2str(5) == '5'
also works
I just can't figure out why I cannot put the two together. I get an error about nonconformant arguments.

int2str(A) produces a very long char array (of size 1 x 4996) containing the string representations of all those numbers (including spacing) appended together end to end.
int2str(A) == '999'
So, in the statement above, you're trying to compare a matrix of size 1 x 4996 with another of size 1 x 3. This, of course, fails as the two either need to be of the same size, or at least one needs to be a scalar, in which case scalar expansion rules apply.
A(A > 990);
The above works because of logical indexing rules, the result will be the elements from the indices of A for which that condition holds true.
int2str(5) == '5'
This only works because the result of the int2str call is a 1 x 1 matrix ('5') and you're comparing it to another matrix of the same size. Try int2str(555) == '55' and it'll fail with the same error as above.
I'm not sure what result you expected from the original statements, but maybe you're looking for this:
A = [0:1:999];
B = int2str(A(A == 999)) % outputs '999'

I am not sure that the int2str() conversion is what you are looking for. (Also, why do you need to convert numbers to strings and then carry out a char comparison?)
Suppose you have a simpler case:
A = 1:3;
strA = int2str(A)
strA =
1 2 3
Note that this is a 1x7 char array. Thus, comparing it against a scalar char:
strA == '2'
ans =
0 0 0 1 0 0 0
Now, you might wanna transpose A and carry out the comparison:
int2str(A')=='2'
ans =
0
1
0
however, this approach will not work if the number of digits of each number is not the same because lower numbers will be padded with spaces (try creating A = 1:10 and comparing against '2').
Then, create a cell array of string without whitespaces and use strcmp():
csA = arrayfun(#int2str,A','un',0)
csA =
'1'
'2'
'3'
strcmp('2',csA)

Should be much faster, and correct to turn the string into a number, than the other way around. Try
B = A(A == str2double ('999'));

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Discriminate between empty and non empty regexp matches in Matlab - matlab

The answer I was looking for is something like: for i=1:length(match), if ~isequal(match(i),{[]}), num(:,i) end end As hbaderts suggested the following is also a way to do this: ~cellfun(#isempty,match)

Related

Iterate (/) a multivalent function

Find substring in cell array of numbers and strings

Give each variable a name based on an already existing logical-ID vector (MATLAB)

Matlab: How do I check the length string got more than certain number

How can I filter my array of numbers in Matlab/Octave?

Categories

Resources