Find substring in cell array of numbers and strings - matlab

I have a cell array consisting of numbers, strings, and empty arrays. I want to find the position (linear or indexed) of all cells containing a string in which a certain substring of interest appears.
mixedCellArray = {
'adpo' 2134 []
0 [] 'daesad'
'xxxxx' 'dp' 'dpdpd'
}
If the substring of interest is 'dp', then I should get the indices for three cells.
The only solutions I can find work when the cell array contains only strings:
http://www.mathworks.com/matlabcentral/answers/2015-find-index-of-cells-containing-my-string
http://www.mathworks.com/matlabcentral/newsreader/view_thread/255090
One work-around is to find all cells not containing strings, and fill them with '', as hinted by this posting. Unfortunately, my approach requires a variation of that solution, probably something like cellfun('ischar',mixedCellArray). This causes the error:
Error using cellfun
Unknown option.
Thanks for any suggestions on how to figure out the error.
I've posted this to usenet
EDUCATIONAL AFTERNOTE: For those who don't have Matlab at home, and end up bouncing back and forth between Matlab and Octave. I asked above why cellfun doesn't accept 'ischar' as its first argument. The answer turns out to be that the argument must be a function handle in Matlab, so you really need to pass #ischar. There are some functions whose names can be passed as strings, for backward compatibility, but ischar is not one of them.

How about this one-liner:
>> mixedCellArray = {'adpo' 2134 []; 0 [] 'daesad'; 'xxxxx' 'dp' 'dpdpd'};
>> index = cellfun(#(c) ischar(c) && ~isempty(strfind(c, 'dp')), mixedCellArray)
index =
3×3 logical array
1 0 0
0 0 0
0 1 1
You could get by without the ischar(c) && ..., but you will likely want to keep it there since strfind will implicitly convert any numeric values/arrays into their equivalent ASCII characters to do the comparison. That means you could get false positives, as in this example:
>> C = {65, 'A'; 'BAD' [66 65 68]} % Note there's a vector in there
C =
2×2 cell array
[ 65] 'A'
'BAD' [1×3 double]
>> index = cellfun(#(c) ~isempty(strfind(c, 'A')), C) % Removed ischar(c) &&
index =
2×2 logical array
1 1 % They all match!
1 1

Just use a loop, testing with ischar and contains (added in R2016b). The various *funs are basically loops and, in general, do not offer any performance advantage over the explicit loop.
mixedCellArray = {'adpo' 2134 []; 0 [] 'daesad'; 'xxxxx' 'dp' 'dpdpd'};
querystr = 'dp';
test = false(size(mixedCellArray));
for ii = 1:numel(mixedCellArray)
if ischar(mixedCellArray{ii})
test(ii) = contains(mixedCellArray{ii}, querystr);
end
end
Which returns:
test =
3×3 logical array
1 0 0
0 0 0
0 1 1
Edit:
If you don't have a MATLAB version with contains you can substitute a regex:
test(ii) = ~isempty(regexp(mixedCellArray{ii}, querystr, 'once'));

z=cellfun(#(x)strfind(x,'dp'),mixedCellArray,'un',0);
idx=cellfun(#(x)x>0,z,'un',0);
find(~cellfun(#isempty,idx))

Here is a solution from the usenet link in my original post:
>> mixedCellArray = {
'adpo' 2134 []
0 [] 'daesad'
'xxxxx' 'dp' 'dpdpd'
}
mixedCellArray =
'adpo' [2134] []
[ 0] [] 'daesad'
'xxxxx' 'dp' 'dpdpd'
>> ~cellfun( #isempty , ...
cellfun( #(x)strfind(x,'dp') , ...
mixedCellArray , ...
'uniform',0) ...
)
ans =
1 0 0
0 0 0
0 1 1
The inner cellfun is able to apply strfind to even numerical cells because, I presume, Matlab treats numerical arrays and strings the same way. A string is just an array of numbers representing the character codes. The outer cellfun identifies all cells for which the inner cellfun found a match, and the prefix tilde turns that into all cells for which there was NO match.
Thanks to dpb.

Related

find string in non-scalar structure matlab

Here's a non-scalar structure in matlab:
clearvars s
s=struct;
for id=1:3
s(id).wa='nko';
s(id).test='5';
s(id).ad(1,1).treasurehunt='asdf'
s(id).ad(1,2).treasurehunt='as df'
s(id).ad(1,3).treasurehunt='foobar'
s(id).ad(2,1).treasurehunt='trea'
s(id).ad(2,2).treasurehunt='foo bar'
s(id).ad(2,3).treasurehunt='treasure'
s(id).ad(id,4).a=magic(5);
end
is there an easy way to test if the structure s contains the string 'treasure' without having to loop through every field (e.g. doing a 'grep' through the actual content of the variable)?
The aim is to see 'quick and dirtily' whether a string exists (regardless of where) in the structure. In other words (for Linux users): I'd like to use 'grep' on a matlab variable.
I tried arrayfun(#(x) any(strcmp(x, 'treasure')), s) with no success, output:
ans =
1×3 logical array
0 0 0
One general approach (applicable to any structure array s) is to convert your structure array to a cell array using struct2cell, test if the contents of any of the cells are equal to the string 'treasure', and recursively repeat the above for any cells that contain structures. This can be done in a while loop that stops if either the string is found or there are no structures left to recurse through. Here's the solution implemented as a function:
function found = string_hunt(s, str)
c = reshape(struct2cell(s), [], 1);
found = any(cellfun(#(v) isequal(v, str), c));
index = cellfun(#isstruct, c);
while ~found && any(index)
c = cellfun(#(v) {reshape(struct2cell(v), [], 1)}, c(index));
c = vertcat(c{:});
found = any(cellfun(#(c) isequal(c, str), c));
index = cellfun(#isstruct, c);
end
end
And using your sample structure s:
>> string_hunt(s, 'treasure')
ans =
logical
1 % True!
This is one way to avoid an explicit loop
% Collect all the treasurehunt entries into a cell with strings
s_cell={s(1).ad.treasurehunt, s(2).ad.treasurehunt, s(3).ad.treasurehunt};
% Check if any 'treasure 'entries exist
find_treasure=nonzeros(strcmp('treasure', s_cell));
% Empty if none
if isempty(find_treasure)
disp('Nothing found')
else
disp(['Found treasure ',num2str(length(find_treasure)), ' times'])
end
Note that you can also just do
% Collect all the treasurehunt entries into a cell with strings
s_cell={s(1).ad.treasurehunt, s(2).ad.treasurehunt, s(3).ad.treasurehunt};
% Check if any 'treasure 'entries exist
find_treasure=~isempty(nonzeros(strcmp('treasure', s_cell)));
..if you're not interested in the number of occurences
Depending on the format of your real data, and if you can find strings that contain your string:
any( ~cellfun('isempty',strfind( arrayfun( #(x)[x.ad.treasurehunt],s,'uni',0 ) ,str)) )

Discriminate between empty and non empty regexp matches in Matlab

I'm trying to match some strings in Matlab and create a new table from the matches.
The variable txt contains:
Columns 1 through 4
'Time' 'LR1R2' 'LR1R2_SD' 'LR1R2_I'
Columns 5 through 8
'LR1R2_SD' 'R1' 'R1_SD' 'R1_I'
Columns 9 through 12
'R1_I_SD' 'R2' 'R2_SD' 'R2_I'
Column 13
'R2_I_SD'
And I want to select all those with '_SD' on the end of the string
pattern='_SD';
match=regexp(txt,pattern)
which returns:
match =
Columns 1 through 8
[] [] [6] [] [6] [] [3] []
Columns 9 through 13
[5] [] [3] [] [5]
Does anybody know how to discriminate between the empty and non empty matches? My aim is to build a new table from the matches. Here is what I've tried
for i=match,
~isempty(i)
end
But this returns true for everything.
Thanks
The regexp function returns a cell array, where each cell contains either an empty array (i.e. []), or a number (e.g. [6]). To go through all cells of this cell array, you can use the cellfun function and apply the isempty function to each cell:
~cellfun(#isempty,match)
which returns
ans =
0 0 1 0 1 0 1 0 1 0 1 0 1
As #Divakar correctly remarks, using
~cellfun('isempty',match)
is much faster.
When the command is run 100'000 times, I measured the following run times:
With #isempty:
Elapsed time is 0.757626 seconds.
With 'isempty':
Elapsed time is 0.118241 seconds.
Note that this syntax is not available for all functions. From the MATLAB documentation on cellfun:
cellfun accepts function name strings for function func, rather than a
function handle, for these function names: isempty, islogical, isreal,
length, ndims, prodofsize, size, isclass. Enclose the function name in
single quotes.
The answer I was looking for is something like:
for i=1:length(match),
if ~isequal(match(i),{[]}),
num(:,i)
end
end
As hbaderts suggested the following is also a way to do this:
~cellfun(#isempty,match)

Matlab: How do I check the length string got more than certain number

I want to check the length of string got more than 20 characters, if more than 20 then will return 1 else return 0 in matrix form [n x 1]. But now, I get the answer of [1x1]. How do I modify my code in if-else statement to get the ans?
str = {'http://www.mathworks.com/matlabcentral/newsreader/view_thread/324182',
'http://jitkomut.lecturer.eng.chula.ac.th/matlab/text.html',
'http://www.ee.ic.ac.uk/pcheung/teaching/ee2_signals/Introduction%20to%20Matlab2.pdf'};
a = cellfun(#length,str)
if a > 20
'1'
else
'0'
end
Output:
a =
68
57
83
ans =
1
I want the output of, lets say
ans =
1
1
1
In Matlab, you can simply use (no if statement is needed):
a = cellfun(#length,str)
(a>20)'
This will give you:
a =
68 57 83
ans =
1
1
1
As #herohuyongtao mentions, you don't actually need an if, the if will only consider the first element of the matrix that it returns, hence giving you only a single value.
But you could actually do this all in your cellfun by using an anonymous function:
cellfun(#(x)(length(x) > 20), str)
And get the result in one shot.
As there is no equivalent of the c ternary operator (?:) in matlab, you can use the following two statements to replace your if then else statement, and achieve what you ask for:
b(a==a)='0'
b(a>20)='1'
The first line initializes the result array, where all value b defaults to the value of the else branch, i.e. '0',
the second line changes the elements for which the conditional > 20 holds to the value in the then branch, i.e. '1'.
If the output values are boolean, you can simply do:
(a>20)
as #herohuyongtao suggested or use #Dan's answer.

How can I filter my array of numbers in Matlab/Octave?

I have a very trivial example where I'm trying to filter by matching a String:
A = [0:1:999];
B = A(int2str(A) == '999');
This
A(A > 990);
works
This
int2str(5) == '5'
also works
I just can't figure out why I cannot put the two together. I get an error about nonconformant arguments.
int2str(A) produces a very long char array (of size 1 x 4996) containing the string representations of all those numbers (including spacing) appended together end to end.
int2str(A) == '999'
So, in the statement above, you're trying to compare a matrix of size 1 x 4996 with another of size 1 x 3. This, of course, fails as the two either need to be of the same size, or at least one needs to be a scalar, in which case scalar expansion rules apply.
A(A > 990);
The above works because of logical indexing rules, the result will be the elements from the indices of A for which that condition holds true.
int2str(5) == '5'
This only works because the result of the int2str call is a 1 x 1 matrix ('5') and you're comparing it to another matrix of the same size. Try int2str(555) == '55' and it'll fail with the same error as above.
I'm not sure what result you expected from the original statements, but maybe you're looking for this:
A = [0:1:999];
B = int2str(A(A == 999)) % outputs '999'
I am not sure that the int2str() conversion is what you are looking for. (Also, why do you need to convert numbers to strings and then carry out a char comparison?)
Suppose you have a simpler case:
A = 1:3;
strA = int2str(A)
strA =
1 2 3
Note that this is a 1x7 char array. Thus, comparing it against a scalar char:
strA == '2'
ans =
0 0 0 1 0 0 0
Now, you might wanna transpose A and carry out the comparison:
int2str(A')=='2'
ans =
0
1
0
however, this approach will not work if the number of digits of each number is not the same because lower numbers will be padded with spaces (try creating A = 1:10 and comparing against '2').
Then, create a cell array of string without whitespaces and use strcmp():
csA = arrayfun(#int2str,A','un',0)
csA =
'1'
'2'
'3'
strcmp('2',csA)
Should be much faster, and correct to turn the string into a number, than the other way around. Try
B = A(A == str2double ('999'));

Trouble identifying space char with isspace

Help! For some reason my function is not identifying spaces on all sets of data. See below:
I am using the following function in my code:
function [ll]=f_get_length(A)
l1=length(A);
for ii=1:l1
if A(ii) == ' '
ll=ii;
break
end
end
but I get to a dataset that gives the following error:
Error in ==> f_get_length at 3
l1=length(A);
??? Output argument "ll" (and maybe others) not assigned during call to
"/home/geovault-01/abutcher/scripts/meghans_codes/SdP_codes/3DKirchhof/f_get_length.m>f_get_length".
Error in ==> process_sacdataSP10_PICASSO at 62
ll=f_get_length(SS);
When I try to figure out the problem, I discovered that the space is not being identified as a space when using isspace, but the following proves that there are spaces after the 4th character:
strtrim(A)
ans =
CAVN
length(A)
ans =
8
display(['test' A(6) 'test'])
test test
display(['test' A(5) 'test'])
test test
display(['test' A(4) 'test'])
testNtest
display(['test' A(7) 'test'])
test test
display(['test' A(8) 'test'])
test test
length(A)
ans =
8
strtrim(A)
ans =
CAVN
length(A)
ans =
8
isspace(A(6))
ans =
0
isspace(A)
ans =
0 0 0 0 0 0 0 0
If there is no space in your input A, the output ll is not assigned.
To solve that problem, you should add at the end of your function:
ll=l1;
Instead of using isspace or the much more limited A(ii) == ' ', you could use the condition
A(ii) <= ' ' || isspace(A(ii))
The first part will take care of all non-printing ASCII characters (e.g., everything in ASCII that comes before space), and isspace will then take care of the remaining non-printing Unicode characters.
Alternatively, you could use
if ~isletter(A(ii)), break; end
and, as Oli already indicated, make sure that ll gets assigned in all possible paths. Meaning, your function will fail if A = '' (empty) or A = char(1) (NULL character) or A = ' THE_REAL_STRING' (it will return length 1 because of leading space), etc.
Moreover, you can vectorize the whole thing like so:
ll = find(A <= ' ' | isspace(A), 1);
making the whole loop unnecessary.