MATLAB: comparison of cell arrays of string

MATLAB: comparison of cell arrays of string - matlab

I have two cell arrays of strings, and I want to check if they contain the same strings (they do not have to be in the same order, nor do we know if they are of the same lengths).
For example:
a = {'2' '4' '1' '3'};
b = {'1' '2' '4' '3'};
or
a = {'2' '4' '1' '3' '5'};
b = {'1' '2' '4' '3'};
First I thought of strcmp but it would require looping over one cell contents and compare against the other. I also considered ismember by using something like:
ismember(a,b) & ismember(b,a)
but then we don't know in advance that they are of the same length (obvious case of unequal). So how would you perform this comparison in the most efficient way without writing too many cases of if/else.

You could use the function SETXOR, which will return the values that are not in the intersection of the two cell arrays. If it returns an empty array, then the two cell arrays contain the same values:
arraysAreEqual = isempty(setxor(a,b));
EDIT: Some performance measures...
Since you were curious about performance measures, I thought I'd test the speed of my solution against the two solutions listed by Amro (which use ISMEMBER and STRCMP/CELLFUN). I first created two large cell arrays:
a = cellstr(num2str((1:10000).')); %'# A cell array with 10,000 strings
b = cellstr(num2str((1:10001).')); %'# A cell array with 10,001 strings
Next, I ran each solution 100 times over to get a mean execution time. Then, I swapped a and b and reran it. Here are the results:
Method | Time | a and b swapped
---------------+---------------+------------------
Using SETXOR | 0.0549 sec | 0.0578 sec
Using ISMEMBER | 0.0856 sec | 0.0426 sec
Using STRCMP | too long to bother ;)
Notice that the SETXOR solution has consistently fast timing. The ISMEMBER solution will actually run slightly faster if a has elements that are not in b. This is due to the short-circuit && which skips the second half of the calculation (because we already know a and b do not contain the same values). However, if all of the values in a are also in b, the ISMEMBER solution is significantly slower.

You can still use ISMEMBER function like you did with a small modification:
arraysAreEqual = all(ismember(a,b)) && all(ismember(b,a))
Also, you can write the loop version with STRCMP as one line:
arraysAreEqual = all( cellfun(#(s)any(strcmp(s,b)), a) )
EDIT: I'm adding a third solution adapted from another SO question:
g = grp2idx([a;b]);
v = all( unique(g(1:numel(a))) == unique(g(numel(a)+1:end)) );
In the same spirit, Im performed the time comparison (using the TIMEIT function):
function perfTests()
a = cellstr( num2str((1:10000)') ); %#' fix SO highlighting
b = a( randperm(length(a)) );
timeit( #() func1(a,b) )
timeit( #() func2(a,b) )
timeit( #() func3(a,b) )
timeit( #() func4(a,b) )
end
function v = func1(a,b)
v = isempty(setxor(a,b)); %# #gnovice answer
end
function v = func2(a,b)
v = all(ismember(a,b)) && all(ismember(b,a));
end
function v = func3(a,b)
v = all( cellfun(#(s)any(strcmp(s,b)), a) );
end
function v = func4(a,b)
g = grp2idx([a;b]);
v = all( unique(g(1:numel(a))) == unique(g(numel(a)+1:end)) );
end
and the results in the same order of functions (lower is better):
ans =
0.032527
ans =
0.055853
ans =
8.6431
ans =
0.022362

Take a look at the function intersect
What MATLAB Help says:
[c, ia, ib] = intersect(a, b) also
returns column index vectors ia and ib
such that c = a(ia) and b(ib) (or c =
a(ia,:) and b(ib,:)).

Related

How to find an matching element (either number or string) in a multi level cell?

I am trying to search a cell of cell arrays for a matching number (for example, 2) or string ('text'). Example for a cell:
A = {1 {2; 3};4 {5 'text' 7;8 9 10}};
There is similar question. However, this solution works only, if you want to find a number value in cell. I would need a solution as well for numbers as for strings.
The needed output should be 1 or 0 (the value is or is not in the cell A) and the cell level/deepness where the matched element was found.

For your example input, you can match character vectors as well as numbers by replacing ismember in the linked solution with isequal. You can get the depth at which the search value was found by tracking how many times the function has to go round the while loop.
function [isPresent, depth] = is_in_cell(cellArray, value)
depth = 1;
f = #(c) isequal(value, c);
cellIndex = cellfun(#iscell, cellArray);
isPresent = any(cellfun(f, cellArray(~cellIndex)));
while ~isPresent
depth = depth + 1;
cellArray = [cellArray{cellIndex}];
cellIndex = cellfun(#iscell, cellArray);
isPresent = any(cellfun(f, cellArray(~cellIndex)));
if ~any(cellIndex)
break
end
end
end
Using isequal works because f is only called for elements of cellArray that are not themselves cell arrays. Use isequaln if you want to be able to search for NaN values.
Note this now won't search inside numeric, logical or string arrays:
>> A = {1 {2; 3};4 {5 'text' 7;8 9 [10 11 12]}};
>> is_in_cell(A, 10)
ans =
logical
0
If you want that, you can define f as
f = #(c) isequal(value, c) || isequal(class(value), class(c)) && ismember(value, c);
which avoids calling ismember with incompatible data types, because of the 'short-circuiting' behaviour of || and &&. This last solution is still a bit inconsistent in how it matches strings with character vectors, just in case that's important to you - see if you can figure out how to fix that.

Fastest way of finding the only index of vector b where array A(i,j) == b

I have 2 big arrays A and b:
A: 10.000++ rows, 4 columns, not unique integers
b: vector with 500.000++ elements, unique integers
Due to the uniqueness of the values of b, I need to find the only index of b, where A(i,j) == b.
What I started with is
[rows,columns] = size(A);
B = zeros(rows,columns);
for i = 1 : rows
for j = 1 : columns
B(i,j) = find(A(i,j)==b,1);
end
end
This takes approx 5.5 seconds to compute, which is way to long, since A and b can be significantly bigger... That in mind I tried to speed up the code by using logical indexing and reducing the for-loops
[rows,columns] = size(A);
B = zeros(rows,columns);
for idx = 1 : numel(b)
B(A==b(idx)) = idx;
end
Sadly this takes even longer: 21 seconds
I even tried to do use bsxfun
for i = 1 : columns
[I,J] = find(bsxfun(#eq,A(:,i),b))
... stitch B together ...
end
but with a bigger arrays the maximum array size is quickly exceeded (102,9GB...).
Can you help me find a faster solution to this? Thanks in advance!
EDIT: I extended find(A(i,j)==b,1), which speeds up the algorithm by factor 2! Thank you, but overall still too slow... ;)

The function ismember is the right tool for this:
[~,B] = ismember(A,b);
Test code:
function so
A = rand(1000,4);
b = unique([A(:);rand(2000,1)]);
B1 = op1(A,b);
B2 = op2(A,b);
isequal(B1,B2)
tic;op1(A,b);op1(A,b);op1(A,b);op1(A,b);toc
tic;op2(A,b);op2(A,b);op2(A,b);op2(A,b);toc
end
function B = op1(A,b)
B = zeros(size(A));
for i = 1:numel(A)
B(i) = find(A(i)==b,1);
end
end
function B = op2(A,b)
[~,B] = ismember(A,b);
end
I ran this on Octave, which is not as fast with loops as MATLAB. It also doesn't have the timeit function, hence the crappy timing using tic/toc (sorry for that). In Octave, op2 is more than 100 times faster than op1. Timings will be different in MATLAB, but ismember should still be the fastest option. (Note I also replaced your double loop with a single loop, this is the same but simpler and probably faster.)
If you want to repeatedly do the search in b, it is worthwhile to sort b first, and implement your own binary search. This will avoid the checks and sorting that ismember does. See this other question.

Assuming that you have positive integers you can use array indexing:
mm = max(max(A(:)),max(b(:)));
idxs = sparse(b,1,1:numel(b),mm,1);
result = full(idxs(A));
If the range of values is small you can use dense matrix instead of sparse matrix:
mm = max(max(A(:)),max(b(:)));
idx = zeros(mm,1);
idx(b)=1:numel(b);
result = idx(A);

How do I assign the output from a Matlab function to a variable in a single line?

Is there a way in matlab to assign the output from a function in matlab to a vector within a single line?
For example, this function should assign a perimeter and an area value
function [p,a] = square_geom(side)
p = perimeter_square(side)
a = side.^2
[p,a]
however when I try to store the data like this
v = square_geom(3)
It doesn't work. However it is possible to do it like this
[p,a] = square_geom(3)
v=[p,a]
But that just doesn't look as good as keeping it to a single line. Any ideas?

You can change the definition of your function by using varargout as output variable:
Edit
Updated the definition of the function to include the check on the number of output
function varargout = square_geom(side)
p = 3;% %perimeter_square(side)
a = side.^2;
v=[p,a];
switch(nargout)
case 0 disp('No output specified, array [p,a] returned')
varargout{1}=v;
case 1 varargout{1}=v;
case 2 varargout{1}=p;
varargout{2}=a;
case 3 varargout{1}=v;
varargout{2}=p;
varargout{3}=a;
otherwise disp(['Error: too many (' num2str(nargout) ') output specified'])
disp('array [p,a,NaN, NaN ...] returned (NaN for each extra output)')
varargout{1}=v;
varargout{2}=p;
varargout{3}=a;
for i=4:nargout
varargout{i}=NaN
end
end
This allows you either calling your function in several ways
square_geom(3)
v=square_geom(3)
[a,b]=square_geom(3)
[a,b,d]=square_geom(3)
[a,b,d,e,f,g]=square_geom(3)
In the first case you get the array v as the automatic variable ans
square_geom(3)
No output specified, array [p,a] returned
ans =
3 9
In the second case, you get the array v
v=square_geom(3)
v =
3 9
In the third case, you get the two variables
[a,b]=square_geom(3)
a = 3
b = 9
In the fourth case, you get the array v and the two sigle variables a and b
[v,b,d]=square_geom(3)
v =
3 9
b = 3
d = 9
In the latter case (too many output specified), you get the array v, the two single variables a and b and the exceeding variables (e, f and g) set to NaN
[v,b,d,e,f,g]=square_geom(3)
Error: too many (6) output specified
array [p,a,NaN, NaN ...] returned (NaN for each extra output)
v =
3 9
b = 3
d = 9
e = NaN
f = NaN
g = NaN
Notice, to thest the code I've modified the function byh replacing the call toperimeter_square with 3

The value outputted by the function are two different variables, not an array.
So the output should be stored as two cells.
[v(1),v(2)]=fn(val)
Here, the value is stored in two different cell v(1) and v(2) of the same array 'v'.

The simplest solution that comes into my mind is:
c = test();
function result = test()
a = ...; % complex computation
b = ...; % complex computation
result = [a b];
end
The C variable will then be a column vector containing two values. But this solution is only suitable if your function output doesn't need to be flexible.

You can get your multiple outputs in a cell array.
In your case:
v=cell(1,2);
[v{:}]=square_geom(3)

Employing conditions in cellfun MATLAB

The questions is:
A=[9 10];
And I want to obtain B={'09','10'};
I made this:
for hij=1:size(A,1)
if A{hij}<9
B{hij}=strcat('0',num2str(A{hij}),'');
else
B{hij}=strcat('',num2str(A{hij}),'');
end
end
But I was wondering if there is any posibility to make that without using a loop, maybe using a "cellfun"; thanks!

Is this what you want?
>> B = num2str(A(:),'%02d'); %// second argument to num2str is format spec
B =
09
10
This gives a string matrix B. To convert B into a cell array of strings:
>> B = mat2cell(B,ones(1,size(B,1))).';
B =
'09' '10'
or, as noted by Divakar,
>> B = cellstr(B).';
B =
'09' '10'

Luis answer is good. Just for completeness, you can use arrayfun if you really want:
C = arrayfun(#num2str, A,'UniformOutput', 0 );

Best way to count all elements in a cell array?

I want to count all elements in a cell array, including those in "nested" cells.
For a cell array
>> C = {{{1,2},{3,4,5}},{{{6},{7},{8}},{9}},10}
C = {1x2 cell} {1x2 cell} [10]
The answer should be 10.
One way is to use [C{:}] repeatedly until there are no cells left and then use numel but there must be a better way?

Since you are only interested in the number of elements, here is a simplified version of flatten.m that #Ansari linked to:
function n = my_numel(A)
n = 0;
for i=1:numel(A)
if iscell(A{i})
n = n + my_numel(A{i});
else
n = n + numel(A{i});
end
end
end
The result:
>> C = {{{1,2},{3,4,5}},{{{6},{7},{8}},{9}},10};
>> my_numel(C)
ans =
10
EDIT:
If you are feeling lazy, we can let CELLPLOT do the counting:
hFig = figure('Visible','off');
num = numel( findobj(cellplot(C),'type','text') );
close(hFig)
Basically we create an invisible figure, plot the cell array, count how many "text" objects were created, then delete the invisible figure.
This is how the plot looks like underneath:

Put this in a function (say flatten.m) (code from MATLAB Central):
function C = flatten(A)
C = {};
for i=1:numel(A)
if(~iscell(A{i}))
C = [C,A{i}];
else
Ctemp = flatten(A{i});
C = [C,Ctemp{:}];
end
end
Then do numel(flatten(C)) to find the total number of elements.
If you don't like making a separate function, you can employ this clever (but nasty) piece of code for defining a flatten function using anonymous functions (code from here):
flatten = #(nested) feval(Y(#(flat) #(v) foldr(#(x, y) ifthenelse(iscell(x) | iscell(y), #() [flat(x), flat(y)], #() {x, y}), [], v)), nested);
Either way, you need to recurse to flatten the cell array then count.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

MATLAB: comparison of cell arrays of string - matlab

Take a look at the function intersect What MATLAB Help says: [c, ia, ib] = intersect(a, b) also returns column index vectors ia and ib such that c = a(ia) and b(ib) (or c = a(ia,:) and b(ib,:)).

Related

How to find an matching element (either number or string) in a multi level cell?

Fastest way of finding the only index of vector b where array A(i,j) == b

How do I assign the output from a Matlab function to a variable in a single line?

Employing conditions in cellfun MATLAB

Best way to count all elements in a cell array?

Categories

Resources