MATLAB: strmatch vs strcmp - matlab

If I'm using a char string as the needle and a cell array of chars as the haysack, would the following achieve the same results every time? I'm looking at their documentations, but I don't see anything that would suggest otherwise. I wanted to check with SO's community as well.
Basically,
k = strmatch('abc', cellArray, 'exact');
k2 = find(strcmp('abc', cellArray));
where cellArray is an Nx1 cell array of chars and it has 'abc' values at arbitrary indices. For example, if cellArray has abc at indices 10, 20, and 30. Then would the following be true everytime for any cellArray?
k = [10 20 30];
k2 = [10 20 30];
Also, if both methods return the same answers, when would you use strmatch over strcmp in this kind of search scenario (looking for a char string in a cell array of same data type)? strmatch is extremely slow, if anyone is wondering why I'm even asking.

No, the results will be different. The function strmatch returns a vector of indexes where the cell array (haystack) matches the string (needle):
>> arr = {'a', 'b', 'c', 'a', 'b'};
>> strmatch('a', arr, 'exact')
ans =
1
4
The strcmp function returns a logical vector, with 1s where the haystack matches and 0s where it doesn't match:
>> strcmp('a', arr)
ans =
1 0 0 1 0
On the other hand, the expression find(strcmp('a', arr)) is equivalent to strmatch('a', arr, 'exact').

strmatch is not recommended. Use strncmp or validatestring instead. strmatch will be removed from a future version of matlab.
*Warning given in Matlab 2017 a.

Related

How to find an matching element (either number or string) in a multi level cell?

I am trying to search a cell of cell arrays for a matching number (for example, 2) or string ('text'). Example for a cell:
A = {1 {2; 3};4 {5 'text' 7;8 9 10}};
There is similar question. However, this solution works only, if you want to find a number value in cell. I would need a solution as well for numbers as for strings.
The needed output should be 1 or 0 (the value is or is not in the cell A) and the cell level/deepness where the matched element was found.
For your example input, you can match character vectors as well as numbers by replacing ismember in the linked solution with isequal. You can get the depth at which the search value was found by tracking how many times the function has to go round the while loop.
function [isPresent, depth] = is_in_cell(cellArray, value)
depth = 1;
f = #(c) isequal(value, c);
cellIndex = cellfun(#iscell, cellArray);
isPresent = any(cellfun(f, cellArray(~cellIndex)));
while ~isPresent
depth = depth + 1;
cellArray = [cellArray{cellIndex}];
cellIndex = cellfun(#iscell, cellArray);
isPresent = any(cellfun(f, cellArray(~cellIndex)));
if ~any(cellIndex)
break
end
end
end
Using isequal works because f is only called for elements of cellArray that are not themselves cell arrays. Use isequaln if you want to be able to search for NaN values.
Note this now won't search inside numeric, logical or string arrays:
>> A = {1 {2; 3};4 {5 'text' 7;8 9 [10 11 12]}};
>> is_in_cell(A, 10)
ans =
logical
0
If you want that, you can define f as
f = #(c) isequal(value, c) || isequal(class(value), class(c)) && ismember(value, c);
which avoids calling ismember with incompatible data types, because of the 'short-circuiting' behaviour of || and &&. This last solution is still a bit inconsistent in how it matches strings with character vectors, just in case that's important to you - see if you can figure out how to fix that.

Vectorizing the Notion of Colon (:) - values between two vectors in MATLAB

I have two vectors, idx1 and idx2, and I want to obtain the values between them. If idx1 and idx2 were numbers and not vectors, I could do that the following way:
idx1=1;
idx2=5;
values=idx1:idx2
% Result
% values =
%
% 1 2 3 4 5
But in my case, idx1 and idx2 are vectors of variable length. For example, for length=2:
idx1=[5,9];
idx2=[9 11];
Can I use the colon operator to directly obtain the values in between? This is, something similar to the following:
values = [5 6 7 8 9 9 10 11]
I know I can do idx1(1):idx2(1) and idx1(2):idx2(2), this is, extract the values for each column separately, so if there is no other solution, I can do this with a for-loop, but maybe Matlab can do this more easily.
Your sample output is not legal. A matrix cannot have rows of different length. What you can do is create a cell array using arrayfun:
values = arrayfun(#colon, idx1, idx2, 'Uniform', false)
To convert the resulting cell array into a vector, you can use cell2mat:
values = cell2mat(values);
Alternatively, if all vectors in the resulting cell array have the same length, you can construct an output matrix as follows:
values = vertcat(values{:});
Try taking the union of the sets. Given the values of idx1 and idx2 you supplied, run
values = union(idx1(1):idx1(2), idx2(1):idx2(2));
Which will yield a vector with the values [5 6 7 8 9 10 11], as desired.
I couldn't get #Eitan's solution to work, apparently you need to specify parameters to colon. The small modification that follows got it working on my R2010b version:
step = 1;
idx1 = [5, 9];
idx2 = [9, 11];
values = arrayfun(#(x,y)colon(x, step, y), idx1, idx2, 'UniformOutput', false);
values=vertcat(cell2mat(values));
Note that step = 1 is actually the default value in colon, and Uniform can be used in place of UniformOutput, but I've included these for the sake of completeness.
There is a great blog post by Loren called Vectorizing the Notion of Colon (:). It includes an answer that is about 5 times faster (for large arrays) than using arrayfun or a for-loop and is similar to run-length-decoding:
The idea is to expand the colon sequences out. I know the lengths of
each sequence so I know the starting points in the output array. Fill
the values after the start values with 1s. Then I figure out how much
to jump from the end of one sequence to the beginning of the next one.
If there are repeated start values, the jumps might be negative. Once
this array is filled, the output is simply the cumulative sum or
cumsum of the sequence.
function x = coloncatrld(start, stop)
% COLONCAT Concatenate colon expressions
% X = COLONCAT(START,STOP) returns a vector containing the values
% [START(1):STOP(1) START(2):STOP(2) START(END):STOP(END)].
% Based on Peter Acklam's code for run length decoding.
len = stop - start + 1;
% keep only sequences whose length is positive
pos = len > 0;
start = start(pos);
stop = stop(pos);
len = len(pos);
if isempty(len)
x = [];
return;
end
% expand out the colon expressions
endlocs = cumsum(len);
incr = ones(1, endlocs(end));
jumps = start(2:end) - stop(1:end-1);
incr(endlocs(1:end-1)+1) = jumps;
incr(1) = start(1);
x = cumsum(incr);

find NaN values is cell array

lets assume I have the following array:
a = {1; 'abc'; NaN}
Now I want to find out in which indices this contains NaN, so that I can replace these with '' (empty string).
If I use cellfun with isnan I get a useless output
cellfun(#isnan, a, 'UniformOutput', false)
ans =
[ 0]
[1x3 logical]
[ 1]
So how would I do this correct?
Indeed, as you found yourself, this can be done by
a(cellfun(#(x) any(isnan(x)),a)) = {''}
Breakdown:
Fx = #(x) any(isnan(x))
will return a logical scalar, irrespective of whether x is a scalar or vector.
Using this function inside cellfun will then erradicate the need for 'UniformOutput', false:
>> inds = cellfun(Fx,a)
inds =
0
0
1
These can be used as indices to the original array:
>> a(inds)
ans =
[NaN]
which in turn allows assignment to these indices:
>> a(inds) = {''}
a =
[1]
'abc'
''
Note that the assignment must be done to a cell array itself. If you don't understand this, read up on the differences between a(inds) and a{inds}.
I found the answer on http://www.mathworks.com/matlabcentral/answers/42273
a(cellfun(#(x) any(isnan(x)),a)) = {''}
However, I do not understant it...
a(ind) = [] will remove the entries from the array
a(ind)= {''} will replace the NaN with an empty string.
If you want to delete the entry use = [] instead of = {''}.
If you wanted to replace the NaNs with a different value just set it equal to that value using curly braces:
a(ind) = {value}

Difference between [] and [1x0] in MATLAB

I have a loop in MATLAB that fills a cell array in my workspace (2011b, Windows 7, 64 bit) with the following entries:
my_array =
[1x219 uint16]
[ 138]
[1x0 uint16] <---- row #3
[1x2 uint16]
[1x0 uint16]
[] <---- row #6
[ 210]
[1x7 uint16]
[1x0 uint16]
[1x4 uint16]
[1x0 uint16]
[ 280]
[]
[]
[ 293]
[ 295]
[1x2 uint16]
[ 298]
[1x0 uint16]
[1x8 uint16]
[1x5 uint16]
Note that some entries hold [], as in row #6, while others hold [1x0] items, as in row #3.
Is there any difference between them? (other than the fact that MATLAB displays them differently). Any differences in how MATLAB represents them in memory?
If the difference is only about how MATLAB internally represents them, why should the programmer be aware of this difference ? (i.e. why display them differently?). Is it a (harmless) bug? or is there any benefit in knowing that such arrays are represented differently?
In most cases (see below for an exception) there is no real difference. Both are considered "empty", since at least one dimension has a size of 0. However, I wouldn't call this a bug, since as a programmer you may want to see this information in some cases.
Say, for example, you have a 2-D matrix and you want to index some rows and some columns to extract into a smaller matrix:
>> M = magic(4) %# Create a 4-by-4 matrix
M =
16 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
>> rowIndex = [1 3]; %# A set of row indices
>> columnIndex = []; %# A set of column indices, which happen to be empty
>> subM = M(rowIndex,columnIndex)
subM =
Empty matrix: 2-by-0
Note that the empty result still tells you some information, specifically that you tried to index 2 rows from the original matrix. If the result just showed [], you wouldn't know if it was empty because your row indices were empty, or your column indices were empty, or both.
The Caveat...
There are some cases when an empty matrix defined as [] (i.e. all of its dimensions are 0) may give you different results than an empty matrix that still has some non-zero dimensions. For example, matrix multiplication can give you different (and somewhat non-intuitive) results when dealing with different kinds of empty matrices. Let's consider these 3 empty matrices:
>> a = zeros(1,0); %# A 1-by-0 empty matrix
>> b = zeros(0,1); %# A 0-by-1 empty matrix
>> c = []; %# A 0-by-0 empty matrix
Now, let's try multiplying these together in different ways:
>> b*a
ans =
[] %# We get a 0-by-0 empty matrix. OK, makes sense.
>> a*b
ans =
0 %# We get a 1-by-1 matrix of zeroes! Wah?!
>> a*c
ans =
Empty matrix: 1-by-0 %# We get back the same empty matrix as a.
>> c*b
ans =
Empty matrix: 0-by-1 %# We get back the same empty matrix as b.
>> b*c
??? Error using ==> mtimes
Inner matrix dimensions must agree. %# The second dimension of the first
%# argument has to match the first
%# dimension of the second argument
%# when multiplying matrices.
Getting a non-empty matrix by multiplying two empty matrices is probably enough to make your head hurt, but it kinda makes sense since the result still doesn't really contain anything (i.e. it has a value of 0).
When concatenating matrices, the common dimension has to match.
It's not currently an error if it doesn't match when one of the operands is empty, but you do get a nasty warning that future versions might be more strict.
Examples:
>> [ones(1,2);zeros(0,9)]
Warning: Concatenation involves an empty array with an incorrect number of columns.
This may not be allowed in a future release.
ans =
1 1
>> [ones(2,1),zeros(9,0)]
Warning: Concatenation involves an empty array with an incorrect number of rows.
This may not be allowed in a future release.
ans =
1
1
Another difference is in the internal representation of both versions of empty. Especially when it comes to bundle together objects of the same class in an array.
Say you have a dummy class:
classdef A < handle
%A Summary of this class goes here
% Detailed explanation goes here
properties
end
methods
end
end
If you try to start an array from empty and grow it into an array of A objects:
clear all
clc
% Try to use the default [] for an array of A objects.
my_array = [];
my_array(1) = A;
Then you get:
??? The following error occurred converting from A to double:
Error using ==> double
Conversion to double from A is not possible.
Error in ==> main2 at 6
my_array(1) = A;
But if you do:
% Now try to use the class dependent empty for an array of A objects.
my_array = A.empty;
my_array(1) = A;
Then all is fine.
I hope this adds to the explanations given before.
If concatenation and multiplication is not enough to worry about, there is still looping. Here are two ways to observe the difference:
1. Loop over the variable size
for t = 1:size(zeros(0,0),1); % Or simply []
'no'
end
for t = 1:size(zeros(1,0),1); % Or zeros(0,1)
'yes'
end
Will print 'yes', if you replace size by length it will not print anything at all.
If this is not a surprise, perhaps the next one will be.
2. Iterating an empty matrix using a for loop
for t = [] %// Iterate an empty 0x0 matrix
1
end
for t = ones(1, 0) %// Iterate an empty 1x0 matrix
2
end
for t = ones(0, 1) %// Iterate an empty 0x1 matrix
3
end
Will print:
ans =
3
To conclude with a concise answer to both of your questions:
Yes there is definitely a difference between them
Indeed I believe the programmer will benefit from being aware of this difference as the difference may produce unexpected results

Map function in MATLAB?

I'm a little surprised that MATLAB doesn't have a Map function, so I hacked one together myself since it's something I can't live without. Is there a better version out there? Is there a somewhat-standard functional programming library for MATLAB out there that I'm missing?
function results = map(f,list)
% why doesn't MATLAB have a Map function?
results = zeros(1,length(list));
for k = 1:length(list)
results(1,k) = f(list(k));
end
end
usage would be e.g.
map( #(x)x^2,1:10)
The short answer: the built-in function arrayfun does exactly what your map function does for numeric arrays:
>> y = arrayfun(#(x) x^2, 1:10)
y =
1 4 9 16 25 36 49 64 81 100
There are two other built-in functions that behave similarly: cellfun (which operates on elements of cell arrays) and structfun (which operates on each field of a structure).
However, these functions are often not necessary if you take advantage of vectorization, specifically using element-wise arithmetic operators. For the example you gave, a vectorized solution would be:
>> x = 1:10;
>> y = x.^2
y =
1 4 9 16 25 36 49 64 81 100
Some operations will automatically operate across elements (like adding a scalar value to a vector) while others operators have a special syntax for element-wise operation (denoted by a . before the operator). Many built-in functions in MATLAB are designed to operate on vector and matrix arguments using element-wise operations (often applied to a given dimension, such as sum and mean for example), and thus don't require map functions.
To summarize, here are some different ways to square each element in an array:
x = 1:10; % Sample array
f = #(x) x.^2; % Anonymous function that squares each element of its input
% Option #1:
y = x.^2; % Use the element-wise power operator
% Option #2:
y = f(x); % Pass a vector to f
% Option #3:
y = arrayfun(f, x); % Pass each element to f separately
Of course, for such a simple operation, option #1 is the most sensible (and efficient) choice.
In addition to vector and element-wise operations, there's also cellfun for mapping functions over cell arrays. For example:
cellfun(#upper, {'a', 'b', 'c'}, 'UniformOutput',false)
ans =
'A' 'B' 'C'
If 'UniformOutput' is true (or not provided), it will attempt to concatenate the results according to the dimensions of the cell array, so
cellfun(#upper, {'a', 'b', 'c'})
ans =
ABC
A rather simple solution, using Matlab's vectorization would be:
a = [ 10 20 30 40 50 ]; % the array with the original values
b = [ 10 8 6 4 2 ]; % the mapping array
c = zeros( 1, 10 ); % your target array
Now, typing
c( b ) = a
returns
c = 0 50 0 40 0 30 0 20 0 10
c( b ) is a reference to a vector of size 5 with the elements of c at the indices given by b. Now if you assing values to this reference vector, the original values in c are overwritten, since c( b ) contains references to the values in c and no copies.
It seems that the built-in arrayfun doesn't work if the result needed is an array of function:
eg:
map(#(x)[x x^2 x^3],1:10)
slight mods below make this work better:
function results = map(f,list)
% why doesn't MATLAB have a Map function?
for k = 1:length(list)
if (k==1)
r1=f(list(k));
results = zeros(length(r1),length(list));
results(:,k)=r1;
else
results(:,k) = f(list(k));
end;
end;
end
If matlab does not have a built in map function, it could be because of efficiency considerations. In your implementation you are using a loop to iterate over the elements of the list, which is generally frowned upon in the matlab world. Most built-in matlab functions are "vectorized", i. e. it is more efficient to call a function on an entire array, than to iterate over it yourself and call the function for each element.
In other words, this
a = 1:10;
a.^2
is much faster than this
a = 1:10;
map(#(x)x^2, a)
assuming your definition of map.
You don't need map since a scalar-function that is applied to a list of values is applied to each of the values and hence works similar to map. Just try
l = 1:10
f = #(x) x + 1
f(l)
In your particular case, you could even write
l.^2
Vectorizing the solution as described in the previous answers is the probably the best solution for speed. Vectorizing is also very Matlaby and feels good.
With that said Matlab does now have a Map container class.
See http://www.mathworks.com/help/matlab/map-containers.html