Vectorized operations on cell arrays

Vectorized operations on cell arrays - matlab

This post was triggered by following discussion on whether cell arrays are "normal arrays" and that vectorizaton does not work for cell arrays.
I wonder why following vectorization syntax is not implemented in MATLAB, what speaks against it:
>> {'hallo','matlab','world'} == 'matlab'
??? Undefined function or method 'eq' for input arguments of type 'cell'.
internally it would be equivalent to
[{'hallo'},{'matlab'},{'world'}] == {'matlab'}
because MATLAB knows when to cast, following works:
[{'hallo','matlab'},'world']
Cell array is an array of pointers. If both left and right side point to equal objects, isequal('hallo','hallo') returns as expected true, then why MATLAB still does not allow topmost example?
I know I can use strmatch or cellfun.
SUMMARY:
operator == which is required for vectorization in above example is eq and not isequal (other operators are < which is lt, etc.)
eq is built-in for numeric types, for all other types (like strings) MATLAB gives as freedom to overload this (and other) operators.
operator vectorization is thus well possible with cell arrays of defined type (like string) but not by default for any type.
function vectorization like myFun( myString ) or myFun( myCellOfStrings ), is also possible, you have just to implement it internally in myFun. Functions sin(val) and sin(array) work also not by witchcraft but because both cases are implemented internally.

Firstly, == is not the same as isequal. The function that gets called when you use == is eq, and the scope of each of those is different.
For e.g., in eq(A,B), if B is a scalar, the function checks each element of A for equality with B and returns a logical vector.
eq([2,5,4,2],2)
ans =
1 0 0 1
However, isequal(A,B) checks if A is identically equal to B in all aspects. In other words, MATLAB cannot tell the difference between A and B. Doing this for the above example:
isequal([2,5,4,2],2)
ans =
0
I think what you really intended to ask in the question, but didn't, is:
"Why is == not defined for cell arrays?"
Well, a simple reason is: Cells were not intended for such use. You can easily see how implementing such a function for cells can quickly get complicated when you start considering individual cases. For example, consider
{2,5,{4,2}}==2
What would you expect the answer to be? A reasonable guess would be
ans = {1,0,0}
which is fair. But let's say, I disagree. Now I'd like the equality operation to walk down nested cells and return
ans = {1,0,{0,1}}
Can you disagree with this interpretation? Perhaps not. It's equally valid, and in some cases that's the behavior you want.
This was just a simple example. Now add to this a mixture of nested cells, different types, etc. within the cell and think about handling each of those corner cases. It quickly becomes a nightmare for the developers to implement such a functionality that can be satisfactorily used by everyone.
So the solution is to overload the function, implementing only the specific functionality that you desire, for use in your application. MATLAB provides a way to do that too, by creating an #cell directory and defining an eq.m for use with cells the way you want it. Ramashalanka has demonstrated this in his answer.

There are many things that would seem natural for MATLAB to do that they have chosen not to. Perhaps they don't want to consider many special cases (see below). You can do it yourself by overloading. If you make a directory #cell and put the following in a new function eq.m:
function c = eq(a,b)
if iscell(b) && ~iscell(a)
c = eq(b,a);
else
c = cell(size(a));
for n = 1:numel(c)
if iscell(a) && iscell(b)
c{n} = isequal(a{n},b{n});
else
c{n} = isequal(a{n},b);
end
end
end
Then you can do, e.g.:
>> {'hallo','matlab','world'} == 'matlab'
ans = [0] [1] [0]
>> {'hallo','matlab','world'} == {'a','matlab','b'}
ans = [0] [1] [0]
>> {'hallo','matlab','world'} == {'a','dd','matlab'}
ans = [0] [0] [0]
>> { 1, 2, 3 } == 2
ans = [0] [1] [0]
But, even though I considered a couple of cases in my simple function, there are lots of things I didn't consider (checking cells are the same size, checking a multi-element cell against a singleton etc etc).
I used isequal even though it's called with eq (i.e. ==) since it handles {'hallo','matlab','world'} == 'matlab' better, but really I should consider more cases.
(EDIT: I made the function slightly shorter, but less efficient)

This is not unique to strings. Even the following does not work:
{ 1, 2, 3 } == 2
Cell arrays are not the same as "normal" arrays: they offer different syntax, different semantics, different capabilities, and are implemented differently (an extra layer of indirection).
Consider if == on cell arrays were defined in terms of isequal on an element-by-element basis. So, the above example would be no problem. But what about this?
{ [1 0 1], [1 1 0] } == 1
The resulting behaviour wouldn't be terribly useful in most circumstances. And what about this?
1 == { 1, 2, 3 }
And how would you define this? (I can think of at least three different interpretations.)
{ 1, 2, 3 } == { 4, 5, 6 }
And what about this?
{ 1, 2, 3 } == { { 4, 5, 6 } }
Or this?
{ 1, 2, 3 } == { 4; 5; 6 }
Or this?
{ 1, 2, 3 } == { 4, 5 }
You could add all sorts of special-case handling, but that makes the language less consistent, more complex, and less predictable.

The reason for this problem is: cell arrays can store different types of variables in different cells. Thus the operator == can't be defined well for the entire array. It is even possible for a cell to contain another cell, further exacerbating the problem.
Think of {4,'4',4.0,{4,4,4}} == '4'. what should be the result? Each type evaluates in different way.

Related

Turn off Warning: Extension: Conversion from LOGICAL(4) to INTEGER(4) at (1) for gfortran?

I am intentionally casting an array of boolean values to integers but I get this warning:
Warning: Extension: Conversion from LOGICAL(4) to INTEGER(4) at (1)
which I don't want. Can I either
(1) Turn off that warning in the Makefile?
or (more favorably)
(2) Explicitly make this cast in the code so that the compiler doesn't need to worry?
The code will looking something like this:
A = (B.eq.0)
where A and B are both size (n,1) integer arrays. B will be filled with integers ranging from 0 to 3. I need to use this type of command again later with something like A = (B.eq.1) and I need A to be an integer array where it is 1 if and only if B is the requested integer, otherwise it should be 0. These should act as boolean values (1 for .true., 0 for .false.), but I am going to be using them in matrix operations and summations where they will be converted to floating point values (when necessary) for division, so logical values are not optimal in this circumstance.
Specifically, I am looking for the fastest, most vectorized version of this command. It is easy to write a wrapper for testing elements, but I want this to be a vectorized operation for efficiency.
I am currently compiling with gfortran, but would like whatever methods are used to also work in ifort as I will be compiling with intel compilers down the road.
update:
Both merge and where work perfectly for the example in question. I will look into performance metrics on these and select the best for vectorization. I am also interested in how this will work with matrices, not just arrays, but that was not my original question so I will post a new one unless someone wants to expand their answer to how this might be adapted for matrices.

I have not found a compiler option to solve (1).
However, the type conversion is pretty simple. The documentation for gfortran specifies that .true. is mapped to 1, and false to 0.
Note that the conversion is not specified by the standard, and different values could be used by other compilers. Specifically, you should not depend on the exact values.
A simple merge will do the trick for scalars and arrays:
program test
integer :: int_sca, int_vec(3)
logical :: log_sca, log_vec(3)
log_sca = .true.
log_vec = [ .true., .false., .true. ]
int_sca = merge( 1, 0, log_sca )
int_vec = merge( 1, 0, log_vec )
print *, int_sca
print *, int_vec
end program
To address your updated question, this is trivial to do with merge:
A = merge(1, 0, B == 0)
This can be performed on scalars and arrays of arbitrary dimensions. For the latter, this can easily be vectorized be the compiler. You should consult the manual of your compiler for that, though.
The where statement in Casey's answer can be extended in the same way.
Since you convert them to floats later on, why not assign them as floats right away? Assuming that A is real, this could look like:
A = merge(1., 0., B == 0)

Another method to compliment #AlexanderVogt is to use the where construct.
program test
implicit none
integer :: int_vec(5)
logical :: log_vec(5)
log_vec = [ .true., .true., .false., .true., .false. ]
where (log_vec)
int_vec = 1
elsewhere
int_vec = 0
end where
print *, log_vec
print *, int_vec
end program test
This will assign 1 to the elements of int_vec that correspond to true elements of log_vec and 0 to the others.
The where construct will work for any rank array.

For this particular example you could avoid the logical all together:
A=1-(3-B)/3
Of course not so good for readability, but it might be ok performance-wise.
Edit, running performance tests this is 2-3 x faster than the where construct, and of course absolutely standards conforming. In fact you can throw in an absolute value and generalize as:
integer,parameter :: h=huge(1)
A=1-(h-abs(B))/h
and still beat the where loop.

eq returns true for non equal lists

I got a strange piece of code to debug which to my opinion should throw an exception but instead it produced totally odd results. Reduced it to these two lines:
EDU>> A={0,0}
A =
[0] [0]
EDU>> A{1:2}==A{2:1}
ans =
1
Why is the comparison of two non equal comma separated lists true?

The line of code A{1:2}==A{2:1} is not checking the equality of two comma-separated lists because 2:1 is an empty array. I think the intended indexing was 2:-1:1; this will create a comma-separated list but also throw an error since == cannot handle the list.
However, it is odd that A{1:2}==A{2:1} produces a valid output of any kind in my mind. The code is literally saying A{1:2} == A{[]}, and the question is "what is A{[]}?" According to my MATLAB R2014b, nothing, which makes some sense, but even a simple double array with an empty index returns an empty double. I guess the actual content, which is what is retreived by { and }, is nothing so, yeah.
But then how is MATLAB producing the answer of true?
Consider the following code from the command window:
>> A = {0,0}; A{1:2} == A{[]}
ans =
1
>> A = {0,1}; A{1:2} == A{[]}
ans =
0
From that, I surmise that MATLAB places the comma-separated list as the first two arguments to eq and appends A{[]} nothing to it and interpret it simply as
eq(0,0,A{[]})
eq(0,1,A{[]})
which is, apparently, valid syntax (eq(a,b,) is not). It is very interesting for a binary operation on elements of a cell array. This also works:
>> A = {[2,3],[3,2]};
>> A{1:2} .* A{[]}
ans =
6 6
>> A{1:2} ./ A{[]}
ans =
0.6667 1.5000
And just for fun, because I'm finding this quite interesting:
>> A = {rand(2),rand(2,1)};
>> A{1:2} \ A{[]}
ans =
0.8984
-0.7841
But I guess it makes sense. The parser finds a token, followed by an infix operator, followed by another token. It resolves the infix operator to its function, and then places the left and right tokens into the argument list in turn.
I guess I just find it a odd about the existence of "nothing"; although that would explain how [1,2,3,] is valid syntax.
That said, I'm sure this is a quirk of the language and not a bug nor a feature.
Of course, the only way to know what is actually going on is to have an intimate knowledge of how MATLAB is interpreting the cell array expansion and application of the operator. Of course, I do not have this experience nor the source required (I'd imagine).

Since both are 0. Try A = {1, 2} then you will get ans = 0

Setting property in array of Matlab objects

I am working with arrays of structs and objects in Matlab. I want to set properties for all the members of a certain array as fast as possible.
For the problem of setting a certain struct field, I reached a solution that involves using arrayfun and setfield. The following works like a charm:
myStru.id = 0;
myStru.name = 'blah';
arrayStru = repmat(myStru,10,1); % Array of 10 elements. All of them have id=0
arrayStru = cell2mat( arrayfun( #(x,y)setfield(x,'id',y), arrayStru, (1:10)', 'UniformOutput', false ) ); % ids ranging from 1 to 10 :D
The problem is that, for objects, this does not work. I understand that setfield is for structures, so I have tried some other alternatives. The most excruciating error pops out when I try the following:
arrayfun( #(x,y) eval(['x.id=y;']), arrayOfObjects, arrayOfValues, 'UniformOutput', false );
(The class is a very simple one, which accepts empty constructor and has a real public property called 'id'). It results in:
Error using setFieldOfStructArray>#(x,y)eval(['x.id=y;']) (line 17)
Error: The expression to the left of the equals sign is not a valid target for an
assignment.
ALTHOUGH if I put a breakpoint in that line, it seems that the expression can be executed with the expected effects.
My two (three) questions:
Why does the above solution fail? How can I get that to work?
My final goal is to set properties fast and simple in arrays of objects. Which is the best technique for this?
(Note: I can write loops, but I always feel itchy when I have to do that :P)

I think the problem may be that your propety may be readonly because setfield works also for classes.
Anyway there is some alternative, if your class inherit from hgsetget you can use set instead of setfield.
You can also use
subsasgn(x,struct('type','.','subs','id'),y)
instead of
setfield(x,'id',y)

If can use cell of values, which will be automatically interpreted as struct array
>> s = struct('a', num2cell(1:10)', 'b', 's')
s =
10x1 struct array with fields:
a
b
>> [s.a]
ans =
1 2 3 4 5 6 7 8 9 10
>> [s.b]
ans =
ssssssssss

Could max introduce round-off error?

In general, the == operator is not suited to test for "numeric" equality, but one should rather do something like abs(a - b) < eps. However, when I want to find the location of the largest element in an array, is it save to assume that max will return the element unchanged? Is it ok to do
[row, col] = find(a == max(a(:));

Yes.
max only compares two values, and does not do any operations on them that might change their values.
Here's a typical C++ implementation of a max:
template <class T>
T max(T a, T b) {
return a>b ? a : b;
}
As you see, this function will return the exact same value as either a or b.
Matlab just adds matrix formalism, fancy formatting wrappers etc. to it, but its kernel will follow the same principles as the example above.
So yes, it is OK to use equality here.

How can I count the number of properties in a structure in MATLAB?

I have a function that returns one or more variables, but as it changes (depending on whether the function is successful or not), the following does NOT work:
[resultA, resultB, resultC, resultD, resultE, resultF] = func(somevars);
This will sometimes return an error, varargout{2} not defined, since only the first variable resultA is actually given a value when the function fails. Instead I put all the output in one variable:
output = func(somevars);
However, the variables are defined as properties of a struct, meaning I have to access them with output.A. This is not a problem in itself, but I need to count the number of properties to determine if I got the proper result.
I tried length(output), numel(output) and size(output) to no avail, so if anyone has a clever way of doing this I would be very grateful.

length(fieldnames(output))
There's probably a better way, but I can't think of it.

It looks like Matthews answer is the best for your problem:
nFields = numel(fieldnames(output));
There's one caveat which probably doesn't apply for your situation but may be interesting to know nonetheless: even if a structure field is empty, FIELDNAMES will still return the name of that field. For example:
>> s.a = 5;
>> s.b = [1 2 3];
>> s.c = [];
>> fieldnames(s)
ans =
'a'
'b'
'c'
If you are interested in knowing the number of fields that are not empty, you could use either STRUCTFUN:
nFields = sum(~structfun(#isempty,s));
or a combination of STRUCT2CELL and CELLFUN:
nFields = sum(~cellfun('isempty',struct2cell(s)));
Both of the above return an answer of 2, whereas:
nFields = numel(fieldnames(s));
returns 3.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse