Could max introduce round-off error? - matlab

In general, the == operator is not suited to test for "numeric" equality, but one should rather do something like abs(a - b) < eps. However, when I want to find the location of the largest element in an array, is it save to assume that max will return the element unchanged? Is it ok to do
[row, col] = find(a == max(a(:));

Yes.
max only compares two values, and does not do any operations on them that might change their values.
Here's a typical C++ implementation of a max:
template <class T>
T max(T a, T b) {
return a>b ? a : b;
}
As you see, this function will return the exact same value as either a or b.
Matlab just adds matrix formalism, fancy formatting wrappers etc. to it, but its kernel will follow the same principles as the example above.
So yes, it is OK to use equality here.

Related

Minizinc: declare explicit set in decision variable

I'm trying to implement the 'Sport Scheduling Problem' (with a Round-Robin approach to break symmetries). The actual problem is of no importance. I simply want to declare the value at x[1,1] to be the set {1,2} and base the sets in the same column upon the first set. This is modelled as in the code below. The output is included in a screenshot below it. The problem is that the first set is not printed as a set but rather some sort of range while the values at x[2,1] and x[3,1] are indeed printed as sets and x[4,1] again as a range. Why is this? I assume that in the declaration of x that set of 1..n is treated as an integer but if it is not, how to declare it as integers?
EDIT: ONLY the first column of the output is of importance.
int: n = 8;
int: nw = n-1;
int: np = n div 2;
array[1..np, 1..nw] of var set of 1..n: x;
% BEGIN FIX FIRST WEEK $
constraint(
x[1,1] = {1, 2}
);
constraint(
forall(t in 2..np) (x[t,1] = {t+1, n+2-t} )
);
solve satisfy;
output[
"\(x[p,w])" ++ if w == nw then "\n" else "\t" endif | p in 1..np, w in 1..nw
]
Backend solver: Gecode
(Here's a summarize of my comments above.)
The range syntax is simply a shorthand for contiguous values in a set: 1..8 is a shorthand of the set {1,2,3,4,5,6,7,8}, and 5..6 is a shorthand for the set {5,6}.
The reason for this shorthand is probably since it's often - and arguably - easier to read the shorthand version than the full list, especially if it's a long list of integers, e.g. 1..1024. It also save space in the output of solutions.
For the two set versions, e.g. {1,2}, this explicit enumeration might be clearer to read than 1..2, though I tend to prefer the shorthand version in all cases.

Fastest type to use for comparing hashes in matlab

I have a table in Matlab with some columns representing 128 bit hashes.
I would like to match rows, to one or more rows, based on these hashes.
Currently, the hashes are represented as hexadecimal strings, and compared with strcmp(). Still, it takes many seconds to process the table.
What is the fastest way to compare two hashes in matlab?
I have tried turning them into categorical variables, but that is much slower. Matlab as far as I know does not have a 128 bit numerical type. nominal and ordinal types are deprecated.
Are there any others that could work?
The code below is analogous to what I am doing:
nodetype = { 'type1'; 'type2'; 'type1'; 'type2' };
hash = {'d285e87940fb9383ec5e983041f8d7a6'; 'd285e87940fb9383ec5e983041f8d7a6'; 'ec9add3cf0f67f443d5820708adc0485'; '5dbdfa232b5b61c8b1e8c698a64e1cc9' };
entries = table(categorical(nodetype),hash,'VariableNames',{'type','hash'});
%nodes to match. filter by type or some other way so rows don't match to
%themselves.
A = entries(entries.type=='type1',:);
B = entries(entries.type=='type2',:);
%pick a node/row with a hash to find all counterparts of
row_to_match_in_A = A(1,:);
matching_rows_in_B = B(strcmp(B.hash,row_to_match_in_A.hash),:);
% do stuff with matching rows...
disp(matching_rows_in_B);
The hash strings are faithful representations of what I am using, but they are not necessarily read or stored as strings in the original source. They are just converted for this purpose because its the fastest way to do the comparison.
Optimization is nice, if you need it. Try it out yourself and measure the performance gain for relevant test cases.
Some suggestions:
Sorted arrays are easier/faster to search
Matlab's default numbers are double, but you can also construct integers. Why not use 2 uint64's instead of the 128bit column? First search for the upper 64bit, then for the lower; or even better: use ismember with the row option and put your hashes in rows:
A = uint64([0 0;
0 1;
1 0;
1 1;
2 0;
2 1]);
srch = uint64([1 1;
0 1]);
[ismatch, loc] = ismember(srch, A, 'rows')
> loc =
4
2
Look into the compare functions you use (eg edit ismember) and strip out unnecessary operations (eg sort) and safety checks that you know in advance won't pose a problem. Like this solution does. Or if you intend do call a search function multiple times, sort in advance and skip the check/sort in the search function later on.

Turn off Warning: Extension: Conversion from LOGICAL(4) to INTEGER(4) at (1) for gfortran?

I am intentionally casting an array of boolean values to integers but I get this warning:
Warning: Extension: Conversion from LOGICAL(4) to INTEGER(4) at (1)
which I don't want. Can I either
(1) Turn off that warning in the Makefile?
or (more favorably)
(2) Explicitly make this cast in the code so that the compiler doesn't need to worry?
The code will looking something like this:
A = (B.eq.0)
where A and B are both size (n,1) integer arrays. B will be filled with integers ranging from 0 to 3. I need to use this type of command again later with something like A = (B.eq.1) and I need A to be an integer array where it is 1 if and only if B is the requested integer, otherwise it should be 0. These should act as boolean values (1 for .true., 0 for .false.), but I am going to be using them in matrix operations and summations where they will be converted to floating point values (when necessary) for division, so logical values are not optimal in this circumstance.
Specifically, I am looking for the fastest, most vectorized version of this command. It is easy to write a wrapper for testing elements, but I want this to be a vectorized operation for efficiency.
I am currently compiling with gfortran, but would like whatever methods are used to also work in ifort as I will be compiling with intel compilers down the road.
update:
Both merge and where work perfectly for the example in question. I will look into performance metrics on these and select the best for vectorization. I am also interested in how this will work with matrices, not just arrays, but that was not my original question so I will post a new one unless someone wants to expand their answer to how this might be adapted for matrices.
I have not found a compiler option to solve (1).
However, the type conversion is pretty simple. The documentation for gfortran specifies that .true. is mapped to 1, and false to 0.
Note that the conversion is not specified by the standard, and different values could be used by other compilers. Specifically, you should not depend on the exact values.
A simple merge will do the trick for scalars and arrays:
program test
integer :: int_sca, int_vec(3)
logical :: log_sca, log_vec(3)
log_sca = .true.
log_vec = [ .true., .false., .true. ]
int_sca = merge( 1, 0, log_sca )
int_vec = merge( 1, 0, log_vec )
print *, int_sca
print *, int_vec
end program
To address your updated question, this is trivial to do with merge:
A = merge(1, 0, B == 0)
This can be performed on scalars and arrays of arbitrary dimensions. For the latter, this can easily be vectorized be the compiler. You should consult the manual of your compiler for that, though.
The where statement in Casey's answer can be extended in the same way.
Since you convert them to floats later on, why not assign them as floats right away? Assuming that A is real, this could look like:
A = merge(1., 0., B == 0)
Another method to compliment #AlexanderVogt is to use the where construct.
program test
implicit none
integer :: int_vec(5)
logical :: log_vec(5)
log_vec = [ .true., .true., .false., .true., .false. ]
where (log_vec)
int_vec = 1
elsewhere
int_vec = 0
end where
print *, log_vec
print *, int_vec
end program test
This will assign 1 to the elements of int_vec that correspond to true elements of log_vec and 0 to the others.
The where construct will work for any rank array.
For this particular example you could avoid the logical all together:
A=1-(3-B)/3
Of course not so good for readability, but it might be ok performance-wise.
Edit, running performance tests this is 2-3 x faster than the where construct, and of course absolutely standards conforming. In fact you can throw in an absolute value and generalize as:
integer,parameter :: h=huge(1)
A=1-(h-abs(B))/h
and still beat the where loop.

How do I determine if *exactly* one boolean is true, without type conversion?

Given an arbitrary list of booleans, what is the most elegant way of determining that exactly one of them is true?
The most obvious hack is type conversion: converting them to 0 for false and 1 for true and then summing them, and returning sum == 1.
I'd like to know if there is a way to do this without converting them to ints, actually using boolean logic.
(This seems like it should be trivial, idk, long week)
Edit: In case it wasn't obvious, this is more of a code-golf / theoretical question. I'm not fussed about using type conversion / int addition in PROD code, I'm just interested if there is way of doing it without that.
Edit2: Sorry folks it's a long week and I'm not explaining myself well. Let me try this:
In boolean logic, ANDing a collection of booleans is true if all of the booleans are true, ORing the collection is true if least one of them is true. Is there a logical construct that will be true if exactly one boolean is true? XOR is this for a collection of two booleans for example, but any more than that and it falls over.
You can actually accomplish this using only boolean logic, although there's perhaps no practical value of that in your example. The boolean version is much more involved than simply counting the number of true values.
Anyway, for the sake of satisfying intellectual curiosity, here goes. First, the idea of using a series of XORs is good, but it only gets us half way. For any two variables x and y,
x ⊻ y
is true whenever exactly one of them is true. However, this does not continue to be true if you add a third variable z,
x ⊻ y ⊻ z
The first part, x ⊻ y, is still true if exactly one of x and y is true. If either x or y is true, then z needs to be false for the whole expression to be true, which is what we want. But consider what happens if both x and y are true. Then x ⊻ y is false, yet the whole expression can become true if z is true as well. So either one variable or all three must be true. In general, if you have a statement that is a chain of XORs, it will be true if an uneven number of variables are true.
Since one is an uneven number, this might prove useful. Of course, checking for an uneven number of truths is not enough. We additionally need to ensure that no more than one variable is true. This can be done in a pairwise fashion by taking all pairs of two variables and checking that they are not both true. Taken together these two conditions ensure that exactly one if the variables are true.
Below is a small Python script to illustrate the approach.
from itertools import product
print("x|y|z|only_one_is_true")
print("======================")
for x, y, z in product([True, False], repeat=3):
uneven_number_is_true = x ^ y ^ z
max_one_is_true = (not (x and y)) and (not (x and z)) and (not (y and z))
only_one_is_true = uneven_number_is_true and max_one_is_true
print(int(x), int(y), int(z), only_one_is_true)
And here's the output.
x|y|z|only_one_is_true
======================
1 1 1 False
1 1 0 False
1 0 1 False
1 0 0 True
0 1 1 False
0 1 0 True
0 0 1 True
0 0 0 False
Sure, you could do something like this (pseudocode, since you didn't mention language):
found = false;
alreadyFound = false;
for (boolean in booleans):
if (boolean):
found = true;
if (alreadyFound):
found = false;
break;
else:
alreadyFound = true;
return found;
After your clarification, here it is with no integers.
bool IsExactlyOneBooleanTrue( bool *boolAry, int size )
{
bool areAnyTrue = false;
bool areTwoTrue = false;
for(int i = 0; (!areTwoTrue) && (i < size); i++) {
areTwoTrue = (areAnyTrue && boolAry[i]);
areAnyTrue |= boolAry[i];
}
return ((areAnyTrue) && (!areTwoTrue));
}
No-one mentioned that this "operation" we're looking for is shortcut-able similarly to boolean AND and OR in most languages. Here's an implementation in Java:
public static boolean exactlyOneOf(boolean... inputs) {
boolean foundAtLeastOne = false;
for (boolean bool : inputs) {
if (bool) {
if (foundAtLeastOne) {
// found a second one that's also true, shortcut like && and ||
return false;
}
foundAtLeastOne = true;
}
}
// we're happy if we found one, but if none found that's less than one
return foundAtLeastOne;
}
With plain boolean logic, it may not be possible to achieve what you want. Because what you are asking for is a truth evaluation not just based on the truth values but also on additional information(count in this case). But boolean evaluation is binary logic, it cannot depend on anything else but on the operands themselves. And there is no way to reverse engineer to find the operands given a truth value because there can be four possible combinations of operands but only two results. Given a false, can you tell if it is because of F ^ F or T ^ T in your case, so that the next evaluation can be determined based on that?.
booleanList.Where(y => y).Count() == 1;
Due to the large number of reads by now, here comes a quick clean up and additional information.
Option 1:
Ask if only the first variable is true, or only the second one, ..., or only the n-th variable.
x1 & !x2 & ... & !xn |
!x1 & x2 & ... & !xn |
...
!x1 & !x2 & ... & xn
This approach scales in O(n^2), the evaluation stops after the first positive match is found. Hence, preferred if it is likely that there is a positive match.
Option 2:
Ask if there is at least one variable true in total. Additionally check every pair to contain at most one true variable (Anders Johannsen's answer)
(x1 | x2 | ... | xn) &
(!x1 | !x2) &
...
(!x1 | !xn) &
(!x2 | !x3) &
...
(!x2 | !xn) &
...
This option also scales in O(n^2) due to the number of possible pairs. Lazy evaluation stops the formula after the first counter example. Hence, it is preferred if its likely there is a negative match.
(Option 3):
This option involves a subtraction and is thus no valid answer for the restricted setting. Nevertheless, it argues how looping the values might not be the most beneficial solution in an unrestricted stetting.
Treat x1 ... xn as a binary number x. Subtract one, then AND the results. The output is zero <=> x1 ... xn contains at most one true value. (the old "check power of two" algorithm)
x 00010000
x-1 00001111
AND 00000000
If the bits are already stored in such a bitboard, this might be beneficial over looping. Though, keep in mind this kills the readability and is limited by the available board length.
A last note to raise awareness: by now there exists a stack exchange called computer science which is exactly intended for this type of algorithmic questions
It can be done quite nicely with recursion, e.g. in Haskell
-- there isn't exactly one true element in the empty list
oneTrue [] = False
-- if the list starts with False, discard it
oneTrue (False : xs) = oneTrue xs
-- if the list starts with True, all other elements must be False
oneTrue (True : xs) = not (or xs)
// Javascript
Use .filter() on array and check the length of the new array.
// Example using array
isExactly1BooleanTrue(boolean:boolean[]) {
return booleans.filter(value => value === true).length === 1;
}
// Example using ...booleans
isExactly1BooleanTrue(...booleans) {
return booleans.filter(value => value === true).length === 1;
}
One way to do it is to perform pairwise AND and then check if any of the pairwise comparisons returned true with chained OR. In python I would implement it using
from itertools import combinations
def one_true(bools):
pairwise_comp = [comb[0] and comb[1] for comb in combinations(bools, 2)]
return not any(pairwise_comp)
This approach easily generalizes to lists of arbitrary length, although for very long lists, the number of possible pairs grows very quickly.
Python:
boolean_list.count(True) == 1
OK, another try. Call the different booleans b[i], and call a slice of them (a range of the array) b[i .. j]. Define functions none(b[i .. j]) and just_one(b[i .. j]) (can substitute the recursive definitions to get explicit formulas if required). We have, using C notation for logical operations (&& is and, || is or, ^ for xor (not really in C), ! is not):
none(b[i .. i + 1]) ~~> !b[i] && !b[i + 1]
just_one(b[i .. i + 1]) ~~> b[i] ^ b[i + 1]
And then recursively:
none(b[i .. j + 1]) ~~> none(b[i .. j]) && !b[j + 1]
just_one(b[i .. j + 1] ~~> (just_one(b[i .. j]) && !b[j + 1]) ^ (none(b[i .. j]) && b[j + 1])
And you are interested in just_one(b[1 .. n]).
The expressions will turn out horrible.
Have fun!
That python script does the job nicely. Here's the one-liner it uses:
((x ∨ (y ∨ z)) ∧ (¬(x ∧ y) ∧ (¬(z ∧ x) ∧ ¬(y ∧ z))))
Retracted for Privacy and Anders Johannsen provided already correct and simple answers. But both solutions do not scale very well (O(n^2)). If performance is important you can stick to the following solution, which performs in O(n):
def exact_one_of(array_of_bool):
exact_one = more_than_one = False
for array_elem in array_of_bool:
more_than_one = (exact_one and array_elem) or more_than_one
exact_one = (exact_one ^ array_elem) and (not more_than_one)
return exact_one
(I used python and a for loop for simplicity. But of course this loop could be unrolled to a sequence of NOT, AND, OR and XOR operations)
It works by tracking two states per boolean variable/list entry:
is there exactly one "True" from the beginning of the list until this entry?
are there more than one "True" from the beginning of the list until this entry?
The states of a list entry can be simply derived from the previous states and corresponding list entry/boolean variable.
Python:
let see using example...
steps:
below function exactly_one_topping takes three parameter
stores their values in the list as True, False
Check whether there exists only one true value by checking the count to be exact 1.
def exactly_one_topping(ketchup, mustard, onion):
args = [ketchup,mustard,onion]
if args.count(True) == 1: # check if Exactly one value is True
return True
else:
return False
How do you want to count how many are true without, you know, counting? Sure, you could do something messy like (C syntax, my Python is horrible):
for(i = 0; i < last && !booleans[i]; i++)
;
if(i == last)
return 0; /* No true one found */
/* We have a true one, check there isn't another */
for(i++; i < last && !booleans[i]; i++)
;
if(i == last)
return 1; /* No more true ones */
else
return 0; /* Found another true */
I'm sure you'll agree that the win (if any) is slight, and the readability is bad.
It is not possible without looping. Check BitSet cardinality() in java implementation.
http://fuseyism.com/classpath/doc/java/util/BitSet-source.html
We can do it this way:-
if (A=true or B=true)and(not(A=true and B=true)) then
<enter statements>
end if

Vectorized operations on cell arrays

This post was triggered by following discussion on whether cell arrays are "normal arrays" and that vectorizaton does not work for cell arrays.
I wonder why following vectorization syntax is not implemented in MATLAB, what speaks against it:
>> {'hallo','matlab','world'} == 'matlab'
??? Undefined function or method 'eq' for input arguments of type 'cell'.
internally it would be equivalent to
[{'hallo'},{'matlab'},{'world'}] == {'matlab'}
because MATLAB knows when to cast, following works:
[{'hallo','matlab'},'world']
Cell array is an array of pointers. If both left and right side point to equal objects, isequal('hallo','hallo') returns as expected true, then why MATLAB still does not allow topmost example?
I know I can use strmatch or cellfun.
SUMMARY:
operator == which is required for vectorization in above example is eq and not isequal (other operators are < which is lt, etc.)
eq is built-in for numeric types, for all other types (like strings) MATLAB gives as freedom to overload this (and other) operators.
operator vectorization is thus well possible with cell arrays of defined type (like string) but not by default for any type.
function vectorization like myFun( myString ) or myFun( myCellOfStrings ), is also possible, you have just to implement it internally in myFun. Functions sin(val) and sin(array) work also not by witchcraft but because both cases are implemented internally.
Firstly, == is not the same as isequal. The function that gets called when you use == is eq, and the scope of each of those is different.
For e.g., in eq(A,B), if B is a scalar, the function checks each element of A for equality with B and returns a logical vector.
eq([2,5,4,2],2)
ans =
1 0 0 1
However, isequal(A,B) checks if A is identically equal to B in all aspects. In other words, MATLAB cannot tell the difference between A and B. Doing this for the above example:
isequal([2,5,4,2],2)
ans =
0
I think what you really intended to ask in the question, but didn't, is:
"Why is == not defined for cell arrays?"
Well, a simple reason is: Cells were not intended for such use. You can easily see how implementing such a function for cells can quickly get complicated when you start considering individual cases. For example, consider
{2,5,{4,2}}==2
What would you expect the answer to be? A reasonable guess would be
ans = {1,0,0}
which is fair. But let's say, I disagree. Now I'd like the equality operation to walk down nested cells and return
ans = {1,0,{0,1}}
Can you disagree with this interpretation? Perhaps not. It's equally valid, and in some cases that's the behavior you want.
This was just a simple example. Now add to this a mixture of nested cells, different types, etc. within the cell and think about handling each of those corner cases. It quickly becomes a nightmare for the developers to implement such a functionality that can be satisfactorily used by everyone.
So the solution is to overload the function, implementing only the specific functionality that you desire, for use in your application. MATLAB provides a way to do that too, by creating an #cell directory and defining an eq.m for use with cells the way you want it. Ramashalanka has demonstrated this in his answer.
There are many things that would seem natural for MATLAB to do that they have chosen not to. Perhaps they don't want to consider many special cases (see below). You can do it yourself by overloading. If you make a directory #cell and put the following in a new function eq.m:
function c = eq(a,b)
if iscell(b) && ~iscell(a)
c = eq(b,a);
else
c = cell(size(a));
for n = 1:numel(c)
if iscell(a) && iscell(b)
c{n} = isequal(a{n},b{n});
else
c{n} = isequal(a{n},b);
end
end
end
Then you can do, e.g.:
>> {'hallo','matlab','world'} == 'matlab'
ans = [0] [1] [0]
>> {'hallo','matlab','world'} == {'a','matlab','b'}
ans = [0] [1] [0]
>> {'hallo','matlab','world'} == {'a','dd','matlab'}
ans = [0] [0] [0]
>> { 1, 2, 3 } == 2
ans = [0] [1] [0]
But, even though I considered a couple of cases in my simple function, there are lots of things I didn't consider (checking cells are the same size, checking a multi-element cell against a singleton etc etc).
I used isequal even though it's called with eq (i.e. ==) since it handles {'hallo','matlab','world'} == 'matlab' better, but really I should consider more cases.
(EDIT: I made the function slightly shorter, but less efficient)
This is not unique to strings. Even the following does not work:
{ 1, 2, 3 } == 2
Cell arrays are not the same as "normal" arrays: they offer different syntax, different semantics, different capabilities, and are implemented differently (an extra layer of indirection).
Consider if == on cell arrays were defined in terms of isequal on an element-by-element basis. So, the above example would be no problem. But what about this?
{ [1 0 1], [1 1 0] } == 1
The resulting behaviour wouldn't be terribly useful in most circumstances. And what about this?
1 == { 1, 2, 3 }
And how would you define this? (I can think of at least three different interpretations.)
{ 1, 2, 3 } == { 4, 5, 6 }
And what about this?
{ 1, 2, 3 } == { { 4, 5, 6 } }
Or this?
{ 1, 2, 3 } == { 4; 5; 6 }
Or this?
{ 1, 2, 3 } == { 4, 5 }
You could add all sorts of special-case handling, but that makes the language less consistent, more complex, and less predictable.
The reason for this problem is: cell arrays can store different types of variables in different cells. Thus the operator == can't be defined well for the entire array. It is even possible for a cell to contain another cell, further exacerbating the problem.
Think of {4,'4',4.0,{4,4,4}} == '4'. what should be the result? Each type evaluates in different way.